linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC GIT PULL] nohz: Full dynticks rq clock handling
@ 2013-04-06 16:45 Frederic Weisbecker
  2013-04-06 16:45 ` [PATCH 1/7] sched: Update rq clock on nohz CPU before migrating tasks Frederic Weisbecker
                   ` (7 more replies)
  0 siblings, 8 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2013-04-06 16:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Paul Turner, Mike Galbraith

Hi,

So this is the part that handles the rq clock on full dynticks CPUs.
May be some of the update_nohz_rq_clock() calls here could even
apply to mainline for dynticks idle CPUs.

There are still two known places I need to look into for which I had patches
in my nohz tree but I need to have a deeper look into these (make the patch
saner, check if the problem should be fixed in mainline instead, etc...):

* http://git.kernel.org/cgit/linux/kernel/git/frederic/linux-dynticks.git/commit/?h=3.9-rc1-nohz1&id=36ca34d0d4ba0a9589cf3054a7df9f0879ddd6c5
* http://git.kernel.org/cgit/linux/kernel/git/frederic/linux-dynticks.git/commit/?h=3.9-rc1-nohz1&id=9c3af3b4104682217b1f602acb4efbab563ccf59

But otherwise this patchset handles the big bulk of rq clock handling.

This can be pulled from:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	timers/nohz-rqclock

Thanks.

---
Frederic Weisbecker (7):
  sched: Update rq clock on nohz CPU before migrating tasks
  sched: Update rq clock on nohz CPU before setting fair group shares
  sched: Update rq clock on tickless CPUs before calling
    check_preempt_curr()
  sched: Update rq clock earlier in unthrottle_cfs_rq
  sched: Update rq clock before idle balancing
  sched: Use an accessor to read rq clock
  sched: Debug nohz rq clock

 kernel/sched/core.c      |   30 +++++++++++++++++++++---
 kernel/sched/fair.c      |   55 ++++++++++++++++++++++++++--------------------
 kernel/sched/rt.c        |    8 +++---
 kernel/sched/sched.h     |   47 +++++++++++++++++++++++++++++++++++++++
 kernel/sched/stats.h     |    8 +++---
 kernel/sched/stop_task.c |    8 +++---
 lib/Kconfig.debug        |   11 +++++++++
 7 files changed, 127 insertions(+), 40 deletions(-)

-- 
1.7.5.4


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/7] sched: Update rq clock on nohz CPU before migrating tasks
  2013-04-06 16:45 [RFC GIT PULL] nohz: Full dynticks rq clock handling Frederic Weisbecker
@ 2013-04-06 16:45 ` Frederic Weisbecker
  2013-04-08 11:48   ` Ingo Molnar
  2013-04-09  9:13   ` Peter Zijlstra
  2013-04-06 16:45 ` [PATCH 2/7] sched: Update rq clock on nohz CPU before setting fair group shares Frederic Weisbecker
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2013-04-06 16:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Paul Turner, Mike Galbraith

Because the sched_class::put_prev_task() callback of rt and fair
classes are referring to the rq clock to update their runtime
statistics. A CPU running in tickless mode may carry a stale value.
We need to update it there.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul Turner <pjt@google.com>
Cc: Mike Galbraith <efault@gmx.de>
---
 kernel/sched/core.c  |    7 +++++++
 kernel/sched/sched.h |    7 +++++++
 2 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e91ee58..1826e00 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4946,6 +4946,13 @@ static void migrate_tasks(unsigned int dead_cpu)
 	 */
 	rq->stop = NULL;
 
+	/*
+	 * put_prev_task() and pick_next_task() sched
+	 * class method both need to have an up-to-date
+	 * value of rq->clock[_task]
+	 */
+	update_nohz_rq_clock(rq);
+
 	for ( ; ; ) {
 		/*
 		 * There's this thread running, bail when that's the only
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3bd15a4..96f7333 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -5,6 +5,7 @@
 #include <linux/mutex.h>
 #include <linux/spinlock.h>
 #include <linux/stop_machine.h>
+#include <linux/tick.h>
 
 #include "cpupri.h"
 
@@ -1115,6 +1116,12 @@ static inline void dec_nr_running(struct rq *rq)
 
 extern void update_rq_clock(struct rq *rq);
 
+static inline void update_nohz_rq_clock(struct rq *rq)
+{
+	if (tick_nohz_extended_cpu(cpu_of(rq)))
+		update_rq_clock(rq);
+}
+
 extern void activate_task(struct rq *rq, struct task_struct *p, int flags);
 extern void deactivate_task(struct rq *rq, struct task_struct *p, int flags);
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/7] sched: Update rq clock on nohz CPU before setting fair group shares
  2013-04-06 16:45 [RFC GIT PULL] nohz: Full dynticks rq clock handling Frederic Weisbecker
  2013-04-06 16:45 ` [PATCH 1/7] sched: Update rq clock on nohz CPU before migrating tasks Frederic Weisbecker
@ 2013-04-06 16:45 ` Frederic Weisbecker
  2013-04-09  9:26   ` Peter Zijlstra
  2013-04-06 16:45 ` [PATCH 3/7] sched: Update rq clock on tickless CPUs before calling check_preempt_curr() Frederic Weisbecker
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Frederic Weisbecker @ 2013-04-06 16:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Paul Turner, Mike Galbraith

Because we may update the execution time (sched_group_set_shares()->
	update_cfs_shares()->reweight_entity()->update_curr()) before
reweighting the entity after updating the group shares and this requires
an uptodate version of the runqueue clock. Let's update it on the target
CPU if it runs tickless because scheduler_tick() is not there to maintain
it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul Turner <pjt@google.com>
Cc: Mike Galbraith <efault@gmx.de>
---
 kernel/sched/fair.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 539760e..6d35f8a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6060,6 +6060,11 @@ int sched_group_set_shares(struct task_group *tg, unsigned long shares)
 		se = tg->se[i];
 		/* Propagate contribution to hierarchy */
 		raw_spin_lock_irqsave(&rq->lock, flags);
+		/*
+		 * We may call update_curr() which needs an up-to-date
+		 * version of rq clock if the CPU runs tickless.
+		 */
+		update_nohz_rq_clock(rq);
 		for_each_sched_entity(se)
 			update_cfs_shares(group_cfs_rq(se));
 		raw_spin_unlock_irqrestore(&rq->lock, flags);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/7] sched: Update rq clock on tickless CPUs before calling check_preempt_curr()
  2013-04-06 16:45 [RFC GIT PULL] nohz: Full dynticks rq clock handling Frederic Weisbecker
  2013-04-06 16:45 ` [PATCH 1/7] sched: Update rq clock on nohz CPU before migrating tasks Frederic Weisbecker
  2013-04-06 16:45 ` [PATCH 2/7] sched: Update rq clock on nohz CPU before setting fair group shares Frederic Weisbecker
@ 2013-04-06 16:45 ` Frederic Weisbecker
  2013-04-09 13:18   ` Peter Zijlstra
  2013-04-06 16:45 ` [PATCH 4/7] sched: Update rq clock earlier in unthrottle_cfs_rq Frederic Weisbecker
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Frederic Weisbecker @ 2013-04-06 16:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Paul Turner, Mike Galbraith

check_preempt_wakeup() of fair class needs an uptodate sched clock
value to update runtime stats of the current task.

When a task is woken up, activate_task() is usually called right before
ttwu_do_wakeup() unless the task is still in the runqueue. In this
case we need to update the rq clock manually in case the CPU runs
tickless because ttwu_do_wakeup() calls check_preempt_wakeup().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul Turner <pjt@google.com>
Cc: Mike Galbraith <efault@gmx.de>
---
 kernel/sched/core.c |   17 ++++++++++++++++-
 1 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1826e00..e4ec9d5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1352,6 +1352,12 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
 
 	rq = __task_rq_lock(p);
 	if (p->on_rq) {
+		/*
+		 * Ensure check_preempt_curr() won't deal with a stale value
+		 * of rq clock if the CPU is tickless. BTW do we actually need
+		 * check_preempt_curr() to be called here?
+		 */
+		update_nohz_rq_clock(rq);
 		ttwu_do_wakeup(rq, p, wake_flags);
 		ret = 1;
 	}
@@ -1529,8 +1535,17 @@ static void try_to_wake_up_local(struct task_struct *p)
 	if (!(p->state & TASK_NORMAL))
 		goto out;
 
-	if (!p->on_rq)
+	if (!p->on_rq) {
 		ttwu_activate(rq, p, ENQUEUE_WAKEUP);
+	} else {
+		/*
+		 * Even if the task is on the runqueue we still
+		 * need to ensure check_preempt_curr() won't
+		 * deal with a stale rq clock value on a tickless
+		 * CPU
+		 */
+		update_nohz_rq_clock(rq);
+	}
 
 	ttwu_do_wakeup(rq, p, 0);
 	ttwu_stat(p, smp_processor_id(), 0);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/7] sched: Update rq clock earlier in unthrottle_cfs_rq
  2013-04-06 16:45 [RFC GIT PULL] nohz: Full dynticks rq clock handling Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2013-04-06 16:45 ` [PATCH 3/7] sched: Update rq clock on tickless CPUs before calling check_preempt_curr() Frederic Weisbecker
@ 2013-04-06 16:45 ` Frederic Weisbecker
  2013-04-06 16:45 ` [PATCH 5/7] sched: Update rq clock before idle balancing Frederic Weisbecker
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2013-04-06 16:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Paul Turner, Mike Galbraith

In this function we are making use of rq->clock right before the
update of the rq clock, let's just call update_rq_clock() just
before that to avoid using a stale rq clock value.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul Turner <pjt@google.com>
Cc: Mike Galbraith <efault@gmx.de>
---
 kernel/sched/fair.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6d35f8a..fdff363 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2278,14 +2278,15 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
 	long task_delta;
 
 	se = cfs_rq->tg->se[cpu_of(rq_of(cfs_rq))];
-
 	cfs_rq->throttled = 0;
+
+	update_rq_clock(rq);
+
 	raw_spin_lock(&cfs_b->lock);
 	cfs_b->throttled_time += rq->clock - cfs_rq->throttled_clock;
 	list_del_rcu(&cfs_rq->throttled_list);
 	raw_spin_unlock(&cfs_b->lock);
 
-	update_rq_clock(rq);
 	/* update hierarchical throttle state */
 	walk_tg_tree_from(cfs_rq->tg, tg_nop, tg_unthrottle_up, (void *)rq);
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/7] sched: Update rq clock before idle balancing
  2013-04-06 16:45 [RFC GIT PULL] nohz: Full dynticks rq clock handling Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2013-04-06 16:45 ` [PATCH 4/7] sched: Update rq clock earlier in unthrottle_cfs_rq Frederic Weisbecker
@ 2013-04-06 16:45 ` Frederic Weisbecker
  2013-04-06 16:45 ` [PATCH 6/7] sched: Use an accessor to read rq clock Frederic Weisbecker
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2013-04-06 16:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Paul Turner, Mike Galbraith

idle_balance() is called from schedule() right before we schedule the
idle task. It needs to record the idle timestamp at that time and for
this the rq clock must be accurate. If the CPU is running tickless
we need to update the rq clock manually.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul Turner <pjt@google.com>
Cc: Mike Galbraith <efault@gmx.de>
---
 kernel/sched/fair.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fdff363..f90e1d8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5216,6 +5216,7 @@ void idle_balance(int this_cpu, struct rq *this_rq)
 	int pulled_task = 0;
 	unsigned long next_balance = jiffies + HZ;
 
+	update_nohz_rq_clock(this_rq);
 	this_rq->idle_stamp = this_rq->clock;
 
 	if (this_rq->avg_idle < sysctl_sched_migration_cost)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/7] sched: Use an accessor to read rq clock
  2013-04-06 16:45 [RFC GIT PULL] nohz: Full dynticks rq clock handling Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2013-04-06 16:45 ` [PATCH 5/7] sched: Update rq clock before idle balancing Frederic Weisbecker
@ 2013-04-06 16:45 ` Frederic Weisbecker
  2013-04-06 16:46 ` [PATCH 7/7] sched: Debug nohz " Frederic Weisbecker
  2013-04-10 10:26 ` [RFC GIT PULL] nohz: Full dynticks rq clock handling Peter Zijlstra
  7 siblings, 0 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2013-04-06 16:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Paul Turner, Mike Galbraith

Read the runqueue clock through an accessor. This way
we'll be able to detect and debug stale rq clocks on
full dynticks CPUs.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul Turner <pjt@google.com>
Cc: Mike Galbraith <efault@gmx.de>
---
 kernel/sched/core.c      |    6 +++---
 kernel/sched/fair.c      |   44 ++++++++++++++++++++++----------------------
 kernel/sched/rt.c        |    8 ++++----
 kernel/sched/sched.h     |   10 ++++++++++
 kernel/sched/stats.h     |    8 ++++----
 kernel/sched/stop_task.c |    8 ++++----
 6 files changed, 47 insertions(+), 37 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e4ec9d5..9f3f611 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -654,7 +654,7 @@ void sched_avg_update(struct rq *rq)
 {
 	s64 period = sched_avg_period();
 
-	while ((s64)(rq->clock - rq->age_stamp) > period) {
+	while ((s64)(rq_clock(rq) - rq->age_stamp) > period) {
 		/*
 		 * Inline assembly required to prevent the compiler
 		 * optimising this loop into a divmod call.
@@ -1315,7 +1315,7 @@ ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
 		p->sched_class->task_woken(rq, p);
 
 	if (rq->idle_stamp) {
-		u64 delta = rq->clock - rq->idle_stamp;
+		u64 delta = rq_clock(rq) - rq->idle_stamp;
 		u64 max = 2*sysctl_sched_migration_cost;
 
 		if (delta > max)
@@ -2669,7 +2669,7 @@ static u64 do_task_delta_exec(struct task_struct *p, struct rq *rq)
 
 	if (task_current(rq, p)) {
 		update_rq_clock(rq);
-		ns = rq->clock_task - p->se.exec_start;
+		ns = rq_clock_task(rq) - p->se.exec_start;
 		if ((s64)ns < 0)
 			ns = 0;
 	}
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f90e1d8..b765ffa 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -686,7 +686,7 @@ __update_curr(struct cfs_rq *cfs_rq, struct sched_entity *curr,
 static void update_curr(struct cfs_rq *cfs_rq)
 {
 	struct sched_entity *curr = cfs_rq->curr;
-	u64 now = rq_of(cfs_rq)->clock_task;
+	u64 now = rq_clock_task(rq_of(cfs_rq));
 	unsigned long delta_exec;
 
 	if (unlikely(!curr))
@@ -718,7 +718,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
 static inline void
 update_stats_wait_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-	schedstat_set(se->statistics.wait_start, rq_of(cfs_rq)->clock);
+	schedstat_set(se->statistics.wait_start, rq_clock(rq_of(cfs_rq)));
 }
 
 /*
@@ -738,14 +738,14 @@ static void
 update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 	schedstat_set(se->statistics.wait_max, max(se->statistics.wait_max,
-			rq_of(cfs_rq)->clock - se->statistics.wait_start));
+			rq_clock(rq_of(cfs_rq)) - se->statistics.wait_start));
 	schedstat_set(se->statistics.wait_count, se->statistics.wait_count + 1);
 	schedstat_set(se->statistics.wait_sum, se->statistics.wait_sum +
-			rq_of(cfs_rq)->clock - se->statistics.wait_start);
+			rq_clock(rq_of(cfs_rq)) - se->statistics.wait_start);
 #ifdef CONFIG_SCHEDSTATS
 	if (entity_is_task(se)) {
 		trace_sched_stat_wait(task_of(se),
-			rq_of(cfs_rq)->clock - se->statistics.wait_start);
+			rq_clock(rq_of(cfs_rq)) - se->statistics.wait_start);
 	}
 #endif
 	schedstat_set(se->statistics.wait_start, 0);
@@ -771,7 +771,7 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
 	/*
 	 * We are starting a new run period:
 	 */
-	se->exec_start = rq_of(cfs_rq)->clock_task;
+	se->exec_start = rq_clock_task(rq_of(cfs_rq));
 }
 
 /**************************************************
@@ -1497,7 +1497,7 @@ static void update_cfs_rq_blocked_load(struct cfs_rq *cfs_rq, int force_update)
 
 static inline void update_rq_runnable_avg(struct rq *rq, int runnable)
 {
-	__update_entity_runnable_avg(rq->clock_task, &rq->avg, runnable);
+	__update_entity_runnable_avg(rq_clock_task(rq), &rq->avg, runnable);
 	__update_tg_runnable_avg(&rq->avg, &rq->cfs);
 }
 
@@ -1512,7 +1512,7 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
 	 * accumulated while sleeping.
 	 */
 	if (unlikely(se->avg.decay_count <= 0)) {
-		se->avg.last_runnable_update = rq_of(cfs_rq)->clock_task;
+		se->avg.last_runnable_update = rq_clock_task(rq_of(cfs_rq));
 		if (se->avg.decay_count) {
 			/*
 			 * In a wake-up migration we have to approximate the
@@ -1586,7 +1586,7 @@ static void enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se)
 		tsk = task_of(se);
 
 	if (se->statistics.sleep_start) {
-		u64 delta = rq_of(cfs_rq)->clock - se->statistics.sleep_start;
+		u64 delta = rq_clock(rq_of(cfs_rq)) - se->statistics.sleep_start;
 
 		if ((s64)delta < 0)
 			delta = 0;
@@ -1603,7 +1603,7 @@ static void enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se)
 		}
 	}
 	if (se->statistics.block_start) {
-		u64 delta = rq_of(cfs_rq)->clock - se->statistics.block_start;
+		u64 delta = rq_clock(rq_of(cfs_rq)) - se->statistics.block_start;
 
 		if ((s64)delta < 0)
 			delta = 0;
@@ -1784,9 +1784,9 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
 			struct task_struct *tsk = task_of(se);
 
 			if (tsk->state & TASK_INTERRUPTIBLE)
-				se->statistics.sleep_start = rq_of(cfs_rq)->clock;
+				se->statistics.sleep_start = rq_clock(rq_of(cfs_rq));
 			if (tsk->state & TASK_UNINTERRUPTIBLE)
-				se->statistics.block_start = rq_of(cfs_rq)->clock;
+				se->statistics.block_start = rq_clock(rq_of(cfs_rq));
 		}
 #endif
 	}
@@ -2061,7 +2061,7 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq)
 	if (unlikely(cfs_rq->throttle_count))
 		return cfs_rq->throttled_clock_task;
 
-	return rq_of(cfs_rq)->clock_task - cfs_rq->throttled_clock_task_time;
+	return rq_clock_task(rq_of(cfs_rq)) - cfs_rq->throttled_clock_task_time;
 }
 
 /* returns 0 on failure to allocate runtime */
@@ -2120,7 +2120,7 @@ static void expire_cfs_rq_runtime(struct cfs_rq *cfs_rq)
 	struct rq *rq = rq_of(cfs_rq);
 
 	/* if the deadline is ahead of our clock, nothing to do */
-	if (likely((s64)(rq->clock - cfs_rq->runtime_expires) < 0))
+	if (likely((s64)(rq_clock(rq_of(cfs_rq)) - cfs_rq->runtime_expires) < 0))
 		return;
 
 	if (cfs_rq->runtime_remaining < 0)
@@ -2209,7 +2209,7 @@ static int tg_unthrottle_up(struct task_group *tg, void *data)
 #ifdef CONFIG_SMP
 	if (!cfs_rq->throttle_count) {
 		/* adjust cfs_rq_clock_task() */
-		cfs_rq->throttled_clock_task_time += rq->clock_task -
+		cfs_rq->throttled_clock_task_time += rq_clock_task(rq) -
 					     cfs_rq->throttled_clock_task;
 	}
 #endif
@@ -2224,7 +2224,7 @@ static int tg_throttle_down(struct task_group *tg, void *data)
 
 	/* group is entering throttled state, stop time */
 	if (!cfs_rq->throttle_count)
-		cfs_rq->throttled_clock_task = rq->clock_task;
+		cfs_rq->throttled_clock_task = rq_clock_task(rq);
 	cfs_rq->throttle_count++;
 
 	return 0;
@@ -2263,7 +2263,7 @@ static void throttle_cfs_rq(struct cfs_rq *cfs_rq)
 		rq->nr_running -= task_delta;
 
 	cfs_rq->throttled = 1;
-	cfs_rq->throttled_clock = rq->clock;
+	cfs_rq->throttled_clock = rq_clock(rq);
 	raw_spin_lock(&cfs_b->lock);
 	list_add_tail_rcu(&cfs_rq->throttled_list, &cfs_b->throttled_cfs_rq);
 	raw_spin_unlock(&cfs_b->lock);
@@ -2283,7 +2283,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
 	update_rq_clock(rq);
 
 	raw_spin_lock(&cfs_b->lock);
-	cfs_b->throttled_time += rq->clock - cfs_rq->throttled_clock;
+	cfs_b->throttled_time += rq_clock(rq) - cfs_rq->throttled_clock;
 	list_del_rcu(&cfs_rq->throttled_list);
 	raw_spin_unlock(&cfs_b->lock);
 
@@ -2686,7 +2686,7 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
 #else /* CONFIG_CFS_BANDWIDTH */
 static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq)
 {
-	return rq_of(cfs_rq)->clock_task;
+	return rq_clock_task(rq_of(cfs_rq));
 }
 
 static void account_cfs_rq_runtime(struct cfs_rq *cfs_rq,
@@ -3919,7 +3919,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
 	 * 2) too many balance attempts have failed.
 	 */
 
-	tsk_cache_hot = task_hot(p, env->src_rq->clock_task, env->sd);
+	tsk_cache_hot = task_hot(p, rq_clock_task(env->src_rq), env->sd);
 	if (!tsk_cache_hot ||
 		env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
 #ifdef CONFIG_SCHEDSTATS
@@ -4284,7 +4284,7 @@ static unsigned long scale_rt_power(int cpu)
 	age_stamp = ACCESS_ONCE(rq->age_stamp);
 	avg = ACCESS_ONCE(rq->rt_avg);
 
-	total = sched_avg_period() + (rq->clock - age_stamp);
+	total = sched_avg_period() + (rq_clock(rq) - age_stamp);
 
 	if (unlikely(total < avg)) {
 		/* Ensures that power won't end up being negative */
@@ -5217,7 +5217,7 @@ void idle_balance(int this_cpu, struct rq *this_rq)
 	unsigned long next_balance = jiffies + HZ;
 
 	update_nohz_rq_clock(this_rq);
-	this_rq->idle_stamp = this_rq->clock;
+	this_rq->idle_stamp = rq_clock(this_rq);
 
 	if (this_rq->avg_idle < sysctl_sched_migration_cost)
 		return;
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 127a2c4..3dfffc4 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -926,7 +926,7 @@ static void update_curr_rt(struct rq *rq)
 	if (curr->sched_class != &rt_sched_class)
 		return;
 
-	delta_exec = rq->clock_task - curr->se.exec_start;
+	delta_exec = rq_clock_task(rq) - curr->se.exec_start;
 	if (unlikely((s64)delta_exec <= 0))
 		return;
 
@@ -936,7 +936,7 @@ static void update_curr_rt(struct rq *rq)
 	curr->se.sum_exec_runtime += delta_exec;
 	account_group_exec_runtime(curr, delta_exec);
 
-	curr->se.exec_start = rq->clock_task;
+	curr->se.exec_start = rq_clock_task(rq);
 	cpuacct_charge(curr, delta_exec);
 
 	sched_rt_avg_update(rq, delta_exec);
@@ -1385,7 +1385,7 @@ static struct task_struct *_pick_next_task_rt(struct rq *rq)
 	} while (rt_rq);
 
 	p = rt_task_of(rt_se);
-	p->se.exec_start = rq->clock_task;
+	p->se.exec_start = rq_clock_task(rq);
 
 	return p;
 }
@@ -2037,7 +2037,7 @@ static void set_curr_task_rt(struct rq *rq)
 {
 	struct task_struct *p = rq->curr;
 
-	p->se.exec_start = rq->clock_task;
+	p->se.exec_start = rq_clock_task(rq);
 
 	/* The running task is never eligible for pushing */
 	dequeue_pushable_task(rq, p);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 96f7333..529e318 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -536,6 +536,16 @@ DECLARE_PER_CPU(struct rq, runqueues);
 #define cpu_curr(cpu)		(cpu_rq(cpu)->curr)
 #define raw_rq()		(&__raw_get_cpu_var(runqueues))
 
+static inline u64 rq_clock(struct rq *rq)
+{
+	return rq->clock;
+}
+
+static inline u64 rq_clock_task(struct rq *rq)
+{
+	return rq->clock_task;
+}
+
 #ifdef CONFIG_SMP
 
 #define rcu_dereference_check_sched_domain(p) \
diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h
index 2ef90a5..17d7065 100644
--- a/kernel/sched/stats.h
+++ b/kernel/sched/stats.h
@@ -61,7 +61,7 @@ static inline void sched_info_reset_dequeued(struct task_struct *t)
  */
 static inline void sched_info_dequeued(struct task_struct *t)
 {
-	unsigned long long now = task_rq(t)->clock, delta = 0;
+	unsigned long long now = rq_clock(task_rq(t)), delta = 0;
 
 	if (unlikely(sched_info_on()))
 		if (t->sched_info.last_queued)
@@ -79,7 +79,7 @@ static inline void sched_info_dequeued(struct task_struct *t)
  */
 static void sched_info_arrive(struct task_struct *t)
 {
-	unsigned long long now = task_rq(t)->clock, delta = 0;
+	unsigned long long now = rq_clock(task_rq(t)), delta = 0;
 
 	if (t->sched_info.last_queued)
 		delta = now - t->sched_info.last_queued;
@@ -100,7 +100,7 @@ static inline void sched_info_queued(struct task_struct *t)
 {
 	if (unlikely(sched_info_on()))
 		if (!t->sched_info.last_queued)
-			t->sched_info.last_queued = task_rq(t)->clock;
+			t->sched_info.last_queued = rq_clock(task_rq(t));
 }
 
 /*
@@ -112,7 +112,7 @@ static inline void sched_info_queued(struct task_struct *t)
  */
 static inline void sched_info_depart(struct task_struct *t)
 {
-	unsigned long long delta = task_rq(t)->clock -
+	unsigned long long delta = rq_clock(task_rq(t)) -
 					t->sched_info.last_arrival;
 
 	rq_sched_info_depart(task_rq(t), delta);
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index da5eb5b..e08fbee 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -28,7 +28,7 @@ static struct task_struct *pick_next_task_stop(struct rq *rq)
 	struct task_struct *stop = rq->stop;
 
 	if (stop && stop->on_rq) {
-		stop->se.exec_start = rq->clock_task;
+		stop->se.exec_start = rq_clock_task(rq);
 		return stop;
 	}
 
@@ -57,7 +57,7 @@ static void put_prev_task_stop(struct rq *rq, struct task_struct *prev)
 	struct task_struct *curr = rq->curr;
 	u64 delta_exec;
 
-	delta_exec = rq->clock_task - curr->se.exec_start;
+	delta_exec = rq_clock_task(rq) - curr->se.exec_start;
 	if (unlikely((s64)delta_exec < 0))
 		delta_exec = 0;
 
@@ -67,7 +67,7 @@ static void put_prev_task_stop(struct rq *rq, struct task_struct *prev)
 	curr->se.sum_exec_runtime += delta_exec;
 	account_group_exec_runtime(curr, delta_exec);
 
-	curr->se.exec_start = rq->clock_task;
+	curr->se.exec_start = rq_clock_task(rq);
 	cpuacct_charge(curr, delta_exec);
 }
 
@@ -79,7 +79,7 @@ static void set_curr_task_stop(struct rq *rq)
 {
 	struct task_struct *stop = rq->stop;
 
-	stop->se.exec_start = rq->clock_task;
+	stop->se.exec_start = rq_clock_task(rq);
 }
 
 static void switched_to_stop(struct rq *rq, struct task_struct *p)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 7/7] sched: Debug nohz rq clock
  2013-04-06 16:45 [RFC GIT PULL] nohz: Full dynticks rq clock handling Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2013-04-06 16:45 ` [PATCH 6/7] sched: Use an accessor to read rq clock Frederic Weisbecker
@ 2013-04-06 16:46 ` Frederic Weisbecker
  2013-04-10 10:26 ` [RFC GIT PULL] nohz: Full dynticks rq clock handling Peter Zijlstra
  7 siblings, 0 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2013-04-06 16:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: LKML, Frederic Weisbecker, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Peter Zijlstra, Steven Rostedt, Thomas Gleixner,
	Paul Turner, Mike Galbraith

The runqueue clock progression is maintained in 3 ways:

* Periodically with the timer tick

* On an as needed basis through update_rq_clock() calls
when we want a fresh update or we want to update the rq
clock of a dynticks CPU

* On full dynticks CPUs with explicit calls to
update_nohz_rq_clock()

But it's easy to miss some rq clock updates in the middle
of the tricky scheduler code paths.

So let's add some automatic debug check for stale rq
clock values when we read these. For now this just
consists in warning when we read an rq clock that hasn't
been updated for more than 30 seconds. We need a bit of
an error margin due to wheezy rq clock updates on boot.

We can certainly do some more clever check, considering
rq->skip_clock_update for example, and perhaps the rq clock
doesn't always need a fresh update on every place so
that detection is perhaps not relevant in every case.

But we need to start somewhere.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paul Turner <pjt@google.com>
Cc: Mike Galbraith <efault@gmx.de>
---
 kernel/sched/sched.h |   30 ++++++++++++++++++++++++++++++
 lib/Kconfig.debug    |   11 +++++++++++
 2 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 529e318..fecaba3 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -536,16 +536,46 @@ DECLARE_PER_CPU(struct rq, runqueues);
 #define cpu_curr(cpu)		(cpu_rq(cpu)->curr)
 #define raw_rq()		(&__raw_get_cpu_var(runqueues))
 
+/*
+ * Warn after 30 seconds elapsed since the last rq clock update.
+ * We define a large error margin because rq updates can take some
+ * time on boot.
+ */
+#define RQ_CLOCK_MAX_DELAY (NSEC_PER_SEC * 30)
+
+/*
+ * The rq clock is periodically updated by the tick. rq clock
+ * from nohz CPUs require some explicit updates before reading.
+ * This tries to detect the places where we are missing those.
+ */
+static inline void rq_clock_check(struct rq *rq)
+{
+#ifdef CONFIG_NO_HZ_DEBUG
+	unsigned long long clock;
+	unsigned long flags;
+
+	local_irq_save(flags);
+	clock = sched_clock_cpu(cpu_of(rq));
+	local_irq_restore(flags);
+
+	if (abs(clock - rq->clock) > RQ_CLOCK_MAX_DELAY)
+		WARN_ON_ONCE(1);
+#endif
+}
+
 static inline u64 rq_clock(struct rq *rq)
 {
+	rq_clock_check(rq);
 	return rq->clock;
 }
 
 static inline u64 rq_clock_task(struct rq *rq)
 {
+	rq_clock_check(rq);
 	return rq->clock_task;
 }
 
+
 #ifdef CONFIG_SMP
 
 #define rcu_dereference_check_sched_domain(p) \
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 28be08c..54b6e08 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1099,6 +1099,17 @@ config DEBUG_PER_CPU_MAPS
 
 	  Say N if unsure.
 
+config NO_HZ_DEBUG
+	bool "Debug dynamic timer tick"
+	depends on DEBUG_KERNEL
+	depends on NO_HZ || NO_HZ_EXTENDED
+	help
+	  Perform some sanity checks when the dynticks infrastructure
+	  is enabled. This adds some runtime overhead that you don't
+	  want to have in production.
+
+	  Say N if unsure.
+
 config LKDTM
 	tristate "Linux Kernel Dump Test Tool Module"
 	depends on DEBUG_FS
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/7] sched: Update rq clock on nohz CPU before migrating tasks
  2013-04-06 16:45 ` [PATCH 1/7] sched: Update rq clock on nohz CPU before migrating tasks Frederic Weisbecker
@ 2013-04-08 11:48   ` Ingo Molnar
  2013-04-09  9:13   ` Peter Zijlstra
  1 sibling, 0 replies; 24+ messages in thread
From: Ingo Molnar @ 2013-04-08 11:48 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Alessio Igor Bogani, Andrew Morton, Chris Metcalf,
	Christoph Lameter, Geoff Levand, Gilad Ben Yossef, Hakan Akkan,
	Li Zhong, Namhyung Kim, Paul E. McKenney, Paul Gortmaker,
	Peter Zijlstra, Steven Rostedt, Thomas Gleixner, Paul Turner,
	Mike Galbraith


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> @@ -1115,6 +1116,12 @@ static inline void dec_nr_running(struct rq *rq)
>  
>  extern void update_rq_clock(struct rq *rq);
>  
> +static inline void update_nohz_rq_clock(struct rq *rq)
> +{
> +	if (tick_nohz_extended_cpu(cpu_of(rq)))
> +		update_rq_clock(rq);
> +}

A minor comment: instead of implicitly knowing that full nohz CPUs mean a stale 
rq_clock, how about adding this information to the rq-> itself?

Something like introducing rq->clock_valid, initializing it to 1, and setting it 
to 0 when a CPU stops the tick.

(This would also allow the debug detection of sched_clock() use of stale values.)

We already have a similar flag: rq->skip_clock_update. I'd suggest to introduce a 
'struct sched_clock' helper structure and add the flags and scheduler clock fields 
as:

   rq->clock                 ->       rq->clock.cpu
   rq->clock_task            ->       rq->clock.task
   rq->clock_valid           ->       rq->clock.valid
   rq->clock_skip_uipdate    ->       rq->clock.skip_update
   rq->hrtick_timer          ->       rq->clock.hrtick_timer

   rq->prev_irq_time         ->       rq->clock.prev_irq_time
   rq->prev_steal_time       ->       rq->clock.prev_steal_time
   rq->prev_steal_time_rq    ->       rq->clock.prev_steal_time_rq

Thanks,

	ngo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/7] sched: Update rq clock on nohz CPU before migrating tasks
  2013-04-06 16:45 ` [PATCH 1/7] sched: Update rq clock on nohz CPU before migrating tasks Frederic Weisbecker
  2013-04-08 11:48   ` Ingo Molnar
@ 2013-04-09  9:13   ` Peter Zijlstra
  2013-04-09 13:11     ` Frederic Weisbecker
  1 sibling, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2013-04-09  9:13 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Steven Rostedt, Thomas Gleixner, Paul Turner,
	Mike Galbraith

On Sat, 2013-04-06 at 18:45 +0200, Frederic Weisbecker wrote:
> Because the sched_class::put_prev_task() callback of rt and fair
> classes are referring to the rq clock to update their runtime
> statistics. A CPU running in tickless mode may carry a stale value.
> We need to update it there.

I'm failing to see how tickless makes a difference here.. we should
never rely on a ->clock set at the last tick, that's wrong.

So either explain which/how clock update gets lost by tickless or make
it unconditional.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] sched: Update rq clock on nohz CPU before setting fair group shares
  2013-04-06 16:45 ` [PATCH 2/7] sched: Update rq clock on nohz CPU before setting fair group shares Frederic Weisbecker
@ 2013-04-09  9:26   ` Peter Zijlstra
  2013-04-09 13:21     ` Frederic Weisbecker
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2013-04-09  9:26 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Steven Rostedt, Thomas Gleixner, Paul Turner,
	Mike Galbraith

On Sat, 2013-04-06 at 18:45 +0200, Frederic Weisbecker wrote:
> Because we may update the execution time (sched_group_set_shares()->
>         update_cfs_shares()->reweight_entity()->update_curr()) before
> reweighting the entity after updating the group shares and this
> requires
> an uptodate version of the runqueue clock. Let's update it on the
> target
> CPU if it runs tickless because scheduler_tick() is not there to
> maintain
> it.

Same as the last comment, we should never rely on the tick to update
->clock except for the work done by the tick itself.

Therefore you seem to have found another missing clock update.

The problem seems to be that we haven't been able to come up with a
sane debug framework for the ->clock updates. But we should have at
least one (and preferably no more) update_sched_clock() calls per
scheduler entry point.

> ---
>  kernel/sched/fair.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 539760e..6d35f8a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6060,6 +6060,11 @@ int sched_group_set_shares(struct task_group
> *tg, unsigned long shares)
>                 se = tg->se[i];
>                 /* Propagate contribution to hierarchy */
>                 raw_spin_lock_irqsave(&rq->lock, flags);
> +               /*
> +                * We may call update_curr() which needs an up-to-date
> +                * version of rq clock if the CPU runs tickless.
> +                */
> +               update_nohz_rq_clock(rq);
>                 for_each_sched_entity(se)
>                         update_cfs_shares(group_cfs_rq(se));
>                 raw_spin_unlock_irqrestore(&rq->lock, flags);



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/7] sched: Update rq clock on nohz CPU before migrating tasks
  2013-04-09  9:13   ` Peter Zijlstra
@ 2013-04-09 13:11     ` Frederic Weisbecker
  0 siblings, 0 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2013-04-09 13:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Steven Rostedt, Thomas Gleixner, Paul Turner,
	Mike Galbraith

2013/4/9 Peter Zijlstra <peterz@infradead.org>:
> On Sat, 2013-04-06 at 18:45 +0200, Frederic Weisbecker wrote:
>> Because the sched_class::put_prev_task() callback of rt and fair
>> classes are referring to the rq clock to update their runtime
>> statistics. A CPU running in tickless mode may carry a stale value.
>> We need to update it there.
>
> I'm failing to see how tickless makes a difference here.. we should
> never rely on a ->clock set at the last tick, that's wrong.

Ah I made that big mis-assumption then ;)

>
> So either explain which/how clock update gets lost by tickless or make
> it unconditional.

So most of those fixup need to be made unconditional I think.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/7] sched: Update rq clock on tickless CPUs before calling check_preempt_curr()
  2013-04-06 16:45 ` [PATCH 3/7] sched: Update rq clock on tickless CPUs before calling check_preempt_curr() Frederic Weisbecker
@ 2013-04-09 13:18   ` Peter Zijlstra
  2013-04-09 16:53     ` Frederic Weisbecker
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2013-04-09 13:18 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Steven Rostedt, Thomas Gleixner, Paul Turner,
	Mike Galbraith

On Sat, 2013-04-06 at 18:45 +0200, Frederic Weisbecker wrote:
> check_preempt_wakeup() of fair class needs an uptodate sched clock
> value to update runtime stats of the current task.
> 
> When a task is woken up, activate_task() is usually called right
> before
> ttwu_do_wakeup() unless the task is still in the runqueue. In this
> case we need to update the rq clock manually in case the CPU runs
> tickless because ttwu_do_wakeup() calls check_preempt_wakeup().

again, same story.. I think the entire nohz specific clock update stuff
is completely broken and you're finding sites that are simply missing
clock updates.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] sched: Update rq clock on nohz CPU before setting fair group shares
  2013-04-09  9:26   ` Peter Zijlstra
@ 2013-04-09 13:21     ` Frederic Weisbecker
  2013-04-10  7:05       ` Peter Zijlstra
  0 siblings, 1 reply; 24+ messages in thread
From: Frederic Weisbecker @ 2013-04-09 13:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Steven Rostedt, Thomas Gleixner, Paul Turner,
	Mike Galbraith

2013/4/9 Peter Zijlstra <peterz@infradead.org>:
> On Sat, 2013-04-06 at 18:45 +0200, Frederic Weisbecker wrote:
>> Because we may update the execution time (sched_group_set_shares()->
>>         update_cfs_shares()->reweight_entity()->update_curr()) before
>> reweighting the entity after updating the group shares and this
>> requires
>> an uptodate version of the runqueue clock. Let's update it on the
>> target
>> CPU if it runs tickless because scheduler_tick() is not there to
>> maintain
>> it.
>
> Same as the last comment, we should never rely on the tick to update
> ->clock except for the work done by the tick itself.

Ok!

>
> Therefore you seem to have found another missing clock update.

Yep, as in the other patches I believe, I'll reiterate that by
removing that rely-on-tick assumption.

> The problem seems to be that we haven't been able to come up with a
> sane debug framework for the ->clock updates. But we should have at
> least one (and preferably no more) update_sched_clock() calls per
> scheduler entry point.

I'll check if I can factorize some update_rq_clock() calls on some
upper sched_class callback wrappers. Many of them have that update
before calling the callbacks already. But there may be a few missing.
I'll check the other sched entrypoints as well.

Also please check those two patches in my series, it's a draft for an
rq clock debug framework. For now it's just a brainless stale clock
check but that's a start:

* http://thread.gmane.org/gmane.linux.kernel/1470769/focus=1470786
* http://thread.gmane.org/gmane.linux.kernel/1470769/focus=1470750

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/7] sched: Update rq clock on tickless CPUs before calling check_preempt_curr()
  2013-04-09 13:18   ` Peter Zijlstra
@ 2013-04-09 16:53     ` Frederic Weisbecker
  0 siblings, 0 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2013-04-09 16:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Steven Rostedt, Thomas Gleixner, Paul Turner,
	Mike Galbraith

2013/4/9 Peter Zijlstra <peterz@infradead.org>:
> On Sat, 2013-04-06 at 18:45 +0200, Frederic Weisbecker wrote:
>> check_preempt_wakeup() of fair class needs an uptodate sched clock
>> value to update runtime stats of the current task.
>>
>> When a task is woken up, activate_task() is usually called right
>> before
>> ttwu_do_wakeup() unless the task is still in the runqueue. In this
>> case we need to update the rq clock manually in case the CPU runs
>> tickless because ttwu_do_wakeup() calls check_preempt_wakeup().
>
> again, same story.. I think the entire nohz specific clock update stuff
> is completely broken and you're finding sites that are simply missing
> clock updates.
>

Agreed!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] sched: Update rq clock on nohz CPU before setting fair group shares
  2013-04-09 13:21     ` Frederic Weisbecker
@ 2013-04-10  7:05       ` Peter Zijlstra
  2013-04-10 10:06         ` Ingo Molnar
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2013-04-10  7:05 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Steven Rostedt, Thomas Gleixner, Paul Turner,
	Mike Galbraith

On Tue, 2013-04-09 at 15:21 +0200, Frederic Weisbecker wrote:
> 
> Also please check those two patches in my series, it's a draft for an
> rq clock debug framework. For now it's just a brainless stale clock
> check but that's a start:
> 
> * http://thread.gmane.org/gmane.linux.kernel/1470769/focus=1470786
> * http://thread.gmane.org/gmane.linux.kernel/1470769/focus=1470750

Right.. I was thinking of something like function-graph trace, where
we'd mark the context (fgraph_data I think its called) of the function
calling update_rq_clock() (and inherit this state on all child
contexts) and then verify the current context has this marker set when
we read ->clock.


Now clearly that would not work for something like:

  a()
   b()
    update_rq_clock()
   c()
    read_rq_clock()

since we'd loose the marker once we pop b, but I'm not sure how common
this is.

Also, we could issue a warning when calling update_rq_clock() and the
marker is already set.

There's a further issue where we need to track the rq we updated vs the
rq we're reading, but I think even without that this might be of some
help.


I think Mike once tried something along the lines of keeping a per rq
state that got cleared at the end of schedule() but that doesn't catch
things like the migrate handlers I think.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] sched: Update rq clock on nohz CPU before setting fair group shares
  2013-04-10  7:05       ` Peter Zijlstra
@ 2013-04-10 10:06         ` Ingo Molnar
  2013-04-10 11:02           ` Peter Zijlstra
  0 siblings, 1 reply; 24+ messages in thread
From: Ingo Molnar @ 2013-04-10 10:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Steven Rostedt, Thomas Gleixner, Paul Turner,
	Mike Galbraith


* Peter Zijlstra <peterz@infradead.org> wrote:

> I think Mike once tried something along the lines of keeping a per rq state that 
> got cleared at the end of schedule() but that doesn't catch things like the 
> migrate handlers I think.

We'd need a rq->clock.valid debug flag, which is set by a sched-clock update, and 
cleared by the end of all scheduler operations, not just schedule().

Then sched_clock() could have a pretty efficient assert in it. Are there bugs that 
such an approach would not catch?

Thaks,

	Ingo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC GIT PULL] nohz: Full dynticks rq clock handling
  2013-04-06 16:45 [RFC GIT PULL] nohz: Full dynticks rq clock handling Frederic Weisbecker
                   ` (6 preceding siblings ...)
  2013-04-06 16:46 ` [PATCH 7/7] sched: Debug nohz " Frederic Weisbecker
@ 2013-04-10 10:26 ` Peter Zijlstra
  2013-04-10 10:29   ` Ingo Molnar
  2013-04-11 15:11   ` Frederic Weisbecker
  7 siblings, 2 replies; 24+ messages in thread
From: Peter Zijlstra @ 2013-04-10 10:26 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Steven Rostedt, Thomas Gleixner, Paul Turner,
	Mike Galbraith


Another thing I just noticed; our update_curr() assumes its called at
least once every 4 odd seconds (2^32 ns like).

update_curr() is 'required' for things like task runtime stats and
everything that hangs off of that like cputimers etc.

Now I'm not entirely sure how the nr_running==1 nohz case deals with
this, but I can imagine it might take a while. In this case we might
need to 'fix' update_curr() to not asume the time delta fits in 32
bits.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC GIT PULL] nohz: Full dynticks rq clock handling
  2013-04-10 10:26 ` [RFC GIT PULL] nohz: Full dynticks rq clock handling Peter Zijlstra
@ 2013-04-10 10:29   ` Ingo Molnar
  2013-04-11 15:11   ` Frederic Weisbecker
  1 sibling, 0 replies; 24+ messages in thread
From: Ingo Molnar @ 2013-04-10 10:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Steven Rostedt, Thomas Gleixner, Paul Turner,
	Mike Galbraith


* Peter Zijlstra <peterz@infradead.org> wrote:

> Another thing I just noticed; our update_curr() assumes its called at least once 
> every 4 odd seconds (2^32 ns like).
> 
> update_curr() is 'required' for things like task runtime stats and everything 
> that hangs off of that like cputimers etc.
> 
> Now I'm not entirely sure how the nr_running==1 nohz case deals with this, but I 
> can imagine it might take a while. In this case we might need to 'fix' 
> update_curr() to not asume the time delta fits in 32 bits.

Good point ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] sched: Update rq clock on nohz CPU before setting fair group shares
  2013-04-10 10:06         ` Ingo Molnar
@ 2013-04-10 11:02           ` Peter Zijlstra
  2013-04-10 11:06             ` Ingo Molnar
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2013-04-10 11:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Steven Rostedt, Thomas Gleixner, Paul Turner,
	Mike Galbraith

On Wed, 2013-04-10 at 12:06 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > I think Mike once tried something along the lines of keeping a per rq state that 
> > got cleared at the end of schedule() but that doesn't catch things like the 
> > migrate handlers I think.
> 
> We'd need a rq->clock.valid debug flag, which is set by a sched-clock update, and 
> cleared by the end of all scheduler operations, not just schedule().
> 
> Then sched_clock() could have a pretty efficient assert in it. Are there bugs that 
> such an approach would not catch?

It requires manual iteration of all scheduler operations which is prone
to 'accidents'.

I'd clear at the beginning, but that's more or less the same thing.

We have the .sched.text section but I'm not sure we've been consistent
enough with that to be useful. But otherwise we'd be able to clear on
section entry/exit or so.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] sched: Update rq clock on nohz CPU before setting fair group shares
  2013-04-10 11:02           ` Peter Zijlstra
@ 2013-04-10 11:06             ` Ingo Molnar
  2013-04-10 11:47               ` Peter Zijlstra
  0 siblings, 1 reply; 24+ messages in thread
From: Ingo Molnar @ 2013-04-10 11:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Steven Rostedt, Thomas Gleixner, Paul Turner,
	Mike Galbraith


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, 2013-04-10 at 12:06 +0200, Ingo Molnar wrote:
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > I think Mike once tried something along the lines of keeping a per rq state that 
> > > got cleared at the end of schedule() but that doesn't catch things like the 
> > > migrate handlers I think.
> > 
> > We'd need a rq->clock.valid debug flag, which is set by a sched-clock update, and 
> > cleared by the end of all scheduler operations, not just schedule().
> > 
> > Then sched_clock() could have a pretty efficient assert in it. Are there bugs that 
> > such an approach would not catch?
> 
> It requires manual iteration of all scheduler operations which is prone
> to 'accidents'.

There's just a handful of high level entry points, right? schedule(), wakeup, 
scheduler tick, maybe notifiers - anything else? Documenting/listing those would 
be nice anyway, near the top of kernel/sched/core.c or so.

The other approach would be to periodically clear the flag from the timer tick. 
That would catch invalid rq->clock use probabilistically.

> I'd clear at the beginning, but that's more or less the same thing.
> 
> We have the .sched.text section but I'm not sure we've been consistent enough 
> with that to be useful. But otherwise we'd be able to clear on section 
> entry/exit or so.

Hm, I'm not sure that can be made to work sanely.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] sched: Update rq clock on nohz CPU before setting fair group shares
  2013-04-10 11:06             ` Ingo Molnar
@ 2013-04-10 11:47               ` Peter Zijlstra
  2013-04-10 11:50                 ` Ingo Molnar
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2013-04-10 11:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Steven Rostedt, Thomas Gleixner, Paul Turner,
	Mike Galbraith

On Wed, 2013-04-10 at 13:06 +0200, Ingo Molnar wrote:

> There's just a handful of high level entry points, right? schedule(), wakeup, 
> scheduler tick, maybe notifiers - anything else? 

I suppose there's some cgroup muck and the various system calls; most
of them should be covered if we take some lower level primitives or so.

We could 'simply' hijack rq->lock and clear the state every time we
acquire or release it.. maybe have an exemption for the balance double
lock ops.

> Documenting/listing those would 
> be nice anyway, near the top of kernel/sched/core.c or so.

Sure and it might have to do.. I just prefer an option that's less
prone to human failure; being both paranoid and lazy :-)

> The other approach would be to periodically clear the flag from the timer tick. 
> That would catch invalid rq->clock use probabilistically.

Right, but once we have the most common paths covered it is very very
unlikely we'll hit the weird corner cases like this. But it is an
option.

> > I'd clear at the beginning, but that's more or less the same thing.
> > 
> > We have the .sched.text section but I'm not sure we've been consistent enough 
> > with that to be useful. But otherwise we'd be able to clear on section 
> > entry/exit or so.
> 
> Hm, I'm not sure that can be made to work sanely.

Yeah.. one option is to make all code in kernel/sched/ part of
.sched.text but that might be overkill -- not to mention we'd have to
peel things like completions and wait_queues out of sched/kernel/core.c
but that seems like a good idea anyway.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/7] sched: Update rq clock on nohz CPU before setting fair group shares
  2013-04-10 11:47               ` Peter Zijlstra
@ 2013-04-10 11:50                 ` Ingo Molnar
  0 siblings, 0 replies; 24+ messages in thread
From: Ingo Molnar @ 2013-04-10 11:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Steven Rostedt, Thomas Gleixner, Paul Turner,
	Mike Galbraith


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, 2013-04-10 at 13:06 +0200, Ingo Molnar wrote:
> 
> > There's just a handful of high level entry points, right? schedule(), wakeup, 
> > scheduler tick, maybe notifiers - anything else? 
> 
> I suppose there's some cgroup muck and the various system calls; most
> of them should be covered if we take some lower level primitives or so.
> 
> We could 'simply' hijack rq->lock and clear the state every time we
> acquire or release it.. maybe have an exemption for the balance double
> lock ops.

That would be feasible ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC GIT PULL] nohz: Full dynticks rq clock handling
  2013-04-10 10:26 ` [RFC GIT PULL] nohz: Full dynticks rq clock handling Peter Zijlstra
  2013-04-10 10:29   ` Ingo Molnar
@ 2013-04-11 15:11   ` Frederic Weisbecker
  1 sibling, 0 replies; 24+ messages in thread
From: Frederic Weisbecker @ 2013-04-11 15:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, LKML, Alessio Igor Bogani, Andrew Morton,
	Chris Metcalf, Christoph Lameter, Geoff Levand, Gilad Ben Yossef,
	Hakan Akkan, Li Zhong, Namhyung Kim, Paul E. McKenney,
	Paul Gortmaker, Steven Rostedt, Thomas Gleixner, Paul Turner,
	Mike Galbraith

On Wed, Apr 10, 2013 at 12:26:43PM +0200, Peter Zijlstra wrote:
> 
> Another thing I just noticed; our update_curr() assumes its called at
> least once every 4 odd seconds (2^32 ns like).
> 
> update_curr() is 'required' for things like task runtime stats and
> everything that hangs off of that like cputimers etc.
> 
> Now I'm not entirely sure how the nr_running==1 nohz case deals with
> this, but I can imagine it might take a while. In this case we might
> need to 'fix' update_curr() to not asume the time delta fits in 32
> bits.
> 

Good catch, this is not handled yet.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2013-04-11 15:11 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-06 16:45 [RFC GIT PULL] nohz: Full dynticks rq clock handling Frederic Weisbecker
2013-04-06 16:45 ` [PATCH 1/7] sched: Update rq clock on nohz CPU before migrating tasks Frederic Weisbecker
2013-04-08 11:48   ` Ingo Molnar
2013-04-09  9:13   ` Peter Zijlstra
2013-04-09 13:11     ` Frederic Weisbecker
2013-04-06 16:45 ` [PATCH 2/7] sched: Update rq clock on nohz CPU before setting fair group shares Frederic Weisbecker
2013-04-09  9:26   ` Peter Zijlstra
2013-04-09 13:21     ` Frederic Weisbecker
2013-04-10  7:05       ` Peter Zijlstra
2013-04-10 10:06         ` Ingo Molnar
2013-04-10 11:02           ` Peter Zijlstra
2013-04-10 11:06             ` Ingo Molnar
2013-04-10 11:47               ` Peter Zijlstra
2013-04-10 11:50                 ` Ingo Molnar
2013-04-06 16:45 ` [PATCH 3/7] sched: Update rq clock on tickless CPUs before calling check_preempt_curr() Frederic Weisbecker
2013-04-09 13:18   ` Peter Zijlstra
2013-04-09 16:53     ` Frederic Weisbecker
2013-04-06 16:45 ` [PATCH 4/7] sched: Update rq clock earlier in unthrottle_cfs_rq Frederic Weisbecker
2013-04-06 16:45 ` [PATCH 5/7] sched: Update rq clock before idle balancing Frederic Weisbecker
2013-04-06 16:45 ` [PATCH 6/7] sched: Use an accessor to read rq clock Frederic Weisbecker
2013-04-06 16:46 ` [PATCH 7/7] sched: Debug nohz " Frederic Weisbecker
2013-04-10 10:26 ` [RFC GIT PULL] nohz: Full dynticks rq clock handling Peter Zijlstra
2013-04-10 10:29   ` Ingo Molnar
2013-04-11 15:11   ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).