linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/7] sched: Diagnostic checks for missing rq clock updates
@ 2016-09-21 13:38 Matt Fleming
  2016-09-21 13:38 ` [PATCH v2 1/7] sched/fair: Update the rq clock before detaching tasks Matt Fleming
                   ` (6 more replies)
  0 siblings, 7 replies; 44+ messages in thread
From: Matt Fleming @ 2016-09-21 13:38 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Byungchul Park, Frederic Weisbecker, Luca Abeni, Rik van Riel,
	Thomas Gleixner, Wanpeng Li, Yuyang Du, Petr Mladek, Jan Kara,
	Sergey Senozhatsky, linux-kernel, Mel Gorman, Mike Galbraith,
	Matt Fleming

There are currently no runtime diagnostic checks for detecting when we
have inadvertently missed a call to update_rq_clock() before accessing
rq_clock() or rq_clock_task().

The idea in these patches, which came from Peter, is to piggyback on
the rq->lock pin/unpin context to detect when we expected (and failed)
to see an update to the rq clock. They've already caught a couple of
bugs: see commit b52fad2db5d7 ("sched/fair: Update rq clock before
updating nohz CPU load").

All the diagnostic code is guarded by CONFIG_SCHED_DEBUG, but there
are minimal changes to __schedule() in patch 5 for the !SCHED_DEBUG
case.

Jan and Sergey, Petr asked that you be Cc'd on this series because of
the recent issues with using WARN_ON() in the async printk work.

Changes in v2:

 - Add a check for missing update_rq_clock() before rq_clock_task().

 - Address review comments from Yuyang where I messed up the
   __schedule() ::clock_update_flags manipulation

Matt Fleming (7):
  sched/fair: Update the rq clock before detaching tasks
  sched/fair: Update rq clock before waking up new task
  sched/fair: Update rq clock in task_hot()
  sched: Add wrappers for lockdep_(un)pin_lock()
  sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock
  sched/fair: Push rq lock pin/unpin into idle_balance()
  sched/core: Add debug code to catch missing update_rq_clock()

 kernel/sched/core.c      |  92 +++++++++++++++++++++-------------------
 kernel/sched/deadline.c  |  10 ++---
 kernel/sched/fair.c      |  40 +++++++++++-------
 kernel/sched/idle_task.c |   2 +-
 kernel/sched/rt.c        |   6 +--
 kernel/sched/sched.h     | 107 ++++++++++++++++++++++++++++++++++++++++-------
 kernel/sched/stop_task.c |   2 +-
 7 files changed, 177 insertions(+), 82 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 1/7] sched/fair: Update the rq clock before detaching tasks
  2016-09-21 13:38 [PATCH v2 0/7] sched: Diagnostic checks for missing rq clock updates Matt Fleming
@ 2016-09-21 13:38 ` Matt Fleming
  2016-10-03 12:49   ` Peter Zijlstra
  2016-09-21 13:38 ` [PATCH v2 2/7] sched/fair: Update rq clock before waking up new task Matt Fleming
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 44+ messages in thread
From: Matt Fleming @ 2016-09-21 13:38 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Byungchul Park, Frederic Weisbecker, Luca Abeni, Rik van Riel,
	Thomas Gleixner, Wanpeng Li, Yuyang Du, Petr Mladek, Jan Kara,
	Sergey Senozhatsky, linux-kernel, Mel Gorman, Mike Galbraith,
	Matt Fleming

detach_task_cfs_rq() may indirectly call rq_clock() to inform the
cpufreq code that the rq utilisation has changed. In which case, we
need to update the rq clock.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
---
 kernel/sched/fair.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 986c10c25176..ab1cf3866a5b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8484,7 +8484,10 @@ static void detach_task_cfs_rq(struct task_struct *p)
 {
 	struct sched_entity *se = &p->se;
 	struct cfs_rq *cfs_rq = cfs_rq_of(se);
-	u64 now = cfs_rq_clock_task(cfs_rq);
+	u64 now;
+
+	update_rq_clock(task_rq(p));
+	now = cfs_rq_clock_task(cfs_rq);
 
 	if (!vruntime_normalized(p)) {
 		/*
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 2/7] sched/fair: Update rq clock before waking up new task
  2016-09-21 13:38 [PATCH v2 0/7] sched: Diagnostic checks for missing rq clock updates Matt Fleming
  2016-09-21 13:38 ` [PATCH v2 1/7] sched/fair: Update the rq clock before detaching tasks Matt Fleming
@ 2016-09-21 13:38 ` Matt Fleming
  2016-09-21 13:38 ` [PATCH v2 3/7] sched/fair: Update rq clock in task_hot() Matt Fleming
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 44+ messages in thread
From: Matt Fleming @ 2016-09-21 13:38 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Byungchul Park, Frederic Weisbecker, Luca Abeni, Rik van Riel,
	Thomas Gleixner, Wanpeng Li, Yuyang Du, Petr Mladek, Jan Kara,
	Sergey Senozhatsky, linux-kernel, Mel Gorman, Mike Galbraith,
	Matt Fleming

When initialising an entity's util and load averages we need an up to
date runqueue clock.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
---
 kernel/sched/fair.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ab1cf3866a5b..7f8a61e97599 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -731,7 +731,10 @@ void post_init_entity_util_avg(struct sched_entity *se)
 	struct cfs_rq *cfs_rq = cfs_rq_of(se);
 	struct sched_avg *sa = &se->avg;
 	long cap = (long)(SCHED_CAPACITY_SCALE - cfs_rq->avg.util_avg) / 2;
-	u64 now = cfs_rq_clock_task(cfs_rq);
+	u64 now;
+
+	update_rq_clock(rq_of(cfs_rq));
+	now = cfs_rq_clock_task(cfs_rq);
 
 	if (cap > 0) {
 		if (cfs_rq->avg.util_avg != 0) {
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 3/7] sched/fair: Update rq clock in task_hot()
  2016-09-21 13:38 [PATCH v2 0/7] sched: Diagnostic checks for missing rq clock updates Matt Fleming
  2016-09-21 13:38 ` [PATCH v2 1/7] sched/fair: Update the rq clock before detaching tasks Matt Fleming
  2016-09-21 13:38 ` [PATCH v2 2/7] sched/fair: Update rq clock before waking up new task Matt Fleming
@ 2016-09-21 13:38 ` Matt Fleming
  2016-09-21 13:38 ` [PATCH v2 4/7] sched: Add wrappers for lockdep_(un)pin_lock() Matt Fleming
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 44+ messages in thread
From: Matt Fleming @ 2016-09-21 13:38 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Byungchul Park, Frederic Weisbecker, Luca Abeni, Rik van Riel,
	Thomas Gleixner, Wanpeng Li, Yuyang Du, Petr Mladek, Jan Kara,
	Sergey Senozhatsky, linux-kernel, Mel Gorman, Mike Galbraith,
	Matt Fleming

When determining whether or not a task is likely to be cache hot based
on its execution start time, we need to ensure the runqueue task clock
is accurate and up to date.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
---
 kernel/sched/fair.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7f8a61e97599..85ca4ddab0d3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6096,6 +6096,7 @@ static int task_hot(struct task_struct *p, struct lb_env *env)
 	if (sysctl_sched_migration_cost == 0)
 		return 0;
 
+	update_rq_clock(env->src_rq);
 	delta = rq_clock_task(env->src_rq) - p->se.exec_start;
 
 	return delta < (s64)sysctl_sched_migration_cost;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 4/7] sched: Add wrappers for lockdep_(un)pin_lock()
  2016-09-21 13:38 [PATCH v2 0/7] sched: Diagnostic checks for missing rq clock updates Matt Fleming
                   ` (2 preceding siblings ...)
  2016-09-21 13:38 ` [PATCH v2 3/7] sched/fair: Update rq clock in task_hot() Matt Fleming
@ 2016-09-21 13:38 ` Matt Fleming
  2017-01-14 12:40   ` [tip:sched/core] sched/core: " tip-bot for Matt Fleming
  2016-09-21 13:38 ` [PATCH v2 5/7] sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock Matt Fleming
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 44+ messages in thread
From: Matt Fleming @ 2016-09-21 13:38 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Byungchul Park, Frederic Weisbecker, Luca Abeni, Rik van Riel,
	Thomas Gleixner, Wanpeng Li, Yuyang Du, Petr Mladek, Jan Kara,
	Sergey Senozhatsky, linux-kernel, Mel Gorman, Mike Galbraith,
	Matt Fleming

In preparation for adding diagnostic checks to catch missing calls to
update_rq_clock(), provide wrappers for (re)pinning and unpinning
rq->lock.

Because the pending diagnostic checks allow state to be maintained in
rq_flags across pin contexts, swap the 'struct pin_cookie' arguments
for 'struct rq_flags *'.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
---
 kernel/sched/core.c      | 80 ++++++++++++++++++++++++------------------------
 kernel/sched/deadline.c  | 10 +++---
 kernel/sched/fair.c      |  6 ++--
 kernel/sched/idle_task.c |  2 +-
 kernel/sched/rt.c        |  6 ++--
 kernel/sched/sched.h     | 31 ++++++++++++++-----
 kernel/sched/stop_task.c |  2 +-
 7 files changed, 76 insertions(+), 61 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 860070fba814..7950c372fca0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -185,7 +185,7 @@ struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)
 		rq = task_rq(p);
 		raw_spin_lock(&rq->lock);
 		if (likely(rq == task_rq(p) && !task_on_rq_migrating(p))) {
-			rf->cookie = lockdep_pin_lock(&rq->lock);
+			rq_pin_lock(rq, rf);
 			return rq;
 		}
 		raw_spin_unlock(&rq->lock);
@@ -225,7 +225,7 @@ struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)
 		 * pair with the WMB to ensure we must then also see migrating.
 		 */
 		if (likely(rq == task_rq(p) && !task_on_rq_migrating(p))) {
-			rf->cookie = lockdep_pin_lock(&rq->lock);
+			rq_pin_lock(rq, rf);
 			return rq;
 		}
 		raw_spin_unlock(&rq->lock);
@@ -1184,9 +1184,9 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
 		 * OK, since we're going to drop the lock immediately
 		 * afterwards anyway.
 		 */
-		lockdep_unpin_lock(&rq->lock, rf.cookie);
+		rq_unpin_lock(rq, &rf);
 		rq = move_queued_task(rq, p, dest_cpu);
-		lockdep_repin_lock(&rq->lock, rf.cookie);
+		rq_repin_lock(rq, &rf);
 	}
 out:
 	task_rq_unlock(rq, p, &rf);
@@ -1679,7 +1679,7 @@ static inline void ttwu_activate(struct rq *rq, struct task_struct *p, int en_fl
  * Mark the task runnable and perform wakeup-preemption.
  */
 static void ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags,
-			   struct pin_cookie cookie)
+			   struct rq_flags *rf)
 {
 	check_preempt_curr(rq, p, wake_flags);
 	p->state = TASK_RUNNING;
@@ -1691,9 +1691,9 @@ static void ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags,
 		 * Our task @p is fully woken up and running; so its safe to
 		 * drop the rq->lock, hereafter rq is only used for statistics.
 		 */
-		lockdep_unpin_lock(&rq->lock, cookie);
+		rq_unpin_lock(rq, rf);
 		p->sched_class->task_woken(rq, p);
-		lockdep_repin_lock(&rq->lock, cookie);
+		rq_repin_lock(rq, rf);
 	}
 
 	if (rq->idle_stamp) {
@@ -1712,7 +1712,7 @@ static void ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags,
 
 static void
 ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
-		 struct pin_cookie cookie)
+		 struct rq_flags *rf)
 {
 	int en_flags = ENQUEUE_WAKEUP;
 
@@ -1727,7 +1727,7 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
 #endif
 
 	ttwu_activate(rq, p, en_flags);
-	ttwu_do_wakeup(rq, p, wake_flags, cookie);
+	ttwu_do_wakeup(rq, p, wake_flags, rf);
 }
 
 /*
@@ -1746,7 +1746,7 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
 	if (task_on_rq_queued(p)) {
 		/* check_preempt_curr() may use rq clock */
 		update_rq_clock(rq);
-		ttwu_do_wakeup(rq, p, wake_flags, rf.cookie);
+		ttwu_do_wakeup(rq, p, wake_flags, &rf);
 		ret = 1;
 	}
 	__task_rq_unlock(rq, &rf);
@@ -1759,15 +1759,15 @@ void sched_ttwu_pending(void)
 {
 	struct rq *rq = this_rq();
 	struct llist_node *llist = llist_del_all(&rq->wake_list);
-	struct pin_cookie cookie;
 	struct task_struct *p;
 	unsigned long flags;
+	struct rq_flags rf;
 
 	if (!llist)
 		return;
 
 	raw_spin_lock_irqsave(&rq->lock, flags);
-	cookie = lockdep_pin_lock(&rq->lock);
+	rq_pin_lock(rq, &rf);
 
 	while (llist) {
 		int wake_flags = 0;
@@ -1778,10 +1778,10 @@ void sched_ttwu_pending(void)
 		if (p->sched_remote_wakeup)
 			wake_flags = WF_MIGRATED;
 
-		ttwu_do_activate(rq, p, wake_flags, cookie);
+		ttwu_do_activate(rq, p, wake_flags, &rf);
 	}
 
-	lockdep_unpin_lock(&rq->lock, cookie);
+	rq_unpin_lock(rq, &rf);
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
 }
 
@@ -1870,7 +1870,7 @@ bool cpus_share_cache(int this_cpu, int that_cpu)
 static void ttwu_queue(struct task_struct *p, int cpu, int wake_flags)
 {
 	struct rq *rq = cpu_rq(cpu);
-	struct pin_cookie cookie;
+	struct rq_flags rf;
 
 #if defined(CONFIG_SMP)
 	if (sched_feat(TTWU_QUEUE) && !cpus_share_cache(smp_processor_id(), cpu)) {
@@ -1881,9 +1881,9 @@ static void ttwu_queue(struct task_struct *p, int cpu, int wake_flags)
 #endif
 
 	raw_spin_lock(&rq->lock);
-	cookie = lockdep_pin_lock(&rq->lock);
-	ttwu_do_activate(rq, p, wake_flags, cookie);
-	lockdep_unpin_lock(&rq->lock, cookie);
+	rq_pin_lock(rq, &rf);
+	ttwu_do_activate(rq, p, wake_flags, &rf);
+	rq_unpin_lock(rq, &rf);
 	raw_spin_unlock(&rq->lock);
 }
 
@@ -2099,7 +2099,7 @@ out:
  * ensure that this_rq() is locked, @p is bound to this_rq() and not
  * the current task.
  */
-static void try_to_wake_up_local(struct task_struct *p, struct pin_cookie cookie)
+static void try_to_wake_up_local(struct task_struct *p, struct rq_flags *rf)
 {
 	struct rq *rq = task_rq(p);
 
@@ -2116,11 +2116,11 @@ static void try_to_wake_up_local(struct task_struct *p, struct pin_cookie cookie
 		 * disabled avoiding further scheduler activity on it and we've
 		 * not yet picked a replacement task.
 		 */
-		lockdep_unpin_lock(&rq->lock, cookie);
+		rq_unpin_lock(rq, rf);
 		raw_spin_unlock(&rq->lock);
 		raw_spin_lock(&p->pi_lock);
 		raw_spin_lock(&rq->lock);
-		lockdep_repin_lock(&rq->lock, cookie);
+		rq_repin_lock(rq, rf);
 	}
 
 	if (!(p->state & TASK_NORMAL))
@@ -2131,7 +2131,7 @@ static void try_to_wake_up_local(struct task_struct *p, struct pin_cookie cookie
 	if (!task_on_rq_queued(p))
 		ttwu_activate(rq, p, ENQUEUE_WAKEUP);
 
-	ttwu_do_wakeup(rq, p, 0, cookie);
+	ttwu_do_wakeup(rq, p, 0, rf);
 	ttwu_stat(p, smp_processor_id(), 0);
 out:
 	raw_spin_unlock(&p->pi_lock);
@@ -2578,9 +2578,9 @@ void wake_up_new_task(struct task_struct *p)
 		 * Nothing relies on rq->lock after this, so its fine to
 		 * drop it.
 		 */
-		lockdep_unpin_lock(&rq->lock, rf.cookie);
+		rq_unpin_lock(rq, &rf);
 		p->sched_class->task_woken(rq, p);
-		lockdep_repin_lock(&rq->lock, rf.cookie);
+		rq_repin_lock(rq, &rf);
 	}
 #endif
 	task_rq_unlock(rq, p, &rf);
@@ -2845,7 +2845,7 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev)
  */
 static __always_inline struct rq *
 context_switch(struct rq *rq, struct task_struct *prev,
-	       struct task_struct *next, struct pin_cookie cookie)
+	       struct task_struct *next, struct rq_flags *rf)
 {
 	struct mm_struct *mm, *oldmm;
 
@@ -2877,7 +2877,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
 	 * of the scheduler it's an obvious special-case), so we
 	 * do an early lockdep release here:
 	 */
-	lockdep_unpin_lock(&rq->lock, cookie);
+	rq_unpin_lock(rq, rf);
 	spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
 
 	/* Here we just switch the register state and the stack. */
@@ -3241,7 +3241,7 @@ static inline void schedule_debug(struct task_struct *prev)
  * Pick up the highest-prio task:
  */
 static inline struct task_struct *
-pick_next_task(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie)
+pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
 	const struct sched_class *class = &fair_sched_class;
 	struct task_struct *p;
@@ -3252,20 +3252,20 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie
 	 */
 	if (likely(prev->sched_class == class &&
 		   rq->nr_running == rq->cfs.h_nr_running)) {
-		p = fair_sched_class.pick_next_task(rq, prev, cookie);
+		p = fair_sched_class.pick_next_task(rq, prev, rf);
 		if (unlikely(p == RETRY_TASK))
 			goto again;
 
 		/* assumes fair_sched_class->next == idle_sched_class */
 		if (unlikely(!p))
-			p = idle_sched_class.pick_next_task(rq, prev, cookie);
+			p = idle_sched_class.pick_next_task(rq, prev, rf);
 
 		return p;
 	}
 
 again:
 	for_each_class(class) {
-		p = class->pick_next_task(rq, prev, cookie);
+		p = class->pick_next_task(rq, prev, rf);
 		if (p) {
 			if (unlikely(p == RETRY_TASK))
 				goto again;
@@ -3319,7 +3319,7 @@ static void __sched notrace __schedule(bool preempt)
 {
 	struct task_struct *prev, *next;
 	unsigned long *switch_count;
-	struct pin_cookie cookie;
+	struct rq_flags rf;
 	struct rq *rq;
 	int cpu;
 
@@ -3353,7 +3353,7 @@ static void __sched notrace __schedule(bool preempt)
 	 */
 	smp_mb__before_spinlock();
 	raw_spin_lock(&rq->lock);
-	cookie = lockdep_pin_lock(&rq->lock);
+	rq_pin_lock(rq, &rf);
 
 	rq->clock_skip_update <<= 1; /* promote REQ to ACT */
 
@@ -3375,7 +3375,7 @@ static void __sched notrace __schedule(bool preempt)
 
 				to_wakeup = wq_worker_sleeping(prev);
 				if (to_wakeup)
-					try_to_wake_up_local(to_wakeup, cookie);
+					try_to_wake_up_local(to_wakeup, &rf);
 			}
 		}
 		switch_count = &prev->nvcsw;
@@ -3384,7 +3384,7 @@ static void __sched notrace __schedule(bool preempt)
 	if (task_on_rq_queued(prev))
 		update_rq_clock(rq);
 
-	next = pick_next_task(rq, prev, cookie);
+	next = pick_next_task(rq, prev, &rf);
 	clear_tsk_need_resched(prev);
 	clear_preempt_need_resched();
 	rq->clock_skip_update = 0;
@@ -3395,9 +3395,9 @@ static void __sched notrace __schedule(bool preempt)
 		++*switch_count;
 
 		trace_sched_switch(preempt, prev, next);
-		rq = context_switch(rq, prev, next, cookie); /* unlocks the rq */
+		rq = context_switch(rq, prev, next, &rf); /* unlocks the rq */
 	} else {
-		lockdep_unpin_lock(&rq->lock, cookie);
+		rq_unpin_lock(rq, &rf);
 		raw_spin_unlock_irq(&rq->lock);
 	}
 
@@ -5487,7 +5487,7 @@ static void migrate_tasks(struct rq *dead_rq)
 {
 	struct rq *rq = dead_rq;
 	struct task_struct *next, *stop = rq->stop;
-	struct pin_cookie cookie;
+	struct rq_flags rf;
 	int dest_cpu;
 
 	/*
@@ -5519,8 +5519,8 @@ static void migrate_tasks(struct rq *dead_rq)
 		/*
 		 * pick_next_task assumes pinned rq->lock.
 		 */
-		cookie = lockdep_pin_lock(&rq->lock);
-		next = pick_next_task(rq, &fake_task, cookie);
+		rq_pin_lock(rq, &rf);
+		next = pick_next_task(rq, &fake_task, &rf);
 		BUG_ON(!next);
 		next->sched_class->put_prev_task(rq, next);
 
@@ -5533,7 +5533,7 @@ static void migrate_tasks(struct rq *dead_rq)
 		 * because !cpu_active at this point, which means load-balance
 		 * will not interfere. Also, stop-machine.
 		 */
-		lockdep_unpin_lock(&rq->lock, cookie);
+		rq_unpin_lock(rq, &rf);
 		raw_spin_unlock(&rq->lock);
 		raw_spin_lock(&next->pi_lock);
 		raw_spin_lock(&rq->lock);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 0c75bc656178..9c7abab9f61e 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -663,9 +663,9 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
 		 * Nothing relies on rq->lock after this, so its safe to drop
 		 * rq->lock.
 		 */
-		lockdep_unpin_lock(&rq->lock, rf.cookie);
+		rq_unpin_lock(rq, &rf);
 		push_dl_task(rq);
-		lockdep_repin_lock(&rq->lock, rf.cookie);
+		rq_repin_lock(rq, &rf);
 	}
 #endif
 
@@ -1119,7 +1119,7 @@ static struct sched_dl_entity *pick_next_dl_entity(struct rq *rq,
 }
 
 struct task_struct *
-pick_next_task_dl(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie)
+pick_next_task_dl(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
 	struct sched_dl_entity *dl_se;
 	struct task_struct *p;
@@ -1134,9 +1134,9 @@ pick_next_task_dl(struct rq *rq, struct task_struct *prev, struct pin_cookie coo
 		 * disabled avoiding further scheduler activity on it and we're
 		 * being very careful to re-start the picking loop.
 		 */
-		lockdep_unpin_lock(&rq->lock, cookie);
+		rq_unpin_lock(rq, rf);
 		pull_dl_task(rq);
-		lockdep_repin_lock(&rq->lock, cookie);
+		rq_repin_lock(rq, rf);
 		/*
 		 * pull_rt_task() can drop (and re-acquire) rq->lock; this
 		 * means a stop task can slip in, in which case we need to
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 85ca4ddab0d3..f0032827fb79 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5718,7 +5718,7 @@ preempt:
 }
 
 static struct task_struct *
-pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie)
+pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
 	struct cfs_rq *cfs_rq = &rq->cfs;
 	struct sched_entity *se;
@@ -5831,9 +5831,9 @@ idle:
 	 * further scheduler activity on it and we're being very careful to
 	 * re-start the picking loop.
 	 */
-	lockdep_unpin_lock(&rq->lock, cookie);
+	rq_unpin_lock(rq, rf);
 	new_tasks = idle_balance(rq);
-	lockdep_repin_lock(&rq->lock, cookie);
+	rq_repin_lock(rq, rf);
 	/*
 	 * Because idle_balance() releases (and re-acquires) rq->lock, it is
 	 * possible for any higher priority task to appear. In that case we
diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
index dedc81ecbb2e..5a84cc97118f 100644
--- a/kernel/sched/idle_task.c
+++ b/kernel/sched/idle_task.c
@@ -24,7 +24,7 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl
 }
 
 static struct task_struct *
-pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie)
+pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
 	put_prev_task(rq, prev);
 
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index d5690b722691..ab79812ee33e 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1524,7 +1524,7 @@ static struct task_struct *_pick_next_task_rt(struct rq *rq)
 }
 
 static struct task_struct *
-pick_next_task_rt(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie)
+pick_next_task_rt(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
 	struct task_struct *p;
 	struct rt_rq *rt_rq = &rq->rt;
@@ -1536,9 +1536,9 @@ pick_next_task_rt(struct rq *rq, struct task_struct *prev, struct pin_cookie coo
 		 * disabled avoiding further scheduler activity on it and we're
 		 * being very careful to re-start the picking loop.
 		 */
-		lockdep_unpin_lock(&rq->lock, cookie);
+		rq_unpin_lock(rq, rf);
 		pull_rt_task(rq);
-		lockdep_repin_lock(&rq->lock, cookie);
+		rq_repin_lock(rq, rf);
 		/*
 		 * pull_rt_task() can drop (and re-acquire) rq->lock; this
 		 * means a dl or stop task can slip in, in which case we need
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 420c05d099c3..bf48e7975c23 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -761,6 +761,26 @@ static inline void rq_clock_skip_update(struct rq *rq, bool skip)
 		rq->clock_skip_update &= ~RQCF_REQ_SKIP;
 }
 
+struct rq_flags {
+	unsigned long flags;
+	struct pin_cookie cookie;
+};
+
+static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf)
+{
+	rf->cookie = lockdep_pin_lock(&rq->lock);
+}
+
+static inline void rq_unpin_lock(struct rq *rq, struct rq_flags *rf)
+{
+	lockdep_unpin_lock(&rq->lock, rf->cookie);
+}
+
+static inline void rq_repin_lock(struct rq *rq, struct rq_flags *rf)
+{
+	lockdep_repin_lock(&rq->lock, rf->cookie);
+}
+
 #ifdef CONFIG_NUMA
 enum numa_topology_type {
 	NUMA_DIRECT,
@@ -1212,7 +1232,7 @@ struct sched_class {
 	 */
 	struct task_struct * (*pick_next_task) (struct rq *rq,
 						struct task_struct *prev,
-						struct pin_cookie cookie);
+						struct rq_flags *rf);
 	void (*put_prev_task) (struct rq *rq, struct task_struct *p);
 
 #ifdef CONFIG_SMP
@@ -1463,11 +1483,6 @@ static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta) { }
 static inline void sched_avg_update(struct rq *rq) { }
 #endif
 
-struct rq_flags {
-	unsigned long flags;
-	struct pin_cookie cookie;
-};
-
 struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)
 	__acquires(rq->lock);
 struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)
@@ -1477,7 +1492,7 @@ struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)
 static inline void __task_rq_unlock(struct rq *rq, struct rq_flags *rf)
 	__releases(rq->lock)
 {
-	lockdep_unpin_lock(&rq->lock, rf->cookie);
+	rq_unpin_lock(rq, rf);
 	raw_spin_unlock(&rq->lock);
 }
 
@@ -1486,7 +1501,7 @@ task_rq_unlock(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
 	__releases(rq->lock)
 	__releases(p->pi_lock)
 {
-	lockdep_unpin_lock(&rq->lock, rf->cookie);
+	rq_unpin_lock(rq, rf);
 	raw_spin_unlock(&rq->lock);
 	raw_spin_unlock_irqrestore(&p->pi_lock, rf->flags);
 }
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 604297a08b3a..9f69fb630853 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -24,7 +24,7 @@ check_preempt_curr_stop(struct rq *rq, struct task_struct *p, int flags)
 }
 
 static struct task_struct *
-pick_next_task_stop(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie)
+pick_next_task_stop(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
 	struct task_struct *stop = rq->stop;
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 5/7] sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock
  2016-09-21 13:38 [PATCH v2 0/7] sched: Diagnostic checks for missing rq clock updates Matt Fleming
                   ` (3 preceding siblings ...)
  2016-09-21 13:38 ` [PATCH v2 4/7] sched: Add wrappers for lockdep_(un)pin_lock() Matt Fleming
@ 2016-09-21 13:38 ` Matt Fleming
  2017-01-14 12:41   ` [tip:sched/core] " tip-bot for Matt Fleming
  2016-09-21 13:38 ` [PATCH v2 6/7] sched/fair: Push rq lock pin/unpin into idle_balance() Matt Fleming
  2016-09-21 13:38 ` [PATCH v2 7/7] sched/core: Add debug code to catch missing update_rq_clock() Matt Fleming
  6 siblings, 1 reply; 44+ messages in thread
From: Matt Fleming @ 2016-09-21 13:38 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Byungchul Park, Frederic Weisbecker, Luca Abeni, Rik van Riel,
	Thomas Gleixner, Wanpeng Li, Yuyang Du, Petr Mladek, Jan Kara,
	Sergey Senozhatsky, linux-kernel, Mel Gorman, Mike Galbraith,
	Matt Fleming

rq_clock() is called from sched_info_{depart,arrive}() after resetting
RQCF_ACT_SKIP but prior to a call to update_rq_clock().

In preparation for pending patches that check whether the rq clock has
been updated inside of a pin context before rq_clock() is called, move
the reset of rq->clock_skip_update immediately before unpinning the rq
lock.

This will avoid the new warnings which check if update_rq_clock() is
being actively skipped.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
---
 kernel/sched/core.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7950c372fca0..1254629c9f2f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2871,6 +2871,9 @@ context_switch(struct rq *rq, struct task_struct *prev,
 		prev->active_mm = NULL;
 		rq->prev_mm = oldmm;
 	}
+
+	rq->clock_skip_update = 0;
+
 	/*
 	 * Since the runqueue lock will be released by the next
 	 * task (which is an invalid locking op but in the case
@@ -3387,7 +3390,6 @@ static void __sched notrace __schedule(bool preempt)
 	next = pick_next_task(rq, prev, &rf);
 	clear_tsk_need_resched(prev);
 	clear_preempt_need_resched();
-	rq->clock_skip_update = 0;
 
 	if (likely(prev != next)) {
 		rq->nr_switches++;
@@ -3397,6 +3399,7 @@ static void __sched notrace __schedule(bool preempt)
 		trace_sched_switch(preempt, prev, next);
 		rq = context_switch(rq, prev, next, &rf); /* unlocks the rq */
 	} else {
+		rq->clock_skip_update = 0;
 		rq_unpin_lock(rq, &rf);
 		raw_spin_unlock_irq(&rq->lock);
 	}
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 6/7] sched/fair: Push rq lock pin/unpin into idle_balance()
  2016-09-21 13:38 [PATCH v2 0/7] sched: Diagnostic checks for missing rq clock updates Matt Fleming
                   ` (4 preceding siblings ...)
  2016-09-21 13:38 ` [PATCH v2 5/7] sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock Matt Fleming
@ 2016-09-21 13:38 ` Matt Fleming
  2017-01-14 12:41   ` [tip:sched/core] " tip-bot for Matt Fleming
  2016-09-21 13:38 ` [PATCH v2 7/7] sched/core: Add debug code to catch missing update_rq_clock() Matt Fleming
  6 siblings, 1 reply; 44+ messages in thread
From: Matt Fleming @ 2016-09-21 13:38 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Byungchul Park, Frederic Weisbecker, Luca Abeni, Rik van Riel,
	Thomas Gleixner, Wanpeng Li, Yuyang Du, Petr Mladek, Jan Kara,
	Sergey Senozhatsky, linux-kernel, Mel Gorman, Mike Galbraith,
	Matt Fleming

Future patches will emit warnings if rq_clock() is called before
update_rq_clock() inside a rq_pin_lock()/rq_unpin_lock() pair.

Since there is only one caller of idle_balance() we can push the
unpin/repin there.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
---
 kernel/sched/fair.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f0032827fb79..df9a5b16e1df 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3232,7 +3232,7 @@ static inline unsigned long cfs_rq_load_avg(struct cfs_rq *cfs_rq)
 	return cfs_rq->avg.load_avg;
 }
 
-static int idle_balance(struct rq *this_rq);
+static int idle_balance(struct rq *this_rq, struct rq_flags *rf);
 
 #else /* CONFIG_SMP */
 
@@ -3261,7 +3261,7 @@ attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
 static inline void
 detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
 
-static inline int idle_balance(struct rq *rq)
+static inline int idle_balance(struct rq *rq, struct rq_flags *rf)
 {
 	return 0;
 }
@@ -5825,15 +5825,8 @@ simple:
 	return p;
 
 idle:
-	/*
-	 * This is OK, because current is on_cpu, which avoids it being picked
-	 * for load-balance and preemption/IRQs are still disabled avoiding
-	 * further scheduler activity on it and we're being very careful to
-	 * re-start the picking loop.
-	 */
-	rq_unpin_lock(rq, rf);
-	new_tasks = idle_balance(rq);
-	rq_repin_lock(rq, rf);
+	new_tasks = idle_balance(rq, rf);
+
 	/*
 	 * Because idle_balance() releases (and re-acquires) rq->lock, it is
 	 * possible for any higher priority task to appear. In that case we
@@ -7767,7 +7760,7 @@ update_next_balance(struct sched_domain *sd, unsigned long *next_balance)
  * idle_balance is called by schedule() if this_cpu is about to become
  * idle. Attempts to pull tasks from other CPUs.
  */
-static int idle_balance(struct rq *this_rq)
+static int idle_balance(struct rq *this_rq, struct rq_flags *rf)
 {
 	unsigned long next_balance = jiffies + HZ;
 	int this_cpu = this_rq->cpu;
@@ -7781,6 +7774,14 @@ static int idle_balance(struct rq *this_rq)
 	 */
 	this_rq->idle_stamp = rq_clock(this_rq);
 
+	/*
+	 * This is OK, because current is on_cpu, which avoids it being picked
+	 * for load-balance and preemption/IRQs are still disabled avoiding
+	 * further scheduler activity on it and we're being very careful to
+	 * re-start the picking loop.
+	 */
+	rq_unpin_lock(this_rq, rf);
+
 	if (this_rq->avg_idle < sysctl_sched_migration_cost ||
 	    !this_rq->rd->overload) {
 		rcu_read_lock();
@@ -7858,6 +7859,8 @@ out:
 	if (pulled_task)
 		this_rq->idle_stamp = 0;
 
+	rq_repin_lock(this_rq, rf);
+
 	return pulled_task;
 }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v2 7/7] sched/core: Add debug code to catch missing update_rq_clock()
  2016-09-21 13:38 [PATCH v2 0/7] sched: Diagnostic checks for missing rq clock updates Matt Fleming
                   ` (5 preceding siblings ...)
  2016-09-21 13:38 ` [PATCH v2 6/7] sched/fair: Push rq lock pin/unpin into idle_balance() Matt Fleming
@ 2016-09-21 13:38 ` Matt Fleming
  2016-09-21 15:58   ` Petr Mladek
  2017-01-14 12:44   ` [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls tip-bot for Matt Fleming
  6 siblings, 2 replies; 44+ messages in thread
From: Matt Fleming @ 2016-09-21 13:38 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Byungchul Park, Frederic Weisbecker, Luca Abeni, Rik van Riel,
	Thomas Gleixner, Wanpeng Li, Yuyang Du, Petr Mladek, Jan Kara,
	Sergey Senozhatsky, linux-kernel, Mel Gorman, Mike Galbraith,
	Matt Fleming

There's no diagnostic checks for figuring out when we've accidentally
missed update_rq_clock() calls. Let's add some by piggybacking on the
rq_*pin_lock() wrappers.

The idea behind the diagnostic checks is that upon pining rq lock the
rq clock should be updated, via update_rq_clock(), before anybody
reads the clock with rq_clock() or rq_clock_task().

The exception to this rule is when updates have explicitly been
disabled with the rq_clock_skip_update() optimisation.

There are some functions that only unpin the rq lock in order to grab
some other lock and avoid deadlock. In that case we don't need to
update the clock again and the previous diagnostic state can be
carried over in rq_repin_lock() by saving the state in the rq_flags
context.

Since this patch adds a new clock update flag and some already exist
in rq::clock_skip_update, that field has now been renamed. An attempt
has been made to keep the flag manipulation code small and fast since
it's used in the heart of the __schedule() fast path.

For the !CONFIG_SCHED_DEBUG case the only object code change (other
than addresses) is the following change to reset RQCF_ACT_SKIP inside
of __schedule(),

  -       c7 83 38 09 00 00 00    movl   $0x0,0x938(%rbx)
  -       00 00 00
  +       83 a3 38 09 00 00 fc    andl   $0xfffffffc,0x938(%rbx)

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
---
 kernel/sched/core.c  | 11 +++++---
 kernel/sched/sched.h | 76 +++++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 77 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1254629c9f2f..95f804b3a413 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -102,9 +102,12 @@ void update_rq_clock(struct rq *rq)
 
 	lockdep_assert_held(&rq->lock);
 
-	if (rq->clock_skip_update & RQCF_ACT_SKIP)
+	if (rq->clock_update_flags & RQCF_ACT_SKIP)
 		return;
 
+#ifdef CONFIG_SCHED_DEBUG
+	rq->clock_update_flags |= RQCF_UPDATED;
+#endif
 	delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;
 	if (delta < 0)
 		return;
@@ -2872,7 +2875,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
 		rq->prev_mm = oldmm;
 	}
 
-	rq->clock_skip_update = 0;
+	rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
 
 	/*
 	 * Since the runqueue lock will be released by the next
@@ -3358,7 +3361,7 @@ static void __sched notrace __schedule(bool preempt)
 	raw_spin_lock(&rq->lock);
 	rq_pin_lock(rq, &rf);
 
-	rq->clock_skip_update <<= 1; /* promote REQ to ACT */
+	rq->clock_update_flags <<= 1; /* promote REQ to ACT */
 
 	switch_count = &prev->nivcsw;
 	if (!preempt && prev->state) {
@@ -3399,7 +3402,7 @@ static void __sched notrace __schedule(bool preempt)
 		trace_sched_switch(preempt, prev, next);
 		rq = context_switch(rq, prev, next, &rf); /* unlocks the rq */
 	} else {
-		rq->clock_skip_update = 0;
+		rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
 		rq_unpin_lock(rq, &rf);
 		raw_spin_unlock_irq(&rq->lock);
 	}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index bf48e7975c23..91f4b3d58d56 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -630,7 +630,7 @@ struct rq {
 	unsigned long next_balance;
 	struct mm_struct *prev_mm;
 
-	unsigned int clock_skip_update;
+	unsigned int clock_update_flags;
 	u64 clock;
 	u64 clock_task;
 
@@ -737,48 +737,112 @@ static inline u64 __rq_clock_broken(struct rq *rq)
 	return READ_ONCE(rq->clock);
 }
 
+/*
+ * rq::clock_update_flags bits
+ *
+ * %RQCF_REQ_SKIP - will request skipping of clock update on the next
+ *  call to __schedule(). This is an optimisation to avoid
+ *  neighbouring rq clock updates.
+ *
+ * %RQCF_ACT_SKIP - is set from inside of __schedule() when skipping is
+ *  in effect and calls to update_rq_clock() are being ignored.
+ *
+ * %RQCF_UPDATED - is a debug flag that indicates whether a call has been
+ *  made to update_rq_clock() since the last time rq::lock was pinned.
+ *
+ * If inside of __schedule(), clock_update_flags will have been
+ * shifted left (a left shift is a cheap operation for the fast path
+ * to promote %RQCF_REQ_SKIP to %RQCF_ACT_SKIP), so you must use,
+ *
+ *	if (rq-clock_update_flags >= RQCF_UPDATED)
+ *
+ * to check if %RQCF_UPADTED is set. It'll never be shifted more than
+ * one position though, because the next rq_unpin_lock() will shift it
+ * back.
+ */
+#define RQCF_REQ_SKIP	0x01
+#define RQCF_ACT_SKIP	0x02
+#define RQCF_UPDATED	0x04
+
+static inline void assert_clock_updated(struct rq *rq)
+{
+#ifdef CONFIG_SCHED_DEBUG
+	/*
+	 * The only reason for not seeing a clock update since the
+	 * last rq_pin_lock() is if we're currently skipping updates.
+	 */
+	WARN_ON_ONCE(rq->clock_update_flags < RQCF_ACT_SKIP);
+#endif
+}
+
 static inline u64 rq_clock(struct rq *rq)
 {
 	lockdep_assert_held(&rq->lock);
+	assert_clock_updated(rq);
+
 	return rq->clock;
 }
 
 static inline u64 rq_clock_task(struct rq *rq)
 {
 	lockdep_assert_held(&rq->lock);
+	assert_clock_updated(rq);
+
 	return rq->clock_task;
 }
 
-#define RQCF_REQ_SKIP	0x01
-#define RQCF_ACT_SKIP	0x02
-
 static inline void rq_clock_skip_update(struct rq *rq, bool skip)
 {
 	lockdep_assert_held(&rq->lock);
 	if (skip)
-		rq->clock_skip_update |= RQCF_REQ_SKIP;
+		rq->clock_update_flags |= RQCF_REQ_SKIP;
 	else
-		rq->clock_skip_update &= ~RQCF_REQ_SKIP;
+		rq->clock_update_flags &= ~RQCF_REQ_SKIP;
 }
 
 struct rq_flags {
 	unsigned long flags;
 	struct pin_cookie cookie;
+#ifdef CONFIG_SCHED_DEBUG
+	/*
+	 * A copy of (rq::clock_update_flags & RQCF_UPDATED) for the
+	 * current pin context is stashed here in case it needs to be
+	 * restored in rq_repin_lock().
+	 */
+	unsigned int clock_update_flags;
+#endif
 };
 
 static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf)
 {
 	rf->cookie = lockdep_pin_lock(&rq->lock);
+
+#ifdef CONFIG_SCHED_DEBUG
+	rq->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
+	rf->clock_update_flags = 0;
+#endif
 }
 
 static inline void rq_unpin_lock(struct rq *rq, struct rq_flags *rf)
 {
+#ifdef CONFIG_SCHED_DEBUG
+	if (rq->clock_update_flags > RQCF_ACT_SKIP)
+		rf->clock_update_flags = RQCF_UPDATED;
+#endif
+
 	lockdep_unpin_lock(&rq->lock, rf->cookie);
 }
 
 static inline void rq_repin_lock(struct rq *rq, struct rq_flags *rf)
 {
 	lockdep_repin_lock(&rq->lock, rf->cookie);
+
+#ifdef CONFIG_SCHED_DEBUG
+	/*
+	 * Restore the value we stashed in @rf for this pin context.
+	 */
+	rq->clock_update_flags |= rf->clock_update_flags;
+#endif
 }
 
 #ifdef CONFIG_NUMA
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 7/7] sched/core: Add debug code to catch missing update_rq_clock()
  2016-09-21 13:38 ` [PATCH v2 7/7] sched/core: Add debug code to catch missing update_rq_clock() Matt Fleming
@ 2016-09-21 15:58   ` Petr Mladek
  2016-09-21 19:08     ` Matt Fleming
  2016-09-22  8:04     ` Peter Zijlstra
  2017-01-14 12:44   ` [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls tip-bot for Matt Fleming
  1 sibling, 2 replies; 44+ messages in thread
From: Petr Mladek @ 2016-09-21 15:58 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Peter Zijlstra, Ingo Molnar, Byungchul Park, Frederic Weisbecker,
	Luca Abeni, Rik van Riel, Thomas Gleixner, Wanpeng Li, Yuyang Du,
	Jan Kara, Sergey Senozhatsky, linux-kernel, Mel Gorman,
	Mike Galbraith

On Wed 2016-09-21 14:38:13, Matt Fleming wrote:
> There's no diagnostic checks for figuring out when we've accidentally
> missed update_rq_clock() calls. Let's add some by piggybacking on the
> rq_*pin_lock() wrappers.
> 
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index bf48e7975c23..91f4b3d58d56 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> +/*
> + * rq::clock_update_flags bits
> + *
> + * %RQCF_REQ_SKIP - will request skipping of clock update on the next
> + *  call to __schedule(). This is an optimisation to avoid
> + *  neighbouring rq clock updates.
> + *
> + * %RQCF_ACT_SKIP - is set from inside of __schedule() when skipping is
> + *  in effect and calls to update_rq_clock() are being ignored.
> + *
> + * %RQCF_UPDATED - is a debug flag that indicates whether a call has been
> + *  made to update_rq_clock() since the last time rq::lock was pinned.
> + *
> + * If inside of __schedule(), clock_update_flags will have been
> + * shifted left (a left shift is a cheap operation for the fast path
> + * to promote %RQCF_REQ_SKIP to %RQCF_ACT_SKIP), so you must use,
> + *
> + *	if (rq-clock_update_flags >= RQCF_UPDATED)
> + *
> + * to check if %RQCF_UPADTED is set. It'll never be shifted more than
> + * one position though, because the next rq_unpin_lock() will shift it
> + * back.
> + */
> +#define RQCF_REQ_SKIP	0x01
> +#define RQCF_ACT_SKIP	0x02
> +#define RQCF_UPDATED	0x04
> +
> +static inline void assert_clock_updated(struct rq *rq)
> +{
> +#ifdef CONFIG_SCHED_DEBUG
> +	/*
> +	 * The only reason for not seeing a clock update since the
> +	 * last rq_pin_lock() is if we're currently skipping updates.
> +	 */
> +	WARN_ON_ONCE(rq->clock_update_flags < RQCF_ACT_SKIP);
> +#endif
> +}

I am afraid that it might eventually create a deadlock.
For example, there is the following call chain:

+ printk()
  + vprintk_func -> vprintk_default()
    + vprinkt_emit()
      + console_unlock()
        + up_console_sem()
	  + up()		# takes &sem->lock
	    + __up()
	      + wake_up_process()
	        + try_to_wake_up()
		  + ttwu_queue()
		    + ttwu_do_activate()
		      + ttwu_do_wakeup()
		        + rq_clock()
			  + lockdep_assert_held()
			    + WARN_ON_ONCE()
			      + printk()
			        + vprintk_func -> vprintk_default()
				  + vprintk_emit()
				    + console_try_lock()
				      + down_trylock_console_sem()
				        + __down_trylock_console_sem()
					  + down_trylock()

   DEADLOCK: Unable to take &sem->lock


We have recently discussed similar deadlock, see the thread
around https://lkml.kernel.org/r/20160714221251.GE3057@ubuntu

A temporary solution would be to replace the WARN_ON_ONCE()
by printk_deferred(). Of course, this is far from ideal because
you do not get the stack, ...

Sergey is working on WARN_ON_ONCE_DEFERRED() but it is not
an easy task.


>  static inline u64 rq_clock(struct rq *rq)
>  {
>  	lockdep_assert_held(&rq->lock);
> +	assert_clock_updated(rq);
> +
>  	return rq->clock;
>  }
>  

I am not sure how the above call chain is realistic. But adding
WARN_ON() into the scheduler paths is risky in general.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 7/7] sched/core: Add debug code to catch missing update_rq_clock()
  2016-09-21 15:58   ` Petr Mladek
@ 2016-09-21 19:08     ` Matt Fleming
  2016-09-21 19:46       ` Thomas Gleixner
  2016-09-22  0:44       ` Sergey Senozhatsky
  2016-09-22  8:04     ` Peter Zijlstra
  1 sibling, 2 replies; 44+ messages in thread
From: Matt Fleming @ 2016-09-21 19:08 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Peter Zijlstra, Ingo Molnar, Byungchul Park, Frederic Weisbecker,
	Luca Abeni, Rik van Riel, Thomas Gleixner, Wanpeng Li, Yuyang Du,
	Jan Kara, Sergey Senozhatsky, linux-kernel, Mel Gorman,
	Mike Galbraith

On Wed, 21 Sep, at 05:58:27PM, Petr Mladek wrote:
> 
> I am not sure how the above call chain is realistic. But adding
> WARN_ON() into the scheduler paths is risky in general.

It's not clear to me why this should be the case. WARN_ON() calls have
existed in the scheduler paths since forever.

If the new async printk patches make that impossible then surely they
need fixing, not the scheduler?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 7/7] sched/core: Add debug code to catch missing update_rq_clock()
  2016-09-21 19:08     ` Matt Fleming
@ 2016-09-21 19:46       ` Thomas Gleixner
  2016-09-22  0:44       ` Sergey Senozhatsky
  1 sibling, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2016-09-21 19:46 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Petr Mladek, Peter Zijlstra, Ingo Molnar, Byungchul Park,
	Frederic Weisbecker, Luca Abeni, Rik van Riel, Wanpeng Li,
	Yuyang Du, Jan Kara, Sergey Senozhatsky, linux-kernel,
	Mel Gorman, Mike Galbraith

On Wed, 21 Sep 2016, Matt Fleming wrote:

> On Wed, 21 Sep, at 05:58:27PM, Petr Mladek wrote:
> > 
> > I am not sure how the above call chain is realistic. But adding
> > WARN_ON() into the scheduler paths is risky in general.
> 
> It's not clear to me why this should be the case. WARN_ON() calls have
> existed in the scheduler paths since forever.

Everything which end up in printk within a rq->lock held section has been
have been prone to deadlocks for a very long time. Guess why
printk_deferred (the former printk_sched) exists.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 7/7] sched/core: Add debug code to catch missing update_rq_clock()
  2016-09-21 19:08     ` Matt Fleming
  2016-09-21 19:46       ` Thomas Gleixner
@ 2016-09-22  0:44       ` Sergey Senozhatsky
  1 sibling, 0 replies; 44+ messages in thread
From: Sergey Senozhatsky @ 2016-09-22  0:44 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Petr Mladek, Peter Zijlstra, Ingo Molnar, Byungchul Park,
	Frederic Weisbecker, Luca Abeni, Rik van Riel, Thomas Gleixner,
	Wanpeng Li, Yuyang Du, Jan Kara, Sergey Senozhatsky,
	linux-kernel, Mel Gorman, Mike Galbraith

Hello,

On (09/21/16 20:08), Matt Fleming wrote:
> On Wed, 21 Sep, at 05:58:27PM, Petr Mladek wrote:
> > 
> > I am not sure how the above call chain is realistic. But adding
> > WARN_ON() into the scheduler paths is risky in general.
> 
> It's not clear to me why this should be the case. WARN_ON() calls have
> existed in the scheduler paths since forever.
> 
> If the new async printk patches make that impossible then surely they
> need fixing, not the scheduler?

it's not specific to async printk, because printk already invokes scheduler
via semaphore up().

	-ss

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 7/7] sched/core: Add debug code to catch missing update_rq_clock()
  2016-09-21 15:58   ` Petr Mladek
  2016-09-21 19:08     ` Matt Fleming
@ 2016-09-22  8:04     ` Peter Zijlstra
  2016-09-22  8:36       ` Jan Kara
  1 sibling, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2016-09-22  8:04 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Matt Fleming, Ingo Molnar, Byungchul Park, Frederic Weisbecker,
	Luca Abeni, Rik van Riel, Thomas Gleixner, Wanpeng Li, Yuyang Du,
	Jan Kara, Sergey Senozhatsky, linux-kernel, Mel Gorman,
	Mike Galbraith

On Wed, Sep 21, 2016 at 05:58:27PM +0200, Petr Mladek wrote:
> > +static inline void assert_clock_updated(struct rq *rq)
> > +{
> > +#ifdef CONFIG_SCHED_DEBUG
> > +	/*
> > +	 * The only reason for not seeing a clock update since the
> > +	 * last rq_pin_lock() is if we're currently skipping updates.
> > +	 */
> > +	WARN_ON_ONCE(rq->clock_update_flags < RQCF_ACT_SKIP);
> > +#endif
> > +}
> 
> I am afraid that it might eventually create a deadlock.
> For example, there is the following call chain:
> 

Yeah, meh. There's already plenty WARNs in the sched code. The idea of
course being that they should not trigger. If they do, something
buggered already, so who bloody cares about a deadlock later ;-)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 7/7] sched/core: Add debug code to catch missing update_rq_clock()
  2016-09-22  8:04     ` Peter Zijlstra
@ 2016-09-22  8:36       ` Jan Kara
  2016-09-22  9:39         ` Peter Zijlstra
  0 siblings, 1 reply; 44+ messages in thread
From: Jan Kara @ 2016-09-22  8:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Petr Mladek, Matt Fleming, Ingo Molnar, Byungchul Park,
	Frederic Weisbecker, Luca Abeni, Rik van Riel, Thomas Gleixner,
	Wanpeng Li, Yuyang Du, Jan Kara, Sergey Senozhatsky,
	linux-kernel, Mel Gorman, Mike Galbraith

On Thu 22-09-16 10:04:36, Peter Zijlstra wrote:
> On Wed, Sep 21, 2016 at 05:58:27PM +0200, Petr Mladek wrote:
> > > +static inline void assert_clock_updated(struct rq *rq)
> > > +{
> > > +#ifdef CONFIG_SCHED_DEBUG
> > > +	/*
> > > +	 * The only reason for not seeing a clock update since the
> > > +	 * last rq_pin_lock() is if we're currently skipping updates.
> > > +	 */
> > > +	WARN_ON_ONCE(rq->clock_update_flags < RQCF_ACT_SKIP);
> > > +#endif
> > > +}
> > 
> > I am afraid that it might eventually create a deadlock.
> > For example, there is the following call chain:
> > 
> 
> Yeah, meh. There's already plenty WARNs in the sched code. The idea of
> course being that they should not trigger. If they do, something
> buggered already, so who bloody cares about a deadlock later ;-)

Yeah, the trouble is that you usually won't see the WARN message
before deadlocking. So WARN_ON in scheduler is usually equivalent to

	if (condition)
		while (1);

;) Not really helping debugging much...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 7/7] sched/core: Add debug code to catch missing update_rq_clock()
  2016-09-22  8:36       ` Jan Kara
@ 2016-09-22  9:39         ` Peter Zijlstra
  2016-09-22 10:17           ` Peter Zijlstra
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2016-09-22  9:39 UTC (permalink / raw)
  To: Jan Kara
  Cc: Petr Mladek, Matt Fleming, Ingo Molnar, Byungchul Park,
	Frederic Weisbecker, Luca Abeni, Rik van Riel, Thomas Gleixner,
	Wanpeng Li, Yuyang Du, Sergey Senozhatsky, linux-kernel,
	Mel Gorman, Mike Galbraith

On Thu, Sep 22, 2016 at 10:36:37AM +0200, Jan Kara wrote:
> On Thu 22-09-16 10:04:36, Peter Zijlstra wrote:
> > On Wed, Sep 21, 2016 at 05:58:27PM +0200, Petr Mladek wrote:
> > > > +static inline void assert_clock_updated(struct rq *rq)
> > > > +{
> > > > +#ifdef CONFIG_SCHED_DEBUG
> > > > +	/*
> > > > +	 * The only reason for not seeing a clock update since the
> > > > +	 * last rq_pin_lock() is if we're currently skipping updates.
> > > > +	 */
> > > > +	WARN_ON_ONCE(rq->clock_update_flags < RQCF_ACT_SKIP);
> > > > +#endif
> > > > +}
> > > 
> > > I am afraid that it might eventually create a deadlock.
> > > For example, there is the following call chain:
> > > 
> > 
> > Yeah, meh. There's already plenty WARNs in the sched code. The idea of
> > course being that they should not trigger. If they do, something
> > buggered already, so who bloody cares about a deadlock later ;-)
> 
> Yeah, the trouble is that you usually won't see the WARN message
> before deadlocking. So WARN_ON in scheduler is usually equivalent to
> 
> 	if (condition)
> 		while (1);
> 
> ;) Not really helping debugging much...

Only if you use the normal piece of crap printk() cruft ;-)

I have the below patch that cures all its woes.. Of course, the driver
model is still broken, but it was that before this as well.
vprintk_emit() really should not be exposed and used directly.

bit banging the serial port is absolutely awesome, the rest of printk
not so much.

---
 kernel/printk/printk.c | 63 +++++++++++++++++++++++++++++++++-----------------
 1 file changed, 42 insertions(+), 21 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index eea6dbc2d8cf..24951ed47835 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -344,6 +344,42 @@ __packed __aligned(4)
 #endif
 ;
 
+#ifdef CONFIG_EARLY_PRINTK
+struct console *early_console;
+
+static bool __read_mostly force_early_printk;
+
+static int __init force_early_printk_setup(char *str)
+{
+	force_early_printk = true;
+	return 0;
+}
+early_param("force_early_printk", force_early_printk_setup);
+
+static int early_vprintk(const char *fmt, va_list args)
+{
+	char buf[512];
+	int n;
+
+	n = vscnprintf(buf, sizeof(buf), fmt, args);
+	early_console->write(early_console, buf, n);
+
+	return n;
+}
+
+asmlinkage __visible void early_printk(const char *fmt, ...)
+{
+	va_list ap;
+
+	if (!early_console)
+		return;
+
+	va_start(ap, fmt);
+	early_vprintk(fmt, ap);
+	va_end(ap);
+}
+#endif
+
 /*
  * The logbuf_lock protects kmsg buffer, indices, counters.  This can be taken
  * within the scheduler's rq lock. It must be released before calling
@@ -1973,7 +2009,12 @@ asmlinkage __visible int printk(const char *fmt, ...)
 	int r;
 
 	va_start(args, fmt);
-	r = vprintk_func(fmt, args);
+#ifdef CONFIG_EARLY_PRINTK
+	if (force_early_printk && early_console)
+		r = early_vprintk(fmt, args);
+	else
+#endif
+		r = vprintk_func(fmt, args);
 	va_end(args);
 
 	return r;
@@ -2023,26 +2064,6 @@ DEFINE_PER_CPU(printk_func_t, printk_func);
 
 #endif /* CONFIG_PRINTK */
 
-#ifdef CONFIG_EARLY_PRINTK
-struct console *early_console;
-
-asmlinkage __visible void early_printk(const char *fmt, ...)
-{
-	va_list ap;
-	char buf[512];
-	int n;
-
-	if (!early_console)
-		return;
-
-	va_start(ap, fmt);
-	n = vscnprintf(buf, sizeof(buf), fmt, ap);
-	va_end(ap);
-
-	early_console->write(early_console, buf, n);
-}
-#endif
-
 static int __add_preferred_console(char *name, int idx, char *options,
 				   char *brl_options)
 {

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 7/7] sched/core: Add debug code to catch missing update_rq_clock()
  2016-09-22  9:39         ` Peter Zijlstra
@ 2016-09-22 10:17           ` Peter Zijlstra
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2016-09-22 10:17 UTC (permalink / raw)
  To: Jan Kara
  Cc: Petr Mladek, Matt Fleming, Ingo Molnar, Byungchul Park,
	Frederic Weisbecker, Luca Abeni, Rik van Riel, Thomas Gleixner,
	Wanpeng Li, Yuyang Du, Sergey Senozhatsky, linux-kernel,
	Mel Gorman, Mike Galbraith

On Thu, Sep 22, 2016 at 11:39:09AM +0200, Peter Zijlstra wrote:
> bit banging the serial port is absolutely awesome, the rest of printk
> not so much.

I also have this second patch that goes on top of this.

I think I had more hacks (like using the per-cpu NMI buffers for output
instead of the on-stack thing), but I cannot seem to find them just now.

---
 kernel/printk/printk.c |   16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -356,14 +356,28 @@ static int __init force_early_printk_set
 }
 early_param("force_early_printk", force_early_printk_setup);
 
+static int early_printk_cpu = -1;
+
 static int early_vprintk(const char *fmt, va_list args)
 {
+	int n, cpu, old;
 	char buf[512];
-	int n;
+
+	cpu = get_cpu();
+	for (;;) {
+		old = cmpxchg(&early_printk_cpu, -1, cpu);
+		if (old == -1 || old == cpu)
+			break;
+
+		cpu_relax();
+	}
 
 	n = vscnprintf(buf, sizeof(buf), fmt, args);
 	early_console->write(early_console, buf, n);
 
+	smp_store_release(&early_printk_cpu, old);
+	put_cpu();
+
 	return n;
 }
 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/7] sched/fair: Update the rq clock before detaching tasks
  2016-09-21 13:38 ` [PATCH v2 1/7] sched/fair: Update the rq clock before detaching tasks Matt Fleming
@ 2016-10-03 12:49   ` Peter Zijlstra
  2016-10-03 14:37     ` Matt Fleming
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2016-10-03 12:49 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Ingo Molnar, Byungchul Park, Frederic Weisbecker, Luca Abeni,
	Rik van Riel, Thomas Gleixner, Wanpeng Li, Yuyang Du,
	Petr Mladek, Jan Kara, Sergey Senozhatsky, linux-kernel,
	Mel Gorman, Mike Galbraith

On Wed, Sep 21, 2016 at 02:38:07PM +0100, Matt Fleming wrote:
> detach_task_cfs_rq() may indirectly call rq_clock() to inform the
> cpufreq code that the rq utilisation has changed. In which case, we
> need to update the rq clock.

Hurm,. so it would've been good to know the callchain that got you
there.

There's two functions that use detach_task_cfs_rq(), one is through
sched_change_group() and that does indeed lack a rq_clock update.

The other is through switched_from() where its far harder (but still
possible afaict) to miss the update.


Now, neither cases are really fast paths, but it would be good to try
and avoid too many update_rq_clock() calls in the same rq-lock section.
So I'm not entirely sure about the placement here.

But let me go stare at the actual debug framework thing first.. I think
this patch is fallout/fixups from that.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/7] sched/fair: Update the rq clock before detaching tasks
  2016-10-03 12:49   ` Peter Zijlstra
@ 2016-10-03 14:37     ` Matt Fleming
  2016-10-03 14:42       ` Peter Zijlstra
  0 siblings, 1 reply; 44+ messages in thread
From: Matt Fleming @ 2016-10-03 14:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Byungchul Park, Frederic Weisbecker, Luca Abeni,
	Rik van Riel, Thomas Gleixner, Wanpeng Li, Yuyang Du,
	Petr Mladek, Jan Kara, Sergey Senozhatsky, linux-kernel,
	Mel Gorman, Mike Galbraith

On Mon, 03 Oct, at 02:49:07PM, Peter Zijlstra wrote:
> On Wed, Sep 21, 2016 at 02:38:07PM +0100, Matt Fleming wrote:
> > detach_task_cfs_rq() may indirectly call rq_clock() to inform the
> > cpufreq code that the rq utilisation has changed. In which case, we
> > need to update the rq clock.
> 
> Hurm,. so it would've been good to know the callchain that got you
> there.
> 
> There's two functions that use detach_task_cfs_rq(), one is through
> sched_change_group() and that does indeed lack a rq_clock update.
> 
> The other is through switched_from() where its far harder (but still
> possible afaict) to miss the update.
 
It was the former callchain.

> Now, neither cases are really fast paths, but it would be good to try
> and avoid too many update_rq_clock() calls in the same rq-lock section.
> So I'm not entirely sure about the placement here.
> 
> But let me go stare at the actual debug framework thing first.. I think
> this patch is fallout/fixups from that.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 1/7] sched/fair: Update the rq clock before detaching tasks
  2016-10-03 14:37     ` Matt Fleming
@ 2016-10-03 14:42       ` Peter Zijlstra
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2016-10-03 14:42 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Ingo Molnar, Byungchul Park, Frederic Weisbecker, Luca Abeni,
	Rik van Riel, Thomas Gleixner, Wanpeng Li, Yuyang Du,
	Petr Mladek, Jan Kara, Sergey Senozhatsky, linux-kernel,
	Mel Gorman, Mike Galbraith

On Mon, Oct 03, 2016 at 03:37:45PM +0100, Matt Fleming wrote:
> On Mon, 03 Oct, at 02:49:07PM, Peter Zijlstra wrote:

> > The other is through switched_from() where its far harder (but still
> > possible afaict) to miss the update.
>  
> It was the former callchain.

Yep, just found it ;-)

I seem to hit a few you didn't as well.. let me prod at this a wee bit
more before I add more asserts..


4WARNING: CPU: 0 PID: 1 at ../kernel/sched/sched.h:797 detach_task_cfs_rq+0x6fe/0x930
rq->clock_update_flags < RQCF_ACT_SKIPdModules linked in:
dCPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-00637-g67223e2-dirty #553
dHardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
 ffffc900000cbc00 ffffffff816152c5 ffffc900000cbc50 0000000000000000
 ffffc900000cbc40 ffffffff810d5bab 0000031d00017b00 ffff88042f817b68
 ffff88042dbb0000 ffff88042f817b00 ffff88042dbb0000 ffffffff81c18900
Call Trace:
 [<ffffffff816152c5>] dump_stack+0x67/0x92
 [<ffffffff810d5bab>] __warn+0xcb/0xf0
 [<ffffffff810d5c1f>] warn_slowpath_fmt+0x4f/0x60
 [<ffffffff8110efae>] detach_task_cfs_rq+0x6fe/0x930
 [<ffffffff8110f1f1>] switched_from_fair+0x11/0x20
 [<ffffffff810fde77>] __sched_setscheduler+0x2a7/0xb40
 [<ffffffff810fe779>] _sched_setscheduler+0x69/0x70
 [<ffffffff810ff243>] sched_set_stop_task+0x53/0x90
 [<ffffffff81173703>] cpu_stop_create+0x23/0x30
 [<ffffffff810f90c0>] __smpboot_create_thread.part.2+0xb0/0x100
 [<ffffffff810f91ef>] smpboot_register_percpu_thread_cpumask+0xdf/0x140
 [<ffffffff823c24e7>] ? pid_namespaces_init+0x40/0x40
 [<ffffffff823c254b>] cpu_stop_init+0x64/0x9b
 [<ffffffff8100040d>] do_one_initcall+0x3d/0x150
 [<ffffffff8107763d>] ? print_cpu_info+0x7d/0xe0
 [<ffffffff823a0001>] kernel_init_freeable+0xcc/0x207
 [<ffffffff81a7d8d0>] ? rest_init+0x90/0x90
 [<ffffffff81a7d8de>] kernel_init+0xe/0x100
 [<ffffffff81a89bc7>] ret_from_fork+0x27/0x40
4---[ end trace 90bea7c93d2289cb ]---

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [tip:sched/core] sched/core: Add wrappers for lockdep_(un)pin_lock()
  2016-09-21 13:38 ` [PATCH v2 4/7] sched: Add wrappers for lockdep_(un)pin_lock() Matt Fleming
@ 2017-01-14 12:40   ` tip-bot for Matt Fleming
  0 siblings, 0 replies; 44+ messages in thread
From: tip-bot for Matt Fleming @ 2017-01-14 12:40 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: efault, riel, peterz, pmladek, luca.abeni, tglx, torvalds,
	byungchul.park, jack, mingo, hpa, mgorman, fweisbec,
	sergey.senozhatsky.work, yuyang.du, umgwanakikbuti, matt,
	wanpeng.li, linux-kernel

Commit-ID:  d8ac897137a230ec351269f6378017f2decca512
Gitweb:     http://git.kernel.org/tip/d8ac897137a230ec351269f6378017f2decca512
Author:     Matt Fleming <matt@codeblueprint.co.uk>
AuthorDate: Wed, 21 Sep 2016 14:38:10 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 14 Jan 2017 11:29:30 +0100

sched/core: Add wrappers for lockdep_(un)pin_lock()

In preparation for adding diagnostic checks to catch missing calls to
update_rq_clock(), provide wrappers for (re)pinning and unpinning
rq->lock.

Because the pending diagnostic checks allow state to be maintained in
rq_flags across pin contexts, swap the 'struct pin_cookie' arguments
for 'struct rq_flags *'.

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luca Abeni <luca.abeni@unitn.it>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yuyang Du <yuyang.du@intel.com>
Link: http://lkml.kernel.org/r/20160921133813.31976-5-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c      | 80 ++++++++++++++++++++++++------------------------
 kernel/sched/deadline.c  | 10 +++---
 kernel/sched/fair.c      |  6 ++--
 kernel/sched/idle_task.c |  2 +-
 kernel/sched/rt.c        |  6 ++--
 kernel/sched/sched.h     | 31 ++++++++++++++-----
 kernel/sched/stop_task.c |  2 +-
 7 files changed, 76 insertions(+), 61 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c56fb57..41df935 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -185,7 +185,7 @@ struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)
 		rq = task_rq(p);
 		raw_spin_lock(&rq->lock);
 		if (likely(rq == task_rq(p) && !task_on_rq_migrating(p))) {
-			rf->cookie = lockdep_pin_lock(&rq->lock);
+			rq_pin_lock(rq, rf);
 			return rq;
 		}
 		raw_spin_unlock(&rq->lock);
@@ -225,7 +225,7 @@ struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)
 		 * pair with the WMB to ensure we must then also see migrating.
 		 */
 		if (likely(rq == task_rq(p) && !task_on_rq_migrating(p))) {
-			rf->cookie = lockdep_pin_lock(&rq->lock);
+			rq_pin_lock(rq, rf);
 			return rq;
 		}
 		raw_spin_unlock(&rq->lock);
@@ -1195,9 +1195,9 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
 		 * OK, since we're going to drop the lock immediately
 		 * afterwards anyway.
 		 */
-		lockdep_unpin_lock(&rq->lock, rf.cookie);
+		rq_unpin_lock(rq, &rf);
 		rq = move_queued_task(rq, p, dest_cpu);
-		lockdep_repin_lock(&rq->lock, rf.cookie);
+		rq_repin_lock(rq, &rf);
 	}
 out:
 	task_rq_unlock(rq, p, &rf);
@@ -1690,7 +1690,7 @@ static inline void ttwu_activate(struct rq *rq, struct task_struct *p, int en_fl
  * Mark the task runnable and perform wakeup-preemption.
  */
 static void ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags,
-			   struct pin_cookie cookie)
+			   struct rq_flags *rf)
 {
 	check_preempt_curr(rq, p, wake_flags);
 	p->state = TASK_RUNNING;
@@ -1702,9 +1702,9 @@ static void ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags,
 		 * Our task @p is fully woken up and running; so its safe to
 		 * drop the rq->lock, hereafter rq is only used for statistics.
 		 */
-		lockdep_unpin_lock(&rq->lock, cookie);
+		rq_unpin_lock(rq, rf);
 		p->sched_class->task_woken(rq, p);
-		lockdep_repin_lock(&rq->lock, cookie);
+		rq_repin_lock(rq, rf);
 	}
 
 	if (rq->idle_stamp) {
@@ -1723,7 +1723,7 @@ static void ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags,
 
 static void
 ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
-		 struct pin_cookie cookie)
+		 struct rq_flags *rf)
 {
 	int en_flags = ENQUEUE_WAKEUP;
 
@@ -1738,7 +1738,7 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
 #endif
 
 	ttwu_activate(rq, p, en_flags);
-	ttwu_do_wakeup(rq, p, wake_flags, cookie);
+	ttwu_do_wakeup(rq, p, wake_flags, rf);
 }
 
 /*
@@ -1757,7 +1757,7 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
 	if (task_on_rq_queued(p)) {
 		/* check_preempt_curr() may use rq clock */
 		update_rq_clock(rq);
-		ttwu_do_wakeup(rq, p, wake_flags, rf.cookie);
+		ttwu_do_wakeup(rq, p, wake_flags, &rf);
 		ret = 1;
 	}
 	__task_rq_unlock(rq, &rf);
@@ -1770,15 +1770,15 @@ void sched_ttwu_pending(void)
 {
 	struct rq *rq = this_rq();
 	struct llist_node *llist = llist_del_all(&rq->wake_list);
-	struct pin_cookie cookie;
 	struct task_struct *p;
 	unsigned long flags;
+	struct rq_flags rf;
 
 	if (!llist)
 		return;
 
 	raw_spin_lock_irqsave(&rq->lock, flags);
-	cookie = lockdep_pin_lock(&rq->lock);
+	rq_pin_lock(rq, &rf);
 
 	while (llist) {
 		int wake_flags = 0;
@@ -1789,10 +1789,10 @@ void sched_ttwu_pending(void)
 		if (p->sched_remote_wakeup)
 			wake_flags = WF_MIGRATED;
 
-		ttwu_do_activate(rq, p, wake_flags, cookie);
+		ttwu_do_activate(rq, p, wake_flags, &rf);
 	}
 
-	lockdep_unpin_lock(&rq->lock, cookie);
+	rq_unpin_lock(rq, &rf);
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
 }
 
@@ -1881,7 +1881,7 @@ bool cpus_share_cache(int this_cpu, int that_cpu)
 static void ttwu_queue(struct task_struct *p, int cpu, int wake_flags)
 {
 	struct rq *rq = cpu_rq(cpu);
-	struct pin_cookie cookie;
+	struct rq_flags rf;
 
 #if defined(CONFIG_SMP)
 	if (sched_feat(TTWU_QUEUE) && !cpus_share_cache(smp_processor_id(), cpu)) {
@@ -1892,9 +1892,9 @@ static void ttwu_queue(struct task_struct *p, int cpu, int wake_flags)
 #endif
 
 	raw_spin_lock(&rq->lock);
-	cookie = lockdep_pin_lock(&rq->lock);
-	ttwu_do_activate(rq, p, wake_flags, cookie);
-	lockdep_unpin_lock(&rq->lock, cookie);
+	rq_pin_lock(rq, &rf);
+	ttwu_do_activate(rq, p, wake_flags, &rf);
+	rq_unpin_lock(rq, &rf);
 	raw_spin_unlock(&rq->lock);
 }
 
@@ -2111,7 +2111,7 @@ out:
  * ensure that this_rq() is locked, @p is bound to this_rq() and not
  * the current task.
  */
-static void try_to_wake_up_local(struct task_struct *p, struct pin_cookie cookie)
+static void try_to_wake_up_local(struct task_struct *p, struct rq_flags *rf)
 {
 	struct rq *rq = task_rq(p);
 
@@ -2128,11 +2128,11 @@ static void try_to_wake_up_local(struct task_struct *p, struct pin_cookie cookie
 		 * disabled avoiding further scheduler activity on it and we've
 		 * not yet picked a replacement task.
 		 */
-		lockdep_unpin_lock(&rq->lock, cookie);
+		rq_unpin_lock(rq, rf);
 		raw_spin_unlock(&rq->lock);
 		raw_spin_lock(&p->pi_lock);
 		raw_spin_lock(&rq->lock);
-		lockdep_repin_lock(&rq->lock, cookie);
+		rq_repin_lock(rq, rf);
 	}
 
 	if (!(p->state & TASK_NORMAL))
@@ -2143,7 +2143,7 @@ static void try_to_wake_up_local(struct task_struct *p, struct pin_cookie cookie
 	if (!task_on_rq_queued(p))
 		ttwu_activate(rq, p, ENQUEUE_WAKEUP);
 
-	ttwu_do_wakeup(rq, p, 0, cookie);
+	ttwu_do_wakeup(rq, p, 0, rf);
 	ttwu_stat(p, smp_processor_id(), 0);
 out:
 	raw_spin_unlock(&p->pi_lock);
@@ -2590,9 +2590,9 @@ void wake_up_new_task(struct task_struct *p)
 		 * Nothing relies on rq->lock after this, so its fine to
 		 * drop it.
 		 */
-		lockdep_unpin_lock(&rq->lock, rf.cookie);
+		rq_unpin_lock(rq, &rf);
 		p->sched_class->task_woken(rq, p);
-		lockdep_repin_lock(&rq->lock, rf.cookie);
+		rq_repin_lock(rq, &rf);
 	}
 #endif
 	task_rq_unlock(rq, p, &rf);
@@ -2861,7 +2861,7 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev)
  */
 static __always_inline struct rq *
 context_switch(struct rq *rq, struct task_struct *prev,
-	       struct task_struct *next, struct pin_cookie cookie)
+	       struct task_struct *next, struct rq_flags *rf)
 {
 	struct mm_struct *mm, *oldmm;
 
@@ -2893,7 +2893,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
 	 * of the scheduler it's an obvious special-case), so we
 	 * do an early lockdep release here:
 	 */
-	lockdep_unpin_lock(&rq->lock, cookie);
+	rq_unpin_lock(rq, rf);
 	spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
 
 	/* Here we just switch the register state and the stack. */
@@ -3257,7 +3257,7 @@ static inline void schedule_debug(struct task_struct *prev)
  * Pick up the highest-prio task:
  */
 static inline struct task_struct *
-pick_next_task(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie)
+pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
 	const struct sched_class *class = &fair_sched_class;
 	struct task_struct *p;
@@ -3268,20 +3268,20 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie
 	 */
 	if (likely(prev->sched_class == class &&
 		   rq->nr_running == rq->cfs.h_nr_running)) {
-		p = fair_sched_class.pick_next_task(rq, prev, cookie);
+		p = fair_sched_class.pick_next_task(rq, prev, rf);
 		if (unlikely(p == RETRY_TASK))
 			goto again;
 
 		/* assumes fair_sched_class->next == idle_sched_class */
 		if (unlikely(!p))
-			p = idle_sched_class.pick_next_task(rq, prev, cookie);
+			p = idle_sched_class.pick_next_task(rq, prev, rf);
 
 		return p;
 	}
 
 again:
 	for_each_class(class) {
-		p = class->pick_next_task(rq, prev, cookie);
+		p = class->pick_next_task(rq, prev, rf);
 		if (p) {
 			if (unlikely(p == RETRY_TASK))
 				goto again;
@@ -3335,7 +3335,7 @@ static void __sched notrace __schedule(bool preempt)
 {
 	struct task_struct *prev, *next;
 	unsigned long *switch_count;
-	struct pin_cookie cookie;
+	struct rq_flags rf;
 	struct rq *rq;
 	int cpu;
 
@@ -3358,7 +3358,7 @@ static void __sched notrace __schedule(bool preempt)
 	 */
 	smp_mb__before_spinlock();
 	raw_spin_lock(&rq->lock);
-	cookie = lockdep_pin_lock(&rq->lock);
+	rq_pin_lock(rq, &rf);
 
 	rq->clock_skip_update <<= 1; /* promote REQ to ACT */
 
@@ -3380,7 +3380,7 @@ static void __sched notrace __schedule(bool preempt)
 
 				to_wakeup = wq_worker_sleeping(prev);
 				if (to_wakeup)
-					try_to_wake_up_local(to_wakeup, cookie);
+					try_to_wake_up_local(to_wakeup, &rf);
 			}
 		}
 		switch_count = &prev->nvcsw;
@@ -3389,7 +3389,7 @@ static void __sched notrace __schedule(bool preempt)
 	if (task_on_rq_queued(prev))
 		update_rq_clock(rq);
 
-	next = pick_next_task(rq, prev, cookie);
+	next = pick_next_task(rq, prev, &rf);
 	clear_tsk_need_resched(prev);
 	clear_preempt_need_resched();
 	rq->clock_skip_update = 0;
@@ -3400,9 +3400,9 @@ static void __sched notrace __schedule(bool preempt)
 		++*switch_count;
 
 		trace_sched_switch(preempt, prev, next);
-		rq = context_switch(rq, prev, next, cookie); /* unlocks the rq */
+		rq = context_switch(rq, prev, next, &rf); /* unlocks the rq */
 	} else {
-		lockdep_unpin_lock(&rq->lock, cookie);
+		rq_unpin_lock(rq, &rf);
 		raw_spin_unlock_irq(&rq->lock);
 	}
 
@@ -5521,7 +5521,7 @@ static void migrate_tasks(struct rq *dead_rq)
 {
 	struct rq *rq = dead_rq;
 	struct task_struct *next, *stop = rq->stop;
-	struct pin_cookie cookie;
+	struct rq_flags rf;
 	int dest_cpu;
 
 	/*
@@ -5553,8 +5553,8 @@ static void migrate_tasks(struct rq *dead_rq)
 		/*
 		 * pick_next_task assumes pinned rq->lock.
 		 */
-		cookie = lockdep_pin_lock(&rq->lock);
-		next = pick_next_task(rq, &fake_task, cookie);
+		rq_pin_lock(rq, &rf);
+		next = pick_next_task(rq, &fake_task, &rf);
 		BUG_ON(!next);
 		next->sched_class->put_prev_task(rq, next);
 
@@ -5567,7 +5567,7 @@ static void migrate_tasks(struct rq *dead_rq)
 		 * because !cpu_active at this point, which means load-balance
 		 * will not interfere. Also, stop-machine.
 		 */
-		lockdep_unpin_lock(&rq->lock, cookie);
+		rq_unpin_lock(rq, &rf);
 		raw_spin_unlock(&rq->lock);
 		raw_spin_lock(&next->pi_lock);
 		raw_spin_lock(&rq->lock);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 70ef2b1..491ff66 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -663,9 +663,9 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
 		 * Nothing relies on rq->lock after this, so its safe to drop
 		 * rq->lock.
 		 */
-		lockdep_unpin_lock(&rq->lock, rf.cookie);
+		rq_unpin_lock(rq, &rf);
 		push_dl_task(rq);
-		lockdep_repin_lock(&rq->lock, rf.cookie);
+		rq_repin_lock(rq, &rf);
 	}
 #endif
 
@@ -1118,7 +1118,7 @@ static struct sched_dl_entity *pick_next_dl_entity(struct rq *rq,
 }
 
 struct task_struct *
-pick_next_task_dl(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie)
+pick_next_task_dl(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
 	struct sched_dl_entity *dl_se;
 	struct task_struct *p;
@@ -1133,9 +1133,9 @@ pick_next_task_dl(struct rq *rq, struct task_struct *prev, struct pin_cookie coo
 		 * disabled avoiding further scheduler activity on it and we're
 		 * being very careful to re-start the picking loop.
 		 */
-		lockdep_unpin_lock(&rq->lock, cookie);
+		rq_unpin_lock(rq, rf);
 		pull_dl_task(rq);
-		lockdep_repin_lock(&rq->lock, cookie);
+		rq_repin_lock(rq, rf);
 		/*
 		 * pull_dl_task() can drop (and re-acquire) rq->lock; this
 		 * means a stop task can slip in, in which case we need to
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6559d19..4904412 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6213,7 +6213,7 @@ preempt:
 }
 
 static struct task_struct *
-pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie)
+pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
 	struct cfs_rq *cfs_rq = &rq->cfs;
 	struct sched_entity *se;
@@ -6326,9 +6326,9 @@ idle:
 	 * further scheduler activity on it and we're being very careful to
 	 * re-start the picking loop.
 	 */
-	lockdep_unpin_lock(&rq->lock, cookie);
+	rq_unpin_lock(rq, rf);
 	new_tasks = idle_balance(rq);
-	lockdep_repin_lock(&rq->lock, cookie);
+	rq_repin_lock(rq, rf);
 	/*
 	 * Because idle_balance() releases (and re-acquires) rq->lock, it is
 	 * possible for any higher priority task to appear. In that case we
diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
index 5405d3f..0c00172 100644
--- a/kernel/sched/idle_task.c
+++ b/kernel/sched/idle_task.c
@@ -24,7 +24,7 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl
 }
 
 static struct task_struct *
-pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie)
+pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
 	put_prev_task(rq, prev);
 	update_idle_core(rq);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 2516b8d..88254be 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1523,7 +1523,7 @@ static struct task_struct *_pick_next_task_rt(struct rq *rq)
 }
 
 static struct task_struct *
-pick_next_task_rt(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie)
+pick_next_task_rt(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
 	struct task_struct *p;
 	struct rt_rq *rt_rq = &rq->rt;
@@ -1535,9 +1535,9 @@ pick_next_task_rt(struct rq *rq, struct task_struct *prev, struct pin_cookie coo
 		 * disabled avoiding further scheduler activity on it and we're
 		 * being very careful to re-start the picking loop.
 		 */
-		lockdep_unpin_lock(&rq->lock, cookie);
+		rq_unpin_lock(rq, rf);
 		pull_rt_task(rq);
-		lockdep_repin_lock(&rq->lock, cookie);
+		rq_repin_lock(rq, rf);
 		/*
 		 * pull_rt_task() can drop (and re-acquire) rq->lock; this
 		 * means a dl or stop task can slip in, in which case we need
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7b34c78..98e7eee 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -792,6 +792,26 @@ static inline void rq_clock_skip_update(struct rq *rq, bool skip)
 		rq->clock_skip_update &= ~RQCF_REQ_SKIP;
 }
 
+struct rq_flags {
+	unsigned long flags;
+	struct pin_cookie cookie;
+};
+
+static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf)
+{
+	rf->cookie = lockdep_pin_lock(&rq->lock);
+}
+
+static inline void rq_unpin_lock(struct rq *rq, struct rq_flags *rf)
+{
+	lockdep_unpin_lock(&rq->lock, rf->cookie);
+}
+
+static inline void rq_repin_lock(struct rq *rq, struct rq_flags *rf)
+{
+	lockdep_repin_lock(&rq->lock, rf->cookie);
+}
+
 #ifdef CONFIG_NUMA
 enum numa_topology_type {
 	NUMA_DIRECT,
@@ -1245,7 +1265,7 @@ struct sched_class {
 	 */
 	struct task_struct * (*pick_next_task) (struct rq *rq,
 						struct task_struct *prev,
-						struct pin_cookie cookie);
+						struct rq_flags *rf);
 	void (*put_prev_task) (struct rq *rq, struct task_struct *p);
 
 #ifdef CONFIG_SMP
@@ -1501,11 +1521,6 @@ static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta) { }
 static inline void sched_avg_update(struct rq *rq) { }
 #endif
 
-struct rq_flags {
-	unsigned long flags;
-	struct pin_cookie cookie;
-};
-
 struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)
 	__acquires(rq->lock);
 struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)
@@ -1515,7 +1530,7 @@ struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)
 static inline void __task_rq_unlock(struct rq *rq, struct rq_flags *rf)
 	__releases(rq->lock)
 {
-	lockdep_unpin_lock(&rq->lock, rf->cookie);
+	rq_unpin_lock(rq, rf);
 	raw_spin_unlock(&rq->lock);
 }
 
@@ -1524,7 +1539,7 @@ task_rq_unlock(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
 	__releases(rq->lock)
 	__releases(p->pi_lock)
 {
-	lockdep_unpin_lock(&rq->lock, rf->cookie);
+	rq_unpin_lock(rq, rf);
 	raw_spin_unlock(&rq->lock);
 	raw_spin_unlock_irqrestore(&p->pi_lock, rf->flags);
 }
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 604297a..9f69fb6 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -24,7 +24,7 @@ check_preempt_curr_stop(struct rq *rq, struct task_struct *p, int flags)
 }
 
 static struct task_struct *
-pick_next_task_stop(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie)
+pick_next_task_stop(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
 	struct task_struct *stop = rq->stop;
 

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [tip:sched/core] sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock
  2016-09-21 13:38 ` [PATCH v2 5/7] sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock Matt Fleming
@ 2017-01-14 12:41   ` tip-bot for Matt Fleming
  0 siblings, 0 replies; 44+ messages in thread
From: tip-bot for Matt Fleming @ 2017-01-14 12:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, hpa, umgwanakikbuti, efault, yuyang.du,
	sergey.senozhatsky.work, peterz, fweisbec, pmladek, luca.abeni,
	mgorman, byungchul.park, linux-kernel, wanpeng.li, tglx, riel,
	jack, mingo, matt

Commit-ID:  92509b732baf14c59ca702307270cfaa3a585ae7
Gitweb:     http://git.kernel.org/tip/92509b732baf14c59ca702307270cfaa3a585ae7
Author:     Matt Fleming <matt@codeblueprint.co.uk>
AuthorDate: Wed, 21 Sep 2016 14:38:11 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 14 Jan 2017 11:29:31 +0100

sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock

rq_clock() is called from sched_info_{depart,arrive}() after resetting
RQCF_ACT_SKIP but prior to a call to update_rq_clock().

In preparation for pending patches that check whether the rq clock has
been updated inside of a pin context before rq_clock() is called, move
the reset of rq->clock_skip_update immediately before unpinning the rq
lock.

This will avoid the new warnings which check if update_rq_clock() is
being actively skipped.

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luca Abeni <luca.abeni@unitn.it>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yuyang Du <yuyang.du@intel.com>
Link: http://lkml.kernel.org/r/20160921133813.31976-6-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 41df935..311460b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2887,6 +2887,9 @@ context_switch(struct rq *rq, struct task_struct *prev,
 		prev->active_mm = NULL;
 		rq->prev_mm = oldmm;
 	}
+
+	rq->clock_skip_update = 0;
+
 	/*
 	 * Since the runqueue lock will be released by the next
 	 * task (which is an invalid locking op but in the case
@@ -3392,7 +3395,6 @@ static void __sched notrace __schedule(bool preempt)
 	next = pick_next_task(rq, prev, &rf);
 	clear_tsk_need_resched(prev);
 	clear_preempt_need_resched();
-	rq->clock_skip_update = 0;
 
 	if (likely(prev != next)) {
 		rq->nr_switches++;
@@ -3402,6 +3404,7 @@ static void __sched notrace __schedule(bool preempt)
 		trace_sched_switch(preempt, prev, next);
 		rq = context_switch(rq, prev, next, &rf); /* unlocks the rq */
 	} else {
+		rq->clock_skip_update = 0;
 		rq_unpin_lock(rq, &rf);
 		raw_spin_unlock_irq(&rq->lock);
 	}

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [tip:sched/core] sched/fair: Push rq lock pin/unpin into idle_balance()
  2016-09-21 13:38 ` [PATCH v2 6/7] sched/fair: Push rq lock pin/unpin into idle_balance() Matt Fleming
@ 2017-01-14 12:41   ` tip-bot for Matt Fleming
  0 siblings, 0 replies; 44+ messages in thread
From: tip-bot for Matt Fleming @ 2017-01-14 12:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: sergey.senozhatsky.work, yuyang.du, torvalds, pmladek,
	linux-kernel, wanpeng.li, riel, umgwanakikbuti, luca.abeni,
	fweisbec, byungchul.park, matt, jack, peterz, hpa, mgorman,
	mingo, efault, tglx

Commit-ID:  46f69fa33712ad12ccaa723e46ed5929ee93589b
Gitweb:     http://git.kernel.org/tip/46f69fa33712ad12ccaa723e46ed5929ee93589b
Author:     Matt Fleming <matt@codeblueprint.co.uk>
AuthorDate: Wed, 21 Sep 2016 14:38:12 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 14 Jan 2017 11:29:32 +0100

sched/fair: Push rq lock pin/unpin into idle_balance()

Future patches will emit warnings if rq_clock() is called before
update_rq_clock() inside a rq_pin_lock()/rq_unpin_lock() pair.

Since there is only one caller of idle_balance() we can push the
unpin/repin there.

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luca Abeni <luca.abeni@unitn.it>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yuyang Du <yuyang.du@intel.com>
Link: http://lkml.kernel.org/r/20160921133813.31976-7-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4904412..faf80e1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3424,7 +3424,7 @@ static inline unsigned long cfs_rq_load_avg(struct cfs_rq *cfs_rq)
 	return cfs_rq->avg.load_avg;
 }
 
-static int idle_balance(struct rq *this_rq);
+static int idle_balance(struct rq *this_rq, struct rq_flags *rf);
 
 #else /* CONFIG_SMP */
 
@@ -3453,7 +3453,7 @@ attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
 static inline void
 detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
 
-static inline int idle_balance(struct rq *rq)
+static inline int idle_balance(struct rq *rq, struct rq_flags *rf)
 {
 	return 0;
 }
@@ -6320,15 +6320,8 @@ simple:
 	return p;
 
 idle:
-	/*
-	 * This is OK, because current is on_cpu, which avoids it being picked
-	 * for load-balance and preemption/IRQs are still disabled avoiding
-	 * further scheduler activity on it and we're being very careful to
-	 * re-start the picking loop.
-	 */
-	rq_unpin_lock(rq, rf);
-	new_tasks = idle_balance(rq);
-	rq_repin_lock(rq, rf);
+	new_tasks = idle_balance(rq, rf);
+
 	/*
 	 * Because idle_balance() releases (and re-acquires) rq->lock, it is
 	 * possible for any higher priority task to appear. In that case we
@@ -8297,7 +8290,7 @@ update_next_balance(struct sched_domain *sd, unsigned long *next_balance)
  * idle_balance is called by schedule() if this_cpu is about to become
  * idle. Attempts to pull tasks from other CPUs.
  */
-static int idle_balance(struct rq *this_rq)
+static int idle_balance(struct rq *this_rq, struct rq_flags *rf)
 {
 	unsigned long next_balance = jiffies + HZ;
 	int this_cpu = this_rq->cpu;
@@ -8311,6 +8304,14 @@ static int idle_balance(struct rq *this_rq)
 	 */
 	this_rq->idle_stamp = rq_clock(this_rq);
 
+	/*
+	 * This is OK, because current is on_cpu, which avoids it being picked
+	 * for load-balance and preemption/IRQs are still disabled avoiding
+	 * further scheduler activity on it and we're being very careful to
+	 * re-start the picking loop.
+	 */
+	rq_unpin_lock(this_rq, rf);
+
 	if (this_rq->avg_idle < sysctl_sched_migration_cost ||
 	    !this_rq->rd->overload) {
 		rcu_read_lock();
@@ -8388,6 +8389,8 @@ out:
 	if (pulled_task)
 		this_rq->idle_stamp = 0;
 
+	rq_repin_lock(this_rq, rf);
+
 	return pulled_task;
 }
 

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2016-09-21 13:38 ` [PATCH v2 7/7] sched/core: Add debug code to catch missing update_rq_clock() Matt Fleming
  2016-09-21 15:58   ` Petr Mladek
@ 2017-01-14 12:44   ` tip-bot for Matt Fleming
       [not found]     ` <87tw8gutp6.fsf@concordia.ellerman.id.au>
  1 sibling, 1 reply; 44+ messages in thread
From: tip-bot for Matt Fleming @ 2017-01-14 12:44 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: fweisbec, tglx, matt, pmladek, hpa, efault,
	sergey.senozhatsky.work, mgorman, peterz, wanpeng.li,
	umgwanakikbuti, byungchul.park, luca.abeni, riel, jack, mingo,
	torvalds, yuyang.du, linux-kernel

Commit-ID:  cb42c9a3ebbbb23448c3f9a25417fae6309b1a92
Gitweb:     http://git.kernel.org/tip/cb42c9a3ebbbb23448c3f9a25417fae6309b1a92
Author:     Matt Fleming <matt@codeblueprint.co.uk>
AuthorDate: Wed, 21 Sep 2016 14:38:13 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 14 Jan 2017 11:29:35 +0100

sched/core: Add debugging code to catch missing update_rq_clock() calls

There's no diagnostic checks for figuring out when we've accidentally
missed update_rq_clock() calls. Let's add some by piggybacking on the
rq_*pin_lock() wrappers.

The idea behind the diagnostic checks is that upon pining rq lock the
rq clock should be updated, via update_rq_clock(), before anybody
reads the clock with rq_clock() or rq_clock_task().

The exception to this rule is when updates have explicitly been
disabled with the rq_clock_skip_update() optimisation.

There are some functions that only unpin the rq lock in order to grab
some other lock and avoid deadlock. In that case we don't need to
update the clock again and the previous diagnostic state can be
carried over in rq_repin_lock() by saving the state in the rq_flags
context.

Since this patch adds a new clock update flag and some already exist
in rq::clock_skip_update, that field has now been renamed. An attempt
has been made to keep the flag manipulation code small and fast since
it's used in the heart of the __schedule() fast path.

For the !CONFIG_SCHED_DEBUG case the only object code change (other
than addresses) is the following change to reset RQCF_ACT_SKIP inside
of __schedule(),

  -       c7 83 38 09 00 00 00    movl   $0x0,0x938(%rbx)
  -       00 00 00
  +       83 a3 38 09 00 00 fc    andl   $0xfffffffc,0x938(%rbx)

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luca Abeni <luca.abeni@unitn.it>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yuyang Du <yuyang.du@intel.com>
Link: http://lkml.kernel.org/r/20160921133813.31976-8-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c  | 11 +++++---
 kernel/sched/sched.h | 74 +++++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 75 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d233892..a129b34 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -102,9 +102,12 @@ void update_rq_clock(struct rq *rq)
 
 	lockdep_assert_held(&rq->lock);
 
-	if (rq->clock_skip_update & RQCF_ACT_SKIP)
+	if (rq->clock_update_flags & RQCF_ACT_SKIP)
 		return;
 
+#ifdef CONFIG_SCHED_DEBUG
+	rq->clock_update_flags |= RQCF_UPDATED;
+#endif
 	delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;
 	if (delta < 0)
 		return;
@@ -2889,7 +2892,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
 		rq->prev_mm = oldmm;
 	}
 
-	rq->clock_skip_update = 0;
+	rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
 
 	/*
 	 * Since the runqueue lock will be released by the next
@@ -3364,7 +3367,7 @@ static void __sched notrace __schedule(bool preempt)
 	raw_spin_lock(&rq->lock);
 	rq_pin_lock(rq, &rf);
 
-	rq->clock_skip_update <<= 1; /* promote REQ to ACT */
+	rq->clock_update_flags <<= 1; /* promote REQ to ACT */
 
 	switch_count = &prev->nivcsw;
 	if (!preempt && prev->state) {
@@ -3405,7 +3408,7 @@ static void __sched notrace __schedule(bool preempt)
 		trace_sched_switch(preempt, prev, next);
 		rq = context_switch(rq, prev, next, &rf); /* unlocks the rq */
 	} else {
-		rq->clock_skip_update = 0;
+		rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
 		rq_unpin_lock(rq, &rf);
 		raw_spin_unlock_irq(&rq->lock);
 	}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 98e7eee..6eeae7e 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -644,7 +644,7 @@ struct rq {
 	unsigned long next_balance;
 	struct mm_struct *prev_mm;
 
-	unsigned int clock_skip_update;
+	unsigned int clock_update_flags;
 	u64 clock;
 	u64 clock_task;
 
@@ -768,48 +768,110 @@ static inline u64 __rq_clock_broken(struct rq *rq)
 	return READ_ONCE(rq->clock);
 }
 
+/*
+ * rq::clock_update_flags bits
+ *
+ * %RQCF_REQ_SKIP - will request skipping of clock update on the next
+ *  call to __schedule(). This is an optimisation to avoid
+ *  neighbouring rq clock updates.
+ *
+ * %RQCF_ACT_SKIP - is set from inside of __schedule() when skipping is
+ *  in effect and calls to update_rq_clock() are being ignored.
+ *
+ * %RQCF_UPDATED - is a debug flag that indicates whether a call has been
+ *  made to update_rq_clock() since the last time rq::lock was pinned.
+ *
+ * If inside of __schedule(), clock_update_flags will have been
+ * shifted left (a left shift is a cheap operation for the fast path
+ * to promote %RQCF_REQ_SKIP to %RQCF_ACT_SKIP), so you must use,
+ *
+ *	if (rq-clock_update_flags >= RQCF_UPDATED)
+ *
+ * to check if %RQCF_UPADTED is set. It'll never be shifted more than
+ * one position though, because the next rq_unpin_lock() will shift it
+ * back.
+ */
+#define RQCF_REQ_SKIP	0x01
+#define RQCF_ACT_SKIP	0x02
+#define RQCF_UPDATED	0x04
+
+static inline void assert_clock_updated(struct rq *rq)
+{
+	/*
+	 * The only reason for not seeing a clock update since the
+	 * last rq_pin_lock() is if we're currently skipping updates.
+	 */
+	SCHED_WARN_ON(rq->clock_update_flags < RQCF_ACT_SKIP);
+}
+
 static inline u64 rq_clock(struct rq *rq)
 {
 	lockdep_assert_held(&rq->lock);
+	assert_clock_updated(rq);
+
 	return rq->clock;
 }
 
 static inline u64 rq_clock_task(struct rq *rq)
 {
 	lockdep_assert_held(&rq->lock);
+	assert_clock_updated(rq);
+
 	return rq->clock_task;
 }
 
-#define RQCF_REQ_SKIP	0x01
-#define RQCF_ACT_SKIP	0x02
-
 static inline void rq_clock_skip_update(struct rq *rq, bool skip)
 {
 	lockdep_assert_held(&rq->lock);
 	if (skip)
-		rq->clock_skip_update |= RQCF_REQ_SKIP;
+		rq->clock_update_flags |= RQCF_REQ_SKIP;
 	else
-		rq->clock_skip_update &= ~RQCF_REQ_SKIP;
+		rq->clock_update_flags &= ~RQCF_REQ_SKIP;
 }
 
 struct rq_flags {
 	unsigned long flags;
 	struct pin_cookie cookie;
+#ifdef CONFIG_SCHED_DEBUG
+	/*
+	 * A copy of (rq::clock_update_flags & RQCF_UPDATED) for the
+	 * current pin context is stashed here in case it needs to be
+	 * restored in rq_repin_lock().
+	 */
+	unsigned int clock_update_flags;
+#endif
 };
 
 static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf)
 {
 	rf->cookie = lockdep_pin_lock(&rq->lock);
+
+#ifdef CONFIG_SCHED_DEBUG
+	rq->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
+	rf->clock_update_flags = 0;
+#endif
 }
 
 static inline void rq_unpin_lock(struct rq *rq, struct rq_flags *rf)
 {
+#ifdef CONFIG_SCHED_DEBUG
+	if (rq->clock_update_flags > RQCF_ACT_SKIP)
+		rf->clock_update_flags = RQCF_UPDATED;
+#endif
+
 	lockdep_unpin_lock(&rq->lock, rf->cookie);
 }
 
 static inline void rq_repin_lock(struct rq *rq, struct rq_flags *rf)
 {
 	lockdep_repin_lock(&rq->lock, rf->cookie);
+
+#ifdef CONFIG_SCHED_DEBUG
+	/*
+	 * Restore the value we stashed in @rf for this pin context.
+	 */
+	rq->clock_update_flags |= rf->clock_update_flags;
+#endif
 }
 
 #ifdef CONFIG_NUMA

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
       [not found]     ` <87tw8gutp6.fsf@concordia.ellerman.id.au>
@ 2017-01-30 21:34       ` Matt Fleming
  2017-01-31  8:35         ` Michael Ellerman
  2017-01-31 11:00         ` Sachin Sant
  0 siblings, 2 replies; 44+ messages in thread
From: Matt Fleming @ 2017-01-30 21:34 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: fweisbec, tglx, pmladek, hpa, efault, sergey.senozhatsky.work,
	peterz, mgorman, wanpeng.li, umgwanakikbuti, byungchul.park,
	jack, mingo, riel, luca.abeni, yuyang.du, torvalds, linux-kernel,
	linux-tip-commits, linuxppc-dev, linux-next

On Tue, 31 Jan, at 08:24:53AM, Michael Ellerman wrote:
> 
> I'm hitting this on multiple powerpc systems:
> 
> [   38.339126] rq->clock_update_flags < RQCF_ACT_SKIP
> [   38.339134] ------------[ cut here ]------------
> [   38.339142] WARNING: CPU: 2 PID: 1 at kernel/sched/sched.h:804 detach_task_cfs_rq+0xa0c/0xd10

[...]
 
> I assume I should be worried?

Thanks for the report. No need to worry, the bug has existed for a
while, this patch just turns on the warning ;-)

The following commit queued up in tip/sched/core should fix your
issues (assuming you see the same callstack on all your powerpc
machines):

  https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=sched/core&id=1b1d62254df0fe42a711eb71948f915918987790

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-01-30 21:34       ` Matt Fleming
@ 2017-01-31  8:35         ` Michael Ellerman
  2017-01-31 11:00         ` Sachin Sant
  1 sibling, 0 replies; 44+ messages in thread
From: Michael Ellerman @ 2017-01-31  8:35 UTC (permalink / raw)
  To: Matt Fleming
  Cc: fweisbec, tglx, pmladek, hpa, efault, sergey.senozhatsky.work,
	peterz, mgorman, wanpeng.li, umgwanakikbuti, byungchul.park,
	jack, mingo, riel, luca.abeni, yuyang.du, torvalds, linux-kernel,
	linux-tip-commits, linuxppc-dev, linux-next

Matt Fleming <matt@codeblueprint.co.uk> writes:

> On Tue, 31 Jan, at 08:24:53AM, Michael Ellerman wrote:
>> 
>> I'm hitting this on multiple powerpc systems:
>> 
>> [   38.339126] rq->clock_update_flags < RQCF_ACT_SKIP
>> [   38.339134] ------------[ cut here ]------------
>> [   38.339142] WARNING: CPU: 2 PID: 1 at kernel/sched/sched.h:804 detach_task_cfs_rq+0xa0c/0xd10
>
> [...]
>  
>> I assume I should be worried?
>
> Thanks for the report. No need to worry, the bug has existed for a
> while, this patch just turns on the warning ;-)
>
> The following commit queued up in tip/sched/core should fix your
> issues (assuming you see the same callstack on all your powerpc
> machines):
>
>   https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=sched/core&id=1b1d62254df0fe42a711eb71948f915918987790

Great thanks.

Looks like that commit is in today's linux-next, so hopefully I won't
see any oopses in my boot tests overnight. If I do I'll let you know.

cheers

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-01-30 21:34       ` Matt Fleming
  2017-01-31  8:35         ` Michael Ellerman
@ 2017-01-31 11:00         ` Sachin Sant
  2017-01-31 11:48           ` Mike Galbraith
  1 sibling, 1 reply; 44+ messages in thread
From: Sachin Sant @ 2017-01-31 11:00 UTC (permalink / raw)
  To: Matt Fleming, Michael Ellerman
  Cc: linuxppc-dev, peterz, linux-next, linux-kernel

Trimming the cc list.

>> I assume I should be worried?
> 
> Thanks for the report. No need to worry, the bug has existed for a
> while, this patch just turns on the warning ;-)
> 
> The following commit queued up in tip/sched/core should fix your
> issues (assuming you see the same callstack on all your powerpc
> machines):
> 
>  https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=sched/core&id=1b1d62254df0fe42a711eb71948f915918987790

I still see this warning with today’s next running inside PowerVM LPAR
on a POWER8 box. The stack trace is different from what Michael had
reported.

Easiest way to recreate this is to Online/offline cpu’s.

[  114.795609] rq->clock_update_flags < RQCF_ACT_SKIP
[  114.795621] ------------[ cut here ]------------
[  114.795632] WARNING: CPU: 2 PID: 27 at kernel/sched/sched.h:804 set_next_entity+0xbc8/0xcc0
[  114.795634] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc rpadlpar_io rpaphp kvm_pr kvm ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm iw_cxgb3 ib_core ghash_generic xts gf128mul tpm_ibmvtpm tpm sg vmx_crypto pseries_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c sr_mod sd_mod cdrom cxgb3 ibmvscsi ibmveth scsi_transport_srp mdio
[  114.795751]  dm_mirror dm_region_hash dm_log dm_mod
[  114.795762] CPU: 2 PID: 27 Comm: migration/2 Not tainted 4.10.0-rc6-next-20170131 #1
[  114.795765] task: c0000004fa2f8600 task.stack: c0000004fa49c000
[  114.795768] NIP: c000000000114ed8 LR: c000000000114ed4 CTR: c0000000004a8cf0
[  114.795771] REGS: c0000004fa49f6a0 TRAP: 0700   Not tainted  (4.10.0-rc6-next-20170131)
[  114.795773] MSR: 8000000002823033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE>
[  114.795787]   CR: 28004022  XER: 00000000
[  114.795789] CFAR: c0000000008ec5c4 SOFTE: 0 
GPR00: c000000000114ed4 c0000004fa49f920 c00000000100dd00 0000000000000026 
GPR04: 0000000000000000 0000000000000006 6574616470755f6b c0000000011cdd00 
GPR08: 0000000000000000 c000000000c6edb0 000000015ef20000 d000000006488538 
GPR12: 0000000000004400 c00000000e801200 c0000000000ecc38 c0000004fe064300 
GPR16: 0000000000000000 0000000000000001 0000000000000000 c000000000f27e08 
GPR20: c000000000f277c5 0000000000000000 0000000000000004 0000000000000000 
GPR24: c00000015fba49f0 c000000000f27e08 c000000000ef9e80 c0000004fa49fb00 
GPR28: c00000015fba4980 c00000015fba49f0 c0000004f34c1000 c00000015fba49f0 
[  114.795850] NIP [c000000000114ed8] set_next_entity+0xbc8/0xcc0
[  114.795855] LR [c000000000114ed4] set_next_entity+0xbc4/0xcc0
[  114.795857] Call Trace:
[  114.795862] [c0000004fa49f920] [c000000000114ed4] set_next_entity+0xbc4/0xcc0 (unreliable)
[  114.795869] [c0000004fa49f9d0] [c000000000119f4c] pick_next_task_fair+0xfc/0x6f0
[  114.795874] [c0000004fa49fae0] [c000000000104820] sched_cpu_dying+0x3c0/0x450
[  114.795880] [c0000004fa49fb80] [c0000000000c1958] cpuhp_invoke_callback+0x148/0x5b0
[  114.795886] [c0000004fa49fbf0] [c0000000000c3340] take_cpu_down+0xb0/0x110
[  114.795893] [c0000004fa49fc50] [c0000000001a1e58] multi_cpu_stop+0x1a8/0x1e0
[  114.795899] [c0000004fa49fca0] [c0000000001a20c4] cpu_stopper_thread+0x104/0x1e0
[  114.795905] [c0000004fa49fd60] [c0000000000f2b90] smpboot_thread_fn+0x290/0x2a0
[  114.795911] [c0000004fa49fdc0] [c0000000000ecd7c] kthread+0x14c/0x190
[  114.795919] [c0000004fa49fe30] [c00000000000b4e8] ret_from_kernel_thread+0x5c/0x74
[  114.795921] Instruction dump:
[  114.795924] 0fe00000 4bfff884 3d02fff2 89289ac5 2f890000 40fef4ec 39200001 3c62ffac 
[  114.795936] 38633698 99289ac5 487d76b5 60000000 <0fe00000> 4bfff4cc eb9f0118 e93f0120 
[  114.795948] ---[ end trace 5c822f32f967fbc5 ]---
[  123.059141] nr_pdflush_threads exported in /proc is scheduled for removal

Thanks
-Sachin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-01-31 11:00         ` Sachin Sant
@ 2017-01-31 11:48           ` Mike Galbraith
  2017-01-31 17:22             ` Ross Zwisler
  0 siblings, 1 reply; 44+ messages in thread
From: Mike Galbraith @ 2017-01-31 11:48 UTC (permalink / raw)
  To: Sachin Sant, Matt Fleming, Michael Ellerman
  Cc: linuxppc-dev, peterz, linux-next, linux-kernel

On Tue, 2017-01-31 at 16:30 +0530, Sachin Sant wrote:
> Trimming the cc list.
> 
> > > I assume I should be worried?
> > 
> > Thanks for the report. No need to worry, the bug has existed for a
> > while, this patch just turns on the warning ;-)
> > 
> > The following commit queued up in tip/sched/core should fix your
> > issues (assuming you see the same callstack on all your powerpc
> > machines):
> > 
> >  https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=sched/core&id=1b1d62254df0fe42a711eb71948f915918987790
> 
> I still see this warning with today’s next running inside PowerVM LPAR
> on a POWER8 box. The stack trace is different from what Michael had
> reported.
> 
> Easiest way to recreate this is to Online/offline cpu’s.

(Ditto tip.today, x86_64 + hotplug stress)

[   94.804196] ------------[ cut here ]------------
[   94.804201] WARNING: CPU: 3 PID: 27 at kernel/sched/sched.h:804 set_next_entity+0x81c/0x910
[   94.804201] rq->clock_update_flags < RQCF_ACT_SKIP
[   94.804202] Modules linked in: ebtable_filter(E) ebtables(E) fuse(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) ip6t_REJECT(E) xt_tcpudp(E) nf_conntrack_ipv6(E) nf_defrag_ipv6(E) ip6table_raw(E) ipt_REJECT(E) iptable_raw(E) iptable_filter(E) ip6table_mangle(E) nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) ip6table_filter(E) ip6_tables(E) x_tables(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) nls_iso8859_1(E) crc32c_intel(E) nls_cp437(E) snd_hda_codec_realtek(E) snd_hda_codec_hdmi(E) snd_hda_codec_generic(E) nfsd(E) aesni_intel(E) snd_hda_intel(E) snd_hda_codec(E) snd_hwdep(E) aes_x86_64(E) snd_hda_core(E) crypto_simd(E)
[   94.804220]  snd_pcm(E) auth_rpcgss(E) snd_timer(E) snd(E) iTCO_wdt(E) iTCO_vendor_support(E) joydev(E) nfs_acl(E) lpc_ich(E) cryptd(E) lockd(E) intel_smartconnect(E) mfd_core(E) i2c_i801(E) battery(E) glue_helper(E) mei_me(E) shpchp(E) mei(E) soundcore(E) grace(E) fan(E) thermal(E) tpm_infineon(E) pcspkr(E) sunrpc(E) efivarfs(E) sr_mod(E) cdrom(E) hid_logitech_hidpp(E) hid_logitech_dj(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) nouveau(E) wmi(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ahci(E) xhci_pci(E) ehci_pci(E) ttm(E) libahci(E) xhci_hcd(E) ehci_hcd(E) r8169(E) mii(E) libata(E) drm(E) usbcore(E) fjes(E) video(E) button(E) af_packet(E) sd_mod(E) vfat(E) fat(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_mod(E) loop(E) sg(E) scsi_mod(E) autofs4(E)
[   94.804246] CPU: 3 PID: 27 Comm: migration/3 Tainted: G            E   4.10.0-tip #15
[   94.804247] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[   94.804247] Call Trace:
[   94.804251]  ? dump_stack+0x5c/0x7c
[   94.804253]  ? __warn+0xc4/0xe0
[   94.804255]  ? warn_slowpath_fmt+0x4f/0x60
[   94.804256]  ? set_next_entity+0x81c/0x910
[   94.804258]  ? pick_next_task_fair+0x20a/0xa20
[   94.804259]  ? sched_cpu_starting+0x50/0x50
[   94.804260]  ? sched_cpu_dying+0x237/0x280
[   94.804261]  ? sched_cpu_starting+0x50/0x50
[   94.804262]  ? cpuhp_invoke_callback+0x83/0x3e0
[   94.804263]  ? take_cpu_down+0x56/0x90
[   94.804266]  ? multi_cpu_stop+0xa9/0xd0
[   94.804267]  ? cpu_stop_queue_work+0xb0/0xb0
[   94.804268]  ? cpu_stopper_thread+0x81/0x110
[   94.804270]  ? smpboot_thread_fn+0xfe/0x150
[   94.804272]  ? kthread+0xf4/0x130
[   94.804273]  ? sort_range+0x20/0x20
[   94.804274]  ? kthread_park+0x80/0x80
[   94.804276]  ? ret_from_fork+0x26/0x40
[   94.804277] ---[ end trace b0a9e4aa1fb229bb ]---

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-01-31 11:48           ` Mike Galbraith
@ 2017-01-31 17:22             ` Ross Zwisler
  2017-02-02 15:55               ` Peter Zijlstra
  0 siblings, 1 reply; 44+ messages in thread
From: Ross Zwisler @ 2017-01-31 17:22 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Sachin Sant, Matt Fleming, Michael Ellerman, linuxppc-dev,
	peterz, linux-next, LKML

On Tue, Jan 31, 2017 at 4:48 AM, Mike Galbraith <efault@gmx.de> wrote:
> On Tue, 2017-01-31 at 16:30 +0530, Sachin Sant wrote:
>> Trimming the cc list.
>>
>> > > I assume I should be worried?
>> >
>> > Thanks for the report. No need to worry, the bug has existed for a
>> > while, this patch just turns on the warning ;-)
>> >
>> > The following commit queued up in tip/sched/core should fix your
>> > issues (assuming you see the same callstack on all your powerpc
>> > machines):
>> >
>> >  https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=sched/core&id=1b1d62254df0fe42a711eb71948f915918987790
>>
>> I still see this warning with today’s next running inside PowerVM LPAR
>> on a POWER8 box. The stack trace is different from what Michael had
>> reported.
>>
>> Easiest way to recreate this is to Online/offline cpu’s.
>
> (Ditto tip.today, x86_64 + hotplug stress)
<>

I'm also seeing a splat in the mmots tree with
v4.10-rc5-mmots-2017-01-26-15-49, which pulled in this commit by
merging with next.  Just booting on an x86_64 VM gives me this:

[   13.090436] ------------[ cut here ]------------
[   13.090577] WARNING: CPU: 8 PID: 1 at kernel/sched/sched.h:804
update_load_avg+0x85b/0xb80
[   13.090577] rq->clock_update_flags < RQCF_ACT_SKIP
[   13.090578] Modules linked in: dax_pmem dax nd_pmem nd_btt nd_e820 libnvdimm
[   13.090582] CPU: 8 PID: 1 Comm: systemd Not tainted
4.10.0-rc5-mm1-00313-g5c0c3d7-dirty #10
[   13.090583] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
[   13.090583] Call Trace:
[   13.090585]  dump_stack+0x86/0xc3
[   13.090586]  __warn+0xcb/0xf0
[   13.090588]  warn_slowpath_fmt+0x5f/0x80
[   13.090590]  update_load_avg+0x85b/0xb80
[   13.090591]  ? debug_smp_processor_id+0x17/0x20
[   13.090593]  detach_task_cfs_rq+0x3f/0x210
[   13.090594]  task_change_group_fair+0x24/0x100
[   13.090596]  sched_change_group+0x5f/0x110
[   13.090597]  sched_move_task+0x53/0x160
[   13.090598]  cpu_cgroup_attach+0x36/0x70
[   13.090600]  cgroup_migrate_execute+0x230/0x3f0
[   13.090602]  cgroup_migrate+0xce/0x140
[   13.090603]  ? cgroup_migrate+0x5/0x140
[   13.090604]  cgroup_attach_task+0x27f/0x3e0
[   13.090606]  ? cgroup_attach_task+0x9b/0x3e0
[   13.090608]  __cgroup_procs_write+0x30e/0x510
[   13.090608]  ? __cgroup_procs_write+0x70/0x510
[   13.090610]  cgroup_procs_write+0x14/0x20
[   13.090611]  cgroup_file_write+0x44/0x1e0
[   13.090613]  kernfs_fop_write+0x13c/0x1c0
[   13.090614]  __vfs_write+0x37/0x160
[   13.090615]  ? rcu_read_lock_sched_held+0x4a/0x80
[   13.090616]  ? rcu_sync_lockdep_assert+0x2f/0x60
[   13.090617]  ? __sb_start_write+0x10d/0x220
[   13.090618]  ? vfs_write+0x19b/0x1f0
[   13.090619]  ? security_file_permission+0x3b/0xc0
[   13.090620]  vfs_write+0xcb/0x1f0
[   13.090621]  SyS_write+0x58/0xc0
[   13.090623]  entry_SYSCALL_64_fastpath+0x1f/0xc2
[   13.090623] RIP: 0033:0x7f8b7c1be210
[   13.090624] RSP: 002b:00007ffe73febfd8 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[   13.090625] RAX: ffffffffffffffda RBX: 000055a84870a7e0 RCX: 00007f8b7c1be210
[   13.090625] RDX: 0000000000000004 RSI: 000055a84870aa10 RDI: 0000000000000033
[   13.090626] RBP: 0000000000000000 R08: 000055a84870a8c0 R09: 00007f8b7dbda900
[   13.090627] R10: 000055a84870aa10 R11: 0000000000000246 R12: 0000000000000000
[   13.090627] R13: 000055a848775360 R14: 000055a84870a7e0 R15: 0000000000000033
[   13.090629] ---[ end trace ba535936c2409043 ]---

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-01-31 17:22             ` Ross Zwisler
@ 2017-02-02 15:55               ` Peter Zijlstra
  2017-02-02 22:01                 ` Matt Fleming
                                   ` (5 more replies)
  0 siblings, 6 replies; 44+ messages in thread
From: Peter Zijlstra @ 2017-02-02 15:55 UTC (permalink / raw)
  To: Ross Zwisler
  Cc: Mike Galbraith, Sachin Sant, Matt Fleming, Michael Ellerman,
	linuxppc-dev, linux-next, LKML

On Tue, Jan 31, 2017 at 10:22:47AM -0700, Ross Zwisler wrote:
> On Tue, Jan 31, 2017 at 4:48 AM, Mike Galbraith <efault@gmx.de> wrote:
> > On Tue, 2017-01-31 at 16:30 +0530, Sachin Sant wrote:


Could some of you test this? It seems to cure things in my (very)
limited testing.

---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 96e4ccc..b773821 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5609,7 +5609,7 @@ static void migrate_tasks(struct rq *dead_rq)
 {
 	struct rq *rq = dead_rq;
 	struct task_struct *next, *stop = rq->stop;
-	struct rq_flags rf, old_rf;
+	struct rq_flags rf;
 	int dest_cpu;
 
 	/*
@@ -5628,7 +5628,9 @@ static void migrate_tasks(struct rq *dead_rq)
 	 * class method both need to have an up-to-date
 	 * value of rq->clock[_task]
 	 */
+	rq_pin_lock(rq, &rf);
 	update_rq_clock(rq);
+	rq_unpin_lock(rq, &rf);
 
 	for (;;) {
 		/*
@@ -5641,7 +5643,7 @@ static void migrate_tasks(struct rq *dead_rq)
 		/*
 		 * pick_next_task assumes pinned rq->lock.
 		 */
-		rq_pin_lock(rq, &rf);
+		rq_repin_lock(rq, &rf);
 		next = pick_next_task(rq, &fake_task, &rf);
 		BUG_ON(!next);
 		next->sched_class->put_prev_task(rq, next);
@@ -5670,13 +5672,6 @@ static void migrate_tasks(struct rq *dead_rq)
 			continue;
 		}
 
-		/*
-		 * __migrate_task() may return with a different
-		 * rq->lock held and a new cookie in 'rf', but we need
-		 * to preserve rf::clock_update_flags for 'dead_rq'.
-		 */
-		old_rf = rf;
-
 		/* Find suitable destination for @next, with force if needed. */
 		dest_cpu = select_fallback_rq(dead_rq->cpu, next);
 
@@ -5685,7 +5680,6 @@ static void migrate_tasks(struct rq *dead_rq)
 			raw_spin_unlock(&rq->lock);
 			rq = dead_rq;
 			raw_spin_lock(&rq->lock);
-			rf = old_rf;
 		}
 		raw_spin_unlock(&next->pi_lock);
 	}

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-02 15:55               ` Peter Zijlstra
@ 2017-02-02 22:01                 ` Matt Fleming
  2017-02-03  3:05                 ` Mike Galbraith
                                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 44+ messages in thread
From: Matt Fleming @ 2017-02-02 22:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ross Zwisler, Mike Galbraith, Sachin Sant, Michael Ellerman,
	linuxppc-dev, linux-next, LKML

On Thu, 02 Feb, at 04:55:06PM, Peter Zijlstra wrote:
> On Tue, Jan 31, 2017 at 10:22:47AM -0700, Ross Zwisler wrote:
> > On Tue, Jan 31, 2017 at 4:48 AM, Mike Galbraith <efault@gmx.de> wrote:
> > > On Tue, 2017-01-31 at 16:30 +0530, Sachin Sant wrote:
> 
> 
> Could some of you test this? It seems to cure things in my (very)
> limited testing.

I haven't tested it but this looks like the correct fix to me.

Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-02 15:55               ` Peter Zijlstra
  2017-02-02 22:01                 ` Matt Fleming
@ 2017-02-03  3:05                 ` Mike Galbraith
  2017-02-03  4:33                 ` Sachin Sant
                                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 44+ messages in thread
From: Mike Galbraith @ 2017-02-03  3:05 UTC (permalink / raw)
  To: Peter Zijlstra, Ross Zwisler
  Cc: Sachin Sant, Matt Fleming, Michael Ellerman, linuxppc-dev,
	linux-next, LKML

On Thu, 2017-02-02 at 16:55 +0100, Peter Zijlstra wrote:
> On Tue, Jan 31, 2017 at 10:22:47AM -0700, Ross Zwisler wrote:
> > On Tue, Jan 31, 2017 at 4:48 AM, Mike Galbraith <efault@gmx.de>
> > wrote:
> > > On Tue, 2017-01-31 at 16:30 +0530, Sachin Sant wrote:
> 
> 
> Could some of you test this? It seems to cure things in my (very)
> limited testing.

Hotplug stress gripe is gone here.

	-Mike

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-02 15:55               ` Peter Zijlstra
  2017-02-02 22:01                 ` Matt Fleming
  2017-02-03  3:05                 ` Mike Galbraith
@ 2017-02-03  4:33                 ` Sachin Sant
  2017-02-03  8:53                   ` Peter Zijlstra
  2017-02-03 13:04                 ` Borislav Petkov
                                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 44+ messages in thread
From: Sachin Sant @ 2017-02-03  4:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ross Zwisler, Mike Galbraith, Matt Fleming, Michael Ellerman,
	linuxppc-dev, linux-next, LKML


> On 02-Feb-2017, at 9:25 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Tue, Jan 31, 2017 at 10:22:47AM -0700, Ross Zwisler wrote:
>> On Tue, Jan 31, 2017 at 4:48 AM, Mike Galbraith <efault@gmx.de> wrote:
>>> On Tue, 2017-01-31 at 16:30 +0530, Sachin Sant wrote:
> 
> 
> Could some of you test this? It seems to cure things in my (very)
> limited testing.
> 

I ran few cycles of cpu hot(un)plug tests. In most cases it works except one
where I ran into rcu stall:

[  173.493453] INFO: rcu_sched detected stalls on CPUs/tasks:
[  173.493473] 	8-...: (2 GPs behind) idle=006/140000000000000/0 softirq=0/0 fqs=2996 
[  173.493476] 	(detected by 0, t=6002 jiffies, g=885, c=884, q=6350)
[  173.493482] Task dump for CPU 8:
[  173.493484] cpuhp/8         R  running task        0  3416      2 0x00000884
[  173.493489] Call Trace:
[  173.493492] [c0000004f7b834a0] [c0000004f7b83560] 0xc0000004f7b83560 (unreliable)
[  173.493498] [c0000004f7b83670] [c000000000008d28] alignment_common+0x128/0x130
[  173.493503] --- interrupt: 600 at _raw_spin_lock+0x2c/0xc0
[  173.493503]     LR = try_to_wake_up+0x204/0x5c0
[  173.493507] [c0000004f7b83960] [c0000004f4d8084c] 0xc0000004f4d8084c (unreliable)
[  173.493511] [c0000004f7b83990] [c0000000000fef54] try_to_wake_up+0x204/0x5c0
[  173.493515] [c0000004f7b83a10] [c0000000000e2b88] create_worker+0x148/0x250
[  173.493519] [c0000004f7b83ab0] [c0000000000e6e1c] alloc_unbound_pwq+0x3bc/0x4c0
[  173.493522] [c0000004f7b83b10] [c0000000000e7084] wq_update_unbound_numa+0x164/0x270
[  173.493526] [c0000004f7b83bb0] [c0000000000e8990] workqueue_online_cpu+0x250/0x3b0
[  173.493529] [c0000004f7b83c70] [c0000000000c2758] cpuhp_invoke_callback+0x148/0x5b0
[  173.493533] [c0000004f7b83ce0] [c0000000000c2df8] cpuhp_up_callbacks+0x48/0x140
[  173.493536] [c0000004f7b83d30] [c0000000000c3e98] cpuhp_thread_fun+0x148/0x180
[  173.493540] [c0000004f7b83d60] [c0000000000f3930] smpboot_thread_fn+0x290/0x2a0
[  173.493544] [c0000004f7b83dc0] [c0000000000edb3c] kthread+0x14c/0x190
[  173.493547] [c0000004f7b83e30] [c00000000000b4e8] ret_from_kernel_thread+0x5c/0x74
[  243.913715] INFO: task kworker/0:2:380 blocked for more than 120 seconds.
[  243.913732]       Not tainted 4.10.0-rc6-next-20170202 #6
[  243.913735] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.913738] kworker/0:2     D    0   380      2 0x00000800
[  243.913746] Workqueue: events vmstat_shepherd
[  243.913748] Call Trace:
[  243.913752] [c0000000ff07f820] [c00000000011135c] enqueue_entity+0x81c/0x1200 (unreliable)
[  243.913757] [c0000000ff07f9f0] [c00000000001a660] __switch_to+0x300/0x400
[  243.913762] [c0000000ff07fa50] [c0000000008df4f4] __schedule+0x314/0xb10
[  243.913766] [c0000000ff07fb20] [c0000000008dfd30] schedule+0x40/0xb0
[  243.913769] [c0000000ff07fb50] [c0000000008e02b8] schedule_preempt_disabled+0x18/0x30
[  243.913773] [c0000000ff07fb70] [c0000000008e1654] __mutex_lock.isra.6+0x1a4/0x660
[  243.913777] [c0000000ff07fc00] [c0000000000c3828] get_online_cpus+0x48/0x90
[  243.913780] [c0000000ff07fc30] [c00000000025fd78] vmstat_shepherd+0x38/0x150
[  243.913784] [c0000000ff07fc80] [c0000000000e5794] process_one_work+0x1a4/0x4d0
[  243.913788] [c0000000ff07fd20] [c0000000000e5b58] worker_thread+0x98/0x5a0
[  243.913791] [c0000000ff07fdc0] [c0000000000edb3c] kthread+0x14c/0x190
[  243.913795] [c0000000ff07fe30] [c00000000000b4e8] ret_from_kernel_thread+0x5c/0x74
[  243.913824] INFO: task drmgr:3413 blocked for more than 120 seconds.
[  243.913826]       Not tainted 4.10.0-rc6-next-20170202 #6
[  243.913829] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.913831] drmgr           D    0  3413   3114 0x00040080
[  243.913834] Call Trace:
[  243.913836] [c000000257ff3380] [c000000257ff3440] 0xc000000257ff3440 (unreliable)
[  243.913840] [c000000257ff3550] [c00000000001a660] __switch_to+0x300/0x400
[  243.913844] [c000000257ff35b0] [c0000000008df4f4] __schedule+0x314/0xb10
[  243.913847] [c000000257ff3680] [c0000000008dfd30] schedule+0x40/0xb0
[  243.913851] [c000000257ff36b0] [c0000000008e4594] schedule_timeout+0x274/0x470
[  243.913855] [c000000257ff37b0] [c0000000008e0efc] wait_for_common+0x1ac/0x2c0
[  243.913858] [c000000257ff3830] [c0000000000c50e4] bringup_cpu+0x84/0xe0
[  243.913862] [c000000257ff3860] [c0000000000c2758] cpuhp_invoke_callback+0x148/0x5b0
[  243.913865] [c000000257ff38d0] [c0000000000c2df8] cpuhp_up_callbacks+0x48/0x140
[  243.913868] [c000000257ff3920] [c0000000000c5438] _cpu_up+0xe8/0x1c0
[  243.913872] [c000000257ff3980] [c0000000000c5630] do_cpu_up+0x120/0x150
[  243.913876] [c000000257ff3a00] [c0000000005c005c] cpu_subsys_online+0x5c/0xe0
[  243.913879] [c000000257ff3a50] [c0000000005b7d84] device_online+0xb4/0x120
[  243.913883] [c000000257ff3a90] [c000000000093424] dlpar_online_cpu+0x144/0x1e0
[  243.913887] [c000000257ff3b50] [c000000000093c08] dlpar_cpu_add+0x108/0x2f0
[  243.913891] [c000000257ff3be0] [c0000000000948dc] dlpar_cpu_probe+0x3c/0x80
[  243.913894] [c000000257ff3c20] [c0000000000207a8] arch_cpu_probe+0x38/0x60
[  243.913898] [c000000257ff3c40] [c0000000005c0880] cpu_probe_store+0x40/0x70
[  243.913902] [c000000257ff3c70] [c0000000005b2e94] dev_attr_store+0x34/0x60
[  243.913906] [c000000257ff3c90] [c0000000003b0fc4] sysfs_kf_write+0x64/0xa0
[  243.913910] [c000000257ff3cb0] [c0000000003afd10] kernfs_fop_write+0x170/0x250
[  243.913914] [c000000257ff3d00] [c0000000002fb0f0] __vfs_write+0x40/0x1c0
[  243.913917] [c000000257ff3d90] [c0000000002fcba8] vfs_write+0xc8/0x240
[  243.913921] [c000000257ff3de0] [c0000000002fe790] SyS_write+0x60/0x110
[  243.913924] [c000000257ff3e30] [c00000000000b184] system_call+0x38/0xe0
[  243.913929] INFO: task ppc64_cpu:3423 blocked for more than 120 seconds.
[  243.913931]       Not tainted 4.10.0-rc6-next-20170202 #6
[  243.913933] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Thanks
-Sachin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-03  4:33                 ` Sachin Sant
@ 2017-02-03  8:53                   ` Peter Zijlstra
  2017-02-03 12:59                     ` Mike Galbraith
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2017-02-03  8:53 UTC (permalink / raw)
  To: Sachin Sant
  Cc: Ross Zwisler, Mike Galbraith, Matt Fleming, Michael Ellerman,
	linuxppc-dev, linux-next, LKML

On Fri, Feb 03, 2017 at 10:03:14AM +0530, Sachin Sant wrote:
> 
> > On 02-Feb-2017, at 9:25 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > On Tue, Jan 31, 2017 at 10:22:47AM -0700, Ross Zwisler wrote:
> >> On Tue, Jan 31, 2017 at 4:48 AM, Mike Galbraith <efault@gmx.de> wrote:
> >>> On Tue, 2017-01-31 at 16:30 +0530, Sachin Sant wrote:
> > 
> > 
> > Could some of you test this? It seems to cure things in my (very)
> > limited testing.
> > 
> 
> I ran few cycles of cpu hot(un)plug tests. In most cases it works except one
> where I ran into rcu stall:
> 
> [  173.493453] INFO: rcu_sched detected stalls on CPUs/tasks:
> [  173.493473] 	8-...: (2 GPs behind) idle=006/140000000000000/0 softirq=0/0 fqs=2996 
> [  173.493476] 	(detected by 0, t=6002 jiffies, g=885, c=884, q=6350)

Right, I actually saw that too, but I don't think that would be related
to my patch. I'll see if I can dig into this though, ought to get fixed
regardless.

Thanks for testing!

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-03  8:53                   ` Peter Zijlstra
@ 2017-02-03 12:59                     ` Mike Galbraith
  2017-02-03 13:37                       ` Peter Zijlstra
  0 siblings, 1 reply; 44+ messages in thread
From: Mike Galbraith @ 2017-02-03 12:59 UTC (permalink / raw)
  To: Peter Zijlstra, Sachin Sant
  Cc: Ross Zwisler, Matt Fleming, Michael Ellerman, linuxppc-dev,
	linux-next, LKML

On Fri, 2017-02-03 at 09:53 +0100, Peter Zijlstra wrote:
> On Fri, Feb 03, 2017 at 10:03:14AM +0530, Sachin Sant wrote:

> > I ran few cycles of cpu hot(un)plug tests. In most cases it works except one
> > where I ran into rcu stall:
> > 
> > [  173.493453] INFO: rcu_sched detected stalls on CPUs/tasks:
> > [  173.493473] > > 	> > 8-...: (2 GPs behind) idle=006/140000000000000/0 softirq=0/0 fqs=2996 
> > [  173.493476] > > 	> > (detected by 0, t=6002 jiffies, g=885, c=884, q=6350)
> 
> Right, I actually saw that too, but I don't think that would be related
> to my patch. I'll see if I can dig into this though, ought to get fixed
> regardless.

FWIW, I'm not seeing stalls/hangs while beating hotplug up in tip. (so
next grew a wart?)

	-Mike

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-02 15:55               ` Peter Zijlstra
                                   ` (2 preceding siblings ...)
  2017-02-03  4:33                 ` Sachin Sant
@ 2017-02-03 13:04                 ` Borislav Petkov
  2017-02-22  9:03                 ` Wanpeng Li
  2017-02-24  9:16                 ` [tip:sched/urgent] sched/core: Fix update_rq_clock() splat on hotplug (and suspend/resume) tip-bot for Peter Zijlstra
  5 siblings, 0 replies; 44+ messages in thread
From: Borislav Petkov @ 2017-02-03 13:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ross Zwisler, Mike Galbraith, Sachin Sant, Matt Fleming,
	Michael Ellerman, linuxppc-dev, linux-next, LKML

On Thu, Feb 02, 2017 at 04:55:06PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 31, 2017 at 10:22:47AM -0700, Ross Zwisler wrote:
> > On Tue, Jan 31, 2017 at 4:48 AM, Mike Galbraith <efault@gmx.de> wrote:
> > > On Tue, 2017-01-31 at 16:30 +0530, Sachin Sant wrote:
> 
> 
> Could some of you test this? It seems to cure things in my (very)
> limited testing.

Tested-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-03 12:59                     ` Mike Galbraith
@ 2017-02-03 13:37                       ` Peter Zijlstra
  2017-02-03 13:52                         ` Mike Galbraith
  2017-02-03 15:44                         ` Paul E. McKenney
  0 siblings, 2 replies; 44+ messages in thread
From: Peter Zijlstra @ 2017-02-03 13:37 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Sachin Sant, Ross Zwisler, Matt Fleming, Michael Ellerman,
	linuxppc-dev, linux-next, LKML, Paul McKenney

On Fri, Feb 03, 2017 at 01:59:34PM +0100, Mike Galbraith wrote:
> On Fri, 2017-02-03 at 09:53 +0100, Peter Zijlstra wrote:
> > On Fri, Feb 03, 2017 at 10:03:14AM +0530, Sachin Sant wrote:
> 
> > > I ran few cycles of cpu hot(un)plug tests. In most cases it works except one
> > > where I ran into rcu stall:
> > > 
> > > [  173.493453] INFO: rcu_sched detected stalls on CPUs/tasks:
> > > [  173.493473] > > 	> > 8-...: (2 GPs behind) idle=006/140000000000000/0 softirq=0/0 fqs=2996 
> > > [  173.493476] > > 	> > (detected by 0, t=6002 jiffies, g=885, c=884, q=6350)
> > 
> > Right, I actually saw that too, but I don't think that would be related
> > to my patch. I'll see if I can dig into this though, ought to get fixed
> > regardless.
> 
> FWIW, I'm not seeing stalls/hangs while beating hotplug up in tip. (so
> next grew a wart?)

I've seen it on tip. It looks like hot unplug goes really slow when
there's running tasks on the CPU being taken down.

What I did was something like:

  taskset -p $((1<<1)) $$
  for ((i=0; i<20; i++)) do while :; do :; done & done

  taskset -p $((1<<0)) $$
  echo 0 > /sys/devices/system/cpu/cpu1/online

And with those 20 tasks stuck sucking cycles on CPU1, the unplug goes
_really_ slow and the RCU stall triggers. What I suspect happens is that
hotplug stops participating in the RCU state machine early, but only
tells RCU about it really late, and in between it gets suspicious it
takes too long.

I've yet to dig through the RCU code to figure out the exact sequence of
events, but found the above to be fairly reliable in triggering the
issue.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-03 13:37                       ` Peter Zijlstra
@ 2017-02-03 13:52                         ` Mike Galbraith
  2017-02-03 15:44                         ` Paul E. McKenney
  1 sibling, 0 replies; 44+ messages in thread
From: Mike Galbraith @ 2017-02-03 13:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sachin Sant, Ross Zwisler, Matt Fleming, Michael Ellerman,
	linuxppc-dev, linux-next, LKML, Paul McKenney

On Fri, 2017-02-03 at 14:37 +0100, Peter Zijlstra wrote:
> On Fri, Feb 03, 2017 at 01:59:34PM +0100, Mike Galbraith wrote:

> > FWIW, I'm not seeing stalls/hangs while beating hotplug up in tip. (so
> > next grew a wart?)
> 
> I've seen it on tip. It looks like hot unplug goes really slow when
> there's running tasks on the CPU being taken down.
> 
> What I did was something like:
> 
>   taskset -p $((1<<1)) $$
>   for ((i=0; i<20; i++)) do while :; do :; done & done
> 
>   taskset -p $((1<<0)) $$
>   echo 0 > /sys/devices/system/cpu/cpu1/online
> 
> And with those 20 tasks stuck sucking cycles on CPU1, the unplug goes
> _really_ slow and the RCU stall triggers. What I suspect happens is that
> hotplug stops participating in the RCU state machine early, but only
> tells RCU about it really late, and in between it gets suspicious it
> takes too long.

Ah.  I wasn't doing a really hard pounding, just running a couple
instances of Steven's script.  To beat hell out of it, I add futextest,
stockfish and a small kbuild on a big box.

	-Mike

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-03 13:37                       ` Peter Zijlstra
  2017-02-03 13:52                         ` Mike Galbraith
@ 2017-02-03 15:44                         ` Paul E. McKenney
  2017-02-03 15:54                           ` Paul E. McKenney
  1 sibling, 1 reply; 44+ messages in thread
From: Paul E. McKenney @ 2017-02-03 15:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Sachin Sant, Ross Zwisler, Matt Fleming,
	Michael Ellerman, linuxppc-dev, linux-next, LKML

On Fri, Feb 03, 2017 at 02:37:48PM +0100, Peter Zijlstra wrote:
> On Fri, Feb 03, 2017 at 01:59:34PM +0100, Mike Galbraith wrote:
> > On Fri, 2017-02-03 at 09:53 +0100, Peter Zijlstra wrote:
> > > On Fri, Feb 03, 2017 at 10:03:14AM +0530, Sachin Sant wrote:
> > 
> > > > I ran few cycles of cpu hot(un)plug tests. In most cases it works except one
> > > > where I ran into rcu stall:
> > > > 
> > > > [  173.493453] INFO: rcu_sched detected stalls on CPUs/tasks:
> > > > [  173.493473] > > 	> > 8-...: (2 GPs behind) idle=006/140000000000000/0 softirq=0/0 fqs=2996 
> > > > [  173.493476] > > 	> > (detected by 0, t=6002 jiffies, g=885, c=884, q=6350)
> > > 
> > > Right, I actually saw that too, but I don't think that would be related
> > > to my patch. I'll see if I can dig into this though, ought to get fixed
> > > regardless.
> > 
> > FWIW, I'm not seeing stalls/hangs while beating hotplug up in tip. (so
> > next grew a wart?)
> 
> I've seen it on tip. It looks like hot unplug goes really slow when
> there's running tasks on the CPU being taken down.
> 
> What I did was something like:
> 
>   taskset -p $((1<<1)) $$
>   for ((i=0; i<20; i++)) do while :; do :; done & done
> 
>   taskset -p $((1<<0)) $$
>   echo 0 > /sys/devices/system/cpu/cpu1/online
> 
> And with those 20 tasks stuck sucking cycles on CPU1, the unplug goes
> _really_ slow and the RCU stall triggers. What I suspect happens is that
> hotplug stops participating in the RCU state machine early, but only
> tells RCU about it really late, and in between it gets suspicious it
> takes too long.
> 
> I've yet to dig through the RCU code to figure out the exact sequence of
> events, but found the above to be fairly reliable in triggering the
> issue.

If you send me the full splat from the dmesg and the RCU portions of
.config, I will take a look.  Is this new behavior, or a new test?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-03 15:44                         ` Paul E. McKenney
@ 2017-02-03 15:54                           ` Paul E. McKenney
  2017-02-06  6:23                             ` Sachin Sant
  0 siblings, 1 reply; 44+ messages in thread
From: Paul E. McKenney @ 2017-02-03 15:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mike Galbraith, Sachin Sant, Ross Zwisler, Matt Fleming,
	Michael Ellerman, linuxppc-dev, linux-next, LKML

On Fri, Feb 03, 2017 at 07:44:57AM -0800, Paul E. McKenney wrote:
> On Fri, Feb 03, 2017 at 02:37:48PM +0100, Peter Zijlstra wrote:
> > On Fri, Feb 03, 2017 at 01:59:34PM +0100, Mike Galbraith wrote:
> > > On Fri, 2017-02-03 at 09:53 +0100, Peter Zijlstra wrote:
> > > > On Fri, Feb 03, 2017 at 10:03:14AM +0530, Sachin Sant wrote:
> > > 
> > > > > I ran few cycles of cpu hot(un)plug tests. In most cases it works except one
> > > > > where I ran into rcu stall:
> > > > > 
> > > > > [  173.493453] INFO: rcu_sched detected stalls on CPUs/tasks:
> > > > > [  173.493473] > > 	> > 8-...: (2 GPs behind) idle=006/140000000000000/0 softirq=0/0 fqs=2996 
> > > > > [  173.493476] > > 	> > (detected by 0, t=6002 jiffies, g=885, c=884, q=6350)
> > > > 
> > > > Right, I actually saw that too, but I don't think that would be related
> > > > to my patch. I'll see if I can dig into this though, ought to get fixed
> > > > regardless.
> > > 
> > > FWIW, I'm not seeing stalls/hangs while beating hotplug up in tip. (so
> > > next grew a wart?)
> > 
> > I've seen it on tip. It looks like hot unplug goes really slow when
> > there's running tasks on the CPU being taken down.
> > 
> > What I did was something like:
> > 
> >   taskset -p $((1<<1)) $$
> >   for ((i=0; i<20; i++)) do while :; do :; done & done
> > 
> >   taskset -p $((1<<0)) $$
> >   echo 0 > /sys/devices/system/cpu/cpu1/online
> > 
> > And with those 20 tasks stuck sucking cycles on CPU1, the unplug goes
> > _really_ slow and the RCU stall triggers. What I suspect happens is that
> > hotplug stops participating in the RCU state machine early, but only
> > tells RCU about it really late, and in between it gets suspicious it
> > takes too long.
> > 
> > I've yet to dig through the RCU code to figure out the exact sequence of
> > events, but found the above to be fairly reliable in triggering the
> > issue.

> If you send me the full splat from the dmesg and the RCU portions of
> .config, I will take a look.  Is this new behavior, or a new test?

If new behavior, I would be most suspicious of these commits in -rcu which
recently entered -tip:

19e4d983cda1 rcu: Place guard on rcu_all_qs() and rcu_note_context_switch() actions
913324b1364f rcu: Eliminate flavor scan in rcu_momentary_dyntick_idle()
fcdcfefafa45 rcu: Pull rcu_qs_ctr into rcu_dynticks structure
0919a0b7e7a5 rcu: Pull rcu_sched_qs_mask into rcu_dynticks structure
caa7c8e34293 rcu: Make rcu_note_context_switch() do deferred NOCB wakeups
41e4b159d516 rcu: Make rcu_all_qs() do deferred NOCB wakeups
b457a3356a68 rcu: Make call_rcu() do deferred NOCB wakeups

Does reverting any of these help?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-03 15:54                           ` Paul E. McKenney
@ 2017-02-06  6:23                             ` Sachin Sant
  2017-02-06 15:10                               ` Paul E. McKenney
  0 siblings, 1 reply; 44+ messages in thread
From: Sachin Sant @ 2017-02-06  6:23 UTC (permalink / raw)
  To: paulmck
  Cc: Peter Zijlstra, Matt Fleming, linuxppc-dev, Mike Galbraith, LKML,
	linux-next, Ross Zwisler


>>> I've seen it on tip. It looks like hot unplug goes really slow when
>>> there's running tasks on the CPU being taken down.
>>> 
>>> What I did was something like:
>>> 
>>>  taskset -p $((1<<1)) $$
>>>  for ((i=0; i<20; i++)) do while :; do :; done & done
>>> 
>>>  taskset -p $((1<<0)) $$
>>>  echo 0 > /sys/devices/system/cpu/cpu1/online
>>> 
>>> And with those 20 tasks stuck sucking cycles on CPU1, the unplug goes
>>> _really_ slow and the RCU stall triggers. What I suspect happens is that
>>> hotplug stops participating in the RCU state machine early, but only
>>> tells RCU about it really late, and in between it gets suspicious it
>>> takes too long.
>>> 
>>> I've yet to dig through the RCU code to figure out the exact sequence of
>>> events, but found the above to be fairly reliable in triggering the
>>> issue.
> 
>> If you send me the full splat from the dmesg and the RCU portions of
>> .config, I will take a look.  Is this new behavior, or a new test?
> 

I have sent the required files to you via separate email.

> If new behavior, I would be most suspicious of these commits in -rcu which
> recently entered -tip:
> 
> 19e4d983cda1 rcu: Place guard on rcu_all_qs() and rcu_note_context_switch() actions
> 913324b1364f rcu: Eliminate flavor scan in rcu_momentary_dyntick_idle()
> fcdcfefafa45 rcu: Pull rcu_qs_ctr into rcu_dynticks structure
> 0919a0b7e7a5 rcu: Pull rcu_sched_qs_mask into rcu_dynticks structure
> caa7c8e34293 rcu: Make rcu_note_context_switch() do deferred NOCB wakeups
> 41e4b159d516 rcu: Make rcu_all_qs() do deferred NOCB wakeups
> b457a3356a68 rcu: Make call_rcu() do deferred NOCB wakeups
> 
> Does reverting any of these help?

I tried reverting the above commits. That does not help. I can still recreate the issue.

Thanks
-Sachin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-06  6:23                             ` Sachin Sant
@ 2017-02-06 15:10                               ` Paul E. McKenney
  2017-02-06 15:14                                 ` Paul E. McKenney
  0 siblings, 1 reply; 44+ messages in thread
From: Paul E. McKenney @ 2017-02-06 15:10 UTC (permalink / raw)
  To: Sachin Sant
  Cc: Peter Zijlstra, Matt Fleming, linuxppc-dev, Mike Galbraith, LKML,
	linux-next, Ross Zwisler

On Mon, Feb 06, 2017 at 11:53:10AM +0530, Sachin Sant wrote:
> 
> >>> I've seen it on tip. It looks like hot unplug goes really slow when
> >>> there's running tasks on the CPU being taken down.
> >>> 
> >>> What I did was something like:
> >>> 
> >>>  taskset -p $((1<<1)) $$
> >>>  for ((i=0; i<20; i++)) do while :; do :; done & done
> >>> 
> >>>  taskset -p $((1<<0)) $$
> >>>  echo 0 > /sys/devices/system/cpu/cpu1/online
> >>> 
> >>> And with those 20 tasks stuck sucking cycles on CPU1, the unplug goes
> >>> _really_ slow and the RCU stall triggers. What I suspect happens is that
> >>> hotplug stops participating in the RCU state machine early, but only
> >>> tells RCU about it really late, and in between it gets suspicious it
> >>> takes too long.
> >>> 
> >>> I've yet to dig through the RCU code to figure out the exact sequence of
> >>> events, but found the above to be fairly reliable in triggering the
> >>> issue.
> > 
> >> If you send me the full splat from the dmesg and the RCU portions of
> >> .config, I will take a look.  Is this new behavior, or a new test?
> > 
> 
> I have sent the required files to you via separate email.
> 
> > If new behavior, I would be most suspicious of these commits in -rcu which
> > recently entered -tip:
> > 
> > 19e4d983cda1 rcu: Place guard on rcu_all_qs() and rcu_note_context_switch() actions
> > 913324b1364f rcu: Eliminate flavor scan in rcu_momentary_dyntick_idle()
> > fcdcfefafa45 rcu: Pull rcu_qs_ctr into rcu_dynticks structure
> > 0919a0b7e7a5 rcu: Pull rcu_sched_qs_mask into rcu_dynticks structure
> > caa7c8e34293 rcu: Make rcu_note_context_switch() do deferred NOCB wakeups
> > 41e4b159d516 rcu: Make rcu_all_qs() do deferred NOCB wakeups
> > b457a3356a68 rcu: Make call_rcu() do deferred NOCB wakeups
> > 
> > Does reverting any of these help?
> 
> I tried reverting the above commits. That does not help. I can still recreate the issue.

Thank you for testing, Sachin!

Could you please try building and testing with CONFIG_RCU_BOOST=y?
You will need to enable CONFIG_RCU_EXPERT=y to see this Kconfig option.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-06 15:10                               ` Paul E. McKenney
@ 2017-02-06 15:14                                 ` Paul E. McKenney
  0 siblings, 0 replies; 44+ messages in thread
From: Paul E. McKenney @ 2017-02-06 15:14 UTC (permalink / raw)
  To: Sachin Sant
  Cc: Peter Zijlstra, Matt Fleming, linuxppc-dev, Mike Galbraith, LKML,
	linux-next, Ross Zwisler

On Mon, Feb 06, 2017 at 07:10:48AM -0800, Paul E. McKenney wrote:
> On Mon, Feb 06, 2017 at 11:53:10AM +0530, Sachin Sant wrote:
> > 
> > >>> I've seen it on tip. It looks like hot unplug goes really slow when
> > >>> there's running tasks on the CPU being taken down.
> > >>> 
> > >>> What I did was something like:
> > >>> 
> > >>>  taskset -p $((1<<1)) $$
> > >>>  for ((i=0; i<20; i++)) do while :; do :; done & done
> > >>> 
> > >>>  taskset -p $((1<<0)) $$
> > >>>  echo 0 > /sys/devices/system/cpu/cpu1/online
> > >>> 
> > >>> And with those 20 tasks stuck sucking cycles on CPU1, the unplug goes
> > >>> _really_ slow and the RCU stall triggers. What I suspect happens is that
> > >>> hotplug stops participating in the RCU state machine early, but only
> > >>> tells RCU about it really late, and in between it gets suspicious it
> > >>> takes too long.
> > >>> 
> > >>> I've yet to dig through the RCU code to figure out the exact sequence of
> > >>> events, but found the above to be fairly reliable in triggering the
> > >>> issue.
> > > 
> > >> If you send me the full splat from the dmesg and the RCU portions of
> > >> .config, I will take a look.  Is this new behavior, or a new test?
> > > 
> > 
> > I have sent the required files to you via separate email.
> > 
> > > If new behavior, I would be most suspicious of these commits in -rcu which
> > > recently entered -tip:
> > > 
> > > 19e4d983cda1 rcu: Place guard on rcu_all_qs() and rcu_note_context_switch() actions
> > > 913324b1364f rcu: Eliminate flavor scan in rcu_momentary_dyntick_idle()
> > > fcdcfefafa45 rcu: Pull rcu_qs_ctr into rcu_dynticks structure
> > > 0919a0b7e7a5 rcu: Pull rcu_sched_qs_mask into rcu_dynticks structure
> > > caa7c8e34293 rcu: Make rcu_note_context_switch() do deferred NOCB wakeups
> > > 41e4b159d516 rcu: Make rcu_all_qs() do deferred NOCB wakeups
> > > b457a3356a68 rcu: Make call_rcu() do deferred NOCB wakeups
> > > 
> > > Does reverting any of these help?
> > 
> > I tried reverting the above commits. That does not help. I can still recreate the issue.
> 
> Thank you for testing, Sachin!
> 
> Could you please try building and testing with CONFIG_RCU_BOOST=y?
> You will need to enable CONFIG_RCU_EXPERT=y to see this Kconfig option.

Ah, but looking ahead to your .config file, you have CONFIG_PREEMPT=n,
which means boosting would not help and is not available in any case.

So it looks like there is a very long loop within an RCU read-side
critical section, and that this critical section needs to be broken
up a bit -- 21 seconds in pretty much any kind of critical section is
a bit excessive, after all.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls
  2017-02-02 15:55               ` Peter Zijlstra
                                   ` (3 preceding siblings ...)
  2017-02-03 13:04                 ` Borislav Petkov
@ 2017-02-22  9:03                 ` Wanpeng Li
  2017-02-24  9:16                 ` [tip:sched/urgent] sched/core: Fix update_rq_clock() splat on hotplug (and suspend/resume) tip-bot for Peter Zijlstra
  5 siblings, 0 replies; 44+ messages in thread
From: Wanpeng Li @ 2017-02-22  9:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ross Zwisler, Mike Galbraith, Sachin Sant, Matt Fleming,
	Michael Ellerman, linuxppc-dev, linux-next, LKML

2017-02-02 23:55 GMT+08:00 Peter Zijlstra <peterz@infradead.org>:
> On Tue, Jan 31, 2017 at 10:22:47AM -0700, Ross Zwisler wrote:
>> On Tue, Jan 31, 2017 at 4:48 AM, Mike Galbraith <efault@gmx.de> wrote:
>> > On Tue, 2017-01-31 at 16:30 +0530, Sachin Sant wrote:
>
>
> Could some of you test this? It seems to cure things in my (very)
> limited testing.
>

Tested-by: Wanpeng Li <wanpeng.li@hotmail.com>

> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 96e4ccc..b773821 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5609,7 +5609,7 @@ static void migrate_tasks(struct rq *dead_rq)
>  {
>         struct rq *rq = dead_rq;
>         struct task_struct *next, *stop = rq->stop;
> -       struct rq_flags rf, old_rf;
> +       struct rq_flags rf;
>         int dest_cpu;
>
>         /*
> @@ -5628,7 +5628,9 @@ static void migrate_tasks(struct rq *dead_rq)
>          * class method both need to have an up-to-date
>          * value of rq->clock[_task]
>          */
> +       rq_pin_lock(rq, &rf);
>         update_rq_clock(rq);
> +       rq_unpin_lock(rq, &rf);
>
>         for (;;) {
>                 /*
> @@ -5641,7 +5643,7 @@ static void migrate_tasks(struct rq *dead_rq)
>                 /*
>                  * pick_next_task assumes pinned rq->lock.
>                  */
> -               rq_pin_lock(rq, &rf);
> +               rq_repin_lock(rq, &rf);
>                 next = pick_next_task(rq, &fake_task, &rf);
>                 BUG_ON(!next);
>                 next->sched_class->put_prev_task(rq, next);
> @@ -5670,13 +5672,6 @@ static void migrate_tasks(struct rq *dead_rq)
>                         continue;
>                 }
>
> -               /*
> -                * __migrate_task() may return with a different
> -                * rq->lock held and a new cookie in 'rf', but we need
> -                * to preserve rf::clock_update_flags for 'dead_rq'.
> -                */
> -               old_rf = rf;
> -
>                 /* Find suitable destination for @next, with force if needed. */
>                 dest_cpu = select_fallback_rq(dead_rq->cpu, next);
>
> @@ -5685,7 +5680,6 @@ static void migrate_tasks(struct rq *dead_rq)
>                         raw_spin_unlock(&rq->lock);
>                         rq = dead_rq;
>                         raw_spin_lock(&rq->lock);
> -                       rf = old_rf;
>                 }
>                 raw_spin_unlock(&next->pi_lock);
>         }

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [tip:sched/urgent] sched/core: Fix update_rq_clock() splat on hotplug (and suspend/resume)
  2017-02-02 15:55               ` Peter Zijlstra
                                   ` (4 preceding siblings ...)
  2017-02-22  9:03                 ` Wanpeng Li
@ 2017-02-24  9:16                 ` tip-bot for Peter Zijlstra
  5 siblings, 0 replies; 44+ messages in thread
From: tip-bot for Peter Zijlstra @ 2017-02-24  9:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, sachinp, mpe, bp, torvalds, matt, mingo, efault,
	hpa, zwisler, tglx, peterz

Commit-ID:  8cb68b343a66cf19834472012590490d34d31703
Gitweb:     http://git.kernel.org/tip/8cb68b343a66cf19834472012590490d34d31703
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 2 Feb 2017 16:55:06 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 24 Feb 2017 08:58:33 +0100

sched/core: Fix update_rq_clock() splat on hotplug (and suspend/resume)

The hotplug code still triggers the warning about using a stale
rq->clock value.

Fix things up to actually run update_rq_clock() in a place where we
record the 'UPDATED' flag, and then modify the annotation to retain
this flag over the rq->lock fiddling that happens as a result of
actually migrating all the tasks elsewhere.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Mike Galbraith <efault@gmx.de>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ross Zwisler <zwisler@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 4d25b35ea372 ("sched/fair: Restore previous rq_flags when migrating tasks in hotplug")
Link: http://lkml.kernel.org/r/20170202155506.GX6515@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 34e2291..2d6e828 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5557,7 +5557,7 @@ static void migrate_tasks(struct rq *dead_rq)
 {
 	struct rq *rq = dead_rq;
 	struct task_struct *next, *stop = rq->stop;
-	struct rq_flags rf, old_rf;
+	struct rq_flags rf;
 	int dest_cpu;
 
 	/*
@@ -5576,7 +5576,9 @@ static void migrate_tasks(struct rq *dead_rq)
 	 * class method both need to have an up-to-date
 	 * value of rq->clock[_task]
 	 */
+	rq_pin_lock(rq, &rf);
 	update_rq_clock(rq);
+	rq_unpin_lock(rq, &rf);
 
 	for (;;) {
 		/*
@@ -5589,7 +5591,7 @@ static void migrate_tasks(struct rq *dead_rq)
 		/*
 		 * pick_next_task() assumes pinned rq->lock:
 		 */
-		rq_pin_lock(rq, &rf);
+		rq_repin_lock(rq, &rf);
 		next = pick_next_task(rq, &fake_task, &rf);
 		BUG_ON(!next);
 		next->sched_class->put_prev_task(rq, next);
@@ -5618,13 +5620,6 @@ static void migrate_tasks(struct rq *dead_rq)
 			continue;
 		}
 
-		/*
-		 * __migrate_task() may return with a different
-		 * rq->lock held and a new cookie in 'rf', but we need
-		 * to preserve rf::clock_update_flags for 'dead_rq'.
-		 */
-		old_rf = rf;
-
 		/* Find suitable destination for @next, with force if needed. */
 		dest_cpu = select_fallback_rq(dead_rq->cpu, next);
 
@@ -5633,7 +5628,6 @@ static void migrate_tasks(struct rq *dead_rq)
 			raw_spin_unlock(&rq->lock);
 			rq = dead_rq;
 			raw_spin_lock(&rq->lock);
-			rf = old_rf;
 		}
 		raw_spin_unlock(&next->pi_lock);
 	}

^ permalink raw reply related	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2017-02-24  9:44 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-21 13:38 [PATCH v2 0/7] sched: Diagnostic checks for missing rq clock updates Matt Fleming
2016-09-21 13:38 ` [PATCH v2 1/7] sched/fair: Update the rq clock before detaching tasks Matt Fleming
2016-10-03 12:49   ` Peter Zijlstra
2016-10-03 14:37     ` Matt Fleming
2016-10-03 14:42       ` Peter Zijlstra
2016-09-21 13:38 ` [PATCH v2 2/7] sched/fair: Update rq clock before waking up new task Matt Fleming
2016-09-21 13:38 ` [PATCH v2 3/7] sched/fair: Update rq clock in task_hot() Matt Fleming
2016-09-21 13:38 ` [PATCH v2 4/7] sched: Add wrappers for lockdep_(un)pin_lock() Matt Fleming
2017-01-14 12:40   ` [tip:sched/core] sched/core: " tip-bot for Matt Fleming
2016-09-21 13:38 ` [PATCH v2 5/7] sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock Matt Fleming
2017-01-14 12:41   ` [tip:sched/core] " tip-bot for Matt Fleming
2016-09-21 13:38 ` [PATCH v2 6/7] sched/fair: Push rq lock pin/unpin into idle_balance() Matt Fleming
2017-01-14 12:41   ` [tip:sched/core] " tip-bot for Matt Fleming
2016-09-21 13:38 ` [PATCH v2 7/7] sched/core: Add debug code to catch missing update_rq_clock() Matt Fleming
2016-09-21 15:58   ` Petr Mladek
2016-09-21 19:08     ` Matt Fleming
2016-09-21 19:46       ` Thomas Gleixner
2016-09-22  0:44       ` Sergey Senozhatsky
2016-09-22  8:04     ` Peter Zijlstra
2016-09-22  8:36       ` Jan Kara
2016-09-22  9:39         ` Peter Zijlstra
2016-09-22 10:17           ` Peter Zijlstra
2017-01-14 12:44   ` [tip:sched/core] sched/core: Add debugging code to catch missing update_rq_clock() calls tip-bot for Matt Fleming
     [not found]     ` <87tw8gutp6.fsf@concordia.ellerman.id.au>
2017-01-30 21:34       ` Matt Fleming
2017-01-31  8:35         ` Michael Ellerman
2017-01-31 11:00         ` Sachin Sant
2017-01-31 11:48           ` Mike Galbraith
2017-01-31 17:22             ` Ross Zwisler
2017-02-02 15:55               ` Peter Zijlstra
2017-02-02 22:01                 ` Matt Fleming
2017-02-03  3:05                 ` Mike Galbraith
2017-02-03  4:33                 ` Sachin Sant
2017-02-03  8:53                   ` Peter Zijlstra
2017-02-03 12:59                     ` Mike Galbraith
2017-02-03 13:37                       ` Peter Zijlstra
2017-02-03 13:52                         ` Mike Galbraith
2017-02-03 15:44                         ` Paul E. McKenney
2017-02-03 15:54                           ` Paul E. McKenney
2017-02-06  6:23                             ` Sachin Sant
2017-02-06 15:10                               ` Paul E. McKenney
2017-02-06 15:14                                 ` Paul E. McKenney
2017-02-03 13:04                 ` Borislav Petkov
2017-02-22  9:03                 ` Wanpeng Li
2017-02-24  9:16                 ` [tip:sched/urgent] sched/core: Fix update_rq_clock() splat on hotplug (and suspend/resume) tip-bot for Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).