linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/5] sched: Diagnostic checks for missing rq clock updates
@ 2016-05-12 19:49 Matt Fleming
  2016-05-12 19:49 ` [RFC][PATCH 1/5] sched/fair: Update the rq clock before detaching tasks Matt Fleming
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Matt Fleming @ 2016-05-12 19:49 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, Byungchul Park, Frederic Weisbecker, Luca Abeni,
	Rafael J . Wysocki, Rik van Riel, Thomas Gleixner, Wanpeng Li,
	Yuyang Du, Mel Gorman, Mike Galbraith, Matt Fleming

There are currently no runtime diagnostic checks for detecting when we
have inadvertently missed a call to update_rq_clock() before accessing
rq_clock().

The idea in these patches, which came from Peter, is to piggyback on
the rq->lock pin/unpin context to detect when we expected (and failed)
to see an update to the rq clock. They've already caught a couple of
bugs: see patch 1 and commit b52fad2db5d7 ("sched/fair: Update rq
clock before updating nohz CPU load") in tip/sched/core.

I'm not sure how palatable the s/pin_cookie/rq_flags/ changes will be
in patch 2, so I've marked this entire series as RFC.

All the diagnostic code is guarded by CONFIG_SCHED_DEBUG, but there
are minimal changes to __schedule() in patch 5 for the !SCHED_DEBUG
case.

Matt Fleming (5):
  sched/fair: Update the rq clock before detaching tasks
  sched: Add wrappers for lockdep_(un)pin_lock()
  sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock
  sched/fair: Push rq lock pin/unpin into idle_balance()
  sched/core: Add debug code to catch missing update_rq_clock()

 kernel/sched/core.c      |  94 +++++++++++++++++++++++--------------------
 kernel/sched/deadline.c  |  10 ++---
 kernel/sched/fair.c      |  31 +++++++++------
 kernel/sched/idle_task.c |   2 +-
 kernel/sched/rt.c        |   6 +--
 kernel/sched/sched.h     | 101 ++++++++++++++++++++++++++++++++++++++++-------
 kernel/sched/stop_task.c |   2 +-
 7 files changed, 166 insertions(+), 80 deletions(-)

-- 
2.7.3

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC][PATCH 1/5] sched/fair: Update the rq clock before detaching tasks
  2016-05-12 19:49 [RFC][PATCH 0/5] sched: Diagnostic checks for missing rq clock updates Matt Fleming
@ 2016-05-12 19:49 ` Matt Fleming
  2016-05-12 19:49 ` [RFC][PATCH 2/5] sched: Add wrappers for lockdep_(un)pin_lock() Matt Fleming
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Matt Fleming @ 2016-05-12 19:49 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, Byungchul Park, Frederic Weisbecker, Luca Abeni,
	Rafael J . Wysocki, Rik van Riel, Thomas Gleixner, Wanpeng Li,
	Yuyang Du, Mel Gorman, Mike Galbraith, Matt Fleming

detach_task_cfs_rq() may indirectly call rq_clock() to inform the
cpufreq code that the rq utilisation has changed. In which case, we
need to update the rq clock.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Yuyang Du <yuyang.du@intel.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
---
 kernel/sched/fair.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 218f8e83db73..02856647339d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8378,6 +8378,8 @@ static void detach_task_cfs_rq(struct task_struct *p)
 	struct sched_entity *se = &p->se;
 	struct cfs_rq *cfs_rq = cfs_rq_of(se);
 
+	update_rq_clock(task_rq(p));
+
 	if (!vruntime_normalized(p)) {
 		/*
 		 * Fix up our vruntime so that the current sleep doesn't
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC][PATCH 2/5] sched: Add wrappers for lockdep_(un)pin_lock()
  2016-05-12 19:49 [RFC][PATCH 0/5] sched: Diagnostic checks for missing rq clock updates Matt Fleming
  2016-05-12 19:49 ` [RFC][PATCH 1/5] sched/fair: Update the rq clock before detaching tasks Matt Fleming
@ 2016-05-12 19:49 ` Matt Fleming
  2016-05-12 19:49 ` [RFC][PATCH 3/5] sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock Matt Fleming
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Matt Fleming @ 2016-05-12 19:49 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, Byungchul Park, Frederic Weisbecker, Luca Abeni,
	Rafael J . Wysocki, Rik van Riel, Thomas Gleixner, Wanpeng Li,
	Yuyang Du, Mel Gorman, Mike Galbraith, Matt Fleming

In preparation for adding diagnostic checks to catch missing calls to
update_rq_clock(), provide wrappers for (re)pinning and unpinning
rq->lock.

Because the pending diagnostic checks allow state to be maintained in
rq_flags across pin contexts, swap the 'struct pin_cookie' arguments
for 'struct rq_flags *'.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Luca Abeni <luca.abeni@unitn.it>
Cc: Yuyang Du <yuyang.du@intel.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
---
 kernel/sched/core.c      | 80 ++++++++++++++++++++++++------------------------
 kernel/sched/deadline.c  | 10 +++---
 kernel/sched/fair.c      |  6 ++--
 kernel/sched/idle_task.c |  2 +-
 kernel/sched/rt.c        |  6 ++--
 kernel/sched/sched.h     | 31 ++++++++++++++-----
 kernel/sched/stop_task.c |  2 +-
 7 files changed, 76 insertions(+), 61 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 404c0784b1fc..dfdd8c3c083b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -184,7 +184,7 @@ struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)
 		rq = task_rq(p);
 		raw_spin_lock(&rq->lock);
 		if (likely(rq == task_rq(p) && !task_on_rq_migrating(p))) {
-			rf->cookie = lockdep_pin_lock(&rq->lock);
+			rq_pin_lock(rq, rf);
 			return rq;
 		}
 		raw_spin_unlock(&rq->lock);
@@ -224,7 +224,7 @@ struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)
 		 * pair with the WMB to ensure we must then also see migrating.
 		 */
 		if (likely(rq == task_rq(p) && !task_on_rq_migrating(p))) {
-			rf->cookie = lockdep_pin_lock(&rq->lock);
+			rq_pin_lock(rq, rf);
 			return rq;
 		}
 		raw_spin_unlock(&rq->lock);
@@ -1183,9 +1183,9 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
 		 * OK, since we're going to drop the lock immediately
 		 * afterwards anyway.
 		 */
-		lockdep_unpin_lock(&rq->lock, rf.cookie);
+		rq_unpin_lock(rq, &rf);
 		rq = move_queued_task(rq, p, dest_cpu);
-		lockdep_repin_lock(&rq->lock, rf.cookie);
+		rq_repin_lock(rq, &rf);
 	}
 out:
 	task_rq_unlock(rq, p, &rf);
@@ -1677,7 +1677,7 @@ static inline void ttwu_activate(struct rq *rq, struct task_struct *p, int en_fl
  * Mark the task runnable and perform wakeup-preemption.
  */
 static void ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags,
-			   struct pin_cookie cookie)
+			   struct rq_flags *rf)
 {
 	check_preempt_curr(rq, p, wake_flags);
 	p->state = TASK_RUNNING;
@@ -1689,9 +1689,9 @@ static void ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags,
 		 * Our task @p is fully woken up and running; so its safe to
 		 * drop the rq->lock, hereafter rq is only used for statistics.
 		 */
-		lockdep_unpin_lock(&rq->lock, cookie);
+		rq_unpin_lock(rq, rf);
 		p->sched_class->task_woken(rq, p);
-		lockdep_repin_lock(&rq->lock, cookie);
+		rq_repin_lock(rq, rf);
 	}
 
 	if (rq->idle_stamp) {
@@ -1710,7 +1710,7 @@ static void ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags,
 
 static void
 ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
-		 struct pin_cookie cookie)
+		 struct rq_flags *rf)
 {
 	int en_flags = ENQUEUE_WAKEUP;
 
@@ -1725,7 +1725,7 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
 #endif
 
 	ttwu_activate(rq, p, en_flags);
-	ttwu_do_wakeup(rq, p, wake_flags, cookie);
+	ttwu_do_wakeup(rq, p, wake_flags, rf);
 }
 
 /*
@@ -1744,7 +1744,7 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)
 	if (task_on_rq_queued(p)) {
 		/* check_preempt_curr() may use rq clock */
 		update_rq_clock(rq);
-		ttwu_do_wakeup(rq, p, wake_flags, rf.cookie);
+		ttwu_do_wakeup(rq, p, wake_flags, &rf);
 		ret = 1;
 	}
 	__task_rq_unlock(rq, &rf);
@@ -1757,15 +1757,15 @@ void sched_ttwu_pending(void)
 {
 	struct rq *rq = this_rq();
 	struct llist_node *llist = llist_del_all(&rq->wake_list);
-	struct pin_cookie cookie;
 	struct task_struct *p;
 	unsigned long flags;
+	struct rq_flags rf;
 
 	if (!llist)
 		return;
 
 	raw_spin_lock_irqsave(&rq->lock, flags);
-	cookie = lockdep_pin_lock(&rq->lock);
+	rq_pin_lock(rq, &rf);
 
 	while (llist) {
 		p = llist_entry(llist, struct task_struct, wake_entry);
@@ -1774,10 +1774,10 @@ void sched_ttwu_pending(void)
 		 * See ttwu_queue(); we only call ttwu_queue_remote() when
 		 * its a x-cpu wakeup.
 		 */
-		ttwu_do_activate(rq, p, WF_MIGRATED, cookie);
+		ttwu_do_activate(rq, p, WF_MIGRATED, &rf);
 	}
 
-	lockdep_unpin_lock(&rq->lock, cookie);
+	rq_unpin_lock(rq, &rf);
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
 }
 
@@ -1864,7 +1864,7 @@ bool cpus_share_cache(int this_cpu, int that_cpu)
 static void ttwu_queue(struct task_struct *p, int cpu, int wake_flags)
 {
 	struct rq *rq = cpu_rq(cpu);
-	struct pin_cookie cookie;
+	struct rq_flags rf;
 
 #if defined(CONFIG_SMP)
 	if (sched_feat(TTWU_QUEUE) && !cpus_share_cache(smp_processor_id(), cpu)) {
@@ -1875,9 +1875,9 @@ static void ttwu_queue(struct task_struct *p, int cpu, int wake_flags)
 #endif
 
 	raw_spin_lock(&rq->lock);
-	cookie = lockdep_pin_lock(&rq->lock);
-	ttwu_do_activate(rq, p, wake_flags, cookie);
-	lockdep_unpin_lock(&rq->lock, cookie);
+	rq_pin_lock(rq, &rf);
+	ttwu_do_activate(rq, p, wake_flags, &rf);
+	rq_unpin_lock(rq, &rf);
 	raw_spin_unlock(&rq->lock);
 }
 
@@ -2071,7 +2071,7 @@ out:
  * ensure that this_rq() is locked, @p is bound to this_rq() and not
  * the current task.
  */
-static void try_to_wake_up_local(struct task_struct *p, struct pin_cookie cookie)
+static void try_to_wake_up_local(struct task_struct *p, struct rq_flags *rf)
 {
 	struct rq *rq = task_rq(p);
 
@@ -2088,11 +2088,11 @@ static void try_to_wake_up_local(struct task_struct *p, struct pin_cookie cookie
 		 * disabled avoiding further scheduler activity on it and we've
 		 * not yet picked a replacement task.
 		 */
-		lockdep_unpin_lock(&rq->lock, cookie);
+		rq_unpin_lock(rq, rf);
 		raw_spin_unlock(&rq->lock);
 		raw_spin_lock(&p->pi_lock);
 		raw_spin_lock(&rq->lock);
-		lockdep_repin_lock(&rq->lock, cookie);
+		rq_repin_lock(rq, rf);
 	}
 
 	if (!(p->state & TASK_NORMAL))
@@ -2103,7 +2103,7 @@ static void try_to_wake_up_local(struct task_struct *p, struct pin_cookie cookie
 	if (!task_on_rq_queued(p))
 		ttwu_activate(rq, p, ENQUEUE_WAKEUP);
 
-	ttwu_do_wakeup(rq, p, 0, cookie);
+	ttwu_do_wakeup(rq, p, 0, rf);
 	if (schedstat_enabled())
 		ttwu_stat(p, smp_processor_id(), 0);
 out:
@@ -2531,9 +2531,9 @@ void wake_up_new_task(struct task_struct *p)
 		 * Nothing relies on rq->lock after this, so its fine to
 		 * drop it.
 		 */
-		lockdep_unpin_lock(&rq->lock, rf.cookie);
+		rq_unpin_lock(rq, &rf);
 		p->sched_class->task_woken(rq, p);
-		lockdep_repin_lock(&rq->lock, rf.cookie);
+		rq_repin_lock(rq, &rf);
 	}
 #endif
 	task_rq_unlock(rq, p, &rf);
@@ -2798,7 +2798,7 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev)
  */
 static __always_inline struct rq *
 context_switch(struct rq *rq, struct task_struct *prev,
-	       struct task_struct *next, struct pin_cookie cookie)
+	       struct task_struct *next, struct rq_flags *rf)
 {
 	struct mm_struct *mm, *oldmm;
 
@@ -2830,7 +2830,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
 	 * of the scheduler it's an obvious special-case), so we
 	 * do an early lockdep release here:
 	 */
-	lockdep_unpin_lock(&rq->lock, cookie);
+	rq_unpin_lock(rq, rf);
 	spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
 
 	/* Here we just switch the register state and the stack. */
@@ -3170,7 +3170,7 @@ static inline void schedule_debug(struct task_struct *prev)
  * Pick up the highest-prio task:
  */
 static inline struct task_struct *
-pick_next_task(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie)
+pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
 	const struct sched_class *class = &fair_sched_class;
 	struct task_struct *p;
@@ -3181,20 +3181,20 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie
 	 */
 	if (likely(prev->sched_class == class &&
 		   rq->nr_running == rq->cfs.h_nr_running)) {
-		p = fair_sched_class.pick_next_task(rq, prev, cookie);
+		p = fair_sched_class.pick_next_task(rq, prev, rf);
 		if (unlikely(p == RETRY_TASK))
 			goto again;
 
 		/* assumes fair_sched_class->next == idle_sched_class */
 		if (unlikely(!p))
-			p = idle_sched_class.pick_next_task(rq, prev, cookie);
+			p = idle_sched_class.pick_next_task(rq, prev, rf);
 
 		return p;
 	}
 
 again:
 	for_each_class(class) {
-		p = class->pick_next_task(rq, prev, cookie);
+		p = class->pick_next_task(rq, prev, rf);
 		if (p) {
 			if (unlikely(p == RETRY_TASK))
 				goto again;
@@ -3248,7 +3248,7 @@ static void __sched notrace __schedule(bool preempt)
 {
 	struct task_struct *prev, *next;
 	unsigned long *switch_count;
-	struct pin_cookie cookie;
+	struct rq_flags rf;
 	struct rq *rq;
 	int cpu;
 
@@ -3282,7 +3282,7 @@ static void __sched notrace __schedule(bool preempt)
 	 */
 	smp_mb__before_spinlock();
 	raw_spin_lock(&rq->lock);
-	cookie = lockdep_pin_lock(&rq->lock);
+	rq_pin_lock(rq, &rf);
 
 	rq->clock_skip_update <<= 1; /* promote REQ to ACT */
 
@@ -3304,7 +3304,7 @@ static void __sched notrace __schedule(bool preempt)
 
 				to_wakeup = wq_worker_sleeping(prev);
 				if (to_wakeup)
-					try_to_wake_up_local(to_wakeup, cookie);
+					try_to_wake_up_local(to_wakeup, &rf);
 			}
 		}
 		switch_count = &prev->nvcsw;
@@ -3313,7 +3313,7 @@ static void __sched notrace __schedule(bool preempt)
 	if (task_on_rq_queued(prev))
 		update_rq_clock(rq);
 
-	next = pick_next_task(rq, prev, cookie);
+	next = pick_next_task(rq, prev, &rf);
 	clear_tsk_need_resched(prev);
 	clear_preempt_need_resched();
 	rq->clock_skip_update = 0;
@@ -3324,9 +3324,9 @@ static void __sched notrace __schedule(bool preempt)
 		++*switch_count;
 
 		trace_sched_switch(preempt, prev, next);
-		rq = context_switch(rq, prev, next, cookie); /* unlocks the rq */
+		rq = context_switch(rq, prev, next, &rf); /* unlocks the rq */
 	} else {
-		lockdep_unpin_lock(&rq->lock, cookie);
+		rq_unpin_lock(rq, &rf);
 		raw_spin_unlock_irq(&rq->lock);
 	}
 
@@ -5411,7 +5411,7 @@ static void migrate_tasks(struct rq *dead_rq)
 {
 	struct rq *rq = dead_rq;
 	struct task_struct *next, *stop = rq->stop;
-	struct pin_cookie cookie;
+	struct rq_flags rf;
 	int dest_cpu;
 
 	/*
@@ -5443,8 +5443,8 @@ static void migrate_tasks(struct rq *dead_rq)
 		/*
 		 * pick_next_task assumes pinned rq->lock.
 		 */
-		cookie = lockdep_pin_lock(&rq->lock);
-		next = pick_next_task(rq, &fake_task, cookie);
+		rq_pin_lock(rq, &rf);
+		next = pick_next_task(rq, &fake_task, &rf);
 		BUG_ON(!next);
 		next->sched_class->put_prev_task(rq, next);
 
@@ -5457,7 +5457,7 @@ static void migrate_tasks(struct rq *dead_rq)
 		 * because !cpu_active at this point, which means load-balance
 		 * will not interfere. Also, stop-machine.
 		 */
-		lockdep_unpin_lock(&rq->lock, cookie);
+		rq_unpin_lock(rq, &rf);
 		raw_spin_unlock(&rq->lock);
 		raw_spin_lock(&next->pi_lock);
 		raw_spin_lock(&rq->lock);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index fcb7f0217ff4..9d2894895745 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -670,9 +670,9 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
 		 * Nothing relies on rq->lock after this, so its safe to drop
 		 * rq->lock.
 		 */
-		lockdep_unpin_lock(&rq->lock, rf.cookie);
+		rq_unpin_lock(rq, &rf);
 		push_dl_task(rq);
-		lockdep_repin_lock(&rq->lock, rf.cookie);
+		rq_repin_lock(rq, &rf);
 	}
 #endif
 
@@ -1126,7 +1126,7 @@ static struct sched_dl_entity *pick_next_dl_entity(struct rq *rq,
 }
 
 struct task_struct *
-pick_next_task_dl(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie)
+pick_next_task_dl(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
 	struct sched_dl_entity *dl_se;
 	struct task_struct *p;
@@ -1141,9 +1141,9 @@ pick_next_task_dl(struct rq *rq, struct task_struct *prev, struct pin_cookie coo
 		 * disabled avoiding further scheduler activity on it and we're
 		 * being very careful to re-start the picking loop.
 		 */
-		lockdep_unpin_lock(&rq->lock, cookie);
+		rq_unpin_lock(rq, rf);
 		pull_dl_task(rq);
-		lockdep_repin_lock(&rq->lock, cookie);
+		rq_repin_lock(rq, rf);
 		/*
 		 * pull_rt_task() can drop (and re-acquire) rq->lock; this
 		 * means a stop task can slip in, in which case we need to
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 02856647339d..7f776a99bde0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5594,7 +5594,7 @@ preempt:
 }
 
 static struct task_struct *
-pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie)
+pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
 	struct cfs_rq *cfs_rq = &rq->cfs;
 	struct sched_entity *se;
@@ -5707,9 +5707,9 @@ idle:
 	 * further scheduler activity on it and we're being very careful to
 	 * re-start the picking loop.
 	 */
-	lockdep_unpin_lock(&rq->lock, cookie);
+	rq_unpin_lock(rq, rf);
 	new_tasks = idle_balance(rq);
-	lockdep_repin_lock(&rq->lock, cookie);
+	rq_repin_lock(rq, rf);
 	/*
 	 * Because idle_balance() releases (and re-acquires) rq->lock, it is
 	 * possible for any higher priority task to appear. In that case we
diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
index 2ce5458bbe1d..f86dd339a7c9 100644
--- a/kernel/sched/idle_task.c
+++ b/kernel/sched/idle_task.c
@@ -24,7 +24,7 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl
 }
 
 static struct task_struct *
-pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie)
+pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
 	put_prev_task(rq, prev);
 
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index d5690b722691..ab79812ee33e 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1524,7 +1524,7 @@ static struct task_struct *_pick_next_task_rt(struct rq *rq)
 }
 
 static struct task_struct *
-pick_next_task_rt(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie)
+pick_next_task_rt(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
 	struct task_struct *p;
 	struct rt_rq *rt_rq = &rq->rt;
@@ -1536,9 +1536,9 @@ pick_next_task_rt(struct rq *rq, struct task_struct *prev, struct pin_cookie coo
 		 * disabled avoiding further scheduler activity on it and we're
 		 * being very careful to re-start the picking loop.
 		 */
-		lockdep_unpin_lock(&rq->lock, cookie);
+		rq_unpin_lock(rq, rf);
 		pull_rt_task(rq);
-		lockdep_repin_lock(&rq->lock, cookie);
+		rq_repin_lock(rq, rf);
 		/*
 		 * pull_rt_task() can drop (and re-acquire) rq->lock; this
 		 * means a dl or stop task can slip in, in which case we need
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index e51145e76807..cbeb830c7ac6 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -759,6 +759,26 @@ static inline void rq_clock_skip_update(struct rq *rq, bool skip)
 		rq->clock_skip_update &= ~RQCF_REQ_SKIP;
 }
 
+struct rq_flags {
+	unsigned long flags;
+	struct pin_cookie cookie;
+};
+
+static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf)
+{
+	rf->cookie = lockdep_pin_lock(&rq->lock);
+}
+
+static inline void rq_unpin_lock(struct rq *rq, struct rq_flags *rf)
+{
+	lockdep_unpin_lock(&rq->lock, rf->cookie);
+}
+
+static inline void rq_repin_lock(struct rq *rq, struct rq_flags *rf)
+{
+	lockdep_repin_lock(&rq->lock, rf->cookie);
+}
+
 #ifdef CONFIG_NUMA
 enum numa_topology_type {
 	NUMA_DIRECT,
@@ -1210,7 +1230,7 @@ struct sched_class {
 	 */
 	struct task_struct * (*pick_next_task) (struct rq *rq,
 						struct task_struct *prev,
-						struct pin_cookie cookie);
+						struct rq_flags *rf);
 	void (*put_prev_task) (struct rq *rq, struct task_struct *p);
 
 #ifdef CONFIG_SMP
@@ -1458,11 +1478,6 @@ static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta) { }
 static inline void sched_avg_update(struct rq *rq) { }
 #endif
 
-struct rq_flags {
-	unsigned long flags;
-	struct pin_cookie cookie;
-};
-
 struct rq *__task_rq_lock(struct task_struct *p, struct rq_flags *rf)
 	__acquires(rq->lock);
 struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)
@@ -1472,7 +1487,7 @@ struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)
 static inline void __task_rq_unlock(struct rq *rq, struct rq_flags *rf)
 	__releases(rq->lock)
 {
-	lockdep_unpin_lock(&rq->lock, rf->cookie);
+	rq_unpin_lock(rq, rf);
 	raw_spin_unlock(&rq->lock);
 }
 
@@ -1481,7 +1496,7 @@ task_rq_unlock(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
 	__releases(rq->lock)
 	__releases(p->pi_lock)
 {
-	lockdep_unpin_lock(&rq->lock, rf->cookie);
+	rq_unpin_lock(rq, rf);
 	raw_spin_unlock(&rq->lock);
 	raw_spin_unlock_irqrestore(&p->pi_lock, rf->flags);
 }
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 604297a08b3a..9f69fb630853 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -24,7 +24,7 @@ check_preempt_curr_stop(struct rq *rq, struct task_struct *p, int flags)
 }
 
 static struct task_struct *
-pick_next_task_stop(struct rq *rq, struct task_struct *prev, struct pin_cookie cookie)
+pick_next_task_stop(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 {
 	struct task_struct *stop = rq->stop;
 
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC][PATCH 3/5] sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock
  2016-05-12 19:49 [RFC][PATCH 0/5] sched: Diagnostic checks for missing rq clock updates Matt Fleming
  2016-05-12 19:49 ` [RFC][PATCH 1/5] sched/fair: Update the rq clock before detaching tasks Matt Fleming
  2016-05-12 19:49 ` [RFC][PATCH 2/5] sched: Add wrappers for lockdep_(un)pin_lock() Matt Fleming
@ 2016-05-12 19:49 ` Matt Fleming
  2016-05-12 19:49 ` [RFC][PATCH 4/5] sched/fair: Push rq lock pin/unpin into idle_balance() Matt Fleming
  2016-05-12 19:49 ` [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock() Matt Fleming
  4 siblings, 0 replies; 13+ messages in thread
From: Matt Fleming @ 2016-05-12 19:49 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, Byungchul Park, Frederic Weisbecker, Luca Abeni,
	Rafael J . Wysocki, Rik van Riel, Thomas Gleixner, Wanpeng Li,
	Yuyang Du, Mel Gorman, Mike Galbraith, Matt Fleming

rq_clock() is called from sched_info_{depart,arrive}() after resetting
RQCF_ACT_SKIP but prior to a call to update_rq_clock().

In preparation for pending patches that check whether the rq clock has
been updated inside of a pin context before rq_clock() is called, move
the reset of rq->clock_skip_update immediately before unpinning the rq
lock.

This will avoid the new warnings which check if update_rq_clock() is
being actively skipped.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
---
 kernel/sched/core.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index dfdd8c3c083b..d2112c268fd1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2824,6 +2824,9 @@ context_switch(struct rq *rq, struct task_struct *prev,
 		prev->active_mm = NULL;
 		rq->prev_mm = oldmm;
 	}
+
+	rq->clock_skip_update = 0;
+
 	/*
 	 * Since the runqueue lock will be released by the next
 	 * task (which is an invalid locking op but in the case
@@ -3316,7 +3319,6 @@ static void __sched notrace __schedule(bool preempt)
 	next = pick_next_task(rq, prev, &rf);
 	clear_tsk_need_resched(prev);
 	clear_preempt_need_resched();
-	rq->clock_skip_update = 0;
 
 	if (likely(prev != next)) {
 		rq->nr_switches++;
@@ -3326,6 +3328,7 @@ static void __sched notrace __schedule(bool preempt)
 		trace_sched_switch(preempt, prev, next);
 		rq = context_switch(rq, prev, next, &rf); /* unlocks the rq */
 	} else {
+		rq->clock_skip_update = 0;
 		rq_unpin_lock(rq, &rf);
 		raw_spin_unlock_irq(&rq->lock);
 	}
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC][PATCH 4/5] sched/fair: Push rq lock pin/unpin into idle_balance()
  2016-05-12 19:49 [RFC][PATCH 0/5] sched: Diagnostic checks for missing rq clock updates Matt Fleming
                   ` (2 preceding siblings ...)
  2016-05-12 19:49 ` [RFC][PATCH 3/5] sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock Matt Fleming
@ 2016-05-12 19:49 ` Matt Fleming
  2016-05-12 19:49 ` [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock() Matt Fleming
  4 siblings, 0 replies; 13+ messages in thread
From: Matt Fleming @ 2016-05-12 19:49 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, Byungchul Park, Frederic Weisbecker, Luca Abeni,
	Rafael J . Wysocki, Rik van Riel, Thomas Gleixner, Wanpeng Li,
	Yuyang Du, Mel Gorman, Mike Galbraith, Matt Fleming

Future patches will emit warnings if rq_clock() is called before
update_rq_clock() inside a rq_pin_lock()/rq_unpin_lock() pair.

Since there is only one caller of idle_balance() we can push the
unpin/repin there.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Yuyang Du <yuyang.du@intel.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
---
 kernel/sched/fair.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7f776a99bde0..217e3a9d78db 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3095,7 +3095,7 @@ static inline unsigned long cfs_rq_load_avg(struct cfs_rq *cfs_rq)
 	return cfs_rq->avg.load_avg;
 }
 
-static int idle_balance(struct rq *this_rq);
+static int idle_balance(struct rq *this_rq, struct rq_flags *rf);
 
 #else /* CONFIG_SMP */
 
@@ -3118,7 +3118,7 @@ attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
 static inline void
 detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
 
-static inline int idle_balance(struct rq *rq)
+static inline int idle_balance(struct rq *rq, struct rq_flags *rf)
 {
 	return 0;
 }
@@ -5701,15 +5701,8 @@ simple:
 	return p;
 
 idle:
-	/*
-	 * This is OK, because current is on_cpu, which avoids it being picked
-	 * for load-balance and preemption/IRQs are still disabled avoiding
-	 * further scheduler activity on it and we're being very careful to
-	 * re-start the picking loop.
-	 */
-	rq_unpin_lock(rq, rf);
-	new_tasks = idle_balance(rq);
-	rq_repin_lock(rq, rf);
+	new_tasks = idle_balance(rq, rf);
+
 	/*
 	 * Because idle_balance() releases (and re-acquires) rq->lock, it is
 	 * possible for any higher priority task to appear. In that case we
@@ -7641,7 +7634,7 @@ update_next_balance(struct sched_domain *sd, int cpu_busy, unsigned long *next_b
  * idle_balance is called by schedule() if this_cpu is about to become
  * idle. Attempts to pull tasks from other CPUs.
  */
-static int idle_balance(struct rq *this_rq)
+static int idle_balance(struct rq *this_rq, struct rq_flags *rf)
 {
 	unsigned long next_balance = jiffies + HZ;
 	int this_cpu = this_rq->cpu;
@@ -7655,6 +7648,14 @@ static int idle_balance(struct rq *this_rq)
 	 */
 	this_rq->idle_stamp = rq_clock(this_rq);
 
+	/*
+	 * This is OK, because current is on_cpu, which avoids it being picked
+	 * for load-balance and preemption/IRQs are still disabled avoiding
+	 * further scheduler activity on it and we're being very careful to
+	 * re-start the picking loop.
+	 */
+	rq_unpin_lock(this_rq, rf);
+
 	if (this_rq->avg_idle < sysctl_sched_migration_cost ||
 	    !this_rq->rd->overload) {
 		rcu_read_lock();
@@ -7732,6 +7733,8 @@ out:
 	if (pulled_task)
 		this_rq->idle_stamp = 0;
 
+	rq_repin_lock(this_rq, rf);
+
 	return pulled_task;
 }
 
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
  2016-05-12 19:49 [RFC][PATCH 0/5] sched: Diagnostic checks for missing rq clock updates Matt Fleming
                   ` (3 preceding siblings ...)
  2016-05-12 19:49 ` [RFC][PATCH 4/5] sched/fair: Push rq lock pin/unpin into idle_balance() Matt Fleming
@ 2016-05-12 19:49 ` Matt Fleming
  2016-05-15  2:14   ` Yuyang Du
  4 siblings, 1 reply; 13+ messages in thread
From: Matt Fleming @ 2016-05-12 19:49 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, Byungchul Park, Frederic Weisbecker, Luca Abeni,
	Rafael J . Wysocki, Rik van Riel, Thomas Gleixner, Wanpeng Li,
	Yuyang Du, Mel Gorman, Mike Galbraith, Matt Fleming

There's no diagnostic checks for figuring out when we've accidentally
missed update_rq_clock() calls. Let's add some by piggybacking on the
rq_*pin_lock() wrappers.

The idea behind the diagnostic checks is that upon pining rq lock the
rq clock should be updated, via update_rq_clock(), before anybody
reads the clock with rq_clock(). The exception to this rule is when
updates have explicitly been disabled with the rq_clock_skip_update()
optimisation.

There are some functions that only unpin the rq lock in order to grab
some other lock and avoid deadlock. In that case we don't need to
update the clock again and the previous diagnostic state can be
carried over in rq_repin_lock() by saving the state in the rq_flags
context.

Since this patch adds a new clock update flag and some already exist
in rq::clock_skip_update, that field has now been renamed. An attempt
has been made to keep the flag manipulation code small and fast since
it's used in the heart of the __schedule() fast path.

For the !CONFIG_SCHED_DEBUG case the only object code change (other
than addresses) is the following change to the two lines to reset
RQCF_ACT_SKIP inside of __schedule(),

  -       41 c7 84 24 f0 08 00    movl    $0x0,0x8f0(%r12)
  -       00 00 00 00 00
  +       41 83 b4 24 f0 08 00    xorl    $0x2,0x8f0(%r12)
  +       00 02

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Yuyang Du <yuyang.du@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
---
 kernel/sched/core.c  | 13 +++++++---
 kernel/sched/sched.h | 70 +++++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 73 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d2112c268fd1..0999b3f23dde 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -101,9 +101,12 @@ void update_rq_clock(struct rq *rq)
 
 	lockdep_assert_held(&rq->lock);
 
-	if (rq->clock_skip_update & RQCF_ACT_SKIP)
+	if (rq->clock_update_flags & RQCF_ACT_SKIP)
 		return;
 
+#ifdef CONFIG_SCHED_DEBUG
+	rq->clock_update_flags |= RQCF_UPDATED;
+#endif
 	delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;
 	if (delta < 0)
 		return;
@@ -2825,7 +2828,8 @@ context_switch(struct rq *rq, struct task_struct *prev,
 		rq->prev_mm = oldmm;
 	}
 
-	rq->clock_skip_update = 0;
+	/* Clear ACT, preserve everything else */
+	rq->clock_update_flags ^= RQCF_ACT_SKIP;
 
 	/*
 	 * Since the runqueue lock will be released by the next
@@ -3287,7 +3291,7 @@ static void __sched notrace __schedule(bool preempt)
 	raw_spin_lock(&rq->lock);
 	rq_pin_lock(rq, &rf);
 
-	rq->clock_skip_update <<= 1; /* promote REQ to ACT */
+	rq->clock_update_flags <<= 1; /* promote REQ to ACT */
 
 	switch_count = &prev->nivcsw;
 	if (!preempt && prev->state) {
@@ -3328,7 +3332,8 @@ static void __sched notrace __schedule(bool preempt)
 		trace_sched_switch(preempt, prev, next);
 		rq = context_switch(rq, prev, next, &rf); /* unlocks the rq */
 	} else {
-		rq->clock_skip_update = 0;
+		/* Clear ACT, preserve everything else */
+		rq->clock_update_flags ^= RQCF_ACT_SKIP;
 		rq_unpin_lock(rq, &rf);
 		raw_spin_unlock_irq(&rq->lock);
 	}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index cbeb830c7ac6..072c020cd8e3 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -628,7 +628,7 @@ struct rq {
 	unsigned long next_balance;
 	struct mm_struct *prev_mm;
 
-	unsigned int clock_skip_update;
+	unsigned int clock_update_flags;
 	u64 clock;
 	u64 clock_task;
 
@@ -735,9 +735,45 @@ static inline u64 __rq_clock_broken(struct rq *rq)
 	return READ_ONCE(rq->clock);
 }
 
+/*
+ * rq::clock_update_flags bits
+ *
+ * %RQCF_REQ_SKIP - will request skipping of clock update on the next
+ *  call to __schedule(). This is an optimisation to avoid
+ *  neighbouring rq clock updates.
+ *
+ * %RQCF_ACT_SKIP - is set from inside of __schedule() when skipping is
+ *  in effect and calls to update_rq_clock() are being ignored.
+ *
+ * %RQCF_UPDATED - is a debug flag that indicates whether a call has been
+ *  made to update_rq_clock() since the last time rq::lock was pinned.
+ *
+ * If inside of __schedule(), clock_update_flags will have been
+ * shifted left (a left shift is a cheap operation for the fast path
+ * to promote %RQCF_REQ_SKIP to %RQCF_ACT_SKIP), so you must use,
+ *
+ *	if (rq-clock_update_flags >= RQCF_UPDATED)
+ *
+ * to check if %RQCF_UPADTED is set. It'll never be shifted more than
+ * one position though, because the next rq_unpin_lock() will shift it
+ * back.
+ */
+#define RQCF_REQ_SKIP	0x01
+#define RQCF_ACT_SKIP	0x02
+#define RQCF_UPDATED	0x04
+
 static inline u64 rq_clock(struct rq *rq)
 {
 	lockdep_assert_held(&rq->lock);
+
+#ifdef CONFIG_SCHED_DEBUG
+	/*
+	 * The only reason for not seeing a clock update since the
+	 * last rq_pin_lock() is if we're currently skipping updates.
+	 */
+	WARN_ON_ONCE(rq->clock_update_flags < RQCF_ACT_SKIP);
+#endif
+
 	return rq->clock;
 }
 
@@ -747,36 +783,58 @@ static inline u64 rq_clock_task(struct rq *rq)
 	return rq->clock_task;
 }
 
-#define RQCF_REQ_SKIP	0x01
-#define RQCF_ACT_SKIP	0x02
-
 static inline void rq_clock_skip_update(struct rq *rq, bool skip)
 {
 	lockdep_assert_held(&rq->lock);
 	if (skip)
-		rq->clock_skip_update |= RQCF_REQ_SKIP;
+		rq->clock_update_flags |= RQCF_REQ_SKIP;
 	else
-		rq->clock_skip_update &= ~RQCF_REQ_SKIP;
+		rq->clock_update_flags &= ~RQCF_REQ_SKIP;
 }
 
 struct rq_flags {
 	unsigned long flags;
 	struct pin_cookie cookie;
+#ifdef CONFIG_SCHED_DEBUG
+	/*
+	 * A copy of (rq::clock_update_flags & RQCF_UPDATED) for the
+	 * current pin context is stashed here in case it needs to be
+	 * restored in rq_repin_lock().
+	 */
+	unsigned int clock_update_flags;
+#endif
 };
 
 static inline void rq_pin_lock(struct rq *rq, struct rq_flags *rf)
 {
 	rf->cookie = lockdep_pin_lock(&rq->lock);
+
+#ifdef CONFIG_SCHED_DEBUG
+	rq->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
+	rf->clock_update_flags = 0;
+#endif
 }
 
 static inline void rq_unpin_lock(struct rq *rq, struct rq_flags *rf)
 {
+#ifdef CONFIG_SCHED_DEBUG
+	if (rq->clock_update_flags > RQCF_ACT_SKIP)
+		rf->clock_update_flags = RQCF_UPDATED;
+#endif
+
 	lockdep_unpin_lock(&rq->lock, rf->cookie);
 }
 
 static inline void rq_repin_lock(struct rq *rq, struct rq_flags *rf)
 {
 	lockdep_repin_lock(&rq->lock, rf->cookie);
+
+#ifdef CONFIG_SCHED_DEBUG
+	/*
+	 * Restore the value we stashed in @rf for this pin context.
+	 */
+	rq->clock_update_flags |= rf->clock_update_flags;
+#endif
 }
 
 #ifdef CONFIG_NUMA
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
  2016-05-12 19:49 ` [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock() Matt Fleming
@ 2016-05-15  2:14   ` Yuyang Du
  2016-05-16  9:46     ` Matt Fleming
  0 siblings, 1 reply; 13+ messages in thread
From: Yuyang Du @ 2016-05-15  2:14 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Byungchul Park,
	Frederic Weisbecker, Luca Abeni, Rafael J . Wysocki,
	Rik van Riel, Thomas Gleixner, Wanpeng Li, Mel Gorman,
	Mike Galbraith

Hi Matt,

Thanks for Ccing me.

I am indeed interested, because I recently encountered an rq clock
issue, which is that the clock jumps about 200ms when I was
experimenting the "flat util hierarchy" patches, which really annoyed
me, and I had to stop to figure out what is wrong (but haven't yet
figured out ;))

First, this patchset does not solve my problem, but never mind, by
reviewing your patches, I have some comments:

On Thu, May 12, 2016 at 08:49:53PM +0100, Matt Fleming wrote:
>  
> -	rq->clock_skip_update = 0;
> +	/* Clear ACT, preserve everything else */
> +	rq->clock_update_flags ^= RQCF_ACT_SKIP;

The comment says "Clear ACT", but this is really xor, and I am not sure
this is even what you want.

In addition, would it be simpler to do this?

update_rq_clock()
	if (flags & RQCF_ACT_SKIP)
		flags <<= 1; /* effective skip is an update */
		return;

	flags = RQCF_UPDATED;

Thanks,
Yuyang

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
  2016-05-15  2:14   ` Yuyang Du
@ 2016-05-16  9:46     ` Matt Fleming
  2016-05-16 20:11       ` Yuyang Du
  0 siblings, 1 reply; 13+ messages in thread
From: Matt Fleming @ 2016-05-16  9:46 UTC (permalink / raw)
  To: Yuyang Du
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Byungchul Park,
	Frederic Weisbecker, Luca Abeni, Rafael J . Wysocki,
	Rik van Riel, Thomas Gleixner, Wanpeng Li, Mel Gorman,
	Mike Galbraith

On Sun, 15 May, at 10:14:39AM, Yuyang Du wrote:
> Hi Matt,
> 
> Thanks for Ccing me.
> 
> I am indeed interested, because I recently encountered an rq clock
> issue, which is that the clock jumps about 200ms when I was
> experimenting the "flat util hierarchy" patches, which really annoyed
> me, and I had to stop to figure out what is wrong (but haven't yet
> figured out ;))
> 
> First, this patchset does not solve my problem, but never mind, by
> reviewing your patches, I have some comments:
 
Thanks for the review. One gap that this patch series doesn't address
is that some callers of update_rq_clock() do not pin rq->lock, which
makes the diagnostic checks useless in that case.

I plan on handling that next, but I wanted to get this series out as
soon as possible for review.

> On Thu, May 12, 2016 at 08:49:53PM +0100, Matt Fleming wrote:
> >  
> > -	rq->clock_skip_update = 0;
> > +	/* Clear ACT, preserve everything else */
> > +	rq->clock_update_flags ^= RQCF_ACT_SKIP;
> 
> The comment says "Clear ACT", but this is really xor, and I am not sure
> this is even what you want.
 
Urgh, you're right. I'm not sure what I was thinking when I wrote
that.

> In addition, would it be simpler to do this?
> 
> update_rq_clock()
> 	if (flags & RQCF_ACT_SKIP)
> 		flags <<= 1; /* effective skip is an update */
> 		return;
> 
> 	flags = RQCF_UPDATED;

No because if someone calls rq_clock() immediately after __schedule(),
or even immediately after we clear RQCF_ACT_SKIP in __schedule(), we
should trigger a warning since the clock has not actually been
updated.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
  2016-05-16  9:46     ` Matt Fleming
@ 2016-05-16 20:11       ` Yuyang Du
  2016-05-17 12:24         ` Matt Fleming
  0 siblings, 1 reply; 13+ messages in thread
From: Yuyang Du @ 2016-05-16 20:11 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Byungchul Park,
	Frederic Weisbecker, Luca Abeni, Rafael J . Wysocki,
	Rik van Riel, Thomas Gleixner, Wanpeng Li, Mel Gorman,
	Mike Galbraith

On Mon, May 16, 2016 at 10:46:38AM +0100, Matt Fleming wrote:
> > >  
> > > -	rq->clock_skip_update = 0;
> > > +	/* Clear ACT, preserve everything else */
> > > +	rq->clock_update_flags ^= RQCF_ACT_SKIP;
> > 
> > The comment says "Clear ACT", but this is really xor, and I am not sure
> > this is even what you want.
>  
> Urgh, you're right. I'm not sure what I was thinking when I wrote
> that.

It happens, ;)
 
> > In addition, would it be simpler to do this?
> > 
> > update_rq_clock()
> > 	if (flags & RQCF_ACT_SKIP)
> > 		flags <<= 1; /* effective skip is an update */
> > 		return;
> > 
> > 	flags = RQCF_UPDATED;
> 
> No because if someone calls rq_clock() immediately after __schedule(),
> or even immediately after we clear RQCF_ACT_SKIP in __schedule(), we
> should trigger a warning since the clock has not actually been
> updated.

Well, I don't know how concurrent it can be, but aren't both update
and read synchronized by rq->lock? So I don't understand the latter
case, and the former should be addressed (missing its own update?).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
  2016-05-16 20:11       ` Yuyang Du
@ 2016-05-17 12:24         ` Matt Fleming
  2016-05-17 19:01           ` Yuyang Du
  0 siblings, 1 reply; 13+ messages in thread
From: Matt Fleming @ 2016-05-17 12:24 UTC (permalink / raw)
  To: Yuyang Du
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Byungchul Park,
	Frederic Weisbecker, Luca Abeni, Rafael J . Wysocki,
	Rik van Riel, Thomas Gleixner, Wanpeng Li, Mel Gorman,
	Mike Galbraith

On Tue, 17 May, at 04:11:09AM, Yuyang Du wrote:
> On Mon, May 16, 2016 at 10:46:38AM +0100, Matt Fleming wrote:
> > 
> > No because if someone calls rq_clock() immediately after __schedule(),
> > or even immediately after we clear RQCF_ACT_SKIP in __schedule(), we
> > should trigger a warning since the clock has not actually been
> > updated.
> 
> Well, I don't know how concurrent it can be, but aren't both update
> and read synchronized by rq->lock? So I don't understand the latter
> case, and the former should be addressed (missing its own update?).

I'm not talking about concurrency; when I said "someone" above, I was
referring to code.

So, if the code looks like the following, either now or in the future,

static void __schedule(bool preempt)
{
	...
	/* Clear RQCF_ACT_SKIP */
	rq->clock_update_flags = 0;
	...
	delta = rq_clock();
}

I would expect to see a warning triggered, because we've read the rq
clock outside of the code area where we know it's safe to do so
without a clock update.

The solution for that bug may be as simple as rearranging the code,

	delta = rq_clock();
	...
	rq->clock_update_flags = 0;

but we definitely want to catch such bugs in the first instance.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
  2016-05-17 12:24         ` Matt Fleming
@ 2016-05-17 19:01           ` Yuyang Du
  2016-05-18  8:41             ` Matt Fleming
  0 siblings, 1 reply; 13+ messages in thread
From: Yuyang Du @ 2016-05-17 19:01 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Byungchul Park,
	Frederic Weisbecker, Luca Abeni, Rafael J . Wysocki,
	Rik van Riel, Thomas Gleixner, Wanpeng Li, Mel Gorman,
	Mike Galbraith

On Tue, May 17, 2016 at 01:24:15PM +0100, Matt Fleming wrote:
> So, if the code looks like the following, either now or in the future,
> 
> static void __schedule(bool preempt)
> {
> 	...
> 	/* Clear RQCF_ACT_SKIP */
> 	rq->clock_update_flags = 0;
> 	...
> 	delta = rq_clock();
> }
 
Sigh, you even said "Clear RQCF_ACT_SKIP", but you not only clear it,
you clear everything. And if you clear the RQCF_UPDATE also (maybe you
shouldn't, but actually it does not matter), of course you will get
a warning...

In addition, it looks like multiple skips are possible, so:

update_rq_clock() {
        rq->clock_update_flags |= RQCF_UPDATE;

        ...
}

instead of clearing the skip flag there.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
  2016-05-17 19:01           ` Yuyang Du
@ 2016-05-18  8:41             ` Matt Fleming
  2016-05-18 22:51               ` Yuyang Du
  0 siblings, 1 reply; 13+ messages in thread
From: Matt Fleming @ 2016-05-18  8:41 UTC (permalink / raw)
  To: Yuyang Du
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Byungchul Park,
	Frederic Weisbecker, Luca Abeni, Rafael J . Wysocki,
	Rik van Riel, Thomas Gleixner, Wanpeng Li, Mel Gorman,
	Mike Galbraith

On Wed, 18 May, at 03:01:27AM, Yuyang Du wrote:
> On Tue, May 17, 2016 at 01:24:15PM +0100, Matt Fleming wrote:
> > So, if the code looks like the following, either now or in the future,
> > 
> > static void __schedule(bool preempt)
> > {
> > 	...
> > 	/* Clear RQCF_ACT_SKIP */
> > 	rq->clock_update_flags = 0;
> > 	...
> > 	delta = rq_clock();
> > }
>  
> Sigh, you even said "Clear RQCF_ACT_SKIP", but you not only clear it,
> you clear everything.

That was sloppy on my part but intentional because that's what the
code looks like in tip/sched/core today.

It was purely meant to demonstrate that setting RQCF_UPDATE just
because RQCF_ACT_SKIP is set does not make sense. You can replace the
clearing line with the correct bit masking operation.

But I get it, the pseudo-code was confusing. I'll send out a v2.

> And if you clear the RQCF_UPDATE also (maybe you shouldn't, but
> actually it does not matter), of course you will get a warning...

Right, I wouldn't actually clear RQCF_UPDATE in v2 of this patch.

> In addition, it looks like multiple skips are possible, so:
 
I'm not sure what you mean, could you elaborate?

> update_rq_clock() {
>         rq->clock_update_flags |= RQCF_UPDATE;
> 
>         ...
> }
> 
> instead of clearing the skip flag there.

Huh? RQCF_*_SKIP are not cleared in update_rq_clock().

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
  2016-05-18  8:41             ` Matt Fleming
@ 2016-05-18 22:51               ` Yuyang Du
  0 siblings, 0 replies; 13+ messages in thread
From: Yuyang Du @ 2016-05-18 22:51 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Byungchul Park,
	Frederic Weisbecker, Luca Abeni, Rafael J . Wysocki,
	Rik van Riel, Thomas Gleixner, Wanpeng Li, Mel Gorman,
	Mike Galbraith

On Wed, May 18, 2016 at 09:41:20AM +0100, Matt Fleming wrote:
> On Wed, 18 May, at 03:01:27AM, Yuyang Du wrote:
> > On Tue, May 17, 2016 at 01:24:15PM +0100, Matt Fleming wrote:
> > > So, if the code looks like the following, either now or in the future,
> > > 
> > > static void __schedule(bool preempt)
> > > {
> > > 	...
> > > 	/* Clear RQCF_ACT_SKIP */
> > > 	rq->clock_update_flags = 0;
> > > 	...
> > > 	delta = rq_clock();
> > > }
> >  
> > Sigh, you even said "Clear RQCF_ACT_SKIP", but you not only clear it,
> > you clear everything.
> 
> That was sloppy on my part but intentional because that's what the
> code looks like in tip/sched/core today.
 
Sure, rq->clock_update_flags = 0 is itself all right, just say what you do.

> It was purely meant to demonstrate that setting RQCF_UPDATE just
> because RQCF_ACT_SKIP is set does not make sense. You can replace the
> clearing line with the correct bit masking operation.
 
I don't understand how you demonstrated that does not make sense. Anways,
you sort it out.

> But I get it, the pseudo-code was confusing. I'll send out a v2.
>
> > And if you clear the RQCF_UPDATE also (maybe you shouldn't, but
> > actually it does not matter), of course you will get a warning...
> 
> Right, I wouldn't actually clear RQCF_UPDATE in v2 of this patch.
> 
> > In addition, it looks like multiple skips are possible, so:
>  
> I'm not sure what you mean, could you elaborate?
> 
> > update_rq_clock() {
> >         rq->clock_update_flags |= RQCF_UPDATE;
> > 
> >         ...
> > }
> > 
> > instead of clearing the skip flag there.
> 
> Huh? RQCF_*_SKIP are not cleared in update_rq_clock().

Yeah, I previously cleared the SKIP bit, which is not right, so I corrected.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-05-19  6:32 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-12 19:49 [RFC][PATCH 0/5] sched: Diagnostic checks for missing rq clock updates Matt Fleming
2016-05-12 19:49 ` [RFC][PATCH 1/5] sched/fair: Update the rq clock before detaching tasks Matt Fleming
2016-05-12 19:49 ` [RFC][PATCH 2/5] sched: Add wrappers for lockdep_(un)pin_lock() Matt Fleming
2016-05-12 19:49 ` [RFC][PATCH 3/5] sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock Matt Fleming
2016-05-12 19:49 ` [RFC][PATCH 4/5] sched/fair: Push rq lock pin/unpin into idle_balance() Matt Fleming
2016-05-12 19:49 ` [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock() Matt Fleming
2016-05-15  2:14   ` Yuyang Du
2016-05-16  9:46     ` Matt Fleming
2016-05-16 20:11       ` Yuyang Du
2016-05-17 12:24         ` Matt Fleming
2016-05-17 19:01           ` Yuyang Du
2016-05-18  8:41             ` Matt Fleming
2016-05-18 22:51               ` Yuyang Du

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).