linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/3] sched: skip_clock_update madness
@ 2015-01-05 10:18 Peter Zijlstra
  2015-01-05 10:18 ` [RFC][PATCH 1/3] sched: Validate rq_clock*() serialization Peter Zijlstra
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Peter Zijlstra @ 2015-01-05 10:18 UTC (permalink / raw)
  To: mingo, umgwanakikbuti; +Cc: linux-kernel, torvalds, Peter Zijlstra

Hi,

So the big lockup thread got me looking at the skip_clock_update stuff again
and here's what I came up with.

It seems to build a kernel without generating lockdep splats.

Mike can you see if it cures the wobblies you were seeing?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC][PATCH 1/3] sched: Validate rq_clock*() serialization
  2015-01-05 10:18 [RFC][PATCH 0/3] sched: skip_clock_update madness Peter Zijlstra
@ 2015-01-05 10:18 ` Peter Zijlstra
  2015-01-14 14:03   ` [tip:sched/core] sched/core: " tip-bot for Peter Zijlstra
  2015-01-05 10:18 ` [RFC][PATCH 2/3] sched: Rework rq->clock update skips Peter Zijlstra
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2015-01-05 10:18 UTC (permalink / raw)
  To: mingo, umgwanakikbuti; +Cc: linux-kernel, torvalds, Peter Zijlstra

[-- Attachment #1: peterz-sched-rq_clock-locking.patch --]
[-- Type: text/plain, Size: 1270 bytes --]

rq->clock{,_task} are serialized by rq->lock, verify this.

One immediate fail is the usage in scale_rt_capability, so 'annotate'
that for now, there's more 'funny' there. Maybe change rq->lock into a
raw_seqlock_t?

(Only 32bit is affected)

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/fair.c  |    2 +-
 kernel/sched/sched.h |    7 +++++++
 2 files changed, 8 insertions(+), 1 deletion(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5945,8 +5945,8 @@ static unsigned long scale_rt_capacity(i
 	 */
 	age_stamp = ACCESS_ONCE(rq->age_stamp);
 	avg = ACCESS_ONCE(rq->rt_avg);
+	delta = __rq_clock_broken(rq) - age_stamp;
 
-	delta = rq_clock(rq) - age_stamp;
 	if (unlikely(delta < 0))
 		delta = 0;
 
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -687,13 +687,20 @@ DECLARE_PER_CPU_SHARED_ALIGNED(struct rq
 #define cpu_curr(cpu)		(cpu_rq(cpu)->curr)
 #define raw_rq()		raw_cpu_ptr(&runqueues)
 
+static inline u64 __rq_clock_broken(struct rq *rq)
+{
+	return ACCESS_ONCE(rq->clock);
+}
+
 static inline u64 rq_clock(struct rq *rq)
 {
+	lockdep_assert_held(&rq->lock);
 	return rq->clock;
 }
 
 static inline u64 rq_clock_task(struct rq *rq)
 {
+	lockdep_assert_held(&rq->lock);
 	return rq->clock_task;
 }
 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC][PATCH 2/3] sched: Rework rq->clock update skips
  2015-01-05 10:18 [RFC][PATCH 0/3] sched: skip_clock_update madness Peter Zijlstra
  2015-01-05 10:18 ` [RFC][PATCH 1/3] sched: Validate rq_clock*() serialization Peter Zijlstra
@ 2015-01-05 10:18 ` Peter Zijlstra
  2015-01-14 14:03   ` [tip:sched/core] sched/core: " tip-bot for Peter Zijlstra
  2015-01-05 10:18 ` [RFC][PATCH 3/3] sched,debug: Print clock_task Peter Zijlstra
  2015-01-05 10:49 ` [RFC][PATCH 0/3] sched: skip_clock_update madness Mike Galbraith
  3 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2015-01-05 10:18 UTC (permalink / raw)
  To: mingo, umgwanakikbuti; +Cc: linux-kernel, torvalds, Peter Zijlstra

[-- Attachment #1: peterz-sched-skip_clock.patch --]
[-- Type: text/plain, Size: 4001 bytes --]

The original purpose of rq::skip_clock_update was to avoid 'costly' clock
updates for back to back wakeup-preempt pairs. The big problem with it
has always been that the rq variable is unaware of the context and
causes indiscrimiate clock skips.

Rework the entire thing and create a sense of context by only allowing
schedule() to skip clock updates. (XXX can we measure the cost of the
added store?)

By ensuring only schedule can ever skip an update, we guarantee we're
never more than 1 tick behind on the update.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c  |   12 ++++++++----
 kernel/sched/fair.c  |    2 +-
 kernel/sched/rt.c    |    9 ++++++---
 kernel/sched/sched.h |   15 +++++++++++++--
 4 files changed, 28 insertions(+), 10 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -119,7 +119,9 @@ void update_rq_clock(struct rq *rq)
 {
 	s64 delta;
 
-	if (rq->skip_clock_update > 0)
+	lockdep_assert_held(&rq->lock);
+
+	if (rq->clock_skip_update & RQCF_ACT_SKIP)
 		return;
 
 	delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;
@@ -1046,7 +1048,7 @@ void check_preempt_curr(struct rq *rq, s
 	 * this case, we can save a useless back to back clock update.
 	 */
 	if (task_on_rq_queued(rq->curr) && test_tsk_need_resched(rq->curr))
-		rq->skip_clock_update = 1;
+		rq_clock_skip_update(rq, true);
 }
 
 #ifdef CONFIG_SMP
@@ -2776,6 +2778,8 @@ static void __sched __schedule(void)
 	smp_mb__before_spinlock();
 	raw_spin_lock_irq(&rq->lock);
 
+	rq->clock_skip_update <<= 1; /* promote REQ to ACT */
+
 	switch_count = &prev->nivcsw;
 	if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
 		if (unlikely(signal_pending_state(prev->state, prev))) {
@@ -2800,13 +2804,13 @@ static void __sched __schedule(void)
 		switch_count = &prev->nvcsw;
 	}
 
-	if (task_on_rq_queued(prev) || rq->skip_clock_update < 0)
+	if (task_on_rq_queued(prev))
 		update_rq_clock(rq);
 
 	next = pick_next_task(rq, prev);
 	clear_tsk_need_resched(prev);
 	clear_preempt_need_resched();
-	rq->skip_clock_update = 0;
+	rq->clock_skip_update = 0;
 
 	if (likely(prev != next)) {
 		rq->nr_switches++;
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5153,7 +5153,7 @@ static void yield_task_fair(struct rq *r
 		 * so we don't do microscopic update in schedule()
 		 * and double the fastpath cost.
 		 */
-		 rq->skip_clock_update = 1;
+		rq_clock_skip_update(rq, true);
 	}
 
 	set_skip_buddy(se);
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -831,11 +831,14 @@ static int do_sched_rt_period_timer(stru
 				enqueue = 1;
 
 				/*
-				 * Force a clock update if the CPU was idle,
-				 * lest wakeup -> unthrottle time accumulate.
+				 * When we're idle and a woken (rt) task is
+				 * throttled check_preempt_curr() will set
+				 * skip_update and the time between the wakeup
+				 * and this unthrottle will get accounted as
+				 * 'runtime'.
 				 */
 				if (rt_rq->rt_nr_running && rq->curr == rq->idle)
-					rq->skip_clock_update = -1;
+					rq_clock_skip_update(rq, false);
 			}
 			if (rt_rq->rt_time || rt_rq->rt_nr_running)
 				idle = 0;
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -558,8 +558,6 @@ struct rq {
 #ifdef CONFIG_NO_HZ_FULL
 	unsigned long last_sched_tick;
 #endif
-	int skip_clock_update;
-
 	/* capture load from *all* tasks on this cpu: */
 	struct load_weight load;
 	unsigned long nr_load_updates;
@@ -588,6 +586,7 @@ struct rq {
 	unsigned long next_balance;
 	struct mm_struct *prev_mm;
 
+	unsigned int clock_skip_update;
 	u64 clock;
 	u64 clock_task;
 
@@ -704,6 +703,18 @@ static inline u64 rq_clock_task(struct r
 	return rq->clock_task;
 }
 
+#define RQCF_REQ_SKIP	0x01
+#define RQCF_ACT_SKIP	0x02
+
+static inline void rq_clock_skip_update(struct rq *rq, bool skip)
+{
+	lockdep_assert_held(&rq->lock);
+	if (skip)
+		rq->clock_skip_update |= RQCF_REQ_SKIP;
+	else
+		rq->clock_skip_update &= ~RQCF_REQ_SKIP;
+}
+
 #ifdef CONFIG_NUMA
 enum numa_topology_type {
 	NUMA_DIRECT,



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC][PATCH 3/3] sched,debug: Print clock_task
  2015-01-05 10:18 [RFC][PATCH 0/3] sched: skip_clock_update madness Peter Zijlstra
  2015-01-05 10:18 ` [RFC][PATCH 1/3] sched: Validate rq_clock*() serialization Peter Zijlstra
  2015-01-05 10:18 ` [RFC][PATCH 2/3] sched: Rework rq->clock update skips Peter Zijlstra
@ 2015-01-05 10:18 ` Peter Zijlstra
  2015-01-14 14:03   ` [tip:sched/core] sched/debug: Print rq->clock_task tip-bot for Peter Zijlstra
  2015-01-05 10:49 ` [RFC][PATCH 0/3] sched: skip_clock_update madness Mike Galbraith
  3 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2015-01-05 10:18 UTC (permalink / raw)
  To: mingo, umgwanakikbuti; +Cc: linux-kernel, torvalds, Peter Zijlstra

[-- Attachment #1: peterz-sched-debug-clock_task.patch --]
[-- Type: text/plain, Size: 489 bytes --]

We seem to have forgotten adding it to the debug output like
forever... do so now.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/debug.c |    1 +
 1 file changed, 1 insertion(+)

--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -305,6 +305,7 @@ do {									\
 	PN(next_balance);
 	SEQ_printf(m, "  .%-30s: %ld\n", "curr->pid", (long)(task_pid_nr(rq->curr)));
 	PN(clock);
+	PN(clock_task);
 	P(cpu_load[0]);
 	P(cpu_load[1]);
 	P(cpu_load[2]);



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH 0/3] sched: skip_clock_update madness
  2015-01-05 10:18 [RFC][PATCH 0/3] sched: skip_clock_update madness Peter Zijlstra
                   ` (2 preceding siblings ...)
  2015-01-05 10:18 ` [RFC][PATCH 3/3] sched,debug: Print clock_task Peter Zijlstra
@ 2015-01-05 10:49 ` Mike Galbraith
  2015-01-05 11:50   ` Peter Zijlstra
  2015-01-05 13:59   ` Peter Zijlstra
  3 siblings, 2 replies; 14+ messages in thread
From: Mike Galbraith @ 2015-01-05 10:49 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, torvalds

On Mon, 2015-01-05 at 11:18 +0100, Peter Zijlstra wrote: 
> Hi,
> 
> So the big lockup thread got me looking at the skip_clock_update stuff again
> and here's what I came up with.
> 
> It seems to build a kernel without generating lockdep splats.
> 
> Mike can you see if it cures the wobblies you were seeing?

Nope, those hiccups were production IO beasts from hell staying in the
kernel for ages at a time.  Watchdog being falsely credited with what is
usually dinky wakeup -> switch delta throttled it effectively forever.

-Mike  



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH 0/3] sched: skip_clock_update madness
  2015-01-05 10:49 ` [RFC][PATCH 0/3] sched: skip_clock_update madness Mike Galbraith
@ 2015-01-05 11:50   ` Peter Zijlstra
  2015-01-05 12:07     ` Mike Galbraith
  2015-01-05 13:59   ` Peter Zijlstra
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2015-01-05 11:50 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: mingo, linux-kernel, torvalds

On Mon, Jan 05, 2015 at 11:49:09AM +0100, Mike Galbraith wrote:
> On Mon, 2015-01-05 at 11:18 +0100, Peter Zijlstra wrote: 
> > Hi,
> > 
> > So the big lockup thread got me looking at the skip_clock_update stuff again
> > and here's what I came up with.
> > 
> > It seems to build a kernel without generating lockdep splats.
> > 
> > Mike can you see if it cures the wobblies you were seeing?
> 
> Nope, those hiccups were production IO beasts from hell staying in the
> kernel for ages at a time.  Watchdog being falsely credited with what is
> usually dinky wakeup -> switch delta throttled it effectively forever.

OK, It's Monday and I'm still trying to bootstrap my post xmas brain. Is
that: Nope - can't check, or: Nope - doesn't fix wobblies ?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH 0/3] sched: skip_clock_update madness
  2015-01-05 11:50   ` Peter Zijlstra
@ 2015-01-05 12:07     ` Mike Galbraith
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Galbraith @ 2015-01-05 12:07 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, torvalds

On Mon, 2015-01-05 at 12:50 +0100, Peter Zijlstra wrote: 
> On Mon, Jan 05, 2015 at 11:49:09AM +0100, Mike Galbraith wrote:
> > On Mon, 2015-01-05 at 11:18 +0100, Peter Zijlstra wrote: 
> > > Hi,
> > > 
> > > So the big lockup thread got me looking at the skip_clock_update stuff again
> > > and here's what I came up with.
> > > 
> > > It seems to build a kernel without generating lockdep splats.
> > > 
> > > Mike can you see if it cures the wobblies you were seeing?
> > 
> > Nope, those hiccups were production IO beasts from hell staying in the
> > kernel for ages at a time.  Watchdog being falsely credited with what is
> > usually dinky wakeup -> switch delta throttled it effectively forever.
> 
> OK, It's Monday and I'm still trying to bootstrap my post xmas brain. Is
> that: Nope - can't check, or: Nope - doesn't fix wobblies ?

That's: nope, the boxen that wobbled to death are not at my disposal.

-Mike


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH 0/3] sched: skip_clock_update madness
  2015-01-05 10:49 ` [RFC][PATCH 0/3] sched: skip_clock_update madness Mike Galbraith
  2015-01-05 11:50   ` Peter Zijlstra
@ 2015-01-05 13:59   ` Peter Zijlstra
  2015-01-06  5:55     ` Mike Galbraith
  1 sibling, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2015-01-05 13:59 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: mingo, linux-kernel, torvalds

On Mon, Jan 05, 2015 at 11:49:09AM +0100, Mike Galbraith wrote:
> On Mon, 2015-01-05 at 11:18 +0100, Peter Zijlstra wrote: 
> > Hi,
> > 
> > So the big lockup thread got me looking at the skip_clock_update stuff again
> > and here's what I came up with.
> > 
> > It seems to build a kernel without generating lockdep splats.
> > 
> > Mike can you see if it cures the wobblies you were seeing?
> 
> Nope, those hiccups were production IO beasts from hell staying in the
> kernel for ages at a time.  Watchdog being falsely credited with what is
> usually dinky wakeup -> switch delta throttled it effectively forever.

OK, fair enough. Does the thing otherwise look sane to you?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC][PATCH 0/3] sched: skip_clock_update madness
  2015-01-05 13:59   ` Peter Zijlstra
@ 2015-01-06  5:55     ` Mike Galbraith
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Galbraith @ 2015-01-06  5:55 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel, torvalds

On Mon, 2015-01-05 at 14:59 +0100, Peter Zijlstra wrote:

> OK, fair enough. Does the thing otherwise look sane to you?

Yup, should do the trick.

-Mike


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tip:sched/core] sched/core: Validate rq_clock*() serialization
  2015-01-05 10:18 ` [RFC][PATCH 1/3] sched: Validate rq_clock*() serialization Peter Zijlstra
@ 2015-01-14 14:03   ` tip-bot for Peter Zijlstra
  2015-05-30  1:03     ` Sasha Levin
  0 siblings, 1 reply; 14+ messages in thread
From: tip-bot for Peter Zijlstra @ 2015-01-14 14:03 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: torvalds, linux-kernel, mingo, peterz, tglx, hpa

Commit-ID:  cebde6d681aa45f96111cfcffc1544cf2a0454ff
Gitweb:     http://git.kernel.org/tip/cebde6d681aa45f96111cfcffc1544cf2a0454ff
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 5 Jan 2015 11:18:10 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 14 Jan 2015 13:34:19 +0100

sched/core: Validate rq_clock*() serialization

rq->clock{,_task} are serialized by rq->lock, verify this.

One immediate fail is the usage in scale_rt_capability, so 'annotate'
that for now, there's more 'funny' there. Maybe change rq->lock into a
raw_seqlock_t?

(Only 32-bit is affected)

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150105103554.361872747@infradead.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: umgwanakikbuti@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c  | 2 +-
 kernel/sched/sched.h | 7 +++++++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2a0b302..50ff902 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5948,8 +5948,8 @@ static unsigned long scale_rt_capacity(int cpu)
 	 */
 	age_stamp = ACCESS_ONCE(rq->age_stamp);
 	avg = ACCESS_ONCE(rq->rt_avg);
+	delta = __rq_clock_broken(rq) - age_stamp;
 
-	delta = rq_clock(rq) - age_stamp;
 	if (unlikely(delta < 0))
 		delta = 0;
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 9a2a45c..bd23732 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -687,13 +687,20 @@ DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
 #define cpu_curr(cpu)		(cpu_rq(cpu)->curr)
 #define raw_rq()		raw_cpu_ptr(&runqueues)
 
+static inline u64 __rq_clock_broken(struct rq *rq)
+{
+	return ACCESS_ONCE(rq->clock);
+}
+
 static inline u64 rq_clock(struct rq *rq)
 {
+	lockdep_assert_held(&rq->lock);
 	return rq->clock;
 }
 
 static inline u64 rq_clock_task(struct rq *rq)
 {
+	lockdep_assert_held(&rq->lock);
 	return rq->clock_task;
 }
 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [tip:sched/core] sched/core: Rework rq->clock update skips
  2015-01-05 10:18 ` [RFC][PATCH 2/3] sched: Rework rq->clock update skips Peter Zijlstra
@ 2015-01-14 14:03   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 14+ messages in thread
From: tip-bot for Peter Zijlstra @ 2015-01-14 14:03 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: peterz, linux-kernel, torvalds, mingo, tglx, hpa

Commit-ID:  9edfbfed3f544a7830d99b341f0c175995a02950
Gitweb:     http://git.kernel.org/tip/9edfbfed3f544a7830d99b341f0c175995a02950
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 5 Jan 2015 11:18:11 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 14 Jan 2015 13:34:20 +0100

sched/core: Rework rq->clock update skips

The original purpose of rq::skip_clock_update was to avoid 'costly' clock
updates for back to back wakeup-preempt pairs. The big problem with it
has always been that the rq variable is unaware of the context and
causes indiscrimiate clock skips.

Rework the entire thing and create a sense of context by only allowing
schedule() to skip clock updates. (XXX can we measure the cost of the
added store?)

By ensuring only schedule can ever skip an update, we guarantee we're
never more than 1 tick behind on the update.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20150105103554.432381549@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c  | 12 ++++++++----
 kernel/sched/fair.c  |  2 +-
 kernel/sched/rt.c    |  9 ++++++---
 kernel/sched/sched.h | 15 +++++++++++++--
 4 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 46a2345..b53cc85 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -119,7 +119,9 @@ void update_rq_clock(struct rq *rq)
 {
 	s64 delta;
 
-	if (rq->skip_clock_update > 0)
+	lockdep_assert_held(&rq->lock);
+
+	if (rq->clock_skip_update & RQCF_ACT_SKIP)
 		return;
 
 	delta = sched_clock_cpu(cpu_of(rq)) - rq->clock;
@@ -1046,7 +1048,7 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
 	 * this case, we can save a useless back to back clock update.
 	 */
 	if (task_on_rq_queued(rq->curr) && test_tsk_need_resched(rq->curr))
-		rq->skip_clock_update = 1;
+		rq_clock_skip_update(rq, true);
 }
 
 #ifdef CONFIG_SMP
@@ -2779,6 +2781,8 @@ need_resched:
 	smp_mb__before_spinlock();
 	raw_spin_lock_irq(&rq->lock);
 
+	rq->clock_skip_update <<= 1; /* promote REQ to ACT */
+
 	switch_count = &prev->nivcsw;
 	if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
 		if (unlikely(signal_pending_state(prev->state, prev))) {
@@ -2803,13 +2807,13 @@ need_resched:
 		switch_count = &prev->nvcsw;
 	}
 
-	if (task_on_rq_queued(prev) || rq->skip_clock_update < 0)
+	if (task_on_rq_queued(prev))
 		update_rq_clock(rq);
 
 	next = pick_next_task(rq, prev);
 	clear_tsk_need_resched(prev);
 	clear_preempt_need_resched();
-	rq->skip_clock_update = 0;
+	rq->clock_skip_update = 0;
 
 	if (likely(prev != next)) {
 		rq->nr_switches++;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 50ff902..2ecf779 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5156,7 +5156,7 @@ static void yield_task_fair(struct rq *rq)
 		 * so we don't do microscopic update in schedule()
 		 * and double the fastpath cost.
 		 */
-		 rq->skip_clock_update = 1;
+		rq_clock_skip_update(rq, true);
 	}
 
 	set_skip_buddy(se);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index ee15f5a..6725e3c 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -831,11 +831,14 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
 				enqueue = 1;
 
 				/*
-				 * Force a clock update if the CPU was idle,
-				 * lest wakeup -> unthrottle time accumulate.
+				 * When we're idle and a woken (rt) task is
+				 * throttled check_preempt_curr() will set
+				 * skip_update and the time between the wakeup
+				 * and this unthrottle will get accounted as
+				 * 'runtime'.
 				 */
 				if (rt_rq->rt_nr_running && rq->curr == rq->idle)
-					rq->skip_clock_update = -1;
+					rq_clock_skip_update(rq, false);
 			}
 			if (rt_rq->rt_time || rt_rq->rt_nr_running)
 				idle = 0;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index bd23732..0870db2 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -558,8 +558,6 @@ struct rq {
 #ifdef CONFIG_NO_HZ_FULL
 	unsigned long last_sched_tick;
 #endif
-	int skip_clock_update;
-
 	/* capture load from *all* tasks on this cpu: */
 	struct load_weight load;
 	unsigned long nr_load_updates;
@@ -588,6 +586,7 @@ struct rq {
 	unsigned long next_balance;
 	struct mm_struct *prev_mm;
 
+	unsigned int clock_skip_update;
 	u64 clock;
 	u64 clock_task;
 
@@ -704,6 +703,18 @@ static inline u64 rq_clock_task(struct rq *rq)
 	return rq->clock_task;
 }
 
+#define RQCF_REQ_SKIP	0x01
+#define RQCF_ACT_SKIP	0x02
+
+static inline void rq_clock_skip_update(struct rq *rq, bool skip)
+{
+	lockdep_assert_held(&rq->lock);
+	if (skip)
+		rq->clock_skip_update |= RQCF_REQ_SKIP;
+	else
+		rq->clock_skip_update &= ~RQCF_REQ_SKIP;
+}
+
 #ifdef CONFIG_NUMA
 enum numa_topology_type {
 	NUMA_DIRECT,

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [tip:sched/core] sched/debug: Print rq->clock_task
  2015-01-05 10:18 ` [RFC][PATCH 3/3] sched,debug: Print clock_task Peter Zijlstra
@ 2015-01-14 14:03   ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 14+ messages in thread
From: tip-bot for Peter Zijlstra @ 2015-01-14 14:03 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: peterz, linux-kernel, hpa, mingo, torvalds, tglx

Commit-ID:  5a5375977b721503e4d6b37ab8982902cd2d10b3
Gitweb:     http://git.kernel.org/tip/5a5375977b721503e4d6b37ab8982902cd2d10b3
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 5 Jan 2015 11:18:12 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 14 Jan 2015 13:34:22 +0100

sched/debug: Print rq->clock_task

We seem to have forgotten adding it to the debug output like
forever... do so now.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150105103554.495253233@infradead.org
Cc: umgwanakikbuti@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/debug.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 92cc520..8baaf85 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -305,6 +305,7 @@ do {									\
 	PN(next_balance);
 	SEQ_printf(m, "  .%-30s: %ld\n", "curr->pid", (long)(task_pid_nr(rq->curr)));
 	PN(clock);
+	PN(clock_task);
 	P(cpu_load[0]);
 	P(cpu_load[1]);
 	P(cpu_load[2]);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [tip:sched/core] sched/core: Validate rq_clock*() serialization
  2015-01-14 14:03   ` [tip:sched/core] sched/core: " tip-bot for Peter Zijlstra
@ 2015-05-30  1:03     ` Sasha Levin
  2015-06-04  2:28       ` Wanpeng Li
  0 siblings, 1 reply; 14+ messages in thread
From: Sasha Levin @ 2015-05-30  1:03 UTC (permalink / raw)
  To: linux-kernel, torvalds, tglx, peterz, hpa, mingo, linux-tip-commits

On 01/14/2015 09:03 AM, tip-bot for Peter Zijlstra wrote:
> Commit-ID:  cebde6d681aa45f96111cfcffc1544cf2a0454ff
> Gitweb:     http://git.kernel.org/tip/cebde6d681aa45f96111cfcffc1544cf2a0454ff
> Author:     Peter Zijlstra <peterz@infradead.org>
> AuthorDate: Mon, 5 Jan 2015 11:18:10 +0100
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Wed, 14 Jan 2015 13:34:19 +0100
> 
> sched/core: Validate rq_clock*() serialization
> 
> rq->clock{,_task} are serialized by rq->lock, verify this.
> 
> One immediate fail is the usage in scale_rt_capability, so 'annotate'
> that for now, there's more 'funny' there. Maybe change rq->lock into a
> raw_seqlock_t?
> 
> (Only 32-bit is affected)
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: http://lkml.kernel.org/r/20150105103554.361872747@infradead.org
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: umgwanakikbuti@gmail.com
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---

Hey Peter,

I'm seeing the following in the latest -next:

[ 2706.857941] ------------[ cut here ]------------
[ 2706.857963] WARNING: CPU: 6 PID: 14780 at kernel/sched/sched.h:728 update_curr+0x3db/0x550()
[ 2706.857971] Modules linked in:
[ 2706.857982] CPU: 6 PID: 14780 Comm: trinity-c2 Not tainted 4.1.0-rc5-next-20150529-sasha-00039-g7fd455d-dirty #2262
[ 2706.858000]  ffffffffa6a1426a 00000000c9f98986 ffff8802b6807b98 ffffffffa6a1426a
[ 2706.858015]  0000000000000000 0000000000000000 ffff8802b6807be8 ffffffff9d1e50c6
[ 2706.858029]  ffffffff9d1e52f5 ffffffff9d28a2fb ffff8800c3d0b068 ffff88050c1c3068
[ 2706.858033] Call Trace:
[ 2706.858054] <IRQ> ? dump_stack (lib/dump_stack.c:52)
[ 2706.858064] dump_stack (lib/dump_stack.c:52)
[ 2706.858077] warn_slowpath_common (kernel/panic.c:448)
[ 2706.858085] ? warn_slowpath_null (kernel/panic.c:479)
[ 2706.858092] ? update_curr (kernel/sched/sched.h:728 kernel/sched/fair.c:698)
[ 2706.858102] warn_slowpath_null (kernel/panic.c:482)
[ 2706.858110] update_curr (kernel/sched/sched.h:728 kernel/sched/fair.c:698)
[ 2706.858116] ? update_curr (kernel/sched/fair.c:697)
[ 2706.858128] task_tick_fair (include/linux/sched.h:3049 kernel/sched/fair.c:397 kernel/sched/fair.c:2795 kernel/sched/fair.c:3372 kernel/sched/fair.c:8000)
[ 2706.858141] ? sched_clock_cpu (kernel/sched/clock.c:315)
[ 2706.858151] ? task_tick_fair (kernel/sched/fair.c:7994)
[ 2706.858162] scheduler_tick (kernel/sched/core.c:2523)
[ 2706.858177] update_process_times (kernel/time/timer.c:1399)
[ 2706.858188] tick_sched_handle.isra.12 (kernel/time/tick-sched.c:152)
[ 2706.858196] ? tick_sched_handle.isra.12 (kernel/time/tick-sched.c:134)
[ 2706.858206] tick_sched_timer (kernel/time/tick-sched.c:1075)
[ 2706.858217] __hrtimer_run_queues (kernel/time/hrtimer.c:1138 kernel/time/hrtimer.c:1194)
[ 2706.858228] ? ftrace_call (arch/x86/kernel/mcount_64.S:158)
[ 2706.858237] ? tick_sched_do_timer (kernel/time/tick-sched.c:1059)
[ 2706.858246] ? hrtimer_fixup_init (kernel/time/hrtimer.c:1161)
[ 2706.858256] ? __hrtimer_run_queues (kernel/time/hrtimer.c:1161)
[ 2706.858267] hrtimer_interrupt (kernel/time/hrtimer.c:1231)
[ 2706.858281] ? preempt_schedule_context (kernel/sched/core.c:2941 (discriminator 1) include/linux/jump_label.h:125 (discriminator 1) include/linux/context_tracking_state.h:29 (discriminator 1) include/linux/context_tracking.h:34 (discriminator 1) kernel/sched/core.c:2947 (discriminator 1))
[ 2706.858293] local_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:891)
[ 2706.858301] ? local_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:863)
[ 2706.858311] smp_apic_timer_interrupt (./arch/x86/include/asm/apic.h:655 arch/x86/kernel/apic/apic.c:915)
[ 2706.858323] apic_timer_interrupt (arch/x86/kernel/entry_64.S:911)
[ 2706.858332]  <EOI>
[ 2706.858333] ---[ end trace b7e04749c55c3d4e ]---


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [tip:sched/core] sched/core: Validate rq_clock*() serialization
  2015-05-30  1:03     ` Sasha Levin
@ 2015-06-04  2:28       ` Wanpeng Li
  0 siblings, 0 replies; 14+ messages in thread
From: Wanpeng Li @ 2015-06-04  2:28 UTC (permalink / raw)
  To: Sasha Levin, linux-kernel, torvalds, tglx, peterz, hpa, mingo,
	linux-tip-commits


On 5/30/15 9:03 AM, Sasha Levin wrote:
> On 01/14/2015 09:03 AM, tip-bot for Peter Zijlstra wrote:
>> Commit-ID:  cebde6d681aa45f96111cfcffc1544cf2a0454ff
>> Gitweb:     http://git.kernel.org/tip/cebde6d681aa45f96111cfcffc1544cf2a0454ff
>> Author:     Peter Zijlstra <peterz@infradead.org>
>> AuthorDate: Mon, 5 Jan 2015 11:18:10 +0100
>> Committer:  Ingo Molnar <mingo@kernel.org>
>> CommitDate: Wed, 14 Jan 2015 13:34:19 +0100
>>
>> sched/core: Validate rq_clock*() serialization
>>
>> rq->clock{,_task} are serialized by rq->lock, verify this.
>>
>> One immediate fail is the usage in scale_rt_capability, so 'annotate'
>> that for now, there's more 'funny' there. Maybe change rq->lock into a
>> raw_seqlock_t?
>>
>> (Only 32-bit is affected)
>>
>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>> Link: http://lkml.kernel.org/r/20150105103554.361872747@infradead.org
>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>> Cc: umgwanakikbuti@gmail.com
>> Signed-off-by: Ingo Molnar <mingo@kernel.org>
>> ---
> Hey Peter,
>
> I'm seeing the following in the latest -next:

How to reproduce? I didn't see it w/ just boot test.

Regards,
Wanpeng Li

>
> [ 2706.857941] ------------[ cut here ]------------
> [ 2706.857963] WARNING: CPU: 6 PID: 14780 at kernel/sched/sched.h:728 update_curr+0x3db/0x550()
> [ 2706.857971] Modules linked in:
> [ 2706.857982] CPU: 6 PID: 14780 Comm: trinity-c2 Not tainted 4.1.0-rc5-next-20150529-sasha-00039-g7fd455d-dirty #2262
> [ 2706.858000]  ffffffffa6a1426a 00000000c9f98986 ffff8802b6807b98 ffffffffa6a1426a
> [ 2706.858015]  0000000000000000 0000000000000000 ffff8802b6807be8 ffffffff9d1e50c6
> [ 2706.858029]  ffffffff9d1e52f5 ffffffff9d28a2fb ffff8800c3d0b068 ffff88050c1c3068
> [ 2706.858033] Call Trace:
> [ 2706.858054] <IRQ> ? dump_stack (lib/dump_stack.c:52)
> [ 2706.858064] dump_stack (lib/dump_stack.c:52)
> [ 2706.858077] warn_slowpath_common (kernel/panic.c:448)
> [ 2706.858085] ? warn_slowpath_null (kernel/panic.c:479)
> [ 2706.858092] ? update_curr (kernel/sched/sched.h:728 kernel/sched/fair.c:698)
> [ 2706.858102] warn_slowpath_null (kernel/panic.c:482)
> [ 2706.858110] update_curr (kernel/sched/sched.h:728 kernel/sched/fair.c:698)
> [ 2706.858116] ? update_curr (kernel/sched/fair.c:697)
> [ 2706.858128] task_tick_fair (include/linux/sched.h:3049 kernel/sched/fair.c:397 kernel/sched/fair.c:2795 kernel/sched/fair.c:3372 kernel/sched/fair.c:8000)
> [ 2706.858141] ? sched_clock_cpu (kernel/sched/clock.c:315)
> [ 2706.858151] ? task_tick_fair (kernel/sched/fair.c:7994)
> [ 2706.858162] scheduler_tick (kernel/sched/core.c:2523)
> [ 2706.858177] update_process_times (kernel/time/timer.c:1399)
> [ 2706.858188] tick_sched_handle.isra.12 (kernel/time/tick-sched.c:152)
> [ 2706.858196] ? tick_sched_handle.isra.12 (kernel/time/tick-sched.c:134)
> [ 2706.858206] tick_sched_timer (kernel/time/tick-sched.c:1075)
> [ 2706.858217] __hrtimer_run_queues (kernel/time/hrtimer.c:1138 kernel/time/hrtimer.c:1194)
> [ 2706.858228] ? ftrace_call (arch/x86/kernel/mcount_64.S:158)
> [ 2706.858237] ? tick_sched_do_timer (kernel/time/tick-sched.c:1059)
> [ 2706.858246] ? hrtimer_fixup_init (kernel/time/hrtimer.c:1161)
> [ 2706.858256] ? __hrtimer_run_queues (kernel/time/hrtimer.c:1161)
> [ 2706.858267] hrtimer_interrupt (kernel/time/hrtimer.c:1231)
> [ 2706.858281] ? preempt_schedule_context (kernel/sched/core.c:2941 (discriminator 1) include/linux/jump_label.h:125 (discriminator 1) include/linux/context_tracking_state.h:29 (discriminator 1) include/linux/context_tracking.h:34 (discriminator 1) kernel/sched/core.c:2947 (discriminator 1))
> [ 2706.858293] local_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:891)
> [ 2706.858301] ? local_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:863)
> [ 2706.858311] smp_apic_timer_interrupt (./arch/x86/include/asm/apic.h:655 arch/x86/kernel/apic/apic.c:915)
> [ 2706.858323] apic_timer_interrupt (arch/x86/kernel/entry_64.S:911)
> [ 2706.858332]  <EOI>
> [ 2706.858333] ---[ end trace b7e04749c55c3d4e ]---
>
>
> Thanks,
> Sasha
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-06-04  2:34 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-05 10:18 [RFC][PATCH 0/3] sched: skip_clock_update madness Peter Zijlstra
2015-01-05 10:18 ` [RFC][PATCH 1/3] sched: Validate rq_clock*() serialization Peter Zijlstra
2015-01-14 14:03   ` [tip:sched/core] sched/core: " tip-bot for Peter Zijlstra
2015-05-30  1:03     ` Sasha Levin
2015-06-04  2:28       ` Wanpeng Li
2015-01-05 10:18 ` [RFC][PATCH 2/3] sched: Rework rq->clock update skips Peter Zijlstra
2015-01-14 14:03   ` [tip:sched/core] sched/core: " tip-bot for Peter Zijlstra
2015-01-05 10:18 ` [RFC][PATCH 3/3] sched,debug: Print clock_task Peter Zijlstra
2015-01-14 14:03   ` [tip:sched/core] sched/debug: Print rq->clock_task tip-bot for Peter Zijlstra
2015-01-05 10:49 ` [RFC][PATCH 0/3] sched: skip_clock_update madness Mike Galbraith
2015-01-05 11:50   ` Peter Zijlstra
2015-01-05 12:07     ` Mike Galbraith
2015-01-05 13:59   ` Peter Zijlstra
2015-01-06  5:55     ` Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).