All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched: Buggy comparison in check_preempt_tick
@ 2010-12-25  0:26 Venkatesh Pallipadi
  2010-12-25  7:50 ` Mike Galbraith
  0 siblings, 1 reply; 7+ messages in thread
From: Venkatesh Pallipadi @ 2010-12-25  0:26 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, Ranjit Manomohan, Venkatesh Pallipadi

A preempt comparison line in check_preempt_tick has two bugs.
* It compares signed and unsigned quantities, which breaks when signed
  quantity happens to be negative
* It compares runtime and vruntime, which breaks when there are niced tasks

The bug was initially found by linsched[1]. Change here fixes both
the problems.

On x86-64, the signed unsigned compare results in tasks running _longer_
than their expected time slice as a false resched_task() gets signalled after
4 ticks (on tick after preceding sysctl_sched_min_granularity check) and
currently running task gets picked again and runs for another ideal_slice
interval.

With 2 busy loops on a single CPU and trace_printks inside this buggy check
triggering resched task and in pick_next_task shows this:

 [001]   510.524336: pick_next_task_fair: loop (5939)
 [001]   510.536326: pick_next_task_fair: loop (5883)
 [001]   510.540319: task_tick_fair: delta -4897059, ideal_runtime 11994146
 [001]   510.540321: pick_next_task_fair: loop (5883)
 [001]   510.544306: task_tick_fair: delta -906540, ideal_runtime 11994146
 [001]   510.544309: pick_next_task_fair: loop (5883)
 [001]   510.556306: pick_next_task_fair: loop (5939)
 [001]   510.560301: task_tick_fair: delta -7105824, ideal_runtime 11994146
 [001]   510.560304: pick_next_task_fair: loop (5939)
 [001]   510.564298: task_tick_fair: delta -3105461, ideal_runtime 11994146
 [001]   510.564300: pick_next_task_fair: loop (5939)
 [001]   510.576288: pick_next_task_fair: loop (5883)
 [001]   510.580282: task_tick_fair: delta -4897210, ideal_runtime 11994146
 [001]   510.580285: pick_next_task_fair: loop (5883)
 [001]   510.584278: task_tick_fair: delta -897348, ideal_runtime 11994146
 [001]   510.584281: pick_next_task_fair: loop (5883)
 [001]   510.596269: pick_next_task_fair: loop (5939)

That is 20 ms slice for each task, with some redundant resched_tasks and
with the fix it is expected ~12ms slices (on 16 cpu system).

[1] - http://lwn.net/Articles/409680/

Signed-off-by: Venkatesh Pallipadi <venki@google.com>
---
 kernel/sched_fair.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 00ebd76..fc5ffbd 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -871,8 +871,10 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
 	if (cfs_rq->nr_running > 1) {
 		struct sched_entity *se = __pick_next_entity(cfs_rq);
 		s64 delta = curr->vruntime - se->vruntime;
+		unsigned long ideal_vruntime;
 
-		if (delta > ideal_runtime)
+		ideal_vruntime = calc_delta_fair(ideal_runtime, curr);
+		if (delta > (s64)ideal_vruntime)
 			resched_task(rq_of(cfs_rq)->curr);
 	}
 }
-- 
1.7.3.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched: Buggy comparison in check_preempt_tick
  2010-12-25  0:26 [PATCH] sched: Buggy comparison in check_preempt_tick Venkatesh Pallipadi
@ 2010-12-25  7:50 ` Mike Galbraith
  2010-12-26  0:05   ` Venkatesh Pallipadi
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Galbraith @ 2010-12-25  7:50 UTC (permalink / raw)
  To: Venkatesh Pallipadi
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Ranjit Manomohan

On Fri, 2010-12-24 at 16:26 -0800, Venkatesh Pallipadi wrote:
> A preempt comparison line in check_preempt_tick has two bugs.
> * It compares signed and unsigned quantities, which breaks when signed
>   quantity happens to be negative

Oops, that's a bug.

> * It compares runtime and vruntime, which breaks when there are niced tasks

This imho isn't.

vruntimes advance at different rates for differently weighted tasks, so
they're already weighted, as is ideal_runtime.

For wakeup preemption we use wakeup_gran() as the weighted ruler to
limit spread.  This test just uses a different weighted ruler for the
same purpose at tick time, one already computed.

If you're a heavily niced task, you were very likely booted before you
got this far.  If you haven't run for considerable wall time, you won't
get further, you keep on running.  You only get booted if giving you a
full wall slice will spread vruntimes too much, for which you'll pay if
you keep running, and leftmost will pay now if we don't boot you.

	-Mike


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched: Buggy comparison in check_preempt_tick
  2010-12-25  7:50 ` Mike Galbraith
@ 2010-12-26  0:05   ` Venkatesh Pallipadi
  2010-12-26  7:23     ` Mike Galbraith
  0 siblings, 1 reply; 7+ messages in thread
From: Venkatesh Pallipadi @ 2010-12-26  0:05 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Ranjit Manomohan

On Fri, Dec 24, 2010 at 11:50 PM, Mike Galbraith <efault@gmx.de> wrote:
> On Fri, 2010-12-24 at 16:26 -0800, Venkatesh Pallipadi wrote:
>> A preempt comparison line in check_preempt_tick has two bugs.
>> * It compares signed and unsigned quantities, which breaks when signed
>>   quantity happens to be negative
>
> Oops, that's a bug.
>
>> * It compares runtime and vruntime, which breaks when there are niced tasks
>
> This imho isn't.
>
> vruntimes advance at different rates for differently weighted tasks, so
> they're already weighted, as is ideal_runtime.
>
> For wakeup preemption we use wakeup_gran() as the weighted ruler to
> limit spread.  This test just uses a different weighted ruler for the
> same purpose at tick time, one already computed.
>
> If you're a heavily niced task, you were very likely booted before you
> got this far.  If you haven't run for considerable wall time, you won't
> get further, you keep on running.  You only get booted if giving you a
> full wall slice will spread vruntimes too much, for which you'll pay if
> you keep running, and leftmost will pay now if we don't boot you.
>

May be I am all confused.
- se->vruntime update in __update_curr() is based on
delta_exec_weighted ( i.e., calc_delta_fair(delta_exec, curr) with
delta_exec being wall clock time.
- We have earlier in check_preempt_tick(), ideal_runtime getting
compared with delta exec, which is wall clock time
- delta in check_preempt_tick is se->vruntime difference, which should
be the weighted time and comparing that with ideal_runtime (which is
compared with wall clock earlier) seems wrong.

This is debug output with trace_printk outside this if statement and
if based on (s64)ideal_runtime.
Task 17807 is nice 0 task and task 6363 is nice -2.
Ideal slice for Task 17807 is 8168151
Ideal slice for Task 6363 is 15798460
<...>-17807 [002] 15851.352247: task_tick_fair: delta 3538447 rt
8168151 vrt 10200199
<...>-17807 [002] 15851.353246: task_tick_fair: delta 4799848 rt
8168151 vrt 10200199
<...>-17807 [002] 15851.354245: task_tick_fair: delta 6036141 rt
8168151 vrt 10200199
<...>-17807 [002] 15851.355244: task_tick_fair: delta 7297361 rt
8168151 vrt 10200199
<...>-17807 [002] 15851.356244: task_tick_fair: delta 8546469 rt
8168151 vrt 10200199

Here delta is increasing faster than wall time due to relative lower
nice value of this task and we are comparing this delta with
ideal_slice, which is wall clock time and hence we preempt this task
earlier. No?

Thanks,
Venki

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched: Buggy comparison in check_preempt_tick
  2010-12-26  0:05   ` Venkatesh Pallipadi
@ 2010-12-26  7:23     ` Mike Galbraith
  2010-12-28  5:48       ` Mike Galbraith
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Galbraith @ 2010-12-26  7:23 UTC (permalink / raw)
  To: Venkatesh Pallipadi
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Ranjit Manomohan

On Sat, 2010-12-25 at 16:05 -0800, Venkatesh Pallipadi wrote: 
> On Fri, Dec 24, 2010 at 11:50 PM, Mike Galbraith <efault@gmx.de> wrote:
> > On Fri, 2010-12-24 at 16:26 -0800, Venkatesh Pallipadi wrote:
> >> A preempt comparison line in check_preempt_tick has two bugs.
> >> * It compares signed and unsigned quantities, which breaks when signed
> >>   quantity happens to be negative
> >
> > Oops, that's a bug.
> >
> >> * It compares runtime and vruntime, which breaks when there are niced tasks
> >
> > This imho isn't.
> >
> > vruntimes advance at different rates for differently weighted tasks, so
> > they're already weighted, as is ideal_runtime.
> >
> > For wakeup preemption we use wakeup_gran() as the weighted ruler to
> > limit spread.  This test just uses a different weighted ruler for the
> > same purpose at tick time, one already computed.
> >
> > If you're a heavily niced task, you were very likely booted before you
> > got this far.  If you haven't run for considerable wall time, you won't
> > get further, you keep on running.  You only get booted if giving you a
> > full wall slice will spread vruntimes too much, for which you'll pay if
> > you keep running, and leftmost will pay now if we don't boot you.
> >
> 
> May be I am all confused.
> - se->vruntime update in __update_curr() is based on
> delta_exec_weighted ( i.e., calc_delta_fair(delta_exec, curr) with
> delta_exec being wall clock time.

Yeah, vruntime is weighted, delta exec is not.

> - We have earlier in check_preempt_tick(), ideal_runtime getting
> compared with delta exec, which is wall clock time

Yeah, you can't run uninterrupted longer than your (weighted!) portion
of the latency target.

[to meet that target, you must be using HRTICK, and even precise tick
(precise? slice can change many times between timer set/fire yes?) won't
help your _wakeup_ latency target.] 

> - delta in check_preempt_tick is se->vruntime difference, which should
> be the weighted time and comparing that with ideal_runtime (which is
> compared with wall clock earlier) seems wrong.

What is sched_wakeup_granularity, and why is it different than
sched_latency?  Why is it OK to scale sched_wakeup_granularity, and use
it as a caliper, and NOT OK to scale sched_latency (which is what
__sched_period() returns, and sched_slice() scales) to do the same?

> This is debug output with trace_printk outside this if statement and
> if based on (s64)ideal_runtime.
> Task 17807 is nice 0 task and task 6363 is nice -2.
> Ideal slice for Task 17807 is 8168151
> Ideal slice for Task 6363 is 15798460
> <...>-17807 [002] 15851.352247: task_tick_fair: delta 3538447 rt
> 8168151 vrt 10200199
> <...>-17807 [002] 15851.353246: task_tick_fair: delta 4799848 rt
> 8168151 vrt 10200199
> <...>-17807 [002] 15851.354245: task_tick_fair: delta 6036141 rt
> 8168151 vrt 10200199
> <...>-17807 [002] 15851.355244: task_tick_fair: delta 7297361 rt
> 8168151 vrt 10200199
> <...>-17807 [002] 15851.356244: task_tick_fair: delta 8546469 rt
> 8168151 vrt 10200199
> 
> Here delta is increasing faster than wall time due to relative lower
> nice value of this task and we are comparing this delta with
> ideal_slice, which is wall clock time and hence we preempt this task
> earlier. No?

Is preempting sooner a bad thing if there's large spread?  It'll keep
him from having to wait so long for his next bit of CPU.

What the thing is supposed to help is this: wakeup happens just after a
hog gets the CPU, the wakee doesn't quite have enough lag to preempt, so
has to wait for nearly a full slice to get service.  That increases
spread, inducing latency spikes.

What would make more sense to me would be to use the exact same criteria
for wakeup/tick preemption, and do away with some knobs, just treat the
tick as nothing but another preemption opportunity.  Set your spread
target knob, and that's it.

But anyway..

echo NO_WAKEUP_PREEMPT > sched_features
echo NO_TESTME > sched_features
two hogs running on isolcpu 3, pid 6890 at nice -2

while sleep 1; do  grep 'pert.*6890' /proc/sched_debug; done

runnable tasks:
            task   PID         tree-key  switches  prio
-------------------------------------------------------
R           pert  6890     50201.071851      7453   118
R           pert  6890     50596.171290      7513   118  +60
R           pert  6890     50991.265264      7572   118  +59
R           pert  6890     51383.781965      7631   118  +59
            pert  6890     51781.463129      7691   118  +60

echo TESTME > sched_features
            pert  6890    126881.306733     18977   118
R           pert  6890    127273.825719     19036   118  +59
R           pert  6890    127668.928218     19095   118  +59
R           pert  6890    128064.031372     19154   118  +59
R           pert  6890    128459.134339     19213   118  +59

...with a compute load, the thing should be a noop, and appears to be so
(with busted compare fixed anyway;).  You have to be well overloaded for
buddies to kick in these days, so it's probably pretty hard to get
enough spread for the thing to fire.

---
 kernel/sched_fair.c     |    4 ++--
 kernel/sched_features.h |    1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

Index: linux-2.6.37.git/kernel/sched_fair.c
===================================================================
--- linux-2.6.37.git.orig/kernel/sched_fair.c
+++ linux-2.6.37.git/kernel/sched_fair.c
@@ -862,7 +862,7 @@ check_preempt_tick(struct cfs_rq *cfs_rq
 	 * narrow margin doesn't have to wait for a full slice.
 	 * This also mitigates buddy induced latencies under load.
 	 */
-	if (!sched_feat(WAKEUP_PREEMPT))
+	if (!sched_feat(TESTME))
 		return;
 
 	if (delta_exec < sysctl_sched_min_granularity)
@@ -872,7 +872,7 @@ check_preempt_tick(struct cfs_rq *cfs_rq
 		struct sched_entity *se = __pick_next_entity(cfs_rq);
 		s64 delta = curr->vruntime - se->vruntime;
 
-		if (delta > ideal_runtime)
+		if (delta > (s64)ideal_runtime)
 			resched_task(rq_of(cfs_rq)->curr);
 	}
 }
Index: linux-2.6.37.git/kernel/sched_features.h
===================================================================
--- linux-2.6.37.git.orig/kernel/sched_features.h
+++ linux-2.6.37.git/kernel/sched_features.h
@@ -66,3 +66,4 @@ SCHED_FEAT(OWNER_SPIN, 1)
  * Decrement CPU power based on irq activity
  */
 SCHED_FEAT(NONIRQ_POWER, 1)
+SCHED_FEAT(TESTME, 1)



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] sched: Buggy comparison in check_preempt_tick
  2010-12-26  7:23     ` Mike Galbraith
@ 2010-12-28  5:48       ` Mike Galbraith
  2011-01-05  4:41         ` [PATCH] " Mike Galbraith
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Galbraith @ 2010-12-28  5:48 UTC (permalink / raw)
  To: Venkatesh Pallipadi
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Ranjit Manomohan

On Sun, 2010-12-26 at 08:23 +0100, Mike Galbraith wrote:

> But anyway..
> 
> echo NO_WAKEUP_PREEMPT > sched_features
> echo NO_TESTME > sched_features
> two hogs running on isolcpu 3, pid 6890 at nice -2
> 
> while sleep 1; do  grep 'pert.*6890' /proc/sched_debug; done
> 
> runnable tasks:
>             task   PID         tree-key  switches  prio
> -------------------------------------------------------
> R           pert  6890     50201.071851      7453   118
> R           pert  6890     50596.171290      7513   118  +60
> R           pert  6890     50991.265264      7572   118  +59
> R           pert  6890     51383.781965      7631   118  +59
>             pert  6890     51781.463129      7691   118  +60
> 
> echo TESTME > sched_features
>             pert  6890    126881.306733     18977   118
> R           pert  6890    127273.825719     19036   118  +59
> R           pert  6890    127668.928218     19095   118  +59
> R           pert  6890    128064.031372     19154   118  +59
> R           pert  6890    128459.134339     19213   118  +59
> 
> ...with a compute load, the thing should be a noop, and appears to be so
> (with busted compare fixed anyway;).  You have to be well overloaded for
> buddies to kick in these days, so it's probably pretty hard to get
> enough spread for the thing to fire.

I did a bit more testing yesterday with wakeup loads.  There's enough
spread for the test to nudge things a few [0..4] times per second/core.

I'd either fix the comparison, and let it keep on nudging once in a
while, or whack the whole thing.

	-Mike


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] Re: [PATCH] sched: Buggy comparison in check_preempt_tick
  2010-12-28  5:48       ` Mike Galbraith
@ 2011-01-05  4:41         ` Mike Galbraith
  2011-01-18 19:06           ` [tip:sched/urgent] sched: Fix signed unsigned comparison in check_preempt_tick() tip-bot for Mike Galbraith
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Galbraith @ 2011-01-05  4:41 UTC (permalink / raw)
  To: Venkatesh Pallipadi
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Ranjit Manomohan

Going through my mailbox, I see this remains unaddressed.  I chose the
keep it option, but whack it and revisit later is also viable.

On Tue, 2010-12-28 at 06:48 +0100, Mike Galbraith wrote:
> On Sun, 2010-12-26 at 08:23 +0100, Mike Galbraith wrote:
> 
> > But anyway..
> > 
> > echo NO_WAKEUP_PREEMPT > sched_features
> > echo NO_TESTME > sched_features
> > two hogs running on isolcpu 3, pid 6890 at nice -2
> > 
> > while sleep 1; do  grep 'pert.*6890' /proc/sched_debug; done
> > 
> > runnable tasks:
> >             task   PID         tree-key  switches  prio
> > -------------------------------------------------------
> > R           pert  6890     50201.071851      7453   118
> > R           pert  6890     50596.171290      7513   118  +60
> > R           pert  6890     50991.265264      7572   118  +59
> > R           pert  6890     51383.781965      7631   118  +59
> >             pert  6890     51781.463129      7691   118  +60
> > 
> > echo TESTME > sched_features
> >             pert  6890    126881.306733     18977   118
> > R           pert  6890    127273.825719     19036   118  +59
> > R           pert  6890    127668.928218     19095   118  +59
> > R           pert  6890    128064.031372     19154   118  +59
> > R           pert  6890    128459.134339     19213   118  +59
> > 
> > ...with a compute load, the thing should be a noop, and appears to be so
> > (with busted compare fixed anyway;).  You have to be well overloaded for
> > buddies to kick in these days, so it's probably pretty hard to get
> > enough spread for the thing to fire.
> 
> I did a bit more testing yesterday with wakeup loads.  There's enough
> spread for the test to nudge things a few [0..4] times per second/core.
> 
> I'd either fix the comparison, and let it keep on nudging once in a
> while, or whack the whole thing.

sched: fix signed unsigned comparison in check_preempt_tick()

signed unsigned comparison may lead to superfluous resched if leftmost
is right of the current task, wasting a few cycles, and inadvertently
_lengthening_ the current task's slice.  

Reported-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>

---
 kernel/sched_fair.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.37.git/kernel/sched_fair.c
===================================================================
--- linux-2.6.37.git.orig/kernel/sched_fair.c
+++ linux-2.6.37.git/kernel/sched_fair.c
@@ -872,7 +872,7 @@ check_preempt_tick(struct cfs_rq *cfs_rq
 		struct sched_entity *se = __pick_next_entity(cfs_rq);
 		s64 delta = curr->vruntime - se->vruntime;
 
-		if (delta > ideal_runtime)
+		if (delta > (s64)ideal_runtime)
 			resched_task(rq_of(cfs_rq)->curr);
 	}
 }



^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip:sched/urgent] sched: Fix signed unsigned comparison in check_preempt_tick()
  2011-01-05  4:41         ` [PATCH] " Mike Galbraith
@ 2011-01-18 19:06           ` tip-bot for Mike Galbraith
  0 siblings, 0 replies; 7+ messages in thread
From: tip-bot for Mike Galbraith @ 2011-01-18 19:06 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, efault, tglx, venki, mingo

Commit-ID:  d7d8294415f0ce4254827d4a2a5ee88b00be52a8
Gitweb:     http://git.kernel.org/tip/d7d8294415f0ce4254827d4a2a5ee88b00be52a8
Author:     Mike Galbraith <efault@gmx.de>
AuthorDate: Wed, 5 Jan 2011 05:41:17 +0100
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Tue, 18 Jan 2011 15:09:44 +0100

sched: Fix signed unsigned comparison in check_preempt_tick()

Signed unsigned comparison may lead to superfluous resched if leftmost
is right of the current task, wasting a few cycles, and inadvertently
_lengthening_ the current task's slice.

Reported-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294202477.9384.5.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched_fair.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 414145c..77e9166 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1062,6 +1062,9 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
 		struct sched_entity *se = __pick_next_entity(cfs_rq);
 		s64 delta = curr->vruntime - se->vruntime;
 
+		if (delta < 0)
+			return;
+
 		if (delta > ideal_runtime)
 			resched_task(rq_of(cfs_rq)->curr);
 	}

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-01-18 19:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-25  0:26 [PATCH] sched: Buggy comparison in check_preempt_tick Venkatesh Pallipadi
2010-12-25  7:50 ` Mike Galbraith
2010-12-26  0:05   ` Venkatesh Pallipadi
2010-12-26  7:23     ` Mike Galbraith
2010-12-28  5:48       ` Mike Galbraith
2011-01-05  4:41         ` [PATCH] " Mike Galbraith
2011-01-18 19:06           ` [tip:sched/urgent] sched: Fix signed unsigned comparison in check_preempt_tick() tip-bot for Mike Galbraith

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.