linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched: recover sched_yield task running time increase
@ 2011-04-05 22:33 Alex Shi
  2011-04-06  5:07 ` Rik van Riel
  2011-04-06  8:04 ` Peter Zijlstra
  0 siblings, 2 replies; 13+ messages in thread
From: Alex Shi @ 2011-04-05 22:33 UTC (permalink / raw)
  To: riel; +Cc: linux-kernel, a.p.zijlstra, mingo, tim.c.chen, shaohua.li

commit ac53db596cc08ecb8040c removed the sched_yield task running
time increase, so the yielded task get more opportunity to be launch
again. That may not the caller want to be. And this also causes
volano benchmark drop 50~80 percent performance on core2/NHM/WSM
machines. This patch recover the sched_yield task vruntime up.

Signed-off-by: alex.shi@intel.com
---
 kernel/sched_fair.c |   18 +++++++++++++++++-
 1 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 3f7ec9e..04d58bb 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1956,7 +1956,7 @@ static void yield_task_fair(struct rq *rq)
 {
 	struct task_struct *curr = rq->curr;
 	struct cfs_rq *cfs_rq = task_cfs_rq(curr);
-	struct sched_entity *se = &curr->se;
+	struct sched_entity *se = &curr->se, *rightmost;
 
 	/*
 	 * Are we the only task in the tree?
@@ -1975,6 +1975,22 @@ static void yield_task_fair(struct rq *rq)
 	}
 
 	set_skip_buddy(se);
+	/*
+	 * Find the rightmost entry in the rbtree:
+	 */
+	rightmost = __pick_last_entity(cfs_rq);
+	/*
+	 * Already in the rightmost position?
+	 */
+	if (unlikely(!rightmost || entity_before(rightmost, se)))
+		return;
+
+	/*
+	 * Minimally necessary key value to be last in the tree:
+	 * Upon rescheduling, sched_class::put_prev_task() will place
+	 * 'current' within the tree based on its new key value.
+	 */
+	se->vruntime = rightmost->vruntime + 1;
 }
 
 static bool yield_to_task_fair(struct rq *rq, struct task_struct *p, bool preempt)
-- 
1.6.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] sched: recover sched_yield task running time increase
  2011-04-05 22:33 [PATCH] sched: recover sched_yield task running time increase Alex Shi
@ 2011-04-06  5:07 ` Rik van Riel
  2011-04-06  6:15   ` Alex,Shi
  2011-04-06  8:04 ` Peter Zijlstra
  1 sibling, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2011-04-06  5:07 UTC (permalink / raw)
  To: Alex Shi; +Cc: linux-kernel, a.p.zijlstra, mingo, tim.c.chen, shaohua.li

On 04/05/2011 06:33 PM, Alex Shi wrote:
> commit ac53db596cc08ecb8040c removed the sched_yield task running
> time increase, so the yielded task get more opportunity to be launch
> again. That may not the caller want to be. And this also causes
> volano benchmark drop 50~80 percent performance on core2/NHM/WSM
> machines. This patch recover the sched_yield task vruntime up.
>
> Signed-off-by: alex.shi@intel.com

NACK

This was switched off by default and under
the sysctl sched_compat_yield for a reason.

Reintroducing it under that sysctl option
may be acceptable, but by default it would
be doing the wrong thing for other workloads.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] sched: recover sched_yield task running time increase
  2011-04-06  5:07 ` Rik van Riel
@ 2011-04-06  6:15   ` Alex,Shi
  2011-04-06  7:01     ` Mike Galbraith
  0 siblings, 1 reply; 13+ messages in thread
From: Alex,Shi @ 2011-04-06  6:15 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel, a.p.zijlstra, mingo, Chen, Tim C, Li, Shaohua

On Wed, 2011-04-06 at 13:07 +0800, Rik van Riel wrote:
> On 04/05/2011 06:33 PM, Alex Shi wrote:
> > commit ac53db596cc08ecb8040c removed the sched_yield task running
> > time increase, so the yielded task get more opportunity to be launch
> > again. That may not the caller want to be. And this also causes
> > volano benchmark drop 50~80 percent performance on core2/NHM/WSM
> > machines. This patch recover the sched_yield task vruntime up.
> >
> > Signed-off-by: alex.shi@intel.com
> 
> NACK
> 
> This was switched off by default and under
> the sysctl sched_compat_yield for a reason.
> 
> Reintroducing it under that sysctl option
> may be acceptable, but by default it would
> be doing the wrong thing for other workloads.

I can implement this as sysctl option. But when I checked again the man
page of sched_yield. I have some concerns on this. 

----
       int sched_yield(void);

DESCRIPTION
       A  process  can  relinquish  the processor voluntarily without blocking by calling sched_yield().
       The process will then be moved to the end of the queue for its static priority and a new  process
       gets to run.
----

If a application calls sched_yield system call, most of time it is not
want to be launched again right now. so the man page said "the caller
process will then be moved to the _end_ of the queue..."


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] sched: recover sched_yield task running time increase
  2011-04-06  6:15   ` Alex,Shi
@ 2011-04-06  7:01     ` Mike Galbraith
  2011-04-06 13:28       ` Shi, Alex
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Galbraith @ 2011-04-06  7:01 UTC (permalink / raw)
  To: Alex,Shi
  Cc: Rik van Riel, linux-kernel, a.p.zijlstra, mingo, Chen, Tim C, Li,
	Shaohua

On Wed, 2011-04-06 at 14:15 +0800, Alex,Shi wrote:
> On Wed, 2011-04-06 at 13:07 +0800, Rik van Riel wrote:
> > On 04/05/2011 06:33 PM, Alex Shi wrote:
> > > commit ac53db596cc08ecb8040c removed the sched_yield task running
> > > time increase, so the yielded task get more opportunity to be launch
> > > again. That may not the caller want to be. And this also causes
> > > volano benchmark drop 50~80 percent performance on core2/NHM/WSM
> > > machines. This patch recover the sched_yield task vruntime up.
> > >
> > > Signed-off-by: alex.shi@intel.com
> > 
> > NACK
> > 
> > This was switched off by default and under
> > the sysctl sched_compat_yield for a reason.
> > 
> > Reintroducing it under that sysctl option
> > may be acceptable, but by default it would
> > be doing the wrong thing for other workloads.
> 
> I can implement this as sysctl option. But when I checked again the man
> page of sched_yield. I have some concerns on this. 
> 
> ----
>        int sched_yield(void);
> 
> DESCRIPTION
>        A  process  can  relinquish  the processor voluntarily without blocking by calling sched_yield().
>        The process will then be moved to the end of the queue for its static priority and a new  process
>        gets to run.
> ----
> 
> If a application calls sched_yield system call, most of time it is not
> want to be launched again right now. so the man page said "the caller
> process will then be moved to the _end_ of the queue..."

Moving a yielding nice 0 task behind a SCHED_IDLE (or nice 19) task
could be incredibly painful.

	-Mike


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] sched: recover sched_yield task running time increase
  2011-04-05 22:33 [PATCH] sched: recover sched_yield task running time increase Alex Shi
  2011-04-06  5:07 ` Rik van Riel
@ 2011-04-06  8:04 ` Peter Zijlstra
  2011-04-06 14:42   ` Rik van Riel
  2011-04-07  3:08   ` Alex,Shi
  1 sibling, 2 replies; 13+ messages in thread
From: Peter Zijlstra @ 2011-04-06  8:04 UTC (permalink / raw)
  To: Alex Shi; +Cc: riel, linux-kernel, mingo, tim.c.chen, shaohua.li

On Wed, 2011-04-06 at 06:33 +0800, Alex Shi wrote:
> commit ac53db596cc08ecb8040c removed the sched_yield task running
> time increase, so the yielded task get more opportunity to be launch
> again. That may not the caller want to be. And this also causes
> volano benchmark drop 50~80 percent performance on core2/NHM/WSM
> machines. This patch recover the sched_yield task vruntime up.

You do know that any app that relies on sched_yield behaviour is more
than broken? Using sched_yield() for anything other than SCHED_FIFO
tasks is well outside spec.

Furthermore, apparently you used sysctl_sched_compat_yield, which was
bound to disappear some time, since with the default settings the yield
semantics didn't actually change.

So no, I'm not much inclined to accept this. The Java people have had
every opportunity to go fix their crap, them not doing so will
eventually (preferably now) stop being my problem.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH] sched: recover sched_yield task running time increase
  2011-04-06  7:01     ` Mike Galbraith
@ 2011-04-06 13:28       ` Shi, Alex
  2011-04-07  2:44         ` Mike Galbraith
  0 siblings, 1 reply; 13+ messages in thread
From: Shi, Alex @ 2011-04-06 13:28 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Rik van Riel, linux-kernel, a.p.zijlstra, mingo, Chen, Tim C, Li,
	Shaohua

>> > NACK
>> >
>> > This was switched off by default and under
>> > the sysctl sched_compat_yield for a reason.
>> >
>> > Reintroducing it under that sysctl option
>> > may be acceptable, but by default it would
>> > be doing the wrong thing for other workloads.
>>
>> I can implement this as sysctl option. But when I checked again the man
>> page of sched_yield. I have some concerns on this.
>>
>> ----
>>        int sched_yield(void);
>>
>> DESCRIPTION
>>        A  process  can  relinquish  the processor voluntarily without blocking by calling sched_yield().
>>        The process will then be moved to the end of the queue for its static priority and a new  process
>>        gets to run.
>> ----
>>
>> If a application calls sched_yield system call, most of time it is not
>> want to be launched again right now. so the man page said "the caller
>> process will then be moved to the _end_ of the queue..."
>
>Moving a yielding nice 0 task behind a SCHED_IDLE (or nice 19) task
>could be incredibly painful.

Good reminder! Do you have more detailed idea on this? 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] sched: recover sched_yield task running time increase
  2011-04-06  8:04 ` Peter Zijlstra
@ 2011-04-06 14:42   ` Rik van Riel
  2011-04-06 15:25     ` Peter Zijlstra
  2011-04-07  3:08   ` Alex,Shi
  1 sibling, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2011-04-06 14:42 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Alex Shi, linux-kernel, mingo, tim.c.chen, shaohua.li

On 04/06/2011 04:04 AM, Peter Zijlstra wrote:
> On Wed, 2011-04-06 at 06:33 +0800, Alex Shi wrote:
>> commit ac53db596cc08ecb8040c removed the sched_yield task running
>> time increase, so the yielded task get more opportunity to be launch
>> again. That may not the caller want to be. And this also causes
>> volano benchmark drop 50~80 percent performance on core2/NHM/WSM
>> machines. This patch recover the sched_yield task vruntime up.
>
> You do know that any app that relies on sched_yield behaviour is more
> than broken? Using sched_yield() for anything other than SCHED_FIFO
> tasks is well outside spec.
>
> Furthermore, apparently you used sysctl_sched_compat_yield, which was
> bound to disappear some time, since with the default settings the yield
> semantics didn't actually change.
>
> So no, I'm not much inclined to accept this. The Java people have had
> every opportunity to go fix their crap, them not doing so will
> eventually (preferably now) stop being my problem.

It appears they might not have figured out how to fix
their stuff :)

Would you have any hints on what the Java folks should
replace their calls to sched_yield with?

Proper use of futexes from inside the JVM perhaps?

Or should we export yield_to to userspace and have
them use that? :)   *runs*

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] sched: recover sched_yield task running time increase
  2011-04-06 14:42   ` Rik van Riel
@ 2011-04-06 15:25     ` Peter Zijlstra
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2011-04-06 15:25 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Alex Shi, linux-kernel, mingo, tim.c.chen, shaohua.li

On Wed, 2011-04-06 at 10:42 -0400, Rik van Riel wrote:

> It appears they might not have figured out how to fix
> their stuff :)

As far as I can tell they're totally not interested in fixing their
crap, it comes up every time we touch sched_yield() but nobody ever
steps up and fixes things.

> Would you have any hints on what the Java folks should
> replace their calls to sched_yield with?
> 
> Proper use of futexes from inside the JVM perhaps?

Yeah, Darren was working on making adaptive spinning futexes, but still
even without that, syscalls on x86 are so cheap there's really hardly
any point in actually spinning in userspace.

> Or should we export yield_to to userspace and have
> them use that? :)   *runs*

Hehe, no, yield_to() is an absolute abomination (much like yield itself
but worse), they have proper locks and should thus use proper
primitives.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH] sched: recover sched_yield task running time increase
  2011-04-06 13:28       ` Shi, Alex
@ 2011-04-07  2:44         ` Mike Galbraith
  0 siblings, 0 replies; 13+ messages in thread
From: Mike Galbraith @ 2011-04-07  2:44 UTC (permalink / raw)
  To: Shi, Alex
  Cc: Rik van Riel, linux-kernel, a.p.zijlstra, mingo, Chen, Tim C, Li,
	Shaohua

On Wed, 2011-04-06 at 21:28 +0800, Shi, Alex wrote:
> >> > NACK
> >> >
> >> > This was switched off by default and under
> >> > the sysctl sched_compat_yield for a reason.
> >> >
> >> > Reintroducing it under that sysctl option
> >> > may be acceptable, but by default it would
> >> > be doing the wrong thing for other workloads.
> >>
> >> I can implement this as sysctl option. But when I checked again the man
> >> page of sched_yield. I have some concerns on this.
> >>
> >> ----
> >>        int sched_yield(void);
> >>
> >> DESCRIPTION
> >>        A  process  can  relinquish  the processor voluntarily without blocking by calling sched_yield().
> >>        The process will then be moved to the end of the queue for its static priority and a new  process
> >>        gets to run.
> >> ----
> >>
> >> If a application calls sched_yield system call, most of time it is not
> >> want to be launched again right now. so the man page said "the caller
> >> process will then be moved to the _end_ of the queue..."
> >
> >Moving a yielding nice 0 task behind a SCHED_IDLE (or nice 19) task
> >could be incredibly painful.
> 
> Good reminder! Do you have more detailed idea on this?

Other than 'don't do that'?  Nope.  sched_yield() semantics suck.

	-Mike


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] sched: recover sched_yield task running time increase
  2011-04-06  8:04 ` Peter Zijlstra
  2011-04-06 14:42   ` Rik van Riel
@ 2011-04-07  3:08   ` Alex,Shi
  2011-04-07  6:13     ` Rik van Riel
  1 sibling, 1 reply; 13+ messages in thread
From: Alex,Shi @ 2011-04-07  3:08 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: riel, linux-kernel, mingo, Chen, Tim C, Li, Shaohua

On Wed, 2011-04-06 at 16:04 +0800, Peter Zijlstra wrote:
> On Wed, 2011-04-06 at 06:33 +0800, Alex Shi wrote:
> > commit ac53db596cc08ecb8040c removed the sched_yield task running
> > time increase, so the yielded task get more opportunity to be launch
> > again. That may not the caller want to be. And this also causes
> > volano benchmark drop 50~80 percent performance on core2/NHM/WSM
> > machines. This patch recover the sched_yield task vruntime up.
> 
> You do know that any app that relies on sched_yield behaviour is more
> than broken? Using sched_yield() for anything other than SCHED_FIFO
> tasks is well outside spec.
> 
> Furthermore, apparently you used sysctl_sched_compat_yield, which was
> bound to disappear some time, since with the default settings the yield
> semantics didn't actually change.

Yes, I used sched_compat_yield, otherwise volano will become extremely
slow in my single machine testing. We may reconsider our testing
setting. 

On the other side, after scheduler change to CFS, the task priority was
converted into load.weight and calculated into vruntime, the originally
meaningful of sched_yield system call is very hard to implement.
Considering this, I understand your decision on this. 
> 
> So no, I'm not much inclined to accept this. The Java people have had
> every opportunity to go fix their crap, them not doing so will
> eventually (preferably now) stop being my problem.
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] sched: recover sched_yield task running time increase
  2011-04-07  3:08   ` Alex,Shi
@ 2011-04-07  6:13     ` Rik van Riel
  2011-04-07  6:43       ` Alex,Shi
  0 siblings, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2011-04-07  6:13 UTC (permalink / raw)
  To: Alex,Shi; +Cc: Peter Zijlstra, linux-kernel, mingo, Chen, Tim C, Li, Shaohua

On 04/06/2011 11:08 PM, Alex,Shi wrote:
> On Wed, 2011-04-06 at 16:04 +0800, Peter Zijlstra wrote:
>> On Wed, 2011-04-06 at 06:33 +0800, Alex Shi wrote:

>> You do know that any app that relies on sched_yield behaviour is more
>> than broken? Using sched_yield() for anything other than SCHED_FIFO
>> tasks is well outside spec.
>>
>> Furthermore, apparently you used sysctl_sched_compat_yield, which was
>> bound to disappear some time, since with the default settings the yield
>> semantics didn't actually change.
>
> Yes, I used sched_compat_yield, otherwise volano will become extremely
> slow in my single machine testing. We may reconsider our testing
> setting.

With what JVM is this happening?

Surely not every JVM uses user space spinlocks and
yield, when we have futexes available?




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] sched: recover sched_yield task running time increase
  2011-04-07  6:13     ` Rik van Riel
@ 2011-04-07  6:43       ` Alex,Shi
  2011-04-07  8:52         ` Ingo Molnar
  0 siblings, 1 reply; 13+ messages in thread
From: Alex,Shi @ 2011-04-07  6:43 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Peter Zijlstra, linux-kernel, mingo, Chen, Tim C, Li, Shaohua

On Thu, 2011-04-07 at 14:13 +0800, Rik van Riel wrote:
> On 04/06/2011 11:08 PM, Alex,Shi wrote:
> > On Wed, 2011-04-06 at 16:04 +0800, Peter Zijlstra wrote:
> >> On Wed, 2011-04-06 at 06:33 +0800, Alex Shi wrote:
> 
> >> You do know that any app that relies on sched_yield behaviour is more
> >> than broken? Using sched_yield() for anything other than SCHED_FIFO
> >> tasks is well outside spec.
> >>
> >> Furthermore, apparently you used sysctl_sched_compat_yield, which was
> >> bound to disappear some time, since with the default settings the yield
> >> semantics didn't actually change.
> >
> > Yes, I used sched_compat_yield, otherwise volano will become extremely
> > slow in my single machine testing. We may reconsider our testing
> > setting.
> 
> With what JVM is this happening?

I used openjdk from fedora 10~13 and jrockit-R27.3.1-jre1.5.0_11, both
of them use sched_yield much. And I just did a quick try with
jrockit-R27.4.0-jre1.6.0_02.x86_64, it had same problem.

> 
> Surely not every JVM uses user space spinlocks and
> yield, when we have futexes available?

No, just 7 times futex called in testing, so I think the JVM did not use
it. 
> 
> 
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] sched: recover sched_yield task running time increase
  2011-04-07  6:43       ` Alex,Shi
@ 2011-04-07  8:52         ` Ingo Molnar
  0 siblings, 0 replies; 13+ messages in thread
From: Ingo Molnar @ 2011-04-07  8:52 UTC (permalink / raw)
  To: Alex,Shi
  Cc: Rik van Riel, Peter Zijlstra, linux-kernel, Chen, Tim C, Li, Shaohua


* Alex,Shi <alex.shi@intel.com> wrote:

> On Thu, 2011-04-07 at 14:13 +0800, Rik van Riel wrote:
> > On 04/06/2011 11:08 PM, Alex,Shi wrote:
> > > On Wed, 2011-04-06 at 16:04 +0800, Peter Zijlstra wrote:
> > >> On Wed, 2011-04-06 at 06:33 +0800, Alex Shi wrote:
> > 
> > >> You do know that any app that relies on sched_yield behaviour is more
> > >> than broken? Using sched_yield() for anything other than SCHED_FIFO
> > >> tasks is well outside spec.
> > >>
> > >> Furthermore, apparently you used sysctl_sched_compat_yield, which was
> > >> bound to disappear some time, since with the default settings the yield
> > >> semantics didn't actually change.
> > >
> > > Yes, I used sched_compat_yield, otherwise volano will become extremely
> > > slow in my single machine testing. We may reconsider our testing
> > > setting.
> > 
> > With what JVM is this happening?
> 
> I used openjdk from fedora 10~13 and jrockit-R27.3.1-jre1.5.0_11, both
> of them use sched_yield much. And I just did a quick try with
> jrockit-R27.4.0-jre1.6.0_02.x86_64, it had same problem.

Well, switching openjdk to futexes would be a nice performance optimization for 
sure - especially if you can show the speedups with VolanoMark.

It would also put all the performance claims to rest. If indeed yield() is 
faster then this would give us an opportunity to improve futex performance to 
the point (or beyond) yield() based locks.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-04-07  8:53 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-05 22:33 [PATCH] sched: recover sched_yield task running time increase Alex Shi
2011-04-06  5:07 ` Rik van Riel
2011-04-06  6:15   ` Alex,Shi
2011-04-06  7:01     ` Mike Galbraith
2011-04-06 13:28       ` Shi, Alex
2011-04-07  2:44         ` Mike Galbraith
2011-04-06  8:04 ` Peter Zijlstra
2011-04-06 14:42   ` Rik van Riel
2011-04-06 15:25     ` Peter Zijlstra
2011-04-07  3:08   ` Alex,Shi
2011-04-07  6:13     ` Rik van Riel
2011-04-07  6:43       ` Alex,Shi
2011-04-07  8:52         ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).