All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Prevent immediate process rescheduling
@ 2009-09-18 19:49 Mark Langsdorf
  2009-09-18 19:54 ` Ingo Molnar
  0 siblings, 1 reply; 9+ messages in thread
From: Mark Langsdorf @ 2009-09-18 19:49 UTC (permalink / raw)
  To: linux-kernel

Prevent the scheduler from immediately rescheduling a process
that just yielded if there is another process available.

Originally suggested by Mike Galbraith (efault@gmx.de).

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
---
 kernel/sched_fair.c |   16 +++++++++++++++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 652e8bd..4fad08f 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -353,11 +353,25 @@ static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
 static struct sched_entity *__pick_next_entity(struct cfs_rq *cfs_rq)
 {
 	struct rb_node *left = cfs_rq->rb_leftmost;
+	struct sched_entity *se, *curr;
 
 	if (!left)
 		return NULL;
 
-	return rb_entry(left, struct sched_entity, run_node);
+	se = rb_entry(left, struct sched_entity, run_node);
+	curr = &current->se;
+
+	/* 
+	 * Don't select the entity who just tried to schedule away
+	 * if there's another entity available.
+	 */
+	if (unlikely(se == curr && cfs_rq->nr_running > 1)) {
+		struct rb_node *next_node = rb_next(&curr->run_node);
+		if (next_node)
+			se = rb_entry(next_node, struct sched_entity, run_node);
+	}
+
+	return se;
 }
 
 static struct sched_entity *__pick_last_entity(struct cfs_rq *cfs_rq)
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] Prevent immediate process rescheduling
  2009-09-18 19:49 [PATCH] Prevent immediate process rescheduling Mark Langsdorf
@ 2009-09-18 19:54 ` Ingo Molnar
  2009-09-18 20:00   ` Langsdorf, Mark
  2009-09-18 20:03   ` Peter Zijlstra
  0 siblings, 2 replies; 9+ messages in thread
From: Ingo Molnar @ 2009-09-18 19:54 UTC (permalink / raw)
  To: Mark Langsdorf, Peter Zijlstra, Mike Galbraith; +Cc: linux-kernel


(fixed the Cc: lines)

* Mark Langsdorf <mark.langsdorf@amd.com> wrote:

> Prevent the scheduler from immediately rescheduling a process
> that just yielded if there is another process available.
> 
> Originally suggested by Mike Galbraith (efault@gmx.de).
> 
> Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
> ---
>  kernel/sched_fair.c |   16 +++++++++++++++-
>  1 files changed, 15 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 652e8bd..4fad08f 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -353,11 +353,25 @@ static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  static struct sched_entity *__pick_next_entity(struct cfs_rq *cfs_rq)
>  {
>  	struct rb_node *left = cfs_rq->rb_leftmost;
> +	struct sched_entity *se, *curr;
>  
>  	if (!left)
>  		return NULL;
>  
> -	return rb_entry(left, struct sched_entity, run_node);
> +	se = rb_entry(left, struct sched_entity, run_node);
> +	curr = &current->se;
> +
> +	/* 
> +	 * Don't select the entity who just tried to schedule away
> +	 * if there's another entity available.
> +	 */
> +	if (unlikely(se == curr && cfs_rq->nr_running > 1)) {
> +		struct rb_node *next_node = rb_next(&curr->run_node);
> +		if (next_node)
> +			se = rb_entry(next_node, struct sched_entity, run_node);
> +	}
> +
> +	return se;
>  }

I suspect some real workload is the motivation of this - what is that 
workload?

	Ingo

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH] Prevent immediate process rescheduling
  2009-09-18 19:54 ` Ingo Molnar
@ 2009-09-18 20:00   ` Langsdorf, Mark
  2009-09-18 20:03   ` Peter Zijlstra
  1 sibling, 0 replies; 9+ messages in thread
From: Langsdorf, Mark @ 2009-09-18 20:00 UTC (permalink / raw)
  To: 'Ingo Molnar', Peter Zijlstra, Mike Galbraith; +Cc: linux-kernel

 

> -----Original Message-----
> From: Ingo Molnar [mailto:mingo@elte.hu] 
> Sent: Friday, September 18, 2009 2:55 PM
> To: Langsdorf, Mark; Peter Zijlstra; Mike Galbraith
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] Prevent immediate process rescheduling
> 
> 
> (fixed the Cc: lines)
> 
> * Mark Langsdorf <mark.langsdorf@amd.com> wrote:
> 
> > Prevent the scheduler from immediately rescheduling a process
> > that just yielded if there is another process available.
> > 
> > Originally suggested by Mike Galbraith (efault@gmx.de).
> > 
> > Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
> > ---
> >  kernel/sched_fair.c |   16 +++++++++++++++-
> >  1 files changed, 15 insertions(+), 1 deletions(-)
> > 
> > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> > index 652e8bd..4fad08f 100644
> > --- a/kernel/sched_fair.c
> > +++ b/kernel/sched_fair.c
> > @@ -353,11 +353,25 @@ static void __dequeue_entity(struct 
> cfs_rq *cfs_rq, struct sched_entity *se)
> >  static struct sched_entity *__pick_next_entity(struct 
> cfs_rq *cfs_rq)
> >  {
> >  	struct rb_node *left = cfs_rq->rb_leftmost;
> > +	struct sched_entity *se, *curr;
> >  
> >  	if (!left)
> >  		return NULL;
> >  
> > -	return rb_entry(left, struct sched_entity, run_node);
> > +	se = rb_entry(left, struct sched_entity, run_node);
> > +	curr = &current->se;
> > +
> > +	/* 
> > +	 * Don't select the entity who just tried to schedule away
> > +	 * if there's another entity available.
> > +	 */
> > +	if (unlikely(se == curr && cfs_rq->nr_running > 1)) {
> > +		struct rb_node *next_node = rb_next(&curr->run_node);
> > +		if (next_node)
> > +			se = rb_entry(next_node, struct 
> sched_entity, run_node);
> > +	}
> > +
> > +	return se;
> >  }
> 
> I suspect some real workload is the motivation of this - what is that 
> workload?

Thanks for fixing the cc lines.

The next patch I submitted enables the Pause Filter feature
from the most recent AMD Operton processors.  Pause Filter
lets us detect when a KVM guest VCPU is spinning useless on
a contended lock because the lockholder VCPU isn't scheduled.
Ideally, we'd like to switch to that VCPU thread, but at a
minimum, we'd like to meaningful yield the contending VCPU.
This patch prevents schedule() from just rescheduling the
thread that is trying to yield, and didn't have any
performance regressions in my tests.

-Mark Langsdorf
Operating System Research Center
AMD

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Prevent immediate process rescheduling
  2009-09-18 19:54 ` Ingo Molnar
  2009-09-18 20:00   ` Langsdorf, Mark
@ 2009-09-18 20:03   ` Peter Zijlstra
  2009-09-19  2:16     ` Mike Galbraith
  2009-09-19  8:31     ` Avi Kivity
  1 sibling, 2 replies; 9+ messages in thread
From: Peter Zijlstra @ 2009-09-18 20:03 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Mark Langsdorf, Mike Galbraith, linux-kernel

On Fri, 2009-09-18 at 21:54 +0200, Ingo Molnar wrote:

> > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> > index 652e8bd..4fad08f 100644
> > --- a/kernel/sched_fair.c
> > +++ b/kernel/sched_fair.c
> > @@ -353,11 +353,25 @@ static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
> >  static struct sched_entity *__pick_next_entity(struct cfs_rq *cfs_rq)
> >  {
> >  	struct rb_node *left = cfs_rq->rb_leftmost;
> > +	struct sched_entity *se, *curr;
> >  
> >  	if (!left)
> >  		return NULL;
> >  
> > -	return rb_entry(left, struct sched_entity, run_node);
> > +	se = rb_entry(left, struct sched_entity, run_node);
> > +	curr = &current->se;
> > +
> > +	/* 
> > +	 * Don't select the entity who just tried to schedule away
> > +	 * if there's another entity available.
> > +	 */
> > +	if (unlikely(se == curr && cfs_rq->nr_running > 1)) {
> > +		struct rb_node *next_node = rb_next(&curr->run_node);
> > +		if (next_node)
> > +			se = rb_entry(next_node, struct sched_entity, run_node);
> > +	}
> > +
> > +	return se;
> >  }

Really hate this change though,. doesn't seem right to not pick the same
task again if its runnable. Bad for cache footprint.

The scenario is quite common for stuff like:

CPU0                                      CPU1

set_task_state(TASK_INTERRUPTIBLE)

if (cond)
  goto out;
                                     <--- ttwu()
schedule();




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Prevent immediate process rescheduling
  2009-09-18 20:03   ` Peter Zijlstra
@ 2009-09-19  2:16     ` Mike Galbraith
  2009-09-19  9:03       ` Mike Galbraith
  2009-09-19  8:31     ` Avi Kivity
  1 sibling, 1 reply; 9+ messages in thread
From: Mike Galbraith @ 2009-09-19  2:16 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Mark Langsdorf, linux-kernel

On Fri, 2009-09-18 at 22:03 +0200, Peter Zijlstra wrote:

> Really hate this change though,. doesn't seem right to not pick the same
> task again if its runnable. Bad for cache footprint.
> 
> The scenario is quite common for stuff like:
> 
> CPU0                                      CPU1
> 
> set_task_state(TASK_INTERRUPTIBLE)
> 
> if (cond)
>   goto out;
>                                      <--- ttwu()
> schedule();

It also resists the scheduler's built in need to close spread, too much
of which can seriously damage latency for others later when the friendly
task later decides it wants to run hard.  OTOH, when it is a voluntary
schedule, not selecting another task is resisting the programmer.

Conundrum.

Another problem with that change is that it doesn't do diddly spit if
the top two are trying to yield, they'll just spin together.

Sounds like Mark's case really needs a gentle_yield() that moves the
task behind next, and slightly beyond if they are too close, but which
also ignores the request entirely if the gap is too large.

	-Mike


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Prevent immediate process rescheduling
  2009-09-18 20:03   ` Peter Zijlstra
  2009-09-19  2:16     ` Mike Galbraith
@ 2009-09-19  8:31     ` Avi Kivity
  2009-09-20 21:03       ` Langsdorf, Mark
  1 sibling, 1 reply; 9+ messages in thread
From: Avi Kivity @ 2009-09-19  8:31 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Mark Langsdorf, Mike Galbraith, linux-kernel

On 09/18/2009 11:03 PM, Peter Zijlstra wrote:
> On Fri, 2009-09-18 at 21:54 +0200, Ingo Molnar wrote:
>
>    
>>> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
>>> index 652e8bd..4fad08f 100644
>>> --- a/kernel/sched_fair.c
>>> +++ b/kernel/sched_fair.c
>>> @@ -353,11 +353,25 @@ static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
>>>   static struct sched_entity *__pick_next_entity(struct cfs_rq *cfs_rq)
>>>   {
>>>   	struct rb_node *left = cfs_rq->rb_leftmost;
>>> +	struct sched_entity *se, *curr;
>>>
>>>   	if (!left)
>>>   		return NULL;
>>>
>>> -	return rb_entry(left, struct sched_entity, run_node);
>>> +	se = rb_entry(left, struct sched_entity, run_node);
>>> +	curr =&current->se;
>>> +
>>> +	/*
>>> +	 * Don't select the entity who just tried to schedule away
>>> +	 * if there's another entity available.
>>> +	 */
>>> +	if (unlikely(se == curr&&  cfs_rq->nr_running>  1)) {
>>> +		struct rb_node *next_node = rb_next(&curr->run_node);
>>> +		if (next_node)
>>> +			se = rb_entry(next_node, struct sched_entity, run_node);
>>> +	}
>>> +
>>> +	return se;
>>>   }
>>>        
> Really hate this change though,. doesn't seem right to not pick the same
> task again if its runnable. Bad for cache footprint.
>
> The scenario is quite common for stuff like:
>
> CPU0                                      CPU1
>
> set_task_state(TASK_INTERRUPTIBLE)
>
> if (cond)
>    goto out;
>                                       <--- ttwu()
> schedule();
>
>    

I agree, yielding should be explicitly requested.

Also, on a heavily overcommitted box an undirected yield might take 
quite a long time to find the thread that's holding the lock.  I think a 
yield_to() will be a lot better:

- we can pick one of the vcpus belonging to the same guest to improve 
the probability that the lock actually get released
- we avoid an issue when the other vcpus are on different runqueues (in 
which case the current patch does nothing)
- we can fix the accounting by donating vruntime from current to the 
yielded-to vcpu


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Prevent immediate process rescheduling
  2009-09-19  2:16     ` Mike Galbraith
@ 2009-09-19  9:03       ` Mike Galbraith
  0 siblings, 0 replies; 9+ messages in thread
From: Mike Galbraith @ 2009-09-19  9:03 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Mark Langsdorf, linux-kernel

On Sat, 2009-09-19 at 04:16 +0200, Mike Galbraith wrote:

> Sounds like Mark's case really needs a gentle_yield() that moves the
> task behind next, and slightly beyond if they are too close, but which
> also ignores the request entirely if the gap is too large.

Did that, it sucks. (not surprising really, yield _always_ sucked;)

	-Mike


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH] Prevent immediate process rescheduling
  2009-09-19  8:31     ` Avi Kivity
@ 2009-09-20 21:03       ` Langsdorf, Mark
  2009-09-20 21:55         ` Avi Kivity
  0 siblings, 1 reply; 9+ messages in thread
From: Langsdorf, Mark @ 2009-09-20 21:03 UTC (permalink / raw)
  To: 'Avi Kivity', Peter Zijlstra
  Cc: Ingo Molnar, Mike Galbraith, linux-kernel

> > Really hate this change though,. doesn't seem right to not 
> pick the same
> > task again if its runnable. Bad for cache footprint.
> 
> I agree, yielding should be explicitly requested.
> 
> Also, on a heavily overcommitted box an undirected yield might take 
> quite a long time to find the thread that's holding the lock. 
>  I think a 
> yield_to() will be a lot better:
> 
> - we can pick one of the vcpus belonging to the same guest to improve 
> the probability that the lock actually get released

Is there a way to find the other vcpus belonging to the
same guest?  I poked around at that, but couldn't find
one.

> - we avoid an issue when the other vcpus are on different 
> runqueues (in which case the current patch does nothing)
> - we can fix the accounting by donating vruntime from current to the 
> yielded-to vcpu

I may need someone to walk me through the vruntime donation
but that's secondary to finding the other vcpus.

-Mark Langsdorf
Operating System Research Center
AMD

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] Prevent immediate process rescheduling
  2009-09-20 21:03       ` Langsdorf, Mark
@ 2009-09-20 21:55         ` Avi Kivity
  0 siblings, 0 replies; 9+ messages in thread
From: Avi Kivity @ 2009-09-20 21:55 UTC (permalink / raw)
  To: Langsdorf, Mark; +Cc: Peter Zijlstra, Ingo Molnar, Mike Galbraith, linux-kernel

On 09/21/2009 12:03 AM, Langsdorf, Mark wrote:
>> - we can pick one of the vcpus belonging to the same guest to improve
>> the probability that the lock actually get released
>>      
> Is there a way to find the other vcpus belonging to the
> same guest?  I poked around at that, but couldn't find
> one.
>    

vcpu->kvm->vcpus[].  It doesn't give you the task_struct, but we can 
easily save it in vcpu_load() and release it in vcpu_put().


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-09-20 21:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-18 19:49 [PATCH] Prevent immediate process rescheduling Mark Langsdorf
2009-09-18 19:54 ` Ingo Molnar
2009-09-18 20:00   ` Langsdorf, Mark
2009-09-18 20:03   ` Peter Zijlstra
2009-09-19  2:16     ` Mike Galbraith
2009-09-19  9:03       ` Mike Galbraith
2009-09-19  8:31     ` Avi Kivity
2009-09-20 21:03       ` Langsdorf, Mark
2009-09-20 21:55         ` Avi Kivity

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.