linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* sys_sched_yield fast path
@ 2001-03-10  0:47 Mike Kravetz
  2001-03-10 11:30 ` Davide Libenzi
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Kravetz @ 2001-03-10  0:47 UTC (permalink / raw)
  To: linux-kernel

Any thoughts about adding a 'fast path' to the SMP code in
sys_sched_yield.  Why not compare nr_pending to smp_num_cpus
before examining the aligned_data structures?  Something like,

if (nr_pending > smp_num_cpus)
	goto set_resched_now;

Where set_resched_now is a label placed just before the code
that sets the need_resched field of the current process.
This would eliminate touching all the aligned_data cache lines
in the case where nr_pending can never be decremented to zero.

Also, would it make sense to stop decrementing nr_pending to
prevent it from going negative?  OR  Is the reasoning that in
these cases there is so much 'scheduling' activity that we
should force the reschedule?

-- 
Mike Kravetz                                 mkravetz@sequent.com
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: sys_sched_yield fast path
  2001-03-10  0:47 sys_sched_yield fast path Mike Kravetz
@ 2001-03-10 11:30 ` Davide Libenzi
  2001-03-10 16:59   ` Andi Kleen
  0 siblings, 1 reply; 10+ messages in thread
From: Davide Libenzi @ 2001-03-10 11:30 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: linux-kernel


On 10-Mar-2001 Mike Kravetz wrote:
> Any thoughts about adding a 'fast path' to the SMP code in
> sys_sched_yield.  Why not compare nr_pending to smp_num_cpus
> before examining the aligned_data structures?  Something like,
> 
> if (nr_pending > smp_num_cpus)
>       goto set_resched_now;
> 
> Where set_resched_now is a label placed just before the code
> that sets the need_resched field of the current process.
> This would eliminate touching all the aligned_data cache lines
> in the case where nr_pending can never be decremented to zero.
> 
> Also, would it make sense to stop decrementing nr_pending to
> prevent it from going negative?  OR  Is the reasoning that in
> these cases there is so much 'scheduling' activity that we
> should force the reschedule?

Probably the rate at which is called sys_sched_yield() is not so high to let
the performance improvement to be measurable.
If You're going to measure the schedule() speed with the test program in which
the schedule() rate is the same of the sched_yield() rate, this could clean Your
measure of the schedule() speed.




- Davide


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: sys_sched_yield fast path
  2001-03-10 11:30 ` Davide Libenzi
@ 2001-03-10 16:59   ` Andi Kleen
  2001-03-11 14:12     ` Davide Libenzi
  0 siblings, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2001-03-10 16:59 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: linux-kernel

Davide Libenzi <davidel@xmailserver.org> writes:


> Probably the rate at which is called sys_sched_yield() is not so high to let
> the performance improvement to be measurable.

LinuxThreads mutexes call sched_yield() when a lock is locked, so when you 
have a  multithreaded program with some lock contention it'll be called a lot.

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: sys_sched_yield fast path
  2001-03-11 14:12     ` Davide Libenzi
@ 2001-03-11 13:54       ` Anton Blanchard
  2001-03-11 19:17         ` Dave Zarzycki
                           ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Anton Blanchard @ 2001-03-11 13:54 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: Andi Kleen, linux-kernel

 
> This is the linux thread spinlock acquire :
> 
> 
> static void __pthread_acquire(int * spinlock)
> {
>   int cnt = 0;
>   struct timespec tm;
> 
>   while (testandset(spinlock)) {
>     if (cnt < MAX_SPIN_COUNT) {
>       sched_yield();
>       cnt++;
>     } else {
>       tm.tv_sec = 0;
>       tm.tv_nsec = SPIN_SLEEP_DURATION;
>       nanosleep(&tm, NULL);
>       cnt = 0;
>     }
>   }
> }
> 
> 
> Yes, it calls sched_yield() but this is not a std wait for mutex but for
> spinlocks that are hold a very short time.  Real wait are implemented using
> signals.  More, with the new implementation of sys_sched_yield() the task
> release all its time quantum so, even in a case where a task repeatedly calls
> sched_yield() the call rate is not so high if there is at least one process
> to spin.  And if there isn't one task with goodness() > 0, nobody cares about
> sched_yield() performance.

The problem I found with sched_yield is that things break down with high
levels of contention. If you have 3 processes and one has a lock then
the other two can ping pong doing sched_yield() until their priority drops
below the process with the lock. eg in a run I just did then where 2
has the lock:

1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
2

Perhaps we need something like sched_yield that takes off some of 
tsk->counter so the task with the spinlock will run earlier.

Anton

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: sys_sched_yield fast path
  2001-03-10 16:59   ` Andi Kleen
@ 2001-03-11 14:12     ` Davide Libenzi
  2001-03-11 13:54       ` Anton Blanchard
  0 siblings, 1 reply; 10+ messages in thread
From: Davide Libenzi @ 2001-03-11 14:12 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel


On 10-Mar-2001 Andi Kleen wrote:
> Davide Libenzi <davidel@xmailserver.org> writes:
> 
> 
>> Probably the rate at which is called sys_sched_yield() is not so high to let
>> the performance improvement to be measurable.
> 
> LinuxThreads mutexes call sched_yield() when a lock is locked, so when you 
> have a  multithreaded program with some lock contention it'll be called a
> lot.

This is the linux thread spinlock acquire :


static void __pthread_acquire(int * spinlock)
{
  int cnt = 0;
  struct timespec tm;

  while (testandset(spinlock)) {
    if (cnt < MAX_SPIN_COUNT) {
      sched_yield();
      cnt++;
    } else {
      tm.tv_sec = 0;
      tm.tv_nsec = SPIN_SLEEP_DURATION;
      nanosleep(&tm, NULL);
      cnt = 0;
    }
  }
}


Yes, it calls sched_yield() but this is not a std wait for mutex but for
spinlocks that are hold a very short time.
Real wait are implemented using signals.
More, with the new implementation of sys_sched_yield() the task release all its
time quantum so, even in a case where a task repeatedly calls sched_yield() the
call rate is not so high if there is at least one process to spin.
And if there isn't one task with goodness() > 0, nobody cares about
sched_yield() performance.




- Davide


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: sys_sched_yield fast path
  2001-03-11 13:54       ` Anton Blanchard
@ 2001-03-11 19:17         ` Dave Zarzycki
  2001-03-12  0:18           ` Davide Libenzi
  2001-03-11 23:46         ` Davide Libenzi
  2001-03-12  0:10         ` Davide Libenzi
  2 siblings, 1 reply; 10+ messages in thread
From: Dave Zarzycki @ 2001-03-11 19:17 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: Davide Libenzi, Andi Kleen, linux-kernel

On Mon, 12 Mar 2001, Anton Blanchard wrote:

> Perhaps we need something like sched_yield that takes off some of
> tsk->counter so the task with the spinlock will run earlier.

Personally speaking, I wish sched_yield() API was like so:

int sched_yield(pid_t pid);

The pid argument would be advisory, of course, the kernel doesn't have to
honor it.

This would allow the thread wanting to acquire the spinlock to yield
specifically to the thread holding the lock (assuming the pid of the lock
holder was stored in the spinlock...) In fact, the the original lock owner
could in theory yield back to the threading wanting to acquire the lock.

Feedback from the scheduling gurus would be appreciated.

Thanks,

davez

-- 
Dave Zarzycki
http://thor.sbay.org/~dave/



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: sys_sched_yield fast path
  2001-03-11 13:54       ` Anton Blanchard
  2001-03-11 19:17         ` Dave Zarzycki
@ 2001-03-11 23:46         ` Davide Libenzi
  2001-03-12  0:10         ` Davide Libenzi
  2 siblings, 0 replies; 10+ messages in thread
From: Davide Libenzi @ 2001-03-11 23:46 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: linux-kernel, Andi Kleen


On 11-Mar-2001 Anton Blanchard wrote:
>  
>> This is the linux thread spinlock acquire :
>> 
>> 
>> static void __pthread_acquire(int * spinlock)
>> {
>>   int cnt = 0;
>>   struct timespec tm;
>> 
>>   while (testandset(spinlock)) {
>>     if (cnt < MAX_SPIN_COUNT) {
>>       sched_yield();
>>       cnt++;
>>     } else {
>>       tm.tv_sec = 0;
>>       tm.tv_nsec = SPIN_SLEEP_DURATION;
>>       nanosleep(&tm, NULL);
>>       cnt = 0;
>>     }
>>   }
>> }
>> 
>> 
>> Yes, it calls sched_yield() but this is not a std wait for mutex but for
>> spinlocks that are hold a very short time.  Real wait are implemented using
>> signals.  More, with the new implementation of sys_sched_yield() the task
>> release all its time quantum so, even in a case where a task repeatedly
>> calls
>> sched_yield() the call rate is not so high if there is at least one process
>> to spin.  And if there isn't one task with goodness() > 0, nobody cares
>> about
>> sched_yield() performance.
> 
> The problem I found with sched_yield is that things break down with high
> levels of contention. If you have 3 processes and one has a lock then
> the other two can ping pong doing sched_yield() until their priority drops
> below the process with the lock. eg in a run I just did then where 2
> has the lock:
> 
> 1
> 0
> 1
> 0
> 1
> 0
> 1
> 0
> 1
> 0
> 1
> 0
> 1
> 0
> 1
> 0
> 1
> 0
> 2
> 
> Perhaps we need something like sched_yield that takes off some of 
> tsk->counter so the task with the spinlock will run earlier.

Which kernel are You running ?



- Davide


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: sys_sched_yield fast path
  2001-03-11 13:54       ` Anton Blanchard
  2001-03-11 19:17         ` Dave Zarzycki
  2001-03-11 23:46         ` Davide Libenzi
@ 2001-03-12  0:10         ` Davide Libenzi
  2001-03-12  1:24           ` Anton Blanchard
  2 siblings, 1 reply; 10+ messages in thread
From: Davide Libenzi @ 2001-03-12  0:10 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: linux-kernel, Andi Kleen


On 11-Mar-2001 Anton Blanchard wrote:
>  
>> This is the linux thread spinlock acquire :
>> 
>> 
>> static void __pthread_acquire(int * spinlock)
>> {
>>   int cnt = 0;
>>   struct timespec tm;
>> 
>>   while (testandset(spinlock)) {
>>     if (cnt < MAX_SPIN_COUNT) {
>>       sched_yield();
>>       cnt++;
>>     } else {
>>       tm.tv_sec = 0;
>>       tm.tv_nsec = SPIN_SLEEP_DURATION;
>>       nanosleep(&tm, NULL);
>>       cnt = 0;
>>     }
>>   }
>> }
>> 
>> 
>> Yes, it calls sched_yield() but this is not a std wait for mutex but for
>> spinlocks that are hold a very short time.  Real wait are implemented using
>> signals.  More, with the new implementation of sys_sched_yield() the task
>> release all its time quantum so, even in a case where a task repeatedly
>> calls
>> sched_yield() the call rate is not so high if there is at least one process
>> to spin.  And if there isn't one task with goodness() > 0, nobody cares
>> about
>> sched_yield() performance.
> 
> The problem I found with sched_yield is that things break down with high
> levels of contention. If you have 3 processes and one has a lock then
> the other two can ping pong doing sched_yield() until their priority drops
> below the process with the lock. eg in a run I just did then where 2
> has the lock:
> 
> 1
> 0
> 1
> 0
> 1
> 0
> 1
> 0
> 1
> 0
> 1
> 0
> 1
> 0
> 1
> 0
> 1
> 0
> 2
> 
> Perhaps we need something like sched_yield that takes off some of 
> tsk->counter so the task with the spinlock will run earlier.

2.4.x has changed the scheduler behaviour so that the task that call
sched_yield() is not rescheduled by the incoming schedule().
A flag is set ( under certain conditions in SMP ) and the goodness()
calculation assign the lower value to the exiting task ( this flag is cleared
in schedule_tail() ).
This could give the task owning the lock the opportunity to complete the locked
code.
But yes, if the locked code is rescheduled for some reason ( timeslice or I/O )
the yielding task will run again.
But this is a software design problem, not a sched_yield() one coz, if the time
path between lock ans unlock can be high the use of sched_yield() is not the
best way to wait.
Wait queue or user space equivalences are a better choice to do this.




- Davide


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: sys_sched_yield fast path
  2001-03-11 19:17         ` Dave Zarzycki
@ 2001-03-12  0:18           ` Davide Libenzi
  0 siblings, 0 replies; 10+ messages in thread
From: Davide Libenzi @ 2001-03-12  0:18 UTC (permalink / raw)
  To: Dave Zarzycki; +Cc: linux-kernel, Andi Kleen, Anton Blanchard


On 11-Mar-2001 Dave Zarzycki wrote:
> On Mon, 12 Mar 2001, Anton Blanchard wrote:
> 
>> Perhaps we need something like sched_yield that takes off some of
>> tsk->counter so the task with the spinlock will run earlier.
> 
> Personally speaking, I wish sched_yield() API was like so:
> 
> int sched_yield(pid_t pid);

Yes, You could do an API like this but it's not the mean of sched_yield().


> This would allow the thread wanting to acquire the spinlock to yield
> specifically to the thread holding the lock (assuming the pid of the lock
> holder was stored in the spinlock...) In fact, the the original lock owner
> could in theory yield back to the threading wanting to acquire the lock.

Everything happens inside a spinlock should be very fast otherwise the use of a
spinlock should be avoided.




- Davide


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: sys_sched_yield fast path
  2001-03-12  0:10         ` Davide Libenzi
@ 2001-03-12  1:24           ` Anton Blanchard
  0 siblings, 0 replies; 10+ messages in thread
From: Anton Blanchard @ 2001-03-12  1:24 UTC (permalink / raw)
  To: Davide Libenzi; +Cc: linux-kernel, Andi Kleen

 
Hi,

> 2.4.x has changed the scheduler behaviour so that the task that call
> sched_yield() is not rescheduled by the incoming schedule().  A flag is
> set ( under certain conditions in SMP ) and the goodness() calculation
> assign the lower value to the exiting task ( this flag is cleared in
> schedule_tail() ).

The behaviour I am talking about is when there is a heavily contended
spinlock, and more than one task is trying to obtain it. Since SCHED_YIELD
only changes the goodness when we are trying to reschedule the task we
can bounce between two or more tasks doing sched_yield() for a while before
finally running the task that has the spinlock.

Of course with short lived spinlocks you should rarely get the case where
a task grabs a spinlock just before its timeslice is up, so maybe the answer
is just to spin a few times on sched_yield() then back off to nanosleep()
like pthreads does.

Anton

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2001-03-12  1:26 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-03-10  0:47 sys_sched_yield fast path Mike Kravetz
2001-03-10 11:30 ` Davide Libenzi
2001-03-10 16:59   ` Andi Kleen
2001-03-11 14:12     ` Davide Libenzi
2001-03-11 13:54       ` Anton Blanchard
2001-03-11 19:17         ` Dave Zarzycki
2001-03-12  0:18           ` Davide Libenzi
2001-03-11 23:46         ` Davide Libenzi
2001-03-12  0:10         ` Davide Libenzi
2001-03-12  1:24           ` Anton Blanchard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).