* Re: sched_yield() makes OpenLDAP slow
[not found] <4D8eT-4rg-31@gated-at.bofh.it>
@ 2005-08-20 3:20 ` Robert Hancock
2005-08-20 4:18 ` Nick Piggin
0 siblings, 1 reply; 49+ messages in thread
From: Robert Hancock @ 2005-08-20 3:20 UTC (permalink / raw)
To: linux-kernel
Howard Chu wrote:
> You assume that spinlocks are the only reason a developer may want to
> yield the processor. This assumption is unfounded. Case in point - the
> primary backend in OpenLDAP uses a transactional database with
> page-level locking of its data structures to provide high levels of
> concurrency. It is the nature of such a system to encounter deadlocks
> over the normal course of operations. When a deadlock is detected, some
> thread must be chosen (by one of a variety of algorithms) to abort its
> transaction, in order to allow other operations to proceed to
> completion. In this situation, the chosen thread must get control of the
> CPU long enough to clean itself up, and then it must yield the CPU in
> order to allow any other competing threads to complete their
> transaction. The thread with the aborted transaction relinquishes all of
> its locks and then waits to get another shot at the CPU to try
> everything over again. Again, this is all fundamental to the nature of
> transactional programming. If the 2.6 kernel makes this programming
> model unreasonably slow, then quite simply this kernel is not viable as
> a database platform.
I fail to see how sched_yield is going to be very helpful in this
situation. Since that call can sleep from a range of time ranging from
zero to a long time, it's going to give unpredictable results.
It seems to me that this sort of thing is why we have POSIX pthread
synchronization primitives.. sched_yield is basically there for a
process to indicate that "what I'm doing doesn't matter much, let other
stuff run". Any other use of it generally constitutes some kind of hack.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-20 3:20 ` sched_yield() makes OpenLDAP slow Robert Hancock
@ 2005-08-20 4:18 ` Nick Piggin
2005-08-20 18:38 ` Howard Chu
0 siblings, 1 reply; 49+ messages in thread
From: Nick Piggin @ 2005-08-20 4:18 UTC (permalink / raw)
To: Robert Hancock; +Cc: linux-kernel
Robert Hancock wrote:
>
> I fail to see how sched_yield is going to be very helpful in this
> situation. Since that call can sleep from a range of time ranging from
> zero to a long time, it's going to give unpredictable results.
>
Well, not sleep technically, but yield the CPU for some undefined
amount of time.
> It seems to me that this sort of thing is why we have POSIX pthread
> synchronization primitives.. sched_yield is basically there for a
> process to indicate that "what I'm doing doesn't matter much, let other
> stuff run". Any other use of it generally constitutes some kind of hack.
>
In SCHED_OTHER mode, you're right, sched_yield is basically meaningless.
In a realtime system, there is a very well defined and probably useful
behaviour.
Eg. If 2 SCHED_FIFO processes are running at the same priority, One can
call sched_yield to deterministically give the CPU to the other guy.
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-20 4:18 ` Nick Piggin
@ 2005-08-20 18:38 ` Howard Chu
2005-08-20 20:57 ` Lee Revell
` (2 more replies)
0 siblings, 3 replies; 49+ messages in thread
From: Howard Chu @ 2005-08-20 18:38 UTC (permalink / raw)
To: Nick Piggin; +Cc: Robert Hancock, linux-kernel
Nick Piggin wrote:
> Robert Hancock wrote:
> > I fail to see how sched_yield is going to be very helpful in this
> > situation. Since that call can sleep from a range of time ranging
> > from zero to a long time, it's going to give unpredictable results.
> Well, not sleep technically, but yield the CPU for some undefined
> amount of time.
Since the slapd server was not written to run in realtime, nor is it
commonly run on realtime operating systems, I don't believe predictable
timing here is a criteria we care about. One could say the same of
sigsuspend() by the way - it can pause a process for a range of time
ranging from zero to a long time. Should we tell application writers not
to use this function either, regardless of whether the developer thinks
they have a good reason to use it?
> > It seems to me that this sort of thing is why we have POSIX pthread
> > synchronization primitives.. sched_yield is basically there for a
> > process to indicate that "what I'm doing doesn't matter much, let
> > other stuff run". Any other use of it generally constitutes some
> > kind of hack.
In terms of transaction recovery, we do an exponential backoff on the
retries, because our benchmarks showed that under heavy lock contention,
immediate retries only made things worse. In fact, having arbitrarily
long backoff delays here was shown to improve transaction throughput.
(We use select() with an increasing timeval in combination with the
yield() call. One way or another we get a longer delay as desired.)
sched_yield is there for a *thread* to indicate "what I'm doing doesn't
matter much, let other stuff run."
I suppose it may be a hack. But then so is TCP congestion control. In
both cases, empirical evidence indicates the hack is worthwhile. If you
haven't done the analysis then you're in no position to deny the value
of the approach.
> In SCHED_OTHER mode, you're right, sched_yield is basically
> meaningless.
> In a realtime system, there is a very well defined and probably
> useful behaviour.
> Eg. If 2 SCHED_FIFO processes are running at the same priority, One
> can call sched_yield to deterministically give the CPU to the other
> guy.
Well yes, the point of a realtime system is to provide deterministic
response times to unpredictable input.
I'll note that we removed a number of the yield calls (that were in
OpenLDAP 2.2) for the 2.3 release, because I found that they were
redundant and causing unnecessary delays. My own test system is running
on a Linux 2.6.12.3 kernel (installed over a SuSE 9.2 x86_64 distro),
and OpenLDAP 2.3 runs perfectly well here, now that those redundant
calls have been removed. But I also found that I needed to add a new
yield(), to work around yet another unexpected issue on this system - we
have a number of threads waiting on a condition variable, and the thread
holding the mutex signals the var, unlocks the mutex, and then
immediately relocks it. The expectation here is that upon unlocking the
mutex, the calling thread would block while some waiting thread (that
just got signaled) would get to run. In fact what happened is that the
calling thread unlocked and relocked the mutex without allowing any of
the waiting threads to run. In this case the only solution was to insert
a yield() after the mutex_unlock(). So again, for those of you claiming
"oh, all you need to do is use a condition variable or any of the other
POSIX synchronization primitives" - yes, that's a nice theory, but
reality says otherwise.
To say that sched_yield is basically meaningless is far overstating your
point.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc
OpenLDAP Core Team http://www.openldap.org/project/
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-20 18:38 ` Howard Chu
@ 2005-08-20 20:57 ` Lee Revell
2005-08-20 21:24 ` Howard Chu
2005-08-20 21:50 ` Lee Revell
2005-08-21 1:04 ` Robert Hancock
2 siblings, 1 reply; 49+ messages in thread
From: Lee Revell @ 2005-08-20 20:57 UTC (permalink / raw)
To: Howard Chu; +Cc: Nick Piggin, Robert Hancock, linux-kernel
On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote:
> But I also found that I needed to add a new
> yield(), to work around yet another unexpected issue on this system -
> we have a number of threads waiting on a condition variable, and the
> thread holding the mutex signals the var, unlocks the mutex, and then
> immediately relocks it. The expectation here is that upon unlocking
> the mutex, the calling thread would block while some waiting thread
> (that just got signaled) would get to run. In fact what happened is
> that the calling thread unlocked and relocked the mutex without
> allowing any of the waiting threads to run. In this case the only
> solution was to insert a yield() after the mutex_unlock().
That's exactly the behavior I would expect. Why would you expect
unlocking a mutex to cause a reschedule, if the calling thread still has
timeslice left?
Lee
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-20 20:57 ` Lee Revell
@ 2005-08-20 21:24 ` Howard Chu
2005-08-21 0:36 ` Nick Piggin
2005-08-21 11:33 ` Nikita Danilov
0 siblings, 2 replies; 49+ messages in thread
From: Howard Chu @ 2005-08-20 21:24 UTC (permalink / raw)
To: Lee Revell; +Cc: Nick Piggin, Robert Hancock, linux-kernel
Lee Revell wrote:
> On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote:
> > But I also found that I needed to add a new yield(), to work around
> > yet another unexpected issue on this system - we have a number of
> > threads waiting on a condition variable, and the thread holding the
> > mutex signals the var, unlocks the mutex, and then immediately
> > relocks it. The expectation here is that upon unlocking the mutex,
> > the calling thread would block while some waiting thread (that just
> > got signaled) would get to run. In fact what happened is that the
> > calling thread unlocked and relocked the mutex without allowing any
> > of the waiting threads to run. In this case the only solution was
> > to insert a yield() after the mutex_unlock().
>
> That's exactly the behavior I would expect. Why would you expect
> unlocking a mutex to cause a reschedule, if the calling thread still
> has timeslice left?
That's beside the point. Folks are making an assertion that
sched_yield() is meaningless; this example demonstrates that there are
cases where sched_yield() is essential.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc
OpenLDAP Core Team http://www.openldap.org/project/
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-20 21:24 ` Howard Chu
@ 2005-08-21 0:36 ` Nick Piggin
2005-08-21 11:33 ` Nikita Danilov
1 sibling, 0 replies; 49+ messages in thread
From: Nick Piggin @ 2005-08-21 0:36 UTC (permalink / raw)
To: Howard Chu; +Cc: Lee Revell, Robert Hancock, linux-kernel
Howard Chu wrote:
> Lee Revell wrote:
>
>> On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote:
>> > But I also found that I needed to add a new yield(), to work around
>> > yet another unexpected issue on this system - we have a number of
>> > threads waiting on a condition variable, and the thread holding the
>> > mutex signals the var, unlocks the mutex, and then immediately
>> > relocks it. The expectation here is that upon unlocking the mutex,
>> > the calling thread would block while some waiting thread (that just
>> > got signaled) would get to run. In fact what happened is that the
>> > calling thread unlocked and relocked the mutex without allowing any
>> > of the waiting threads to run. In this case the only solution was
>> > to insert a yield() after the mutex_unlock().
>>
>> That's exactly the behavior I would expect. Why would you expect
>> unlocking a mutex to cause a reschedule, if the calling thread still
>> has timeslice left?
>
>
> That's beside the point. Folks are making an assertion that
> sched_yield() is meaningless; this example demonstrates that there are
> cases where sched_yield() is essential.
>
The point is, with SCHED_OTHER scheduling, sched_yield() need not
do anything. It may not let any other tasks run.
The fact that it does on Linux is because we do attempt to do
something expected... but the simple matter is that you can't realy
on it to do what you expect.
I'm not sure exactly how you would solve the above problem, but I'm
sure it can be achieved using mutexes (for example, you could have
a queue where every thread waits on its own private mutex).... but I
don't do much userspace C programming sorry.
Send instant messages to your online friends http://au.messenger.yahoo.com
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-20 21:24 ` Howard Chu
2005-08-21 0:36 ` Nick Piggin
@ 2005-08-21 11:33 ` Nikita Danilov
2005-08-22 8:06 ` Howard Chu
1 sibling, 1 reply; 49+ messages in thread
From: Nikita Danilov @ 2005-08-21 11:33 UTC (permalink / raw)
To: Howard Chu; +Cc: Nick Piggin, Robert Hancock, linux-kernel
Howard Chu writes:
> Lee Revell wrote:
> > On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote:
> > > But I also found that I needed to add a new yield(), to work around
> > > yet another unexpected issue on this system - we have a number of
> > > threads waiting on a condition variable, and the thread holding the
> > > mutex signals the var, unlocks the mutex, and then immediately
> > > relocks it. The expectation here is that upon unlocking the mutex,
> > > the calling thread would block while some waiting thread (that just
> > > got signaled) would get to run. In fact what happened is that the
> > > calling thread unlocked and relocked the mutex without allowing any
> > > of the waiting threads to run. In this case the only solution was
> > > to insert a yield() after the mutex_unlock().
> >
> > That's exactly the behavior I would expect. Why would you expect
> > unlocking a mutex to cause a reschedule, if the calling thread still
> > has timeslice left?
>
> That's beside the point. Folks are making an assertion that
> sched_yield() is meaningless; this example demonstrates that there are
> cases where sched_yield() is essential.
It is not essential, it is non-portable.
Code you described is based on non-portable "expectations" about thread
scheduling. Linux implementation of pthreads fails to satisfy
them. Perfectly reasonable. Code is then "fixed" by adding sched_yield()
calls and introducing more non-portable assumptions. Again, there is no
guarantee this would work on any compliant implementation.
While "intuitive" semantics of sched_yield() is to yield CPU and to give
other runnable threads their chance to run, this is _not_ what standard
prescribes (for non-RT threads).
>
> --
> -- Howard Chu
Nikita.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-21 11:33 ` Nikita Danilov
@ 2005-08-22 8:06 ` Howard Chu
0 siblings, 0 replies; 49+ messages in thread
From: Howard Chu @ 2005-08-22 8:06 UTC (permalink / raw)
To: Nikita Danilov; +Cc: Nick Piggin, Robert Hancock, linux-kernel
Nikita Danilov wrote:
> Howard Chu writes:
>
> > That's beside the point. Folks are making an assertion that
> > sched_yield() is meaningless; this example demonstrates that there are
> > cases where sched_yield() is essential.
>
> It is not essential, it is non-portable.
>
> Code you described is based on non-portable "expectations" about thread
> scheduling. Linux implementation of pthreads fails to satisfy
> them. Perfectly reasonable. Code is then "fixed" by adding sched_yield()
> calls and introducing more non-portable assumptions. Again, there is no
> guarantee this would work on any compliant implementation.
>
> While "intuitive" semantics of sched_yield() is to yield CPU and to give
> other runnable threads their chance to run, this is _not_ what standard
> prescribes (for non-RT threads).
>
Very well; it is not prescribed in the standard and it is non-portable.
Our code is broken and we will fix it.
But even Dave Butenhof, Mr. Pthreads himself, has said it is reasonable
to expect sched_yield to yield the CPU. That's what pthread_yield did in
Pthreads Draft 4 (DCE threads) and it is common knowledge that
sched_yield is a direct replacement for pthread_yield; i.e.,
pthread_yield() was deleted from the spec because sched_yield fulfilled
its purpose. Now you're saying "well, technically, sched_yield doesn't
have to do anything at all" and the letter of the spec supports your
position, but anybody who's been programming with pthreads since the DCE
days "knows" that is not the original intention. I wonder that nobody
has decided to raise this issue with the IEEE/POSIX group and get them
to issue a correction/clarification in all this time, since the absence
of specification here really impairs the usefulness of the spec.
Likewise the fact that sched_yield() can now cause the current process
to be queued behind other processes seems suspect, unless we know for
sure that the threads are running with PTHREAD_SCOPE_SYSTEM. (I haven't
checked to see if PTHREAD_SCOPE_PROCESS is still supported in NPTL.)
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc
OpenLDAP Core Team http://www.openldap.org/project/
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-20 18:38 ` Howard Chu
2005-08-20 20:57 ` Lee Revell
@ 2005-08-20 21:50 ` Lee Revell
2005-08-21 1:04 ` Robert Hancock
2 siblings, 0 replies; 49+ messages in thread
From: Lee Revell @ 2005-08-20 21:50 UTC (permalink / raw)
To: Howard Chu; +Cc: Nick Piggin, Robert Hancock, linux-kernel
On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote:
> Nick Piggin wrote:
> > Robert Hancock wrote:
> > > I fail to see how sched_yield is going to be very helpful in this
> > > situation. Since that call can sleep from a range of time ranging
> > > from zero to a long time, it's going to give unpredictable results.
>
> > Well, not sleep technically, but yield the CPU for some undefined
> > amount of time.
>
> Since the slapd server was not written to run in realtime, nor is it
> commonly run on realtime operating systems, I don't believe predictable
> timing here is a criteria we care about. One could say the same of
> sigsuspend() by the way - it can pause a process for a range of time
> ranging from zero to a long time. Should we tell application writers not
> to use this function either, regardless of whether the developer thinks
> they have a good reason to use it?
Of course not. We should tell them that if they use sigsuspend() they
cannot assume that the process will not wake up immediately.
Lee
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-20 18:38 ` Howard Chu
2005-08-20 20:57 ` Lee Revell
2005-08-20 21:50 ` Lee Revell
@ 2005-08-21 1:04 ` Robert Hancock
2005-08-22 11:44 ` linux-os (Dick Johnson)
2 siblings, 1 reply; 49+ messages in thread
From: Robert Hancock @ 2005-08-21 1:04 UTC (permalink / raw)
To: linux-kernel
Howard Chu wrote:
> I'll note that we removed a number of the yield calls (that were in
> OpenLDAP 2.2) for the 2.3 release, because I found that they were
> redundant and causing unnecessary delays. My own test system is running
> on a Linux 2.6.12.3 kernel (installed over a SuSE 9.2 x86_64 distro),
> and OpenLDAP 2.3 runs perfectly well here, now that those redundant
> calls have been removed. But I also found that I needed to add a new
> yield(), to work around yet another unexpected issue on this system - we
> have a number of threads waiting on a condition variable, and the thread
> holding the mutex signals the var, unlocks the mutex, and then
> immediately relocks it. The expectation here is that upon unlocking the
> mutex, the calling thread would block while some waiting thread (that
> just got signaled) would get to run. In fact what happened is that the
> calling thread unlocked and relocked the mutex without allowing any of
> the waiting threads to run. In this case the only solution was to insert
> a yield() after the mutex_unlock(). So again, for those of you claiming
> "oh, all you need to do is use a condition variable or any of the other
> POSIX synchronization primitives" - yes, that's a nice theory, but
> reality says otherwise.
I encountered a similar issue with some software that I wrote, and used
a similar workaround, however this was basically because there wasn't
enough time available at the time to redesign things to work properly.
The problem here is essentially caused by the fact that the mutex is
being locked for an excessively large proportion of the time and not
letting other threads in. In the case I am thinking of, posting the
messages to the thread that was hogging the mutex via a signaling queue
would have been a better solution than using yield and having correct
operation depend on undefined parts of thread scheduling behavior..
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-21 1:04 ` Robert Hancock
@ 2005-08-22 11:44 ` linux-os (Dick Johnson)
2005-08-22 14:26 ` Robert Hancock
0 siblings, 1 reply; 49+ messages in thread
From: linux-os (Dick Johnson) @ 2005-08-22 11:44 UTC (permalink / raw)
To: Robert Hancock; +Cc: linux-kernel
On Sat, 20 Aug 2005, Robert Hancock wrote:
> Howard Chu wrote:
>> I'll note that we removed a number of the yield calls (that were in
>> OpenLDAP 2.2) for the 2.3 release, because I found that they were
>> redundant and causing unnecessary delays. My own test system is running
>> on a Linux 2.6.12.3 kernel (installed over a SuSE 9.2 x86_64 distro),
>> and OpenLDAP 2.3 runs perfectly well here, now that those redundant
>> calls have been removed. But I also found that I needed to add a new
>> yield(), to work around yet another unexpected issue on this system - we
>> have a number of threads waiting on a condition variable, and the thread
>> holding the mutex signals the var, unlocks the mutex, and then
>> immediately relocks it. The expectation here is that upon unlocking the
>> mutex, the calling thread would block while some waiting thread (that
>> just got signaled) would get to run. In fact what happened is that the
>> calling thread unlocked and relocked the mutex without allowing any of
>> the waiting threads to run. In this case the only solution was to insert
>> a yield() after the mutex_unlock(). So again, for those of you claiming
>> "oh, all you need to do is use a condition variable or any of the other
>> POSIX synchronization primitives" - yes, that's a nice theory, but
>> reality says otherwise.
>
> I encountered a similar issue with some software that I wrote, and used
> a similar workaround, however this was basically because there wasn't
> enough time available at the time to redesign things to work properly.
> The problem here is essentially caused by the fact that the mutex is
> being locked for an excessively large proportion of the time and not
> letting other threads in. In the case I am thinking of, posting the
> messages to the thread that was hogging the mutex via a signaling queue
> would have been a better solution than using yield and having correct
> operation depend on undefined parts of thread scheduling behavior..
>
> --
> Robert Hancock Saskatoon, SK, Canada
> To email, remove "nospam" from hancockr@nospamshaw.ca
> Home Page: http://www.roberthancock.com/
>
I reported thet sched_yield() wasn't working (at least as expected)
back in March of 2004.
for(;;)
sched_yield();
... takes 100% CPU time as reported by `top`. It should take
practically 0. Somebody said that this was because `top` was
broken, others said that it was because I didn't know how to
code. Nevertheless, the problem was not fixed, even after
schedular changes were made for the current version.
One can execute:
usleep(0);
... instead of:
sched_yield();
... and Linux then performs exactly like other Unixes when
code is waiting on mutexes.
Cheers,
Dick Johnson
Penguin : Linux version 2.6.12.5 on an i686 machine (5537.79 BogoMips).
Warning : 98.36% of all statistics are fiction.
.
I apologize for the following. I tried to kill it with the above dot :
****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.
Thank you.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-22 11:44 ` linux-os (Dick Johnson)
@ 2005-08-22 14:26 ` Robert Hancock
2005-08-23 11:17 ` linux-os (Dick Johnson)
0 siblings, 1 reply; 49+ messages in thread
From: Robert Hancock @ 2005-08-22 14:26 UTC (permalink / raw)
To: linux-kernel
linux-os (Dick Johnson) wrote:
> I reported thet sched_yield() wasn't working (at least as expected)
> back in March of 2004.
>
> for(;;)
> sched_yield();
>
> ... takes 100% CPU time as reported by `top`. It should take
> practically 0. Somebody said that this was because `top` was
> broken, others said that it was because I didn't know how to
> code. Nevertheless, the problem was not fixed, even after
> schedular changes were made for the current version.
This is what I would expect if run on an otherwise idle machine.
sched_yield just puts you at the back of the line for runnable
processes, it doesn't magically cause you to go to sleep somehow.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-22 14:26 ` Robert Hancock
@ 2005-08-23 11:17 ` linux-os (Dick Johnson)
2005-08-23 12:07 ` Denis Vlasenko
0 siblings, 1 reply; 49+ messages in thread
From: linux-os (Dick Johnson) @ 2005-08-23 11:17 UTC (permalink / raw)
To: Robert Hancock; +Cc: linux-kernel
On Mon, 22 Aug 2005, Robert Hancock wrote:
> linux-os (Dick Johnson) wrote:
>> I reported thet sched_yield() wasn't working (at least as expected)
>> back in March of 2004.
>>
>> for(;;)
>> sched_yield();
>>
>> ... takes 100% CPU time as reported by `top`. It should take
>> practically 0. Somebody said that this was because `top` was
>> broken, others said that it was because I didn't know how to
>> code. Nevertheless, the problem was not fixed, even after
>> schedular changes were made for the current version.
>
> This is what I would expect if run on an otherwise idle machine.
> sched_yield just puts you at the back of the line for runnable
> processes, it doesn't magically cause you to go to sleep somehow.
>
When a kernel build is occurring??? Plus `top` itself.... It damn
well sleep while giving up the CPU. If it doesn't it's broken.
> --
> Robert Hancock Saskatoon, SK, Canada
> To email, remove "nospam" from hancockr@nospamshaw.ca
> Home Page: http://www.roberthancock.com/
>
Cheers,
Dick Johnson
Penguin : Linux version 2.6.12.5 on an i686 machine (5537.79 BogoMips).
Warning : 98.36% of all statistics are fiction.
.
I apologize for the following. I tried to kill it with the above dot :
****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.
Thank you.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-23 11:17 ` linux-os (Dick Johnson)
@ 2005-08-23 12:07 ` Denis Vlasenko
2005-08-24 3:37 ` Lincoln Dale
0 siblings, 1 reply; 49+ messages in thread
From: Denis Vlasenko @ 2005-08-23 12:07 UTC (permalink / raw)
To: linux-os (Dick Johnson), Robert Hancock; +Cc: linux-kernel
On Tuesday 23 August 2005 14:17, linux-os \(Dick Johnson\) wrote:
>
> On Mon, 22 Aug 2005, Robert Hancock wrote:
>
> > linux-os (Dick Johnson) wrote:
> >> I reported thet sched_yield() wasn't working (at least as expected)
> >> back in March of 2004.
> >>
> >> for(;;)
> >> sched_yield();
> >>
> >> ... takes 100% CPU time as reported by `top`. It should take
> >> practically 0. Somebody said that this was because `top` was
> >> broken, others said that it was because I didn't know how to
> >> code. Nevertheless, the problem was not fixed, even after
> >> schedular changes were made for the current version.
> >
> > This is what I would expect if run on an otherwise idle machine.
> > sched_yield just puts you at the back of the line for runnable
> > processes, it doesn't magically cause you to go to sleep somehow.
> >
>
> When a kernel build is occurring??? Plus `top` itself.... It damn
> well sleep while giving up the CPU. If it doesn't it's broken.
top doesn't run all the time:
# strace -o top.strace -tt top
14:52:19.407958 write(1, " 758 root 16 0 104 2"..., 79) = 79
14:52:19.408318 write(1, " 759 root 16 0 100 1"..., 79) = 79
14:52:19.408659 write(1, " 760 root 16 0 100 1"..., 79) = 79
14:52:19.409001 write(1, " 761 root 18 0 2604 39"..., 74) = 74
14:52:19.409342 write(1, " 763 daemon 17 0 108 1"..., 78) = 78
14:52:19.409672 write(1, " 773 root 16 0 104 2"..., 79) = 79
14:52:19.410010 write(1, " 774 root 16 0 104 2"..., 79) = 79
14:52:19.410362 write(1, " 775 root 16 0 100 1"..., 79) = 79
14:52:19.410692 write(1, " 776 root 16 0 104 2"..., 79) = 79
14:52:19.411136 write(1, " 777 daemon 17 0 108 1"..., 86) = 86
14:52:19.411505 select(1, [0], NULL, NULL, {5, 0}) = 0 (Timeout)
hrrr..... psssss.......
14:52:24.411744 time([1124797944]) = 1124797944
14:52:24.411883 lseek(4, 0, SEEK_SET) = 0
14:52:24.411957 read(4, "24822.01 18801.28\n", 1023) = 18
14:52:24.412082 access("/var/run/utmpx", F_OK) = -1 ENOENT (No such file or directory)
14:52:24.412224 open("/var/run/utmp", O_RDWR) = 8
14:52:24.412328 fcntl64(8, F_GETFD) = 0
14:52:24.412399 fcntl64(8, F_SETFD, FD_CLOEXEC) = 0
14:52:24.412467 _llseek(8, 0, [0], SEEK_SET) = 0
14:52:24.412556 alarm(0) = 0
14:52:24.412643 rt_sigaction(SIGALRM, {0x4015a57c, [], SA_RESTORER, 0x40094ae8}, {SIG_DFL}, 8) = 0
14:52:24.412747 alarm(1) = 0
However, kernel compile shouldn't.
I suggest stracing with -tt "for(;;) yield();" test proggy with and without
kernel compile in parallel, and comparing the output...
Hmm... actually, knowing that you will argue to death instead...
# cat t.c
#include <sched.h>
int main() {
for(;;) sched_yield();
return 0;
}
# gcc t.c
# strace -tt ./a.out
...
15:03:41.211324 sched_yield() = 0
15:03:41.211673 sched_yield() = 0
15:03:41.212034 sched_yield() = 0
15:03:41.212400 sched_yield() = 0
15:03:41.212749 sched_yield() = 0
15:03:41.213126 sched_yield() = 0
15:03:41.213486 sched_yield() = 0
15:03:41.213835 sched_yield() = 0
15:03:41.214220 sched_yield() = 0
15:03:41.214577 sched_yield() = 0
15:03:41.214939 sched_yield() = 0
I start "while true; do true; done" on another console...
15:03:43.314645 sched_yield() = 0
15:03:43.847644 sched_yield() = 0
15:03:43.954635 sched_yield() = 0
15:03:44.063798 sched_yield() = 0
15:03:44.171596 sched_yield() = 0
15:03:44.282624 sched_yield() = 0
15:03:44.391632 sched_yield() = 0
15:03:44.498609 sched_yield() = 0
15:03:44.605584 sched_yield() = 0
15:03:44.712538 sched_yield() = 0
15:03:44.819557 sched_yield() = 0
15:03:44.928594 sched_yield() = 0
15:03:45.040603 sched_yield() = 0
15:03:45.148545 sched_yield() = 0
15:03:45.259311 sched_yield() = 0
15:03:45.368563 sched_yield() = 0
15:03:45.476482 sched_yield() = 0
15:03:45.583568 sched_yield() = 0
15:03:45.690491 sched_yield() = 0
15:03:45.797512 sched_yield() = 0
15:03:45.906534 sched_yield() = 0
15:03:46.013545 sched_yield() = 0
15:03:46.120505 sched_yield() = 0
Ctrl-C
# uname -a
Linux firebird 2.6.12-r4 #1 SMP Sun Jul 17 13:51:47 EEST 2005 i686 unknown unknown GNU/Linux
--
vda
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-23 12:07 ` Denis Vlasenko
@ 2005-08-24 3:37 ` Lincoln Dale
0 siblings, 0 replies; 49+ messages in thread
From: Lincoln Dale @ 2005-08-24 3:37 UTC (permalink / raw)
To: Denis Vlasenko; +Cc: linux-os (Dick Johnson), Robert Hancock, linux-kernel
Denis Vlasenko wrote:
>>>This is what I would expect if run on an otherwise idle machine.
>>>sched_yield just puts you at the back of the line for runnable
>>>processes, it doesn't magically cause you to go to sleep somehow.
>>>
>>>
>>When a kernel build is occurring??? Plus `top` itself.... It damn
>>well sleep while giving up the CPU. If it doesn't it's broken.
>>
>>
unless you have all of the kernel source in the buffer cache, a
concurrent kernel build will spend a fair bit of time in io_wait state ..
as such its perfectly plausible that sched_yield keeps popping back to
the top of 'runnable' processes . . .
cheers,
lincoln.
^ permalink raw reply [flat|nested] 49+ messages in thread
* e100 oops on resume
@ 2006-01-24 22:59 Stefan Seyfried
2006-01-24 23:21 ` Mattia Dongili
0 siblings, 1 reply; 49+ messages in thread
From: Stefan Seyfried @ 2006-01-24 22:59 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: netdev
Hi,
since 2.6.16rc1-git3, e100 dies on resume (regardless if from disk, ram or
runtime powermanagement). Unfortunately i only have a bad photo of
the oops right now, it is available from
https://bugzilla.novell.com/attachment.cgi?id=64761&action=view
I have reproduced this on a second e100 machine and can get a serial
console log from this machine tomorrow if needed.
It did resume fine with 2.6.15-git12
--
Stefan Seyfried \ "I didn't want to write for pay. I
QA / R&D Team Mobile Devices \ wanted to be paid for what I write."
SUSE LINUX Products GmbH, Nürnberg \ -- Leonard Cohen
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: e100 oops on resume
2006-01-24 22:59 e100 oops on resume Stefan Seyfried
@ 2006-01-24 23:21 ` Mattia Dongili
2006-01-25 9:02 ` Olaf Kirch
0 siblings, 1 reply; 49+ messages in thread
From: Mattia Dongili @ 2006-01-24 23:21 UTC (permalink / raw)
To: Stefan Seyfried; +Cc: Linux Kernel Mailing List, netdev
On Tue, Jan 24, 2006 at 11:59:19PM +0100, Stefan Seyfried wrote:
> Hi,
> since 2.6.16rc1-git3, e100 dies on resume (regardless if from disk, ram or
> runtime powermanagement). Unfortunately i only have a bad photo of
> the oops right now, it is available from
> https://bugzilla.novell.com/attachment.cgi?id=64761&action=view
> I have reproduced this on a second e100 machine and can get a serial
> console log from this machine tomorrow if needed.
> It did resume fine with 2.6.15-git12
I experienced the same today, I was planning to get a photo tomorrow :)
I'm running 2.6.16-rc1-mm2 and the last working kernel was 2.6.15-mm4
(didn't try .16-rc1-mm1 being scared of the reiserfs breakage).
--
mattia
:wq!
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: e100 oops on resume
2006-01-24 23:21 ` Mattia Dongili
@ 2006-01-25 9:02 ` Olaf Kirch
2006-01-25 12:11 ` Olaf Kirch
0 siblings, 1 reply; 49+ messages in thread
From: Olaf Kirch @ 2006-01-25 9:02 UTC (permalink / raw)
To: Stefan Seyfried, Linux Kernel Mailing List, netdev
[-- Attachment #1: Type: text/plain, Size: 1478 bytes --]
On Wed, Jan 25, 2006 at 12:21:42AM +0100, Mattia Dongili wrote:
> I experienced the same today, I was planning to get a photo tomorrow :)
> I'm running 2.6.16-rc1-mm2 and the last working kernel was 2.6.15-mm4
> (didn't try .16-rc1-mm1 being scared of the reiserfs breakage).
I think that's because the latest driver version wants to wait for
the ucode download, and e100_exec_cb_wait before allocating any
control blocks.
static inline int e100_exec_cb_wait(struct nic *nic, struct sk_buff *skb,
void (*cb_prepare)(struct nic *, struct cb *, struct sk_buff *))
{
int err = 0, counter = 50;
struct cb *cb = nic->cb_to_clean;
if ((err = e100_exec_cb(nic, NULL, e100_setup_ucode)))
DPRINTK(PROBE,ERR, "ucode cmd failed with error %d\n", err);
/* NOTE: the oops shows that e100_exec_cb fails with ENOMEM,
* which also means there are no cbs */
/* ... other stuff...
* and then we die here because cb is NULL: */
while (!(cb->status & cpu_to_le16(cb_complete))) {
msleep(10);
if (!--counter) break;
}
I'm not sure what the right fix would be. e100_resume would probably
have to call e100_alloc_cbs early on, while e100_up should avoid
calling it a second time if nic->cbs_avail != 0. A tentative patch
for testing is attached.
Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir@suse.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
[-- Attachment #2: e100-resume-fix --]
[-- Type: text/plain, Size: 1830 bytes --]
[PATCH] e100: allocate cbs early on when resuming
Signed-off-by: Olaf Kirch <okir@suse.de>
drivers/net/e100.c | 14 +++++++++++---
1 files changed, 11 insertions(+), 3 deletions(-)
Index: build/drivers/net/e100.c
===================================================================
--- build.orig/drivers/net/e100.c
+++ build/drivers/net/e100.c
@@ -1298,8 +1298,10 @@ static inline int e100_exec_cb_wait(stru
int err = 0, counter = 50;
struct cb *cb = nic->cb_to_clean;
- if ((err = e100_exec_cb(nic, NULL, e100_setup_ucode)))
+ if ((err = e100_exec_cb(nic, NULL, e100_setup_ucode))) {
DPRINTK(PROBE,ERR, "ucode cmd failed with error %d\n", err);
+ return err;
+ }
/* must restart cuc */
nic->cuc_cmd = cuc_start;
@@ -1721,9 +1723,11 @@ static int e100_alloc_cbs(struct nic *ni
struct cb *cb;
unsigned int i, count = nic->params.cbs.count;
+ /* bail out if we've been here before */
+ if (nic->cbs_avail)
+ return 0;
+
nic->cuc_cmd = cuc_start;
- nic->cb_to_use = nic->cb_to_send = nic->cb_to_clean = NULL;
- nic->cbs_avail = 0;
nic->cbs = pci_alloc_consistent(nic->pdev,
sizeof(struct cb) * count, &nic->cbs_dma_addr);
@@ -2578,6 +2582,8 @@ static int __devinit e100_probe(struct p
nic->pdev = pdev;
nic->msg_enable = (1 << debug) - 1;
pci_set_drvdata(pdev, netdev);
+ nic->cb_to_use = nic->cb_to_send = nic->cb_to_clean = NULL;
+ nic->cbs_avail = 0;
if((err = pci_enable_device(pdev))) {
DPRINTK(PROBE, ERR, "Cannot enable PCI device, aborting.\n");
@@ -2752,6 +2758,8 @@ static int e100_resume(struct pci_dev *p
retval = pci_enable_wake(pdev, 0, 0);
if (retval)
DPRINTK(PROBE,ERR, "Error clearing wake events\n");
+ if ((retval = e100_alloc_cbs(nic)))
+ DPRINTK(PROBE,ERR, "No memory for cbs\n");
if(e100_hw_init(nic))
DPRINTK(HW, ERR, "e100_hw_init failed\n");
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: e100 oops on resume
2006-01-25 9:02 ` Olaf Kirch
@ 2006-01-25 12:11 ` Olaf Kirch
2006-01-25 13:51 ` sched_yield() makes OpenLDAP slow Howard Chu
0 siblings, 1 reply; 49+ messages in thread
From: Olaf Kirch @ 2006-01-25 12:11 UTC (permalink / raw)
To: Stefan Seyfried, Linux Kernel Mailing List, netdev
On Wed, Jan 25, 2006 at 10:02:40AM +0100, Olaf Kirch wrote:
> I'm not sure what the right fix would be. e100_resume would probably
> have to call e100_alloc_cbs early on, while e100_up should avoid
> calling it a second time if nic->cbs_avail != 0. A tentative patch
> for testing is attached.
Reportedly, the patch fixes the crash on resume.
Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir@suse.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2006-01-25 12:11 ` Olaf Kirch
@ 2006-01-25 13:51 ` Howard Chu
2006-01-25 14:38 ` Robert Hancock
` (2 more replies)
0 siblings, 3 replies; 49+ messages in thread
From: Howard Chu @ 2006-01-25 13:51 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: hancockr
Robert Hancock wrote:
> Howard Chu wrote:
> > POSIX requires a reschedule to occur, as noted here:
> > http://blog.firetree.net/2005/06/22/thread-yield-after-mutex-unlock/
>
> No, it doesn't:
>
> >
> > The relevant SUSv3 text is here
> >
http://www.opengroup.org/onlinepubs/000095399/functions/pthread_mutex_unlock.html
>
> "If there are threads blocked on the mutex object referenced by mutex
> when pthread_mutex_unlock() is called, resulting in the mutex becoming
> available, the scheduling policy shall determine which thread shall
> acquire the mutex."
>
> This says nothing about requiring a reschedule. The "scheduling policy"
> can well decide that the thread which just released the mutex can
> re-acquire it.
No, because the thread that just released the mutex is obviously not one
of the threads blocked on the mutex. When a mutex is unlocked, one of
the *waiting* threads at the time of the unlock must acquire it, and the
scheduling policy can determine that. But the thread the released the
mutex is not one of the waiting threads, and is not eligible for
consideration.
> > I suppose if pthread_mutex_unlock() actually behaved correctly we
could
> > remove the other sched_yield() hacks that didn't belong there in the
> > first place and go on our merry way.
>
> Generally, needing to implement hacks like this is a sign that there are
> problems with the synchronization design of the code (like a mutex which
> has excessive contention). Programs should not rely on the scheduling
> behavior of the kernel for proper operation when that behavior is not
> defined.
>
> --
> Robert Hancock Saskatoon, SK, Canada
> To email, remove "nospam" from hancockr@nospamshaw.ca
> Home Page: http://www.roberthancock.com/
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc
OpenLDAP Core Team http://www.openldap.org/project/
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2006-01-25 13:51 ` sched_yield() makes OpenLDAP slow Howard Chu
@ 2006-01-25 14:38 ` Robert Hancock
2006-01-25 17:49 ` Christopher Friesen
2006-01-26 1:07 ` David Schwartz
2 siblings, 0 replies; 49+ messages in thread
From: Robert Hancock @ 2006-01-25 14:38 UTC (permalink / raw)
To: Howard Chu; +Cc: Linux Kernel Mailing List
Howard Chu wrote:
> No, because the thread that just released the mutex is obviously not one
> of the threads blocked on the mutex. When a mutex is unlocked, one of
> the *waiting* threads at the time of the unlock must acquire it, and the
> scheduling policy can determine that. But the thread the released the
> mutex is not one of the waiting threads, and is not eligible for
> consideration.
That statement does not imply that any reschedule needs to happen at the
time of the mutex unlock at all, only that the other threads waiting on
the mutex can attempt to reacquire it when the scheduler allows them to.
In all likelihood, what tends to happen is that either the thread that
had the mutex previously still has time left in its timeslice and is
allowed to keep running and reacquire the mutex, or another thread is
woken up (perhaps on another CPU) but doesn't reacquire the mutex before
the original thread carries on and acquires it, and therefore goes back
to sleep.
Forcing the mutex to ping-pong between different threads would be quite
inefficient (especially on SMP machines), and is not something that
POSIX requires.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2006-01-25 13:51 ` sched_yield() makes OpenLDAP slow Howard Chu
2006-01-25 14:38 ` Robert Hancock
@ 2006-01-25 17:49 ` Christopher Friesen
2006-01-26 1:07 ` David Schwartz
2 siblings, 0 replies; 49+ messages in thread
From: Christopher Friesen @ 2006-01-25 17:49 UTC (permalink / raw)
To: Howard Chu; +Cc: Linux Kernel Mailing List, hancockr
Howard Chu wrote:
>
> Robert Hancock wrote:
> > This says nothing about requiring a reschedule. The "scheduling policy"
> > can well decide that the thread which just released the mutex can
> > re-acquire it.
>
> No, because the thread that just released the mutex is obviously not one
> of the threads blocked on the mutex. When a mutex is unlocked, one of
> the *waiting* threads at the time of the unlock must acquire it, and the
> scheduling policy can determine that. But the thread the released the
> mutex is not one of the waiting threads, and is not eligible for
> consideration.
Is it *required* that the new owner of the mutex is determined at the
time of mutex release?
If the kernel doesn't actually determine the new owner of the mutex
until the currently running thread swaps out, it would be possible for
the currently running thread to re-aquire the mutex.
Chris
^ permalink raw reply [flat|nested] 49+ messages in thread
* RE: sched_yield() makes OpenLDAP slow
2006-01-25 13:51 ` sched_yield() makes OpenLDAP slow Howard Chu
2006-01-25 14:38 ` Robert Hancock
2006-01-25 17:49 ` Christopher Friesen
@ 2006-01-26 1:07 ` David Schwartz
2006-01-26 8:30 ` Helge Hafting
2 siblings, 1 reply; 49+ messages in thread
From: David Schwartz @ 2006-01-26 1:07 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: hancockr
> Robert Hancock wrote:
> > "If there are threads blocked on the mutex object referenced by mutex
> > when pthread_mutex_unlock() is called, resulting in the mutex becoming
> > available, the scheduling policy shall determine which thread shall
> > acquire the mutex."
> >
> > This says nothing about requiring a reschedule. The "scheduling policy"
> > can well decide that the thread which just released the mutex can
> > re-acquire it.
> No, because the thread that just released the mutex is obviously not one
> of the threads blocked on the mutex.
So what?
> When a mutex is unlocked, one of
> the *waiting* threads at the time of the unlock must acquire it, and the
> scheduling policy can determine that.
This is false and is nowhere found in the standard.
> But the thread the released the
> mutex is not one of the waiting threads, and is not eligible for
> consideration.
Where are you getting this from? Nothing requires the scheduler to schedule
any threads when the mutex is released.
All that must happen is that the mutex must be unlocked. The scheduler is
permitted to allow any thread it wants to run at that point, or no thread.
Nothing says the thread that released the mutex can't continue running and
nothing says that it can't call pthread_mutex_lock and re-acquire the mutex
before any other thread gets around to getting it.
In general, it is very bad karma for the scheduler to stop a thread before
its timeslice is up if it doesn't have to. Consider one CPU and two threads,
each needing to do 100 quick lock/unlock cycles. Why force 200 context
switches?
DS
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2006-01-26 1:07 ` David Schwartz
@ 2006-01-26 8:30 ` Helge Hafting
2006-01-26 9:01 ` Nick Piggin
2006-01-26 10:50 ` Nikita Danilov
0 siblings, 2 replies; 49+ messages in thread
From: Helge Hafting @ 2006-01-26 8:30 UTC (permalink / raw)
To: davids; +Cc: Linux Kernel Mailing List, hancockr
David Schwartz wrote:
>>Robert Hancock wrote:
>>
>>
>>But the thread the released the
>>mutex is not one of the waiting threads, and is not eligible for
>>consideration.
>>
>>
>
> Where are you getting this from? Nothing requires the scheduler to schedule
>any threads when the mutex is released.
>
>
Correct.
> All that must happen is that the mutex must be unlocked. The scheduler is
>permitted to allow any thread it wants to run at that point, or no thread.
>Nothing says the thread that released the mutex can't continue running and
>
>
Correct. The releasing thread may keep running.
>nothing says that it can't call pthread_mutex_lock and re-acquire the mutex
>before any other thread gets around to getting it.
>
>
Wrong.
The spec says that the mutex must be given to a waiter (if any) at the
moment of release. The waiter don't have to be scheduled at that
point, it may keep sleeping with its freshly unlocked mutex. So the
unlocking thread may continue - but if it tries to reaquire the mutex
it will find the mutex taken and go to sleep at that point. Then other
processes will schedule, and at some time the one now owning the mutex
will wake up and do its work.
> In general, it is very bad karma for the scheduler to stop a thread before
>its timeslice is up if it doesn't have to. Consider one CPU and two threads,
>each needing to do 100 quick lock/unlock cycles. Why force 200 context
>switches?
>
Good point, except it is a strange program that do this. Lock the mutex
once,
do 100 operations, then unlock is the better way. :-)
Helge Hafting
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2006-01-26 8:30 ` Helge Hafting
@ 2006-01-26 9:01 ` Nick Piggin
2006-01-26 10:50 ` Nikita Danilov
1 sibling, 0 replies; 49+ messages in thread
From: Nick Piggin @ 2006-01-26 9:01 UTC (permalink / raw)
To: Helge Hafting; +Cc: davids, Linux Kernel Mailing List, hancockr
Helge Hafting wrote:
> David Schwartz wrote:
>> nothing says that it can't call pthread_mutex_lock and re-acquire the
>> mutex
>> before any other thread gets around to getting it.
>>
>>
> Wrong.
> The spec says that the mutex must be given to a waiter (if any) at the
> moment of release.
Repeating myself here...
To me it says that the scheduling policy decides at the moment of release.
What if the scheduling policy decides *right then* to give the mutex to
the next running thread that tries to aquire it?
That would be the logical way for a scheduling policy to decide the next
owner of the mutex.
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2006-01-26 8:30 ` Helge Hafting
2006-01-26 9:01 ` Nick Piggin
@ 2006-01-26 10:50 ` Nikita Danilov
1 sibling, 0 replies; 49+ messages in thread
From: Nikita Danilov @ 2006-01-26 10:50 UTC (permalink / raw)
To: Helge Hafting; +Cc: Linux Kernel Mailing List, hancockr
Helge Hafting writes:
[...]
>
> >nothing says that it can't call pthread_mutex_lock and re-acquire the mutex
> >before any other thread gets around to getting it.
> >
> >
> Wrong.
> The spec says that the mutex must be given to a waiter (if any) at the
> moment of release. The waiter don't have to be scheduled at that
> point, it may keep sleeping with its freshly unlocked mutex. So the
> unlocking thread may continue - but if it tries to reaquire the mutex
> it will find the mutex taken and go to sleep at that point. Then other
You just described a convoy formation: a phenomenon that all reasonable
mutex implementation try to avoid at all costs. If that's what standard
prescribes---the standard has to be amended.
>
> Helge Hafting
Nikita.
^ permalink raw reply [flat|nested] 49+ messages in thread
[parent not found: <5uZqb-4fo-15@gated-at.bofh.it>]
* Re: sched_yield() makes OpenLDAP slow
[not found] <5uZqb-4fo-15@gated-at.bofh.it>
@ 2006-01-14 22:47 ` Robert Hancock
0 siblings, 0 replies; 49+ messages in thread
From: Robert Hancock @ 2006-01-14 22:47 UTC (permalink / raw)
To: linux-kernel
Howard Chu wrote:
> POSIX requires a reschedule to occur, as noted here:
> http://blog.firetree.net/2005/06/22/thread-yield-after-mutex-unlock/
No, it doesn't:
>
> The relevant SUSv3 text is here
> http://www.opengroup.org/onlinepubs/000095399/functions/pthread_mutex_unlock.html
"If there are threads blocked on the mutex object referenced by mutex
when pthread_mutex_unlock() is called, resulting in the mutex becoming
available, the scheduling policy shall determine which thread shall
acquire the mutex."
This says nothing about requiring a reschedule. The "scheduling policy"
can well decide that the thread which just released the mutex can
re-acquire it.
> I suppose if pthread_mutex_unlock() actually behaved correctly we could
> remove the other sched_yield() hacks that didn't belong there in the
> first place and go on our merry way.
Generally, needing to implement hacks like this is a sign that there are
problems with the synchronization design of the code (like a mutex which
has excessive contention). Programs should not rely on the scheduling
behavior of the kernel for proper operation when that behavior is not
defined.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
@ 2006-01-14 19:29 Howard Chu
0 siblings, 0 replies; 49+ messages in thread
From: Howard Chu @ 2006-01-14 19:29 UTC (permalink / raw)
To: linux-kernel
Resurrecting a dead horse...
> *From: *Lee Revell
> *Date: * Sat Aug 20 2005 - 15:57:36 EST
>
> ------------------------------------------------------------------------
> On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote:
> >/ But I also found that I needed to add a new /
> >/ yield(), to work around yet another unexpected issue on this system -/
> >/ we have a number of threads waiting on a condition variable, and the/
> >/ thread holding the mutex signals the var, unlocks the mutex, and then /
> >/ immediately relocks it. The expectation here is that upon unlocking/
> >/ the mutex, the calling thread would block while some waiting thread/
> >/ (that just got signaled) would get to run. In fact what happened is/
> >/ that the calling thread unlocked and relocked the mutex without/
> >/ allowing any of the waiting threads to run. In this case the only/
> >/ solution was to insert a yield() after the mutex_unlock(). /
>
> That's exactly the behavior I would expect. Why would you expect
> unlocking a mutex to cause a reschedule, if the calling thread still has
> timeslice left?
>
> Lee
POSIX requires a reschedule to occur, as noted here:
http://blog.firetree.net/2005/06/22/thread-yield-after-mutex-unlock/
The relevant SUSv3 text is here
http://www.opengroup.org/onlinepubs/000095399/functions/pthread_mutex_unlock.html
I suppose if pthread_mutex_unlock() actually behaved correctly we could
remove the other sched_yield() hacks that didn't belong there in the
first place and go on our merry way.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc
OpenLDAP Core Team http://www.openldap.org/project/
^ permalink raw reply [flat|nested] 49+ messages in thread
[parent not found: <43057641.70700@symas.com.suse.lists.linux.kernel>]
* re: sched_yield() makes OpenLDAP slow
@ 2005-08-19 6:03 Howard Chu
2005-08-19 6:34 ` Nick Piggin
` (2 more replies)
0 siblings, 3 replies; 49+ messages in thread
From: Howard Chu @ 2005-08-19 6:03 UTC (permalink / raw)
To: linux-kernel
Hm, seems there's a great deal of misinformation in this thread.
> I also think OpenLDAP is wrong. First, it should be calling
> pthread_yield() because slapd is a multithreading process and it just
> wants to run the other threads. See:
...
> AFAIKS, sched_yield should only really be used by realtime
> applications that know exactly what they're doing.
pthread_yield() was deleted from the POSIX threads drafts years ago.
sched_yield() is the officially supported API, and OpenLDAP is using it
for the documented purpose. Anyone who says "applications shouldn't be
using sched_yield()" doesn't know what they're talking about.
> It's really more a feature than a bug that it breaks so easily
> because they should be really using futexes instead, which have much
> better behaviour than any sched_yield ever could (they will directly
> wake up another process waiting for the lock and avoid the thundering
> herd for contended locks)
You assume that spinlocks are the only reason a developer may want to
yield the processor. This assumption is unfounded. Case in point - the
primary backend in OpenLDAP uses a transactional database with
page-level locking of its data structures to provide high levels of
concurrency. It is the nature of such a system to encounter deadlocks
over the normal course of operations. When a deadlock is detected, some
thread must be chosen (by one of a variety of algorithms) to abort its
transaction, in order to allow other operations to proceed to
completion. In this situation, the chosen thread must get control of the
CPU long enough to clean itself up, and then it must yield the CPU in
order to allow any other competing threads to complete their
transaction. The thread with the aborted transaction relinquishes all of
its locks and then waits to get another shot at the CPU to try
everything over again. Again, this is all fundamental to the nature of
transactional programming. If the 2.6 kernel makes this programming
model unreasonably slow, then quite simply this kernel is not viable as
a database platform.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc
OpenLDAP Core Team http://www.openldap.org/project/
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-19 6:03 Howard Chu
@ 2005-08-19 6:34 ` Nick Piggin
2005-08-19 6:59 ` Chris Wedgwood
2005-08-19 10:21 ` Nikita Danilov
2 siblings, 0 replies; 49+ messages in thread
From: Nick Piggin @ 2005-08-19 6:34 UTC (permalink / raw)
To: Howard Chu; +Cc: linux-kernel
Hi Howard,
Thanks for joining the discussion. One request, if I may,
can you retain the CC list on posts please?
Howard Chu wrote:
>
>> AFAIKS, sched_yield should only really be used by realtime
>> applications that know exactly what they're doing.
>
>
> pthread_yield() was deleted from the POSIX threads drafts years ago.
> sched_yield() is the officially supported API, and OpenLDAP is using it
> for the documented purpose. Anyone who says "applications shouldn't be
> using sched_yield()" doesn't know what they're talking about.
>
Linux's SCHED_OTHER policy offers static priorities in the range [0..0].
I think anything else would be a bug, because from my reading of the
standards, a process with a higher static priority shall always preempt
a process with a lower priority.
And SCHED_OTHER simply doesn't work that way.
So sched_yield() from a SCHED_OTHER task is free to basically do anything
at all. Is that the kind of behaviour you had in mind?
>> It's really more a feature than a bug that it breaks so easily
>> because they should be really using futexes instead, which have much
>> better behaviour than any sched_yield ever could (they will directly
>> wake up another process waiting for the lock and avoid the thundering
>> herd for contended locks)
>
>
> You assume that spinlocks are the only reason a developer may want to
> yield the processor. This assumption is unfounded. Case in point - the
> primary backend in OpenLDAP uses a transactional database with
> page-level locking of its data structures to provide high levels of
> concurrency. It is the nature of such a system to encounter deadlocks
> over the normal course of operations. When a deadlock is detected, some
> thread must be chosen (by one of a variety of algorithms) to abort its
> transaction, in order to allow other operations to proceed to
> completion. In this situation, the chosen thread must get control of the
> CPU long enough to clean itself up, and then it must yield the CPU in
> order to allow any other competing threads to complete their
> transaction. The thread with the aborted transaction relinquishes all of
> its locks and then waits to get another shot at the CPU to try
> everything over again. Again, this is all fundamental to the nature of
You didn't explain why you can't use a mutex to do this. From
your brief description, it seems like a mutex might just do
the job nicely.
> transactional programming. If the 2.6 kernel makes this programming
> model unreasonably slow, then quite simply this kernel is not viable as
> a database platform.
>
Actually it should still be fast. It may yield excessive CPU to
other tasks (including those that are reniced). You didn't rely
on sched_yield providing some semantics about not doing such a
thing, did you?
Send instant messages to your online friends http://au.messenger.yahoo.com
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-19 6:03 Howard Chu
2005-08-19 6:34 ` Nick Piggin
@ 2005-08-19 6:59 ` Chris Wedgwood
2005-08-19 22:45 ` Howard Chu
2005-08-19 10:21 ` Nikita Danilov
2 siblings, 1 reply; 49+ messages in thread
From: Chris Wedgwood @ 2005-08-19 6:59 UTC (permalink / raw)
To: Howard Chu; +Cc: linux-kernel
On Thu, Aug 18, 2005 at 11:03:45PM -0700, Howard Chu wrote:
> If the 2.6 kernel makes this programming model unreasonably slow,
> then quite simply this kernel is not viable as a database platform.
Pretty much everyone else manages to make it work.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-19 6:59 ` Chris Wedgwood
@ 2005-08-19 22:45 ` Howard Chu
0 siblings, 0 replies; 49+ messages in thread
From: Howard Chu @ 2005-08-19 22:45 UTC (permalink / raw)
To: Chris Wedgwood; +Cc: linux-kernel
Chris Wedgwood wrote:
> On Thu, Aug 18, 2005 at 11:03:45PM -0700, Howard Chu wrote:
> > If the 2.6 kernel makes this programming model unreasonably slow,
> > then quite simply this kernel is not viable as a database platform.
> Pretty much everyone else manages to make it work.
And this contributes to the discussion how? Pretty much every other
Unix-ish operating system manages to make scheduling with nice'd
processes work. If you really want to get into what "everyone else
manages to make work" you're in for a rough ride.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc
OpenLDAP Core Team http://www.openldap.org/project/
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-19 6:03 Howard Chu
2005-08-19 6:34 ` Nick Piggin
2005-08-19 6:59 ` Chris Wedgwood
@ 2005-08-19 10:21 ` Nikita Danilov
2005-08-19 23:10 ` Howard Chu
2 siblings, 1 reply; 49+ messages in thread
From: Nikita Danilov @ 2005-08-19 10:21 UTC (permalink / raw)
To: Howard Chu; +Cc: Linux Kernel Mailing List
Howard Chu <hyc@symas.com> writes:
[...]
> concurrency. It is the nature of such a system to encounter deadlocks
> over the normal course of operations. When a deadlock is detected, some
> thread must be chosen (by one of a variety of algorithms) to abort its
> transaction, in order to allow other operations to proceed to
> completion. In this situation, the chosen thread must get control of the
> CPU long enough to clean itself up,
What prevents transaction monitor from using, say, condition variables
to "yield cpu"? That would have an additional advantage of blocking
thread precisely until specific event occurs, instead of blocking for
some vague indeterminate load and platform dependent amount of time.
> and then it must yield the CPU in
> order to allow any other competing threads to complete their
> transaction.
Again, this sounds like thing doable with standard POSIX synchronization
primitives.
>
> --
> -- Howard Chu
Nikita.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-19 10:21 ` Nikita Danilov
@ 2005-08-19 23:10 ` Howard Chu
2005-08-20 13:23 ` Nikita Danilov
0 siblings, 1 reply; 49+ messages in thread
From: Howard Chu @ 2005-08-19 23:10 UTC (permalink / raw)
To: Nikita Danilov; +Cc: Linux Kernel Mailing List
Nikita Danilov wrote:
> Howard Chu <hyc@symas.com> writes:
> > concurrency. It is the nature of such a system to encounter
> > deadlocks over the normal course of operations. When a deadlock is
> > detected, some thread must be chosen (by one of a variety of
> > algorithms) to abort its transaction, in order to allow other
> > operations to proceed to completion. In this situation, the chosen
> > thread must get control of the CPU long enough to clean itself up,
> What prevents transaction monitor from using, say, condition
> variables to "yield cpu"? That would have an additional advantage of
> blocking thread precisely until specific event occurs, instead of
> blocking for some vague indeterminate load and platform dependent
> amount of time.
Condition variables offer no control over which thread is waken up.
We're wandering into the design of the SleepyCat BerkeleyDB library
here, and we don't exert any control over that either. BerkeleyDB
doesn't appear to use pthread condition variables; it seems to construct
its own synchronization mechanisms on top of mutexes (and yield calls).
In this specific example, we use whatever BerkeleyDB provides and we're
certainly not about to write our own transactional embedded database
engine just for this.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc
OpenLDAP Core Team http://www.openldap.org/project/
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-19 23:10 ` Howard Chu
@ 2005-08-20 13:23 ` Nikita Danilov
2005-08-20 19:49 ` Howard Chu
0 siblings, 1 reply; 49+ messages in thread
From: Nikita Danilov @ 2005-08-20 13:23 UTC (permalink / raw)
To: Howard Chu; +Cc: Linux Kernel Mailing List
Howard Chu writes:
> Nikita Danilov wrote:
[...]
>
> > What prevents transaction monitor from using, say, condition
> > variables to "yield cpu"? That would have an additional advantage of
> > blocking thread precisely until specific event occurs, instead of
> > blocking for some vague indeterminate load and platform dependent
> > amount of time.
>
> Condition variables offer no control over which thread is waken up.
When only one thread waits on a condition variable, which is exactly a
scenario involved, --sorry if I weren't clear enough-- condition signal
provides precise control over which thread is woken up.
> We're wandering into the design of the SleepyCat BerkeleyDB library
> here, and we don't exert any control over that either. BerkeleyDB
> doesn't appear to use pthread condition variables; it seems to construct
> its own synchronization mechanisms on top of mutexes (and yield calls).
That returns us to the core of the problem: sched_yield() is used to
implement a synchronization primitive and non-portable assumptions are
made about its behavior: SUS defines that after sched_yield() thread
ceases to run on the CPU "until it again becomes the head of its thread
list", and "thread list" discipline is only defined for real-time
scheduling policies. E.g.,
int sched_yield(void)
{
return 0;
}
and
int sched_yield(void)
{
sleep(100);
return 0;
}
are both valid sched_yield() implementation for non-rt (SCHED_OTHER)
threads.
Nikita.
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-20 13:23 ` Nikita Danilov
@ 2005-08-20 19:49 ` Howard Chu
2005-08-20 22:08 ` Nikita Danilov
0 siblings, 1 reply; 49+ messages in thread
From: Howard Chu @ 2005-08-20 19:49 UTC (permalink / raw)
To: Nikita Danilov; +Cc: Linux Kernel Mailing List
Nikita Danilov wrote:
> That returns us to the core of the problem: sched_yield() is used to
> implement a synchronization primitive and non-portable assumptions are
> made about its behavior: SUS defines that after sched_yield() thread
> ceases to run on the CPU "until it again becomes the head of its thread
> list", and "thread list" discipline is only defined for real-time
> scheduling policies. E.g.,
>
> int sched_yield(void)
> {
> return 0;
> }
>
> and
>
> int sched_yield(void)
> {
> sleep(100);
> return 0;
> }
>
> are both valid sched_yield() implementation for non-rt (SCHED_OTHER)
> threads.
I think you're mistaken:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/0d4eaf3703131e86/da051ebe58976b00#da051ebe58976b00
sched_yield() is required to be supported even if priority scheduling is
not supported, and it is required to cause the calling thread (not
process) to yield the processor.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc
OpenLDAP Core Team http://www.openldap.org/project/
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-20 19:49 ` Howard Chu
@ 2005-08-20 22:08 ` Nikita Danilov
0 siblings, 0 replies; 49+ messages in thread
From: Nikita Danilov @ 2005-08-20 22:08 UTC (permalink / raw)
To: Howard Chu; +Cc: Linux Kernel Mailing List
Howard Chu writes:
> Nikita Danilov wrote:
> > That returns us to the core of the problem: sched_yield() is used to
> > implement a synchronization primitive and non-portable assumptions are
> > made about its behavior: SUS defines that after sched_yield() thread
> > ceases to run on the CPU "until it again becomes the head of its thread
> > list", and "thread list" discipline is only defined for real-time
> > scheduling policies. E.g.,
> >
> > int sched_yield(void)
> > {
> > return 0;
> > }
> >
> > and
> >
> > int sched_yield(void)
> > {
> > sleep(100);
> > return 0;
> > }
> >
> > are both valid sched_yield() implementation for non-rt (SCHED_OTHER)
> > threads.
> I think you're mistaken:
> http://groups.google.com/group/comp.programming.threads/browse_frm/thread/0d4eaf3703131e86/da051ebe58976b00#da051ebe58976b00
>
> sched_yield() is required to be supported even if priority scheduling is
> not supported, and it is required to cause the calling thread (not
> process) to yield the processor.
Of course sched_yield() is required to be supported, the question is for
how long CPU is yielded. Here is the quote from the SUS (actually the
complete definition of sched_yield()):
The sched_yield() function shall force the running thread to
relinquish the processor until it again becomes the head of its
thread list.
As far as I can see, SUS doesn't specify how "thread list" is maintained
for non-RT scheduling policy, and implementation that immediately places
SCHED_OTHER thread that called sched_yield() back at the head of its
thread list is perfectly valid. Also valid is an implementation that
waits for 100 seconds and then places sched_yield() caller to the head
of the list, etc. Basically, while semantics of sched_yield() are well
defined for RT scheduling policy, for SCHED_OTHER policy standard leaves
it implementation defined.
>
> --
> -- Howard Chu
Nikita.
^ permalink raw reply [flat|nested] 49+ messages in thread
[parent not found: <4303DB48.8010902@develer.com.suse.lists.linux.kernel>]
* sched_yield() makes OpenLDAP slow
@ 2005-08-18 0:50 Bernardo Innocenti
2005-08-18 0:47 ` Con Kolivas
2005-08-18 1:07 ` Joseph Fannin
0 siblings, 2 replies; 49+ messages in thread
From: Bernardo Innocenti @ 2005-08-18 0:50 UTC (permalink / raw)
To: lkml, OpenLDAP-devel; +Cc: Giovanni Bajo, Simone Zinanni
Hello,
I've been investigating a performance problem on a
server using OpenLDAP 2.2.26 for nss resolution and
running kernel 2.6.12.
When a CPU bound process such as GCC is running in the
background (even at nice 10), many trivial commands such
as "su" or "groups" become extremely slow and take a few
seconds to complete.
strace revealed that data exchange over the slapd socket
was where most of the time was spent. Looking at the
slapd side, I see several calls to sched_yield() like this:
[pid 8780] 0.000033 stat64("gidNumber.dbb", 0xb7b3ebcc) = -1 EACCES (Permission denied)
[pid 8780] 0.000059 pread(20, "\0\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\2\0\344\17\2\3"..., 4096, 4096) = 4096
[pid 8780] 0.000083 pread(20, "\0\0\0\0\1\0\0\0\4\0\0\0\3\0\0\0\0\0\0\0\222\0<\7\1\5\370"..., 4096, 16384) = 4096
[pid 8780] 0.000078 time(NULL) = 1124322520
[pid 8780] 0.000066 pread(11, "\0\0\0\0\1\0\0\0\250\0\0\0\231\0\0\0\235\0\0\0\16\0000"..., 4096, 688128) = 4096
[pid 8780] 0.000241 write(19, "0e\2\1\3d`\4$cn=bernie,ou=group,dc=d"..., 103) = 103
[pid 8780] 0.000137 sched_yield( <unfinished ...>
[pid 8781] 0.050020 <... sched_yield resumed> ) = 0
[pid 8780] 0.000025 <... sched_yield resumed> ) = 0
[pid 8781] 0.000060 futex(0x925ab20, FUTEX_WAIT, 33, NULL <unfinished ...>
[pid 8780] 0.000026 write(19, "0\f\2\1\3e\7\n\1\0\4\0\4\0", 14) = 14
[pid 8774] 0.000774 <... select resumed> ) = 1 (in [19])
The relative timestamp reveals that slapd is spending 50ms
after yielding. Meanwhile, GCC is probably being scheduled
for a whole quantum.
Reading the man-page of sched_yield() it seems this isn't
the correct behavior:
Note: If the current process is the only process in the
highest priority list at that time, this process will
continue to run after a call to sched_yield.
I also think OpenLDAP is wrong. First, it should be calling
pthread_yield() because slapd is a multithreading process
and it just wants to run the other threads. See:
int
ldap_pvt_thread_yield( void )
{
#if HAVE_THR_YIELD
return thr_yield();
#elif HAVE_PTHREADS == 10
return sched_yield();
#elif defined(_POSIX_THREAD_IS_GNU_PTH)
sched_yield();
return 0;
#elif HAVE_PTHREADS == 6
pthread_yield(NULL);
return 0;
#else
pthread_yield();
return 0;
#endif
}
--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-18 0:50 Bernardo Innocenti
@ 2005-08-18 0:47 ` Con Kolivas
2005-08-18 10:48 ` Maciej Soltysiak
2005-08-18 1:07 ` Joseph Fannin
1 sibling, 1 reply; 49+ messages in thread
From: Con Kolivas @ 2005-08-18 0:47 UTC (permalink / raw)
To: Bernardo Innocenti; +Cc: lkml, OpenLDAP-devel, Giovanni Bajo, Simone Zinanni
On Thu, 18 Aug 2005 10:50 am, Bernardo Innocenti wrote:
> Hello,
>
> I've been investigating a performance problem on a
> server using OpenLDAP 2.2.26 for nss resolution and
> running kernel 2.6.12.
>
> When a CPU bound process such as GCC is running in the
> background (even at nice 10), many trivial commands such
> as "su" or "groups" become extremely slow and take a few
> seconds to complete.
>
> strace revealed that data exchange over the slapd socket
> was where most of the time was spent. Looking at the
> slapd side, I see several calls to sched_yield() like this:
>
>
> [pid 8780] 0.000033 stat64("gidNumber.dbb", 0xb7b3ebcc) = -1 EACCES
> (Permission denied) [pid 8780] 0.000059 pread(20,
> "\0\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\2\0\344\17\2\3"..., 4096, 4096) =
> 4096 [pid 8780] 0.000083 pread(20,
> "\0\0\0\0\1\0\0\0\4\0\0\0\3\0\0\0\0\0\0\0\222\0<\7\1\5\370"..., 4096,
> 16384) = 4096 [pid 8780] 0.000078 time(NULL) = 1124322520
> [pid 8780] 0.000066 pread(11,
> "\0\0\0\0\1\0\0\0\250\0\0\0\231\0\0\0\235\0\0\0\16\0000"..., 4096, 688128)
> = 4096 [pid 8780] 0.000241 write(19,
> "0e\2\1\3d`\4$cn=bernie,ou=group,dc=d"..., 103) = 103 [pid 8780]
> 0.000137 sched_yield( <unfinished ...>
> [pid 8781] 0.050020 <... sched_yield resumed> ) = 0
> [pid 8780] 0.000025 <... sched_yield resumed> ) = 0
> [pid 8781] 0.000060 futex(0x925ab20, FUTEX_WAIT, 33, NULL <unfinished
> ...> [pid 8780] 0.000026 write(19, "0\f\2\1\3e\7\n\1\0\4\0\4\0", 14)
> = 14 [pid 8774] 0.000774 <... select resumed> ) = 1 (in [19])
>
>
> The relative timestamp reveals that slapd is spending 50ms
> after yielding. Meanwhile, GCC is probably being scheduled
> for a whole quantum.
>
> Reading the man-page of sched_yield() it seems this isn't
> the correct behavior:
>
> Note: If the current process is the only process in the
> highest priority list at that time, this process will
> continue to run after a call to sched_yield.
>
> I also think OpenLDAP is wrong. First, it should be calling
> pthread_yield() because slapd is a multithreading process
> and it just wants to run the other threads. See:
sched_yield behaviour changed in 2.5 series more than 3 years ago and
applications that use this as a locking primitive should be updated.
Cheers,
Con
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-18 0:47 ` Con Kolivas
@ 2005-08-18 10:48 ` Maciej Soltysiak
0 siblings, 0 replies; 49+ messages in thread
From: Maciej Soltysiak @ 2005-08-18 10:48 UTC (permalink / raw)
To: lkml
Hello Con,
Thursday, August 18, 2005, 2:47:25 AM, you wrote:
> sched_yield behaviour changed in 2.5 series more than 3 years ago and
> applications that use this as a locking primitive should be updated.
I remember open office had a problem with excessive use of sched_yield()
during 2.5. I guess they changed it but I have not checked.
Does anyone know ?
Back then oo was having serious latency problems on 2.5
Regards,
Maciej
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-18 0:50 Bernardo Innocenti
2005-08-18 0:47 ` Con Kolivas
@ 2005-08-18 1:07 ` Joseph Fannin
2005-08-18 2:25 ` Bernardo Innocenti
2005-08-18 2:58 ` Nick Piggin
1 sibling, 2 replies; 49+ messages in thread
From: Joseph Fannin @ 2005-08-18 1:07 UTC (permalink / raw)
To: Bernardo Innocenti; +Cc: lkml, OpenLDAP-devel, Giovanni Bajo, Simone Zinanni
On Thu, Aug 18, 2005 at 02:50:16AM +0200, Bernardo Innocenti wrote:
> The relative timestamp reveals that slapd is spending 50ms
> after yielding. Meanwhile, GCC is probably being scheduled
> for a whole quantum.
>
> Reading the man-page of sched_yield() it seems this isn't
> the correct behavior:
>
> Note: If the current process is the only process in the
> highest priority list at that time, this process will
> continue to run after a call to sched_yield.
The behavior of sched_yield changed for 2.6. I suppose the man
page didn't get updated.
>From linux/Documentation/post-halloween.txt:
| - The behavior of sched_yield() changed a lot. A task that uses
| this system call should now expect to sleep for possibly a very
| long time. Tasks that do not really desire to give up the
| processor for a while should probably not make heavy use of this
| function. Unfortunately, some GUI programs (like Open Office)
| do make excessive use of this call and under load their
| performance is poor. It seems this new 2.6 behavior is optimal
| but some user-space applications may need fixing.
This is pretty much all I know about it; I just thought I'd point
it out.
> I also think OpenLDAP is wrong. First, it should be calling
> pthread_yield() because slapd is a multithreading process
> and it just wants to run the other threads. See:
Is it possible that this problem has been noticed and fixed
already?
--
Joseph Fannin
jfannin@gmail.com
/* So there I am, in the middle of my `netfilter-is-wonderful'
talk in Sydney, and someone asks `What happens if you try
to enlarge a 64k packet here?'. I think I said something
eloquent like `fuck'. - RR */
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-18 1:07 ` Joseph Fannin
@ 2005-08-18 2:25 ` Bernardo Innocenti
2005-08-18 2:58 ` Nick Piggin
1 sibling, 0 replies; 49+ messages in thread
From: Bernardo Innocenti @ 2005-08-18 2:25 UTC (permalink / raw)
To: Joseph Fannin; +Cc: lkml, OpenLDAP-devel, Giovanni Bajo, Simone Zinanni
Joseph Fannin wrote:
> The behavior of sched_yield changed for 2.6. I suppose the man
> page didn't get updated.
Now I remember reading about that on LWN or maybe KernelTraffic.
Thanks!
>>I also think OpenLDAP is wrong. First, it should be calling
>>pthread_yield() because slapd is a multithreading process
>>and it just wants to run the other threads. See:
>
> Is it possible that this problem has been noticed and fixed
> already?
The OpenLDAP 2.3.5 source still looks like this.
I've filed a report in OpenLDAP's issue tracker:
http://www.openldap.org/its/index.cgi/Incoming?id=3950;page=2
--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-18 1:07 ` Joseph Fannin
2005-08-18 2:25 ` Bernardo Innocenti
@ 2005-08-18 2:58 ` Nick Piggin
2005-08-19 3:10 ` Bernardo Innocenti
1 sibling, 1 reply; 49+ messages in thread
From: Nick Piggin @ 2005-08-18 2:58 UTC (permalink / raw)
To: Joseph Fannin
Cc: Bernardo Innocenti, lkml, OpenLDAP-devel, Giovanni Bajo, Simone Zinanni
Joseph Fannin wrote:
>On Thu, Aug 18, 2005 at 02:50:16AM +0200, Bernardo Innocenti wrote:
>
>
>>The relative timestamp reveals that slapd is spending 50ms
>>after yielding. Meanwhile, GCC is probably being scheduled
>>for a whole quantum.
>>
>>Reading the man-page of sched_yield() it seems this isn't
>>the correct behavior:
>>
>> Note: If the current process is the only process in the
>> highest priority list at that time, this process will
>> continue to run after a call to sched_yield.
>>
>
> The behavior of sched_yield changed for 2.6. I suppose the man
>page didn't get updated.
>
>
We class the SCHED_OTHER policy as having a single priority, which
I believe is allowed (and even makes good sense, because dynamic
and even nice priorities aren't really well defined).
That also makes our sched_yield() behaviour correct.
AFAIKS, sched_yield should only really be used by realtime
applications that know exactly what they're doing.
Send instant messages to your online friends http://au.messenger.yahoo.com
^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: sched_yield() makes OpenLDAP slow
2005-08-18 2:58 ` Nick Piggin
@ 2005-08-19 3:10 ` Bernardo Innocenti
0 siblings, 0 replies; 49+ messages in thread
From: Bernardo Innocenti @ 2005-08-19 3:10 UTC (permalink / raw)
To: Nick Piggin
Cc: Joseph Fannin, lkml, OpenLDAP-devel, Giovanni Bajo, Simone Zinanni
Nick Piggin wrote:
> We class the SCHED_OTHER policy as having a single priority, which
> I believe is allowed (and even makes good sense, because dynamic
> and even nice priorities aren't really well defined).
>
> That also makes our sched_yield() behaviour correct.
>
> AFAIKS, sched_yield should only really be used by realtime
> applications that know exactly what they're doing.
I'm pretty sure this has already been discussed in the
past, but I fail to see why this new behavior of
sched_yield() would be more correct.
In the OpenLDAP bug discussion, one of the developers
considers this a Linux quirk needing a workaround, not
a real bug in OpenLDAP.
As I understand it, the old behavior was to push the
yielding process to the end of the queue for processes
with the same niceness. This is somewhat closer to
the (vague) definition in the POSIX man pages:
The sched_yield() function shall force the running
thread to relinquish the processor until it again
becomes the head of its thread list. It takes no arguments.
Pushing the process far behind in the queue, even after
niced CPU crunchers, appears a bit extreme. It seems
most programs expect sched_yield() to only reschedule
the calling thread wrt its sibling threads, to be used
to implement do-it-yourself spinlocks and the like.
--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/
^ permalink raw reply [flat|nested] 49+ messages in thread
end of thread, other threads:[~2006-01-26 10:50 UTC | newest]
Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <4D8eT-4rg-31@gated-at.bofh.it>
2005-08-20 3:20 ` sched_yield() makes OpenLDAP slow Robert Hancock
2005-08-20 4:18 ` Nick Piggin
2005-08-20 18:38 ` Howard Chu
2005-08-20 20:57 ` Lee Revell
2005-08-20 21:24 ` Howard Chu
2005-08-21 0:36 ` Nick Piggin
2005-08-21 11:33 ` Nikita Danilov
2005-08-22 8:06 ` Howard Chu
2005-08-20 21:50 ` Lee Revell
2005-08-21 1:04 ` Robert Hancock
2005-08-22 11:44 ` linux-os (Dick Johnson)
2005-08-22 14:26 ` Robert Hancock
2005-08-23 11:17 ` linux-os (Dick Johnson)
2005-08-23 12:07 ` Denis Vlasenko
2005-08-24 3:37 ` Lincoln Dale
2006-01-24 22:59 e100 oops on resume Stefan Seyfried
2006-01-24 23:21 ` Mattia Dongili
2006-01-25 9:02 ` Olaf Kirch
2006-01-25 12:11 ` Olaf Kirch
2006-01-25 13:51 ` sched_yield() makes OpenLDAP slow Howard Chu
2006-01-25 14:38 ` Robert Hancock
2006-01-25 17:49 ` Christopher Friesen
2006-01-26 1:07 ` David Schwartz
2006-01-26 8:30 ` Helge Hafting
2006-01-26 9:01 ` Nick Piggin
2006-01-26 10:50 ` Nikita Danilov
[not found] <5uZqb-4fo-15@gated-at.bofh.it>
2006-01-14 22:47 ` Robert Hancock
-- strict thread matches above, loose matches on Subject: below --
2006-01-14 19:29 Howard Chu
[not found] <43057641.70700@symas.com.suse.lists.linux.kernel>
[not found] ` <17157.45712.877795.437505@gargle.gargle.HOWL.suse.lists.linux.kernel>
[not found] ` <430666DB.70802@symas.com.suse.lists.linux.kernel>
2005-08-20 13:48 ` Andi Kleen
2005-08-21 19:47 ` Florian Weimer
2005-08-22 5:09 ` Howard Chu
2005-08-22 13:06 ` Andi Kleen
2005-08-22 18:47 ` Howard Chu
2005-08-22 13:20 ` Florian Weimer
2005-08-22 23:19 ` Howard Chu
2005-08-19 6:03 Howard Chu
2005-08-19 6:34 ` Nick Piggin
2005-08-19 6:59 ` Chris Wedgwood
2005-08-19 22:45 ` Howard Chu
2005-08-19 10:21 ` Nikita Danilov
2005-08-19 23:10 ` Howard Chu
2005-08-20 13:23 ` Nikita Danilov
2005-08-20 19:49 ` Howard Chu
2005-08-20 22:08 ` Nikita Danilov
[not found] <4303DB48.8010902@develer.com.suse.lists.linux.kernel>
[not found] ` <20050818010703.GA13127@nineveh.rivenstone.net.suse.lists.linux.kernel>
[not found] ` <4303F967.6000404@yahoo.com.au.suse.lists.linux.kernel>
[not found] ` <43054D9A.7090509@develer.com.suse.lists.linux.kernel>
2005-08-19 3:19 ` Andi Kleen
2005-08-19 3:30 ` Bernardo Innocenti
2005-08-18 0:50 Bernardo Innocenti
2005-08-18 0:47 ` Con Kolivas
2005-08-18 10:48 ` Maciej Soltysiak
2005-08-18 1:07 ` Joseph Fannin
2005-08-18 2:25 ` Bernardo Innocenti
2005-08-18 2:58 ` Nick Piggin
2005-08-19 3:10 ` Bernardo Innocenti
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).