linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH RT 0/2] futex priority based wakeup
@ 2006-05-10  9:26 Sébastien Dugué
  2006-05-10 10:08 ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Sébastien Dugué @ 2006-05-10  9:26 UTC (permalink / raw)
  To: LKML; +Cc: Ingo Molnar, Thomas Gleixner



  Hi,

  in the current futex implementation, tasks are woken up in FIFO order,
(i.e. in the order they were put to sleep). For realtime systems needing
system wide strict realtime priority scheduling, tasks should be woken up
in priority order.

  This patchset achieves this by changing the futex hash bucket list into
a plist. Tasks waiting on a futex are enqueued in this plist based on
their priority so that they can be woken up in priority order.

  The 1st patch is a small optimization to futex_requeue() which could
go into mainline and does not depend on the RT patch.

  The 2nd patch does the real job of converting the futex hash bucket list
into a plist.
  There is however a drawback with this patch in that it is not possible to use
CONFIG_DEBUG_PI_LIST as the plist debugging code expects the spinlock protecting
the plist to be a raw_spinlock while the futex hash bucket lock is a regular
spinlock (rt-mutex). Maybe the plist debugging checks could be made generic to
accept both types.

  Any thoughts?

  Comments welcomed.

  Sébastien.



-----------------------------------------------------

  Sébastien Dugué                BULL/FREC:B1-247
  phone: (+33) 476 29 77 70      Bullcom: 229-7770

  mailto:sebastien.dugue@bull.net

  Linux POSIX AIO: http://www.bullopensource.org/posix
                   http://sourceforge.net/projects/paiol
  
-----------------------------------------------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH RT 0/2] futex priority based wakeup
  2006-05-10  9:26 [RFC][PATCH RT 0/2] futex priority based wakeup Sébastien Dugué
@ 2006-05-10 10:08 ` Ingo Molnar
  2006-05-10 13:03   ` Sébastien Dugué
  2006-05-12 13:32   ` Pierre Peiffer
  0 siblings, 2 replies; 10+ messages in thread
From: Ingo Molnar @ 2006-05-10 10:08 UTC (permalink / raw)
  To: Sébastien Dugué; +Cc: LKML, Thomas Gleixner


* Sébastien Dugué <sebastien.dugue@bull.net> wrote:

>   in the current futex implementation, tasks are woken up in FIFO 
> order, (i.e. in the order they were put to sleep). For realtime 
> systems needing system wide strict realtime priority scheduling, tasks 
> should be woken up in priority order.
> 
>   This patchset achieves this by changing the futex hash bucket list 
> into a plist. Tasks waiting on a futex are enqueued in this plist 
> based on their priority so that they can be woken up in priority 
> order.

hm, i dont think this is enough. Basically, waking up in priority order 
is just the (easier) half of the story - what you want is to also 
propagate priorities when you block. We provided a complete solution via 
the PI-futex patchset (currently included in -mm).

In other words: as long as locking primitives go, i dont think real-time 
applications should use wakeup-priority-ordered futexes, they should use 
the real thing, PI futexes.

There is one exception: when a normal futex is used as a waitqueue 
without any contention properties. (for example a waitqueue for worker 
threads) But those are both rare, and typically dont muster tasks with 
different priorities - i.e. FIFO is good enough.

Also, there's a performance cost to this. Could you try to measure the 
impact to SCHED_OTHER tasks via some pthread locking benchmark?

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH RT 0/2] futex priority based wakeup
  2006-05-10 10:08 ` Ingo Molnar
@ 2006-05-10 13:03   ` Sébastien Dugué
  2006-05-10 14:32     ` Thomas Gleixner
  2006-05-12 13:32   ` Pierre Peiffer
  1 sibling, 1 reply; 10+ messages in thread
From: Sébastien Dugué @ 2006-05-10 13:03 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: LKML, Thomas Gleixner

On Wed, 2006-05-10 at 12:08 +0200, Ingo Molnar wrote:
> * Sébastien Dugué <sebastien.dugue@bull.net> wrote:
> 
> >   in the current futex implementation, tasks are woken up in FIFO 
> > order, (i.e. in the order they were put to sleep). For realtime 
> > systems needing system wide strict realtime priority scheduling, tasks 
> > should be woken up in priority order.
> > 
> >   This patchset achieves this by changing the futex hash bucket list 
> > into a plist. Tasks waiting on a futex are enqueued in this plist 
> > based on their priority so that they can be woken up in priority 
> > order.
> 
> hm, i dont think this is enough. Basically, waking up in priority order 
> is just the (easier) half of the story - what you want is to also 
> propagate priorities when you block. We provided a complete solution via 
> the PI-futex patchset (currently included in -mm).
> 
> In other words: as long as locking primitives go, i dont think real-time 
> applications should use wakeup-priority-ordered futexes, they should use 
> the real thing, PI futexes.

  Yeah, that's right as long as userland pthread_mutexes are used, but
currently, pthread_condvars do not use PI-futexes. Therefore when we
have some threads sleeping on a condvar and the condvar is broadcasted,
those blocked threads are woken up through futex_requeue() in the order
they were put to sleep and not in their priority order.

  Maybe the pthread_cond_*() function should be made to use the
PI-futexes support in glibc.

> 
> There is one exception: when a normal futex is used as a waitqueue 
> without any contention properties. (for example a waitqueue for worker 
> threads) But those are both rare, and typically dont muster tasks with 
> different priorities - i.e. FIFO is good enough.
> 
> Also, there's a performance cost to this. Could you try to measure the 
> impact to SCHED_OTHER tasks via some pthread locking benchmark?

  Right, I will try to quantify the impact for SCHED_OTHER tasks. Any
pointers to such a benchmark?

  Thanks,

  Sébastien.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH RT 0/2] futex priority based wakeup
  2006-05-10 13:03   ` Sébastien Dugué
@ 2006-05-10 14:32     ` Thomas Gleixner
  2006-05-10 15:01       ` Jakub Jelinek
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2006-05-10 14:32 UTC (permalink / raw)
  To: Sébastien Dugué; +Cc: Ingo Molnar, LKML

On Wed, 2006-05-10 at 15:03 +0200, Sébastien Dugué wrote:
>   Maybe the pthread_cond_*() function should be made to use the
> PI-futexes support in glibc.

<hack_alert>

There is a simple way to do so. Just remove the assembler version of
pthread_cond_xx and use the generic c code implementation in glibc. That
allows you to use a PI mutex for the outer mutex.

</hack_alert>

	tglx



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH RT 0/2] futex priority based wakeup
  2006-05-10 14:32     ` Thomas Gleixner
@ 2006-05-10 15:01       ` Jakub Jelinek
  2006-05-10 17:02         ` Thomas Gleixner
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2006-05-10 15:01 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Sebastien Dugue, Ingo Molnar, LKML, Ulrich Drepper

On Wed, May 10, 2006 at 04:32:14PM +0200, Thomas Gleixner wrote:
> On Wed, 2006-05-10 at 15:03 +0200, Sébastien Dugué wrote:
> >   Maybe the pthread_cond_*() function should be made to use the
> > PI-futexes support in glibc.
> 
> <hack_alert>
> 
> There is a simple way to do so. Just remove the assembler version of
> pthread_cond_xx and use the generic c code implementation in glibc. That
> allows you to use a PI mutex for the outer mutex.
> 
> </hack_alert>

I don't see how would that help.  Both asm and sysdeps/pthread/*.c
versions call __pthread_mutex_unlock_usercnt/__pthread_mutex_cond_lock,
which will DTRT for the mutex passed as second argument to
pthread_mutex_*wait (assuming you have Ulrich's/mine PI nptl patch).
But, there are 2 other futexes used by condvars - internal condvar's lock
and __data.__futex.  If the associated mutex uses PI protocol, then
I'm afraid the internal condvar lock needs to follow the same protocol
(i.e. use FUTEX_*LOCK_PI), otherwise a low priority task calling
pthread_cond_* and acquiring the internal lock, then scheduled away
indefinitely because of some middle-priority CPU hog could delay
some high priority task calling pthread_cond_* on the same condvar.
But, there is a problem here - pthread_cond_{signal,broadcast} don't
have any associated mutexes, so you often don't know which type
of protocol you want to use for the internal condvar lock.
Now, for the __data.__futex lock POSIX seems to be more clear,
all it says is that the wake up should happen according to the scheduling
policy (but, on the other side for pthread_mutex_unlock it says the same
and we still use FIFO there).

	Jakub

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH RT 0/2] futex priority based wakeup
  2006-05-10 15:01       ` Jakub Jelinek
@ 2006-05-10 17:02         ` Thomas Gleixner
  2006-05-11  8:56           ` Sébastien Dugué
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2006-05-10 17:02 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Sebastien Dugue, Ingo Molnar, LKML, Ulrich Drepper

On Wed, 2006-05-10 at 11:01 -0400, Jakub Jelinek wrote:
> But, there are 2 other futexes used by condvars - internal condvar's lock
> and __data.__futex.  If the associated mutex uses PI protocol, then
> I'm afraid the internal condvar lock needs to follow the same protocol
> (i.e. use FUTEX_*LOCK_PI), otherwise a low priority task calling
> pthread_cond_* and acquiring the internal lock, then scheduled away
> indefinitely because of some middle-priority CPU hog could delay
> some high priority task calling pthread_cond_* on the same condvar.

Did not think about __data.__lock

I said "hack_alert" explicitely as this was just a work around, so we
could use an PI futex for the outer one, which was used to protect other
things too. The assmebler code corrupted the lock field of the outer
mutex with 0/1/2.

> But, there is a problem here - pthread_cond_{signal,broadcast} don't
> have any associated mutexes, so you often don't know which type
> of protocol you want to use for the internal condvar lock.
> Now, for the __data.__futex lock POSIX seems to be more clear,
> all it says is that the wake up should happen according to the scheduling
> policy (but, on the other side for pthread_mutex_unlock it says the same
> and we still use FIFO there).

Sebastians patch might solve this.

Is there a way to (pre)set the policy of the inner lock or is there any
other solution in sight ?

	tglx



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH RT 0/2] futex priority based wakeup
  2006-05-10 17:02         ` Thomas Gleixner
@ 2006-05-11  8:56           ` Sébastien Dugué
  2006-05-16 10:36             ` Jakub Jelinek
  0 siblings, 1 reply; 10+ messages in thread
From: Sébastien Dugué @ 2006-05-11  8:56 UTC (permalink / raw)
  To: tglx; +Cc: Jakub Jelinek, Ingo Molnar, LKML, Ulrich Drepper

On Wed, 2006-05-10 at 19:02 +0200, Thomas Gleixner wrote:
> On Wed, 2006-05-10 at 11:01 -0400, Jakub Jelinek wrote:
> > But, there are 2 other futexes used by condvars - internal condvar's lock
> > and __data.__futex.  If the associated mutex uses PI protocol, then
> > I'm afraid the internal condvar lock needs to follow the same protocol
> > (i.e. use FUTEX_*LOCK_PI), otherwise a low priority task calling
> > pthread_cond_* and acquiring the internal lock, then scheduled away
> > indefinitely because of some middle-priority CPU hog could delay
> > some high priority task calling pthread_cond_* on the same condvar.
> 
> Did not think about __data.__lock
> 
> I said "hack_alert" explicitely as this was just a work around, so we
> could use an PI futex for the outer one, which was used to protect other
> things too. The assmebler code corrupted the lock field of the outer
> mutex with 0/1/2.

  Hm, I wonder what would be the effect of having the external mutex
and __data.__lock use a PI futex and __data.__futex use a regular
futex when the waiters on __data.__futex are requeued on the external
mutex during a broadcast.

> 
> > But, there is a problem here - pthread_cond_{signal,broadcast} don't
> > have any associated mutexes, so you often don't know which type
> > of protocol you want to use for the internal condvar lock.

  Just a wild guess here, but in the broadcast or signal path, couldn't
__data.__mutex be looked up to determine what protocol to use for the
__data.__futex?

> > Now, for the __data.__futex lock POSIX seems to be more clear,
> > all it says is that the wake up should happen according to the scheduling
> > policy (but, on the other side for pthread_mutex_unlock it says the same
> > and we still use FIFO there).
> 
> Sebastians patch might solve this.
> 
> Is there a way to (pre)set the policy of the inner lock or is there any
> other solution in sight ?

  As a corolary to the above couldn't the protocol used by the external
mutex dictates what the __data.__futex should use (PI or not)?

Maybe there is a need for something like a futex_requeue_pi() otherwise
we'll end up mixing PI and non-PI futexes or maybe I'm not understanding
the thing.

  Sébastien.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH RT 0/2] futex priority based wakeup
  2006-05-10 10:08 ` Ingo Molnar
  2006-05-10 13:03   ` Sébastien Dugué
@ 2006-05-12 13:32   ` Pierre Peiffer
  1 sibling, 0 replies; 10+ messages in thread
From: Pierre Peiffer @ 2006-05-12 13:32 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Sébastien Dugué, LKML, Thomas Gleixner

Hi,

Just few comments for my understanding:

Ingo Molnar a écrit :
> * Sébastien Dugué <sebastien.dugue@bull.net> wrote:
> 
>>   in the current futex implementation, tasks are woken up in FIFO 
>> order, (i.e. in the order they were put to sleep). For realtime 
>> systems needing system wide strict realtime priority scheduling, tasks 
>> should be woken up in priority order.
>>
>>   This patchset achieves this by changing the futex hash bucket list 
>> into a plist. Tasks waiting on a futex are enqueued in this plist 
>> based on their priority so that they can be woken up in priority 
>> order.
> 
> hm, i dont think this is enough. Basically, waking up in priority order 
> is just the (easier) half of the story - what you want is to also 
> propagate priorities when you block. We provided a complete solution via 
> the PI-futex patchset (currently included in -mm).
> 
> In other words: as long as locking primitives go, i dont think real-time 
> applications should use wakeup-priority-ordered futexes, they should use 
> the real thing, PI futexes.

In fact, I agree with that for a lock (pthread_mutex, etc).

> 
> There is one exception: when a normal futex is used as a waitqueue 
> without any contention properties. (for example a waitqueue for worker 
> threads) But those are both rare, and typically dont muster tasks with 
> different priorities - i.e. FIFO is good enough.
> 

But here, I think this is what we have with the condvar, no ? When some 
threads are blocked on the condvar (pthread_cond_wait), they must be 
woken in priority order with pthread_broadcast, but there is no 
"lock-owner" to boost here.
Even if all threads but one are requeued on the second futex (i.e. the 
mutex used with the condvar), with the patch from Seb, they are requeued 
in priority order and thus get woken in priority order: we don't need 
any priority propagation here, I think.

So, I think that the PI-futexes are the right solution for the mutexes 
and rwlocks. But this patch seems to me correct for condvar 
(FUTEX_REQUEUE), I don't think that PI-futexes will add any benefit for 
condvar (?). But I may have missed something ?


> Also, there's a performance cost to this. Could you try to measure the 
> impact to SCHED_OTHER tasks via some pthread locking benchmark?
> 
> 	Ingo

-- 
Pierre

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH RT 0/2] futex priority based wakeup
  2006-05-11  8:56           ` Sébastien Dugué
@ 2006-05-16 10:36             ` Jakub Jelinek
  2006-05-18  8:51               ` Sébastien Dugué
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Jelinek @ 2006-05-16 10:36 UTC (permalink / raw)
  To: Sebastien Dugue; +Cc: tglx, Ingo Molnar, LKML, Ulrich Drepper

On Thu, May 11, 2006 at 10:56:57AM +0200, S?bastien Dugu? wrote:
>   Hm, I wonder what would be the effect of having the external mutex
> and __data.__lock use a PI futex and __data.__futex use a regular
> futex when the waiters on __data.__futex are requeued on the external
> mutex during a broadcast.

Well, either glibc can stop using requeue if the mutex is PI mutex (and use
the slower fallback), or kernel would need to handle requeueing from normal to
PI futex.

> > > But, there is a problem here - pthread_cond_{signal,broadcast} don't
> > > have any associated mutexes, so you often don't know which type
> > > of protocol you want to use for the internal condvar lock.
> 
>   Just a wild guess here, but in the broadcast or signal path, couldn't
> __data.__mutex be looked up to determine what protocol to use for the
> __data.__futex?

Not always.
Say if you do:
thread1	(low prio)	thread2 (very high prio)		thread3 (mid prio)
pthread_cond_signal (&cv)
# first use of cv in the program, no mutex has been ever used with this
# condvar
lll_mutex_lock (&cv->__data.__lock)
			pthread_cond_wait (&cv, &pi_mutex)
			lll_mutex_lock (&cv->__data.__lock)
								use_all_CPU
			# Then thread2 is stuck, waiting on thread1 which waits on thread3

At pthread_cond_signal enter time you don't know the type of associated
mutex, so you don't know which kind of internal lock to use.
It doesn't have to be the first use of cv in the program, similarly it can
be any pthread_cond_{signal,broadcast} called while there are no threads
in pthread_cond_*wait on that cv.

	Jakub

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC][PATCH RT 0/2] futex priority based wakeup
  2006-05-16 10:36             ` Jakub Jelinek
@ 2006-05-18  8:51               ` Sébastien Dugué
  0 siblings, 0 replies; 10+ messages in thread
From: Sébastien Dugué @ 2006-05-18  8:51 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: tglx, Ingo Molnar, LKML, Ulrich Drepper

On Tue, 2006-05-16 at 06:36 -0400, Jakub Jelinek wrote:
> On Thu, May 11, 2006 at 10:56:57AM +0200, S?bastien Dugu? wrote:
> >   Hm, I wonder what would be the effect of having the external mutex
> > and __data.__lock use a PI futex and __data.__futex use a regular
> > futex when the waiters on __data.__futex are requeued on the external
> > mutex during a broadcast.
> 
> Well, either glibc can stop using requeue if the mutex is PI mutex (and use
> the slower fallback), or kernel would need to handle requeueing from normal to
> PI futex.

  Yes, or make the condvar use a PI futex and implement a 
futex_requeue_pi() and have glibc use that in the broadcast case. I 
think the requeue thing is a good idea and should not be dropped.

> 
> > > > But, there is a problem here - pthread_cond_{signal,broadcast} don't
> > > > have any associated mutexes, so you often don't know which type
> > > > of protocol you want to use for the internal condvar lock.
> > 
> >   Just a wild guess here, but in the broadcast or signal path, couldn't
> > __data.__mutex be looked up to determine what protocol to use for the
> > __data.__futex?
> 
> Not always.
> Say if you do:
> thread1 (low prio)	thread2 (very high prio)		thread3 (mid prio)
> pthread_cond_signal (&cv)
> # first use of cv in the program, no mutex has been ever used with this
> # condvar
> lll_mutex_lock (&cv->__data.__lock)
> 			pthread_cond_wait (&cv, &pi_mutex)
> 			lll_mutex_lock (&cv->__data.__lock)
> 								use_all_CPU
> 			# Then thread2 is stuck, waiting on thread1 which waits on thread3
> 
> At pthread_cond_signal enter time you don't know the type of associated
> mutex, so you don't know which kind of internal lock to use.
> It doesn't have to be the first use of cv in the program, similarly it can
> be any pthread_cond_{signal,broadcast} called while there are no threads
> in pthread_cond_*wait on that cv.

  OK, it makes sense now, thanks.

  Sébastien.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-05-18  8:46 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-10  9:26 [RFC][PATCH RT 0/2] futex priority based wakeup Sébastien Dugué
2006-05-10 10:08 ` Ingo Molnar
2006-05-10 13:03   ` Sébastien Dugué
2006-05-10 14:32     ` Thomas Gleixner
2006-05-10 15:01       ` Jakub Jelinek
2006-05-10 17:02         ` Thomas Gleixner
2006-05-11  8:56           ` Sébastien Dugué
2006-05-16 10:36             ` Jakub Jelinek
2006-05-18  8:51               ` Sébastien Dugué
2006-05-12 13:32   ` Pierre Peiffer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).