All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1
@ 2004-08-04  9:13 Perez-Gonzalez, Inaky
  2004-08-05  6:21 ` Andrew Morton
  2004-08-05 10:34 ` Ingo Molnar
  0 siblings, 2 replies; 22+ messages in thread
From: Perez-Gonzalez, Inaky @ 2004-08-04  9:13 UTC (permalink / raw)
  To: linux-kernel, robustmutexes

Hi All

Fusyn aims to provide primitives to solve a bunch of gaps in POSIX
compliance related to mutexes, conditional variables and semaphores,
POSIX Advanced real-time support as well as adding mutex robustness
(to dying owners) and deep deadlock checking.

All of these primitives are available to both kernel space and user
space (through a generalization of the mechanism used by futexes),
allowing for a fast path on most mutex operations.

We strive to solve the POSIX gap of scheduling-policy based
unlock/wakeup for mutexes, conditional variables, semaphores,
etc; the lacks in Advanced Real-Time support (priority inversion
protection through priority inheritance and priority protection),
robust mutexes (mutex waiters no longer deadlock when a mutex
owner dies) and deep deadlock checking for mutexes.

The full description of the gaps we solve, rationales behind the
implementation and explanations on the need for the new features
is kind of long to fully explain here, so you can find it in the
linux/Documentation/fusyn-why.txt after applying our patch or at
our website, in:

http://developer.osdl.org/dev/robustmutexes/index.html#Documentation

High level changelog since release 2.3:

- Not much, ported to 2.6.7 vanilla.

However, this showed that we weren't properly locking when
obtaining pages from an 'struct address_mapping' [function
__vl_key_page_get_shared()]...so we fixed that.

Still to-do:

- Finally finish implementing priority protection; the core is
there, only the glue to use it is needed.

- Wipe out debug stuff

- Call fuqueue_waiter_cancel() into try_to_wake_up?

The patch is split in the following parts:

1/11: documentation files
2/11: priority based O(1) lists
3/11: kernel fuqueues
4/11: kernel fulocks
5/11: user space/kernel space tracker
6/11: user space fuqueues
7/11: user space fulocks
8/11: Arch-specific support
9/11: Modifications to the core: basic
10/11: Modifications to the core: struct timeout
11/11: Modifications to the core: scheduler

We are not posting it to the list, as it has grown kind of
big. It can be obtained at out web site mentioned above, in the
Download section.

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own (and my fault)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1
  2004-08-04  9:13 [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1 Perez-Gonzalez, Inaky
@ 2004-08-05  6:21 ` Andrew Morton
  2004-08-05  7:06   ` Ulrich Drepper
  2004-08-05 10:34 ` Ingo Molnar
  1 sibling, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2004-08-05  6:21 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: linux-kernel, robustmutexes, Rusty Russell, Ingo Molnar,
	Jamie Lokier, Ulrich Drepper

"Perez-Gonzalez, Inaky" <inaky.perez-gonzalez@intel.com> wrote:
>
> Hi All
> 
> Fusyn aims to provide primitives to solve a bunch of gaps in POSIX
> compliance related to mutexes, conditional variables and semaphores,
> POSIX Advanced real-time support as well as adding mutex robustness
> (to dying owners) and deep deadlock checking.
> 
> All of these primitives are available to both kernel space and user
> space (through a generalization of the mechanism used by futexes),
> allowing for a fast path on most mutex operations.
> 
> We strive to solve the POSIX gap of scheduling-policy based
> unlock/wakeup for mutexes, conditional variables, semaphores,
> etc; the lacks in Advanced Real-Time support (priority inversion
> protection through priority inheritance and priority protection),
> robust mutexes (mutex waiters no longer deadlock when a mutex
> owner dies) and deep deadlock checking for mutexes.
> 
> The full description of the gaps we solve, rationales behind the
> implementation and explanations on the need for the new features
> is kind of long to fully explain here, so you can find it in the
> linux/Documentation/fusyn-why.txt after applying our patch or at
> our website, in:
> 
> http://developer.osdl.org/dev/robustmutexes/index.html#Documentation

This fixes what appear to be some fairly significant shortcomings.  What do
the futex and NPTL people have to say about the gravity of the problems
which this solves, and the offered implementation?

> We are not posting it to the list, as it has grown kind of big.

Maybe you should...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1
  2004-08-05  6:21 ` Andrew Morton
@ 2004-08-05  7:06   ` Ulrich Drepper
  2004-08-05  7:17     ` Andrew Morton
  0 siblings, 1 reply; 22+ messages in thread
From: Ulrich Drepper @ 2004-08-05  7:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Perez-Gonzalez, Inaky, linux-kernel, robustmutexes,
	Rusty Russell, Ingo Molnar, Jamie Lokier

Andrew Morton wrote:

> This fixes what appear to be some fairly significant shortcomings.  What do
> the futex and NPTL people have to say about the gravity of the problems
> which this solves, and the offered implementation?

This code will not be suppoerted by the glibc code.  Using these
primitives would mean significant slowdown of all operations and this
for problems which only a few people have.  I asked to get the useful
parts of the code to be made available using the current futex interface
(robust mutexes are useful) but Inaky and rest rest never acted on this
and instead invented this completely incompatible interface.  IMO this
code should not go into the mainstream kernel.  Let those who want to do
realtime work bear the costs.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1
  2004-08-05  7:06   ` Ulrich Drepper
@ 2004-08-05  7:17     ` Andrew Morton
  2004-08-05  7:37       ` Ulrich Drepper
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2004-08-05  7:17 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: inaky.perez-gonzalez, linux-kernel, robustmutexes, rusty, mingo, jamie

Ulrich Drepper <drepper@redhat.com> wrote:
>
> Andrew Morton wrote:
> 
> > This fixes what appear to be some fairly significant shortcomings.  What do
> > the futex and NPTL people have to say about the gravity of the problems
> > which this solves, and the offered implementation?
> 
> This code will not be suppoerted by the glibc code.  Using these
> primitives would mean significant slowdown of all operations and this
> for problems which only a few people have.

How large is the slowdown, and on what workloads?

>  I asked to get the useful
> parts of the code to be made available using the current futex interface
> (robust mutexes are useful)

Passing the lock to a non-rt task when there's an rt-task waiting for it
seems pretty poor form, too.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1
  2004-08-05  7:17     ` Andrew Morton
@ 2004-08-05  7:37       ` Ulrich Drepper
  2004-08-05  7:40         ` Andrew Morton
                           ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Ulrich Drepper @ 2004-08-05  7:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: inaky.perez-gonzalez, linux-kernel, robustmutexes, rusty, mingo, jamie

Andrew Morton wrote:

> How large is the slowdown, and on what workloads?

The fast path for all locking primitives etc in nptl today is entirely
at userlevel.  Normally just a single atomic operation with a dozen
other instructions.  With the fusyn stuff each and every locking
operation needs a system call to register/unregister the thread as it
locks/unlocks mutex/rwlocks/etc.  Go figure how well this works.  We are
talking about making the fast path of the locking primitives
two/three/four orders of magnitude more expensive.  And this for
absolutely no benefit for 99.999% of all the code which uses threads.


> Passing the lock to a non-rt task when there's an rt-task waiting for it
> seems pretty poor form, too.

No no, that's not what is wanted.  Robust mutexes are a special kind of
mutex and not related to rt issues.  Lockers of robust mutexes have to
register with the kernel (i.e., the locking must actually be performed
by the kernel) so that in case the thread goes away or the entire
process dies, the mutex is unlocked and other waiters (other threads, in
the same or other processes) can get the lock.  This is very useful for
normal operations where mutexes are used inter-process.  This is the
part which is independent from rt but it also must not be the default
mode (i.e., normal pthread_mutex_t code must not be replaced) since it
is significantly slower.


The rest of the extensions like all the priority handling is not of
general interest.  POSIX describes how a thread's priority would be
temporarily raised if it holds a mutex which has a higher-priority
waiter.  But this is all functionality of a realtime profile and widely
not part of the normal implementation.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1
  2004-08-05  7:37       ` Ulrich Drepper
@ 2004-08-05  7:40         ` Andrew Morton
  2004-08-05  8:22           ` Ulrich Drepper
  2004-08-05 10:42         ` Ingo Molnar
                           ` (3 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2004-08-05  7:40 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: inaky.perez-gonzalez, linux-kernel, robustmutexes, rusty, mingo, jamie

Ulrich Drepper <drepper@redhat.com> wrote:
>
> Andrew Morton wrote:
> 
>  > How large is the slowdown, and on what workloads?
> 
>  The fast path for all locking primitives etc in nptl today is entirely
>  at userlevel.  Normally just a single atomic operation with a dozen
>  other instructions.  With the fusyn stuff each and every locking
>  operation needs a system call to register/unregister the thread as it
>  locks/unlocks mutex/rwlocks/etc.  Go figure how well this works.  We are
>  talking about making the fast path of the locking primitives
>  two/three/four orders of magnitude more expensive.  And this for
>  absolutely no benefit for 99.999% of all the code which uses threads.
> 

ouch, OK.  But doesn't the current futex code continue to work unchanged?

>  > Passing the lock to a non-rt task when there's an rt-task waiting for it
>  > seems pretty poor form, too.
> 
>  No no, that's not what is wanted.  Robust mutexes are a special kind of
>  mutex and not related to rt issues.

I was referring to "scheduling-policy based unlock/wakeup", actually.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1
  2004-08-05  7:40         ` Andrew Morton
@ 2004-08-05  8:22           ` Ulrich Drepper
  0 siblings, 0 replies; 22+ messages in thread
From: Ulrich Drepper @ 2004-08-05  8:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: inaky.perez-gonzalez, linux-kernel, robustmutexes, rusty, mingo, jamie

Andrew Morton wrote:
> But doesn't the current futex code continue to work unchanged?

If the patches don't touch the existing futex code there is no risk of
breaking anything.  Futexes and these new objects have nothing to do
with each other.


> I was referring to "scheduling-policy based unlock/wakeup", actually.

We don't have anything like this for futexes.  It's not impossible to
do, we just didn't do it because it was of not much interest to us at
that time.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1
  2004-08-04  9:13 [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1 Perez-Gonzalez, Inaky
  2004-08-05  6:21 ` Andrew Morton
@ 2004-08-05 10:34 ` Ingo Molnar
  2004-08-05 10:59   ` Ingo Molnar
  1 sibling, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2004-08-05 10:34 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: linux-kernel, robustmutexes, Andrew Morton, Ulrich Drepper


* Perez-Gonzalez, Inaky <inaky.perez-gonzalez@intel.com> wrote:

> Fusyn aims to provide primitives to solve a bunch of gaps in POSIX
> compliance related to mutexes, conditional variables and semaphores,
> POSIX Advanced real-time support as well as adding mutex robustness
> (to dying owners) and deep deadlock checking.

the sched.c bits look clean enough.

i like the generic concept - keeping the userspace fast-path for
lock/unlock, like for futexes, and registering/unregistering a lock via
the kernel.

but, couldnt there be more sharing between futex.c and fusyn.c? In
particular on the API side, why arent all these ops done as an extension
to sys_futex()? That would keep the glibc part much simpler (and more
compatible) as well. You'd still get all the glory of implementing true
priority inheritance and advanced RT-locking for Linux :-)

or are the two interfaces way too different?

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1
  2004-08-05  7:37       ` Ulrich Drepper
  2004-08-05  7:40         ` Andrew Morton
@ 2004-08-05 10:42         ` Ingo Molnar
  2004-08-05 11:48         ` Rusty Russell
                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2004-08-05 10:42 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: Andrew Morton, inaky.perez-gonzalez, linux-kernel, robustmutexes,
	rusty, jamie


* Ulrich Drepper <drepper@redhat.com> wrote:

> Andrew Morton wrote:
> 
> > How large is the slowdown, and on what workloads?
> 
> The fast path for all locking primitives etc in nptl today is entirely
> at userlevel.  Normally just a single atomic operation with a dozen
> other instructions.  With the fusyn stuff each and every locking
> operation needs a system call to register/unregister the thread as it
> locks/unlocks mutex/rwlocks/etc. [...]

actually, the way i understand the patch it is not that bad: normally
(in non-KCO mode) the fastpath locks/unlocks (uncontended locks) are
userspace-only. Non-KCO mode still gives all the priority guarantees. 
(There's also KCO mode for guaranteed-unlock-on-thread-death and for
broken architectures - but it doesnt have any real advantage other than
the slowdown.)

there's overhead in the wakeup handling and in the registration need for
non-KCO locks, but once things are up and running it should be quite
comparable to current futex costs - it's pure userspace.

so i think what would be nice is an extension of sys_futex() to
incorporate the fusyn primitives, and then a nice glibc patch to
introduce a robust mode for all the sync objects.

and if fusyn.c is fast enough then we could even try to do normal
futexes via fusyn.c - but not doing the registration/unregistration
(hence losing the priority guarantee, but still sharing much of the
codepath). This would be the most robust internal design i believe.

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1
  2004-08-05 10:34 ` Ingo Molnar
@ 2004-08-05 10:59   ` Ingo Molnar
  0 siblings, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2004-08-05 10:59 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: linux-kernel, robustmutexes, Andrew Morton, Ulrich Drepper


* Ingo Molnar <mingo@elte.hu> wrote:

> but, couldnt there be more sharing between futex.c and fusyn.c? In
> particular on the API side, why arent all these ops done as an
> extension to sys_futex()? That would keep the glibc part much simpler
> (and more compatible) as well. [...]

i believe the key to integration of this feature is to try to make it
used by normal (non-RT) apps as much as possible. I.e. try to make
current futexes a subset of fusyn.c and to merge the two APIs if
possible (essentially renaming your fusyn.c to futex.c and implementing
the futex API). Is this possible without noticeable performance overhead
(and without too many special-cases)?

such an approach would ensure that key portions of the code would be
triggered by everyday apps. Developers wouldnt break the feature every
other day, etc. Deadlock detection and priority boosting might not be
tested this way, but the basic locking/waking/VM-keying mechanism sure
could be.

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1
  2004-08-05  7:37       ` Ulrich Drepper
  2004-08-05  7:40         ` Andrew Morton
  2004-08-05 10:42         ` Ingo Molnar
@ 2004-08-05 11:48         ` Rusty Russell
  2004-08-05 13:23           ` Linh Dang
  2004-08-05 13:26         ` Linh Dang
  2004-08-05 14:02         ` Chris Friesen
  4 siblings, 1 reply; 22+ messages in thread
From: Rusty Russell @ 2004-08-05 11:48 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: Andrew Morton, inaky.perez-gonzalez, lkml - Kernel Mailing List,
	robustmutexes, Ingo Molnar, jamie

On Thu, 2004-08-05 at 17:37, Ulrich Drepper wrote:
> Andrew Morton wrote:
> > Passing the lock to a non-rt task when there's an rt-task waiting for it
> > seems pretty poor form, too.
> 
> No no, that's not what is wanted.  Robust mutexes are a special kind of
> mutex and not related to rt issues.  Lockers of robust mutexes have to
> register with the kernel (i.e., the locking must actually be performed
> by the kernel) so that in case the thread goes away or the entire
> process dies, the mutex is unlocked and other waiters (other threads, in
> the same or other processes) can get the lock.

I don't think this is neccessarily true: I think that platforms with
64-bit compare-and-exchange can do the whole thing in userspace.  They
would set the mutex and stamp in the thread ID simultanously, allowing
for "dead thread" detection (ie. I didn't get the lock, and it's a
robust mutex: check the holder is still alive).

W/o 64-bit compare-and-exchange a 100% robust solution may not be
possible though.

Thoughts?
Rusty.
-- 
Anyone who quotes me in their signature is an idiot -- Rusty Russell


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1
  2004-08-05 11:48         ` Rusty Russell
@ 2004-08-05 13:23           ` Linh Dang
  0 siblings, 0 replies; 22+ messages in thread
From: Linh Dang @ 2004-08-05 13:23 UTC (permalink / raw)
  To: linux-kernel

Rusty Russell <rusty@rustcorp.com.au> wrote:

> I don't think this is neccessarily true: I think that platforms with
> 64-bit compare-and-exchange can do the whole thing in userspace.
> They would set the mutex and stamp in the thread ID simultanously,
> allowing for "dead thread" detection (ie. I didn't get the lock, and
> it's a robust mutex: check the holder is still alive).

Or for priority-inheritance: try get the lock, if failed raised the
holder's priority to mine if necessary.

>
> W/o 64-bit compare-and-exchange a 100% robust solution may not be
> possible though.

PPC arch can do a lot of things in a pseudo-atomic way.

>
> Thoughts?
> Rusty.

-- 
Linh Dang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1
  2004-08-05  7:37       ` Ulrich Drepper
                           ` (2 preceding siblings ...)
  2004-08-05 11:48         ` Rusty Russell
@ 2004-08-05 13:26         ` Linh Dang
  2004-08-05 14:02         ` Chris Friesen
  4 siblings, 0 replies; 22+ messages in thread
From: Linh Dang @ 2004-08-05 13:26 UTC (permalink / raw)
  To: linux-kernel

Ulrich Drepper <drepper@redhat.com> wrote:
> The fast path for all locking primitives etc in nptl today is
> entirely at userlevel.  Normally just a single atomic operation with
> a dozen other instructions.  With the fusyn stuff each and every
> locking operation needs a system call to register/unregister the
> thread as it locks/unlocks mutex/rwlocks/etc.  Go figure how well
> this works.  We are talking about making the fast path of the
> locking primitives two/three/four orders of magnitude more
> expensive.  And this for absolutely no benefit for 99.999% of all
> the code which uses threads.
>

Is there an EFFICIENT way to add priority-inheritance to futex? the
lack of priority-inheritance is biggest headache for RT applications
running on top of NPTL/kernel-2.6. And there's is a LOT more of us (RT
users who want to use NPTL/kernel-2.6) than you might think. I guess
we're just not vocal.

-- 
Linh Dang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1
  2004-08-05  7:37       ` Ulrich Drepper
                           ` (3 preceding siblings ...)
  2004-08-05 13:26         ` Linh Dang
@ 2004-08-05 14:02         ` Chris Friesen
  4 siblings, 0 replies; 22+ messages in thread
From: Chris Friesen @ 2004-08-05 14:02 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: Andrew Morton, inaky.perez-gonzalez, linux-kernel, robustmutexes,
	rusty, mingo, jamie

Ulrich Drepper wrote:
> Andrew Morton wrote:
> 
> 
>>How large is the slowdown, and on what workloads?
> 
> 
> The fast path for all locking primitives etc in nptl today is entirely
> at userlevel.  Normally just a single atomic operation with a dozen
> other instructions.  With the fusyn stuff each and every locking
> operation needs a system call to register/unregister the thread as it
> locks/unlocks mutex/rwlocks/etc.  Go figure how well this works.  We are
> talking about making the fast path of the locking primitives
> two/three/four orders of magnitude more expensive.  And this for
> absolutely no benefit for 99.999% of all the code which uses threads.

Just a small clarification.  (Rusty already touched on this briefly, but I think 
he made a mistake.)

If the arch has atomic compare-and-exchange, then the non-contended case is 
entirely userspace and no syscall is needed.  I don't think that the cmpxchg 
need be 64-bit.  From the OLS 2004 talk:

int vfulock_lock (&vfulock, flags, pid, &timeout) {
	unsigned old = VFULOCK_UNLOCKED;
	if (cmpxchg(vfulock,old,pid) != old) return 0;
	return SYSCALL(ufulock_lock,3,vfulock,flags,to);
}

That looks like a 32-bit cmpxchg to me.

Also, Inaky reported general operation about 10% slower than NPTL, but said that 
he wanted to fix that if possible.

Chris

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1
@ 2004-08-05 18:39 Perez-Gonzalez, Inaky
  0 siblings, 0 replies; 22+ messages in thread
From: Perez-Gonzalez, Inaky @ 2004-08-05 18:39 UTC (permalink / raw)
  To: Linh Dang, linux-kernel

> From: Linh Dang
> 
> Ulrich Drepper <drepper@redhat.com> wrote:
> > The fast path for all locking primitives etc in nptl today is
> > entirely at userlevel.  Normally just a single atomic operation with
> > a dozen other instructions.  With the fusyn stuff each and every
> > locking operation needs a system call to register/unregister the
> > thread as it locks/unlocks mutex/rwlocks/etc.  Go figure how well
> > this works.  We are talking about making the fast path of the
> > locking primitives two/three/four orders of magnitude more
> > expensive.  And this for absolutely no benefit for 99.999% of all
> > the code which uses threads.
> >
> 
> Is there an EFFICIENT way to add priority-inheritance to futex? the
> lack of priority-inheritance is biggest headache for RT applications
> running on top of NPTL/kernel-2.6. And there's is a LOT more of us (RT
> users who want to use NPTL/kernel-2.6) than you might think. I guess
> we're just not vocal.

No.

You need the concept of ownership (who do I have to boost?). You need 
spinlocks to be able to traverse the who-is-waiting-for-whom trees
(you might have three guys trying to lock from three different CPUs
plus other guys preempting in the middle and you need to protect those
trees.

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own (and my fault)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1
@ 2004-08-05 18:39 Perez-Gonzalez, Inaky
  0 siblings, 0 replies; 22+ messages in thread
From: Perez-Gonzalez, Inaky @ 2004-08-05 18:39 UTC (permalink / raw)
  To: Ingo Molnar, Ulrich Drepper
  Cc: Andrew Morton, linux-kernel, robustmutexes, rusty, jamie

> From: Ingo Molnar [mailto:mingo@elte.hu]
>
> and if fusyn.c is fast enough then we could even try to do normal
> futexes via fusyn.c - but not doing the registration/unregistration
> (hence losing the priority guarantee, but still sharing much of the
> codepath). This would be the most robust internal design i believe.

The priority guarantee or the robustness guarantee? I am guessing you
meant the second; the priority-based wakeup you will always get it.

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own (and my fault)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1
@ 2004-08-05 18:37 Perez-Gonzalez, Inaky
  0 siblings, 0 replies; 22+ messages in thread
From: Perez-Gonzalez, Inaky @ 2004-08-05 18:37 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, robustmutexes, Andrew Morton, Ulrich Drepper

> From: Ingo Molnar [mailto:mingo@elte.hu]
> 
> * Perez-Gonzalez, Inaky <inaky.perez-gonzalez@intel.com> wrote:
> 
> > Fusyn aims to provide primitives to solve a bunch of gaps in POSIX
> > compliance related to mutexes, conditional variables and semaphores,
> > POSIX Advanced real-time support as well as adding mutex robustness
> > (to dying owners) and deep deadlock checking.
> 
> the sched.c bits look clean enough.

The bit that scares me is the hook into effective_prio() in the
scheduler fast-path for the prio boost. It is a minimal op, but I know
how cautious are you guys on poking there.

> but, couldnt there be more sharing between futex.c and fusyn.c? In
> particular on the API side, why arent all these ops done as an extension
> to sys_futex()? 

That's what I did initially and many people barked at me because it made
sys_futex even uglier. Somebody [I can't remember the thread and I cannot
find it] said that sys_futex() should have been split in a number of syscalls
from the beginning, not multiplexing--and that they should do that in 2.7
to ease up a transition.

As well, I need to take different arguments [flags for the fulocks, for example].
For the sys_ufuqueue_*(), it makes sense to do a redirection for simplification,
emulating the sys_futex() thingies, but for fulocks, they are completely 
different beasts. It makes little sense, it is a locking interface, not a 
waitqueue interface. 

> That would keep the glibc part much simpler (and more
> compatible) as well. You'd still get all the glory of implementing true

I'll work on the sys_futex() redirection to ufuqueues during today [let's
see if I can get something done before taking off on vacation] and as soon
as I come back, but for the ufulocks...the interface is too different. I
think it makes more sense to clean up the glibc implementation, make a truly
layered set of calls redirecting the lll_ stuff at compile time [and run
time where needed/desired] that would allow the user to select what he wants
to do (when in the know) and default to the best combination for the general
public.

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own (and my fault)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1
@ 2004-08-05 18:22 Perez-Gonzalez, Inaky
  0 siblings, 0 replies; 22+ messages in thread
From: Perez-Gonzalez, Inaky @ 2004-08-05 18:22 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, robustmutexes, Andrew Morton, Ulrich Drepper



Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own (and my fault)


> -----Original Message-----
> From: Ingo Molnar [mailto:mingo@elte.hu]
> Sent: Thursday, August 05, 2004 3:59 AM
> To: Perez-Gonzalez, Inaky
> Cc: linux-kernel@vger.kernel.org; robustmutexes@lists.osdl.org; Andrew Morton; Ulrich Drepper
> Subject: Re: [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1
> 
> 
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > but, couldnt there be more sharing between futex.c and fusyn.c? In
> > particular on the API side, why arent all these ops done as an
> > extension to sys_futex()? That would keep the glibc part much simpler
> > (and more compatible) as well. [...]
> 
> i believe the key to integration of this feature is to try to make it
> used by normal (non-RT) apps as much as possible. I.e. try to make
> current futexes a subset of fusyn.c and to merge the two APIs if
> possible (essentially renaming your fusyn.c to futex.c and implementing
> the futex API). Is this possible without noticeable performance overhead
> (and without too many special-cases)?

I mentioned it in some other answer...I think. Nevermind. One of the fusyn
layers (ufuqueue) can emulate futexes completely [except for a few extra 
errno codes and the scheduling policy based wakeup and the missing requeue
[easy to do] and FUTEX_FD -- only NGPT uses it, afaik]. 

The interface is now through a three system calls (sys_ufuqueue_{wait,wake,ctl}), 
but it should be easy to redirect sys_futex().

> such an approach would ensure that key portions of the code would be
> triggered by everyday apps. Developers wouldnt break the feature every
> other day, etc. Deadlock detection and priority boosting might not be
> tested this way, but the basic locking/waking/VM-keying mechanism sure
> could be.

That makes sense. Performance overhead wise would be related only to the
extra spinlocks we take...I'll work on that redirection layer--I am going
on vacation tonight, but it should be ready in a couple of days as soon
as I come back.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1
@ 2004-08-05 18:16 Perez-Gonzalez, Inaky
  0 siblings, 0 replies; 22+ messages in thread
From: Perez-Gonzalez, Inaky @ 2004-08-05 18:16 UTC (permalink / raw)
  To: Andrew Morton, Ulrich Drepper
  Cc: linux-kernel, robustmutexes, rusty, mingo, jamie

> From: Andrew Morton [mailto:akpm@osdl.org]
> 
> Ulrich Drepper <drepper@redhat.com> wrote:
> >
> > Andrew Morton wrote:
> >
> > > This fixes what appear to be some fairly significant shortcomings.  What do
> > > the futex and NPTL people have to say about the gravity of the problems
> > > which this solves, and the offered implementation?
> >
> > This code will not be suppoerted by the glibc code.  Using these
> > primitives would mean significant slowdown of all operations and this
> > for problems which only a few people have.
> 
> How large is the slowdown, and on what workloads?

10% on volanomark, as well in some other conditional variable stress
tests we have been conducting. For the conditional variable one we have
an initial proof of concept optimization that we'll try in a couple weeks
[I am going on vacation next week].

> >  I asked to get the useful
> > parts of the code to be made available using the current futex interface
> > (robust mutexes are useful)
> 
> Passing the lock to a non-rt task when there's an rt-task waiting for it
> seems pretty poor form, too.

???? That never happens in fusyn [unless there is a bug]. The next guy who 
gets the lock if it is being passed (or is woken up) is always the highest 
priority one.

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own (and my fault)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1
@ 2004-08-05 18:16 Perez-Gonzalez, Inaky
  0 siblings, 0 replies; 22+ messages in thread
From: Perez-Gonzalez, Inaky @ 2004-08-05 18:16 UTC (permalink / raw)
  To: Ulrich Drepper, Andrew Morton
  Cc: linux-kernel, robustmutexes, Rusty Russell, Ingo Molnar, Jamie Lokier

> From: Ulrich Drepper [mailto:drepper@redhat.com]
>
> Andrew Morton wrote:
> 
> > This fixes what appear to be some fairly significant shortcomings.  What do
> > the futex and NPTL people have to say about the gravity of the problems
> > which this solves, and the offered implementation?
> 
> This code will not be suppoerted by the glibc code.  Using these
> primitives would mean significant slowdown of all operations and this
> for problems which only a few people have.  I asked to get the useful
> parts of the code to be made available using the current futex interface
> (robust mutexes are useful) but Inaky and rest rest never acted on this
> and instead invented this completely incompatible interface.  IMO this
> code should not go into the mainstream kernel.  Let those who want to do
> realtime work bear the costs.

But I told you many times why it is not possible and you keep ignoring
those reasons.

Read the paper, read the slides. In a nutshell [again]: the idea of 
robustness requires that there is a concept of ownership and that the 
kernel knows about it. Futexes are just waitqueues, and as they are
used all around the code in glibc and other apps, the ownership concept
doesn't work with them.

[same thing applies to the advanced real-time features, priority 
inheritance and protection, as they require kernel knowledge about
ownership].

With the current futex interface it cannot be done--I tried and it 
resulted in a mess [lookup for rtfutex patches I sent last year].

On top of that, many people has voiced that the sys_futex() multiplexing
should have been unfolded long time ago.

The fusyn patch provides exactly the same capabilities than futexes
using the fuqueues [except for the requeue bits and the FUTEX_FD, that
should be quick to implement], albeit using different system calls.
It should not be hard to redirect sys_futex() to the sys_ufuqueue_*()
calls.

As people has pointed out already, there is a fast path--the slowdown
that we see is caused by the in-kernel overhead, and we are looking
into that, because even 10% is not acceptable.

[for the record, my belief is that it is caused because we need to
use IRQ locks--and a total of some more spinlocks to do the lock/
unlock operations in the kernel--this is what we are profiling now].

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own (and my fault)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1
@ 2004-08-05 18:16 Perez-Gonzalez, Inaky
  0 siblings, 0 replies; 22+ messages in thread
From: Perez-Gonzalez, Inaky @ 2004-08-05 18:16 UTC (permalink / raw)
  To: Ulrich Drepper, Andrew Morton
  Cc: linux-kernel, robustmutexes, rusty, mingo, jamie

> From: Ulrich Drepper [mailto:drepper@redhat.com]


> Andrew Morton wrote:
> 
> > How large is the slowdown, and on what workloads?
> 
> The fast path for all locking primitives etc in nptl today is entirely
> at userlevel.  Normally just a single atomic operation with a dozen
> other instructions.  With the fusyn stuff each and every locking
> operation needs a system call to register/unregister the thread as it
> locks/unlocks mutex/rwlocks/etc.  Go figure how well this works.  We are
> talking about making the fast path of the locking primitives
> two/three/four orders of magnitude more expensive.  And this for
> absolutely no benefit for 99.999% of all the code which uses threads.

Just for the record, Ulrich. This is not correct--there is a fast path,
and you only need to use the kernel if 

a) there is contention
b) your architecture doesn't have the atomic cmpxchg for doing the fast-path

even in b), if you want fast-path, you can use the ufuqueue calls like
you would use futexes to do a fast-path, loosing the robustness and
advanced real-time thingies [you still get the scheduling-policy based
unlock/wakeup].

If one of those (b) arches wants the robustness and stuff, it then
_needs_ to go always through the kernel [slow]. 

Of course you don't want to penalize normal users, so it is easy to 
have a simple switch in user-space that will do it one way or the 
other depending on the attributes of the mutex they select [we haven't 
coded a proof-of-concept for  this yet, but we want to do it during fall].


> > Passing the lock to a non-rt task when there's an rt-task waiting for it
> > seems pretty poor form, too.
> 
> No no, that's not what is wanted.  Robust mutexes are a special kind of
> mutex and not related to rt issues.  Lockers of robust mutexes have to
> register with the kernel (i.e., the locking must actually be performed
> by the kernel) so that in case the thread goes away or the entire

Small point here--this only happens when there are waiters in the kernel
already. If there is no contention, the locker fast-locks in user space
only using his PID [this is weak, read the paper for some solutions to
avoid PID reusage collision]. If it dies, next time somebody tries to
lock sees the user space word (vfulock) locked and goes to the kernel.
The kernel looks it up, and if the PID doesn't exist, declares the mutex
dead and assigns 'current' ownership to it, back to user space.

 
> ... This is very useful for
> normal operations where mutexes are used inter-process.  This is the
> part which is independent from rt but it also must not be the default
> mode (i.e., normal pthread_mutex_t code must not be replaced) since it
> is significantly slower.

So even in the robustness case there is fast-path. It is not significantly
slower.

> The rest of the extensions like all the priority handling is not of
> general interest.  POSIX describes how a thread's priority would be
> temporarily raised if it holds a mutex which has a higher-priority
> waiter.  But this is all functionality of a realtime profile and widely
> not part of the normal implementation.

This is not what I am hearing from embedded and enterprise guys. I just
wish they were more vocal and not expressed themselves only in private
mails--I understand you need a proof though.

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own (and my fault)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1
@ 2004-08-05  8:39 Eric Valette
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Valette @ 2004-08-05  8:39 UTC (permalink / raw)
  To: drepper; +Cc: Andrew Morton, Linux Kernel Mailing List

Ulrich Drepper wote

>> How large is the slowdown, and on what workloads?
> 
> The fast path for all locking primitives etc in nptl today is entirely
> at userlevel. Normally just a single atomic operation with a dozen
> other instructions. With the fusyn stuff each and every locking
> operation needs a system call to register/unregister the thread as it
> locks/unlocks mutex/rwlocks/etc. Go figure how well this works. We are
> talking about making the fast path of the locking primitives
> two/three/four orders of magnitude more expensive. And this for
> absolutely no benefit for 99.999% of all the code which uses threads.

Frankly, there is no way you can escape calling the kernel to perform 
anything usefull related to respect of scheduling priorities and of 
course it cost more than a user spec only thing. But programming RT 
application correctly if priority invesion due to locking occurs is 
almost _impossible_. So most RT systems (including Open Source ones) 
have two kind of primitives (e.g Jaluna (mutex/semaphore (hint hint), 
RTEMS various kind of semaphore), the users level only one. Most of the 
time people are readdy to pay the price for determinism.

>> Passing the lock to a non-rt task when there's an rt-task waiting for it
>> seems pretty poor form, too.

Exactly.

> No no, that's not what is wanted. Robust mutexes are a special kind of
> mutex and not related to rt issues. Lockers of robust mutexes have to
> register with the kernel (i.e., the locking must actually be performed
> by the kernel) so that in case the thread goes away or the entire
> process dies, the mutex is unlocked and other waiters (other threads, in
> the same or other processes) can get the lock. This is very useful for
> normal operations where mutexes are used inter-process. This is the
> part which is independent from rt but it also must not be the default
> mode (i.e., normal pthread_mutex_t code must not be replaced) since it
> is significantly slower.

Robust mutext could also then be used then for dealing with priority 
inversion, handling of thread priorities when dequeing, ... But if you 
cannot access theses functionnality without going away of posix API, 
that's a pity.

> The rest of the extensions like all the priority handling is not of
> general interest. POSIX describes how a thread's priority would be
> temporarily raised if it holds a mutex which has a higher-priority
> waiter. But this is all functionality of a realtime profile and widely
> not part of the normal implementation.

I guess you do not read linuxdevices often enough : linux is becoming a
major player in the embeeded market place and RT behavior is important 
here. Given the work that already occured on reducing scheduling latency 
and continue with the voluntary premption patche trial, I guess it is 
time to make application at least able to benefit of theses enhancements.

Question for Andrew : I have seen the IRQ handler -> IRQ thread handler 
conversion patch and for me this will go about nowhere (from experience) 
but I'm wondering why nobody actually proposed as way to define logical 
interrupt priorities (e.g by applying a mask on the 8259 rather than 
just masking the current interrupt). More details at 
<http://www.rtems.org/cgi-bin/viewcvs.cgi/rtems/c/src/lib/libbsp/i386/shared/irq/> 


Defining a generic API, is very complicated and has been given up by 
almost RTOS vendor but defining priorities among interrupts is important 
and threads simply too costly for interrupt driven applications :-)

-- 
    __
   /  `                   	Eric Valette
  /--   __  o _.          	6 rue Paul Le Flem
(___, / (_(_(__         	35740 Pace

Tel: +33 (0)2 99 85 26 76	Fax: +33 (0)2 99 85 26 76
E-mail: eric.valette@free.fr




^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2004-08-05 19:14 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-08-04  9:13 [RFC/PATCH] FUSYN Realtime & robust mutexes for Linux, v2.3.1 Perez-Gonzalez, Inaky
2004-08-05  6:21 ` Andrew Morton
2004-08-05  7:06   ` Ulrich Drepper
2004-08-05  7:17     ` Andrew Morton
2004-08-05  7:37       ` Ulrich Drepper
2004-08-05  7:40         ` Andrew Morton
2004-08-05  8:22           ` Ulrich Drepper
2004-08-05 10:42         ` Ingo Molnar
2004-08-05 11:48         ` Rusty Russell
2004-08-05 13:23           ` Linh Dang
2004-08-05 13:26         ` Linh Dang
2004-08-05 14:02         ` Chris Friesen
2004-08-05 10:34 ` Ingo Molnar
2004-08-05 10:59   ` Ingo Molnar
2004-08-05  8:39 Eric Valette
2004-08-05 18:16 Perez-Gonzalez, Inaky
2004-08-05 18:16 Perez-Gonzalez, Inaky
2004-08-05 18:16 Perez-Gonzalez, Inaky
2004-08-05 18:22 Perez-Gonzalez, Inaky
2004-08-05 18:37 Perez-Gonzalez, Inaky
2004-08-05 18:39 Perez-Gonzalez, Inaky
2004-08-05 18:39 Perez-Gonzalez, Inaky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.