* Re: futex(2) man page update help request
@ 2014-05-15 14:14 ` Thomas Gleixner
0 siblings, 0 replies; 145+ messages in thread
From: Thomas Gleixner @ 2014-05-15 14:14 UTC (permalink / raw)
To: Michael Kerrisk (man-pages)
Cc: Carlos O'Donell, Darren Hart, Ingo Molnar, Jakub Jelinek,
linux-man-u79uwXL29TY76Z2rM5mHXA, lkml, Davidlohr Bueso,
Arnd Bergmann, Steven Rostedt, Peter Zijlstra, Linux API
On Thu, 15 May 2014, Michael Kerrisk (man-pages) wrote:
> And that universe would love to have your documentation of
> FUTEX_WAKE_BITSET and FUTEX_WAIT_BITSET ;-),
I give you almost the full treatment, but I leave REQUEUE_PI to Darren
and FUTEX_WAKE_OP to Jakub. :)
FUTEX_WAIT
< Existing blurb seems ok >
Related return values
[EFAULT] Kernel was unable to access the futex value at uaddr.
[EINVAL] The supplied uaddr argument does not point to a valid
object, i.e. pointer is not 4 byte aligned
[EINVAL] The supplied timeout argument is not normalized.
[EWOULDBLOCK] The atomic enqueueing failed. User space value
at uaddr is not equal val argument.
[ETIMEDOUT] timeout expired
FUTEX_WAKE
< Existing blurb seems ok >
Related return values
[EFAULT] Kernel was unable to access the futex value at uaddr.
[EINVAL] The supplied uaddr argument does not point to a valid
object, i.e. pointer is not 4 byte aligned
[EINVAL] The kernel detected inconsistent state between the
user space state at uaddr and the kernel state,
i.e. it detected a waiter which waits in
FUTEX_LOCK_PI
FUTEX_REQUEUE
Existing blurb seems ok , except for this:
The argument val contains the number of waiters on uaddr which
are immediately woken up.
The timeout argument is abused to transport the number of
waiters which are requeued to the futex at uaddr2. The pointer
is typecasted to u32.
[EFAULT] Kernel was unable to access the futex value at uaddr or uaddr2
[EINVAL] The supplied uaddr/uaddr2 arguments do not point to a
valid object, i.e. pointer is not 4 byte aligned
[EINVAL] The kernel detected inconsistent state between the
user space state at uaddr and the kernel state,
i.e. it detected a waiter which waits in
FUTEX_LOCK_PI on uaddr
[EINVAL] uaddr equal uaddr2. Requeue to same futex.
FUTEX_REQUEUE_CMP
Existing blurb seems ok , except for this:
The argument val is contains the number of waiters on uaddr
which are immediately woken up.
The timeout argument is abused to transport the number of
waiters which are requeued to the futex at uaddr2. The pointer
is typecasted to u32.
Related return values
[EFAULT] Kernel was unable to access the futex value at uaddr or uaddr2
[EINVAL] The supplied uaddr/uaddr2 arguments do not point to a
valid object, i.e. pointer is not 4 byte aligned
[EINVAL] uaddr equal uaddr2. Requeue to same futex.
[EINVAL] The kernel detected inconsistent state between the
user space state at uaddr and the kernel state,
i.e. it detected a waiter which waits in
FUTEX_LOCK_PI on uaddr
[EAGAIN] uaddr1 readout is not equal the compare value in
argument val3
FUTEX_WAKE_OP
Jakub, can you please explain it? I'm lost :)
The argument val contains the maximum number of waiters on
uaddr which are immediately woken up.
The timeout argument is abused to transport the maximum
number of waiters on uaddr2 which are woken up. The pointer
is typecasted to u32.
Related return values
[EFAULT] Kernel was unable to access the futex values at uaddr
or uaddr2
[EINVAL] The supplied uaddr or uaddr2 argument does not point
to a valid object, i.e. pointer is not 4 byte aligned
[EINVAL] The kernel detected inconsistent state between the
user space state at uaddr and the kernel state,
i.e. it detected a waiter which waits in
FUTEX_LOCK_PI on uaddr
FUTEX_WAIT_BITSET
The same as FUTEX_WAIT except that val3 is used to provide a
32bit bitset to the kernel. This bitset is stored in the
kernel internal state of the waiter.
This futex op also allows to have the option bit
FUTEX_CLOCK_REALTIME set.
Related return values
[EFAULT] Kernel was unable to access the futex value at uaddr.
[EINVAL] The supplied uaddr argument does not point to a valid
object, i.e. pointer is not 4 byte aligned
[EINVAL] The supplied bitset is zero.
[EINVAL] The supplied timeout argument is not normalized.
[ETIMEDOUT] timeout expired
FUTEX_WAKE_BITSET
The same as FUTEX_WAKE except that val3 is used to provide a
32bit bitset to the kernel. This bitset is used to select
waiters on the futex. The selection is done by a bitwise AND
of the wake side supplied bitset and the bitset which is
stored in the kernel internal state of the waiters. If the
result is non zero, the waiter is woken, otherwise left
waiting.
[EFAULT] Kernel was unable to access the futex value at uaddr.
[EINVAL] The supplied uaddr argument does not point to a valid
object, i.e. pointer is not 4 byte aligned
[EINVAL] The supplied bitset is zero.
[EINVAL] The kernel detected inconsistent state between the
user space state at uaddr and the kernel state,
i.e. it detected a waiter which waits in
FUTEX_LOCK_PI
FUTEX_LOCK_PI
This operation reads from the futex address provided by the
uaddr argument, which contains the namespace specific TID of
the lock owner. If the TID is 0, then the kernel tries to set
the waiters TID atomically. If the TID is nonzero or the take
over fails the kernel sets atomically the FUTEX_WAITERS bit
which signals the owner, that it cannot unlock the futex in
user space atomically by transitioning from TID to 0. After
that the kernel tries to find the task which is associated to
the owner TID, creates or reuses kernel state on behalf of the
owner and attaches the waiter to it. The enqueing of the
waiter is in descending priority order if more than one waiter
exists. The owner inherits either the priority or the
bandwidth of the waiter. This inheritance follows the lock
chain in the case of nested locking and performs deadlock
detection.
The timeout argument is handled as described in FUTEX_WAIT.
The arguments uaddr2, val, and val3 are ignored.
Related return values
[EFAULT] Kernel was unable to access the futex value at uaddr.
[ENOMEM] Kernel could not allocate state
[EINVAL] The supplied uaddr argument does not point to a valid
object, i.e. pointer is not 4 byte aligned
[EINVAL] The supplied timeout argument is not normalized.
[EINVAL] The kernel detected inconsistent state between the
user space state at uaddr and the kernel state. Thats
either state corruption or it found a waiter on uaddr
which is waiting on FUTEX_WAIT[_BITSET]
[EPERM] Caller is not allowed to attach itself to the futex.
Can be a legitimate issue or a hint for state
corruption in user space
[ESRCH] The TID in the user space value does not exist
[EAGAIN] The futex owner TID is about to exit, but has not yet
handled the internal state cleanup. Try again.
[ETIMEDOUT] timeout expired
[EDEADLOCK] The futex is already locked by the caller or the kernel
detected a deadlock scenario in a nested lock chain
[EOWNERDIED] The owner of the futex died and the kernel made the
caller the new owner. The kernel sets the
FUTEX_OWNER_DIED bit in the futex userspace value.
Caller is responsible for cleanup
[ENOSYS] Not implemented on all architectures and not supported
on some CPU variants (runtime detection)
FUTEX_TRYLOCK_PI
This operation tries to acquire the futex at uaddr. It deals
with the situation where the TID value at uaddr is 0, but the
FUTEX_HAS_WAITER bit is set. User space cannot handle this
race free.
The arguments uaddr2, val, timeout and val3 are ignored.
Return values:
[EFAULT] Kernel was unable to access the futex value at uaddr.
[ENOMEM] Kernel could not allocate state
[EINVAL] The supplied uaddr argument does not point to a valid
object, i.e. pointer is not 4 byte aligned
[EINVAL] The kernel detected inconsistent state between the user
space state at uaddr and the kernel state
[EPERM] Caller is not allowed to attach itself to the futex.
Can be a legitimate issue or a hint for state
corruption in user space
[ESRCH] The TID in the user space value does not exist
[EAGAIN] The futex owner TID is about to exit, but has not yet
handled the internal state cleanup. Try again.
[EDEADLOCK] The futex is already locked by the caller.
[EOWNERDIED] The owner of the futex died and the kernel made the
caller the new owner. The kernel sets the
FUTEX_OWNER_DIED bit in the futex userspace value.
Caller is responsible for cleanup
[ENOSYS] Not implemented on all architectures and not supported
on some CPU variants (runtime detection)
FUTEX_UNLOCK_PI
This operation wakes the top priority waiter which is waiting
in FUTEX_LOCK_PI on the futex address provided by the uaddr
argument.
This is called when the user space value at uaddr cannot be
changed atomically from TID (of the owner) to 0.
The arguments uaddr2, val, timeout and val3 are ignored.
Related return values:
[EINVAL] The kernel detected inconsistent state between the
user space state at uaddr and the kernel state,
i.e. it detected a waiter which waits in
FUTEX_WAIT[_BITSET].
[EPERM] Caller does not own the futex.
[ENOSYS] Not implemented on all architectures and not supported
on some CPU variants (runtime detection)
FUTEX_WAIT_REQUEUE_PI
Wait operation to wait on a non pi futex at uaddr and
potentially be requeued on a pi futex at uaddr2. The wait
operation on uaddr is the same as FUTEX_WAIT. The waiter can
be removed from the wait on uaddr via FUTEX_WAKE without
requeuing on uaddr2.
The timeout argument is handled as described in FUTEX_WAIT.
Darren, can you fill in the missing details?
Return values:
[EFAULT] Kernel was unable to access the futex value at uaddr
or uaddr2
[EINVAL] The supplied uaddr or uaddr2 argument does not point
to a valid object, i.e. pointer is not 4 byte aligned
[EINVAL] The supplied timeout argument is not normalized.
[EINVAL] The supplied bitset is zero.
[EWOULDBLOCK] The atomic enqueueing failed. User space value
at uaddr is not equal val argument.
[ETIMEDOUT] timeout expired
[EOWNERDIED] The owner of the PI futex at uaddr2 died and the
kernel made the caller the new owner. The kernel
sets the FUTEX_OWNER_DIED bit in the uaddr2 futex
userspace value. Caller is responsible for
cleanup
[ENOSYS] Not implemented on all architectures and not supported
on some CPU variants (runtime detection)
FUTEX_CMP_REQUEUE_PI
PI aware variant of FUTEX_CMP_REQUEUE. Inner futex at uaddr is
a non PI futex. Outer futex to which is requeued is a PI futex
at uaddr2.
The waiters on uaddr must wait in FUTEX_WAIT_REQUEUE_PI.
The argument val is contains the number of waiters on uaddr
which are immediately woken up. Must be 1 for this opcode.
The timeout argument is abused to transport the number of
waiters which are requeued on to the futex at uaddr2. The
pointer is typecasted to u32.
Darren, can you fill in the missing details?
[EFAULT] Kernel was unable to access the futex value at uaddr
or uaddr2
[ENOMEM] Kernel could not allocate state
[EINVAL] The supplied uaddr/uaddr2 arguments do not point to a
valid object, i.e. pointer is not 4 byte aligned
[EINVAL] uaddr equal uaddr2. Requeue to same futex.
[EINVAL] The kernel detected inconsistent state between the
user space state at uaddr and the kernel state,
i.e. it detected a waiter which waits in
FUTEX_LOCK_PI on uaddr
[EINVAL] The kernel detected inconsistent state between the
user space state at uaddr and the kernel state,
i.e. it detected a waiter which waits in
FUTEX_WAIT[_BITSET] on uaddr
[EINVAL] The kernel detected inconsistent state between the
user space state at uaddr2 and the kernel state,
i.e. it detected a waiter which waits in
FUTEX_WAIT on uaddr2.
[EINVAL] The supplied bitset is zero.
[EAGAIN] uaddr1 readout is not equal the compare value in
argument val3
[EAGAIN] The futex owner TID of uaddr2 is about to exit, but
has not yet handled the internal state cleanup. Try
again.
[EPERM] Caller is not allowed to attach the waiter to the
futex at uaddr2 Can be a legitimate issue or a hint
for state corruption in user space
[ESRCH] The TID in the user space value at uaddr2 does not exist
[EDEADLOCK] The requeuing of a waiter to the kernel representation
of the PI futex at uaddr2 detected a deadlock scenario.
[ENOSYS] Not implemented on all architectures and not supported
on some CPU variants (runtime detection)
The various option bits seem to be undocumented as well
FUTEX_PRIVATE_FLAG
This option bit can be ored on all futex ops.
It tells the kernel, that the futex is process private and not
shared with another process. That allows the kernel to chose
the fast path for validating the user space address and avoids
expensive VMA lookup, taking refcounts on file backing store
etc.
FUTEX_CLOCK_REALTIME
This option bit can be ored on the futex ops FUTEX_WAIT_BITSET
and FUTEX_WAIT_REQUEUE_PI
If set the kernel treats the user space supplied timeout as
absolute time based on CLOCK_REALTIME.
If not set the kernel treats the user space supplied timeout
as relative time.
If this is set on any other op than the supported ones, kernel
returns ENOSYS!
Thanks,
tglx
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2014-05-15 20:19 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 145+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-05-15 20:19 UTC (permalink / raw)
To: Thomas Gleixner
Cc: mtk.manpages, Carlos O'Donell, Darren Hart, Ingo Molnar,
Jakub Jelinek, linux-man, lkml, Davidlohr Bueso, Arnd Bergmann,
Steven Rostedt, Peter Zijlstra, Linux API
On 05/15/2014 04:14 PM, Thomas Gleixner wrote:
> On Thu, 15 May 2014, Michael Kerrisk (man-pages) wrote:
>> And that universe would love to have your documentation of
>> FUTEX_WAKE_BITSET and FUTEX_WAIT_BITSET ;-),
>
> I give you almost the full treatment, but I leave REQUEUE_PI to Darren
> and FUTEX_WAKE_OP to Jakub. :)
Thanks Thomas--that's fantastic! Hopefully, Darren and Jakub fill in those
missing pieces...
Cheers,
Michael
> FUTEX_WAIT
>
> < Existing blurb seems ok >
>
> Related return values
>
> [EFAULT] Kernel was unable to access the futex value at uaddr.
>
> [EINVAL] The supplied uaddr argument does not point to a valid
> object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] The supplied timeout argument is not normalized.
>
> [EWOULDBLOCK] The atomic enqueueing failed. User space value
> at uaddr is not equal val argument.
>
> [ETIMEDOUT] timeout expired
>
>
> FUTEX_WAKE
>
> < Existing blurb seems ok >
>
> Related return values
>
> [EFAULT] Kernel was unable to access the futex value at uaddr.
>
> [EINVAL] The supplied uaddr argument does not point to a valid
> object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_LOCK_PI
>
> FUTEX_REQUEUE
>
> Existing blurb seems ok , except for this:
>
> The argument val contains the number of waiters on uaddr which
> are immediately woken up.
>
> The timeout argument is abused to transport the number of
> waiters which are requeued to the futex at uaddr2. The pointer
> is typecasted to u32.
>
>
> [EFAULT] Kernel was unable to access the futex value at uaddr or uaddr2
>
> [EINVAL] The supplied uaddr/uaddr2 arguments do not point to a
> valid object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_LOCK_PI on uaddr
>
> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>
> FUTEX_REQUEUE_CMP
>
> Existing blurb seems ok , except for this:
>
> The argument val is contains the number of waiters on uaddr
> which are immediately woken up.
>
> The timeout argument is abused to transport the number of
> waiters which are requeued to the futex at uaddr2. The pointer
> is typecasted to u32.
>
> Related return values
>
> [EFAULT] Kernel was unable to access the futex value at uaddr or uaddr2
>
> [EINVAL] The supplied uaddr/uaddr2 arguments do not point to a
> valid object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_LOCK_PI on uaddr
>
> [EAGAIN] uaddr1 readout is not equal the compare value in
> argument val3
>
> FUTEX_WAKE_OP
>
>
> Jakub, can you please explain it? I'm lost :)
>
>
> The argument val contains the maximum number of waiters on
> uaddr which are immediately woken up.
>
> The timeout argument is abused to transport the maximum
> number of waiters on uaddr2 which are woken up. The pointer
> is typecasted to u32.
>
> Related return values
>
> [EFAULT] Kernel was unable to access the futex values at uaddr
> or uaddr2
>
> [EINVAL] The supplied uaddr or uaddr2 argument does not point
> to a valid object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_LOCK_PI on uaddr
>
>
> FUTEX_WAIT_BITSET
>
> The same as FUTEX_WAIT except that val3 is used to provide a
> 32bit bitset to the kernel. This bitset is stored in the
> kernel internal state of the waiter.
>
> This futex op also allows to have the option bit
> FUTEX_CLOCK_REALTIME set.
>
> Related return values
>
> [EFAULT] Kernel was unable to access the futex value at uaddr.
>
> [EINVAL] The supplied uaddr argument does not point to a valid
> object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] The supplied bitset is zero.
>
> [EINVAL] The supplied timeout argument is not normalized.
>
> [ETIMEDOUT] timeout expired
>
>
> FUTEX_WAKE_BITSET
>
> The same as FUTEX_WAKE except that val3 is used to provide a
> 32bit bitset to the kernel. This bitset is used to select
> waiters on the futex. The selection is done by a bitwise AND
> of the wake side supplied bitset and the bitset which is
> stored in the kernel internal state of the waiters. If the
> result is non zero, the waiter is woken, otherwise left
> waiting.
>
> [EFAULT] Kernel was unable to access the futex value at uaddr.
>
> [EINVAL] The supplied uaddr argument does not point to a valid
> object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] The supplied bitset is zero.
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_LOCK_PI
>
> FUTEX_LOCK_PI
>
> This operation reads from the futex address provided by the
> uaddr argument, which contains the namespace specific TID of
> the lock owner. If the TID is 0, then the kernel tries to set
> the waiters TID atomically. If the TID is nonzero or the take
> over fails the kernel sets atomically the FUTEX_WAITERS bit
> which signals the owner, that it cannot unlock the futex in
> user space atomically by transitioning from TID to 0. After
> that the kernel tries to find the task which is associated to
> the owner TID, creates or reuses kernel state on behalf of the
> owner and attaches the waiter to it. The enqueing of the
> waiter is in descending priority order if more than one waiter
> exists. The owner inherits either the priority or the
> bandwidth of the waiter. This inheritance follows the lock
> chain in the case of nested locking and performs deadlock
> detection.
>
> The timeout argument is handled as described in FUTEX_WAIT.
> The arguments uaddr2, val, and val3 are ignored.
>
> Related return values
>
> [EFAULT] Kernel was unable to access the futex value at uaddr.
>
> [ENOMEM] Kernel could not allocate state
>
> [EINVAL] The supplied uaddr argument does not point to a valid
> object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] The supplied timeout argument is not normalized.
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr and the kernel state. Thats
> either state corruption or it found a waiter on uaddr
> which is waiting on FUTEX_WAIT[_BITSET]
>
> [EPERM] Caller is not allowed to attach itself to the futex.
> Can be a legitimate issue or a hint for state
> corruption in user space
>
> [ESRCH] The TID in the user space value does not exist
>
> [EAGAIN] The futex owner TID is about to exit, but has not yet
> handled the internal state cleanup. Try again.
>
> [ETIMEDOUT] timeout expired
>
> [EDEADLOCK] The futex is already locked by the caller or the kernel
> detected a deadlock scenario in a nested lock chain
>
> [EOWNERDIED] The owner of the futex died and the kernel made the
> caller the new owner. The kernel sets the
> FUTEX_OWNER_DIED bit in the futex userspace value.
> Caller is responsible for cleanup
>
> [ENOSYS] Not implemented on all architectures and not supported
> on some CPU variants (runtime detection)
>
> FUTEX_TRYLOCK_PI
>
> This operation tries to acquire the futex at uaddr. It deals
> with the situation where the TID value at uaddr is 0, but the
> FUTEX_HAS_WAITER bit is set. User space cannot handle this
> race free.
>
> The arguments uaddr2, val, timeout and val3 are ignored.
>
> Return values:
>
> [EFAULT] Kernel was unable to access the futex value at uaddr.
>
> [ENOMEM] Kernel could not allocate state
>
> [EINVAL] The supplied uaddr argument does not point to a valid
> object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] The kernel detected inconsistent state between the user
> space state at uaddr and the kernel state
>
> [EPERM] Caller is not allowed to attach itself to the futex.
> Can be a legitimate issue or a hint for state
> corruption in user space
>
> [ESRCH] The TID in the user space value does not exist
>
> [EAGAIN] The futex owner TID is about to exit, but has not yet
> handled the internal state cleanup. Try again.
>
> [EDEADLOCK] The futex is already locked by the caller.
>
> [EOWNERDIED] The owner of the futex died and the kernel made the
> caller the new owner. The kernel sets the
> FUTEX_OWNER_DIED bit in the futex userspace value.
> Caller is responsible for cleanup
>
> [ENOSYS] Not implemented on all architectures and not supported
> on some CPU variants (runtime detection)
>
> FUTEX_UNLOCK_PI
>
> This operation wakes the top priority waiter which is waiting
> in FUTEX_LOCK_PI on the futex address provided by the uaddr
> argument.
>
> This is called when the user space value at uaddr cannot be
> changed atomically from TID (of the owner) to 0.
>
> The arguments uaddr2, val, timeout and val3 are ignored.
>
> Related return values:
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_WAIT[_BITSET].
>
> [EPERM] Caller does not own the futex.
>
> [ENOSYS] Not implemented on all architectures and not supported
> on some CPU variants (runtime detection)
>
> FUTEX_WAIT_REQUEUE_PI
>
> Wait operation to wait on a non pi futex at uaddr and
> potentially be requeued on a pi futex at uaddr2. The wait
> operation on uaddr is the same as FUTEX_WAIT. The waiter can
> be removed from the wait on uaddr via FUTEX_WAKE without
> requeuing on uaddr2.
>
> The timeout argument is handled as described in FUTEX_WAIT.
>
> Darren, can you fill in the missing details?
>
> Return values:
>
> [EFAULT] Kernel was unable to access the futex value at uaddr
> or uaddr2
>
> [EINVAL] The supplied uaddr or uaddr2 argument does not point
> to a valid object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] The supplied timeout argument is not normalized.
>
> [EINVAL] The supplied bitset is zero.
>
> [EWOULDBLOCK] The atomic enqueueing failed. User space value
> at uaddr is not equal val argument.
>
> [ETIMEDOUT] timeout expired
>
> [EOWNERDIED] The owner of the PI futex at uaddr2 died and the
> kernel made the caller the new owner. The kernel
> sets the FUTEX_OWNER_DIED bit in the uaddr2 futex
> userspace value. Caller is responsible for
> cleanup
>
> [ENOSYS] Not implemented on all architectures and not supported
> on some CPU variants (runtime detection)
>
>
> FUTEX_CMP_REQUEUE_PI
>
> PI aware variant of FUTEX_CMP_REQUEUE. Inner futex at uaddr is
> a non PI futex. Outer futex to which is requeued is a PI futex
> at uaddr2.
>
> The waiters on uaddr must wait in FUTEX_WAIT_REQUEUE_PI.
>
> The argument val is contains the number of waiters on uaddr
> which are immediately woken up. Must be 1 for this opcode.
>
> The timeout argument is abused to transport the number of
> waiters which are requeued on to the futex at uaddr2. The
> pointer is typecasted to u32.
>
> Darren, can you fill in the missing details?
>
> [EFAULT] Kernel was unable to access the futex value at uaddr
> or uaddr2
>
> [ENOMEM] Kernel could not allocate state
>
> [EINVAL] The supplied uaddr/uaddr2 arguments do not point to a
> valid object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_LOCK_PI on uaddr
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_WAIT[_BITSET] on uaddr
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr2 and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_WAIT on uaddr2.
>
> [EINVAL] The supplied bitset is zero.
>
> [EAGAIN] uaddr1 readout is not equal the compare value in
> argument val3
>
> [EAGAIN] The futex owner TID of uaddr2 is about to exit, but
> has not yet handled the internal state cleanup. Try
> again.
>
> [EPERM] Caller is not allowed to attach the waiter to the
> futex at uaddr2 Can be a legitimate issue or a hint
> for state corruption in user space
>
> [ESRCH] The TID in the user space value at uaddr2 does not exist
>
> [EDEADLOCK] The requeuing of a waiter to the kernel representation
> of the PI futex at uaddr2 detected a deadlock scenario.
>
> [ENOSYS] Not implemented on all architectures and not supported
> on some CPU variants (runtime detection)
>
>
> The various option bits seem to be undocumented as well
>
> FUTEX_PRIVATE_FLAG
>
> This option bit can be ored on all futex ops.
>
> It tells the kernel, that the futex is process private and not
> shared with another process. That allows the kernel to chose
> the fast path for validating the user space address and avoids
> expensive VMA lookup, taking refcounts on file backing store
> etc.
>
> FUTEX_CLOCK_REALTIME
>
> This option bit can be ored on the futex ops FUTEX_WAIT_BITSET
> and FUTEX_WAIT_REQUEUE_PI
>
> If set the kernel treats the user space supplied timeout as
> absolute time based on CLOCK_REALTIME.
>
> If not set the kernel treats the user space supplied timeout
> as relative time.
>
> If this is set on any other op than the supported ones, kernel
> returns ENOSYS!
>
>
> Thanks,
>
> tglx
>
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2014-05-15 20:19 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 145+ messages in thread
From: Michael Kerrisk (man-pages) @ 2014-05-15 20:19 UTC (permalink / raw)
To: Thomas Gleixner
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Carlos O'Donell,
Darren Hart, Ingo Molnar, Jakub Jelinek,
linux-man-u79uwXL29TY76Z2rM5mHXA, lkml, Davidlohr Bueso,
Arnd Bergmann, Steven Rostedt, Peter Zijlstra, Linux API
On 05/15/2014 04:14 PM, Thomas Gleixner wrote:
> On Thu, 15 May 2014, Michael Kerrisk (man-pages) wrote:
>> And that universe would love to have your documentation of
>> FUTEX_WAKE_BITSET and FUTEX_WAIT_BITSET ;-),
>
> I give you almost the full treatment, but I leave REQUEUE_PI to Darren
> and FUTEX_WAKE_OP to Jakub. :)
Thanks Thomas--that's fantastic! Hopefully, Darren and Jakub fill in those
missing pieces...
Cheers,
Michael
> FUTEX_WAIT
>
> < Existing blurb seems ok >
>
> Related return values
>
> [EFAULT] Kernel was unable to access the futex value at uaddr.
>
> [EINVAL] The supplied uaddr argument does not point to a valid
> object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] The supplied timeout argument is not normalized.
>
> [EWOULDBLOCK] The atomic enqueueing failed. User space value
> at uaddr is not equal val argument.
>
> [ETIMEDOUT] timeout expired
>
>
> FUTEX_WAKE
>
> < Existing blurb seems ok >
>
> Related return values
>
> [EFAULT] Kernel was unable to access the futex value at uaddr.
>
> [EINVAL] The supplied uaddr argument does not point to a valid
> object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_LOCK_PI
>
> FUTEX_REQUEUE
>
> Existing blurb seems ok , except for this:
>
> The argument val contains the number of waiters on uaddr which
> are immediately woken up.
>
> The timeout argument is abused to transport the number of
> waiters which are requeued to the futex at uaddr2. The pointer
> is typecasted to u32.
>
>
> [EFAULT] Kernel was unable to access the futex value at uaddr or uaddr2
>
> [EINVAL] The supplied uaddr/uaddr2 arguments do not point to a
> valid object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_LOCK_PI on uaddr
>
> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>
> FUTEX_REQUEUE_CMP
>
> Existing blurb seems ok , except for this:
>
> The argument val is contains the number of waiters on uaddr
> which are immediately woken up.
>
> The timeout argument is abused to transport the number of
> waiters which are requeued to the futex at uaddr2. The pointer
> is typecasted to u32.
>
> Related return values
>
> [EFAULT] Kernel was unable to access the futex value at uaddr or uaddr2
>
> [EINVAL] The supplied uaddr/uaddr2 arguments do not point to a
> valid object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_LOCK_PI on uaddr
>
> [EAGAIN] uaddr1 readout is not equal the compare value in
> argument val3
>
> FUTEX_WAKE_OP
>
>
> Jakub, can you please explain it? I'm lost :)
>
>
> The argument val contains the maximum number of waiters on
> uaddr which are immediately woken up.
>
> The timeout argument is abused to transport the maximum
> number of waiters on uaddr2 which are woken up. The pointer
> is typecasted to u32.
>
> Related return values
>
> [EFAULT] Kernel was unable to access the futex values at uaddr
> or uaddr2
>
> [EINVAL] The supplied uaddr or uaddr2 argument does not point
> to a valid object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_LOCK_PI on uaddr
>
>
> FUTEX_WAIT_BITSET
>
> The same as FUTEX_WAIT except that val3 is used to provide a
> 32bit bitset to the kernel. This bitset is stored in the
> kernel internal state of the waiter.
>
> This futex op also allows to have the option bit
> FUTEX_CLOCK_REALTIME set.
>
> Related return values
>
> [EFAULT] Kernel was unable to access the futex value at uaddr.
>
> [EINVAL] The supplied uaddr argument does not point to a valid
> object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] The supplied bitset is zero.
>
> [EINVAL] The supplied timeout argument is not normalized.
>
> [ETIMEDOUT] timeout expired
>
>
> FUTEX_WAKE_BITSET
>
> The same as FUTEX_WAKE except that val3 is used to provide a
> 32bit bitset to the kernel. This bitset is used to select
> waiters on the futex. The selection is done by a bitwise AND
> of the wake side supplied bitset and the bitset which is
> stored in the kernel internal state of the waiters. If the
> result is non zero, the waiter is woken, otherwise left
> waiting.
>
> [EFAULT] Kernel was unable to access the futex value at uaddr.
>
> [EINVAL] The supplied uaddr argument does not point to a valid
> object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] The supplied bitset is zero.
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_LOCK_PI
>
> FUTEX_LOCK_PI
>
> This operation reads from the futex address provided by the
> uaddr argument, which contains the namespace specific TID of
> the lock owner. If the TID is 0, then the kernel tries to set
> the waiters TID atomically. If the TID is nonzero or the take
> over fails the kernel sets atomically the FUTEX_WAITERS bit
> which signals the owner, that it cannot unlock the futex in
> user space atomically by transitioning from TID to 0. After
> that the kernel tries to find the task which is associated to
> the owner TID, creates or reuses kernel state on behalf of the
> owner and attaches the waiter to it. The enqueing of the
> waiter is in descending priority order if more than one waiter
> exists. The owner inherits either the priority or the
> bandwidth of the waiter. This inheritance follows the lock
> chain in the case of nested locking and performs deadlock
> detection.
>
> The timeout argument is handled as described in FUTEX_WAIT.
> The arguments uaddr2, val, and val3 are ignored.
>
> Related return values
>
> [EFAULT] Kernel was unable to access the futex value at uaddr.
>
> [ENOMEM] Kernel could not allocate state
>
> [EINVAL] The supplied uaddr argument does not point to a valid
> object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] The supplied timeout argument is not normalized.
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr and the kernel state. Thats
> either state corruption or it found a waiter on uaddr
> which is waiting on FUTEX_WAIT[_BITSET]
>
> [EPERM] Caller is not allowed to attach itself to the futex.
> Can be a legitimate issue or a hint for state
> corruption in user space
>
> [ESRCH] The TID in the user space value does not exist
>
> [EAGAIN] The futex owner TID is about to exit, but has not yet
> handled the internal state cleanup. Try again.
>
> [ETIMEDOUT] timeout expired
>
> [EDEADLOCK] The futex is already locked by the caller or the kernel
> detected a deadlock scenario in a nested lock chain
>
> [EOWNERDIED] The owner of the futex died and the kernel made the
> caller the new owner. The kernel sets the
> FUTEX_OWNER_DIED bit in the futex userspace value.
> Caller is responsible for cleanup
>
> [ENOSYS] Not implemented on all architectures and not supported
> on some CPU variants (runtime detection)
>
> FUTEX_TRYLOCK_PI
>
> This operation tries to acquire the futex at uaddr. It deals
> with the situation where the TID value at uaddr is 0, but the
> FUTEX_HAS_WAITER bit is set. User space cannot handle this
> race free.
>
> The arguments uaddr2, val, timeout and val3 are ignored.
>
> Return values:
>
> [EFAULT] Kernel was unable to access the futex value at uaddr.
>
> [ENOMEM] Kernel could not allocate state
>
> [EINVAL] The supplied uaddr argument does not point to a valid
> object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] The kernel detected inconsistent state between the user
> space state at uaddr and the kernel state
>
> [EPERM] Caller is not allowed to attach itself to the futex.
> Can be a legitimate issue or a hint for state
> corruption in user space
>
> [ESRCH] The TID in the user space value does not exist
>
> [EAGAIN] The futex owner TID is about to exit, but has not yet
> handled the internal state cleanup. Try again.
>
> [EDEADLOCK] The futex is already locked by the caller.
>
> [EOWNERDIED] The owner of the futex died and the kernel made the
> caller the new owner. The kernel sets the
> FUTEX_OWNER_DIED bit in the futex userspace value.
> Caller is responsible for cleanup
>
> [ENOSYS] Not implemented on all architectures and not supported
> on some CPU variants (runtime detection)
>
> FUTEX_UNLOCK_PI
>
> This operation wakes the top priority waiter which is waiting
> in FUTEX_LOCK_PI on the futex address provided by the uaddr
> argument.
>
> This is called when the user space value at uaddr cannot be
> changed atomically from TID (of the owner) to 0.
>
> The arguments uaddr2, val, timeout and val3 are ignored.
>
> Related return values:
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_WAIT[_BITSET].
>
> [EPERM] Caller does not own the futex.
>
> [ENOSYS] Not implemented on all architectures and not supported
> on some CPU variants (runtime detection)
>
> FUTEX_WAIT_REQUEUE_PI
>
> Wait operation to wait on a non pi futex at uaddr and
> potentially be requeued on a pi futex at uaddr2. The wait
> operation on uaddr is the same as FUTEX_WAIT. The waiter can
> be removed from the wait on uaddr via FUTEX_WAKE without
> requeuing on uaddr2.
>
> The timeout argument is handled as described in FUTEX_WAIT.
>
> Darren, can you fill in the missing details?
>
> Return values:
>
> [EFAULT] Kernel was unable to access the futex value at uaddr
> or uaddr2
>
> [EINVAL] The supplied uaddr or uaddr2 argument does not point
> to a valid object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] The supplied timeout argument is not normalized.
>
> [EINVAL] The supplied bitset is zero.
>
> [EWOULDBLOCK] The atomic enqueueing failed. User space value
> at uaddr is not equal val argument.
>
> [ETIMEDOUT] timeout expired
>
> [EOWNERDIED] The owner of the PI futex at uaddr2 died and the
> kernel made the caller the new owner. The kernel
> sets the FUTEX_OWNER_DIED bit in the uaddr2 futex
> userspace value. Caller is responsible for
> cleanup
>
> [ENOSYS] Not implemented on all architectures and not supported
> on some CPU variants (runtime detection)
>
>
> FUTEX_CMP_REQUEUE_PI
>
> PI aware variant of FUTEX_CMP_REQUEUE. Inner futex at uaddr is
> a non PI futex. Outer futex to which is requeued is a PI futex
> at uaddr2.
>
> The waiters on uaddr must wait in FUTEX_WAIT_REQUEUE_PI.
>
> The argument val is contains the number of waiters on uaddr
> which are immediately woken up. Must be 1 for this opcode.
>
> The timeout argument is abused to transport the number of
> waiters which are requeued on to the futex at uaddr2. The
> pointer is typecasted to u32.
>
> Darren, can you fill in the missing details?
>
> [EFAULT] Kernel was unable to access the futex value at uaddr
> or uaddr2
>
> [ENOMEM] Kernel could not allocate state
>
> [EINVAL] The supplied uaddr/uaddr2 arguments do not point to a
> valid object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_LOCK_PI on uaddr
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_WAIT[_BITSET] on uaddr
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr2 and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_WAIT on uaddr2.
>
> [EINVAL] The supplied bitset is zero.
>
> [EAGAIN] uaddr1 readout is not equal the compare value in
> argument val3
>
> [EAGAIN] The futex owner TID of uaddr2 is about to exit, but
> has not yet handled the internal state cleanup. Try
> again.
>
> [EPERM] Caller is not allowed to attach the waiter to the
> futex at uaddr2 Can be a legitimate issue or a hint
> for state corruption in user space
>
> [ESRCH] The TID in the user space value at uaddr2 does not exist
>
> [EDEADLOCK] The requeuing of a waiter to the kernel representation
> of the PI futex at uaddr2 detected a deadlock scenario.
>
> [ENOSYS] Not implemented on all architectures and not supported
> on some CPU variants (runtime detection)
>
>
> The various option bits seem to be undocumented as well
>
> FUTEX_PRIVATE_FLAG
>
> This option bit can be ored on all futex ops.
>
> It tells the kernel, that the futex is process private and not
> shared with another process. That allows the kernel to chose
> the fast path for validating the user space address and avoids
> expensive VMA lookup, taking refcounts on file backing store
> etc.
>
> FUTEX_CLOCK_REALTIME
>
> This option bit can be ored on the futex ops FUTEX_WAIT_BITSET
> and FUTEX_WAIT_REQUEUE_PI
>
> If set the kernel treats the user space supplied timeout as
> absolute time based on CLOCK_REALTIME.
>
> If not set the kernel treats the user space supplied timeout
> as relative time.
>
> If this is set on any other op than the supported ones, kernel
> returns ENOSYS!
>
>
> Thanks,
>
> tglx
>
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2014-08-04 14:46 ` Carlos O'Donell
0 siblings, 0 replies; 145+ messages in thread
From: Carlos O'Donell @ 2014-08-04 14:46 UTC (permalink / raw)
To: Michael Kerrisk (man-pages), Thomas Gleixner
Cc: Darren Hart, Ingo Molnar, Jakub Jelinek, linux-man, lkml,
Davidlohr Bueso, Arnd Bergmann, Steven Rostedt, Peter Zijlstra,
Linux API
On 05/15/2014 04:19 PM, Michael Kerrisk (man-pages) wrote:
> On 05/15/2014 04:14 PM, Thomas Gleixner wrote:
>> On Thu, 15 May 2014, Michael Kerrisk (man-pages) wrote:
>>> And that universe would love to have your documentation of
>>> FUTEX_WAKE_BITSET and FUTEX_WAIT_BITSET ;-),
>>
>> I give you almost the full treatment, but I leave REQUEUE_PI to Darren
>> and FUTEX_WAKE_OP to Jakub. :)
>
> Thanks Thomas--that's fantastic! Hopefully, Darren and Jakub fill in those
> missing pieces...
Michael,
Do you need any help getting these additional futex error codes
into the linux kernel man pages project? Thomas provided the
missing bits and Darren commented... what else do we need?
I'm asking because I want to point other Red Hat engineers at
these pages to say: "these are the canonical error codes."
We're trying to cleanup the userspace side of things.
Cheers,
Carlos.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2014-08-04 14:46 ` Carlos O'Donell
0 siblings, 0 replies; 145+ messages in thread
From: Carlos O'Donell @ 2014-08-04 14:46 UTC (permalink / raw)
To: Michael Kerrisk (man-pages), Thomas Gleixner
Cc: Darren Hart, Ingo Molnar, Jakub Jelinek,
linux-man-u79uwXL29TY76Z2rM5mHXA, lkml, Davidlohr Bueso,
Arnd Bergmann, Steven Rostedt, Peter Zijlstra, Linux API
On 05/15/2014 04:19 PM, Michael Kerrisk (man-pages) wrote:
> On 05/15/2014 04:14 PM, Thomas Gleixner wrote:
>> On Thu, 15 May 2014, Michael Kerrisk (man-pages) wrote:
>>> And that universe would love to have your documentation of
>>> FUTEX_WAKE_BITSET and FUTEX_WAIT_BITSET ;-),
>>
>> I give you almost the full treatment, but I leave REQUEUE_PI to Darren
>> and FUTEX_WAKE_OP to Jakub. :)
>
> Thanks Thomas--that's fantastic! Hopefully, Darren and Jakub fill in those
> missing pieces...
Michael,
Do you need any help getting these additional futex error codes
into the linux kernel man pages project? Thomas provided the
missing bits and Darren commented... what else do we need?
I'm asking because I want to point other Red Hat engineers at
these pages to say: "these are the canonical error codes."
We're trying to cleanup the userspace side of things.
Cheers,
Carlos.
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2014-05-15 20:35 ` Darren Hart
0 siblings, 0 replies; 145+ messages in thread
From: Darren Hart @ 2014-05-15 20:35 UTC (permalink / raw)
To: Thomas Gleixner, Michael Kerrisk (man-pages)
Cc: Carlos O'Donell, Ingo Molnar, Jakub Jelinek, linux-man, lkml,
Davidlohr Bueso, Arnd Bergmann, Steven Rostedt, Peter Zijlstra,
Linux API
On 5/15/14, 7:14, "Thomas Gleixner" <tglx@linutronix.de> wrote:
Wow Thomas, I planned to do exactly this and you beat me to it. Again.
Thanks for getting this started.
Michael, I imagine you want something more condensed, and I'll add to what
tglx posted (inline below) to try and get you that, but if you have
questions and need to fill in the gap, the paper I presented at RTLWS11 in
'09 covers this particularly nasty OPCODE in detail:
http://lwn.net/images/conf/rtlws11/papers/proc/p10.pdf
I believe Michael is looking for some higher level documentation, like how
to use these and what they are intended for. Probably something more like
Ulrich's Futexes are Tricky paper - but let's start with getting the op
codes, arguments, and return codes fleshed out.
For all the PI opcodes, we should probably mention something about the
futex value scheme (TID), whereas the other opcodes do not require any
specific value scheme.
No Owner: 0
Owner: TID
Waiters: TID | FUTEX_WAITERS
This is the relevant section from the referenced paper:
The PI futex operations diverge from the oth-
ers in that they impose a policy describing how
the futex value is to be used. If the lock is un-
owned, the futex value shall be 0. If owned, it
shall be the thread id (tid) of the owning thread.
If there are threads contending for the lock, then
the FUTEX_WAITERS flag is set. With this policy in
place, userspace can atomically acquire an unowned
lock or release an uncontended lock using an atomic
instruction and their own tid. A non-zero futex
value will force waiters into the kernel to lock. The
FUTEX_WAITERS flag forces the owner into the kernel
to unlock. If the callers are forced into the kernel,
they then deal directly with an underlying rt_mutex
which implements the priority inheritance semantics.
After the rt_mutex is acquired, the futex value is up-
dated accordingly, before the calling thread returns
to userspace.
It is important to note that the kernel will update the futex value prior
to returning to userspace. Unlike other futex op codes,
FUTEX_CMP_REUQUE_PI (and FUTEX_WAIT_REQUEUE_PI, FUTEX_LOCK_PI are designed
for the implementation of very specific IPC mechanisms).
>FUTEX_CMP_REQUEUE_PI
>
> PI aware variant of FUTEX_CMP_REQUEUE. Inner futex at uaddr is
> a non PI futex. Outer futex to which is requeued is a PI futex
> at uaddr2.
Inner/outer terminology applies specifically to the glibc pthread
condition variable and mutex use case, but is overly specific for the man
page. Consider:
PI aware variant for FUTEX_CMP_REQUEUE. Requeue tasks blocked on uaddr via
FUTEX_WAIT_REQUEUE_PI from a non-PI source futex (uaddr) to a PI target
futex (uaddr2).
>
> The waiters on uaddr must wait in FUTEX_WAIT_REQUEUE_PI.
>
> The argument val is contains the number of waiters on uaddr
> which are immediately woken up. Must be 1 for this opcode.
Because the point is to avoid the thundering herd in the first place, and
other nasty little races and faulting corner cases...
>
> The timeout argument is abused to transport the number of
> waiters which are requeued on to the futex at uaddr2. The
> pointer is typecasted to u32.
val3 contains the expected value of uaddr (same as
FUTEX_CMP_REQUEUE)
>
>Darren, can you fill in the missing details?
Yup...
>
> [EFAULT] Kernel was unable to access the futex value at uaddr
> or uaddr2
>
> [ENOMEM] Kernel could not allocate state
>
> [EINVAL] The supplied uaddr/uaddr2 arguments do not point to a
> valid object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_LOCK_PI on uaddr
instead of FUTEX_WAIT_REQUEUE_PI.
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_WAIT[_BITSET] on uaddr
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr2 and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_WAIT on uaddr2.
[EINVAL] The kernel detected the FUTEX_CMP_REQUEUE_PI call is
attempting to requeue a task to a futex other than that
specified by the matching FUTEX_WAIT_REQUEUE_PI call for
that task.
A number of these EINVALs can probably be combined into "Kernel detected
bad state" as far as the C library is concerned, but we can consolidate
later. But basically, EINVAL is returned if the non-pi to pi or op pairing
semantics are violated.
>
> [EINVAL] The supplied bitset is zero.
Bitset doesn't apply to FUTEX_CMP_REQUEUE_PI.
[EINVAL] nr_wake != 1
EAGAIN == EWOULDBLOCK. We use each in the kernel, but will just refer to
them here as EAGAIN.
> [EAGAIN] uaddr1 readout is not equal the compare value in
> argument val3
>
> [EAGAIN] The futex owner TID of uaddr2 is about to exit, but
> has not yet handled the internal state cleanup. Try
> again.
>
> [EPERM] Caller is not allowed to attach the waiter to the
> futex at uaddr2 Can be a legitimate issue or a hint
> for state corruption in user space
>
> [ESRCH] The TID in the user space value at uaddr2 does not exist
Hrm, I'm missing ESRCH and EPERM in my state diagrams.... put yes, we can
get ESRCH when looking up PI state, and we can return that from
futex_requeue.... That needs some time to review...
I'm not seeing the EPERM path, where is that coming from?
>
> [EDEADLOCK] The requeuing of a waiter to the kernel representation
> of the PI futex at uaddr2 detected a deadlock scenario.
>
> [ENOSYS] Not implemented on all architectures and not supported
> on some CPU variants (runtime detection)
Return value >= 0 is successful, indicating the number of of tasks
requeued or woken (3 requeued and 1 woken would return 4).
Thanks,
--
Darren Hart Open Source Technology Center
darren.hart@intel.com Intel Corporation
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2014-05-15 20:35 ` Darren Hart
0 siblings, 0 replies; 145+ messages in thread
From: Darren Hart @ 2014-05-15 20:35 UTC (permalink / raw)
To: Thomas Gleixner, Michael Kerrisk (man-pages)
Cc: Carlos O'Donell, Ingo Molnar, Jakub Jelinek,
linux-man-u79uwXL29TY76Z2rM5mHXA, lkml, Davidlohr Bueso,
Arnd Bergmann, Steven Rostedt, Peter Zijlstra, Linux API
On 5/15/14, 7:14, "Thomas Gleixner" <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org> wrote:
Wow Thomas, I planned to do exactly this and you beat me to it. Again.
Thanks for getting this started.
Michael, I imagine you want something more condensed, and I'll add to what
tglx posted (inline below) to try and get you that, but if you have
questions and need to fill in the gap, the paper I presented at RTLWS11 in
'09 covers this particularly nasty OPCODE in detail:
http://lwn.net/images/conf/rtlws11/papers/proc/p10.pdf
I believe Michael is looking for some higher level documentation, like how
to use these and what they are intended for. Probably something more like
Ulrich's Futexes are Tricky paper - but let's start with getting the op
codes, arguments, and return codes fleshed out.
For all the PI opcodes, we should probably mention something about the
futex value scheme (TID), whereas the other opcodes do not require any
specific value scheme.
No Owner: 0
Owner: TID
Waiters: TID | FUTEX_WAITERS
This is the relevant section from the referenced paper:
The PI futex operations diverge from the oth-
ers in that they impose a policy describing how
the futex value is to be used. If the lock is un-
owned, the futex value shall be 0. If owned, it
shall be the thread id (tid) of the owning thread.
If there are threads contending for the lock, then
the FUTEX_WAITERS flag is set. With this policy in
place, userspace can atomically acquire an unowned
lock or release an uncontended lock using an atomic
instruction and their own tid. A non-zero futex
value will force waiters into the kernel to lock. The
FUTEX_WAITERS flag forces the owner into the kernel
to unlock. If the callers are forced into the kernel,
they then deal directly with an underlying rt_mutex
which implements the priority inheritance semantics.
After the rt_mutex is acquired, the futex value is up-
dated accordingly, before the calling thread returns
to userspace.
It is important to note that the kernel will update the futex value prior
to returning to userspace. Unlike other futex op codes,
FUTEX_CMP_REUQUE_PI (and FUTEX_WAIT_REQUEUE_PI, FUTEX_LOCK_PI are designed
for the implementation of very specific IPC mechanisms).
>FUTEX_CMP_REQUEUE_PI
>
> PI aware variant of FUTEX_CMP_REQUEUE. Inner futex at uaddr is
> a non PI futex. Outer futex to which is requeued is a PI futex
> at uaddr2.
Inner/outer terminology applies specifically to the glibc pthread
condition variable and mutex use case, but is overly specific for the man
page. Consider:
PI aware variant for FUTEX_CMP_REQUEUE. Requeue tasks blocked on uaddr via
FUTEX_WAIT_REQUEUE_PI from a non-PI source futex (uaddr) to a PI target
futex (uaddr2).
>
> The waiters on uaddr must wait in FUTEX_WAIT_REQUEUE_PI.
>
> The argument val is contains the number of waiters on uaddr
> which are immediately woken up. Must be 1 for this opcode.
Because the point is to avoid the thundering herd in the first place, and
other nasty little races and faulting corner cases...
>
> The timeout argument is abused to transport the number of
> waiters which are requeued on to the futex at uaddr2. The
> pointer is typecasted to u32.
val3 contains the expected value of uaddr (same as
FUTEX_CMP_REQUEUE)
>
>Darren, can you fill in the missing details?
Yup...
>
> [EFAULT] Kernel was unable to access the futex value at uaddr
> or uaddr2
>
> [ENOMEM] Kernel could not allocate state
>
> [EINVAL] The supplied uaddr/uaddr2 arguments do not point to a
> valid object, i.e. pointer is not 4 byte aligned
>
> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_LOCK_PI on uaddr
instead of FUTEX_WAIT_REQUEUE_PI.
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_WAIT[_BITSET] on uaddr
>
> [EINVAL] The kernel detected inconsistent state between the
> user space state at uaddr2 and the kernel state,
> i.e. it detected a waiter which waits in
> FUTEX_WAIT on uaddr2.
[EINVAL] The kernel detected the FUTEX_CMP_REQUEUE_PI call is
attempting to requeue a task to a futex other than that
specified by the matching FUTEX_WAIT_REQUEUE_PI call for
that task.
A number of these EINVALs can probably be combined into "Kernel detected
bad state" as far as the C library is concerned, but we can consolidate
later. But basically, EINVAL is returned if the non-pi to pi or op pairing
semantics are violated.
>
> [EINVAL] The supplied bitset is zero.
Bitset doesn't apply to FUTEX_CMP_REQUEUE_PI.
[EINVAL] nr_wake != 1
EAGAIN == EWOULDBLOCK. We use each in the kernel, but will just refer to
them here as EAGAIN.
> [EAGAIN] uaddr1 readout is not equal the compare value in
> argument val3
>
> [EAGAIN] The futex owner TID of uaddr2 is about to exit, but
> has not yet handled the internal state cleanup. Try
> again.
>
> [EPERM] Caller is not allowed to attach the waiter to the
> futex at uaddr2 Can be a legitimate issue or a hint
> for state corruption in user space
>
> [ESRCH] The TID in the user space value at uaddr2 does not exist
Hrm, I'm missing ESRCH and EPERM in my state diagrams.... put yes, we can
get ESRCH when looking up PI state, and we can return that from
futex_requeue.... That needs some time to review...
I'm not seeing the EPERM path, where is that coming from?
>
> [EDEADLOCK] The requeuing of a waiter to the kernel representation
> of the PI futex at uaddr2 detected a deadlock scenario.
>
> [ENOSYS] Not implemented on all architectures and not supported
> on some CPU variants (runtime detection)
Return value >= 0 is successful, indicating the number of of tasks
requeued or woken (3 requeued and 1 woken would return 4).
Thanks,
--
Darren Hart Open Source Technology Center
darren.hart-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org Intel Corporation
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-15 15:12 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 145+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-15 15:12 UTC (permalink / raw)
To: Darren Hart, Thomas Gleixner
Cc: mtk.manpages, Carlos O'Donell, Ingo Molnar, Jakub Jelinek,
linux-man, lkml, Davidlohr Bueso, Arnd Bergmann, Steven Rostedt,
Peter Zijlstra, Linux API
Hello Darren,
I give you the same apology as to Thomas for the
long-delayed response to your mail.
And I repeat my note to Thomas:
In the next day or two, I hope to send out the new version
of the futex(2) page for review. The new draft is a bit
bigger (okay -- 4 x bigger) than the current page. And there
are a quite number of FIXMEs that I've placed in the page
for various points--some minor, but a few major--that need
to be checked or fixed. Would you have some time to review
that page?
In the meantime, I have a couple of questions, which, if
you could answer them, I would work some changes into the
page before sending.
1. In various places, distinction is made between non-PI
futexs and PI futexes. But what determines that distinction?
From the kernel's perspective, hat make a futex one type
or another? I presume it is to do with the types of blocking
waiters on the futex, but it would be good to have a formal
definition.
2. Can you say something about the pairing requirements of
FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI.
What is the requirement and why do we need it?
Most of the rest of this mail is just a checklist noting
what I did with your comments. No response is needed
in most cases, but there is one that I have marked with
"???". If you could reply to that. I'd be grateful.
On 05/15/2014 10:35 PM, Darren Hart wrote:
> On 5/15/14, 7:14, "Thomas Gleixner" <tglx@linutronix.de> wrote:
>
> Wow Thomas, I planned to do exactly this and you beat me to it. Again.
> Thanks for getting this started.
>
> Michael, I imagine you want something more condensed, and I'll add to what
> tglx posted (inline below) to try and get you that, but if you have
> questions and need to fill in the gap, the paper I presented at RTLWS11 in
> '09 covers this particularly nasty OPCODE in detail:
>
> http://lwn.net/images/conf/rtlws11/papers/proc/p10.pdf
>
> I believe Michael is looking for some higher level documentation, like how
> to use these and what they are intended for.
Yes, that would be good.
> Probably something more like
> Ulrich's Futexes are Tricky paper - but let's start with getting the op
> codes, arguments, and return codes fleshed out.
Okay.
> For all the PI opcodes, we should probably mention something about the
> futex value scheme (TID), whereas the other opcodes do not require any
> specific value scheme.
>
> No Owner: 0
> Owner: TID
> Waiters: TID | FUTEX_WAITERS
>
> This is the relevant section from the referenced paper:
>
> The PI futex operations diverge from the oth-
> ers in that they impose a policy describing how
> the futex value is to be used. If the lock is un-
> owned, the futex value shall be 0. If owned, it
> shall be the thread id (tid) of the owning thread.
> If there are threads contending for the lock, then
> the FUTEX_WAITERS flag is set. With this policy in
> place, userspace can atomically acquire an unowned
> lock or release an uncontended lock using an atomic
> instruction and their own tid. A non-zero futex
> value will force waiters into the kernel to lock. The
> FUTEX_WAITERS flag forces the owner into the kernel
> to unlock. If the callers are forced into the kernel,
> they then deal directly with an underlying rt_mutex
> which implements the priority inheritance semantics.
> After the rt_mutex is acquired, the futex value is up-
> dated accordingly, before the calling thread returns
> to userspace.
>
> It is important to note that the kernel will update the futex value prior
> to returning to userspace. Unlike other futex op codes,
> FUTEX_CMP_REUQUE_PI (and FUTEX_WAIT_REQUEUE_PI, FUTEX_LOCK_PI are designed
> for the implementation of very specific IPC mechanisms).
??? Great text. May I presume that I can take this text
and freely adapt it for the man page? (Actually, this is a
request for forgiveness, rather than permission :-).)
>> FUTEX_CMP_REQUEUE_PI
>>
>> PI aware variant of FUTEX_CMP_REQUEUE. Inner futex at uaddr is
>> a non PI futex. Outer futex to which is requeued is a PI futex
>> at uaddr2.
>
> Inner/outer terminology applies specifically to the glibc pthread
> condition variable and mutex use case, but is overly specific for the man
> page. Consider:
>
> PI aware variant for FUTEX_CMP_REQUEUE. Requeue tasks blocked on uaddr via
> FUTEX_WAIT_REQUEUE_PI from a non-PI source futex (uaddr) to a PI target
> futex (uaddr2).
Thanks for that text. It is easier to grasp.
>>
>> The waiters on uaddr must wait in FUTEX_WAIT_REQUEUE_PI.
>>
>> The argument val is contains the number of waiters on uaddr
>> which are immediately woken up. Must be 1 for this opcode.
>
> Because the point is to avoid the thundering herd in the first place, and
> other nasty little races and faulting corner cases...
I added the piece about "thundering herd".
>> The timeout argument is abused to transport the number of
>> waiters which are requeued on to the futex at uaddr2. The
>> pointer is typecasted to u32.
>
>
> val3 contains the expected value of uaddr (same as
> FUTEX_CMP_REQUEUE)
Yes. (The text now says that 'val3' has the same purpose as
for FUTEX_CMP_REQUEUE.)
>> Darren, can you fill in the missing details?
>
> Yup...
>
>>
>> [EFAULT] Kernel was unable to access the futex value at uaddr
>> or uaddr2
>>
>> [ENOMEM] Kernel could not allocate state
>>
>> [EINVAL] The supplied uaddr/uaddr2 arguments do not point to a
>> valid object, i.e. pointer is not 4 byte aligned
>>
>> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>>
>> [EINVAL] The kernel detected inconsistent state between the
>> user space state at uaddr and the kernel state,
>> i.e. it detected a waiter which waits in
>> FUTEX_LOCK_PI on uaddr
>
> instead of FUTEX_WAIT_REQUEUE_PI.
Thanks. I added that detail.
>> [EINVAL] The kernel detected inconsistent state between the
>> user space state at uaddr and the kernel state,
>> i.e. it detected a waiter which waits in
>> FUTEX_WAIT[_BITSET] on uaddr
>>
>> [EINVAL] The kernel detected inconsistent state between the
>> user space state at uaddr2 and the kernel state,
>> i.e. it detected a waiter which waits in
>> FUTEX_WAIT on uaddr2.
>
> [EINVAL] The kernel detected the FUTEX_CMP_REQUEUE_PI call is
> attempting to requeue a task to a futex other than that
> specified by the matching FUTEX_WAIT_REQUEUE_PI call for
> that task.
Thanks. Added.
> A number of these EINVALs can probably be combined into "Kernel detected
> bad state" as far as the C library is concerned, but we can consolidate
> later. But basically, EINVAL is returned if the non-pi to pi or op pairing
> semantics are violated.
I think the page probably needs some text to cover that point. I'll add
a FIXME for review.
>> [EINVAL] The supplied bitset is zero.
>
> Bitset doesn't apply to FUTEX_CMP_REQUEUE_PI.
Thanks.
> [EINVAL] nr_wake != 1
Thanks, I'd already spotted this, but it's good to have confirmation.
> EAGAIN == EWOULDBLOCK. We use each in the kernel, but will just refer to
> them here as EAGAIN.
Yes. And I've followed that convention now in the man page.
>> [EAGAIN] uaddr1 readout is not equal the compare value in
>> argument val3
>>
>> [EAGAIN] The futex owner TID of uaddr2 is about to exit, but
>> has not yet handled the internal state cleanup. Try
>> again.
>>
>> [EPERM] Caller is not allowed to attach the waiter to the
>> futex at uaddr2 Can be a legitimate issue or a hint
>> for state corruption in user space
>>
>> [ESRCH] The TID in the user space value at uaddr2 does not exist
>
> Hrm, I'm missing ESRCH and EPERM in my state diagrams.... put yes, we can
> get ESRCH when looking up PI state, and we can return that from
> futex_requeue.... That needs some time to review...
>
> I'm not seeing the EPERM path, where is that coming from?
Any further insight on the above?
>> [EDEADLOCK] The requeuing of a waiter to the kernel representation
>> of the PI futex at uaddr2 detected a deadlock scenario.
>>
>> [ENOSYS] Not implemented on all architectures and not supported
>> on some CPU variants (runtime detection)
>
> Return value >= 0 is successful, indicating the number of of tasks
> requeued or woken (3 requeued and 1 woken would return 4).
Yes. Already noted.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-15 15:12 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 145+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-15 15:12 UTC (permalink / raw)
To: Darren Hart, Thomas Gleixner
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Carlos O'Donell,
Ingo Molnar, Jakub Jelinek, linux-man-u79uwXL29TY76Z2rM5mHXA,
lkml, Davidlohr Bueso, Arnd Bergmann, Steven Rostedt,
Peter Zijlstra, Linux API
Hello Darren,
I give you the same apology as to Thomas for the
long-delayed response to your mail.
And I repeat my note to Thomas:
In the next day or two, I hope to send out the new version
of the futex(2) page for review. The new draft is a bit
bigger (okay -- 4 x bigger) than the current page. And there
are a quite number of FIXMEs that I've placed in the page
for various points--some minor, but a few major--that need
to be checked or fixed. Would you have some time to review
that page?
In the meantime, I have a couple of questions, which, if
you could answer them, I would work some changes into the
page before sending.
1. In various places, distinction is made between non-PI
futexs and PI futexes. But what determines that distinction?
From the kernel's perspective, hat make a futex one type
or another? I presume it is to do with the types of blocking
waiters on the futex, but it would be good to have a formal
definition.
2. Can you say something about the pairing requirements of
FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI.
What is the requirement and why do we need it?
Most of the rest of this mail is just a checklist noting
what I did with your comments. No response is needed
in most cases, but there is one that I have marked with
"???". If you could reply to that. I'd be grateful.
On 05/15/2014 10:35 PM, Darren Hart wrote:
> On 5/15/14, 7:14, "Thomas Gleixner" <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org> wrote:
>
> Wow Thomas, I planned to do exactly this and you beat me to it. Again.
> Thanks for getting this started.
>
> Michael, I imagine you want something more condensed, and I'll add to what
> tglx posted (inline below) to try and get you that, but if you have
> questions and need to fill in the gap, the paper I presented at RTLWS11 in
> '09 covers this particularly nasty OPCODE in detail:
>
> http://lwn.net/images/conf/rtlws11/papers/proc/p10.pdf
>
> I believe Michael is looking for some higher level documentation, like how
> to use these and what they are intended for.
Yes, that would be good.
> Probably something more like
> Ulrich's Futexes are Tricky paper - but let's start with getting the op
> codes, arguments, and return codes fleshed out.
Okay.
> For all the PI opcodes, we should probably mention something about the
> futex value scheme (TID), whereas the other opcodes do not require any
> specific value scheme.
>
> No Owner: 0
> Owner: TID
> Waiters: TID | FUTEX_WAITERS
>
> This is the relevant section from the referenced paper:
>
> The PI futex operations diverge from the oth-
> ers in that they impose a policy describing how
> the futex value is to be used. If the lock is un-
> owned, the futex value shall be 0. If owned, it
> shall be the thread id (tid) of the owning thread.
> If there are threads contending for the lock, then
> the FUTEX_WAITERS flag is set. With this policy in
> place, userspace can atomically acquire an unowned
> lock or release an uncontended lock using an atomic
> instruction and their own tid. A non-zero futex
> value will force waiters into the kernel to lock. The
> FUTEX_WAITERS flag forces the owner into the kernel
> to unlock. If the callers are forced into the kernel,
> they then deal directly with an underlying rt_mutex
> which implements the priority inheritance semantics.
> After the rt_mutex is acquired, the futex value is up-
> dated accordingly, before the calling thread returns
> to userspace.
>
> It is important to note that the kernel will update the futex value prior
> to returning to userspace. Unlike other futex op codes,
> FUTEX_CMP_REUQUE_PI (and FUTEX_WAIT_REQUEUE_PI, FUTEX_LOCK_PI are designed
> for the implementation of very specific IPC mechanisms).
??? Great text. May I presume that I can take this text
and freely adapt it for the man page? (Actually, this is a
request for forgiveness, rather than permission :-).)
>> FUTEX_CMP_REQUEUE_PI
>>
>> PI aware variant of FUTEX_CMP_REQUEUE. Inner futex at uaddr is
>> a non PI futex. Outer futex to which is requeued is a PI futex
>> at uaddr2.
>
> Inner/outer terminology applies specifically to the glibc pthread
> condition variable and mutex use case, but is overly specific for the man
> page. Consider:
>
> PI aware variant for FUTEX_CMP_REQUEUE. Requeue tasks blocked on uaddr via
> FUTEX_WAIT_REQUEUE_PI from a non-PI source futex (uaddr) to a PI target
> futex (uaddr2).
Thanks for that text. It is easier to grasp.
>>
>> The waiters on uaddr must wait in FUTEX_WAIT_REQUEUE_PI.
>>
>> The argument val is contains the number of waiters on uaddr
>> which are immediately woken up. Must be 1 for this opcode.
>
> Because the point is to avoid the thundering herd in the first place, and
> other nasty little races and faulting corner cases...
I added the piece about "thundering herd".
>> The timeout argument is abused to transport the number of
>> waiters which are requeued on to the futex at uaddr2. The
>> pointer is typecasted to u32.
>
>
> val3 contains the expected value of uaddr (same as
> FUTEX_CMP_REQUEUE)
Yes. (The text now says that 'val3' has the same purpose as
for FUTEX_CMP_REQUEUE.)
>> Darren, can you fill in the missing details?
>
> Yup...
>
>>
>> [EFAULT] Kernel was unable to access the futex value at uaddr
>> or uaddr2
>>
>> [ENOMEM] Kernel could not allocate state
>>
>> [EINVAL] The supplied uaddr/uaddr2 arguments do not point to a
>> valid object, i.e. pointer is not 4 byte aligned
>>
>> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>>
>> [EINVAL] The kernel detected inconsistent state between the
>> user space state at uaddr and the kernel state,
>> i.e. it detected a waiter which waits in
>> FUTEX_LOCK_PI on uaddr
>
> instead of FUTEX_WAIT_REQUEUE_PI.
Thanks. I added that detail.
>> [EINVAL] The kernel detected inconsistent state between the
>> user space state at uaddr and the kernel state,
>> i.e. it detected a waiter which waits in
>> FUTEX_WAIT[_BITSET] on uaddr
>>
>> [EINVAL] The kernel detected inconsistent state between the
>> user space state at uaddr2 and the kernel state,
>> i.e. it detected a waiter which waits in
>> FUTEX_WAIT on uaddr2.
>
> [EINVAL] The kernel detected the FUTEX_CMP_REQUEUE_PI call is
> attempting to requeue a task to a futex other than that
> specified by the matching FUTEX_WAIT_REQUEUE_PI call for
> that task.
Thanks. Added.
> A number of these EINVALs can probably be combined into "Kernel detected
> bad state" as far as the C library is concerned, but we can consolidate
> later. But basically, EINVAL is returned if the non-pi to pi or op pairing
> semantics are violated.
I think the page probably needs some text to cover that point. I'll add
a FIXME for review.
>> [EINVAL] The supplied bitset is zero.
>
> Bitset doesn't apply to FUTEX_CMP_REQUEUE_PI.
Thanks.
> [EINVAL] nr_wake != 1
Thanks, I'd already spotted this, but it's good to have confirmation.
> EAGAIN == EWOULDBLOCK. We use each in the kernel, but will just refer to
> them here as EAGAIN.
Yes. And I've followed that convention now in the man page.
>> [EAGAIN] uaddr1 readout is not equal the compare value in
>> argument val3
>>
>> [EAGAIN] The futex owner TID of uaddr2 is about to exit, but
>> has not yet handled the internal state cleanup. Try
>> again.
>>
>> [EPERM] Caller is not allowed to attach the waiter to the
>> futex at uaddr2 Can be a legitimate issue or a hint
>> for state corruption in user space
>>
>> [ESRCH] The TID in the user space value at uaddr2 does not exist
>
> Hrm, I'm missing ESRCH and EPERM in my state diagrams.... put yes, we can
> get ESRCH when looking up PI state, and we can return that from
> futex_requeue.... That needs some time to review...
>
> I'm not seeing the EPERM path, where is that coming from?
Any further insight on the above?
>> [EDEADLOCK] The requeuing of a waiter to the kernel representation
>> of the PI futex at uaddr2 detected a deadlock scenario.
>>
>> [ENOSYS] Not implemented on all architectures and not supported
>> on some CPU variants (runtime detection)
>
> Return value >= 0 is successful, indicating the number of of tasks
> requeued or woken (3 requeued and 1 woken would return 4).
Yes. Already noted.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-17 1:33 ` Darren Hart
0 siblings, 0 replies; 145+ messages in thread
From: Darren Hart @ 2015-01-17 1:33 UTC (permalink / raw)
To: Michael Kerrisk (man-pages), Thomas Gleixner
Cc: Carlos O'Donell, Ingo Molnar, Jakub Jelinek, linux-man, lkml,
Arnd Bergmann, Steven Rostedt, Peter Zijlstra, Linux API,
Davidlohr Bueso
Corrected Davidlohr's email address.
On 1/15/15, 7:12 AM, "Michael Kerrisk (man-pages)"
<mtk.manpages@gmail.com> wrote:
>Hello Darren,
>
>I give you the same apology as to Thomas for the
>long-delayed response to your mail.
>
>And I repeat my note to Thomas:
>In the next day or two, I hope to send out the new version
>of the futex(2) page for review. The new draft is a bit
>bigger (okay -- 4 x bigger) than the current page. And there
>are a quite number of FIXMEs that I've placed in the page
>for various points--some minor, but a few major--that need
>to be checked or fixed. Would you have some time to review
>that page?
I'll make the time for that. I've wanted to see this for a while, so thank
you for working on it!
>
>
>In the meantime, I have a couple of questions, which, if
>you could answer them, I would work some changes into the
>page before sending.
>
>1. In various places, distinction is made between non-PI
> futexs and PI futexes. But what determines that distinction?
> From the kernel's perspective, hat make a futex one type
> or another? I presume it is to do with the types of blocking
> waiters on the futex, but it would be good to have a formal
> definition.
You're right in that a uaddr is a uaddr is a uaddr. Also "there is no such
thing as a futex", it doesn't exist as any kind of identifiable object, so
these discussions can get rather confusing :-)
A "futex" becomes a PI futex when it is "created" via a PI futex op code.
At that point, the syscall will ensure a pi_state is populated for the
futex_q entry. See futex_lock_pi() for example. Before the locks are
taken, there is a call to refill_pi_state_cache() which preps a pi_state
for assignment later in futex_lock_pi_atomic(). This pi_state provides the
necessary linkage to perform the priority boosting in the event of a
priority inversion. This is handled externally from the futexes via the
rt_mutex construct.
Clear as mud?
>
>2. Can you say something about the pairing requirements of
> FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI.
> What is the requirement and why do we need it?
Briefly, these op codes exist to support a fairly specific use case:
support for PI aware pthread condvars (glibc patch acceptance STILL
PENDING FOR LOVE OF EVERYTHING HOLY WHY?!?!?! But is shipped with various
PREEMPT_RT enabled Linux systems. Because these calls are paired, and more
of the logic can happen on the kernel side (to preserve ownership of an
rt_mutex with waiters), so in order to ensure userspace and kernelspace
remain in sync, we pre-specify the target of the requeue in
futex_wait_requeue_pi. This also limits the attack surface by only
supporting exactly what it was meant to do. The corner cases get insane
otherwise.
We could walk through the various ways in which it would break if these
pairing restrictions were not in place, but I'll have to take some serious
time to page all those into working memory. Let me know if we need more
detail here and I will.
>
>Most of the rest of this mail is just a checklist noting
>what I did with your comments. No response is needed
>in most cases, but there is one that I have marked with
>"???". If you could reply to that. I'd be grateful.
...
>> For all the PI opcodes, we should probably mention something about the
>> futex value scheme (TID), whereas the other opcodes do not require any
>> specific value scheme.
>>
>> No Owner: 0
>> Owner: TID
>> Waiters: TID | FUTEX_WAITERS
>>
>> This is the relevant section from the referenced paper:
>>
>> The PI futex operations diverge from the oth-
>> ers in that they impose a policy describing how
>> the futex value is to be used. If the lock is un-
>> owned, the futex value shall be 0. If owned, it
>> shall be the thread id (tid) of the owning thread.
>> If there are threads contending for the lock, then
>> the FUTEX_WAITERS flag is set. With this policy in
>> place, userspace can atomically acquire an unowned
>> lock or release an uncontended lock using an atomic
>> instruction and their own tid. A non-zero futex
>> value will force waiters into the kernel to lock. The
>> FUTEX_WAITERS flag forces the owner into the kernel
>> to unlock. If the callers are forced into the kernel,
>> they then deal directly with an underlying rt_mutex
>> which implements the priority inheritance semantics.
>> After the rt_mutex is acquired, the futex value is up-
>> dated accordingly, before the calling thread returns
>> to userspace.
>>
>> It is important to note that the kernel will update the futex value
>>prior
>> to returning to userspace. Unlike other futex op codes,
>> FUTEX_CMP_REUQUE_PI (and FUTEX_WAIT_REQUEUE_PI, FUTEX_LOCK_PI are
>>designed
>> for the implementation of very specific IPC mechanisms).
>
>??? Great text. May I presume that I can take this text
>and freely adapt it for the man page? (Actually, this is a
>request for forgiveness, rather than permission :-).)
Thanks, and no objection from me.
--
Darren Hart
Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-17 1:33 ` Darren Hart
0 siblings, 0 replies; 145+ messages in thread
From: Darren Hart @ 2015-01-17 1:33 UTC (permalink / raw)
To: Michael Kerrisk (man-pages), Thomas Gleixner
Cc: Carlos O'Donell, Ingo Molnar, Jakub Jelinek,
linux-man-u79uwXL29TY76Z2rM5mHXA, lkml, Arnd Bergmann,
Steven Rostedt, Peter Zijlstra, Linux API, Davidlohr Bueso
Corrected Davidlohr's email address.
On 1/15/15, 7:12 AM, "Michael Kerrisk (man-pages)"
<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>Hello Darren,
>
>I give you the same apology as to Thomas for the
>long-delayed response to your mail.
>
>And I repeat my note to Thomas:
>In the next day or two, I hope to send out the new version
>of the futex(2) page for review. The new draft is a bit
>bigger (okay -- 4 x bigger) than the current page. And there
>are a quite number of FIXMEs that I've placed in the page
>for various points--some minor, but a few major--that need
>to be checked or fixed. Would you have some time to review
>that page?
I'll make the time for that. I've wanted to see this for a while, so thank
you for working on it!
>
>
>In the meantime, I have a couple of questions, which, if
>you could answer them, I would work some changes into the
>page before sending.
>
>1. In various places, distinction is made between non-PI
> futexs and PI futexes. But what determines that distinction?
> From the kernel's perspective, hat make a futex one type
> or another? I presume it is to do with the types of blocking
> waiters on the futex, but it would be good to have a formal
> definition.
You're right in that a uaddr is a uaddr is a uaddr. Also "there is no such
thing as a futex", it doesn't exist as any kind of identifiable object, so
these discussions can get rather confusing :-)
A "futex" becomes a PI futex when it is "created" via a PI futex op code.
At that point, the syscall will ensure a pi_state is populated for the
futex_q entry. See futex_lock_pi() for example. Before the locks are
taken, there is a call to refill_pi_state_cache() which preps a pi_state
for assignment later in futex_lock_pi_atomic(). This pi_state provides the
necessary linkage to perform the priority boosting in the event of a
priority inversion. This is handled externally from the futexes via the
rt_mutex construct.
Clear as mud?
>
>2. Can you say something about the pairing requirements of
> FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI.
> What is the requirement and why do we need it?
Briefly, these op codes exist to support a fairly specific use case:
support for PI aware pthread condvars (glibc patch acceptance STILL
PENDING FOR LOVE OF EVERYTHING HOLY WHY?!?!?! But is shipped with various
PREEMPT_RT enabled Linux systems. Because these calls are paired, and more
of the logic can happen on the kernel side (to preserve ownership of an
rt_mutex with waiters), so in order to ensure userspace and kernelspace
remain in sync, we pre-specify the target of the requeue in
futex_wait_requeue_pi. This also limits the attack surface by only
supporting exactly what it was meant to do. The corner cases get insane
otherwise.
We could walk through the various ways in which it would break if these
pairing restrictions were not in place, but I'll have to take some serious
time to page all those into working memory. Let me know if we need more
detail here and I will.
>
>Most of the rest of this mail is just a checklist noting
>what I did with your comments. No response is needed
>in most cases, but there is one that I have marked with
>"???". If you could reply to that. I'd be grateful.
...
>> For all the PI opcodes, we should probably mention something about the
>> futex value scheme (TID), whereas the other opcodes do not require any
>> specific value scheme.
>>
>> No Owner: 0
>> Owner: TID
>> Waiters: TID | FUTEX_WAITERS
>>
>> This is the relevant section from the referenced paper:
>>
>> The PI futex operations diverge from the oth-
>> ers in that they impose a policy describing how
>> the futex value is to be used. If the lock is un-
>> owned, the futex value shall be 0. If owned, it
>> shall be the thread id (tid) of the owning thread.
>> If there are threads contending for the lock, then
>> the FUTEX_WAITERS flag is set. With this policy in
>> place, userspace can atomically acquire an unowned
>> lock or release an uncontended lock using an atomic
>> instruction and their own tid. A non-zero futex
>> value will force waiters into the kernel to lock. The
>> FUTEX_WAITERS flag forces the owner into the kernel
>> to unlock. If the callers are forced into the kernel,
>> they then deal directly with an underlying rt_mutex
>> which implements the priority inheritance semantics.
>> After the rt_mutex is acquired, the futex value is up-
>> dated accordingly, before the calling thread returns
>> to userspace.
>>
>> It is important to note that the kernel will update the futex value
>>prior
>> to returning to userspace. Unlike other futex op codes,
>> FUTEX_CMP_REUQUE_PI (and FUTEX_WAIT_REQUEUE_PI, FUTEX_LOCK_PI are
>>designed
>> for the implementation of very specific IPC mechanisms).
>
>??? Great text. May I presume that I can take this text
>and freely adapt it for the man page? (Actually, this is a
>request for forgiveness, rather than permission :-).)
Thanks, and no objection from me.
--
Darren Hart
Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
2015-01-17 1:33 ` Darren Hart
(?)
@ 2015-01-17 9:16 ` Michael Kerrisk (man-pages)
2015-01-17 19:26 ` Darren Hart
-1 siblings, 1 reply; 145+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-17 9:16 UTC (permalink / raw)
To: Darren Hart, Thomas Gleixner
Cc: mtk.manpages, Carlos O'Donell, Ingo Molnar, Jakub Jelinek,
linux-man, lkml, Arnd Bergmann, Steven Rostedt, Peter Zijlstra,
Linux API, Davidlohr Bueso, Jan Kiszka
Hello Darren,
On 01/17/2015 02:33 AM, Darren Hart wrote:
> Corrected Davidlohr's email address.
Thanks!
> On 1/15/15, 7:12 AM, "Michael Kerrisk (man-pages)"
> <mtk.manpages@gmail.com> wrote:
>
>> Hello Darren,
>>
>> I give you the same apology as to Thomas for the
>> long-delayed response to your mail.
>>
>> And I repeat my note to Thomas:
>> In the next day or two, I hope to send out the new version
>> of the futex(2) page for review. The new draft is a bit
>> bigger (okay -- 4 x bigger) than the current page. And there
>> are a quite number of FIXMEs that I've placed in the page
>> for various points--some minor, but a few major--that need
>> to be checked or fixed. Would you have some time to review
>> that page?
>
> I'll make the time for that. I've wanted to see this for a while, so thank
> you for working on it!
Great!
>> In the meantime, I have a couple of questions, which, if
>> you could answer them, I would work some changes into the
>> page before sending.
>>
>> 1. In various places, distinction is made between non-PI
>> futexs and PI futexes. But what determines that distinction?
>> From the kernel's perspective, hat make a futex one type
>> or another? I presume it is to do with the types of blocking
>> waiters on the futex, but it would be good to have a formal
>> definition.
>
> You're right in that a uaddr is a uaddr is a uaddr. Also "there is no such
> thing as a futex", it doesn't exist as any kind of identifiable object, so
> these discussions can get rather confusing :-)
So, I want to make sure that I am clear on what you mean you say this.
You say "there is no such thing as a futex" because from the kernel's
perspective there is no visible entity in the uncontended case
(where everything can be dealt with in user space). And from user-space,
in the uncontended case all we're doing is memory operations. Right?
On the other hand, from a kernel perspective, we could say that a
futex "exists" in the contended phases, since the kernel has allocated
state associated with the uaddr. Right?
> A "futex" becomes a PI futex when it is "created" via a PI futex op code.
Precisely which PI op codes? Is it: FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, and
FUTEX_CMP_REQUEUE_PI, and not FUTEX_WAIT_REQUEUE_PI or FUTEX_UNLOCK_PI?
> At that point, the syscall will ensure a pi_state is populated for the
> futex_q entry. See futex_lock_pi() for example. Before the locks are
> taken, there is a call to refill_pi_state_cache() which preps a pi_state
> for assignment later in futex_lock_pi_atomic(). This pi_state provides the
> necessary linkage to perform the priority boosting in the event of a
> priority inversion. This is handled externally from the futexes via the
> rt_mutex construct.
>
> Clear as mud?
Not quite that bad, but... The thing is, still, the man page has text
such as the following (based on your wording):
FUTEX_CMP_REQUEUE_PI (since Linux 2.6.31)
This operation is a PI-aware variant of FUTEX_CMP_REQUEUE.
It requeues waiters that are blocked via
FUTEX_WAIT_REQUEUE_PI on uaddr from a non-PI source futex
(uaddr) to a PI target futex (uaddr2).
And elsewhere you said
EINVAL is returned if the non-pi to pi or
op pairing semantics are violated.
When someone in user-land (e.g., me) reads pieces like that, they then
want to find somewhere in the man page a description of what makes a
futex a *PI futex* and probably some statements of the distinction
between PI and non-PI futexes. And those statements should be from a
perspective that is somewhat comprehensible to user-space. I'm not
yet confident that I can do that. Do you care to take a shot at it?
>> 2. Can you say something about the pairing requirements of
>> FUTEX_WAIT_REQUEUE_PI and FUTEX_CMP_REQUEUE_PI.
>> What is the requirement and why do we need it?
>
> Briefly, these op codes exist to support a fairly specific use case:
> support for PI aware pthread condvars (glibc patch acceptance STILL
> PENDING FOR LOVE OF EVERYTHING HOLY WHY?!?!?!
Yes, Jan Kiszka recently alerted me to the existence of
https://sourceware.org/bugzilla/show_bug.cgi?id=11588
and I still have some text that you proposed (mail titled
("Pthread Condition Variables and Priority Inversion")
quite a long time ago for the pthread_cond_timedwait() page.
One day, when that page exists, I'll try to remember to add it.
> But is shipped with various
> PREEMPT_RT enabled Linux systems. Because these calls are paired, and more
> of the logic can happen on the kernel side (to preserve ownership of an
> rt_mutex with waiters), so in order to ensure userspace and kernelspace
> remain in sync, we pre-specify the target of the requeue in
> futex_wait_requeue_pi. This also limits the attack surface by only
> supporting exactly what it was meant to do. The corner cases get insane
> otherwise.
Thanks. I've added some text on pairing, based on your text above.
> We could walk through the various ways in which it would break if these
> pairing restrictions were not in place, but I'll have to take some serious
> time to page all those into working memory. Let me know if we need more
> detail here and I will.
I don't think we need that much level of detail.
>> Most of the rest of this mail is just a checklist noting
>> what I did with your comments. No response is needed
>> in most cases, but there is one that I have marked with
>> "???". If you could reply to that. I'd be grateful.
>
> ...
>
>>> For all the PI opcodes, we should probably mention something about the
>>> futex value scheme (TID), whereas the other opcodes do not require any
>>> specific value scheme.
>>>
>>> No Owner: 0
>>> Owner: TID
>>> Waiters: TID | FUTEX_WAITERS
>>>
>>> This is the relevant section from the referenced paper:
>>>
>>> The PI futex operations diverge from the oth-
>>> ers in that they impose a policy describing how
>>> the futex value is to be used. If the lock is un-
>>> owned, the futex value shall be 0. If owned, it
>>> shall be the thread id (tid) of the owning thread.
>>> If there are threads contending for the lock, then
>>> the FUTEX_WAITERS flag is set. With this policy in
>>> place, userspace can atomically acquire an unowned
>>> lock or release an uncontended lock using an atomic
>>> instruction and their own tid. A non-zero futex
>>> value will force waiters into the kernel to lock. The
>>> FUTEX_WAITERS flag forces the owner into the kernel
>>> to unlock. If the callers are forced into the kernel,
>>> they then deal directly with an underlying rt_mutex
>>> which implements the priority inheritance semantics.
>>> After the rt_mutex is acquired, the futex value is up-
>>> dated accordingly, before the calling thread returns
>>> to userspace.
>>>
>>> It is important to note that the kernel will update the futex value
>>> prior
>>> to returning to userspace. Unlike other futex op codes,
>>> FUTEX_CMP_REUQUE_PI (and FUTEX_WAIT_REQUEUE_PI, FUTEX_LOCK_PI are
>>> designed
>>> for the implementation of very specific IPC mechanisms).
>>
>> ??? Great text. May I presume that I can take this text
>> and freely adapt it for the man page? (Actually, this is a
>> request for forgiveness, rather than permission :-).)
>
> Thanks, and no objection from me.
Thanks.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-17 19:26 ` Darren Hart
0 siblings, 0 replies; 145+ messages in thread
From: Darren Hart @ 2015-01-17 19:26 UTC (permalink / raw)
To: Michael Kerrisk (man-pages), Thomas Gleixner
Cc: Carlos O'Donell, Ingo Molnar, Jakub Jelinek, linux-man, lkml,
Arnd Bergmann, Steven Rostedt, Peter Zijlstra, Linux API,
Davidlohr Bueso, Jan Kiszka
On 1/17/15, 1:16 AM, "Michael Kerrisk (man-pages)"
<mtk.manpages@gmail.com> wrote:
>Hello Darren,
>
>On 01/17/2015 02:33 AM, Darren Hart wrote:
>> Corrected Davidlohr's email address.
>
>Thanks!
>
>> On 1/15/15, 7:12 AM, "Michael Kerrisk (man-pages)"
>> <mtk.manpages@gmail.com> wrote:
>>
>>> Hello Darren,
>>>
>>> I give you the same apology as to Thomas for the
>>> long-delayed response to your mail.
>>>
>>> And I repeat my note to Thomas:
>>> In the next day or two, I hope to send out the new version
>>> of the futex(2) page for review. The new draft is a bit
>>> bigger (okay -- 4 x bigger) than the current page. And there
>>> are a quite number of FIXMEs that I've placed in the page
>>> for various points--some minor, but a few major--that need
>>> to be checked or fixed. Would you have some time to review
>>> that page?
>>
>> I'll make the time for that. I've wanted to see this for a while, so
>>thank
>> you for working on it!
>
>Great!
>
>>> In the meantime, I have a couple of questions, which, if
>>> you could answer them, I would work some changes into the
>>> page before sending.
>>>
>>> 1. In various places, distinction is made between non-PI
>>> futexs and PI futexes. But what determines that distinction?
>>> From the kernel's perspective, hat make a futex one type
>>> or another? I presume it is to do with the types of blocking
>>> waiters on the futex, but it would be good to have a formal
>>> definition.
>>
>> You're right in that a uaddr is a uaddr is a uaddr. Also "there is no
>>such
>> thing as a futex", it doesn't exist as any kind of identifiable object,
>>so
>> these discussions can get rather confusing :-)
>
>So, I want to make sure that I am clear on what you mean you say this.
>You say "there is no such thing as a futex" because from the kernel's
>perspective there is no visible entity in the uncontended case
>(where everything can be dealt with in user space). And from user-space,
>in the uncontended case all we're doing is memory operations. Right?
>
>On the other hand, from a kernel perspective, we could say that a
>futex "exists" in the contended phases, since the kernel has allocated
>state associated with the uaddr. Right?
Sorry, this was more anecdotal, and probably more of a distraction than
constructive. I just meant that unlike other things which you can point to
a specific struct for (task, rt_mutex, etc.), a "futex" has it's state
distributed across the backing store (uaddr), the queue (futex_q), the
pi_state, the rt_mutex, etc, and these span kernel space and userspace.
Your description above is correct.
>
>> A "futex" becomes a PI futex when it is "created" via a PI futex op
>>code.
>
>Precisely which PI op codes? Is it: FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, and
>FUTEX_CMP_REQUEUE_PI, and not FUTEX_WAIT_REQUEUE_PI or FUTEX_UNLOCK_PI?
Based on your wording below about taking a user POV on this, I'm going to
say "yes" here. These opcodes paired with the PI futex value policy
(described below) defines a "futex" as PI aware. These were created very
specifically in support of PI pthread_mutexes, so it makes a lot more
sense to talk about a PI aware pthread_mutex, than a PI aware futex, since
there is a lot of policy and scaffolding that has to be built up around it
to use it properly (this is what a PI pthread_mutex is).
>> At that point, the syscall will ensure a pi_state is populated for the
>> futex_q entry. See futex_lock_pi() for example. Before the locks are
>> taken, there is a call to refill_pi_state_cache() which preps a pi_state
>> for assignment later in futex_lock_pi_atomic(). This pi_state provides
>>the
>> necessary linkage to perform the priority boosting in the event of a
>> priority inversion. This is handled externally from the futexes via the
>> rt_mutex construct.
>>
>> Clear as mud?
>
>Not quite that bad, but... The thing is, still, the man page has text
>such as the following (based on your wording):
>
> FUTEX_CMP_REQUEUE_PI (since Linux 2.6.31)
> This operation is a PI-aware variant of FUTEX_CMP_REQUEUE.
> It requeues waiters that are blocked via
> FUTEX_WAIT_REQUEUE_PI on uaddr from a non-PI source futex
> (uaddr) to a PI target futex (uaddr2).
>
>And elsewhere you said
>
> EINVAL is returned if the non-pi to pi or
> op pairing semantics are violated.
>
>When someone in user-land (e.g., me) reads pieces like that, they then
>want to find somewhere in the man page a description of what makes a
>futex a *PI futex* and probably some statements of the distinction
>between PI and non-PI futexes. And those statements should be from a
>perspective that is somewhat comprehensible to user-space. I'm not
>yet confident that I can do that. Do you care to take a shot at it?
Hrm, tricky indeed. From userspace, what makes a "futex" PI is the policy
agreement between kernel and userspace (which is the value of the futex:
0, TID, TID|WAITERS, and never just WAITERS, and the use of PI aware futex
op codes when making the futex syscalls.
For a longer discussion of this policy, see Documentation/pi-futex.txt.
Also note that this policy can be combined with that for robust futexes,
adding the OWNERDIED component.
--
Darren Hart
Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-17 19:26 ` Darren Hart
0 siblings, 0 replies; 145+ messages in thread
From: Darren Hart @ 2015-01-17 19:26 UTC (permalink / raw)
To: Michael Kerrisk (man-pages), Thomas Gleixner
Cc: Carlos O'Donell, Ingo Molnar, Jakub Jelinek,
linux-man-u79uwXL29TY76Z2rM5mHXA, lkml, Arnd Bergmann,
Steven Rostedt, Peter Zijlstra, Linux API, Davidlohr Bueso,
Jan Kiszka
On 1/17/15, 1:16 AM, "Michael Kerrisk (man-pages)"
<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>Hello Darren,
>
>On 01/17/2015 02:33 AM, Darren Hart wrote:
>> Corrected Davidlohr's email address.
>
>Thanks!
>
>> On 1/15/15, 7:12 AM, "Michael Kerrisk (man-pages)"
>> <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>
>>> Hello Darren,
>>>
>>> I give you the same apology as to Thomas for the
>>> long-delayed response to your mail.
>>>
>>> And I repeat my note to Thomas:
>>> In the next day or two, I hope to send out the new version
>>> of the futex(2) page for review. The new draft is a bit
>>> bigger (okay -- 4 x bigger) than the current page. And there
>>> are a quite number of FIXMEs that I've placed in the page
>>> for various points--some minor, but a few major--that need
>>> to be checked or fixed. Would you have some time to review
>>> that page?
>>
>> I'll make the time for that. I've wanted to see this for a while, so
>>thank
>> you for working on it!
>
>Great!
>
>>> In the meantime, I have a couple of questions, which, if
>>> you could answer them, I would work some changes into the
>>> page before sending.
>>>
>>> 1. In various places, distinction is made between non-PI
>>> futexs and PI futexes. But what determines that distinction?
>>> From the kernel's perspective, hat make a futex one type
>>> or another? I presume it is to do with the types of blocking
>>> waiters on the futex, but it would be good to have a formal
>>> definition.
>>
>> You're right in that a uaddr is a uaddr is a uaddr. Also "there is no
>>such
>> thing as a futex", it doesn't exist as any kind of identifiable object,
>>so
>> these discussions can get rather confusing :-)
>
>So, I want to make sure that I am clear on what you mean you say this.
>You say "there is no such thing as a futex" because from the kernel's
>perspective there is no visible entity in the uncontended case
>(where everything can be dealt with in user space). And from user-space,
>in the uncontended case all we're doing is memory operations. Right?
>
>On the other hand, from a kernel perspective, we could say that a
>futex "exists" in the contended phases, since the kernel has allocated
>state associated with the uaddr. Right?
Sorry, this was more anecdotal, and probably more of a distraction than
constructive. I just meant that unlike other things which you can point to
a specific struct for (task, rt_mutex, etc.), a "futex" has it's state
distributed across the backing store (uaddr), the queue (futex_q), the
pi_state, the rt_mutex, etc, and these span kernel space and userspace.
Your description above is correct.
>
>> A "futex" becomes a PI futex when it is "created" via a PI futex op
>>code.
>
>Precisely which PI op codes? Is it: FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, and
>FUTEX_CMP_REQUEUE_PI, and not FUTEX_WAIT_REQUEUE_PI or FUTEX_UNLOCK_PI?
Based on your wording below about taking a user POV on this, I'm going to
say "yes" here. These opcodes paired with the PI futex value policy
(described below) defines a "futex" as PI aware. These were created very
specifically in support of PI pthread_mutexes, so it makes a lot more
sense to talk about a PI aware pthread_mutex, than a PI aware futex, since
there is a lot of policy and scaffolding that has to be built up around it
to use it properly (this is what a PI pthread_mutex is).
>> At that point, the syscall will ensure a pi_state is populated for the
>> futex_q entry. See futex_lock_pi() for example. Before the locks are
>> taken, there is a call to refill_pi_state_cache() which preps a pi_state
>> for assignment later in futex_lock_pi_atomic(). This pi_state provides
>>the
>> necessary linkage to perform the priority boosting in the event of a
>> priority inversion. This is handled externally from the futexes via the
>> rt_mutex construct.
>>
>> Clear as mud?
>
>Not quite that bad, but... The thing is, still, the man page has text
>such as the following (based on your wording):
>
> FUTEX_CMP_REQUEUE_PI (since Linux 2.6.31)
> This operation is a PI-aware variant of FUTEX_CMP_REQUEUE.
> It requeues waiters that are blocked via
> FUTEX_WAIT_REQUEUE_PI on uaddr from a non-PI source futex
> (uaddr) to a PI target futex (uaddr2).
>
>And elsewhere you said
>
> EINVAL is returned if the non-pi to pi or
> op pairing semantics are violated.
>
>When someone in user-land (e.g., me) reads pieces like that, they then
>want to find somewhere in the man page a description of what makes a
>futex a *PI futex* and probably some statements of the distinction
>between PI and non-PI futexes. And those statements should be from a
>perspective that is somewhat comprehensible to user-space. I'm not
>yet confident that I can do that. Do you care to take a shot at it?
Hrm, tricky indeed. From userspace, what makes a "futex" PI is the policy
agreement between kernel and userspace (which is the value of the futex:
0, TID, TID|WAITERS, and never just WAITERS, and the use of PI aware futex
op codes when making the futex syscalls.
For a longer discussion of this policy, see Documentation/pi-futex.txt.
Also note that this policy can be combined with that for robust futexes,
adding the OWNERDIED component.
--
Darren Hart
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-18 10:18 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 145+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-18 10:18 UTC (permalink / raw)
To: Darren Hart, Thomas Gleixner
Cc: mtk.manpages, Carlos O'Donell, Ingo Molnar, Jakub Jelinek,
linux-man, lkml, Arnd Bergmann, Steven Rostedt, Peter Zijlstra,
Linux API, Davidlohr Bueso, Jan Kiszka
Hello Darren,
On 01/17/2015 08:26 PM, Darren Hart wrote:
>
> On 1/17/15, 1:16 AM, "Michael Kerrisk (man-pages)"
> <mtk.manpages@gmail.com> wrote:
[...]
>>>> In the meantime, I have a couple of questions, which, if
>>>> you could answer them, I would work some changes into the
>>>> page before sending.
>>>>
>>>> 1. In various places, distinction is made between non-PI
>>>> futexs and PI futexes. But what determines that distinction?
>>>> From the kernel's perspective, hat make a futex one type
>>>> or another? I presume it is to do with the types of blocking
>>>> waiters on the futex, but it would be good to have a formal
>>>> definition.
>>>
>>> You're right in that a uaddr is a uaddr is a uaddr. Also "there is no
>>> such
>>> thing as a futex", it doesn't exist as any kind of identifiable object,
>>> so
>>> these discussions can get rather confusing :-)
>>
>> So, I want to make sure that I am clear on what you mean you say this.
>> You say "there is no such thing as a futex" because from the kernel's
>> perspective there is no visible entity in the uncontended case
>> (where everything can be dealt with in user space). And from user-space,
>> in the uncontended case all we're doing is memory operations. Right?
>>
>> On the other hand, from a kernel perspective, we could say that a
>> futex "exists" in the contended phases, since the kernel has allocated
>> state associated with the uaddr. Right?
>
>
> Sorry, this was more anecdotal, and probably more of a distraction than
> constructive. I just meant that unlike other things which you can point to
> a specific struct for (task, rt_mutex, etc.), a "futex" has it's state
> distributed across the backing store (uaddr), the queue (futex_q), the
> pi_state, the rt_mutex, etc, and these span kernel space and userspace.
> Your description above is correct.
Okay. Thanks. I've added a few more words to the page noting that
the kernel maintains no state for a futex in the uncontended state.
>>> A "futex" becomes a PI futex when it is "created" via a PI futex op
>>> code.
>>
>> Precisely which PI op codes? Is it: FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, and
>> FUTEX_CMP_REQUEUE_PI, and not FUTEX_WAIT_REQUEUE_PI or FUTEX_UNLOCK_PI?
>
> Based on your wording below about taking a user POV on this, I'm going to
> say "yes" here. These opcodes paired with the PI futex value policy
> (described below) defines a "futex" as PI aware. These were created very
> specifically in support of PI pthread_mutexes, so it makes a lot more
> sense to talk about a PI aware pthread_mutex, than a PI aware futex, since
> there is a lot of policy and scaffolding that has to be built up around it
> to use it properly (this is what a PI pthread_mutex is).
See below.
>>> At that point, the syscall will ensure a pi_state is populated for the
>>> futex_q entry. See futex_lock_pi() for example. Before the locks are
>>> taken, there is a call to refill_pi_state_cache() which preps a pi_state
>>> for assignment later in futex_lock_pi_atomic(). This pi_state provides
>>> the
>>> necessary linkage to perform the priority boosting in the event of a
>>> priority inversion. This is handled externally from the futexes via the
>>> rt_mutex construct.
>>>
>>> Clear as mud?
>>
>> Not quite that bad, but... The thing is, still, the man page has text
>> such as the following (based on your wording):
>>
>> FUTEX_CMP_REQUEUE_PI (since Linux 2.6.31)
>> This operation is a PI-aware variant of FUTEX_CMP_REQUEUE.
>> It requeues waiters that are blocked via
>> FUTEX_WAIT_REQUEUE_PI on uaddr from a non-PI source futex
>> (uaddr) to a PI target futex (uaddr2).
>>
>> And elsewhere you said
>>
>> EINVAL is returned if the non-pi to pi or
>> op pairing semantics are violated.
>>
>> When someone in user-land (e.g., me) reads pieces like that, they then
>> want to find somewhere in the man page a description of what makes a
>> futex a *PI futex* and probably some statements of the distinction
>> between PI and non-PI futexes. And those statements should be from a
>> perspective that is somewhat comprehensible to user-space. I'm not
>> yet confident that I can do that. Do you care to take a shot at it?
>
> Hrm, tricky indeed. From userspace, what makes a "futex" PI is the policy
> agreement between kernel and userspace (which is the value of the futex:
> 0, TID, TID|WAITERS, and never just WAITERS, and the use of PI aware futex
> op codes when making the futex syscalls.
Okay -- I've attempted to capture this in some text that I added to the
page.
> For a longer discussion of this policy, see Documentation/pi-futex.txt.
Sad to say, that document doesn't supply that much more detail, in
my reading of it, at least.
> Also note that this policy can be combined with that for robust futexes,
> adding the OWNERDIED component.
Now there's two other stories that have yet to be dealt with ;-).
I have a FIXME already in the page regarding OWNERDIED, and
get_robust_list(2) is another page that seems like it could do with
a fair bit of work, but that's a story for another day.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-18 10:18 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 145+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-18 10:18 UTC (permalink / raw)
To: Darren Hart, Thomas Gleixner
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Carlos O'Donell,
Ingo Molnar, Jakub Jelinek, linux-man-u79uwXL29TY76Z2rM5mHXA,
lkml, Arnd Bergmann, Steven Rostedt, Peter Zijlstra, Linux API,
Davidlohr Bueso, Jan Kiszka
Hello Darren,
On 01/17/2015 08:26 PM, Darren Hart wrote:
>
> On 1/17/15, 1:16 AM, "Michael Kerrisk (man-pages)"
> <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
[...]
>>>> In the meantime, I have a couple of questions, which, if
>>>> you could answer them, I would work some changes into the
>>>> page before sending.
>>>>
>>>> 1. In various places, distinction is made between non-PI
>>>> futexs and PI futexes. But what determines that distinction?
>>>> From the kernel's perspective, hat make a futex one type
>>>> or another? I presume it is to do with the types of blocking
>>>> waiters on the futex, but it would be good to have a formal
>>>> definition.
>>>
>>> You're right in that a uaddr is a uaddr is a uaddr. Also "there is no
>>> such
>>> thing as a futex", it doesn't exist as any kind of identifiable object,
>>> so
>>> these discussions can get rather confusing :-)
>>
>> So, I want to make sure that I am clear on what you mean you say this.
>> You say "there is no such thing as a futex" because from the kernel's
>> perspective there is no visible entity in the uncontended case
>> (where everything can be dealt with in user space). And from user-space,
>> in the uncontended case all we're doing is memory operations. Right?
>>
>> On the other hand, from a kernel perspective, we could say that a
>> futex "exists" in the contended phases, since the kernel has allocated
>> state associated with the uaddr. Right?
>
>
> Sorry, this was more anecdotal, and probably more of a distraction than
> constructive. I just meant that unlike other things which you can point to
> a specific struct for (task, rt_mutex, etc.), a "futex" has it's state
> distributed across the backing store (uaddr), the queue (futex_q), the
> pi_state, the rt_mutex, etc, and these span kernel space and userspace.
> Your description above is correct.
Okay. Thanks. I've added a few more words to the page noting that
the kernel maintains no state for a futex in the uncontended state.
>>> A "futex" becomes a PI futex when it is "created" via a PI futex op
>>> code.
>>
>> Precisely which PI op codes? Is it: FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, and
>> FUTEX_CMP_REQUEUE_PI, and not FUTEX_WAIT_REQUEUE_PI or FUTEX_UNLOCK_PI?
>
> Based on your wording below about taking a user POV on this, I'm going to
> say "yes" here. These opcodes paired with the PI futex value policy
> (described below) defines a "futex" as PI aware. These were created very
> specifically in support of PI pthread_mutexes, so it makes a lot more
> sense to talk about a PI aware pthread_mutex, than a PI aware futex, since
> there is a lot of policy and scaffolding that has to be built up around it
> to use it properly (this is what a PI pthread_mutex is).
See below.
>>> At that point, the syscall will ensure a pi_state is populated for the
>>> futex_q entry. See futex_lock_pi() for example. Before the locks are
>>> taken, there is a call to refill_pi_state_cache() which preps a pi_state
>>> for assignment later in futex_lock_pi_atomic(). This pi_state provides
>>> the
>>> necessary linkage to perform the priority boosting in the event of a
>>> priority inversion. This is handled externally from the futexes via the
>>> rt_mutex construct.
>>>
>>> Clear as mud?
>>
>> Not quite that bad, but... The thing is, still, the man page has text
>> such as the following (based on your wording):
>>
>> FUTEX_CMP_REQUEUE_PI (since Linux 2.6.31)
>> This operation is a PI-aware variant of FUTEX_CMP_REQUEUE.
>> It requeues waiters that are blocked via
>> FUTEX_WAIT_REQUEUE_PI on uaddr from a non-PI source futex
>> (uaddr) to a PI target futex (uaddr2).
>>
>> And elsewhere you said
>>
>> EINVAL is returned if the non-pi to pi or
>> op pairing semantics are violated.
>>
>> When someone in user-land (e.g., me) reads pieces like that, they then
>> want to find somewhere in the man page a description of what makes a
>> futex a *PI futex* and probably some statements of the distinction
>> between PI and non-PI futexes. And those statements should be from a
>> perspective that is somewhat comprehensible to user-space. I'm not
>> yet confident that I can do that. Do you care to take a shot at it?
>
> Hrm, tricky indeed. From userspace, what makes a "futex" PI is the policy
> agreement between kernel and userspace (which is the value of the futex:
> 0, TID, TID|WAITERS, and never just WAITERS, and the use of PI aware futex
> op codes when making the futex syscalls.
Okay -- I've attempted to capture this in some text that I added to the
page.
> For a longer discussion of this policy, see Documentation/pi-futex.txt.
Sad to say, that document doesn't supply that much more detail, in
my reading of it, at least.
> Also note that this policy can be combined with that for robust futexes,
> adding the OWNERDIED component.
Now there's two other stories that have yet to be dealt with ;-).
I have a FIXME already in the page regarding OWNERDIED, and
get_robust_list(2) is another page that seems like it could do with
a fair bit of work, but that's a story for another day.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-15 15:10 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 145+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-15 15:10 UTC (permalink / raw)
To: Thomas Gleixner
Cc: mtk.manpages, Carlos O'Donell, Darren Hart, Ingo Molnar,
Jakub Jelinek, linux-man, lkml, Davidlohr Bueso, Arnd Bergmann,
Steven Rostedt, Peter Zijlstra, Linux API, Torvald Riegel,
Roland McGrath, Darren Hart, Anton Blanchard, Peter Zijlstra,
Petr Baudis, Eric Dumazet, bill o gallmeister, Jan Kiszka,
Daniel Wagner, Rich Felker
[Adding a few people to CC that have expressed interest in the
progress of the updates of this page, or who may be able to
provide review feedback. Eventually, you'll all get CCed on
the new draft of the page.]
Hello Thomas,
On 05/15/2014 04:14 PM, Thomas Gleixner wrote:
> On Thu, 15 May 2014, Michael Kerrisk (man-pages) wrote:
>> And that universe would love to have your documentation of
>> FUTEX_WAKE_BITSET and FUTEX_WAIT_BITSET ;-),
>
> I give you almost the full treatment, but I leave REQUEUE_PI to
> Darren and FUTEX_WAKE_OP to Jakub. :)
Thank you for the great effort you put into compiling the
text below, and apologies for my long delay in following up.
I've integrated almost all of your suggestions into the
manual page. I will shortly send out a new draft of the
page that contains various FIXMEs for points that remain
unclear.
Most of the rest of this mail is just a checklist noting
what I did with your comments. No response is needed
in most cases, but there are a very few open questions in
this mail that, to help you find them, I have marked with
"???". If you (or someone else) could reply to those, I
would be grateful.
In the next day or two, I hope to send out the new version
of the futex(2) page for review. The new draft is a bit
bigger (okay -- 4 x bigger) than the current page. And there
are a quite number of FIXMEs that I've placed in the page
for various points--some minor, but a few major--that need
to be checked or fixed. Would you have some time to review
that page?
For that matter, if anyone else would have time for
reviewing the page, could they shout out now. It's perhaps
unlikely, but I am worried about getting a thundering herd
of comments, and bringing the page to the state where I have
it now has already been a fairly demanding task.
==========
> FUTEX_WAIT
>
> < Existing blurb seems ok >
>
> Related return values
>
> [EFAULT] Kernel was unable to access the futex value at uaddr.
Added/reworked.
> [EINVAL] The supplied uaddr argument does not pouint to a valid
> object, i.e. pointer is not 4 byte aligned
Added.
> [EINVAL] The supplied timeout argument is not normalized.
Added, but with more detail.
> [EWOULDBLOCK] The atomic enqueueing failed.
Added.
Note, however, that for consistency, I'll use EAGAIN throughout
the page.
> User space value at uaddr
> is not equal val argument.
Was already present.
> [ETIMEDOUT] timeout expired
Was present, but I have now added more detail.
==========
> FUTEX_WAKE
>
> < Existing blurb seems ok >
>
> Related return values
>
> [EFAULT] Kernel was unable to access the futex value at uaddr.
Added/reworked.
> [EINVAL] The supplied uaddr argument does not point to a valid
> object, i.e. pointer is not 4 byte aligned
Added.
> [EINVAL] The kernel detected inconsistent state between the user
> space state at uaddr and the kernel state, i.e. it detected a waiter
> which waits in FUTEX_LOCK_PI
Added.
==========
> FUTEX_REQUEUE
>
> Existing blurb seems ok , except for this:
>
> The argument val contains the number of waiters on uaddr which are
> immediately woken up.
> The timeout argument is abused to transport the number of waiters
> which are requeued to the futex at uaddr2. The pointer is typecasted
> to u32.
What I've actually done with the main text for FUTEX_REQUEUE is defer
to the (now-expanded) discussion of FUTEX_CMP_REQUEUE.
> [EFAULT] Kernel was unable to access the futex value at uaddr or
> uaddr2
Added/reworked.
> [EINVAL] The supplied uaddr/uaddr2 arguments do not point to a valid
> object, i.e. pointer is not 4 byte aligned
Added.
> [EINVAL] The kernel detected inconsistent state between the user
> space state at uaddr and the kernel state, i.e. it detected a waiter
> which waits in FUTEX_LOCK_PI on uaddr
Added.
> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
??? I added this, but does this error not occur only for PI requeues?
==========
> FUTEX_REQUEUE_CMP
>
> Existing blurb seems ok , except for this:
[[
> The argument val is contains the number of waiters on uaddr which are
> immediately woken up.
>
> The timeout argument is abused to transport the number of waiters
> which are requeued to the futex at uaddr2. The pointer is typecasted
> to u32.
]]
Covered now (in more detail).
> Related return values
>
> [EFAULT] Kernel was unable to access the futex value at uaddr or
> uaddr2
Added/reworked.
> [EINVAL] The supplied uaddr/uaddr2 arguments do not point to a valid
> object, i.e. pointer is not 4 byte aligned
Added.
> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
Added.
> [EINVAL] The kernel detected inconsistent state between the user
> space state at uaddr and the kernel state, i.e. it detected a waiter
> which waits in FUTEX_LOCK_PI on uaddr
Added
> [EAGAIN] uaddr1 readout is not equal the compare value in argument
> val3
Was already present.
==========
> FUTEX_WAKE_OP
>
>
> Jakub, can you please explain it? I'm lost :)
I had a read of Ulrich Drepper's "Futexes are Tricky", and the source
code, and took a shot at it. I'd like to have someone check what
I wrote though. See the draft that I will soon send out.
> The argument val contains the maximum number of waiters on uaddr
> which are immediately woken up.
Covered in my new text.
> The timeout argument is abused to transport the maximum number of
> waiters on uaddr2 which are woken up. The pointer is typecasted to
> u32.
Covered in my new text.
> Related return values
>
> [EFAULT] Kernel was unable to access the futex values at uaddr or
> uaddr2
This point was covered already in ERRORS.
> [EINVAL] The supplied uaddr or uaddr2 argument does not point to a
> valid object, i.e. pointer is not 4 byte aligned
This point was covered already in ERRORS.
> [EINVAL] The kernel detected inconsistent state between the user
> space state at uaddr and the kernel state, i.e. it detected a waiter
> which waits in FUTEX_LOCK_PI on uaddr
I added this point.
==========
> FUTEX_WAIT_BITSET
>
> The same as FUTEX_WAIT except that val3 is used to provide a 32bit
> bitset to the kernel. This bitset is stored in the kernel internal
> state of the waiter.
Added.
> This futex op also allows to have the option bit FUTEX_CLOCK_REALTIME
> set.
Added.
> Related return values
>
> [EFAULT] Kernel was unable to access the futex value at uaddr.
Already covered.
> [EINVAL] The supplied uaddr argument does not point to a valid
> object, i.e. pointer is not 4 byte aligned
Already covered.
> [EINVAL] The supplied bitset is zero.
Added.
> [EINVAL] The supplied timeout argument is not normalized.
Already covered.
> [ETIMEDOUT] timeout expired
Already covered.
==========
> FUTEX_WAKE_BITSET
>
> The same as FUTEX_WAKE except that val3 is used to provide a 32bit
> bitset to the kernel. This bitset is used to select waiters on the
> futex. The selection is done by a bitwise AND of the wake side
> supplied bitset and the bitset which is stored in the kernel internal
> state of the waiters. If the result is non zero, the waiter is woken,
> otherwise left waiting.
Added (along with quite a bit of other detail).
> [EFAULT] Kernel was unable to access the futex value at uaddr.
Covered already.
> [EINVAL] The supplied uaddr argument does not point to a valid
> object, i.e. pointer is not 4 byte aligned
Covered already.
> [EINVAL] The supplied bitset is zero.
Added.
> [EINVAL] The kernel detected inconsistent state between the user
> space state at uaddr and the kernel state, i.e. it detected a waiter
> which waits in FUTEX_LOCK_PI
Added.
==========
> FUTEX_LOCK_PI
>
> This operation reads from the futex address provided by the uaddr
> argument, which contains the namespace specific TID of the lock
> owner. If the TID is 0, then the kernel tries to set the waiters TID
> atomically. If the TID is nonzero or the take over fails the kernel
> sets atomically the FUTEX_WAITERS bit which signals the owner, that
> it cannot unlock the futex in user space atomically by transitioning
> from TID to 0. After that the kernel tries to find the task which is
> associated to the owner TID, creates or reuses kernel state on behalf
> of the owner and attaches the waiter to it. The enqueing of the
> waiter is in descending priority order if more than one waiter
> exists. The owner inherits either the priority or the bandwidth of
> the waiter. This inheritance follows the lock chain in the case of
> nested locking and performs deadlock detection.
Added.
> The timeout argument is handled as described in FUTEX_WAIT. The
> arguments uaddr2, val, and val3 are ignored.
Added. Note, though, that some crufty code gives the impression
that FUTEX_LOCK_PI uses 'val'. I'll send a patch separately.
> Related return values
>
> [EFAULT] Kernel was unable to access the futex value at uaddr.
Already covered.
> [ENOMEM] Kernel could not allocate state
Added
> [EINVAL] The supplied uaddr argument does not point to a valid
> object, i.e. pointer is not 4 byte aligned
Already covered.
> [EINVAL] The supplied timeout argument is not normalized.
Already covered.
> [EINVAL]
> The kernel detected inconsistent state between the user space state
> at uaddr and the kernel state. Thats either state corruption or it
> found a waiter on uaddr which is waiting on FUTEX_WAIT[_BITSET]
Added.
> [EPERM] Caller is not allowed to attach itself to the futex. Can be
> a legitimate issue or a hint for state corruption in user space
Added.
> [ESRCH] The TID in the user space value does not exist
Added.
> [EAGAIN] The futex owner TID is about to exit, but has not yet
> handled the internal state cleanup. Try again.
Added.
> [ETIMEDOUT] timeout expired
Already covered.
> [EDEADLOCK] The futex is already locked by the caller or the kernel
> detected a deadlock scenario in a nested lock chain
Added.
> [EOWNERDIED] The owner of the futex died and the kernel made the
> caller the new owner. The kernel sets the FUTEX_OWNER_DIED bit in the
> futex userspace value. Caller is responsible for cleanup
There is no such thing as an EOWNERDIED error. I had a look
through the kernel source for the FUTEX_OWNER_DIED cases and didn't
see an obvious error associated with them. Can you clarify? (I think
the point is that this condition, which is described in
Documentation/robust-futexes.txt, is not an error as such. However, I'm
not yet sure of how to describe it in the man page.)
I will add this point as a FIXME in the new draft man page.
> [ENOSYS] Not implemented on all architectures and not supported on
> some CPU variants (runtime detection)
Added.
==========
> FUTEX_TRYLOCK_PI
>
> This operation tries to acquire the futex at uaddr. It deals with the
> situation where the TID value at uaddr is 0, but the FUTEX_HAS_WAITER
> bit is set. User space cannot handle this race free.
Added.
> The arguments uaddr2, val, timeout and val3 are ignored.
??? But the code reads:
case FUTEX_TRYLOCK_PI:
return futex_lock_pi(uaddr, flags, 0, timeout, 1);
which momentarily misleads one into thinking that 'timeout' is used.
And: it's not quite ignored, since in futex_lock_pi() a non-NULL
'timeout' is unconditionally dereferenced (meaning you could get
an EFAULT error for a bad 'timeout' pointer).
I'm confused....
Maybe the above code should be
case FUTEX_TRYLOCK_PI:
return futex_lock_pi(uaddr, flags, 0, NULL, 1);
?
> Return values:
>
> [EFAULT] Kernel was unable to access the futex value at uaddr.
Already covered.
> [ENOMEM] Kernel could not allocate state
Added.
> [EINVAL] The supplied uaddr argument does not point to a valid
> object, i.e. pointer is not 4 byte aligned
Already covered.
> [EINVAL] The kernel detected inconsistent state between the user
> space state at uaddr and the kernel state
Added, but with the same text as for FUTEX_LOCK_PI above. I.e., the text
"Thats either state corruption or it found a waiter on uaddr which is
waiting on FUTEX_WAIT[_BITSET]" is also included.
> [EPERM] Caller is not allowed to attach itself to the futex. Can be
> a legitimate issue or a hint for state corruption in user space
Added.
> [ESRCH] The TID in the user space value does not exist
Added.
> [EAGAIN] The futex owner TID is about to exit, but has not yet
> handled the internal state cleanup. Try again.
Added.
> [EDEADLOCK] The futex is already locked by the caller.
Added.
> [EOWNERDIED] The owner of the futex died and the kernel made the
> caller the new owner. The kernel sets the FUTEX_OWNER_DIED bit in the
> futex userspace value. Caller is responsible for cleanup
See comment above concerning EOWNERDIED for FUTEX_LOCK_PI
> [ENOSYS] Not implemented on all architectures and not supported on
> some CPU variants (runtime detection)
Added.
==========
> FUTEX_UNLOCK_PI
>
> This operation wakes the top priority waiter which is waiting in
> FUTEX_LOCK_PI on the futex address provided by the uaddr argument.
>
> This is called when the user space value at uaddr cannot be changed
> atomically from TID (of the owner) to 0.
>
> The arguments uaddr2, val, timeout and val3 are ignored.
Added.
> Related return values:
> [EINVAL] The kernel detected inconsistent
> state between the user space state at uaddr and the kernel state,
> i.e. it detected a waiter which waits in FUTEX_WAIT[_BITSET].
Added (but with a question in a FIXME).
> [EPERM] Caller does not own the futex.
Added.
> [ENOSYS] Not implemented on all architectures and not supported on
> some CPU variants (runtime detection)
Added.
==========
> FUTEX_WAIT_REQUEUE_PI
>
> Wait operation to wait on a non pi futex at uaddr and potentially be
> requeued on a pi futex at uaddr2. The wait operation on uaddr is the
> same as FUTEX_WAIT. The waiter can be removed from the wait on uaddr
> via FUTEX_WAKE without requeuing on uaddr2.
Added.
> The timeout argument is handled as described in FUTEX_WAIT.
The above seems not to be correct. I've written the discussion of
'timeout' up as I understand it, and added a FIXME to the draft page.
> Darren, can you fill in the missing details?
> Return values:
>
> [EFAULT] Kernel was unable to access the futex value at uaddr or
> uaddr2
Already covered.
> [EINVAL] The supplied uaddr or uaddr2 argument does not point to a
> valid object, i.e. pointer is not 4 byte aligned
Already covered.
> [EINVAL] The supplied timeout argument is not normalized.
Already covered.
> [EINVAL] The supplied bitset is zero.
??? I don't believe this can happen. 'val3' is internally set to
FUTEX_BITSET_MATCH_ANY. Can you confirm?
> [EWOULDBLOCK] The atomic enqueueing failed. User space value at uaddr
> is not equal val argument.
Added using the same text as FUTEX_WAIT:
EAGAIN (FUTEX_WAIT, FUTEX_WAIT_REQUEUE_PI) The value pointed to
by uaddr was not equal to the expected value val at the
time of the call.
> [ETIMEDOUT] timeout expired
Already covered.
> [EOWNERDIED] The owner of the PI futex at uaddr2 died and the kernel
> made the caller the new owner. The kernel sets the FUTEX_OWNER_DIED
> bit in the uaddr2 futex userspace value. Caller is responsible for
> cleanup
See comment above concerning EOWNERDIED for FUTEX_LOCK_PI
> [ENOSYS] Not implemented on all architectures and not supported on
> some CPU variants (runtime detection)
Added.
==========
> FUTEX_CMP_REQUEUE_PI
>
> PI aware variant of FUTEX_CMP_REQUEUE. Inner futex at uaddr is a non
> PI futex. Outer futex to which is requeued is a PI futex at uaddr2.
I instead used Darren's proposed text:
# PI aware variant for FUTEX_CMP_REQUEUE. Requeue tasks blocked on uaddr via
# FUTEX_WAIT_REQUEUE_PI from a non-PI source futex (uaddr) to a PI target
# futex (uaddr2).
> The waiters on uaddr must wait in FUTEX_WAIT_REQUEUE_PI.
Covered above.
> The argument val is contains the number of waiters on uaddr which are
> immediately woken up. Must be 1 for this opcode.
Added.
> The timeout argument is abused to transport the number of waiters
> which are requeued on to the futex at uaddr2. The pointer is
> typecasted to u32.
Added.
> Darren, can you fill in the missing details?
>
> [EFAULT] Kernel was unable to access the futex value at uaddr or
> uaddr2
Already covered.
> [ENOMEM] Kernel could not allocate state
Added.
> [EINVAL] The supplied uaddr/uaddr2 arguments do not point to a valid
> object, i.e. pointer is not 4 byte aligned
Already covered.
> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
Added.
> [EINVAL] The kernel detected inconsistent state between the user
> space state at uaddr and the kernel state, i.e. it detected a waiter
> which waits in FUTEX_LOCK_PI on uaddr
Added
> [EINVAL] The kernel detected inconsistent state between the user
> space state at uaddr and the kernel state, i.e. it detected a waiter
> which waits in FUTEX_WAIT[_BITSET] on uaddr
Added.
> [EINVAL] The kernel detected inconsistent state between the user
> space state at uaddr2 and the kernel state, i.e. it detected a waiter
> which waits in FUTEX_WAIT on uaddr2.
Added.
> [EINVAL] The supplied bitset is zero.
Darren Hart noted: Bitset doesn't apply to FUTEX_CMP_REQUEUE_PI.
> [EAGAIN] uaddr1 readout is not equal the compare value in argument
> val3
Added.
> [EAGAIN] The futex owner TID of uaddr2 is about to exit, but has not
> yet handled the internal state cleanup. Try again.
Added.
> [EPERM] Caller is not allowed to attach the waiter to the futex at
> uaddr2 Can be a legitimate issue or a hint for state corruption in
> user space
Added.
> [ESRCH] The TID in the user space value at uaddr2 does not exist
Added.
> [EDEADLOCK] The requeuing of a waiter to the kernel representation of
> the PI futex at uaddr2 detected a deadlock scenario.
Added.
> [ENOSYS] Not implemented on all architectures and not supported on
> some CPU variants (runtime detection)
Added.
==========
> The various option bits seem to be undocumented as well
Yes, thanks for that.
> FUTEX_PRIVATE_FLAG
I've added this one, along with the detail "(since Linux 2.6.22)"
> This option bit can be ored on all futex ops.
>
> It tells the kernel, that the futex is process private and not shared
> with another process. That allows the kernel to chose the fast path
> for validating the user space address and avoids expensive VMA
> lookup, taking refcounts on file backing store etc.
>
> FUTEX_CLOCK_REALTIME
I've added this one, along with the detail "(since Linux 2.6.28)"
> This option bit can be ored on the futex ops FUTEX_WAIT_BITSET and
> FUTEX_WAIT_REQUEUE_PI
>
> If set the kernel treats the user space supplied timeout as absolute
> time based on CLOCK_REALTIME.
>
> If not set the kernel treats the user space supplied timeout as
> relative time.
>
> If this is set on any other op than the supported ones, kernel
> returns ENOSYS!
The details in the preceding 4 paragraphs have been integrated.
Thanks,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-15 15:10 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 145+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-15 15:10 UTC (permalink / raw)
To: Thomas Gleixner
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Carlos O'Donell,
Darren Hart, Ingo Molnar, Jakub Jelinek,
linux-man-u79uwXL29TY76Z2rM5mHXA, lkml, Davidlohr Bueso,
Arnd Bergmann, Steven Rostedt, Peter Zijlstra, Linux API,
Torvald Riegel, Roland McGrath, Darren Hart, Anton Blanchard
[Adding a few people to CC that have expressed interest in the
progress of the updates of this page, or who may be able to
provide review feedback. Eventually, you'll all get CCed on
the new draft of the page.]
Hello Thomas,
On 05/15/2014 04:14 PM, Thomas Gleixner wrote:
> On Thu, 15 May 2014, Michael Kerrisk (man-pages) wrote:
>> And that universe would love to have your documentation of
>> FUTEX_WAKE_BITSET and FUTEX_WAIT_BITSET ;-),
>
> I give you almost the full treatment, but I leave REQUEUE_PI to
> Darren and FUTEX_WAKE_OP to Jakub. :)
Thank you for the great effort you put into compiling the
text below, and apologies for my long delay in following up.
I've integrated almost all of your suggestions into the
manual page. I will shortly send out a new draft of the
page that contains various FIXMEs for points that remain
unclear.
Most of the rest of this mail is just a checklist noting
what I did with your comments. No response is needed
in most cases, but there are a very few open questions in
this mail that, to help you find them, I have marked with
"???". If you (or someone else) could reply to those, I
would be grateful.
In the next day or two, I hope to send out the new version
of the futex(2) page for review. The new draft is a bit
bigger (okay -- 4 x bigger) than the current page. And there
are a quite number of FIXMEs that I've placed in the page
for various points--some minor, but a few major--that need
to be checked or fixed. Would you have some time to review
that page?
For that matter, if anyone else would have time for
reviewing the page, could they shout out now. It's perhaps
unlikely, but I am worried about getting a thundering herd
of comments, and bringing the page to the state where I have
it now has already been a fairly demanding task.
==========
> FUTEX_WAIT
>
> < Existing blurb seems ok >
>
> Related return values
>
> [EFAULT] Kernel was unable to access the futex value at uaddr.
Added/reworked.
> [EINVAL] The supplied uaddr argument does not pouint to a valid
> object, i.e. pointer is not 4 byte aligned
Added.
> [EINVAL] The supplied timeout argument is not normalized.
Added, but with more detail.
> [EWOULDBLOCK] The atomic enqueueing failed.
Added.
Note, however, that for consistency, I'll use EAGAIN throughout
the page.
> User space value at uaddr
> is not equal val argument.
Was already present.
> [ETIMEDOUT] timeout expired
Was present, but I have now added more detail.
==========
> FUTEX_WAKE
>
> < Existing blurb seems ok >
>
> Related return values
>
> [EFAULT] Kernel was unable to access the futex value at uaddr.
Added/reworked.
> [EINVAL] The supplied uaddr argument does not point to a valid
> object, i.e. pointer is not 4 byte aligned
Added.
> [EINVAL] The kernel detected inconsistent state between the user
> space state at uaddr and the kernel state, i.e. it detected a waiter
> which waits in FUTEX_LOCK_PI
Added.
==========
> FUTEX_REQUEUE
>
> Existing blurb seems ok , except for this:
>
> The argument val contains the number of waiters on uaddr which are
> immediately woken up.
> The timeout argument is abused to transport the number of waiters
> which are requeued to the futex at uaddr2. The pointer is typecasted
> to u32.
What I've actually done with the main text for FUTEX_REQUEUE is defer
to the (now-expanded) discussion of FUTEX_CMP_REQUEUE.
> [EFAULT] Kernel was unable to access the futex value at uaddr or
> uaddr2
Added/reworked.
> [EINVAL] The supplied uaddr/uaddr2 arguments do not point to a valid
> object, i.e. pointer is not 4 byte aligned
Added.
> [EINVAL] The kernel detected inconsistent state between the user
> space state at uaddr and the kernel state, i.e. it detected a waiter
> which waits in FUTEX_LOCK_PI on uaddr
Added.
> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
??? I added this, but does this error not occur only for PI requeues?
==========
> FUTEX_REQUEUE_CMP
>
> Existing blurb seems ok , except for this:
[[
> The argument val is contains the number of waiters on uaddr which are
> immediately woken up.
>
> The timeout argument is abused to transport the number of waiters
> which are requeued to the futex at uaddr2. The pointer is typecasted
> to u32.
]]
Covered now (in more detail).
> Related return values
>
> [EFAULT] Kernel was unable to access the futex value at uaddr or
> uaddr2
Added/reworked.
> [EINVAL] The supplied uaddr/uaddr2 arguments do not point to a valid
> object, i.e. pointer is not 4 byte aligned
Added.
> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
Added.
> [EINVAL] The kernel detected inconsistent state between the user
> space state at uaddr and the kernel state, i.e. it detected a waiter
> which waits in FUTEX_LOCK_PI on uaddr
Added
> [EAGAIN] uaddr1 readout is not equal the compare value in argument
> val3
Was already present.
==========
> FUTEX_WAKE_OP
>
>
> Jakub, can you please explain it? I'm lost :)
I had a read of Ulrich Drepper's "Futexes are Tricky", and the source
code, and took a shot at it. I'd like to have someone check what
I wrote though. See the draft that I will soon send out.
> The argument val contains the maximum number of waiters on uaddr
> which are immediately woken up.
Covered in my new text.
> The timeout argument is abused to transport the maximum number of
> waiters on uaddr2 which are woken up. The pointer is typecasted to
> u32.
Covered in my new text.
> Related return values
>
> [EFAULT] Kernel was unable to access the futex values at uaddr or
> uaddr2
This point was covered already in ERRORS.
> [EINVAL] The supplied uaddr or uaddr2 argument does not point to a
> valid object, i.e. pointer is not 4 byte aligned
This point was covered already in ERRORS.
> [EINVAL] The kernel detected inconsistent state between the user
> space state at uaddr and the kernel state, i.e. it detected a waiter
> which waits in FUTEX_LOCK_PI on uaddr
I added this point.
==========
> FUTEX_WAIT_BITSET
>
> The same as FUTEX_WAIT except that val3 is used to provide a 32bit
> bitset to the kernel. This bitset is stored in the kernel internal
> state of the waiter.
Added.
> This futex op also allows to have the option bit FUTEX_CLOCK_REALTIME
> set.
Added.
> Related return values
>
> [EFAULT] Kernel was unable to access the futex value at uaddr.
Already covered.
> [EINVAL] The supplied uaddr argument does not point to a valid
> object, i.e. pointer is not 4 byte aligned
Already covered.
> [EINVAL] The supplied bitset is zero.
Added.
> [EINVAL] The supplied timeout argument is not normalized.
Already covered.
> [ETIMEDOUT] timeout expired
Already covered.
==========
> FUTEX_WAKE_BITSET
>
> The same as FUTEX_WAKE except that val3 is used to provide a 32bit
> bitset to the kernel. This bitset is used to select waiters on the
> futex. The selection is done by a bitwise AND of the wake side
> supplied bitset and the bitset which is stored in the kernel internal
> state of the waiters. If the result is non zero, the waiter is woken,
> otherwise left waiting.
Added (along with quite a bit of other detail).
> [EFAULT] Kernel was unable to access the futex value at uaddr.
Covered already.
> [EINVAL] The supplied uaddr argument does not point to a valid
> object, i.e. pointer is not 4 byte aligned
Covered already.
> [EINVAL] The supplied bitset is zero.
Added.
> [EINVAL] The kernel detected inconsistent state between the user
> space state at uaddr and the kernel state, i.e. it detected a waiter
> which waits in FUTEX_LOCK_PI
Added.
==========
> FUTEX_LOCK_PI
>
> This operation reads from the futex address provided by the uaddr
> argument, which contains the namespace specific TID of the lock
> owner. If the TID is 0, then the kernel tries to set the waiters TID
> atomically. If the TID is nonzero or the take over fails the kernel
> sets atomically the FUTEX_WAITERS bit which signals the owner, that
> it cannot unlock the futex in user space atomically by transitioning
> from TID to 0. After that the kernel tries to find the task which is
> associated to the owner TID, creates or reuses kernel state on behalf
> of the owner and attaches the waiter to it. The enqueing of the
> waiter is in descending priority order if more than one waiter
> exists. The owner inherits either the priority or the bandwidth of
> the waiter. This inheritance follows the lock chain in the case of
> nested locking and performs deadlock detection.
Added.
> The timeout argument is handled as described in FUTEX_WAIT. The
> arguments uaddr2, val, and val3 are ignored.
Added. Note, though, that some crufty code gives the impression
that FUTEX_LOCK_PI uses 'val'. I'll send a patch separately.
> Related return values
>
> [EFAULT] Kernel was unable to access the futex value at uaddr.
Already covered.
> [ENOMEM] Kernel could not allocate state
Added
> [EINVAL] The supplied uaddr argument does not point to a valid
> object, i.e. pointer is not 4 byte aligned
Already covered.
> [EINVAL] The supplied timeout argument is not normalized.
Already covered.
> [EINVAL]
> The kernel detected inconsistent state between the user space state
> at uaddr and the kernel state. Thats either state corruption or it
> found a waiter on uaddr which is waiting on FUTEX_WAIT[_BITSET]
Added.
> [EPERM] Caller is not allowed to attach itself to the futex. Can be
> a legitimate issue or a hint for state corruption in user space
Added.
> [ESRCH] The TID in the user space value does not exist
Added.
> [EAGAIN] The futex owner TID is about to exit, but has not yet
> handled the internal state cleanup. Try again.
Added.
> [ETIMEDOUT] timeout expired
Already covered.
> [EDEADLOCK] The futex is already locked by the caller or the kernel
> detected a deadlock scenario in a nested lock chain
Added.
> [EOWNERDIED] The owner of the futex died and the kernel made the
> caller the new owner. The kernel sets the FUTEX_OWNER_DIED bit in the
> futex userspace value. Caller is responsible for cleanup
There is no such thing as an EOWNERDIED error. I had a look
through the kernel source for the FUTEX_OWNER_DIED cases and didn't
see an obvious error associated with them. Can you clarify? (I think
the point is that this condition, which is described in
Documentation/robust-futexes.txt, is not an error as such. However, I'm
not yet sure of how to describe it in the man page.)
I will add this point as a FIXME in the new draft man page.
> [ENOSYS] Not implemented on all architectures and not supported on
> some CPU variants (runtime detection)
Added.
==========
> FUTEX_TRYLOCK_PI
>
> This operation tries to acquire the futex at uaddr. It deals with the
> situation where the TID value at uaddr is 0, but the FUTEX_HAS_WAITER
> bit is set. User space cannot handle this race free.
Added.
> The arguments uaddr2, val, timeout and val3 are ignored.
??? But the code reads:
case FUTEX_TRYLOCK_PI:
return futex_lock_pi(uaddr, flags, 0, timeout, 1);
which momentarily misleads one into thinking that 'timeout' is used.
And: it's not quite ignored, since in futex_lock_pi() a non-NULL
'timeout' is unconditionally dereferenced (meaning you could get
an EFAULT error for a bad 'timeout' pointer).
I'm confused....
Maybe the above code should be
case FUTEX_TRYLOCK_PI:
return futex_lock_pi(uaddr, flags, 0, NULL, 1);
?
> Return values:
>
> [EFAULT] Kernel was unable to access the futex value at uaddr.
Already covered.
> [ENOMEM] Kernel could not allocate state
Added.
> [EINVAL] The supplied uaddr argument does not point to a valid
> object, i.e. pointer is not 4 byte aligned
Already covered.
> [EINVAL] The kernel detected inconsistent state between the user
> space state at uaddr and the kernel state
Added, but with the same text as for FUTEX_LOCK_PI above. I.e., the text
"Thats either state corruption or it found a waiter on uaddr which is
waiting on FUTEX_WAIT[_BITSET]" is also included.
> [EPERM] Caller is not allowed to attach itself to the futex. Can be
> a legitimate issue or a hint for state corruption in user space
Added.
> [ESRCH] The TID in the user space value does not exist
Added.
> [EAGAIN] The futex owner TID is about to exit, but has not yet
> handled the internal state cleanup. Try again.
Added.
> [EDEADLOCK] The futex is already locked by the caller.
Added.
> [EOWNERDIED] The owner of the futex died and the kernel made the
> caller the new owner. The kernel sets the FUTEX_OWNER_DIED bit in the
> futex userspace value. Caller is responsible for cleanup
See comment above concerning EOWNERDIED for FUTEX_LOCK_PI
> [ENOSYS] Not implemented on all architectures and not supported on
> some CPU variants (runtime detection)
Added.
==========
> FUTEX_UNLOCK_PI
>
> This operation wakes the top priority waiter which is waiting in
> FUTEX_LOCK_PI on the futex address provided by the uaddr argument.
>
> This is called when the user space value at uaddr cannot be changed
> atomically from TID (of the owner) to 0.
>
> The arguments uaddr2, val, timeout and val3 are ignored.
Added.
> Related return values:
> [EINVAL] The kernel detected inconsistent
> state between the user space state at uaddr and the kernel state,
> i.e. it detected a waiter which waits in FUTEX_WAIT[_BITSET].
Added (but with a question in a FIXME).
> [EPERM] Caller does not own the futex.
Added.
> [ENOSYS] Not implemented on all architectures and not supported on
> some CPU variants (runtime detection)
Added.
==========
> FUTEX_WAIT_REQUEUE_PI
>
> Wait operation to wait on a non pi futex at uaddr and potentially be
> requeued on a pi futex at uaddr2. The wait operation on uaddr is the
> same as FUTEX_WAIT. The waiter can be removed from the wait on uaddr
> via FUTEX_WAKE without requeuing on uaddr2.
Added.
> The timeout argument is handled as described in FUTEX_WAIT.
The above seems not to be correct. I've written the discussion of
'timeout' up as I understand it, and added a FIXME to the draft page.
> Darren, can you fill in the missing details?
> Return values:
>
> [EFAULT] Kernel was unable to access the futex value at uaddr or
> uaddr2
Already covered.
> [EINVAL] The supplied uaddr or uaddr2 argument does not point to a
> valid object, i.e. pointer is not 4 byte aligned
Already covered.
> [EINVAL] The supplied timeout argument is not normalized.
Already covered.
> [EINVAL] The supplied bitset is zero.
??? I don't believe this can happen. 'val3' is internally set to
FUTEX_BITSET_MATCH_ANY. Can you confirm?
> [EWOULDBLOCK] The atomic enqueueing failed. User space value at uaddr
> is not equal val argument.
Added using the same text as FUTEX_WAIT:
EAGAIN (FUTEX_WAIT, FUTEX_WAIT_REQUEUE_PI) The value pointed to
by uaddr was not equal to the expected value val at the
time of the call.
> [ETIMEDOUT] timeout expired
Already covered.
> [EOWNERDIED] The owner of the PI futex at uaddr2 died and the kernel
> made the caller the new owner. The kernel sets the FUTEX_OWNER_DIED
> bit in the uaddr2 futex userspace value. Caller is responsible for
> cleanup
See comment above concerning EOWNERDIED for FUTEX_LOCK_PI
> [ENOSYS] Not implemented on all architectures and not supported on
> some CPU variants (runtime detection)
Added.
==========
> FUTEX_CMP_REQUEUE_PI
>
> PI aware variant of FUTEX_CMP_REQUEUE. Inner futex at uaddr is a non
> PI futex. Outer futex to which is requeued is a PI futex at uaddr2.
I instead used Darren's proposed text:
# PI aware variant for FUTEX_CMP_REQUEUE. Requeue tasks blocked on uaddr via
# FUTEX_WAIT_REQUEUE_PI from a non-PI source futex (uaddr) to a PI target
# futex (uaddr2).
> The waiters on uaddr must wait in FUTEX_WAIT_REQUEUE_PI.
Covered above.
> The argument val is contains the number of waiters on uaddr which are
> immediately woken up. Must be 1 for this opcode.
Added.
> The timeout argument is abused to transport the number of waiters
> which are requeued on to the futex at uaddr2. The pointer is
> typecasted to u32.
Added.
> Darren, can you fill in the missing details?
>
> [EFAULT] Kernel was unable to access the futex value at uaddr or
> uaddr2
Already covered.
> [ENOMEM] Kernel could not allocate state
Added.
> [EINVAL] The supplied uaddr/uaddr2 arguments do not point to a valid
> object, i.e. pointer is not 4 byte aligned
Already covered.
> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
Added.
> [EINVAL] The kernel detected inconsistent state between the user
> space state at uaddr and the kernel state, i.e. it detected a waiter
> which waits in FUTEX_LOCK_PI on uaddr
Added
> [EINVAL] The kernel detected inconsistent state between the user
> space state at uaddr and the kernel state, i.e. it detected a waiter
> which waits in FUTEX_WAIT[_BITSET] on uaddr
Added.
> [EINVAL] The kernel detected inconsistent state between the user
> space state at uaddr2 and the kernel state, i.e. it detected a waiter
> which waits in FUTEX_WAIT on uaddr2.
Added.
> [EINVAL] The supplied bitset is zero.
Darren Hart noted: Bitset doesn't apply to FUTEX_CMP_REQUEUE_PI.
> [EAGAIN] uaddr1 readout is not equal the compare value in argument
> val3
Added.
> [EAGAIN] The futex owner TID of uaddr2 is about to exit, but has not
> yet handled the internal state cleanup. Try again.
Added.
> [EPERM] Caller is not allowed to attach the waiter to the futex at
> uaddr2 Can be a legitimate issue or a hint for state corruption in
> user space
Added.
> [ESRCH] The TID in the user space value at uaddr2 does not exist
Added.
> [EDEADLOCK] The requeuing of a waiter to the kernel representation of
> the PI futex at uaddr2 detected a deadlock scenario.
Added.
> [ENOSYS] Not implemented on all architectures and not supported on
> some CPU variants (runtime detection)
Added.
==========
> The various option bits seem to be undocumented as well
Yes, thanks for that.
> FUTEX_PRIVATE_FLAG
I've added this one, along with the detail "(since Linux 2.6.22)"
> This option bit can be ored on all futex ops.
>
> It tells the kernel, that the futex is process private and not shared
> with another process. That allows the kernel to chose the fast path
> for validating the user space address and avoids expensive VMA
> lookup, taking refcounts on file backing store etc.
>
> FUTEX_CLOCK_REALTIME
I've added this one, along with the detail "(since Linux 2.6.28)"
> This option bit can be ored on the futex ops FUTEX_WAIT_BITSET and
> FUTEX_WAIT_REQUEUE_PI
>
> If set the kernel treats the user space supplied timeout as absolute
> time based on CLOCK_REALTIME.
>
> If not set the kernel treats the user space supplied timeout as
> relative time.
>
> If this is set on any other op than the supported ones, kernel
> returns ENOSYS!
The details in the preceding 4 paragraphs have been integrated.
Thanks,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-15 22:23 ` Thomas Gleixner
0 siblings, 0 replies; 145+ messages in thread
From: Thomas Gleixner @ 2015-01-15 22:23 UTC (permalink / raw)
To: Michael Kerrisk (man-pages)
Cc: Carlos O'Donell, Darren Hart, Ingo Molnar, Jakub Jelinek,
linux-man, lkml, Davidlohr Bueso, Arnd Bergmann, Steven Rostedt,
Peter Zijlstra, Linux API, Torvald Riegel, Roland McGrath,
Darren Hart, Anton Blanchard, Peter Zijlstra, Petr Baudis,
Eric Dumazet, bill o gallmeister, Jan Kiszka, Daniel Wagner,
Rich Felker
On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
> > [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>
> ??? I added this, but does this error not occur only for PI requeues?
It's equally wrong for normal futexes. And its actually the same code
checking for this for all variants.
> > [EDEADLOCK] The futex is already locked by the caller or the kernel
> > detected a deadlock scenario in a nested lock chain
>
> Added.
It's actually EDEADLK
>
> > [EOWNERDIED] The owner of the futex died and the kernel made the
> > caller the new owner. The kernel sets the FUTEX_OWNER_DIED bit in the
> > futex userspace value. Caller is responsible for cleanup
>
> There is no such thing as an EOWNERDIED error. I had a look
> through the kernel source for the FUTEX_OWNER_DIED cases and didn't
> see an obvious error associated with them. Can you clarify? (I think
> the point is that this condition, which is described in
> Documentation/robust-futexes.txt, is not an error as such. However, I'm
> not yet sure of how to describe it in the man page.)
> I will add this point as a FIXME in the new draft man page.
Oops. My bad. That's not the what the kernel does. The kernel merily
marks it in the futex itself with FUTEX_OWNER_DIED. User space needs
to deal with that and the posix users return EOWNERDEAD (not
EOWNERDIED], so it's not part of the futex call itself.
We had discussions about returning EOWNERDEAD in that case, but then
glibc with its sophisticated error handling prevented that ....
> > FUTEX_TRYLOCK_PI
> >
> > This operation tries to acquire the futex at uaddr. It deals with the
> > situation where the TID value at uaddr is 0, but the FUTEX_HAS_WAITER
> > bit is set. User space cannot handle this race free.
>
> Added.
>
> > The arguments uaddr2, val, timeout and val3 are ignored.
>
> ??? But the code reads:
>
> case FUTEX_TRYLOCK_PI:
> return futex_lock_pi(uaddr, flags, 0, timeout, 1);
>
> which momentarily misleads one into thinking that 'timeout' is used.
> And: it's not quite ignored, since in futex_lock_pi() a non-NULL
> 'timeout' is unconditionally dereferenced (meaning you could get
> an EFAULT error for a bad 'timeout' pointer).
> I'm confused....
Indeed. That's just wrong.
> Maybe the above code should be
>
> case FUTEX_TRYLOCK_PI:
> return futex_lock_pi(uaddr, flags, 0, NULL, 1);
> ?
Care to send a patch?
> > FUTEX_WAIT_REQUEUE_PI
> >
> > Wait operation to wait on a non pi futex at uaddr and potentially be
> > requeued on a pi futex at uaddr2. The wait operation on uaddr is the
> > same as FUTEX_WAIT. The waiter can be removed from the wait on uaddr
> > via FUTEX_WAKE without requeuing on uaddr2.
>
> Added.
>
> > The timeout argument is handled as described in FUTEX_WAIT.
>
> The above seems not to be correct. I've written the discussion of
> 'timeout' up as I understand it, and added a FIXME to the draft page.
>
> > Darren, can you fill in the missing details?
>
> > Return values:
> >
> > [EFAULT] Kernel was unable to access the futex value at uaddr or
> > uaddr2
>
> Already covered.
>
> > [EINVAL] The supplied uaddr or uaddr2 argument does not point to a
> > valid object, i.e. pointer is not 4 byte aligned
>
> Already covered.
>
> > [EINVAL] The supplied timeout argument is not normalized.
>
> Already covered.
>
> > [EINVAL] The supplied bitset is zero.
>
> ??? I don't believe this can happen. 'val3' is internally set to
> FUTEX_BITSET_MATCH_ANY. Can you confirm?
Right. We dont support that bitset stuff in requeue_pi ATM.
Thanks,
tglx
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-15 22:23 ` Thomas Gleixner
0 siblings, 0 replies; 145+ messages in thread
From: Thomas Gleixner @ 2015-01-15 22:23 UTC (permalink / raw)
To: Michael Kerrisk (man-pages)
Cc: Carlos O'Donell, Darren Hart, Ingo Molnar, Jakub Jelinek,
linux-man-u79uwXL29TY76Z2rM5mHXA, lkml, Davidlohr Bueso,
Arnd Bergmann, Steven Rostedt, Peter Zijlstra, Linux API,
Torvald Riegel, Roland McGrath, Darren Hart, Anton Blanchard
On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
> > [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>
> ??? I added this, but does this error not occur only for PI requeues?
It's equally wrong for normal futexes. And its actually the same code
checking for this for all variants.
> > [EDEADLOCK] The futex is already locked by the caller or the kernel
> > detected a deadlock scenario in a nested lock chain
>
> Added.
It's actually EDEADLK
>
> > [EOWNERDIED] The owner of the futex died and the kernel made the
> > caller the new owner. The kernel sets the FUTEX_OWNER_DIED bit in the
> > futex userspace value. Caller is responsible for cleanup
>
> There is no such thing as an EOWNERDIED error. I had a look
> through the kernel source for the FUTEX_OWNER_DIED cases and didn't
> see an obvious error associated with them. Can you clarify? (I think
> the point is that this condition, which is described in
> Documentation/robust-futexes.txt, is not an error as such. However, I'm
> not yet sure of how to describe it in the man page.)
> I will add this point as a FIXME in the new draft man page.
Oops. My bad. That's not the what the kernel does. The kernel merily
marks it in the futex itself with FUTEX_OWNER_DIED. User space needs
to deal with that and the posix users return EOWNERDEAD (not
EOWNERDIED], so it's not part of the futex call itself.
We had discussions about returning EOWNERDEAD in that case, but then
glibc with its sophisticated error handling prevented that ....
> > FUTEX_TRYLOCK_PI
> >
> > This operation tries to acquire the futex at uaddr. It deals with the
> > situation where the TID value at uaddr is 0, but the FUTEX_HAS_WAITER
> > bit is set. User space cannot handle this race free.
>
> Added.
>
> > The arguments uaddr2, val, timeout and val3 are ignored.
>
> ??? But the code reads:
>
> case FUTEX_TRYLOCK_PI:
> return futex_lock_pi(uaddr, flags, 0, timeout, 1);
>
> which momentarily misleads one into thinking that 'timeout' is used.
> And: it's not quite ignored, since in futex_lock_pi() a non-NULL
> 'timeout' is unconditionally dereferenced (meaning you could get
> an EFAULT error for a bad 'timeout' pointer).
> I'm confused....
Indeed. That's just wrong.
> Maybe the above code should be
>
> case FUTEX_TRYLOCK_PI:
> return futex_lock_pi(uaddr, flags, 0, NULL, 1);
> ?
Care to send a patch?
> > FUTEX_WAIT_REQUEUE_PI
> >
> > Wait operation to wait on a non pi futex at uaddr and potentially be
> > requeued on a pi futex at uaddr2. The wait operation on uaddr is the
> > same as FUTEX_WAIT. The waiter can be removed from the wait on uaddr
> > via FUTEX_WAKE without requeuing on uaddr2.
>
> Added.
>
> > The timeout argument is handled as described in FUTEX_WAIT.
>
> The above seems not to be correct. I've written the discussion of
> 'timeout' up as I understand it, and added a FIXME to the draft page.
>
> > Darren, can you fill in the missing details?
>
> > Return values:
> >
> > [EFAULT] Kernel was unable to access the futex value at uaddr or
> > uaddr2
>
> Already covered.
>
> > [EINVAL] The supplied uaddr or uaddr2 argument does not point to a
> > valid object, i.e. pointer is not 4 byte aligned
>
> Already covered.
>
> > [EINVAL] The supplied timeout argument is not normalized.
>
> Already covered.
>
> > [EINVAL] The supplied bitset is zero.
>
> ??? I don't believe this can happen. 'val3' is internally set to
> FUTEX_BITSET_MATCH_ANY. Can you confirm?
Right. We dont support that bitset stuff in requeue_pi ATM.
Thanks,
tglx
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
2015-01-15 22:23 ` Thomas Gleixner
@ 2015-01-16 15:17 ` Michael Kerrisk (man-pages)
-1 siblings, 0 replies; 145+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-16 15:17 UTC (permalink / raw)
To: Thomas Gleixner
Cc: mtk.manpages, Carlos O'Donell, Darren Hart, Ingo Molnar,
Jakub Jelinek, linux-man, lkml, Davidlohr Bueso, Arnd Bergmann,
Steven Rostedt, Peter Zijlstra, Linux API, Torvald Riegel,
Roland McGrath, Darren Hart, Anton Blanchard, Petr Baudis,
Eric Dumazet, bill o gallmeister, Jan Kiszka, Daniel Wagner,
Rich Felker
Hello Thomas,
On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
> On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
>>> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>>
>> ??? I added this, but does this error not occur only for PI requeues?
>
> It's equally wrong for normal futexes. And its actually the same code
> checking for this for all variants.
I don't understand "equally wrong" in your reply, I'm sorry. Do you
mean:
a) This error text should be there for both normal and PI requeues
OR
a) This error text should be there for neither normal nor PI requeues
>>> [EDEADLOCK] The futex is already locked by the caller or the kernel
>>> detected a deadlock scenario in a nested lock chain
>>
>> Added.
>
> It's actually EDEADLK
Yes, sorry -- I should have said that I already found and fixed
that problem.
>>> [EOWNERDIED] The owner of the futex died and the kernel made the
>>> caller the new owner. The kernel sets the FUTEX_OWNER_DIED bit in the
>>> futex userspace value. Caller is responsible for cleanup
>>
>> There is no such thing as an EOWNERDIED error. I had a look
>> through the kernel source for the FUTEX_OWNER_DIED cases and didn't
>> see an obvious error associated with them. Can you clarify? (I think
>> the point is that this condition, which is described in
>> Documentation/robust-futexes.txt, is not an error as such. However, I'm
>> not yet sure of how to describe it in the man page.)
>> I will add this point as a FIXME in the new draft man page.
>
> Oops. My bad. That's not the what the kernel does. The kernel merily
> marks it in the futex itself with FUTEX_OWNER_DIED. User space needs
> to deal with that and the posix users return EOWNERDEAD (not
> EOWNERDIED], so it's not part of the futex call itself.
>
> We had discussions about returning EOWNERDEAD in that case, but then
> glibc with its sophisticated error handling prevented that ....
Okay. I'll add a FIXME to the draft page, to see if we get some good
text together to describe FUTEX_OWNER_DIED and how it is used.
>>> FUTEX_TRYLOCK_PI
>>>
>>> This operation tries to acquire the futex at uaddr. It deals with the
>>> situation where the TID value at uaddr is 0, but the FUTEX_HAS_WAITER
>>> bit is set. User space cannot handle this race free.
>>
>> Added.
>>
>>> The arguments uaddr2, val, timeout and val3 are ignored.
>>
>> ??? But the code reads:
>>
>> case FUTEX_TRYLOCK_PI:
>> return futex_lock_pi(uaddr, flags, 0, timeout, 1);
>>
>> which momentarily misleads one into thinking that 'timeout' is used.
>> And: it's not quite ignored, since in futex_lock_pi() a non-NULL
>> 'timeout' is unconditionally dereferenced (meaning you could get
>> an EFAULT error for a bad 'timeout' pointer).
>> I'm confused....
>
> Indeed. That's just wrong.
>
>> Maybe the above code should be
>>
>> case FUTEX_TRYLOCK_PI:
>> return futex_lock_pi(uaddr, flags, 0, NULL, 1);
>> ?
>
> Care to send a patch?
Will do.
[...]
>> ??? I don't believe this can happen. 'val3' is internally set to
>> FUTEX_BITSET_MATCH_ANY. Can you confirm?
>
> Right. We dont support that bitset stuff in requeue_pi ATM.
Thanks for the confirmation.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-16 15:17 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 145+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-16 15:17 UTC (permalink / raw)
To: Thomas Gleixner
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Carlos O'Donell,
Darren Hart, Ingo Molnar, Jakub Jelinek,
linux-man-u79uwXL29TY76Z2rM5mHXA, lkml, Davidlohr Bueso,
Arnd Bergmann, Steven Rostedt, Peter Zijlstra, Linux API,
Torvald Riegel, Roland McGrath, Darren Hart, Anton Blanchard,
Petr Baudis, Eric Dumazet, bill o gallmeister, Jan Kiszka,
Daniel Wagner, Rich Felker
Hello Thomas,
On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
> On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
>>> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>>
>> ??? I added this, but does this error not occur only for PI requeues?
>
> It's equally wrong for normal futexes. And its actually the same code
> checking for this for all variants.
I don't understand "equally wrong" in your reply, I'm sorry. Do you
mean:
a) This error text should be there for both normal and PI requeues
OR
a) This error text should be there for neither normal nor PI requeues
>>> [EDEADLOCK] The futex is already locked by the caller or the kernel
>>> detected a deadlock scenario in a nested lock chain
>>
>> Added.
>
> It's actually EDEADLK
Yes, sorry -- I should have said that I already found and fixed
that problem.
>>> [EOWNERDIED] The owner of the futex died and the kernel made the
>>> caller the new owner. The kernel sets the FUTEX_OWNER_DIED bit in the
>>> futex userspace value. Caller is responsible for cleanup
>>
>> There is no such thing as an EOWNERDIED error. I had a look
>> through the kernel source for the FUTEX_OWNER_DIED cases and didn't
>> see an obvious error associated with them. Can you clarify? (I think
>> the point is that this condition, which is described in
>> Documentation/robust-futexes.txt, is not an error as such. However, I'm
>> not yet sure of how to describe it in the man page.)
>> I will add this point as a FIXME in the new draft man page.
>
> Oops. My bad. That's not the what the kernel does. The kernel merily
> marks it in the futex itself with FUTEX_OWNER_DIED. User space needs
> to deal with that and the posix users return EOWNERDEAD (not
> EOWNERDIED], so it's not part of the futex call itself.
>
> We had discussions about returning EOWNERDEAD in that case, but then
> glibc with its sophisticated error handling prevented that ....
Okay. I'll add a FIXME to the draft page, to see if we get some good
text together to describe FUTEX_OWNER_DIED and how it is used.
>>> FUTEX_TRYLOCK_PI
>>>
>>> This operation tries to acquire the futex at uaddr. It deals with the
>>> situation where the TID value at uaddr is 0, but the FUTEX_HAS_WAITER
>>> bit is set. User space cannot handle this race free.
>>
>> Added.
>>
>>> The arguments uaddr2, val, timeout and val3 are ignored.
>>
>> ??? But the code reads:
>>
>> case FUTEX_TRYLOCK_PI:
>> return futex_lock_pi(uaddr, flags, 0, timeout, 1);
>>
>> which momentarily misleads one into thinking that 'timeout' is used.
>> And: it's not quite ignored, since in futex_lock_pi() a non-NULL
>> 'timeout' is unconditionally dereferenced (meaning you could get
>> an EFAULT error for a bad 'timeout' pointer).
>> I'm confused....
>
> Indeed. That's just wrong.
>
>> Maybe the above code should be
>>
>> case FUTEX_TRYLOCK_PI:
>> return futex_lock_pi(uaddr, flags, 0, NULL, 1);
>> ?
>
> Care to send a patch?
Will do.
[...]
>> ??? I don't believe this can happen. 'val3' is internally set to
>> FUTEX_BITSET_MATCH_ANY. Can you confirm?
>
> Right. We dont support that bitset stuff in requeue_pi ATM.
Thanks for the confirmation.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-16 15:20 ` Thomas Gleixner
0 siblings, 0 replies; 145+ messages in thread
From: Thomas Gleixner @ 2015-01-16 15:20 UTC (permalink / raw)
To: Michael Kerrisk (man-pages)
Cc: Carlos O'Donell, Darren Hart, Ingo Molnar, Jakub Jelinek,
linux-man, lkml, Davidlohr Bueso, Arnd Bergmann, Steven Rostedt,
Peter Zijlstra, Linux API, Torvald Riegel, Roland McGrath,
Darren Hart, Anton Blanchard, Petr Baudis, Eric Dumazet,
bill o gallmeister, Jan Kiszka, Daniel Wagner, Rich Felker
On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:
> Hello Thomas,
>
> On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
> > On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
> >>> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
> >>
> >> ??? I added this, but does this error not occur only for PI requeues?
> >
> > It's equally wrong for normal futexes. And its actually the same code
> > checking for this for all variants.
>
> I don't understand "equally wrong" in your reply, I'm sorry. Do you
> mean:
>
> a) This error text should be there for both normal and PI requeues
It is there for both. The requeue code has that check independent of
the requeue type (normal/pi). It never makes sense to requeue
something to itself whether normal or pi futex. We added this for PI,
because there it is harmful, but we did not special case it. So normal
futexes get the same treatment.
Thanks,
tglx
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-16 15:20 ` Thomas Gleixner
0 siblings, 0 replies; 145+ messages in thread
From: Thomas Gleixner @ 2015-01-16 15:20 UTC (permalink / raw)
To: Michael Kerrisk (man-pages)
Cc: Carlos O'Donell, Darren Hart, Ingo Molnar, Jakub Jelinek,
linux-man-u79uwXL29TY76Z2rM5mHXA, lkml, Davidlohr Bueso,
Arnd Bergmann, Steven Rostedt, Peter Zijlstra, Linux API,
Torvald Riegel, Roland McGrath, Darren Hart, Anton Blanchard,
Petr Baudis, Eric Dumazet, bill o gallmeister, Jan Kiszka,
Daniel Wagner, Rich Felker
On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:
> Hello Thomas,
>
> On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
> > On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
> >>> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
> >>
> >> ??? I added this, but does this error not occur only for PI requeues?
> >
> > It's equally wrong for normal futexes. And its actually the same code
> > checking for this for all variants.
>
> I don't understand "equally wrong" in your reply, I'm sorry. Do you
> mean:
>
> a) This error text should be there for both normal and PI requeues
It is there for both. The requeue code has that check independent of
the requeue type (normal/pi). It never makes sense to requeue
something to itself whether normal or pi futex. We added this for PI,
because there it is harmful, but we did not special case it. So normal
futexes get the same treatment.
Thanks,
tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
2015-01-16 15:20 ` Thomas Gleixner
@ 2015-01-16 20:54 ` Michael Kerrisk (man-pages)
-1 siblings, 0 replies; 145+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-16 20:54 UTC (permalink / raw)
To: Thomas Gleixner
Cc: mtk.manpages, Carlos O'Donell, Darren Hart, Ingo Molnar,
Jakub Jelinek, linux-man, lkml, Davidlohr Bueso, Arnd Bergmann,
Steven Rostedt, Peter Zijlstra, Linux API, Torvald Riegel,
Roland McGrath, Darren Hart, Anton Blanchard, Petr Baudis,
Eric Dumazet, bill o gallmeister, Jan Kiszka, Daniel Wagner,
Rich Felker
On 01/16/2015 04:20 PM, Thomas Gleixner wrote:
> On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:
>
>> Hello Thomas,
>>
>> On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
>>> On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
>>>>> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>>>>
>>>> ??? I added this, but does this error not occur only for PI requeues?
>>>
>>> It's equally wrong for normal futexes. And its actually the same code
>>> checking for this for all variants.
>>
>> I don't understand "equally wrong" in your reply, I'm sorry. Do you
>> mean:
>>
>> a) This error text should be there for both normal and PI requeues
>
> It is there for both. The requeue code has that check independent of
> the requeue type (normal/pi). It never makes sense to requeue
> something to itself whether normal or pi futex. We added this for PI,
> because there it is harmful, but we did not special case it. So normal
> futexes get the same treatment.
Hello Thomas,
Color me stupid, but I can't see this in futex_requeue(). Where is that
check that is "independent of the requeue type (normal/pi)"?
When I look through futex_requeue(), all the likely looking sources
of EINVAL are governed by a check on the 'requeue_pi' argument.
Thanks,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-16 20:54 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 145+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-16 20:54 UTC (permalink / raw)
To: Thomas Gleixner
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Carlos O'Donell,
Darren Hart, Ingo Molnar, Jakub Jelinek,
linux-man-u79uwXL29TY76Z2rM5mHXA, lkml, Davidlohr Bueso,
Arnd Bergmann, Steven Rostedt, Peter Zijlstra, Linux API,
Torvald Riegel, Roland McGrath, Darren Hart, Anton Blanchard,
Petr Baudis, Eric Dumazet, bill o gallmeister, Jan Kiszka,
Daniel Wagner, Rich Felker
On 01/16/2015 04:20 PM, Thomas Gleixner wrote:
> On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:
>
>> Hello Thomas,
>>
>> On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
>>> On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
>>>>> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>>>>
>>>> ??? I added this, but does this error not occur only for PI requeues?
>>>
>>> It's equally wrong for normal futexes. And its actually the same code
>>> checking for this for all variants.
>>
>> I don't understand "equally wrong" in your reply, I'm sorry. Do you
>> mean:
>>
>> a) This error text should be there for both normal and PI requeues
>
> It is there for both. The requeue code has that check independent of
> the requeue type (normal/pi). It never makes sense to requeue
> something to itself whether normal or pi futex. We added this for PI,
> because there it is harmful, but we did not special case it. So normal
> futexes get the same treatment.
Hello Thomas,
Color me stupid, but I can't see this in futex_requeue(). Where is that
check that is "independent of the requeue type (normal/pi)"?
When I look through futex_requeue(), all the likely looking sources
of EINVAL are governed by a check on the 'requeue_pi' argument.
Thanks,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-17 0:46 ` Darren Hart
0 siblings, 0 replies; 145+ messages in thread
From: Darren Hart @ 2015-01-17 0:46 UTC (permalink / raw)
To: Michael Kerrisk (man-pages), Thomas Gleixner
Cc: Carlos O'Donell, Ingo Molnar, Jakub Jelinek, linux-man, lkml,
Davidlohr Bueso, Arnd Bergmann, Steven Rostedt, Peter Zijlstra,
Linux API, Torvald Riegel, Roland McGrath, Darren Hart,
Anton Blanchard, Petr Baudis, Eric Dumazet, bill o gallmeister,
Jan Kiszka, Daniel Wagner, Rich Felker
On 1/16/15, 12:54 PM, "Michael Kerrisk (man-pages)"
<mtk.manpages@gmail.com> wrote:
>On 01/16/2015 04:20 PM, Thomas Gleixner wrote:
>> On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:
>>
>>> Hello Thomas,
>>>
>>> On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
>>>> On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
>>>>>> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>>>>>
>>>>> ??? I added this, but does this error not occur only for PI requeues?
>>>>
>>>> It's equally wrong for normal futexes. And its actually the same code
>>>> checking for this for all variants.
>>>
>>> I don't understand "equally wrong" in your reply, I'm sorry. Do you
>>> mean:
>>>
>>> a) This error text should be there for both normal and PI requeues
>>
>> It is there for both. The requeue code has that check independent of
>> the requeue type (normal/pi). It never makes sense to requeue
>> something to itself whether normal or pi futex. We added this for PI,
>> because there it is harmful, but we did not special case it. So normal
>> futexes get the same treatment.
>
>Hello Thomas,
>
>Color me stupid, but I can't see this in futex_requeue(). Where is that
>check that is "independent of the requeue type (normal/pi)"?
>
>When I look through futex_requeue(), all the likely looking sources
>of EINVAL are governed by a check on the 'requeue_pi' argument.
Right, in the non-PI case, I believe there are valid use cases: move to
the back of the FIFO, for example (OK, maybe the only example?). Both
tests ensuring uaddr1 != uaddr2 are under the requeue_pi conditional
block. The second compares the keys in case they are not FUTEX_PRIVATE
(uaddrs would be different, but still the same backing store).
Thomas, am I missing a test for this someplace else?
--
Darren Hart
Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-17 0:46 ` Darren Hart
0 siblings, 0 replies; 145+ messages in thread
From: Darren Hart @ 2015-01-17 0:46 UTC (permalink / raw)
To: Michael Kerrisk (man-pages), Thomas Gleixner
Cc: Carlos O'Donell, Ingo Molnar, Jakub Jelinek,
linux-man-u79uwXL29TY76Z2rM5mHXA, lkml, Davidlohr Bueso,
Arnd Bergmann, Steven Rostedt, Peter Zijlstra, Linux API,
Torvald Riegel, Roland McGrath, Darren Hart, Anton Blanchard,
Petr Baudis, Eric Dumazet, bill o gallmeister, Jan Kiszka,
Daniel Wagner, Rich Felker
On 1/16/15, 12:54 PM, "Michael Kerrisk (man-pages)"
<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>On 01/16/2015 04:20 PM, Thomas Gleixner wrote:
>> On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:
>>
>>> Hello Thomas,
>>>
>>> On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
>>>> On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
>>>>>> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>>>>>
>>>>> ??? I added this, but does this error not occur only for PI requeues?
>>>>
>>>> It's equally wrong for normal futexes. And its actually the same code
>>>> checking for this for all variants.
>>>
>>> I don't understand "equally wrong" in your reply, I'm sorry. Do you
>>> mean:
>>>
>>> a) This error text should be there for both normal and PI requeues
>>
>> It is there for both. The requeue code has that check independent of
>> the requeue type (normal/pi). It never makes sense to requeue
>> something to itself whether normal or pi futex. We added this for PI,
>> because there it is harmful, but we did not special case it. So normal
>> futexes get the same treatment.
>
>Hello Thomas,
>
>Color me stupid, but I can't see this in futex_requeue(). Where is that
>check that is "independent of the requeue type (normal/pi)"?
>
>When I look through futex_requeue(), all the likely looking sources
>of EINVAL are governed by a check on the 'requeue_pi' argument.
Right, in the non-PI case, I believe there are valid use cases: move to
the back of the FIFO, for example (OK, maybe the only example?). Both
tests ensuring uaddr1 != uaddr2 are under the requeue_pi conditional
block. The second compares the keys in case they are not FUTEX_PRIVATE
(uaddrs would be different, but still the same backing store).
Thomas, am I missing a test for this someplace else?
--
Darren Hart
Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-19 10:45 ` Thomas Gleixner
0 siblings, 0 replies; 145+ messages in thread
From: Thomas Gleixner @ 2015-01-19 10:45 UTC (permalink / raw)
To: Darren Hart
Cc: Michael Kerrisk (man-pages),
Carlos O'Donell, Ingo Molnar, Jakub Jelinek, linux-man, lkml,
Davidlohr Bueso, Arnd Bergmann, Steven Rostedt, Peter Zijlstra,
Linux API, Torvald Riegel, Roland McGrath, Darren Hart,
Anton Blanchard, Petr Baudis, Eric Dumazet, bill o gallmeister,
Jan Kiszka, Daniel Wagner, Rich Felker
On Fri, 16 Jan 2015, Darren Hart wrote:
> On 1/16/15, 12:54 PM, "Michael Kerrisk (man-pages)"
> <mtk.manpages@gmail.com> wrote:
>
> >On 01/16/2015 04:20 PM, Thomas Gleixner wrote:
> >> On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:
> >>
> >>> Hello Thomas,
> >>>
> >>> On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
> >>>> On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
> >>>>>> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
> >>>>>
> >>>>> ??? I added this, but does this error not occur only for PI requeues?
> >>>>
> >>>> It's equally wrong for normal futexes. And its actually the same code
> >>>> checking for this for all variants.
> >>>
> >>> I don't understand "equally wrong" in your reply, I'm sorry. Do you
> >>> mean:
> >>>
> >>> a) This error text should be there for both normal and PI requeues
> >>
> >> It is there for both. The requeue code has that check independent of
> >> the requeue type (normal/pi). It never makes sense to requeue
> >> something to itself whether normal or pi futex. We added this for PI,
> >> because there it is harmful, but we did not special case it. So normal
> >> futexes get the same treatment.
> >
> >Hello Thomas,
> >
> >Color me stupid, but I can't see this in futex_requeue(). Where is that
> >check that is "independent of the requeue type (normal/pi)"?
> >
> >When I look through futex_requeue(), all the likely looking sources
> >of EINVAL are governed by a check on the 'requeue_pi' argument.
>
>
> Right, in the non-PI case, I believe there are valid use cases: move to
> the back of the FIFO, for example (OK, maybe the only example?). Both
> tests ensuring uaddr1 != uaddr2 are under the requeue_pi conditional
> block. The second compares the keys in case they are not FUTEX_PRIVATE
> (uaddrs would be different, but still the same backing store).
>
> Thomas, am I missing a test for this someplace else?
No, I had a short look at the code misread it. So, yes, it's a valid
operation for the non PI case. Sorry for the confusion.
Thanks,
tglx
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-19 10:45 ` Thomas Gleixner
0 siblings, 0 replies; 145+ messages in thread
From: Thomas Gleixner @ 2015-01-19 10:45 UTC (permalink / raw)
To: Darren Hart
Cc: Michael Kerrisk (man-pages),
Carlos O'Donell, Ingo Molnar, Jakub Jelinek,
linux-man-u79uwXL29TY76Z2rM5mHXA, lkml, Davidlohr Bueso,
Arnd Bergmann, Steven Rostedt, Peter Zijlstra, Linux API,
Torvald Riegel, Roland McGrath, Darren Hart, Anton Blanchard,
Petr Baudis, Eric Dumazet, bill o gallmeister, Jan Kiszka,
Daniel Wagner, Rich Felker
On Fri, 16 Jan 2015, Darren Hart wrote:
> On 1/16/15, 12:54 PM, "Michael Kerrisk (man-pages)"
> <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> >On 01/16/2015 04:20 PM, Thomas Gleixner wrote:
> >> On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:
> >>
> >>> Hello Thomas,
> >>>
> >>> On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
> >>>> On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
> >>>>>> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
> >>>>>
> >>>>> ??? I added this, but does this error not occur only for PI requeues?
> >>>>
> >>>> It's equally wrong for normal futexes. And its actually the same code
> >>>> checking for this for all variants.
> >>>
> >>> I don't understand "equally wrong" in your reply, I'm sorry. Do you
> >>> mean:
> >>>
> >>> a) This error text should be there for both normal and PI requeues
> >>
> >> It is there for both. The requeue code has that check independent of
> >> the requeue type (normal/pi). It never makes sense to requeue
> >> something to itself whether normal or pi futex. We added this for PI,
> >> because there it is harmful, but we did not special case it. So normal
> >> futexes get the same treatment.
> >
> >Hello Thomas,
> >
> >Color me stupid, but I can't see this in futex_requeue(). Where is that
> >check that is "independent of the requeue type (normal/pi)"?
> >
> >When I look through futex_requeue(), all the likely looking sources
> >of EINVAL are governed by a check on the 'requeue_pi' argument.
>
>
> Right, in the non-PI case, I believe there are valid use cases: move to
> the back of the FIFO, for example (OK, maybe the only example?). Both
> tests ensuring uaddr1 != uaddr2 are under the requeue_pi conditional
> block. The second compares the keys in case they are not FUTEX_PRIVATE
> (uaddrs would be different, but still the same backing store).
>
> Thomas, am I missing a test for this someplace else?
No, I had a short look at the code misread it. So, yes, it's a valid
operation for the non PI case. Sorry for the confusion.
Thanks,
tglx
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
2015-01-19 10:45 ` Thomas Gleixner
(?)
@ 2015-01-19 14:07 ` Michael Kerrisk (man-pages)
-1 siblings, 0 replies; 145+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-19 14:07 UTC (permalink / raw)
To: Thomas Gleixner, Darren Hart
Cc: mtk.manpages, Carlos O'Donell, Ingo Molnar, Jakub Jelinek,
linux-man, lkml, Davidlohr Bueso, Arnd Bergmann, Steven Rostedt,
Peter Zijlstra, Linux API, Torvald Riegel, Roland McGrath,
Darren Hart, Anton Blanchard, Petr Baudis, Eric Dumazet,
bill o gallmeister, Jan Kiszka, Daniel Wagner, Rich Felker
On 01/19/2015 11:45 AM, Thomas Gleixner wrote:
> On Fri, 16 Jan 2015, Darren Hart wrote:
>> On 1/16/15, 12:54 PM, "Michael Kerrisk (man-pages)"
>> <mtk.manpages@gmail.com> wrote:
>>
>>> On 01/16/2015 04:20 PM, Thomas Gleixner wrote:
>>>> On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:
>>>>
>>>>> Hello Thomas,
>>>>>
>>>>> On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
>>>>>> On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
>>>>>>>> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>>>>>>>
>>>>>>> ??? I added this, but does this error not occur only for PI requeues?
>>>>>>
>>>>>> It's equally wrong for normal futexes. And its actually the same code
>>>>>> checking for this for all variants.
>>>>>
>>>>> I don't understand "equally wrong" in your reply, I'm sorry. Do you
>>>>> mean:
>>>>>
>>>>> a) This error text should be there for both normal and PI requeues
>>>>
>>>> It is there for both. The requeue code has that check independent of
>>>> the requeue type (normal/pi). It never makes sense to requeue
>>>> something to itself whether normal or pi futex. We added this for PI,
>>>> because there it is harmful, but we did not special case it. So normal
>>>> futexes get the same treatment.
>>>
>>> Hello Thomas,
>>>
>>> Color me stupid, but I can't see this in futex_requeue(). Where is that
>>> check that is "independent of the requeue type (normal/pi)"?
>>>
>>> When I look through futex_requeue(), all the likely looking sources
>>> of EINVAL are governed by a check on the 'requeue_pi' argument.
>>
>>
>> Right, in the non-PI case, I believe there are valid use cases: move to
>> the back of the FIFO, for example (OK, maybe the only example?). Both
>> tests ensuring uaddr1 != uaddr2 are under the requeue_pi conditional
>> block. The second compares the keys in case they are not FUTEX_PRIVATE
>> (uaddrs would be different, but still the same backing store).
>>
>> Thomas, am I missing a test for this someplace else?
>
> No, I had a short look at the code misread it. So, yes, it's a valid
> operation for the non PI case. Sorry for the confusion.
Thanks for the confirmation, Thomas. Page updated to remove
FUTEX_CMP_REQUEUE from that error case.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-23 18:19 ` Torvald Riegel
0 siblings, 0 replies; 145+ messages in thread
From: Torvald Riegel @ 2015-01-23 18:19 UTC (permalink / raw)
To: Darren Hart
Cc: Michael Kerrisk (man-pages),
Thomas Gleixner, Carlos O'Donell, Ingo Molnar, Jakub Jelinek,
linux-man, lkml, Davidlohr Bueso, Arnd Bergmann, Steven Rostedt,
Peter Zijlstra, Linux API, Darren Hart, Anton Blanchard,
Petr Baudis, Eric Dumazet, bill o gallmeister, Jan Kiszka,
Daniel Wagner, Rich Felker
On Fri, 2015-01-16 at 16:46 -0800, Darren Hart wrote:
> On 1/16/15, 12:54 PM, "Michael Kerrisk (man-pages)"
> <mtk.manpages@gmail.com> wrote:
>
> >Color me stupid, but I can't see this in futex_requeue(). Where is that
> >check that is "independent of the requeue type (normal/pi)"?
> >
> >When I look through futex_requeue(), all the likely looking sources
> >of EINVAL are governed by a check on the 'requeue_pi' argument.
>
>
> Right, in the non-PI case, I believe there are valid use cases: move to
> the back of the FIFO, for example (OK, maybe the only example?).
But we never guarantee a futex is a FIFO, or do we? If we don't, then
such a requeue could be implemented as a no-op by the kernel, which
would sort of invalidate the use case.
(And I guess we don't want to guarantee FIFO behavior for futexes.)
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-23 18:19 ` Torvald Riegel
0 siblings, 0 replies; 145+ messages in thread
From: Torvald Riegel @ 2015-01-23 18:19 UTC (permalink / raw)
To: Darren Hart
Cc: Michael Kerrisk (man-pages),
Thomas Gleixner, Carlos O'Donell, Ingo Molnar, Jakub Jelinek,
linux-man-u79uwXL29TY76Z2rM5mHXA, lkml, Davidlohr Bueso,
Arnd Bergmann, Steven Rostedt, Peter Zijlstra, Linux API,
Darren Hart, Anton Blanchard, Petr Baudis, Eric Dumazet,
bill o gallmeister, Jan Kiszka, Daniel Wagner, Rich Felker
On Fri, 2015-01-16 at 16:46 -0800, Darren Hart wrote:
> On 1/16/15, 12:54 PM, "Michael Kerrisk (man-pages)"
> <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> >Color me stupid, but I can't see this in futex_requeue(). Where is that
> >check that is "independent of the requeue type (normal/pi)"?
> >
> >When I look through futex_requeue(), all the likely looking sources
> >of EINVAL are governed by a check on the 'requeue_pi' argument.
>
>
> Right, in the non-PI case, I believe there are valid use cases: move to
> the back of the FIFO, for example (OK, maybe the only example?).
But we never guarantee a futex is a FIFO, or do we? If we don't, then
such a requeue could be implemented as a no-op by the kernel, which
would sort of invalidate the use case.
(And I guess we don't want to guarantee FIFO behavior for futexes.)
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
2015-01-23 18:19 ` Torvald Riegel
(?)
@ 2015-01-24 10:05 ` Thomas Gleixner
2015-01-24 12:58 ` Torvald Riegel
-1 siblings, 1 reply; 145+ messages in thread
From: Thomas Gleixner @ 2015-01-24 10:05 UTC (permalink / raw)
To: Torvald Riegel
Cc: Darren Hart, Michael Kerrisk (man-pages),
Carlos O'Donell, Ingo Molnar, Jakub Jelinek, linux-man, lkml,
Davidlohr Bueso, Arnd Bergmann, Steven Rostedt, Peter Zijlstra,
Linux API, Darren Hart, Anton Blanchard, Petr Baudis,
Eric Dumazet, bill o gallmeister, Jan Kiszka, Daniel Wagner,
Rich Felker
On Fri, 23 Jan 2015, Torvald Riegel wrote:
> On Fri, 2015-01-16 at 16:46 -0800, Darren Hart wrote:
> > On 1/16/15, 12:54 PM, "Michael Kerrisk (man-pages)"
> > <mtk.manpages@gmail.com> wrote:
> >
> > >Color me stupid, but I can't see this in futex_requeue(). Where is that
> > >check that is "independent of the requeue type (normal/pi)"?
> > >
> > >When I look through futex_requeue(), all the likely looking sources
> > >of EINVAL are governed by a check on the 'requeue_pi' argument.
> >
> >
> > Right, in the non-PI case, I believe there are valid use cases: move to
> > the back of the FIFO, for example (OK, maybe the only example?).
>
> But we never guarantee a futex is a FIFO, or do we? If we don't, then
> such a requeue could be implemented as a no-op by the kernel, which
> would sort of invalidate the use case.
>
> (And I guess we don't want to guarantee FIFO behavior for futexes.)
The (current) behaviour is:
real-time threads: FIFO per priority level
sched-other threads: FIFO independent of nice level
The wakeup is priority ordered. Highest priority level first.
Thanks,
tglx
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
2015-01-24 10:05 ` Thomas Gleixner
@ 2015-01-24 12:58 ` Torvald Riegel
2015-01-24 16:25 ` Thomas Gleixner
0 siblings, 1 reply; 145+ messages in thread
From: Torvald Riegel @ 2015-01-24 12:58 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Darren Hart, Michael Kerrisk (man-pages),
Carlos O'Donell, Ingo Molnar, Jakub Jelinek, linux-man, lkml,
Arnd Bergmann, Steven Rostedt, Peter Zijlstra, Linux API,
Darren Hart, Anton Blanchard, Eric Dumazet, bill o gallmeister,
Jan Kiszka, Daniel Wagner, Rich Felker
On Sat, 2015-01-24 at 11:05 +0100, Thomas Gleixner wrote:
> On Fri, 23 Jan 2015, Torvald Riegel wrote:
>
> > On Fri, 2015-01-16 at 16:46 -0800, Darren Hart wrote:
> > > On 1/16/15, 12:54 PM, "Michael Kerrisk (man-pages)"
> > > <mtk.manpages@gmail.com> wrote:
> > >
> > > >Color me stupid, but I can't see this in futex_requeue(). Where is that
> > > >check that is "independent of the requeue type (normal/pi)"?
> > > >
> > > >When I look through futex_requeue(), all the likely looking sources
> > > >of EINVAL are governed by a check on the 'requeue_pi' argument.
> > >
> > >
> > > Right, in the non-PI case, I believe there are valid use cases: move to
> > > the back of the FIFO, for example (OK, maybe the only example?).
> >
> > But we never guarantee a futex is a FIFO, or do we? If we don't, then
> > such a requeue could be implemented as a no-op by the kernel, which
> > would sort of invalidate the use case.
> >
> > (And I guess we don't want to guarantee FIFO behavior for futexes.)
>
> The (current) behaviour is:
>
> real-time threads: FIFO per priority level
> sched-other threads: FIFO independent of nice level
>
> The wakeup is priority ordered. Highest priority level first.
OK.
But, just to be clear, do I correctly understand that you do not want to
guarantee FIFO behavior in the specified futex semantics? I think there
are cases where being able to *rely* on FIFO (now and on all future
kernels) would be helpful for users (e.g., on POSIX/C++11 condvars and I
assume in other ordered-wakeup cases too) -- but at the same time, this
would constrain future futex implementations.
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-24 16:25 ` Thomas Gleixner
0 siblings, 0 replies; 145+ messages in thread
From: Thomas Gleixner @ 2015-01-24 16:25 UTC (permalink / raw)
To: Torvald Riegel
Cc: Darren Hart, Michael Kerrisk (man-pages),
Carlos O'Donell, Ingo Molnar, Jakub Jelinek, linux-man, lkml,
Arnd Bergmann, Steven Rostedt, Peter Zijlstra, Linux API,
Darren Hart, Anton Blanchard, Eric Dumazet, bill o gallmeister,
Jan Kiszka, Daniel Wagner, Rich Felker
On Sat, 24 Jan 2015, Torvald Riegel wrote:
> On Sat, 2015-01-24 at 11:05 +0100, Thomas Gleixner wrote:
> > On Fri, 23 Jan 2015, Torvald Riegel wrote:
> >
> > > On Fri, 2015-01-16 at 16:46 -0800, Darren Hart wrote:
> > > > On 1/16/15, 12:54 PM, "Michael Kerrisk (man-pages)"
> > > > <mtk.manpages@gmail.com> wrote:
> > > >
> > > > >Color me stupid, but I can't see this in futex_requeue(). Where is that
> > > > >check that is "independent of the requeue type (normal/pi)"?
> > > > >
> > > > >When I look through futex_requeue(), all the likely looking sources
> > > > >of EINVAL are governed by a check on the 'requeue_pi' argument.
> > > >
> > > >
> > > > Right, in the non-PI case, I believe there are valid use cases: move to
> > > > the back of the FIFO, for example (OK, maybe the only example?).
> > >
> > > But we never guarantee a futex is a FIFO, or do we? If we don't, then
> > > such a requeue could be implemented as a no-op by the kernel, which
> > > would sort of invalidate the use case.
> > >
> > > (And I guess we don't want to guarantee FIFO behavior for futexes.)
> >
> > The (current) behaviour is:
> >
> > real-time threads: FIFO per priority level
> > sched-other threads: FIFO independent of nice level
> >
> > The wakeup is priority ordered. Highest priority level first.
>
> OK.
>
> But, just to be clear, do I correctly understand that you do not want to
> guarantee FIFO behavior in the specified futex semantics? I think there
> are cases where being able to *rely* on FIFO (now and on all future
> kernels) would be helpful for users (e.g., on POSIX/C++11 condvars and I
> assume in other ordered-wakeup cases too) -- but at the same time, this
> would constrain future futex implementations.
It would be a constraint, but I don't think it would be a horrible
one. Though I have my doubts, that we can actually guarantee it under
all circumstances.
One thing comes to my mind right away: spurious wakeups. There is no
way that we can guarantee FIFO ordering in the context of those. And
we cannot prevent them either.
Thanks,
tglx
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-24 16:25 ` Thomas Gleixner
0 siblings, 0 replies; 145+ messages in thread
From: Thomas Gleixner @ 2015-01-24 16:25 UTC (permalink / raw)
To: Torvald Riegel
Cc: Darren Hart, Michael Kerrisk (man-pages),
Carlos O'Donell, Ingo Molnar, Jakub Jelinek,
linux-man-u79uwXL29TY76Z2rM5mHXA, lkml, Arnd Bergmann,
Steven Rostedt, Peter Zijlstra, Linux API, Darren Hart,
Anton Blanchard, Eric Dumazet, bill o gallmeister, Jan Kiszka,
Daniel Wagner, Rich Felker
On Sat, 24 Jan 2015, Torvald Riegel wrote:
> On Sat, 2015-01-24 at 11:05 +0100, Thomas Gleixner wrote:
> > On Fri, 23 Jan 2015, Torvald Riegel wrote:
> >
> > > On Fri, 2015-01-16 at 16:46 -0800, Darren Hart wrote:
> > > > On 1/16/15, 12:54 PM, "Michael Kerrisk (man-pages)"
> > > > <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > > >
> > > > >Color me stupid, but I can't see this in futex_requeue(). Where is that
> > > > >check that is "independent of the requeue type (normal/pi)"?
> > > > >
> > > > >When I look through futex_requeue(), all the likely looking sources
> > > > >of EINVAL are governed by a check on the 'requeue_pi' argument.
> > > >
> > > >
> > > > Right, in the non-PI case, I believe there are valid use cases: move to
> > > > the back of the FIFO, for example (OK, maybe the only example?).
> > >
> > > But we never guarantee a futex is a FIFO, or do we? If we don't, then
> > > such a requeue could be implemented as a no-op by the kernel, which
> > > would sort of invalidate the use case.
> > >
> > > (And I guess we don't want to guarantee FIFO behavior for futexes.)
> >
> > The (current) behaviour is:
> >
> > real-time threads: FIFO per priority level
> > sched-other threads: FIFO independent of nice level
> >
> > The wakeup is priority ordered. Highest priority level first.
>
> OK.
>
> But, just to be clear, do I correctly understand that you do not want to
> guarantee FIFO behavior in the specified futex semantics? I think there
> are cases where being able to *rely* on FIFO (now and on all future
> kernels) would be helpful for users (e.g., on POSIX/C++11 condvars and I
> assume in other ordered-wakeup cases too) -- but at the same time, this
> would constrain future futex implementations.
It would be a constraint, but I don't think it would be a horrible
one. Though I have my doubts, that we can actually guarantee it under
all circumstances.
One thing comes to my mind right away: spurious wakeups. There is no
way that we can guarantee FIFO ordering in the context of those. And
we cannot prevent them either.
Thanks,
tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
2015-01-16 20:54 ` Michael Kerrisk (man-pages)
(?)
(?)
@ 2015-01-17 0:56 ` Davidlohr Bueso
2015-01-17 1:11 ` Darren Hart
-1 siblings, 1 reply; 145+ messages in thread
From: Davidlohr Bueso @ 2015-01-17 0:56 UTC (permalink / raw)
To: Michael Kerrisk (man-pages)
Cc: Thomas Gleixner, Carlos O'Donell, Darren Hart, Ingo Molnar,
Jakub Jelinek, linux-man, lkml, Arnd Bergmann, Steven Rostedt,
Peter Zijlstra, Linux API, Torvald Riegel, Roland McGrath,
Darren Hart, Anton Blanchard, Petr Baudis, Eric Dumazet,
bill o gallmeister, Jan Kiszka, Daniel Wagner, Rich Felker
On Fri, 2015-01-16 at 21:54 +0100, Michael Kerrisk (man-pages) wrote:
> On 01/16/2015 04:20 PM, Thomas Gleixner wrote:
> > On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:
> >
> >> Hello Thomas,
> >>
> >> On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
> >>> On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
> >>>>> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
> >>>>
> >>>> ??? I added this, but does this error not occur only for PI requeues?
> >>>
> >>> It's equally wrong for normal futexes. And its actually the same code
> >>> checking for this for all variants.
> >>
> >> I don't understand "equally wrong" in your reply, I'm sorry. Do you
> >> mean:
> >>
> >> a) This error text should be there for both normal and PI requeues
> >
> > It is there for both. The requeue code has that check independent of
> > the requeue type (normal/pi). It never makes sense to requeue
> > something to itself whether normal or pi futex. We added this for PI,
> > because there it is harmful, but we did not special case it. So normal
> > futexes get the same treatment.
>
> Hello Thomas,
>
> Color me stupid, but I can't see this in futex_requeue(). Where is that
> check that is "independent of the requeue type (normal/pi)"?
>
> When I look through futex_requeue(), all the likely looking sources
> of EINVAL are governed by a check on the 'requeue_pi' argument.
Yeah, its not very straightforward, I was also scratching my head. First
we do:
if (requeue_pi) {
/*
* Requeue PI only works on two distinct uaddrs. This
* check is only valid for private futexes. See below.
*/
if (uaddr1 == uaddr2)
return -EINVAL;
Then:
/*
* The check above which compares uaddrs is not sufficient for
* shared futexes. We need to compare the keys:
*/
if (requeue_pi && match_futex(&key1, &key2)) {
ret = -EINVAL;
goto out_put_keys;
}
I wonder why we're checking for requeue_pi again, when, at least
according to the comments, it should be for shared. I guess it would
make sense depending on the mappings as the keys are the only true way
of determining if both futexes are the same, so perhaps:
if ((requeue_pi || (flags & FLAGS_SHARED)) && match_futex())
That would also align with the retry labels.
Thanks,
Davidlohr
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-17 1:11 ` Darren Hart
0 siblings, 0 replies; 145+ messages in thread
From: Darren Hart @ 2015-01-17 1:11 UTC (permalink / raw)
To: Davidlohr Bueso, Michael Kerrisk (man-pages)
Cc: Thomas Gleixner, Carlos O'Donell, Ingo Molnar, Jakub Jelinek,
linux-man, lkml, Arnd Bergmann, Steven Rostedt, Peter Zijlstra,
Linux API, Torvald Riegel, Roland McGrath, Darren Hart,
Anton Blanchard, Petr Baudis, Eric Dumazet, bill o gallmeister,
Jan Kiszka, Daniel Wagner, Rich Felker
On 1/16/15, 4:56 PM, "Davidlohr Bueso" <dave@stgolabs.net> wrote:
>On Fri, 2015-01-16 at 21:54 +0100, Michael Kerrisk (man-pages) wrote:
>> On 01/16/2015 04:20 PM, Thomas Gleixner wrote:
>> > On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:
>> >
>> >> Hello Thomas,
>> >>
>> >> On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
>> >>> On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
>> >>>>> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>> >>>>
>> >>>> ??? I added this, but does this error not occur only for PI
>>requeues?
>> >>>
>> >>> It's equally wrong for normal futexes. And its actually the same
>>code
>> >>> checking for this for all variants.
>> >>
>> >> I don't understand "equally wrong" in your reply, I'm sorry. Do you
>> >> mean:
>> >>
>> >> a) This error text should be there for both normal and PI requeues
>> >
>> > It is there for both. The requeue code has that check independent of
>> > the requeue type (normal/pi). It never makes sense to requeue
>> > something to itself whether normal or pi futex. We added this for PI,
>> > because there it is harmful, but we did not special case it. So normal
>> > futexes get the same treatment.
>>
>> Hello Thomas,
>>
>> Color me stupid, but I can't see this in futex_requeue(). Where is that
>> check that is "independent of the requeue type (normal/pi)"?
>>
>> When I look through futex_requeue(), all the likely looking sources
>> of EINVAL are governed by a check on the 'requeue_pi' argument.
>
>Yeah, its not very straightforward, I was also scratching my head. First
>we do:
>
> if (requeue_pi) {
> /*
> * Requeue PI only works on two distinct uaddrs. This
> * check is only valid for private futexes. See below.
> */
> if (uaddr1 == uaddr2)
> return -EINVAL;
We check here to abort as early as possible for the usual security reasons.
>
>Then:
>
> /*
> * The check above which compares uaddrs is not sufficient for
> * shared futexes. We need to compare the keys:
> */
> if (requeue_pi && match_futex(&key1, &key2)) {
> ret = -EINVAL;
> goto out_put_keys;
> }
>
>I wonder why we're checking for requeue_pi again, when, at least
>according to the comments, it should be for shared. I guess it would
>make sense depending on the mappings as the keys are the only true way
>of determining if both futexes are the same, so perhaps:
>
> if ((requeue_pi || (flags & FLAGS_SHARED)) && match_futex())
No, the rule only applies to requeue_pi. This check is the for-sure
version of the uaddr comparison above. We could add if flags &
FLAGS_SHARED, but I'm not sure it's worth it.
--
Darren Hart
Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-17 1:11 ` Darren Hart
0 siblings, 0 replies; 145+ messages in thread
From: Darren Hart @ 2015-01-17 1:11 UTC (permalink / raw)
To: Davidlohr Bueso, Michael Kerrisk (man-pages)
Cc: Thomas Gleixner, Carlos O'Donell, Ingo Molnar, Jakub Jelinek,
linux-man-u79uwXL29TY76Z2rM5mHXA, lkml, Arnd Bergmann,
Steven Rostedt, Peter Zijlstra, Linux API, Torvald Riegel,
Roland McGrath, Darren Hart, Anton Blanchard, Petr Baudis,
Eric Dumazet, bill o gallmeister, Jan Kiszka, Daniel Wagner,
Rich Felker
On 1/16/15, 4:56 PM, "Davidlohr Bueso" <dave-h16yJtLeMjHk1uMJSBkQmQ@public.gmane.org> wrote:
>On Fri, 2015-01-16 at 21:54 +0100, Michael Kerrisk (man-pages) wrote:
>> On 01/16/2015 04:20 PM, Thomas Gleixner wrote:
>> > On Fri, 16 Jan 2015, Michael Kerrisk (man-pages) wrote:
>> >
>> >> Hello Thomas,
>> >>
>> >> On 01/15/2015 11:23 PM, Thomas Gleixner wrote:
>> >>> On Thu, 15 Jan 2015, Michael Kerrisk (man-pages) wrote:
>> >>>>> [EINVAL] uaddr equal uaddr2. Requeue to same futex.
>> >>>>
>> >>>> ??? I added this, but does this error not occur only for PI
>>requeues?
>> >>>
>> >>> It's equally wrong for normal futexes. And its actually the same
>>code
>> >>> checking for this for all variants.
>> >>
>> >> I don't understand "equally wrong" in your reply, I'm sorry. Do you
>> >> mean:
>> >>
>> >> a) This error text should be there for both normal and PI requeues
>> >
>> > It is there for both. The requeue code has that check independent of
>> > the requeue type (normal/pi). It never makes sense to requeue
>> > something to itself whether normal or pi futex. We added this for PI,
>> > because there it is harmful, but we did not special case it. So normal
>> > futexes get the same treatment.
>>
>> Hello Thomas,
>>
>> Color me stupid, but I can't see this in futex_requeue(). Where is that
>> check that is "independent of the requeue type (normal/pi)"?
>>
>> When I look through futex_requeue(), all the likely looking sources
>> of EINVAL are governed by a check on the 'requeue_pi' argument.
>
>Yeah, its not very straightforward, I was also scratching my head. First
>we do:
>
> if (requeue_pi) {
> /*
> * Requeue PI only works on two distinct uaddrs. This
> * check is only valid for private futexes. See below.
> */
> if (uaddr1 == uaddr2)
> return -EINVAL;
We check here to abort as early as possible for the usual security reasons.
>
>Then:
>
> /*
> * The check above which compares uaddrs is not sufficient for
> * shared futexes. We need to compare the keys:
> */
> if (requeue_pi && match_futex(&key1, &key2)) {
> ret = -EINVAL;
> goto out_put_keys;
> }
>
>I wonder why we're checking for requeue_pi again, when, at least
>according to the comments, it should be for shared. I guess it would
>make sense depending on the mappings as the keys are the only true way
>of determining if both futexes are the same, so perhaps:
>
> if ((requeue_pi || (flags & FLAGS_SHARED)) && match_futex())
No, the rule only applies to requeue_pi. This check is the for-sure
version of the uaddr comparison above. We could add if flags &
FLAGS_SHARED, but I'm not sure it's worth it.
--
Darren Hart
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-23 18:29 ` Torvald Riegel
0 siblings, 0 replies; 145+ messages in thread
From: Torvald Riegel @ 2015-01-23 18:29 UTC (permalink / raw)
To: Michael Kerrisk (man-pages)
Cc: Thomas Gleixner, Carlos O'Donell, Darren Hart, Ingo Molnar,
Jakub Jelinek, linux-man, lkml, Davidlohr Bueso, Arnd Bergmann,
Steven Rostedt, Peter Zijlstra, Linux API, Darren Hart,
Anton Blanchard, Petr Baudis, Eric Dumazet, bill o gallmeister,
Jan Kiszka, Daniel Wagner, Rich Felker
On Thu, 2015-01-15 at 16:10 +0100, Michael Kerrisk (man-pages) wrote:
> [Adding a few people to CC that have expressed interest in the
> progress of the updates of this page, or who may be able to
> provide review feedback. Eventually, you'll all get CCed on
> the new draft of the page.]
>
> Hello Thomas,
>
> On 05/15/2014 04:14 PM, Thomas Gleixner wrote:
> > On Thu, 15 May 2014, Michael Kerrisk (man-pages) wrote:
> >> And that universe would love to have your documentation of
> >> FUTEX_WAKE_BITSET and FUTEX_WAIT_BITSET ;-),
> >
> > I give you almost the full treatment, but I leave REQUEUE_PI to
> > Darren and FUTEX_WAKE_OP to Jakub. :)
>
> Thank you for the great effort you put into compiling the
> text below, and apologies for my long delay in following up.
>
> I've integrated almost all of your suggestions into the
> manual page. I will shortly send out a new draft of the
> page that contains various FIXMEs for points that remain
> unclear.
Michael, thanks for working on the draft! I'll review the draft closely
once you've sent it (or have I missed it?).
There are a few things that I'd like to see covered.
First, we should discuss that users, until they control all code in the
respective process, need to expect futexes to be affected by spurious
futex_wake calls; see https://lkml.org/lkml/2014/11/27/472 for
background and Linus' choice (AFAIU) to just document this.
So, for example, a return code of 0 for FUTEX_WAIT can mean either being
woken up by a FUTEX_WAKE intended for this futex, or a stale one
intended for another futex used by, for example, glibc internally.
(Note that as explained in this thread, this isn't just a glibc
artifact, but a result of the general futex design mixed with
destruction requirements for mutexes and other constructs in C++11 and
POSIX.)
It might also be necessary to further consider this when documenting the
errors, because it does affect how to handle them. See this for a glibc
perspective:
https://sourceware.org/ml/libc-alpha/2014-09/msg00381.html
Second, the current documentation for EINTR is that it can happen due to
receiving a signal *or* due to a spurious wake-up. This is difficult to
handle when implementing POSIX semaphores, because they require that
EINTR is returned from SEM_WAIT if and only if the interruption was due
to a signal. Thus, if FUTEX_WAIT returns EINTR, the semaphore
implementation can't return EINTR from sem_wait; see this for more
comments, including some discussion why use cases relying on the POSIX
requirement around EINTR are borderline timing-dependent:
https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/sem_waitcommon.c;h=96848d7ac5b6f8f1f3099b422deacc09323c796a;hb=HEAD#l282
Others have commented that aio_suspend has a similar issue; if EINTR
wouldn't in fact be returned spuriously, the POSIX-implementation-side
would get easier.
Third, I think it would be useful to -- somewhere -- explain which
behavior the futex operations would have conceptually when expressed by
C11 code. We currently say that they wake up, sleep, etc, and which
values they return. But we never say how to properly synchronize with
them on the userspace side. The C11 memory model is probably the best
model to use on the userspace side, so that's why I'm arguing for this.
Basically, I think we need to (1) tell people that they should use
memory_order_relaxed accesses to the futex variable (ie, the memory
location associated with the whole futex construct on the kernel side --
or do we have another name for this?), and (2) give some conceptual
guarantees for the kernel-side synchronization so that one use this to
derive how to use them correctly in userspace.
The man pages might not be the right place for this, and maybe we just
need a revision of "Futexes are tricky". If you have other suggestions
for where to document this, or on the content, let me know. (I'm also
willing to spend time on this :) ).
Torvald
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-23 18:29 ` Torvald Riegel
0 siblings, 0 replies; 145+ messages in thread
From: Torvald Riegel @ 2015-01-23 18:29 UTC (permalink / raw)
To: Michael Kerrisk (man-pages)
Cc: Thomas Gleixner, Carlos O'Donell, Darren Hart, Ingo Molnar,
Jakub Jelinek, linux-man-u79uwXL29TY76Z2rM5mHXA, lkml,
Davidlohr Bueso, Arnd Bergmann, Steven Rostedt, Peter Zijlstra,
Linux API, Darren Hart, Anton Blanchard, Petr Baudis,
Eric Dumazet, bill o gallmeister, Jan Kiszka, Daniel Wagner,
Rich Felker
On Thu, 2015-01-15 at 16:10 +0100, Michael Kerrisk (man-pages) wrote:
> [Adding a few people to CC that have expressed interest in the
> progress of the updates of this page, or who may be able to
> provide review feedback. Eventually, you'll all get CCed on
> the new draft of the page.]
>
> Hello Thomas,
>
> On 05/15/2014 04:14 PM, Thomas Gleixner wrote:
> > On Thu, 15 May 2014, Michael Kerrisk (man-pages) wrote:
> >> And that universe would love to have your documentation of
> >> FUTEX_WAKE_BITSET and FUTEX_WAIT_BITSET ;-),
> >
> > I give you almost the full treatment, but I leave REQUEUE_PI to
> > Darren and FUTEX_WAKE_OP to Jakub. :)
>
> Thank you for the great effort you put into compiling the
> text below, and apologies for my long delay in following up.
>
> I've integrated almost all of your suggestions into the
> manual page. I will shortly send out a new draft of the
> page that contains various FIXMEs for points that remain
> unclear.
Michael, thanks for working on the draft! I'll review the draft closely
once you've sent it (or have I missed it?).
There are a few things that I'd like to see covered.
First, we should discuss that users, until they control all code in the
respective process, need to expect futexes to be affected by spurious
futex_wake calls; see https://lkml.org/lkml/2014/11/27/472 for
background and Linus' choice (AFAIU) to just document this.
So, for example, a return code of 0 for FUTEX_WAIT can mean either being
woken up by a FUTEX_WAKE intended for this futex, or a stale one
intended for another futex used by, for example, glibc internally.
(Note that as explained in this thread, this isn't just a glibc
artifact, but a result of the general futex design mixed with
destruction requirements for mutexes and other constructs in C++11 and
POSIX.)
It might also be necessary to further consider this when documenting the
errors, because it does affect how to handle them. See this for a glibc
perspective:
https://sourceware.org/ml/libc-alpha/2014-09/msg00381.html
Second, the current documentation for EINTR is that it can happen due to
receiving a signal *or* due to a spurious wake-up. This is difficult to
handle when implementing POSIX semaphores, because they require that
EINTR is returned from SEM_WAIT if and only if the interruption was due
to a signal. Thus, if FUTEX_WAIT returns EINTR, the semaphore
implementation can't return EINTR from sem_wait; see this for more
comments, including some discussion why use cases relying on the POSIX
requirement around EINTR are borderline timing-dependent:
https://sourceware.org/git/?p=glibc.git;a=blob;f=nptl/sem_waitcommon.c;h=96848d7ac5b6f8f1f3099b422deacc09323c796a;hb=HEAD#l282
Others have commented that aio_suspend has a similar issue; if EINTR
wouldn't in fact be returned spuriously, the POSIX-implementation-side
would get easier.
Third, I think it would be useful to -- somewhere -- explain which
behavior the futex operations would have conceptually when expressed by
C11 code. We currently say that they wake up, sleep, etc, and which
values they return. But we never say how to properly synchronize with
them on the userspace side. The C11 memory model is probably the best
model to use on the userspace side, so that's why I'm arguing for this.
Basically, I think we need to (1) tell people that they should use
memory_order_relaxed accesses to the futex variable (ie, the memory
location associated with the whole futex construct on the kernel side --
or do we have another name for this?), and (2) give some conceptual
guarantees for the kernel-side synchronization so that one use this to
derive how to use them correctly in userspace.
The man pages might not be the right place for this, and maybe we just
need a revision of "Futexes are tricky". If you have other suggestions
for where to document this, or on the content, let me know. (I'm also
willing to spend time on this :) ).
Torvald
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-24 11:35 ` Thomas Gleixner
0 siblings, 0 replies; 145+ messages in thread
From: Thomas Gleixner @ 2015-01-24 11:35 UTC (permalink / raw)
To: Torvald Riegel
Cc: Michael Kerrisk (man-pages),
Carlos O'Donell, Darren Hart, Ingo Molnar, Jakub Jelinek,
linux-man, lkml, Davidlohr Bueso, Arnd Bergmann, Steven Rostedt,
Peter Zijlstra, Linux API, Darren Hart, Anton Blanchard,
Petr Baudis, Eric Dumazet, bill o gallmeister, Jan Kiszka,
Daniel Wagner, Rich Felker
On Fri, 23 Jan 2015, Torvald Riegel wrote:
> Second, the current documentation for EINTR is that it can happen due to
> receiving a signal *or* due to a spurious wake-up. This is difficult to
I don't think so. I went through all callchains again with a fine comb.
futex_wait()
retry:
ret = futex_wait_setup();
if (ret) {
/*
* Possible return codes related to uaddr:
* -EINVAL: Not u32 aligned uaddr
* -EFAULT: No mapping, no RW
* -ENOMEM: Paging ran out of memory
* -EHWPOISON: Memory hardware error
*
* Others:
* -EWOULDBLOCK: value at uaddr has changed
*/
return ret;
}
futex_wait_queue_me();
if (woken by futex_wake/requeue)
return 0;
if (timeout)
return -ETIMEOUT;
/*
* Spurious wakeup, i.e. no signal pending
*/
if (!signal_pending())
goto retry;
/* Handled in the low level syscall exit code */
if (!timed_wait)
return -ERESTARTSYS;
else
return -ERESTARTBLOCK;
Now in the low level syscall exit we try to deliver the signal
if (!signal_delivered())
restart_syscall();
if (sigaction->flags & SA_RESTART)
restart_syscall();
ret_to_userspace -EINTR;
So we should never see -EINTR in the case of a spurious wakeup here.
But, here is the not so good news:
I did some archaeology. The restart handling of futex_wait() got
introduced in kernel 2.6.22, so anything older than that will have
the spurious -EINTR issues.
futex_wait_pi() always had the restart handling and glibc folks back
then (2006) requested that it should never return -EINTR, so it
unconditionally restarts the syscall whether a signal had been
delivered or not.
So kernels >= 2.6.22 should never return -EINTR spuriously. If that
happens it's a bug and needs to be fixed.
> Third, I think it would be useful to -- somewhere -- explain which
> behavior the futex operations would have conceptually when expressed by
> C11 code. We currently say that they wake up, sleep, etc, and which
> values they return. But we never say how to properly synchronize with
> them on the userspace side. The C11 memory model is probably the best
> model to use on the userspace side, so that's why I'm arguing for this.
> Basically, I think we need to (1) tell people that they should use
> memory_order_relaxed accesses to the futex variable (ie, the memory
> location associated with the whole futex construct on the kernel side --
> or do we have another name for this?), and (2) give some conceptual
> guarantees for the kernel-side synchronization so that one use this to
> derive how to use them correctly in userspace.
>
> The man pages might not be the right place for this, and maybe we just
> need a revision of "Futexes are tricky". If you have other suggestions
> for where to document this, or on the content, let me know. (I'm also
> willing to spend time on this :) ).
The current futex code in the kernel has gained documentation about
the required memory ordering recently. That should be a good starting
point.
Thanks,
tglx
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-24 11:35 ` Thomas Gleixner
0 siblings, 0 replies; 145+ messages in thread
From: Thomas Gleixner @ 2015-01-24 11:35 UTC (permalink / raw)
To: Torvald Riegel
Cc: Michael Kerrisk (man-pages),
Carlos O'Donell, Darren Hart, Ingo Molnar, Jakub Jelinek,
linux-man-u79uwXL29TY76Z2rM5mHXA, lkml, Davidlohr Bueso,
Arnd Bergmann, Steven Rostedt, Peter Zijlstra, Linux API,
Darren Hart, Anton Blanchard, Petr Baudis, Eric Dumazet,
bill o gallmeister, Jan Kiszka, Daniel Wagner, Rich Felker
On Fri, 23 Jan 2015, Torvald Riegel wrote:
> Second, the current documentation for EINTR is that it can happen due to
> receiving a signal *or* due to a spurious wake-up. This is difficult to
I don't think so. I went through all callchains again with a fine comb.
futex_wait()
retry:
ret = futex_wait_setup();
if (ret) {
/*
* Possible return codes related to uaddr:
* -EINVAL: Not u32 aligned uaddr
* -EFAULT: No mapping, no RW
* -ENOMEM: Paging ran out of memory
* -EHWPOISON: Memory hardware error
*
* Others:
* -EWOULDBLOCK: value at uaddr has changed
*/
return ret;
}
futex_wait_queue_me();
if (woken by futex_wake/requeue)
return 0;
if (timeout)
return -ETIMEOUT;
/*
* Spurious wakeup, i.e. no signal pending
*/
if (!signal_pending())
goto retry;
/* Handled in the low level syscall exit code */
if (!timed_wait)
return -ERESTARTSYS;
else
return -ERESTARTBLOCK;
Now in the low level syscall exit we try to deliver the signal
if (!signal_delivered())
restart_syscall();
if (sigaction->flags & SA_RESTART)
restart_syscall();
ret_to_userspace -EINTR;
So we should never see -EINTR in the case of a spurious wakeup here.
But, here is the not so good news:
I did some archaeology. The restart handling of futex_wait() got
introduced in kernel 2.6.22, so anything older than that will have
the spurious -EINTR issues.
futex_wait_pi() always had the restart handling and glibc folks back
then (2006) requested that it should never return -EINTR, so it
unconditionally restarts the syscall whether a signal had been
delivered or not.
So kernels >= 2.6.22 should never return -EINTR spuriously. If that
happens it's a bug and needs to be fixed.
> Third, I think it would be useful to -- somewhere -- explain which
> behavior the futex operations would have conceptually when expressed by
> C11 code. We currently say that they wake up, sleep, etc, and which
> values they return. But we never say how to properly synchronize with
> them on the userspace side. The C11 memory model is probably the best
> model to use on the userspace side, so that's why I'm arguing for this.
> Basically, I think we need to (1) tell people that they should use
> memory_order_relaxed accesses to the futex variable (ie, the memory
> location associated with the whole futex construct on the kernel side --
> or do we have another name for this?), and (2) give some conceptual
> guarantees for the kernel-side synchronization so that one use this to
> derive how to use them correctly in userspace.
>
> The man pages might not be the right place for this, and maybe we just
> need a revision of "Futexes are tricky". If you have other suggestions
> for where to document this, or on the content, let me know. (I'm also
> willing to spend time on this :) ).
The current futex code in the kernel has gained documentation about
the required memory ordering recently. That should be a good starting
point.
Thanks,
tglx
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
2015-01-24 11:35 ` Thomas Gleixner
@ 2015-01-24 13:12 ` Torvald Riegel
-1 siblings, 0 replies; 145+ messages in thread
From: Torvald Riegel @ 2015-01-24 13:12 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Michael Kerrisk (man-pages),
Carlos O'Donell, Darren Hart, Ingo Molnar, Jakub Jelinek,
linux-man, lkml, Arnd Bergmann, Steven Rostedt, Peter Zijlstra,
Linux API, Darren Hart, Anton Blanchard, Eric Dumazet,
bill o gallmeister, Jan Kiszka, Daniel Wagner, Rich Felker
On Sat, 2015-01-24 at 12:35 +0100, Thomas Gleixner wrote:
> So we should never see -EINTR in the case of a spurious wakeup here.
>
> But, here is the not so good news:
>
> I did some archaeology. The restart handling of futex_wait() got
> introduced in kernel 2.6.22, so anything older than that will have
> the spurious -EINTR issues.
>
> futex_wait_pi() always had the restart handling and glibc folks back
> then (2006) requested that it should never return -EINTR, so it
> unconditionally restarts the syscall whether a signal had been
> delivered or not.
>
> So kernels >= 2.6.22 should never return -EINTR spuriously. If that
> happens it's a bug and needs to be fixed.
Thanks for looking into this.
Michael, can you include the above in the documentation please? This is
useful for userspace code like glibc that expects a minimum kernel
version. Thanks!
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-24 13:12 ` Torvald Riegel
0 siblings, 0 replies; 145+ messages in thread
From: Torvald Riegel @ 2015-01-24 13:12 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Michael Kerrisk (man-pages),
Carlos O'Donell, Darren Hart, Ingo Molnar, Jakub Jelinek,
linux-man-u79uwXL29TY76Z2rM5mHXA, lkml, Arnd Bergmann,
Steven Rostedt, Peter Zijlstra, Linux API, Darren Hart,
Anton Blanchard, Eric Dumazet, bill o gallmeister, Jan Kiszka,
Daniel Wagner, Rich Felker
On Sat, 2015-01-24 at 12:35 +0100, Thomas Gleixner wrote:
> So we should never see -EINTR in the case of a spurious wakeup here.
>
> But, here is the not so good news:
>
> I did some archaeology. The restart handling of futex_wait() got
> introduced in kernel 2.6.22, so anything older than that will have
> the spurious -EINTR issues.
>
> futex_wait_pi() always had the restart handling and glibc folks back
> then (2006) requested that it should never return -EINTR, so it
> unconditionally restarts the syscall whether a signal had been
> delivered or not.
>
> So kernels >= 2.6.22 should never return -EINTR spuriously. If that
> happens it's a bug and needs to be fixed.
Thanks for looking into this.
Michael, can you include the above in the documentation please? This is
useful for userspace code like glibc that expects a minimum kernel
version. Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-27 7:48 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 145+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-27 7:48 UTC (permalink / raw)
To: Torvald Riegel, Thomas Gleixner
Cc: mtk.manpages, Carlos O'Donell, Darren Hart, Ingo Molnar,
Jakub Jelinek, linux-man, lkml, Arnd Bergmann, Steven Rostedt,
Peter Zijlstra, Linux API, Darren Hart, Anton Blanchard,
Eric Dumazet, bill o gallmeister, Jan Kiszka, Daniel Wagner,
Rich Felker
Hello Torvald,
On 01/24/2015 02:12 PM, Torvald Riegel wrote:
> On Sat, 2015-01-24 at 12:35 +0100, Thomas Gleixner wrote:
>> So we should never see -EINTR in the case of a spurious wakeup here.
>>
>> But, here is the not so good news:
>>
>> I did some archaeology. The restart handling of futex_wait() got
>> introduced in kernel 2.6.22, so anything older than that will have
>> the spurious -EINTR issues.
>>
>> futex_wait_pi() always had the restart handling and glibc folks back
>> then (2006) requested that it should never return -EINTR, so it
>> unconditionally restarts the syscall whether a signal had been
>> delivered or not.
>>
>> So kernels >= 2.6.22 should never return -EINTR spuriously. If that
>> happens it's a bug and needs to be fixed.
>
> Thanks for looking into this.
>
> Michael, can you include the above in the documentation please? This is
> useful for userspace code like glibc that expects a minimum kernel
> version. Thanks!
I've added some text to my draft to cover this point.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
@ 2015-01-27 7:48 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 145+ messages in thread
From: Michael Kerrisk (man-pages) @ 2015-01-27 7:48 UTC (permalink / raw)
To: Torvald Riegel, Thomas Gleixner
Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, Carlos O'Donell,
Darren Hart, Ingo Molnar, Jakub Jelinek,
linux-man-u79uwXL29TY76Z2rM5mHXA, lkml, Arnd Bergmann,
Steven Rostedt, Peter Zijlstra, Linux API, Darren Hart,
Anton Blanchard, Eric Dumazet, bill o gallmeister, Jan Kiszka,
Daniel Wagner, Rich Felker
Hello Torvald,
On 01/24/2015 02:12 PM, Torvald Riegel wrote:
> On Sat, 2015-01-24 at 12:35 +0100, Thomas Gleixner wrote:
>> So we should never see -EINTR in the case of a spurious wakeup here.
>>
>> But, here is the not so good news:
>>
>> I did some archaeology. The restart handling of futex_wait() got
>> introduced in kernel 2.6.22, so anything older than that will have
>> the spurious -EINTR issues.
>>
>> futex_wait_pi() always had the restart handling and glibc folks back
>> then (2006) requested that it should never return -EINTR, so it
>> unconditionally restarts the syscall whether a signal had been
>> delivered or not.
>>
>> So kernels >= 2.6.22 should never return -EINTR spuriously. If that
>> happens it's a bug and needs to be fixed.
>
> Thanks for looking into this.
>
> Michael, can you include the above in the documentation please? This is
> useful for userspace code like glibc that expects a minimum kernel
> version. Thanks!
I've added some text to my draft to cover this point.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 145+ messages in thread
* Re: futex(2) man page update help request
2015-01-24 11:35 ` Thomas Gleixner
(?)
(?)
@ 2015-02-05 19:57 ` Darren Hart
-1 siblings, 0 replies; 145+ messages in thread
From: Darren Hart @ 2015-02-05 19:57 UTC (permalink / raw)
To: Thomas Gleixner, Torvald Riegel
Cc: Michael Kerrisk (man-pages),
Carlos O'Donell, Ingo Molnar, Jakub Jelinek, linux-man, lkml,
Davidlohr Bueso, Arnd Bergmann, Steven Rostedt, Peter Zijlstra,
Linux API, Darren Hart, Anton Blanchard, Petr Baudis,
Eric Dumazet, bill o gallmeister, Jan Kiszka, Daniel Wagner,
Rich Felker
On 1/24/15, 3:35 AM, "Thomas Gleixner" <tglx@linutronix.de> wrote:
>On Fri, 23 Jan 2015, Torvald Riegel wrote:
>> Second, the current documentation for EINTR is that it can happen due to
>> receiving a signal *or* due to a spurious wake-up. This is difficult to
>
>I don't think so. I went through all callchains again with a fine comb.
>
>futex_wait()
>retry:
> ret = futex_wait_setup();
> if (ret) {
> /*
> * Possible return codes related to uaddr:
> * -EINVAL: Not u32 aligned uaddr
> * -EFAULT: No mapping, no RW
> * -ENOMEM: Paging ran out of memory
> * -EHWPOISON: Memory hardware error
> *
> * Others:
> * -EWOULDBLOCK: value at uaddr has changed
> */
> return ret;
> }
>
> futex_wait_queue_me();
>
> if (woken by futex_wake/requeue)
> return 0;
>
> if (timeout)
> return -ETIMEOUT;
>
> /*
> * Spurious wakeup, i.e. no signal pending
> */
> if (!signal_pending())
> goto retry;
>
> /* Handled in the low level syscall exit code */
> if (!timed_wait)
> return -ERESTARTSYS;
> else
> return -ERESTARTBLOCK;
>
>Now in the low level syscall exit we try to deliver the signal
>
> if (!signal_delivered())
> restart_syscall();
>
> if (sigaction->flags & SA_RESTART)
> restart_syscall();
>
> ret_to_userspace -EINTR;
>
>So we should never see -EINTR in the case of a spurious wakeup here.
>
>But, here is the not so good news:
>
> I did some archaeology. The restart handling of futex_wait() got
> introduced in kernel 2.6.22, so anything older than that will have
> the spurious -EINTR issues.
>
>futex_wait_pi() always had the restart handling and glibc folks back
>then (2006) requested that it should never return -EINTR, so it
>unconditionally restarts the syscall whether a signal had been
>delivered or not.
>
>So kernels >= 2.6.22 should never return -EINTR spuriously. If that
>happens it's a bug and needs to be fixed.
>
>> Third, I think it would be useful to -- somewhere -- explain which
>> behavior the futex operations would have conceptually when expressed by
>> C11 code. We currently say that they wake up, sleep, etc, and which
>> values they return. But we never say how to properly synchronize with
>> them on the userspace side. The C11 memory model is probably the best
>> model to use on the userspace side, so that's why I'm arguing for this.
>> Basically, I think we need to (1) tell people that they should use
>> memory_order_relaxed accesses to the futex variable (ie, the memory
>> location associated with the whole futex construct on the kernel side --
>> or do we have another name for this?), and (2) give some conceptual
>> guarantees for the kernel-side synchronization so that one use this to
>> derive how to use them correctly in userspace.
>>
>> The man pages might not be the right place for this, and maybe we just
>> need a revision of "Futexes are tricky". If you have other suggestions
>> for where to document this, or on the content, let me know. (I'm also
>> willing to spend time on this :) ).
>
>The current futex code in the kernel has gained documentation about
>the required memory ordering recently. That should be a good starting
>point.
Lots of paging in here... If I recall correctly there was something about
not being able to return to userspace in these events without owning the
lock (waiters but no owner, breaking pi chains and promotion, etc.), so
restarting was the preferable path.
--
Darren Hart
Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 145+ messages in thread