linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrey Semashev <andrey.semashev@gmail.com>
To: LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 00/13] Add futex2 syscalls
Date: Tue, 16 Feb 2021 15:13:09 +0300	[thread overview]
Message-ID: <9557a62c-ab64-495b-36bd-6d8db426ddce@gmail.com> (raw)

Sorry for posting out-of-tree, I just subscribed to the list to reply to 
a post that was already sent.

André Almeida wrote:

> ** "And what's about FUTEX_64?"
> 
>  By supporting 64 bit futexes, the kernel structure for futex would
>  need to have a 64 bit field for the value, and that could defeat one of
>  the purposes of having different sized futexes in the first place:
>  supporting smaller ones to decrease memory usage. This might be
>  something that could be disabled for 32bit archs (and even for
>  CONFIG_BASE_SMALL).
> 
>  Which use case would benefit for FUTEX_64? Does it worth the trade-offs?

I strongly believe that 64-bit futex must be supported. I have a few use 
cases in mind:

1. Cooperative robust futexes.

I have a real-world case where multiple processes need to communicate 
via shared memory and synchronize via a futex. The processes run under a 
supervisor parent process, which can detect termination of its children 
and also has access to the shared memory. In order to make the 
communication more or less safe in face of one of the child process 
crashing, the futex currently contains a portion of pid of the process 
that locked it. The parent supervisor is then able to tell that the 
crashed child was holding the futex locked and then marke the futex as 
"broken" and notify any other threads blocked on it.

Given that pid can be up to 32-bits in size, and we also need some bits 
in the futex to implement its logic (i.e. at least "locked" and "broken" 
bits, some bits for the ABA counter, etc.), the pid can be truncated and 
the above logic may be broken. In the real application, only 15 bits are 
left for the pid, which is already less than the actual pid range on the 
system.

Note: We're not using the proper pthread robust mutexes because we also 
need a condition variable, and condition variables contain a non-robust 
mutex internally, which basically nullifies robustness. One could argue 
to fix pthread instead, but I view that as a more difficult task as 
pthread interface is standardized. We would rather use futex directly 
anyway because of more flexibility and less performance overhead.

2. Parity with WaitOnAddress[1] on Windows.

WaitOnAddress is explicitly documented to support 8-byte states, and its 
interface allows for further extension. I'm not a Wine developer, but I 
would guess that having a 8-byte futex support to match would be useful 
there.

Besides Wine, having a 64-bit futex would be important for 
std::atomic[2] and Boost.Atomic in C++, which support waiting and 
notifying operations (for std::atomic, introduced in C++20). Waiting and 
notifying operations are normally implemented using futex API on Linux 
and WaitOnAddress on Windows, and can be emulated with a process-wide 
global mutex pool if such API is unavailable for a given atomic size on 
the target platform. This means that 64-bit atomics on Linux currently 
must be implemented with a lock and therefore cannot be used in 
process-shared memory, while there is no such limitation on Windows.


I'm not sure how much memory is saved by not having 64-bit state in the 
kernel futex structures, but this doesn't look like a huge deal on 
modern systems - server, desktop or mobile. It may make sense for 
extremely low memory embedded systems, and for those targets the support 
may be disabled with a switch. In fact, such systems would probably not 
support 64-bit atomics anyway. For any other targets I would prefer 
64-bit futex to be available by default.

My main issue with 64-bit being optional though is that applications and 
libraries like Boost.Atomic would like (or even require) to know if the 
feature is available at compile time rather than run time. std::atomic, 
for example, is supposed to be a thin abstraction over atomic 
instructions and OS primitives like futex, so performing runtime 
detection of the available features in the kernel would be detrimental 
there. I'm not sure if this is possible in the current kernel 
infrastructure, but it would be best if the lack of 64-bit atomics in 
the kernel was detectable through kernel headers (e.g. by a macro for 
64-bit futexes not being defined or something like that), which means 
the headers must be generated at kernel configuration time.

[1]: 
https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitonaddress
[2]: https://en.cppreference.com/w/cpp/atomic/atomic

             reply	other threads:[~2021-02-16 12:14 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-16 12:13 Andrey Semashev [this message]
2021-02-16 12:17 ` [RFC PATCH 00/13] Add futex2 syscalls Andrey Semashev
  -- strict thread matches above, loose matches on Subject: below --
2021-02-15 15:23 André Almeida

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9557a62c-ab64-495b-36bd-6d8db426ddce@gmail.com \
    --to=andrey.semashev@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).