LKML Archive on
 help / color / Atom feed
From: Zebediah Figura <>
To: Thomas Gleixner <>
Cc: Peter Zijlstra <>,
	Gabriel Krisman Bertazi <>,,,,,
	Steven Noonan <>,
	"Pierre-Loup A . Griffais" <>,,
Subject: Re: [PATCH RFC 2/2] futex: Implement mechanism to wait on any of several futexes
Date: Wed, 31 Jul 2019 18:02:25 -0500
Message-ID: <> (raw)
In-Reply-To: <>

On 7/31/19 5:39 PM, Thomas Gleixner wrote:
> On Wed, 31 Jul 2019, Zebediah Figura wrote:
>> On 7/31/19 7:06 AM, Peter Zijlstra wrote:
>>> On Tue, Jul 30, 2019 at 06:06:02PM -0400, Gabriel Krisman Bertazi wrote:
>>>> This is a new futex operation, called FUTEX_WAIT_MULTIPLE, which allows
>>>> a thread to wait on several futexes at the same time, and be awoken by
>>>> any of them.  In a sense, it implements one of the features that was
>>>> supported by pooling on the old FUTEX_FD interface.
>>>> My use case for this operation lies in Wine, where we want to implement
>>>> a similar interface available in Windows, used mainly for event
>>>> handling.  The wine folks have an implementation that uses eventfd, but
>>>> it suffers from FD exhaustion (I was told they have application that go
>>>> to the order of multi-milion FDs), and higher CPU utilization.
>>> So is multi-million the range we expect for @count ?
>> Not in Wine's case; in fact Wine has a hard limit of 64 synchronization
>> primitives that can be waited on at once (which, with the current user-side
>> code, translates into 65 futexes). The exhaustion just had to do with the
>> number of primitives created; some programs seem to leak them badly.
> And how is the futex approach better suited to 'fix' resource leaks?

The crucial constraints for implementing Windows synchronization 
primitives in Wine are that (a) it must be possible to access them from 
multiple processes and (b) it must be possible to wait on more than one 
at a time.

The current best solution for this, performance-wise, backs each Windows 
synchronization primitive with an eventfd(2) descriptor and uses poll(2) 
to select on them. Windows programs can create an apparently unbounded 
number of synchronization objects, though they can only wait on up to 64 
at a time. However, on Linux the NOFILE limit causes problems; some 
distributions have it as low as 4096 by default, which is too low even 
for some modern programs that don't leak objects.

The approach we are developing, that relies on this patch, backs each 
object with a single futex whose value represents its signaled state. 
Therefore the only resource we are at risk of running out of is 
available memory, which exists in far greater quantities than available 
descriptors. [Presumably Windows synchronization primitives require at 
least some kernel memory to be allocated per object as well, so this 
puts us essentially at parity, for whatever that's worth.]

To be clear, I think the primary impetus for developing the futex-based 
approach was performance; it lets us avoid some system calls in hot 
paths (e.g. waiting on an already signaled object, resetting the state 
of an object to unsignaled. In that respect we're trying to get ahead of 
Windows, I guess.) But we have still been encountering occasional grief 
due to NOFILE limits that are too low, so this is another helpful benefit.

  reply index

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-30 22:06 [PATCH RFC 1/2] futex: Split key setup from key queue locking and read Gabriel Krisman Bertazi
2019-07-30 22:06 ` [PATCH RFC 2/2] futex: Implement mechanism to wait on any of several futexes Gabriel Krisman Bertazi
2019-07-31 12:06   ` Peter Zijlstra
2019-07-31 15:15     ` Zebediah Figura
2019-07-31 22:39       ` Thomas Gleixner
2019-07-31 23:02         ` Zebediah Figura [this message]
2019-08-06  6:26     ` Gabriel Krisman Bertazi
2019-08-06 10:13       ` Peter Zijlstra
2019-08-01  0:45   ` Thomas Gleixner
2019-08-01  1:22     ` Zebediah Figura
2019-08-01  1:32       ` Zebediah Figura
2019-08-01  1:42         ` Pierre-Loup A. Griffais
2019-07-31 23:33 ` [PATCH RFC 1/2] futex: Split key setup from key queue locking and read Thomas Gleixner
2019-08-01  0:07   ` Gabriel Krisman Bertazi
2019-08-01  0:22     ` Gabriel Krisman Bertazi
2019-08-01  0:41     ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on

Archives are clonable:
	git clone --mirror lkml/git/0.git
	git clone --mirror lkml/git/1.git
	git clone --mirror lkml/git/2.git
	git clone --mirror lkml/git/3.git
	git clone --mirror lkml/git/4.git
	git clone --mirror lkml/git/5.git
	git clone --mirror lkml/git/6.git
	git clone --mirror lkml/git/7.git
	git clone --mirror lkml/git/8.git
	git clone --mirror lkml/git/9.git
	git clone --mirror lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ \
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone