Linux-kselftest Archive on lore.kernel.org
 help / color / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: "Pierre-Loup A. Griffais" <pgriffais@valvesoftware.com>
Cc: "Thomas Gleixner" <tglx@linutronix.de>,
	"André Almeida" <andrealmeid@collabora.com>,
	linux-kernel@vger.kernel.org, kernel@collabora.com,
	krisman@collabora.com, shuah@kernel.org,
	linux-kselftest@vger.kernel.org, rostedt@goodmis.org,
	ryao@gentoo.org, dvhart@infradead.org, mingo@redhat.com,
	z.figura12@gmail.com, steven@valvesoftware.com,
	steven@liquorix.net, malteskarupke@web.de, carlos@redhat.com,
	adhemerval.zanella@linaro.org, fweimer@redhat.com,
	libc-alpha@sourceware.org
Subject: 'simple' futex interface [Was: [PATCH v3 1/4] futex: Implement mechanism to wait on any of several futexes]
Date: Tue, 3 Mar 2020 13:00:50 +0100
Message-ID: <20200303120050.GC2596@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <beb82055-96fa-cb64-a06e-9d7a0946587b@valvesoftware.com>

Hi All,

Added some people harvested from glibc.git and added libc-alpha.

We currently have 2 big new futex features proposed, and still have the
whole NUMA thing on the table.

The proposed features are:

 - a vectored FUTEX_WAIT (as per the parent thread); allows userspace to
   wait on up-to 128 futex values.

 - multi-size (8,16,32) futexes (WAIT,WAKE,CMP_REQUEUE).

Both these features are specific to the 'simple' futex interfaces, that
is, they exclude all the PI / robust stuff.

As is; the vectored WAIT doesn't nicely interact with the multi-size
proposal (or for that matter with the already existing PRIVATE flag),
for not allowing to specify flags per WAIT instance, but this should be
fixable with some little changes to the proposed ABI.

The much bigger sticking point; as already noticed by the multi-size
patches; is that the current ABI is a limiting factor. The giant
horrible syscall.

Now, we have a whole bunch of futex ops that are already gone (FD) or
are fundamentally broken (REQUEUE) or partially weird (WAIT_BITSET has
CLOCK selection where WAIT does not) or unused (per glibc, WAKE_OP,
WAKE_BITSET, WAIT_BITSET (except for that CLOCK crud)).

So how about we introduce new syscalls:

  sys_futex_wait(void *uaddr, unsigned long val, unsigned long flags, ktime_t *timo);

  struct futex_wait {
	void *uaddr;
	unsigned long val;
	unsigned long flags;
  };
  sys_futex_waitv(struct futex_wait *waiters, unsigned int nr_waiters,
		  unsigned long flags, ktime_t *timo);

  sys_futex_wake(void *uaddr, unsigned int nr, unsigned long flags);

  sys_futex_cmp_requeue(void *uaddr1, void *uaddr2, unsigned int nr_wake,
			unsigned int nr_requeue, unsigned long cmpval, unsigned long flags);

Where flags:

  - has 2 bits for size: 8,16,32,64
  - has 2 more bits for size (requeue) ??
  - has ... bits for clocks
  - has private/shared
  - has numa


This does not provide BITSET functionality, as I found no use in glibc.
Both wait and wake have arguments left, do we needs this?

For NUMA I propose that when NUMA_FLAG is set, uaddr-4 will be 'int
node_id', with the following semantics:

 - on WAIT, node_id is read and when 0 <= node_id <= nr_nodes, is
   directly used to index into per-node hash-tables. When -1, it is
   replaced by the current node_id and an smp_mb() is issued before we
   load and compare the @uaddr.

 - on WAKE/REQUEUE, it is an immediate index.

Any invalid value with result in EINVAL.


Then later, we can look at doing sys_futex_{,un}lock_{,pi}(), which have
all the mind-meld associated with robust and PI and possibly optimistic
spinning etc.

Opinions?

  reply index

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-13 21:45 [PATCH v3 0/4] Implement FUTEX_WAIT_MULTIPLE operation André Almeida
2020-02-13 21:45 ` [PATCH v3 1/4] futex: Implement mechanism to wait on any of several futexes André Almeida
2020-02-28 19:07   ` Peter Zijlstra
2020-02-28 19:49     ` Peter Zijlstra
2020-02-28 21:25       ` Thomas Gleixner
2020-02-29  0:29         ` Pierre-Loup A. Griffais
2020-02-29 10:27           ` Thomas Gleixner
2020-03-03  2:47             ` Pierre-Loup A. Griffais
2020-03-03 12:00               ` Peter Zijlstra [this message]
2020-03-03 13:00                 ` 'simple' futex interface [Was: [PATCH v3 1/4] futex: Implement mechanism to wait on any of several futexes] Florian Weimer
2020-03-03 13:21                   ` Peter Zijlstra
2020-03-03 13:47                     ` Florian Weimer
2020-03-03 15:01                       ` Peter Zijlstra
2020-03-05 16:14                         ` André Almeida
2020-03-05 16:25                           ` Florian Weimer
2020-03-05 18:51                           ` Peter Zijlstra
2020-03-06 16:57                             ` David Laight
2020-02-13 21:45 ` [PATCH v3 2/4] selftests: futex: Add FUTEX_WAIT_MULTIPLE timeout test André Almeida
2020-02-13 21:45 ` [PATCH v3 3/4] selftests: futex: Add FUTEX_WAIT_MULTIPLE wouldblock test André Almeida
2020-02-13 21:45 ` [PATCH v3 4/4] selftests: futex: Add FUTEX_WAIT_MULTIPLE wake up test André Almeida
2020-02-19 16:27 ` [PATCH v3 0/4] Implement FUTEX_WAIT_MULTIPLE operation shuah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200303120050.GC2596@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=adhemerval.zanella@linaro.org \
    --cc=andrealmeid@collabora.com \
    --cc=carlos@redhat.com \
    --cc=dvhart@infradead.org \
    --cc=fweimer@redhat.com \
    --cc=kernel@collabora.com \
    --cc=krisman@collabora.com \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=malteskarupke@web.de \
    --cc=mingo@redhat.com \
    --cc=pgriffais@valvesoftware.com \
    --cc=rostedt@goodmis.org \
    --cc=ryao@gentoo.org \
    --cc=shuah@kernel.org \
    --cc=steven@liquorix.net \
    --cc=steven@valvesoftware.com \
    --cc=tglx@linutronix.de \
    --cc=z.figura12@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-kselftest Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-kselftest/0 linux-kselftest/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-kselftest linux-kselftest/ https://lore.kernel.org/linux-kselftest \
		linux-kselftest@vger.kernel.org
	public-inbox-index linux-kselftest

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kselftest


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git