archive mirror
 help / color / mirror / Atom feed
From: Andrey Semashev <>
To: "André Almeida" <>,
	"Thomas Gleixner" <>,
	"Ingo Molnar" <>,
	"Peter Zijlstra" <>,
	"Darren Hart" <>,
Cc:,,, Davidlohr Bueso <>,
	Steven Rostedt <>,
	Sebastian Andrzej Siewior <>
Subject: Re: [RFC] futex2: add NUMA awareness
Date: Thu, 14 Jul 2022 14:01:04 +0300	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On 7/14/22 06:18, André Almeida wrote:
> Hi,
> futex2 is an ongoing project with the goal to create a new interface for
> futex that solves ongoing issues with the current syscall.
> One of this problems is the lack of NUMA awareness for futex operations.
> This RFC is aimed to gather feedback around the a NUMA interface proposal.
>  * The problem
> futex has a single, global hash table to store information of current
> waiters to be queried by wakers. This hash table is stored in a single
> node in non-uniform machines. This means that a process running in other
> nodes will have some overhead using futex, given that it will need to
> access the table in a different node.
>  * A solution
> For NUMA machines, it would be allocated a table per node. Processes
> then would be able to use the local table to avoid sharing data with
> other nodes.
>  * The interface
> Userspace needs to specify which node would like to use to store/query
> the futex table. The common case would be to operate on the current
> node, but some cases could required to operate in another one.
> Before getting to the NUMA part, a quick recap of the syscalls interface
> of futex2:
> futex_wait(void *uaddr, unsigned int val, unsigned int flags,
>            struct timespec *timo)
> futex_wake(void *uaddr, unsigned long nr_wake, unsigned int flags)
> struct futex_requeue {
> 	void *uaddr;
> 	unsigned int flags;
> };
> futex_requeue(struct futex_requeue *rq1, struct futex_requeue *rq2,
> 	      unsigned int nr_wake, unsigned int nr_requeue,
> 	      u64 cmpval, unsigned int flags)
> As requeue already has 6 arguments, we can't add an argument for the
> node ID, we need to pack it in a struct. So then we have
> struct futexX_numa {
>         __uX value;
>         __sX hint;
> };
> Where X can be 8, 16, 32 or 64 (futex2 supports variable sized futexes).
> `value` is the futex value and `hint` can be -1 for the current node, or
> [0, MAX_NUMA_NODES) to specify a node. Example:
> struct futex32_numa f = {.value = 0, hint = -1};
> ...
> futex_wait(&f, 0, FUTEX_NUMA | FUTEX_32, NULL);
> Then &f would be used as the futex address, as expected, and this would
> be used for the current node. If an app is expecting to have calls from
> different nodes then it should do for instance:
> struct futex32_numa f = {.value = 0, hint = 2};
> For non-NUMA apps, a call without FUTEX_NUMA flag would just use the
> first node as default.
> Feedback? Who else should I CC?

Just a few questions:

Do I understand correctly that notifiers won't be able to wake up
waiters unless they know on which node they are waiting?

Is it possible to wait on a futex on different nodes?

Is it possible to wake waiters on a futex on all nodes? When a single
(or N, where N is not "all") waiter is woken, which node is selected? Is
there a rotation of nodes, so that nodes are not skewed in terms of
notified waiters?

  reply	other threads:[~2022-07-14 11:01 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-14  3:18 [RFC] futex2: add NUMA awareness André Almeida
2022-07-14 11:01 ` Andrey Semashev [this message]
2022-07-14 15:00   ` André Almeida
2022-07-22 16:42     ` Andrey Semashev
2022-07-27 17:19       ` André Almeida

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).