All of lore.kernel.org
 help / color / mirror / Atom feed
From: "André Almeida" <andrealmeid@igalia.com>
To: Andrey Semashev <andrey.semashev@gmail.com>
Cc: linux-api@vger.kernel.org, fweimer@redhat.com,
	linux-kernel@vger.kernel.org, Darren Hart <dvhart@infradead.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	libc-alpha@sourceware.org, Davidlohr Bueso <dave@stgolabs.net>,
	Steven Rostedt <rostedt@goodmis.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: Re: [RFC] futex2: add NUMA awareness
Date: Wed, 27 Jul 2022 14:19:22 -0300	[thread overview]
Message-ID: <9ef24c18-775b-000a-5a03-4e4fe0f1c83c@igalia.com> (raw)
In-Reply-To: <3995754e-064b-6091-ccb0-224c3e698af2@gmail.com>

Às 13:42 de 22/07/22, Andrey Semashev escreveu:
> On 7/14/22 18:00, André Almeida wrote:
>> Hi Andrey,
>>
>> Thanks for the feedback.
>>
>> Às 08:01 de 14/07/22, Andrey Semashev escreveu:
>>> On 7/14/22 06:18, André Almeida wrote:
>> [...]
>>>>
>>>> Feedback? Who else should I CC?
>>>
>>> Just a few questions:
>>>
>>> Do I understand correctly that notifiers won't be able to wake up
>>> waiters unless they know on which node they are waiting?
>>>
>>
>> If userspace is using NUMA_FLAG, yes. Otherwise all futexes would be
>> located in the default node, and userspace doesn't need to know which
>> one is the default.
>>
>>> Is it possible to wait on a futex on different nodes?
>>
>> Yes, given that you specify `.hint = id` with the proper node id.
> 
> So any given futex_wake(FUTEX_NUMA) operates only within its node, right?
> 
>>> Is it possible to wake waiters on a futex on all nodes? When a single
>>> (or N, where N is not "all") waiter is woken, which node is selected? Is
>>> there a rotation of nodes, so that nodes are not skewed in terms of
>>> notified waiters?
>>
>> Regardless of which node the waiter process is running, what matter is
>> in which node the futex hash table is. So for instance if we have:
>>
>> 	struct futex32_numa f = {.value = 0, hint = 2};
>>
>> And now we add some waiters for this futex:
>>
>> Thread 1, running on node 3:
>>
>> 	futex_wait(&f, 0, FUTEX_NUMA | FUTEX_32, NULL);
>>
>> Thread 2, running on node 0:
>>
>> 	futex_wait(&f, 0, FUTEX_NUMA | FUTEX_32, NULL);
>>
>> Thread 3, running on node 2:
>>
>> 	futex_wait(&f, 0, FUTEX_NUMA | FUTEX_32, NULL);
>>
>> And then, Thread 4, running on node 3:
>>
>> 	futex_wake(&f, 2, FUTEX_NUMA | FUTEX_32);
>>
>> Now, two waiter would wake up (e.g. T1 and T3, node 3 and 2) and they
>> are from different nodes. futex_wake() doesn't provide guarantees of
>> which waiter will be selected, so I can't say which node would be
>> selected.
> 
> In this example, T1, T2 and T3 are all blocking on node 2 (since all of
> them presumably specify hint == 2), right? In this sense, it doesn't
> matter which node they are running on, what matters is what node they
> block on.

yes

> 
> What I'm asking is can I wake all threads blocked on all nodes on the
> same futex? That is, is the following possible?
> 
>   // I'm using hint == -1 to indicate the current node
>   // of the calling thread for waiters and all nodes for notifiers
>   struct futex32_numa f = {.value = 0, .hint = -1};
> 
>   Thread 1, running on node 3, blocks on node 3:
> 
>   futex_wait(&f, 0, FUTEX_NUMA | FUTEX_32, NULL);
> 
>   Thread 2, running on node 0, blocks on node 0:
> 
>   futex_wait(&f, 0, FUTEX_NUMA | FUTEX_32, NULL);
> 
>   Thread 3, running on node 2, blocks on node 2:
> 
>   futex_wait(&f, 0, FUTEX_NUMA | FUTEX_32, NULL);
> 
>   And then, Thread 4, running on whatever node:
> 
>   futex_wake(&f, -1, FUTEX_NUMA | FUTEX_32);

this futex_wake will wake all futexes waiting on the node that called
futex_wake(), waking only one futex in this example. They are __not__
the same futex. If they have different nodes, they would have different
information inside the kernel.

if you want to wake them all with the same futex_wake(), they need to be
waiting on the same node.

> 
> Here, futex_wake would wake T1, T2 and T3. Or:
> 
>   futex_wake(&f, 1, FUTEX_NUMA | FUTEX_32);

this would behave exactly as the futex_wake() above.

> 
> Here, futex_wake would wake any one of T1, T2 or T3.
> 
>> There's no policy for fairness/starvation for futex_wake(). Do
>> you think this would be important for the NUMA case?
> 
> I'm not sure yet. If there isn't a cross-node behavior like in my
> example above then, I suppose, it falls to the userspace to ensure fair
> rotation of the wakeups on different nodes. If there is functionality
> like this, I imagine, some sort of fairness would be desired.

      reply	other threads:[~2022-07-27 18:20 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-14  3:18 [RFC] futex2: add NUMA awareness André Almeida
2022-07-14 11:01 ` Andrey Semashev
2022-07-14 15:00   ` André Almeida
2022-07-22 16:42     ` Andrey Semashev
2022-07-27 17:19       ` André Almeida [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9ef24c18-775b-000a-5a03-4e4fe0f1c83c@igalia.com \
    --to=andrealmeid@igalia.com \
    --cc=andrey.semashev@gmail.com \
    --cc=bigeasy@linutronix.de \
    --cc=dave@stgolabs.net \
    --cc=dvhart@infradead.org \
    --cc=fweimer@redhat.com \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.