All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicholas Piggin <npiggin@gmail.com>
To: "Andr� Almeida" <andrealmeid@collabora.com>,
	"Davidlohr Bueso" <dave@stgolabs.net>
Cc: acme@kernel.org, Andrey Semashev <andrey.semashev@gmail.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	corbet@lwn.net, Darren Hart <dvhart@infradead.org>,
	fweimer@redhat.com, joel@joelfernandes.org, kernel@collabora.com,
	krisman@collabora.com, libc-alpha@sourceware.org,
	linux-api@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, malteskarupke@fastmail.fm,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	pgriffais@valvesoftware.com, Peter Oskolkov <posk@posk.io>,
	Steven Rostedt <rostedt@goodmis.org>,
	shuah@kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	z.figura12@gmail.com
Subject: Re: [PATCH v4 00/15] Add futex2 syscalls
Date: Tue, 08 Jun 2021 14:45:31 +1000	[thread overview]
Message-ID: <1623125574.l2jo2w0x2v.astroid@bobo.none> (raw)
In-Reply-To: <20210608023302.34yzrm5ktf3qvxhq@offworld>

Excerpts from Davidlohr Bueso's message of June 8, 2021 12:33 pm:
> On Mon, 07 Jun 2021, Andr� Almeida wrote:
> 
>>Às 22:09 de 04/06/21, Nicholas Piggin escreveu:
>>> Actually one other scalability thing while I remember it:
>>>
>>> futex_wait currently requires that the lock word is tested under the
>>> queue spin lock (to avoid consuming a wakeup). The problem with this is
>>> that the lock word can be a very hot cache line if you have a lot of
>>> concurrency, so accessing it under the queue lock can increase queue
>>> lock hold time.
>>>
>>> I would prefer if the new API was relaxed to avoid this restriction
>>> (e.g., any wait call may consume a wakeup so it's up to userspace to
>>> avoid that if it is a problem).
>>
>>Maybe I'm wrong, but AFAIK the goal of checking the lock word inside the
>>spin lock is to avoid sleeping forever (in other words, wrongly assuming
>>that the lock is taken and missing a wakeup call), not to avoid
>>consuming wakeups. Or at least this is my interpretation of this long
>>comment in futex.c:
>>
>>https://elixir.bootlin.com/linux/v5.12.9/source/kernel/futex.c#L51
> 
> I think what Nick is referring to is that futex_wait() could return 0
> instead of EAGAIN upon a uval != val condition if the check is done
> without the hb lock. The value could have changed between when userspace
> did the condition check and called into futex(2) to block in the slowpath.

I just mean the check could be done after queueing ourselves on the wait 
queue (and unlocking the waitqueue lock, not checking while holding the
lock). That is the standard pattern used everywhere else by the kernel:

  prepare_to_wait() /* -> lock; add_wait_queue; unlock; */
  check_again();
  schedule();

It can still return EAGAIN if there is a reasonable use for it, but I'd
be wary about user code that cares about this -- it's racy you could 
arrive right before the value changes or right after it changes, so any
user code checking this I would be suspicious of (I'm willing to see a
use case that really cares).

> 
> But such spurious scenarios should be pretty rare, and while I agree that
> the cacheline can be hot, I'm not sure how much of a performance issue this
> really is(?),

It's not a spurious anything. The problem is the contention on the
lock word cacheline means it can take a relatively long time just to perform
that one load instruction. Mandating that it must be done while holding the
lock translates to increased lock hold times.

This matters particularly in situations that have lock stealing, 
optimistic spinning, reader-writer locks or more exotic kind of things 
that allow some common types of critical section to go through while others
are blocking. And partiuclarly when such things hash collide on other
futexes that share the same hash lock.

> compared to other issues, certainly not to govern futex2
> design. Changing such semantics would be a _huge_ difference between futex1
> and futex2.

futex1 behaviour should not govern futex2 design. That's the only nice 
thing you get with an API change, so we should take full advantage of 
it. I'm not saying make changes for no reason, but I gave a reason, so 
that should be countered with a better reason to not change.

Thanks,
Nick

> 
> At least compared, for example, to the hb collisions serializing independent
> futexes, affecting both performance and determinism. And I agree that a new
> interface should address this problem - albeit most of the workloads I have
> seen in production use but a handful of futexes and larger thread counts.
> One thing that crossed my mind (but have not actually sat down to look at)
> would be to use rlhastables for the dynamic resizing, but of course that would
> probably add a decent amount of overhead to the simple hashing we currently have.

  reply	other threads:[~2021-06-08  4:46 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-03 19:59 [PATCH v4 00/15] Add futex2 syscalls André Almeida
2021-06-03 19:59 ` [PATCH v4 01/15] futex2: Implement wait and wake functions André Almeida
2021-06-03 20:35   ` kernel test robot
2021-06-03 22:57   ` kernel test robot
2021-06-04  7:45   ` kernel test robot
2021-06-03 19:59 ` [PATCH v4 02/15] futex2: Add support for shared futexes André Almeida
2021-06-03 19:59 ` [PATCH v4 03/15] futex2: Implement vectorized wait André Almeida
2021-06-03 20:57   ` kernel test robot
2021-06-04  0:58   ` kernel test robot
2021-06-03 19:59 ` [PATCH v4 04/15] futex2: Implement requeue operation André Almeida
2021-06-03 21:07   ` kernel test robot
2021-06-03 19:59 ` [PATCH v4 05/15] futex2: Implement support for different futex sizes André Almeida
2021-06-04  0:23   ` kernel test robot
2021-06-04  0:23     ` kernel test robot
2021-06-06 19:12   ` Davidlohr Bueso
2021-06-06 23:01     ` Andrey Semashev
2021-06-03 19:59 ` [PATCH v4 06/15] futex2: Add compatibility entry point for x86_x32 ABI André Almeida
2021-06-03 19:59 ` [PATCH v4 07/15] docs: locking: futex2: Add documentation André Almeida
2021-06-06 19:23   ` Davidlohr Bueso
2021-06-03 19:59 ` [PATCH v4 08/15] selftests: futex2: Add wake/wait test André Almeida
2021-06-03 19:59 ` [PATCH v4 09/15] selftests: futex2: Add timeout test André Almeida
2021-06-03 19:59 ` [PATCH v4 10/15] selftests: futex2: Add wouldblock test André Almeida
2021-06-03 19:59 ` [PATCH v4 11/15] selftests: futex2: Add waitv test André Almeida
2021-06-03 19:59 ` [PATCH v4 12/15] selftests: futex2: Add requeue test André Almeida
2021-06-03 19:59 ` [PATCH v4 13/15] selftests: futex2: Add futex sizes test André Almeida
2021-06-03 19:59 ` [PATCH v4 14/15] perf bench: Add futex2 benchmark tests André Almeida
2021-06-03 19:59 ` [PATCH v4 15/15] kernel: Enable waitpid() for futex2 André Almeida
2021-06-04  4:51 ` [PATCH v4 00/15] Add futex2 syscalls Zebediah Figura
2021-06-04 17:04   ` André Almeida
2021-06-04 11:36 ` Nicholas Piggin
2021-06-04 20:01   ` André Almeida
2021-06-05  1:09     ` Nicholas Piggin
2021-06-05  8:56       ` Andrey Semashev
2021-06-06 11:57         ` Nicholas Piggin
2021-06-06 13:15           ` Andrey Semashev
2021-06-08  1:25             ` Nicholas Piggin
2021-06-08 11:03               ` Andrey Semashev
2021-06-08 11:13                 ` Greg KH
2021-06-08 11:44                   ` Peter Zijlstra
2021-06-08 14:31                     ` Davidlohr Bueso
2021-06-08 12:06                   ` Andrey Semashev
2021-06-08 12:33                     ` Greg KH
2021-06-08 12:35                     ` Greg KH
2021-06-08 13:18                       ` Andrey Semashev
2021-06-08 13:27                         ` Greg KH
2021-06-08 13:41                           ` Andrey Semashev
2021-06-08 17:06                         ` Zebediah Figura
2021-06-08 14:14                   ` André Almeida
2021-06-07 15:40       ` André Almeida
2021-06-08  1:31         ` Nicholas Piggin
2021-06-08  2:33         ` Davidlohr Bueso
2021-06-08  4:45           ` Nicholas Piggin [this message]
2021-06-08 12:26         ` Sebastian Andrzej Siewior
2021-06-08 14:23           ` Peter Zijlstra
2021-06-08 14:57             ` Sebastian Andrzej Siewior
2021-06-08 15:04             ` André Almeida
2021-06-08 18:08             ` Adhemerval Zanella
2021-06-08 18:19               ` Florian Weimer
2021-06-08 18:22                 ` Adhemerval Zanella
2021-06-09 16:26             ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1623125574.l2jo2w0x2v.astroid@bobo.none \
    --to=npiggin@gmail.com \
    --cc=acme@kernel.org \
    --cc=andrealmeid@collabora.com \
    --cc=andrey.semashev@gmail.com \
    --cc=bigeasy@linutronix.de \
    --cc=corbet@lwn.net \
    --cc=dave@stgolabs.net \
    --cc=dvhart@infradead.org \
    --cc=fweimer@redhat.com \
    --cc=joel@joelfernandes.org \
    --cc=kernel@collabora.com \
    --cc=krisman@collabora.com \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=malteskarupke@fastmail.fm \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pgriffais@valvesoftware.com \
    --cc=posk@posk.io \
    --cc=rostedt@goodmis.org \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=z.figura12@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.