All of lore.kernel.org
 help / color / mirror / Atom feed
From: "André Almeida" <andrealmeid@collabora.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>,
	acme@kernel.org, Andrey Semashev <andrey.semashev@gmail.com>,
	corbet@lwn.net, Davidlohr Bueso <dave@stgolabs.net>,
	Darren Hart <dvhart@infradead.org>,
	fweimer@redhat.com, joel@joelfernandes.org, kernel@collabora.com,
	krisman@collabora.com, libc-alpha@sourceware.org,
	linux-api@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, malteskarupke@fastmail.fm,
	Ingo Molnar <mingo@redhat.com>,
	pgriffais@valvesoftware.com, Peter Oskolkov <posk@posk.io>,
	Steven Rostedt <rostedt@goodmis.org>,
	shuah@kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	z.figura12@gmail.com,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: Re: [PATCH v4 00/15] Add futex2 syscalls
Date: Tue, 8 Jun 2021 12:04:18 -0300	[thread overview]
Message-ID: <7ab1a38e-5ba6-843d-9fa8-7480914c3d15@collabora.com> (raw)
In-Reply-To: <YL99cR0H+7xgU8L1@hirez.programming.kicks-ass.net>

Às 11:23 de 08/06/21, Peter Zijlstra escreveu:
> On Tue, Jun 08, 2021 at 02:26:22PM +0200, Sebastian Andrzej Siewior wrote:
>> On 2021-06-07 12:40:54 [-0300], André Almeida wrote:
>>>
>>> When I first read Thomas proposal for per table process, I thought that
>>> the main goal there was to solve NUMA locality issues, not RT latency,
>>> but I think you are right. However, re-reading the thread at [0], it
>>> seems that the RT problems where not completely solved in that
>>> interface, maybe the people involved with that patchset can help to shed
>>> some light on it.
>>>
>>> Otherwise, this same proposal could be integrated in futex2, given that
>>> we would only need to provide to userland some extra flags and add some
>>> `if`s around the hash table code (in a very similar way the NUMA code
>>> will be implemented in futex2).
>>
>> There are slides at [0] describing some attempts and the kernel tree [1]
>> from that time.
>>
>> The process-table solves the problem to some degree that two random
>> process don't collide on the same hash bucket. But as Peter Zijlstra
>> pointed out back then two threads from the same task could collide on
>> the same hash bucket (and with ASLR not always). So the collision is
>> there but limited and this was not perfect.
>>
>> All the attempts with API extensions didn't go well because glibc did
>> not want to change a bit. This starts with a mutex that has a static
>> initializer which has to work (I don't remember why the first
>> pthread_mutex_lock() could not fail with -ENOMEM but there was
>> something) and ends with glibc's struct mutex which is full and has no
>> room for additional data storage.
>>
>> The additional data in user's struct mutex + init would have the benefit
>> that instead uaddr (which is hashed for the in-kernel lookup) a cookie
>> could be used for the hash-less lookup (and NUMA pointer where memory
>> should be stored).
>>
>> So. We couldn't change a thing back then so nothing did happen. We
>> didn't want to create a new interface and a library implementing it plus
>> all the functionality around it (like pthread_cond, phtread_barrier, …).
>> Not to mention that if glibc continues to use the "old" locking
>> internally then the application is still affected by the hash-collision
>> locking (or the NUMA problem) should it block on the lock.
> 
> There's more futex users than glibc, and some of them are really hurting
> because of the NUMA issue. Oracle used to (I've no idea what they do or
> do not do these days) use sysvsem because the futex hash table was a
> massive bottleneck for them.
> 
> And as Nick said, other vendors are having the same problems.

Since we're talking about NUMA, which userspace communities would be
able to provide feedback about the futex2() NUMA-aware feature, to check
if this interface would help solving those issues?

> 
> And if you don't extend the futex to store the nid you put the waiter in
> (see all the problems above) you will have to do wakeups on all nodes,
> which is both slower than it is today, and scales possibly even worse.
> 
> The whole numa-aware qspinlock saga is in part because of futex.
> 
> 
> That said; if we're going to do the whole futex-vector thing, we really
> do need a new interface, because the futex multiplex monster is about to
> crumble (see the fun wrt timeouts for example).
> 
> And if we're going to do a new interface, we ought to make one that can
> solve all these problems. Now, ideally glibc will bring forth some
> opinions, but if they don't want to play, we'll go back to the good old
> days of non-standard locking libraries.. we're halfway there already due
> to glibc not wanting to break with POSIX were we know POSIX was just
> dead wrong broken.
> 
> See: https://github.com/dvhart/librtpi
> 
> 

  parent reply	other threads:[~2021-06-08 15:04 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-03 19:59 [PATCH v4 00/15] Add futex2 syscalls André Almeida
2021-06-03 19:59 ` [PATCH v4 01/15] futex2: Implement wait and wake functions André Almeida
2021-06-03 20:35   ` kernel test robot
2021-06-03 22:57   ` kernel test robot
2021-06-04  7:45   ` kernel test robot
2021-06-03 19:59 ` [PATCH v4 02/15] futex2: Add support for shared futexes André Almeida
2021-06-03 19:59 ` [PATCH v4 03/15] futex2: Implement vectorized wait André Almeida
2021-06-03 20:57   ` kernel test robot
2021-06-04  0:58   ` kernel test robot
2021-06-03 19:59 ` [PATCH v4 04/15] futex2: Implement requeue operation André Almeida
2021-06-03 21:07   ` kernel test robot
2021-06-03 19:59 ` [PATCH v4 05/15] futex2: Implement support for different futex sizes André Almeida
2021-06-04  0:23   ` kernel test robot
2021-06-04  0:23     ` kernel test robot
2021-06-06 19:12   ` Davidlohr Bueso
2021-06-06 23:01     ` Andrey Semashev
2021-06-03 19:59 ` [PATCH v4 06/15] futex2: Add compatibility entry point for x86_x32 ABI André Almeida
2021-06-03 19:59 ` [PATCH v4 07/15] docs: locking: futex2: Add documentation André Almeida
2021-06-06 19:23   ` Davidlohr Bueso
2021-06-03 19:59 ` [PATCH v4 08/15] selftests: futex2: Add wake/wait test André Almeida
2021-06-03 19:59 ` [PATCH v4 09/15] selftests: futex2: Add timeout test André Almeida
2021-06-03 19:59 ` [PATCH v4 10/15] selftests: futex2: Add wouldblock test André Almeida
2021-06-03 19:59 ` [PATCH v4 11/15] selftests: futex2: Add waitv test André Almeida
2021-06-03 19:59 ` [PATCH v4 12/15] selftests: futex2: Add requeue test André Almeida
2021-06-03 19:59 ` [PATCH v4 13/15] selftests: futex2: Add futex sizes test André Almeida
2021-06-03 19:59 ` [PATCH v4 14/15] perf bench: Add futex2 benchmark tests André Almeida
2021-06-03 19:59 ` [PATCH v4 15/15] kernel: Enable waitpid() for futex2 André Almeida
2021-06-04  4:51 ` [PATCH v4 00/15] Add futex2 syscalls Zebediah Figura
2021-06-04 17:04   ` André Almeida
2021-06-04 11:36 ` Nicholas Piggin
2021-06-04 20:01   ` André Almeida
2021-06-05  1:09     ` Nicholas Piggin
2021-06-05  8:56       ` Andrey Semashev
2021-06-06 11:57         ` Nicholas Piggin
2021-06-06 13:15           ` Andrey Semashev
2021-06-08  1:25             ` Nicholas Piggin
2021-06-08 11:03               ` Andrey Semashev
2021-06-08 11:13                 ` Greg KH
2021-06-08 11:44                   ` Peter Zijlstra
2021-06-08 14:31                     ` Davidlohr Bueso
2021-06-08 12:06                   ` Andrey Semashev
2021-06-08 12:33                     ` Greg KH
2021-06-08 12:35                     ` Greg KH
2021-06-08 13:18                       ` Andrey Semashev
2021-06-08 13:27                         ` Greg KH
2021-06-08 13:41                           ` Andrey Semashev
2021-06-08 17:06                         ` Zebediah Figura
2021-06-08 14:14                   ` André Almeida
2021-06-07 15:40       ` André Almeida
2021-06-08  1:31         ` Nicholas Piggin
2021-06-08  2:33         ` Davidlohr Bueso
2021-06-08  4:45           ` Nicholas Piggin
2021-06-08 12:26         ` Sebastian Andrzej Siewior
2021-06-08 14:23           ` Peter Zijlstra
2021-06-08 14:57             ` Sebastian Andrzej Siewior
2021-06-08 15:04             ` André Almeida [this message]
2021-06-08 18:08             ` Adhemerval Zanella
2021-06-08 18:19               ` Florian Weimer
2021-06-08 18:22                 ` Adhemerval Zanella
2021-06-09 16:26             ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7ab1a38e-5ba6-843d-9fa8-7480914c3d15@collabora.com \
    --to=andrealmeid@collabora.com \
    --cc=acme@kernel.org \
    --cc=andrey.semashev@gmail.com \
    --cc=bigeasy@linutronix.de \
    --cc=corbet@lwn.net \
    --cc=dave@stgolabs.net \
    --cc=dvhart@infradead.org \
    --cc=fweimer@redhat.com \
    --cc=joel@joelfernandes.org \
    --cc=kernel@collabora.com \
    --cc=krisman@collabora.com \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=malteskarupke@fastmail.fm \
    --cc=mingo@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=pgriffais@valvesoftware.com \
    --cc=posk@posk.io \
    --cc=rostedt@goodmis.org \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=z.figura12@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.