linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: io-uring <io-uring@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>
Subject: [GIT PULL] io_uring futex support
Date: Mon, 30 Oct 2023 19:07:01 -0600	[thread overview]
Message-ID: <49ec1791-f353-48f2-a39a-378b5463db42@kernel.dk> (raw)

Hi Linus,

Was holding off on sending this one until the tip locking/core branch
had been merged with the futex2 changes, as this uses the same API. This
sits on top of both that, and the main for-6.7/io_uring branch.

This pull request adds support for using futexes through io_uring -
first futex wake and wait, and then the vectored variant of waiting,
futex waitv.

For both wait/wake/waitv, we support the bitset variant, as the
"normal" variants can be easily implemented on top of that.

PI and requeue are not supported through io_uring, just the above
mentioned parts. This may change in the future, but in the spirit
of keeping this small (and based on what people have been asking for),
this is what we currently have.

Wake support is pretty straight forward, most of the thought has gone
into the wait side to avoid needing to offload wait operations to a
blocking context. Instead, we rely on the usual callbacks to retry and
post a completion event, when appropriate.

As far as I can recall, the first request for futex support with
io_uring came from Andres Freund, working on postgres. His aio rework
of postgres was one of the early adopters of io_uring, and futex
support was a natural extension for that. This is relevant from both
a usability point of view, as well as for effiency and performance.
In Andres's words, for the former:

"Futex wait support in io_uring makes it a lot easier to avoid deadlocks
in concurrent programs that have their own buffer pool: Obviously pages in
the application buffer pool have to be locked during IO. If the initiator
of IO A needs to wait for a held lock B, the holder of lock B might wait
for the IO A to complete.  The ability to wait for a lock and IO
completions at the same time provides an efficient way to avoid such
deadlocks."

and in terms of effiency, even without unlocking the full potential yet,
Andres says:

"Futex wake support in io_uring is useful because it allows for more
efficient directed wakeups.  For some "locks" postgres has queues
implemented in userspace, with wakeup logic that cannot easily be
implemented with FUTEX_WAKE_BITSET on a single "futex word" (imagine
waiting for journal flushes to have completed up to a certain point). Thus
a "lock release" sometimes need to wake up many processes in a row.  A
quick-and-dirty conversion to doing these wakeups via io_uring lead to a
3% throughput increase, with 12% fewer context switches, albeit in a
fairly extreme workload."

Please pull!


The following changes since commit 93b8cc60c37b9d17732b7a297e5dca29b50a990d:

  io_uring: cancelable uring_cmd (2023-09-28 07:36:00 -0600)

are available in the Git repository at:

  git://git.kernel.dk/linux.git tags/io_uring-futex-2023-10-30

for you to fetch changes up to 8f350194d5cfd7016d4cd44e433df0faa4d4a703:

  io_uring: add support for vectored futex waits (2023-09-29 02:37:08 -0600)

----------------------------------------------------------------
io_uring-futex-2023-10-30

----------------------------------------------------------------
Jens Axboe (10):
      Merge branch 'for-6.7/io_uring' into io_uring-futex
      Merge branch 'locking/core' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into io_uring-futex
      futex: move FUTEX2_VALID_MASK to futex.h
      futex: factor out the futex wake handling
      futex: abstract out a __futex_wake_mark() helper
      io_uring: add support for futex wake and wait
      futex: add wake_data to struct futex_q
      futex: make futex_parse_waitv() available as a helper
      futex: make the vectored futex operations available
      io_uring: add support for vectored futex waits

 include/linux/io_uring_types.h |   5 +
 include/uapi/linux/io_uring.h  |   4 +
 io_uring/Makefile              |   1 +
 io_uring/cancel.c              |   5 +
 io_uring/cancel.h              |   4 +
 io_uring/futex.c               | 386 +++++++++++++++++++++++++++++++++
 io_uring/futex.h               |  36 +++
 io_uring/io_uring.c            |   7 +
 io_uring/opdef.c               |  34 +++
 kernel/futex/futex.h           |  20 ++
 kernel/futex/requeue.c         |   3 +-
 kernel/futex/syscalls.c        |  18 +-
 kernel/futex/waitwake.c        |  49 +++--
 13 files changed, 545 insertions(+), 27 deletions(-)

-- 
Jens Axboe


             reply	other threads:[~2023-10-31  1:07 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-31  1:07 Jens Axboe [this message]
2023-11-01 22:47 ` [GIT PULL] io_uring futex support pr-tracker-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49ec1791-f353-48f2-a39a-378b5463db42@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).