linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andres Freund <andres@anarazel.de>
To: Pavel Begunkov <asml.silence@gmail.com>
Cc: io-uring@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Darren Hart <dvhart@infradead.org>,
	Davidlohr Bueso <dave@stgolabs.net>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC 0/4] futex request support
Date: Thu, 3 Jun 2021 11:59:43 -0700	[thread overview]
Message-ID: <20210603185943.eeav4sfkrxyuhytp@alap3.anarazel.de> (raw)
In-Reply-To: <cover.1622558659.git.asml.silence@gmail.com>

Hi,

On 2021-06-01 15:58:25 +0100, Pavel Begunkov wrote:
> Should be interesting for a bunch of people, so we should first
> outline API and capabilities it should give. As I almost never
> had to deal with futexes myself, would especially love to hear
> use case, what might be lacking and other blind spots.

I did chat with Jens about how useful futex support would be in io_uring, so I
should outline our / my needs. I'm off work this week though, so I don't think
I'll have much time to experiment.

For postgres's AIO support (which I am working on) there are two, largely
independent, use-cases for desiring futex support in io_uring.

The first is the ability to wait for locks (queued r/w locks, blocking
implemented via futexes) and IO at the same time, within one task. Quickly and
efficiently processing IO completions can improve whole-system latency and
throughput substantially in some cases (journalling, indexes and other
high-contention areas - which often have a low queue depth). This is true
*especially* when there also is lock contention, which tends to make efficient
IO scheduling harder.

The second use case is the ability to efficiently wait in several tasks for
one IO to be processed. The prototypical example here is group commit/journal
flush, where each task can only continue once the journal flush has
completed. Typically one of waiters has to do a small amount of work with the
completion (updating a few shared memory variables) before the other waiters
can be released. It is hard to implement this efficiently and race-free with
io_uring right now without adding locking around *waiting* on the completion
side (instead of just consumption of completions). One cannot just wait on the
io_uring, because of a) the obvious race that another process could reap all
completions between check and wait b) there is no good way to wake up other
waiters once the userspace portion of IO completion is through.


All answers for postgres:

> 1) Do we need PI?

Not right now.

Not related to io_uring: I do wish there were a lower overhead (and lower
guarantees) version of PI futexes. Not for correctness reasons, but
performance. Granting the waiter's timeslice to the lock holder would improve
common contention scenarios with more runnable tasks than cores.


> 2) Do we need requeue? Anything else?

I can see requeue being useful, but I haven't thought it through fully.

Do the wake/wait ops as you have them right now support bitsets?


> 3) How hot waits are? May be done fully async avoiding io-wq, but
> apparently requires more changes in futex code.

The waits can be quite hot, most prominently on low latency storage, but not
just.

Greetings,

Andres Freund

  parent reply	other threads:[~2021-06-03 18:59 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-01 14:58 [RFC 0/4] futex request support Pavel Begunkov
2021-06-01 14:58 ` [RFC 1/4] futex: add op wake for a single key Pavel Begunkov
2021-06-01 14:58 ` [RFC 2/4] io_uring: frame out futex op Pavel Begunkov
2021-06-01 14:58 ` [RFC 3/4] io_uring: support futex wake requests Pavel Begunkov
2021-06-01 14:58 ` [RFC 4/4] io_uring: implement futex wait Pavel Begunkov
2021-06-01 15:45   ` Jens Axboe
2021-06-01 15:58     ` Pavel Begunkov
2021-06-01 16:01       ` Jens Axboe
2021-06-01 16:29         ` Pavel Begunkov
2021-06-01 21:53           ` Thomas Gleixner
2021-06-03 10:31             ` Pavel Begunkov
2021-06-04  9:19               ` Thomas Gleixner
2021-06-04 11:58                 ` Pavel Begunkov
2021-06-05  2:09                   ` Thomas Gleixner
2021-06-07 12:14                     ` Pavel Begunkov
2021-06-03 19:03             ` Andres Freund
2021-06-03 21:10               ` Peter Zijlstra
2021-06-03 21:21                 ` Andres Freund
2021-06-05  0:43               ` Thomas Gleixner
2021-06-07 11:31                 ` Pavel Begunkov
2021-06-07 11:48                   ` Peter Zijlstra
2021-06-03 18:59 ` Andres Freund [this message]
2021-06-04 15:26   ` [RFC 0/4] futex request support Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210603185943.eeav4sfkrxyuhytp@alap3.anarazel.de \
    --to=andres@anarazel.de \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=dave@stgolabs.net \
    --cc=dvhart@infradead.org \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).