IO-Uring Archive on
 help / color / Atom feed
From: Jens Axboe <>
To: Jann Horn <>, Pavel Begunkov <>
Cc: io-uring <>
Subject: Re: [PATCH 3/7] io_uring: don't wait when under-submitting
Date: Tue, 17 Dec 2019 17:06:13 -0700
Message-ID: <> (raw)
In-Reply-To: <>

On 12/17/19 4:55 PM, Jann Horn wrote:
> On Tue, Dec 17, 2019 at 11:54 PM Jens Axboe <> wrote:
>> There is no reliable way to submit and wait in a single syscall, as
>> io_submit_sqes() may under-consume sqes (in case of an early error).
>> Then it will wait for not-yet-submitted requests, deadlocking the user
>> in most cases.
>> In such cases adjust min_complete, so it won't wait for more than
>> what have been submitted in the current io_uring_enter() call. It
>> may be less than total in-flight, but that up to a user to handle.
>> Signed-off-by: Pavel Begunkov <>
>> Signed-off-by: Jens Axboe <>
> [...]
>>         if (flags & IORING_ENTER_GETEVENTS) {
>>                 unsigned nr_events = 0;
>>                 min_complete = min(min_complete, ctx->cq_entries);
>> +               if (submitted != to_submit)
>> +                       min_complete = min(min_complete, (u32)submitted);
> Hm. Let's say someone submits two requests, first an ACCEPT request
> that might stall indefinitely and then a WRITE to a file on disk that
> is expected to complete quickly; and the caller uses min_complete=1
> because they want to wait for the WRITE op. But now the submission of
> the WRITE fails, io_uring_enter() computes min_complete=min(1, 1)=1,
> and it blocks on the ACCEPT op. That would be bad, right?
> If the usecase I described is valid, I think it might make more sense
> to do something like this:
> u32 missing_submissions = to_submit - submitted;
> min_complete = min(min_complete, ctx->cq_entries);
> if ((flags & IORING_ENTER_GETEVENTS) && missing_submissions < min_complete) {
>   min_complete -= missing_submissions;
>   [...]
> }
> In other words: If we do a partially successful submission, only wait
> as long as we know that userspace definitely wants us to wait for one
> of the pending requests; and once we can't tell whether userspace
> intended to wait longer, return to userspace and let the user decide.
> Or it might make sense to just ignore IORING_ENTER_GETEVENTS
> completely in the partial submission case, in case userspace wants to
> immediately react to the failed request by writing out an error
> message to a socket or whatever. This case probably isn't
> performance-critical, right? And it would simplify things a bit.

That's a good point, and Pavel's first patch actually did that. I
didn't consider the different request type case, which might be
uncommon but definitely valid.

Probably the safest bet here is just to not wait at all if we fail
submitting all of them. This isn't a fast path, there was an error
somehow which meant we didn't submit it all. So just return the
submit count (including 0, not -EAGAIN) if we fail submitting,
and ignore IORING_ENTER_GETEVENTS for that case.

Pavel, care to submit a new one? I'll drop this one now.

Jens Axboe

  reply index

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-17 22:54 [PATCHSET] io_uring fixes for 5.5 Jens Axboe
2019-12-17 22:54 ` [PATCH 1/7] io_uring: fix stale comment and a few typos Jens Axboe
2019-12-17 22:54 ` [PATCH 2/7] io_uring: fix sporadic -EFAULT from IORING_OP_RECVMSG Jens Axboe
2019-12-17 22:54 ` [PATCH 3/7] io_uring: don't wait when under-submitting Jens Axboe
2019-12-17 23:55   ` Jann Horn
2019-12-18  0:06     ` Jens Axboe [this message]
2019-12-18  9:38       ` Pavel Begunkov
2019-12-18 13:02         ` Jens Axboe
2019-12-18 13:09           ` Pavel Begunkov
2019-12-17 22:54 ` [PATCH 4/7] io_uring: fix pre-prepped issue with force_nonblock == true Jens Axboe
2019-12-17 22:54 ` [PATCH 5/7] io_uring: remove 'sqe' parameter to the OP helpers that take it Jens Axboe
2019-12-17 22:54 ` [PATCH 6/7] io_uring: any deferred command must have stable sqe data Jens Axboe
2019-12-17 22:54 ` [PATCH 7/7] io_uring: make HARDLINK imply LINK Jens Axboe

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

IO-Uring Archive on

Archives are clonable:
	git clone --mirror io-uring/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 io-uring io-uring/ \
	public-inbox-index io-uring

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone