io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: Jens Axboe <axboe@kernel.dk>, Jann Horn <jannh@google.com>
Cc: io-uring <io-uring@vger.kernel.org>
Subject: Re: [PATCH 3/7] io_uring: don't wait when under-submitting
Date: Wed, 18 Dec 2019 16:09:42 +0300	[thread overview]
Message-ID: <4fce5f02-6263-24b5-adbc-c496a13b0aa0@gmail.com> (raw)
In-Reply-To: <78c65b96-d5ad-c2f4-862b-5fb839895fc1@kernel.dk>

On 12/18/2019 4:02 PM, Jens Axboe wrote:
> On 12/18/19 2:38 AM, Pavel Begunkov wrote:
>> On 12/18/2019 3:06 AM, Jens Axboe wrote:
>>> On 12/17/19 4:55 PM, Jann Horn wrote:
>>>> On Tue, Dec 17, 2019 at 11:54 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>> There is no reliable way to submit and wait in a single syscall, as
>>>>> io_submit_sqes() may under-consume sqes (in case of an early error).
>>>>> Then it will wait for not-yet-submitted requests, deadlocking the user
>>>>> in most cases.
>>>>>
>>>>> In such cases adjust min_complete, so it won't wait for more than
>>>>> what have been submitted in the current io_uring_enter() call. It
>>>>> may be less than total in-flight, but that up to a user to handle.
>>>>>
>>>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>> [...]
>>>>>         if (flags & IORING_ENTER_GETEVENTS) {
>>>>>                 unsigned nr_events = 0;
>>>>>
>>>>>                 min_complete = min(min_complete, ctx->cq_entries);
>>>>> +               if (submitted != to_submit)
>>>>> +                       min_complete = min(min_complete, (u32)submitted);
>>>>
>>>> Hm. Let's say someone submits two requests, first an ACCEPT request
>>>> that might stall indefinitely and then a WRITE to a file on disk that
>>>> is expected to complete quickly; and the caller uses min_complete=1
>>>> because they want to wait for the WRITE op. But now the submission of
>>>> the WRITE fails, io_uring_enter() computes min_complete=min(1, 1)=1,
>>>> and it blocks on the ACCEPT op. That would be bad, right?
>>>>
>>>> If the usecase I described is valid, I think it might make more sense
>>>> to do something like this:
>>>>
>>>> u32 missing_submissions = to_submit - submitted;
>>>> min_complete = min(min_complete, ctx->cq_entries);
>>>> if ((flags & IORING_ENTER_GETEVENTS) && missing_submissions < min_complete) {
>>>>   min_complete -= missing_submissions;
>>>>   [...]
>>>> }
>>>>
>>>> In other words: If we do a partially successful submission, only wait
>>>> as long as we know that userspace definitely wants us to wait for one
>>>> of the pending requests; and once we can't tell whether userspace
>>>> intended to wait longer, return to userspace and let the user decide.
>>>>
>>>> Or it might make sense to just ignore IORING_ENTER_GETEVENTS
>>>> completely in the partial submission case, in case userspace wants to
>>>> immediately react to the failed request by writing out an error
>>>> message to a socket or whatever. This case probably isn't
>>>> performance-critical, right? And it would simplify things a bit.
>>>
>>> That's a good point, and Pavel's first patch actually did that. I
>>> didn't consider the different request type case, which might be
>>> uncommon but definitely valid.
>>>
>>> Probably the safest bet here is just to not wait at all if we fail
>>> submitting all of them. This isn't a fast path, there was an error
>>> somehow which meant we didn't submit it all. So just return the
>>> submit count (including 0, not -EAGAIN) if we fail submitting,
>>> and ignore IORING_ENTER_GETEVENTS for that case.
>>>
>> I see nothing wrong with -EAGAIN, it's returned only if it can't
>> allocate memory for the first request. If so, can you then just take the
>> v1? It will probably be applied cleanly.
> 
> -EAGAIN for request alloc is fine, but your v1 also returned -EAGAIN if
> someone asked to submit 0, which is a bug. We must return zero for that
> case.
> 
Now I see which -EAGAIN you meant. You're right, I'll resend

> So your v1 without that would work, something ala:
> 
> if (submitted != to_submit)
> 	goto out;
> 
> without the turning 0 into -EAGAIN unconditionally, io_submit_sqes() does
> the right thing there as it is.
> 

-- 
Pavel Begunkov

  reply	other threads:[~2019-12-18 13:09 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-17 22:54 [PATCHSET] io_uring fixes for 5.5 Jens Axboe
2019-12-17 22:54 ` [PATCH 1/7] io_uring: fix stale comment and a few typos Jens Axboe
2019-12-17 22:54 ` [PATCH 2/7] io_uring: fix sporadic -EFAULT from IORING_OP_RECVMSG Jens Axboe
2019-12-17 22:54 ` [PATCH 3/7] io_uring: don't wait when under-submitting Jens Axboe
2019-12-17 23:55   ` Jann Horn
2019-12-18  0:06     ` Jens Axboe
2019-12-18  9:38       ` Pavel Begunkov
2019-12-18 13:02         ` Jens Axboe
2019-12-18 13:09           ` Pavel Begunkov [this message]
2019-12-17 22:54 ` [PATCH 4/7] io_uring: fix pre-prepped issue with force_nonblock == true Jens Axboe
2019-12-17 22:54 ` [PATCH 5/7] io_uring: remove 'sqe' parameter to the OP helpers that take it Jens Axboe
2019-12-17 22:54 ` [PATCH 6/7] io_uring: any deferred command must have stable sqe data Jens Axboe
2019-12-17 22:54 ` [PATCH 7/7] io_uring: make HARDLINK imply LINK Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4fce5f02-6263-24b5-adbc-c496a13b0aa0@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=jannh@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).