Re: [PATCH 3/7] io_uring: don't wait when under-submitting

From: Jens Axboe <axboe@kernel.dk>
To: Pavel Begunkov <asml.silence@gmail.com>, Jann Horn <jannh@google.com>
Cc: io-uring <io-uring@vger.kernel.org>
Subject: Re: [PATCH 3/7] io_uring: don't wait when under-submitting
Date: Wed, 18 Dec 2019 06:02:27 -0700	[thread overview]
Message-ID: <78c65b96-d5ad-c2f4-862b-5fb839895fc1@kernel.dk> (raw)
In-Reply-To: <711479c4-9aee-667c-590d-480fbee64c96@gmail.com>

On 12/18/19 2:38 AM, Pavel Begunkov wrote:
> On 12/18/2019 3:06 AM, Jens Axboe wrote:
>> On 12/17/19 4:55 PM, Jann Horn wrote:
>>> On Tue, Dec 17, 2019 at 11:54 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>> There is no reliable way to submit and wait in a single syscall, as
>>>> io_submit_sqes() may under-consume sqes (in case of an early error).
>>>> Then it will wait for not-yet-submitted requests, deadlocking the user
>>>> in most cases.
>>>>
>>>> In such cases adjust min_complete, so it won't wait for more than
>>>> what have been submitted in the current io_uring_enter() call. It
>>>> may be less than total in-flight, but that up to a user to handle.
>>>>
>>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>> [...]
>>>>         if (flags & IORING_ENTER_GETEVENTS) {
>>>>                 unsigned nr_events = 0;
>>>>
>>>>                 min_complete = min(min_complete, ctx->cq_entries);
>>>> +               if (submitted != to_submit)
>>>> +                       min_complete = min(min_complete, (u32)submitted);
>>>
>>> Hm. Let's say someone submits two requests, first an ACCEPT request
>>> that might stall indefinitely and then a WRITE to a file on disk that
>>> is expected to complete quickly; and the caller uses min_complete=1
>>> because they want to wait for the WRITE op. But now the submission of
>>> the WRITE fails, io_uring_enter() computes min_complete=min(1, 1)=1,
>>> and it blocks on the ACCEPT op. That would be bad, right?
>>>
>>> If the usecase I described is valid, I think it might make more sense
>>> to do something like this:
>>>
>>> u32 missing_submissions = to_submit - submitted;
>>> min_complete = min(min_complete, ctx->cq_entries);
>>> if ((flags & IORING_ENTER_GETEVENTS) && missing_submissions < min_complete) {
>>>   min_complete -= missing_submissions;
>>>   [...]
>>> }
>>>
>>> In other words: If we do a partially successful submission, only wait
>>> as long as we know that userspace definitely wants us to wait for one
>>> of the pending requests; and once we can't tell whether userspace
>>> intended to wait longer, return to userspace and let the user decide.
>>>
>>> Or it might make sense to just ignore IORING_ENTER_GETEVENTS
>>> completely in the partial submission case, in case userspace wants to
>>> immediately react to the failed request by writing out an error
>>> message to a socket or whatever. This case probably isn't
>>> performance-critical, right? And it would simplify things a bit.
>>
>> That's a good point, and Pavel's first patch actually did that. I
>> didn't consider the different request type case, which might be
>> uncommon but definitely valid.
>>
>> Probably the safest bet here is just to not wait at all if we fail
>> submitting all of them. This isn't a fast path, there was an error
>> somehow which meant we didn't submit it all. So just return the
>> submit count (including 0, not -EAGAIN) if we fail submitting,
>> and ignore IORING_ENTER_GETEVENTS for that case.
>>
> I see nothing wrong with -EAGAIN, it's returned only if it can't
> allocate memory for the first request. If so, can you then just take the
> v1? It will probably be applied cleanly.

-EAGAIN for request alloc is fine, but your v1 also returned -EAGAIN if
someone asked to submit 0, which is a bug. We must return zero for that
case.

So your v1 without that would work, something ala:

if (submitted != to_submit)
	goto out;

without the turning 0 into -EAGAIN unconditionally, io_submit_sqes() does
the right thing there as it is.

-- 
Jens Axboe