Re: [PATCH RFC] io_uring: limit inflight IO

From: Pavel Begunkov <asml.silence@gmail.com>
To: Jens Axboe <axboe@kernel.dk>, io-uring@vger.kernel.org
Subject: Re: [PATCH RFC] io_uring: limit inflight IO
Date: Sat, 9 Nov 2019 13:33:00 +0300	[thread overview]
Message-ID: <e3c1f9a8-bd6a-ea08-1450-0d7a55a843a6@gmail.com> (raw)
In-Reply-To: <cc6b368b-10e3-504c-4895-feefe6311004@kernel.dk>

[-- Attachment #1.1: Type: text/plain, Size: 3718 bytes --]

On 08/11/2019 17:05, Jens Axboe wrote:
> On 11/8/19 2:56 AM, Pavel Begunkov wrote:
>> On 08/11/2019 03:19, Jens Axboe wrote:
>>> On 11/7/19 4:21 PM, Jens Axboe wrote:
>>>> I'd like some feedback on this one. Even tith the overflow backpressure
>>>> patch, we still have a potentially large gap where applications can
>>>> submit IO before we get any dropped events in the CQ ring. This is
>>>> especially true if the execution time of those requests are long
>>>> (unbounded).
>>>>
>>>> This adds IORING_SETUP_INFLIGHT, which if set, will return -EBUSY if we
>>>> have more IO pending than we can feasibly support. This is normally the
>>>> CQ ring size, but of IORING_SETUP_CQ_NODROP is enabled, then it's twice
>>>> the CQ ring size.
>>>>
>>>> This helps manage the pending queue size instead of letting it grow
>>>> indefinitely.
>>>>
>>>> Note that we could potentially just make this the default behavior -
>>>> applications need to handle -EAGAIN returns already, in case we run out
>>>> of memory, and if we change this to return -EAGAIN as well, then it
>>>> doesn't introduce any new failure cases. I'm tempted to do that...
>>>>
>>>> Anyway, comments solicited!
>> What's wrong with giving away overflow handling to the userspace? It
>> knows its inflight count, and it shouldn't be a problem to allocate
>> large enough rings. The same goes for the backpressure patch. Do you
>> have a particular requirement/user for this?
> 
> There are basically two things at play here:
> 
> - The backpressure patch helps with users that can't easily size the
>   ring. This could be library use cases, or just normal use cases that
>   don't necessarily know how big the ring needs to be. Hence they need
>   to size for the worst case, which is wasteful.
> 
> - For this case, it's different. I just want to avoid having the
>   application having potential tons of requests in flight. Before the
>   backpressure patch, if you had more than the CQ ring size inflight,
>   you'd get dropped events. Once you get dropped events, it's game over,
>   the application has totally lost track. Hence I think it's safe to
>   unconditionally return -EBUSY if we exceed that limit.
> 
> The first one helps use cases that couldn't really io_uring before, the
> latter helps protect against use cases that would use a lot of memory.
> If the request for the latter use case takes a long time to run, the
> problem is even worse.

I see, thanks. I like the point with having an upper-bound for inflight
memory. Seems, the patch doesn't keep it strict and we can get more
than specified. Is that so?

As an RFC, we could think about using static request pool, e.g. 
with sbitmap as in blk-mq. Would also be able to get rid of some
refcounting. However, struggle to estimate performance difference.

> 
>> Awhile something could be done {efficiently,securely,etc} in the
>> userspace, I would prefer to keep the kernel part simpler.
> 
> For this particular patch, the majority of issues will just read the sq
> head and mask, which we will just now anyway. The extra cost is
> basically a branch and cmp instruction. There's no way you can do that
> in userspace that efficiently, and it helps protect the kernel.
> 
I'm more concern about complexity, and bugs as consequences. The second
concern is edge cases and an absence of formalised guarantees for that.

For example, it could fetch and submit only half of a link because of
failed io_get_req(). Shouldn't it be kind of atomic? From the
userspace perspective I'd prefer so.

> I'm not a fan of adding stuff we don't need either, but I do believe we
> need this one.
> 

-- 
Pavel Begunkov

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]