io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Pavel Begunkov <asml.silence@gmail.com>, io-uring@vger.kernel.org
Subject: Re: [PATCH RFC] io_uring: limit inflight IO
Date: Sat, 9 Nov 2019 07:12:27 -0700	[thread overview]
Message-ID: <0b482581-50a6-e309-e6ff-01cd81ff7931@kernel.dk> (raw)
In-Reply-To: <e3c1f9a8-bd6a-ea08-1450-0d7a55a843a6@gmail.com>

On 11/9/19 3:33 AM, Pavel Begunkov wrote:
> On 08/11/2019 17:05, Jens Axboe wrote:
>> On 11/8/19 2:56 AM, Pavel Begunkov wrote:
>>> On 08/11/2019 03:19, Jens Axboe wrote:
>>>> On 11/7/19 4:21 PM, Jens Axboe wrote:
>>>>> I'd like some feedback on this one. Even tith the overflow backpressure
>>>>> patch, we still have a potentially large gap where applications can
>>>>> submit IO before we get any dropped events in the CQ ring. This is
>>>>> especially true if the execution time of those requests are long
>>>>> (unbounded).
>>>>>
>>>>> This adds IORING_SETUP_INFLIGHT, which if set, will return -EBUSY if we
>>>>> have more IO pending than we can feasibly support. This is normally the
>>>>> CQ ring size, but of IORING_SETUP_CQ_NODROP is enabled, then it's twice
>>>>> the CQ ring size.
>>>>>
>>>>> This helps manage the pending queue size instead of letting it grow
>>>>> indefinitely.
>>>>>
>>>>> Note that we could potentially just make this the default behavior -
>>>>> applications need to handle -EAGAIN returns already, in case we run out
>>>>> of memory, and if we change this to return -EAGAIN as well, then it
>>>>> doesn't introduce any new failure cases. I'm tempted to do that...
>>>>>
>>>>> Anyway, comments solicited!
>>> What's wrong with giving away overflow handling to the userspace? It
>>> knows its inflight count, and it shouldn't be a problem to allocate
>>> large enough rings. The same goes for the backpressure patch. Do you
>>> have a particular requirement/user for this?
>>
>> There are basically two things at play here:
>>
>> - The backpressure patch helps with users that can't easily size the
>>    ring. This could be library use cases, or just normal use cases that
>>    don't necessarily know how big the ring needs to be. Hence they need
>>    to size for the worst case, which is wasteful.
>>
>> - For this case, it's different. I just want to avoid having the
>>    application having potential tons of requests in flight. Before the
>>    backpressure patch, if you had more than the CQ ring size inflight,
>>    you'd get dropped events. Once you get dropped events, it's game over,
>>    the application has totally lost track. Hence I think it's safe to
>>    unconditionally return -EBUSY if we exceed that limit.
>>
>> The first one helps use cases that couldn't really io_uring before, the
>> latter helps protect against use cases that would use a lot of memory.
>> If the request for the latter use case takes a long time to run, the
>> problem is even worse.
> 
> I see, thanks. I like the point with having an upper-bound for inflight
> memory. Seems, the patch doesn't keep it strict and we can get more
> than specified. Is that so?

Right, the important part is providing some upper bound, not an exact
one.

> As an RFC, we could think about using static request pool, e.g.
> with sbitmap as in blk-mq. Would also be able to get rid of some
> refcounting. However, struggle to estimate performance difference.

That would be fine if we needed exact limits, for this case it's just be
a waste of CPU with no real win. So I would not advocate that, and it's
not like we need a tagged identifier for it either.

>>> Awhile something could be done {efficiently,securely,etc} in the
>>> userspace, I would prefer to keep the kernel part simpler.
>>
>> For this particular patch, the majority of issues will just read the sq
>> head and mask, which we will just now anyway. The extra cost is
>> basically a branch and cmp instruction. There's no way you can do that
>> in userspace that efficiently, and it helps protect the kernel.
>>
> I'm more concern about complexity, and bugs as consequences. The second
> concern is edge cases and an absence of formalised guarantees for that.
> 
> For example, it could fetch and submit only half of a link because of
> failed io_get_req(). Shouldn't it be kind of atomic? From the
> userspace perspective I'd prefer so.

If we start a sequence, we should let it finish, agree. I'll post an
update.


-- 
Jens Axboe


      reply	other threads:[~2019-11-09 14:12 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-07 23:21 [PATCH RFC] io_uring: limit inflight IO Jens Axboe
2019-11-08  0:19 ` Jens Axboe
2019-11-08  9:56   ` Pavel Begunkov
2019-11-08 14:05     ` Jens Axboe
2019-11-08 17:45       ` Jens Axboe
2019-11-09 11:16         ` Pavel Begunkov
2019-11-09 14:23           ` Jens Axboe
2019-11-09 15:15             ` Jens Axboe
2019-11-09 19:24             ` Pavel Begunkov
2019-11-09 10:33       ` Pavel Begunkov
2019-11-09 14:12         ` Jens Axboe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0b482581-50a6-e309-e6ff-01cd81ff7931@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=asml.silence@gmail.com \
    --cc=io-uring@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).