archive mirror
 help / color / mirror / Atom feed
From: Pavel Begunkov <>
To: "Jens Axboe" <>, "Carter Li 李通洲" <>
Cc: io-uring <>
Subject: Re: [RFC] single cqe per link
Date: Wed, 26 Feb 2020 00:13:01 +0300	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On 25/02/2020 23:20, Jens Axboe wrote:
> On 2/25/20 3:12 AM, Pavel Begunkov wrote:
>> Flexible, but not performant. The existence of drain is already makes
>> io_uring to do a lot of extra stuff, and even worse when it's actually used.
> Yeah I agree, that's assuming we can make the drain more efficient. Just
> hand waving on possible use cases :-)

I don't even know what to do with sequences and drains when we get to in-kernel
sqe generation. And the current linear numbering won't be the case at all.

E.g. req1 -> DRAIN, and req1 infinitely generates req2, req3, etc. Should they
go before DRAIN? or at any time? What would be performance burden for it?..

I'd rather forbid them for using with some new features. And that's the reason
behind the question about wideness of its use.

>> That's a different thing. Knowing how requests behave (e.g. if
>> nbytes!=res, then fail link), one would want to get cqe for the last
>> executed sqe, whether it's an error or a success for the last one.
>> It makes a link to be handled as a single entity. I don't see a way to
>> emulate similar behaviour with the unconditional masking. Probably, we
>> will need them both.
> But you can easily do that with IOSQE_NO_CQE, in fact that's what I did
> to test this. The chain will have IOSQE_NO_CQE | IOSQE_IO_LINK set on
> all but the last request.

It's fine if you don't expect it to fail. Otherwise, there will be only
-ECANCELELED for the last one, so you don't know error code nor failed
req/user_data. Forcing IOSQE_NO_CQE to emit in case of an error is not really

I know, it's hard to judge base on performance-testing-only patch, but the whole
idea is to greatly simplify userspace cqe handling, including errors. And I'd
like to find something better/faster and doing the same favor.

> My box with the optane2 is out of commission, apparently, cannot get it
> going today. So I had to make do with my laptop, which does about ~600K
> random read IOPS. I don't see any difference there, using polled IO,
> using 4 link deep chains (so 1/4th the CQEs). Both run at around
> 611-613K IOPS.

Pavel Begunkov

  reply	other threads:[~2020-02-25 21:13 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-25  0:39 [RFC] single cqe per link Pavel Begunkov
2020-02-25  2:14 ` Carter Li 李通洲
2020-02-25  2:36   ` Jens Axboe
2020-02-25  3:13     ` Jens Axboe
2020-02-25 10:12       ` Pavel Begunkov
2020-02-25 20:20         ` Jens Axboe
2020-02-25 21:13           ` Pavel Begunkov [this message]
2020-08-21  5:17             ` Questions about IORING_OP_ASYNC_CANCEL usage Carter Li 李通洲
2020-08-21  5:20               ` Carter Li 李通洲
2020-02-25  2:24 ` [RFC] single cqe per link Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).