linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: yangerkun <yangerkun@huawei.com>
To: Jens Axboe <axboe@kernel.dk>, Bob Liu <bob.liu@oracle.com>,
	<linux-block@vger.kernel.org>
Cc: <houtao1@huawei.com>, <yi.zhang@huawei.com>
Subject: Re: [RFC] io_uring: stop only support read/write for ctx with IORING_SETUP_IOPOLL
Date: Thu, 7 Nov 2019 09:27:51 +0800	[thread overview]
Message-ID: <0f3419a4-e49f-4056-48cf-0e50431b8062@huawei.com> (raw)
In-Reply-To: <8194c20d-4523-8885-2b80-b9849d37e890@kernel.dk>



On 2019/11/6 22:14, Jens Axboe wrote:
> On 11/6/19 5:00 AM, yangerkun wrote:
>>
>>
>> On 2019/11/4 22:01, Jens Axboe wrote:
>>> On 11/4/19 4:46 AM, yangerkun wrote:
>>>>
>>>>
>>>> On 2019/11/4 18:09, Bob Liu wrote:
>>>>> On 11/4/19 4:56 PM, yangerkun wrote:
>>>>>> There is no problem to support other type request for the ctx with
>>>>>> IORING_SETUP_IOPOLL.
>>>>>
>>>>> Could you describe the benefit of doing this?
>>>>
>>>> Hi,
>>>>
>>>> I am trying to replace libaio with io_uring in InnoDB/MariaDB(which
>>>> build on xfs/nvme). And in order to simulate the timeout mechanism
>>>> like io_getevents, firstly, to use the poll function of io_uring's fd
>>>> has been selected and it really did work. But while trying to enable
>>>> IORING_SETUP_IOPOLL since xfs has iopoll function interface, the
>>>> mechanism will fail since io_uring_poll does check the cq.head between
>>>> cached_cq_tail, which will not update until we call io_uring_enter and
>>>> do the poll. So, instead, I decide to use timeout requests in
>>>> io_uring but will return -EINVAL since we enable IORING_SETUP_IOPOLL
>>>> at the same time. I think this combination is a normal scene so as
>>>> the other combination descibed in this patch. I am not sure does it a
>>>> good solution for this problem, and maybe there exists some better way.
>>>
>>> I think we can support timeouts pretty easily with SETUP_IOPOLL, but we
>>> can't mix and match everything. Pretty sure I've written at length about
>>> that before, but the tldr is that for purely polled commands, we won't
>>> have an IRQ event. That's the case for nvme if it's misconfigured, but
>>> for an optimal setup where nvme is loaded with poll queues, there will
>>> never be an interrupt for the command. This means that we can never wait
>>> in io_cqring_wait(), we must always call the iopoll poller, because if
>>> we wait we might very well be waiting for events that will never happen
>>> unless we actively poll for them.
>>>
>>> This could be supported if we accounted requests, but I don't want to
>>> add that kind of overhead. Same with the lock+irqdisable you had to add
>>> for this, it's not acceptable overhead.
>>>
>>> Sounds like you just need timeout support for polling? If so, then that
>>
>> Hi,
>>
>> Yeah, the thing I need add is the timeout support for polling.
>>
>>> is supportable as we know that these events will trigger an async event
>>> when they happen. Either that, or it triggers when we poll for
>>> completions. So it's safe to support, and we can definitely do that.
>>
>> I am a little confuse. What you describe is once enable SETUP_IOPOLL
>> and there is a async call of io_timeout_fn, we should not call
>> io_cqring_fill_event directly, but leave io_iopoll_check to do this?
>> Or, the parallel running for io_iopoll_check may trigger some problem
>> since they can going to io_cqring_fill_event.
> 
> Maybe that wasn't quite clear, what I'm trying to say is that there are
> two outcomes with IORING_OP_TIMEOUT:
> 
> 1) The number of events specified in the timeout is met. This happens
>     through the normal poll command checks when we complete commands.
> 2) The timer fires. When this happens, we just increment ->cq_timeouts.
>     You'd have to make a note of that in the poll loop just like we do in
>     cqring_wait() to know to abort if that value is different from when
>     we started the loop.
> 
> All that's needed to support timeouts with IORING_SETUP_IOPOLL is to
> have that ->cq_timeouts check in place. With that, the restriction could
> be lifted.
> 


Thank you for your detailed explanation! I will try the solution you
suggest instead of this patch!


      reply	other threads:[~2019-11-07  1:28 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-04  8:56 [RFC] io_uring: stop only support read/write for ctx with IORING_SETUP_IOPOLL yangerkun
2019-11-04 10:09 ` Bob Liu
2019-11-04 11:46   ` yangerkun
2019-11-04 14:01     ` Jens Axboe
2019-11-06 12:00       ` yangerkun
2019-11-06 14:14         ` Jens Axboe
2019-11-07  1:27           ` yangerkun [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0f3419a4-e49f-4056-48cf-0e50431b8062@huawei.com \
    --to=yangerkun@huawei.com \
    --cc=axboe@kernel.dk \
    --cc=bob.liu@oracle.com \
    --cc=houtao1@huawei.com \
    --cc=linux-block@vger.kernel.org \
    --cc=yi.zhang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).