io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [FEATURE REQUEST] Specify a sqe won't generate a cqe
@ 2020-02-14  8:29 Carter Li 李通洲
  2020-02-14 10:34 ` Pavel Begunkov
  0 siblings, 1 reply; 6+ messages in thread
From: Carter Li 李通洲 @ 2020-02-14  8:29 UTC (permalink / raw)
  To: io-uring

To implement io_uring_wait_cqe_timeout, we introduce a magic number
called `LIBURING_UDATA_TIMEOUT`. The problem is that not only we
must make sure that users should never set sqe->user_data to
LIBURING_UDATA_TIMEOUT, but also introduce extra complexity to
filter out TIMEOUT cqes.

Former discussion: https://github.com/axboe/liburing/issues/53

I’m suggesting introducing a new SQE flag called IOSQE_IGNORE_CQE
to solve this problem.

For a sqe tagged with IOSQE_IGNORE_CQE flag, it won’t generate a cqe
on completion. So that IORING_OP_TIMEOUT can be filtered on kernel
side.

In addition, `IOSQE_IGNORE_CQE` can be used to save cq size.

For example `POLL_ADD(POLLIN)->READ/RECV` link chain, people usually
don’t care the result of `POLL_ADD` is ( since it will always be
POLLIN ), `IOSQE_IGNORE_CQE` can be set on `POLL_ADD` to save lots
of cq size.

Besides POLL_ADD, people usually don’t care the result of POLL_REMOVE
/TIMEOUT_REMOVE/ASYNC_CANCEL/CLOSE. These operations can also be tagged
with IOSQE_IGNORE_CQE.

Thoughts?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FEATURE REQUEST] Specify a sqe won't generate a cqe
  2020-02-14  8:29 [FEATURE REQUEST] Specify a sqe won't generate a cqe Carter Li 李通洲
@ 2020-02-14 10:34 ` Pavel Begunkov
  2020-02-14 11:27   ` Carter Li 李通洲
  0 siblings, 1 reply; 6+ messages in thread
From: Pavel Begunkov @ 2020-02-14 10:34 UTC (permalink / raw)
  To: Carter Li 李通洲, io-uring

On 2/14/2020 11:29 AM, Carter Li 李通洲 wrote:
> To implement io_uring_wait_cqe_timeout, we introduce a magic number
> called `LIBURING_UDATA_TIMEOUT`. The problem is that not only we
> must make sure that users should never set sqe->user_data to
> LIBURING_UDATA_TIMEOUT, but also introduce extra complexity to
> filter out TIMEOUT cqes.
> 
> Former discussion: https://github.com/axboe/liburing/issues/53
> 
> I’m suggesting introducing a new SQE flag called IOSQE_IGNORE_CQE
> to solve this problem.
> 
> For a sqe tagged with IOSQE_IGNORE_CQE flag, it won’t generate a cqe
> on completion. So that IORING_OP_TIMEOUT can be filtered on kernel
> side.
> 
> In addition, `IOSQE_IGNORE_CQE` can be used to save cq size.
> 
> For example `POLL_ADD(POLLIN)->READ/RECV` link chain, people usually
> don’t care the result of `POLL_ADD` is ( since it will always be
> POLLIN ), `IOSQE_IGNORE_CQE` can be set on `POLL_ADD` to save lots
> of cq size.
> 
> Besides POLL_ADD, people usually don’t care the result of POLL_REMOVE
> /TIMEOUT_REMOVE/ASYNC_CANCEL/CLOSE. These operations can also be tagged
> with IOSQE_IGNORE_CQE.
> 
> Thoughts?
> 

I like the idea! And that's one of my TODOs for the eBPF plans.
Let me list my use cases, so we can think how to extend it a bit.

1. In case of link fail, we need to reap all -ECANCELLED, analise it and
resubmit the rest. It's quite inconvenient. We may want to have CQE only
for not cancelled requests.

2. When chain succeeded, you in the most cases already know the result
of all intermediate CQEs, but you still need to reap and match them.
I'd prefer to have only 1 CQE per link, that is either for the first
failed or for the last request in the chain.

These 2 may shed much processing overhead from the userspace.

3. If we generate requests by eBPF even the notion of per-request event
may broke.
- eBPF creating new requests would also need to specify user-data, and
  this may be problematic from the user perspective.
- may want to not generate CQEs automatically, but let eBPF do it.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FEATURE REQUEST] Specify a sqe won't generate a cqe
  2020-02-14 10:34 ` Pavel Begunkov
@ 2020-02-14 11:27   ` Carter Li 李通洲
  2020-02-14 12:52     ` Pavel Begunkov
  0 siblings, 1 reply; 6+ messages in thread
From: Carter Li 李通洲 @ 2020-02-14 11:27 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: io-uring


> 2020年2月14日 下午6:34,Pavel Begunkov <asml.silence@gmail.com> 写道:
> 
> On 2/14/2020 11:29 AM, Carter Li 李通洲 wrote:
>> To implement io_uring_wait_cqe_timeout, we introduce a magic number
>> called `LIBURING_UDATA_TIMEOUT`. The problem is that not only we
>> must make sure that users should never set sqe->user_data to
>> LIBURING_UDATA_TIMEOUT, but also introduce extra complexity to
>> filter out TIMEOUT cqes.
>> 
>> Former discussion: https://github.com/axboe/liburing/issues/53
>> 
>> I’m suggesting introducing a new SQE flag called IOSQE_IGNORE_CQE
>> to solve this problem.
>> 
>> For a sqe tagged with IOSQE_IGNORE_CQE flag, it won’t generate a cqe
>> on completion. So that IORING_OP_TIMEOUT can be filtered on kernel
>> side.
>> 
>> In addition, `IOSQE_IGNORE_CQE` can be used to save cq size.
>> 
>> For example `POLL_ADD(POLLIN)->READ/RECV` link chain, people usually
>> don’t care the result of `POLL_ADD` is ( since it will always be
>> POLLIN ), `IOSQE_IGNORE_CQE` can be set on `POLL_ADD` to save lots
>> of cq size.
>> 
>> Besides POLL_ADD, people usually don’t care the result of POLL_REMOVE
>> /TIMEOUT_REMOVE/ASYNC_CANCEL/CLOSE. These operations can also be tagged
>> with IOSQE_IGNORE_CQE.
>> 
>> Thoughts?
>> 
> 
> I like the idea! And that's one of my TODOs for the eBPF plans.
> Let me list my use cases, so we can think how to extend it a bit.
> 
> 1. In case of link fail, we need to reap all -ECANCELLED, analise it and
> resubmit the rest. It's quite inconvenient. We may want to have CQE only
> for not cancelled requests.
> 
> 2. When chain succeeded, you in the most cases already know the result
> of all intermediate CQEs, but you still need to reap and match them.
> I'd prefer to have only 1 CQE per link, that is either for the first
> failed or for the last request in the chain.
> 
> These 2 may shed much processing overhead from the userspace.

I couldn't agree more!

Another problem is that io_uring_enter will be awaked for completion of
every operation in a link, which results in unnecessary context switch.
When awaked, users have nothing to do but issue another io_uring_enter
syscall to wait for completion of the entire link chain.

> 
> 3. If we generate requests by eBPF even the notion of per-request event
> may broke.
> - eBPF creating new requests would also need to specify user-data, and
>  this may be problematic from the user perspective.
> - may want to not generate CQEs automatically, but let eBPF do it.
> 
> -- 
> Pavel Begunkov


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FEATURE REQUEST] Specify a sqe won't generate a cqe
  2020-02-14 11:27   ` Carter Li 李通洲
@ 2020-02-14 12:52     ` Pavel Begunkov
  2020-02-14 13:27       ` Carter Li 李通洲
  0 siblings, 1 reply; 6+ messages in thread
From: Pavel Begunkov @ 2020-02-14 12:52 UTC (permalink / raw)
  To: Carter Li 李通洲; +Cc: io-uring

On 2/14/2020 2:27 PM, Carter Li 李通洲 wrote:
> 
>> 2020年2月14日 下午6:34,Pavel Begunkov <asml.silence@gmail.com> 写道:
>>
>> On 2/14/2020 11:29 AM, Carter Li 李通洲 wrote:
>>> To implement io_uring_wait_cqe_timeout, we introduce a magic number
>>> called `LIBURING_UDATA_TIMEOUT`. The problem is that not only we
>>> must make sure that users should never set sqe->user_data to
>>> LIBURING_UDATA_TIMEOUT, but also introduce extra complexity to
>>> filter out TIMEOUT cqes.
>>>
>>> Former discussion: https://github.com/axboe/liburing/issues/53
>>>
>>> I’m suggesting introducing a new SQE flag called IOSQE_IGNORE_CQE
>>> to solve this problem.
>>>
>>> For a sqe tagged with IOSQE_IGNORE_CQE flag, it won’t generate a cqe
>>> on completion. So that IORING_OP_TIMEOUT can be filtered on kernel
>>> side.
>>>
>>> In addition, `IOSQE_IGNORE_CQE` can be used to save cq size.
>>>
>>> For example `POLL_ADD(POLLIN)->READ/RECV` link chain, people usually
>>> don’t care the result of `POLL_ADD` is ( since it will always be
>>> POLLIN ), `IOSQE_IGNORE_CQE` can be set on `POLL_ADD` to save lots
>>> of cq size.
>>>
>>> Besides POLL_ADD, people usually don’t care the result of POLL_REMOVE
>>> /TIMEOUT_REMOVE/ASYNC_CANCEL/CLOSE. These operations can also be tagged
>>> with IOSQE_IGNORE_CQE.
>>>
>>> Thoughts?
>>>
>>
>> I like the idea! And that's one of my TODOs for the eBPF plans.
>> Let me list my use cases, so we can think how to extend it a bit.
>>
>> 1. In case of link fail, we need to reap all -ECANCELLED, analise it and
>> resubmit the rest. It's quite inconvenient. We may want to have CQE only
>> for not cancelled requests.
>>
>> 2. When chain succeeded, you in the most cases already know the result
>> of all intermediate CQEs, but you still need to reap and match them.
>> I'd prefer to have only 1 CQE per link, that is either for the first
>> failed or for the last request in the chain.
>>
>> These 2 may shed much processing overhead from the userspace.
> 
> I couldn't agree more!
> 
> Another problem is that io_uring_enter will be awaked for completion of
> every operation in a link, which results in unnecessary context switch.
> When awaked, users have nothing to do but issue another io_uring_enter
> syscall to wait for completion of the entire link chain.

Good point. Sounds like I have one more thing to do :)
Would the behaviour as in the (2) cover all your needs?

There is a nuisance with linked timeouts, but I think it's reasonable
for REQ->LINKED_TIMEOUT, where it didn't fired, notify only for REQ


>>
>> 3. If we generate requests by eBPF even the notion of per-request event
>> may broke.
>> - eBPF creating new requests would also need to specify user-data, and
>>  this may be problematic from the user perspective.
>> - may want to not generate CQEs automatically, but let eBPF do it.
>>
>> -- 
>> Pavel Begunkov
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FEATURE REQUEST] Specify a sqe won't generate a cqe
  2020-02-14 12:52     ` Pavel Begunkov
@ 2020-02-14 13:27       ` Carter Li 李通洲
  2020-02-14 14:16         ` Pavel Begunkov
  0 siblings, 1 reply; 6+ messages in thread
From: Carter Li 李通洲 @ 2020-02-14 13:27 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: io-uring



> 2020年2月14日 下午8:52,Pavel Begunkov <asml.silence@gmail.com> 写道:
> 
> On 2/14/2020 2:27 PM, Carter Li 李通洲 wrote:
>> 
>>> 2020年2月14日 下午6:34,Pavel Begunkov <asml.silence@gmail.com> 写道:
>>> 
>>> On 2/14/2020 11:29 AM, Carter Li 李通洲 wrote:
>>>> To implement io_uring_wait_cqe_timeout, we introduce a magic number
>>>> called `LIBURING_UDATA_TIMEOUT`. The problem is that not only we
>>>> must make sure that users should never set sqe->user_data to
>>>> LIBURING_UDATA_TIMEOUT, but also introduce extra complexity to
>>>> filter out TIMEOUT cqes.
>>>> 
>>>> Former discussion: https://github.com/axboe/liburing/issues/53
>>>> 
>>>> I’m suggesting introducing a new SQE flag called IOSQE_IGNORE_CQE
>>>> to solve this problem.
>>>> 
>>>> For a sqe tagged with IOSQE_IGNORE_CQE flag, it won’t generate a cqe
>>>> on completion. So that IORING_OP_TIMEOUT can be filtered on kernel
>>>> side.
>>>> 
>>>> In addition, `IOSQE_IGNORE_CQE` can be used to save cq size.
>>>> 
>>>> For example `POLL_ADD(POLLIN)->READ/RECV` link chain, people usually
>>>> don’t care the result of `POLL_ADD` is ( since it will always be
>>>> POLLIN ), `IOSQE_IGNORE_CQE` can be set on `POLL_ADD` to save lots
>>>> of cq size.
>>>> 
>>>> Besides POLL_ADD, people usually don’t care the result of POLL_REMOVE
>>>> /TIMEOUT_REMOVE/ASYNC_CANCEL/CLOSE. These operations can also be tagged
>>>> with IOSQE_IGNORE_CQE.
>>>> 
>>>> Thoughts?
>>>> 
>>> 
>>> I like the idea! And that's one of my TODOs for the eBPF plans.
>>> Let me list my use cases, so we can think how to extend it a bit.
>>> 
>>> 1. In case of link fail, we need to reap all -ECANCELLED, analise it and
>>> resubmit the rest. It's quite inconvenient. We may want to have CQE only
>>> for not cancelled requests.
>>> 
>>> 2. When chain succeeded, you in the most cases already know the result
>>> of all intermediate CQEs, but you still need to reap and match them.
>>> I'd prefer to have only 1 CQE per link, that is either for the first
>>> failed or for the last request in the chain.
>>> 
>>> These 2 may shed much processing overhead from the userspace.
>> 
>> I couldn't agree more!
>> 
>> Another problem is that io_uring_enter will be awaked for completion of
>> every operation in a link, which results in unnecessary context switch.
>> When awaked, users have nothing to do but issue another io_uring_enter
>> syscall to wait for completion of the entire link chain.
> 
> Good point. Sounds like I have one more thing to do :)
> Would the behaviour as in the (2) cover all your needs?

(2) should cover most cases for me. For cases it couldn’t cover ( if any ),
I can still use normal sqes.

> 
> There is a nuisance with linked timeouts, but I think it's reasonable
> for REQ->LINKED_TIMEOUT, where it didn't fired, notify only for REQ
> 
>>> 
>>> 3. If we generate requests by eBPF even the notion of per-request event
>>> may broke.
>>> - eBPF creating new requests would also need to specify user-data, and
>>> this may be problematic from the user perspective.
>>> - may want to not generate CQEs automatically, but let eBPF do it.
>>> 
>>> -- 
>>> Pavel Begunkov
>> 
> 
> -- 
> Pavel Begunkov


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FEATURE REQUEST] Specify a sqe won't generate a cqe
  2020-02-14 13:27       ` Carter Li 李通洲
@ 2020-02-14 14:16         ` Pavel Begunkov
  0 siblings, 0 replies; 6+ messages in thread
From: Pavel Begunkov @ 2020-02-14 14:16 UTC (permalink / raw)
  To: Carter Li 李通洲; +Cc: io-uring

On 2/14/2020 4:27 PM, Carter Li 李通洲 wrote:
>> 2020年2月14日 下午8:52,Pavel Begunkov <asml.silence@gmail.com> 写道:
>> On 2/14/2020 2:27 PM, Carter Li 李通洲 wrote:
>>>> 2020年2月14日 下午6:34,Pavel Begunkov <asml.silence@gmail.com> 写道:
>>>> On 2/14/2020 11:29 AM, Carter Li 李通洲 wrote:
>>>>> To implement io_uring_wait_cqe_timeout, we introduce a magic number
>>>>> called `LIBURING_UDATA_TIMEOUT`. The problem is that not only we
>>>>> must make sure that users should never set sqe->user_data to
>>>>> LIBURING_UDATA_TIMEOUT, but also introduce extra complexity to
>>>>> filter out TIMEOUT cqes.
>>>>>
>>>>> Former discussion: https://github.com/axboe/liburing/issues/53
>>>>>
>>>>> I’m suggesting introducing a new SQE flag called IOSQE_IGNORE_CQE
>>>>> to solve this problem.
>>>>>
>>>>> For a sqe tagged with IOSQE_IGNORE_CQE flag, it won’t generate a cqe
>>>>> on completion. So that IORING_OP_TIMEOUT can be filtered on kernel
>>>>> side.
>>>>>
>>>>> In addition, `IOSQE_IGNORE_CQE` can be used to save cq size.
>>>>>
>>>>> For example `POLL_ADD(POLLIN)->READ/RECV` link chain, people usually
>>>>> don’t care the result of `POLL_ADD` is ( since it will always be
>>>>> POLLIN ), `IOSQE_IGNORE_CQE` can be set on `POLL_ADD` to save lots
>>>>> of cq size.
>>>>>
>>>>> Besides POLL_ADD, people usually don’t care the result of POLL_REMOVE
>>>>> /TIMEOUT_REMOVE/ASYNC_CANCEL/CLOSE. These operations can also be tagged
>>>>> with IOSQE_IGNORE_CQE.
>>>>>
>>>>> Thoughts?
>>>>>
>>>>
>>>> I like the idea! And that's one of my TODOs for the eBPF plans.
>>>> Let me list my use cases, so we can think how to extend it a bit.
>>>>
>>>> 1. In case of link fail, we need to reap all -ECANCELLED, analise it and
>>>> resubmit the rest. It's quite inconvenient. We may want to have CQE only
>>>> for not cancelled requests.
>>>>
>>>> 2. When chain succeeded, you in the most cases already know the result
>>>> of all intermediate CQEs, but you still need to reap and match them.
>>>> I'd prefer to have only 1 CQE per link, that is either for the first
>>>> failed or for the last request in the chain.
>>>>
>>>> These 2 may shed much processing overhead from the userspace.
>>>
>>> I couldn't agree more!
>>>
>>> Another problem is that io_uring_enter will be awaked for completion of
>>> every operation in a link, which results in unnecessary context switch.
>>> When awaked, users have nothing to do but issue another io_uring_enter
>>> syscall to wait for completion of the entire link chain.
>>
>> Good point. Sounds like I have one more thing to do :)
>> Would the behaviour as in the (2) cover all your needs?
> 
> (2) should cover most cases for me. For cases it couldn’t cover ( if any ),
> I can still use normal sqes.
> 

Great! I need to give a thought, what I may need for eBPF-steering
stuff, but sounds like a plan.

>>
>> There is a nuisance with linked timeouts, but I think it's reasonable
>> for REQ->LINKED_TIMEOUT, where it didn't fired, notify only for REQ
>>
>>>>
>>>> 3. If we generate requests by eBPF even the notion of per-request event
>>>> may broke.
>>>> - eBPF creating new requests would also need to specify user-data, and
>>>> this may be problematic from the user perspective.
>>>> - may want to not generate CQEs automatically, but let eBPF do it.
>>>>

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-02-14 14:17 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-14  8:29 [FEATURE REQUEST] Specify a sqe won't generate a cqe Carter Li 李通洲
2020-02-14 10:34 ` Pavel Begunkov
2020-02-14 11:27   ` Carter Li 李通洲
2020-02-14 12:52     ` Pavel Begunkov
2020-02-14 13:27       ` Carter Li 李通洲
2020-02-14 14:16         ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).