All of lore.kernel.org
 help / color / mirror / Atom feed
* bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
@ 2021-02-02  5:36 Victor Stewart
  2021-02-02 11:05 ` Pavel Begunkov
  0 siblings, 1 reply; 24+ messages in thread
From: Victor Stewart @ 2021-02-02  5:36 UTC (permalink / raw)
  To: io-uring

started experimenting with sqpoll and my fastpoll accepts started
failing. was banging my head against the wall for a few hours... wrote
this test case below....

basically fastpoll accept only works without sqpoll, and without
adding IOSQE_FIXED_FILE to the sqe. fails with both, fails with
either. these must be bugs?

I'm running Clear Linux 5.10.10-1017.native.

i hope no one here is allergic to C++, haha. compilation command
commented in the gist, just replace the two paths. and I can fold
these checks if needed into a liburing PR later.

https://gist.github.com/victorstewart/98814b65ed702c33480487c05b40eb56

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
  2021-02-02  5:36 bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE Victor Stewart
@ 2021-02-02 11:05 ` Pavel Begunkov
  2021-02-02 11:23   ` Pavel Begunkov
  0 siblings, 1 reply; 24+ messages in thread
From: Pavel Begunkov @ 2021-02-02 11:05 UTC (permalink / raw)
  To: Victor Stewart, io-uring

On 02/02/2021 05:36, Victor Stewart wrote:
> started experimenting with sqpoll and my fastpoll accepts started
> failing. was banging my head against the wall for a few hours... wrote
> this test case below....
> 
> basically fastpoll accept only works without sqpoll, and without
> adding IOSQE_FIXED_FILE to the sqe. fails with both, fails with
> either. these must be bugs?
> 
> I'm running Clear Linux 5.10.10-1017.native.
> 
> i hope no one here is allergic to C++, haha. compilation command
> commented in the gist, just replace the two paths. and I can fold
> these checks if needed into a liburing PR later.
> 
> https://gist.github.com/victorstewart/98814b65ed702c33480487c05b40eb56

Please don't forget about checking error codes. At least fixed
files don't work for you because of

int fds[10];
memset(fds, -1, 10); // 10 bytes, not 10 ints

So io_uring_register_files() silently fails.


For me, all two "with SQPOLL" tests spit SUCCESS, then it hangs.
But need to test it with upstream to be sure.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
  2021-02-02 11:05 ` Pavel Begunkov
@ 2021-02-02 11:23   ` Pavel Begunkov
  2021-02-02 16:18     ` Victor Stewart
  0 siblings, 1 reply; 24+ messages in thread
From: Pavel Begunkov @ 2021-02-02 11:23 UTC (permalink / raw)
  To: Victor Stewart, io-uring

On 02/02/2021 11:05, Pavel Begunkov wrote:
> On 02/02/2021 05:36, Victor Stewart wrote:
>> started experimenting with sqpoll and my fastpoll accepts started
>> failing. was banging my head against the wall for a few hours... wrote
>> this test case below....
>>
>> basically fastpoll accept only works without sqpoll, and without
>> adding IOSQE_FIXED_FILE to the sqe. fails with both, fails with
>> either. these must be bugs?
>>
>> I'm running Clear Linux 5.10.10-1017.native.
>>
>> i hope no one here is allergic to C++, haha. compilation command
>> commented in the gist, just replace the two paths. and I can fold
>> these checks if needed into a liburing PR later.
>>
>> https://gist.github.com/victorstewart/98814b65ed702c33480487c05b40eb56
> 
> Please don't forget about checking error codes. At least fixed
> files don't work for you because of
> 
> int fds[10];
> memset(fds, -1, 10); // 10 bytes, not 10 ints
> 
> So io_uring_register_files() silently fails.
> 
> 
> For me, all two "with SQPOLL" tests spit SUCCESS, then it hangs.
> But need to test it with upstream to be sure.

Also you forget to submit, all works with these 2 changes.

When you don't do io_uring_submit(), apparently it gets live-locked
in liburing's _io_uring_get_cqe(), that's a bug.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
  2021-02-02 11:23   ` Pavel Begunkov
@ 2021-02-02 16:18     ` Victor Stewart
  2021-02-02 16:30       ` Victor Stewart
  2021-02-02 17:21       ` Pavel Begunkov
  0 siblings, 2 replies; 24+ messages in thread
From: Victor Stewart @ 2021-02-02 16:18 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: io-uring

> > Please don't forget about checking error codes. At least fixed
> > files don't work for you because of
> >
> > int fds[10];
> > memset(fds, -1, 10); // 10 bytes, not 10 ints
> >
> > So io_uring_register_files() silently fails.

well i'm glad this one was my own careless error. fresh eyes are
everything. you're right, bad habit of ignoring return values lol.

> Also you forget to submit, all works with these 2 changes.
>
> When you don't do io_uring_submit(), apparently it gets live-locked
> in liburing's _io_uring_get_cqe(), that's a bug.

in the comments above io_uring_wait_cqe_timeout it says it submits for
you, that's why i didn't submit here. but i guess great if that
exposed the _io_uring_get_cqe bug.

thanks so much for taking a look at this Pavel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
  2021-02-02 16:18     ` Victor Stewart
@ 2021-02-02 16:30       ` Victor Stewart
  2021-02-02 16:44         ` Jens Axboe
  2021-02-02 17:21       ` Pavel Begunkov
  1 sibling, 1 reply; 24+ messages in thread
From: Victor Stewart @ 2021-02-02 16:30 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: io-uring

i just reran it with my memset bug fixed, and now get these results...
still fails with SQPOLL. (i ommited the SQPOLL without fixed flag
result since i'm working with a 5.10 kernel). I tried with and without
submitting, no difference.

with SQPOLL      with      FIXED FLAG -> FAILURE: failed with error = -22
without SQPOLL with      FIXED FLAG -> SUCCESS: timeout as expected
without SQPOLL without FIXED FLAG -> SUCCESS: timeout as expected

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
  2021-02-02 16:30       ` Victor Stewart
@ 2021-02-02 16:44         ` Jens Axboe
  2021-02-02 17:10           ` Victor Stewart
  0 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2021-02-02 16:44 UTC (permalink / raw)
  To: Victor Stewart, Pavel Begunkov; +Cc: io-uring

On 2/2/21 9:30 AM, Victor Stewart wrote:
> i just reran it with my memset bug fixed, and now get these results...
> still fails with SQPOLL. (i ommited the SQPOLL without fixed flag
> result since i'm working with a 5.10 kernel). I tried with and without
> submitting, no difference.
> 
> with SQPOLL      with      FIXED FLAG -> FAILURE: failed with error = -22
> without SQPOLL with      FIXED FLAG -> SUCCESS: timeout as expected
> without SQPOLL without FIXED FLAG -> SUCCESS: timeout as expected

Can you send the updated test app?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
  2021-02-02 16:44         ` Jens Axboe
@ 2021-02-02 17:10           ` Victor Stewart
  2021-02-02 17:24             ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Victor Stewart @ 2021-02-02 17:10 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Pavel Begunkov, io-uring

> Can you send the updated test app?

https://gist.github.com/victorstewart/98814b65ed702c33480487c05b40eb56

same link i just updated the same gist

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
  2021-02-02 16:18     ` Victor Stewart
  2021-02-02 16:30       ` Victor Stewart
@ 2021-02-02 17:21       ` Pavel Begunkov
  2021-02-02 17:30         ` Victor Stewart
  1 sibling, 1 reply; 24+ messages in thread
From: Pavel Begunkov @ 2021-02-02 17:21 UTC (permalink / raw)
  To: Victor Stewart; +Cc: io-uring

On 02/02/2021 16:18, Victor Stewart wrote:
>>> Please don't forget about checking error codes. At least fixed
>>> files don't work for you because of
>>>
>>> int fds[10];
>>> memset(fds, -1, 10); // 10 bytes, not 10 ints
>>>
>>> So io_uring_register_files() silently fails.
> 
> well i'm glad this one was my own careless error. fresh eyes are
> everything. you're right, bad habit of ignoring return values lol.
> 
>> Also you forget to submit, all works with these 2 changes.
>>
>> When you don't do io_uring_submit(), apparently it gets live-locked
>> in liburing's _io_uring_get_cqe(), that's a bug.
> 
> in the comments above io_uring_wait_cqe_timeout it says it submits for
> you, that's why i didn't submit here. but i guess great if that

There is a change of behaviour, if IORING_FEAT_EXT_ARG is set it
won't submit (IIRC, since 5.12) -- it's pretty important for some
multi-threaded cases.

So... where in particular does it say that? In case your liburing
is up to date and we forgot to remove such a comment.

> exposed the _io_uring_get_cqe bug.

It's great for sure!

> 
> thanks so much for taking a look at this Pavel
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
  2021-02-02 17:10           ` Victor Stewart
@ 2021-02-02 17:24             ` Jens Axboe
  2021-02-02 17:41               ` Pavel Begunkov
                                 ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Jens Axboe @ 2021-02-02 17:24 UTC (permalink / raw)
  To: Victor Stewart; +Cc: Pavel Begunkov, io-uring

On 2/2/21 10:10 AM, Victor Stewart wrote:
>> Can you send the updated test app?
> 
> https://gist.github.com/victorstewart/98814b65ed702c33480487c05b40eb56
> 
> same link i just updated the same gist

And how are you running it?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
  2021-02-02 17:21       ` Pavel Begunkov
@ 2021-02-02 17:30         ` Victor Stewart
  2021-02-02 17:45           ` Victor Stewart
  0 siblings, 1 reply; 24+ messages in thread
From: Victor Stewart @ 2021-02-02 17:30 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: io-uring

> There is a change of behaviour, if IORING_FEAT_EXT_ARG is set it
> won't submit (IIRC, since 5.12) -- it's pretty important for some
> multi-threaded cases.
>
> So... where in particular does it say that? In case your liburing
> is up to date and we forgot to remove such a comment.

https://github.com/axboe/liburing/blob/c96202546bd9d7420d97bc05e73c7144d0924e8a/src/queue.c#L269

"For kernels without IORING_FEAT_EXT_ARG (5.10 and older), if 'ts' is
specified, the application need not call io_uring_submit() before
calling this function, as we will do that on its behalf."

and I'm using the latest Clear Linux kernel which is 5.10, so that's
why I wasn't submitting.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
  2021-02-02 17:24             ` Jens Axboe
@ 2021-02-02 17:41               ` Pavel Begunkov
  2021-02-02 20:34                 ` Pavel Begunkov
  2021-02-02 17:46               ` Jens Axboe
  2021-02-02 17:46               ` Victor Stewart
  2 siblings, 1 reply; 24+ messages in thread
From: Pavel Begunkov @ 2021-02-02 17:41 UTC (permalink / raw)
  To: Jens Axboe, Victor Stewart; +Cc: io-uring

On 02/02/2021 17:24, Jens Axboe wrote:
> On 2/2/21 10:10 AM, Victor Stewart wrote:
>>> Can you send the updated test app?
>>
>> https://gist.github.com/victorstewart/98814b65ed702c33480487c05b40eb56
>>
>> same link i just updated the same gist
> 
> And how are you running it?

with SQPOLL    with    FIXED FLAG -> FAILURE: failed with error = ???
	-> io_uring_wait_cqe_timeout() strangely returns -1, (-EPERM??)

Others work fine. I'll check what's with it later.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
  2021-02-02 17:30         ` Victor Stewart
@ 2021-02-02 17:45           ` Victor Stewart
  0 siblings, 0 replies; 24+ messages in thread
From: Victor Stewart @ 2021-02-02 17:45 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: io-uring

side note question on optimal sqpoll topology, assume a single
threaded application, if you're going to pin the application thread
and the sqpoll thread to the same logical cpu... is that stupid? maybe
they'll just back and forth preempt each other? because either your
application is doing work, or you're waiting on io_uring in the kernel
to deliver it more work to do, not a scenario where there's idleness
to be exploited with parallelism.

the only advantage that comes to my mind would be, take a network
server example, reducing the latency of submitted packets getting out
by time sharing the work of pushing those packets out with application
work constructing the next ones.

On Tue, Feb 2, 2021 at 12:30 PM Victor Stewart <v@nametag.social> wrote:
>
> > There is a change of behaviour, if IORING_FEAT_EXT_ARG is set it
> > won't submit (IIRC, since 5.12) -- it's pretty important for some
> > multi-threaded cases.
> >
> > So... where in particular does it say that? In case your liburing
> > is up to date and we forgot to remove such a comment.
>
> https://github.com/axboe/liburing/blob/c96202546bd9d7420d97bc05e73c7144d0924e8a/src/queue.c#L269
>
> "For kernels without IORING_FEAT_EXT_ARG (5.10 and older), if 'ts' is
> specified, the application need not call io_uring_submit() before
> calling this function, as we will do that on its behalf."
>
> and I'm using the latest Clear Linux kernel which is 5.10, so that's
> why I wasn't submitting.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
  2021-02-02 17:24             ` Jens Axboe
  2021-02-02 17:41               ` Pavel Begunkov
@ 2021-02-02 17:46               ` Jens Axboe
  2021-02-02 17:50                 ` Victor Stewart
  2021-02-02 17:46               ` Victor Stewart
  2 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2021-02-02 17:46 UTC (permalink / raw)
  To: Victor Stewart; +Cc: Pavel Begunkov, io-uring

On 2/2/21 10:24 AM, Jens Axboe wrote:
> On 2/2/21 10:10 AM, Victor Stewart wrote:
>>> Can you send the updated test app?
>>
>> https://gist.github.com/victorstewart/98814b65ed702c33480487c05b40eb56
>>
>> same link i just updated the same gist
> 
> And how are you running it?

Alright, so you're doing ACCEPT with 5.10 and you cannot do ACCEPT
with SQPOLL as it needs access to the file table. That works in 5.11
and newer, NOT in 5.10. Same goes or eg open/close and those kinds
of requests.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
  2021-02-02 17:24             ` Jens Axboe
  2021-02-02 17:41               ` Pavel Begunkov
  2021-02-02 17:46               ` Jens Axboe
@ 2021-02-02 17:46               ` Victor Stewart
  2 siblings, 0 replies; 24+ messages in thread
From: Victor Stewart @ 2021-02-02 17:46 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Pavel Begunkov, io-uring

> And how are you running it?

compiling the file locally with this command...

clang++ -fuse-ld=lld --std=gnu++2a -I/folder/with/liburing
/path/to/liburing.a -o accept.test accept.test.cpp -g

and then just doing ./accept.test

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
  2021-02-02 17:46               ` Jens Axboe
@ 2021-02-02 17:50                 ` Victor Stewart
  2021-02-02 17:57                   ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Victor Stewart @ 2021-02-02 17:50 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Pavel Begunkov, io-uring

> Alright, so you're doing ACCEPT with 5.10 and you cannot do ACCEPT
> with SQPOLL as it needs access to the file table. That works in 5.11
> and newer, NOT in 5.10. Same goes or eg open/close and those kinds
> of requests.

okay i must have missed that point somewhere. perfect i'll just avoid
sqpoll until using 5.11. at least this exercise exposed some other
issue pavel wanted to look into!

> --
> Jens Axboe
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
  2021-02-02 17:50                 ` Victor Stewart
@ 2021-02-02 17:57                   ` Jens Axboe
  0 siblings, 0 replies; 24+ messages in thread
From: Jens Axboe @ 2021-02-02 17:57 UTC (permalink / raw)
  To: Victor Stewart; +Cc: Pavel Begunkov, io-uring

On 2/2/21 10:50 AM, Victor Stewart wrote:
>> Alright, so you're doing ACCEPT with 5.10 and you cannot do ACCEPT
>> with SQPOLL as it needs access to the file table. That works in 5.11
>> and newer, NOT in 5.10. Same goes or eg open/close and those kinds
>> of requests.
> 
> okay i must have missed that point somewhere. perfect i'll just avoid
> sqpoll until using 5.11. at least this exercise exposed some other
> issue pavel wanted to look into!

Yep, there's an issue with the newer timeouts and not breaking out of
the loop as it should.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
  2021-02-02 17:41               ` Pavel Begunkov
@ 2021-02-02 20:34                 ` Pavel Begunkov
  2021-02-02 20:48                   ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Pavel Begunkov @ 2021-02-02 20:34 UTC (permalink / raw)
  To: Jens Axboe, Victor Stewart; +Cc: io-uring

On 02/02/2021 17:41, Pavel Begunkov wrote:
> On 02/02/2021 17:24, Jens Axboe wrote:
>> On 2/2/21 10:10 AM, Victor Stewart wrote:
>>>> Can you send the updated test app?
>>>
>>> https://gist.github.com/victorstewart/98814b65ed702c33480487c05b40eb56
>>>
>>> same link i just updated the same gist
>>
>> And how are you running it?
> 
> with SQPOLL    with    FIXED FLAG -> FAILURE: failed with error = ???
> 	-> io_uring_wait_cqe_timeout() strangely returns -1, (-EPERM??)

Ok, _io_uring_get_cqe() is just screwed twice

TL;DR
we enter into it with submit=0, do an iteration, which decrements it,
then a second iteration passes submit=-1, which is returned back by
the kernel as a result and propagated back from liburing...

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
  2021-02-02 20:34                 ` Pavel Begunkov
@ 2021-02-02 20:48                   ` Jens Axboe
  2021-02-02 20:56                     ` Pavel Begunkov
  0 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2021-02-02 20:48 UTC (permalink / raw)
  To: Pavel Begunkov, Victor Stewart; +Cc: io-uring

On 2/2/21 1:34 PM, Pavel Begunkov wrote:
> On 02/02/2021 17:41, Pavel Begunkov wrote:
>> On 02/02/2021 17:24, Jens Axboe wrote:
>>> On 2/2/21 10:10 AM, Victor Stewart wrote:
>>>>> Can you send the updated test app?
>>>>
>>>> https://gist.github.com/victorstewart/98814b65ed702c33480487c05b40eb56
>>>>
>>>> same link i just updated the same gist
>>>
>>> And how are you running it?
>>
>> with SQPOLL    with    FIXED FLAG -> FAILURE: failed with error = ???
>> 	-> io_uring_wait_cqe_timeout() strangely returns -1, (-EPERM??)
> 
> Ok, _io_uring_get_cqe() is just screwed twice
> 
> TL;DR
> we enter into it with submit=0, do an iteration, which decrements it,
> then a second iteration passes submit=-1, which is returned back by
> the kernel as a result and propagated back from liburing...

Yep, that's what I came up with too. We really just need a clear way
of knowing when to break out, and when to keep going. Eg if we've
done a loop and don't end up calling the system call, then there's
no point in continuing.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
  2021-02-02 20:48                   ` Jens Axboe
@ 2021-02-02 20:56                     ` Pavel Begunkov
  2021-02-03 11:49                       ` Pavel Begunkov
  0 siblings, 1 reply; 24+ messages in thread
From: Pavel Begunkov @ 2021-02-02 20:56 UTC (permalink / raw)
  To: Jens Axboe, Victor Stewart; +Cc: io-uring

On 02/02/2021 20:48, Jens Axboe wrote:
> On 2/2/21 1:34 PM, Pavel Begunkov wrote:
>> On 02/02/2021 17:41, Pavel Begunkov wrote:
>>> On 02/02/2021 17:24, Jens Axboe wrote:
>>>> On 2/2/21 10:10 AM, Victor Stewart wrote:
>>>>>> Can you send the updated test app?
>>>>>
>>>>> https://gist.github.com/victorstewart/98814b65ed702c33480487c05b40eb56
>>>>>
>>>>> same link i just updated the same gist
>>>>
>>>> And how are you running it?
>>>
>>> with SQPOLL    with    FIXED FLAG -> FAILURE: failed with error = ???
>>> 	-> io_uring_wait_cqe_timeout() strangely returns -1, (-EPERM??)
>>
>> Ok, _io_uring_get_cqe() is just screwed twice
>>
>> TL;DR
>> we enter into it with submit=0, do an iteration, which decrements it,
>> then a second iteration passes submit=-1, which is returned back by
>> the kernel as a result and propagated back from liburing...
> 
> Yep, that's what I came up with too. We really just need a clear way
> of knowing when to break out, and when to keep going. Eg if we've
> done a loop and don't end up calling the system call, then there's
> no point in continuing.

We can bodge something up (and forget about it), and do much cleaner
for IORING_FEAT_EXT_ARG, because we don't have LIBURING_UDATA_TIMEOUT
reqs for it and so can remove peek and so on.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
  2021-02-02 20:56                     ` Pavel Begunkov
@ 2021-02-03 11:49                       ` Pavel Begunkov
  2021-02-04 16:50                         ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Pavel Begunkov @ 2021-02-03 11:49 UTC (permalink / raw)
  To: Jens Axboe, Victor Stewart; +Cc: io-uring

On 02/02/2021 20:56, Pavel Begunkov wrote:
> On 02/02/2021 20:48, Jens Axboe wrote:
>> On 2/2/21 1:34 PM, Pavel Begunkov wrote:
>>> On 02/02/2021 17:41, Pavel Begunkov wrote:
>>>> On 02/02/2021 17:24, Jens Axboe wrote:
>>>>> On 2/2/21 10:10 AM, Victor Stewart wrote:
>>>>>>> Can you send the updated test app?
>>>>>>
>>>>>> https://gist.github.com/victorstewart/98814b65ed702c33480487c05b40eb56
>>>>>>
>>>>>> same link i just updated the same gist
>>>>>
>>>>> And how are you running it?
>>>>
>>>> with SQPOLL    with    FIXED FLAG -> FAILURE: failed with error = ???
>>>> 	-> io_uring_wait_cqe_timeout() strangely returns -1, (-EPERM??)
>>>
>>> Ok, _io_uring_get_cqe() is just screwed twice
>>>
>>> TL;DR
>>> we enter into it with submit=0, do an iteration, which decrements it,
>>> then a second iteration passes submit=-1, which is returned back by
>>> the kernel as a result and propagated back from liburing...
>>
>> Yep, that's what I came up with too. We really just need a clear way
>> of knowing when to break out, and when to keep going. Eg if we've
>> done a loop and don't end up calling the system call, then there's
>> no point in continuing.
> 
> We can bodge something up (and forget about it), and do much cleaner
> for IORING_FEAT_EXT_ARG, because we don't have LIBURING_UDATA_TIMEOUT
> reqs for it and so can remove peek and so on.

This version looks reasonably simple, and even passes tests and all
issues found by Victor's test. Didn't test it yet, but should behave
similarly in regard of internal timeouts (pre IORING_FEAT_EXT_ARG).

static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_ptr,
			     struct get_data *data)
{
	struct io_uring_cqe *cqe = NULL;
	int ret = 0, err;

	do {
		unsigned flags = 0;
		unsigned nr_available;
		bool enter = false;

		err = __io_uring_peek_cqe(ring, &cqe, &nr_available);
		if (err)
			break;

		/* IOPOLL won't proceed when there're not reaped CQEs */
		if (cqe && (ring->flags & IORING_SETUP_IOPOLL))
			data->wait_nr = 0;

		if (data->wait_nr > nr_available || cq_ring_needs_flush(ring)) {
			flags = IORING_ENTER_GETEVENTS | data->get_flags;
			enter = true;
		}
		if (data->submit) {
			sq_ring_needs_enter(ring, &flags);
			enter = true;
		}
		if (!enter)
			break;

		ret = __sys_io_uring_enter2(ring->ring_fd, data->submit,
					    data->wait_nr, flags, data->arg,
					    data->sz);
		if (ret < 0) {
			err = -errno;
			break;
		}
		data->submit -= ret;
	} while (1);

	*cqe_ptr = cqe;
	return err;
}

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
  2021-02-03 11:49                       ` Pavel Begunkov
@ 2021-02-04 16:50                         ` Jens Axboe
  2021-02-05 12:46                           ` Pavel Begunkov
  0 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2021-02-04 16:50 UTC (permalink / raw)
  To: Pavel Begunkov, Victor Stewart; +Cc: io-uring

On 2/3/21 4:49 AM, Pavel Begunkov wrote:
> On 02/02/2021 20:56, Pavel Begunkov wrote:
>> On 02/02/2021 20:48, Jens Axboe wrote:
>>> On 2/2/21 1:34 PM, Pavel Begunkov wrote:
>>>> On 02/02/2021 17:41, Pavel Begunkov wrote:
>>>>> On 02/02/2021 17:24, Jens Axboe wrote:
>>>>>> On 2/2/21 10:10 AM, Victor Stewart wrote:
>>>>>>>> Can you send the updated test app?
>>>>>>>
>>>>>>> https://gist.github.com/victorstewart/98814b65ed702c33480487c05b40eb56
>>>>>>>
>>>>>>> same link i just updated the same gist
>>>>>>
>>>>>> And how are you running it?
>>>>>
>>>>> with SQPOLL    with    FIXED FLAG -> FAILURE: failed with error = ???
>>>>> 	-> io_uring_wait_cqe_timeout() strangely returns -1, (-EPERM??)
>>>>
>>>> Ok, _io_uring_get_cqe() is just screwed twice
>>>>
>>>> TL;DR
>>>> we enter into it with submit=0, do an iteration, which decrements it,
>>>> then a second iteration passes submit=-1, which is returned back by
>>>> the kernel as a result and propagated back from liburing...
>>>
>>> Yep, that's what I came up with too. We really just need a clear way
>>> of knowing when to break out, and when to keep going. Eg if we've
>>> done a loop and don't end up calling the system call, then there's
>>> no point in continuing.
>>
>> We can bodge something up (and forget about it), and do much cleaner
>> for IORING_FEAT_EXT_ARG, because we don't have LIBURING_UDATA_TIMEOUT
>> reqs for it and so can remove peek and so on.
> 
> This version looks reasonably simple, and even passes tests and all
> issues found by Victor's test. Didn't test it yet, but should behave
> similarly in regard of internal timeouts (pre IORING_FEAT_EXT_ARG).
> 
> static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_ptr,
> 			     struct get_data *data)
> {
> 	struct io_uring_cqe *cqe = NULL;
> 	int ret = 0, err;
> 
> 	do {
> 		unsigned flags = 0;
> 		unsigned nr_available;
> 		bool enter = false;
> 
> 		err = __io_uring_peek_cqe(ring, &cqe, &nr_available);
> 		if (err)
> 			break;
> 
> 		/* IOPOLL won't proceed when there're not reaped CQEs */
> 		if (cqe && (ring->flags & IORING_SETUP_IOPOLL))
> 			data->wait_nr = 0;
> 
> 		if (data->wait_nr > nr_available || cq_ring_needs_flush(ring)) {
> 			flags = IORING_ENTER_GETEVENTS | data->get_flags;
> 			enter = true;
> 		}
> 		if (data->submit) {
> 			sq_ring_needs_enter(ring, &flags);
> 			enter = true;
> 		}
> 		if (!enter)
> 			break;
> 
> 		ret = __sys_io_uring_enter2(ring->ring_fd, data->submit,
> 					    data->wait_nr, flags, data->arg,
> 					    data->sz);
> 		if (ret < 0) {
> 			err = -errno;
> 			break;
> 		}
> 		data->submit -= ret;
> 	} while (1);
> 
> 	*cqe_ptr = cqe;
> 	return err;
> }

So here's my take on this - any rewrite of _io_uring_get_cqe() is going
to end up adding special cases, that's unfortunately just the nature of
the game. And since we're going to be doing a new liburing release very
shortly, this isn't a great time to add a rewrite of it. It'll certainly
introduce more bugs than it solves, and hence regressions, no matter how
careful we are.

Hence my suggestion is to just patch this in a trivial kind of fashion,
even if it doesn't really make the function any prettier. But it'll be
safer for a release, and then we can rework the function after.

With that in mind, here's my suggestion. The premise is if we go through
the loop and don't do io_uring_enter(), then there's no point in
continuing. That's the trivial fix.


diff --git a/src/queue.c b/src/queue.c
index 94f791e..4161aa7 100644
--- a/src/queue.c
+++ b/src/queue.c
@@ -89,12 +89,13 @@ static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_pt
 {
 	struct io_uring_cqe *cqe = NULL;
 	const int to_wait = data->wait_nr;
-	int ret = 0, err;
+	int err;
 
 	do {
 		bool cq_overflow_flush = false;
 		unsigned flags = 0;
 		unsigned nr_available;
+		int ret = -2;
 
 		err = __io_uring_peek_cqe(ring, &cqe, &nr_available);
 		if (err)
@@ -117,7 +118,9 @@ static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_pt
 			ret = __sys_io_uring_enter2(ring->ring_fd, data->submit,
 					data->wait_nr, flags, data->arg,
 					data->sz);
-		if (ret < 0) {
+		if (ret == -2) {
+			break;
+		} else if (ret < 0) {
 			err = -errno;
 		} else if (ret == (int)data->submit) {
 			data->submit = 0;

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
  2021-02-04 16:50                         ` Jens Axboe
@ 2021-02-05 12:46                           ` Pavel Begunkov
  2021-02-05 14:42                             ` Jens Axboe
  0 siblings, 1 reply; 24+ messages in thread
From: Pavel Begunkov @ 2021-02-05 12:46 UTC (permalink / raw)
  To: Jens Axboe, Victor Stewart; +Cc: io-uring

On 04/02/2021 16:50, Jens Axboe wrote:
> On 2/3/21 4:49 AM, Pavel Begunkov wrote:
>> On 02/02/2021 20:56, Pavel Begunkov wrote:
>>> On 02/02/2021 20:48, Jens Axboe wrote:
>>>> On 2/2/21 1:34 PM, Pavel Begunkov wrote:
>>>>> On 02/02/2021 17:41, Pavel Begunkov wrote:
>>>>>> On 02/02/2021 17:24, Jens Axboe wrote:
>>>>>>> On 2/2/21 10:10 AM, Victor Stewart wrote:
>>>>>>>>> Can you send the updated test app?
>>>>>>>>
>>>>>>>> https://gist.github.com/victorstewart/98814b65ed702c33480487c05b40eb56
>>>>>>>>
>>>>>>>> same link i just updated the same gist
>>>>>>>
>>>>>>> And how are you running it?
>>>>>>
>>>>>> with SQPOLL    with    FIXED FLAG -> FAILURE: failed with error = ???
>>>>>> 	-> io_uring_wait_cqe_timeout() strangely returns -1, (-EPERM??)
>>>>>
>>>>> Ok, _io_uring_get_cqe() is just screwed twice
>>>>>
>>>>> TL;DR
>>>>> we enter into it with submit=0, do an iteration, which decrements it,
>>>>> then a second iteration passes submit=-1, which is returned back by
>>>>> the kernel as a result and propagated back from liburing...
>>>>
>>>> Yep, that's what I came up with too. We really just need a clear way
>>>> of knowing when to break out, and when to keep going. Eg if we've
>>>> done a loop and don't end up calling the system call, then there's
>>>> no point in continuing.
>>>
>>> We can bodge something up (and forget about it), and do much cleaner
>>> for IORING_FEAT_EXT_ARG, because we don't have LIBURING_UDATA_TIMEOUT
>>> reqs for it and so can remove peek and so on.
>>
>> This version looks reasonably simple, and even passes tests and all
>> issues found by Victor's test. Didn't test it yet, but should behave
>> similarly in regard of internal timeouts (pre IORING_FEAT_EXT_ARG).
>>
>> static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_ptr,
>> 			     struct get_data *data)
>> {
>> 	struct io_uring_cqe *cqe = NULL;
>> 	int ret = 0, err;
>>
>> 	do {
>> 		unsigned flags = 0;
>> 		unsigned nr_available;
>> 		bool enter = false;
>>
>> 		err = __io_uring_peek_cqe(ring, &cqe, &nr_available);
>> 		if (err)
>> 			break;
>>
>> 		/* IOPOLL won't proceed when there're not reaped CQEs */
>> 		if (cqe && (ring->flags & IORING_SETUP_IOPOLL))
>> 			data->wait_nr = 0;
>>
>> 		if (data->wait_nr > nr_available || cq_ring_needs_flush(ring)) {
>> 			flags = IORING_ENTER_GETEVENTS | data->get_flags;
>> 			enter = true;
>> 		}
>> 		if (data->submit) {
>> 			sq_ring_needs_enter(ring, &flags);
>> 			enter = true;
>> 		}
>> 		if (!enter)
>> 			break;
>>
>> 		ret = __sys_io_uring_enter2(ring->ring_fd, data->submit,
>> 					    data->wait_nr, flags, data->arg,
>> 					    data->sz);
>> 		if (ret < 0) {
>> 			err = -errno;
>> 			break;
>> 		}
>> 		data->submit -= ret;
>> 	} while (1);
>>
>> 	*cqe_ptr = cqe;
>> 	return err;
>> }
> 
> So here's my take on this - any rewrite of _io_uring_get_cqe() is going
> to end up adding special cases, that's unfortunately just the nature of
> the game. And since we're going to be doing a new liburing release very
> shortly, this isn't a great time to add a rewrite of it. It'll certainly
> introduce more bugs than it solves, and hence regressions, no matter how
> careful we are.
> 
> Hence my suggestion is to just patch this in a trivial kind of fashion,
> even if it doesn't really make the function any prettier. But it'll be
> safer for a release, and then we can rework the function after.
> 
> With that in mind, here's my suggestion. The premise is if we go through
> the loop and don't do io_uring_enter(), then there's no point in
> continuing. That's the trivial fix.

Your idea but imho cleaner below.
+1 comment inline

> diff --git a/src/queue.c b/src/queue.c
> index 94f791e..4161aa7 100644
> --- a/src/queue.c
> +++ b/src/queue.c
> @@ -89,12 +89,13 @@ static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_pt
>  {
>  	struct io_uring_cqe *cqe = NULL;
>  	const int to_wait = data->wait_nr;
> -	int ret = 0, err;
> +	int err;
>  
>  	do {
>  		bool cq_overflow_flush = false;
>  		unsigned flags = 0;
>  		unsigned nr_available;
> +		int ret = -2;
>  
>  		err = __io_uring_peek_cqe(ring, &cqe, &nr_available);
>  		if (err)
> @@ -117,7 +118,9 @@ static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_pt
>  			ret = __sys_io_uring_enter2(ring->ring_fd, data->submit,
>  					data->wait_nr, flags, data->arg,
>  					data->sz);
> -		if (ret < 0) {
> +		if (ret == -2) {
> +			break;

peek/wait_cqe expect that cqe_ptr is filled on return=0. Looks we need
to return an error or hack up those functions.

> +		} else if (ret < 0) {
>  			err = -errno;
>  		} else if (ret == (int)data->submit) {
>  			data->submit = 0;
> 


diff --git a/src/queue.c b/src/queue.c
index 94f791e..7d6f31d 100644
--- a/src/queue.c
+++ b/src/queue.c
@@ -112,11 +112,15 @@ static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_pt
 			flags = IORING_ENTER_GETEVENTS | data->get_flags;
 		if (data->submit)
 			sq_ring_needs_enter(ring, &flags);
-		if (data->wait_nr > nr_available || data->submit ||
-		    cq_overflow_flush)
-			ret = __sys_io_uring_enter2(ring->ring_fd, data->submit,
-					data->wait_nr, flags, data->arg,
-					data->sz);
+
+		if (data->wait_nr <= nr_available && !data->submit &&
+		    !cq_overflow_flush) {
+			err = ?;
+			break;
+		}
+		ret = __sys_io_uring_enter2(ring->ring_fd, data->submit,
+				data->wait_nr, flags, data->arg,
+				data->sz);
 		if (ret < 0) {
 			err = -errno;
 		} else if (ret == (int)data->submit) {

-- 
Pavel Begunkov

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
  2021-02-05 12:46                           ` Pavel Begunkov
@ 2021-02-05 14:42                             ` Jens Axboe
  2021-02-05 14:49                               ` Pavel Begunkov
  0 siblings, 1 reply; 24+ messages in thread
From: Jens Axboe @ 2021-02-05 14:42 UTC (permalink / raw)
  To: Pavel Begunkov, Victor Stewart; +Cc: io-uring

On 2/5/21 5:46 AM, Pavel Begunkov wrote:
> On 04/02/2021 16:50, Jens Axboe wrote:
>> On 2/3/21 4:49 AM, Pavel Begunkov wrote:
>>> On 02/02/2021 20:56, Pavel Begunkov wrote:
>>>> On 02/02/2021 20:48, Jens Axboe wrote:
>>>>> On 2/2/21 1:34 PM, Pavel Begunkov wrote:
>>>>>> On 02/02/2021 17:41, Pavel Begunkov wrote:
>>>>>>> On 02/02/2021 17:24, Jens Axboe wrote:
>>>>>>>> On 2/2/21 10:10 AM, Victor Stewart wrote:
>>>>>>>>>> Can you send the updated test app?
>>>>>>>>>
>>>>>>>>> https://gist.github.com/victorstewart/98814b65ed702c33480487c05b40eb56
>>>>>>>>>
>>>>>>>>> same link i just updated the same gist
>>>>>>>>
>>>>>>>> And how are you running it?
>>>>>>>
>>>>>>> with SQPOLL    with    FIXED FLAG -> FAILURE: failed with error = ???
>>>>>>> 	-> io_uring_wait_cqe_timeout() strangely returns -1, (-EPERM??)
>>>>>>
>>>>>> Ok, _io_uring_get_cqe() is just screwed twice
>>>>>>
>>>>>> TL;DR
>>>>>> we enter into it with submit=0, do an iteration, which decrements it,
>>>>>> then a second iteration passes submit=-1, which is returned back by
>>>>>> the kernel as a result and propagated back from liburing...
>>>>>
>>>>> Yep, that's what I came up with too. We really just need a clear way
>>>>> of knowing when to break out, and when to keep going. Eg if we've
>>>>> done a loop and don't end up calling the system call, then there's
>>>>> no point in continuing.
>>>>
>>>> We can bodge something up (and forget about it), and do much cleaner
>>>> for IORING_FEAT_EXT_ARG, because we don't have LIBURING_UDATA_TIMEOUT
>>>> reqs for it and so can remove peek and so on.
>>>
>>> This version looks reasonably simple, and even passes tests and all
>>> issues found by Victor's test. Didn't test it yet, but should behave
>>> similarly in regard of internal timeouts (pre IORING_FEAT_EXT_ARG).
>>>
>>> static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_ptr,
>>> 			     struct get_data *data)
>>> {
>>> 	struct io_uring_cqe *cqe = NULL;
>>> 	int ret = 0, err;
>>>
>>> 	do {
>>> 		unsigned flags = 0;
>>> 		unsigned nr_available;
>>> 		bool enter = false;
>>>
>>> 		err = __io_uring_peek_cqe(ring, &cqe, &nr_available);
>>> 		if (err)
>>> 			break;
>>>
>>> 		/* IOPOLL won't proceed when there're not reaped CQEs */
>>> 		if (cqe && (ring->flags & IORING_SETUP_IOPOLL))
>>> 			data->wait_nr = 0;
>>>
>>> 		if (data->wait_nr > nr_available || cq_ring_needs_flush(ring)) {
>>> 			flags = IORING_ENTER_GETEVENTS | data->get_flags;
>>> 			enter = true;
>>> 		}
>>> 		if (data->submit) {
>>> 			sq_ring_needs_enter(ring, &flags);
>>> 			enter = true;
>>> 		}
>>> 		if (!enter)
>>> 			break;
>>>
>>> 		ret = __sys_io_uring_enter2(ring->ring_fd, data->submit,
>>> 					    data->wait_nr, flags, data->arg,
>>> 					    data->sz);
>>> 		if (ret < 0) {
>>> 			err = -errno;
>>> 			break;
>>> 		}
>>> 		data->submit -= ret;
>>> 	} while (1);
>>>
>>> 	*cqe_ptr = cqe;
>>> 	return err;
>>> }
>>
>> So here's my take on this - any rewrite of _io_uring_get_cqe() is going
>> to end up adding special cases, that's unfortunately just the nature of
>> the game. And since we're going to be doing a new liburing release very
>> shortly, this isn't a great time to add a rewrite of it. It'll certainly
>> introduce more bugs than it solves, and hence regressions, no matter how
>> careful we are.
>>
>> Hence my suggestion is to just patch this in a trivial kind of fashion,
>> even if it doesn't really make the function any prettier. But it'll be
>> safer for a release, and then we can rework the function after.
>>
>> With that in mind, here's my suggestion. The premise is if we go through
>> the loop and don't do io_uring_enter(), then there's no point in
>> continuing. That's the trivial fix.
> 
> Your idea but imho cleaner below.
> +1 comment inline

Shouldn't be hard, it was just a quick hack :-)

>> diff --git a/src/queue.c b/src/queue.c
>> index 94f791e..4161aa7 100644
>> --- a/src/queue.c
>> +++ b/src/queue.c
>> @@ -89,12 +89,13 @@ static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_pt
>>  {
>>  	struct io_uring_cqe *cqe = NULL;
>>  	const int to_wait = data->wait_nr;
>> -	int ret = 0, err;
>> +	int err;
>>  
>>  	do {
>>  		bool cq_overflow_flush = false;
>>  		unsigned flags = 0;
>>  		unsigned nr_available;
>> +		int ret = -2;
>>  
>>  		err = __io_uring_peek_cqe(ring, &cqe, &nr_available);
>>  		if (err)
>> @@ -117,7 +118,9 @@ static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_pt
>>  			ret = __sys_io_uring_enter2(ring->ring_fd, data->submit,
>>  					data->wait_nr, flags, data->arg,
>>  					data->sz);
>> -		if (ret < 0) {
>> +		if (ret == -2) {
>> +			break;
> 
> peek/wait_cqe expect that cqe_ptr is filled on return=0. Looks we need
> to return an error or hack up those functions.

Right good point, we'd need -EAGAIN.

>> +		} else if (ret < 0) {
>>  			err = -errno;
>>  		} else if (ret == (int)data->submit) {
>>  			data->submit = 0;
>>
> 
> 
> diff --git a/src/queue.c b/src/queue.c
> index 94f791e..7d6f31d 100644
> --- a/src/queue.c
> +++ b/src/queue.c
> @@ -112,11 +112,15 @@ static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_pt
>  			flags = IORING_ENTER_GETEVENTS | data->get_flags;
>  		if (data->submit)
>  			sq_ring_needs_enter(ring, &flags);
> -		if (data->wait_nr > nr_available || data->submit ||
> -		    cq_overflow_flush)
> -			ret = __sys_io_uring_enter2(ring->ring_fd, data->submit,
> -					data->wait_nr, flags, data->arg,
> -					data->sz);
> +
> +		if (data->wait_nr <= nr_available && !data->submit &&
> +		    !cq_overflow_flush) {
> +			err = ?;

which I guess is the actual error missing from here?

> +			break;
> +		}
> +		ret = __sys_io_uring_enter2(ring->ring_fd, data->submit,
> +				data->wait_nr, flags, data->arg,
> +				data->sz);
>  		if (ret < 0) {
>  			err = -errno;
>  		} else if (ret == (int)data->submit) {
> 


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE
  2021-02-05 14:42                             ` Jens Axboe
@ 2021-02-05 14:49                               ` Pavel Begunkov
  0 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-02-05 14:49 UTC (permalink / raw)
  To: Jens Axboe, Victor Stewart; +Cc: io-uring

On 05/02/2021 14:42, Jens Axboe wrote:
> On 2/5/21 5:46 AM, Pavel Begunkov wrote:
>> On 04/02/2021 16:50, Jens Axboe wrote:
[...]
>>> Hence my suggestion is to just patch this in a trivial kind of fashion,
>>> even if it doesn't really make the function any prettier. But it'll be
>>> safer for a release, and then we can rework the function after.
>>>	
>>> With that in mind, here's my suggestion. The premise is if we go through
>>> the loop and don't do io_uring_enter(), then there's no point in
>>> continuing. That's the trivial fix.
>>
>> Your idea but imho cleaner below.
>> +1 comment inline
> 
> Shouldn't be hard, it was just a quick hack :-)

Yes, hopefully. That comment came straight from my ever failing
attempts to clean it up :) We will need to test well the final
version -- with and without IORING_FEAT_EXT_ARG.

[...]
>> diff --git a/src/queue.c b/src/queue.c
>> index 94f791e..7d6f31d 100644
>> --- a/src/queue.c
>> +++ b/src/queue.c
>> @@ -112,11 +112,15 @@ static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_pt
>>  			flags = IORING_ENTER_GETEVENTS | data->get_flags;
>>  		if (data->submit)
>>  			sq_ring_needs_enter(ring, &flags);
>> -		if (data->wait_nr > nr_available || data->submit ||
>> -		    cq_overflow_flush)
>> -			ret = __sys_io_uring_enter2(ring->ring_fd, data->submit,
>> -					data->wait_nr, flags, data->arg,
>> -					data->sz);
>> +
>> +		if (data->wait_nr <= nr_available && !data->submit &&
>> +		    !cq_overflow_flush) {
>> +			err = ?;
> 
> which I guess is the actual error missing from here?

As a way to say "not tested at all". I just believe it's not all to that.

e.g. user calls wait/peek(nr=1, cqe);
__io_uring_peek_cqe() well succeeds, then

if (data->wait_nr && cqe)
	data->wait_nr--;

That might make us to skip enter, and we return -EAGAIN. 


> 
>> +			break;
>> +		}
>> +		ret = __sys_io_uring_enter2(ring->ring_fd, data->submit,
>> +				data->wait_nr, flags, data->arg,
>> +				data->sz);
>>  		if (ret < 0) {
>>  			err = -errno;
>>  		} else if (ret == (int)data->submit) {
>>

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-02-05 22:56 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-02  5:36 bug with fastpoll accept and sqpoll + IOSQE_FIXED_FILE Victor Stewart
2021-02-02 11:05 ` Pavel Begunkov
2021-02-02 11:23   ` Pavel Begunkov
2021-02-02 16:18     ` Victor Stewart
2021-02-02 16:30       ` Victor Stewart
2021-02-02 16:44         ` Jens Axboe
2021-02-02 17:10           ` Victor Stewart
2021-02-02 17:24             ` Jens Axboe
2021-02-02 17:41               ` Pavel Begunkov
2021-02-02 20:34                 ` Pavel Begunkov
2021-02-02 20:48                   ` Jens Axboe
2021-02-02 20:56                     ` Pavel Begunkov
2021-02-03 11:49                       ` Pavel Begunkov
2021-02-04 16:50                         ` Jens Axboe
2021-02-05 12:46                           ` Pavel Begunkov
2021-02-05 14:42                             ` Jens Axboe
2021-02-05 14:49                               ` Pavel Begunkov
2021-02-02 17:46               ` Jens Axboe
2021-02-02 17:50                 ` Victor Stewart
2021-02-02 17:57                   ` Jens Axboe
2021-02-02 17:46               ` Victor Stewart
2021-02-02 17:21       ` Pavel Begunkov
2021-02-02 17:30         ` Victor Stewart
2021-02-02 17:45           ` Victor Stewart

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.