io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* regression: fixed file hang
@ 2020-05-13 18:45 Jens Axboe
  2020-05-13 19:04 ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2020-05-13 18:45 UTC (permalink / raw)
  To: io-uring, Xiaoguang Wang

Hi Xiaoguang,

Was doing some other testing today, and noticed a hang with fixed files.
I did a bit of poor mans bisecting, and came up with this one:

commit 0558955373023b08f638c9ede36741b0e4200f58
Author: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Date:   Tue Mar 31 14:05:18 2020 +0800

    io_uring: refactor file register/unregister/update handling

If I revert this one, the test completes fine.

The case case is pretty simple, just run t/io_uring from the fio
repo, default settings:

[ fio] # t/io_uring /dev/nvme0n1p2
Added file /dev/nvme0n1p2
sq_ring ptr = 0x0x7fe1cb81f000
sqes ptr    = 0x0x7fe1cb81d000
cq_ring ptr = 0x0x7fe1cb81b000
polled=1, fixedbufs=1, buffered=0 QD=128, sq_ring=128, cq_ring=256
submitter=345
IOPS=240096, IOS/call=32/31, inflight=91 (91)
IOPS=249696, IOS/call=32/31, inflight=99 (99)
^CExiting on signal 2

and ctrl-c it after a second or so. You'll then notice a kworker that
is stuck in io_sqe_files_unregister(), here:

	/* wait for all refs nodes to complete */
	wait_for_completion(&data->done);

I'll try and debug this a bit, and for some reason it doens't trigger
with the liburing fixed file setup. Just wanted to throw this out there,
so if you have cycles, please do take a look at it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: regression: fixed file hang
  2020-05-13 18:45 regression: fixed file hang Jens Axboe
@ 2020-05-13 19:04 ` Jens Axboe
  2020-05-14  5:33   ` Xiaoguang Wang
  0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2020-05-13 19:04 UTC (permalink / raw)
  To: io-uring, Xiaoguang Wang

On 5/13/20 12:45 PM, Jens Axboe wrote:
> Hi Xiaoguang,
> 
> Was doing some other testing today, and noticed a hang with fixed files.
> I did a bit of poor mans bisecting, and came up with this one:
> 
> commit 0558955373023b08f638c9ede36741b0e4200f58
> Author: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
> Date:   Tue Mar 31 14:05:18 2020 +0800
> 
>     io_uring: refactor file register/unregister/update handling
> 
> If I revert this one, the test completes fine.
> 
> The case case is pretty simple, just run t/io_uring from the fio
> repo, default settings:
> 
> [ fio] # t/io_uring /dev/nvme0n1p2
> Added file /dev/nvme0n1p2
> sq_ring ptr = 0x0x7fe1cb81f000
> sqes ptr    = 0x0x7fe1cb81d000
> cq_ring ptr = 0x0x7fe1cb81b000
> polled=1, fixedbufs=1, buffered=0 QD=128, sq_ring=128, cq_ring=256
> submitter=345
> IOPS=240096, IOS/call=32/31, inflight=91 (91)
> IOPS=249696, IOS/call=32/31, inflight=99 (99)
> ^CExiting on signal 2
> 
> and ctrl-c it after a second or so. You'll then notice a kworker that
> is stuck in io_sqe_files_unregister(), here:
> 
> 	/* wait for all refs nodes to complete */
> 	wait_for_completion(&data->done);
> 
> I'll try and debug this a bit, and for some reason it doens't trigger
> with the liburing fixed file setup. Just wanted to throw this out there,
> so if you have cycles, please do take a look at it.

https://lore.kernel.org/io-uring/015659db-626c-5a78-6746-081a45175f45@kernel.dk/T/#u


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: regression: fixed file hang
  2020-05-13 19:04 ` Jens Axboe
@ 2020-05-14  5:33   ` Xiaoguang Wang
  2020-05-14 15:43     ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Xiaoguang Wang @ 2020-05-14  5:33 UTC (permalink / raw)
  To: Jens Axboe, io-uring

hi,

> On 5/13/20 12:45 PM, Jens Axboe wrote:
>> Hi Xiaoguang,
>>
>> Was doing some other testing today, and noticed a hang with fixed files.
>> I did a bit of poor mans bisecting, and came up with this one:
>>
>> commit 0558955373023b08f638c9ede36741b0e4200f58
>> Author: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
>> Date:   Tue Mar 31 14:05:18 2020 +0800
>>
>>      io_uring: refactor file register/unregister/update handling
>>
>> If I revert this one, the test completes fine.
>>
>> The case case is pretty simple, just run t/io_uring from the fio
>> repo, default settings:
>>
>> [ fio] # t/io_uring /dev/nvme0n1p2
>> Added file /dev/nvme0n1p2
>> sq_ring ptr = 0x0x7fe1cb81f000
>> sqes ptr    = 0x0x7fe1cb81d000
>> cq_ring ptr = 0x0x7fe1cb81b000
>> polled=1, fixedbufs=1, buffered=0 QD=128, sq_ring=128, cq_ring=256
>> submitter=345
>> IOPS=240096, IOS/call=32/31, inflight=91 (91)
>> IOPS=249696, IOS/call=32/31, inflight=99 (99)
>> ^CExiting on signal 2
>>
>> and ctrl-c it after a second or so. You'll then notice a kworker that
>> is stuck in io_sqe_files_unregister(), here:
>>
>> 	/* wait for all refs nodes to complete */
>> 	wait_for_completion(&data->done);
>>
>> I'll try and debug this a bit, and for some reason it doens't trigger
>> with the liburing fixed file setup. Just wanted to throw this out there,
>> so if you have cycles, please do take a look at it.
> 
> https://lore.kernel.org/io-uring/015659db-626c-5a78-6746-081a45175f45@kernel.dk/T/#u
Thanks for this fix, and sorry, it's my bad, I didn't cover this case when sending patches.
Can you share your test cases or test method when developing io_uring? Usually I just
run test cases under liburing/test, seems it's not enough.

Regards,
Xiaoguang Wang
> 
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: regression: fixed file hang
  2020-05-14  5:33   ` Xiaoguang Wang
@ 2020-05-14 15:43     ` Jens Axboe
  0 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2020-05-14 15:43 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring

On 5/13/20 11:33 PM, Xiaoguang Wang wrote:
> hi,
> 
>> On 5/13/20 12:45 PM, Jens Axboe wrote:
>>> Hi Xiaoguang,
>>>
>>> Was doing some other testing today, and noticed a hang with fixed files.
>>> I did a bit of poor mans bisecting, and came up with this one:
>>>
>>> commit 0558955373023b08f638c9ede36741b0e4200f58
>>> Author: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
>>> Date:   Tue Mar 31 14:05:18 2020 +0800
>>>
>>>      io_uring: refactor file register/unregister/update handling
>>>
>>> If I revert this one, the test completes fine.
>>>
>>> The case case is pretty simple, just run t/io_uring from the fio
>>> repo, default settings:
>>>
>>> [ fio] # t/io_uring /dev/nvme0n1p2
>>> Added file /dev/nvme0n1p2
>>> sq_ring ptr = 0x0x7fe1cb81f000
>>> sqes ptr    = 0x0x7fe1cb81d000
>>> cq_ring ptr = 0x0x7fe1cb81b000
>>> polled=1, fixedbufs=1, buffered=0 QD=128, sq_ring=128, cq_ring=256
>>> submitter=345
>>> IOPS=240096, IOS/call=32/31, inflight=91 (91)
>>> IOPS=249696, IOS/call=32/31, inflight=99 (99)
>>> ^CExiting on signal 2
>>>
>>> and ctrl-c it after a second or so. You'll then notice a kworker that
>>> is stuck in io_sqe_files_unregister(), here:
>>>
>>> 	/* wait for all refs nodes to complete */
>>> 	wait_for_completion(&data->done);
>>>
>>> I'll try and debug this a bit, and for some reason it doens't trigger
>>> with the liburing fixed file setup. Just wanted to throw this out there,
>>> so if you have cycles, please do take a look at it.
>>
>> https://lore.kernel.org/io-uring/015659db-626c-5a78-6746-081a45175f45@kernel.dk/T/#u
> Thanks for this fix, and sorry, it's my bad, I didn't cover this case
> when sending patches.  Can you share your test cases or test method
> when developing io_uring? Usually I just run test cases under
> liburing/test, seems it's not enough.

It really should be enough, the case that triggered this issue is the
combination of fixed files and polled IO. I'll need to add that to eg
test/read-write.c, which does a lot of combinations already.

The only issue is that polled IO only works on some files / devices.
I've been meaning to add a config file to the liburing regression tests,
so you can configure a device to use for testing. With an NVMe device in
there, we should be able to have full coverage.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-05-14 15:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-13 18:45 regression: fixed file hang Jens Axboe
2020-05-13 19:04 ` Jens Axboe
2020-05-14  5:33   ` Xiaoguang Wang
2020-05-14 15:43     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).