linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* "blk-mq: fix tag_get wait task can't be awakened" causes hangs
       [not found] <1643040870.3bwvk3sis4.none.ref@localhost>
@ 2022-01-24 16:24 ` Alex Xu (Hello71)
  2022-01-25  4:08   ` QiuLaibin
  2022-01-25 11:38   ` Daniel Palmer
  0 siblings, 2 replies; 6+ messages in thread
From: Alex Xu (Hello71) @ 2022-01-24 16:24 UTC (permalink / raw)
  To: Laibin Qiu, axboe, Andy Shevchenko, linux-block
  Cc: john.garry, ming.lei, martin.petersen, hare, akpm, bvanassche,
	linux-kernel

Hi,

Recently on torvalds master, I/O on USB flash drives started hanging 
here:

task:systemd-udevd   state:D stack:    0 pid:  374 ppid:   347 flags:0x00004000
Call Trace:
 <TASK>
 ? __schedule+0x319/0x4a0
 ? schedule+0x77/0xa0
 ? io_schedule+0x43/0x60
 ? blk_mq_get_tag+0x175/0x2b0
 ? mempool_alloc+0x33/0x170
 ? init_wait_entry+0x30/0x30
 ? __blk_mq_alloc_requests+0x1b4/0x220
 ? blk_mq_submit_bio+0x213/0x490
 ? submit_bio_noacct+0x22c/0x270
 ? xa_load+0x48/0x80
 ? mpage_readahead+0x114/0x130
 ? blkdev_fallocate+0x170/0x170
 ? read_pages+0x48/0x1d0
 ? page_cache_ra_unbounded+0xee/0x1f0
 ? force_page_cache_ra+0x68/0xc0
 ? filemap_read+0x18c/0x9a0
 ? blkdev_read_iter+0x4e/0x120
 ? vfs_read+0x265/0x2d0
 ? ksys_read+0x50/0xa0
 ? do_syscall_64+0x62/0x90
 ? do_user_addr_fault+0x271/0x3c0
 ? asm_exc_page_fault+0x8/0x30
 ? exc_page_fault+0x58/0x80
 ? entry_SYSCALL_64_after_hwframe+0x44/0xae
 </TASK>

mount(8) hangs with a similar backtrace, making the device effectively 
unusable. It does not seem to affect NVMe or SATA attached drives. The 
affected drive does not support UAS. I don't currently have UAS drives 
to test with. The default I/O scheduler is set to noop.

I found that reverting 180dccb0dba4 ("blk-mq: fix tag_get wait 
task can't be awakened") appears to resolve the issue.

Let me know what other information is needed.

Cheers,
Alex.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "blk-mq: fix tag_get wait task can't be awakened" causes hangs
  2022-01-24 16:24 ` "blk-mq: fix tag_get wait task can't be awakened" causes hangs Alex Xu (Hello71)
@ 2022-01-25  4:08   ` QiuLaibin
  2022-01-25 15:15     ` Alex Xu (Hello71)
  2022-01-26 13:38     ` Jens Axboe
  2022-01-25 11:38   ` Daniel Palmer
  1 sibling, 2 replies; 6+ messages in thread
From: QiuLaibin @ 2022-01-25  4:08 UTC (permalink / raw)
  To: Alex Xu (Hello71), axboe, Andy Shevchenko, linux-block
  Cc: john.garry, ming.lei, martin.petersen, hare, akpm, bvanassche,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2010 bytes --]

Hi Alex

1、Please help to import this structure:

blk_mq_tags <= request_queue->blk_mq_hw_ctx->blk_mq_tags

If there is no kernel dump, help to see the value of

cat /sys/block/sda/mq/0/nr_tags
                __ <= Change it to the problem device

And how many block devices in total by lsblk.

2、Please describe in detail how to reproduce the issue,

And what type of USB device?

3、Please help to try the attachment patch and see if it can be reproduced.

Thanks.

On 2022/1/25 0:24, Alex Xu (Hello71) wrote:
> Hi,
> 
> Recently on torvalds master, I/O on USB flash drives started hanging
> here:
> 
> task:systemd-udevd   state:D stack:    0 pid:  374 ppid:   347 flags:0x00004000
> Call Trace:
>   <TASK>
>   ? __schedule+0x319/0x4a0
>   ? schedule+0x77/0xa0
>   ? io_schedule+0x43/0x60
>   ? blk_mq_get_tag+0x175/0x2b0
>   ? mempool_alloc+0x33/0x170
>   ? init_wait_entry+0x30/0x30
>   ? __blk_mq_alloc_requests+0x1b4/0x220
>   ? blk_mq_submit_bio+0x213/0x490
>   ? submit_bio_noacct+0x22c/0x270
>   ? xa_load+0x48/0x80
>   ? mpage_readahead+0x114/0x130
>   ? blkdev_fallocate+0x170/0x170
>   ? read_pages+0x48/0x1d0
>   ? page_cache_ra_unbounded+0xee/0x1f0
>   ? force_page_cache_ra+0x68/0xc0
>   ? filemap_read+0x18c/0x9a0
>   ? blkdev_read_iter+0x4e/0x120
>   ? vfs_read+0x265/0x2d0
>   ? ksys_read+0x50/0xa0
>   ? do_syscall_64+0x62/0x90
>   ? do_user_addr_fault+0x271/0x3c0
>   ? asm_exc_page_fault+0x8/0x30
>   ? exc_page_fault+0x58/0x80
>   ? entry_SYSCALL_64_after_hwframe+0x44/0xae
>   </TASK>
> 
> mount(8) hangs with a similar backtrace, making the device effectively
> unusable. It does not seem to affect NVMe or SATA attached drives. The
> affected drive does not support UAS. I don't currently have UAS drives
> to test with. The default I/O scheduler is set to noop.
> 
> I found that reverting 180dccb0dba4 ("blk-mq: fix tag_get wait
> task can't be awakened") appears to resolve the issue.
> 
> Let me know what other information is needed.
> 
> Cheers,
> Alex.
> .
> 

BR
Laibin

[-- Attachment #2: fix_hang.patch --]
[-- Type: text/plain, Size: 708 bytes --]

diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index 6220fa67fb7e..09d293c30fd2 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -488,9 +488,13 @@ void sbitmap_queue_recalculate_wake_batch(struct sbitmap_queue *sbq,
 					    unsigned int users)
 {
 	unsigned int wake_batch;
+	unsigned int min_batch;
+	unsigned int depth = (sbq->sb.depth + users - 1) / users;
 
-	wake_batch = clamp_val((sbq->sb.depth + users - 1) /
-			users, 4, SBQ_WAKE_BATCH);
+	min_batch = sbq->sb.depth >= (4 * SBQ_WAIT_QUEUES) ? 4 : 1;
+
+	wake_batch = clamp_val(depth / SBQ_WAIT_QUEUES,
+			min_batch, SBQ_WAKE_BATCH);
 	__sbitmap_queue_update_wake_batch(sbq, wake_batch);
 }
 EXPORT_SYMBOL_GPL(sbitmap_queue_recalculate_wake_batch);

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: "blk-mq: fix tag_get wait task can't be awakened" causes hangs
  2022-01-24 16:24 ` "blk-mq: fix tag_get wait task can't be awakened" causes hangs Alex Xu (Hello71)
  2022-01-25  4:08   ` QiuLaibin
@ 2022-01-25 11:38   ` Daniel Palmer
  1 sibling, 0 replies; 6+ messages in thread
From: Daniel Palmer @ 2022-01-25 11:38 UTC (permalink / raw)
  To: Alex Xu (Hello71)
  Cc: Laibin Qiu, axboe, Andy Shevchenko, linux-block, john.garry,
	ming.lei, martin.petersen, hare, Andrew Morton, bvanassche,
	Linux Kernel Mailing List

Hi Alex,

Same issue here. I just spent an hour bisecting the issue and hit the
same commit you did.

If I try to dd from a USB card reader to null I can see a few commands
wake up the thread that usb storage uses and then nothing more.
The user process then blocks forever waiting for the data it asked for.

Cheers,

Daniel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "blk-mq: fix tag_get wait task can't be awakened" causes hangs
  2022-01-25  4:08   ` QiuLaibin
@ 2022-01-25 15:15     ` Alex Xu (Hello71)
  2022-01-26 13:38     ` Jens Axboe
  1 sibling, 0 replies; 6+ messages in thread
From: Alex Xu (Hello71) @ 2022-01-25 15:15 UTC (permalink / raw)
  To: Andy Shevchenko, axboe, linux-block, QiuLaibin
  Cc: akpm, bvanassche, hare, john.garry, linux-kernel,
	martin.petersen, ming.lei, Daniel Palmer

Excerpts from QiuLaibin's message of January 24, 2022 11:08 pm:
> Hi Alex
> 
> 1、Please help to import this structure:
> 
> blk_mq_tags <= request_queue->blk_mq_hw_ctx->blk_mq_tags

I don't understand what you mean.

> If there is no kernel dump, help to see the value of
> 
> cat /sys/block/sda/mq/0/nr_tags
>                 __ <= Change it to the problem device

The affected device returns 1. My understanding is that mq does not work 
with legacy non-UAS devices.

> And how many block devices in total by lsblk.

My device topology roughly looks like:

NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
sda           8:0    0 [snip]  0 disk
├─sda1        8:1    0 [snip]  0 part
├─sda2        8:2    0 [snip]  0 part
└─sda3        8:3    0 [snip]  0 part
sdb           8:16   0 [snip]  0 disk
├─sdb1        8:17   0 [snip]  0 part
├─sdb2        8:18   0 [snip]  0 part
├─sdb3        8:19   0 [snip]  0 part
└─sdb4        8:20   0 [snip]  0 part
sdc           8:32   1 [snip]  0 disk
├─sdc1        8:33   1 [snip]  0 part
└─sdc2        8:34   1 [snip]  0 part
nvme0n1     259:0    0 [snip]  0 disk
├─nvme0n1p1 259:1    0 [snip]  0 part
└─nvme0n1p2 259:2    0 [snip]  0 part
  └─root    254:0    0 [snip]  0 crypt /

> 2、Please describe in detail how to reproduce the issue,

1. Plug in the device.
2. Execute Show Blocked Tasks. udev is stuck.

> And what type of USB device?

It is a cheap unbranded USB flash drive.

> 3、Please help to try the attachment patch and see if it can be reproduced.

From a quick test, it appears to resolve the issue.

> Thanks.

Cheers,
Alex.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "blk-mq: fix tag_get wait task can't be awakened" causes hangs
  2022-01-25  4:08   ` QiuLaibin
  2022-01-25 15:15     ` Alex Xu (Hello71)
@ 2022-01-26 13:38     ` Jens Axboe
  2022-01-27  1:24       ` QiuLaibin
  1 sibling, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2022-01-26 13:38 UTC (permalink / raw)
  To: QiuLaibin, Alex Xu (Hello71), Andy Shevchenko, linux-block
  Cc: john.garry, ming.lei, martin.petersen, hare, akpm, bvanassche,
	linux-kernel

On 1/24/22 9:08 PM, QiuLaibin wrote:
> Hi Alex
> 
> 1、Please help to import this structure:
> 
> blk_mq_tags <= request_queue->blk_mq_hw_ctx->blk_mq_tags
> 
> If there is no kernel dump, help to see the value of
> 
> cat /sys/block/sda/mq/0/nr_tags
>                 __ <= Change it to the problem device
> 
> And how many block devices in total by lsblk.
> 
> 2、Please describe in detail how to reproduce the issue,
> 
> And what type of USB device?
> 
> 3、Please help to try the attachment patch and see if it can be reproduced.

Any progress on this? I strongly suspect that any QD=1 setup would
trivially show the issue, based on the reports.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: "blk-mq: fix tag_get wait task can't be awakened" causes hangs
  2022-01-26 13:38     ` Jens Axboe
@ 2022-01-27  1:24       ` QiuLaibin
  0 siblings, 0 replies; 6+ messages in thread
From: QiuLaibin @ 2022-01-27  1:24 UTC (permalink / raw)
  To: Jens Axboe, Alex Xu (Hello71), Andy Shevchenko, linux-block
  Cc: john.garry, ming.lei, martin.petersen, hare, akpm, bvanassche,
	linux-kernel


Hi

On 2022/1/26 21:38, Jens Axboe wrote:
> On 1/24/22 9:08 PM, QiuLaibin wrote:
>> Hi Alex
>>
>> 1、Please help to import this structure:
>>
>> blk_mq_tags <= request_queue->blk_mq_hw_ctx->blk_mq_tags
>>
>> If there is no kernel dump, help to see the value of
>>
>> cat /sys/block/sda/mq/0/nr_tags
>>                  __ <= Change it to the problem device
>>
>> And how many block devices in total by lsblk.
>>
>> 2、Please describe in detail how to reproduce the issue,
>>
>> And what type of USB device?
>>
>> 3、Please help to try the attachment patch and see if it can be reproduced.
> 
> Any progress on this? I strongly suspect that any QD=1 setup would
> trivially show the issue, based on the reports.
Yes, QD = 1 from Alex Xu's must-see environment. I'm trying to build a 
must-see locally, and I will submit the repaired patch as soon as possible.

> 

BR
Laibin

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-01-27  1:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1643040870.3bwvk3sis4.none.ref@localhost>
2022-01-24 16:24 ` "blk-mq: fix tag_get wait task can't be awakened" causes hangs Alex Xu (Hello71)
2022-01-25  4:08   ` QiuLaibin
2022-01-25 15:15     ` Alex Xu (Hello71)
2022-01-26 13:38     ` Jens Axboe
2022-01-27  1:24       ` QiuLaibin
2022-01-25 11:38   ` Daniel Palmer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).