* [PATCH 0/2] fix crash by reading /proc/diskstats @ 2022-01-20 10:52 Daniel Wagner 2022-01-20 10:52 ` [PATCH 1/2] block: remove commented out code from diskstats_show Daniel Wagner 2022-01-20 10:52 ` [PATCH 2/2] block: hold queue lock while iterating in diskstats_show Daniel Wagner 0 siblings, 2 replies; 17+ messages in thread From: Daniel Wagner @ 2022-01-20 10:52 UTC (permalink / raw) To: linux-block; +Cc: linux-kernel, Jens Axboe, Hannes Reinecke, Daniel Wagner During failover nvme-tcp tests in a HA setup a crash orccured. iostat was running in the background while the test was executed. I am not sure if the fix is correct. I am unsure if using blk_queue_enter() is the right thing. But when starring at the code snipped I saw in older version blk_queue_enter() was used but got evenutally replaced by the rcu_read_lock() as far I understand at the moment... So any tip here would be highly appreciated. Daniel Daniel Wagner (2): block: remove commented out code from diskstats_show block: hold queue lock while iterating in diskstats_show block/genhd.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) -- 2.29.2 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/2] block: remove commented out code from diskstats_show 2022-01-20 10:52 [PATCH 0/2] fix crash by reading /proc/diskstats Daniel Wagner @ 2022-01-20 10:52 ` Daniel Wagner 2022-01-20 11:02 ` Hannes Reinecke 2022-01-20 10:52 ` [PATCH 2/2] block: hold queue lock while iterating in diskstats_show Daniel Wagner 1 sibling, 1 reply; 17+ messages in thread From: Daniel Wagner @ 2022-01-20 10:52 UTC (permalink / raw) To: linux-block; +Cc: linux-kernel, Jens Axboe, Hannes Reinecke, Daniel Wagner The diskstats format is ABI, so we just can't add a new header. The code snippet has been commented out since at least v2.6.12-rc2. Remove it. Signed-off-by: Daniel Wagner <dwagner@suse.de> --- block/genhd.c | 8 -------- 1 file changed, 8 deletions(-) diff --git a/block/genhd.c b/block/genhd.c index 626c8406f21a..c9d4386dd177 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -1162,14 +1162,6 @@ static int diskstats_show(struct seq_file *seqf, void *v) struct disk_stats stat; unsigned long idx; - /* - if (&disk_to_dev(gp)->kobj.entry == block_class.devices.next) - seq_puts(seqf, "major minor name" - " rio rmerge rsect ruse wio wmerge " - "wsect wuse running use aveq" - "\n\n"); - */ - rcu_read_lock(); xa_for_each(&gp->part_tbl, idx, hd) { if (bdev_is_partition(hd) && !bdev_nr_sectors(hd)) -- 2.29.2 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] block: remove commented out code from diskstats_show 2022-01-20 10:52 ` [PATCH 1/2] block: remove commented out code from diskstats_show Daniel Wagner @ 2022-01-20 11:02 ` Hannes Reinecke 2022-03-04 15:38 ` Daniel Wagner 0 siblings, 1 reply; 17+ messages in thread From: Hannes Reinecke @ 2022-01-20 11:02 UTC (permalink / raw) To: Daniel Wagner, linux-block; +Cc: linux-kernel, Jens Axboe On 1/20/22 11:52 AM, Daniel Wagner wrote: > The diskstats format is ABI, so we just can't add a new header. The > code snippet has been commented out since at least v2.6.12-rc2. > Remove it. > > Signed-off-by: Daniel Wagner <dwagner@suse.de> > --- > block/genhd.c | 8 -------- > 1 file changed, 8 deletions(-) > > diff --git a/block/genhd.c b/block/genhd.c > index 626c8406f21a..c9d4386dd177 100644 > --- a/block/genhd.c > +++ b/block/genhd.c > @@ -1162,14 +1162,6 @@ static int diskstats_show(struct seq_file *seqf, void *v) > struct disk_stats stat; > unsigned long idx; > > - /* > - if (&disk_to_dev(gp)->kobj.entry == block_class.devices.next) > - seq_puts(seqf, "major minor name" > - " rio rmerge rsect ruse wio wmerge " > - "wsect wuse running use aveq" > - "\n\n"); > - */ > - > rcu_read_lock(); > xa_for_each(&gp->part_tbl, idx, hd) { > if (bdev_is_partition(hd) && !bdev_nr_sectors(hd)) > Reviewed-by: Hannes Reinecke <hare@suse.de> Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] block: remove commented out code from diskstats_show 2022-01-20 11:02 ` Hannes Reinecke @ 2022-03-04 15:38 ` Daniel Wagner 0 siblings, 0 replies; 17+ messages in thread From: Daniel Wagner @ 2022-03-04 15:38 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, linux-kernel, Hannes Reinecke On Thu, Jan 20, 2022 at 12:02:49PM +0100, Hannes Reinecke wrote: > On 1/20/22 11:52 AM, Daniel Wagner wrote: > > The diskstats format is ABI, so we just can't add a new header. The > > code snippet has been commented out since at least v2.6.12-rc2. > > Remove it. > > > Reviewed-by: Hannes Reinecke <hare@suse.de> Any change to queue this patch up? ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 2/2] block: hold queue lock while iterating in diskstats_show 2022-01-20 10:52 [PATCH 0/2] fix crash by reading /proc/diskstats Daniel Wagner 2022-01-20 10:52 ` [PATCH 1/2] block: remove commented out code from diskstats_show Daniel Wagner @ 2022-01-20 10:52 ` Daniel Wagner 2022-01-20 11:18 ` Hannes Reinecke 2022-01-20 12:56 ` Ming Lei 1 sibling, 2 replies; 17+ messages in thread From: Daniel Wagner @ 2022-01-20 10:52 UTC (permalink / raw) To: linux-block; +Cc: linux-kernel, Jens Axboe, Hannes Reinecke, Daniel Wagner The request queues can be freed while we operate on them. Make sure we hold a reference when using blk_mq_queue_tag_busy_iter. RIP: blk_mq_queue_tag_busy_iter Call Trace: ? blk_mq_hctx_mark_pending ? diskstats_show ? blk_mq_hctx_mark_pending blk_mq_in_flight diskstats_show ? klist_next seq_read proc_reg_read vfs_read ksys_read do_syscall_64 entry_SYSCALL_64_after_hwframe Signed-off-by: Daniel Wagner <dwagner@suse.de> --- block/genhd.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/block/genhd.c b/block/genhd.c index c9d4386dd177..0e163055a4e6 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -1167,10 +1167,13 @@ static int diskstats_show(struct seq_file *seqf, void *v) if (bdev_is_partition(hd) && !bdev_nr_sectors(hd)) continue; part_stat_read_all(hd, &stat); + if (blk_queue_enter(gp->queue, BLK_MQ_REQ_NOWAIT)) + continue; if (queue_is_mq(gp->queue)) inflight = blk_mq_in_flight(gp->queue, hd); else inflight = part_in_flight(hd); + blk_queue_exit(gp->queue); seq_printf(seqf, "%4d %7d %pg " "%lu %lu %lu %u " -- 2.29.2 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] block: hold queue lock while iterating in diskstats_show 2022-01-20 10:52 ` [PATCH 2/2] block: hold queue lock while iterating in diskstats_show Daniel Wagner @ 2022-01-20 11:18 ` Hannes Reinecke 2022-01-20 12:41 ` Daniel Wagner 2022-01-20 12:56 ` Ming Lei 1 sibling, 1 reply; 17+ messages in thread From: Hannes Reinecke @ 2022-01-20 11:18 UTC (permalink / raw) To: Daniel Wagner, linux-block; +Cc: linux-kernel, Jens Axboe On 1/20/22 11:52 AM, Daniel Wagner wrote: > The request queues can be freed while we operate on them. Make sure we > hold a reference when using blk_mq_queue_tag_busy_iter. > > RIP: blk_mq_queue_tag_busy_iter > Call Trace: > ? blk_mq_hctx_mark_pending > ? diskstats_show > ? blk_mq_hctx_mark_pending > blk_mq_in_flight > diskstats_show > ? klist_next > seq_read > proc_reg_read > vfs_read > ksys_read > do_syscall_64 > entry_SYSCALL_64_after_hwframe > > Signed-off-by: Daniel Wagner <dwagner@suse.de> > --- > block/genhd.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/block/genhd.c b/block/genhd.c > index c9d4386dd177..0e163055a4e6 100644 > --- a/block/genhd.c > +++ b/block/genhd.c > @@ -1167,10 +1167,13 @@ static int diskstats_show(struct seq_file *seqf, void *v) > if (bdev_is_partition(hd) && !bdev_nr_sectors(hd)) > continue; > part_stat_read_all(hd, &stat); > + if (blk_queue_enter(gp->queue, BLK_MQ_REQ_NOWAIT)) > + continue; > if (queue_is_mq(gp->queue)) > inflight = blk_mq_in_flight(gp->queue, hd); > else > inflight = part_in_flight(hd); > + blk_queue_exit(gp->queue); > > seq_printf(seqf, "%4d %7d %pg " > "%lu %lu %lu %u " > While this looks good, it seems to have been cut against an earlier branch; the code in question looks different in my copy. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] block: hold queue lock while iterating in diskstats_show 2022-01-20 11:18 ` Hannes Reinecke @ 2022-01-20 12:41 ` Daniel Wagner 0 siblings, 0 replies; 17+ messages in thread From: Daniel Wagner @ 2022-01-20 12:41 UTC (permalink / raw) To: Hannes Reinecke; +Cc: linux-block, linux-kernel, Jens Axboe On Thu, Jan 20, 2022 at 12:18:24PM +0100, Hannes Reinecke wrote: > On 1/20/22 11:52 AM, Daniel Wagner wrote: > While this looks good, it seems to have been cut against an earlier > branch; the code in question looks different in my copy. This was cut against linux-block/for-next. Let me know which tree/branch I should use. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] block: hold queue lock while iterating in diskstats_show 2022-01-20 10:52 ` [PATCH 2/2] block: hold queue lock while iterating in diskstats_show Daniel Wagner 2022-01-20 11:18 ` Hannes Reinecke @ 2022-01-20 12:56 ` Ming Lei 2022-01-20 13:19 ` Daniel Wagner 1 sibling, 1 reply; 17+ messages in thread From: Ming Lei @ 2022-01-20 12:56 UTC (permalink / raw) To: Daniel Wagner; +Cc: linux-block, linux-kernel, Jens Axboe, Hannes Reinecke On Thu, Jan 20, 2022 at 11:52:48AM +0100, Daniel Wagner wrote: > The request queues can be freed while we operate on them. Make sure we It shouldn't be freed here. > hold a reference when using blk_mq_queue_tag_busy_iter. > > RIP: blk_mq_queue_tag_busy_iter Any chance to pose the whole stack trace? > Call Trace: > ? blk_mq_hctx_mark_pending > ? diskstats_show > ? blk_mq_hctx_mark_pending > blk_mq_in_flight > diskstats_show > ? klist_next > seq_read > proc_reg_read > vfs_read > ksys_read > do_syscall_64 > entry_SYSCALL_64_after_hwframe > > Signed-off-by: Daniel Wagner <dwagner@suse.de> > --- > block/genhd.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/block/genhd.c b/block/genhd.c > index c9d4386dd177..0e163055a4e6 100644 > --- a/block/genhd.c > +++ b/block/genhd.c > @@ -1167,10 +1167,13 @@ static int diskstats_show(struct seq_file *seqf, void *v) > if (bdev_is_partition(hd) && !bdev_nr_sectors(hd)) > continue; > part_stat_read_all(hd, &stat); > + if (blk_queue_enter(gp->queue, BLK_MQ_REQ_NOWAIT)) > + continue; The disk device's refcnt is supposed to be grabbed by class device iterator code, so queue shouldn't be released in diskstats_show(). Not mention blk_queue_enter() won't prevent disk from being released... thanks, Ming ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] block: hold queue lock while iterating in diskstats_show 2022-01-20 12:56 ` Ming Lei @ 2022-01-20 13:19 ` Daniel Wagner 2022-01-20 13:51 ` Ming Lei 0 siblings, 1 reply; 17+ messages in thread From: Daniel Wagner @ 2022-01-20 13:19 UTC (permalink / raw) To: Ming Lei; +Cc: linux-block, linux-kernel, Jens Axboe, Hannes Reinecke Hi Ming, On Thu, Jan 20, 2022 at 08:56:03PM +0800, Ming Lei wrote: > On Thu, Jan 20, 2022 at 11:52:48AM +0100, Daniel Wagner wrote: > > The request queues can be freed while we operate on them. Make sure we > > It shouldn't be freed here. No, that is not what I tried to say. We somehow end here operating on pointers which are invalid. > > hold a reference when using blk_mq_queue_tag_busy_iter. > > > > RIP: blk_mq_queue_tag_busy_iter > > Any chance to pose the whole stack trace? [323465.820735] nvme nvme3: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 100.102.253.10:8009 [323465.821654] nvme nvme3: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery" [323465.850291] nvme nvme3: creating 80 I/O queues. [323465.974196] nvme nvme3: mapped 80/0/0 default/read/poll queues. [323466.031107] nvme nvme3: new ctrl: NQN "nqn.1988-11.com.dell:powerstore:00:a2ee06317d1845EE5227", addr 100.102.251.11:4420 [323466.093520] nvme nvme4: creating 80 I/O queues. [323466.131740] nvme nvme4: mapped 80/0/0 default/read/poll queues. [323466.194742] nvme nvme4: new ctrl: NQN "nqn.1988-11.com.dell:powerstore:00:a2ee06317d1845EE5227", addr 100.102.252.12:4420 [323466.207338] nvme nvme5: creating 80 I/O queues. [323466.219473] nvme nvme5: mapped 80/0/0 default/read/poll queues. [323466.256877] nvme nvme5: new ctrl: NQN "nqn.1988-11.com.dell:powerstore:00:a2ee06317d1845EE5227", addr 100.102.253.12:4420 [323466.295990] nvme nvme6: creating 80 I/O queues. [323466.308171] nvme nvme6: mapped 80/0/0 default/read/poll queues. [323466.359055] nvme nvme6: new ctrl: NQN "nqn.1988-11.com.dell:powerstore:00:a2ee06317d1845EE5227", addr 100.102.251.12:4420 [323466.366912] nvme nvme7: creating 80 I/O queues. [323466.379132] nvme nvme7: mapped 80/0/0 default/read/poll queues. [323466.432957] nvme nvme7: new ctrl: NQN "nqn.1988-11.com.dell:powerstore:00:a2ee06317d1845EE5227", addr 100.102.252.11:4420 [323466.458548] nvme nvme8: creating 80 I/O queues. [323466.474414] nvme nvme8: mapped 80/0/0 default/read/poll queues. [323466.548254] nvme nvme8: new ctrl: NQN "nqn.1988-11.com.dell:powerstore:00:a2ee06317d1845EE5227", addr 100.102.253.11:4420 [323467.255527] BUG: kernel NULL pointer dereference, address: 0000000000000000 [323467.255538] #PF: supervisor read access in kernel mode [323467.255541] #PF: error_code(0x0000) - not-present page [323467.255544] PGD 0 P4D 0 [323467.255550] Oops: 0000 [#1] SMP PTI [323467.255555] CPU: 13 PID: 17640 Comm: iostat Kdump: loaded Tainted: G IOE X 5.3.18-59.27-default #1 SLE15-SP3 [323467.255559] Hardware name: Dell Inc. PowerEdge R940xa/08XR9M, BIOS 2.12.2 07/12/2021 [323467.255569] RIP: 0010:blk_mq_queue_tag_busy_iter+0x1e4/0x2e0 [323467.255573] Code: 89 44 24 48 41 8b 1f 85 db 0f 84 d1 fe ff ff 4c 89 6c 24 20 4c 89 74 24 28 89 e8 44 29 e3 48 8d 04 40 48 c1 e0 06 49 03 47 10 <48> 8b 10 39 d3 0f 43 da 48 8b 90 80 00 00 00 41 01 dc 48 f7 d2 48 [323467.255583] RSP: 0018:ffffb701aefb7c20 EFLAGS: 00010246 [323467.255587] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 [323467.255592] RDX: ffff954f43c14c00 RSI: ffffffffbaa442c0 RDI: ffff954fb3749010 [323467.255597] RBP: 0000000000000000 R08: 0000000800000000 R09: 00000008ffffffff [323467.255602] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [323467.255607] R13: ffff9567b2200000 R14: ffff954798c09bc0 R15: ffff954798c09c20 [323467.255612] FS: 00007f6032444580(0000) GS:ffff956fbf6c0000(0000) knlGS:0000000000000000 [323467.255616] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [323467.255618] CR2: 0000000000000000 CR3: 00000037bc7c0001 CR4: 00000000007726e0 [323467.255621] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [323467.255624] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [323467.255626] PKRU: 55555554 [323467.255628] Call Trace: [323467.255641] ? blk_mq_hctx_mark_pending+0x70/0x70 [323467.255647] ? diskstats_show+0x207/0x2a0 [323467.255651] ? blk_mq_hctx_mark_pending+0x70/0x70 [323467.255654] blk_mq_in_flight+0x35/0x60 [323467.255659] diskstats_show+0x6d/0x2a0 [323467.255669] ? klist_next+0x88/0x110 [323467.255676] seq_read+0x2a6/0x3f0 [323467.255682] proc_reg_read+0x39/0x60 [323467.255690] vfs_read+0x89/0x140 [323467.255693] ksys_read+0xa1/0xe0 [323467.255702] do_syscall_64+0x5b/0x1e0 [323467.255709] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > The disk device's refcnt is supposed to be grabbed by class device > iterator code, so queue shouldn't be released in diskstats_show(). Okay, but it's not the disk device which is invalid as far I understand. It's the request_queue which is bogus. > Not mention blk_queue_enter() won't prevent disk from being released... Understood. Daniel ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] block: hold queue lock while iterating in diskstats_show 2022-01-20 13:19 ` Daniel Wagner @ 2022-01-20 13:51 ` Ming Lei 2022-01-20 14:01 ` Daniel Wagner 0 siblings, 1 reply; 17+ messages in thread From: Ming Lei @ 2022-01-20 13:51 UTC (permalink / raw) To: Daniel Wagner; +Cc: linux-block, linux-kernel, Jens Axboe, Hannes Reinecke On Thu, Jan 20, 2022 at 02:19:36PM +0100, Daniel Wagner wrote: > Hi Ming, > > On Thu, Jan 20, 2022 at 08:56:03PM +0800, Ming Lei wrote: > [323467.255527] BUG: kernel NULL pointer dereference, address: 0000000000000000 > [323467.255538] #PF: supervisor read access in kernel mode > [323467.255541] #PF: error_code(0x0000) - not-present page > [323467.255544] PGD 0 P4D 0 > [323467.255550] Oops: 0000 [#1] SMP PTI > [323467.255555] CPU: 13 PID: 17640 Comm: iostat Kdump: loaded Tainted: G IOE X 5.3.18-59.27-default #1 SLE15-SP3 > [323467.255559] Hardware name: Dell Inc. PowerEdge R940xa/08XR9M, BIOS 2.12.2 07/12/2021 > [323467.255569] RIP: 0010:blk_mq_queue_tag_busy_iter+0x1e4/0x2e0 Then Can you figure out where blk_mq_queue_tag_busy_iter+0x1e4 points to in source code? And what is NULL pointer? With this kind of info, we may know the root cause. Thanks, Ming ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] block: hold queue lock while iterating in diskstats_show 2022-01-20 13:51 ` Ming Lei @ 2022-01-20 14:01 ` Daniel Wagner 2022-01-20 14:06 ` Daniel Wagner ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Daniel Wagner @ 2022-01-20 14:01 UTC (permalink / raw) To: Ming Lei; +Cc: linux-block, linux-kernel, Jens Axboe, Hannes Reinecke [-- Attachment #1: Type: text/plain, Size: 1120 bytes --] On Thu, Jan 20, 2022 at 09:51:18PM +0800, Ming Lei wrote: > Then Can you figure out where blk_mq_queue_tag_busy_iter+0x1e4 points to > in source code? And what is NULL pointer? Here we go: /usr/src/debug/kernel-default-5.3.18-59.27.1.x86_64/linux-5.3/linux-obj/../include/linux/sbitmap.h: 249 0xffffffffb244a254 <blk_mq_queue_tag_busy_iter+484>: mov (%rax),%rdx 234static inline void __sbitmap_for_each_set(struct sbitmap *sb, 235 unsigned int start, 236 sb_for_each_fn fn, void *data) 237{ 238 unsigned int index; 239 unsigned int nr; 240 unsigned int scanned = 0; 241 242 if (start >= sb->depth) 243 start = 0; 244 index = SB_NR_TO_INDEX(sb, start); 245 nr = SB_NR_TO_BIT(sb, start); 246 247 while (scanned < sb->depth) { 248 unsigned long word; 249 unsigned int depth = min_t(unsigned int, 250 sb->map[index].depth - nr, 251 sb->depth - scanned); [-- Attachment #2: blk_mq_queue_tag_busy_iter.dis --] [-- Type: application/vnd.mobius.dis, Size: 15457 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] block: hold queue lock while iterating in diskstats_show 2022-01-20 14:01 ` Daniel Wagner @ 2022-01-20 14:06 ` Daniel Wagner 2022-01-20 14:21 ` Christoph Hellwig 2022-01-20 15:37 ` Ming Lei 2 siblings, 0 replies; 17+ messages in thread From: Daniel Wagner @ 2022-01-20 14:06 UTC (permalink / raw) To: Ming Lei; +Cc: linux-block, linux-kernel, Jens Axboe, Hannes Reinecke [-- Attachment #1: Type: text/plain, Size: 3708 bytes --] On Thu, Jan 20, 2022 at 03:01:27PM +0100, Daniel Wagner wrote: > On Thu, Jan 20, 2022 at 09:51:18PM +0800, Ming Lei wrote: > > Then Can you figure out where blk_mq_queue_tag_busy_iter+0x1e4 points to > > in source code? And what is NULL pointer? Yes %rax is NULL > Here we go: > > /usr/src/debug/kernel-default-5.3.18-59.27.1.x86_64/linux-5.3/linux-obj/../include/linux/sbitmap.h: 249 > 0xffffffffb244a254 <blk_mq_queue_tag_busy_iter+484>: mov (%rax),%rdx > > 234static inline void __sbitmap_for_each_set(struct sbitmap *sb, > 235 unsigned int start, > 236 sb_for_each_fn fn, void *data) > 237{ > 238 unsigned int index; > 239 unsigned int nr; > 240 unsigned int scanned = 0; > 241 > 242 if (start >= sb->depth) > 243 start = 0; > 244 index = SB_NR_TO_INDEX(sb, start); > 245 nr = SB_NR_TO_BIT(sb, start); > 246 > 247 while (scanned < sb->depth) { > 248 unsigned long word; > 249 unsigned int depth = min_t(unsigned int, > 250 sb->map[index].depth - nr, > 251 sb->depth - scanned); forgot to add this crash> bt PID: 17640 TASK: ffff956f4a468000 CPU: 13 COMMAND: "iostat" #0 [ffffb701aefb7980] machine_kexec at ffffffffba66fb91 #1 [ffffb701aefb79d8] __crash_kexec at ffffffffba75927d #2 [ffffb701aefb7aa0] crash_kexec at ffffffffba75a13d #3 [ffffb701aefb7ab8] oops_end at ffffffffba636cdf #4 [ffffb701aefb7ad8] no_context at ffffffffba682baf #5 [ffffb701aefb7b40] do_page_fault at ffffffffba683e30 #6 [ffffb701aefb7b70] page_fault at ffffffffbb0012fe [exception RIP: blk_mq_queue_tag_busy_iter+484] RIP: ffffffffbaa4a254 RSP: ffffb701aefb7c20 RFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 RDX: ffff954f43c14c00 RSI: ffffffffbaa442c0 RDI: ffff954fb3749010 RBP: 0000000000000000 R8: 0000000800000000 R9: 00000008ffffffff R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff9567b2200000 R14: ffff954798c09bc0 R15: ffff954798c09c20 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffb701aefb7cb0] blk_mq_in_flight at ffffffffbaa461e5 #8 [ffffb701aefb7cd0] diskstats_show at ffffffffbaa4f00d #9 [ffffb701aefb7e50] seq_read at ffffffffba90df16 #10 [ffffb701aefb7eb0] proc_reg_read at ffffffffba96d789 #11 [ffffb701aefb7ec8] vfs_read at ffffffffba8e4c39 #12 [ffffb701aefb7ef8] ksys_read at ffffffffba8e4fc1 #13 [ffffb701aefb7f38] do_syscall_64 at ffffffffba60538b #14 [ffffb701aefb7f50] entry_SYSCALL_64_after_hwframe at ffffffffbb00008c RIP: 00007f6031f4eb5e RSP: 00007ffd187a7e88 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 00005577a698c2c0 RCX: 00007f6031f4eb5e RDX: 0000000000000400 RSI: 00005577a698f480 RDI: 0000000000000003 RBP: 00007f603222e300 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000a R13: 0000000000000d68 R14: 00007f603222d700 R15: 0000000000000d68 ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b I've tried to figure out the request_pointer from the registers and I think the pointer is still in %rdi struct request_queue { last_merge = 0x0, elevator = 0x0, stats = 0xffff956f45a9bec0, rq_qos = 0xffff954f54c57558, mq_ops = 0xffffffffc0c27140 <nvme_tcp_mq_ops>, queue_ctx = 0x4151cf2266c0, queue_depth = 0x0, queue_hw_ctx = 0xffff954f43c14c00, nr_hw_queues = 0x50, backing_dev_info = 0xffff953fae3ae800, queuedata = 0xffff953622282800, queue_flags = 0x5041d0, [-- Attachment #2: request_queue.log --] [-- Type: text/plain, Size: 15849 bytes --] struct request_queue { last_merge = 0x0, elevator = 0x0, stats = 0xffff956f45a9bec0, rq_qos = 0xffff954f54c57558, mq_ops = 0xffffffffc0c27140 <nvme_tcp_mq_ops>, queue_ctx = 0x4151cf2266c0, queue_depth = 0x0, queue_hw_ctx = 0xffff954f43c14c00, nr_hw_queues = 0x50, backing_dev_info = 0xffff953fae3ae800, queuedata = 0xffff953622282800, queue_flags = 0x5041d0, pm_only = { counter = 0x0 }, id = 0x61, bounce_gfp = 0x0, queue_lock = { { rlock = { raw_lock = { { val = { counter = 0x0 }, { locked = 0x0, pending = 0x0 }, { locked_pending = 0x0, tail = 0x0 } } } } } }, kobj = { name = 0xffffffffbb7846be "queue", entry = { next = 0xffff954fb3749088, prev = 0xffff954fb3749088 }, parent = 0xffff95af629a58a0, kset = 0x0, ktype = 0xffffffffbbb56540, sd = 0xffff956563ecd580, kref = { refcount = { refs = { counter = 0x2 } } }, state_initialized = 0x1, state_in_sysfs = 0x1, state_add_uevent_sent = 0x0, state_remove_uevent_sent = 0x0, uevent_suppress = 0x0 }, mq_kobj = 0xffff95720ca05660, integrity = { profile = 0x0, flags = 0x0, tuple_size = 0x0, interval_exp = 0x0, tag_size = 0x0 }, dev = 0x0, rpm_status = RPM_ACTIVE, nr_pending = 0x0, nr_requests = 0x80, dma_pad_mask = 0x0, dma_alignment = 0x7, ksm = 0x0, rq_timeout = 0x1d4c, poll_nsec = 0xffffffff, poll_cb = 0xffff953443cafa00, poll_stat = {{ mean = 0x0, min = 0x0, max = 0x0, nr_samples = 0x0, batch = 0x0 }, { mean = 0x0, min = 0x0, max = 0x0, nr_samples = 0x0, batch = 0x0 }, { mean = 0x0, min = 0x0, max = 0x0, nr_samples = 0x0, batch = 0x0 }, { mean = 0x0, min = 0x0, max = 0x0, nr_samples = 0x0, batch = 0x0 }, { mean = 0x0, min = 0x0, max = 0x0, nr_samples = 0x0, batch = 0x0 }, { mean = 0x0, min = 0x0, max = 0x0, nr_samples = 0x0, batch = 0x0 }, { mean = 0x0, min = 0x0, max = 0x0, nr_samples = 0x0, batch = 0x0 }, { mean = 0x0, min = 0x0, max = 0x0, nr_samples = 0x0, batch = 0x0 }, { mean = 0x0, min = 0x0, max = 0x0, nr_samples = 0x0, batch = 0x0 }, { mean = 0x0, min = 0x0, max = 0x0, nr_samples = 0x0, batch = 0x0 }, { mean = 0x0, min = 0x0, max = 0x0, nr_samples = 0x0, batch = 0x0 }, { mean = 0x0, min = 0x0, max = 0x0, nr_samples = 0x0, batch = 0x0 }, { mean = 0x0, min = 0x0, max = 0x0, nr_samples = 0x0, batch = 0x0 }, { mean = 0x0, min = 0x0, max = 0x0, nr_samples = 0x0, batch = 0x0 }, { mean = 0x0, min = 0x0, max = 0x0, nr_samples = 0x0, batch = 0x0 }, { mean = 0x0, min = 0x0, max = 0x0, nr_samples = 0x0, batch = 0x0 }}, timeout = { entry = { next = 0x0, pprev = 0x0 }, expires = 0x0, function = 0xffffffffbaa39440, flags = 0x2c }, timeout_work = { data = { counter = 0xfffffffe0 }, entry = { next = 0xffff954fb37493c0, prev = 0xffff954fb37493c0 }, func = 0xffffffffbaa454f0 }, icq_list = { next = 0xffff954fb37493d8, prev = 0xffff954fb37493d8 }, blkcg_pols = {0x3}, root_blkg = 0xffff953622281c00, blkg_list = { next = 0xffff953622281c08, prev = 0xffff953622281c08 }, limits = { bounce_pfn = 0xfffffffffffff, seg_boundary_mask = 0xffffffff, virt_boundary_mask = 0xfff, max_hw_sectors = 0x800, max_dev_sectors = 0x0, chunk_sectors = 0x0, max_sectors = 0x800, max_segment_size = 0xffffffff, physical_block_size = 0x200, logical_block_size = 0x200, alignment_offset = 0x0, io_min = 0x200, io_opt = 0x0, max_discard_sectors = 0xffffffff, max_hw_discard_sectors = 0xffffffff, max_write_same_sectors = 0x0, max_write_zeroes_sectors = 0x801, max_zone_append_sectors = 0x0, discard_granularity = 0x200, discard_alignment = 0x0, max_segments = 0x101, max_integrity_segments = 0x0, max_discard_segments = 0x100, misaligned = 0x0, discard_misaligned = 0x0, raid_partial_stripes_expensive = 0x0, zoned = BLK_ZONED_NONE }, required_elevator_features = 0x0, nr_zones = 0x0, conv_zones_bitmap = 0x0, seq_zones_wlock = 0x0, sg_timeout = 0x0, sg_reserved_size = 0x7fffffff, node = 0xffffffff, blk_trace = 0x0, blk_trace_mutex = { owner = { counter = 0x0 }, wait_lock = { { rlock = { raw_lock = { { val = { counter = 0x0 }, { locked = 0x0, pending = 0x0 }, { locked_pending = 0x0, tail = 0x0 } } } } } }, osq = { tail = { counter = 0x0 } }, wait_list = { next = 0xffff954fb37494b8, prev = 0xffff954fb37494b8 } }, fq = 0x0, requeue_list = { next = 0xffff954fb37494d0, prev = 0xffff954fb37494d0 }, requeue_lock = { { rlock = { raw_lock = { { val = { counter = 0x0 }, { locked = 0x0, pending = 0x0 }, { locked_pending = 0x0, tail = 0x0 } } } } } }, requeue_work = { work = { data = { counter = 0xfffffffe0 }, entry = { next = 0xffff954fb37494f0, prev = 0xffff954fb37494f0 }, func = 0xffffffffbaa474d0 }, timer = { entry = { next = 0x0, pprev = 0x0 }, expires = 0x0, function = 0xffffffffba6bb050, flags = 0x200040 }, wq = 0x0, cpu = 0x0 }, sysfs_lock = { owner = { counter = 0x0 }, wait_lock = { { rlock = { raw_lock = { { val = { counter = 0x0 }, { locked = 0x0, pending = 0x0 }, { locked_pending = 0x0, tail = 0x0 } } } } } }, osq = { tail = { counter = 0x0 } }, wait_list = { next = 0xffff954fb3749550, prev = 0xffff954fb3749550 } }, sysfs_dir_lock = { owner = { counter = 0x0 }, wait_lock = { { rlock = { raw_lock = { { val = { counter = 0x0 }, { locked = 0x0, pending = 0x0 }, { locked_pending = 0x0, tail = 0x0 } } } } } }, osq = { tail = { counter = 0x0 } }, wait_list = { next = 0xffff954fb3749570, prev = 0xffff954fb3749570 } }, unused_hctx_list = { next = 0xffff954fb3749580, prev = 0xffff954fb3749580 }, unused_hctx_lock = { { rlock = { raw_lock = { { val = { counter = 0x0 }, { locked = 0x0, pending = 0x0 }, { locked_pending = 0x0, tail = 0x0 } } } } } }, mq_freeze_depth = 0x0, bsg_dev = { class_dev = 0x0, minor = 0x0, queue = 0x0, ops = 0x0 }, td = 0xffff954f43c12800, callback_head = { next = 0x0, func = 0x0 }, mq_freeze_wq = { lock = { { rlock = { raw_lock = { { val = { counter = 0x0 }, { locked = 0x0, pending = 0x0 }, { locked_pending = 0x0, tail = 0x0 } } } } } }, head = { next = 0xffff954fb37495d8, prev = 0xffff954fb37495d8 } }, mq_freeze_lock = { owner = { counter = 0x0 }, wait_lock = { { rlock = { raw_lock = { { val = { counter = 0x0 }, { locked = 0x0, pending = 0x0 }, { locked_pending = 0x0, tail = 0x0 } } } } } }, osq = { tail = { counter = 0x0 } }, wait_list = { next = 0xffff954fb37495f8, prev = 0xffff954fb37495f8 } }, q_usage_counter = { count = { counter = 0x8000000000000001 }, percpu_count_ptr = 0x4151cda18fe8, release = 0xffffffffbaa38da0, confirm_switch = 0x0, force_atomic = 0x0, allow_reinit = 0x1, rcu = { next = 0x0, func = 0x0 } }, tag_set = 0xffff958f7fe56008, tag_set_list = { next = 0xffff958f7fe56118, prev = 0xffff956f479e6ea0 }, bio_split = { bio_slab = 0xffff9530c7c20e00, front_pad = 0x0, bio_pool = { lock = { { rlock = { raw_lock = { { val = { counter = 0x0 }, { locked = 0x0, pending = 0x0 }, { locked_pending = 0x0, tail = 0x0 } } } } } }, min_nr = 0x2, curr_nr = 0x2, elements = 0xffff954fbdb45180, pool_data = 0xffff9530c7c20e00, alloc = 0xffffffffba832460, free = 0xffffffffba832480, wait = { lock = { { rlock = { raw_lock = { { val = { counter = 0x0 }, { locked = 0x0, pending = 0x0 }, { locked_pending = 0x0, tail = 0x0 } } } } } }, head = { next = 0xffff954fb37496a0, prev = 0xffff954fb37496a0 } } }, bvec_pool = { lock = { { rlock = { raw_lock = { { val = { counter = 0x0 }, { locked = 0x0, pending = 0x0 }, { locked_pending = 0x0, tail = 0x0 } } } } } }, min_nr = 0x2, curr_nr = 0x2, elements = 0xffff954fbdb453f0, pool_data = 0xffff956fbe9a9180, alloc = 0xffffffffba832460, free = 0xffffffffba832480, wait = { lock = { { rlock = { raw_lock = { { val = { counter = 0x0 }, { locked = 0x0, pending = 0x0 }, { locked_pending = 0x0, tail = 0x0 } } } } } }, head = { next = 0xffff954fb37496e8, prev = 0xffff954fb37496e8 } } }, bio_integrity_pool = { lock = { { rlock = { raw_lock = { { val = { counter = 0x0 }, { locked = 0x0, pending = 0x0 }, { locked_pending = 0x0, tail = 0x0 } } } } } }, min_nr = 0x0, curr_nr = 0x0, elements = 0x0, pool_data = 0x0, alloc = 0x0, free = 0x0, wait = { lock = { { rlock = { raw_lock = { { val = { counter = 0x0 }, { locked = 0x0, pending = 0x0 }, { locked_pending = 0x0, tail = 0x0 } } } } } }, head = { next = 0x0, prev = 0x0 } } }, bvec_integrity_pool = { lock = { { rlock = { raw_lock = { { val = { counter = 0x0 }, { locked = 0x0, pending = 0x0 }, { locked_pending = 0x0, tail = 0x0 } } } } } }, min_nr = 0x0, curr_nr = 0x0, elements = 0x0, pool_data = 0x0, alloc = 0x0, free = 0x0, wait = { lock = { { rlock = { raw_lock = { { val = { counter = 0x0 }, { locked = 0x0, pending = 0x0 }, { locked_pending = 0x0, tail = 0x0 } } } } } }, head = { next = 0x0, prev = 0x0 } } }, rescue_lock = { { rlock = { raw_lock = { { val = { counter = 0x0 }, { locked = 0x0, pending = 0x0 }, { locked_pending = 0x0, tail = 0x0 } } } } } }, rescue_list = { head = 0x0, tail = 0x0 }, rescue_work = { data = { counter = 0xfffffffe0 }, entry = { next = 0xffff954fb37497a8, prev = 0xffff954fb37497a8 }, func = 0xffffffffbaa33ec0 }, rescue_workqueue = 0x0 }, debugfs_dir = 0xffff956f64d2efc0, sched_debugfs_dir = 0x0, rqos_debugfs_dir = 0xffff953fae3a6d80, mq_sysfs_init_done = 0x1, cmd_size = 0x0, write_hints = {0x0, 0x0, 0x0, 0x0, 0x0} } ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] block: hold queue lock while iterating in diskstats_show 2022-01-20 14:01 ` Daniel Wagner 2022-01-20 14:06 ` Daniel Wagner @ 2022-01-20 14:21 ` Christoph Hellwig 2022-01-20 14:34 ` Daniel Wagner 2022-01-20 15:37 ` Ming Lei 2 siblings, 1 reply; 17+ messages in thread From: Christoph Hellwig @ 2022-01-20 14:21 UTC (permalink / raw) To: Daniel Wagner Cc: Ming Lei, linux-block, linux-kernel, Jens Axboe, Hannes Reinecke On Thu, Jan 20, 2022 at 03:01:26PM +0100, Daniel Wagner wrote: > On Thu, Jan 20, 2022 at 09:51:18PM +0800, Ming Lei wrote: > > Then Can you figure out where blk_mq_queue_tag_busy_iter+0x1e4 points to > > in source code? And what is NULL pointer? > > Here we go: > > /usr/src/debug/kernel-default-5.3.18-59.27.1.x86_64/linux-5.3/linux-obj/../include/linux/sbitmap.h: 249 That looks quite old. How does this look on mainline? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] block: hold queue lock while iterating in diskstats_show 2022-01-20 14:21 ` Christoph Hellwig @ 2022-01-20 14:34 ` Daniel Wagner 0 siblings, 0 replies; 17+ messages in thread From: Daniel Wagner @ 2022-01-20 14:34 UTC (permalink / raw) To: Christoph Hellwig Cc: Ming Lei, linux-block, linux-kernel, Jens Axboe, Hannes Reinecke On Thu, Jan 20, 2022 at 06:21:26AM -0800, Christoph Hellwig wrote: > That looks quite old. How does this look on mainline? I am still try to reproduce it locally (on our SUSE kernels and mainline). So far 'no luck' using nvme-fc. I'll try now with nvme-tcp maybe it's transport thing. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] block: hold queue lock while iterating in diskstats_show 2022-01-20 14:01 ` Daniel Wagner 2022-01-20 14:06 ` Daniel Wagner 2022-01-20 14:21 ` Christoph Hellwig @ 2022-01-20 15:37 ` Ming Lei 2022-01-20 16:29 ` Daniel Wagner 2 siblings, 1 reply; 17+ messages in thread From: Ming Lei @ 2022-01-20 15:37 UTC (permalink / raw) To: Daniel Wagner; +Cc: linux-block, linux-kernel, Jens Axboe, Hannes Reinecke On Thu, Jan 20, 2022 at 03:01:26PM +0100, Daniel Wagner wrote: > On Thu, Jan 20, 2022 at 09:51:18PM +0800, Ming Lei wrote: > > Then Can you figure out where blk_mq_queue_tag_busy_iter+0x1e4 points to > > in source code? And what is NULL pointer? > > Here we go: > > /usr/src/debug/kernel-default-5.3.18-59.27.1.x86_64/linux-5.3/linux-obj/../include/linux/sbitmap.h: 249 > 0xffffffffb244a254 <blk_mq_queue_tag_busy_iter+484>: mov (%rax),%rdx > > 234static inline void __sbitmap_for_each_set(struct sbitmap *sb, > 235 unsigned int start, > 236 sb_for_each_fn fn, void *data) > 237{ > 238 unsigned int index; > 239 unsigned int nr; > 240 unsigned int scanned = 0; > 241 > 242 if (start >= sb->depth) > 243 start = 0; > 244 index = SB_NR_TO_INDEX(sb, start); > 245 nr = SB_NR_TO_BIT(sb, start); > 246 > 247 while (scanned < sb->depth) { > 248 unsigned long word; > 249 unsigned int depth = min_t(unsigned int, > 250 sb->map[index].depth - nr, > 251 sb->depth - scanned); > Seems more likely nvme recovery issue, tags is changed even though ->q_usage_counter is grabbed in blk_mq_queue_tag_busy_iter(), which is added in v4.19. You may check dmesg log and see if there is any nvme error recovery info. Thanks, Ming ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] block: hold queue lock while iterating in diskstats_show 2022-01-20 15:37 ` Ming Lei @ 2022-01-20 16:29 ` Daniel Wagner 2022-03-04 14:53 ` Daniel Wagner 0 siblings, 1 reply; 17+ messages in thread From: Daniel Wagner @ 2022-01-20 16:29 UTC (permalink / raw) To: Ming Lei; +Cc: linux-block, linux-kernel, Jens Axboe, Hannes Reinecke On Thu, Jan 20, 2022 at 11:37:00PM +0800, Ming Lei wrote: > Seems more likely nvme recovery issue, tags is changed even though > ->q_usage_counter is grabbed in blk_mq_queue_tag_busy_iter(), which is > added in v4.19. > > You may check dmesg log and see if there is any nvme error recovery info. Thanks for the tip. It's likely that a recovery is running although there is no entry in the logs. At least I have an idea what could be the problem here. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2/2] block: hold queue lock while iterating in diskstats_show 2022-01-20 16:29 ` Daniel Wagner @ 2022-03-04 14:53 ` Daniel Wagner 0 siblings, 0 replies; 17+ messages in thread From: Daniel Wagner @ 2022-03-04 14:53 UTC (permalink / raw) To: Ming Lei; +Cc: linux-block, linux-kernel, Jens Axboe, Hannes Reinecke On Thu, Jan 20, 2022 at 05:29:50PM +0100, Daniel Wagner wrote: > On Thu, Jan 20, 2022 at 11:37:00PM +0800, Ming Lei wrote: > > Seems more likely nvme recovery issue, tags is changed even though > > ->q_usage_counter is grabbed in blk_mq_queue_tag_busy_iter(), which is > > added in v4.19. > > > > You may check dmesg log and see if there is any nvme error recovery info. > > Thanks for the tip. It's likely that a recovery is running although > there is no entry in the logs. At least I have an idea what could be the > problem here. FWIW, a846a8e6c9a5 ("blk-mq: don't free tags if the tag_set is used by other device in queue initialztion") was missing our tree. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2022-03-04 15:38 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-01-20 10:52 [PATCH 0/2] fix crash by reading /proc/diskstats Daniel Wagner 2022-01-20 10:52 ` [PATCH 1/2] block: remove commented out code from diskstats_show Daniel Wagner 2022-01-20 11:02 ` Hannes Reinecke 2022-03-04 15:38 ` Daniel Wagner 2022-01-20 10:52 ` [PATCH 2/2] block: hold queue lock while iterating in diskstats_show Daniel Wagner 2022-01-20 11:18 ` Hannes Reinecke 2022-01-20 12:41 ` Daniel Wagner 2022-01-20 12:56 ` Ming Lei 2022-01-20 13:19 ` Daniel Wagner 2022-01-20 13:51 ` Ming Lei 2022-01-20 14:01 ` Daniel Wagner 2022-01-20 14:06 ` Daniel Wagner 2022-01-20 14:21 ` Christoph Hellwig 2022-01-20 14:34 ` Daniel Wagner 2022-01-20 15:37 ` Ming Lei 2022-01-20 16:29 ` Daniel Wagner 2022-03-04 14:53 ` Daniel Wagner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).