[PATCH v6 0/5] Fix potential kernel panic when increase hardware queue

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v6 0/5] Fix potential kernel panic when increase hardware queue
@ 2020-05-07 13:03 Weiping Zhang
  2020-05-07 13:03 ` [PATCH v6 1/5] block: free both rq_map and request Weiping Zhang
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Weiping Zhang @ 2020-05-07 13:03 UTC (permalink / raw)
  To: axboe, tom.leiming, bvanassche, hch; +Cc: linux-block

Hi,

This series mainly fix the kernel panic when increase hardware queue,
and also fix some other misc issue.

Memleak 1:

__blk_mq_alloc_rq_maps
	__blk_mq_alloc_rq_map

if fail
	blk_mq_free_rq_map

Actually, __blk_mq_alloc_rq_map alloc both map and request, here
also need free request.

Patch1: fix Memleak 1.
Patch2: fix prev_nr_hw_queues issue, need be saved before change.
Patch3: From Ming, fix potential kernel panic when increase hardware queue.
Patch4~5: rename two function, because these two function alloc both
map and request, and keep in pair with blk_mq_free_map_and_request(s).

Changes since V5:
 * fix 80 char per line for patch-4

Changes since V4:
 * use another way to fix kernel panic when increase hardware queue,
   this patch from Ming.

Changes since V3:
* record patchset, fix issue fistly then rename.
* rename function to blk_mq_alloc_map_and_request

Changes since V2:
 * rename some functions name and fix memleak when free map and requests
 * Not free new allocated map and request, they will be relased when tagset gone

Changes since V1:
 * Add fix for potential kernel panic when increase hardware queue

Ming Lei (1):
  block: alloc map and request for new hardware queue

Weiping Zhang (4):
  block: free both rq_map and request
  block: save previous hardware queue count before udpate
  block: rename __blk_mq_alloc_rq_map
  block: rename blk_mq_alloc_rq_maps

 block/blk-mq.c | 37 +++++++++++++++++++------------------
 1 file changed, 19 insertions(+), 18 deletions(-)

-- 
2.18.2

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v6 1/5] block: free both rq_map and request
  2020-05-07 13:03 [PATCH v6 0/5] Fix potential kernel panic when increase hardware queue Weiping Zhang
@ 2020-05-07 13:03 ` Weiping Zhang
  2020-05-07 15:09   ` Hannes Reinecke
  2020-05-07 13:03 ` [PATCH v6 2/5] block: save previous hardware queue count before udpate Weiping Zhang
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Weiping Zhang @ 2020-05-07 13:03 UTC (permalink / raw)
  To: axboe, tom.leiming, bvanassche, hch; +Cc: linux-block

Allocation:

__blk_mq_alloc_rq_map
	blk_mq_alloc_rq_map
		blk_mq_alloc_rq_map
			tags = blk_mq_init_tags : kzalloc_node:
			tags->rqs = kcalloc_node
			tags->static_rqs = kcalloc_node
	blk_mq_alloc_rqs
		p = alloc_pages_node
		tags->static_rqs[i] = p + offset;

Free:

blk_mq_free_rq_map
	kfree(tags->rqs);
	kfree(tags->static_rqs);
	blk_mq_free_tags
		kfree(tags);

The page allocated in blk_mq_alloc_rqs cannot be released,
so we should use blk_mq_free_map_and_requests here.

blk_mq_free_map_and_requests
	blk_mq_free_rqs
		__free_pages : cleanup for blk_mq_alloc_rqs
	blk_mq_free_rq_map : cleanup for blk_mq_alloc_rq_map

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
---
 block/blk-mq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index a7785df2c944..f789b3e1b3ab 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2995,7 +2995,7 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
 
 out_unwind:
 	while (--i >= 0)
-		blk_mq_free_rq_map(set->tags[i]);
+		blk_mq_free_map_and_requests(set, i);
 
 	return -ENOMEM;
 }
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 2/5] block: save previous hardware queue count before udpate
  2020-05-07 13:03 [PATCH v6 0/5] Fix potential kernel panic when increase hardware queue Weiping Zhang
  2020-05-07 13:03 ` [PATCH v6 1/5] block: free both rq_map and request Weiping Zhang
@ 2020-05-07 13:03 ` Weiping Zhang
  2020-05-07 15:09   ` Hannes Reinecke
  2020-05-07 13:04 ` [PATCH v6 3/5] block: alloc map and request for new hardware queue Weiping Zhang
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Weiping Zhang @ 2020-05-07 13:03 UTC (permalink / raw)
  To: axboe, tom.leiming, bvanassche, hch; +Cc: linux-block

blk_mq_realloc_tag_set_tags will update set->nr_hw_queues, so
save old set->nr_hw_queues before call this function.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
---
 block/blk-mq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f789b3e1b3ab..a79afbe60ca6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3347,11 +3347,11 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 		blk_mq_sysfs_unregister(q);
 	}
 
+	prev_nr_hw_queues = set->nr_hw_queues;
 	if (blk_mq_realloc_tag_set_tags(set, set->nr_hw_queues, nr_hw_queues) <
 	    0)
 		goto reregister;
 
-	prev_nr_hw_queues = set->nr_hw_queues;
 	set->nr_hw_queues = nr_hw_queues;
 	blk_mq_update_queue_map(set);
 fallback:
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 3/5] block: alloc map and request for new hardware queue
  2020-05-07 13:03 [PATCH v6 0/5] Fix potential kernel panic when increase hardware queue Weiping Zhang
  2020-05-07 13:03 ` [PATCH v6 1/5] block: free both rq_map and request Weiping Zhang
  2020-05-07 13:03 ` [PATCH v6 2/5] block: save previous hardware queue count before udpate Weiping Zhang
@ 2020-05-07 13:04 ` Weiping Zhang
  2020-05-07 15:11   ` Hannes Reinecke
  2020-05-07 13:04 ` [PATCH v6 4/5] block: rename __blk_mq_alloc_rq_map Weiping Zhang
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Weiping Zhang @ 2020-05-07 13:04 UTC (permalink / raw)
  To: axboe, tom.leiming, bvanassche, hch; +Cc: linux-block

From: Ming Lei <ming.lei@redhat.com>

Alloc new map and request for new hardware queue when increse
hardware queue count. Before this patch, it will show a
warning for each new hardware queue, but it's not enough, these
hctx have no maps and reqeust, when a bio was mapped to these
hardware queue, it will trigger kernel panic when get request
from these hctx.

Test environment:
 * A NVMe disk supports 128 io queues
 * 96 cpus in system

A corner case can always trigger this panic, there are 96
io queues allocated for HCTX_TYPE_DEFAULT type, the corresponding kernel
log: nvme nvme0: 96/0/0 default/read/poll queues. Now we set nvme write
queues to 96, then nvme will alloc others(32) queues for read, but
blk_mq_update_nr_hw_queues does not alloc map and request for these new
added io queues. So when process read nvme disk, it will trigger kernel
panic when get request from these hardware context.

Reproduce script:

nr=$(expr `cat /sys/block/nvme0n1/device/queue_count` - 1)
echo $nr > /sys/module/nvme/parameters/write_queues
echo 1 > /sys/block/nvme0n1/device/reset_controller
dd if=/dev/nvme0n1 of=/dev/null bs=4K count=1

[ 8040.805626] ------------[ cut here ]------------
[ 8040.805627] WARNING: CPU: 82 PID: 12921 at block/blk-mq.c:2578 blk_mq_map_swqueue+0x2b6/0x2c0
[ 8040.805627] Modules linked in: nvme nvme_core nf_conntrack_netlink xt_addrtype br_netfilter overlay xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nft_counter nf_nat_tftp nf_conntrack_tftp nft_masq nf_tables_set nft_fib_inet nft_f
ib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack tun bridge nf_defrag_ipv6 nf_defrag_ipv4 stp llc ip6_tables ip_tables nft_compat rfkill ip_set nf_tables nfne
tlink sunrpc intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ipmi_ssif crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support ghash_clmulni_intel intel_
cstate intel_uncore raid0 joydev intel_rapl_perf ipmi_si pcspkr mei_me ioatdma sg ipmi_devintf mei i2c_i801 dca lpc_ich ipmi_msghandler acpi_power_meter acpi_pad xfs libcrc32c sd_mod ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm d
rm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
[ 8040.805637]  ahci drm i40e libahci crc32c_intel libata t10_pi wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nvme_core]
[ 8040.805640] CPU: 82 PID: 12921 Comm: kworker/u194:2 Kdump: loaded Tainted: G        W         5.6.0-rc5.78317c+ #2
[ 8040.805640] Hardware name: Inspur SA5212M5/YZMB-00882-104, BIOS 4.0.9 08/27/2019
[ 8040.805641] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
[ 8040.805642] RIP: 0010:blk_mq_map_swqueue+0x2b6/0x2c0
[ 8040.805643] Code: 00 00 00 00 00 41 83 c5 01 44 39 6d 50 77 b8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b bb 98 00 00 00 89 d6 e8 8c 81 03 00 eb 83 <0f> 0b e9 52 ff ff ff 0f 1f 00 0f 1f 44 00 00 41 57 48 89 f1 41 56
[ 8040.805643] RSP: 0018:ffffba590d2e7d48 EFLAGS: 00010246
[ 8040.805643] RAX: 0000000000000000 RBX: ffff9f013e1ba800 RCX: 000000000000003d
[ 8040.805644] RDX: ffff9f00ffff6000 RSI: 0000000000000003 RDI: ffff9ed200246d90
[ 8040.805644] RBP: ffff9f00f6a79860 R08: 0000000000000000 R09: 000000000000003d
[ 8040.805645] R10: 0000000000000001 R11: ffff9f0138c3d000 R12: ffff9f00fb3a9008
[ 8040.805645] R13: 000000000000007f R14: ffffffff96822660 R15: 000000000000005f
[ 8040.805645] FS:  0000000000000000(0000) GS:ffff9f013fa80000(0000) knlGS:0000000000000000
[ 8040.805646] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8040.805646] CR2: 00007f7f397fa6f8 CR3: 0000003d8240a002 CR4: 00000000007606e0
[ 8040.805647] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8040.805647] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8040.805647] PKRU: 55555554
[ 8040.805647] Call Trace:
[ 8040.805649]  blk_mq_update_nr_hw_queues+0x31b/0x390
[ 8040.805650]  nvme_reset_work+0xb4b/0xeab [nvme]
[ 8040.805651]  process_one_work+0x1a7/0x370
[ 8040.805652]  worker_thread+0x1c9/0x380
[ 8040.805653]  ? max_active_store+0x80/0x80
[ 8040.805655]  kthread+0x112/0x130
[ 8040.805656]  ? __kthread_parkme+0x70/0x70
[ 8040.805657]  ret_from_fork+0x35/0x40
[ 8040.805658] ---[ end trace b5f13b1e73ccb5d3 ]---
[ 8229.365135] BUG: kernel NULL pointer dereference, address: 0000000000000004
[ 8229.365165] #PF: supervisor read access in kernel mode
[ 8229.365178] #PF: error_code(0x0000) - not-present page
[ 8229.365191] PGD 0 P4D 0
[ 8229.365201] Oops: 0000 [#1] SMP PTI
[ 8229.365212] CPU: 77 PID: 13024 Comm: dd Kdump: loaded Tainted: G        W         5.6.0-rc5.78317c+ #2
[ 8229.365232] Hardware name: Inspur SA5212M5/YZMB-00882-104, BIOS 4.0.9 08/27/2019
[ 8229.365253] RIP: 0010:blk_mq_get_tag+0x227/0x250
[ 8229.365265] Code: 44 24 04 44 01 e0 48 8b 74 24 38 65 48 33 34 25 28 00 00 00 75 33 48 83 c4 40 5b 5d 41 5c 41 5d 41 5e c3 48 8d 68 10 4c 89 ef <44> 8b 60 04 48 89 ee e8 dd f9 ff ff 83 f8 ff 75 c8 e9 67 fe ff ff
[ 8229.365304] RSP: 0018:ffffba590e977970 EFLAGS: 00010246
[ 8229.365317] RAX: 0000000000000000 RBX: ffff9f00f6a79860 RCX: ffffba590e977998
[ 8229.365333] RDX: 0000000000000000 RSI: ffff9f012039b140 RDI: ffffba590e977a38
[ 8229.365349] RBP: 0000000000000010 R08: ffffda58ff94e190 R09: ffffda58ff94e198
[ 8229.365365] R10: 0000000000000011 R11: ffff9f00f6a79860 R12: 0000000000000000
[ 8229.365381] R13: ffffba590e977a38 R14: ffff9f012039b140 R15: 0000000000000001
[ 8229.365397] FS:  00007f481c230580(0000) GS:ffff9f013f940000(0000) knlGS:0000000000000000
[ 8229.365415] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8229.365428] CR2: 0000000000000004 CR3: 0000005f35e26004 CR4: 00000000007606e0
[ 8229.365444] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8229.365460] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8229.365476] PKRU: 55555554
[ 8229.365484] Call Trace:
[ 8229.365498]  ? finish_wait+0x80/0x80
[ 8229.365512]  blk_mq_get_request+0xcb/0x3f0
[ 8229.365525]  blk_mq_make_request+0x143/0x5d0
[ 8229.365538]  generic_make_request+0xcf/0x310
[ 8229.365553]  ? scan_shadow_nodes+0x30/0x30
[ 8229.365564]  submit_bio+0x3c/0x150
[ 8229.365576]  mpage_readpages+0x163/0x1a0
[ 8229.365588]  ? blkdev_direct_IO+0x490/0x490
[ 8229.365601]  read_pages+0x6b/0x190
[ 8229.365612]  __do_page_cache_readahead+0x1c1/0x1e0
[ 8229.365626]  ondemand_readahead+0x182/0x2f0
[ 8229.365639]  generic_file_buffered_read+0x590/0xab0
[ 8229.365655]  new_sync_read+0x12a/0x1c0
[ 8229.365666]  vfs_read+0x8a/0x140
[ 8229.365676]  ksys_read+0x59/0xd0
[ 8229.365688]  do_syscall_64+0x55/0x1d0
[ 8229.365700]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Tested-by: Weiping Zhang <zhangweiping@didiglobal.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
---
 block/blk-mq.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index a79afbe60ca6..c6ba94cba17d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2521,18 +2521,6 @@ static void blk_mq_map_swqueue(struct request_queue *q)
 	 * If the cpu isn't present, the cpu is mapped to first hctx.
 	 */
 	for_each_possible_cpu(i) {
-		hctx_idx = set->map[HCTX_TYPE_DEFAULT].mq_map[i];
-		/* unmapped hw queue can be remapped after CPU topo changed */
-		if (!set->tags[hctx_idx] &&
-		    !__blk_mq_alloc_rq_map(set, hctx_idx)) {
-			/*
-			 * If tags initialization fail for some hctx,
-			 * that hctx won't be brought online.  In this
-			 * case, remap the current ctx to hctx[0] which
-			 * is guaranteed to always have tags allocated
-			 */
-			set->map[HCTX_TYPE_DEFAULT].mq_map[i] = 0;
-		}
 
 		ctx = per_cpu_ptr(q->queue_ctx, i);
 		for (j = 0; j < set->nr_maps; j++) {
@@ -2541,6 +2529,18 @@ static void blk_mq_map_swqueue(struct request_queue *q)
 						HCTX_TYPE_DEFAULT, i);
 				continue;
 			}
+			hctx_idx = set->map[j].mq_map[i];
+			/* unmapped hw queue can be remapped after CPU topo changed */
+			if (!set->tags[hctx_idx] &&
+			    !__blk_mq_alloc_rq_map(set, hctx_idx)) {
+				/*
+				 * If tags initialization fail for some hctx,
+				 * that hctx won't be brought online.  In this
+				 * case, remap the current ctx to hctx[0] which
+				 * is guaranteed to always have tags allocated
+				 */
+				set->map[j].mq_map[i] = 0;
+			}
 
 			hctx = blk_mq_map_queue_type(q, j, i);
 			ctx->hctxs[j] = hctx;
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 4/5] block: rename __blk_mq_alloc_rq_map
  2020-05-07 13:03 [PATCH v6 0/5] Fix potential kernel panic when increase hardware queue Weiping Zhang
                   ` (2 preceding siblings ...)
  2020-05-07 13:04 ` [PATCH v6 3/5] block: alloc map and request for new hardware queue Weiping Zhang
@ 2020-05-07 13:04 ` Weiping Zhang
  2020-05-07 15:12   ` Hannes Reinecke
  2020-05-07 13:04 ` [PATCH v6 5/5] block: rename blk_mq_alloc_rq_maps Weiping Zhang
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Weiping Zhang @ 2020-05-07 13:04 UTC (permalink / raw)
  To: axboe, tom.leiming, bvanassche, hch; +Cc: linux-block

rename __blk_mq_alloc_rq_map to __blk_mq_alloc_map_and_request,
actually it alloc both map and request, make function name
align with function.

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
---
 block/blk-mq.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c6ba94cba17d..0a4f7fdd2248 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2473,7 +2473,8 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
 	}
 }
 
-static bool __blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, int hctx_idx)
+static bool __blk_mq_alloc_map_and_request(struct blk_mq_tag_set *set,
+					int hctx_idx)
 {
 	int ret = 0;
 
@@ -2532,7 +2533,7 @@ static void blk_mq_map_swqueue(struct request_queue *q)
 			hctx_idx = set->map[j].mq_map[i];
 			/* unmapped hw queue can be remapped after CPU topo changed */
 			if (!set->tags[hctx_idx] &&
-			    !__blk_mq_alloc_rq_map(set, hctx_idx)) {
+			    !__blk_mq_alloc_map_and_request(set, hctx_idx)) {
 				/*
 				 * If tags initialization fail for some hctx,
 				 * that hctx won't be brought online.  In this
@@ -2988,7 +2989,7 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
 	int i;
 
 	for (i = 0; i < set->nr_hw_queues; i++)
-		if (!__blk_mq_alloc_rq_map(set, i))
+		if (!__blk_mq_alloc_map_and_request(set, i))
 			goto out_unwind;
 
 	return 0;
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 5/5] block: rename blk_mq_alloc_rq_maps
  2020-05-07 13:03 [PATCH v6 0/5] Fix potential kernel panic when increase hardware queue Weiping Zhang
                   ` (3 preceding siblings ...)
  2020-05-07 13:04 ` [PATCH v6 4/5] block: rename __blk_mq_alloc_rq_map Weiping Zhang
@ 2020-05-07 13:04 ` Weiping Zhang
  2020-05-07 15:12   ` Hannes Reinecke
  2020-05-07 13:43 ` [PATCH v6 0/5] Fix potential kernel panic when increase hardware queue Christoph Hellwig
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Weiping Zhang @ 2020-05-07 13:04 UTC (permalink / raw)
  To: axboe, tom.leiming, bvanassche, hch; +Cc: linux-block

rename blk_mq_alloc_rq_maps to blk_mq_alloc_map_and_requests,
this function allocs both map and request, make function name align
with funtion.

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
---
 block/blk-mq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0a4f7fdd2248..649a8abdd742 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3006,7 +3006,7 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
  * may reduce the depth asked for, if memory is tight. set->queue_depth
  * will be updated to reflect the allocated depth.
  */
-static int blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
+static int blk_mq_alloc_map_and_requests(struct blk_mq_tag_set *set)
 {
 	unsigned int depth;
 	int err;
@@ -3166,7 +3166,7 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 	if (ret)
 		goto out_free_mq_map;
 
-	ret = blk_mq_alloc_rq_maps(set);
+	ret = blk_mq_alloc_map_and_requests(set);
 	if (ret)
 		goto out_free_mq_map;
 
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 0/5] Fix potential kernel panic when increase hardware queue
  2020-05-07 13:03 [PATCH v6 0/5] Fix potential kernel panic when increase hardware queue Weiping Zhang
                   ` (4 preceding siblings ...)
  2020-05-07 13:04 ` [PATCH v6 5/5] block: rename blk_mq_alloc_rq_maps Weiping Zhang
@ 2020-05-07 13:43 ` Christoph Hellwig
  2020-05-07 18:31 ` Jens Axboe
  2020-05-12  1:30 ` Bart Van Assche
  7 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2020-05-07 13:43 UTC (permalink / raw)
  To: axboe, tom.leiming, bvanassche, hch, linux-block

The whole series looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 1/5] block: free both rq_map and request
  2020-05-07 13:03 ` [PATCH v6 1/5] block: free both rq_map and request Weiping Zhang
@ 2020-05-07 15:09   ` Hannes Reinecke
  0 siblings, 0 replies; 18+ messages in thread
From: Hannes Reinecke @ 2020-05-07 15:09 UTC (permalink / raw)
  To: axboe, tom.leiming, bvanassche, hch, linux-block

On 5/7/20 3:03 PM, Weiping Zhang wrote:
> Allocation:
> 
> __blk_mq_alloc_rq_map
> 	blk_mq_alloc_rq_map
> 		blk_mq_alloc_rq_map
> 			tags = blk_mq_init_tags : kzalloc_node:
> 			tags->rqs = kcalloc_node
> 			tags->static_rqs = kcalloc_node
> 	blk_mq_alloc_rqs
> 		p = alloc_pages_node
> 		tags->static_rqs[i] = p + offset;
> 
> Free:
> 
> blk_mq_free_rq_map
> 	kfree(tags->rqs);
> 	kfree(tags->static_rqs);
> 	blk_mq_free_tags
> 		kfree(tags);
> 
> The page allocated in blk_mq_alloc_rqs cannot be released,
> so we should use blk_mq_free_map_and_requests here.
> 
> blk_mq_free_map_and_requests
> 	blk_mq_free_rqs
> 		__free_pages : cleanup for blk_mq_alloc_rqs
> 	blk_mq_free_rq_map : cleanup for blk_mq_alloc_rq_map
> 
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
> ---
>   block/blk-mq.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index a7785df2c944..f789b3e1b3ab 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2995,7 +2995,7 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
>   
>   out_unwind:
>   	while (--i >= 0)
> -		blk_mq_free_rq_map(set->tags[i]);
> +		blk_mq_free_map_and_requests(set, i);
>   
>   	return -ENOMEM;
>   }
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 2/5] block: save previous hardware queue count before udpate
  2020-05-07 13:03 ` [PATCH v6 2/5] block: save previous hardware queue count before udpate Weiping Zhang
@ 2020-05-07 15:09   ` Hannes Reinecke
  0 siblings, 0 replies; 18+ messages in thread
From: Hannes Reinecke @ 2020-05-07 15:09 UTC (permalink / raw)
  To: axboe, tom.leiming, bvanassche, hch, linux-block

On 5/7/20 3:03 PM, Weiping Zhang wrote:
> blk_mq_realloc_tag_set_tags will update set->nr_hw_queues, so
> save old set->nr_hw_queues before call this function.
> 
> Reviewed-by: Bart Van Assche <bvanassche@acm.org>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
> ---
>   block/blk-mq.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index f789b3e1b3ab..a79afbe60ca6 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3347,11 +3347,11 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
>   		blk_mq_sysfs_unregister(q);
>   	}
>   
> +	prev_nr_hw_queues = set->nr_hw_queues;
>   	if (blk_mq_realloc_tag_set_tags(set, set->nr_hw_queues, nr_hw_queues) <
>   	    0)
>   		goto reregister;
>   
> -	prev_nr_hw_queues = set->nr_hw_queues;
>   	set->nr_hw_queues = nr_hw_queues;
>   	blk_mq_update_queue_map(set);
>   fallback:
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 3/5] block: alloc map and request for new hardware queue
  2020-05-07 13:04 ` [PATCH v6 3/5] block: alloc map and request for new hardware queue Weiping Zhang
@ 2020-05-07 15:11   ` Hannes Reinecke
  0 siblings, 0 replies; 18+ messages in thread
From: Hannes Reinecke @ 2020-05-07 15:11 UTC (permalink / raw)
  To: axboe, tom.leiming, bvanassche, hch, linux-block

On 5/7/20 3:04 PM, Weiping Zhang wrote:
> From: Ming Lei <ming.lei@redhat.com>
> 
> Alloc new map and request for new hardware queue when increse
> hardware queue count. Before this patch, it will show a
> warning for each new hardware queue, but it's not enough, these
> hctx have no maps and reqeust, when a bio was mapped to these
> hardware queue, it will trigger kernel panic when get request
> from these hctx.
> 
> Test environment:
>   * A NVMe disk supports 128 io queues
>   * 96 cpus in system
> 
> A corner case can always trigger this panic, there are 96
> io queues allocated for HCTX_TYPE_DEFAULT type, the corresponding kernel
> log: nvme nvme0: 96/0/0 default/read/poll queues. Now we set nvme write
> queues to 96, then nvme will alloc others(32) queues for read, but
> blk_mq_update_nr_hw_queues does not alloc map and request for these new
> added io queues. So when process read nvme disk, it will trigger kernel
> panic when get request from these hardware context.
> 
> Reproduce script:
> 
> nr=$(expr `cat /sys/block/nvme0n1/device/queue_count` - 1)
> echo $nr > /sys/module/nvme/parameters/write_queues
> echo 1 > /sys/block/nvme0n1/device/reset_controller
> dd if=/dev/nvme0n1 of=/dev/null bs=4K count=1
> 
> [ 8040.805626] ------------[ cut here ]------------
> [ 8040.805627] WARNING: CPU: 82 PID: 12921 at block/blk-mq.c:2578 blk_mq_map_swqueue+0x2b6/0x2c0
> [ 8040.805627] Modules linked in: nvme nvme_core nf_conntrack_netlink xt_addrtype br_netfilter overlay xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nft_counter nf_nat_tftp nf_conntrack_tftp nft_masq nf_tables_set nft_fib_inet nft_f
> ib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack tun bridge nf_defrag_ipv6 nf_defrag_ipv4 stp llc ip6_tables ip_tables nft_compat rfkill ip_set nf_tables nfne
> tlink sunrpc intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ipmi_ssif crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support ghash_clmulni_intel intel_
> cstate intel_uncore raid0 joydev intel_rapl_perf ipmi_si pcspkr mei_me ioatdma sg ipmi_devintf mei i2c_i801 dca lpc_ich ipmi_msghandler acpi_power_meter acpi_pad xfs libcrc32c sd_mod ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm d
> rm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
> [ 8040.805637]  ahci drm i40e libahci crc32c_intel libata t10_pi wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nvme_core]
> [ 8040.805640] CPU: 82 PID: 12921 Comm: kworker/u194:2 Kdump: loaded Tainted: G        W         5.6.0-rc5.78317c+ #2
> [ 8040.805640] Hardware name: Inspur SA5212M5/YZMB-00882-104, BIOS 4.0.9 08/27/2019
> [ 8040.805641] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
> [ 8040.805642] RIP: 0010:blk_mq_map_swqueue+0x2b6/0x2c0
> [ 8040.805643] Code: 00 00 00 00 00 41 83 c5 01 44 39 6d 50 77 b8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b bb 98 00 00 00 89 d6 e8 8c 81 03 00 eb 83 <0f> 0b e9 52 ff ff ff 0f 1f 00 0f 1f 44 00 00 41 57 48 89 f1 41 56
> [ 8040.805643] RSP: 0018:ffffba590d2e7d48 EFLAGS: 00010246
> [ 8040.805643] RAX: 0000000000000000 RBX: ffff9f013e1ba800 RCX: 000000000000003d
> [ 8040.805644] RDX: ffff9f00ffff6000 RSI: 0000000000000003 RDI: ffff9ed200246d90
> [ 8040.805644] RBP: ffff9f00f6a79860 R08: 0000000000000000 R09: 000000000000003d
> [ 8040.805645] R10: 0000000000000001 R11: ffff9f0138c3d000 R12: ffff9f00fb3a9008
> [ 8040.805645] R13: 000000000000007f R14: ffffffff96822660 R15: 000000000000005f
> [ 8040.805645] FS:  0000000000000000(0000) GS:ffff9f013fa80000(0000) knlGS:0000000000000000
> [ 8040.805646] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 8040.805646] CR2: 00007f7f397fa6f8 CR3: 0000003d8240a002 CR4: 00000000007606e0
> [ 8040.805647] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 8040.805647] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 8040.805647] PKRU: 55555554
> [ 8040.805647] Call Trace:
> [ 8040.805649]  blk_mq_update_nr_hw_queues+0x31b/0x390
> [ 8040.805650]  nvme_reset_work+0xb4b/0xeab [nvme]
> [ 8040.805651]  process_one_work+0x1a7/0x370
> [ 8040.805652]  worker_thread+0x1c9/0x380
> [ 8040.805653]  ? max_active_store+0x80/0x80
> [ 8040.805655]  kthread+0x112/0x130
> [ 8040.805656]  ? __kthread_parkme+0x70/0x70
> [ 8040.805657]  ret_from_fork+0x35/0x40
> [ 8040.805658] ---[ end trace b5f13b1e73ccb5d3 ]---
> [ 8229.365135] BUG: kernel NULL pointer dereference, address: 0000000000000004
> [ 8229.365165] #PF: supervisor read access in kernel mode
> [ 8229.365178] #PF: error_code(0x0000) - not-present page
> [ 8229.365191] PGD 0 P4D 0
> [ 8229.365201] Oops: 0000 [#1] SMP PTI
> [ 8229.365212] CPU: 77 PID: 13024 Comm: dd Kdump: loaded Tainted: G        W         5.6.0-rc5.78317c+ #2
> [ 8229.365232] Hardware name: Inspur SA5212M5/YZMB-00882-104, BIOS 4.0.9 08/27/2019
> [ 8229.365253] RIP: 0010:blk_mq_get_tag+0x227/0x250
> [ 8229.365265] Code: 44 24 04 44 01 e0 48 8b 74 24 38 65 48 33 34 25 28 00 00 00 75 33 48 83 c4 40 5b 5d 41 5c 41 5d 41 5e c3 48 8d 68 10 4c 89 ef <44> 8b 60 04 48 89 ee e8 dd f9 ff ff 83 f8 ff 75 c8 e9 67 fe ff ff
> [ 8229.365304] RSP: 0018:ffffba590e977970 EFLAGS: 00010246
> [ 8229.365317] RAX: 0000000000000000 RBX: ffff9f00f6a79860 RCX: ffffba590e977998
> [ 8229.365333] RDX: 0000000000000000 RSI: ffff9f012039b140 RDI: ffffba590e977a38
> [ 8229.365349] RBP: 0000000000000010 R08: ffffda58ff94e190 R09: ffffda58ff94e198
> [ 8229.365365] R10: 0000000000000011 R11: ffff9f00f6a79860 R12: 0000000000000000
> [ 8229.365381] R13: ffffba590e977a38 R14: ffff9f012039b140 R15: 0000000000000001
> [ 8229.365397] FS:  00007f481c230580(0000) GS:ffff9f013f940000(0000) knlGS:0000000000000000
> [ 8229.365415] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 8229.365428] CR2: 0000000000000004 CR3: 0000005f35e26004 CR4: 00000000007606e0
> [ 8229.365444] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 8229.365460] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 8229.365476] PKRU: 55555554
> [ 8229.365484] Call Trace:
> [ 8229.365498]  ? finish_wait+0x80/0x80
> [ 8229.365512]  blk_mq_get_request+0xcb/0x3f0
> [ 8229.365525]  blk_mq_make_request+0x143/0x5d0
> [ 8229.365538]  generic_make_request+0xcf/0x310
> [ 8229.365553]  ? scan_shadow_nodes+0x30/0x30
> [ 8229.365564]  submit_bio+0x3c/0x150
> [ 8229.365576]  mpage_readpages+0x163/0x1a0
> [ 8229.365588]  ? blkdev_direct_IO+0x490/0x490
> [ 8229.365601]  read_pages+0x6b/0x190
> [ 8229.365612]  __do_page_cache_readahead+0x1c1/0x1e0
> [ 8229.365626]  ondemand_readahead+0x182/0x2f0
> [ 8229.365639]  generic_file_buffered_read+0x590/0xab0
> [ 8229.365655]  new_sync_read+0x12a/0x1c0
> [ 8229.365666]  vfs_read+0x8a/0x140
> [ 8229.365676]  ksys_read+0x59/0xd0
> [ 8229.365688]  do_syscall_64+0x55/0x1d0
> [ 8229.365700]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Tested-by: Weiping Zhang <zhangweiping@didiglobal.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
> ---
>   block/blk-mq.c | 24 ++++++++++++------------
>   1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index a79afbe60ca6..c6ba94cba17d 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2521,18 +2521,6 @@ static void blk_mq_map_swqueue(struct request_queue *q)
>   	 * If the cpu isn't present, the cpu is mapped to first hctx.
>   	 */
>   	for_each_possible_cpu(i) {
> -		hctx_idx = set->map[HCTX_TYPE_DEFAULT].mq_map[i];
> -		/* unmapped hw queue can be remapped after CPU topo changed */
> -		if (!set->tags[hctx_idx] &&
> -		    !__blk_mq_alloc_rq_map(set, hctx_idx)) {
> -			/*
> -			 * If tags initialization fail for some hctx,
> -			 * that hctx won't be brought online.  In this
> -			 * case, remap the current ctx to hctx[0] which
> -			 * is guaranteed to always have tags allocated
> -			 */
> -			set->map[HCTX_TYPE_DEFAULT].mq_map[i] = 0;
> -		}
>   
>   		ctx = per_cpu_ptr(q->queue_ctx, i);
>   		for (j = 0; j < set->nr_maps; j++) {
> @@ -2541,6 +2529,18 @@ static void blk_mq_map_swqueue(struct request_queue *q)
>   						HCTX_TYPE_DEFAULT, i);
>   				continue;
>   			}
> +			hctx_idx = set->map[j].mq_map[i];
> +			/* unmapped hw queue can be remapped after CPU topo changed */
> +			if (!set->tags[hctx_idx] &&
> +			    !__blk_mq_alloc_rq_map(set, hctx_idx)) {
> +				/*
> +				 * If tags initialization fail for some hctx,
> +				 * that hctx won't be brought online.  In this
> +				 * case, remap the current ctx to hctx[0] which
> +				 * is guaranteed to always have tags allocated
> +				 */
> +				set->map[j].mq_map[i] = 0;
> +			}
>   
>   			hctx = blk_mq_map_queue_type(q, j, i);
>   			ctx->hctxs[j] = hctx;
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 4/5] block: rename __blk_mq_alloc_rq_map
  2020-05-07 13:04 ` [PATCH v6 4/5] block: rename __blk_mq_alloc_rq_map Weiping Zhang
@ 2020-05-07 15:12   ` Hannes Reinecke
  0 siblings, 0 replies; 18+ messages in thread
From: Hannes Reinecke @ 2020-05-07 15:12 UTC (permalink / raw)
  To: axboe, tom.leiming, bvanassche, hch, linux-block

On 5/7/20 3:04 PM, Weiping Zhang wrote:
> rename __blk_mq_alloc_rq_map to __blk_mq_alloc_map_and_request,
> actually it alloc both map and request, make function name
> align with function.
> 
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
> ---
>   block/blk-mq.c | 7 ++++---
>   1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index c6ba94cba17d..0a4f7fdd2248 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2473,7 +2473,8 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
>   	}
>   }
>   
> -static bool __blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, int hctx_idx)
> +static bool __blk_mq_alloc_map_and_request(struct blk_mq_tag_set *set,
> +					int hctx_idx)
>   {
>   	int ret = 0;
>   
> @@ -2532,7 +2533,7 @@ static void blk_mq_map_swqueue(struct request_queue *q)
>   			hctx_idx = set->map[j].mq_map[i];
>   			/* unmapped hw queue can be remapped after CPU topo changed */
>   			if (!set->tags[hctx_idx] &&
> -			    !__blk_mq_alloc_rq_map(set, hctx_idx)) {
> +			    !__blk_mq_alloc_map_and_request(set, hctx_idx)) {
>   				/*
>   				 * If tags initialization fail for some hctx,
>   				 * that hctx won't be brought online.  In this
> @@ -2988,7 +2989,7 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
>   	int i;
>   
>   	for (i = 0; i < set->nr_hw_queues; i++)
> -		if (!__blk_mq_alloc_rq_map(set, i))
> +		if (!__blk_mq_alloc_map_and_request(set, i))
>   			goto out_unwind;
>   
>   	return 0;
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 5/5] block: rename blk_mq_alloc_rq_maps
  2020-05-07 13:04 ` [PATCH v6 5/5] block: rename blk_mq_alloc_rq_maps Weiping Zhang
@ 2020-05-07 15:12   ` Hannes Reinecke
  0 siblings, 0 replies; 18+ messages in thread
From: Hannes Reinecke @ 2020-05-07 15:12 UTC (permalink / raw)
  To: axboe, tom.leiming, bvanassche, hch, linux-block

On 5/7/20 3:04 PM, Weiping Zhang wrote:
> rename blk_mq_alloc_rq_maps to blk_mq_alloc_map_and_requests,
> this function allocs both map and request, make function name align
> with funtion.
> 
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
> ---
>   block/blk-mq.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 0a4f7fdd2248..649a8abdd742 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3006,7 +3006,7 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
>    * may reduce the depth asked for, if memory is tight. set->queue_depth
>    * will be updated to reflect the allocated depth.
>    */
> -static int blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
> +static int blk_mq_alloc_map_and_requests(struct blk_mq_tag_set *set)
>   {
>   	unsigned int depth;
>   	int err;
> @@ -3166,7 +3166,7 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
>   	if (ret)
>   		goto out_free_mq_map;
>   
> -	ret = blk_mq_alloc_rq_maps(set);
> +	ret = blk_mq_alloc_map_and_requests(set);
>   	if (ret)
>   		goto out_free_mq_map;
>   
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 0/5] Fix potential kernel panic when increase hardware queue
  2020-05-07 13:03 [PATCH v6 0/5] Fix potential kernel panic when increase hardware queue Weiping Zhang
                   ` (5 preceding siblings ...)
  2020-05-07 13:43 ` [PATCH v6 0/5] Fix potential kernel panic when increase hardware queue Christoph Hellwig
@ 2020-05-07 18:31 ` Jens Axboe
  2020-05-12  1:30 ` Bart Van Assche
  7 siblings, 0 replies; 18+ messages in thread
From: Jens Axboe @ 2020-05-07 18:31 UTC (permalink / raw)
  To: tom.leiming, bvanassche, hch, linux-block

On 5/7/20 7:03 AM, Weiping Zhang wrote:
> Hi,
> 
> This series mainly fix the kernel panic when increase hardware queue,
> and also fix some other misc issue.
> 
> Memleak 1:
> 
> __blk_mq_alloc_rq_maps
> 	__blk_mq_alloc_rq_map
> 
> if fail
> 	blk_mq_free_rq_map
> 
> Actually, __blk_mq_alloc_rq_map alloc both map and request, here
> also need free request.

Applied for 5.8, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 0/5] Fix potential kernel panic when increase hardware queue
  2020-05-07 13:03 [PATCH v6 0/5] Fix potential kernel panic when increase hardware queue Weiping Zhang
                   ` (6 preceding siblings ...)
  2020-05-07 18:31 ` Jens Axboe
@ 2020-05-12  1:30 ` Bart Van Assche
  2020-05-12 12:09   ` Weiping Zhang
  7 siblings, 1 reply; 18+ messages in thread
From: Bart Van Assche @ 2020-05-12  1:30 UTC (permalink / raw)
  To: axboe, tom.leiming, hch, linux-block, Weiping Zhang

On 2020-05-07 06:03, Weiping Zhang wrote:
> This series mainly fix the kernel panic when increase hardware queue,
> and also fix some other misc issue.

Does this patch series survive blktests? I'm asking this because
blktests triggers the crash shown below for Jens' block-for-next branch.
I think this report is the result of a recent change.

run blktests block/030

null_blk: module loaded
Increasing nr_hw_queues to 8 fails, fallback to 1
==================================================================
BUG: KASAN: null-ptr-deref in blk_mq_map_swqueue+0x2f2/0x830
Read of size 8 at addr 0000000000000128 by task nproc/8541

CPU: 5 PID: 8541 Comm: nproc Not tainted 5.7.0-rc4-dbg+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
Call Trace:
 dump_stack+0xa5/0xe6
 __kasan_report.cold+0x65/0xbb
 kasan_report+0x45/0x60
 check_memory_region+0x15e/0x1c0
 __kasan_check_read+0x15/0x20
 blk_mq_map_swqueue+0x2f2/0x830
 __blk_mq_update_nr_hw_queues+0x3df/0x690
 blk_mq_update_nr_hw_queues+0x32/0x50
 nullb_device_submit_queues_store+0xde/0x160 [null_blk]
 configfs_write_file+0x1c4/0x250 [configfs]
 __vfs_write+0x4c/0x90
 vfs_write+0x14b/0x2d0
 ksys_write+0xdd/0x180
 __x64_sys_write+0x47/0x50
 do_syscall_64+0x6f/0x310
 entry_SYSCALL_64_after_hwframe+0x49/0xb3

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 0/5] Fix potential kernel panic when increase hardware queue
  2020-05-12  1:30 ` Bart Van Assche
@ 2020-05-12 12:09   ` Weiping Zhang
  2020-05-12 12:20     ` Weiping Zhang
  0 siblings, 1 reply; 18+ messages in thread
From: Weiping Zhang @ 2020-05-12 12:09 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, tom.leiming, Christoph Hellwig, linux-block, Weiping Zhang

On Tue, May 12, 2020 at 9:31 AM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 2020-05-07 06:03, Weiping Zhang wrote:
> > This series mainly fix the kernel panic when increase hardware queue,
> > and also fix some other misc issue.
>
> Does this patch series survive blktests? I'm asking this because
> blktests triggers the crash shown below for Jens' block-for-next branch.
> I think this report is the result of a recent change.
>
> run blktests block/030
>
> null_blk: module loaded
> Increasing nr_hw_queues to 8 fails, fallback to 1
> ==================================================================
> BUG: KASAN: null-ptr-deref in blk_mq_map_swqueue+0x2f2/0x830
> Read of size 8 at addr 0000000000000128 by task nproc/8541
>
> CPU: 5 PID: 8541 Comm: nproc Not tainted 5.7.0-rc4-dbg+ #3
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
> Call Trace:
>  dump_stack+0xa5/0xe6
>  __kasan_report.cold+0x65/0xbb
>  kasan_report+0x45/0x60
>  check_memory_region+0x15e/0x1c0
>  __kasan_check_read+0x15/0x20
>  blk_mq_map_swqueue+0x2f2/0x830
>  __blk_mq_update_nr_hw_queues+0x3df/0x690
>  blk_mq_update_nr_hw_queues+0x32/0x50
>  nullb_device_submit_queues_store+0xde/0x160 [null_blk]
>  configfs_write_file+0x1c4/0x250 [configfs]
>  __vfs_write+0x4c/0x90
>  vfs_write+0x14b/0x2d0
>  ksys_write+0xdd/0x180
>  __x64_sys_write+0x47/0x50
>  do_syscall_64+0x6f/0x310
>  entry_SYSCALL_64_after_hwframe+0x49/0xb3
>
> Thanks,
>

Hi Bart,

I don't test block/030, since I don't pull blktest very often.

It's a different problem,
because the mapping cann't be reset when do fallback, so the
cpu[>=1] will point to a hctx(!=0).

 it should be fixed by:

diff --git a/block/blk-mq.c b/block/blk-mq.c
index bc34d6b572b6..d82cefb0474f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3365,8 +3365,8 @@ static void __blk_mq_update_nr_hw_queues(struct
blk_mq_tag_set *set,
                goto reregister;

        set->nr_hw_queues = nr_hw_queues;
-       blk_mq_update_queue_map(set);
 fallback:
+       blk_mq_update_queue_map(set);
        list_for_each_entry(q, &set->tag_list, tag_set_list) {
                blk_mq_realloc_hw_ctxs(set, q);
                if (q->nr_hw_queues != set->nr_hw_queues) {
> Bart.

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 0/5] Fix potential kernel panic when increase hardware queue
  2020-05-12 12:09   ` Weiping Zhang
@ 2020-05-12 12:20     ` Weiping Zhang
  2020-05-12 23:08       ` Bart Van Assche
  0 siblings, 1 reply; 18+ messages in thread
From: Weiping Zhang @ 2020-05-12 12:20 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, tom.leiming, Christoph Hellwig, linux-block, Weiping Zhang

On Tue, May 12, 2020 at 8:09 PM Weiping Zhang <zwp10758@gmail.com> wrote:
>
> On Tue, May 12, 2020 at 9:31 AM Bart Van Assche <bvanassche@acm.org> wrote:
> >
> > On 2020-05-07 06:03, Weiping Zhang wrote:
> > > This series mainly fix the kernel panic when increase hardware queue,
> > > and also fix some other misc issue.
> >
> > Does this patch series survive blktests? I'm asking this because
> > blktests triggers the crash shown below for Jens' block-for-next branch.
> > I think this report is the result of a recent change.
> >
> > run blktests block/030
> >
> > null_blk: module loaded
> > Increasing nr_hw_queues to 8 fails, fallback to 1
> > ==================================================================
> > BUG: KASAN: null-ptr-deref in blk_mq_map_swqueue+0x2f2/0x830
> > Read of size 8 at addr 0000000000000128 by task nproc/8541
> >
> > CPU: 5 PID: 8541 Comm: nproc Not tainted 5.7.0-rc4-dbg+ #3
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014
> > Call Trace:
> >  dump_stack+0xa5/0xe6
> >  __kasan_report.cold+0x65/0xbb
> >  kasan_report+0x45/0x60
> >  check_memory_region+0x15e/0x1c0
> >  __kasan_check_read+0x15/0x20
> >  blk_mq_map_swqueue+0x2f2/0x830
> >  __blk_mq_update_nr_hw_queues+0x3df/0x690
> >  blk_mq_update_nr_hw_queues+0x32/0x50
> >  nullb_device_submit_queues_store+0xde/0x160 [null_blk]
> >  configfs_write_file+0x1c4/0x250 [configfs]
> >  __vfs_write+0x4c/0x90
> >  vfs_write+0x14b/0x2d0
> >  ksys_write+0xdd/0x180
> >  __x64_sys_write+0x47/0x50
> >  do_syscall_64+0x6f/0x310
> >  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> >
> > Thanks,
> >
>
> Hi Bart,
>
> I don't test block/030, since I don't pull blktest very often.
>
> It's a different problem,
> because the mapping cann't be reset when do fallback, so the
> cpu[>=1] will point to a hctx(!=0).
>
>  it should be fixed by:
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index bc34d6b572b6..d82cefb0474f 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3365,8 +3365,8 @@ static void __blk_mq_update_nr_hw_queues(struct
> blk_mq_tag_set *set,
>                 goto reregister;
>
>         set->nr_hw_queues = nr_hw_queues;
> -       blk_mq_update_queue_map(set);
>  fallback:
> +       blk_mq_update_queue_map(set);
>         list_for_each_entry(q, &set->tag_list, tag_set_list) {
>                 blk_mq_realloc_hw_ctxs(set, q);
>                 if (q->nr_hw_queues != set->nr_hw_queues) {

And block/030 should also be improved ?

 35         # Since older null_blk versions do not allow "submit_queues" to be
 36         # modified, check first whether that configs attribute is writeable.
 37         # Each iteration of the loop below triggers $(nproc) + 1
 38         # null_init_hctx() calls. Since <interval>=$(nproc), all possible
 39         # blk_mq_realloc_hw_ctxs() error paths will be triggered. Whether or
 40         # not this test succeeds depends on whether or not _check_dmesg()
 41         # detects a kernel warning.
 42         if { echo "$(<"$sq")" >$sq; } 2>/dev/null; then
 43                 for ((i = 0; i < 100; i++)); do
 44                         echo 1 > $sq
 45                         nproc > $sq  # this line output lots
"nproc: write error: Cannot allocate memory"
 46                 done
 47         else
 48                 SKIP_REASON="Skipping test because $sq cannot be modified"
 49         fi


The test result show this test case [failed], actually it [pass],
there is no warning detect
in kernel log, if apply above patch.

block/030 (trigger the blk_mq_realloc_hw_ctxs() error path)  [failed]
    runtime  1.999s  ...  2.115s
    --- tests/block/030.out 2020-05-12 10:42:26.345782849 +0800
    +++ /data1/zwp/src/blktests/results/nodev/block/030.out.bad
2020-05-12 20:14:59.878915218 +0800
    @@ -1 +1,51 @@
    +nproc: write error: Cannot allocate memory
    +nproc: write error: Cannot allocate memory
    +nproc: write error: Cannot allocate memory
    +nproc: write error: Cannot allocate memory
    +nproc: write error: Cannot allocate memory
    +nproc: write error: Cannot allocate memory
    +nproc: write error: Cannot allocate memory
    ...
    (Run 'diff -u tests/block/030.out
/data1/zwp/src/blktests/results/nodev/block/030.out.bad' to see the
entire diff)

Thanks
Weiping

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 0/5] Fix potential kernel panic when increase hardware queue
  2020-05-12 12:20     ` Weiping Zhang
@ 2020-05-12 23:08       ` Bart Van Assche
  2020-05-13  0:43         ` Weiping Zhang
  0 siblings, 1 reply; 18+ messages in thread
From: Bart Van Assche @ 2020-05-12 23:08 UTC (permalink / raw)
  To: Weiping Zhang
  Cc: Jens Axboe, tom.leiming, Christoph Hellwig, linux-block, Weiping Zhang

On 2020-05-12 05:20, Weiping Zhang wrote:
> On Tue, May 12, 2020 at 8:09 PM Weiping Zhang <zwp10758@gmail.com> wrote:
>> I don't test block/030, since I don't pull blktest very often.

That's unfortunate ...

>> It's a different problem,
>> because the mapping cann't be reset when do fallback, so the
>> cpu[>=1] will point to a hctx(!=0).
>>
>>  it should be fixed by:
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index bc34d6b572b6..d82cefb0474f 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -3365,8 +3365,8 @@ static void __blk_mq_update_nr_hw_queues(struct
>> blk_mq_tag_set *set,
>>                 goto reregister;
>>
>>         set->nr_hw_queues = nr_hw_queues;
>> -       blk_mq_update_queue_map(set);
>>  fallback:
>> +       blk_mq_update_queue_map(set);
>>         list_for_each_entry(q, &set->tag_list, tag_set_list) {
>>                 blk_mq_realloc_hw_ctxs(set, q);
>>                 if (q->nr_hw_queues != set->nr_hw_queues) {

If this is posted as a patch, feel free to add:

Tested-by: Bart van Assche <bvanassche@acm.org>

> And block/030 should also be improved ?
> 
>  35         # Since older null_blk versions do not allow "submit_queues" to be
>  36         # modified, check first whether that configs attribute is writeable.
>  37         # Each iteration of the loop below triggers $(nproc) + 1
>  38         # null_init_hctx() calls. Since <interval>=$(nproc), all possible
>  39         # blk_mq_realloc_hw_ctxs() error paths will be triggered. Whether or
>  40         # not this test succeeds depends on whether or not _check_dmesg()
>  41         # detects a kernel warning.
>  42         if { echo "$(<"$sq")" >$sq; } 2>/dev/null; then
>  43                 for ((i = 0; i < 100; i++)); do
>  44                         echo 1 > $sq
>  45                         nproc > $sq  # this line output lots
> "nproc: write error: Cannot allocate memory"
>  46                 done
>  47         else
>  48                 SKIP_REASON="Skipping test because $sq cannot be modified"
>  49         fi
> 
> 
> The test result show this test case [failed], actually it [pass],
> there is no warning detect
> in kernel log, if apply above patch.
> 
> block/030 (trigger the blk_mq_realloc_hw_ctxs() error path)  [failed]
>     runtime  1.999s  ...  2.115s
>     --- tests/block/030.out 2020-05-12 10:42:26.345782849 +0800
>     +++ /data1/zwp/src/blktests/results/nodev/block/030.out.bad
> 2020-05-12 20:14:59.878915218 +0800
>     @@ -1 +1,51 @@
>     +nproc: write error: Cannot allocate memory
>     +nproc: write error: Cannot allocate memory
>     +nproc: write error: Cannot allocate memory
>     +nproc: write error: Cannot allocate memory
>     +nproc: write error: Cannot allocate memory
>     +nproc: write error: Cannot allocate memory
>     +nproc: write error: Cannot allocate memory
>     ...
>     (Run 'diff -u tests/block/030.out
> /data1/zwp/src/blktests/results/nodev/block/030.out.bad' to see the
> entire diff)

That's weird. I have not yet encountered this. Test block/030 passes on
my setup.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 0/5] Fix potential kernel panic when increase hardware queue
  2020-05-12 23:08       ` Bart Van Assche
@ 2020-05-13  0:43         ` Weiping Zhang
  0 siblings, 0 replies; 18+ messages in thread
From: Weiping Zhang @ 2020-05-13  0:43 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, tom.leiming, Christoph Hellwig, linux-block, Weiping Zhang

On Wed, May 13, 2020 at 7:08 AM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 2020-05-12 05:20, Weiping Zhang wrote:
> > On Tue, May 12, 2020 at 8:09 PM Weiping Zhang <zwp10758@gmail.com> wrote:
> >> I don't test block/030, since I don't pull blktest very often.
>
> That's unfortunate ...
>
> >> It's a different problem,
> >> because the mapping cann't be reset when do fallback, so the
> >> cpu[>=1] will point to a hctx(!=0).
> >>
> >>  it should be fixed by:
> >>
> >> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >> index bc34d6b572b6..d82cefb0474f 100644
> >> --- a/block/blk-mq.c
> >> +++ b/block/blk-mq.c
> >> @@ -3365,8 +3365,8 @@ static void __blk_mq_update_nr_hw_queues(struct
> >> blk_mq_tag_set *set,
> >>                 goto reregister;
> >>
> >>         set->nr_hw_queues = nr_hw_queues;
> >> -       blk_mq_update_queue_map(set);
> >>  fallback:
> >> +       blk_mq_update_queue_map(set);
> >>         list_for_each_entry(q, &set->tag_list, tag_set_list) {
> >>                 blk_mq_realloc_hw_ctxs(set, q);
> >>                 if (q->nr_hw_queues != set->nr_hw_queues) {
>
> If this is posted as a patch, feel free to add:
>
> Tested-by: Bart van Assche <bvanassche@acm.org>
>
Post it latter, thank you

> > And block/030 should also be improved ?
> >
> >  35         # Since older null_blk versions do not allow "submit_queues" to be
> >  36         # modified, check first whether that configs attribute is writeable.
> >  37         # Each iteration of the loop below triggers $(nproc) + 1
> >  38         # null_init_hctx() calls. Since <interval>=$(nproc), all possible
> >  39         # blk_mq_realloc_hw_ctxs() error paths will be triggered. Whether or
> >  40         # not this test succeeds depends on whether or not _check_dmesg()
> >  41         # detects a kernel warning.
> >  42         if { echo "$(<"$sq")" >$sq; } 2>/dev/null; then
> >  43                 for ((i = 0; i < 100; i++)); do
> >  44                         echo 1 > $sq
> >  45                         nproc > $sq  # this line output lots
> > "nproc: write error: Cannot allocate memory"
> >  46                 done
> >  47         else
> >  48                 SKIP_REASON="Skipping test because $sq cannot be modified"
> >  49         fi
> >
> >
> > The test result show this test case [failed], actually it [pass],
> > there is no warning detect
> > in kernel log, if apply above patch.
> >
> > block/030 (trigger the blk_mq_realloc_hw_ctxs() error path)  [failed]
> >     runtime  1.999s  ...  2.115s
> >     --- tests/block/030.out 2020-05-12 10:42:26.345782849 +0800
> >     +++ /data1/zwp/src/blktests/results/nodev/block/030.out.bad
> > 2020-05-12 20:14:59.878915218 +0800
> >     @@ -1 +1,51 @@
> >     +nproc: write error: Cannot allocate memory
> >     +nproc: write error: Cannot allocate memory
> >     +nproc: write error: Cannot allocate memory
> >     +nproc: write error: Cannot allocate memory
> >     +nproc: write error: Cannot allocate memory
> >     +nproc: write error: Cannot allocate memory
> >     +nproc: write error: Cannot allocate memory
> >     ...
> >     (Run 'diff -u tests/block/030.out
> > /data1/zwp/src/blktests/results/nodev/block/030.out.bad' to see the
> > entire diff)
>
> That's weird. I have not yet encountered this. Test block/030 passes on
> my setup.
>
> Thanks,
>
> Bart.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-05-13  0:43 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-07 13:03 [PATCH v6 0/5] Fix potential kernel panic when increase hardware queue Weiping Zhang
2020-05-07 13:03 ` [PATCH v6 1/5] block: free both rq_map and request Weiping Zhang
2020-05-07 15:09   ` Hannes Reinecke
2020-05-07 13:03 ` [PATCH v6 2/5] block: save previous hardware queue count before udpate Weiping Zhang
2020-05-07 15:09   ` Hannes Reinecke
2020-05-07 13:04 ` [PATCH v6 3/5] block: alloc map and request for new hardware queue Weiping Zhang
2020-05-07 15:11   ` Hannes Reinecke
2020-05-07 13:04 ` [PATCH v6 4/5] block: rename __blk_mq_alloc_rq_map Weiping Zhang
2020-05-07 15:12   ` Hannes Reinecke
2020-05-07 13:04 ` [PATCH v6 5/5] block: rename blk_mq_alloc_rq_maps Weiping Zhang
2020-05-07 15:12   ` Hannes Reinecke
2020-05-07 13:43 ` [PATCH v6 0/5] Fix potential kernel panic when increase hardware queue Christoph Hellwig
2020-05-07 18:31 ` Jens Axboe
2020-05-12  1:30 ` Bart Van Assche
2020-05-12 12:09   ` Weiping Zhang
2020-05-12 12:20     ` Weiping Zhang
2020-05-12 23:08       ` Bart Van Assche
2020-05-13  0:43         ` Weiping Zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.