All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] Fix potential kernel panic when increase hardware queue
@ 2020-04-06 19:35 Weiping Zhang
  2020-04-06 19:36 ` [PATCH v3 1/7] block: rename __blk_mq_alloc_rq_map Weiping Zhang
                   ` (7 more replies)
  0 siblings, 8 replies; 21+ messages in thread
From: Weiping Zhang @ 2020-04-06 19:35 UTC (permalink / raw)
  To: axboe, bvanassche; +Cc: linux-block

Hi,

This series do some improvement base on V2, and also fix two memleaks.
And V2 import a regression for blktest/block/test/029, this test case
will increase and decrease hardware queue count quickly.


Memleak 1:

__blk_mq_alloc_rq_maps
	__blk_mq_alloc_rq_map

if fail
	blk_mq_free_rq_map

Actually, __blk_mq_alloc_rq_map alloc both map and request, here
also need free request.

Memleak 2:
When driver decrease hardware queue, set->nr_hw_queues will be changed
firstly in blk_mq_realloc_tag_set_tags or __blk_mq_update_nr_hw_queues,
then blk_mq_realloc_hw_ctxs and blk_mq_map_swqueue, even
blk_mq_free_tag_set have no chance to free these hardware queue resource,
because they iterate hardware queue by
for (i = 0; i < set->nr_hw_queues; i++).

Patch1~3: rename some function name, no function change.
Patch4: fix first memleak.
Patch5: fix prev_nr_hw_queues issue, need be saved before change.
Patch6: alloc map and request to fix potential kernel panic.


Changes since V2:
 * rename some functions name and fix memleak when free map and requests
 * Not free new allocated map and request, they will be relased when tagset gone

Changes since V1:
 * Add fix for potential kernel panic when increase hardware queue

Weiping Zhang (7):
  block: rename __blk_mq_alloc_rq_map
  block: rename __blk_mq_alloc_rq_maps
  block: rename blk_mq_alloc_rq_maps
  block: free both map and request
  block: save previous hardware queue count before udpate
  block: refactor __blk_mq_alloc_rq_map_and_requests
  block: alloc map and request for new hardware queue

 block/blk-mq.c         | 49 ++++++++++++++++++++++++++++++------------
 include/linux/blk-mq.h |  1 +
 2 files changed, 36 insertions(+), 14 deletions(-)

-- 
2.18.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 1/7] block: rename __blk_mq_alloc_rq_map
  2020-04-06 19:35 [PATCH v3 0/7] Fix potential kernel panic when increase hardware queue Weiping Zhang
@ 2020-04-06 19:36 ` Weiping Zhang
  2020-04-20 20:41   ` Bart Van Assche
  2020-04-06 19:36 ` [PATCH v3 2/7] block: rename __blk_mq_alloc_rq_maps Weiping Zhang
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Weiping Zhang @ 2020-04-06 19:36 UTC (permalink / raw)
  To: axboe, bvanassche; +Cc: linux-block

rename __blk_mq_alloc_rq_map to __blk_mq_alloc_rq_map_and_request,
actually it alloc both map and request, make function name
align with function.

Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
---
 block/blk-mq.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f6291ceedee4..3a482ce7ed28 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2468,7 +2468,7 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
 	}
 }
 
-static bool __blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, int hctx_idx)
+static bool __blk_mq_alloc_rq_map_and_request(struct blk_mq_tag_set *set, int hctx_idx)
 {
 	int ret = 0;
 
@@ -2519,7 +2519,7 @@ static void blk_mq_map_swqueue(struct request_queue *q)
 		hctx_idx = set->map[HCTX_TYPE_DEFAULT].mq_map[i];
 		/* unmapped hw queue can be remapped after CPU topo changed */
 		if (!set->tags[hctx_idx] &&
-		    !__blk_mq_alloc_rq_map(set, hctx_idx)) {
+		    !__blk_mq_alloc_rq_map_and_request(set, hctx_idx)) {
 			/*
 			 * If tags initialization fail for some hctx,
 			 * that hctx won't be brought online.  In this
@@ -2983,7 +2983,7 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
 	int i;
 
 	for (i = 0; i < set->nr_hw_queues; i++)
-		if (!__blk_mq_alloc_rq_map(set, i))
+		if (!__blk_mq_alloc_rq_map_and_request(set, i))
 			goto out_unwind;
 
 	return 0;
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 2/7] block: rename __blk_mq_alloc_rq_maps
  2020-04-06 19:35 [PATCH v3 0/7] Fix potential kernel panic when increase hardware queue Weiping Zhang
  2020-04-06 19:36 ` [PATCH v3 1/7] block: rename __blk_mq_alloc_rq_map Weiping Zhang
@ 2020-04-06 19:36 ` Weiping Zhang
  2020-04-20 20:46   ` Bart Van Assche
  2020-04-06 19:36 ` [PATCH v3 3/7] block: rename blk_mq_alloc_rq_maps Weiping Zhang
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Weiping Zhang @ 2020-04-06 19:36 UTC (permalink / raw)
  To: axboe; +Cc: linux-block

rename __blk_mq_alloc_rq_maps to __blk_mq_alloc_rq_map_and_requests,
this function allocs both map and request, make function name align
with funtion.

Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
---
 block/blk-mq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3a482ce7ed28..5a322130aaf2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2978,7 +2978,7 @@ void blk_mq_exit_queue(struct request_queue *q)
 	blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
 }
 
-static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
+static int __blk_mq_alloc_rq_map_and_requests(struct blk_mq_tag_set *set)
 {
 	int i;
 
@@ -3007,7 +3007,7 @@ static int blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
 
 	depth = set->queue_depth;
 	do {
-		err = __blk_mq_alloc_rq_maps(set);
+		err = __blk_mq_alloc_rq_map_and_requests(set);
 		if (!err)
 			break;
 
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 3/7] block: rename blk_mq_alloc_rq_maps
  2020-04-06 19:35 [PATCH v3 0/7] Fix potential kernel panic when increase hardware queue Weiping Zhang
  2020-04-06 19:36 ` [PATCH v3 1/7] block: rename __blk_mq_alloc_rq_map Weiping Zhang
  2020-04-06 19:36 ` [PATCH v3 2/7] block: rename __blk_mq_alloc_rq_maps Weiping Zhang
@ 2020-04-06 19:36 ` Weiping Zhang
  2020-04-20 20:47   ` Bart Van Assche
  2020-04-06 19:36 ` [PATCH v3 4/7] block: free both map and request Weiping Zhang
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Weiping Zhang @ 2020-04-06 19:36 UTC (permalink / raw)
  To: axboe, bvanassche; +Cc: linux-block

rename blk_mq_alloc_rq_maps to blk_mq_alloc_rq_map_and_requests,
this function alloc both map and request, make function name
align with function.

Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
---
 block/blk-mq.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5a322130aaf2..4692e8232699 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3000,7 +3000,7 @@ static int __blk_mq_alloc_rq_map_and_requests(struct blk_mq_tag_set *set)
  * may reduce the depth asked for, if memory is tight. set->queue_depth
  * will be updated to reflect the allocated depth.
  */
-static int blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
+static int blk_mq_alloc_rq_map_and_requests(struct blk_mq_tag_set *set)
 {
 	unsigned int depth;
 	int err;
@@ -3160,7 +3160,7 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 	if (ret)
 		goto out_free_mq_map;
 
-	ret = blk_mq_alloc_rq_maps(set);
+	ret = blk_mq_alloc_rq_map_and_requests(set);
 	if (ret)
 		goto out_free_mq_map;
 
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 4/7] block: free both map and request
  2020-04-06 19:35 [PATCH v3 0/7] Fix potential kernel panic when increase hardware queue Weiping Zhang
                   ` (2 preceding siblings ...)
  2020-04-06 19:36 ` [PATCH v3 3/7] block: rename blk_mq_alloc_rq_maps Weiping Zhang
@ 2020-04-06 19:36 ` Weiping Zhang
  2020-04-20 20:58   ` Bart Van Assche
  2020-04-06 19:37 ` [PATCH v3 5/7] block: save previous hardware queue count before udpate Weiping Zhang
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Weiping Zhang @ 2020-04-06 19:36 UTC (permalink / raw)
  To: axboe, bvanassche; +Cc: linux-block

For this error handle, it should free both map and request,
otherwise memleak occur.

Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
---
 block/blk-mq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4692e8232699..406df9ce9b55 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2990,7 +2990,7 @@ static int __blk_mq_alloc_rq_map_and_requests(struct blk_mq_tag_set *set)
 
 out_unwind:
 	while (--i >= 0)
-		blk_mq_free_rq_map(set->tags[i]);
+		blk_mq_free_map_and_requests(set, i);
 
 	return -ENOMEM;
 }
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 5/7] block: save previous hardware queue count before udpate
  2020-04-06 19:35 [PATCH v3 0/7] Fix potential kernel panic when increase hardware queue Weiping Zhang
                   ` (3 preceding siblings ...)
  2020-04-06 19:36 ` [PATCH v3 4/7] block: free both map and request Weiping Zhang
@ 2020-04-06 19:37 ` Weiping Zhang
  2020-04-20 21:00   ` Bart Van Assche
  2020-04-06 19:37 ` [PATCH v3 6/7] block: refactor __blk_mq_alloc_rq_map_and_requests Weiping Zhang
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Weiping Zhang @ 2020-04-06 19:37 UTC (permalink / raw)
  To: axboe, bvanassche; +Cc: linux-block

blk_mq_realloc_tag_set_tags will update set->nr_hw_queues, so
save old set->nr_hw_queues before call this function.

Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
---
 block/blk-mq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 406df9ce9b55..df243c19a158 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3342,11 +3342,11 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 		blk_mq_sysfs_unregister(q);
 	}
 
+	prev_nr_hw_queues = set->nr_hw_queues;
 	if (blk_mq_realloc_tag_set_tags(set, set->nr_hw_queues, nr_hw_queues) <
 	    0)
 		goto reregister;
 
-	prev_nr_hw_queues = set->nr_hw_queues;
 	set->nr_hw_queues = nr_hw_queues;
 	blk_mq_update_queue_map(set);
 fallback:
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 6/7] block: refactor __blk_mq_alloc_rq_map_and_requests
  2020-04-06 19:35 [PATCH v3 0/7] Fix potential kernel panic when increase hardware queue Weiping Zhang
                   ` (4 preceding siblings ...)
  2020-04-06 19:37 ` [PATCH v3 5/7] block: save previous hardware queue count before udpate Weiping Zhang
@ 2020-04-06 19:37 ` Weiping Zhang
  2020-04-20 21:07   ` Bart Van Assche
  2020-04-06 19:37 ` [PATCH v3 7/7] block: alloc map and request for new hardware queue Weiping Zhang
  2020-04-08 12:25 ` [PATCH v3 0/7] Fix potential kernel panic when increase " Weiping Zhang
  7 siblings, 1 reply; 21+ messages in thread
From: Weiping Zhang @ 2020-04-06 19:37 UTC (permalink / raw)
  To: axboe, bvanassche; +Cc: linux-block

This patch add a new member nr_allocated_map_rqs to the
struct blk_mq_tag_set to record the number of maps and requests have
been allocated for this tagset.

Now there is a problem when we increase hardware queue count, we do not
allocate maps and request for the new allocated hardware queue, it will
be fixed in the next patch.

Since request needs lots of memory, it's not easy alloc so many memory
dynamically, espeicially when system is under memory pressure.

This patch allow nr_hw_queues does not equal to the nr_allocated_map_rqs,
to avoid alloc/free memory when change hardware queue count.

Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
---
 block/blk-mq.c         | 28 +++++++++++++++++++++-------
 include/linux/blk-mq.h |  1 +
 2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index df243c19a158..15f6a811122a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2479,8 +2479,10 @@ static bool __blk_mq_alloc_rq_map_and_request(struct blk_mq_tag_set *set, int hc
 
 	ret = blk_mq_alloc_rqs(set, set->tags[hctx_idx], hctx_idx,
 				set->queue_depth);
-	if (!ret)
+	if (!ret) {
+		set->nr_allocated_map_rqs++;
 		return true;
+	}
 
 	blk_mq_free_rq_map(set->tags[hctx_idx]);
 	set->tags[hctx_idx] = NULL;
@@ -2494,6 +2496,7 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
 		blk_mq_free_rqs(set, set->tags[hctx_idx], hctx_idx);
 		blk_mq_free_rq_map(set->tags[hctx_idx]);
 		set->tags[hctx_idx] = NULL;
+		set->nr_allocated_map_rqs--;
 	}
 }
 
@@ -2978,18 +2981,28 @@ void blk_mq_exit_queue(struct request_queue *q)
 	blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
 }
 
-static int __blk_mq_alloc_rq_map_and_requests(struct blk_mq_tag_set *set)
+/*
+ * Only append new map and requests, if new > now, all of these maps and
+ * request will be released when cleanup whole tag set. Because requests
+ * will cost lots memory, if system's memory is under a pressure, it's not
+ * easy to allocate too much memory.
+ */
+static int blk_mq_realloc_rq_map_and_requests(struct blk_mq_tag_set *set,
+						int new)
 {
-	int i;
+	int i, now = set->nr_allocated_map_rqs;
+
+	if (new <= now)
+		return 0;
 
-	for (i = 0; i < set->nr_hw_queues; i++)
+	for (i = now; i < new; i++)
 		if (!__blk_mq_alloc_rq_map_and_request(set, i))
 			goto out_unwind;
 
 	return 0;
 
 out_unwind:
-	while (--i >= 0)
+	while (--i >= now)
 		blk_mq_free_map_and_requests(set, i);
 
 	return -ENOMEM;
@@ -3007,7 +3020,8 @@ static int blk_mq_alloc_rq_map_and_requests(struct blk_mq_tag_set *set)
 
 	depth = set->queue_depth;
 	do {
-		err = __blk_mq_alloc_rq_map_and_requests(set);
+		err = blk_mq_realloc_rq_map_and_requests(set,
+						set->nr_hw_queues);
 		if (!err)
 			break;
 
@@ -3184,7 +3198,7 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
 {
 	int i, j;
 
-	for (i = 0; i < set->nr_hw_queues; i++)
+	for (i = 0; i < set->nr_allocated_map_rqs; i++)
 		blk_mq_free_map_and_requests(set, i);
 
 	for (j = 0; j < set->nr_maps; j++) {
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index f389d7c724bd..d950435cd3c6 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -240,6 +240,7 @@ struct blk_mq_tag_set {
 	unsigned int		nr_maps;
 	const struct blk_mq_ops	*ops;
 	unsigned int		nr_hw_queues;
+	unsigned int		nr_allocated_map_rqs;
 	unsigned int		queue_depth;
 	unsigned int		reserved_tags;
 	unsigned int		cmd_size;
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 7/7] block: alloc map and request for new hardware queue
  2020-04-06 19:35 [PATCH v3 0/7] Fix potential kernel panic when increase hardware queue Weiping Zhang
                   ` (5 preceding siblings ...)
  2020-04-06 19:37 ` [PATCH v3 6/7] block: refactor __blk_mq_alloc_rq_map_and_requests Weiping Zhang
@ 2020-04-06 19:37 ` Weiping Zhang
  2020-04-08 12:25 ` [PATCH v3 0/7] Fix potential kernel panic when increase " Weiping Zhang
  7 siblings, 0 replies; 21+ messages in thread
From: Weiping Zhang @ 2020-04-06 19:37 UTC (permalink / raw)
  To: axboe, bvanassche; +Cc: linux-block

Alloc new map and request for new hardware queue when increse
hardware queue count. Before this patch, it will show a
warning for each new hardware queue, but it's not enough, these
hctx have no maps and reqeust, when a bio was mapped to these
hardware queue, it will trigger kernel panic when get request
from these hctx.

Test environment:
 * A NVMe disk supports 128 io queues
 * 96 cpus in system

A corner case can always trigger this panic, there are 96
io queues allocated for HCTX_TYPE_DEFAULT type, the corresponding kernel
log: nvme nvme0: 96/0/0 default/read/poll queues. Now we set nvme write
queues to 96, then nvme will alloc others(32) queues for read, but
blk_mq_update_nr_hw_queues does not alloc map and request for these new
added io queues. So when process read nvme disk, it will trigger kernel
panic when get request from these hardware context.

Reproduce script:

nr=$(expr `cat /sys/block/nvme0n1/device/queue_count` - 1)
echo $nr > /sys/module/nvme/parameters/write_queues
echo 1 > /sys/block/nvme0n1/device/reset_controller
dd if=/dev/nvme0n1 of=/dev/null bs=4K count=1

[ 8040.805626] ------------[ cut here ]------------
[ 8040.805627] WARNING: CPU: 82 PID: 12921 at block/blk-mq.c:2578 blk_mq_map_swqueue+0x2b6/0x2c0
[ 8040.805627] Modules linked in: nvme nvme_core nf_conntrack_netlink xt_addrtype br_netfilter overlay xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nft_counter nf_nat_tftp nf_conntrack_tftp nft_masq nf_tables_set nft_fib_inet nft_f
ib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack tun bridge nf_defrag_ipv6 nf_defrag_ipv4 stp llc ip6_tables ip_tables nft_compat rfkill ip_set nf_tables nfne
tlink sunrpc intel_rapl_msr intel_rapl_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass ipmi_ssif crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support ghash_clmulni_intel intel_
cstate intel_uncore raid0 joydev intel_rapl_perf ipmi_si pcspkr mei_me ioatdma sg ipmi_devintf mei i2c_i801 dca lpc_ich ipmi_msghandler acpi_power_meter acpi_pad xfs libcrc32c sd_mod ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm d
rm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
[ 8040.805637]  ahci drm i40e libahci crc32c_intel libata t10_pi wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nvme_core]
[ 8040.805640] CPU: 82 PID: 12921 Comm: kworker/u194:2 Kdump: loaded Tainted: G        W         5.6.0-rc5.78317c+ #2
[ 8040.805640] Hardware name: Inspur SA5212M5/YZMB-00882-104, BIOS 4.0.9 08/27/2019
[ 8040.805641] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
[ 8040.805642] RIP: 0010:blk_mq_map_swqueue+0x2b6/0x2c0
[ 8040.805643] Code: 00 00 00 00 00 41 83 c5 01 44 39 6d 50 77 b8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b bb 98 00 00 00 89 d6 e8 8c 81 03 00 eb 83 <0f> 0b e9 52 ff ff ff 0f 1f 00 0f 1f 44 00 00 41 57 48 89 f1 41 56
[ 8040.805643] RSP: 0018:ffffba590d2e7d48 EFLAGS: 00010246
[ 8040.805643] RAX: 0000000000000000 RBX: ffff9f013e1ba800 RCX: 000000000000003d
[ 8040.805644] RDX: ffff9f00ffff6000 RSI: 0000000000000003 RDI: ffff9ed200246d90
[ 8040.805644] RBP: ffff9f00f6a79860 R08: 0000000000000000 R09: 000000000000003d
[ 8040.805645] R10: 0000000000000001 R11: ffff9f0138c3d000 R12: ffff9f00fb3a9008
[ 8040.805645] R13: 000000000000007f R14: ffffffff96822660 R15: 000000000000005f
[ 8040.805645] FS:  0000000000000000(0000) GS:ffff9f013fa80000(0000) knlGS:0000000000000000
[ 8040.805646] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8040.805646] CR2: 00007f7f397fa6f8 CR3: 0000003d8240a002 CR4: 00000000007606e0
[ 8040.805647] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8040.805647] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8040.805647] PKRU: 55555554
[ 8040.805647] Call Trace:
[ 8040.805649]  blk_mq_update_nr_hw_queues+0x31b/0x390
[ 8040.805650]  nvme_reset_work+0xb4b/0xeab [nvme]
[ 8040.805651]  process_one_work+0x1a7/0x370
[ 8040.805652]  worker_thread+0x1c9/0x380
[ 8040.805653]  ? max_active_store+0x80/0x80
[ 8040.805655]  kthread+0x112/0x130
[ 8040.805656]  ? __kthread_parkme+0x70/0x70
[ 8040.805657]  ret_from_fork+0x35/0x40
[ 8040.805658] ---[ end trace b5f13b1e73ccb5d3 ]---
[ 8229.365135] BUG: kernel NULL pointer dereference, address: 0000000000000004
[ 8229.365165] #PF: supervisor read access in kernel mode
[ 8229.365178] #PF: error_code(0x0000) - not-present page
[ 8229.365191] PGD 0 P4D 0
[ 8229.365201] Oops: 0000 [#1] SMP PTI
[ 8229.365212] CPU: 77 PID: 13024 Comm: dd Kdump: loaded Tainted: G        W         5.6.0-rc5.78317c+ #2
[ 8229.365232] Hardware name: Inspur SA5212M5/YZMB-00882-104, BIOS 4.0.9 08/27/2019
[ 8229.365253] RIP: 0010:blk_mq_get_tag+0x227/0x250
[ 8229.365265] Code: 44 24 04 44 01 e0 48 8b 74 24 38 65 48 33 34 25 28 00 00 00 75 33 48 83 c4 40 5b 5d 41 5c 41 5d 41 5e c3 48 8d 68 10 4c 89 ef <44> 8b 60 04 48 89 ee e8 dd f9 ff ff 83 f8 ff 75 c8 e9 67 fe ff ff
[ 8229.365304] RSP: 0018:ffffba590e977970 EFLAGS: 00010246
[ 8229.365317] RAX: 0000000000000000 RBX: ffff9f00f6a79860 RCX: ffffba590e977998
[ 8229.365333] RDX: 0000000000000000 RSI: ffff9f012039b140 RDI: ffffba590e977a38
[ 8229.365349] RBP: 0000000000000010 R08: ffffda58ff94e190 R09: ffffda58ff94e198
[ 8229.365365] R10: 0000000000000011 R11: ffff9f00f6a79860 R12: 0000000000000000
[ 8229.365381] R13: ffffba590e977a38 R14: ffff9f012039b140 R15: 0000000000000001
[ 8229.365397] FS:  00007f481c230580(0000) GS:ffff9f013f940000(0000) knlGS:0000000000000000
[ 8229.365415] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8229.365428] CR2: 0000000000000004 CR3: 0000005f35e26004 CR4: 00000000007606e0
[ 8229.365444] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8229.365460] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8229.365476] PKRU: 55555554
[ 8229.365484] Call Trace:
[ 8229.365498]  ? finish_wait+0x80/0x80
[ 8229.365512]  blk_mq_get_request+0xcb/0x3f0
[ 8229.365525]  blk_mq_make_request+0x143/0x5d0
[ 8229.365538]  generic_make_request+0xcf/0x310
[ 8229.365553]  ? scan_shadow_nodes+0x30/0x30
[ 8229.365564]  submit_bio+0x3c/0x150
[ 8229.365576]  mpage_readpages+0x163/0x1a0
[ 8229.365588]  ? blkdev_direct_IO+0x490/0x490
[ 8229.365601]  read_pages+0x6b/0x190
[ 8229.365612]  __do_page_cache_readahead+0x1c1/0x1e0
[ 8229.365626]  ondemand_readahead+0x182/0x2f0
[ 8229.365639]  generic_file_buffered_read+0x590/0xab0
[ 8229.365655]  new_sync_read+0x12a/0x1c0
[ 8229.365666]  vfs_read+0x8a/0x140
[ 8229.365676]  ksys_read+0x59/0xd0
[ 8229.365688]  do_syscall_64+0x55/0x1d0
[ 8229.365700]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
---
 block/blk-mq.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 15f6a811122a..5676b5654d8d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3363,6 +3363,13 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 
 	set->nr_hw_queues = nr_hw_queues;
 	blk_mq_update_queue_map(set);
+
+	if (blk_mq_realloc_rq_map_and_requests(set, set->nr_hw_queues) < 0) {
+		pr_warn("Updating nr_hw_queues to %d fails, fallback to %d\n",
+			nr_hw_queues, prev_nr_hw_queues);
+		goto reregister;
+	}
+
 fallback:
 	list_for_each_entry(q, &set->tag_list, tag_set_list) {
 		blk_mq_realloc_hw_ctxs(set, q);
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 0/7] Fix potential kernel panic when increase hardware queue
  2020-04-06 19:35 [PATCH v3 0/7] Fix potential kernel panic when increase hardware queue Weiping Zhang
                   ` (6 preceding siblings ...)
  2020-04-06 19:37 ` [PATCH v3 7/7] block: alloc map and request for new hardware queue Weiping Zhang
@ 2020-04-08 12:25 ` Weiping Zhang
  2020-04-20 11:15   ` Weiping Zhang
  7 siblings, 1 reply; 21+ messages in thread
From: Weiping Zhang @ 2020-04-08 12:25 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, linux-block

Weiping Zhang <zhangweiping@didiglobal.com> 于2020年4月7日周二 上午3:36写道:
>
> Hi,
>
> This series do some improvement base on V2, and also fix two memleaks.
> And V2 import a regression for blktest/block/test/029, this test case
> will increase and decrease hardware queue count quickly.
>
>
> Memleak 1:
>
> __blk_mq_alloc_rq_maps
>         __blk_mq_alloc_rq_map
>
> if fail
>         blk_mq_free_rq_map
>
> Actually, __blk_mq_alloc_rq_map alloc both map and request, here
> also need free request.
>
> Memleak 2:
> When driver decrease hardware queue, set->nr_hw_queues will be changed
> firstly in blk_mq_realloc_tag_set_tags or __blk_mq_update_nr_hw_queues,
> then blk_mq_realloc_hw_ctxs and blk_mq_map_swqueue, even
> blk_mq_free_tag_set have no chance to free these hardware queue resource,
> because they iterate hardware queue by
> for (i = 0; i < set->nr_hw_queues; i++).
>
> Patch1~3: rename some function name, no function change.
> Patch4: fix first memleak.
> Patch5: fix prev_nr_hw_queues issue, need be saved before change.
> Patch6: alloc map and request to fix potential kernel panic.
>
Update patch description:
Patch1~3: rename some function name, no function change.
Patch4: fix first memleak.
Patch5: fix prev_nr_hw_queues issue, need be saved before change.
Patch6: add nr_allocated_map_rqs to struct blk_mq_tag_set to record how
may rq and maps were allocated for this tag set, and also fix memleak2.

Patch7: fix kernel panic when update hardware queue count > cpu count,
when use multiple maps. Patch7's commit message has more detail information
about this issue.

>
> Changes since V2:
>  * rename some functions name and fix memleak when free map and requests
>  * Not free new allocated map and request, they will be relased when tagset gone
>
> Changes since V1:
>  * Add fix for potential kernel panic when increase hardware queue
>
> Weiping Zhang (7):
>   block: rename __blk_mq_alloc_rq_map
>   block: rename __blk_mq_alloc_rq_maps
>   block: rename blk_mq_alloc_rq_maps
>   block: free both map and request
>   block: save previous hardware queue count before udpate
>   block: refactor __blk_mq_alloc_rq_map_and_requests
>   block: alloc map and request for new hardware queue
>
>  block/blk-mq.c         | 49 ++++++++++++++++++++++++++++++------------
>  include/linux/blk-mq.h |  1 +
>  2 files changed, 36 insertions(+), 14 deletions(-)
>
> --
> 2.18.1
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 0/7] Fix potential kernel panic when increase hardware queue
  2020-04-08 12:25 ` [PATCH v3 0/7] Fix potential kernel panic when increase " Weiping Zhang
@ 2020-04-20 11:15   ` Weiping Zhang
  2020-04-20 19:30     ` Jens Axboe
  0 siblings, 1 reply; 21+ messages in thread
From: Weiping Zhang @ 2020-04-20 11:15 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, linux-block

Hi Jens,

Ping

On Wed, Apr 8, 2020 at 8:25 PM Weiping Zhang <zwp10758@gmail.com> wrote:
>
> Weiping Zhang <zhangweiping@didiglobal.com> 于2020年4月7日周二 上午3:36写道:
> >
> > Hi,
> >
> > This series do some improvement base on V2, and also fix two memleaks.
> > And V2 import a regression for blktest/block/test/029, this test case
> > will increase and decrease hardware queue count quickly.
> >
> >
> > Memleak 1:
> >
> > __blk_mq_alloc_rq_maps
> >         __blk_mq_alloc_rq_map
> >
> > if fail
> >         blk_mq_free_rq_map
> >
> > Actually, __blk_mq_alloc_rq_map alloc both map and request, here
> > also need free request.
> >
> > Memleak 2:
> > When driver decrease hardware queue, set->nr_hw_queues will be changed
> > firstly in blk_mq_realloc_tag_set_tags or __blk_mq_update_nr_hw_queues,
> > then blk_mq_realloc_hw_ctxs and blk_mq_map_swqueue, even
> > blk_mq_free_tag_set have no chance to free these hardware queue resource,
> > because they iterate hardware queue by
> > for (i = 0; i < set->nr_hw_queues; i++).
> >
> > Patch1~3: rename some function name, no function change.
> > Patch4: fix first memleak.
> > Patch5: fix prev_nr_hw_queues issue, need be saved before change.
> > Patch6: alloc map and request to fix potential kernel panic.
> >
> Update patch description:
> Patch1~3: rename some function name, no function change.
> Patch4: fix first memleak.
> Patch5: fix prev_nr_hw_queues issue, need be saved before change.
> Patch6: add nr_allocated_map_rqs to struct blk_mq_tag_set to record how
> may rq and maps were allocated for this tag set, and also fix memleak2.
>
> Patch7: fix kernel panic when update hardware queue count > cpu count,
> when use multiple maps. Patch7's commit message has more detail information
> about this issue.
>
> >
> > Changes since V2:
> >  * rename some functions name and fix memleak when free map and requests
> >  * Not free new allocated map and request, they will be relased when tagset gone
> >
> > Changes since V1:
> >  * Add fix for potential kernel panic when increase hardware queue
> >
> > Weiping Zhang (7):
> >   block: rename __blk_mq_alloc_rq_map
> >   block: rename __blk_mq_alloc_rq_maps
> >   block: rename blk_mq_alloc_rq_maps
> >   block: free both map and request
> >   block: save previous hardware queue count before udpate
> >   block: refactor __blk_mq_alloc_rq_map_and_requests
> >   block: alloc map and request for new hardware queue
> >
> >  block/blk-mq.c         | 49 ++++++++++++++++++++++++++++++------------
> >  include/linux/blk-mq.h |  1 +
> >  2 files changed, 36 insertions(+), 14 deletions(-)
> >
> > --
> > 2.18.1
> >

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 0/7] Fix potential kernel panic when increase hardware queue
  2020-04-20 11:15   ` Weiping Zhang
@ 2020-04-20 19:30     ` Jens Axboe
  0 siblings, 0 replies; 21+ messages in thread
From: Jens Axboe @ 2020-04-20 19:30 UTC (permalink / raw)
  To: Weiping Zhang, Bart Van Assche, linux-block, Ming Lei

On 4/20/20 5:15 AM, Weiping Zhang wrote:
> Hi Jens,
> 
> Ping

I'd really like for Bart/Ming to weigh in here.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 1/7] block: rename __blk_mq_alloc_rq_map
  2020-04-06 19:36 ` [PATCH v3 1/7] block: rename __blk_mq_alloc_rq_map Weiping Zhang
@ 2020-04-20 20:41   ` Bart Van Assche
  2020-04-21  1:25     ` Ming Lei
  0 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2020-04-20 20:41 UTC (permalink / raw)
  To: axboe, linux-block, Weiping Zhang

On 4/6/20 12:36 PM, Weiping Zhang wrote:
> rename __blk_mq_alloc_rq_map to __blk_mq_alloc_rq_map_and_request,
> actually it alloc both map and request, make function name
> align with function.
> 
> Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
> ---
>   block/blk-mq.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index f6291ceedee4..3a482ce7ed28 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2468,7 +2468,7 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
>   	}
>   }
>   
> -static bool __blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, int hctx_idx)
> +static bool __blk_mq_alloc_rq_map_and_request(struct blk_mq_tag_set *set, int hctx_idx)
>   {
>   	int ret = 0;
>   
> @@ -2519,7 +2519,7 @@ static void blk_mq_map_swqueue(struct request_queue *q)
>   		hctx_idx = set->map[HCTX_TYPE_DEFAULT].mq_map[i];
>   		/* unmapped hw queue can be remapped after CPU topo changed */
>   		if (!set->tags[hctx_idx] &&
> -		    !__blk_mq_alloc_rq_map(set, hctx_idx)) {
> +		    !__blk_mq_alloc_rq_map_and_request(set, hctx_idx)) {
>   			/*
>   			 * If tags initialization fail for some hctx,
>   			 * that hctx won't be brought online.  In this
> @@ -2983,7 +2983,7 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
>   	int i;
>   
>   	for (i = 0; i < set->nr_hw_queues; i++)
> -		if (!__blk_mq_alloc_rq_map(set, i))
> +		if (!__blk_mq_alloc_rq_map_and_request(set, i))
>   			goto out_unwind;
>   
>   	return 0;

What the __blk_mq_alloc_rq_map() function allocates is a request map and 
requests. The new name is misleading because it suggests that only a 
single request is allocated instead of multiple. The name 
__blk_mq_alloc_rq_map_and_requests() is probably a better choice than 
__blk_mq_alloc_rq_map_and_request().

My opinion is that the old name is clear enough. I prefer the current name.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/7] block: rename __blk_mq_alloc_rq_maps
  2020-04-06 19:36 ` [PATCH v3 2/7] block: rename __blk_mq_alloc_rq_maps Weiping Zhang
@ 2020-04-20 20:46   ` Bart Van Assche
  0 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2020-04-20 20:46 UTC (permalink / raw)
  To: axboe, linux-block, Weiping Zhang

On 4/6/20 12:36 PM, Weiping Zhang wrote:
> rename __blk_mq_alloc_rq_maps to __blk_mq_alloc_rq_map_and_requests,
> this function allocs both map and request, make function name align
> with funtion.
> 
> Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
> ---
>   block/blk-mq.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 3a482ce7ed28..5a322130aaf2 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2978,7 +2978,7 @@ void blk_mq_exit_queue(struct request_queue *q)
>   	blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
>   }
>   
> -static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
> +static int __blk_mq_alloc_rq_map_and_requests(struct blk_mq_tag_set *set)
>   {
>   	int i;
>   
> @@ -3007,7 +3007,7 @@ static int blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
>   
>   	depth = set->queue_depth;
>   	do {
> -		err = __blk_mq_alloc_rq_maps(set);
> +		err = __blk_mq_alloc_rq_map_and_requests(set);
>   		if (!err)
>   			break;

Just like for the previous patch, I prefer not to rename 
__blk_mq_alloc_rq_maps().

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/7] block: rename blk_mq_alloc_rq_maps
  2020-04-06 19:36 ` [PATCH v3 3/7] block: rename blk_mq_alloc_rq_maps Weiping Zhang
@ 2020-04-20 20:47   ` Bart Van Assche
  0 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2020-04-20 20:47 UTC (permalink / raw)
  To: axboe, linux-block, Weiping Zhang

On 4/6/20 12:36 PM, Weiping Zhang wrote:
> rename blk_mq_alloc_rq_maps to blk_mq_alloc_rq_map_and_requests,
> this function alloc both map and request, make function name
> align with function.
> 
> Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
> ---
>   block/blk-mq.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 5a322130aaf2..4692e8232699 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3000,7 +3000,7 @@ static int __blk_mq_alloc_rq_map_and_requests(struct blk_mq_tag_set *set)
>    * may reduce the depth asked for, if memory is tight. set->queue_depth
>    * will be updated to reflect the allocated depth.
>    */
> -static int blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
> +static int blk_mq_alloc_rq_map_and_requests(struct blk_mq_tag_set *set)
>   {
>   	unsigned int depth;
>   	int err;
> @@ -3160,7 +3160,7 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
>   	if (ret)
>   		goto out_free_mq_map;
>   
> -	ret = blk_mq_alloc_rq_maps(set);
> +	ret = blk_mq_alloc_rq_map_and_requests(set);
>   	if (ret)
>   		goto out_free_mq_map;

Personally I'm fine with the current name of this function. However, 
others may have another opinion.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 4/7] block: free both map and request
  2020-04-06 19:36 ` [PATCH v3 4/7] block: free both map and request Weiping Zhang
@ 2020-04-20 20:58   ` Bart Van Assche
  2020-04-21 13:15     ` weiping zhang
  0 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2020-04-20 20:58 UTC (permalink / raw)
  To: axboe, linux-block, Weiping Zhang

On 4/6/20 12:36 PM, Weiping Zhang wrote:
> For this error handle, it should free both map and request,
> otherwise memleak occur.
             ^^^^^^^
Please expand this into "a memory leak".

> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4692e8232699..406df9ce9b55 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2990,7 +2990,7 @@ static int __blk_mq_alloc_rq_map_and_requests(struct blk_mq_tag_set *set)
>   
>   out_unwind:
>   	while (--i >= 0)
> -		blk_mq_free_rq_map(set->tags[i]);
> +		blk_mq_free_map_and_requests(set, i);
>   
>   	return -ENOMEM;
>   }

The current upstream implementation is as follows:

static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
{
	int i;

	for (i = 0; i < set->nr_hw_queues; i++)
		if (!__blk_mq_alloc_rq_map(set, i))
			goto out_unwind;

	return 0;

out_unwind:
	while (--i >= 0)
		blk_mq_free_rq_map(set->tags[i]);

	return -ENOMEM;
}

The pointers to the memory allocated by __blk_mq_alloc_rq_map() are 
stored in set->tags[i] and set->tags[i]->static_rqs. 
blk_mq_free_rq_map() frees set->tags[i]->static_rqs and set->tags[i]. In 
other words, I don't see which memory is leaked. What did I miss?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 5/7] block: save previous hardware queue count before udpate
  2020-04-06 19:37 ` [PATCH v3 5/7] block: save previous hardware queue count before udpate Weiping Zhang
@ 2020-04-20 21:00   ` Bart Van Assche
  0 siblings, 0 replies; 21+ messages in thread
From: Bart Van Assche @ 2020-04-20 21:00 UTC (permalink / raw)
  To: axboe, linux-block, Weiping Zhang

On 4/6/20 12:37 PM, Weiping Zhang wrote:
> blk_mq_realloc_tag_set_tags will update set->nr_hw_queues, so
> save old set->nr_hw_queues before call this function.
> 
> Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
> ---
>   block/blk-mq.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 406df9ce9b55..df243c19a158 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3342,11 +3342,11 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
>   		blk_mq_sysfs_unregister(q);
>   	}
>   
> +	prev_nr_hw_queues = set->nr_hw_queues;
>   	if (blk_mq_realloc_tag_set_tags(set, set->nr_hw_queues, nr_hw_queues) <
>   	    0)
>   		goto reregister;
>   
> -	prev_nr_hw_queues = set->nr_hw_queues;
>   	set->nr_hw_queues = nr_hw_queues;
>   	blk_mq_update_queue_map(set);
>   fallback:

How about adding Fixes: and Cc: stable tags? Anyway:

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 6/7] block: refactor __blk_mq_alloc_rq_map_and_requests
  2020-04-06 19:37 ` [PATCH v3 6/7] block: refactor __blk_mq_alloc_rq_map_and_requests Weiping Zhang
@ 2020-04-20 21:07   ` Bart Van Assche
  2020-04-21 13:24     ` weiping zhang
  0 siblings, 1 reply; 21+ messages in thread
From: Bart Van Assche @ 2020-04-20 21:07 UTC (permalink / raw)
  To: axboe, linux-block, Weiping Zhang

On 4/6/20 12:37 PM, Weiping Zhang wrote:
> This patch add a new member nr_allocated_map_rqs to the
> struct blk_mq_tag_set to record the number of maps and requests have
> been allocated for this tagset.
> 
> Now there is a problem when we increase hardware queue count, we do not
> allocate maps and request for the new allocated hardware queue, it will
> be fixed in the next patch.
> 
> Since request needs lots of memory, it's not easy alloc so many memory
> dynamically, espeicially when system is under memory pressure.
> 
> This patch allow nr_hw_queues does not equal to the nr_allocated_map_rqs,
> to avoid alloc/free memory when change hardware queue count.

It seems to me that patches 6 and 7 combined fix a single issue. How 
about combining these two patches into a single patch?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 1/7] block: rename __blk_mq_alloc_rq_map
  2020-04-20 20:41   ` Bart Van Assche
@ 2020-04-21  1:25     ` Ming Lei
  2020-04-21 13:17       ` weiping zhang
  0 siblings, 1 reply; 21+ messages in thread
From: Ming Lei @ 2020-04-21  1:25 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Jens Axboe, linux-block, Weiping Zhang

On Tue, Apr 21, 2020 at 4:42 AM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 4/6/20 12:36 PM, Weiping Zhang wrote:
> > rename __blk_mq_alloc_rq_map to __blk_mq_alloc_rq_map_and_request,
> > actually it alloc both map and request, make function name
> > align with function.
> >
> > Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
> > ---
> >   block/blk-mq.c | 6 +++---
> >   1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index f6291ceedee4..3a482ce7ed28 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -2468,7 +2468,7 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
> >       }
> >   }
> >
> > -static bool __blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, int hctx_idx)
> > +static bool __blk_mq_alloc_rq_map_and_request(struct blk_mq_tag_set *set, int hctx_idx)
> >   {
> >       int ret = 0;
> >
> > @@ -2519,7 +2519,7 @@ static void blk_mq_map_swqueue(struct request_queue *q)
> >               hctx_idx = set->map[HCTX_TYPE_DEFAULT].mq_map[i];
> >               /* unmapped hw queue can be remapped after CPU topo changed */
> >               if (!set->tags[hctx_idx] &&
> > -                 !__blk_mq_alloc_rq_map(set, hctx_idx)) {
> > +                 !__blk_mq_alloc_rq_map_and_request(set, hctx_idx)) {
> >                       /*
> >                        * If tags initialization fail for some hctx,
> >                        * that hctx won't be brought online.  In this
> > @@ -2983,7 +2983,7 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
> >       int i;
> >
> >       for (i = 0; i < set->nr_hw_queues; i++)
> > -             if (!__blk_mq_alloc_rq_map(set, i))
> > +             if (!__blk_mq_alloc_rq_map_and_request(set, i))
> >                       goto out_unwind;
> >
> >       return 0;
>
> What the __blk_mq_alloc_rq_map() function allocates is a request map and
> requests. The new name is misleading because it suggests that only a
> single request is allocated instead of multiple. The name
> __blk_mq_alloc_rq_map_and_requests() is probably a better choice than
> __blk_mq_alloc_rq_map_and_request().
>
> My opinion is that the old name is clear enough. I prefer the current name.

Also putting renaming patches before actual fix patches does make more trouble
for backporting the fix to stable tree.

So please re-organize patches by fixing issues first, then following rename
stuff.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 4/7] block: free both map and request
  2020-04-20 20:58   ` Bart Van Assche
@ 2020-04-21 13:15     ` weiping zhang
  0 siblings, 0 replies; 21+ messages in thread
From: weiping zhang @ 2020-04-21 13:15 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: axboe, linux-block, Weiping Zhang

On Mon, Apr 20, 2020 at 01:58:54PM -0700, Bart Van Assche wrote:
> On 4/6/20 12:36 PM, Weiping Zhang wrote:
> >For this error handle, it should free both map and request,
> >otherwise memleak occur.
>             ^^^^^^^
> Please expand this into "a memory leak".
> 
> >diff --git a/block/blk-mq.c b/block/blk-mq.c
> >index 4692e8232699..406df9ce9b55 100644
> >--- a/block/blk-mq.c
> >+++ b/block/blk-mq.c
> >@@ -2990,7 +2990,7 @@ static int __blk_mq_alloc_rq_map_and_requests(struct blk_mq_tag_set *set)
> >  out_unwind:
> >  	while (--i >= 0)
> >-		blk_mq_free_rq_map(set->tags[i]);
> >+		blk_mq_free_map_and_requests(set, i);
> >  	return -ENOMEM;
> >  }
> 
> The current upstream implementation is as follows:
> 
> static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
> {
> 	int i;
> 
> 	for (i = 0; i < set->nr_hw_queues; i++)
> 		if (!__blk_mq_alloc_rq_map(set, i))
> 			goto out_unwind;
> 
> 	return 0;
> 
> out_unwind:
> 	while (--i >= 0)
> 		blk_mq_free_rq_map(set->tags[i]);
> 
> 	return -ENOMEM;
> }
> 
> The pointers to the memory allocated by __blk_mq_alloc_rq_map() are
> stored in set->tags[i] and set->tags[i]->static_rqs.
> blk_mq_free_rq_map() frees set->tags[i]->static_rqs and
> set->tags[i]. In other words, I don't see which memory is leaked.
> What did I miss?

Hi Bart,

__blk_mq_alloc_rq_map
	blk_mq_alloc_rq_map
		blk_mq_alloc_rq_map
			tags = blk_mq_init_tags : kzalloc_node:
			tags->rqs = kcalloc_node
			tags->static_rqs = kcalloc_node
	blk_mq_alloc_rqs
		p = alloc_pages_node
		tags->static_rqs[i] = p + offset;

but

blk_mq_free_rq_map
	kfree(tags->rqs);
	kfree(tags->static_rqs);
	blk_mq_free_tags
		kfree(tags);

The page allocated in blk_mq_alloc_rqs cannot be released,
so we should use blk_mq_free_map_and_requests here,

blk_mq_free_map_and_requests
	blk_mq_free_rqs
		__free_pages : cleanup for blk_mq_alloc_rqs
	blk_mq_free_rq_map : cleanup for blk_mq_alloc_rq_map
		...

The reason why I rename some functions in previous three patches is
that, these function not only allocate rq_map but also requests.

How about rename them like this:
__blk_mq_alloc_rq_map	=> __blk_mq_alloc_map_and_request
__blk_mq_alloc_rq_maps	=> __blk_mq_alloc_map_and_requests,
blk_mq_alloc_rq_maps	=> blk_mq_alloc_map_and_requests

Thanks

> 
> Thanks,
> 
> Bart.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 1/7] block: rename __blk_mq_alloc_rq_map
  2020-04-21  1:25     ` Ming Lei
@ 2020-04-21 13:17       ` weiping zhang
  0 siblings, 0 replies; 21+ messages in thread
From: weiping zhang @ 2020-04-21 13:17 UTC (permalink / raw)
  To: Ming Lei; +Cc: Bart Van Assche, Jens Axboe, linux-block, Weiping Zhang

On Tue, Apr 21, 2020 at 09:25:49AM +0800, Ming Lei wrote:
> On Tue, Apr 21, 2020 at 4:42 AM Bart Van Assche <bvanassche@acm.org> wrote:
> >
> > On 4/6/20 12:36 PM, Weiping Zhang wrote:
> > > rename __blk_mq_alloc_rq_map to __blk_mq_alloc_rq_map_and_request,
> > > actually it alloc both map and request, make function name
> > > align with function.
> > >
> > > Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com>
> > > ---
> > >   block/blk-mq.c | 6 +++---
> > >   1 file changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > > index f6291ceedee4..3a482ce7ed28 100644
> > > --- a/block/blk-mq.c
> > > +++ b/block/blk-mq.c
> > > @@ -2468,7 +2468,7 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
> > >       }
> > >   }
> > >
> > > -static bool __blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, int hctx_idx)
> > > +static bool __blk_mq_alloc_rq_map_and_request(struct blk_mq_tag_set *set, int hctx_idx)
> > >   {
> > >       int ret = 0;
> > >
> > > @@ -2519,7 +2519,7 @@ static void blk_mq_map_swqueue(struct request_queue *q)
> > >               hctx_idx = set->map[HCTX_TYPE_DEFAULT].mq_map[i];
> > >               /* unmapped hw queue can be remapped after CPU topo changed */
> > >               if (!set->tags[hctx_idx] &&
> > > -                 !__blk_mq_alloc_rq_map(set, hctx_idx)) {
> > > +                 !__blk_mq_alloc_rq_map_and_request(set, hctx_idx)) {
> > >                       /*
> > >                        * If tags initialization fail for some hctx,
> > >                        * that hctx won't be brought online.  In this
> > > @@ -2983,7 +2983,7 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
> > >       int i;
> > >
> > >       for (i = 0; i < set->nr_hw_queues; i++)
> > > -             if (!__blk_mq_alloc_rq_map(set, i))
> > > +             if (!__blk_mq_alloc_rq_map_and_request(set, i))
> > >                       goto out_unwind;
> > >
> > >       return 0;
> >
> > What the __blk_mq_alloc_rq_map() function allocates is a request map and
> > requests. The new name is misleading because it suggests that only a
> > single request is allocated instead of multiple. The name
> > __blk_mq_alloc_rq_map_and_requests() is probably a better choice than
> > __blk_mq_alloc_rq_map_and_request().
> >
> > My opinion is that the old name is clear enough. I prefer the current name.
> 
> Also putting renaming patches before actual fix patches does make more trouble
> for backporting the fix to stable tree.
> 
> So please re-organize patches by fixing issues first, then following rename
> stuff.
OK, I reorder them.

Thanks
> 
> Thanks,
> Ming Lei

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 6/7] block: refactor __blk_mq_alloc_rq_map_and_requests
  2020-04-20 21:07   ` Bart Van Assche
@ 2020-04-21 13:24     ` weiping zhang
  0 siblings, 0 replies; 21+ messages in thread
From: weiping zhang @ 2020-04-21 13:24 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: axboe, linux-block, Weiping Zhang

On Mon, Apr 20, 2020 at 02:07:13PM -0700, Bart Van Assche wrote:
> On 4/6/20 12:37 PM, Weiping Zhang wrote:
> >This patch add a new member nr_allocated_map_rqs to the
> >struct blk_mq_tag_set to record the number of maps and requests have
> >been allocated for this tagset.
> >
> >Now there is a problem when we increase hardware queue count, we do not
> >allocate maps and request for the new allocated hardware queue, it will
> >be fixed in the next patch.
> >
> >Since request needs lots of memory, it's not easy alloc so many memory
> >dynamically, espeicially when system is under memory pressure.
> >
> >This patch allow nr_hw_queues does not equal to the nr_allocated_map_rqs,
> >to avoid alloc/free memory when change hardware queue count.
> 
> It seems to me that patches 6 and 7 combined fix a single issue. How
> about combining these two patches into a single patch?
> 
It's ok for me, the reason why I split two patches is that make patch
review more easily, since patch-6 actually does not change original
logic. If you think combined them is more better, I can merge these
two patches.

Thanks
> Thanks,
> 
> Bart.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-04-21 13:24 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-06 19:35 [PATCH v3 0/7] Fix potential kernel panic when increase hardware queue Weiping Zhang
2020-04-06 19:36 ` [PATCH v3 1/7] block: rename __blk_mq_alloc_rq_map Weiping Zhang
2020-04-20 20:41   ` Bart Van Assche
2020-04-21  1:25     ` Ming Lei
2020-04-21 13:17       ` weiping zhang
2020-04-06 19:36 ` [PATCH v3 2/7] block: rename __blk_mq_alloc_rq_maps Weiping Zhang
2020-04-20 20:46   ` Bart Van Assche
2020-04-06 19:36 ` [PATCH v3 3/7] block: rename blk_mq_alloc_rq_maps Weiping Zhang
2020-04-20 20:47   ` Bart Van Assche
2020-04-06 19:36 ` [PATCH v3 4/7] block: free both map and request Weiping Zhang
2020-04-20 20:58   ` Bart Van Assche
2020-04-21 13:15     ` weiping zhang
2020-04-06 19:37 ` [PATCH v3 5/7] block: save previous hardware queue count before udpate Weiping Zhang
2020-04-20 21:00   ` Bart Van Assche
2020-04-06 19:37 ` [PATCH v3 6/7] block: refactor __blk_mq_alloc_rq_map_and_requests Weiping Zhang
2020-04-20 21:07   ` Bart Van Assche
2020-04-21 13:24     ` weiping zhang
2020-04-06 19:37 ` [PATCH v3 7/7] block: alloc map and request for new hardware queue Weiping Zhang
2020-04-08 12:25 ` [PATCH v3 0/7] Fix potential kernel panic when increase " Weiping Zhang
2020-04-20 11:15   ` Weiping Zhang
2020-04-20 19:30     ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.