All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/14] blk-mq: Minor fixes and cleanups
@ 2016-09-18  7:37 Alexander Gordeev
  2016-09-18  7:37 ` [PATCH 01/14] blk-mq: Fix memory leaks on queue cleanup Alexander Gordeev
                   ` (13 more replies)
  0 siblings, 14 replies; 27+ messages in thread
From: Alexander Gordeev @ 2016-09-18  7:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alexander Gordeev, linux-block

Hi Jens!

The series is against:

  git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-4.9/block-irq

Thanks!

CC: linux-block@vger.kernel.org

Alexander Gordeev (14):
  blk-mq: Fix memory leaks on queue cleanup
  blk-mq: Fix a potential NULL pointer assignment to hctx tags
  block: Get rid of unused request_queue::nr_queues member
  blk-mq: Do not limit number of queues to 'nr_cpu_ids' in allocations
  blk-mq: Remove a redundant assignment
  blk-mq: Fix hardware context data node selection
  blk-mq: Cleanup a loop exit condition
  blk-mq: Get rid of unnecessary blk_mq_free_hw_queues()
  blk-mq: Move duplicating code to blk_mq_exit_hctx()
  blk-mq: Uninit hardware context in order reverse to init
  blk-mq: Move hardware context init code into single location
  blk-mq: Rework blk_mq_init_hctx() function
  blk-mq: Pair blk_mq_hctx_kobj_init() with blk_mq_hctx_kobj_put()
  blk-mq: Set flush_start_tag to BLK_MQ_MAX_DEPTH

 block/blk-mq-sysfs.c   |   5 ++
 block/blk-mq.c         | 166 ++++++++++++++++++++++---------------------------
 block/blk-mq.h         |   1 +
 include/linux/blkdev.h |   1 -
 4 files changed, 79 insertions(+), 94 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 01/14] blk-mq: Fix memory leaks on queue cleanup
  2016-09-18  7:37 [PATCH 00/14] blk-mq: Minor fixes and cleanups Alexander Gordeev
@ 2016-09-18  7:37 ` Alexander Gordeev
  2016-09-19 13:51   ` Christoph Hellwig
  2016-09-18  7:37 ` [PATCH 02/14] blk-mq: Fix a potential NULL pointer assignment to hctx tags Alexander Gordeev
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Alexander Gordeev @ 2016-09-18  7:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alexander Gordeev, linux-block

Some data are leaked when blk_cleanup_queue() interface
is called.

CC: linux-block@vger.kernel.org
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 block/blk-mq.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f1c5263..66505af7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1699,8 +1699,13 @@ static void blk_mq_free_hw_queues(struct request_queue *q,
 	struct blk_mq_hw_ctx *hctx;
 	unsigned int i;
 
-	queue_for_each_hw_ctx(q, hctx, i)
+	queue_for_each_hw_ctx(q, hctx, i) {
 		free_cpumask_var(hctx->cpumask);
+		kfree(hctx->ctxs);
+		kfree(hctx);
+	}
+
+	q->nr_hw_queues = 0;
 }
 
 static int blk_mq_init_hctx(struct request_queue *q,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 02/14] blk-mq: Fix a potential NULL pointer assignment to hctx tags
  2016-09-18  7:37 [PATCH 00/14] blk-mq: Minor fixes and cleanups Alexander Gordeev
  2016-09-18  7:37 ` [PATCH 01/14] blk-mq: Fix memory leaks on queue cleanup Alexander Gordeev
@ 2016-09-18  7:37 ` Alexander Gordeev
  2016-09-19 18:34   ` Omar Sandoval
  2016-09-18  7:37 ` [PATCH 03/14] block: Get rid of unused request_queue::nr_queues member Alexander Gordeev
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Alexander Gordeev @ 2016-09-18  7:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alexander Gordeev, linux-block

If number of used hardware queues is dynamically decreased
then tags corresponding to the newly unused queues are freed.

If previously unused hardware queues are then reused again
they will start referring the previously freed tags.

CC: linux-block@vger.kernel.org
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 block/blk-mq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 66505af7..7fa58fe 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1995,6 +1995,8 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
 
 		if (hctxs[i])
 			continue;
+		if (!set->tags[i])
+			break;
 
 		node = blk_mq_hw_queue_to_node(q->mq_map, i);
 		hctxs[i] = kzalloc_node(sizeof(struct blk_mq_hw_ctx),
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 03/14] block: Get rid of unused request_queue::nr_queues member
  2016-09-18  7:37 [PATCH 00/14] blk-mq: Minor fixes and cleanups Alexander Gordeev
  2016-09-18  7:37 ` [PATCH 01/14] blk-mq: Fix memory leaks on queue cleanup Alexander Gordeev
  2016-09-18  7:37 ` [PATCH 02/14] blk-mq: Fix a potential NULL pointer assignment to hctx tags Alexander Gordeev
@ 2016-09-18  7:37 ` Alexander Gordeev
  2016-09-19 18:37   ` Omar Sandoval
  2016-09-18  7:37 ` [PATCH 04/14] blk-mq: Do not limit number of queues to 'nr_cpu_ids' in allocations Alexander Gordeev
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Alexander Gordeev @ 2016-09-18  7:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alexander Gordeev, linux-block

CC: linux-block@vger.kernel.org
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 block/blk-mq.c         | 2 --
 include/linux/blkdev.h | 1 -
 2 files changed, 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7fa58fe..276ec7b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2068,8 +2068,6 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	INIT_WORK(&q->timeout_work, blk_mq_timeout_work);
 	blk_queue_rq_timeout(q, set->timeout ? set->timeout : 30 * HZ);
 
-	q->nr_queues = nr_cpu_ids;
-
 	q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT;
 
 	if (!(set->flags & BLK_MQ_F_SG_MERGE))
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c47c358..0f1bf55 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -325,7 +325,6 @@ struct request_queue {
 
 	/* sw queues */
 	struct blk_mq_ctx __percpu	*queue_ctx;
-	unsigned int		nr_queues;
 
 	/* hw dispatch queues */
 	struct blk_mq_hw_ctx	**queue_hw_ctx;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 04/14] blk-mq: Do not limit number of queues to 'nr_cpu_ids' in allocations
  2016-09-18  7:37 [PATCH 00/14] blk-mq: Minor fixes and cleanups Alexander Gordeev
                   ` (2 preceding siblings ...)
  2016-09-18  7:37 ` [PATCH 03/14] block: Get rid of unused request_queue::nr_queues member Alexander Gordeev
@ 2016-09-18  7:37 ` Alexander Gordeev
  2016-09-19 17:48   ` Omar Sandoval
  2016-09-18  7:37 ` [PATCH 05/14] blk-mq: Remove a redundant assignment Alexander Gordeev
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Alexander Gordeev @ 2016-09-18  7:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alexander Gordeev, linux-block

Currently maximum number of used hardware queues is limited to
number of CPUs in the system. However, using 'nr_cpu_ids' as
the limit for (de-)allocations of data structures instead of
existing data structures' counters (a) worsens readability and
(b) leads to unused memory when number of hardware queues is
less than number of CPUs.

CC: linux-block@vger.kernel.org
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 block/blk-mq.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 276ec7b..2c77b68 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2054,8 +2054,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	if (!q->queue_ctx)
 		goto err_exit;
 
-	q->queue_hw_ctx = kzalloc_node(nr_cpu_ids * sizeof(*(q->queue_hw_ctx)),
-						GFP_KERNEL, set->numa_node);
+	q->queue_hw_ctx = kzalloc_node(set->nr_hw_queues *
+			sizeof(*(q->queue_hw_ctx)), GFP_KERNEL, set->numa_node);
 	if (!q->queue_hw_ctx)
 		goto err_percpu;
 
@@ -2319,7 +2319,7 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 	if (set->nr_hw_queues > nr_cpu_ids)
 		set->nr_hw_queues = nr_cpu_ids;
 
-	set->tags = kzalloc_node(nr_cpu_ids * sizeof(struct blk_mq_tags *),
+	set->tags = kzalloc_node(set->nr_hw_queues * sizeof(*set->tags),
 				 GFP_KERNEL, set->numa_node);
 	if (!set->tags)
 		return -ENOMEM;
@@ -2360,7 +2360,7 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
 {
 	int i;
 
-	for (i = 0; i < nr_cpu_ids; i++) {
+	for (i = 0; i < set->nr_hw_queues; i++) {
 		if (set->tags[i])
 			blk_mq_free_rq_map(set, set->tags[i], i);
 	}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 05/14] blk-mq: Remove a redundant assignment
  2016-09-18  7:37 [PATCH 00/14] blk-mq: Minor fixes and cleanups Alexander Gordeev
                   ` (3 preceding siblings ...)
  2016-09-18  7:37 ` [PATCH 04/14] blk-mq: Do not limit number of queues to 'nr_cpu_ids' in allocations Alexander Gordeev
@ 2016-09-18  7:37 ` Alexander Gordeev
  2016-09-19 18:38   ` Omar Sandoval
  2016-09-18  7:37 ` [PATCH 06/14] blk-mq: Fix hardware context data node selection Alexander Gordeev
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Alexander Gordeev @ 2016-09-18  7:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alexander Gordeev, linux-block

blk_mq_hw_ctx::queue_num is initialized in blk_mq_init_hctx()
function.

CC: linux-block@vger.kernel.org
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 block/blk-mq.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2c77b68..a38fd2e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2013,7 +2013,6 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
 
 		atomic_set(&hctxs[i]->nr_active, 0);
 		hctxs[i]->numa_node = node;
-		hctxs[i]->queue_num = i;
 
 		if (blk_mq_init_hctx(q, set, hctxs[i], i)) {
 			free_cpumask_var(hctxs[i]->cpumask);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 06/14] blk-mq: Fix hardware context data node selection
  2016-09-18  7:37 [PATCH 00/14] blk-mq: Minor fixes and cleanups Alexander Gordeev
                   ` (4 preceding siblings ...)
  2016-09-18  7:37 ` [PATCH 05/14] blk-mq: Remove a redundant assignment Alexander Gordeev
@ 2016-09-18  7:37 ` Alexander Gordeev
  2016-09-19 18:54   ` Omar Sandoval
  2016-09-18  7:37 ` [PATCH 07/14] blk-mq: Cleanup a loop exit condition Alexander Gordeev
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Alexander Gordeev @ 2016-09-18  7:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alexander Gordeev, linux-block

CC: linux-block@vger.kernel.org
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 block/blk-mq.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index a38fd2e..9fbfe31 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1712,13 +1712,9 @@ static int blk_mq_init_hctx(struct request_queue *q,
 		struct blk_mq_tag_set *set,
 		struct blk_mq_hw_ctx *hctx, unsigned hctx_idx)
 {
-	int node;
+	int node = hctx->numa_node;
 	unsigned flush_start_tag = set->queue_depth;
 
-	node = hctx->numa_node;
-	if (node == NUMA_NO_NODE)
-		node = hctx->numa_node = set->numa_node;
-
 	INIT_WORK(&hctx->run_work, blk_mq_run_work_fn);
 	INIT_DELAYED_WORK(&hctx->delay_work, blk_mq_delay_work_fn);
 	spin_lock_init(&hctx->lock);
@@ -1999,6 +1995,9 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
 			break;
 
 		node = blk_mq_hw_queue_to_node(q->mq_map, i);
+		if (node == NUMA_NO_NODE)
+			node = set->numa_node;
+
 		hctxs[i] = kzalloc_node(sizeof(struct blk_mq_hw_ctx),
 					GFP_KERNEL, node);
 		if (!hctxs[i])
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 07/14] blk-mq: Cleanup a loop exit condition
  2016-09-18  7:37 [PATCH 00/14] blk-mq: Minor fixes and cleanups Alexander Gordeev
                   ` (5 preceding siblings ...)
  2016-09-18  7:37 ` [PATCH 06/14] blk-mq: Fix hardware context data node selection Alexander Gordeev
@ 2016-09-18  7:37 ` Alexander Gordeev
  2016-09-19 19:00   ` Omar Sandoval
  2016-09-18  7:37 ` [PATCH 08/14] blk-mq: Get rid of unnecessary blk_mq_free_hw_queues() Alexander Gordeev
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Alexander Gordeev @ 2016-09-18  7:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alexander Gordeev, linux-block

CC: linux-block@vger.kernel.org
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 block/blk-mq.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9fbfe31..b2ef8f5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1681,16 +1681,13 @@ static void blk_mq_exit_hctx(struct request_queue *q,
 }
 
 static void blk_mq_exit_hw_queues(struct request_queue *q,
-		struct blk_mq_tag_set *set, int nr_queue)
+		struct blk_mq_tag_set *set)
 {
 	struct blk_mq_hw_ctx *hctx;
 	unsigned int i;
 
-	queue_for_each_hw_ctx(q, hctx, i) {
-		if (i == nr_queue)
-			break;
+	queue_for_each_hw_ctx(q, hctx, i)
 		blk_mq_exit_hctx(q, set, hctx, i);
-	}
 }
 
 static void blk_mq_free_hw_queues(struct request_queue *q,
@@ -2124,7 +2121,7 @@ void blk_mq_free_queue(struct request_queue *q)
 
 	blk_mq_del_queue_tag_set(q);
 
-	blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
+	blk_mq_exit_hw_queues(q, set);
 	blk_mq_free_hw_queues(q, set);
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 08/14] blk-mq: Get rid of unnecessary blk_mq_free_hw_queues()
  2016-09-18  7:37 [PATCH 00/14] blk-mq: Minor fixes and cleanups Alexander Gordeev
                   ` (6 preceding siblings ...)
  2016-09-18  7:37 ` [PATCH 07/14] blk-mq: Cleanup a loop exit condition Alexander Gordeev
@ 2016-09-18  7:37 ` Alexander Gordeev
  2016-09-18  7:37 ` [PATCH 09/14] blk-mq: Move duplicating code to blk_mq_exit_hctx() Alexander Gordeev
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Alexander Gordeev @ 2016-09-18  7:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alexander Gordeev, linux-block

CC: linux-block@vger.kernel.org
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 block/blk-mq.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index b2ef8f5..3efb700 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1686,17 +1686,8 @@ static void blk_mq_exit_hw_queues(struct request_queue *q,
 	struct blk_mq_hw_ctx *hctx;
 	unsigned int i;
 
-	queue_for_each_hw_ctx(q, hctx, i)
-		blk_mq_exit_hctx(q, set, hctx, i);
-}
-
-static void blk_mq_free_hw_queues(struct request_queue *q,
-		struct blk_mq_tag_set *set)
-{
-	struct blk_mq_hw_ctx *hctx;
-	unsigned int i;
-
 	queue_for_each_hw_ctx(q, hctx, i) {
+		blk_mq_exit_hctx(q, set, hctx, i);
 		free_cpumask_var(hctx->cpumask);
 		kfree(hctx->ctxs);
 		kfree(hctx);
@@ -2120,9 +2111,7 @@ void blk_mq_free_queue(struct request_queue *q)
 	mutex_unlock(&all_q_mutex);
 
 	blk_mq_del_queue_tag_set(q);
-
 	blk_mq_exit_hw_queues(q, set);
-	blk_mq_free_hw_queues(q, set);
 }
 
 /* Basically redo blk_mq_init_queue with queue frozen */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 09/14] blk-mq: Move duplicating code to blk_mq_exit_hctx()
  2016-09-18  7:37 [PATCH 00/14] blk-mq: Minor fixes and cleanups Alexander Gordeev
                   ` (7 preceding siblings ...)
  2016-09-18  7:37 ` [PATCH 08/14] blk-mq: Get rid of unnecessary blk_mq_free_hw_queues() Alexander Gordeev
@ 2016-09-18  7:37 ` Alexander Gordeev
  2016-09-19 17:57   ` Omar Sandoval
  2016-09-18  7:37 ` [PATCH 10/14] blk-mq: Uninit hardware context in order reverse to init Alexander Gordeev
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Alexander Gordeev @ 2016-09-18  7:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alexander Gordeev, linux-block

CC: linux-block@vger.kernel.org
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 block/blk-mq.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3efb700..cd32a08 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1678,6 +1678,10 @@ static void blk_mq_exit_hctx(struct request_queue *q,
 	blk_mq_unregister_cpu_notifier(&hctx->cpu_notifier);
 	blk_free_flush_queue(hctx->fq);
 	blk_mq_free_bitmap(&hctx->ctx_map);
+
+	free_cpumask_var(hctx->cpumask);
+	kfree(hctx->ctxs);
+	kfree(hctx);
 }
 
 static void blk_mq_exit_hw_queues(struct request_queue *q,
@@ -1686,12 +1690,8 @@ static void blk_mq_exit_hw_queues(struct request_queue *q,
 	struct blk_mq_hw_ctx *hctx;
 	unsigned int i;
 
-	queue_for_each_hw_ctx(q, hctx, i) {
+	queue_for_each_hw_ctx(q, hctx, i)
 		blk_mq_exit_hctx(q, set, hctx, i);
-		free_cpumask_var(hctx->cpumask);
-		kfree(hctx->ctxs);
-		kfree(hctx);
-	}
 
 	q->nr_hw_queues = 0;
 }
@@ -2018,12 +2018,8 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
 				set->tags[j] = NULL;
 			}
 			blk_mq_exit_hctx(q, set, hctx, j);
-			free_cpumask_var(hctx->cpumask);
 			kobject_put(&hctx->kobj);
-			kfree(hctx->ctxs);
-			kfree(hctx);
 			hctxs[j] = NULL;
-
 		}
 	}
 	q->nr_hw_queues = i;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 10/14] blk-mq: Uninit hardware context in order reverse to init
  2016-09-18  7:37 [PATCH 00/14] blk-mq: Minor fixes and cleanups Alexander Gordeev
                   ` (8 preceding siblings ...)
  2016-09-18  7:37 ` [PATCH 09/14] blk-mq: Move duplicating code to blk_mq_exit_hctx() Alexander Gordeev
@ 2016-09-18  7:37 ` Alexander Gordeev
  2016-09-19 17:59   ` Omar Sandoval
  2016-09-18  7:37 ` [PATCH 11/14] blk-mq: Move hardware context init code into single location Alexander Gordeev
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Alexander Gordeev @ 2016-09-18  7:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alexander Gordeev, linux-block

CC: linux-block@vger.kernel.org
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 block/blk-mq.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index cd32a08..c589096 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2013,12 +2013,13 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
 		struct blk_mq_hw_ctx *hctx = hctxs[j];
 
 		if (hctx) {
+			kobject_put(&hctx->kobj);
+
 			if (hctx->tags) {
 				blk_mq_free_rq_map(set, hctx->tags, j);
 				set->tags[j] = NULL;
 			}
 			blk_mq_exit_hctx(q, set, hctx, j);
-			kobject_put(&hctx->kobj);
 			hctxs[j] = NULL;
 		}
 	}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 11/14] blk-mq: Move hardware context init code into single location
  2016-09-18  7:37 [PATCH 00/14] blk-mq: Minor fixes and cleanups Alexander Gordeev
                   ` (9 preceding siblings ...)
  2016-09-18  7:37 ` [PATCH 10/14] blk-mq: Uninit hardware context in order reverse to init Alexander Gordeev
@ 2016-09-18  7:37 ` Alexander Gordeev
  2016-09-18  7:37 ` [PATCH 12/14] blk-mq: Rework blk_mq_init_hctx() function Alexander Gordeev
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Alexander Gordeev @ 2016-09-18  7:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alexander Gordeev, linux-block

Move scattered hardware context initialization code into
a single function destined to do that, blk_mq_init_hctx()

CC: linux-block@vger.kernel.org
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 block/blk-mq.c | 81 +++++++++++++++++++++++++++++-----------------------------
 1 file changed, 40 insertions(+), 41 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c589096..af6d049 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1696,17 +1696,30 @@ static void blk_mq_exit_hw_queues(struct request_queue *q,
 	q->nr_hw_queues = 0;
 }
 
-static int blk_mq_init_hctx(struct request_queue *q,
-		struct blk_mq_tag_set *set,
-		struct blk_mq_hw_ctx *hctx, unsigned hctx_idx)
+static struct blk_mq_hw_ctx *blk_mq_init_hctx(struct request_queue *q,
+		struct blk_mq_tag_set *set, unsigned hctx_idx)
 {
-	int node = hctx->numa_node;
 	unsigned flush_start_tag = set->queue_depth;
+	struct blk_mq_hw_ctx *hctx;
+	int node;
+
+	node = blk_mq_hw_queue_to_node(q->mq_map, hctx_idx);
+	if (node == NUMA_NO_NODE)
+		node = set->numa_node;
+
+	hctx = kzalloc_node(sizeof(*hctx), GFP_KERNEL, node);
+	if (!hctx)
+		return NULL;
+
+	if (!zalloc_cpumask_var_node(&hctx->cpumask, GFP_KERNEL, node))
+		goto free_hctx;
 
 	INIT_WORK(&hctx->run_work, blk_mq_run_work_fn);
 	INIT_DELAYED_WORK(&hctx->delay_work, blk_mq_delay_work_fn);
 	spin_lock_init(&hctx->lock);
 	INIT_LIST_HEAD(&hctx->dispatch);
+	atomic_set(&hctx->nr_active, 0);
+	hctx->numa_node = node;
 	hctx->queue = q;
 	hctx->queue_num = hctx_idx;
 	hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
@@ -1745,7 +1758,7 @@ static int blk_mq_init_hctx(struct request_queue *q,
 				   flush_start_tag + hctx_idx, node))
 		goto free_fq;
 
-	return 0;
+	return hctx;
 
  free_fq:
 	kfree(hctx->fq);
@@ -1758,8 +1771,11 @@ static int blk_mq_init_hctx(struct request_queue *q,
 	kfree(hctx->ctxs);
  unregister_cpu_notifier:
 	blk_mq_unregister_cpu_notifier(&hctx->cpu_notifier);
-
-	return -1;
+	free_cpumask_var(hctx->cpumask);
+ free_hctx:
+	kfree(hctx);
+	
+	return NULL;
 }
 
 static void blk_mq_init_cpu_queues(struct request_queue *q,
@@ -1971,57 +1987,40 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
 						struct request_queue *q)
 {
 	int i, j;
+	struct blk_mq_hw_ctx *hctx;
 	struct blk_mq_hw_ctx **hctxs = q->queue_hw_ctx;
 
 	blk_mq_sysfs_unregister(q);
 	for (i = 0; i < set->nr_hw_queues; i++) {
-		int node;
-
 		if (hctxs[i])
 			continue;
 		if (!set->tags[i])
 			break;
 
-		node = blk_mq_hw_queue_to_node(q->mq_map, i);
-		if (node == NUMA_NO_NODE)
-			node = set->numa_node;
-
-		hctxs[i] = kzalloc_node(sizeof(struct blk_mq_hw_ctx),
-					GFP_KERNEL, node);
-		if (!hctxs[i])
-			break;
-
-		if (!zalloc_cpumask_var_node(&hctxs[i]->cpumask, GFP_KERNEL,
-						node)) {
-			kfree(hctxs[i]);
-			hctxs[i] = NULL;
+		hctx = blk_mq_init_hctx(q, set, i);
+		if (!hctx)
 			break;
-		}
 
-		atomic_set(&hctxs[i]->nr_active, 0);
-		hctxs[i]->numa_node = node;
+		blk_mq_hctx_kobj_init(hctx);
 
-		if (blk_mq_init_hctx(q, set, hctxs[i], i)) {
-			free_cpumask_var(hctxs[i]->cpumask);
-			kfree(hctxs[i]);
-			hctxs[i] = NULL;
-			break;
-		}
-		blk_mq_hctx_kobj_init(hctxs[i]);
+		hctxs[i] = hctx;
 	}
 	for (j = i; j < q->nr_hw_queues; j++) {
-		struct blk_mq_hw_ctx *hctx = hctxs[j];
+		hctx = hctxs[j];
 
-		if (hctx) {
-			kobject_put(&hctx->kobj);
+		if (!hctx)
+			continue;
 
-			if (hctx->tags) {
-				blk_mq_free_rq_map(set, hctx->tags, j);
-				set->tags[j] = NULL;
-			}
-			blk_mq_exit_hctx(q, set, hctx, j);
-			hctxs[j] = NULL;
+		kobject_put(&hctx->kobj);
+
+		if (hctx->tags) {
+			blk_mq_free_rq_map(set, hctx->tags, j);
+			set->tags[j] = NULL;
 		}
+
+		blk_mq_exit_hctx(q, set, hctx, j);
+
+		hctxs[j] = NULL;
 	}
 	q->nr_hw_queues = i;
 	blk_mq_sysfs_register(q);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 12/14] blk-mq: Rework blk_mq_init_hctx() function
  2016-09-18  7:37 [PATCH 00/14] blk-mq: Minor fixes and cleanups Alexander Gordeev
                   ` (10 preceding siblings ...)
  2016-09-18  7:37 ` [PATCH 11/14] blk-mq: Move hardware context init code into single location Alexander Gordeev
@ 2016-09-18  7:37 ` Alexander Gordeev
  2016-09-18  7:37 ` [PATCH 13/14] blk-mq: Pair blk_mq_hctx_kobj_init() with blk_mq_hctx_kobj_put() Alexander Gordeev
  2016-09-18  7:37 ` [PATCH 14/14] blk-mq: Set flush_start_tag to BLK_MQ_MAX_DEPTH Alexander Gordeev
  13 siblings, 0 replies; 27+ messages in thread
From: Alexander Gordeev @ 2016-09-18  7:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alexander Gordeev, linux-block

Rework blk_mq_init_hctx() function so all memory allocations
are done before data initialization and callbacks invocation.
As result, the latter is avoided in tight memory conditions.

CC: linux-block@vger.kernel.org
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 block/blk-mq.c | 50 ++++++++++++++++++++++++--------------------------
 1 file changed, 24 insertions(+), 26 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index af6d049..5ecbb5f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1714,6 +1714,22 @@ static struct blk_mq_hw_ctx *blk_mq_init_hctx(struct request_queue *q,
 	if (!zalloc_cpumask_var_node(&hctx->cpumask, GFP_KERNEL, node))
 		goto free_hctx;
 
+	/*
+	 * Allocate space for all possible cpus to avoid allocation at
+	 * runtime
+	 */
+	hctx->ctxs = kmalloc_node(nr_cpu_ids * sizeof(void *),
+					GFP_KERNEL, node);
+	if (!hctx->ctxs)
+		goto free_cpumask;
+
+	if (blk_mq_alloc_bitmap(&hctx->ctx_map, node))
+		goto free_ctxs;
+
+	hctx->fq = blk_alloc_flush_queue(q, hctx->numa_node, set->cmd_size);
+	if (!hctx->fq)
+		goto free_bitmap;
+
 	INIT_WORK(&hctx->run_work, blk_mq_run_work_fn);
 	INIT_DELAYED_WORK(&hctx->delay_work, blk_mq_delay_work_fn);
 	spin_lock_init(&hctx->lock);
@@ -1722,55 +1738,37 @@ static struct blk_mq_hw_ctx *blk_mq_init_hctx(struct request_queue *q,
 	hctx->numa_node = node;
 	hctx->queue = q;
 	hctx->queue_num = hctx_idx;
+	hctx->nr_ctx = 0;
 	hctx->flags = set->flags & ~BLK_MQ_F_TAG_SHARED;
+	hctx->tags = set->tags[hctx_idx];
 
 	blk_mq_init_cpu_notifier(&hctx->cpu_notifier,
 					blk_mq_hctx_notify, hctx);
 	blk_mq_register_cpu_notifier(&hctx->cpu_notifier);
 
-	hctx->tags = set->tags[hctx_idx];
-
-	/*
-	 * Allocate space for all possible cpus to avoid allocation at
-	 * runtime
-	 */
-	hctx->ctxs = kmalloc_node(nr_cpu_ids * sizeof(void *),
-					GFP_KERNEL, node);
-	if (!hctx->ctxs)
-		goto unregister_cpu_notifier;
-
-	if (blk_mq_alloc_bitmap(&hctx->ctx_map, node))
-		goto free_ctxs;
-
-	hctx->nr_ctx = 0;
-
 	if (set->ops->init_hctx &&
 	    set->ops->init_hctx(hctx, set->driver_data, hctx_idx))
-		goto free_bitmap;
-
-	hctx->fq = blk_alloc_flush_queue(q, hctx->numa_node, set->cmd_size);
-	if (!hctx->fq)
-		goto exit_hctx;
+		goto unregister_cpu_notifier;
 
 	if (set->ops->init_request &&
 	    set->ops->init_request(set->driver_data,
 				   hctx->fq->flush_rq, hctx_idx,
 				   flush_start_tag + hctx_idx, node))
-		goto free_fq;
+		goto exit_hctx;
 
 	return hctx;
 
- free_fq:
-	kfree(hctx->fq);
  exit_hctx:
 	if (set->ops->exit_hctx)
 		set->ops->exit_hctx(hctx, hctx_idx);
+ unregister_cpu_notifier:
+	blk_mq_unregister_cpu_notifier(&hctx->cpu_notifier);
+	kfree(hctx->fq);
  free_bitmap:
 	blk_mq_free_bitmap(&hctx->ctx_map);
  free_ctxs:
 	kfree(hctx->ctxs);
- unregister_cpu_notifier:
-	blk_mq_unregister_cpu_notifier(&hctx->cpu_notifier);
+ free_cpumask:
 	free_cpumask_var(hctx->cpumask);
  free_hctx:
 	kfree(hctx);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 13/14] blk-mq: Pair blk_mq_hctx_kobj_init() with blk_mq_hctx_kobj_put()
  2016-09-18  7:37 [PATCH 00/14] blk-mq: Minor fixes and cleanups Alexander Gordeev
                   ` (11 preceding siblings ...)
  2016-09-18  7:37 ` [PATCH 12/14] blk-mq: Rework blk_mq_init_hctx() function Alexander Gordeev
@ 2016-09-18  7:37 ` Alexander Gordeev
  2016-09-18  7:37 ` [PATCH 14/14] blk-mq: Set flush_start_tag to BLK_MQ_MAX_DEPTH Alexander Gordeev
  13 siblings, 0 replies; 27+ messages in thread
From: Alexander Gordeev @ 2016-09-18  7:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alexander Gordeev, linux-block

CC: linux-block@vger.kernel.org
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 block/blk-mq-sysfs.c | 5 +++++
 block/blk-mq.c       | 2 +-
 block/blk-mq.h       | 1 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index ac5160e..81288a2 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -428,6 +428,11 @@ void blk_mq_hctx_kobj_init(struct blk_mq_hw_ctx *hctx)
 	kobject_init(&hctx->kobj, &blk_mq_hw_ktype);
 }
 
+void blk_mq_hctx_kobj_put(struct blk_mq_hw_ctx *hctx)
+{
+	kobject_put(&hctx->kobj);
+}
+
 static void blk_mq_sysfs_init(struct request_queue *q)
 {
 	struct blk_mq_ctx *ctx;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5ecbb5f..c617002 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2009,7 +2009,7 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
 		if (!hctx)
 			continue;
 
-		kobject_put(&hctx->kobj);
+		blk_mq_hctx_kobj_put(hctx);
 
 		if (hctx->tags) {
 			blk_mq_free_rq_map(set, hctx->tags, j);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index c92bb7d..1caedc3 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -62,6 +62,7 @@ static inline struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *q,
 extern int blk_mq_sysfs_register(struct request_queue *q);
 extern void blk_mq_sysfs_unregister(struct request_queue *q);
 extern void blk_mq_hctx_kobj_init(struct blk_mq_hw_ctx *hctx);
+extern void blk_mq_hctx_kobj_put(struct blk_mq_hw_ctx *hctx);
 
 extern void blk_mq_rq_timed_out(struct request *req, bool reserved);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 14/14] blk-mq: Set flush_start_tag to BLK_MQ_MAX_DEPTH
  2016-09-18  7:37 [PATCH 00/14] blk-mq: Minor fixes and cleanups Alexander Gordeev
                   ` (12 preceding siblings ...)
  2016-09-18  7:37 ` [PATCH 13/14] blk-mq: Pair blk_mq_hctx_kobj_init() with blk_mq_hctx_kobj_put() Alexander Gordeev
@ 2016-09-18  7:37 ` Alexander Gordeev
  13 siblings, 0 replies; 27+ messages in thread
From: Alexander Gordeev @ 2016-09-18  7:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alexander Gordeev, linux-block

We need flush tags unique across hardware contexts and do
not overlap with normal tags. BLK_MQ_MAX_DEPTH as a base
number seems better choice than a queue's depth.

CC: linux-block@vger.kernel.org
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
---
 block/blk-mq.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c617002..d1f5c2e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1663,14 +1663,12 @@ static void blk_mq_exit_hctx(struct request_queue *q,
 		struct blk_mq_tag_set *set,
 		struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
 {
-	unsigned flush_start_tag = set->queue_depth;
-
 	blk_mq_tag_idle(hctx);
 
 	if (set->ops->exit_request)
 		set->ops->exit_request(set->driver_data,
 				       hctx->fq->flush_rq, hctx_idx,
-				       flush_start_tag + hctx_idx);
+				       BLK_MQ_MAX_DEPTH + hctx_idx);
 
 	if (set->ops->exit_hctx)
 		set->ops->exit_hctx(hctx, hctx_idx);
@@ -1699,7 +1697,6 @@ static void blk_mq_exit_hw_queues(struct request_queue *q,
 static struct blk_mq_hw_ctx *blk_mq_init_hctx(struct request_queue *q,
 		struct blk_mq_tag_set *set, unsigned hctx_idx)
 {
-	unsigned flush_start_tag = set->queue_depth;
 	struct blk_mq_hw_ctx *hctx;
 	int node;
 
@@ -1753,7 +1750,7 @@ static struct blk_mq_hw_ctx *blk_mq_init_hctx(struct request_queue *q,
 	if (set->ops->init_request &&
 	    set->ops->init_request(set->driver_data,
 				   hctx->fq->flush_rq, hctx_idx,
-				   flush_start_tag + hctx_idx, node))
+				   BLK_MQ_MAX_DEPTH + hctx_idx, node))
 		goto exit_hctx;
 
 	return hctx;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 01/14] blk-mq: Fix memory leaks on queue cleanup
  2016-09-18  7:37 ` [PATCH 01/14] blk-mq: Fix memory leaks on queue cleanup Alexander Gordeev
@ 2016-09-19 13:51   ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2016-09-19 13:51 UTC (permalink / raw)
  To: Alexander Gordeev; +Cc: linux-kernel, linux-block

On Sun, Sep 18, 2016 at 09:37:11AM +0200, Alexander Gordeev wrote:
> Some data are leaked when blk_cleanup_queue() interface
> is called.

I don't think this is correct - we free them in blk_mq_release.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 04/14] blk-mq: Do not limit number of queues to 'nr_cpu_ids' in allocations
  2016-09-18  7:37 ` [PATCH 04/14] blk-mq: Do not limit number of queues to 'nr_cpu_ids' in allocations Alexander Gordeev
@ 2016-09-19 17:48   ` Omar Sandoval
  2016-09-20 11:44     ` Alexander Gordeev
  0 siblings, 1 reply; 27+ messages in thread
From: Omar Sandoval @ 2016-09-19 17:48 UTC (permalink / raw)
  To: Alexander Gordeev; +Cc: linux-kernel, linux-block

On Sun, Sep 18, 2016 at 09:37:14AM +0200, Alexander Gordeev wrote:
> Currently maximum number of used hardware queues is limited to
> number of CPUs in the system. However, using 'nr_cpu_ids' as
> the limit for (de-)allocations of data structures instead of
> existing data structures' counters (a) worsens readability and
> (b) leads to unused memory when number of hardware queues is
> less than number of CPUs.
> 
> CC: linux-block@vger.kernel.org
> Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
> ---
>  block/blk-mq.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 276ec7b..2c77b68 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2054,8 +2054,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
>  	if (!q->queue_ctx)
>  		goto err_exit;
>  
> -	q->queue_hw_ctx = kzalloc_node(nr_cpu_ids * sizeof(*(q->queue_hw_ctx)),
> -						GFP_KERNEL, set->numa_node);
> +	q->queue_hw_ctx = kzalloc_node(set->nr_hw_queues *
> +			sizeof(*(q->queue_hw_ctx)), GFP_KERNEL, set->numa_node);
>  	if (!q->queue_hw_ctx)
>  		goto err_percpu;
>  
> @@ -2319,7 +2319,7 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
>  	if (set->nr_hw_queues > nr_cpu_ids)
>  		set->nr_hw_queues = nr_cpu_ids;
>  
> -	set->tags = kzalloc_node(nr_cpu_ids * sizeof(struct blk_mq_tags *),
> +	set->tags = kzalloc_node(set->nr_hw_queues * sizeof(*set->tags),
>  				 GFP_KERNEL, set->numa_node);
>  	if (!set->tags)
>  		return -ENOMEM;
> @@ -2360,7 +2360,7 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
>  {
>  	int i;
>  
> -	for (i = 0; i < nr_cpu_ids; i++) {
> +	for (i = 0; i < set->nr_hw_queues; i++) {
>  		if (set->tags[i])
>  			blk_mq_free_rq_map(set, set->tags[i], i);
>  	}

I don't think this is safe since we might increase the number of
hardware queues (blk_mq_update_nr_hw_queues()).

-- 
Omar

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 09/14] blk-mq: Move duplicating code to blk_mq_exit_hctx()
  2016-09-18  7:37 ` [PATCH 09/14] blk-mq: Move duplicating code to blk_mq_exit_hctx() Alexander Gordeev
@ 2016-09-19 17:57   ` Omar Sandoval
  0 siblings, 0 replies; 27+ messages in thread
From: Omar Sandoval @ 2016-09-19 17:57 UTC (permalink / raw)
  To: Alexander Gordeev; +Cc: linux-kernel, linux-block

On Sun, Sep 18, 2016 at 09:37:19AM +0200, Alexander Gordeev wrote:
> CC: linux-block@vger.kernel.org
> Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
> ---
>  block/blk-mq.c | 14 +++++---------
>  1 file changed, 5 insertions(+), 9 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 3efb700..cd32a08 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1678,6 +1678,10 @@ static void blk_mq_exit_hctx(struct request_queue *q,
>  	blk_mq_unregister_cpu_notifier(&hctx->cpu_notifier);
>  	blk_free_flush_queue(hctx->fq);
>  	blk_mq_free_bitmap(&hctx->ctx_map);
> +
> +	free_cpumask_var(hctx->cpumask);
> +	kfree(hctx->ctxs);
> +	kfree(hctx);
>  }
>  
>  static void blk_mq_exit_hw_queues(struct request_queue *q,
> @@ -1686,12 +1690,8 @@ static void blk_mq_exit_hw_queues(struct request_queue *q,
>  	struct blk_mq_hw_ctx *hctx;
>  	unsigned int i;
>  
> -	queue_for_each_hw_ctx(q, hctx, i) {
> +	queue_for_each_hw_ctx(q, hctx, i)
>  		blk_mq_exit_hctx(q, set, hctx, i);
> -		free_cpumask_var(hctx->cpumask);
> -		kfree(hctx->ctxs);
> -		kfree(hctx);
> -	}
>  
>  	q->nr_hw_queues = 0;
>  }
> @@ -2018,12 +2018,8 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
>  				set->tags[j] = NULL;
>  			}
>  			blk_mq_exit_hctx(q, set, hctx, j);
> -			free_cpumask_var(hctx->cpumask);
>  			kobject_put(&hctx->kobj);

Now this hctx->kobj will be a use-after-free since we just kfreed hctx
in blk_mq_exit_hctx().

> -			kfree(hctx->ctxs);
> -			kfree(hctx);
>  			hctxs[j] = NULL;
> -
>  		}
>  	}
>  	q->nr_hw_queues = i;
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Omar

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 10/14] blk-mq: Uninit hardware context in order reverse to init
  2016-09-18  7:37 ` [PATCH 10/14] blk-mq: Uninit hardware context in order reverse to init Alexander Gordeev
@ 2016-09-19 17:59   ` Omar Sandoval
  0 siblings, 0 replies; 27+ messages in thread
From: Omar Sandoval @ 2016-09-19 17:59 UTC (permalink / raw)
  To: Alexander Gordeev; +Cc: linux-kernel, linux-block

On Sun, Sep 18, 2016 at 09:37:20AM +0200, Alexander Gordeev wrote:
> CC: linux-block@vger.kernel.org
> Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
> ---
>  block/blk-mq.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index cd32a08..c589096 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2013,12 +2013,13 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
>  		struct blk_mq_hw_ctx *hctx = hctxs[j];
>  
>  		if (hctx) {
> +			kobject_put(&hctx->kobj);
> +
>  			if (hctx->tags) {
>  				blk_mq_free_rq_map(set, hctx->tags, j);
>  				set->tags[j] = NULL;
>  			}
>  			blk_mq_exit_hctx(q, set, hctx, j);
> -			kobject_put(&hctx->kobj);
>  			hctxs[j] = NULL;
>  		}
>  	}

Oh, this fixes my comment from the last one, but it should be folded in
to be bisect-safe.

-- 
Omar

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 02/14] blk-mq: Fix a potential NULL pointer assignment to hctx tags
  2016-09-18  7:37 ` [PATCH 02/14] blk-mq: Fix a potential NULL pointer assignment to hctx tags Alexander Gordeev
@ 2016-09-19 18:34   ` Omar Sandoval
  0 siblings, 0 replies; 27+ messages in thread
From: Omar Sandoval @ 2016-09-19 18:34 UTC (permalink / raw)
  To: Alexander Gordeev; +Cc: linux-kernel, linux-block

On Sun, Sep 18, 2016 at 09:37:12AM +0200, Alexander Gordeev wrote:
> If number of used hardware queues is dynamically decreased
> then tags corresponding to the newly unused queues are freed.
> 
> If previously unused hardware queues are then reused again
> they will start referring the previously freed tags.
> 
> CC: linux-block@vger.kernel.org
> Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
> ---
>  block/blk-mq.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 66505af7..7fa58fe 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1995,6 +1995,8 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
>  
>  		if (hctxs[i])
>  			continue;
> +		if (!set->tags[i])
> +			break;
>  
>  		node = blk_mq_hw_queue_to_node(q->mq_map, i);
>  		hctxs[i] = kzalloc_node(sizeof(struct blk_mq_hw_ctx),

In blk_mq_map_swqueue(), we have:

		/* unmapped hw queue can be remapped after CPU topo changed */
		if (!set->tags[i])
			set->tags[i] = blk_mq_init_rq_map(set, i);
		hctx->tags = set->tags[i];
		WARN_ON(!hctx->tags);

blk_mq_map_swqueue() is called from blk_mq_queue_reinit(), which we call
from blk_mq_update_nr_hw_queues(). Is that not enough? This
initialization/resizing is a bit of a twisty maze and it's hard to
convince myself that it's all correct, so cleanup here is probably
valuable.

-- 
Omar

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 03/14] block: Get rid of unused request_queue::nr_queues member
  2016-09-18  7:37 ` [PATCH 03/14] block: Get rid of unused request_queue::nr_queues member Alexander Gordeev
@ 2016-09-19 18:37   ` Omar Sandoval
  0 siblings, 0 replies; 27+ messages in thread
From: Omar Sandoval @ 2016-09-19 18:37 UTC (permalink / raw)
  To: Alexander Gordeev; +Cc: linux-kernel, linux-block

On Sun, Sep 18, 2016 at 09:37:13AM +0200, Alexander Gordeev wrote:
> CC: linux-block@vger.kernel.org
> Signed-off-by: Alexander Gordeev <agordeev@redhat.com>

Reviewed-by: Omar Sandoval <osandov@fb.com>

> ---
>  block/blk-mq.c         | 2 --
>  include/linux/blkdev.h | 1 -
>  2 files changed, 3 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 7fa58fe..276ec7b 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2068,8 +2068,6 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
>  	INIT_WORK(&q->timeout_work, blk_mq_timeout_work);
>  	blk_queue_rq_timeout(q, set->timeout ? set->timeout : 30 * HZ);
>  
> -	q->nr_queues = nr_cpu_ids;
> -
>  	q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT;
>  
>  	if (!(set->flags & BLK_MQ_F_SG_MERGE))
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c47c358..0f1bf55 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -325,7 +325,6 @@ struct request_queue {
>  
>  	/* sw queues */
>  	struct blk_mq_ctx __percpu	*queue_ctx;
> -	unsigned int		nr_queues;
>  
>  	/* hw dispatch queues */
>  	struct blk_mq_hw_ctx	**queue_hw_ctx;
> -- 
> 1.8.3.1
> 

-- 
Omar

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 05/14] blk-mq: Remove a redundant assignment
  2016-09-18  7:37 ` [PATCH 05/14] blk-mq: Remove a redundant assignment Alexander Gordeev
@ 2016-09-19 18:38   ` Omar Sandoval
  0 siblings, 0 replies; 27+ messages in thread
From: Omar Sandoval @ 2016-09-19 18:38 UTC (permalink / raw)
  To: Alexander Gordeev; +Cc: linux-kernel, linux-block

On Sun, Sep 18, 2016 at 09:37:15AM +0200, Alexander Gordeev wrote:
> blk_mq_hw_ctx::queue_num is initialized in blk_mq_init_hctx()
> function.
> 
> CC: linux-block@vger.kernel.org
> Signed-off-by: Alexander Gordeev <agordeev@redhat.com>

Reviewed-by: Omar Sandoval <osandov@fb.com>

> ---
>  block/blk-mq.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 2c77b68..a38fd2e 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2013,7 +2013,6 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
>  
>  		atomic_set(&hctxs[i]->nr_active, 0);
>  		hctxs[i]->numa_node = node;
> -		hctxs[i]->queue_num = i;
>  
>  		if (blk_mq_init_hctx(q, set, hctxs[i], i)) {
>  			free_cpumask_var(hctxs[i]->cpumask);
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Omar

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 06/14] blk-mq: Fix hardware context data node selection
  2016-09-18  7:37 ` [PATCH 06/14] blk-mq: Fix hardware context data node selection Alexander Gordeev
@ 2016-09-19 18:54   ` Omar Sandoval
  0 siblings, 0 replies; 27+ messages in thread
From: Omar Sandoval @ 2016-09-19 18:54 UTC (permalink / raw)
  To: Alexander Gordeev; +Cc: linux-kernel, linux-block

On Sun, Sep 18, 2016 at 09:37:16AM +0200, Alexander Gordeev wrote:
> CC: linux-block@vger.kernel.org
> Signed-off-by: Alexander Gordeev <agordeev@redhat.com>

Makes sense.

Reviewed-by: Omar Sandoval <osandov@fb.com>

> ---
>  block/blk-mq.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index a38fd2e..9fbfe31 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1712,13 +1712,9 @@ static int blk_mq_init_hctx(struct request_queue *q,
>  		struct blk_mq_tag_set *set,
>  		struct blk_mq_hw_ctx *hctx, unsigned hctx_idx)
>  {
> -	int node;
> +	int node = hctx->numa_node;
>  	unsigned flush_start_tag = set->queue_depth;
>  
> -	node = hctx->numa_node;
> -	if (node == NUMA_NO_NODE)
> -		node = hctx->numa_node = set->numa_node;
> -
>  	INIT_WORK(&hctx->run_work, blk_mq_run_work_fn);
>  	INIT_DELAYED_WORK(&hctx->delay_work, blk_mq_delay_work_fn);
>  	spin_lock_init(&hctx->lock);
> @@ -1999,6 +1995,9 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
>  			break;
>  
>  		node = blk_mq_hw_queue_to_node(q->mq_map, i);
> +		if (node == NUMA_NO_NODE)
> +			node = set->numa_node;
> +
>  		hctxs[i] = kzalloc_node(sizeof(struct blk_mq_hw_ctx),
>  					GFP_KERNEL, node);
>  		if (!hctxs[i])
> -- 
> 1.8.3.1
> 

-- 
Omar

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 07/14] blk-mq: Cleanup a loop exit condition
  2016-09-18  7:37 ` [PATCH 07/14] blk-mq: Cleanup a loop exit condition Alexander Gordeev
@ 2016-09-19 19:00   ` Omar Sandoval
  2016-09-20 11:33     ` Alexander Gordeev
  0 siblings, 1 reply; 27+ messages in thread
From: Omar Sandoval @ 2016-09-19 19:00 UTC (permalink / raw)
  To: Alexander Gordeev; +Cc: linux-kernel, linux-block

On Sun, Sep 18, 2016 at 09:37:17AM +0200, Alexander Gordeev wrote:
> CC: linux-block@vger.kernel.org
> Signed-off-by: Alexander Gordeev <agordeev@redhat.com>

So set->nr_hw_queues is always >= q->nr_hw_queues, right?

Reviewed-by: Omar Sandoval <osandov@fb.com>

> ---
>  block/blk-mq.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 9fbfe31..b2ef8f5 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1681,16 +1681,13 @@ static void blk_mq_exit_hctx(struct request_queue *q,
>  }
>  
>  static void blk_mq_exit_hw_queues(struct request_queue *q,
> -		struct blk_mq_tag_set *set, int nr_queue)
> +		struct blk_mq_tag_set *set)
>  {
>  	struct blk_mq_hw_ctx *hctx;
>  	unsigned int i;
>  
> -	queue_for_each_hw_ctx(q, hctx, i) {
> -		if (i == nr_queue)
> -			break;
> +	queue_for_each_hw_ctx(q, hctx, i)
>  		blk_mq_exit_hctx(q, set, hctx, i);
> -	}
>  }
>  
>  static void blk_mq_free_hw_queues(struct request_queue *q,
> @@ -2124,7 +2121,7 @@ void blk_mq_free_queue(struct request_queue *q)
>  
>  	blk_mq_del_queue_tag_set(q);
>  
> -	blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
> +	blk_mq_exit_hw_queues(q, set);
>  	blk_mq_free_hw_queues(q, set);
>  }
>  
> -- 
> 1.8.3.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Omar

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 07/14] blk-mq: Cleanup a loop exit condition
  2016-09-19 19:00   ` Omar Sandoval
@ 2016-09-20 11:33     ` Alexander Gordeev
  0 siblings, 0 replies; 27+ messages in thread
From: Alexander Gordeev @ 2016-09-20 11:33 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-kernel, linux-block

On Mon, Sep 19, 2016 at 12:00:28PM -0700, Omar Sandoval wrote:
> On Sun, Sep 18, 2016 at 09:37:17AM +0200, Alexander Gordeev wrote:
> > CC: linux-block@vger.kernel.org
> > Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
> 
> So set->nr_hw_queues is always >= q->nr_hw_queues, right?

That is what I conclude form blk_mq_realloc_hw_ctxs() code:

	for (i = 0; i < set->nr_hw_queues; i++) {
		...
	}
	...
	q->nr_hw_queues = i;

> Reviewed-by: Omar Sandoval <osandov@fb.com>
> 
> > ---
> >  block/blk-mq.c | 9 +++------
> >  1 file changed, 3 insertions(+), 6 deletions(-)
> > 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 9fbfe31..b2ef8f5 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -1681,16 +1681,13 @@ static void blk_mq_exit_hctx(struct request_queue *q,
> >  }
> >  
> >  static void blk_mq_exit_hw_queues(struct request_queue *q,
> > -		struct blk_mq_tag_set *set, int nr_queue)
> > +		struct blk_mq_tag_set *set)
> >  {
> >  	struct blk_mq_hw_ctx *hctx;
> >  	unsigned int i;
> >  
> > -	queue_for_each_hw_ctx(q, hctx, i) {
> > -		if (i == nr_queue)
> > -			break;
> > +	queue_for_each_hw_ctx(q, hctx, i)
> >  		blk_mq_exit_hctx(q, set, hctx, i);
> > -	}
> >  }
> >  
> >  static void blk_mq_free_hw_queues(struct request_queue *q,
> > @@ -2124,7 +2121,7 @@ void blk_mq_free_queue(struct request_queue *q)
> >  
> >  	blk_mq_del_queue_tag_set(q);
> >  
> > -	blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
> > +	blk_mq_exit_hw_queues(q, set);
> >  	blk_mq_free_hw_queues(q, set);
> >  }
> >  
> > -- 
> > 1.8.3.1
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-block" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> Omar

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 04/14] blk-mq: Do not limit number of queues to 'nr_cpu_ids' in allocations
  2016-09-19 17:48   ` Omar Sandoval
@ 2016-09-20 11:44     ` Alexander Gordeev
  2016-09-20 17:26       ` Omar Sandoval
  0 siblings, 1 reply; 27+ messages in thread
From: Alexander Gordeev @ 2016-09-20 11:44 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: linux-kernel, linux-block

On Mon, Sep 19, 2016 at 10:48:49AM -0700, Omar Sandoval wrote:
> On Sun, Sep 18, 2016 at 09:37:14AM +0200, Alexander Gordeev wrote:
> > Currently maximum number of used hardware queues is limited to
> > number of CPUs in the system. However, using 'nr_cpu_ids' as
> > the limit for (de-)allocations of data structures instead of
> > existing data structures' counters (a) worsens readability and
> > (b) leads to unused memory when number of hardware queues is
> > less than number of CPUs.
> > 
> > CC: linux-block@vger.kernel.org
> > Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
> > ---
> >  block/blk-mq.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 276ec7b..2c77b68 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -2054,8 +2054,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
> >  	if (!q->queue_ctx)
> >  		goto err_exit;
> >  
> > -	q->queue_hw_ctx = kzalloc_node(nr_cpu_ids * sizeof(*(q->queue_hw_ctx)),
> > -						GFP_KERNEL, set->numa_node);
> > +	q->queue_hw_ctx = kzalloc_node(set->nr_hw_queues *
> > +			sizeof(*(q->queue_hw_ctx)), GFP_KERNEL, set->numa_node);
> >  	if (!q->queue_hw_ctx)
> >  		goto err_percpu;
> >  
> > @@ -2319,7 +2319,7 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
> >  	if (set->nr_hw_queues > nr_cpu_ids)
> >  		set->nr_hw_queues = nr_cpu_ids;
> >  
> > -	set->tags = kzalloc_node(nr_cpu_ids * sizeof(struct blk_mq_tags *),
> > +	set->tags = kzalloc_node(set->nr_hw_queues * sizeof(*set->tags),
> >  				 GFP_KERNEL, set->numa_node);
> >  	if (!set->tags)
> >  		return -ENOMEM;
> > @@ -2360,7 +2360,7 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
> >  {
> >  	int i;
> >  
> > -	for (i = 0; i < nr_cpu_ids; i++) {
> > +	for (i = 0; i < set->nr_hw_queues; i++) {
> >  		if (set->tags[i])
> >  			blk_mq_free_rq_map(set, set->tags[i], i);
> >  	}
> 
> I don't think this is safe since we might increase the number of
> hardware queues (blk_mq_update_nr_hw_queues()).

It is safe, because blk_mq_update_nr_hw_queues() limits number of
hardware queues to nr_cpu_ids. But still using nr_cpu_ids is wrong,
because (a) set->nr_hw_queues could be less than nr_cpu_ids and
(b) it is set->nr_hw_queues counter that tracks size of the array.

> -- 
> Omar

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 04/14] blk-mq: Do not limit number of queues to 'nr_cpu_ids' in allocations
  2016-09-20 11:44     ` Alexander Gordeev
@ 2016-09-20 17:26       ` Omar Sandoval
  0 siblings, 0 replies; 27+ messages in thread
From: Omar Sandoval @ 2016-09-20 17:26 UTC (permalink / raw)
  To: Alexander Gordeev; +Cc: linux-kernel, linux-block

On Tue, Sep 20, 2016 at 01:44:36PM +0200, Alexander Gordeev wrote:
> On Mon, Sep 19, 2016 at 10:48:49AM -0700, Omar Sandoval wrote:
> > On Sun, Sep 18, 2016 at 09:37:14AM +0200, Alexander Gordeev wrote:
> > > Currently maximum number of used hardware queues is limited to
> > > number of CPUs in the system. However, using 'nr_cpu_ids' as
> > > the limit for (de-)allocations of data structures instead of
> > > existing data structures' counters (a) worsens readability and
> > > (b) leads to unused memory when number of hardware queues is
> > > less than number of CPUs.
> > > 
> > > CC: linux-block@vger.kernel.org
> > > Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
> > > ---
> > >  block/blk-mq.c | 8 ++++----
> > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > > index 276ec7b..2c77b68 100644
> > > --- a/block/blk-mq.c
> > > +++ b/block/blk-mq.c
> > > @@ -2054,8 +2054,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
> > >  	if (!q->queue_ctx)
> > >  		goto err_exit;
> > >  
> > > -	q->queue_hw_ctx = kzalloc_node(nr_cpu_ids * sizeof(*(q->queue_hw_ctx)),
> > > -						GFP_KERNEL, set->numa_node);
> > > +	q->queue_hw_ctx = kzalloc_node(set->nr_hw_queues *
> > > +			sizeof(*(q->queue_hw_ctx)), GFP_KERNEL, set->numa_node);
> > >  	if (!q->queue_hw_ctx)
> > >  		goto err_percpu;
> > >  
> > > @@ -2319,7 +2319,7 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
> > >  	if (set->nr_hw_queues > nr_cpu_ids)
> > >  		set->nr_hw_queues = nr_cpu_ids;
> > >  
> > > -	set->tags = kzalloc_node(nr_cpu_ids * sizeof(struct blk_mq_tags *),
> > > +	set->tags = kzalloc_node(set->nr_hw_queues * sizeof(*set->tags),
> > >  				 GFP_KERNEL, set->numa_node);
> > >  	if (!set->tags)
> > >  		return -ENOMEM;
> > > @@ -2360,7 +2360,7 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
> > >  {
> > >  	int i;
> > >  
> > > -	for (i = 0; i < nr_cpu_ids; i++) {
> > > +	for (i = 0; i < set->nr_hw_queues; i++) {
> > >  		if (set->tags[i])
> > >  			blk_mq_free_rq_map(set, set->tags[i], i);
> > >  	}
> > 
> > I don't think this is safe since we might increase the number of
> > hardware queues (blk_mq_update_nr_hw_queues()).
> 
> It is safe, because blk_mq_update_nr_hw_queues() limits number of
> hardware queues to nr_cpu_ids. But still using nr_cpu_ids is wrong,
> because (a) set->nr_hw_queues could be less than nr_cpu_ids and
> (b) it is set->nr_hw_queues counter that tracks size of the array.

But say we originally call blk_mq_init_allocated_queue() with
set->nr_hw_queues = 1 and later we call blk_mq_update_nr_hw_queues()
with nr_hw_queues = nr_cpu_ids (say, 4). Then we update (line numbers
from v4.8-rc7):

   2406         set->nr_hw_queues = nr_hw_queues;

So, when we call blk_mq_realloc_hw_ctxs(), we do:

   1998         for (i = 0; i < set->nr_hw_queues; i++) {
   1999                 int node;
   2000
   2001                 if (hctxs[i])
   2002                         continue;
   2003
   2004                 node = blk_mq_hw_queue_to_node(q->mq_map, i);
   2005                 hctxs[i] = kzalloc_node(sizeof(struct blk_mq_hw_ctx),
   2006                                         GFP_KERNEL, node);

Now set->nr_hw_queues = 4 but we only allocated 1 hardware queue, so
we'll scribble off the end of the array.

Similar problem in blk_mq_map_swqueue():

   1850         queue_for_each_hw_ctx(q, hctx, i) {
   1851                 struct blk_mq_ctxmap *map = &hctx->ctx_map;
   1852
   1853                 /*
   1854                  * If no software queues are mapped to this hardware queue,
   1855                  * disable it and free the request entries.
   1856                  */
   1857                 if (!hctx->nr_ctx) {
   1858                         if (set->tags[i]) {
   1859                                 blk_mq_free_rq_map(set, set->tags[i], i);
   1860                                 set->tags[i] = NULL;
   1861                         }
   1862                         hctx->tags = NULL;
   1863                         continue;
   1864                 }
   1865
   1866                 /* unmapped hw queue can be remapped after CPU topo changed */
   1867                 if (!set->tags[i])
   1868                         set->tags[i] = blk_mq_init_rq_map(set, i);

q->nr_hw_queues would now be 4, but we only allocated one element for
set->tags.

Allocating nr_cpu_ids upfront is the easiest way to handle adding
hardware queues up to nr_cpu_ids as needed. Both of these are just
arrays of pointers, so it's not like it's a huge waste of memory.

-- 
Omar

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-09-20 17:26 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-18  7:37 [PATCH 00/14] blk-mq: Minor fixes and cleanups Alexander Gordeev
2016-09-18  7:37 ` [PATCH 01/14] blk-mq: Fix memory leaks on queue cleanup Alexander Gordeev
2016-09-19 13:51   ` Christoph Hellwig
2016-09-18  7:37 ` [PATCH 02/14] blk-mq: Fix a potential NULL pointer assignment to hctx tags Alexander Gordeev
2016-09-19 18:34   ` Omar Sandoval
2016-09-18  7:37 ` [PATCH 03/14] block: Get rid of unused request_queue::nr_queues member Alexander Gordeev
2016-09-19 18:37   ` Omar Sandoval
2016-09-18  7:37 ` [PATCH 04/14] blk-mq: Do not limit number of queues to 'nr_cpu_ids' in allocations Alexander Gordeev
2016-09-19 17:48   ` Omar Sandoval
2016-09-20 11:44     ` Alexander Gordeev
2016-09-20 17:26       ` Omar Sandoval
2016-09-18  7:37 ` [PATCH 05/14] blk-mq: Remove a redundant assignment Alexander Gordeev
2016-09-19 18:38   ` Omar Sandoval
2016-09-18  7:37 ` [PATCH 06/14] blk-mq: Fix hardware context data node selection Alexander Gordeev
2016-09-19 18:54   ` Omar Sandoval
2016-09-18  7:37 ` [PATCH 07/14] blk-mq: Cleanup a loop exit condition Alexander Gordeev
2016-09-19 19:00   ` Omar Sandoval
2016-09-20 11:33     ` Alexander Gordeev
2016-09-18  7:37 ` [PATCH 08/14] blk-mq: Get rid of unnecessary blk_mq_free_hw_queues() Alexander Gordeev
2016-09-18  7:37 ` [PATCH 09/14] blk-mq: Move duplicating code to blk_mq_exit_hctx() Alexander Gordeev
2016-09-19 17:57   ` Omar Sandoval
2016-09-18  7:37 ` [PATCH 10/14] blk-mq: Uninit hardware context in order reverse to init Alexander Gordeev
2016-09-19 17:59   ` Omar Sandoval
2016-09-18  7:37 ` [PATCH 11/14] blk-mq: Move hardware context init code into single location Alexander Gordeev
2016-09-18  7:37 ` [PATCH 12/14] blk-mq: Rework blk_mq_init_hctx() function Alexander Gordeev
2016-09-18  7:37 ` [PATCH 13/14] blk-mq: Pair blk_mq_hctx_kobj_init() with blk_mq_hctx_kobj_put() Alexander Gordeev
2016-09-18  7:37 ` [PATCH 14/14] blk-mq: Set flush_start_tag to BLK_MQ_MAX_DEPTH Alexander Gordeev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.