linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Reduce the amount of memory required for request queues
@ 2019-10-21 22:42 Bart Van Assche
  2019-10-21 22:42 ` [PATCH 1/4] block: Remove the synchronize_rcu() call from __blk_mq_update_nr_hw_queues() Bart Van Assche
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Bart Van Assche @ 2019-10-21 22:42 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Bart Van Assche

Hi Jens,

This patch series reduces the amount of memory required for request queues
and also includes two patches that are the result of reviewing code related
to the modified code. Please consider these patches for kernel version v5.5.

Thanks,

Bart.

Bart Van Assche (4):
  block: Remove the synchronize_rcu() call from
    __blk_mq_update_nr_hw_queues()
  block: Fix a race between blk_poll() and blk_mq_update_nr_hw_queues()
  block: Reduce the amount of memory required per request queue
  block: Reduce the amount of memory used for tag sets

 block/blk-mq.c | 113 +++++++++++++++++++++++++++++++------------------
 1 file changed, 72 insertions(+), 41 deletions(-)

-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] block: Remove the synchronize_rcu() call from __blk_mq_update_nr_hw_queues()
  2019-10-21 22:42 [PATCH 0/4] Reduce the amount of memory required for request queues Bart Van Assche
@ 2019-10-21 22:42 ` Bart Van Assche
  2019-10-22  9:32   ` Ming Lei
  2019-10-21 22:42 ` [PATCH 2/4] block: Fix a race between blk_poll() and blk_mq_update_nr_hw_queues() Bart Van Assche
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2019-10-21 22:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche,
	Christoph Hellwig, Ming Lei, Hannes Reinecke, Johannes Thumshirn

Since the blk_mq_{,un}freeze_queue() calls in __blk_mq_update_nr_hw_queues()
already serialize __blk_mq_update_nr_hw_queues() against
blk_mq_queue_tag_busy_iter(), the synchronize_rcu() call in
__blk_mq_update_nr_hw_queues() is not necessary. Hence remove it.

Note: the synchronize_rcu() call in __blk_mq_update_nr_hw_queues() was
introduced by commit f5bbbbe4d635 ("blk-mq: sync the update nr_hw_queues with
blk_mq_queue_tag_busy_iter"). Commit 530ca2c9bd69 ("blk-mq: Allow blocking
queue tag iter callbacks") removed the rcu_read_{,un}lock() calls that
correspond to the synchronize_rcu() call in __blk_mq_update_nr_hw_queues().

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8538dc415499..7528678ef41f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3242,10 +3242,6 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 
 	list_for_each_entry(q, &set->tag_list, tag_set_list)
 		blk_mq_freeze_queue(q);
-	/*
-	 * Sync with blk_mq_queue_tag_busy_iter.
-	 */
-	synchronize_rcu();
 	/*
 	 * Switch IO scheduler to 'none', cleaning up the data associated
 	 * with the previous scheduler. We will switch back once we are done
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] block: Fix a race between blk_poll() and blk_mq_update_nr_hw_queues()
  2019-10-21 22:42 [PATCH 0/4] Reduce the amount of memory required for request queues Bart Van Assche
  2019-10-21 22:42 ` [PATCH 1/4] block: Remove the synchronize_rcu() call from __blk_mq_update_nr_hw_queues() Bart Van Assche
@ 2019-10-21 22:42 ` Bart Van Assche
  2019-10-22  9:41   ` Ming Lei
  2019-10-21 22:42 ` [PATCH 3/4] block: Reduce the amount of memory required per request queue Bart Van Assche
  2019-10-21 22:42 ` [PATCH 4/4] block: Reduce the amount of memory used for tag sets Bart Van Assche
  3 siblings, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2019-10-21 22:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche,
	Christoph Hellwig, Ming Lei, Hannes Reinecke, Johannes Thumshirn

If blk_poll() is called if no requests are in progress, it may happen that
blk_mq_update_nr_hw_queues() modifies the data structures used by blk_poll(),
e.g. q->queue_hw_ctx[]. Fix this race by serializing blk_poll() against
blk_mq_update_nr_hw_queues().

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq.c | 38 +++++++++++++++++++++++++-------------
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7528678ef41f..ea64d951f411 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3439,19 +3439,7 @@ static bool blk_mq_poll_hybrid(struct request_queue *q,
 	return blk_mq_poll_hybrid_sleep(q, hctx, rq);
 }
 
-/**
- * blk_poll - poll for IO completions
- * @q:  the queue
- * @cookie: cookie passed back at IO submission time
- * @spin: whether to spin for completions
- *
- * Description:
- *    Poll for completions on the passed in queue. Returns number of
- *    completed entries found. If @spin is true, then blk_poll will continue
- *    looping until at least one completion is found, unless the task is
- *    otherwise marked running (or we need to reschedule).
- */
-int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
+static int __blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
 {
 	struct blk_mq_hw_ctx *hctx;
 	long state;
@@ -3503,6 +3491,30 @@ int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
 	__set_current_state(TASK_RUNNING);
 	return 0;
 }
+
+/**
+ * blk_poll - poll for IO completions
+ * @q:  the queue
+ * @cookie: cookie passed back at IO submission time
+ * @spin: whether to spin for completions
+ *
+ * Description:
+ *    Poll for completions on the passed in queue. Returns number of
+ *    completed entries found. If @spin is true, then blk_poll will continue
+ *    looping until at least one completion is found, unless the task is
+ *    otherwise marked running (or we need to reschedule).
+ */
+int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
+{
+	int ret;
+
+	if (!percpu_ref_tryget(&q->q_usage_counter))
+		return 0;
+	ret = __blk_poll(q, cookie, spin);
+	blk_queue_exit(q);
+
+	return ret;
+}
 EXPORT_SYMBOL_GPL(blk_poll);
 
 unsigned int blk_mq_rq_cpu(struct request *rq)
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] block: Reduce the amount of memory required per request queue
  2019-10-21 22:42 [PATCH 0/4] Reduce the amount of memory required for request queues Bart Van Assche
  2019-10-21 22:42 ` [PATCH 1/4] block: Remove the synchronize_rcu() call from __blk_mq_update_nr_hw_queues() Bart Van Assche
  2019-10-21 22:42 ` [PATCH 2/4] block: Fix a race between blk_poll() and blk_mq_update_nr_hw_queues() Bart Van Assche
@ 2019-10-21 22:42 ` Bart Van Assche
  2019-10-21 22:42 ` [PATCH 4/4] block: Reduce the amount of memory used for tag sets Bart Van Assche
  3 siblings, 0 replies; 9+ messages in thread
From: Bart Van Assche @ 2019-10-21 22:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Keith Busch,
	Christoph Hellwig, Ming Lei, Hannes Reinecke, Johannes Thumshirn

Instead of always allocating at least nr_cpu_ids hardware queues per request
queue, reallocate q->queue_hw_ctx if it has to grow. This patch improves
behavior that was introduced by commit 868f2f0b7206 ("blk-mq: dynamic h/w
context count").

Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index ea64d951f411..86f6852130fc 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2761,6 +2761,23 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
 	int i, j, end;
 	struct blk_mq_hw_ctx **hctxs = q->queue_hw_ctx;
 
+	if (q->nr_hw_queues < set->nr_hw_queues) {
+		struct blk_mq_hw_ctx **new_hctxs;
+
+		new_hctxs = kcalloc_node(set->nr_hw_queues,
+				       sizeof(*new_hctxs), GFP_KERNEL,
+				       set->numa_node);
+		if (!new_hctxs)
+			return;
+		if (hctxs)
+			memcpy(new_hctxs, hctxs, q->nr_hw_queues *
+			       sizeof(*hctxs));
+		q->queue_hw_ctx = new_hctxs;
+		q->nr_hw_queues = set->nr_hw_queues;
+		kfree(hctxs);
+		hctxs = new_hctxs;
+	}
+
 	/* protect against switching io scheduler  */
 	mutex_lock(&q->sysfs_lock);
 	for (i = 0; i < set->nr_hw_queues; i++) {
@@ -2848,12 +2865,6 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	/* init q->mq_kobj and sw queues' kobjects */
 	blk_mq_sysfs_init(q);
 
-	q->queue_hw_ctx = kcalloc_node(nr_hw_queues(set),
-				       sizeof(*(q->queue_hw_ctx)), GFP_KERNEL,
-				       set->numa_node);
-	if (!q->queue_hw_ctx)
-		goto err_sys_init;
-
 	INIT_LIST_HEAD(&q->unused_hctx_list);
 	spin_lock_init(&q->unused_hctx_lock);
 
@@ -2901,7 +2912,6 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 err_hctxs:
 	kfree(q->queue_hw_ctx);
 	q->nr_hw_queues = 0;
-err_sys_init:
 	blk_mq_sysfs_deinit(q);
 err_poll:
 	blk_stat_free_callback(q->poll_cb);
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] block: Reduce the amount of memory used for tag sets
  2019-10-21 22:42 [PATCH 0/4] Reduce the amount of memory required for request queues Bart Van Assche
                   ` (2 preceding siblings ...)
  2019-10-21 22:42 ` [PATCH 3/4] block: Reduce the amount of memory required per request queue Bart Van Assche
@ 2019-10-21 22:42 ` Bart Van Assche
  3 siblings, 0 replies; 9+ messages in thread
From: Bart Van Assche @ 2019-10-21 22:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Keith Busch,
	Christoph Hellwig, Ming Lei, Hannes Reinecke, Johannes Thumshirn

Instead of allocating an array of size nr_cpu_ids for set->tags, allocate
an array of size set->nr_hw_queues. This patch improves behavior that was
introduced by commit 868f2f0b7206 ("blk-mq: dynamic h/w context count").

Reallocating tag sets from inside __blk_mq_update_nr_hw_queues() is safe
because:
- All request queues that share the tag sets are frozen before the tag sets
  are reallocated.
- blk_mq_queue_tag_busy_iter() holds q->q_usage_counter while active and
  hence is serialized against __blk_mq_update_nr_hw_queues().

Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq.c | 47 ++++++++++++++++++++++++++++++-----------------
 1 file changed, 30 insertions(+), 17 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 86f6852130fc..1279db579fa2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2833,19 +2833,6 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
 	mutex_unlock(&q->sysfs_lock);
 }
 
-/*
- * Maximum number of hardware queues we support. For single sets, we'll never
- * have more than the CPUs (software queues). For multiple sets, the tag_set
- * user may have set ->nr_hw_queues larger.
- */
-static unsigned int nr_hw_queues(struct blk_mq_tag_set *set)
-{
-	if (set->nr_maps == 1)
-		return nr_cpu_ids;
-
-	return max(set->nr_hw_queues, nr_cpu_ids);
-}
-
 struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 						  struct request_queue *q,
 						  bool elevator_init)
@@ -3012,6 +2999,29 @@ static int blk_mq_update_queue_map(struct blk_mq_tag_set *set)
 	}
 }
 
+static int blk_mq_realloc_tag_set_tags(struct blk_mq_tag_set *set,
+				  int cur_nr_hw_queues, int new_nr_hw_queues)
+{
+	struct blk_mq_tags **new_tags;
+
+	if (cur_nr_hw_queues >= new_nr_hw_queues)
+		return 0;
+
+	new_tags = kcalloc_node(new_nr_hw_queues, sizeof(struct blk_mq_tags *),
+				GFP_KERNEL, set->numa_node);
+	if (!new_tags)
+		return -ENOMEM;
+
+	if (set->tags)
+		memcpy(new_tags, set->tags, cur_nr_hw_queues *
+		       sizeof(*set->tags));
+	kfree(set->tags);
+	set->tags = new_tags;
+	set->nr_hw_queues = new_nr_hw_queues;
+
+	return 0;
+}
+
 /*
  * Alloc a tag set to be associated with one or more request queues.
  * May fail with EINVAL for various error conditions. May adjust the
@@ -3065,9 +3075,7 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 	if (set->nr_maps == 1 && set->nr_hw_queues > nr_cpu_ids)
 		set->nr_hw_queues = nr_cpu_ids;
 
-	set->tags = kcalloc_node(nr_hw_queues(set), sizeof(struct blk_mq_tags *),
-				 GFP_KERNEL, set->numa_node);
-	if (!set->tags)
+	if (blk_mq_realloc_tag_set_tags(set, 0, set->nr_hw_queues) < 0)
 		return -ENOMEM;
 
 	ret = -ENOMEM;
@@ -3108,7 +3116,7 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
 {
 	int i, j;
 
-	for (i = 0; i < nr_hw_queues(set); i++)
+	for (i = 0; i < set->nr_hw_queues; i++)
 		blk_mq_free_map_and_requests(set, i);
 
 	for (j = 0; j < set->nr_maps; j++) {
@@ -3266,6 +3274,10 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 		blk_mq_sysfs_unregister(q);
 	}
 
+	if (blk_mq_realloc_tag_set_tags(set, set->nr_hw_queues, nr_hw_queues) <
+	    0)
+		goto reregister;
+
 	prev_nr_hw_queues = set->nr_hw_queues;
 	set->nr_hw_queues = nr_hw_queues;
 	blk_mq_update_queue_map(set);
@@ -3282,6 +3294,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 		blk_mq_map_swqueue(q);
 	}
 
+reregister:
 	list_for_each_entry(q, &set->tag_list, tag_set_list) {
 		blk_mq_sysfs_register(q);
 		blk_mq_debugfs_register_hctxs(q);
-- 
2.23.0.866.gb869b98d4c-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] block: Remove the synchronize_rcu() call from __blk_mq_update_nr_hw_queues()
  2019-10-21 22:42 ` [PATCH 1/4] block: Remove the synchronize_rcu() call from __blk_mq_update_nr_hw_queues() Bart Van Assche
@ 2019-10-22  9:32   ` Ming Lei
  0 siblings, 0 replies; 9+ messages in thread
From: Ming Lei @ 2019-10-22  9:32 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Christoph Hellwig,
	Hannes Reinecke, Johannes Thumshirn

On Mon, Oct 21, 2019 at 03:42:56PM -0700, Bart Van Assche wrote:
> Since the blk_mq_{,un}freeze_queue() calls in __blk_mq_update_nr_hw_queues()
> already serialize __blk_mq_update_nr_hw_queues() against
> blk_mq_queue_tag_busy_iter(), the synchronize_rcu() call in
> __blk_mq_update_nr_hw_queues() is not necessary. Hence remove it.
> 
> Note: the synchronize_rcu() call in __blk_mq_update_nr_hw_queues() was
> introduced by commit f5bbbbe4d635 ("blk-mq: sync the update nr_hw_queues with
> blk_mq_queue_tag_busy_iter"). Commit 530ca2c9bd69 ("blk-mq: Allow blocking
> queue tag iter callbacks") removed the rcu_read_{,un}lock() calls that
> correspond to the synchronize_rcu() call in __blk_mq_update_nr_hw_queues().
> 
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-mq.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 8538dc415499..7528678ef41f 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3242,10 +3242,6 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
>  
>  	list_for_each_entry(q, &set->tag_list, tag_set_list)
>  		blk_mq_freeze_queue(q);
> -	/*
> -	 * Sync with blk_mq_queue_tag_busy_iter.
> -	 */
> -	synchronize_rcu();
>  	/*
>  	 * Switch IO scheduler to 'none', cleaning up the data associated
>  	 * with the previous scheduler. We will switch back once we are done
> -- 
> 2.23.0.866.gb869b98d4c-goog
> 

Reviewed-by: Ming Lei <ming.lei@redhat.com>

-- 
Ming


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/4] block: Fix a race between blk_poll() and blk_mq_update_nr_hw_queues()
  2019-10-21 22:42 ` [PATCH 2/4] block: Fix a race between blk_poll() and blk_mq_update_nr_hw_queues() Bart Van Assche
@ 2019-10-22  9:41   ` Ming Lei
  2019-10-23 20:58     ` Bart Van Assche
  0 siblings, 1 reply; 9+ messages in thread
From: Ming Lei @ 2019-10-22  9:41 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Christoph Hellwig,
	Hannes Reinecke, Johannes Thumshirn

On Mon, Oct 21, 2019 at 03:42:57PM -0700, Bart Van Assche wrote:
> If blk_poll() is called if no requests are in progress, it may happen that
> blk_mq_update_nr_hw_queues() modifies the data structures used by blk_poll(),
> e.g. q->queue_hw_ctx[]. Fix this race by serializing blk_poll() against
> blk_mq_update_nr_hw_queues().
> 
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Johannes Thumshirn <jthumshirn@suse.de>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-mq.c | 38 +++++++++++++++++++++++++-------------
>  1 file changed, 25 insertions(+), 13 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 7528678ef41f..ea64d951f411 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3439,19 +3439,7 @@ static bool blk_mq_poll_hybrid(struct request_queue *q,
>  	return blk_mq_poll_hybrid_sleep(q, hctx, rq);
>  }
>  
> -/**
> - * blk_poll - poll for IO completions
> - * @q:  the queue
> - * @cookie: cookie passed back at IO submission time
> - * @spin: whether to spin for completions
> - *
> - * Description:
> - *    Poll for completions on the passed in queue. Returns number of
> - *    completed entries found. If @spin is true, then blk_poll will continue
> - *    looping until at least one completion is found, unless the task is
> - *    otherwise marked running (or we need to reschedule).
> - */
> -int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
> +static int __blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
>  {
>  	struct blk_mq_hw_ctx *hctx;
>  	long state;
> @@ -3503,6 +3491,30 @@ int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
>  	__set_current_state(TASK_RUNNING);
>  	return 0;
>  }
> +
> +/**
> + * blk_poll - poll for IO completions
> + * @q:  the queue
> + * @cookie: cookie passed back at IO submission time
> + * @spin: whether to spin for completions
> + *
> + * Description:
> + *    Poll for completions on the passed in queue. Returns number of
> + *    completed entries found. If @spin is true, then blk_poll will continue
> + *    looping until at least one completion is found, unless the task is
> + *    otherwise marked running (or we need to reschedule).
> + */
> +int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
> +{
> +	int ret;
> +
> +	if (!percpu_ref_tryget(&q->q_usage_counter))
> +		return 0;
> +	ret = __blk_poll(q, cookie, spin);
> +	blk_queue_exit(q);
> +
> +	return ret;
> +}

IMO, this change isn't required. Caller of blk_poll is supposed to
hold refcount of the request queue, then the related hctx data structure
won't go away. When the hctx is in transient state, there can't be IO
to be polled, and it is safe to call into IO path.

BTW, .poll is absolutely the fast path, we should be careful to add code
in this path.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/4] block: Fix a race between blk_poll() and blk_mq_update_nr_hw_queues()
  2019-10-22  9:41   ` Ming Lei
@ 2019-10-23 20:58     ` Bart Van Assche
  2019-10-24  0:33       ` Ming Lei
  0 siblings, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2019-10-23 20:58 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Christoph Hellwig,
	Hannes Reinecke, Johannes Thumshirn

On 2019-10-22 02:41, Ming Lei wrote:
> On Mon, Oct 21, 2019 at 03:42:57PM -0700, Bart Van Assche wrote:
>> +int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
>> +{
>> +	int ret;
>> +
>> +	if (!percpu_ref_tryget(&q->q_usage_counter))
>> +		return 0;
>> +	ret = __blk_poll(q, cookie, spin);
>> +	blk_queue_exit(q);
>> +
>> +	return ret;
>> +}
> 
> IMO, this change isn't required. Caller of blk_poll is supposed to
> hold refcount of the request queue, then the related hctx data structure
> won't go away. When the hctx is in transient state, there can't be IO
> to be polled, and it is safe to call into IO path.
> 
> BTW, .poll is absolutely the fast path, we should be careful to add code
> in this path.

Hi Ming,

I'm not sure whether all blk_poll() callers really hold a refcount on
the request queue. Anyway, I will convert this code change into a
comment that explains that blk_poll() callers must hold a request queue
reference.

Bart.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/4] block: Fix a race between blk_poll() and blk_mq_update_nr_hw_queues()
  2019-10-23 20:58     ` Bart Van Assche
@ 2019-10-24  0:33       ` Ming Lei
  0 siblings, 0 replies; 9+ messages in thread
From: Ming Lei @ 2019-10-24  0:33 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Christoph Hellwig,
	Hannes Reinecke, Johannes Thumshirn

On Wed, Oct 23, 2019 at 01:58:24PM -0700, Bart Van Assche wrote:
> On 2019-10-22 02:41, Ming Lei wrote:
> > On Mon, Oct 21, 2019 at 03:42:57PM -0700, Bart Van Assche wrote:
> >> +int blk_poll(struct request_queue *q, blk_qc_t cookie, bool spin)
> >> +{
> >> +	int ret;
> >> +
> >> +	if (!percpu_ref_tryget(&q->q_usage_counter))
> >> +		return 0;
> >> +	ret = __blk_poll(q, cookie, spin);
> >> +	blk_queue_exit(q);
> >> +
> >> +	return ret;
> >> +}
> > 
> > IMO, this change isn't required. Caller of blk_poll is supposed to
> > hold refcount of the request queue, then the related hctx data structure
> > won't go away. When the hctx is in transient state, there can't be IO
> > to be polled, and it is safe to call into IO path.
> > 
> > BTW, .poll is absolutely the fast path, we should be careful to add code
> > in this path.
> 
> Hi Ming,
> 
> I'm not sure whether all blk_poll() callers really hold a refcount on
> the request queue.

Please see __device_add_disk() which takes one extra queue ref, which
won't be dropped until the disk is released.

Meantime blkdev_get() holds gendisk reference via bdev_get_gendisk(),
and blkdev_get() is called by blkdev_open().

The above way should guarantee that the request queue won't go away
when IO is submitted to queue via blkdev fs inode.

> Anyway, I will convert this code change into a
> comment that explains that blk_poll() callers must hold a request queue
> reference.

Good point.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-10-24  0:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-21 22:42 [PATCH 0/4] Reduce the amount of memory required for request queues Bart Van Assche
2019-10-21 22:42 ` [PATCH 1/4] block: Remove the synchronize_rcu() call from __blk_mq_update_nr_hw_queues() Bart Van Assche
2019-10-22  9:32   ` Ming Lei
2019-10-21 22:42 ` [PATCH 2/4] block: Fix a race between blk_poll() and blk_mq_update_nr_hw_queues() Bart Van Assche
2019-10-22  9:41   ` Ming Lei
2019-10-23 20:58     ` Bart Van Assche
2019-10-24  0:33       ` Ming Lei
2019-10-21 22:42 ` [PATCH 3/4] block: Reduce the amount of memory required per request queue Bart Van Assche
2019-10-21 22:42 ` [PATCH 4/4] block: Reduce the amount of memory used for tag sets Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).