All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] block: Fix a race between tag iteration and hardware queue changes
@ 2019-04-02 22:32 Bart Van Assche
  2019-04-03  1:02 ` Dongli Zhang
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Bart Van Assche @ 2019-04-02 22:32 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche,
	Christoph Hellwig, Hannes Reinecke, James Smart, Ming Lei,
	Jianchao Wang, Keith Busch, Dongli Zhang, stable

Since the callback function called by blk_mq_queue_tag_busy_iter()
may sleep calling synchronize_rcu() from __blk_mq_update_nr_hw_queues()
is not sufficient to wait until blk_mq_queue_tag_busy_iter() has
finished. Instead of making __blk_mq_update_nr_hw_queues() wait until
q->q_usage_counter == 0 is globally visible, do not iterate over tags
if the request queue is frozen.

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: <stable@vger.kernel.org>
Fixes: 530ca2c9bd69 ("blk-mq: Allow blocking queue tag iter callbacks") # v4.19.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq-tag.c | 10 +++++-----
 block/blk-mq.c     |  5 +----
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index a4931fc7be8a..89f479634b4d 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -383,14 +383,13 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
 
 	/*
 	 * __blk_mq_update_nr_hw_queues() updates nr_hw_queues and queue_hw_ctx
-	 * while the queue is frozen. So we can use q_usage_counter to avoid
-	 * racing with it. __blk_mq_update_nr_hw_queues() uses
-	 * synchronize_rcu() to ensure this function left the critical section
-	 * below.
+	 * while the queue is frozen. Hold q_usage_counter to serialize
+	 * __blk_mq_update_nr_hw_queues() against this function.
 	 */
 	if (!percpu_ref_tryget(&q->q_usage_counter))
 		return;
-
+	if (atomic_read(&q->mq_freeze_depth))
+		goto out;
 	queue_for_each_hw_ctx(q, hctx, i) {
 		struct blk_mq_tags *tags = hctx->tags;
 
@@ -405,6 +404,7 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
 			bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
 		bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
 	}
+out:
 	blk_queue_exit(q);
 }
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 652d0c6d5945..f9fc1536408d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3226,10 +3226,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 
 	list_for_each_entry(q, &set->tag_list, tag_set_list)
 		blk_mq_freeze_queue(q);
-	/*
-	 * Sync with blk_mq_queue_tag_busy_iter.
-	 */
-	synchronize_rcu();
+
 	/*
 	 * Switch IO scheduler to 'none', cleaning up the data associated
 	 * with the previous scheduler. We will switch back once we are done
-- 
2.21.0.196.g041f5ea1cf98


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] block: Fix a race between tag iteration and hardware queue changes
  2019-04-02 22:32 [PATCH] block: Fix a race between tag iteration and hardware queue changes Bart Van Assche
@ 2019-04-03  1:02 ` Dongli Zhang
  2019-04-03  1:05   ` Bart Van Assche
  2019-04-03  1:37 ` Dongli Zhang
  2019-04-03  3:26 ` jianchao.wang
  2 siblings, 1 reply; 6+ messages in thread
From: Dongli Zhang @ 2019-04-03  1:02 UTC (permalink / raw)
  To: Bart Van Assche, Jianchao Wang
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Christoph Hellwig,
	Hannes Reinecke, James Smart, Ming Lei, Keith Busch, stable



On 4/3/19 6:32 AM, Bart Van Assche wrote:
> Since the callback function called by blk_mq_queue_tag_busy_iter()
> may sleep calling synchronize_rcu() from __blk_mq_update_nr_hw_queues()
> is not sufficient to wait until blk_mq_queue_tag_busy_iter() has
> finished. Instead of making __blk_mq_update_nr_hw_queues() wait until
> q->q_usage_counter == 0 is globally visible, do not iterate over tags
> if the request queue is frozen.
> 
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: James Smart <james.smart@broadcom.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Dongli Zhang <dongli.zhang@oracle.com>
> Cc: <stable@vger.kernel.org>
> Fixes: 530ca2c9bd69 ("blk-mq: Allow blocking queue tag iter callbacks") # v4.19.
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-mq-tag.c | 10 +++++-----
>  block/blk-mq.c     |  5 +----
>  2 files changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index a4931fc7be8a..89f479634b4d 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -383,14 +383,13 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
>  
>  	/*
>  	 * __blk_mq_update_nr_hw_queues() updates nr_hw_queues and queue_hw_ctx
> -	 * while the queue is frozen. So we can use q_usage_counter to avoid
> -	 * racing with it. __blk_mq_update_nr_hw_queues() uses
> -	 * synchronize_rcu() to ensure this function left the critical section
> -	 * below.
> +	 * while the queue is frozen. Hold q_usage_counter to serialize
> +	 * __blk_mq_update_nr_hw_queues() against this function.
>  	 */
>  	if (!percpu_ref_tryget(&q->q_usage_counter))
>  		return;
> -
> +	if (atomic_read(&q->mq_freeze_depth))
> +		goto out;
>  	queue_for_each_hw_ctx(q, hctx, i) {
>  		struct blk_mq_tags *tags = hctx->tags;
>  
> @@ -405,6 +404,7 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
>  			bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
>  		bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
>  	}
> +out:
>  	blk_queue_exit(q);
>  }
>  
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 652d0c6d5945..f9fc1536408d 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3226,10 +3226,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
>  
>  	list_for_each_entry(q, &set->tag_list, tag_set_list)
>  		blk_mq_freeze_queue(q);
> -	/*
> -	 * Sync with blk_mq_queue_tag_busy_iter.
> -	 */
> -	synchronize_rcu();

I used to see the removal of above in one of Jianchao's patchset.

https://lore.kernel.org/patchwork/patch/1051047/

Dongli Zhang

> +
>  	/*
>  	 * Switch IO scheduler to 'none', cleaning up the data associated
>  	 * with the previous scheduler. We will switch back once we are done
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block: Fix a race between tag iteration and hardware queue changes
  2019-04-03  1:02 ` Dongli Zhang
@ 2019-04-03  1:05   ` Bart Van Assche
  0 siblings, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2019-04-03  1:05 UTC (permalink / raw)
  To: Dongli Zhang, Jianchao Wang
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Christoph Hellwig,
	Hannes Reinecke, James Smart, Ming Lei, Keith Busch, stable

On 4/2/19 6:02 PM, Dongli Zhang wrote:
> I used to see the removal of above in one of Jianchao's patchset.
> 
> https://lore.kernel.org/patchwork/patch/1051047/

Thanks for the feedback. But it seems like the q->mq_freeze_depth test 
is missing from Jianchao's patch ...

Bart.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block: Fix a race between tag iteration and hardware queue changes
  2019-04-02 22:32 [PATCH] block: Fix a race between tag iteration and hardware queue changes Bart Van Assche
  2019-04-03  1:02 ` Dongli Zhang
@ 2019-04-03  1:37 ` Dongli Zhang
  2019-04-03  3:26 ` jianchao.wang
  2 siblings, 0 replies; 6+ messages in thread
From: Dongli Zhang @ 2019-04-03  1:37 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Christoph Hellwig,
	Hannes Reinecke, James Smart, Ming Lei, Jianchao Wang,
	Keith Busch, stable

Hi Bart,

On 4/3/19 6:32 AM, Bart Van Assche wrote:
> Since the callback function called by blk_mq_queue_tag_busy_iter()
> may sleep calling synchronize_rcu() from __blk_mq_update_nr_hw_queues()
> is not sufficient to wait until blk_mq_queue_tag_busy_iter() has
> finished. Instead of making __blk_mq_update_nr_hw_queues() wait until
> q->q_usage_counter == 0 is globally visible, do not iterate over tags
> if the request queue is frozen.>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: James Smart <james.smart@broadcom.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Dongli Zhang <dongli.zhang@oracle.com>
> Cc: <stable@vger.kernel.org>
> Fixes: 530ca2c9bd69 ("blk-mq: Allow blocking queue tag iter callbacks") # v4.19.
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-mq-tag.c | 10 +++++-----
>  block/blk-mq.c     |  5 +----
>  2 files changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index a4931fc7be8a..89f479634b4d 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -383,14 +383,13 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
>  
>  	/*
>  	 * __blk_mq_update_nr_hw_queues() updates nr_hw_queues and queue_hw_ctx
> -	 * while the queue is frozen. So we can use q_usage_counter to avoid
> -	 * racing with it. __blk_mq_update_nr_hw_queues() uses
> -	 * synchronize_rcu() to ensure this function left the critical section
> -	 * below.
> +	 * while the queue is frozen. Hold q_usage_counter to serialize
> +	 * __blk_mq_update_nr_hw_queues() against this function.
>  	 */
>  	if (!percpu_ref_tryget(&q->q_usage_counter))
>  		return;
> -
> +	if (atomic_read(&q->mq_freeze_depth))
> +		goto out;
>  	queue_for_each_hw_ctx(q, hctx, i) {
>  		struct blk_mq_tags *tags = hctx->tags;
>  
> @@ -405,6 +404,7 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
>  			bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
>  		bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
>  	}
> +out:
>  	blk_queue_exit(q);

If callback function goes to sleep, we would not be able to reach at
blk_queue_exit() to dec q->q_usage_counter.

On the other side of __blk_mq_update_nr_hw_queues(), once blk_mq_freeze_queue()
is passed, __PERCPU_REF_DEAD is set and q->q_usage_counter should always be 0.
Otherwise, blk_mq_freeze_queue_wait() would not move forward.

I think we would even not be able to call the callback (which might sleep) if
blk_mq_freeze_queue() is already passed?

Why we need extra check of q->mq_freeze_depth? Is it an optimization?

Please let me know if my understanding is incorrect.

Thank you very much!

>  }
>  
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 652d0c6d5945..f9fc1536408d 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3226,10 +3226,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
>  
>  	list_for_each_entry(q, &set->tag_list, tag_set_list)
>  		blk_mq_freeze_queue(q);
> -	/*
> -	 * Sync with blk_mq_queue_tag_busy_iter.
> -	 */
> -	synchronize_rcu();
> +
>  	/*
>  	 * Switch IO scheduler to 'none', cleaning up the data associated
>  	 * with the previous scheduler. We will switch back once we are done
> 

Dongli Zhang

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block: Fix a race between tag iteration and hardware queue changes
  2019-04-02 22:32 [PATCH] block: Fix a race between tag iteration and hardware queue changes Bart Van Assche
  2019-04-03  1:02 ` Dongli Zhang
  2019-04-03  1:37 ` Dongli Zhang
@ 2019-04-03  3:26 ` jianchao.wang
  2019-04-03 15:30   ` Bart Van Assche
  2 siblings, 1 reply; 6+ messages in thread
From: jianchao.wang @ 2019-04-03  3:26 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Christoph Hellwig,
	Hannes Reinecke, James Smart, Ming Lei, Keith Busch,
	Dongli Zhang, stable

Hi Bart

On 4/3/19 6:32 AM, Bart Van Assche wrote:
> Since the callback function called by blk_mq_queue_tag_busy_iter()
> may sleep calling synchronize_rcu() from __blk_mq_update_nr_hw_queues()
> is not sufficient to wait until blk_mq_queue_tag_busy_iter() has
> finished. Instead of making __blk_mq_update_nr_hw_queues() wait until
> q->q_usage_counter == 0 is globally visible, do not iterate over tags
> if the request queue is frozen.
> 
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: James Smart <james.smart@broadcom.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
> Cc: Keith Busch <keith.busch@intel.com>
> Cc: Dongli Zhang <dongli.zhang@oracle.com>
> Cc: <stable@vger.kernel.org>
> Fixes: 530ca2c9bd69 ("blk-mq: Allow blocking queue tag iter callbacks") # v4.19.
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-mq-tag.c | 10 +++++-----
>  block/blk-mq.c     |  5 +----
>  2 files changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
> index a4931fc7be8a..89f479634b4d 100644
> --- a/block/blk-mq-tag.c
> +++ b/block/blk-mq-tag.c
> @@ -383,14 +383,13 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
>  
>  	/*
>  	 * __blk_mq_update_nr_hw_queues() updates nr_hw_queues and queue_hw_ctx
> -	 * while the queue is frozen. So we can use q_usage_counter to avoid
> -	 * racing with it. __blk_mq_update_nr_hw_queues() uses
> -	 * synchronize_rcu() to ensure this function left the critical section
> -	 * below.
> +	 * while the queue is frozen. Hold q_usage_counter to serialize
> +	 * __blk_mq_update_nr_hw_queues() against this function.
>  	 */
>  	if (!percpu_ref_tryget(&q->q_usage_counter))
>  		return;
> -
> +	if (atomic_read(&q->mq_freeze_depth))
> +		goto out;

percpu_ref_tryget should be enough here.

If the refcount is not zero, get it and the process of updating nr_hw_queues will wait, and we will
be safe.

If the refcount has been zeroed, it says that there is no any requests and blk_mq_queue_tag_busy_iter
needn't to do anything, so quit is OK.

Add a atomic_read(&q->mq_freeze_depth) here is unnecessary and may cause deadlock.

For example, the recovery process want to freeze and drain the queue, but the device has been dead.
We have to depend on timeout work to finish the in-flight requests with blk_mq_queue_tag_busy_iter.
But now, it cannot do nothing because this mq_freeze_depth checking.

The synchronize_rcu is indeed unnecessary after we introduce percpu_ref_tryget here.

Thanks
Jianchao


>  	queue_for_each_hw_ctx(q, hctx, i) {
>  		struct blk_mq_tags *tags = hctx->tags;
>  
> @@ -405,6 +404,7 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
>  			bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
>  		bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
>  	}
> +out:
>  	blk_queue_exit(q);
>  }
>  
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 652d0c6d5945..f9fc1536408d 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3226,10 +3226,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
>  
>  	list_for_each_entry(q, &set->tag_list, tag_set_list)
>  		blk_mq_freeze_queue(q);
> -	/*
> -	 * Sync with blk_mq_queue_tag_busy_iter.
> -	 */
> -	synchronize_rcu();
> +
>  	/*
>  	 * Switch IO scheduler to 'none', cleaning up the data associated
>  	 * with the previous scheduler. We will switch back once we are done
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] block: Fix a race between tag iteration and hardware queue changes
  2019-04-03  3:26 ` jianchao.wang
@ 2019-04-03 15:30   ` Bart Van Assche
  0 siblings, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2019-04-03 15:30 UTC (permalink / raw)
  To: jianchao.wang, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Christoph Hellwig,
	Hannes Reinecke, James Smart, Ming Lei, Keith Busch,
	Dongli Zhang, stable

On Wed, 2019-04-03 at 11:26 +0800, jianchao.wang wrote:
> percpu_ref_tryget should be enough here.

Having thought further about this, I think you are right. I will drop this
patch.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-04-03 15:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-02 22:32 [PATCH] block: Fix a race between tag iteration and hardware queue changes Bart Van Assche
2019-04-03  1:02 ` Dongli Zhang
2019-04-03  1:05   ` Bart Van Assche
2019-04-03  1:37 ` Dongli Zhang
2019-04-03  3:26 ` jianchao.wang
2019-04-03 15:30   ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.