linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* tear down file system I/O in del_gendisk
@ 2021-09-20 11:24 Christoph Hellwig
  2021-09-20 11:24 ` [PATCH 1/4] block: factor out a blk_try_enter_queue helper Christoph Hellwig
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Christoph Hellwig @ 2021-09-20 11:24 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Tejun Heo, linux-block, Ming Lei

Ming reported that for SCSI we have a lifetime problem now that
the BDI moved from the request_queue to the disk as del_gendisk
doesn't finish all outstanding file system I/O.  It turns out
this actually is an older problem, although the case where it could
be hit before was very unusual (unbinding of a SCSI upper driver
while the scsi_device stays around).  This series fixes this by
draining all I/O in del_gendisk.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] block: factor out a blk_try_enter_queue helper
  2021-09-20 11:24 tear down file system I/O in del_gendisk Christoph Hellwig
@ 2021-09-20 11:24 ` Christoph Hellwig
  2021-09-21  3:17   ` Bart Van Assche
  2021-09-20 11:24 ` [PATCH 2/4] block: split bio_queue_enter from blk_queue_enter Christoph Hellwig
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2021-09-20 11:24 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Tejun Heo, linux-block, Ming Lei

Factor out the code to try to get q_usage_counter without blocking into
a separate helper.  Both to improve code readability and to prepare for
splitting bio_queue_enter from blk_queue_enter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c | 60 ++++++++++++++++++++++++++----------------------
 1 file changed, 32 insertions(+), 28 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 5454db2fa263b..4c078681f39b8 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -416,6 +416,30 @@ void blk_cleanup_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_cleanup_queue);
 
+static bool blk_try_enter_queue(struct request_queue *q, bool pm)
+{
+	rcu_read_lock();
+	if (!percpu_ref_tryget_live(&q->q_usage_counter))
+		goto fail;
+
+	/*
+	 * The code that increments the pm_only counter must ensure that the
+	 * counter is globally visible before the queue is unfrozen.
+	 */
+	if (blk_queue_pm_only(q) &&
+	    (!pm || queue_rpm_status(q) == RPM_SUSPENDED))
+		goto fail_put;
+
+	rcu_read_unlock();
+	return true;
+
+fail_put:
+	percpu_ref_put(&q->q_usage_counter);
+fail:
+	rcu_read_unlock();
+	return false;
+}
+
 /**
  * blk_queue_enter() - try to increase q->q_usage_counter
  * @q: request queue pointer
@@ -425,40 +449,18 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 {
 	const bool pm = flags & BLK_MQ_REQ_PM;
 
-	while (true) {
-		bool success = false;
-
-		rcu_read_lock();
-		if (percpu_ref_tryget_live(&q->q_usage_counter)) {
-			/*
-			 * The code that increments the pm_only counter is
-			 * responsible for ensuring that that counter is
-			 * globally visible before the queue is unfrozen.
-			 */
-			if ((pm && queue_rpm_status(q) != RPM_SUSPENDED) ||
-			    !blk_queue_pm_only(q)) {
-				success = true;
-			} else {
-				percpu_ref_put(&q->q_usage_counter);
-			}
-		}
-		rcu_read_unlock();
-
-		if (success)
-			return 0;
-
+	while (!blk_try_enter_queue(q, pm)) {
 		if (flags & BLK_MQ_REQ_NOWAIT)
 			return -EBUSY;
 
 		/*
-		 * read pair of barrier in blk_freeze_queue_start(),
-		 * we need to order reading __PERCPU_REF_DEAD flag of
-		 * .q_usage_counter and reading .mq_freeze_depth or
-		 * queue dying flag, otherwise the following wait may
-		 * never return if the two reads are reordered.
+		 * read pair of barrier in blk_freeze_queue_start(), we need to
+		 * order reading __PERCPU_REF_DEAD flag of .q_usage_counter and
+		 * reading .mq_freeze_depth or queue dying flag, otherwise the
+		 * following wait may never return if the two reads are
+		 * reordered.
 		 */
 		smp_rmb();
-
 		wait_event(q->mq_freeze_wq,
 			   (!q->mq_freeze_depth &&
 			    blk_pm_resume_queue(pm, q)) ||
@@ -466,6 +468,8 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 		if (blk_queue_dying(q))
 			return -ENODEV;
 	}
+
+	return 0;
 }
 
 static inline int bio_queue_enter(struct bio *bio)
-- 
2.30.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/4] block: split bio_queue_enter from blk_queue_enter
  2021-09-20 11:24 tear down file system I/O in del_gendisk Christoph Hellwig
  2021-09-20 11:24 ` [PATCH 1/4] block: factor out a blk_try_enter_queue helper Christoph Hellwig
@ 2021-09-20 11:24 ` Christoph Hellwig
  2021-09-21  3:25   ` Bart Van Assche
  2021-09-20 11:24 ` [PATCH 3/4] block: drain file system I/O on del_gendisk Christoph Hellwig
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2021-09-20 11:24 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Tejun Heo, linux-block, Ming Lei

To prepare for fixing a gendisk shutdown race, open code the
blk_queue_enter logic in bio_queue_enter.  This also remove the
pointless flags translation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 4c078681f39b8..be7cd1819b605 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -475,18 +475,32 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 static inline int bio_queue_enter(struct bio *bio)
 {
 	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
-	bool nowait = bio->bi_opf & REQ_NOWAIT;
-	int ret;
 
-	ret = blk_queue_enter(q, nowait ? BLK_MQ_REQ_NOWAIT : 0);
-	if (unlikely(ret)) {
-		if (nowait && !blk_queue_dying(q))
+	while (!blk_try_enter_queue(q, false)) {
+		if (bio->bi_opf & REQ_NOWAIT) {
 			bio_wouldblock_error(bio);
-		else
+			return -EBUSY;
+		}
+
+		/*
+		 * read pair of barrier in blk_freeze_queue_start(), we need to
+		 * order reading __PERCPU_REF_DEAD flag of .q_usage_counter and
+		 * reading .mq_freeze_depth or queue dying flag, otherwise the
+		 * following wait may never return if the two reads are
+		 * reordered.
+		 */
+		smp_rmb();
+		wait_event(q->mq_freeze_wq,
+			   (!q->mq_freeze_depth &&
+			    blk_pm_resume_queue(false, q)) ||
+			   blk_queue_dying(q));
+		if (blk_queue_dying(q)) {
 			bio_io_error(bio);
+			return -ENODEV;
+		}
 	}
 
-	return ret;
+	return 0;
 }
 
 void blk_queue_exit(struct request_queue *q)
-- 
2.30.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/4] block: drain file system I/O on del_gendisk
  2021-09-20 11:24 tear down file system I/O in del_gendisk Christoph Hellwig
  2021-09-20 11:24 ` [PATCH 1/4] block: factor out a blk_try_enter_queue helper Christoph Hellwig
  2021-09-20 11:24 ` [PATCH 2/4] block: split bio_queue_enter from blk_queue_enter Christoph Hellwig
@ 2021-09-20 11:24 ` Christoph Hellwig
  2021-09-20 11:24 ` [PATCH 4/4] block: keep q_usage_counter in atomic mode after del_gendisk Christoph Hellwig
  2021-09-21  3:38 ` tear down file system I/O in del_gendisk Bart Van Assche
  4 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2021-09-20 11:24 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Tejun Heo, linux-block, Ming Lei

Instead of delaying draining of file system I/O related items like the
blk-qos queues, the integrity read workqueue and timeouts only when the
request_queue is removed, do that when del_gendisk is called.  This is
important for SCSI where the upper level drivers that control the gendisk
are separate entities, and the disk can be freed much earlier than the
request_queue, or can even be unbound without tearing down the queue.

Fixes: edb0872f44ec ("block: move the bdi from the request_queue to the gendisk")
Reported-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c      | 25 +++++++++++--------------
 block/blk.h           |  1 +
 block/genhd.c         | 21 +++++++++++++++++++++
 include/linux/genhd.h |  1 +
 4 files changed, 34 insertions(+), 14 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index be7cd1819b605..0b03f050a95cd 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -49,7 +49,6 @@
 #include "blk-mq.h"
 #include "blk-mq-sched.h"
 #include "blk-pm.h"
-#include "blk-rq-qos.h"
 
 struct dentry *blk_debugfs_root;
 
@@ -337,23 +336,25 @@ void blk_put_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_put_queue);
 
-void blk_set_queue_dying(struct request_queue *q)
+void blk_queue_start_drain(struct request_queue *q)
 {
-	blk_queue_flag_set(QUEUE_FLAG_DYING, q);
-
 	/*
 	 * When queue DYING flag is set, we need to block new req
 	 * entering queue, so we call blk_freeze_queue_start() to
 	 * prevent I/O from crossing blk_queue_enter().
 	 */
 	blk_freeze_queue_start(q);
-
 	if (queue_is_mq(q))
 		blk_mq_wake_waiters(q);
-
 	/* Make blk_queue_enter() reexamine the DYING flag. */
 	wake_up_all(&q->mq_freeze_wq);
 }
+
+void blk_set_queue_dying(struct request_queue *q)
+{
+	blk_queue_flag_set(QUEUE_FLAG_DYING, q);
+	blk_queue_start_drain(q);
+}
 EXPORT_SYMBOL_GPL(blk_set_queue_dying);
 
 /**
@@ -385,13 +386,8 @@ void blk_cleanup_queue(struct request_queue *q)
 	 */
 	blk_freeze_queue(q);
 
-	rq_qos_exit(q);
-
 	blk_queue_flag_set(QUEUE_FLAG_DEAD, q);
 
-	/* for synchronous bio-based driver finish in-flight integrity i/o */
-	blk_flush_integrity();
-
 	blk_sync_queue(q);
 	if (queue_is_mq(q))
 		blk_mq_exit_queue(q);
@@ -474,7 +470,8 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 
 static inline int bio_queue_enter(struct bio *bio)
 {
-	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
+	struct gendisk *disk = bio->bi_bdev->bd_disk;
+	struct request_queue *q = disk->queue;
 
 	while (!blk_try_enter_queue(q, false)) {
 		if (bio->bi_opf & REQ_NOWAIT) {
@@ -493,8 +490,8 @@ static inline int bio_queue_enter(struct bio *bio)
 		wait_event(q->mq_freeze_wq,
 			   (!q->mq_freeze_depth &&
 			    blk_pm_resume_queue(false, q)) ||
-			   blk_queue_dying(q));
-		if (blk_queue_dying(q)) {
+			   test_bit(GD_DEAD, &disk->state));
+		if (test_bit(GD_DEAD, &disk->state)) {
 			bio_io_error(bio);
 			return -ENODEV;
 		}
diff --git a/block/blk.h b/block/blk.h
index 7d2a0ba7ed21d..e2ed2257709ae 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -51,6 +51,7 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size,
 void blk_free_flush_queue(struct blk_flush_queue *q);
 
 void blk_freeze_queue(struct request_queue *q);
+void blk_queue_start_drain(struct request_queue *q);
 
 #define BIO_INLINE_VECS 4
 struct bio_vec *bvec_alloc(mempool_t *pool, unsigned short *nr_vecs,
diff --git a/block/genhd.c b/block/genhd.c
index 7b6e5e1cf9564..b3c33495d7208 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -26,6 +26,7 @@
 #include <linux/badblocks.h>
 
 #include "blk.h"
+#include "blk-rq-qos.h"
 
 static struct kobject *block_depr;
 
@@ -559,6 +560,8 @@ EXPORT_SYMBOL(device_add_disk);
  */
 void del_gendisk(struct gendisk *disk)
 {
+	struct request_queue *q = disk->queue;
+
 	might_sleep();
 
 	if (WARN_ON_ONCE(!disk_live(disk) && !(disk->flags & GENHD_FL_HIDDEN)))
@@ -575,8 +578,26 @@ void del_gendisk(struct gendisk *disk)
 	fsync_bdev(disk->part0);
 	__invalidate_device(disk->part0, true);
 
+	/*
+	 * Fail any new I/O.
+	 */
+	set_bit(GD_DEAD, &disk->state);
 	set_capacity(disk, 0);
 
+	/*
+	 * Prevent new I/O from crossing bio_queue_enter().
+	 */
+	blk_queue_start_drain(q);
+	blk_mq_freeze_queue_wait(q);
+
+	rq_qos_exit(q);
+	blk_sync_queue(q);
+	blk_flush_integrity();
+	/*
+	 * Allow using passthrough request again after the queue is torn down.
+	 */
+	blk_mq_unfreeze_queue(q);
+
 	if (!(disk->flags & GENHD_FL_HIDDEN)) {
 		sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
 
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index c68d83c87f83f..0f5315c2b5a34 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -149,6 +149,7 @@ struct gendisk {
 	unsigned long state;
 #define GD_NEED_PART_SCAN		0
 #define GD_READ_ONLY			1
+#define GD_DEAD				2
 
 	struct mutex open_mutex;	/* open/close mutex */
 	unsigned open_partitions;	/* number of open partitions */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 4/4] block: keep q_usage_counter in atomic mode after del_gendisk
  2021-09-20 11:24 tear down file system I/O in del_gendisk Christoph Hellwig
                   ` (2 preceding siblings ...)
  2021-09-20 11:24 ` [PATCH 3/4] block: drain file system I/O on del_gendisk Christoph Hellwig
@ 2021-09-20 11:24 ` Christoph Hellwig
  2021-09-21  3:29   ` Bart Van Assche
  2021-09-21  3:38 ` tear down file system I/O in del_gendisk Bart Van Assche
  4 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2021-09-20 11:24 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Tejun Heo, linux-block, Ming Lei

Don't switch back to percpu mode to avoid the double RCU grace period
when tearing down SCSI devices.  After removing the disk only passthrough
commands can be send anyway.

Suggested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq.c | 9 ++++++++-
 block/blk.h    | 1 +
 block/genhd.c  | 1 +
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 108a352051be5..bc026372de439 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -188,9 +188,11 @@ void blk_mq_freeze_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue);
 
-void blk_mq_unfreeze_queue(struct request_queue *q)
+void __blk_mq_unfreeze_queue(struct request_queue *q, bool force_atomic)
 {
 	mutex_lock(&q->mq_freeze_lock);
+	if (force_atomic)
+		q->q_usage_counter.data->force_atomic = true;
 	q->mq_freeze_depth--;
 	WARN_ON_ONCE(q->mq_freeze_depth < 0);
 	if (!q->mq_freeze_depth) {
@@ -199,6 +201,11 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
 	}
 	mutex_unlock(&q->mq_freeze_lock);
 }
+
+void blk_mq_unfreeze_queue(struct request_queue *q)
+{
+	__blk_mq_unfreeze_queue(q, false);
+}
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
 
 /*
diff --git a/block/blk.h b/block/blk.h
index e2ed2257709ae..6c3c00a8fe19d 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -51,6 +51,7 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size,
 void blk_free_flush_queue(struct blk_flush_queue *q);
 
 void blk_freeze_queue(struct request_queue *q);
+void __blk_mq_unfreeze_queue(struct request_queue *q, bool force_atomic);
 void blk_queue_start_drain(struct request_queue *q);
 
 #define BIO_INLINE_VECS 4
diff --git a/block/genhd.c b/block/genhd.c
index b3c33495d7208..fe23085ddddd6 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -596,6 +596,7 @@ void del_gendisk(struct gendisk *disk)
 	/*
 	 * Allow using passthrough request again after the queue is torn down.
 	 */
+	blk_queue_flag_clear(QUEUE_FLAG_INIT_DONE, q);
 	blk_mq_unfreeze_queue(q);
 
 	if (!(disk->flags & GENHD_FL_HIDDEN)) {
-- 
2.30.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/4] block: factor out a blk_try_enter_queue helper
  2021-09-20 11:24 ` [PATCH 1/4] block: factor out a blk_try_enter_queue helper Christoph Hellwig
@ 2021-09-21  3:17   ` Bart Van Assche
  0 siblings, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2021-09-21  3:17 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Tejun Heo, linux-block, Ming Lei

On 9/20/21 04:24, Christoph Hellwig wrote:
>   		smp_rmb();
> -
>   		wait_event(q->mq_freeze_wq,
>   			   (!q->mq_freeze_depth &&
>   			    blk_pm_resume_queue(pm, q)) ||

I prefer to keep this blank line. Otherwise this patch looks good to me.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/4] block: split bio_queue_enter from blk_queue_enter
  2021-09-20 11:24 ` [PATCH 2/4] block: split bio_queue_enter from blk_queue_enter Christoph Hellwig
@ 2021-09-21  3:25   ` Bart Van Assche
  0 siblings, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2021-09-21  3:25 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Tejun Heo, linux-block, Ming Lei

On 9/20/21 04:24, Christoph Hellwig wrote:
> To prepare for fixing a gendisk shutdown race, open code the
> blk_queue_enter logic in bio_queue_enter.  This also remove the
                                                        ^^^^^^
                                                        removes
> pointless flags translation.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-core.c | 28 +++++++++++++++++++++-------
>   1 file changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 4c078681f39b8..be7cd1819b605 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -475,18 +475,32 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
>   static inline int bio_queue_enter(struct bio *bio)
>   {
>   	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
> -	bool nowait = bio->bi_opf & REQ_NOWAIT;
> -	int ret;
>   
> -	ret = blk_queue_enter(q, nowait ? BLK_MQ_REQ_NOWAIT : 0);
> -	if (unlikely(ret)) {
> -		if (nowait && !blk_queue_dying(q))
> +	while (!blk_try_enter_queue(q, false)) {
> +		if (bio->bi_opf & REQ_NOWAIT) {
>   			bio_wouldblock_error(bio);
> -		else
> +			return -EBUSY;
> +		}
> +
> +		/*
> +		 * read pair of barrier in blk_freeze_queue_start(), we need to
> +		 * order reading __PERCPU_REF_DEAD flag of .q_usage_counter and
> +		 * reading .mq_freeze_depth or queue dying flag, otherwise the
> +		 * following wait may never return if the two reads are
> +		 * reordered.
> +		 */
> +		smp_rmb();
> +		wait_event(q->mq_freeze_wq,
> +			   (!q->mq_freeze_depth &&
> +			    blk_pm_resume_queue(false, q)) ||
> +			   blk_queue_dying(q));
> +		if (blk_queue_dying(q)) {
>   			bio_io_error(bio);
> +			return -ENODEV;
> +		}
>   	}
>   
> -	return ret;
> +	return 0;

This patch changes the behavior of bio_queue_enter(). I think that
preserving the existing behavior implies testing blk_queue_dying(q)
before checking the REQ_NOWAIT flag.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] block: keep q_usage_counter in atomic mode after del_gendisk
  2021-09-20 11:24 ` [PATCH 4/4] block: keep q_usage_counter in atomic mode after del_gendisk Christoph Hellwig
@ 2021-09-21  3:29   ` Bart Van Assche
  2021-09-21  9:07     ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2021-09-21  3:29 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Tejun Heo, linux-block, Ming Lei

On 9/20/21 04:24, Christoph Hellwig wrote:
> Don't switch back to percpu mode to avoid the double RCU grace period
> when tearing down SCSI devices.  After removing the disk only passthrough
> commands can be send anyway.
                   ^^^^
                   sent?

> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 108a352051be5..bc026372de439 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -188,9 +188,11 @@ void blk_mq_freeze_queue(struct request_queue *q)
>   }
>   EXPORT_SYMBOL_GPL(blk_mq_freeze_queue);
>   
> -void blk_mq_unfreeze_queue(struct request_queue *q)
> +void __blk_mq_unfreeze_queue(struct request_queue *q, bool force_atomic)
>   {
>   	mutex_lock(&q->mq_freeze_lock);
> +	if (force_atomic)
> +		q->q_usage_counter.data->force_atomic = true;
>   	q->mq_freeze_depth--;
>   	WARN_ON_ONCE(q->mq_freeze_depth < 0);
>   	if (!q->mq_freeze_depth) {
> @@ -199,6 +201,11 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
>   	}
>   	mutex_unlock(&q->mq_freeze_lock);
>   }
> +
> +void blk_mq_unfreeze_queue(struct request_queue *q)
> +{
> +	__blk_mq_unfreeze_queue(q, false);
> +}
>   EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
>   
>   /*
> diff --git a/block/blk.h b/block/blk.h
> index e2ed2257709ae..6c3c00a8fe19d 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -51,6 +51,7 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size,
>   void blk_free_flush_queue(struct blk_flush_queue *q);
>   
>   void blk_freeze_queue(struct request_queue *q);
> +void __blk_mq_unfreeze_queue(struct request_queue *q, bool force_atomic);
>   void blk_queue_start_drain(struct request_queue *q);
>   
>   #define BIO_INLINE_VECS 4
> diff --git a/block/genhd.c b/block/genhd.c
> index b3c33495d7208..fe23085ddddd6 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -596,6 +596,7 @@ void del_gendisk(struct gendisk *disk)
>   	/*
>   	 * Allow using passthrough request again after the queue is torn down.
>   	 */
> +	blk_queue_flag_clear(QUEUE_FLAG_INIT_DONE, q);
>   	blk_mq_unfreeze_queue(q);
>   
>   	if (!(disk->flags & GENHD_FL_HIDDEN)) {

I don't see any code that passes 'true' as second argument to
__blk_mq_unfreeze_queue()? Did I perhaps overlook something?

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: tear down file system I/O in del_gendisk
  2021-09-20 11:24 tear down file system I/O in del_gendisk Christoph Hellwig
                   ` (3 preceding siblings ...)
  2021-09-20 11:24 ` [PATCH 4/4] block: keep q_usage_counter in atomic mode after del_gendisk Christoph Hellwig
@ 2021-09-21  3:38 ` Bart Van Assche
  2021-09-21  9:08   ` Christoph Hellwig
  4 siblings, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2021-09-21  3:38 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Tejun Heo, linux-block, Ming Lei

On 9/20/21 04:24, Christoph Hellwig wrote:
> Ming reported that for SCSI we have a lifetime problem now that
> the BDI moved from the request_queue to the disk as del_gendisk
> doesn't finish all outstanding file system I/O.  It turns out
> this actually is an older problem, although the case where it could
> be hit before was very unusual (unbinding of a SCSI upper driver
> while the scsi_device stays around).  This series fixes this by
> draining all I/O in del_gendisk.

Several failures are reported when running blktests against Jens' for-next
branch if KASAN and lockdep are enabled. Is this patch series sufficient
to make blktests pass again?

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/4] block: keep q_usage_counter in atomic mode after del_gendisk
  2021-09-21  3:29   ` Bart Van Assche
@ 2021-09-21  9:07     ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2021-09-21  9:07 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Tejun Heo, linux-block, Ming Lei

On Mon, Sep 20, 2021 at 08:29:49PM -0700, Bart Van Assche wrote:
>>   	blk_mq_unfreeze_queue(q);
>>     	if (!(disk->flags & GENHD_FL_HIDDEN)) {
>
> I don't see any code that passes 'true' as second argument to
> __blk_mq_unfreeze_queue()? Did I perhaps overlook something?

No, I did not finish actually implementing the suggestion from
Ming properly, sorry.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: tear down file system I/O in del_gendisk
  2021-09-21  3:38 ` tear down file system I/O in del_gendisk Bart Van Assche
@ 2021-09-21  9:08   ` Christoph Hellwig
  2021-09-21 14:24     ` Bart Van Assche
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2021-09-21  9:08 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Tejun Heo, linux-block, Ming Lei

On Mon, Sep 20, 2021 at 08:38:31PM -0700, Bart Van Assche wrote:
> On 9/20/21 04:24, Christoph Hellwig wrote:
>> Ming reported that for SCSI we have a lifetime problem now that
>> the BDI moved from the request_queue to the disk as del_gendisk
>> doesn't finish all outstanding file system I/O.  It turns out
>> this actually is an older problem, although the case where it could
>> be hit before was very unusual (unbinding of a SCSI upper driver
>> while the scsi_device stays around).  This series fixes this by
>> draining all I/O in del_gendisk.
>
> Several failures are reported when running blktests against Jens' for-next
> branch if KASAN and lockdep are enabled. Is this patch series sufficient
> to make blktests pass again?

I don't see any new failures (I have a few consistent ones due to the
fact that blktests is completly fucked up and wants to load modules
everywhere which doesn't exactly work with builtin drivers).  Care
to post your issues?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: tear down file system I/O in del_gendisk
  2021-09-21  9:08   ` Christoph Hellwig
@ 2021-09-21 14:24     ` Bart Van Assche
  2021-09-21 22:33       ` Bart Van Assche
  0 siblings, 1 reply; 20+ messages in thread
From: Bart Van Assche @ 2021-09-21 14:24 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Tejun Heo, linux-block, Ming Lei

On 9/21/21 2:08 AM, Christoph Hellwig wrote:
> On Mon, Sep 20, 2021 at 08:38:31PM -0700, Bart Van Assche wrote:
>> On 9/20/21 04:24, Christoph Hellwig wrote:
>>> Ming reported that for SCSI we have a lifetime problem now that
>>> the BDI moved from the request_queue to the disk as del_gendisk
>>> doesn't finish all outstanding file system I/O.  It turns out
>>> this actually is an older problem, although the case where it could
>>> be hit before was very unusual (unbinding of a SCSI upper driver
>>> while the scsi_device stays around).  This series fixes this by
>>> draining all I/O in del_gendisk.
>>
>> Several failures are reported when running blktests against Jens' for-next
>> branch if KASAN and lockdep are enabled. Is this patch series sufficient
>> to make blktests pass again?
> 
> I don't see any new failures (I have a few consistent ones due to the
> fact that blktests is completly fucked up and wants to load modules
> everywhere which doesn't exactly work with builtin drivers).  Care
> to post your issues?

This is the first complaint I run into with KASAN enabled (this failure prevents
running more tests) for Jens' for-next branch merged with Linus' master branch:

root[4270]: run blktests block/010
null_blk: module loaded
==================================================================
BUG: KASAN: null-ptr-deref in null_map_queues+0x131/0x1a0 [null_blk]
Read of size 8 at addr 0000000000000000 by task modprobe/4320

CPU: 9 PID: 4320 Comm: modprobe Tainted: G            E     5.15.0-rc2-dbg+ #2
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
Call Trace:
  show_stack+0x52/0x58
  dump_stack_lvl+0x49/0x5e
  kasan_report.cold+0x64/0xdb
  __asan_load8+0x69/0x90
  null_map_queues+0x131/0x1a0 [null_blk]
  blk_mq_update_queue_map+0x122/0x1a0
  blk_mq_alloc_tag_set+0x1e8/0x570
  null_init_tag_set+0x197/0x220 [null_blk]
  null_init+0x1dc/0x1000 [null_blk]
  do_one_initcall+0xc7/0x440
  do_init_module+0x10a/0x3d0
  load_module+0x115c/0x1220
  __do_sys_finit_module+0x124/0x1a0
  __x64_sys_finit_module+0x42/0x50
  do_syscall_64+0x35/0xb0
  entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fa873062d6d
Code: c9 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d3 80 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007fff2aa058b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa873062d6d
RDX: 0000000000000000 RSI: 00005613ba37f750 RDI: 0000000000000003
RBP: 00005613ba37f540 R08: 0000000000000000 R09: 0000000000000050
R10: 0000000000000003 R11: 0000000000000246 R12: 00005613ba37f750
R13: 00005613ba37f7a0 R14: 00005613ba37f750 R15: 00005613ba37f460
==================================================================

This regression may have been introduced by commit 5f7acddf706c ("null_blk: poll
queue support").

Thanks,

Bart.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: tear down file system I/O in del_gendisk
  2021-09-21 14:24     ` Bart Van Assche
@ 2021-09-21 22:33       ` Bart Van Assche
  0 siblings, 0 replies; 20+ messages in thread
From: Bart Van Assche @ 2021-09-21 22:33 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Tejun Heo, linux-block, Ming Lei

On 9/21/21 7:24 AM, Bart Van Assche wrote:
> This regression may have been introduced by commit 5f7acddf706c ("null_blk: poll
> queue support").

(replying to my own email)

All the blktests tests that I ran pass if make the following changes:
- Revert the commit mentioned above.
- Use the soft-iWARP driver instead of the soft-RoCE driver (export use_siw=1).

Bart.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] block: drain file system I/O on del_gendisk
  2021-09-27 12:04         ` Christoph Hellwig
@ 2021-09-27 14:32           ` Ming Lei
  0 siblings, 0 replies; 20+ messages in thread
From: Ming Lei @ 2021-09-27 14:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Tejun Heo, linux-block

On Mon, Sep 27, 2021 at 02:04:41PM +0200, Christoph Hellwig wrote:
> On Thu, Sep 23, 2021 at 02:39:59PM +0800, Ming Lei wrote:
> > > > After blk_mq_unfreeze_queue() returns, blk_try_enter_queue() will return
> > > > true, so new FS I/O from opened bdev still won't be blocked, right?
> > > 
> > > It won't be blocked by blk_mq_unfreeze_queue, but because the capacity
> > > is set to 0 it still won't make it to the driver.
> > 
> > One bio could be made & checked before set_capacity(0), then is
> > submitted after blk_mq_unfreeze_queue() returns.
> > 
> > blk_mq_freeze_queue_wait() doesn't always imply RCU grace period, for
> > example, the .q_usage_counter may have been in atomic mode before
> > calling blk_queue_start_drain() & blk_mq_freeze_queue_wait().
> 
> True, but this isn't really a new issue as we never had proper quiesce
> behavior.  I have a bunch of ideas of how to make this actually solid,
> but none of them looks like 5.15 material.

It is new issue since edb0872f44ec ("block: move the bdi from the
request_queue to the gendisk").

Previously bdi has longer lifetime than request queue, but now it becomes
not, then either queue_to_disk() or queue->disk may cause panic.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] block: drain file system I/O on del_gendisk
  2021-09-23  6:39       ` Ming Lei
@ 2021-09-27 12:04         ` Christoph Hellwig
  2021-09-27 14:32           ` Ming Lei
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2021-09-27 12:04 UTC (permalink / raw)
  To: Ming Lei; +Cc: Christoph Hellwig, Jens Axboe, Tejun Heo, linux-block

On Thu, Sep 23, 2021 at 02:39:59PM +0800, Ming Lei wrote:
> > > After blk_mq_unfreeze_queue() returns, blk_try_enter_queue() will return
> > > true, so new FS I/O from opened bdev still won't be blocked, right?
> > 
> > It won't be blocked by blk_mq_unfreeze_queue, but because the capacity
> > is set to 0 it still won't make it to the driver.
> 
> One bio could be made & checked before set_capacity(0), then is
> submitted after blk_mq_unfreeze_queue() returns.
> 
> blk_mq_freeze_queue_wait() doesn't always imply RCU grace period, for
> example, the .q_usage_counter may have been in atomic mode before
> calling blk_queue_start_drain() & blk_mq_freeze_queue_wait().

True, but this isn't really a new issue as we never had proper quiesce
behavior.  I have a bunch of ideas of how to make this actually solid,
but none of them looks like 5.15 material.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] block: drain file system I/O on del_gendisk
  2021-09-23  5:27     ` Christoph Hellwig
@ 2021-09-23  6:39       ` Ming Lei
  2021-09-27 12:04         ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Ming Lei @ 2021-09-23  6:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Tejun Heo, linux-block

On Thu, Sep 23, 2021 at 07:27:05AM +0200, Christoph Hellwig wrote:
> On Thu, Sep 23, 2021 at 09:34:03AM +0800, Ming Lei wrote:
> > > +	set_bit(GD_DEAD, &disk->state);
> > >  	set_capacity(disk, 0);
> > >  
> > > +	/*
> > > +	 * Prevent new I/O from crossing bio_queue_enter().
> > > +	 */
> > > +	blk_queue_start_drain(q);
> > > +	blk_mq_freeze_queue_wait(q);
> > > +
> > > +	rq_qos_exit(q);
> > > +	blk_sync_queue(q);
> > > +	blk_flush_integrity();
> > > +	/*
> > > +	 * Allow using passthrough request again after the queue is torn down.
> > > +	 */
> > > +	blk_mq_unfreeze_queue(q);
> > 
> > After blk_mq_unfreeze_queue() returns, blk_try_enter_queue() will return
> > true, so new FS I/O from opened bdev still won't be blocked, right?
> 
> It won't be blocked by blk_mq_unfreeze_queue, but because the capacity
> is set to 0 it still won't make it to the driver.

One bio could be made & checked before set_capacity(0), then is
submitted after blk_mq_unfreeze_queue() returns.

blk_mq_freeze_queue_wait() doesn't always imply RCU grace period, for
example, the .q_usage_counter may have been in atomic mode before
calling blk_queue_start_drain() & blk_mq_freeze_queue_wait().


Thanks,
Ming


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] block: drain file system I/O on del_gendisk
  2021-09-22 17:22 ` [PATCH 3/4] block: drain file system I/O on del_gendisk Christoph Hellwig
  2021-09-23  1:34   ` Ming Lei
@ 2021-09-23  6:21   ` Hannes Reinecke
  1 sibling, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2021-09-23  6:21 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: Tejun Heo, linux-block, Ming Lei

On 9/22/21 7:22 PM, Christoph Hellwig wrote:
> Instead of delaying draining of file system I/O related items like the
> blk-qos queues, the integrity read workqueue and timeouts only when the
> request_queue is removed, do that when del_gendisk is called.  This is
> important for SCSI where the upper level drivers that control the gendisk
> are separate entities, and the disk can be freed much earlier than the
> request_queue, or can even be unbound without tearing down the queue.
> 
> Fixes: edb0872f44ec ("block: move the bdi from the request_queue to the gendisk")
> Reported-by: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-core.c      | 27 ++++++++++++---------------
>   block/blk.h           |  1 +
>   block/genhd.c         | 21 +++++++++++++++++++++
>   include/linux/genhd.h |  1 +
>   4 files changed, 35 insertions(+), 15 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index e951378855a02..d150d829a53c4 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -49,7 +49,6 @@
>   #include "blk-mq.h"
>   #include "blk-mq-sched.h"
>   #include "blk-pm.h"
> -#include "blk-rq-qos.h"
>   
>   struct dentry *blk_debugfs_root;
>   
> @@ -337,23 +336,25 @@ void blk_put_queue(struct request_queue *q)
>   }
>   EXPORT_SYMBOL(blk_put_queue);
>   
> -void blk_set_queue_dying(struct request_queue *q)
> +void blk_queue_start_drain(struct request_queue *q)
>   {
> -	blk_queue_flag_set(QUEUE_FLAG_DYING, q);
> -
>   	/*
>   	 * When queue DYING flag is set, we need to block new req
>   	 * entering queue, so we call blk_freeze_queue_start() to
>   	 * prevent I/O from crossing blk_queue_enter().
>   	 */
>   	blk_freeze_queue_start(q);
> -
>   	if (queue_is_mq(q))
>   		blk_mq_wake_waiters(q);
> -
>   	/* Make blk_queue_enter() reexamine the DYING flag. */
>   	wake_up_all(&q->mq_freeze_wq);
>   }
> +
> +void blk_set_queue_dying(struct request_queue *q)
> +{
> +	blk_queue_flag_set(QUEUE_FLAG_DYING, q);
> +	blk_queue_start_drain(q);
> +}
>   EXPORT_SYMBOL_GPL(blk_set_queue_dying);
>   
>   /**
> @@ -385,13 +386,8 @@ void blk_cleanup_queue(struct request_queue *q)
>   	 */
>   	blk_freeze_queue(q);
>   
> -	rq_qos_exit(q);
> -
>   	blk_queue_flag_set(QUEUE_FLAG_DEAD, q);
>   
> -	/* for synchronous bio-based driver finish in-flight integrity i/o */
> -	blk_flush_integrity();
> -
>   	blk_sync_queue(q);
>   	if (queue_is_mq(q))
>   		blk_mq_exit_queue(q);
> @@ -474,11 +470,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
>   
>   static inline int bio_queue_enter(struct bio *bio)
>   {
> -	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
> +	struct gendisk *disk = bio->bi_bdev->bd_disk;
> +	struct request_queue *q = disk->queue;
>   
>   	while (!blk_try_enter_queue(q, false)) {
>   		if (bio->bi_opf & REQ_NOWAIT) {
> -			if (blk_queue_dying(q))
> +			if (test_bit(GD_DEAD, &disk->state))
>   				goto dead;
>   			bio_wouldblock_error(bio);
>   			return -EBUSY;
> @@ -495,8 +492,8 @@ static inline int bio_queue_enter(struct bio *bio)
>   		wait_event(q->mq_freeze_wq,
>   			   (!q->mq_freeze_depth &&
>   			    blk_pm_resume_queue(false, q)) ||
> -			   blk_queue_dying(q));
> -		if (blk_queue_dying(q))
> +			   test_bit(GD_DEAD, &disk->state));
> +		if (test_bit(GD_DEAD, &disk->state))
>   			goto dead;
>   	}
>   
> diff --git a/block/blk.h b/block/blk.h
> index 7d2a0ba7ed21d..e2ed2257709ae 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -51,6 +51,7 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size,
>   void blk_free_flush_queue(struct blk_flush_queue *q);
>   
>   void blk_freeze_queue(struct request_queue *q);
> +void blk_queue_start_drain(struct request_queue *q);
>   
>   #define BIO_INLINE_VECS 4
>   struct bio_vec *bvec_alloc(mempool_t *pool, unsigned short *nr_vecs,
> diff --git a/block/genhd.c b/block/genhd.c
> index 7b6e5e1cf9564..b3c33495d7208 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -26,6 +26,7 @@
>   #include <linux/badblocks.h>
>   
>   #include "blk.h"
> +#include "blk-rq-qos.h"
>   
>   static struct kobject *block_depr;
>   
> @@ -559,6 +560,8 @@ EXPORT_SYMBOL(device_add_disk);
>    */
>   void del_gendisk(struct gendisk *disk)
>   {
> +	struct request_queue *q = disk->queue;
> +
>   	might_sleep();
>   
>   	if (WARN_ON_ONCE(!disk_live(disk) && !(disk->flags & GENHD_FL_HIDDEN)))
> @@ -575,8 +578,26 @@ void del_gendisk(struct gendisk *disk)
>   	fsync_bdev(disk->part0);
>   	__invalidate_device(disk->part0, true);
>   
> +	/*
> +	 * Fail any new I/O.
> +	 */
> +	set_bit(GD_DEAD, &disk->state);
>   	set_capacity(disk, 0);
>   

I always tend to become a bit nervous about simple 'set_bit' statements.
On the one side, it _might_ be already set (unlikely, I know).
But more importantly, 'set_bit' implies no memory barrier, so we have an 
inherent race condition.
So don't you need a barrier here to synchronize the 'state' variable 
update to all CPUs?
Or can't we just make it a

if (test_and_set_bit(GD_DEAD, &disk->state))
	WARN()

Hmm?

> +	/*
> +	 * Prevent new I/O from crossing bio_queue_enter().
> +	 */
> +	blk_queue_start_drain(q);
> +	blk_mq_freeze_queue_wait(q);
> +
> +	rq_qos_exit(q);
> +	blk_sync_queue(q);
> +	blk_flush_integrity();
> +	/*
> +	 * Allow using passthrough request again after the queue is torn down.
> +	 */
> +	blk_mq_unfreeze_queue(q);
> +
>   	if (!(disk->flags & GENHD_FL_HIDDEN)) {
>   		sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
>   
> diff --git a/include/linux/genhd.h b/include/linux/genhd.h
> index c68d83c87f83f..0f5315c2b5a34 100644
> --- a/include/linux/genhd.h
> +++ b/include/linux/genhd.h
> @@ -149,6 +149,7 @@ struct gendisk {
>   	unsigned long state;
>   #define GD_NEED_PART_SCAN		0
>   #define GD_READ_ONLY			1
> +#define GD_DEAD				2
>   
>   	struct mutex open_mutex;	/* open/close mutex */
>   	unsigned open_partitions;	/* number of open partitions */
> 

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] block: drain file system I/O on del_gendisk
  2021-09-23  1:34   ` Ming Lei
@ 2021-09-23  5:27     ` Christoph Hellwig
  2021-09-23  6:39       ` Ming Lei
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2021-09-23  5:27 UTC (permalink / raw)
  To: Ming Lei; +Cc: Christoph Hellwig, Jens Axboe, Tejun Heo, linux-block

On Thu, Sep 23, 2021 at 09:34:03AM +0800, Ming Lei wrote:
> > +	set_bit(GD_DEAD, &disk->state);
> >  	set_capacity(disk, 0);
> >  
> > +	/*
> > +	 * Prevent new I/O from crossing bio_queue_enter().
> > +	 */
> > +	blk_queue_start_drain(q);
> > +	blk_mq_freeze_queue_wait(q);
> > +
> > +	rq_qos_exit(q);
> > +	blk_sync_queue(q);
> > +	blk_flush_integrity();
> > +	/*
> > +	 * Allow using passthrough request again after the queue is torn down.
> > +	 */
> > +	blk_mq_unfreeze_queue(q);
> 
> After blk_mq_unfreeze_queue() returns, blk_try_enter_queue() will return
> true, so new FS I/O from opened bdev still won't be blocked, right?

It won't be blocked by blk_mq_unfreeze_queue, but because the capacity
is set to 0 it still won't make it to the driver.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/4] block: drain file system I/O on del_gendisk
  2021-09-22 17:22 ` [PATCH 3/4] block: drain file system I/O on del_gendisk Christoph Hellwig
@ 2021-09-23  1:34   ` Ming Lei
  2021-09-23  5:27     ` Christoph Hellwig
  2021-09-23  6:21   ` Hannes Reinecke
  1 sibling, 1 reply; 20+ messages in thread
From: Ming Lei @ 2021-09-23  1:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Tejun Heo, linux-block

On Wed, Sep 22, 2021 at 07:22:21PM +0200, Christoph Hellwig wrote:
> Instead of delaying draining of file system I/O related items like the
> blk-qos queues, the integrity read workqueue and timeouts only when the
> request_queue is removed, do that when del_gendisk is called.  This is
> important for SCSI where the upper level drivers that control the gendisk
> are separate entities, and the disk can be freed much earlier than the
> request_queue, or can even be unbound without tearing down the queue.
> 
> Fixes: edb0872f44ec ("block: move the bdi from the request_queue to the gendisk")
> Reported-by: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/blk-core.c      | 27 ++++++++++++---------------
>  block/blk.h           |  1 +
>  block/genhd.c         | 21 +++++++++++++++++++++
>  include/linux/genhd.h |  1 +
>  4 files changed, 35 insertions(+), 15 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index e951378855a02..d150d829a53c4 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -49,7 +49,6 @@
>  #include "blk-mq.h"
>  #include "blk-mq-sched.h"
>  #include "blk-pm.h"
> -#include "blk-rq-qos.h"
>  
>  struct dentry *blk_debugfs_root;
>  
> @@ -337,23 +336,25 @@ void blk_put_queue(struct request_queue *q)
>  }
>  EXPORT_SYMBOL(blk_put_queue);
>  
> -void blk_set_queue_dying(struct request_queue *q)
> +void blk_queue_start_drain(struct request_queue *q)
>  {
> -	blk_queue_flag_set(QUEUE_FLAG_DYING, q);
> -
>  	/*
>  	 * When queue DYING flag is set, we need to block new req
>  	 * entering queue, so we call blk_freeze_queue_start() to
>  	 * prevent I/O from crossing blk_queue_enter().
>  	 */
>  	blk_freeze_queue_start(q);
> -
>  	if (queue_is_mq(q))
>  		blk_mq_wake_waiters(q);
> -
>  	/* Make blk_queue_enter() reexamine the DYING flag. */
>  	wake_up_all(&q->mq_freeze_wq);
>  }
> +
> +void blk_set_queue_dying(struct request_queue *q)
> +{
> +	blk_queue_flag_set(QUEUE_FLAG_DYING, q);
> +	blk_queue_start_drain(q);
> +}
>  EXPORT_SYMBOL_GPL(blk_set_queue_dying);
>  
>  /**
> @@ -385,13 +386,8 @@ void blk_cleanup_queue(struct request_queue *q)
>  	 */
>  	blk_freeze_queue(q);
>  
> -	rq_qos_exit(q);
> -
>  	blk_queue_flag_set(QUEUE_FLAG_DEAD, q);
>  
> -	/* for synchronous bio-based driver finish in-flight integrity i/o */
> -	blk_flush_integrity();
> -
>  	blk_sync_queue(q);
>  	if (queue_is_mq(q))
>  		blk_mq_exit_queue(q);
> @@ -474,11 +470,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
>  
>  static inline int bio_queue_enter(struct bio *bio)
>  {
> -	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
> +	struct gendisk *disk = bio->bi_bdev->bd_disk;
> +	struct request_queue *q = disk->queue;
>  
>  	while (!blk_try_enter_queue(q, false)) {
>  		if (bio->bi_opf & REQ_NOWAIT) {
> -			if (blk_queue_dying(q))
> +			if (test_bit(GD_DEAD, &disk->state))
>  				goto dead;
>  			bio_wouldblock_error(bio);
>  			return -EBUSY;
> @@ -495,8 +492,8 @@ static inline int bio_queue_enter(struct bio *bio)
>  		wait_event(q->mq_freeze_wq,
>  			   (!q->mq_freeze_depth &&
>  			    blk_pm_resume_queue(false, q)) ||
> -			   blk_queue_dying(q));
> -		if (blk_queue_dying(q))
> +			   test_bit(GD_DEAD, &disk->state));
> +		if (test_bit(GD_DEAD, &disk->state))
>  			goto dead;
>  	}
>  
> diff --git a/block/blk.h b/block/blk.h
> index 7d2a0ba7ed21d..e2ed2257709ae 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -51,6 +51,7 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size,
>  void blk_free_flush_queue(struct blk_flush_queue *q);
>  
>  void blk_freeze_queue(struct request_queue *q);
> +void blk_queue_start_drain(struct request_queue *q);
>  
>  #define BIO_INLINE_VECS 4
>  struct bio_vec *bvec_alloc(mempool_t *pool, unsigned short *nr_vecs,
> diff --git a/block/genhd.c b/block/genhd.c
> index 7b6e5e1cf9564..b3c33495d7208 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -26,6 +26,7 @@
>  #include <linux/badblocks.h>
>  
>  #include "blk.h"
> +#include "blk-rq-qos.h"
>  
>  static struct kobject *block_depr;
>  
> @@ -559,6 +560,8 @@ EXPORT_SYMBOL(device_add_disk);
>   */
>  void del_gendisk(struct gendisk *disk)
>  {
> +	struct request_queue *q = disk->queue;
> +
>  	might_sleep();
>  
>  	if (WARN_ON_ONCE(!disk_live(disk) && !(disk->flags & GENHD_FL_HIDDEN)))
> @@ -575,8 +578,26 @@ void del_gendisk(struct gendisk *disk)
>  	fsync_bdev(disk->part0);
>  	__invalidate_device(disk->part0, true);
>  
> +	/*
> +	 * Fail any new I/O.
> +	 */
> +	set_bit(GD_DEAD, &disk->state);
>  	set_capacity(disk, 0);
>  
> +	/*
> +	 * Prevent new I/O from crossing bio_queue_enter().
> +	 */
> +	blk_queue_start_drain(q);
> +	blk_mq_freeze_queue_wait(q);
> +
> +	rq_qos_exit(q);
> +	blk_sync_queue(q);
> +	blk_flush_integrity();
> +	/*
> +	 * Allow using passthrough request again after the queue is torn down.
> +	 */
> +	blk_mq_unfreeze_queue(q);

After blk_mq_unfreeze_queue() returns, blk_try_enter_queue() will return
true, so new FS I/O from opened bdev still won't be blocked, right?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/4] block: drain file system I/O on del_gendisk
  2021-09-22 17:22 tear down file system I/O in del_gendisk v2 Christoph Hellwig
@ 2021-09-22 17:22 ` Christoph Hellwig
  2021-09-23  1:34   ` Ming Lei
  2021-09-23  6:21   ` Hannes Reinecke
  0 siblings, 2 replies; 20+ messages in thread
From: Christoph Hellwig @ 2021-09-22 17:22 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Tejun Heo, linux-block, Ming Lei

Instead of delaying draining of file system I/O related items like the
blk-qos queues, the integrity read workqueue and timeouts only when the
request_queue is removed, do that when del_gendisk is called.  This is
important for SCSI where the upper level drivers that control the gendisk
are separate entities, and the disk can be freed much earlier than the
request_queue, or can even be unbound without tearing down the queue.

Fixes: edb0872f44ec ("block: move the bdi from the request_queue to the gendisk")
Reported-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c      | 27 ++++++++++++---------------
 block/blk.h           |  1 +
 block/genhd.c         | 21 +++++++++++++++++++++
 include/linux/genhd.h |  1 +
 4 files changed, 35 insertions(+), 15 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index e951378855a02..d150d829a53c4 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -49,7 +49,6 @@
 #include "blk-mq.h"
 #include "blk-mq-sched.h"
 #include "blk-pm.h"
-#include "blk-rq-qos.h"
 
 struct dentry *blk_debugfs_root;
 
@@ -337,23 +336,25 @@ void blk_put_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_put_queue);
 
-void blk_set_queue_dying(struct request_queue *q)
+void blk_queue_start_drain(struct request_queue *q)
 {
-	blk_queue_flag_set(QUEUE_FLAG_DYING, q);
-
 	/*
 	 * When queue DYING flag is set, we need to block new req
 	 * entering queue, so we call blk_freeze_queue_start() to
 	 * prevent I/O from crossing blk_queue_enter().
 	 */
 	blk_freeze_queue_start(q);
-
 	if (queue_is_mq(q))
 		blk_mq_wake_waiters(q);
-
 	/* Make blk_queue_enter() reexamine the DYING flag. */
 	wake_up_all(&q->mq_freeze_wq);
 }
+
+void blk_set_queue_dying(struct request_queue *q)
+{
+	blk_queue_flag_set(QUEUE_FLAG_DYING, q);
+	blk_queue_start_drain(q);
+}
 EXPORT_SYMBOL_GPL(blk_set_queue_dying);
 
 /**
@@ -385,13 +386,8 @@ void blk_cleanup_queue(struct request_queue *q)
 	 */
 	blk_freeze_queue(q);
 
-	rq_qos_exit(q);
-
 	blk_queue_flag_set(QUEUE_FLAG_DEAD, q);
 
-	/* for synchronous bio-based driver finish in-flight integrity i/o */
-	blk_flush_integrity();
-
 	blk_sync_queue(q);
 	if (queue_is_mq(q))
 		blk_mq_exit_queue(q);
@@ -474,11 +470,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 
 static inline int bio_queue_enter(struct bio *bio)
 {
-	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
+	struct gendisk *disk = bio->bi_bdev->bd_disk;
+	struct request_queue *q = disk->queue;
 
 	while (!blk_try_enter_queue(q, false)) {
 		if (bio->bi_opf & REQ_NOWAIT) {
-			if (blk_queue_dying(q))
+			if (test_bit(GD_DEAD, &disk->state))
 				goto dead;
 			bio_wouldblock_error(bio);
 			return -EBUSY;
@@ -495,8 +492,8 @@ static inline int bio_queue_enter(struct bio *bio)
 		wait_event(q->mq_freeze_wq,
 			   (!q->mq_freeze_depth &&
 			    blk_pm_resume_queue(false, q)) ||
-			   blk_queue_dying(q));
-		if (blk_queue_dying(q))
+			   test_bit(GD_DEAD, &disk->state));
+		if (test_bit(GD_DEAD, &disk->state))
 			goto dead;
 	}
 
diff --git a/block/blk.h b/block/blk.h
index 7d2a0ba7ed21d..e2ed2257709ae 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -51,6 +51,7 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size,
 void blk_free_flush_queue(struct blk_flush_queue *q);
 
 void blk_freeze_queue(struct request_queue *q);
+void blk_queue_start_drain(struct request_queue *q);
 
 #define BIO_INLINE_VECS 4
 struct bio_vec *bvec_alloc(mempool_t *pool, unsigned short *nr_vecs,
diff --git a/block/genhd.c b/block/genhd.c
index 7b6e5e1cf9564..b3c33495d7208 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -26,6 +26,7 @@
 #include <linux/badblocks.h>
 
 #include "blk.h"
+#include "blk-rq-qos.h"
 
 static struct kobject *block_depr;
 
@@ -559,6 +560,8 @@ EXPORT_SYMBOL(device_add_disk);
  */
 void del_gendisk(struct gendisk *disk)
 {
+	struct request_queue *q = disk->queue;
+
 	might_sleep();
 
 	if (WARN_ON_ONCE(!disk_live(disk) && !(disk->flags & GENHD_FL_HIDDEN)))
@@ -575,8 +578,26 @@ void del_gendisk(struct gendisk *disk)
 	fsync_bdev(disk->part0);
 	__invalidate_device(disk->part0, true);
 
+	/*
+	 * Fail any new I/O.
+	 */
+	set_bit(GD_DEAD, &disk->state);
 	set_capacity(disk, 0);
 
+	/*
+	 * Prevent new I/O from crossing bio_queue_enter().
+	 */
+	blk_queue_start_drain(q);
+	blk_mq_freeze_queue_wait(q);
+
+	rq_qos_exit(q);
+	blk_sync_queue(q);
+	blk_flush_integrity();
+	/*
+	 * Allow using passthrough request again after the queue is torn down.
+	 */
+	blk_mq_unfreeze_queue(q);
+
 	if (!(disk->flags & GENHD_FL_HIDDEN)) {
 		sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
 
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index c68d83c87f83f..0f5315c2b5a34 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -149,6 +149,7 @@ struct gendisk {
 	unsigned long state;
 #define GD_NEED_PART_SCAN		0
 #define GD_READ_ONLY			1
+#define GD_DEAD				2
 
 	struct mutex open_mutex;	/* open/close mutex */
 	unsigned open_partitions;	/* number of open partitions */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-09-27 14:32 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-20 11:24 tear down file system I/O in del_gendisk Christoph Hellwig
2021-09-20 11:24 ` [PATCH 1/4] block: factor out a blk_try_enter_queue helper Christoph Hellwig
2021-09-21  3:17   ` Bart Van Assche
2021-09-20 11:24 ` [PATCH 2/4] block: split bio_queue_enter from blk_queue_enter Christoph Hellwig
2021-09-21  3:25   ` Bart Van Assche
2021-09-20 11:24 ` [PATCH 3/4] block: drain file system I/O on del_gendisk Christoph Hellwig
2021-09-20 11:24 ` [PATCH 4/4] block: keep q_usage_counter in atomic mode after del_gendisk Christoph Hellwig
2021-09-21  3:29   ` Bart Van Assche
2021-09-21  9:07     ` Christoph Hellwig
2021-09-21  3:38 ` tear down file system I/O in del_gendisk Bart Van Assche
2021-09-21  9:08   ` Christoph Hellwig
2021-09-21 14:24     ` Bart Van Assche
2021-09-21 22:33       ` Bart Van Assche
2021-09-22 17:22 tear down file system I/O in del_gendisk v2 Christoph Hellwig
2021-09-22 17:22 ` [PATCH 3/4] block: drain file system I/O on del_gendisk Christoph Hellwig
2021-09-23  1:34   ` Ming Lei
2021-09-23  5:27     ` Christoph Hellwig
2021-09-23  6:39       ` Ming Lei
2021-09-27 12:04         ` Christoph Hellwig
2021-09-27 14:32           ` Ming Lei
2021-09-23  6:21   ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).