linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* tear down file system I/O in del_gendisk v3
@ 2021-09-29  7:12 Christoph Hellwig
  2021-09-29  7:12 ` [PATCH 1/5] block: call submit_bio_checks under q_usage_counter Christoph Hellwig
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Christoph Hellwig @ 2021-09-29  7:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Tejun Heo, linux-block, Ming Lei

Ming reported that for SCSI we have a lifetime problem now that
the BDI moved from the request_queue to the disk as del_gendisk
doesn't finish all outstanding file system I/O.  It turns out
this actually is an older problem, although the case where it could
be hit before was very unusual (unbinding of a SCSI upper driver
while the scsi_device stays around).  This series fixes this by
draining all I/O in del_gendisk.

Changes since v2:
 - move the call to submit_bio_checks into freeze protection

Changes since v1:
 - fix a commit log typo
 - keep the existing nowait vs queue dying semantics in bio_queue_enter 
 - actually keep q_usage_counter in atomic mode after del_gendisk

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/5] block: call submit_bio_checks under q_usage_counter
  2021-09-29  7:12 tear down file system I/O in del_gendisk v3 Christoph Hellwig
@ 2021-09-29  7:12 ` Christoph Hellwig
  2021-10-17  1:14   ` Jens Axboe
  2021-09-29  7:12 ` [PATCH 2/5] block: factor out a blk_try_enter_queue helper Christoph Hellwig
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2021-09-29  7:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Tejun Heo, linux-block, Ming Lei

Ensure all bios check the current values of the queue under freeze
protection, i.e. to make sure the zero capacity set by del_gendisk
is actually seen before dispatching to the driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c | 34 ++++++++++++----------------------
 1 file changed, 12 insertions(+), 22 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 5454db2fa263b..c071f1a90b104 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -899,11 +899,18 @@ static blk_qc_t __submit_bio(struct bio *bio)
 	struct gendisk *disk = bio->bi_bdev->bd_disk;
 	blk_qc_t ret = BLK_QC_T_NONE;
 
-	if (blk_crypto_bio_prep(&bio)) {
-		if (!disk->fops->submit_bio)
-			return blk_mq_submit_bio(bio);
+	if (unlikely(bio_queue_enter(bio) != 0))
+		return BLK_QC_T_NONE;
+
+	if (!submit_bio_checks(bio) || !blk_crypto_bio_prep(&bio))
+		goto queue_exit;
+	if (disk->fops->submit_bio) {
 		ret = disk->fops->submit_bio(bio);
+		goto queue_exit;
 	}
+	return blk_mq_submit_bio(bio);
+
+queue_exit:
 	blk_queue_exit(disk->queue);
 	return ret;
 }
@@ -941,9 +948,6 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio)
 		struct request_queue *q = bio->bi_bdev->bd_disk->queue;
 		struct bio_list lower, same;
 
-		if (unlikely(bio_queue_enter(bio) != 0))
-			continue;
-
 		/*
 		 * Create a fresh bio_list for all subordinate requests.
 		 */
@@ -979,23 +983,12 @@ static blk_qc_t __submit_bio_noacct(struct bio *bio)
 static blk_qc_t __submit_bio_noacct_mq(struct bio *bio)
 {
 	struct bio_list bio_list[2] = { };
-	blk_qc_t ret = BLK_QC_T_NONE;
+	blk_qc_t ret;
 
 	current->bio_list = bio_list;
 
 	do {
-		struct gendisk *disk = bio->bi_bdev->bd_disk;
-
-		if (unlikely(bio_queue_enter(bio) != 0))
-			continue;
-
-		if (!blk_crypto_bio_prep(&bio)) {
-			blk_queue_exit(disk->queue);
-			ret = BLK_QC_T_NONE;
-			continue;
-		}
-
-		ret = blk_mq_submit_bio(bio);
+		ret = __submit_bio(bio);
 	} while ((bio = bio_list_pop(&bio_list[0])));
 
 	current->bio_list = NULL;
@@ -1013,9 +1006,6 @@ static blk_qc_t __submit_bio_noacct_mq(struct bio *bio)
  */
 blk_qc_t submit_bio_noacct(struct bio *bio)
 {
-	if (!submit_bio_checks(bio))
-		return BLK_QC_T_NONE;
-
 	/*
 	 * We only want one ->submit_bio to be active at a time, else stack
 	 * usage with stacked devices could be a problem.  Use current->bio_list
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] block: factor out a blk_try_enter_queue helper
  2021-09-29  7:12 tear down file system I/O in del_gendisk v3 Christoph Hellwig
  2021-09-29  7:12 ` [PATCH 1/5] block: call submit_bio_checks under q_usage_counter Christoph Hellwig
@ 2021-09-29  7:12 ` Christoph Hellwig
  2021-09-29  7:12 ` [PATCH 3/5] block: split bio_queue_enter from blk_queue_enter Christoph Hellwig
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2021-09-29  7:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Tejun Heo, linux-block, Ming Lei, Darrick J . Wong

Factor out the code to try to get q_usage_counter without blocking into
a separate helper.  Both to improve code readability and to prepare for
splitting bio_queue_enter from blk_queue_enter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Darrick J. Wong <djwong@kernel.org>
---
 block/blk-core.c | 60 ++++++++++++++++++++++++++----------------------
 1 file changed, 32 insertions(+), 28 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index c071f1a90b104..7e9eadacf2dea 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -416,6 +416,30 @@ void blk_cleanup_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_cleanup_queue);
 
+static bool blk_try_enter_queue(struct request_queue *q, bool pm)
+{
+	rcu_read_lock();
+	if (!percpu_ref_tryget_live(&q->q_usage_counter))
+		goto fail;
+
+	/*
+	 * The code that increments the pm_only counter must ensure that the
+	 * counter is globally visible before the queue is unfrozen.
+	 */
+	if (blk_queue_pm_only(q) &&
+	    (!pm || queue_rpm_status(q) == RPM_SUSPENDED))
+		goto fail_put;
+
+	rcu_read_unlock();
+	return true;
+
+fail_put:
+	percpu_ref_put(&q->q_usage_counter);
+fail:
+	rcu_read_unlock();
+	return false;
+}
+
 /**
  * blk_queue_enter() - try to increase q->q_usage_counter
  * @q: request queue pointer
@@ -425,40 +449,18 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 {
 	const bool pm = flags & BLK_MQ_REQ_PM;
 
-	while (true) {
-		bool success = false;
-
-		rcu_read_lock();
-		if (percpu_ref_tryget_live(&q->q_usage_counter)) {
-			/*
-			 * The code that increments the pm_only counter is
-			 * responsible for ensuring that that counter is
-			 * globally visible before the queue is unfrozen.
-			 */
-			if ((pm && queue_rpm_status(q) != RPM_SUSPENDED) ||
-			    !blk_queue_pm_only(q)) {
-				success = true;
-			} else {
-				percpu_ref_put(&q->q_usage_counter);
-			}
-		}
-		rcu_read_unlock();
-
-		if (success)
-			return 0;
-
+	while (!blk_try_enter_queue(q, pm)) {
 		if (flags & BLK_MQ_REQ_NOWAIT)
 			return -EBUSY;
 
 		/*
-		 * read pair of barrier in blk_freeze_queue_start(),
-		 * we need to order reading __PERCPU_REF_DEAD flag of
-		 * .q_usage_counter and reading .mq_freeze_depth or
-		 * queue dying flag, otherwise the following wait may
-		 * never return if the two reads are reordered.
+		 * read pair of barrier in blk_freeze_queue_start(), we need to
+		 * order reading __PERCPU_REF_DEAD flag of .q_usage_counter and
+		 * reading .mq_freeze_depth or queue dying flag, otherwise the
+		 * following wait may never return if the two reads are
+		 * reordered.
 		 */
 		smp_rmb();
-
 		wait_event(q->mq_freeze_wq,
 			   (!q->mq_freeze_depth &&
 			    blk_pm_resume_queue(pm, q)) ||
@@ -466,6 +468,8 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 		if (blk_queue_dying(q))
 			return -ENODEV;
 	}
+
+	return 0;
 }
 
 static inline int bio_queue_enter(struct bio *bio)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] block: split bio_queue_enter from blk_queue_enter
  2021-09-29  7:12 tear down file system I/O in del_gendisk v3 Christoph Hellwig
  2021-09-29  7:12 ` [PATCH 1/5] block: call submit_bio_checks under q_usage_counter Christoph Hellwig
  2021-09-29  7:12 ` [PATCH 2/5] block: factor out a blk_try_enter_queue helper Christoph Hellwig
@ 2021-09-29  7:12 ` Christoph Hellwig
  2021-09-29  7:12 ` [PATCH 4/5] block: drain file system I/O on del_gendisk Christoph Hellwig
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2021-09-29  7:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Tejun Heo, linux-block, Ming Lei, Darrick J . Wong

To prepare for fixing a gendisk shutdown race, open code the
blk_queue_enter logic in bio_queue_enter.  This also removes the
pointless flags translation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Darrick J. Wong <djwong@kernel.org>
---
 block/blk-core.c | 33 +++++++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 7e9eadacf2dea..43f5da707d8e3 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -475,18 +475,35 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 static inline int bio_queue_enter(struct bio *bio)
 {
 	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
-	bool nowait = bio->bi_opf & REQ_NOWAIT;
-	int ret;
 
-	ret = blk_queue_enter(q, nowait ? BLK_MQ_REQ_NOWAIT : 0);
-	if (unlikely(ret)) {
-		if (nowait && !blk_queue_dying(q))
+	while (!blk_try_enter_queue(q, false)) {
+		if (bio->bi_opf & REQ_NOWAIT) {
+			if (blk_queue_dying(q))
+				goto dead;
 			bio_wouldblock_error(bio);
-		else
-			bio_io_error(bio);
+			return -EBUSY;
+		}
+
+		/*
+		 * read pair of barrier in blk_freeze_queue_start(), we need to
+		 * order reading __PERCPU_REF_DEAD flag of .q_usage_counter and
+		 * reading .mq_freeze_depth or queue dying flag, otherwise the
+		 * following wait may never return if the two reads are
+		 * reordered.
+		 */
+		smp_rmb();
+		wait_event(q->mq_freeze_wq,
+			   (!q->mq_freeze_depth &&
+			    blk_pm_resume_queue(false, q)) ||
+			   blk_queue_dying(q));
+		if (blk_queue_dying(q))
+			goto dead;
 	}
 
-	return ret;
+	return 0;
+dead:
+	bio_io_error(bio);
+	return -ENODEV;
 }
 
 void blk_queue_exit(struct request_queue *q)
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/5] block: drain file system I/O on del_gendisk
  2021-09-29  7:12 tear down file system I/O in del_gendisk v3 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2021-09-29  7:12 ` [PATCH 3/5] block: split bio_queue_enter from blk_queue_enter Christoph Hellwig
@ 2021-09-29  7:12 ` Christoph Hellwig
  2021-09-29  8:17   ` Ming Lei
  2021-09-29  7:12 ` [PATCH 5/5] block: keep q_usage_counter in atomic mode after del_gendisk Christoph Hellwig
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2021-09-29  7:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Tejun Heo, linux-block, Ming Lei, Darrick J . Wong

Instead of delaying draining of file system I/O related items like the
blk-qos queues, the integrity read workqueue and timeouts only when the
request_queue is removed, do that when del_gendisk is called.  This is
important for SCSI where the upper level drivers that control the gendisk
are separate entities, and the disk can be freed much earlier than the
request_queue, or can even be unbound without tearing down the queue.

Fixes: edb0872f44ec ("block: move the bdi from the request_queue to the gendisk")
Reported-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Darrick J. Wong <djwong@kernel.org>
---
 block/blk-core.c      | 27 ++++++++++++---------------
 block/blk.h           |  1 +
 block/genhd.c         | 21 +++++++++++++++++++++
 include/linux/genhd.h |  1 +
 4 files changed, 35 insertions(+), 15 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 43f5da707d8e3..4d8f5fe915887 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -49,7 +49,6 @@
 #include "blk-mq.h"
 #include "blk-mq-sched.h"
 #include "blk-pm.h"
-#include "blk-rq-qos.h"
 
 struct dentry *blk_debugfs_root;
 
@@ -337,23 +336,25 @@ void blk_put_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_put_queue);
 
-void blk_set_queue_dying(struct request_queue *q)
+void blk_queue_start_drain(struct request_queue *q)
 {
-	blk_queue_flag_set(QUEUE_FLAG_DYING, q);
-
 	/*
 	 * When queue DYING flag is set, we need to block new req
 	 * entering queue, so we call blk_freeze_queue_start() to
 	 * prevent I/O from crossing blk_queue_enter().
 	 */
 	blk_freeze_queue_start(q);
-
 	if (queue_is_mq(q))
 		blk_mq_wake_waiters(q);
-
 	/* Make blk_queue_enter() reexamine the DYING flag. */
 	wake_up_all(&q->mq_freeze_wq);
 }
+
+void blk_set_queue_dying(struct request_queue *q)
+{
+	blk_queue_flag_set(QUEUE_FLAG_DYING, q);
+	blk_queue_start_drain(q);
+}
 EXPORT_SYMBOL_GPL(blk_set_queue_dying);
 
 /**
@@ -385,13 +386,8 @@ void blk_cleanup_queue(struct request_queue *q)
 	 */
 	blk_freeze_queue(q);
 
-	rq_qos_exit(q);
-
 	blk_queue_flag_set(QUEUE_FLAG_DEAD, q);
 
-	/* for synchronous bio-based driver finish in-flight integrity i/o */
-	blk_flush_integrity();
-
 	blk_sync_queue(q);
 	if (queue_is_mq(q))
 		blk_mq_exit_queue(q);
@@ -474,11 +470,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 
 static inline int bio_queue_enter(struct bio *bio)
 {
-	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
+	struct gendisk *disk = bio->bi_bdev->bd_disk;
+	struct request_queue *q = disk->queue;
 
 	while (!blk_try_enter_queue(q, false)) {
 		if (bio->bi_opf & REQ_NOWAIT) {
-			if (blk_queue_dying(q))
+			if (test_bit(GD_DEAD, &disk->state))
 				goto dead;
 			bio_wouldblock_error(bio);
 			return -EBUSY;
@@ -495,8 +492,8 @@ static inline int bio_queue_enter(struct bio *bio)
 		wait_event(q->mq_freeze_wq,
 			   (!q->mq_freeze_depth &&
 			    blk_pm_resume_queue(false, q)) ||
-			   blk_queue_dying(q));
-		if (blk_queue_dying(q))
+			   test_bit(GD_DEAD, &disk->state));
+		if (test_bit(GD_DEAD, &disk->state))
 			goto dead;
 	}
 
diff --git a/block/blk.h b/block/blk.h
index 7d2a0ba7ed21d..e2ed2257709ae 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -51,6 +51,7 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size,
 void blk_free_flush_queue(struct blk_flush_queue *q);
 
 void blk_freeze_queue(struct request_queue *q);
+void blk_queue_start_drain(struct request_queue *q);
 
 #define BIO_INLINE_VECS 4
 struct bio_vec *bvec_alloc(mempool_t *pool, unsigned short *nr_vecs,
diff --git a/block/genhd.c b/block/genhd.c
index 7b6e5e1cf9564..b3c33495d7208 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -26,6 +26,7 @@
 #include <linux/badblocks.h>
 
 #include "blk.h"
+#include "blk-rq-qos.h"
 
 static struct kobject *block_depr;
 
@@ -559,6 +560,8 @@ EXPORT_SYMBOL(device_add_disk);
  */
 void del_gendisk(struct gendisk *disk)
 {
+	struct request_queue *q = disk->queue;
+
 	might_sleep();
 
 	if (WARN_ON_ONCE(!disk_live(disk) && !(disk->flags & GENHD_FL_HIDDEN)))
@@ -575,8 +578,26 @@ void del_gendisk(struct gendisk *disk)
 	fsync_bdev(disk->part0);
 	__invalidate_device(disk->part0, true);
 
+	/*
+	 * Fail any new I/O.
+	 */
+	set_bit(GD_DEAD, &disk->state);
 	set_capacity(disk, 0);
 
+	/*
+	 * Prevent new I/O from crossing bio_queue_enter().
+	 */
+	blk_queue_start_drain(q);
+	blk_mq_freeze_queue_wait(q);
+
+	rq_qos_exit(q);
+	blk_sync_queue(q);
+	blk_flush_integrity();
+	/*
+	 * Allow using passthrough request again after the queue is torn down.
+	 */
+	blk_mq_unfreeze_queue(q);
+
 	if (!(disk->flags & GENHD_FL_HIDDEN)) {
 		sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
 
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index c68d83c87f83f..0f5315c2b5a34 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -149,6 +149,7 @@ struct gendisk {
 	unsigned long state;
 #define GD_NEED_PART_SCAN		0
 #define GD_READ_ONLY			1
+#define GD_DEAD				2
 
 	struct mutex open_mutex;	/* open/close mutex */
 	unsigned open_partitions;	/* number of open partitions */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] block: keep q_usage_counter in atomic mode after del_gendisk
  2021-09-29  7:12 tear down file system I/O in del_gendisk v3 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2021-09-29  7:12 ` [PATCH 4/5] block: drain file system I/O on del_gendisk Christoph Hellwig
@ 2021-09-29  7:12 ` Christoph Hellwig
  2021-10-12  9:33 ` [PATCH 6/5] kyber: avoid q->disk dereferences in trace points Christoph Hellwig
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2021-09-29  7:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Tejun Heo, linux-block, Ming Lei, Darrick J . Wong

Don't switch back to percpu mode to avoid the double RCU grace period
when tearing down SCSI devices.  After removing the disk only passthrough
commands can be send anyway.

Suggested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Darrick J. Wong <djwong@kernel.org>
---
 block/blk-mq.c | 9 ++++++++-
 block/blk.h    | 1 +
 block/genhd.c  | 3 ++-
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 108a352051be5..bc026372de439 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -188,9 +188,11 @@ void blk_mq_freeze_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue);
 
-void blk_mq_unfreeze_queue(struct request_queue *q)
+void __blk_mq_unfreeze_queue(struct request_queue *q, bool force_atomic)
 {
 	mutex_lock(&q->mq_freeze_lock);
+	if (force_atomic)
+		q->q_usage_counter.data->force_atomic = true;
 	q->mq_freeze_depth--;
 	WARN_ON_ONCE(q->mq_freeze_depth < 0);
 	if (!q->mq_freeze_depth) {
@@ -199,6 +201,11 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
 	}
 	mutex_unlock(&q->mq_freeze_lock);
 }
+
+void blk_mq_unfreeze_queue(struct request_queue *q)
+{
+	__blk_mq_unfreeze_queue(q, false);
+}
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
 
 /*
diff --git a/block/blk.h b/block/blk.h
index e2ed2257709ae..6c3c00a8fe19d 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -51,6 +51,7 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size,
 void blk_free_flush_queue(struct blk_flush_queue *q);
 
 void blk_freeze_queue(struct request_queue *q);
+void __blk_mq_unfreeze_queue(struct request_queue *q, bool force_atomic);
 void blk_queue_start_drain(struct request_queue *q);
 
 #define BIO_INLINE_VECS 4
diff --git a/block/genhd.c b/block/genhd.c
index b3c33495d7208..edd013a820879 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -596,7 +596,8 @@ void del_gendisk(struct gendisk *disk)
 	/*
 	 * Allow using passthrough request again after the queue is torn down.
 	 */
-	blk_mq_unfreeze_queue(q);
+	blk_queue_flag_clear(QUEUE_FLAG_INIT_DONE, q);
+	__blk_mq_unfreeze_queue(q, true);
 
 	if (!(disk->flags & GENHD_FL_HIDDEN)) {
 		sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] block: drain file system I/O on del_gendisk
  2021-09-29  7:12 ` [PATCH 4/5] block: drain file system I/O on del_gendisk Christoph Hellwig
@ 2021-09-29  8:17   ` Ming Lei
  2021-10-01  4:13     ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2021-09-29  8:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Tejun Heo, linux-block, Darrick J . Wong

On Wed, Sep 29, 2021 at 09:12:40AM +0200, Christoph Hellwig wrote:
> Instead of delaying draining of file system I/O related items like the
> blk-qos queues, the integrity read workqueue and timeouts only when the
> request_queue is removed, do that when del_gendisk is called.  This is
> important for SCSI where the upper level drivers that control the gendisk
> are separate entities, and the disk can be freed much earlier than the
> request_queue, or can even be unbound without tearing down the queue.
> 
> Fixes: edb0872f44ec ("block: move the bdi from the request_queue to the gendisk")
> Reported-by: Ming Lei <ming.lei@redhat.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Tested-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  block/blk-core.c      | 27 ++++++++++++---------------
>  block/blk.h           |  1 +
>  block/genhd.c         | 21 +++++++++++++++++++++
>  include/linux/genhd.h |  1 +
>  4 files changed, 35 insertions(+), 15 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 43f5da707d8e3..4d8f5fe915887 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -49,7 +49,6 @@
>  #include "blk-mq.h"
>  #include "blk-mq-sched.h"
>  #include "blk-pm.h"
> -#include "blk-rq-qos.h"
>  
>  struct dentry *blk_debugfs_root;
>  
> @@ -337,23 +336,25 @@ void blk_put_queue(struct request_queue *q)
>  }
>  EXPORT_SYMBOL(blk_put_queue);
>  
> -void blk_set_queue_dying(struct request_queue *q)
> +void blk_queue_start_drain(struct request_queue *q)
>  {
> -	blk_queue_flag_set(QUEUE_FLAG_DYING, q);
> -
>  	/*
>  	 * When queue DYING flag is set, we need to block new req
>  	 * entering queue, so we call blk_freeze_queue_start() to
>  	 * prevent I/O from crossing blk_queue_enter().
>  	 */
>  	blk_freeze_queue_start(q);
> -
>  	if (queue_is_mq(q))
>  		blk_mq_wake_waiters(q);
> -
>  	/* Make blk_queue_enter() reexamine the DYING flag. */
>  	wake_up_all(&q->mq_freeze_wq);
>  }
> +
> +void blk_set_queue_dying(struct request_queue *q)
> +{
> +	blk_queue_flag_set(QUEUE_FLAG_DYING, q);
> +	blk_queue_start_drain(q);
> +}
>  EXPORT_SYMBOL_GPL(blk_set_queue_dying);
>  
>  /**
> @@ -385,13 +386,8 @@ void blk_cleanup_queue(struct request_queue *q)
>  	 */
>  	blk_freeze_queue(q);
>  
> -	rq_qos_exit(q);
> -
>  	blk_queue_flag_set(QUEUE_FLAG_DEAD, q);
>  
> -	/* for synchronous bio-based driver finish in-flight integrity i/o */
> -	blk_flush_integrity();
> -
>  	blk_sync_queue(q);
>  	if (queue_is_mq(q))
>  		blk_mq_exit_queue(q);
> @@ -474,11 +470,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
>  
>  static inline int bio_queue_enter(struct bio *bio)
>  {
> -	struct request_queue *q = bio->bi_bdev->bd_disk->queue;
> +	struct gendisk *disk = bio->bi_bdev->bd_disk;
> +	struct request_queue *q = disk->queue;
>  
>  	while (!blk_try_enter_queue(q, false)) {
>  		if (bio->bi_opf & REQ_NOWAIT) {
> -			if (blk_queue_dying(q))
> +			if (test_bit(GD_DEAD, &disk->state))
>  				goto dead;
>  			bio_wouldblock_error(bio);
>  			return -EBUSY;
> @@ -495,8 +492,8 @@ static inline int bio_queue_enter(struct bio *bio)
>  		wait_event(q->mq_freeze_wq,
>  			   (!q->mq_freeze_depth &&
>  			    blk_pm_resume_queue(false, q)) ||
> -			   blk_queue_dying(q));
> -		if (blk_queue_dying(q))
> +			   test_bit(GD_DEAD, &disk->state));
> +		if (test_bit(GD_DEAD, &disk->state))
>  			goto dead;
>  	}
>  
> diff --git a/block/blk.h b/block/blk.h
> index 7d2a0ba7ed21d..e2ed2257709ae 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -51,6 +51,7 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size,
>  void blk_free_flush_queue(struct blk_flush_queue *q);
>  
>  void blk_freeze_queue(struct request_queue *q);
> +void blk_queue_start_drain(struct request_queue *q);
>  
>  #define BIO_INLINE_VECS 4
>  struct bio_vec *bvec_alloc(mempool_t *pool, unsigned short *nr_vecs,
> diff --git a/block/genhd.c b/block/genhd.c
> index 7b6e5e1cf9564..b3c33495d7208 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -26,6 +26,7 @@
>  #include <linux/badblocks.h>
>  
>  #include "blk.h"
> +#include "blk-rq-qos.h"
>  
>  static struct kobject *block_depr;
>  
> @@ -559,6 +560,8 @@ EXPORT_SYMBOL(device_add_disk);
>   */
>  void del_gendisk(struct gendisk *disk)
>  {
> +	struct request_queue *q = disk->queue;
> +
>  	might_sleep();
>  
>  	if (WARN_ON_ONCE(!disk_live(disk) && !(disk->flags & GENHD_FL_HIDDEN)))
> @@ -575,8 +578,26 @@ void del_gendisk(struct gendisk *disk)
>  	fsync_bdev(disk->part0);
>  	__invalidate_device(disk->part0, true);
>  
> +	/*
> +	 * Fail any new I/O.
> +	 */
> +	set_bit(GD_DEAD, &disk->state);
>  	set_capacity(disk, 0);
>  
> +	/*
> +	 * Prevent new I/O from crossing bio_queue_enter().
> +	 */
> +	blk_queue_start_drain(q);
> +	blk_mq_freeze_queue_wait(q);

Draining request won't fix the problem completely:

1) blk-mq dispatch code may still be in-progress after q_usage_counter
becomes zero, see the story in 662156641bc4 ("block: don't drain in-progress dispatch in
blk_cleanup_queue()")

2) elevator code / blkcg code may still be called after blk_cleanup_queue(), such
as kyber, trace_kyber_latency()(q->disk is referred) is called in kyber's timer
handler, and the timer is deleted via del_timer_sync() via kyber_exit_sched()
from blk_release_queue().

> +
> +	rq_qos_exit(q);
> +	blk_sync_queue(q);
> +	blk_flush_integrity();
> +	/*
> +	 * Allow using passthrough request again after the queue is torn down.
> +	 */
> +	blk_mq_unfreeze_queue(q);

Again, one FS bio is still possible to enter queue now: submit_bio_checks()
is done before set_capacity(0), and submitted after blk_mq_unfreeze_queue()
returns.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] block: drain file system I/O on del_gendisk
  2021-09-29  8:17   ` Ming Lei
@ 2021-10-01  4:13     ` Christoph Hellwig
  2021-10-05  2:15       ` Ming Lei
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2021-10-01  4:13 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Jens Axboe, Tejun Heo, linux-block, Darrick J . Wong

On Wed, Sep 29, 2021 at 04:17:01PM +0800, Ming Lei wrote:

[full quote deleted]

> Draining request won't fix the problem completely:
> 
> 1) blk-mq dispatch code may still be in-progress after q_usage_counter
> becomes zero, see the story in 662156641bc4 ("block: don't drain in-progress dispatch in
> blk_cleanup_queue()")

That commit does not have a good explanation on what it actually fixed.

> 2) elevator code / blkcg code may still be called after blk_cleanup_queue(), such
> as kyber, trace_kyber_latency()(q->disk is referred) is called in kyber's timer
> handler, and the timer is deleted via del_timer_sync() via kyber_exit_sched()
> from blk_release_queue().

Yes.  There's two things we can do here:

 - stop using the dev_t in tracing a request_queue
 - exit the I/O schedules in del_gendisk, because they are only used
   for file system I/O that requires the gendisk anyway

we'll probably want both eventually.

> 
> > +
> > +	rq_qos_exit(q);
> > +	blk_sync_queue(q);
> > +	blk_flush_integrity();
> > +	/*
> > +	 * Allow using passthrough request again after the queue is torn down.
> > +	 */
> > +	blk_mq_unfreeze_queue(q);
> 
> Again, one FS bio is still possible to enter queue now: submit_bio_checks()
> is done before set_capacity(0), and submitted after blk_mq_unfreeze_queue()
> returns.

Not with the new patch 1 in this series.

Jens - can you take a look at the series that fixes the crashes people
are sending while I'm looking at the rest of the corner cases?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/5] block: drain file system I/O on del_gendisk
  2021-10-01  4:13     ` Christoph Hellwig
@ 2021-10-05  2:15       ` Ming Lei
  0 siblings, 0 replies; 14+ messages in thread
From: Ming Lei @ 2021-10-05  2:15 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Tejun Heo, linux-block, Darrick J . Wong

On Fri, Oct 01, 2021 at 06:13:48AM +0200, Christoph Hellwig wrote:
> On Wed, Sep 29, 2021 at 04:17:01PM +0800, Ming Lei wrote:
> 
> [full quote deleted]
> 
> > Draining request won't fix the problem completely:
> > 
> > 1) blk-mq dispatch code may still be in-progress after q_usage_counter
> > becomes zero, see the story in 662156641bc4 ("block: don't drain in-progress dispatch in
> > blk_cleanup_queue()")
> 
> That commit does not have a good explanation on what it actually fixed.

The commit log mentions that:

    Now freeing hw queue resource is moved to hctx's release handler,
    we don't need to worry about the race between blk_cleanup_queue and
    run queue any more.

So the hctx queue resource is fine to be referred during cleaning up
queue, since freeing hctx related resources are moved to queue release
handler.

Or you can see the point in commit c2856ae2f315 ("blk-mq: quiesce queue before
freeing queue"), which explains the issue in enough details. That said
even though after .q_usage_counter is zero, the block layer code is
still running and queue or disk can still be referred.

> 
> > 2) elevator code / blkcg code may still be called after blk_cleanup_queue(), such
> > as kyber, trace_kyber_latency()(q->disk is referred) is called in kyber's timer
> > handler, and the timer is deleted via del_timer_sync() via kyber_exit_sched()
> > from blk_release_queue().
> 
> Yes.  There's two things we can do here:
> 
>  - stop using the dev_t in tracing a request_queue
>  - exit the I/O schedules in del_gendisk, because they are only used
>    for file system I/O that requires the gendisk anyway
> 
> we'll probably want both eventually.

Not sure if it is easy to do, one thing is that we can't slow down
blk_cleanup_queue() too much. Usually we added quiesing queue in
blk_cleanup_queue(), but people complained and the change is removed.

> 
> > 
> > > +
> > > +	rq_qos_exit(q);
> > > +	blk_sync_queue(q);
> > > +	blk_flush_integrity();
> > > +	/*
> > > +	 * Allow using passthrough request again after the queue is torn down.
> > > +	 */
> > > +	blk_mq_unfreeze_queue(q);
> > 
> > Again, one FS bio is still possible to enter queue now: submit_bio_checks()
> > is done before set_capacity(0), and submitted after blk_mq_unfreeze_queue()
> > returns.
> 
> Not with the new patch 1 in this series.

OK, looks you get the .q_usage_counter before checking bio.

> 
> Jens - can you take a look at the series that fixes the crashes people
> are sending while I'm looking at the rest of the corner cases?

But draining .q_usage_counter can't address this kind of issue completely, can
it? Not mention the io scheduler code can still refer to q->disk after
blk_cleanup_queue().


Thanks,
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 6/5] kyber: avoid q->disk dereferences in trace points
  2021-09-29  7:12 tear down file system I/O in del_gendisk v3 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2021-09-29  7:12 ` [PATCH 5/5] block: keep q_usage_counter in atomic mode after del_gendisk Christoph Hellwig
@ 2021-10-12  9:33 ` Christoph Hellwig
  2021-10-14 12:57   ` Ming Lei
  2021-10-14  9:23 ` tear down file system I/O in del_gendisk v3 Yi Zhang
  2021-10-17  1:14 ` Jens Axboe
  7 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2021-10-12  9:33 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Tejun Heo, linux-block, Ming Lei


q->disk becomes invalid after the gendisk is removed.  Work around this
by caching the dev_t for the tracepoints.  The real fix would be to
properly tear down the I/O schedulers with the gendisk, but that is
a much more invasive change.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/kyber-iosched.c        | 10 ++++++----
 include/trace/events/kyber.h | 19 +++++++++----------
 2 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index 15a8be57203d6..a0ffbabfac2c6 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -151,6 +151,7 @@ struct kyber_ctx_queue {
 
 struct kyber_queue_data {
 	struct request_queue *q;
+	dev_t dev;
 
 	/*
 	 * Each scheduling domain has a limited number of in-flight requests
@@ -257,7 +258,7 @@ static int calculate_percentile(struct kyber_queue_data *kqd,
 	}
 	memset(buckets, 0, sizeof(kqd->latency_buckets[sched_domain][type]));
 
-	trace_kyber_latency(kqd->q, kyber_domain_names[sched_domain],
+	trace_kyber_latency(kqd->dev, kyber_domain_names[sched_domain],
 			    kyber_latency_type_names[type], percentile,
 			    bucket + 1, 1 << KYBER_LATENCY_SHIFT, samples);
 
@@ -270,7 +271,7 @@ static void kyber_resize_domain(struct kyber_queue_data *kqd,
 	depth = clamp(depth, 1U, kyber_depth[sched_domain]);
 	if (depth != kqd->domain_tokens[sched_domain].sb.depth) {
 		sbitmap_queue_resize(&kqd->domain_tokens[sched_domain], depth);
-		trace_kyber_adjust(kqd->q, kyber_domain_names[sched_domain],
+		trace_kyber_adjust(kqd->dev, kyber_domain_names[sched_domain],
 				   depth);
 	}
 }
@@ -366,6 +367,7 @@ static struct kyber_queue_data *kyber_queue_data_alloc(struct request_queue *q)
 		goto err;
 
 	kqd->q = q;
+	kqd->dev = disk_devt(q->disk);
 
 	kqd->cpu_latency = alloc_percpu_gfp(struct kyber_cpu_latency,
 					    GFP_KERNEL | __GFP_ZERO);
@@ -774,7 +776,7 @@ kyber_dispatch_cur_domain(struct kyber_queue_data *kqd,
 			list_del_init(&rq->queuelist);
 			return rq;
 		} else {
-			trace_kyber_throttled(kqd->q,
+			trace_kyber_throttled(kqd->dev,
 					      kyber_domain_names[khd->cur_domain]);
 		}
 	} else if (sbitmap_any_bit_set(&khd->kcq_map[khd->cur_domain])) {
@@ -787,7 +789,7 @@ kyber_dispatch_cur_domain(struct kyber_queue_data *kqd,
 			list_del_init(&rq->queuelist);
 			return rq;
 		} else {
-			trace_kyber_throttled(kqd->q,
+			trace_kyber_throttled(kqd->dev,
 					      kyber_domain_names[khd->cur_domain]);
 		}
 	}
diff --git a/include/trace/events/kyber.h b/include/trace/events/kyber.h
index 491098a0d8ed9..bf7533f171ff9 100644
--- a/include/trace/events/kyber.h
+++ b/include/trace/events/kyber.h
@@ -13,11 +13,11 @@
 
 TRACE_EVENT(kyber_latency,
 
-	TP_PROTO(struct request_queue *q, const char *domain, const char *type,
+	TP_PROTO(dev_t dev, const char *domain, const char *type,
 		 unsigned int percentile, unsigned int numerator,
 		 unsigned int denominator, unsigned int samples),
 
-	TP_ARGS(q, domain, type, percentile, numerator, denominator, samples),
+	TP_ARGS(dev, domain, type, percentile, numerator, denominator, samples),
 
 	TP_STRUCT__entry(
 		__field(	dev_t,	dev				)
@@ -30,7 +30,7 @@ TRACE_EVENT(kyber_latency,
 	),
 
 	TP_fast_assign(
-		__entry->dev		= disk_devt(q->disk);
+		__entry->dev		= dev;
 		strlcpy(__entry->domain, domain, sizeof(__entry->domain));
 		strlcpy(__entry->type, type, sizeof(__entry->type));
 		__entry->percentile	= percentile;
@@ -47,10 +47,9 @@ TRACE_EVENT(kyber_latency,
 
 TRACE_EVENT(kyber_adjust,
 
-	TP_PROTO(struct request_queue *q, const char *domain,
-		 unsigned int depth),
+	TP_PROTO(dev_t dev, const char *domain, unsigned int depth),
 
-	TP_ARGS(q, domain, depth),
+	TP_ARGS(dev, domain, depth),
 
 	TP_STRUCT__entry(
 		__field(	dev_t,	dev			)
@@ -59,7 +58,7 @@ TRACE_EVENT(kyber_adjust,
 	),
 
 	TP_fast_assign(
-		__entry->dev		= disk_devt(q->disk);
+		__entry->dev		= dev;
 		strlcpy(__entry->domain, domain, sizeof(__entry->domain));
 		__entry->depth		= depth;
 	),
@@ -71,9 +70,9 @@ TRACE_EVENT(kyber_adjust,
 
 TRACE_EVENT(kyber_throttled,
 
-	TP_PROTO(struct request_queue *q, const char *domain),
+	TP_PROTO(dev_t dev, const char *domain),
 
-	TP_ARGS(q, domain),
+	TP_ARGS(dev, domain),
 
 	TP_STRUCT__entry(
 		__field(	dev_t,	dev			)
@@ -81,7 +80,7 @@ TRACE_EVENT(kyber_throttled,
 	),
 
 	TP_fast_assign(
-		__entry->dev		= disk_devt(q->disk);
+		__entry->dev		= dev;
 		strlcpy(__entry->domain, domain, sizeof(__entry->domain));
 	),
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: tear down file system I/O in del_gendisk v3
  2021-09-29  7:12 tear down file system I/O in del_gendisk v3 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2021-10-12  9:33 ` [PATCH 6/5] kyber: avoid q->disk dereferences in trace points Christoph Hellwig
@ 2021-10-14  9:23 ` Yi Zhang
  2021-10-17  1:14 ` Jens Axboe
  7 siblings, 0 replies; 14+ messages in thread
From: Yi Zhang @ 2021-10-14  9:23 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Tejun Heo, linux-block, Ming Lei

This patchset fixed the NULL pointer issue triggered by blktests block/025
https://lore.kernel.org/linux-block/YWfJ06KX3HT1nANX@infradead.org/T/#mbf26cd34c88660ce0221dd7933b294a0b0298319

Tested-by: Yi Zhang <yi.zhang@redhat.com>

On Wed, Sep 29, 2021 at 3:14 PM Christoph Hellwig <hch@lst.de> wrote:
>
> Ming reported that for SCSI we have a lifetime problem now that
> the BDI moved from the request_queue to the disk as del_gendisk
> doesn't finish all outstanding file system I/O.  It turns out
> this actually is an older problem, although the case where it could
> be hit before was very unusual (unbinding of a SCSI upper driver
> while the scsi_device stays around).  This series fixes this by
> draining all I/O in del_gendisk.
>
> Changes since v2:
>  - move the call to submit_bio_checks into freeze protection
>
> Changes since v1:
>  - fix a commit log typo
>  - keep the existing nowait vs queue dying semantics in bio_queue_enter
>  - actually keep q_usage_counter in atomic mode after del_gendisk
>


-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 6/5] kyber: avoid q->disk dereferences in trace points
  2021-10-12  9:33 ` [PATCH 6/5] kyber: avoid q->disk dereferences in trace points Christoph Hellwig
@ 2021-10-14 12:57   ` Ming Lei
  0 siblings, 0 replies; 14+ messages in thread
From: Ming Lei @ 2021-10-14 12:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Tejun Heo, linux-block

Hello Christoph,

On Tue, Oct 12, 2021 at 11:33:01AM +0200, Christoph Hellwig wrote:
> 
> q->disk becomes invalid after the gendisk is removed.  Work around this
> by caching the dev_t for the tracepoints.  The real fix would be to
> properly tear down the I/O schedulers with the gendisk, but that is
> a much more invasive change.

Except for kyber, blkcg code refers q->disk too.

Such as blkg_dev_name() used in showing blk cgroup files, see
tg_print_limit(), but blkcg_exit_queue() is called in queue's release
handler, so q->disk can be referred when reading blkcg file of cgroup
fs from userspace after deleting gendisk.

I think it is fine to move blkcg_exit_queue() into del_gendisk(), but
bfq_exit_queue() is still run from queue's release handler, and bfq's
blkcg policy is deactivated there. That said the referring in reading
bfq's cgroup file still can come after deleting disk, especially kernel
panic could be caused when q->disk is cleared between the check and
calling to bdi_dev_name() in blkg_dev_name().


Thanks,
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/5] block: call submit_bio_checks under q_usage_counter
  2021-09-29  7:12 ` [PATCH 1/5] block: call submit_bio_checks under q_usage_counter Christoph Hellwig
@ 2021-10-17  1:14   ` Jens Axboe
  0 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2021-10-17  1:14 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Ming Lei, Tejun Heo, linux-block

On Wed, 29 Sep 2021 09:12:37 +0200, Christoph Hellwig wrote:
> Ensure all bios check the current values of the queue under freeze
> protection, i.e. to make sure the zero capacity set by del_gendisk
> is actually seen before dispatching to the driver.
> 
> 

Applied, thanks!

[1/5] block: call submit_bio_checks under q_usage_counter
      (no commit info)
[2/5] block: factor out a blk_try_enter_queue helper
      (no commit info)
[3/5] block: split bio_queue_enter from blk_queue_enter
      (no commit info)
[4/5] block: drain file system I/O on del_gendisk
      (no commit info)
[5/5] block: keep q_usage_counter in atomic mode after del_gendisk
      (no commit info)
[6/6] kyber: avoid q->disk dereferences in trace points
      (no commit info)

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: tear down file system I/O in del_gendisk v3
  2021-09-29  7:12 tear down file system I/O in del_gendisk v3 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2021-10-14  9:23 ` tear down file system I/O in del_gendisk v3 Yi Zhang
@ 2021-10-17  1:14 ` Jens Axboe
  7 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2021-10-17  1:14 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Tejun Heo, linux-block, Ming Lei

On 9/29/21 1:12 AM, Christoph Hellwig wrote:
> Ming reported that for SCSI we have a lifetime problem now that
> the BDI moved from the request_queue to the disk as del_gendisk
> doesn't finish all outstanding file system I/O.  It turns out
> this actually is an older problem, although the case where it could
> be hit before was very unusual (unbinding of a SCSI upper driver
> while the scsi_device stays around).  This series fixes this by
> draining all I/O in del_gendisk.
> 
> Changes since v2:
>  - move the call to submit_bio_checks into freeze protection
> 
> Changes since v1:
>  - fix a commit log typo
>  - keep the existing nowait vs queue dying semantics in bio_queue_enter 
>  - actually keep q_usage_counter in atomic mode after del_gendisk

Looks like there's no other way than going down this path, even though
it's not the most exciting right now... I have applied this one for
5.15.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-10-17  1:14 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-29  7:12 tear down file system I/O in del_gendisk v3 Christoph Hellwig
2021-09-29  7:12 ` [PATCH 1/5] block: call submit_bio_checks under q_usage_counter Christoph Hellwig
2021-10-17  1:14   ` Jens Axboe
2021-09-29  7:12 ` [PATCH 2/5] block: factor out a blk_try_enter_queue helper Christoph Hellwig
2021-09-29  7:12 ` [PATCH 3/5] block: split bio_queue_enter from blk_queue_enter Christoph Hellwig
2021-09-29  7:12 ` [PATCH 4/5] block: drain file system I/O on del_gendisk Christoph Hellwig
2021-09-29  8:17   ` Ming Lei
2021-10-01  4:13     ` Christoph Hellwig
2021-10-05  2:15       ` Ming Lei
2021-09-29  7:12 ` [PATCH 5/5] block: keep q_usage_counter in atomic mode after del_gendisk Christoph Hellwig
2021-10-12  9:33 ` [PATCH 6/5] kyber: avoid q->disk dereferences in trace points Christoph Hellwig
2021-10-14 12:57   ` Ming Lei
2021-10-14  9:23 ` tear down file system I/O in del_gendisk v3 Yi Zhang
2021-10-17  1:14 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).