All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] block: don't drain file system I/O on del_gendisk
@ 2022-01-16  4:18 Ming Lei
  2022-01-16  4:18 ` [PATCH 1/3] block: move freeing disk into queue's release handler Ming Lei
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Ming Lei @ 2022-01-16  4:18 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Ming Lei

Hello,

Draining FS I/O on del_gendisk() is added for just avoiding to refer to
recently added q->disk in IO path, and it isn't actually needed.

Now we can move killing disk into queue's release handler, see patch 1,
so no need to drain FS I/O on del_gendisk().

Draining FS I/O on del_gendisk() isn't reliable, see the following
cases, so revert this behavior.

1) queue freezing can't drain FS I/O for bio based driver

2) it isn't easy to move elevator/cgroup/throttle shutdown during
del_gendisk, and q->disk can still be referred in these code paths

3) the added flag of GD_DEAD may not be observed reliably in
__bio_queue_enter() because queue freezing might not imply rcu grace
period.


Ming Lei (3):
  block: move freeing disk into queue's release handler
  block: revert aec89dc5d421 block: keep q_usage_counter in atomic mode
    after del_gendisk
  block: revert 8e141f9eb803 block: drain file system I/O on del_gendisk

 block/blk-core.c      | 24 ++++++++++++------------
 block/blk-mq.c        |  9 +--------
 block/blk-sysfs.c     | 13 +++++++++++++
 block/blk.h           |  2 --
 block/genhd.c         | 31 +++++--------------------------
 include/linux/genhd.h |  1 -
 6 files changed, 31 insertions(+), 49 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] block: move freeing disk into queue's release handler
  2022-01-16  4:18 [PATCH 0/3] block: don't drain file system I/O on del_gendisk Ming Lei
@ 2022-01-16  4:18 ` Ming Lei
  2022-01-18  8:22   ` Christoph Hellwig
  2022-01-16  4:18 ` [PATCH 2/3] block: revert aec89dc5d421 block: keep q_usage_counter in atomic mode after del_gendisk Ming Lei
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Ming Lei @ 2022-01-16  4:18 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Ming Lei

So far block_device(for disk), gendisk and request queue are supposed to
be killed at the same time, because we can retrieve request queue from both
block_device->bd_queue and gendisk->queue directly, meantime disk is
associated with request queue directly via q->disk. Also request queue's
refcnt is grabbed when allocating disk, and released in disk's release
handler.

Also we put request queue in disk_release() and clear queue->disk there.
Meantime disk can be released before running blk_cleanup_queue(). This
way isn't reliable:

- some block core code(blk-cgroup, io scheduler, ...) usually deals with
request queue only, and sometimes they need to retrieve disk info via q->disk,
but both io scheduler and blk-cgroup are shutdown in blk_release_queue()
actually, and q->disk can be cleared before releasing queue.

- q->disk is referred in fast io code path, such as io account code, but
q->disk can be cleared when queue is active since queue is still needed
for handling passthrough/private request after disk is deleted

Move freeing disk into queue's release handler, so that block_device(for
disk), gendisk and request queue can be removed at the same time basically,
then q->disk can be used reliably. This way is reasonable too, since
request queue becomes dying actually in del_gendisk(), see commit
8e141f9eb803 ("block: drain file system I/O on del_gendisk").

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-sysfs.c | 13 +++++++++++++
 block/genhd.c     |  7 +++++--
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index e20eadfcf5c8..dc8af443b29b 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -12,6 +12,7 @@
 #include <linux/blk-mq.h>
 #include <linux/blk-cgroup.h>
 #include <linux/debugfs.h>
+#include <linux/fs.h>
 
 #include "blk.h"
 #include "blk-mq.h"
@@ -811,6 +812,18 @@ static void blk_release_queue(struct kobject *kobj)
 
 	bioset_exit(&q->bio_split);
 
+	/*
+	 * Free associated disk now if there is.
+	 *
+	 * Follows cases in which request queue hasn't disk:
+	 *
+	 * - not active LUN probed for scsi host
+	 *
+	 * - nvme's admin queue
+	 */
+	if (q->disk)
+		iput(q->disk->part0->bd_inode);
+
 	ida_simple_remove(&blk_queue_ida, q->id);
 	call_rcu(&q->rcu_head, blk_free_queue_rcu);
 }
diff --git a/block/genhd.c b/block/genhd.c
index 626c8406f21a..6357cab37eef 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1112,9 +1112,12 @@ static void disk_release(struct device *dev)
 	disk_release_events(disk);
 	kfree(disk->random);
 	xa_destroy(&disk->part_tbl);
-	disk->queue->disk = NULL;
+
+	/*
+	 * delay freeing disk/and its bdev into request queue's release
+	 * handler, then all can be killed at the same time
+	 */
 	blk_put_queue(disk->queue);
-	iput(disk->part0->bd_inode);	/* frees the disk */
 }
 
 static int block_uevent(struct device *dev, struct kobj_uevent_env *env)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] block: revert aec89dc5d421 block: keep q_usage_counter in atomic mode after del_gendisk
  2022-01-16  4:18 [PATCH 0/3] block: don't drain file system I/O on del_gendisk Ming Lei
  2022-01-16  4:18 ` [PATCH 1/3] block: move freeing disk into queue's release handler Ming Lei
@ 2022-01-16  4:18 ` Ming Lei
  2022-01-16  4:18 ` [PATCH 3/3] block: revert 8e141f9eb803 block: drain file system I/O on del_gendisk Ming Lei
  2022-01-17  8:13 ` [PATCH 0/3] block: don't " Christoph Hellwig
  3 siblings, 0 replies; 12+ messages in thread
From: Ming Lei @ 2022-01-16  4:18 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Ming Lei

Prepare for revert commit 8e141f9eb803 ("block: drain file system I/O
on del_gendisk").

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 9 +--------
 block/blk.h    | 1 -
 block/genhd.c  | 3 +--
 3 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index a6d4780580fc..cb979dcf7986 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -215,11 +215,9 @@ void blk_mq_freeze_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue);
 
-void __blk_mq_unfreeze_queue(struct request_queue *q, bool force_atomic)
+void blk_mq_unfreeze_queue(struct request_queue *q)
 {
 	mutex_lock(&q->mq_freeze_lock);
-	if (force_atomic)
-		q->q_usage_counter.data->force_atomic = true;
 	q->mq_freeze_depth--;
 	WARN_ON_ONCE(q->mq_freeze_depth < 0);
 	if (!q->mq_freeze_depth) {
@@ -228,11 +226,6 @@ void __blk_mq_unfreeze_queue(struct request_queue *q, bool force_atomic)
 	}
 	mutex_unlock(&q->mq_freeze_lock);
 }
-
-void blk_mq_unfreeze_queue(struct request_queue *q)
-{
-	__blk_mq_unfreeze_queue(q, false);
-}
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
 
 /*
diff --git a/block/blk.h b/block/blk.h
index 8bd43b3ad33d..9ee7ab1c5572 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -43,7 +43,6 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size,
 void blk_free_flush_queue(struct blk_flush_queue *q);
 
 void blk_freeze_queue(struct request_queue *q);
-void __blk_mq_unfreeze_queue(struct request_queue *q, bool force_atomic);
 void blk_queue_start_drain(struct request_queue *q);
 int __bio_queue_enter(struct request_queue *q, struct bio *bio);
 bool submit_bio_checks(struct bio *bio);
diff --git a/block/genhd.c b/block/genhd.c
index 6357cab37eef..9842371904d6 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -628,8 +628,7 @@ void del_gendisk(struct gendisk *disk)
 	/*
 	 * Allow using passthrough request again after the queue is torn down.
 	 */
-	blk_queue_flag_clear(QUEUE_FLAG_INIT_DONE, q);
-	__blk_mq_unfreeze_queue(q, true);
+	blk_mq_unfreeze_queue(q);
 
 }
 EXPORT_SYMBOL(del_gendisk);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] block: revert 8e141f9eb803 block: drain file system I/O on del_gendisk
  2022-01-16  4:18 [PATCH 0/3] block: don't drain file system I/O on del_gendisk Ming Lei
  2022-01-16  4:18 ` [PATCH 1/3] block: move freeing disk into queue's release handler Ming Lei
  2022-01-16  4:18 ` [PATCH 2/3] block: revert aec89dc5d421 block: keep q_usage_counter in atomic mode after del_gendisk Ming Lei
@ 2022-01-16  4:18 ` Ming Lei
  2022-01-17  8:13 ` [PATCH 0/3] block: don't " Christoph Hellwig
  3 siblings, 0 replies; 12+ messages in thread
From: Ming Lei @ 2022-01-16  4:18 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Ming Lei

Commit 8e141f9eb803 ("block: drain file system I/O on del_gendisk") is
added for addressing the issue in which disk release is done before
blk_cleanup_queue(), but it has several problems:

1) queue freezing can't drain FS I/O for bio based driver

2) it isn't easy to move elevator/cgroup/throttle shutdown during
del_gendisk, and q->disk can still be referred in these code paths

3) the added flag of GD_DEAD may not be observed reliably in
__bio_queue_enter() because queue freezing might not imply rcu grace
period.

We have moved freeing disk and its block_device into request queue's
release handler, and it is safe to refer to q->disk in IO related path
now, so revert this commit.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-core.c      | 24 ++++++++++++------------
 block/blk.h           |  1 -
 block/genhd.c         | 23 -----------------------
 include/linux/genhd.h |  1 -
 4 files changed, 12 insertions(+), 37 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 97f8bc8d3a79..a16fb551ac03 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -50,6 +50,7 @@
 #include "blk-mq-sched.h"
 #include "blk-pm.h"
 #include "blk-throttle.h"
+#include "blk-rq-qos.h"
 
 struct dentry *blk_debugfs_root;
 
@@ -270,8 +271,10 @@ void blk_put_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_put_queue);
 
-void blk_queue_start_drain(struct request_queue *q)
+void blk_set_queue_dying(struct request_queue *q)
 {
+	blk_queue_flag_set(QUEUE_FLAG_DYING, q);
+
 	/*
 	 * When queue DYING flag is set, we need to block new req
 	 * entering queue, so we call blk_freeze_queue_start() to
@@ -283,12 +286,6 @@ void blk_queue_start_drain(struct request_queue *q)
 	/* Make blk_queue_enter() reexamine the DYING flag. */
 	wake_up_all(&q->mq_freeze_wq);
 }
-
-void blk_set_queue_dying(struct request_queue *q)
-{
-	blk_queue_flag_set(QUEUE_FLAG_DYING, q);
-	blk_queue_start_drain(q);
-}
 EXPORT_SYMBOL_GPL(blk_set_queue_dying);
 
 /**
@@ -320,8 +317,13 @@ void blk_cleanup_queue(struct request_queue *q)
 	 */
 	blk_freeze_queue(q);
 
+	rq_qos_exit(q);
+
 	blk_queue_flag_set(QUEUE_FLAG_DEAD, q);
 
+	/* for synchronous bio-based driver finish in-flight integrity i/o */
+	blk_flush_integrity();
+
 	blk_sync_queue(q);
 	if (queue_is_mq(q)) {
 		blk_mq_cancel_work_sync(q);
@@ -383,10 +385,8 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 int __bio_queue_enter(struct request_queue *q, struct bio *bio)
 {
 	while (!blk_try_enter_queue(q, false)) {
-		struct gendisk *disk = bio->bi_bdev->bd_disk;
-
 		if (bio->bi_opf & REQ_NOWAIT) {
-			if (test_bit(GD_DEAD, &disk->state))
+			if (blk_queue_dying(q))
 				goto dead;
 			bio_wouldblock_error(bio);
 			return -EBUSY;
@@ -403,8 +403,8 @@ int __bio_queue_enter(struct request_queue *q, struct bio *bio)
 		wait_event(q->mq_freeze_wq,
 			   (!q->mq_freeze_depth &&
 			    blk_pm_resume_queue(false, q)) ||
-			   test_bit(GD_DEAD, &disk->state));
-		if (test_bit(GD_DEAD, &disk->state))
+			   blk_queue_dying(q));
+		if (blk_queue_dying(q))
 			goto dead;
 	}
 
diff --git a/block/blk.h b/block/blk.h
index 9ee7ab1c5572..4293c8e0a082 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -43,7 +43,6 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size,
 void blk_free_flush_queue(struct blk_flush_queue *q);
 
 void blk_freeze_queue(struct request_queue *q);
-void blk_queue_start_drain(struct request_queue *q);
 int __bio_queue_enter(struct request_queue *q, struct bio *bio);
 bool submit_bio_checks(struct bio *bio);
 
diff --git a/block/genhd.c b/block/genhd.c
index 9842371904d6..f7577dde18fc 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -29,7 +29,6 @@
 
 #include "blk.h"
 #include "blk-mq-sched.h"
-#include "blk-rq-qos.h"
 
 static struct kobject *block_depr;
 
@@ -569,8 +568,6 @@ EXPORT_SYMBOL(device_add_disk);
  */
 void del_gendisk(struct gendisk *disk)
 {
-	struct request_queue *q = disk->queue;
-
 	might_sleep();
 
 	if (WARN_ON_ONCE(!disk_live(disk) && !(disk->flags & GENHD_FL_HIDDEN)))
@@ -587,17 +584,8 @@ void del_gendisk(struct gendisk *disk)
 	fsync_bdev(disk->part0);
 	__invalidate_device(disk->part0, true);
 
-	/*
-	 * Fail any new I/O.
-	 */
-	set_bit(GD_DEAD, &disk->state);
 	set_capacity(disk, 0);
 
-	/*
-	 * Prevent new I/O from crossing bio_queue_enter().
-	 */
-	blk_queue_start_drain(q);
-
 	if (!(disk->flags & GENHD_FL_HIDDEN)) {
 		sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
 
@@ -619,17 +607,6 @@ void del_gendisk(struct gendisk *disk)
 		sysfs_remove_link(block_depr, dev_name(disk_to_dev(disk)));
 	pm_runtime_set_memalloc_noio(disk_to_dev(disk), false);
 	device_del(disk_to_dev(disk));
-
-	blk_mq_freeze_queue_wait(q);
-
-	rq_qos_exit(q);
-	blk_sync_queue(q);
-	blk_flush_integrity();
-	/*
-	 * Allow using passthrough request again after the queue is torn down.
-	 */
-	blk_mq_unfreeze_queue(q);
-
 }
 EXPORT_SYMBOL(del_gendisk);
 
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 6906a45bc761..3e9f234495e4 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -108,7 +108,6 @@ struct gendisk {
 	unsigned long state;
 #define GD_NEED_PART_SCAN		0
 #define GD_READ_ONLY			1
-#define GD_DEAD				2
 #define GD_NATIVE_CAPACITY		3
 
 	struct mutex open_mutex;	/* open/close mutex */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] block: don't drain file system I/O on del_gendisk
  2022-01-16  4:18 [PATCH 0/3] block: don't drain file system I/O on del_gendisk Ming Lei
                   ` (2 preceding siblings ...)
  2022-01-16  4:18 ` [PATCH 3/3] block: revert 8e141f9eb803 block: drain file system I/O on del_gendisk Ming Lei
@ 2022-01-17  8:13 ` Christoph Hellwig
  2022-01-17  9:08   ` Ming Lei
  3 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2022-01-17  8:13 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Christoph Hellwig

On Sun, Jan 16, 2022 at 12:18:12PM +0800, Ming Lei wrote:
> Hello,
> 
> Draining FS I/O on del_gendisk() is added for just avoiding to refer to
> recently added q->disk in IO path, and it isn't actually needed.

We need it to have proper life times in the block layer.  Everything only
needed for file system I/O and not blk-mq specific should slowly move
from the request_queue to the gendisk and I have patches going in
that direction.  In the end only the SCSI discovery code and the case
of /dev/sg without SCSI ULP will ever do passthrough I/O purely on the
gendisk.

So I think this series is moving in the wrong direction.  If you care
about no doing two freeze cycles the right thing to do is to record
if we ever did non-disk based passthrough I/O on a requeue_queue and
if not simplify the request_queue cleanup.  Doing this is on my TODO
list but I haven't look into the details yet.

> 1) queue freezing can't drain FS I/O for bio based driver

This is something I've started looking into it.

> 2) it isn't easy to move elevator/cgroup/throttle shutdown during
> del_gendisk, and q->disk can still be referred in these code paths

I've also done some prep work to land this cycle here, as all that
code is only used for FS I/O.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] block: don't drain file system I/O on del_gendisk
  2022-01-17  8:13 ` [PATCH 0/3] block: don't " Christoph Hellwig
@ 2022-01-17  9:08   ` Ming Lei
  2022-01-18  8:25     ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Ming Lei @ 2022-01-17  9:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block

On Mon, Jan 17, 2022 at 09:13:21AM +0100, Christoph Hellwig wrote:
> On Sun, Jan 16, 2022 at 12:18:12PM +0800, Ming Lei wrote:
> > Hello,
> > 
> > Draining FS I/O on del_gendisk() is added for just avoiding to refer to
> > recently added q->disk in IO path, and it isn't actually needed.
> 
> We need it to have proper life times in the block layer.  Everything only
> needed for file system I/O and not blk-mq specific should slowly move
> from the request_queue to the gendisk and I have patches going in
> that direction.  In the end only the SCSI discovery code and the case
> of /dev/sg without SCSI ULP will ever do passthrough I/O purely on the
> gendisk.
> 
> So I think this series is moving in the wrong direction.  If you care
> about no doing two freeze cycles the right thing to do is to record

I just think that the extra draining point in del_gendisk() isn't useful,
can you share any use case with this change?

> if we ever did non-disk based passthrough I/O on a requeue_queue and
> if not simplify the request_queue cleanup.  Doing this is on my TODO
> list but I haven't look into the details yet.
> 
> > 1) queue freezing can't drain FS I/O for bio based driver
> 
> This is something I've started looking into it.

But that is one big problem, not sure you can solve it in short time,
also not sure if it is useful, cause FS already guaranteed that every
IO is drained before releasing disk, or IOs in the submission task are
drained when exiting the task.

> 
> > 2) it isn't easy to move elevator/cgroup/throttle shutdown during
> > del_gendisk, and q->disk can still be referred in these code paths
> 
> I've also done some prep work to land this cycle here, as all that
> code is only used for FS I/O.

IMO,

Firstly, FS layer has already guaranteed that every FS IO is done before
releasing disk, so no need to take so much effort and make code more
fragile to add one extra FS IO draining point in del_gendisk().

Also the above two things aren't trivial enough to solve in short time, so
can we delay the FS draining in del_gendisk() until the two are done?

Thanks,
Ming


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] block: move freeing disk into queue's release handler
  2022-01-16  4:18 ` [PATCH 1/3] block: move freeing disk into queue's release handler Ming Lei
@ 2022-01-18  8:22   ` Christoph Hellwig
  2022-01-18 15:47     ` Ming Lei
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2022-01-18  8:22 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, linux-block, Christoph Hellwig

How does this work for SCSI where we can detach the disk from the
request queue, reattach it and then maybe later free them both?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] block: don't drain file system I/O on del_gendisk
  2022-01-17  9:08   ` Ming Lei
@ 2022-01-18  8:25     ` Christoph Hellwig
  2022-01-19  9:02       ` Ming Lei
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2022-01-18  8:25 UTC (permalink / raw)
  To: Ming Lei; +Cc: Christoph Hellwig, Jens Axboe, linux-block

On Mon, Jan 17, 2022 at 05:08:23PM +0800, Ming Lei wrote:
> > We need it to have proper life times in the block layer.  Everything only
> > needed for file system I/O and not blk-mq specific should slowly move
> > from the request_queue to the gendisk and I have patches going in
> > that direction.  In the end only the SCSI discovery code and the case
> > of /dev/sg without SCSI ULP will ever do passthrough I/O purely on the
> > gendisk.
> > 
> > So I think this series is moving in the wrong direction.  If you care
> > about no doing two freeze cycles the right thing to do is to record
> 
> I just think that the extra draining point in del_gendisk() isn't useful,
> can you share any use case with this change?

SCSI disk detach for example is a place where we need it.

> > if we ever did non-disk based passthrough I/O on a requeue_queue and
> > if not simplify the request_queue cleanup.  Doing this is on my TODO
> > list but I haven't look into the details yet.
> > 
> > > 1) queue freezing can't drain FS I/O for bio based driver
> > 
> > This is something I've started looking into it.
> 
> But that is one big problem, not sure you can solve it in short time,
> also not sure if it is useful, cause FS already guaranteed that every
> IO is drained before releasing disk, or IOs in the submission task are
> drained when exiting the task.

Think of a hot unplug.  The device gets a removal even, but the file
system still lives on.

> Firstly, FS layer has already guaranteed that every FS IO is done before
> releasing disk, so no need to take so much effort and make code more
> fragile to add one extra FS IO draining point in del_gendisk().

In the hot removal case the file system is still alive when del_gendisk
is called.

> Also the above two things aren't trivial enough to solve in short time, so
> can we delay the FS draining in del_gendisk() until the two are done?

We already have the draining.  What are you trying to fix by removing it?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] block: move freeing disk into queue's release handler
  2022-01-18  8:22   ` Christoph Hellwig
@ 2022-01-18 15:47     ` Ming Lei
  2022-01-18 15:59       ` Ming Lei
  2022-01-18 16:19       ` Christoph Hellwig
  0 siblings, 2 replies; 12+ messages in thread
From: Ming Lei @ 2022-01-18 15:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block

On Tue, Jan 18, 2022 at 09:22:59AM +0100, Christoph Hellwig wrote:
> How does this work for SCSI where we can detach the disk from the
> request queue, reattach it and then maybe later free them both?

Commit 8e141f9eb803 ("block: drain file system I/O on del_gendisk") has
marked queue as dying, so how can the above case work given no any code
clears the queue's dying flag?

Can you share steps for this test case? And it shouldn't hard to extend this
patch for supporting it.

Thanks, 
Ming


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] block: move freeing disk into queue's release handler
  2022-01-18 15:47     ` Ming Lei
@ 2022-01-18 15:59       ` Ming Lei
  2022-01-18 16:19       ` Christoph Hellwig
  1 sibling, 0 replies; 12+ messages in thread
From: Ming Lei @ 2022-01-18 15:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block

On Tue, Jan 18, 2022 at 11:47:33PM +0800, Ming Lei wrote:
> On Tue, Jan 18, 2022 at 09:22:59AM +0100, Christoph Hellwig wrote:
> > How does this work for SCSI where we can detach the disk from the
> > request queue, reattach it and then maybe later free them both?
> 
> Commit 8e141f9eb803 ("block: drain file system I/O on del_gendisk") has
> marked queue as dying, so how can the above case work given no any code
> clears the queue's dying flag?

oops, my fault, blk_queue_start_drain() actually doesn't set
QUEUE_FLAG_DYING, so the rettach case works with commit 8e141f9eb803.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] block: move freeing disk into queue's release handler
  2022-01-18 15:47     ` Ming Lei
  2022-01-18 15:59       ` Ming Lei
@ 2022-01-18 16:19       ` Christoph Hellwig
  1 sibling, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2022-01-18 16:19 UTC (permalink / raw)
  To: Ming Lei; +Cc: Christoph Hellwig, Jens Axboe, linux-block

On Tue, Jan 18, 2022 at 11:47:33PM +0800, Ming Lei wrote:
> Can you share steps for this test case? And it shouldn't hard to extend this
> patch for supporting it.

Just use the sysfs unbind/bind files to detach and attach the ULP.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] block: don't drain file system I/O on del_gendisk
  2022-01-18  8:25     ` Christoph Hellwig
@ 2022-01-19  9:02       ` Ming Lei
  0 siblings, 0 replies; 12+ messages in thread
From: Ming Lei @ 2022-01-19  9:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, linux-block

On Tue, Jan 18, 2022 at 09:25:08AM +0100, Christoph Hellwig wrote:
> On Mon, Jan 17, 2022 at 05:08:23PM +0800, Ming Lei wrote:
> > > We need it to have proper life times in the block layer.  Everything only
> > > needed for file system I/O and not blk-mq specific should slowly move
> > > from the request_queue to the gendisk and I have patches going in
> > > that direction.  In the end only the SCSI discovery code and the case
> > > of /dev/sg without SCSI ULP will ever do passthrough I/O purely on the
> > > gendisk.
> > > 
> > > So I think this series is moving in the wrong direction.  If you care
> > > about no doing two freeze cycles the right thing to do is to record
> > 
> > I just think that the extra draining point in del_gendisk() isn't useful,
> > can you share any use case with this change?
> 
> SCSI disk detach for example is a place where we need it.

SCSI disk detach doesn't need it, since 8e141f9eb803 ("block: drain file
system I/O on del_gendisk") is added for fixing q->disk in Sept., 2021, but
sd detach has been there for dozens of years.

del_gendisk() starts to prevent new IO from being submitted to queue,
and sync and flush dirty pages, but other in-flight IOs are still fine,
since disk release will wait for all of them.

> 
> > > if we ever did non-disk based passthrough I/O on a requeue_queue and
> > > if not simplify the request_queue cleanup.  Doing this is on my TODO
> > > list but I haven't look into the details yet.
> > > 
> > > > 1) queue freezing can't drain FS I/O for bio based driver
> > > 
> > > This is something I've started looking into it.
> > 
> > But that is one big problem, not sure you can solve it in short time,
> > also not sure if it is useful, cause FS already guaranteed that every
> > IO is drained before releasing disk, or IOs in the submission task are
> > drained when exiting the task.
> 
> Think of a hot unplug.  The device gets a removal even, but the file
> system still lives on.

No need the extra drain point, same with above.

> 
> > Firstly, FS layer has already guaranteed that every FS IO is done before
> > releasing disk, so no need to take so much effort and make code more
> > fragile to add one extra FS IO draining point in del_gendisk().
> 
> In the hot removal case the file system is still alive when del_gendisk
> is called.

Yeah, but draining IO in del_gendisk() doesn't make difference from FS
viewpoint, does it?

> 
> > Also the above two things aren't trivial enough to solve in short time, so
> > can we delay the FS draining in del_gendisk() until the two are done?
> 
> We already have the draining.  What are you trying to fix by removing it?

It is just added months ago, and doesn't mean it is reasonable/necessary.

Now I am thinking of handling this stuff in the following approach:

1) move block cgroup and rqos allocation into allocating disk, and move their
destroying into disk release handler; and in long term, both two can be moved
into gendisk, as you mentioned.

2) just run io accounting on passthrough request from user space, such
as, sg io and nvme ns io, and one RQF_USER_IO flag may help to do that; for
other private kinds of command, we needn't to account them and q->disk may
not be available for them.

3) once the above two are done, we still can remove the added io draining in
del_gendisk().


Thanks,
Ming


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-01-19  9:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-16  4:18 [PATCH 0/3] block: don't drain file system I/O on del_gendisk Ming Lei
2022-01-16  4:18 ` [PATCH 1/3] block: move freeing disk into queue's release handler Ming Lei
2022-01-18  8:22   ` Christoph Hellwig
2022-01-18 15:47     ` Ming Lei
2022-01-18 15:59       ` Ming Lei
2022-01-18 16:19       ` Christoph Hellwig
2022-01-16  4:18 ` [PATCH 2/3] block: revert aec89dc5d421 block: keep q_usage_counter in atomic mode after del_gendisk Ming Lei
2022-01-16  4:18 ` [PATCH 3/3] block: revert 8e141f9eb803 block: drain file system I/O on del_gendisk Ming Lei
2022-01-17  8:13 ` [PATCH 0/3] block: don't " Christoph Hellwig
2022-01-17  9:08   ` Ming Lei
2022-01-18  8:25     ` Christoph Hellwig
2022-01-19  9:02       ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.