[PATCH 0/4] nvme: fix two kinds of IO hang from removing NSs

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] nvme: fix two kinds of IO hang from removing NSs
@ 2023-06-15 14:32 Ming Lei
  2023-06-15 14:32 ` [PATCH 1/4] blk-mq: add API of blk_mq_unfreeze_queue_force Ming Lei
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Ming Lei @ 2023-06-15 14:32 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Keith Busch, linux-nvme
  Cc: Yi Zhang, linux-block, Chunguang Xu, Ming Lei

Hello,

The 1st three patch fixes io hang when controller removal interrupts error
recovery, then queue is left as frozen.

The 4th patch fixes io hang when controller is left as unquiesce.

Ming Lei (4):
  blk-mq: add API of blk_mq_unfreeze_queue_force
  nvme: add nvme_unfreeze_force()
  nvme: unfreeze queues before removing namespaces
  nvme: unquiesce io queues when removing namespaces

 block/blk-mq.c           | 25 ++++++++++++++++++++++---
 block/blk.h              |  3 ++-
 block/genhd.c            |  2 +-
 drivers/nvme/host/core.c | 34 ++++++++++++++++++++++++++++------
 drivers/nvme/host/nvme.h |  1 +
 include/linux/blk-mq.h   |  1 +
 6 files changed, 55 insertions(+), 11 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] blk-mq: add API of blk_mq_unfreeze_queue_force
  2023-06-15 14:32 [PATCH 0/4] nvme: fix two kinds of IO hang from removing NSs Ming Lei
@ 2023-06-15 14:32 ` Ming Lei
  2023-06-15 15:16   ` Keith Busch
  2023-06-15 14:32 ` [PATCH 2/4] nvme: add nvme_unfreeze_force() Ming Lei
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2023-06-15 14:32 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Keith Busch, linux-nvme
  Cc: Yi Zhang, linux-block, Chunguang Xu, Ming Lei

NVMe calls freeze/unfreeze in different contexts, and controller removal
may break in-progress error recovery, then leave queues in frozen state.
So cause IO hang in del_gendisk() because pending writeback IOs are
still waited in bio_queue_enter().

Prepare for fixing this issue by calling the added
blk_mq_unfreeze_queue_force when removing device.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c         | 25 ++++++++++++++++++++++---
 block/blk.h            |  3 ++-
 block/genhd.c          |  2 +-
 include/linux/blk-mq.h |  1 +
 4 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 24dc8fe0a9d2..6ac58dc9e648 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -185,12 +185,16 @@ void blk_mq_freeze_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue);
 
-void __blk_mq_unfreeze_queue(struct request_queue *q, bool force_atomic)
+void __blk_mq_unfreeze_queue(struct request_queue *q, bool force_atomic,
+		bool force)
 {
 	mutex_lock(&q->mq_freeze_lock);
 	if (force_atomic)
 		q->q_usage_counter.data->force_atomic = true;
-	q->mq_freeze_depth--;
+	if (force)
+		q->mq_freeze_depth = 0;
+	else
+		q->mq_freeze_depth--;
 	WARN_ON_ONCE(q->mq_freeze_depth < 0);
 	if (!q->mq_freeze_depth) {
 		percpu_ref_resurrect(&q->q_usage_counter);
@@ -201,10 +205,25 @@ void __blk_mq_unfreeze_queue(struct request_queue *q, bool force_atomic)
 
 void blk_mq_unfreeze_queue(struct request_queue *q)
 {
-	__blk_mq_unfreeze_queue(q, false);
+	__blk_mq_unfreeze_queue(q, false, false);
 }
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
 
+/*
+ * Force to unfreeze queue
+ *
+ * Be careful: this API should only be used for avoiding IO hang in
+ * bio_queue_enter() when going to remove disk which needs to drain pending
+ * writeback IO.
+ *
+ * Please don't use it for other cases.
+ */
+void blk_mq_unfreeze_queue_force(struct request_queue *q)
+{
+	__blk_mq_unfreeze_queue(q, false, true);
+}
+EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue_force);
+
 /*
  * FIXME: replace the scsi_internal_device_*block_nowait() calls in the
  * mpt3sas driver such that this function can be removed.
diff --git a/block/blk.h b/block/blk.h
index 768852a84fef..5c9f99051837 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -33,7 +33,8 @@ struct blk_flush_queue *blk_alloc_flush_queue(int node, int cmd_size,
 void blk_free_flush_queue(struct blk_flush_queue *q);
 
 void blk_freeze_queue(struct request_queue *q);
-void __blk_mq_unfreeze_queue(struct request_queue *q, bool force_atomic);
+void __blk_mq_unfreeze_queue(struct request_queue *q, bool force_atomic, bool
+		force);
 void blk_queue_start_drain(struct request_queue *q);
 int __bio_queue_enter(struct request_queue *q, struct bio *bio);
 void submit_bio_noacct_nocheck(struct bio *bio);
diff --git a/block/genhd.c b/block/genhd.c
index f71f82991434..184aa968b453 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -708,7 +708,7 @@ void del_gendisk(struct gendisk *disk)
 	 */
 	if (!test_bit(GD_OWNS_QUEUE, &disk->state)) {
 		blk_queue_flag_clear(QUEUE_FLAG_INIT_DONE, q);
-		__blk_mq_unfreeze_queue(q, true);
+		__blk_mq_unfreeze_queue(q, true, false);
 	} else {
 		if (queue_is_mq(q))
 			blk_mq_exit_queue(q);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index f401067ac03a..fa265e85d753 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -890,6 +890,7 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 void blk_mq_tagset_wait_completed_request(struct blk_mq_tag_set *tagset);
 void blk_mq_freeze_queue(struct request_queue *q);
 void blk_mq_unfreeze_queue(struct request_queue *q);
+void blk_mq_unfreeze_queue_force(struct request_queue *q);
 void blk_freeze_queue_start(struct request_queue *q);
 void blk_mq_freeze_queue_wait(struct request_queue *q);
 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] nvme: add nvme_unfreeze_force()
  2023-06-15 14:32 [PATCH 0/4] nvme: fix two kinds of IO hang from removing NSs Ming Lei
  2023-06-15 14:32 ` [PATCH 1/4] blk-mq: add API of blk_mq_unfreeze_queue_force Ming Lei
@ 2023-06-15 14:32 ` Ming Lei
  2023-06-15 14:32 ` [PATCH 3/4] nvme: unfreeze queues before removing namespaces Ming Lei
  2023-06-15 14:32 ` [PATCH 4/4] nvme: unquiesce io queues when " Ming Lei
  3 siblings, 0 replies; 14+ messages in thread
From: Ming Lei @ 2023-06-15 14:32 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Keith Busch, linux-nvme
  Cc: Yi Zhang, linux-block, Chunguang Xu, Ming Lei

Add nvme_unfreeze_force() for fixing IO hang during removing namespaces
from error recovery.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c | 21 ++++++++++++++++++---
 drivers/nvme/host/nvme.h |  1 +
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c3d72fc677f7..96785913845b 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -5220,17 +5220,32 @@ void nvme_mark_namespaces_dead(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_mark_namespaces_dead);
 
-void nvme_unfreeze(struct nvme_ctrl *ctrl)
+static void __nvme_unfreeze(struct nvme_ctrl *ctrl, bool force)
 {
 	struct nvme_ns *ns;
 
 	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list)
-		blk_mq_unfreeze_queue(ns->queue);
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		if (force)
+			blk_mq_unfreeze_queue_force(ns->queue);
+		else
+			blk_mq_unfreeze_queue(ns->queue);
+	}
 	up_read(&ctrl->namespaces_rwsem);
 }
+
+void nvme_unfreeze(struct nvme_ctrl *ctrl)
+{
+	__nvme_unfreeze(ctrl, false);
+}
 EXPORT_SYMBOL_GPL(nvme_unfreeze);
 
+void nvme_unfreeze_force(struct nvme_ctrl *ctrl)
+{
+	__nvme_unfreeze(ctrl, true);
+}
+EXPORT_SYMBOL_GPL(nvme_unfreeze_force);
+
 int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout)
 {
 	struct nvme_ns *ns;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 953e59f56139..33b740808e7b 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -764,6 +764,7 @@ void nvme_mark_namespaces_dead(struct nvme_ctrl *ctrl);
 void nvme_sync_queues(struct nvme_ctrl *ctrl);
 void nvme_sync_io_queues(struct nvme_ctrl *ctrl);
 void nvme_unfreeze(struct nvme_ctrl *ctrl);
+void nvme_unfreeze_force(struct nvme_ctrl *ctrl);
 void nvme_wait_freeze(struct nvme_ctrl *ctrl);
 int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout);
 void nvme_start_freeze(struct nvme_ctrl *ctrl);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] nvme: unfreeze queues before removing namespaces
  2023-06-15 14:32 [PATCH 0/4] nvme: fix two kinds of IO hang from removing NSs Ming Lei
  2023-06-15 14:32 ` [PATCH 1/4] blk-mq: add API of blk_mq_unfreeze_queue_force Ming Lei
  2023-06-15 14:32 ` [PATCH 2/4] nvme: add nvme_unfreeze_force() Ming Lei
@ 2023-06-15 14:32 ` Ming Lei
  2023-06-15 14:32 ` [PATCH 4/4] nvme: unquiesce io queues when " Ming Lei
  3 siblings, 0 replies; 14+ messages in thread
From: Ming Lei @ 2023-06-15 14:32 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Keith Busch, linux-nvme
  Cc: Yi Zhang, linux-block, Chunguang Xu, Ming Lei

If removal is from breaking error recovery, queues may be frozen, and
there may be pending IOs in bio_queue_enter(), and the following
del_gendisk() may wait for these IOs, especially from writeback.

Similar IO hang exists in flushing scan work too if there are pending
IO in scan work context.

Fix the kind of issue by unfreezing queues before removing namespace,
so that all pending IOs can be handled.

Reported-by: Chunguang Xu <brookxu.cn@gmail.com>
https://lore.kernel.org/linux-nvme/cover.1685350577.git.chunguang.xu@shopee.com/
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 96785913845b..ec7bd33b7e5f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4645,6 +4645,9 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
 	 */
 	nvme_mpath_clear_ctrl_paths(ctrl);
 
+	/* unfreeze queues which may be frozen from error recovery */
+	nvme_unfreeze_force(ctrl);
+
 	/* prevent racing with ns scanning */
 	flush_work(&ctrl->scan_work);
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] nvme: unquiesce io queues when removing namespaces
  2023-06-15 14:32 [PATCH 0/4] nvme: fix two kinds of IO hang from removing NSs Ming Lei
                   ` (2 preceding siblings ...)
  2023-06-15 14:32 ` [PATCH 3/4] nvme: unfreeze queues before removing namespaces Ming Lei
@ 2023-06-15 14:32 ` Ming Lei
  3 siblings, 0 replies; 14+ messages in thread
From: Ming Lei @ 2023-06-15 14:32 UTC (permalink / raw)
  To: Jens Axboe, Christoph Hellwig, Sagi Grimberg, Keith Busch, linux-nvme
  Cc: Yi Zhang, linux-block, Chunguang Xu, Ming Lei

Error recovery can be interrupted by controller removal, then the
controller is left as quiesced, and IO hang can be caused.

Fix the issue by unquiescing controller unconditionally when removing
namespaces.

Reported-by: Chunguang Xu <brookxu.cn@gmail.com>
Closes: https://lore.kernel.org/linux-nvme/cover.1685350577.git.chunguang.xu@shopee.com/
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/core.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ec7bd33b7e5f..6d58b30ea835 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4648,6 +4648,12 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
 	/* unfreeze queues which may be frozen from error recovery */
 	nvme_unfreeze_force(ctrl);
 
+	/*
+	 * Unquiesce io queues so any pending IO won't hang, especially
+	 * those submitted from scan work
+	 */
+	nvme_unquiesce_io_queues(ctrl);
+
 	/* prevent racing with ns scanning */
 	flush_work(&ctrl->scan_work);
 
@@ -4657,10 +4663,8 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
 	 * removing the namespaces' disks; fail all the queues now to avoid
 	 * potentially having to clean up the failed sync later.
 	 */
-	if (ctrl->state == NVME_CTRL_DEAD) {
+	if (ctrl->state == NVME_CTRL_DEAD)
 		nvme_mark_namespaces_dead(ctrl);
-		nvme_unquiesce_io_queues(ctrl);
-	}
 
 	/* this is a no-op when called from the controller reset handler */
 	nvme_change_ctrl_state(ctrl, NVME_CTRL_DELETING_NOIO);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] blk-mq: add API of blk_mq_unfreeze_queue_force
  2023-06-15 14:32 ` [PATCH 1/4] blk-mq: add API of blk_mq_unfreeze_queue_force Ming Lei
@ 2023-06-15 15:16   ` Keith Busch
  2023-06-15 15:43     ` Ming Lei
  0 siblings, 1 reply; 14+ messages in thread
From: Keith Busch @ 2023-06-15 15:16 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme,
	Yi Zhang, linux-block, Chunguang Xu

On Thu, Jun 15, 2023 at 10:32:33PM +0800, Ming Lei wrote:
> NVMe calls freeze/unfreeze in different contexts, and controller removal
> may break in-progress error recovery, then leave queues in frozen state.
> So cause IO hang in del_gendisk() because pending writeback IOs are
> still waited in bio_queue_enter().

Shouldn't those writebacks be unblocked by the existing check in
bio_queue_enter, test_bit(GD_DEAD, &disk->state))? Or are we missing a
disk state update or wakeup on this condition?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] blk-mq: add API of blk_mq_unfreeze_queue_force
  2023-06-15 15:16   ` Keith Busch
@ 2023-06-15 15:43     ` Ming Lei
  2023-06-16  5:48       ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2023-06-15 15:43 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme,
	Yi Zhang, linux-block, Chunguang Xu

On Thu, Jun 15, 2023 at 09:16:27AM -0600, Keith Busch wrote:
> On Thu, Jun 15, 2023 at 10:32:33PM +0800, Ming Lei wrote:
> > NVMe calls freeze/unfreeze in different contexts, and controller removal
> > may break in-progress error recovery, then leave queues in frozen state.
> > So cause IO hang in del_gendisk() because pending writeback IOs are
> > still waited in bio_queue_enter().
> 
> Shouldn't those writebacks be unblocked by the existing check in
> bio_queue_enter, test_bit(GD_DEAD, &disk->state))? Or are we missing a
> disk state update or wakeup on this condition?

GD_DEAD is only set if the device is really dead, then all pending IO
will be failed.

We need to try to handle these IOs first if device isn't set as dead by
calling blk_mark_disk_dead().

Thanks,
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] blk-mq: add API of blk_mq_unfreeze_queue_force
  2023-06-15 15:43     ` Ming Lei
@ 2023-06-16  5:48       ` Christoph Hellwig
  2023-06-16  7:20         ` Ming Lei
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2023-06-16  5:48 UTC (permalink / raw)
  To: Ming Lei
  Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme, Yi Zhang, linux-block, Chunguang Xu

On Thu, Jun 15, 2023 at 11:43:51PM +0800, Ming Lei wrote:
> On Thu, Jun 15, 2023 at 09:16:27AM -0600, Keith Busch wrote:
> > On Thu, Jun 15, 2023 at 10:32:33PM +0800, Ming Lei wrote:
> > > NVMe calls freeze/unfreeze in different contexts, and controller removal
> > > may break in-progress error recovery, then leave queues in frozen state.
> > > So cause IO hang in del_gendisk() because pending writeback IOs are
> > > still waited in bio_queue_enter().
> > 
> > Shouldn't those writebacks be unblocked by the existing check in
> > bio_queue_enter, test_bit(GD_DEAD, &disk->state))? Or are we missing a
> > disk state update or wakeup on this condition?
> 
> GD_DEAD is only set if the device is really dead, then all pending IO
> will be failed.

del_gendisk also sets GD_DEAD early on.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] blk-mq: add API of blk_mq_unfreeze_queue_force
  2023-06-16  5:48       ` Christoph Hellwig
@ 2023-06-16  7:20         ` Ming Lei
  2023-06-16  7:27           ` Christoph Hellwig
  2023-06-20  5:41           ` Christoph Hellwig
  0 siblings, 2 replies; 14+ messages in thread
From: Ming Lei @ 2023-06-16  7:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, Yi Zhang,
	linux-block, Chunguang Xu

On Fri, Jun 16, 2023 at 07:48:00AM +0200, Christoph Hellwig wrote:
> On Thu, Jun 15, 2023 at 11:43:51PM +0800, Ming Lei wrote:
> > On Thu, Jun 15, 2023 at 09:16:27AM -0600, Keith Busch wrote:
> > > On Thu, Jun 15, 2023 at 10:32:33PM +0800, Ming Lei wrote:
> > > > NVMe calls freeze/unfreeze in different contexts, and controller removal
> > > > may break in-progress error recovery, then leave queues in frozen state.
> > > > So cause IO hang in del_gendisk() because pending writeback IOs are
> > > > still waited in bio_queue_enter().
> > > 
> > > Shouldn't those writebacks be unblocked by the existing check in
> > > bio_queue_enter, test_bit(GD_DEAD, &disk->state))? Or are we missing a
> > > disk state update or wakeup on this condition?
> > 
> > GD_DEAD is only set if the device is really dead, then all pending IO
> > will be failed.
> 
> del_gendisk also sets GD_DEAD early on.

No.

The hang happens in fsync_bdev() of del_gendisk(), and there are IOs pending on
bio_queue_enter().

Thanks,
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] blk-mq: add API of blk_mq_unfreeze_queue_force
  2023-06-16  7:20         ` Ming Lei
@ 2023-06-16  7:27           ` Christoph Hellwig
  2023-06-16  7:33             ` Ming Lei
  2023-06-20  5:41           ` Christoph Hellwig
  1 sibling, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2023-06-16  7:27 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Keith Busch, Jens Axboe, Sagi Grimberg,
	linux-nvme, Yi Zhang, linux-block, Chunguang Xu

On Fri, Jun 16, 2023 at 03:20:38PM +0800, Ming Lei wrote:
> > > > Shouldn't those writebacks be unblocked by the existing check in
> > > > bio_queue_enter, test_bit(GD_DEAD, &disk->state))? Or are we missing a
> > > > disk state update or wakeup on this condition?
> > > 
> > > GD_DEAD is only set if the device is really dead, then all pending IO
> > > will be failed.
> > 
> > del_gendisk also sets GD_DEAD early on.
> 
> No.
> 
> The hang happens in fsync_bdev() of del_gendisk(), and there are IOs pending on
> bio_queue_enter().

What is the workload here?  If del_gendisk is called to remove a disk
that is in perfectly fine state and can do I/O, fsync_bdev should write
back data, which is what is exists for.  If the disk is dead, we should
have called blk_mark_disk_dead before.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] blk-mq: add API of blk_mq_unfreeze_queue_force
  2023-06-16  7:27           ` Christoph Hellwig
@ 2023-06-16  7:33             ` Ming Lei
  0 siblings, 0 replies; 14+ messages in thread
From: Ming Lei @ 2023-06-16  7:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, Yi Zhang,
	linux-block, Chunguang Xu

On Fri, Jun 16, 2023 at 09:27:21AM +0200, Christoph Hellwig wrote:
> On Fri, Jun 16, 2023 at 03:20:38PM +0800, Ming Lei wrote:
> > > > > Shouldn't those writebacks be unblocked by the existing check in
> > > > > bio_queue_enter, test_bit(GD_DEAD, &disk->state))? Or are we missing a
> > > > > disk state update or wakeup on this condition?
> > > > 
> > > > GD_DEAD is only set if the device is really dead, then all pending IO
> > > > will be failed.
> > > 
> > > del_gendisk also sets GD_DEAD early on.
> > 
> > No.
> > 
> > The hang happens in fsync_bdev() of del_gendisk(), and there are IOs pending on
> > bio_queue_enter().
> 
> What is the workload here?  If del_gendisk is called to remove a disk
> that is in perfectly fine state and can do I/O, fsync_bdev should write
> back data, which is what is exists for.  If the disk is dead, we should
> have called blk_mark_disk_dead before.

It is basically that removing ctrl breaks in-progress error recovery,
then queues are left as quiesced and froze.

https://lore.kernel.org/linux-nvme/CAHj4cs-4gQHnp5aiekvJmb6o8qAcb6nLV61uOGFiisCzM49_dg@mail.gmail.com/T/#u

https://lore.kernel.org/linux-nvme/cover.1685350577.git.chunguang.xu@shopee.com/

Thanks, 
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] blk-mq: add API of blk_mq_unfreeze_queue_force
  2023-06-16  7:20         ` Ming Lei
  2023-06-16  7:27           ` Christoph Hellwig
@ 2023-06-20  5:41           ` Christoph Hellwig
  2023-06-20 10:01             ` Sagi Grimberg
  2023-06-20 13:15             ` Ming Lei
  1 sibling, 2 replies; 14+ messages in thread
From: Christoph Hellwig @ 2023-06-20  5:41 UTC (permalink / raw)
  To: Ming Lei
  Cc: Christoph Hellwig, Keith Busch, Jens Axboe, Sagi Grimberg,
	linux-nvme, Yi Zhang, linux-block, Chunguang Xu

On Fri, Jun 16, 2023 at 03:20:38PM +0800, Ming Lei wrote:
> > > GD_DEAD is only set if the device is really dead, then all pending IO
> > > will be failed.
> > 
> > del_gendisk also sets GD_DEAD early on.
> 
> No.
> 
> The hang happens in fsync_bdev() of del_gendisk(), and there are IOs pending on
> bio_queue_enter().

IFF we know we can't do I/O by the time del_gendisk is called, we
need to call mark_disk_dead first and not paper over the problem.

An API that force unfreezes is just broken and will leaves us with
freezecount mismatches.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] blk-mq: add API of blk_mq_unfreeze_queue_force
  2023-06-20  5:41           ` Christoph Hellwig
@ 2023-06-20 10:01             ` Sagi Grimberg
  2023-06-20 13:15             ` Ming Lei
  1 sibling, 0 replies; 14+ messages in thread
From: Sagi Grimberg @ 2023-06-20 10:01 UTC (permalink / raw)
  To: Christoph Hellwig, Ming Lei
  Cc: Keith Busch, Jens Axboe, linux-nvme, Yi Zhang, linux-block, Chunguang Xu


>>>> GD_DEAD is only set if the device is really dead, then all pending IO
>>>> will be failed.
>>>
>>> del_gendisk also sets GD_DEAD early on.
>>
>> No.
>>
>> The hang happens in fsync_bdev() of del_gendisk(), and there are IOs pending on
>> bio_queue_enter().
> 
> IFF we know we can't do I/O by the time del_gendisk is called, we
> need to call mark_disk_dead first and not paper over the problem.
> 
> An API that force unfreezes is just broken and will leaves us with
> freezecount mismatches.

I absolutely agree here.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] blk-mq: add API of blk_mq_unfreeze_queue_force
  2023-06-20  5:41           ` Christoph Hellwig
  2023-06-20 10:01             ` Sagi Grimberg
@ 2023-06-20 13:15             ` Ming Lei
  1 sibling, 0 replies; 14+ messages in thread
From: Ming Lei @ 2023-06-20 13:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme, Yi Zhang,
	linux-block, Chunguang Xu

On Tue, Jun 20, 2023 at 07:41:41AM +0200, Christoph Hellwig wrote:
> On Fri, Jun 16, 2023 at 03:20:38PM +0800, Ming Lei wrote:
> > > > GD_DEAD is only set if the device is really dead, then all pending IO
> > > > will be failed.
> > > 
> > > del_gendisk also sets GD_DEAD early on.
> > 
> > No.
> > 
> > The hang happens in fsync_bdev() of del_gendisk(), and there are IOs pending on
> > bio_queue_enter().
> 
> IFF we know we can't do I/O by the time del_gendisk is called, we
> need to call mark_disk_dead first and not paper over the problem.

In theory, device removal can happen any time, when it isn't clear
if the controller is recovered well at that time, that is why this
API is added for avoiding to fail IO unnecessarily.

However maybe it is just fine to mark controller as dead in case that
removal breaks current error recovery given current nvme driver error
handling isn't very fine-grained control, so how about something like
the following:

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c3d72fc677f7..120d98f348de 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -558,6 +558,7 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
        }

        if (changed) {
+               ctrl->old_state = ctrl->state;
                ctrl->state = new_state;
                wake_up_all(&ctrl->state_wq);
        }
@@ -4654,7 +4655,7 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
         * removing the namespaces' disks; fail all the queues now to avoid
         * potentially having to clean up the failed sync later.
         */
-       if (ctrl->state == NVME_CTRL_DEAD) {
+       if (ctrl->state == NVME_CTRL_DEAD || ctrl->old_state != NVME_CTRL_LIVE) {
                nvme_mark_namespaces_dead(ctrl);
                nvme_unquiesce_io_queues(ctrl);
        }
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 953e59f56139..7da53cc76f11 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -246,8 +246,9 @@ enum nvme_ctrl_flags {

 struct nvme_ctrl {
        bool comp_seen;
-       enum nvme_ctrl_state state;
        bool identified;
+       enum nvme_ctrl_state old_state;
+       enum nvme_ctrl_state state;
        spinlock_t lock;
        struct mutex scan_lock;
        const struct nvme_ctrl_ops *ops;

> 
> An API that force unfreezes is just broken and will leaves us with
> freezecount mismatches.

The freezecount mismatch problem is actually in nvme driver, please
see the previous patch[1] I posted.


[1] https://lore.kernel.org/linux-block/20230613005847.1762378-1-ming.lei@redhat.com/T/#m17ac1aa71056b6517f8aefbae58c301f296f0a73


Thanks,
Ming


^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-06-20 13:18 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-15 14:32 [PATCH 0/4] nvme: fix two kinds of IO hang from removing NSs Ming Lei
2023-06-15 14:32 ` [PATCH 1/4] blk-mq: add API of blk_mq_unfreeze_queue_force Ming Lei
2023-06-15 15:16   ` Keith Busch
2023-06-15 15:43     ` Ming Lei
2023-06-16  5:48       ` Christoph Hellwig
2023-06-16  7:20         ` Ming Lei
2023-06-16  7:27           ` Christoph Hellwig
2023-06-16  7:33             ` Ming Lei
2023-06-20  5:41           ` Christoph Hellwig
2023-06-20 10:01             ` Sagi Grimberg
2023-06-20 13:15             ` Ming Lei
2023-06-15 14:32 ` [PATCH 2/4] nvme: add nvme_unfreeze_force() Ming Lei
2023-06-15 14:32 ` [PATCH 3/4] nvme: unfreeze queues before removing namespaces Ming Lei
2023-06-15 14:32 ` [PATCH 4/4] nvme: unquiesce io queues when " Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.