All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2 1/2] blk-mq: Export blk_mq_freeze_queue_wait
@ 2017-02-24  0:36 ` Keith Busch
  0 siblings, 0 replies; 26+ messages in thread
From: Keith Busch @ 2017-02-24  0:36 UTC (permalink / raw)
  To: linux-block, linux-nvme, Jens Axboe, Christoph Hellwig, Sagi Grimberg
  Cc: Marc MERLIN, Keith Busch

Drivers can start a freeze, so this provides a way to wait for frozen.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 block/blk-mq.c         | 3 ++-
 include/linux/blk-mq.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9e6b064..d5b9a15 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -75,10 +75,11 @@ void blk_mq_freeze_queue_start(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_start);
 
-static void blk_mq_freeze_queue_wait(struct request_queue *q)
+void blk_mq_freeze_queue_wait(struct request_queue *q)
 {
 	wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter));
 }
+EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);
 
 /*
  * Guarantee no request is in use, so we can change any data structure of
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 001d30d..8dacf68 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -245,6 +245,7 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 void blk_mq_freeze_queue(struct request_queue *q);
 void blk_mq_unfreeze_queue(struct request_queue *q);
 void blk_mq_freeze_queue_start(struct request_queue *q);
+void blk_mq_freeze_queue_wait(struct request_queue *q);
 int blk_mq_reinit_tagset(struct blk_mq_tag_set *set);
 
 int blk_mq_map_queues(struct blk_mq_tag_set *set);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv2 1/2] blk-mq: Export blk_mq_freeze_queue_wait
@ 2017-02-24  0:36 ` Keith Busch
  0 siblings, 0 replies; 26+ messages in thread
From: Keith Busch @ 2017-02-24  0:36 UTC (permalink / raw)


Drivers can start a freeze, so this provides a way to wait for frozen.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 block/blk-mq.c         | 3 ++-
 include/linux/blk-mq.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9e6b064..d5b9a15 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -75,10 +75,11 @@ void blk_mq_freeze_queue_start(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_start);
 
-static void blk_mq_freeze_queue_wait(struct request_queue *q)
+void blk_mq_freeze_queue_wait(struct request_queue *q)
 {
 	wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter));
 }
+EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);
 
 /*
  * Guarantee no request is in use, so we can change any data structure of
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 001d30d..8dacf68 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -245,6 +245,7 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 void blk_mq_freeze_queue(struct request_queue *q);
 void blk_mq_unfreeze_queue(struct request_queue *q);
 void blk_mq_freeze_queue_start(struct request_queue *q);
+void blk_mq_freeze_queue_wait(struct request_queue *q);
 int blk_mq_reinit_tagset(struct blk_mq_tag_set *set);
 
 int blk_mq_map_queues(struct blk_mq_tag_set *set);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv2 2/2] nvme: Complete all stuck requests
  2017-02-24  0:36 ` Keith Busch
@ 2017-02-24  0:36   ` Keith Busch
  -1 siblings, 0 replies; 26+ messages in thread
From: Keith Busch @ 2017-02-24  0:36 UTC (permalink / raw)
  To: linux-block, linux-nvme, Jens Axboe, Christoph Hellwig, Sagi Grimberg
  Cc: Marc MERLIN, Keith Busch

If the block layer has entered requests and gets a CPU hot plug event
prior to the resume event, it will wait for those requests to exit. If
the nvme driver is shutting down, it will not start the queues back up,
preventing forward progress.

To fix that, this patch freezes the request queues when the driver intends
to shut down the controller so that no new requests may enter.  After the
controller has been disabled, the queues will be restarted to force all
entered requests to end in failure so that blk-mq's hot cpu notifier may
progress. To ensure the queue usage count is 0 on a shutdown, the driver
waits for freeze to complete before completing the controller shutdown.

On resume, the driver will unfreeze the queue for new requests to enter
once the hardware contexts are reinitialized.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
v1 -> v2:
  Simplified the freeze and waiting, using the new blk API.

 drivers/nvme/host/core.c | 33 +++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h |  3 +++
 drivers/nvme/host/pci.c  | 14 ++++++++++++++
 3 files changed, 50 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 25ec4e5..9e99b94 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2344,6 +2344,39 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_kill_queues);
 
+void nvme_unfreeze(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_mq_unfreeze_queue(ns->queue);
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+EXPORT_SYMBOL_GPL(nvme_unfreeze);
+
+void nvme_wait_freeze(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_mq_freeze_queue_wait(ns->queue);
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+EXPORT_SYMBOL_GPL(nvme_wait_freeze);
+
+void nvme_start_freeze(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_mq_freeze_queue_start(ns->queue);
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+EXPORT_SYMBOL_GPL(nvme_start_freeze);
+
 void nvme_stop_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a3da1e9..62af901 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -294,6 +294,9 @@ void nvme_queue_async_events(struct nvme_ctrl *ctrl);
 void nvme_stop_queues(struct nvme_ctrl *ctrl);
 void nvme_start_queues(struct nvme_ctrl *ctrl);
 void nvme_kill_queues(struct nvme_ctrl *ctrl);
+void nvme_unfreeze(struct nvme_ctrl *ctrl);
+void nvme_wait_freeze(struct nvme_ctrl *ctrl);
+void nvme_start_freeze(struct nvme_ctrl *ctrl);
 
 #define NVME_QID_ANY -1
 struct request *nvme_alloc_request(struct request_queue *q,
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 57a1af5..ce80cb5 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1676,6 +1676,8 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 	del_timer_sync(&dev->watchdog_timer);
 
 	mutex_lock(&dev->shutdown_lock);
+	if (shutdown)
+		nvme_start_freeze(&dev->ctrl);
 	if (pci_is_enabled(to_pci_dev(dev->dev))) {
 		nvme_stop_queues(&dev->ctrl);
 		csts = readl(dev->bar + NVME_REG_CSTS);
@@ -1700,6 +1702,16 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 
 	blk_mq_tagset_busy_iter(&dev->tagset, nvme_cancel_request, &dev->ctrl);
 	blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_request, &dev->ctrl);
+
+	/*
+	 * The driver will not be starting up queues again if shutting down so
+	 * must flush all entered requests to their failed completion to avoid
+	 * deadlocking blk-mq hot-cpu notifier.
+	 */
+	if (shutdown) {
+		nvme_start_queues(&dev->ctrl);
+		nvme_wait_freeze(&dev->ctrl);
+	}
 	mutex_unlock(&dev->shutdown_lock);
 }
 
@@ -1823,6 +1835,8 @@ static void nvme_reset_work(struct work_struct *work)
 	} else {
 		nvme_start_queues(&dev->ctrl);
 		nvme_dev_add(dev);
+		if (was_suspended)
+			nvme_unfreeze(&dev->ctrl);
 	}
 
 	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_LIVE)) {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv2 2/2] nvme: Complete all stuck requests
@ 2017-02-24  0:36   ` Keith Busch
  0 siblings, 0 replies; 26+ messages in thread
From: Keith Busch @ 2017-02-24  0:36 UTC (permalink / raw)


If the block layer has entered requests and gets a CPU hot plug event
prior to the resume event, it will wait for those requests to exit. If
the nvme driver is shutting down, it will not start the queues back up,
preventing forward progress.

To fix that, this patch freezes the request queues when the driver intends
to shut down the controller so that no new requests may enter.  After the
controller has been disabled, the queues will be restarted to force all
entered requests to end in failure so that blk-mq's hot cpu notifier may
progress. To ensure the queue usage count is 0 on a shutdown, the driver
waits for freeze to complete before completing the controller shutdown.

On resume, the driver will unfreeze the queue for new requests to enter
once the hardware contexts are reinitialized.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
v1 -> v2:
  Simplified the freeze and waiting, using the new blk API.

 drivers/nvme/host/core.c | 33 +++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h |  3 +++
 drivers/nvme/host/pci.c  | 14 ++++++++++++++
 3 files changed, 50 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 25ec4e5..9e99b94 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2344,6 +2344,39 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_kill_queues);
 
+void nvme_unfreeze(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_mq_unfreeze_queue(ns->queue);
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+EXPORT_SYMBOL_GPL(nvme_unfreeze);
+
+void nvme_wait_freeze(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_mq_freeze_queue_wait(ns->queue);
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+EXPORT_SYMBOL_GPL(nvme_wait_freeze);
+
+void nvme_start_freeze(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_mq_freeze_queue_start(ns->queue);
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+EXPORT_SYMBOL_GPL(nvme_start_freeze);
+
 void nvme_stop_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a3da1e9..62af901 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -294,6 +294,9 @@ void nvme_queue_async_events(struct nvme_ctrl *ctrl);
 void nvme_stop_queues(struct nvme_ctrl *ctrl);
 void nvme_start_queues(struct nvme_ctrl *ctrl);
 void nvme_kill_queues(struct nvme_ctrl *ctrl);
+void nvme_unfreeze(struct nvme_ctrl *ctrl);
+void nvme_wait_freeze(struct nvme_ctrl *ctrl);
+void nvme_start_freeze(struct nvme_ctrl *ctrl);
 
 #define NVME_QID_ANY -1
 struct request *nvme_alloc_request(struct request_queue *q,
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 57a1af5..ce80cb5 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1676,6 +1676,8 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 	del_timer_sync(&dev->watchdog_timer);
 
 	mutex_lock(&dev->shutdown_lock);
+	if (shutdown)
+		nvme_start_freeze(&dev->ctrl);
 	if (pci_is_enabled(to_pci_dev(dev->dev))) {
 		nvme_stop_queues(&dev->ctrl);
 		csts = readl(dev->bar + NVME_REG_CSTS);
@@ -1700,6 +1702,16 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 
 	blk_mq_tagset_busy_iter(&dev->tagset, nvme_cancel_request, &dev->ctrl);
 	blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_request, &dev->ctrl);
+
+	/*
+	 * The driver will not be starting up queues again if shutting down so
+	 * must flush all entered requests to their failed completion to avoid
+	 * deadlocking blk-mq hot-cpu notifier.
+	 */
+	if (shutdown) {
+		nvme_start_queues(&dev->ctrl);
+		nvme_wait_freeze(&dev->ctrl);
+	}
 	mutex_unlock(&dev->shutdown_lock);
 }
 
@@ -1823,6 +1835,8 @@ static void nvme_reset_work(struct work_struct *work)
 	} else {
 		nvme_start_queues(&dev->ctrl);
 		nvme_dev_add(dev);
+		if (was_suspended)
+			nvme_unfreeze(&dev->ctrl);
 	}
 
 	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_LIVE)) {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 1/2] blk-mq: Export blk_mq_freeze_queue_wait
  2017-02-24  0:36 ` Keith Busch
@ 2017-02-24  6:49   ` Christoph Hellwig
  -1 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2017-02-24  6:49 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, Sagi Grimberg, Marc MERLIN, linux-nvme, linux-block,
	Christoph Hellwig

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCHv2 1/2] blk-mq: Export blk_mq_freeze_queue_wait
@ 2017-02-24  6:49   ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2017-02-24  6:49 UTC (permalink / raw)


Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 2/2] nvme: Complete all stuck requests
  2017-02-24  0:36   ` Keith Busch
@ 2017-02-24  6:50     ` Christoph Hellwig
  -1 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2017-02-24  6:50 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, Sagi Grimberg, Marc MERLIN, linux-nvme, linux-block,
	Christoph Hellwig

Thanks Keith,

this looks much simpler so that even I can understand it :)

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCHv2 2/2] nvme: Complete all stuck requests
@ 2017-02-24  6:50     ` Christoph Hellwig
  0 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2017-02-24  6:50 UTC (permalink / raw)


Thanks Keith,

this looks much simpler so that even I can understand it :)

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 1/2] blk-mq: Export blk_mq_freeze_queue_wait
  2017-02-24  0:36 ` Keith Busch
@ 2017-02-27 13:38   ` Sagi Grimberg
  -1 siblings, 0 replies; 26+ messages in thread
From: Sagi Grimberg @ 2017-02-27 13:38 UTC (permalink / raw)
  To: Keith Busch, linux-block, linux-nvme, Jens Axboe, Christoph Hellwig
  Cc: Marc MERLIN

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCHv2 1/2] blk-mq: Export blk_mq_freeze_queue_wait
@ 2017-02-27 13:38   ` Sagi Grimberg
  0 siblings, 0 replies; 26+ messages in thread
From: Sagi Grimberg @ 2017-02-27 13:38 UTC (permalink / raw)


Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 2/2] nvme: Complete all stuck requests
  2017-02-24  0:36   ` Keith Busch
@ 2017-02-27 13:46     ` Sagi Grimberg
  -1 siblings, 0 replies; 26+ messages in thread
From: Sagi Grimberg @ 2017-02-27 13:46 UTC (permalink / raw)
  To: Keith Busch, linux-block, linux-nvme, Jens Axboe, Christoph Hellwig
  Cc: Marc MERLIN



On 24/02/17 02:36, Keith Busch wrote:
> If the block layer has entered requests and gets a CPU hot plug event
> prior to the resume event, it will wait for those requests to exit. If
> the nvme driver is shutting down, it will not start the queues back up,
> preventing forward progress.
>
> To fix that, this patch freezes the request queues when the driver intends
> to shut down the controller so that no new requests may enter.  After the
> controller has been disabled, the queues will be restarted to force all
> entered requests to end in failure so that blk-mq's hot cpu notifier may
> progress. To ensure the queue usage count is 0 on a shutdown, the driver
> waits for freeze to complete before completing the controller shutdown.

Keith, can you explain (again) for me why is the freeze_wait must happen
after the controller has been disabled, instead of starting the queues
and waiting right after freeze start?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCHv2 2/2] nvme: Complete all stuck requests
@ 2017-02-27 13:46     ` Sagi Grimberg
  0 siblings, 0 replies; 26+ messages in thread
From: Sagi Grimberg @ 2017-02-27 13:46 UTC (permalink / raw)




On 24/02/17 02:36, Keith Busch wrote:
> If the block layer has entered requests and gets a CPU hot plug event
> prior to the resume event, it will wait for those requests to exit. If
> the nvme driver is shutting down, it will not start the queues back up,
> preventing forward progress.
>
> To fix that, this patch freezes the request queues when the driver intends
> to shut down the controller so that no new requests may enter.  After the
> controller has been disabled, the queues will be restarted to force all
> entered requests to end in failure so that blk-mq's hot cpu notifier may
> progress. To ensure the queue usage count is 0 on a shutdown, the driver
> waits for freeze to complete before completing the controller shutdown.

Keith, can you explain (again) for me why is the freeze_wait must happen
after the controller has been disabled, instead of starting the queues
and waiting right after freeze start?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 2/2] nvme: Complete all stuck requests
  2017-02-27 13:46     ` Sagi Grimberg
@ 2017-02-27 15:01       ` Keith Busch
  -1 siblings, 0 replies; 26+ messages in thread
From: Keith Busch @ 2017-02-27 15:01 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-block, Jens Axboe, Marc MERLIN, Christoph Hellwig, linux-nvme

On Mon, Feb 27, 2017 at 03:46:09PM +0200, Sagi Grimberg wrote:
> On 24/02/17 02:36, Keith Busch wrote:
> > If the block layer has entered requests and gets a CPU hot plug event
> > prior to the resume event, it will wait for those requests to exit. If
> > the nvme driver is shutting down, it will not start the queues back up,
> > preventing forward progress.
> > 
> > To fix that, this patch freezes the request queues when the driver intends
> > to shut down the controller so that no new requests may enter.  After the
> > controller has been disabled, the queues will be restarted to force all
> > entered requests to end in failure so that blk-mq's hot cpu notifier may
> > progress. To ensure the queue usage count is 0 on a shutdown, the driver
> > waits for freeze to complete before completing the controller shutdown.
> 
> Keith, can you explain (again) for me why is the freeze_wait must happen
> after the controller has been disabled, instead of starting the queues
> and waiting right after freeze start?

Yeah, the driver needs to make forward progress even if the controller
isn't functioning. If we do the freeze wait before disabling the
controller, there's no way to reclaim missing completions. If the
controller is working perfectly, it'd be okay, but the driver would be
stuck if there's a problem.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCHv2 2/2] nvme: Complete all stuck requests
@ 2017-02-27 15:01       ` Keith Busch
  0 siblings, 0 replies; 26+ messages in thread
From: Keith Busch @ 2017-02-27 15:01 UTC (permalink / raw)


On Mon, Feb 27, 2017@03:46:09PM +0200, Sagi Grimberg wrote:
> On 24/02/17 02:36, Keith Busch wrote:
> > If the block layer has entered requests and gets a CPU hot plug event
> > prior to the resume event, it will wait for those requests to exit. If
> > the nvme driver is shutting down, it will not start the queues back up,
> > preventing forward progress.
> > 
> > To fix that, this patch freezes the request queues when the driver intends
> > to shut down the controller so that no new requests may enter.  After the
> > controller has been disabled, the queues will be restarted to force all
> > entered requests to end in failure so that blk-mq's hot cpu notifier may
> > progress. To ensure the queue usage count is 0 on a shutdown, the driver
> > waits for freeze to complete before completing the controller shutdown.
> 
> Keith, can you explain (again) for me why is the freeze_wait must happen
> after the controller has been disabled, instead of starting the queues
> and waiting right after freeze start?

Yeah, the driver needs to make forward progress even if the controller
isn't functioning. If we do the freeze wait before disabling the
controller, there's no way to reclaim missing completions. If the
controller is working perfectly, it'd be okay, but the driver would be
stuck if there's a problem.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 2/2] nvme: Complete all stuck requests
  2017-02-27 15:01       ` Keith Busch
@ 2017-02-27 17:27         ` Sagi Grimberg
  -1 siblings, 0 replies; 26+ messages in thread
From: Sagi Grimberg @ 2017-02-27 17:27 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-block, linux-nvme, Jens Axboe, Christoph Hellwig, Marc MERLIN


>>> If the block layer has entered requests and gets a CPU hot plug event
>>> prior to the resume event, it will wait for those requests to exit. If
>>> the nvme driver is shutting down, it will not start the queues back up,
>>> preventing forward progress.
>>>
>>> To fix that, this patch freezes the request queues when the driver intends
>>> to shut down the controller so that no new requests may enter.  After the
>>> controller has been disabled, the queues will be restarted to force all
>>> entered requests to end in failure so that blk-mq's hot cpu notifier may
>>> progress. To ensure the queue usage count is 0 on a shutdown, the driver
>>> waits for freeze to complete before completing the controller shutdown.
>>
>> Keith, can you explain (again) for me why is the freeze_wait must happen
>> after the controller has been disabled, instead of starting the queues
>> and waiting right after freeze start?
>
> Yeah, the driver needs to make forward progress even if the controller
> isn't functioning. If we do the freeze wait before disabling the
> controller, there's no way to reclaim missing completions. If the
> controller is working perfectly, it'd be okay, but the driver would be
> stuck if there's a problem.

OK, I think we can get it for fabrics too, need to figure out how to
handle it there too.

Do you have a reproducer?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCHv2 2/2] nvme: Complete all stuck requests
@ 2017-02-27 17:27         ` Sagi Grimberg
  0 siblings, 0 replies; 26+ messages in thread
From: Sagi Grimberg @ 2017-02-27 17:27 UTC (permalink / raw)



>>> If the block layer has entered requests and gets a CPU hot plug event
>>> prior to the resume event, it will wait for those requests to exit. If
>>> the nvme driver is shutting down, it will not start the queues back up,
>>> preventing forward progress.
>>>
>>> To fix that, this patch freezes the request queues when the driver intends
>>> to shut down the controller so that no new requests may enter.  After the
>>> controller has been disabled, the queues will be restarted to force all
>>> entered requests to end in failure so that blk-mq's hot cpu notifier may
>>> progress. To ensure the queue usage count is 0 on a shutdown, the driver
>>> waits for freeze to complete before completing the controller shutdown.
>>
>> Keith, can you explain (again) for me why is the freeze_wait must happen
>> after the controller has been disabled, instead of starting the queues
>> and waiting right after freeze start?
>
> Yeah, the driver needs to make forward progress even if the controller
> isn't functioning. If we do the freeze wait before disabling the
> controller, there's no way to reclaim missing completions. If the
> controller is working perfectly, it'd be okay, but the driver would be
> stuck if there's a problem.

OK, I think we can get it for fabrics too, need to figure out how to
handle it there too.

Do you have a reproducer?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 2/2] nvme: Complete all stuck requests
  2017-02-27 17:27         ` Sagi Grimberg
@ 2017-02-27 19:15           ` Keith Busch
  -1 siblings, 0 replies; 26+ messages in thread
From: Keith Busch @ 2017-02-27 19:15 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: linux-block, Jens Axboe, Marc MERLIN, Christoph Hellwig, linux-nvme

On Mon, Feb 27, 2017 at 07:27:51PM +0200, Sagi Grimberg wrote:
> OK, I think we can get it for fabrics too, need to figure out how to
> handle it there too.
> 
> Do you have a reproducer?

To repro, I have to run a buffered writer workload then put the system into S3.

This fio job seems to reproduce for me:

  fio --name=global --filename=/dev/nvme0n1 --bsrange=4k-128k --rw=randwrite --ioengine=libaio --iodepth=8 --numjobs=8 --name=foobar

I use rtcwake to test suspend/resume:

  rtcwake -m mem -s 10

Without the patch we'll get stuck after "Disabling non-boot CPUs ..."
when blk-mq waits to freeze some entered queues after nvme was disabled.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCHv2 2/2] nvme: Complete all stuck requests
@ 2017-02-27 19:15           ` Keith Busch
  0 siblings, 0 replies; 26+ messages in thread
From: Keith Busch @ 2017-02-27 19:15 UTC (permalink / raw)


On Mon, Feb 27, 2017@07:27:51PM +0200, Sagi Grimberg wrote:
> OK, I think we can get it for fabrics too, need to figure out how to
> handle it there too.
> 
> Do you have a reproducer?

To repro, I have to run a buffered writer workload then put the system into S3.

This fio job seems to reproduce for me:

  fio --name=global --filename=/dev/nvme0n1 --bsrange=4k-128k --rw=randwrite --ioengine=libaio --iodepth=8 --numjobs=8 --name=foobar

I use rtcwake to test suspend/resume:

  rtcwake -m mem -s 10

Without the patch we'll get stuck after "Disabling non-boot CPUs ..."
when blk-mq waits to freeze some entered queues after nvme was disabled.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 2/2] nvme: Complete all stuck requests
  2017-02-27 19:15           ` Keith Busch
@ 2017-02-28  7:42             ` Artur Paszkiewicz
  -1 siblings, 0 replies; 26+ messages in thread
From: Artur Paszkiewicz @ 2017-02-28  7:42 UTC (permalink / raw)
  To: Keith Busch, Sagi Grimberg
  Cc: linux-block, Jens Axboe, Marc MERLIN, Christoph Hellwig, linux-nvme

On 02/27/2017 08:15 PM, Keith Busch wrote:
> On Mon, Feb 27, 2017 at 07:27:51PM +0200, Sagi Grimberg wrote:
>> OK, I think we can get it for fabrics too, need to figure out how to
>> handle it there too.
>>
>> Do you have a reproducer?
> 
> To repro, I have to run a buffered writer workload then put the system into S3.
> 
> This fio job seems to reproduce for me:
> 
>   fio --name=global --filename=/dev/nvme0n1 --bsrange=4k-128k --rw=randwrite --ioengine=libaio --iodepth=8 --numjobs=8 --name=foobar
> 
> I use rtcwake to test suspend/resume:
> 
>   rtcwake -m mem -s 10
> 
> Without the patch we'll get stuck after "Disabling non-boot CPUs ..."
> when blk-mq waits to freeze some entered queues after nvme was disabled.

I'm observing the same thing when hibernating during mdraid resync on
nvme - it hangs in blk_mq_freeze_queue_wait() after "Disabling non-boot
CPUs ...". This patch did not help but when I put nvme_wait_freeze()
right after nvme_start_freeze() it appeared to be working. Maybe the
difference here is that requests are submitted from a non-freezable
kernel thread (md sync_thread)?

Thanks,
Artur

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCHv2 2/2] nvme: Complete all stuck requests
@ 2017-02-28  7:42             ` Artur Paszkiewicz
  0 siblings, 0 replies; 26+ messages in thread
From: Artur Paszkiewicz @ 2017-02-28  7:42 UTC (permalink / raw)


On 02/27/2017 08:15 PM, Keith Busch wrote:
> On Mon, Feb 27, 2017@07:27:51PM +0200, Sagi Grimberg wrote:
>> OK, I think we can get it for fabrics too, need to figure out how to
>> handle it there too.
>>
>> Do you have a reproducer?
> 
> To repro, I have to run a buffered writer workload then put the system into S3.
> 
> This fio job seems to reproduce for me:
> 
>   fio --name=global --filename=/dev/nvme0n1 --bsrange=4k-128k --rw=randwrite --ioengine=libaio --iodepth=8 --numjobs=8 --name=foobar
> 
> I use rtcwake to test suspend/resume:
> 
>   rtcwake -m mem -s 10
> 
> Without the patch we'll get stuck after "Disabling non-boot CPUs ..."
> when blk-mq waits to freeze some entered queues after nvme was disabled.

I'm observing the same thing when hibernating during mdraid resync on
nvme - it hangs in blk_mq_freeze_queue_wait() after "Disabling non-boot
CPUs ...". This patch did not help but when I put nvme_wait_freeze()
right after nvme_start_freeze() it appeared to be working. Maybe the
difference here is that requests are submitted from a non-freezable
kernel thread (md sync_thread)?

Thanks,
Artur

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 2/2] nvme: Complete all stuck requests
  2017-02-28  7:42             ` Artur Paszkiewicz
@ 2017-02-28 11:45               ` Sagi Grimberg
  -1 siblings, 0 replies; 26+ messages in thread
From: Sagi Grimberg @ 2017-02-28 11:45 UTC (permalink / raw)
  To: Artur Paszkiewicz, Keith Busch
  Cc: linux-block, Jens Axboe, Marc MERLIN, Christoph Hellwig, linux-nvme


>>> OK, I think we can get it for fabrics too, need to figure out how to
>>> handle it there too.
>>>
>>> Do you have a reproducer?
>>
>> To repro, I have to run a buffered writer workload then put the system into S3.
>>
>> This fio job seems to reproduce for me:
>>
>>   fio --name=global --filename=/dev/nvme0n1 --bsrange=4k-128k --rw=randwrite --ioengine=libaio --iodepth=8 --numjobs=8 --name=foobar
>>
>> I use rtcwake to test suspend/resume:
>>
>>   rtcwake -m mem -s 10
>>
>> Without the patch we'll get stuck after "Disabling non-boot CPUs ..."
>> when blk-mq waits to freeze some entered queues after nvme was disabled.
>
> I'm observing the same thing when hibernating during mdraid resync on
> nvme - it hangs in blk_mq_freeze_queue_wait() after "Disabling non-boot
> CPUs ...". This patch did not help but when I put nvme_wait_freeze()
> right after nvme_start_freeze() it appeared to be working.

Interesting. did the nvme device succeeded to shutdown at all?

> Maybe the
> difference here is that requests are submitted from a non-freezable
> kernel thread (md sync_thread)?

Don't think its related...

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCHv2 2/2] nvme: Complete all stuck requests
@ 2017-02-28 11:45               ` Sagi Grimberg
  0 siblings, 0 replies; 26+ messages in thread
From: Sagi Grimberg @ 2017-02-28 11:45 UTC (permalink / raw)



>>> OK, I think we can get it for fabrics too, need to figure out how to
>>> handle it there too.
>>>
>>> Do you have a reproducer?
>>
>> To repro, I have to run a buffered writer workload then put the system into S3.
>>
>> This fio job seems to reproduce for me:
>>
>>   fio --name=global --filename=/dev/nvme0n1 --bsrange=4k-128k --rw=randwrite --ioengine=libaio --iodepth=8 --numjobs=8 --name=foobar
>>
>> I use rtcwake to test suspend/resume:
>>
>>   rtcwake -m mem -s 10
>>
>> Without the patch we'll get stuck after "Disabling non-boot CPUs ..."
>> when blk-mq waits to freeze some entered queues after nvme was disabled.
>
> I'm observing the same thing when hibernating during mdraid resync on
> nvme - it hangs in blk_mq_freeze_queue_wait() after "Disabling non-boot
> CPUs ...". This patch did not help but when I put nvme_wait_freeze()
> right after nvme_start_freeze() it appeared to be working.

Interesting. did the nvme device succeeded to shutdown at all?

> Maybe the
> difference here is that requests are submitted from a non-freezable
> kernel thread (md sync_thread)?

Don't think its related...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 2/2] nvme: Complete all stuck requests
  2017-02-28  7:42             ` Artur Paszkiewicz
@ 2017-02-28 16:57               ` Keith Busch
  -1 siblings, 0 replies; 26+ messages in thread
From: Keith Busch @ 2017-02-28 16:57 UTC (permalink / raw)
  To: Artur Paszkiewicz
  Cc: Jens Axboe, Sagi Grimberg, Marc MERLIN, linux-nvme, linux-block,
	Christoph Hellwig

On Tue, Feb 28, 2017 at 08:42:19AM +0100, Artur Paszkiewicz wrote:
> 
> I'm observing the same thing when hibernating during mdraid resync on
> nvme - it hangs in blk_mq_freeze_queue_wait() after "Disabling non-boot
> CPUs ...".

The patch guarantees forward progress for blk-mq's hot-cpu notifier on
nvme request queues by failing all entered requests. It sounds like some
part of your setup needs those requests to succeed in order to hibernate.

If your mdraid uses a stacking request_queue that submits retries while
it's request queue is entered, that may explain how you remain stuck
at blk_mq_freeze_queue_wait.

> This patch did not help but when I put nvme_wait_freeze()
> right after nvme_start_freeze() it appeared to be working. Maybe the
> difference here is that requests are submitted from a non-freezable
> kernel thread (md sync_thread)?

Wait freeze prior to quiescing the queue is ok when the controller is
functioning, but it'd be impossible to complete a reset if the controller
is in a failed or degraded state.

We probably want to give those requests a chance to succeed, and I think
we'd need to be able to timeout the freeze wait. Below are two patches
I tested. Prior to these, the fio test would report IO errors from some
of its jobs; no errors with these.

---
 block/blk-mq.c         | 9 +++++++++
 include/linux/blk-mq.h | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8da2c04..a5e66a7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -81,6 +81,15 @@ void blk_mq_freeze_queue_wait(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);
 
+int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
+				     unsigned long timeout)
+{
+	return wait_event_timeout(q->mq_freeze_wq,
+					percpu_ref_is_zero(&q->q_usage_counter),
+					timeout);
+}
+EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait_timeout);
+
 /*
  * Guarantee no request is in use, so we can change any data structure of
  * the queue afterward.
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8dacf68..b296a90 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -246,6 +246,8 @@ void blk_mq_freeze_queue(struct request_queue *q);
 void blk_mq_unfreeze_queue(struct request_queue *q);
 void blk_mq_freeze_queue_start(struct request_queue *q);
 void blk_mq_freeze_queue_wait(struct request_queue *q);
+int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
+				     unsigned long timeout);
 int blk_mq_reinit_tagset(struct blk_mq_tag_set *set);
 
 int blk_mq_map_queues(struct blk_mq_tag_set *set);
-- 

---
 drivers/nvme/host/core.c | 14 ++++++++++++++
 drivers/nvme/host/nvme.h |  1 +
 drivers/nvme/host/pci.c  | 18 ++++++++++++++----
 3 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9e99b94..9b3b57f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2355,6 +2355,20 @@ void nvme_unfreeze(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_unfreeze);
 
+void nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		timeout = blk_mq_freeze_queue_wait_timeout(ns->queue, timeout);
+		if (timeout <= 0)
+			break;
+	}
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+EXPORT_SYMBOL_GPL(nvme_wait_freeze_timeout);
+
 void nvme_wait_freeze(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 62af901..2aa20e3 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -296,6 +296,7 @@ void nvme_start_queues(struct nvme_ctrl *ctrl);
 void nvme_kill_queues(struct nvme_ctrl *ctrl);
 void nvme_unfreeze(struct nvme_ctrl *ctrl);
 void nvme_wait_freeze(struct nvme_ctrl *ctrl);
+void nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout);
 void nvme_start_freeze(struct nvme_ctrl *ctrl);
 
 #define NVME_QID_ANY -1
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index b266fb9..a7a423f 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1691,22 +1691,32 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 {
 	int i, queues;
 	u32 csts = -1;
+	bool dead;
+	struct pci_dev *pdev = to_pci_dev(dev->dev);
 
 	del_timer_sync(&dev->watchdog_timer);
 
 	mutex_lock(&dev->shutdown_lock);
 	if (shutdown)
 		nvme_start_freeze(&dev->ctrl);
-	if (pci_is_enabled(to_pci_dev(dev->dev))) {
-		nvme_stop_queues(&dev->ctrl);
+	if (pci_is_enabled(pdev))
 		csts = readl(dev->bar + NVME_REG_CSTS);
-	}
+	dead = (csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY) ||
+		pdev->error_state  != pci_channel_io_normal;
+
+	/*
+	 * Give the controller a chance to complete all entered requests if
+	 * doing a safe shutdown.
+	 */
+	if (!dead && shutdown)
+		nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
+	nvme_stop_queues(&dev->ctrl);
 
 	queues = dev->online_queues - 1;
 	for (i = dev->queue_count - 1; i > 0; i--)
 		nvme_suspend_queue(dev->queues[i]);
 
-	if (csts & NVME_CSTS_CFS || !(csts & NVME_CSTS_RDY)) {
+	if (dead) {
 		/* A device might become IO incapable very soon during
 		 * probe, before the admin queue is configured. Thus,
 		 * queue_count can be 0 here.
-- 

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv2 2/2] nvme: Complete all stuck requests
@ 2017-02-28 16:57               ` Keith Busch
  0 siblings, 0 replies; 26+ messages in thread
From: Keith Busch @ 2017-02-28 16:57 UTC (permalink / raw)


On Tue, Feb 28, 2017@08:42:19AM +0100, Artur Paszkiewicz wrote:
> 
> I'm observing the same thing when hibernating during mdraid resync on
> nvme - it hangs in blk_mq_freeze_queue_wait() after "Disabling non-boot
> CPUs ...".

The patch guarantees forward progress for blk-mq's hot-cpu notifier on
nvme request queues by failing all entered requests. It sounds like some
part of your setup needs those requests to succeed in order to hibernate.

If your mdraid uses a stacking request_queue that submits retries while
it's request queue is entered, that may explain how you remain stuck
at blk_mq_freeze_queue_wait.

> This patch did not help but when I put nvme_wait_freeze()
> right after nvme_start_freeze() it appeared to be working. Maybe the
> difference here is that requests are submitted from a non-freezable
> kernel thread (md sync_thread)?

Wait freeze prior to quiescing the queue is ok when the controller is
functioning, but it'd be impossible to complete a reset if the controller
is in a failed or degraded state.

We probably want to give those requests a chance to succeed, and I think
we'd need to be able to timeout the freeze wait. Below are two patches
I tested. Prior to these, the fio test would report IO errors from some
of its jobs; no errors with these.

---
 block/blk-mq.c         | 9 +++++++++
 include/linux/blk-mq.h | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8da2c04..a5e66a7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -81,6 +81,15 @@ void blk_mq_freeze_queue_wait(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);
 
+int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
+				     unsigned long timeout)
+{
+	return wait_event_timeout(q->mq_freeze_wq,
+					percpu_ref_is_zero(&q->q_usage_counter),
+					timeout);
+}
+EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait_timeout);
+
 /*
  * Guarantee no request is in use, so we can change any data structure of
  * the queue afterward.
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8dacf68..b296a90 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -246,6 +246,8 @@ void blk_mq_freeze_queue(struct request_queue *q);
 void blk_mq_unfreeze_queue(struct request_queue *q);
 void blk_mq_freeze_queue_start(struct request_queue *q);
 void blk_mq_freeze_queue_wait(struct request_queue *q);
+int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
+				     unsigned long timeout);
 int blk_mq_reinit_tagset(struct blk_mq_tag_set *set);
 
 int blk_mq_map_queues(struct blk_mq_tag_set *set);
-- 

---
 drivers/nvme/host/core.c | 14 ++++++++++++++
 drivers/nvme/host/nvme.h |  1 +
 drivers/nvme/host/pci.c  | 18 ++++++++++++++----
 3 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9e99b94..9b3b57f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2355,6 +2355,20 @@ void nvme_unfreeze(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_unfreeze);
 
+void nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		timeout = blk_mq_freeze_queue_wait_timeout(ns->queue, timeout);
+		if (timeout <= 0)
+			break;
+	}
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+EXPORT_SYMBOL_GPL(nvme_wait_freeze_timeout);
+
 void nvme_wait_freeze(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 62af901..2aa20e3 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -296,6 +296,7 @@ void nvme_start_queues(struct nvme_ctrl *ctrl);
 void nvme_kill_queues(struct nvme_ctrl *ctrl);
 void nvme_unfreeze(struct nvme_ctrl *ctrl);
 void nvme_wait_freeze(struct nvme_ctrl *ctrl);
+void nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout);
 void nvme_start_freeze(struct nvme_ctrl *ctrl);
 
 #define NVME_QID_ANY -1
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index b266fb9..a7a423f 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1691,22 +1691,32 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 {
 	int i, queues;
 	u32 csts = -1;
+	bool dead;
+	struct pci_dev *pdev = to_pci_dev(dev->dev);
 
 	del_timer_sync(&dev->watchdog_timer);
 
 	mutex_lock(&dev->shutdown_lock);
 	if (shutdown)
 		nvme_start_freeze(&dev->ctrl);
-	if (pci_is_enabled(to_pci_dev(dev->dev))) {
-		nvme_stop_queues(&dev->ctrl);
+	if (pci_is_enabled(pdev))
 		csts = readl(dev->bar + NVME_REG_CSTS);
-	}
+	dead = (csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY) ||
+		pdev->error_state  != pci_channel_io_normal;
+
+	/*
+	 * Give the controller a chance to complete all entered requests if
+	 * doing a safe shutdown.
+	 */
+	if (!dead && shutdown)
+		nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
+	nvme_stop_queues(&dev->ctrl);
 
 	queues = dev->online_queues - 1;
 	for (i = dev->queue_count - 1; i > 0; i--)
 		nvme_suspend_queue(dev->queues[i]);
 
-	if (csts & NVME_CSTS_CFS || !(csts & NVME_CSTS_RDY)) {
+	if (dead) {
 		/* A device might become IO incapable very soon during
 		 * probe, before the admin queue is configured. Thus,
 		 * queue_count can be 0 here.
-- 

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 2/2] nvme: Complete all stuck requests
  2017-02-28 16:57               ` Keith Busch
@ 2017-03-01  8:54                 ` Artur Paszkiewicz
  -1 siblings, 0 replies; 26+ messages in thread
From: Artur Paszkiewicz @ 2017-03-01  8:54 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, Sagi Grimberg, Marc MERLIN, linux-nvme, linux-block,
	Christoph Hellwig

On 02/28/2017 05:57 PM, Keith Busch wrote:
> On Tue, Feb 28, 2017 at 08:42:19AM +0100, Artur Paszkiewicz wrote:
>>
>> I'm observing the same thing when hibernating during mdraid resync on
>> nvme - it hangs in blk_mq_freeze_queue_wait() after "Disabling non-boot
>> CPUs ...".
> 
> The patch guarantees forward progress for blk-mq's hot-cpu notifier on
> nvme request queues by failing all entered requests. It sounds like some
> part of your setup needs those requests to succeed in order to hibernate.
> 
> If your mdraid uses a stacking request_queue that submits retries while
> it's request queue is entered, that may explain how you remain stuck
> at blk_mq_freeze_queue_wait.
> 
>> This patch did not help but when I put nvme_wait_freeze()
>> right after nvme_start_freeze() it appeared to be working. Maybe the
>> difference here is that requests are submitted from a non-freezable
>> kernel thread (md sync_thread)?
> 
> Wait freeze prior to quiescing the queue is ok when the controller is
> functioning, but it'd be impossible to complete a reset if the controller
> is in a failed or degraded state.
> 
> We probably want to give those requests a chance to succeed, and I think
> we'd need to be able to timeout the freeze wait. Below are two patches
> I tested. Prior to these, the fio test would report IO errors from some
> of its jobs; no errors with these.

With these patches it works fine. I tested multiple iterations on 2
platforms and they were able to hibernate and resume without issues.

Thanks,
Artur

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCHv2 2/2] nvme: Complete all stuck requests
@ 2017-03-01  8:54                 ` Artur Paszkiewicz
  0 siblings, 0 replies; 26+ messages in thread
From: Artur Paszkiewicz @ 2017-03-01  8:54 UTC (permalink / raw)


On 02/28/2017 05:57 PM, Keith Busch wrote:
> On Tue, Feb 28, 2017@08:42:19AM +0100, Artur Paszkiewicz wrote:
>>
>> I'm observing the same thing when hibernating during mdraid resync on
>> nvme - it hangs in blk_mq_freeze_queue_wait() after "Disabling non-boot
>> CPUs ...".
> 
> The patch guarantees forward progress for blk-mq's hot-cpu notifier on
> nvme request queues by failing all entered requests. It sounds like some
> part of your setup needs those requests to succeed in order to hibernate.
> 
> If your mdraid uses a stacking request_queue that submits retries while
> it's request queue is entered, that may explain how you remain stuck
> at blk_mq_freeze_queue_wait.
> 
>> This patch did not help but when I put nvme_wait_freeze()
>> right after nvme_start_freeze() it appeared to be working. Maybe the
>> difference here is that requests are submitted from a non-freezable
>> kernel thread (md sync_thread)?
> 
> Wait freeze prior to quiescing the queue is ok when the controller is
> functioning, but it'd be impossible to complete a reset if the controller
> is in a failed or degraded state.
> 
> We probably want to give those requests a chance to succeed, and I think
> we'd need to be able to timeout the freeze wait. Below are two patches
> I tested. Prior to these, the fio test would report IO errors from some
> of its jobs; no errors with these.

With these patches it works fine. I tested multiple iterations on 2
platforms and they were able to hibernate and resume without issues.

Thanks,
Artur

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-03-01  8:54 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-24  0:36 [PATCHv2 1/2] blk-mq: Export blk_mq_freeze_queue_wait Keith Busch
2017-02-24  0:36 ` Keith Busch
2017-02-24  0:36 ` [PATCHv2 2/2] nvme: Complete all stuck requests Keith Busch
2017-02-24  0:36   ` Keith Busch
2017-02-24  6:50   ` Christoph Hellwig
2017-02-24  6:50     ` Christoph Hellwig
2017-02-27 13:46   ` Sagi Grimberg
2017-02-27 13:46     ` Sagi Grimberg
2017-02-27 15:01     ` Keith Busch
2017-02-27 15:01       ` Keith Busch
2017-02-27 17:27       ` Sagi Grimberg
2017-02-27 17:27         ` Sagi Grimberg
2017-02-27 19:15         ` Keith Busch
2017-02-27 19:15           ` Keith Busch
2017-02-28  7:42           ` Artur Paszkiewicz
2017-02-28  7:42             ` Artur Paszkiewicz
2017-02-28 11:45             ` Sagi Grimberg
2017-02-28 11:45               ` Sagi Grimberg
2017-02-28 16:57             ` Keith Busch
2017-02-28 16:57               ` Keith Busch
2017-03-01  8:54               ` Artur Paszkiewicz
2017-03-01  8:54                 ` Artur Paszkiewicz
2017-02-24  6:49 ` [PATCHv2 1/2] blk-mq: Export blk_mq_freeze_queue_wait Christoph Hellwig
2017-02-24  6:49   ` Christoph Hellwig
2017-02-27 13:38 ` Sagi Grimberg
2017-02-27 13:38   ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.