linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] nvme suspend/resume fix
@ 2017-03-01 19:22 Keith Busch
  2017-03-01 19:22 ` [PATCH 1/3] blk-mq: Export blk_mq_freeze_queue_wait Keith Busch
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Keith Busch @ 2017-03-01 19:22 UTC (permalink / raw)
  To: Jens Axboe, Sagi Grimberg, Christoph Hellwig, linux-nvme, linux-block
  Cc: Marc MERLIN, Jens Axboe, Keith Busch

Hi Jens,

This is hopefully the last version to fix nvme stopping blk-mq's CPU
event from making forward progress. The solution requires a couple new
blk-mq exports so the nvme driver can properly sync with queue states.

Since this depends on the blk-mq parts, and if you approve of the
proposal, I think it'd be easiest if you can take this directly into
linux-block/for-linus. Otherwise, we can send you a pull request if you
Ack the blk-mq parts.

The difference from the previous patch is an update that Artur
confirmed passes hibernate on a stacked request queue. Personally,
I tested this for several hours with fio running buffered writes
in the back-ground and rtcwake running suspend/resume at intervals.
This succeeded with no fio errors.

Keith Busch (3):
  blk-mq: Export blk_mq_freeze_queue_wait
  blk-mq: Provide queue freeze wait timeout
  nvme: Complete all stuck requests

 block/blk-mq.c           | 12 +++++++++++-
 drivers/nvme/host/core.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h |  4 ++++
 drivers/nvme/host/pci.c  | 33 ++++++++++++++++++++++++++++-----
 include/linux/blk-mq.h   |  3 +++
 5 files changed, 93 insertions(+), 6 deletions(-)

-- 
2.5.5

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] blk-mq: Export blk_mq_freeze_queue_wait
  2017-03-01 19:22 [PATCH 0/3] nvme suspend/resume fix Keith Busch
@ 2017-03-01 19:22 ` Keith Busch
  2017-03-01 19:22 ` [PATCH 2/3] blk-mq: Provide freeze queue timeout Keith Busch
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Keith Busch @ 2017-03-01 19:22 UTC (permalink / raw)
  To: Jens Axboe, Sagi Grimberg, Christoph Hellwig, linux-nvme, linux-block
  Cc: Marc MERLIN, Jens Axboe, Keith Busch

Drivers can start a freeze, so this provides a way to wait for frozen.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 block/blk-mq.c         | 3 ++-
 include/linux/blk-mq.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 94593c6..8da2c04 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -75,10 +75,11 @@ void blk_mq_freeze_queue_start(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_start);
 
-static void blk_mq_freeze_queue_wait(struct request_queue *q)
+void blk_mq_freeze_queue_wait(struct request_queue *q)
 {
 	wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter));
 }
+EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);
 
 /*
  * Guarantee no request is in use, so we can change any data structure of
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 001d30d..8dacf68 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -245,6 +245,7 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 void blk_mq_freeze_queue(struct request_queue *q);
 void blk_mq_unfreeze_queue(struct request_queue *q);
 void blk_mq_freeze_queue_start(struct request_queue *q);
+void blk_mq_freeze_queue_wait(struct request_queue *q);
 int blk_mq_reinit_tagset(struct blk_mq_tag_set *set);
 
 int blk_mq_map_queues(struct blk_mq_tag_set *set);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] blk-mq: Provide freeze queue timeout
  2017-03-01 19:22 [PATCH 0/3] nvme suspend/resume fix Keith Busch
  2017-03-01 19:22 ` [PATCH 1/3] blk-mq: Export blk_mq_freeze_queue_wait Keith Busch
@ 2017-03-01 19:22 ` Keith Busch
  2017-03-02  0:14   ` Christoph Hellwig
  2017-03-01 19:22 ` [PATCH 3/3] nvme: Complete all stuck requests Keith Busch
  2017-03-01 23:29 ` [PATCH 0/3] nvme suspend/resume fix Jens Axboe
  3 siblings, 1 reply; 7+ messages in thread
From: Keith Busch @ 2017-03-01 19:22 UTC (permalink / raw)
  To: Jens Axboe, Sagi Grimberg, Christoph Hellwig, linux-nvme, linux-block
  Cc: Marc MERLIN, Jens Axboe, Keith Busch

A driver may wish to take corrective action if queued requests do not
complete within a set time.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 block/blk-mq.c         | 9 +++++++++
 include/linux/blk-mq.h | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8da2c04..a5e66a7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -81,6 +81,15 @@ void blk_mq_freeze_queue_wait(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);
 
+int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
+				     unsigned long timeout)
+{
+	return wait_event_timeout(q->mq_freeze_wq,
+					percpu_ref_is_zero(&q->q_usage_counter),
+					timeout);
+}
+EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait_timeout);
+
 /*
  * Guarantee no request is in use, so we can change any data structure of
  * the queue afterward.
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8dacf68..b296a90 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -246,6 +246,8 @@ void blk_mq_freeze_queue(struct request_queue *q);
 void blk_mq_unfreeze_queue(struct request_queue *q);
 void blk_mq_freeze_queue_start(struct request_queue *q);
 void blk_mq_freeze_queue_wait(struct request_queue *q);
+int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
+				     unsigned long timeout);
 int blk_mq_reinit_tagset(struct blk_mq_tag_set *set);
 
 int blk_mq_map_queues(struct blk_mq_tag_set *set);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] nvme: Complete all stuck requests
  2017-03-01 19:22 [PATCH 0/3] nvme suspend/resume fix Keith Busch
  2017-03-01 19:22 ` [PATCH 1/3] blk-mq: Export blk_mq_freeze_queue_wait Keith Busch
  2017-03-01 19:22 ` [PATCH 2/3] blk-mq: Provide freeze queue timeout Keith Busch
@ 2017-03-01 19:22 ` Keith Busch
  2017-03-02  7:15   ` Sagi Grimberg
  2017-03-01 23:29 ` [PATCH 0/3] nvme suspend/resume fix Jens Axboe
  3 siblings, 1 reply; 7+ messages in thread
From: Keith Busch @ 2017-03-01 19:22 UTC (permalink / raw)
  To: Jens Axboe, Sagi Grimberg, Christoph Hellwig, linux-nvme, linux-block
  Cc: Marc MERLIN, Jens Axboe, Keith Busch

If the nvme driver is shutting down its controller, the drievr will not
start the queues up again, preventing blk-mq's hot CPU notifier from
making forward progress.

To fix that, this patch starts a request_queue freeze when the driver
resets a controller so no new requests may enter. The driver will wait
for frozen after IO queues are restarted to ensure the queue reference
can be reinitialized when nvme requests to unfreeze the queues.

If the driver is doing a safe shutdown, the driver will wait for the
controller to successfully complete all inflight requests so that we
don't unnecessarily fail them. Once the controller has been disabled,
the queues will be restarted to force remaining entered requests to end
in failure so that blk-mq's hot cpu notifier may progress.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/nvme/host/core.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h |  4 ++++
 drivers/nvme/host/pci.c  | 33 ++++++++++++++++++++++++++++-----
 3 files changed, 79 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 25ec4e5..9b3b57f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2344,6 +2344,53 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_kill_queues);
 
+void nvme_unfreeze(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_mq_unfreeze_queue(ns->queue);
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+EXPORT_SYMBOL_GPL(nvme_unfreeze);
+
+void nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		timeout = blk_mq_freeze_queue_wait_timeout(ns->queue, timeout);
+		if (timeout <= 0)
+			break;
+	}
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+EXPORT_SYMBOL_GPL(nvme_wait_freeze_timeout);
+
+void nvme_wait_freeze(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_mq_freeze_queue_wait(ns->queue);
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+EXPORT_SYMBOL_GPL(nvme_wait_freeze);
+
+void nvme_start_freeze(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_mq_freeze_queue_start(ns->queue);
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+EXPORT_SYMBOL_GPL(nvme_start_freeze);
+
 void nvme_stop_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a3da1e9..2aa20e3 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -294,6 +294,10 @@ void nvme_queue_async_events(struct nvme_ctrl *ctrl);
 void nvme_stop_queues(struct nvme_ctrl *ctrl);
 void nvme_start_queues(struct nvme_ctrl *ctrl);
 void nvme_kill_queues(struct nvme_ctrl *ctrl);
+void nvme_unfreeze(struct nvme_ctrl *ctrl);
+void nvme_wait_freeze(struct nvme_ctrl *ctrl);
+void nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout);
+void nvme_start_freeze(struct nvme_ctrl *ctrl);
 
 #define NVME_QID_ANY -1
 struct request *nvme_alloc_request(struct request_queue *q,
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index eee8f84..26a5fd0 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1675,21 +1675,34 @@ static void nvme_pci_disable(struct nvme_dev *dev)
 static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 {
 	int i, queues;
-	u32 csts = -1;
+	bool dead = true;
+	struct pci_dev *pdev = to_pci_dev(dev->dev);
 
 	del_timer_sync(&dev->watchdog_timer);
 
 	mutex_lock(&dev->shutdown_lock);
-	if (pci_is_enabled(to_pci_dev(dev->dev))) {
-		nvme_stop_queues(&dev->ctrl);
-		csts = readl(dev->bar + NVME_REG_CSTS);
+	if (pci_is_enabled(pdev)) {
+		u32 csts = readl(dev->bar + NVME_REG_CSTS);
+
+		if (dev->ctrl.state == NVME_CTRL_LIVE)
+			nvme_start_freeze(&dev->ctrl);
+		dead = !!((csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY) ||
+			pdev->error_state  != pci_channel_io_normal);
 	}
 
+	/*
+	 * Give the controller a chance to complete all entered requests if
+	 * doing a safe shutdown.
+	 */
+	if (!dead && shutdown)
+		nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
+	nvme_stop_queues(&dev->ctrl);
+
 	queues = dev->online_queues - 1;
 	for (i = dev->queue_count - 1; i > 0; i--)
 		nvme_suspend_queue(dev->queues[i]);
 
-	if (csts & NVME_CSTS_CFS || !(csts & NVME_CSTS_RDY)) {
+	if (dead) {
 		/* A device might become IO incapable very soon during
 		 * probe, before the admin queue is configured. Thus,
 		 * queue_count can be 0 here.
@@ -1704,6 +1717,14 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 
 	blk_mq_tagset_busy_iter(&dev->tagset, nvme_cancel_request, &dev->ctrl);
 	blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_request, &dev->ctrl);
+
+	/*
+	 * The driver will not be starting up queues again if shutting down so
+	 * must flush all entered requests to their failed completion to avoid
+	 * deadlocking blk-mq hot-cpu notifier.
+	 */
+	if (shutdown)
+		nvme_start_queues(&dev->ctrl);
 	mutex_unlock(&dev->shutdown_lock);
 }
 
@@ -1826,7 +1847,9 @@ static void nvme_reset_work(struct work_struct *work)
 		nvme_remove_namespaces(&dev->ctrl);
 	} else {
 		nvme_start_queues(&dev->ctrl);
+		nvme_wait_freeze(&dev->ctrl);
 		nvme_dev_add(dev);
+		nvme_unfreeze(&dev->ctrl);
 	}
 
 	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_LIVE)) {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] nvme suspend/resume fix
  2017-03-01 19:22 [PATCH 0/3] nvme suspend/resume fix Keith Busch
                   ` (2 preceding siblings ...)
  2017-03-01 19:22 ` [PATCH 3/3] nvme: Complete all stuck requests Keith Busch
@ 2017-03-01 23:29 ` Jens Axboe
  3 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2017-03-01 23:29 UTC (permalink / raw)
  To: Keith Busch, Sagi Grimberg, Christoph Hellwig, linux-nvme, linux-block
  Cc: Marc MERLIN, Jens Axboe

On 03/01/2017 12:22 PM, Keith Busch wrote:
> Hi Jens,
> 
> This is hopefully the last version to fix nvme stopping blk-mq's CPU
> event from making forward progress. The solution requires a couple new
> blk-mq exports so the nvme driver can properly sync with queue states.
> 
> Since this depends on the blk-mq parts, and if you approve of the
> proposal, I think it'd be easiest if you can take this directly into
> linux-block/for-linus. Otherwise, we can send you a pull request if you
> Ack the blk-mq parts.
> 
> The difference from the previous patch is an update that Artur
> confirmed passes hibernate on a stacked request queue. Personally,
> I tested this for several hours with fio running buffered writes
> in the back-ground and rtcwake running suspend/resume at intervals.
> This succeeded with no fio errors.

I've queued it up for this series, thanks Keith.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/3] blk-mq: Provide freeze queue timeout
  2017-03-01 19:22 ` [PATCH 2/3] blk-mq: Provide freeze queue timeout Keith Busch
@ 2017-03-02  0:14   ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2017-03-02  0:14 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, Sagi Grimberg, Christoph Hellwig, linux-nvme,
	linux-block, Marc MERLIN, Jens Axboe

> +int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
> +				     unsigned long timeout)
> +{
> +	return wait_event_timeout(q->mq_freeze_wq,
> +					percpu_ref_is_zero(&q->q_usage_counter),
> +					timeout);
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait_timeout);

Can you just add the timeout argument to blk_mq_freeze_queue_wait?
Existing callers can pass 0, which is interpreted as no timeout by
the low-level wait code.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/3] nvme: Complete all stuck requests
  2017-03-01 19:22 ` [PATCH 3/3] nvme: Complete all stuck requests Keith Busch
@ 2017-03-02  7:15   ` Sagi Grimberg
  0 siblings, 0 replies; 7+ messages in thread
From: Sagi Grimberg @ 2017-03-02  7:15 UTC (permalink / raw)
  To: Keith Busch, Jens Axboe, Christoph Hellwig, linux-nvme, linux-block
  Cc: Marc MERLIN, Jens Axboe

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-03-02  7:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-01 19:22 [PATCH 0/3] nvme suspend/resume fix Keith Busch
2017-03-01 19:22 ` [PATCH 1/3] blk-mq: Export blk_mq_freeze_queue_wait Keith Busch
2017-03-01 19:22 ` [PATCH 2/3] blk-mq: Provide freeze queue timeout Keith Busch
2017-03-02  0:14   ` Christoph Hellwig
2017-03-01 19:22 ` [PATCH 3/3] nvme: Complete all stuck requests Keith Busch
2017-03-02  7:15   ` Sagi Grimberg
2017-03-01 23:29 ` [PATCH 0/3] nvme suspend/resume fix Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).