All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] NVMe pci fixes, for-4.11
@ 2017-02-10 23:15 Keith Busch
  2017-02-10 23:15 ` [PATCH 1/5] nvme/pci: Disable on removal when disconnected Keith Busch
                   ` (5 more replies)
  0 siblings, 6 replies; 35+ messages in thread
From: Keith Busch @ 2017-02-10 23:15 UTC (permalink / raw)


These are some fixes for some more device and CPU hot plug scenarios. The
end result should fix up a long-standing deadlock caused by requests
entering a quiesced queue that block blk_mq_freeze_queue_wait forever. I'd
previously tried to fix that here:

  http://lists.infradead.org/pipermail/linux-nvme/2017-January/007620.html

But that was overly complex and overkill anyway on ending requests that
could have been successful. This time is much simpler, isolated to the
nvme driver, and more likely to not fail good IO.

This is based off linux-block/for-next because I needed to make use
of the "was_suspend" bool that was recently added with the Opal patch
set. I also merged in the nvme git tree for final testing, and noticed
there is a conflict between the following to commits:

  https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git/commit/?h=for-next&id=06406d81a2d7cfb8abcc4fa6cdfeb8e5897007c5
  http://git.infradead.org/nvme.git/commitdiff/73494e87734f2c2cd4f9e91e98700cd4fd5f3e05

It's a trivial fix, but I think we gotta sort out our merging flow among
these trees.

Anyway, these are addressing real issues I'll be happy to see fixed for
4.11, but rare enough that I don't think any qualify for stable.

Keith Busch (5):
  nvme/pci: Disable on removal when disconnected
  nvme/pci: Cancel work after watchdog disabled
  nvme/core: Fix race kicking freed request_queue
  nvme/pci: No special case for queue busy on IO
  nvme/pci: Complete all stuck requests

 drivers/nvme/host/core.c | 39 ++++++++++++++++++++++++++++++++++++---
 drivers/nvme/host/nvme.h |  3 +++
 drivers/nvme/host/pci.c  | 45 ++++++++++++++++++++++++++++++++++++++-------
 3 files changed, 77 insertions(+), 10 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/5] nvme/pci: Disable on removal when disconnected
  2017-02-10 23:15 [PATCH 0/5] NVMe pci fixes, for-4.11 Keith Busch
@ 2017-02-10 23:15 ` Keith Busch
  2017-02-13 10:18   ` Johannes Thumshirn
  2017-02-13 13:51   ` Christoph Hellwig
  2017-02-10 23:15 ` [PATCH 2/5] nvme/pci: Cancel work after watchdog disabled Keith Busch
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 35+ messages in thread
From: Keith Busch @ 2017-02-10 23:15 UTC (permalink / raw)


If the device is not present, the driver should disable the queues
immediately. Prior to this, the driver was relying on the watchdog timer
to kill the queues if requests were outstanding to the device, and that
just delays removal up to one second.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/pci.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index e25d632..9126637 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1997,8 +1997,10 @@ static void nvme_remove(struct pci_dev *pdev)
 
 	pci_set_drvdata(pdev, NULL);
 
-	if (!pci_device_is_present(pdev))
+	if (!pci_device_is_present(pdev)) {
 		nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DEAD);
+		nvme_dev_disable(dev, false);
+	}
 
 	flush_work(&dev->reset_work);
 	nvme_uninit_ctrl(&dev->ctrl);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/5] nvme/pci: Cancel work after watchdog disabled
  2017-02-10 23:15 [PATCH 0/5] NVMe pci fixes, for-4.11 Keith Busch
  2017-02-10 23:15 ` [PATCH 1/5] nvme/pci: Disable on removal when disconnected Keith Busch
@ 2017-02-10 23:15 ` Keith Busch
  2017-02-13 10:25   ` Johannes Thumshirn
  2017-02-13 13:51   ` Christoph Hellwig
  2017-02-10 23:15 ` [PATCH 3/5] nvme/core: Fix race kicking freed request_queue Keith Busch
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 35+ messages in thread
From: Keith Busch @ 2017-02-10 23:15 UTC (permalink / raw)


The driver had been flushing the work prior to uninitializing the
controller, hoping the work would not get restarted. If controller
failure or IO time occurs when tearing down the request queues, the
watchdog timer may queue another reset.

We want to make sure that reset work is not running to prevent use after
free errors accessing the device being torn down, so this patch cancels
the reset work only after we know it can never be started again.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9126637..52cca9f 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1677,6 +1677,7 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 	u32 csts = -1;
 
 	del_timer_sync(&dev->watchdog_timer);
+	cancel_work_sync(&dev->reset_work);
 
 	mutex_lock(&dev->shutdown_lock);
 	if (pci_is_enabled(to_pci_dev(dev->dev))) {
@@ -2002,7 +2003,6 @@ static void nvme_remove(struct pci_dev *pdev)
 		nvme_dev_disable(dev, false);
 	}
 
-	flush_work(&dev->reset_work);
 	nvme_uninit_ctrl(&dev->ctrl);
 	nvme_dev_disable(dev, true);
 	nvme_dev_remove_admin(dev);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/5] nvme/core: Fix race kicking freed request_queue
  2017-02-10 23:15 [PATCH 0/5] NVMe pci fixes, for-4.11 Keith Busch
  2017-02-10 23:15 ` [PATCH 1/5] nvme/pci: Disable on removal when disconnected Keith Busch
  2017-02-10 23:15 ` [PATCH 2/5] nvme/pci: Cancel work after watchdog disabled Keith Busch
@ 2017-02-10 23:15 ` Keith Busch
  2017-02-13 10:33   ` Johannes Thumshirn
  2017-02-13 13:53   ` Christoph Hellwig
  2017-02-10 23:15 ` [PATCH 4/5] nvme/pci: No special case for queue busy on IO Keith Busch
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 35+ messages in thread
From: Keith Busch @ 2017-02-10 23:15 UTC (permalink / raw)


If a namespace has already been marked dead, we don't want to kick the
request_queue again since we may have just freed it from another thread.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index de80a84..c302270 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2114,9 +2114,9 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 		 * Revalidating a dead namespace sets capacity to 0. This will
 		 * end buffered writers dirtying pages that can't be synced.
 		 */
-		if (ns->disk && !test_and_set_bit(NVME_NS_DEAD, &ns->flags))
-			revalidate_disk(ns->disk);
-
+		if (!ns->disk || test_and_set_bit(NVME_NS_DEAD, &ns->flags))
+			continue;
+		revalidate_disk(ns->disk);
 		blk_set_queue_dying(ns->queue);
 		blk_mq_abort_requeue_list(ns->queue);
 		blk_mq_start_stopped_hw_queues(ns->queue, true);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 4/5] nvme/pci: No special case for queue busy on IO
  2017-02-10 23:15 [PATCH 0/5] NVMe pci fixes, for-4.11 Keith Busch
                   ` (2 preceding siblings ...)
  2017-02-10 23:15 ` [PATCH 3/5] nvme/core: Fix race kicking freed request_queue Keith Busch
@ 2017-02-10 23:15 ` Keith Busch
  2017-02-13 13:53   ` Christoph Hellwig
  2017-02-10 23:15 ` [PATCH 5/5] nvme/pci: Complete all stuck requests Keith Busch
  2017-02-15  9:40 ` [PATCH 0/5] NVMe pci fixes, for-4.11 Sagi Grimberg
  5 siblings, 1 reply; 35+ messages in thread
From: Keith Busch @ 2017-02-10 23:15 UTC (permalink / raw)


This driver previously required we have a special check for IO submitted
to nvme IO queues that are temporarily suspended. That is no longer
necessary since blk-mq provides a quiesce, so any IO that actually gets
submitted to such a queue must be ended since the queue isn't going to
start back up.

This is fixing a condition where we have fewer IO queues after a
controller reset. This may happen if the number of CPU's has changed,
or controller firmware update changed the queue count, for example.

While it may be possible to complete the IO on a different queue, the
block layer does not provide a way to resubmit a request on a different
hardware context once the request has entered the queue. We don't want
these requests to be stuck indefinitely either, so ending them in error
is our only option at the moment.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/pci.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 52cca9f..92010fd 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -613,10 +613,7 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 	spin_lock_irq(&nvmeq->q_lock);
 	if (unlikely(nvmeq->cq_vector < 0)) {
-		if (ns && !test_bit(NVME_NS_DEAD, &ns->flags))
-			ret = BLK_MQ_RQ_QUEUE_BUSY;
-		else
-			ret = BLK_MQ_RQ_QUEUE_ERROR;
+		ret = BLK_MQ_RQ_QUEUE_ERROR;
 		spin_unlock_irq(&nvmeq->q_lock);
 		goto out_cleanup_iod;
 	}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 5/5] nvme/pci: Complete all stuck requests
  2017-02-10 23:15 [PATCH 0/5] NVMe pci fixes, for-4.11 Keith Busch
                   ` (3 preceding siblings ...)
  2017-02-10 23:15 ` [PATCH 4/5] nvme/pci: No special case for queue busy on IO Keith Busch
@ 2017-02-10 23:15 ` Keith Busch
  2017-02-15  9:50   ` Sagi Grimberg
                     ` (2 more replies)
  2017-02-15  9:40 ` [PATCH 0/5] NVMe pci fixes, for-4.11 Sagi Grimberg
  5 siblings, 3 replies; 35+ messages in thread
From: Keith Busch @ 2017-02-10 23:15 UTC (permalink / raw)


If the nvme driver is shutting down, it will not start the queues back
up until asked to resume. If the block layer has entered requests and
gets a CPU hot plug event prior to the resume event, it will wait for
those requests to exit. Those requests will never exit since the NVMe
driver is quieced, creating a deadlock.

This patch fixes that by freezing the queue and flushing all entered
requests to either their natural completion, or forces their demise. We
only need to do this when requesting to shutdown the controller since
we will not be starting the IO queues back up again.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/core.c | 33 +++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h |  3 +++
 drivers/nvme/host/pci.c  | 34 +++++++++++++++++++++++++++++++++-
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c302270..1888451 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2125,6 +2125,39 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_kill_queues);
 
+void nvme_unfreeze(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_mq_unfreeze_queue(ns->queue);
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+EXPORT_SYMBOL_GPL(nvme_unfreeze);
+
+void nvme_wait_freeze(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_mq_freeze_queue(ns->queue);
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+EXPORT_SYMBOL_GPL(nvme_wait_freeze);
+
+void nvme_start_freeze(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_mq_freeze_queue_start(ns->queue);
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+EXPORT_SYMBOL_GPL(nvme_start_freeze);
+
 void nvme_stop_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 569cba1..7408373 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -289,6 +289,9 @@ void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
 void nvme_stop_queues(struct nvme_ctrl *ctrl);
 void nvme_start_queues(struct nvme_ctrl *ctrl);
 void nvme_kill_queues(struct nvme_ctrl *ctrl);
+void nvme_unfreeze(struct nvme_ctrl *ctrl);
+void nvme_wait_freeze(struct nvme_ctrl *ctrl);
+void nvme_start_freeze(struct nvme_ctrl *ctrl);
 
 #define NVME_QID_ANY -1
 struct request *nvme_alloc_request(struct request_queue *q,
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 92010fd..b6451d8 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1672,12 +1672,15 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 {
 	int i, queues;
 	u32 csts = -1;
+	bool drain_queue = pci_is_enabled(to_pci_dev(dev->dev));
 
 	del_timer_sync(&dev->watchdog_timer);
 	cancel_work_sync(&dev->reset_work);
 
 	mutex_lock(&dev->shutdown_lock);
-	if (pci_is_enabled(to_pci_dev(dev->dev))) {
+	if (drain_queue) {
+		if (shutdown)
+			nvme_start_freeze(&dev->ctrl);
 		nvme_stop_queues(&dev->ctrl);
 		csts = readl(dev->bar + NVME_REG_CSTS);
 	}
@@ -1701,6 +1704,25 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 
 	blk_mq_tagset_busy_iter(&dev->tagset, nvme_cancel_request, &dev->ctrl);
 	blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_request, &dev->ctrl);
+
+	/*
+	 * If shutting down, the driver will not be starting up queues again,
+	 * so must drain all entered requests to their demise to avoid
+	 * deadlocking blk-mq hot-cpu notifier.
+	 */
+	if (drain_queue && shutdown) {
+		nvme_start_queues(&dev->ctrl);
+		/*
+		 * Waiting for frozen increases the freeze depth. Since we
+		 * already start the freeze earlier in this function to stop
+		 * incoming requests, we have to unfreeze after froze to get
+		 * the depth back to the desired.
+		 */
+		nvme_wait_freeze(&dev->ctrl);
+		nvme_unfreeze(&dev->ctrl);
+		nvme_stop_queues(&dev->ctrl);
+	}
+
 	mutex_unlock(&dev->shutdown_lock);
 }
 
@@ -1817,6 +1839,16 @@ static void nvme_reset_work(struct work_struct *work)
 	} else {
 		nvme_start_queues(&dev->ctrl);
 		nvme_dev_add(dev);
+
+		/*
+		 * If we are resuming from suspend, the queue was set to freeze
+		 * to prevent blk-mq's hot CPU notifier from getting stuck on
+		 * requests that entered the queue that NVMe had quiesced. Now
+		 * that we are resuming and have notified blk-mq of the new h/w
+		 * context queue count, it is safe to unfreeze the queues.
+		 */
+		if (was_suspend)
+			nvme_unfreeze(&dev->ctrl);
 	}
 
 	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_LIVE)) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 1/5] nvme/pci: Disable on removal when disconnected
  2017-02-10 23:15 ` [PATCH 1/5] nvme/pci: Disable on removal when disconnected Keith Busch
@ 2017-02-13 10:18   ` Johannes Thumshirn
  2017-02-13 13:51   ` Christoph Hellwig
  1 sibling, 0 replies; 35+ messages in thread
From: Johannes Thumshirn @ 2017-02-13 10:18 UTC (permalink / raw)


On 02/11/2017 12:15 AM, Keith Busch wrote:
> If the device is not present, the driver should disable the queues
> immediately. Prior to this, the driver was relying on the watchdog timer
> to kill the queues if requests were outstanding to the device, and that
> just delays removal up to one second.
> 
> Signed-off-by: Keith Busch <keith.busch at intel.com>
> ---

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>


-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 2/5] nvme/pci: Cancel work after watchdog disabled
  2017-02-10 23:15 ` [PATCH 2/5] nvme/pci: Cancel work after watchdog disabled Keith Busch
@ 2017-02-13 10:25   ` Johannes Thumshirn
  2017-02-13 13:51   ` Christoph Hellwig
  1 sibling, 0 replies; 35+ messages in thread
From: Johannes Thumshirn @ 2017-02-13 10:25 UTC (permalink / raw)


On 02/11/2017 12:15 AM, Keith Busch wrote:
> The driver had been flushing the work prior to uninitializing the
> controller, hoping the work would not get restarted. If controller
> failure or IO time occurs when tearing down the request queues, the
> watchdog timer may queue another reset.
> 
> We want to make sure that reset work is not running to prevent use after
> free errors accessing the device being torn down, so this patch cancels
> the reset work only after we know it can never be started again.
> 
> Signed-off-by: Keith Busch <keith.busch at intel.com>
> ---

Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>


-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 3/5] nvme/core: Fix race kicking freed request_queue
  2017-02-10 23:15 ` [PATCH 3/5] nvme/core: Fix race kicking freed request_queue Keith Busch
@ 2017-02-13 10:33   ` Johannes Thumshirn
  2017-02-13 13:53   ` Christoph Hellwig
  1 sibling, 0 replies; 35+ messages in thread
From: Johannes Thumshirn @ 2017-02-13 10:33 UTC (permalink / raw)


On 02/11/2017 12:15 AM, Keith Busch wrote:
> If a namespace has already been marked dead, we don't want to kick the
> request_queue again since we may have just freed it from another thread.
> 
> Signed-off-by: Keith Busch <keith.busch at intel.com>
> ---

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>


-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/5] nvme/pci: Disable on removal when disconnected
  2017-02-10 23:15 ` [PATCH 1/5] nvme/pci: Disable on removal when disconnected Keith Busch
  2017-02-13 10:18   ` Johannes Thumshirn
@ 2017-02-13 13:51   ` Christoph Hellwig
  1 sibling, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2017-02-13 13:51 UTC (permalink / raw)


On Fri, Feb 10, 2017@06:15:49PM -0500, Keith Busch wrote:
> If the device is not present, the driver should disable the queues
> immediately. Prior to this, the driver was relying on the watchdog timer
> to kill the queues if requests were outstanding to the device, and that
> just delays removal up to one second.
> 
> Signed-off-by: Keith Busch <keith.busch at intel.com>

Looks fine,

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 2/5] nvme/pci: Cancel work after watchdog disabled
  2017-02-10 23:15 ` [PATCH 2/5] nvme/pci: Cancel work after watchdog disabled Keith Busch
  2017-02-13 10:25   ` Johannes Thumshirn
@ 2017-02-13 13:51   ` Christoph Hellwig
  1 sibling, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2017-02-13 13:51 UTC (permalink / raw)


Looks fine,

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 3/5] nvme/core: Fix race kicking freed request_queue
  2017-02-10 23:15 ` [PATCH 3/5] nvme/core: Fix race kicking freed request_queue Keith Busch
  2017-02-13 10:33   ` Johannes Thumshirn
@ 2017-02-13 13:53   ` Christoph Hellwig
  1 sibling, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2017-02-13 13:53 UTC (permalink / raw)


On Fri, Feb 10, 2017@06:15:51PM -0500, Keith Busch wrote:
> If a namespace has already been marked dead, we don't want to kick the
> request_queue again since we may have just freed it from another thread.
> 
> Signed-off-by: Keith Busch <keith.busch at intel.com>
> ---
>  drivers/nvme/host/core.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index de80a84..c302270 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2114,9 +2114,9 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
>  		 * Revalidating a dead namespace sets capacity to 0. This will
>  		 * end buffered writers dirtying pages that can't be synced.
>  		 */
> -		if (ns->disk && !test_and_set_bit(NVME_NS_DEAD, &ns->flags))
> -			revalidate_disk(ns->disk);
> -
> +		if (!ns->disk || test_and_set_bit(NVME_NS_DEAD, &ns->flags))
> +			continue;

I think the comment above needs to be modified or removed.  Except
for this the patch looks fine to me:

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 4/5] nvme/pci: No special case for queue busy on IO
  2017-02-10 23:15 ` [PATCH 4/5] nvme/pci: No special case for queue busy on IO Keith Busch
@ 2017-02-13 13:53   ` Christoph Hellwig
  0 siblings, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2017-02-13 13:53 UTC (permalink / raw)


Looks fine,

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 0/5] NVMe pci fixes, for-4.11
  2017-02-10 23:15 [PATCH 0/5] NVMe pci fixes, for-4.11 Keith Busch
                   ` (4 preceding siblings ...)
  2017-02-10 23:15 ` [PATCH 5/5] nvme/pci: Complete all stuck requests Keith Busch
@ 2017-02-15  9:40 ` Sagi Grimberg
  5 siblings, 0 replies; 35+ messages in thread
From: Sagi Grimberg @ 2017-02-15  9:40 UTC (permalink / raw)


Hi Keith,

> These are some fixes for some more device and CPU hot plug scenarios. The
> end result should fix up a long-standing deadlock caused by requests
> entering a quiesced queue that block blk_mq_freeze_queue_wait forever. I'd
> previously tried to fix that here:
>
>   http://lists.infradead.org/pipermail/linux-nvme/2017-January/007620.html
>
> But that was overly complex and overkill anyway on ending requests that
> could have been successful. This time is much simpler, isolated to the
> nvme driver, and more likely to not fail good IO.

Umm, I have some questions on 5/5, I think we can go ahead and queue
1-4 though.

> This is based off linux-block/for-next because I needed to make use
> of the "was_suspend" bool that was recently added with the Opal patch
> set. I also merged in the nvme git tree for final testing, and noticed
> there is a conflict between the following to commits:
>
>   https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git/commit/?h=for-next&id=06406d81a2d7cfb8abcc4fa6cdfeb8e5897007c5
>   http://git.infradead.org/nvme.git/commitdiff/73494e87734f2c2cd4f9e91e98700cd4fd5f3e05
>
> It's a trivial fix, but I think we gotta sort out our merging flow among
> these trees.

Where is this? I didn't see this set on nvme-4.11?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/5] nvme/pci: Complete all stuck requests
  2017-02-10 23:15 ` [PATCH 5/5] nvme/pci: Complete all stuck requests Keith Busch
@ 2017-02-15  9:50   ` Sagi Grimberg
  2017-02-15 15:46     ` Keith Busch
  2017-02-15 18:14   ` Marc MERLIN
  2017-02-17 15:27   ` Christoph Hellwig
  2 siblings, 1 reply; 35+ messages in thread
From: Sagi Grimberg @ 2017-02-15  9:50 UTC (permalink / raw)



> If the nvme driver is shutting down, it will not start the queues back
> up until asked to resume. If the block layer has entered requests and
> gets a CPU hot plug event prior to the resume event, it will wait for
> those requests to exit. Those requests will never exit since the NVMe
> driver is quieced, creating a deadlock.
>
> This patch fixes that by freezing the queue and flushing all entered
> requests to either their natural completion, or forces their demise. We
> only need to do this when requesting to shutdown the controller since
> we will not be starting the IO queues back up again.

How is this is something specific to nvme? What prevents this
for other multi-queue devices that shutdown during live IO?

Can you please describe the race in specific? Is it stuck on
nvme_ns_remove (blk_cleanup_queue)? If so, then I think we
might want to fix blk_cleanup_queue to start/drain/wait
instead?

I think it's acceptable to have drivers make their own use
of freeze_start and freeze_wait, but if this is not
nvme specific perhaps we want to move it to block instead?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/5] nvme/pci: Complete all stuck requests
  2017-02-15  9:50   ` Sagi Grimberg
@ 2017-02-15 15:46     ` Keith Busch
  2017-02-15 16:04       ` Marc MERLIN
  0 siblings, 1 reply; 35+ messages in thread
From: Keith Busch @ 2017-02-15 15:46 UTC (permalink / raw)


On Wed, Feb 15, 2017@11:50:15AM +0200, Sagi Grimberg wrote:
> How is this is something specific to nvme? What prevents this
> for other multi-queue devices that shutdown during live IO?
> 
> Can you please describe the race in specific? Is it stuck on
> nvme_ns_remove (blk_cleanup_queue)? If so, then I think we
> might want to fix blk_cleanup_queue to start/drain/wait
> instead?
> 
> I think it's acceptable to have drivers make their own use
> of freeze_start and freeze_wait, but if this is not
> nvme specific perhaps we want to move it to block instead?

There are many sequences that can get a request queue stuck forever, but
the one that was initially raised is on a system suspend. It could look
something like this:

  CPU A                       CPU B
  -----                       -----
  nvme_suspend
   nvme_dev_disable           generic_make_request
    nvme_stop_queues           blk_queue_enter
     blk_queue_quiesce_queue    blk_mq_alloc_request
                                 blk_mq_map_request
                                  blk_mq_enter_live
                                 blk_mq_run_hw_queue <-- the hctx is stopped,
                                                         request is stuck until
                                                         restarted.

Shortly later, suspend takes a CPU offline:

 blk_mq_queue_reinit_dead
  blk_mq_queue_reinit_work
   blk_mq_free_queue_wait

Now we're stuck forever waiting for that queue to freeze because a request
entered a stopped hctx that we're not going to bring back online. The
driver was told to suspend, and suspend must complete before resume
can start.

The problem is not specific to pci nvme, but control needs to pass back to
the device specific driver: after halting new queue entering by starting
the queue freeze, the driver needs a chance to complete everything that
was submitted. Only after the driver finishes its specific clean up tasks,
it can flush all the entered requests to a failed completion.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/5] nvme/pci: Complete all stuck requests
  2017-02-15 15:46     ` Keith Busch
@ 2017-02-15 16:04       ` Marc MERLIN
  2017-02-15 17:36         ` J Freyensee
  2017-02-16  9:12         ` Sagi Grimberg
  0 siblings, 2 replies; 35+ messages in thread
From: Marc MERLIN @ 2017-02-15 16:04 UTC (permalink / raw)


On Wed, Feb 15, 2017@10:46:49AM -0500, Keith Busch wrote:
> On Wed, Feb 15, 2017@11:50:15AM +0200, Sagi Grimberg wrote:
> > How is this is something specific to nvme? What prevents this
> > for other multi-queue devices that shutdown during live IO?
> > 
> > Can you please describe the race in specific? Is it stuck on
> > nvme_ns_remove (blk_cleanup_queue)? If so, then I think we
> > might want to fix blk_cleanup_queue to start/drain/wait
> > instead?
> > 
> > I think it's acceptable to have drivers make their own use
> > of freeze_start and freeze_wait, but if this is not
> > nvme specific perhaps we want to move it to block instead?
> 
> There are many sequences that can get a request queue stuck forever, but
> the one that was initially raised is on a system suspend. It could look
> something like this:
> 
>   CPU A                       CPU B
>   -----                       -----
>   nvme_suspend
>    nvme_dev_disable           generic_make_request
>     nvme_stop_queues           blk_queue_enter
>      blk_queue_quiesce_queue    blk_mq_alloc_request
>                                  blk_mq_map_request
>                                   blk_mq_enter_live
>                                  blk_mq_run_hw_queue <-- the hctx is stopped,
>                                                          request is stuck until
>                                                          restarted.

Howdy,

Let me chime in here about how the stuck request thing is not just
theory, or made up :)

I first reported this in Aug 2016: https://patchwork.kernel.org/patch/9265695/

Long story short, I have been unable to upgrade to any kernel past 4.4
due to my M2 NVME drive. No matter what I did, S3 suspend would not
succeed or resume (as in ever, not just sometimes).

It's only until the last patch I got from Keith applied to 4.10
linux-block/for-next that I can _finally_ upgrade to a kernel past 4.4
and that suspend/resume works.

So while I don't have the knowledge to say whether Keith's patch is the
best way to fix my problem, it is the only thing I've seen that works in
the last 9 months, and has taken me from 100% failure to 0% failure so
far.

As a result, a big thanks to Keith again and thumbs up from me.

Hope this helps.
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/5] nvme/pci: Complete all stuck requests
  2017-02-15 16:04       ` Marc MERLIN
@ 2017-02-15 17:36         ` J Freyensee
  2017-02-16  9:12         ` Sagi Grimberg
  1 sibling, 0 replies; 35+ messages in thread
From: J Freyensee @ 2017-02-15 17:36 UTC (permalink / raw)



> So while I don't have the knowledge to say whether Keith's patch is the
> best way to fix my problem, it is the only thing I've seen that works in
> the last 9 months, and has taken me from 100% failure to 0% failure so
> far.
> 
> As a result, a big thanks to Keith again and thumbs up from me.

You should put a "Tested-by:" tag on the patch to help get this patch accepted
:-).

>?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/5] nvme/pci: Complete all stuck requests
  2017-02-10 23:15 ` [PATCH 5/5] nvme/pci: Complete all stuck requests Keith Busch
  2017-02-15  9:50   ` Sagi Grimberg
@ 2017-02-15 18:14   ` Marc MERLIN
  2017-12-14  3:36     ` Marc MERLIN
  2017-02-17 15:27   ` Christoph Hellwig
  2 siblings, 1 reply; 35+ messages in thread
From: Marc MERLIN @ 2017-02-15 18:14 UTC (permalink / raw)


On Fri, Feb 10, 2017@06:15:53PM -0500, Keith Busch wrote:
> If the nvme driver is shutting down, it will not start the queues back
> up until asked to resume. If the block layer has entered requests and
> gets a CPU hot plug event prior to the resume event, it will wait for
> those requests to exit. Those requests will never exit since the NVMe
> driver is quieced, creating a deadlock.
> 
> This patch fixes that by freezing the queue and flushing all entered
> requests to either their natural completion, or forces their demise. We
> only need to do this when requesting to shutdown the controller since
> we will not be starting the IO queues back up again.
> 
> Signed-off-by: Keith Busch <keith.busch at intel.com>

Tested-by: Marc MERLIN <marc at merlins.org>

> ---
>  drivers/nvme/host/core.c | 33 +++++++++++++++++++++++++++++++++
>  drivers/nvme/host/nvme.h |  3 +++
>  drivers/nvme/host/pci.c  | 34 +++++++++++++++++++++++++++++++++-
>  3 files changed, 69 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index c302270..1888451 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2125,6 +2125,39 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
>  }
>  EXPORT_SYMBOL_GPL(nvme_kill_queues);
>  
> +void nvme_unfreeze(struct nvme_ctrl *ctrl)
> +{
> +	struct nvme_ns *ns;
> +
> +	mutex_lock(&ctrl->namespaces_mutex);
> +	list_for_each_entry(ns, &ctrl->namespaces, list)
> +		blk_mq_unfreeze_queue(ns->queue);
> +	mutex_unlock(&ctrl->namespaces_mutex);
> +}
> +EXPORT_SYMBOL_GPL(nvme_unfreeze);
> +
> +void nvme_wait_freeze(struct nvme_ctrl *ctrl)
> +{
> +	struct nvme_ns *ns;
> +
> +	mutex_lock(&ctrl->namespaces_mutex);
> +	list_for_each_entry(ns, &ctrl->namespaces, list)
> +		blk_mq_freeze_queue(ns->queue);
> +	mutex_unlock(&ctrl->namespaces_mutex);
> +}
> +EXPORT_SYMBOL_GPL(nvme_wait_freeze);
> +
> +void nvme_start_freeze(struct nvme_ctrl *ctrl)
> +{
> +	struct nvme_ns *ns;
> +
> +	mutex_lock(&ctrl->namespaces_mutex);
> +	list_for_each_entry(ns, &ctrl->namespaces, list)
> +		blk_mq_freeze_queue_start(ns->queue);
> +	mutex_unlock(&ctrl->namespaces_mutex);
> +}
> +EXPORT_SYMBOL_GPL(nvme_start_freeze);
> +
>  void nvme_stop_queues(struct nvme_ctrl *ctrl)
>  {
>  	struct nvme_ns *ns;
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 569cba1..7408373 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -289,6 +289,9 @@ void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
>  void nvme_stop_queues(struct nvme_ctrl *ctrl);
>  void nvme_start_queues(struct nvme_ctrl *ctrl);
>  void nvme_kill_queues(struct nvme_ctrl *ctrl);
> +void nvme_unfreeze(struct nvme_ctrl *ctrl);
> +void nvme_wait_freeze(struct nvme_ctrl *ctrl);
> +void nvme_start_freeze(struct nvme_ctrl *ctrl);
>  
>  #define NVME_QID_ANY -1
>  struct request *nvme_alloc_request(struct request_queue *q,
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 92010fd..b6451d8 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1672,12 +1672,15 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
>  {
>  	int i, queues;
>  	u32 csts = -1;
> +	bool drain_queue = pci_is_enabled(to_pci_dev(dev->dev));
>  
>  	del_timer_sync(&dev->watchdog_timer);
>  	cancel_work_sync(&dev->reset_work);
>  
>  	mutex_lock(&dev->shutdown_lock);
> -	if (pci_is_enabled(to_pci_dev(dev->dev))) {
> +	if (drain_queue) {
> +		if (shutdown)
> +			nvme_start_freeze(&dev->ctrl);
>  		nvme_stop_queues(&dev->ctrl);
>  		csts = readl(dev->bar + NVME_REG_CSTS);
>  	}
> @@ -1701,6 +1704,25 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
>  
>  	blk_mq_tagset_busy_iter(&dev->tagset, nvme_cancel_request, &dev->ctrl);
>  	blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_request, &dev->ctrl);
> +
> +	/*
> +	 * If shutting down, the driver will not be starting up queues again,
> +	 * so must drain all entered requests to their demise to avoid
> +	 * deadlocking blk-mq hot-cpu notifier.
> +	 */
> +	if (drain_queue && shutdown) {
> +		nvme_start_queues(&dev->ctrl);
> +		/*
> +		 * Waiting for frozen increases the freeze depth. Since we
> +		 * already start the freeze earlier in this function to stop
> +		 * incoming requests, we have to unfreeze after froze to get
> +		 * the depth back to the desired.
> +		 */
> +		nvme_wait_freeze(&dev->ctrl);
> +		nvme_unfreeze(&dev->ctrl);
> +		nvme_stop_queues(&dev->ctrl);
> +	}
> +
>  	mutex_unlock(&dev->shutdown_lock);
>  }
>  
> @@ -1817,6 +1839,16 @@ static void nvme_reset_work(struct work_struct *work)
>  	} else {
>  		nvme_start_queues(&dev->ctrl);
>  		nvme_dev_add(dev);
> +
> +		/*
> +		 * If we are resuming from suspend, the queue was set to freeze
> +		 * to prevent blk-mq's hot CPU notifier from getting stuck on
> +		 * requests that entered the queue that NVMe had quiesced. Now
> +		 * that we are resuming and have notified blk-mq of the new h/w
> +		 * context queue count, it is safe to unfreeze the queues.
> +		 */
> +		if (was_suspend)
> +			nvme_unfreeze(&dev->ctrl);
>  	}
>  
>  	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_LIVE)) {
> -- 
> 1.8.3.1
> 
> 

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/5] nvme/pci: Complete all stuck requests
  2017-02-15 16:04       ` Marc MERLIN
  2017-02-15 17:36         ` J Freyensee
@ 2017-02-16  9:12         ` Sagi Grimberg
  2017-02-16 22:51           ` Keith Busch
  1 sibling, 1 reply; 35+ messages in thread
From: Sagi Grimberg @ 2017-02-16  9:12 UTC (permalink / raw)



> Howdy,

Hey Mark,

> Let me chime in here about how the stuck request thing is not just
> theory, or made up :)
>
> I first reported this in Aug 2016: https://patchwork.kernel.org/patch/9265695/
>
> Long story short, I have been unable to upgrade to any kernel past 4.4
> due to my M2 NVME drive. No matter what I did, S3 suspend would not
> succeed or resume (as in ever, not just sometimes).
>
> It's only until the last patch I got from Keith applied to 4.10
> linux-block/for-next that I can _finally_ upgrade to a kernel past 4.4
> and that suspend/resume works.
>
> So while I don't have the knowledge to say whether Keith's patch is the
> best way to fix my problem, it is the only thing I've seen that works in
> the last 9 months, and has taken me from 100% failure to 0% failure so
> far.
>
> As a result, a big thanks to Keith again and thumbs up from me.

I have no doubt this patch is useful, I'm just wandering if we can
place the functionality in a more generic location so that more
drivers can enjoy it.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/5] nvme/pci: Complete all stuck requests
  2017-02-16  9:12         ` Sagi Grimberg
@ 2017-02-16 22:51           ` Keith Busch
  2017-02-17  8:25             ` Christoph Hellwig
  0 siblings, 1 reply; 35+ messages in thread
From: Keith Busch @ 2017-02-16 22:51 UTC (permalink / raw)


On Thu, Feb 16, 2017@11:12:53AM +0200, Sagi Grimberg wrote:
> 
> I have no doubt this patch is useful, I'm just wandering if we can
> place the functionality in a more generic location so that more
> drivers can enjoy it.

What this is doing is too driver specific to make generically.

I think we actually want the ability to back out a request and renter
a different hardware context, but that's too hard. :)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/5] nvme/pci: Complete all stuck requests
  2017-02-16 22:51           ` Keith Busch
@ 2017-02-17  8:25             ` Christoph Hellwig
  0 siblings, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2017-02-17  8:25 UTC (permalink / raw)


On Thu, Feb 16, 2017@05:51:42PM -0500, Keith Busch wrote:
> On Thu, Feb 16, 2017@11:12:53AM +0200, Sagi Grimberg wrote:
> > 
> > I have no doubt this patch is useful, I'm just wandering if we can
> > place the functionality in a more generic location so that more
> > drivers can enjoy it.
> 
> What this is doing is too driver specific to make generically.
> 
> I think we actually want the ability to back out a request and renter
> a different hardware context, but that's too hard. :)

I don't think it's all that bad.  But I'll need to take another look
at your patch - I'd really like to see the issue fixed.  But it's just
been too complicated to easily understand so it'll take some time.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/5] nvme/pci: Complete all stuck requests
  2017-02-10 23:15 ` [PATCH 5/5] nvme/pci: Complete all stuck requests Keith Busch
  2017-02-15  9:50   ` Sagi Grimberg
  2017-02-15 18:14   ` Marc MERLIN
@ 2017-02-17 15:27   ` Christoph Hellwig
  2017-02-17 16:33     ` Keith Busch
  2 siblings, 1 reply; 35+ messages in thread
From: Christoph Hellwig @ 2017-02-17 15:27 UTC (permalink / raw)


>  	u32 csts = -1;
> +	bool drain_queue = pci_is_enabled(to_pci_dev(dev->dev));
>  
>  	del_timer_sync(&dev->watchdog_timer);
>  	cancel_work_sync(&dev->reset_work);
>  
>  	mutex_lock(&dev->shutdown_lock);
> -	if (pci_is_enabled(to_pci_dev(dev->dev))) {
> +	if (drain_queue) {
> +		if (shutdown)
> +			nvme_start_freeze(&dev->ctrl);

So if the devices is enabled and we are going to shut the device
down we're going to freeze all I/O queues here.

Question 1:  why skip the freeze if we are not shutting down?

>  		nvme_stop_queues(&dev->ctrl);

Especially as we're now going to wait for all I/O to finish here in
all shutdown cases.

>  		csts = readl(dev->bar + NVME_REG_CSTS);
>  	}
> @@ -1701,6 +1704,25 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
>  
>  	blk_mq_tagset_busy_iter(&dev->tagset, nvme_cancel_request, &dev->ctrl);
>  	blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_request, &dev->ctrl);

And kill all busy requests down here.

> +
> +	/*
> +	 * If shutting down, the driver will not be starting up queues again,
> +	 * so must drain all entered requests to their demise to avoid
> +	 * deadlocking blk-mq hot-cpu notifier.
> +	 */
> +	if (drain_queue && shutdown) {
> +		nvme_start_queues(&dev->ctrl);
> +		/*
> +		 * Waiting for frozen increases the freeze depth. Since we
> +		 * already start the freeze earlier in this function to stop
> +		 * incoming requests, we have to unfreeze after froze to get
> +		 * the depth back to the desired.
> +		 */
> +		nvme_wait_freeze(&dev->ctrl);
> +		nvme_unfreeze(&dev->ctrl);
> +		nvme_stop_queues(&dev->ctrl);

And all this (just like the start_free + quience sequence above)
really sounds like something we'd need to move to the core.

> +		/*
> +		 * If we are resuming from suspend, the queue was set to freeze
> +		 * to prevent blk-mq's hot CPU notifier from getting stuck on
> +		 * requests that entered the queue that NVMe had quiesced. Now
> +		 * that we are resuming and have notified blk-mq of the new h/w
> +		 * context queue count, it is safe to unfreeze the queues.
> +		 */
> +		if (was_suspend)
> +			nvme_unfreeze(&dev->ctrl);

And this change I don't understand at all.  It doesn't seem to pair
up with anything else in the patch.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/5] nvme/pci: Complete all stuck requests
  2017-02-17 15:27   ` Christoph Hellwig
@ 2017-02-17 16:33     ` Keith Busch
  2017-02-20 10:05       ` Christoph Hellwig
  2017-02-21 21:55       ` Sagi Grimberg
  0 siblings, 2 replies; 35+ messages in thread
From: Keith Busch @ 2017-02-17 16:33 UTC (permalink / raw)


On Fri, Feb 17, 2017@04:27:13PM +0100, Christoph Hellwig wrote:
> >  	u32 csts = -1;
> > +	bool drain_queue = pci_is_enabled(to_pci_dev(dev->dev));
> >  
> >  	del_timer_sync(&dev->watchdog_timer);
> >  	cancel_work_sync(&dev->reset_work);
> >  
> >  	mutex_lock(&dev->shutdown_lock);
> > -	if (pci_is_enabled(to_pci_dev(dev->dev))) {
> > +	if (drain_queue) {
> > +		if (shutdown)
> > +			nvme_start_freeze(&dev->ctrl);
> 
> So if the devices is enabled and we are going to shut the device
> down we're going to freeze all I/O queues here.
> 
> Question 1:  why skip the freeze if we are not shutting down?

That is a great question!

If we are not shutting down, we are in one of two scenarios: a simple
reset, or we are killing this controller's request queues.

If the former, we don't want to freeze and flush because we are about
to bring the hctx back online, and anything queued may continue as normal.

For the latter, we will flush all entered requests to their failed
completion through nvme_kill_queues.

> >  		nvme_stop_queues(&dev->ctrl);
> 
> Especially as we're now going to wait for all I/O to finish here in
> all shutdown cases.

nvme_stop_queues only quieces. This doesn't actually wait for any IO to
complete.
 
> >  		csts = readl(dev->bar + NVME_REG_CSTS);
> >  	}
> > @@ -1701,6 +1704,25 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
> >  
> >  	blk_mq_tagset_busy_iter(&dev->tagset, nvme_cancel_request, &dev->ctrl);
> >  	blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_request, &dev->ctrl);
> 
> And kill all busy requests down here.

This only completes requests that were submitted to the controller but have
not been returned, and those requests are probably going to get requeued on
q->requeue_list, leaving the entered reference non-zero.

There may also be requests that entered the queue but are stuck there
after the quiece.
 
> > +
> > +	/*
> > +	 * If shutting down, the driver will not be starting up queues again,
> > +	 * so must drain all entered requests to their demise to avoid
> > +	 * deadlocking blk-mq hot-cpu notifier.
> > +	 */
> > +	if (drain_queue && shutdown) {
> > +		nvme_start_queues(&dev->ctrl);
> > +		/*
> > +		 * Waiting for frozen increases the freeze depth. Since we
> > +		 * already start the freeze earlier in this function to stop
> > +		 * incoming requests, we have to unfreeze after froze to get
> > +		 * the depth back to the desired.
> > +		 */
> > +		nvme_wait_freeze(&dev->ctrl);
> > +		nvme_unfreeze(&dev->ctrl);
> > +		nvme_stop_queues(&dev->ctrl);
> 
> And all this (just like the start_free + quience sequence above)
> really sounds like something we'd need to move to the core.

Maybe. I'm okay with moving it to the core and document the intended
usage, but the sequence inbetween initiating the freeze and waiting for
frozen is specific to the driver, as well as knowing when it needs to
be done. The above could be moved to core, but it only makes sense to
call it only if the request to start the freeze was done prior to
reclaiming controller owned IO.
 
> > +		/*
> > +		 * If we are resuming from suspend, the queue was set to freeze
> > +		 * to prevent blk-mq's hot CPU notifier from getting stuck on
> > +		 * requests that entered the queue that NVMe had quiesced. Now
> > +		 * that we are resuming and have notified blk-mq of the new h/w
> > +		 * context queue count, it is safe to unfreeze the queues.
> > +		 */
> > +		if (was_suspend)
> > +			nvme_unfreeze(&dev->ctrl);
> 
> And this change I don't understand at all.  It doesn't seem to pair
> up with anything else in the patch.

If we had done a controller shutdown, as would happen on a system suspend,
the resume needs to restore the queue freeze depth. That's all this
is doing.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/5] nvme/pci: Complete all stuck requests
  2017-02-17 16:33     ` Keith Busch
@ 2017-02-20 10:05       ` Christoph Hellwig
  2017-02-21 15:57         ` Keith Busch
  2017-02-21 21:55       ` Sagi Grimberg
  1 sibling, 1 reply; 35+ messages in thread
From: Christoph Hellwig @ 2017-02-20 10:05 UTC (permalink / raw)


> > > +		 * If we are resuming from suspend, the queue was set to freeze
> > > +		 * to prevent blk-mq's hot CPU notifier from getting stuck on
> > > +		 * requests that entered the queue that NVMe had quiesced. Now
> > > +		 * that we are resuming and have notified blk-mq of the new h/w
> > > +		 * context queue count, it is safe to unfreeze the queues.
> > > +		 */
> > > +		if (was_suspend)
> > > +			nvme_unfreeze(&dev->ctrl);
> > 
> > And this change I don't understand at all.  It doesn't seem to pair
> > up with anything else in the patch.
> 
> If we had done a controller shutdown, as would happen on a system suspend,
> the resume needs to restore the queue freeze depth. That's all this
> is doing.

I've spent tons of times trying to understand this, but still fail
to.  Where is the nvme_start_freeze / nvme_wait_freeze that this
pairs with?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/5] nvme/pci: Complete all stuck requests
  2017-02-20 10:05       ` Christoph Hellwig
@ 2017-02-21 15:57         ` Keith Busch
  2017-02-22  7:17           ` Christoph Hellwig
  0 siblings, 1 reply; 35+ messages in thread
From: Keith Busch @ 2017-02-21 15:57 UTC (permalink / raw)


On Mon, Feb 20, 2017@11:05:15AM +0100, Christoph Hellwig wrote:
> > > > +		 * If we are resuming from suspend, the queue was set to freeze
> > > > +		 * to prevent blk-mq's hot CPU notifier from getting stuck on
> > > > +		 * requests that entered the queue that NVMe had quiesced. Now
> > > > +		 * that we are resuming and have notified blk-mq of the new h/w
> > > > +		 * context queue count, it is safe to unfreeze the queues.
> > > > +		 */
> > > > +		if (was_suspend)
> > > > +			nvme_unfreeze(&dev->ctrl);
> > > 
> > > And this change I don't understand at all.  It doesn't seem to pair
> > > up with anything else in the patch.
> > 
> > If we had done a controller shutdown, as would happen on a system suspend,
> > the resume needs to restore the queue freeze depth. That's all this
> > is doing.
> 
> I've spent tons of times trying to understand this, but still fail
> to.  Where is the nvme_start_freeze / nvme_wait_freeze that this
> pairs with?

This is for suspend/resume. The freeze start is done during the suspend
phase, and the unfreeze on resume. Power management calls nvme_suspend,
which calls nvme_dev_disable with 'suspend == true', and that sets the
freeze depth. Power management later calls nvme_resume, which queues
the reset work that will observe 'was_suspend == true', and that pairs
the unfreeze.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/5] nvme/pci: Complete all stuck requests
  2017-02-17 16:33     ` Keith Busch
  2017-02-20 10:05       ` Christoph Hellwig
@ 2017-02-21 21:55       ` Sagi Grimberg
  2017-02-21 23:26         ` Keith Busch
  1 sibling, 1 reply; 35+ messages in thread
From: Sagi Grimberg @ 2017-02-21 21:55 UTC (permalink / raw)



>>> +
>>> +	/*
>>> +	 * If shutting down, the driver will not be starting up queues again,
>>> +	 * so must drain all entered requests to their demise to avoid
>>> +	 * deadlocking blk-mq hot-cpu notifier.
>>> +	 */
>>> +	if (drain_queue && shutdown) {
>>> +		nvme_start_queues(&dev->ctrl);
>>> +		/*
>>> +		 * Waiting for frozen increases the freeze depth. Since we
>>> +		 * already start the freeze earlier in this function to stop
>>> +		 * incoming requests, we have to unfreeze after froze to get
>>> +		 * the depth back to the desired.
>>> +		 */
>>> +		nvme_wait_freeze(&dev->ctrl);
>>> +		nvme_unfreeze(&dev->ctrl);
>>> +		nvme_stop_queues(&dev->ctrl);
>>
>> And all this (just like the start_free + quience sequence above)
>> really sounds like something we'd need to move to the core.
>
> Maybe. I'm okay with moving it to the core and document the intended
> usage, but the sequence inbetween initiating the freeze and waiting for
> frozen is specific to the driver, as well as knowing when it needs to
> be done. The above could be moved to core, but it only makes sense to
> call it only if the request to start the freeze was done prior to
> reclaiming controller owned IO.

What if we pass a flag to blk_mq_quiesce_queue to indicate that we want
it to flush (and freeze) all entered requests?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/5] nvme/pci: Complete all stuck requests
  2017-02-21 21:55       ` Sagi Grimberg
@ 2017-02-21 23:26         ` Keith Busch
  0 siblings, 0 replies; 35+ messages in thread
From: Keith Busch @ 2017-02-21 23:26 UTC (permalink / raw)


On Tue, Feb 21, 2017@11:55:10PM +0200, Sagi Grimberg wrote:
> > 
> > Maybe. I'm okay with moving it to the core and document the intended
> > usage, but the sequence inbetween initiating the freeze and waiting for
> > frozen is specific to the driver, as well as knowing when it needs to
> > be done. The above could be moved to core, but it only makes sense to
> > call it only if the request to start the freeze was done prior to
> > reclaiming controller owned IO.
> 
> What if we pass a flag to blk_mq_quiesce_queue to indicate that we want
> it to flush (and freeze) all entered requests?

Sounds like that's shuffling the set up for no particular gain, and has
potential for getting the freeze depth off from misuse. Control still
has to pass back to the driver to do driver specific tasks to reclaim
IO and set up to fail requests that have entered but not issued prior
to temporarily restarting the hctx's and waiting for frozen.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/5] nvme/pci: Complete all stuck requests
  2017-02-21 15:57         ` Keith Busch
@ 2017-02-22  7:17           ` Christoph Hellwig
  2017-02-22 14:45             ` Keith Busch
  0 siblings, 1 reply; 35+ messages in thread
From: Christoph Hellwig @ 2017-02-22  7:17 UTC (permalink / raw)


On Tue, Feb 21, 2017@10:57:04AM -0500, Keith Busch wrote:
> This is for suspend/resume. The freeze start is done during the suspend
> phase, and the unfreeze on resume. Power management calls nvme_suspend,
> which calls nvme_dev_disable with 'suspend == true',

I guess I'm missing something in my tree then.  I have a local tree
with your five patches applied and there is no suspend argument anywhere
in pci.c.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/5] nvme/pci: Complete all stuck requests
  2017-02-22  7:17           ` Christoph Hellwig
@ 2017-02-22 14:45             ` Keith Busch
  2017-02-23 15:06               ` Christoph Hellwig
  0 siblings, 1 reply; 35+ messages in thread
From: Keith Busch @ 2017-02-22 14:45 UTC (permalink / raw)


On Wed, Feb 22, 2017@08:17:54AM +0100, Christoph Hellwig wrote:
> On Tue, Feb 21, 2017@10:57:04AM -0500, Keith Busch wrote:
> > This is for suspend/resume. The freeze start is done during the suspend
> > phase, and the unfreeze on resume. Power management calls nvme_suspend,
> > which calls nvme_dev_disable with 'suspend == true',
> 
> I guess I'm missing something in my tree then.  I have a local tree
> with your five patches applied and there is no suspend argument anywhere
> in pci.c.

Err, I meant 'shutdown == true' for nvme_dev_disable, not suspend. That's
in the nvme tree. The part that isn't is the tree is the 'was_suspend' in
nvme_reset_work. That was part of the opal series directly in Jens' tree.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/5] nvme/pci: Complete all stuck requests
  2017-02-22 14:45             ` Keith Busch
@ 2017-02-23 15:06               ` Christoph Hellwig
  2017-02-23 15:21                 ` Keith Busch
  0 siblings, 1 reply; 35+ messages in thread
From: Christoph Hellwig @ 2017-02-23 15:06 UTC (permalink / raw)


On Wed, Feb 22, 2017@09:45:11AM -0500, Keith Busch wrote:
> > with your five patches applied and there is no suspend argument anywhere
> > in pci.c.
> 
> Err, I meant 'shutdown == true' for nvme_dev_disable, not suspend. That's
> in the nvme tree. The part that isn't is the tree is the 'was_suspend' in
> nvme_reset_work. That was part of the opal series directly in Jens' tree.

I still don't understand it.  nvme_dev_disable has no early return,
and it does the nvme_start_freeze, nvme_wait_freeze and nvme_unfreeze
calls under exactly the same conditionals:

	if (drain_queue) {
		if (shutdown)
			nvme_start_freeze(&dev->ctrl);
		nvme_stop_queues(&dev->ctrl);
		...
	}

	..

	if (drain_queue && shutdown) {
		nvme_start_queues(&dev->ctrl);
		nvme_wait_freeze(&dev->ctrl);
		nvme_unfreeze(&dev->ctrl);
		nvme_stop_queues(&dev->ctrl);
	}

so where is the pairing for the unfreeze in nvme_reset_work
coming from?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/5] nvme/pci: Complete all stuck requests
  2017-02-23 15:21                 ` Keith Busch
@ 2017-02-23 15:16                   ` Christoph Hellwig
  0 siblings, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2017-02-23 15:16 UTC (permalink / raw)


On Thu, Feb 23, 2017@10:21:40AM -0500, Keith Busch wrote:
> I thought this would be non-obvious, so I put this detailed commend just
> before the unfreeze:
> 
> 	/*
> 	 * Waiting for frozen increases the freeze depth. Since we
> 	 * already start the freeze earlier in this function to stop
> 	 * incoming requests, we have to unfreeze after froze to get
> 	 * the depth back to the desired.
> 	 */
> 
> Assuming we are starting with a freeze depth of 0, the nvme_start_freeze
> gets us to 1. Then nvme_wait_freeze increases the freeze depth to 2
> (blk_mq_freeze_wait is not exported),

Oooh.  I didn't spot nvme_wait_freeze did not actually call
blk_mq_freeze_queue_wait.  Let's start by exporting that and beating
some sense into the sequence.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/5] nvme/pci: Complete all stuck requests
  2017-02-23 15:06               ` Christoph Hellwig
@ 2017-02-23 15:21                 ` Keith Busch
  2017-02-23 15:16                   ` Christoph Hellwig
  0 siblings, 1 reply; 35+ messages in thread
From: Keith Busch @ 2017-02-23 15:21 UTC (permalink / raw)


On Thu, Feb 23, 2017@04:06:51PM +0100, Christoph Hellwig wrote:
> I still don't understand it.  nvme_dev_disable has no early return,
> and it does the nvme_start_freeze, nvme_wait_freeze and nvme_unfreeze
> calls under exactly the same conditionals:
> 
> 	if (drain_queue) {
> 		if (shutdown)
> 			nvme_start_freeze(&dev->ctrl);
> 		nvme_stop_queues(&dev->ctrl);
> 		...
> 	}
> 
> 	..
> 
> 	if (drain_queue && shutdown) {
> 		nvme_start_queues(&dev->ctrl);
> 		nvme_wait_freeze(&dev->ctrl);
> 		nvme_unfreeze(&dev->ctrl);
> 		nvme_stop_queues(&dev->ctrl);
> 	}
> 
> so where is the pairing for the unfreeze in nvme_reset_work
> coming from?

I thought this would be non-obvious, so I put this detailed commend just
before the unfreeze:

	/*
	 * Waiting for frozen increases the freeze depth. Since we
	 * already start the freeze earlier in this function to stop
	 * incoming requests, we have to unfreeze after froze to get
	 * the depth back to the desired.
	 */

Assuming we are starting with a freeze depth of 0, the nvme_start_freeze
gets us to 1. Then nvme_wait_freeze increases the freeze depth to 2
(blk_mq_freeze_wait is not exported), so we need to unfreeze after frozen
to get us back to 1. Then the nvme_reset_work does the final unfreeze
to get the depth back to 0 so new requests may enter.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/5] nvme/pci: Complete all stuck requests
  2017-02-15 18:14   ` Marc MERLIN
@ 2017-12-14  3:36     ` Marc MERLIN
  2018-02-28  2:22       ` Marc MERLIN
  0 siblings, 1 reply; 35+ messages in thread
From: Marc MERLIN @ 2017-12-14  3:36 UTC (permalink / raw)


On Mon, Mar 13, 2017@08:33:19AM -0700, Marc MERLIN wrote:
> On Mon, Mar 13, 2017@10:36:50AM -0400, Keith Busch wrote:
> > On Fri, Mar 10, 2017@06:26:12PM -0800, Marc MERLIN wrote:
> > > Hi Keith,
> > > 
> > > From what I think I've seen, your patches did get in. Thank you again for
> > > getting them through.
> > > 
> > > I'm assuming that they were too late for 4.10 but would be in 4.11, correct?
> > > 
> > > The 4.10 branch you gave me to test sadly has multiple other bugs that make
> > > it not very desirable on my laptop for every day use, so I'm looking at
> > > switching to a standard released kernel as soon as practical.
> > > I'm also happy to install your patches on top of a release kernel, if they
> > > do apply until they make it to the next mainline release (but don't do extra
> > > rebase for me if that's needed, I'm ok waiting if they don't apply as is)
> > 
> > No problem, thanks for reporting the issue and sticking with it through
> > all the delays.
> > 
> > This was too late for 4.10, but applied with the 4.11 merge. The commit
> > is here:
> > 
> >   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=302ad8cc09339ea261eef58a8d5f4a116a8ffda5
>  
> Thank yuou.
> 
> > Sorry to hear about the other bugs. I think this should be readily
> > portable to the 4.9 longterm stable brach if we get the OK to apply the
> > new blk-mq API's. I'll do some testing this week and check with the
> > maintainers.
> 
> No worries, I didn't expect a random in between release kernel to be
> fully stable. If I just need to wait for 4.11, I can certainly do that.

So, I was all happy and all with 4.11, and it still works fine.
S3 sleep just works, hibernate just works.
And then 4.12 came, and S3 sleep would not work reliably
4.13 came, same thing
4.14, again, same thing :(

Currently when I put the laptop to sleep, it sometimes, but just
sometimes, keeps the thinkpad light in a slow pulsing light as if it
were still in sleep mode, and just never wakes up.

Sadly, this is crappy to bisect since it's not reliably reproducible.

Before I go through that pain, are there any suggestions or ideas I
should try?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 5/5] nvme/pci: Complete all stuck requests
  2017-12-14  3:36     ` Marc MERLIN
@ 2018-02-28  2:22       ` Marc MERLIN
  0 siblings, 0 replies; 35+ messages in thread
From: Marc MERLIN @ 2018-02-28  2:22 UTC (permalink / raw)


On Wed, Dec 13, 2017@07:36:33PM -0800, Marc MERLIN wrote:
> So, I was all happy and all with 4.11, and it still works fine.
> S3 sleep just works, hibernate just works.
> And then 4.12 came, and S3 sleep would not work reliably
> 4.13 came, same thing
> 4.14, again, same thing :(
> 
> Currently when I put the laptop to sleep, it sometimes, but just
> sometimes, keeps the thinkpad light in a slow pulsing light as if it
> were still in sleep mode, and just never wakes up.
> 
> Sadly, this is crappy to bisect since it's not reliably reproducible.
> 
> Before I go through that pain, are there any suggestions or ideas I
> should try?

Well, this is kind of embarassing, but here's what happened.

I got tired of my laptop locking up when recovering from S3 sleep
(sometimes) with 4.12, and 100% of the time with 4.13 and above all the
way to 4.15.
So, I bought a 1TB Sata M2 drive to replace my NVME one.
I spent time setting it up, and then... it failed just like with NVME.

So, NVME was innocent.
A lot of trial and error later, I found out that the iwlagn driver
actually broke between 4.11 and 4.12/4.13 and causes hangs on S3 resume.
Sigh..
My suspend script now unloads the wireless driver before going to sleep,
and all is well again.

Sorry about the wrong report on NVME.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2018-02-28  2:22 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-10 23:15 [PATCH 0/5] NVMe pci fixes, for-4.11 Keith Busch
2017-02-10 23:15 ` [PATCH 1/5] nvme/pci: Disable on removal when disconnected Keith Busch
2017-02-13 10:18   ` Johannes Thumshirn
2017-02-13 13:51   ` Christoph Hellwig
2017-02-10 23:15 ` [PATCH 2/5] nvme/pci: Cancel work after watchdog disabled Keith Busch
2017-02-13 10:25   ` Johannes Thumshirn
2017-02-13 13:51   ` Christoph Hellwig
2017-02-10 23:15 ` [PATCH 3/5] nvme/core: Fix race kicking freed request_queue Keith Busch
2017-02-13 10:33   ` Johannes Thumshirn
2017-02-13 13:53   ` Christoph Hellwig
2017-02-10 23:15 ` [PATCH 4/5] nvme/pci: No special case for queue busy on IO Keith Busch
2017-02-13 13:53   ` Christoph Hellwig
2017-02-10 23:15 ` [PATCH 5/5] nvme/pci: Complete all stuck requests Keith Busch
2017-02-15  9:50   ` Sagi Grimberg
2017-02-15 15:46     ` Keith Busch
2017-02-15 16:04       ` Marc MERLIN
2017-02-15 17:36         ` J Freyensee
2017-02-16  9:12         ` Sagi Grimberg
2017-02-16 22:51           ` Keith Busch
2017-02-17  8:25             ` Christoph Hellwig
2017-02-15 18:14   ` Marc MERLIN
2017-12-14  3:36     ` Marc MERLIN
2018-02-28  2:22       ` Marc MERLIN
2017-02-17 15:27   ` Christoph Hellwig
2017-02-17 16:33     ` Keith Busch
2017-02-20 10:05       ` Christoph Hellwig
2017-02-21 15:57         ` Keith Busch
2017-02-22  7:17           ` Christoph Hellwig
2017-02-22 14:45             ` Keith Busch
2017-02-23 15:06               ` Christoph Hellwig
2017-02-23 15:21                 ` Keith Busch
2017-02-23 15:16                   ` Christoph Hellwig
2017-02-21 21:55       ` Sagi Grimberg
2017-02-21 23:26         ` Keith Busch
2017-02-15  9:40 ` [PATCH 0/5] NVMe pci fixes, for-4.11 Sagi Grimberg
     [not found] <20170313153319.fmy6ww72fjtx74xq@merlins.org>
     [not found] ` <20170313143649.GC6994@localhost.localdomain>

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.