All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2-4.5 00/10] NVMe fixes
@ 2016-02-11 20:05 Keith Busch
  2016-02-11 20:05 ` [PATCHv2-4.5 01/10] blk-mq: End unstarted requests on dying queue Keith Busch
                   ` (10 more replies)
  0 siblings, 11 replies; 23+ messages in thread
From: Keith Busch @ 2016-02-11 20:05 UTC (permalink / raw)


Thank you everyone for the earlier review comments.

The entire series is here, though most of the code is not changing from
the previous.

I'll gladly handle merging this with the 4.6 branch if/when this goes
through.

v1 -> v2:

  Added all recieved "Reviewed-by".

  Dropped patch for NVMe's max_dev_sectors. This is fixed in the block
  layer with a different and generic patch external to this series.

  Dropped patch removing nvme_workq's WQ_MEM_RECLAIM. The flush sync
  warning that patch was trying to address is fixed by moving the
  namespace cleanup from the nvme_workq to an existing nvme callback
  scheduled on the system workqueue.

  Implemented Sagi's suggestion to move the "nvme_kill_ns_queues" to from
  pci to core. There was no real dependency on PCI after a very minor
  modification. Also added code comments to explain the function usage.

  Combined the "NS_DEAD" state patch with the namespace queue cleanup
  handling in the reset_work. The first really counted on the second to
  actually be useful, so making it a single commit.

  Fixed an error code to be negative, pointed out in review.

  Commented the requirement for nvme_remove()'s necessity to handle a
  partially initialized.

  Made nvme's "Cancelling I/O ..." message debug level, in addition to
  rate limted.

Keith Busch (10):
  blk-mq: End unstarted requests on dying queue
  NVMe: Fix io incapable return values
  NVMe: Allow request merges
  NVMe: Fix namespace removal deadlock
  NVMe: Requeue requests on suspended queues
  NVMe: Poll device while still active during remove
  NVMe: Simplify device reset failure
  NVMe: Move error handling to failed reset handler
  NVMe: Mark queues as dead on degraded controller
  NVMe: Rate limit nvme IO warnings

 block/blk-mq.c           |   6 ++-
 drivers/nvme/host/core.c | 105 ++++++++++++++++++++++++++++++++---------------
 drivers/nvme/host/nvme.h |  13 +++++-
 drivers/nvme/host/pci.c  |  66 ++++++++++++++---------------
 4 files changed, 121 insertions(+), 69 deletions(-)

-- 
2.6.2.307.g37023ba

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCHv2-4.5 01/10] blk-mq: End unstarted requests on dying queue
  2016-02-11 20:05 [PATCHv2-4.5 00/10] NVMe fixes Keith Busch
@ 2016-02-11 20:05 ` Keith Busch
  2016-02-11 20:05 ` [PATCHv2-4.5 02/10] NVMe: Fix io incapable return values Keith Busch
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Keith Busch @ 2016-02-11 20:05 UTC (permalink / raw)


Go directly to ending a request if it wasn't started. Previously, completing a
request may invoke a driver callback for a request it didn't initialize.

Signed-off-by: Keith Busch <keith.busch at intel.com>
Reviewed-by: Sagi Grimberg <sagig at mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
Acked-by: Christoph Hellwig <hch at lst.de>
---
 block/blk-mq.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4c0622f..56c0a72 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -599,8 +599,10 @@ static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
 		 * If a request wasn't started before the queue was
 		 * marked dying, kill it here or it'll go unnoticed.
 		 */
-		if (unlikely(blk_queue_dying(rq->q)))
-			blk_mq_complete_request(rq, -EIO);
+		if (unlikely(blk_queue_dying(rq->q))) {
+			rq->errors = -EIO;
+			blk_mq_end_request(rq, rq->errors);
+		}
 		return;
 	}
 
-- 
2.6.2.307.g37023ba

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv2-4.5 02/10] NVMe: Fix io incapable return values
  2016-02-11 20:05 [PATCHv2-4.5 00/10] NVMe fixes Keith Busch
  2016-02-11 20:05 ` [PATCHv2-4.5 01/10] blk-mq: End unstarted requests on dying queue Keith Busch
@ 2016-02-11 20:05 ` Keith Busch
  2016-02-11 20:05 ` [PATCHv2-4.5 03/10] NVMe: Allow request merges Keith Busch
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Keith Busch @ 2016-02-11 20:05 UTC (permalink / raw)


The function returns true when the controller can't handle IO.

Signed-off-by: Keith Busch <keith.busch at intel.com>
Reviewed-by: Sagi Grimberg <sagig at mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
Acked-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/nvme.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 4fb5bb7..9664d07 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -139,9 +139,9 @@ static inline bool nvme_io_incapable(struct nvme_ctrl *ctrl)
 	u32 val = 0;
 
 	if (ctrl->ops->io_incapable(ctrl))
-		return false;
+		return true;
 	if (ctrl->ops->reg_read32(ctrl, NVME_REG_CSTS, &val))
-		return false;
+		return true;
 	return val & NVME_CSTS_CFS;
 }
 
-- 
2.6.2.307.g37023ba

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv2-4.5 03/10] NVMe: Allow request merges
  2016-02-11 20:05 [PATCHv2-4.5 00/10] NVMe fixes Keith Busch
  2016-02-11 20:05 ` [PATCHv2-4.5 01/10] blk-mq: End unstarted requests on dying queue Keith Busch
  2016-02-11 20:05 ` [PATCHv2-4.5 02/10] NVMe: Fix io incapable return values Keith Busch
@ 2016-02-11 20:05 ` Keith Busch
  2016-02-11 20:05 ` [PATCHv2-4.5 04/10] NVMe: Fix namespace removal deadlock Keith Busch
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Keith Busch @ 2016-02-11 20:05 UTC (permalink / raw)


It is generally more efficient to submit larger IO.

Signed-off-by: Keith Busch <keith.busch at intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
Reviewed-by: Sagi Grimberg <sagig at mellanox.com>
---
 drivers/nvme/host/core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c5bf001..3cd921e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1121,7 +1121,6 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	ns->queue = blk_mq_init_queue(ctrl->tagset);
 	if (IS_ERR(ns->queue))
 		goto out_free_ns;
-	queue_flag_set_unlocked(QUEUE_FLAG_NOMERGES, ns->queue);
 	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, ns->queue);
 	ns->queue->queuedata = ns;
 	ns->ctrl = ctrl;
-- 
2.6.2.307.g37023ba

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv2-4.5 04/10] NVMe: Fix namespace removal deadlock
  2016-02-11 20:05 [PATCHv2-4.5 00/10] NVMe fixes Keith Busch
                   ` (2 preceding siblings ...)
  2016-02-11 20:05 ` [PATCHv2-4.5 03/10] NVMe: Allow request merges Keith Busch
@ 2016-02-11 20:05 ` Keith Busch
  2016-02-13  9:44   ` Christoph Hellwig
  2016-02-11 20:05 ` [PATCHv2-4.5 05/10] NVMe: Requeue requests on suspended queues Keith Busch
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Keith Busch @ 2016-02-11 20:05 UTC (permalink / raw)


Namespace removal holds the namespace list mutex. If the controller
fails during namespace removal, the reset work would deadlock trying to
acquire the mutex.

This patch completes the potentially blocking parts of namespace
removal in a controller work queue so that reset work can cleanup a
failed controller.

Signed-off-by: Keith Busch <keith.busch at intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
---
 drivers/nvme/host/core.c | 81 ++++++++++++++++++++++++++++++------------------
 drivers/nvme/host/nvme.h |  6 ++++
 2 files changed, 57 insertions(+), 30 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3cd921e..ceea7f0 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -48,6 +48,10 @@ static void nvme_free_ns(struct kref *kref)
 {
 	struct nvme_ns *ns = container_of(kref, struct nvme_ns, kref);
 
+	mutex_lock(&ns->ctrl->namespaces_mutex);
+	list_del_init(&ns->list);
+	mutex_unlock(&ns->ctrl->namespaces_mutex);
+
 	if (ns->type == NVME_NS_LIGHTNVM)
 		nvme_nvm_unregister(ns->queue, ns->disk->disk_name);
 
@@ -1106,6 +1110,37 @@ static struct nvme_ns *nvme_find_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	return NULL;
 }
 
+static void nvme_ns_remove_work(struct work_struct *work)
+{
+	struct nvme_ns *ns = container_of(work, struct nvme_ns, remove_work);
+	bool kill = nvme_io_incapable(ns->ctrl) &&
+			!blk_queue_dying(ns->queue);
+
+	if (kill) {
+		blk_set_queue_dying(ns->queue);
+
+		/*
+		 * The controller was shutdown first if we got here through
+		 * device removal. The shutdown may requeue outstanding
+		 * requests. These need to be aborted immediately so
+		 * del_gendisk doesn't block indefinitely for their completion.
+		 */
+		blk_mq_abort_requeue_list(ns->queue);
+	}
+	if (ns->disk->flags & GENHD_FL_UP) {
+		if (blk_get_integrity(ns->disk))
+			blk_integrity_unregister(ns->disk);
+		sysfs_remove_group(&disk_to_dev(ns->disk)->kobj,
+					&nvme_ns_attr_group);
+		del_gendisk(ns->disk);
+	}
+	if (kill || !blk_queue_dying(ns->queue)) {
+		blk_mq_abort_requeue_list(ns->queue);
+		blk_cleanup_queue(ns->queue);
+	}
+	nvme_put_ns(ns);
+}
+
 static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 {
 	struct nvme_ns *ns;
@@ -1160,6 +1195,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 
 	list_add_tail(&ns->list, &ctrl->namespaces);
 	kref_get(&ctrl->kref);
+	INIT_WORK(&ns->remove_work, nvme_ns_remove_work);
 	if (ns->type == NVME_NS_LIGHTNVM)
 		return;
 
@@ -1179,35 +1215,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 
 static void nvme_ns_remove(struct nvme_ns *ns)
 {
-	bool kill = nvme_io_incapable(ns->ctrl) &&
-			!blk_queue_dying(ns->queue);
-
-	lockdep_assert_held(&ns->ctrl->namespaces_mutex);
-
-	if (kill) {
-		blk_set_queue_dying(ns->queue);
-
-		/*
-		 * The controller was shutdown first if we got here through
-		 * device removal. The shutdown may requeue outstanding
-		 * requests. These need to be aborted immediately so
-		 * del_gendisk doesn't block indefinitely for their completion.
-		 */
-		blk_mq_abort_requeue_list(ns->queue);
-	}
-	if (ns->disk->flags & GENHD_FL_UP) {
-		if (blk_get_integrity(ns->disk))
-			blk_integrity_unregister(ns->disk);
-		sysfs_remove_group(&disk_to_dev(ns->disk)->kobj,
-					&nvme_ns_attr_group);
-		del_gendisk(ns->disk);
-	}
-	if (kill || !blk_queue_dying(ns->queue)) {
-		blk_mq_abort_requeue_list(ns->queue);
-		blk_cleanup_queue(ns->queue);
-	}
-	list_del_init(&ns->list);
-	nvme_put_ns(ns);
+	if (!test_and_set_bit(NVME_NS_REMOVING, &ns->flags))
+		queue_work(ns->ctrl->ns_workq, &ns->remove_work);
 }
 
 static void nvme_validate_ns(struct nvme_ctrl *ctrl, unsigned nsid)
@@ -1304,6 +1313,8 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
 	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list)
 		nvme_ns_remove(ns);
 	mutex_unlock(&ctrl->namespaces_mutex);
+
+	flush_workqueue(ctrl->ns_workq);
 }
 
 static DEFINE_IDA(nvme_instance_ida);
@@ -1336,12 +1347,14 @@ static void nvme_release_instance(struct nvme_ctrl *ctrl)
 }
 
 void nvme_uninit_ctrl(struct nvme_ctrl *ctrl)
- {
+{
 	device_destroy(nvme_class, MKDEV(nvme_char_major, ctrl->instance));
 
 	spin_lock(&dev_list_lock);
 	list_del(&ctrl->node);
 	spin_unlock(&dev_list_lock);
+
+	destroy_workqueue(ctrl->ns_workq);
 }
 
 static void nvme_free_ctrl(struct kref *kref)
@@ -1391,11 +1404,19 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 	get_device(ctrl->device);
 	dev_set_drvdata(ctrl->device, ctrl);
 
+	ctrl->ns_workq = alloc_workqueue("nvme-ns", WQ_UNBOUND, 0);
+	if (!ctrl->ns_workq) {
+		ret = -ENOMEM;
+		goto out_destroy_device;
+	}
+
 	spin_lock(&dev_list_lock);
 	list_add_tail(&ctrl->node, &nvme_ctrl_list);
 	spin_unlock(&dev_list_lock);
 
 	return 0;
+out_destroy_device:
+	device_destroy(nvme_class, MKDEV(nvme_char_major, ctrl->instance));
 out_release_instance:
 	nvme_release_instance(ctrl);
 out:
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 9664d07..d330512 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -72,6 +72,7 @@ struct nvme_ctrl {
 	struct mutex namespaces_mutex;
 	struct device *device;	/* char device */
 	struct list_head node;
+	struct workqueue_struct *ns_workq;
 
 	char name[12];
 	char serial[20];
@@ -112,6 +113,11 @@ struct nvme_ns {
 	bool ext;
 	u8 pi_type;
 	int type;
+	struct work_struct remove_work;
+	unsigned long flags;
+
+#define NVME_NS_REMOVING 0
+
 	u64 mode_select_num_blocks;
 	u32 mode_select_block_len;
 };
-- 
2.6.2.307.g37023ba

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv2-4.5 05/10] NVMe: Requeue requests on suspended queues
  2016-02-11 20:05 [PATCHv2-4.5 00/10] NVMe fixes Keith Busch
                   ` (3 preceding siblings ...)
  2016-02-11 20:05 ` [PATCHv2-4.5 04/10] NVMe: Fix namespace removal deadlock Keith Busch
@ 2016-02-11 20:05 ` Keith Busch
  2016-02-11 20:05 ` [PATCHv2-4.5 06/10] NVMe: Poll device while still active during remove Keith Busch
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Keith Busch @ 2016-02-11 20:05 UTC (permalink / raw)


It's possible a request may get to the driver after the nvme queue was
disabled. This has the request requeue if that happens.

Note the request is still "started" by the driver, but requeuing will
clear the start state for timeout handling.

Signed-off-by: Keith Busch <keith.busch at intel.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
Reviewed-by: Sagi Grimberg <sagig at mellanox.com>
---
 drivers/nvme/host/pci.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 72ef832..e5c2bea 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -678,6 +678,11 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 	blk_mq_start_request(req);
 
 	spin_lock_irq(&nvmeq->q_lock);
+	if (unlikely(nvmeq->cq_vector < 0)) {
+		ret = BLK_MQ_RQ_QUEUE_BUSY;
+		spin_unlock_irq(&nvmeq->q_lock);
+		goto out;
+	}
 	__nvme_submit_cmd(nvmeq, &cmnd);
 	nvme_process_cq(nvmeq);
 	spin_unlock_irq(&nvmeq->q_lock);
-- 
2.6.2.307.g37023ba

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv2-4.5 06/10] NVMe: Poll device while still active during remove
  2016-02-11 20:05 [PATCHv2-4.5 00/10] NVMe fixes Keith Busch
                   ` (4 preceding siblings ...)
  2016-02-11 20:05 ` [PATCHv2-4.5 05/10] NVMe: Requeue requests on suspended queues Keith Busch
@ 2016-02-11 20:05 ` Keith Busch
  2016-02-11 20:05 ` [PATCHv2-4.5 07/10] NVMe: Simplify device reset failure Keith Busch
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Keith Busch @ 2016-02-11 20:05 UTC (permalink / raw)


A device failure or link down wouldn't have been detected during namespace
removal. This patch keeps the device in the list for polling so that the
thread may see such failure and initiate a reset. The device is removed
from the list after disable, so we can safely flush the reset work as
it can't be requeued when disable completes.

Signed-off-by: Keith Busch <keith.busch at intel.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
Reviewed-by: Sagi Grimberg <sagig at mellanox.com>
---
 drivers/nvme/host/pci.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index e5c2bea..09cc4da 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2116,16 +2116,12 @@ static void nvme_remove(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
 
-	spin_lock(&dev_list_lock);
-	list_del_init(&dev->node);
-	spin_unlock(&dev_list_lock);
-
 	pci_set_drvdata(pdev, NULL);
-	flush_work(&dev->reset_work);
 	flush_work(&dev->scan_work);
 	nvme_remove_namespaces(&dev->ctrl);
 	nvme_uninit_ctrl(&dev->ctrl);
 	nvme_dev_disable(dev, true);
+	flush_work(&dev->reset_work);
 	nvme_dev_remove_admin(dev);
 	nvme_free_queues(dev, 0);
 	nvme_release_cmb(dev);
-- 
2.6.2.307.g37023ba

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv2-4.5 07/10] NVMe: Simplify device reset failure
  2016-02-11 20:05 [PATCHv2-4.5 00/10] NVMe fixes Keith Busch
                   ` (5 preceding siblings ...)
  2016-02-11 20:05 ` [PATCHv2-4.5 06/10] NVMe: Poll device while still active during remove Keith Busch
@ 2016-02-11 20:05 ` Keith Busch
  2016-02-13  9:21   ` Christoph Hellwig
  2016-02-11 20:05 ` [PATCHv2-4.5 08/10] NVMe: Move error handling to failed reset handler Keith Busch
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Keith Busch @ 2016-02-11 20:05 UTC (permalink / raw)


A reset failure schedules the device to unbind from the driver through
the pci driver's 'remove'. This already cleans up all intialization
from a partial or completed state, so there is no need to duplicate the
cleanup that could potentially race with a pci hot removal.

To help understand why a reset failed, the status is logged with the
existing warning message.

Signed-off-by: Keith Busch <keith.busch at intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
---
 drivers/nvme/host/pci.c | 48 +++++++++++++++++++++---------------------------
 1 file changed, 21 insertions(+), 27 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 09cc4da..9e1be57 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -86,7 +86,6 @@ struct nvme_queue;
 
 static int nvme_reset(struct nvme_dev *dev);
 static void nvme_process_cq(struct nvme_queue *nvmeq);
-static void nvme_remove_dead_ctrl(struct nvme_dev *dev);
 static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown);
 
 /*
@@ -1899,10 +1898,19 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
 	kfree(dev);
 }
 
+static void nvme_remove_dead_ctrl(struct nvme_dev *dev, int status)
+{
+	dev_warn(dev->dev, "Removing after probe failure status: %d\n", status);
+
+	kref_get(&dev->ctrl.kref);
+	if (!schedule_work(&dev->remove_work))
+		nvme_put_ctrl(&dev->ctrl);
+}
+
 static void nvme_reset_work(struct work_struct *work)
 {
 	struct nvme_dev *dev = container_of(work, struct nvme_dev, reset_work);
-	int result;
+	int result = -ENODEV;
 
 	if (WARN_ON(test_bit(NVME_CTRL_RESETTING, &dev->flags)))
 		goto out;
@@ -1922,26 +1930,26 @@ static void nvme_reset_work(struct work_struct *work)
 
 	result = nvme_configure_admin_queue(dev);
 	if (result)
-		goto unmap;
+		goto out;
 
 	nvme_init_queue(dev->queues[0], 0);
 	result = nvme_alloc_admin_tags(dev);
 	if (result)
-		goto disable;
+		goto out;
 
 	result = nvme_init_identify(&dev->ctrl);
 	if (result)
-		goto free_tags;
+		goto out;
 
 	result = nvme_setup_io_queues(dev);
 	if (result)
-		goto free_tags;
+		goto out;
 
 	dev->ctrl.event_limit = NVME_NR_AEN_COMMANDS;
 
 	result = nvme_dev_list_add(dev);
 	if (result)
-		goto remove;
+		goto out;
 
 	/*
 	 * Keep the controller around but remove all namespaces if we don't have
@@ -1958,19 +1966,8 @@ static void nvme_reset_work(struct work_struct *work)
 	clear_bit(NVME_CTRL_RESETTING, &dev->flags);
 	return;
 
- remove:
-	nvme_dev_list_remove(dev);
- free_tags:
-	nvme_dev_remove_admin(dev);
-	blk_put_queue(dev->ctrl.admin_q);
-	dev->ctrl.admin_q = NULL;
-	dev->queues[0]->tags = NULL;
- disable:
-	nvme_disable_admin_queue(dev, false);
- unmap:
-	nvme_dev_unmap(dev);
  out:
-	nvme_remove_dead_ctrl(dev);
+	nvme_remove_dead_ctrl(dev, result);
 }
 
 static void nvme_remove_dead_ctrl_work(struct work_struct *work)
@@ -1983,14 +1980,6 @@ static void nvme_remove_dead_ctrl_work(struct work_struct *work)
 	nvme_put_ctrl(&dev->ctrl);
 }
 
-static void nvme_remove_dead_ctrl(struct nvme_dev *dev)
-{
-	dev_warn(dev->dev, "Removing after probe failure\n");
-	kref_get(&dev->ctrl.kref);
-	if (!schedule_work(&dev->remove_work))
-		nvme_put_ctrl(&dev->ctrl);
-}
-
 static int nvme_reset(struct nvme_dev *dev)
 {
 	if (!dev->ctrl.admin_q || blk_queue_dying(dev->ctrl.admin_q))
@@ -2112,6 +2101,11 @@ static void nvme_shutdown(struct pci_dev *pdev)
 	nvme_dev_disable(dev, true);
 }
 
+/*
+ * The driver's remove may be called on a device in a partially initialized
+ * state. This function must not have any dependencies on the init state in
+ * order to proceed, and may even execute while initiazation is in progress.
+ */
 static void nvme_remove(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
-- 
2.6.2.307.g37023ba

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv2-4.5 08/10] NVMe: Move error handling to failed reset handler
  2016-02-11 20:05 [PATCHv2-4.5 00/10] NVMe fixes Keith Busch
                   ` (6 preceding siblings ...)
  2016-02-11 20:05 ` [PATCHv2-4.5 07/10] NVMe: Simplify device reset failure Keith Busch
@ 2016-02-11 20:05 ` Keith Busch
  2016-02-13  9:46   ` Christoph Hellwig
  2016-02-11 20:05 ` [PATCHv2-4.5 09/10] NVMe: Mark queues as dead on degraded controller Keith Busch
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Keith Busch @ 2016-02-11 20:05 UTC (permalink / raw)


This moves failed queue handling out of the namespace removal path and into
the reset failure path, fixing a deadlock condition if the controller
fails or link down during del_gendisk. Previously the driver had to see
the controller as degraded prior to calling del_gendisk to setup the
queues to fail. If the controller happened to fail after this though,
there was no task to end the request_queue.

On failure, all namespace states are set to 'dead'. This has capacity
revalidate to 0, and ends all new requests with error status.

Signed-off-by: Keith Busch <keith.busch at intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
---
 drivers/nvme/host/core.c | 51 ++++++++++++++++++++++++++++++++++--------------
 drivers/nvme/host/nvme.h |  2 ++
 drivers/nvme/host/pci.c  |  6 +++++-
 3 files changed, 43 insertions(+), 16 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ceea7f0..5ff1e77 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -560,6 +560,10 @@ static int nvme_revalidate_disk(struct gendisk *disk)
 	u16 old_ms;
 	unsigned short bs;
 
+	if (test_bit(NVME_NS_DEAD, &ns->flags)) {
+		set_capacity(disk, 0);
+		return -ENODEV;
+	}
 	if (nvme_identify_ns(ns->ctrl, ns->ns_id, &id)) {
 		dev_warn(ns->ctrl->dev, "%s: Identify failure nvme%dn%d\n",
 				__func__, ns->ctrl->instance, ns->ns_id);
@@ -1113,34 +1117,51 @@ static struct nvme_ns *nvme_find_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 static void nvme_ns_remove_work(struct work_struct *work)
 {
 	struct nvme_ns *ns = container_of(work, struct nvme_ns, remove_work);
-	bool kill = nvme_io_incapable(ns->ctrl) &&
-			!blk_queue_dying(ns->queue);
-
-	if (kill) {
-		blk_set_queue_dying(ns->queue);
 
-		/*
-		 * The controller was shutdown first if we got here through
-		 * device removal. The shutdown may requeue outstanding
-		 * requests. These need to be aborted immediately so
-		 * del_gendisk doesn't block indefinitely for their completion.
-		 */
-		blk_mq_abort_requeue_list(ns->queue);
-	}
 	if (ns->disk->flags & GENHD_FL_UP) {
 		if (blk_get_integrity(ns->disk))
 			blk_integrity_unregister(ns->disk);
 		sysfs_remove_group(&disk_to_dev(ns->disk)->kobj,
 					&nvme_ns_attr_group);
 		del_gendisk(ns->disk);
-	}
-	if (kill || !blk_queue_dying(ns->queue)) {
 		blk_mq_abort_requeue_list(ns->queue);
 		blk_cleanup_queue(ns->queue);
 	}
 	nvme_put_ns(ns);
 }
 
+/**
+ * nvme_kill_ns_queues(): Ends all namespace queues
+ * @ctrl: the dead controller that needs to end
+ *
+ * Call this function when the driver determines it is unable to get the
+ * controller in a state capable of servicing IO.
+ */
+void nvme_kill_ns_queues(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		if (!kref_get_unless_zero(&ns->kref))
+			continue;
+
+		/*
+		 * Revalidating a dead namespace sets capacity to 0. This will
+		 * end buffered writers dirtying pages that can't be synced.
+		 */
+		if (!test_and_set_bit(NVME_NS_DEAD, &ns->flags))
+			revalidate_disk(ns->disk);
+
+		blk_set_queue_dying(ns->queue);
+		blk_mq_abort_requeue_list(ns->queue);
+		blk_mq_start_stopped_hw_queues(ns->queue, true);
+
+		nvme_put_ns(ns);
+	}
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+
 static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 {
 	struct nvme_ns *ns;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index d330512..8fa5ceb 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -117,6 +117,7 @@ struct nvme_ns {
 	unsigned long flags;
 
 #define NVME_NS_REMOVING 0
+#define NVME_NS_DEAD     1
 
 	u64 mode_select_num_blocks;
 	u32 mode_select_block_len;
@@ -270,6 +271,7 @@ int nvme_get_features(struct nvme_ctrl *dev, unsigned fid, unsigned nsid,
 int nvme_set_features(struct nvme_ctrl *dev, unsigned fid, unsigned dword11,
 			dma_addr_t dma_addr, u32 *result);
 int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count);
+void nvme_kill_ns_queues(struct nvme_ctrl *ctrl);
 
 extern spinlock_t dev_list_lock;
 
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9e1be57..32286e9 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -678,7 +678,9 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 	spin_lock_irq(&nvmeq->q_lock);
 	if (unlikely(nvmeq->cq_vector < 0)) {
-		ret = BLK_MQ_RQ_QUEUE_BUSY;
+		ret = test_bit(NVME_NS_DEAD, &ns->flags) ?
+					BLK_MQ_RQ_QUEUE_ERROR :
+					BLK_MQ_RQ_QUEUE_BUSY;
 		spin_unlock_irq(&nvmeq->q_lock);
 		goto out;
 	}
@@ -1903,6 +1905,7 @@ static void nvme_remove_dead_ctrl(struct nvme_dev *dev, int status)
 	dev_warn(dev->dev, "Removing after probe failure status: %d\n", status);
 
 	kref_get(&dev->ctrl.kref);
+	nvme_dev_disable(dev, false);
 	if (!schedule_work(&dev->remove_work))
 		nvme_put_ctrl(&dev->ctrl);
 }
@@ -1975,6 +1978,7 @@ static void nvme_remove_dead_ctrl_work(struct work_struct *work)
 	struct nvme_dev *dev = container_of(work, struct nvme_dev, remove_work);
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 
+	nvme_kill_ns_queues(&dev->ctrl);
 	if (pci_get_drvdata(pdev))
 		pci_stop_and_remove_bus_device_locked(pdev);
 	nvme_put_ctrl(&dev->ctrl);
-- 
2.6.2.307.g37023ba

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv2-4.5 09/10] NVMe: Mark queues as dead on degraded controller
  2016-02-11 20:05 [PATCHv2-4.5 00/10] NVMe fixes Keith Busch
                   ` (7 preceding siblings ...)
  2016-02-11 20:05 ` [PATCHv2-4.5 08/10] NVMe: Move error handling to failed reset handler Keith Busch
@ 2016-02-11 20:05 ` Keith Busch
  2016-02-11 20:05 ` [PATCHv2-4.5 10/10] NVMe: Rate limit nvme IO warnings Keith Busch
  2016-02-11 22:28 ` [PATCHv2-4.5 00/10] NVMe fixes Keith Busch
  10 siblings, 0 replies; 23+ messages in thread
From: Keith Busch @ 2016-02-11 20:05 UTC (permalink / raw)


If a previously optimal controller goes degraded, the IO queues will
stop working, but the controller is still managable. Prior to removing
namespace, we need to fail the queues so queued and new IO will fail
and allow namespace removal to progress.

Signed-off-by: Keith Busch <keith.busch at intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
---
 drivers/nvme/host/pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 32286e9..4de17ae 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1960,6 +1960,7 @@ static void nvme_reset_work(struct work_struct *work)
 	 */
 	if (dev->online_queues < 2) {
 		dev_warn(dev->dev, "IO queues not created\n");
+		nvme_kill_ns_queues(&dev->ctrl);
 		nvme_remove_namespaces(&dev->ctrl);
 	} else {
 		nvme_start_queues(&dev->ctrl);
-- 
2.6.2.307.g37023ba

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv2-4.5 10/10] NVMe: Rate limit nvme IO warnings
  2016-02-11 20:05 [PATCHv2-4.5 00/10] NVMe fixes Keith Busch
                   ` (8 preceding siblings ...)
  2016-02-11 20:05 ` [PATCHv2-4.5 09/10] NVMe: Mark queues as dead on degraded controller Keith Busch
@ 2016-02-11 20:05 ` Keith Busch
  2016-02-12  8:16   ` Johannes Thumshirn
  2016-02-11 22:28 ` [PATCHv2-4.5 00/10] NVMe fixes Keith Busch
  10 siblings, 1 reply; 23+ messages in thread
From: Keith Busch @ 2016-02-11 20:05 UTC (permalink / raw)


We don't need to spam the kernel logs with thousands of IO cancelling
messages. We can infer all IO's are being cancelled with fewer, or
even none at all. This patch rate limits the message and uses the debug
log level as it is mainly used for testing purposes.

Signed-off-by: Keith Busch <keith.busch at intel.com>
---
 drivers/nvme/host/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 4de17ae..1b0e447 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1005,7 +1005,7 @@ static void nvme_cancel_queue_ios(struct request *req, void *data, bool reserved
 	if (!blk_mq_request_started(req))
 		return;
 
-	dev_warn(nvmeq->q_dmadev,
+	dev_dbg_ratelimited(nvmeq->q_dmadev,
 		 "Cancelling I/O %d QID %d\n", req->tag, nvmeq->qid);
 
 	status = NVME_SC_ABORT_REQ;
-- 
2.6.2.307.g37023ba

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCHv2-4.5 00/10] NVMe fixes
  2016-02-11 20:05 [PATCHv2-4.5 00/10] NVMe fixes Keith Busch
                   ` (9 preceding siblings ...)
  2016-02-11 20:05 ` [PATCHv2-4.5 10/10] NVMe: Rate limit nvme IO warnings Keith Busch
@ 2016-02-11 22:28 ` Keith Busch
  2016-02-11 22:38   ` Jens Axboe
  10 siblings, 1 reply; 23+ messages in thread
From: Keith Busch @ 2016-02-11 22:28 UTC (permalink / raw)


Just want to point out a couple issues. Both are easy fixes and will
send in a few minutes.

First, admin queues don't have namespaces, so the new namespace state
check isn't always correct. I haven't been able to synthesize a test that
gets an admin command down this path in the disabled queue condition,
but it's clearly not correct.

The second issue potentially existed all along, but it's a littlie easier
to hit now: we need to lock out shutdown from manipulating registers
while they're being setup. This was hit by forcing controller reset from
the shelll in a loop, then "rmmod nvme" at the same time.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCHv2-4.5 00/10] NVMe fixes
  2016-02-11 22:28 ` [PATCHv2-4.5 00/10] NVMe fixes Keith Busch
@ 2016-02-11 22:38   ` Jens Axboe
  2016-02-12  8:35     ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2016-02-11 22:38 UTC (permalink / raw)


On 02/11/2016 03:28 PM, Keith Busch wrote:
> Just want to point out a couple issues. Both are easy fixes and will
> send in a few minutes.
>
> First, admin queues don't have namespaces, so the new namespace state
> check isn't always correct. I haven't been able to synthesize a test that
> gets an admin command down this path in the disabled queue condition,
> but it's clearly not correct.
>
> The second issue potentially existed all along, but it's a littlie easier
> to hit now: we need to lock out shutdown from manipulating registers
> while they're being setup. This was hit by forcing controller reset from
> the shelll in a loop, then "rmmod nvme" at the same time.

I already applied these for 4.5, just didn't send the ack before I had 
done some testing. JFYI.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCHv2-4.5 10/10] NVMe: Rate limit nvme IO warnings
  2016-02-11 20:05 ` [PATCHv2-4.5 10/10] NVMe: Rate limit nvme IO warnings Keith Busch
@ 2016-02-12  8:16   ` Johannes Thumshirn
  0 siblings, 0 replies; 23+ messages in thread
From: Johannes Thumshirn @ 2016-02-12  8:16 UTC (permalink / raw)


On Thu, Feb 11, 2016@01:05:47PM -0700, Keith Busch wrote:
> We don't need to spam the kernel logs with thousands of IO cancelling
> messages. We can infer all IO's are being cancelled with fewer, or
> even none at all. This patch rate limits the message and uses the debug
> log level as it is mainly used for testing purposes.
> 
> Signed-off-by: Keith Busch <keith.busch at intel.com>

Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>

-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCHv2-4.5 00/10] NVMe fixes
  2016-02-11 22:38   ` Jens Axboe
@ 2016-02-12  8:35     ` Christoph Hellwig
  2016-02-12 15:09       ` Jens Axboe
  2016-02-12 15:24       ` Keith Busch
  0 siblings, 2 replies; 23+ messages in thread
From: Christoph Hellwig @ 2016-02-12  8:35 UTC (permalink / raw)


On Thu, Feb 11, 2016@03:38:02PM -0700, Jens Axboe wrote:
> I already applied these for 4.5, just didn't send the ack before I had done
> some testing. JFYI.

Argg - some of these including the namespace removal changes absolutely
move into the wrong direction.  I've just been busy reviewing the easier
parts first before coming up with better ways for those.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCHv2-4.5 00/10] NVMe fixes
  2016-02-12  8:35     ` Christoph Hellwig
@ 2016-02-12 15:09       ` Jens Axboe
  2016-02-12 15:24       ` Keith Busch
  1 sibling, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2016-02-12 15:09 UTC (permalink / raw)


On 02/12/2016 01:35 AM, Christoph Hellwig wrote:
> On Thu, Feb 11, 2016@03:38:02PM -0700, Jens Axboe wrote:
>> I already applied these for 4.5, just didn't send the ack before I had done
>> some testing. JFYI.
>
> Argg - some of these including the namespace removal changes absolutely
> move into the wrong direction.  I've just been busy reviewing the easier
> parts first before coming up with better ways for those.

Let's just kill the series, it's not in the 4.6 branch yet either. I'll 
rebase for-linus.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCHv2-4.5 00/10] NVMe fixes
  2016-02-12  8:35     ` Christoph Hellwig
  2016-02-12 15:09       ` Jens Axboe
@ 2016-02-12 15:24       ` Keith Busch
  2016-02-13  9:49         ` Christoph Hellwig
  1 sibling, 1 reply; 23+ messages in thread
From: Keith Busch @ 2016-02-12 15:24 UTC (permalink / raw)


On Fri, Feb 12, 2016@12:35:08AM -0800, Christoph Hellwig wrote:
> Argg - some of these including the namespace removal changes absolutely
> move into the wrong direction.  I've just been busy reviewing the easier
> parts first before coming up with better ways for those.

Fair enough. I know you're busy in many other areas as well, so I
appreciate the time you've provided.

To the task at hand, this started out to fix a hot-plug regression (is
there any other kind?). More poking revealed other cracks, some of which
are pretty obscure that probably never worked, but still scrambling to
fix before the release anyway.

I'm not trying to significantly alter the design, but I think you're
saying these bolted on fixes are not helping long term. More focus is
moving to Fabrics, so I really want to get this one PCI feature working
before that happens. If you have something in the right direction that
will also help the current cycle, I can wait for when you're ready to
share that.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCHv2-4.5 07/10] NVMe: Simplify device reset failure
  2016-02-11 20:05 ` [PATCHv2-4.5 07/10] NVMe: Simplify device reset failure Keith Busch
@ 2016-02-13  9:21   ` Christoph Hellwig
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2016-02-13  9:21 UTC (permalink / raw)


On Thu, Feb 11, 2016@01:05:44PM -0700, Keith Busch wrote:
> A reset failure schedules the device to unbind from the driver through
> the pci driver's 'remove'. This already cleans up all intialization
> from a partial or completed state, so there is no need to duplicate the
> cleanup that could potentially race with a pci hot removal.
> 
> To help understand why a reset failed, the status is logged with the
> existing warning message.
> 
> Signed-off-by: Keith Busch <keith.busch at intel.com>
> Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>

Looks good,

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCHv2-4.5 04/10] NVMe: Fix namespace removal deadlock
  2016-02-11 20:05 ` [PATCHv2-4.5 04/10] NVMe: Fix namespace removal deadlock Keith Busch
@ 2016-02-13  9:44   ` Christoph Hellwig
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2016-02-13  9:44 UTC (permalink / raw)


So, this one is the thing I don't like.  The real issue here is the
broad scope of namespaces_mutex.  I think we should have a small
scope lock protecting just the list manipulations, which is not
held over I/O, and the issue would sort itself out.

Note that we already have a high level serialization for scanning
by always executing it in the scan_work work_item, so there is no
need for the high level serialization anyway.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCHv2-4.5 08/10] NVMe: Move error handling to failed reset handler
  2016-02-11 20:05 ` [PATCHv2-4.5 08/10] NVMe: Move error handling to failed reset handler Keith Busch
@ 2016-02-13  9:46   ` Christoph Hellwig
  2016-02-16 21:57     ` Keith Busch
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2016-02-13  9:46 UTC (permalink / raw)


On Thu, Feb 11, 2016@01:05:45PM -0700, Keith Busch wrote:
> This moves failed queue handling out of the namespace removal path and into
> the reset failure path, fixing a deadlock condition if the controller
> fails or link down during del_gendisk. Previously the driver had to see
> the controller as degraded prior to calling del_gendisk to setup the
> queues to fail. If the controller happened to fail after this though,
> there was no task to end the request_queue.
> 
> On failure, all namespace states are set to 'dead'. This has capacity
> revalidate to 0, and ends all new requests with error status.

I like this a lot, but a few comments below:

> +/**
> + * nvme_kill_ns_queues(): Ends all namespace queues
> + * @ctrl: the dead controller that needs to end
> + *
> + * Call this function when the driver determines it is unable to get the
> + * controller in a state capable of servicing IO.
> + */
> +void nvme_kill_ns_queues(struct nvme_ctrl *ctrl)

Why do we have a separate function for this instead of driving this
from nvme_remove_namespaces?

> @@ -678,7 +678,9 @@ static int nvme_queue_rq(struct blk_mq_hw_ctx *hctx,
>  
>  	spin_lock_irq(&nvmeq->q_lock);
>  	if (unlikely(nvmeq->cq_vector < 0)) {
> -		ret = BLK_MQ_RQ_QUEUE_BUSY;
> +		ret = test_bit(NVME_NS_DEAD, &ns->flags) ?
> +					BLK_MQ_RQ_QUEUE_ERROR :
> +					BLK_MQ_RQ_QUEUE_BUSY;

This would be easier to read with a good old 'if'.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCHv2-4.5 00/10] NVMe fixes
  2016-02-12 15:24       ` Keith Busch
@ 2016-02-13  9:49         ` Christoph Hellwig
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2016-02-13  9:49 UTC (permalink / raw)


On Fri, Feb 12, 2016@03:24:48PM +0000, Keith Busch wrote:
> Fair enough. I know you're busy in many other areas as well, so I
> appreciate the time you've provided.

Sorry for the delay - I planned to get to it thursday afternoon, but
the common cold nocked me out unexpectedly just around that time and
it took a day to recover me into a state where I could carefully
review complicated changes.

I'm actually much happier now with the remaining patches than I initially
thought - my main issue is the namespace removal work item which should
not be nessecary if we apply locking good practices by reducing the
namespace_mutex scope.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCHv2-4.5 08/10] NVMe: Move error handling to failed reset handler
  2016-02-13  9:46   ` Christoph Hellwig
@ 2016-02-16 21:57     ` Keith Busch
  2016-02-17  8:06       ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Keith Busch @ 2016-02-16 21:57 UTC (permalink / raw)


On Sat, Feb 13, 2016@01:46:34AM -0800, Christoph Hellwig wrote:
> On Thu, Feb 11, 2016@01:05:45PM -0700, Keith Busch wrote:
> > +/**
> > + * nvme_kill_ns_queues(): Ends all namespace queues
> > + * @ctrl: the dead controller that needs to end
> > + *
> > + * Call this function when the driver determines it is unable to get the
> > + * controller in a state capable of servicing IO.
> > + */
> > +void nvme_kill_ns_queues(struct nvme_ctrl *ctrl)
> 
> Why do we have a separate function for this instead of driving this
> from nvme_remove_namespaces?

nvme_remove_namespaces attempts to remove namespaces assuming the
controller is still functional. For example, user pushed the "Attention"
removal button and waits for the acknowledgement.

This new API assumes the controller is not functional and never will be.

If your question is why nvme_remove_namespaces doesn't handle both cases,
it's because the controller could break/link-down only after calling
del_gendisk, at which point its too late for the driver to kill the
queue in that path.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCHv2-4.5 08/10] NVMe: Move error handling to failed reset handler
  2016-02-16 21:57     ` Keith Busch
@ 2016-02-17  8:06       ` Christoph Hellwig
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2016-02-17  8:06 UTC (permalink / raw)


On Tue, Feb 16, 2016@09:57:47PM +0000, Keith Busch wrote:
> > 
> > Why do we have a separate function for this instead of driving this
> > from nvme_remove_namespaces?
> 
> nvme_remove_namespaces attempts to remove namespaces assuming the
> controller is still functional. For example, user pushed the "Attention"
> removal button and waits for the acknowledgement.

But we'll always call it before removing a controller anyway.

> This new API assumes the controller is not functional and never will be.
> 
> If your question is why nvme_remove_namespaces doesn't handle both cases,
> it's because the controller could break/link-down only after calling
> del_gendisk, at which point its too late for the driver to kill the
> queue in that path.

My idea was more to have a kill flag for nvme_remove_namespaces given
that we basically always follow up with a nvme_remove_namespaces when
calling nvme_kill_ns_queues.  But looking at the code this might
actually end up more convoluted, so I'll take my comment back.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-02-17  8:06 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-11 20:05 [PATCHv2-4.5 00/10] NVMe fixes Keith Busch
2016-02-11 20:05 ` [PATCHv2-4.5 01/10] blk-mq: End unstarted requests on dying queue Keith Busch
2016-02-11 20:05 ` [PATCHv2-4.5 02/10] NVMe: Fix io incapable return values Keith Busch
2016-02-11 20:05 ` [PATCHv2-4.5 03/10] NVMe: Allow request merges Keith Busch
2016-02-11 20:05 ` [PATCHv2-4.5 04/10] NVMe: Fix namespace removal deadlock Keith Busch
2016-02-13  9:44   ` Christoph Hellwig
2016-02-11 20:05 ` [PATCHv2-4.5 05/10] NVMe: Requeue requests on suspended queues Keith Busch
2016-02-11 20:05 ` [PATCHv2-4.5 06/10] NVMe: Poll device while still active during remove Keith Busch
2016-02-11 20:05 ` [PATCHv2-4.5 07/10] NVMe: Simplify device reset failure Keith Busch
2016-02-13  9:21   ` Christoph Hellwig
2016-02-11 20:05 ` [PATCHv2-4.5 08/10] NVMe: Move error handling to failed reset handler Keith Busch
2016-02-13  9:46   ` Christoph Hellwig
2016-02-16 21:57     ` Keith Busch
2016-02-17  8:06       ` Christoph Hellwig
2016-02-11 20:05 ` [PATCHv2-4.5 09/10] NVMe: Mark queues as dead on degraded controller Keith Busch
2016-02-11 20:05 ` [PATCHv2-4.5 10/10] NVMe: Rate limit nvme IO warnings Keith Busch
2016-02-12  8:16   ` Johannes Thumshirn
2016-02-11 22:28 ` [PATCHv2-4.5 00/10] NVMe fixes Keith Busch
2016-02-11 22:38   ` Jens Axboe
2016-02-12  8:35     ` Christoph Hellwig
2016-02-12 15:09       ` Jens Axboe
2016-02-12 15:24       ` Keith Busch
2016-02-13  9:49         ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.