Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 1/5] nvme: Restart request timers in resetting state
@ 2019-09-05 14:26 Keith Busch
  2019-09-05 14:26 ` [PATCH 2/5] nvme: Prevent resets during paused states Keith Busch
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Keith Busch @ 2019-09-05 14:26 UTC (permalink / raw)
  To: Christoph Hellwig, linux-nvme, Sagi Grimberg; +Cc: Keith Busch, Edmund Nadolski

A controller in the resetting state has not yet completed its recovery
actions. Don't attempt additional recovery in this state and just restart
the timer instead.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/rdma.c | 8 ++++++++
 drivers/nvme/host/tcp.c  | 8 ++++++++
 2 files changed, 16 insertions(+)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 5e30bcf3fe37..0ac4460a3de3 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1688,6 +1688,14 @@ nvme_rdma_timeout(struct request *rq, bool reserved)
 	dev_warn(ctrl->ctrl.device, "I/O %d QID %d timeout\n",
 		 rq->tag, nvme_rdma_queue_idx(queue));
 
+	/*
+	 * Restart the timer if a controller reset is already scheduled. Any
+	 * timed out commands would be handled before entering the connecting
+	 * state.
+	 */
+	if (ctrl->ctrl.state == NVME_CTRL_RESETTING)
+		return BLK_EH_RESET_TIMER;
+
 	if (ctrl->ctrl.state != NVME_CTRL_LIVE) {
 		/*
 		 * Teardown immediately if controller times out while starting
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 0a0263a364f2..994a2c3caaca 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2044,6 +2044,14 @@ nvme_tcp_timeout(struct request *rq, bool reserved)
 	struct nvme_tcp_ctrl *ctrl = req->queue->ctrl;
 	struct nvme_tcp_cmd_pdu *pdu = req->pdu;
 
+	/*
+	 * Restart the timer if a controller reset is already scheduled. Any
+	 * timed out commands would be handled before entering the connecting
+	 * state.
+	 */
+	if (ctrl->ctrl.state == NVME_CTRL_RESETTING)
+		return BLK_EH_RESET_TIMER;
+
 	dev_warn(ctrl->ctrl.device,
 		"queue %d: timeout request %#x type %d\n",
 		nvme_tcp_queue_id(req->queue), rq->tag, pdu->hdr.type);
-- 
2.14.5


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/5] nvme: Prevent resets during paused states
  2019-09-05 14:26 [PATCH 1/5] nvme: Restart request timers in resetting state Keith Busch
@ 2019-09-05 14:26 ` Keith Busch
  2019-09-05 20:23   ` Sagi Grimberg
  2019-09-05 14:26 ` [PATCH 3/5] nvme-pci: Free tagset if no IO queues Keith Busch
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Keith Busch @ 2019-09-05 14:26 UTC (permalink / raw)
  To: Christoph Hellwig, linux-nvme, Sagi Grimberg; +Cc: Keith Busch, Edmund Nadolski

A paused controller is doing critical internal activation work. Don't
allow a reset to occur by setting it to the resetting state, preventing
any future reset from occuring during this time.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/core.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 91b1f0e57715..d42167d7594b 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3705,20 +3705,23 @@ static void nvme_fw_act_work(struct work_struct *work)
 		fw_act_timeout = jiffies +
 				msecs_to_jiffies(admin_timeout * 1000);
 
+	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING))
+		return;
+
 	nvme_stop_queues(ctrl);
 	while (nvme_ctrl_pp_status(ctrl)) {
 		if (time_after(jiffies, fw_act_timeout)) {
 			dev_warn(ctrl->device,
 				"Fw activation timeout, reset controller\n");
-			nvme_reset_ctrl(ctrl);
+			if (nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE))
+				nvme_reset_ctrl(ctrl);
 			break;
 		}
 		msleep(100);
 	}
 
-	if (ctrl->state != NVME_CTRL_LIVE)
+	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE))
 		return;
-
 	nvme_start_queues(ctrl);
 	/* read FW slot information to clear the AER */
 	nvme_get_fw_slot_info(ctrl);
-- 
2.14.5


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 3/5] nvme-pci: Free tagset if no IO queues
  2019-09-05 14:26 [PATCH 1/5] nvme: Restart request timers in resetting state Keith Busch
  2019-09-05 14:26 ` [PATCH 2/5] nvme: Prevent resets during paused states Keith Busch
@ 2019-09-05 14:26 ` Keith Busch
  2019-09-05 20:24   ` Sagi Grimberg
  2019-09-05 14:26 ` [PATCH 4/5] nvme: Remove ADMIN_ONLY state Keith Busch
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Keith Busch @ 2019-09-05 14:26 UTC (permalink / raw)
  To: Christoph Hellwig, linux-nvme, Sagi Grimberg; +Cc: Keith Busch, Edmund Nadolski

If a controller becomes degraded after a reset, we will not be able to
perform any IO. We currently teardown all previously created request
queues and namespaces, but we had been keeping an unusable tagset. Free
it after all queues using it have been released.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/pci.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 5c3732fd02bc..ef6c31c20571 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2489,14 +2489,20 @@ static void nvme_release_prp_pools(struct nvme_dev *dev)
 	dma_pool_destroy(dev->prp_small_pool);
 }
 
+static void nvme_free_tagset(struct nvme_dev *dev)
+{
+	if (dev->tagset.tags)
+		blk_mq_free_tag_set(&dev->tagset);
+	dev->ctrl.tagset = NULL;
+}
+
 static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
 {
 	struct nvme_dev *dev = to_nvme_dev(ctrl);
 
 	nvme_dbbuf_dma_free(dev);
 	put_device(dev->dev);
-	if (dev->tagset.tags)
-		blk_mq_free_tag_set(&dev->tagset);
+	nvme_free_tagset(dev);
 	if (dev->ctrl.admin_q)
 		blk_put_queue(dev->ctrl.admin_q);
 	kfree(dev->queues);
@@ -2615,6 +2621,7 @@ static void nvme_reset_work(struct work_struct *work)
 		nvme_kill_queues(&dev->ctrl);
 		nvme_remove_namespaces(&dev->ctrl);
 		new_state = NVME_CTRL_ADMIN_ONLY;
+		nvme_free_tagset(dev);
 	} else {
 		nvme_start_queues(&dev->ctrl);
 		nvme_wait_freeze(&dev->ctrl);
-- 
2.14.5


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 4/5] nvme: Remove ADMIN_ONLY state
  2019-09-05 14:26 [PATCH 1/5] nvme: Restart request timers in resetting state Keith Busch
  2019-09-05 14:26 ` [PATCH 2/5] nvme: Prevent resets during paused states Keith Busch
  2019-09-05 14:26 ` [PATCH 3/5] nvme-pci: Free tagset if no IO queues Keith Busch
@ 2019-09-05 14:26 ` Keith Busch
  2019-09-05 14:26 ` [PATCH 5/5] nvme: Wait for reset state when required Keith Busch
  2019-09-05 20:13 ` [PATCH 1/5] nvme: Restart request timers in resetting state Sagi Grimberg
  4 siblings, 0 replies; 18+ messages in thread
From: Keith Busch @ 2019-09-05 14:26 UTC (permalink / raw)
  To: Christoph Hellwig, linux-nvme, Sagi Grimberg; +Cc: Keith Busch, Edmund Nadolski

The admin only state was intended to fence off actions that don't
apply to a non-IO capable controller. The only actual user of this is
the scan_work, and pci was the only transport to ever set this state.
The consequence of having this state is placing an additional burden on
every other action that applies to both live and admin only controllers.

Remove the admin only state and place the admin only burden on the only
place that actually cares: scan_work.

This also prepares to make it easier to temporarily pause a LIVE state
so that we don't need to remember which state the controller had been in
prior to the pause.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/core.c    | 22 +++-------------------
 drivers/nvme/host/fabrics.h |  3 +--
 drivers/nvme/host/nvme.h    |  1 -
 drivers/nvme/host/pci.c     | 21 ++++++---------------
 4 files changed, 10 insertions(+), 37 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d42167d7594b..d3920c1e54b9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -114,7 +114,7 @@ static void nvme_queue_scan(struct nvme_ctrl *ctrl)
 	/*
 	 * Only new queue scan work when admin and IO queues are both alive
 	 */
-	if (ctrl->state == NVME_CTRL_LIVE)
+	if (ctrl->state == NVME_CTRL_LIVE && ctrl->tagset)
 		queue_work(nvme_wq, &ctrl->scan_work);
 }
 
@@ -135,8 +135,7 @@ int nvme_reset_ctrl_sync(struct nvme_ctrl *ctrl)
 	ret = nvme_reset_ctrl(ctrl);
 	if (!ret) {
 		flush_work(&ctrl->reset_work);
-		if (ctrl->state != NVME_CTRL_LIVE &&
-		    ctrl->state != NVME_CTRL_ADMIN_ONLY)
+		if (ctrl->state != NVME_CTRL_LIVE)
 			ret = -ENETRESET;
 	}
 
@@ -313,15 +312,6 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 
 	old_state = ctrl->state;
 	switch (new_state) {
-	case NVME_CTRL_ADMIN_ONLY:
-		switch (old_state) {
-		case NVME_CTRL_CONNECTING:
-			changed = true;
-			/* FALLTHRU */
-		default:
-			break;
-		}
-		break;
 	case NVME_CTRL_LIVE:
 		switch (old_state) {
 		case NVME_CTRL_NEW:
@@ -337,7 +327,6 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 		switch (old_state) {
 		case NVME_CTRL_NEW:
 		case NVME_CTRL_LIVE:
-		case NVME_CTRL_ADMIN_ONLY:
 			changed = true;
 			/* FALLTHRU */
 		default:
@@ -357,7 +346,6 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 	case NVME_CTRL_DELETING:
 		switch (old_state) {
 		case NVME_CTRL_LIVE:
-		case NVME_CTRL_ADMIN_ONLY:
 		case NVME_CTRL_RESETTING:
 		case NVME_CTRL_CONNECTING:
 			changed = true;
@@ -2775,7 +2763,6 @@ static int nvme_dev_open(struct inode *inode, struct file *file)
 
 	switch (ctrl->state) {
 	case NVME_CTRL_LIVE:
-	case NVME_CTRL_ADMIN_ONLY:
 		break;
 	default:
 		return -EWOULDBLOCK;
@@ -3065,7 +3052,6 @@ static ssize_t nvme_sysfs_show_state(struct device *dev,
 	static const char *const state_name[] = {
 		[NVME_CTRL_NEW]		= "new",
 		[NVME_CTRL_LIVE]	= "live",
-		[NVME_CTRL_ADMIN_ONLY]	= "only-admin",
 		[NVME_CTRL_RESETTING]	= "resetting",
 		[NVME_CTRL_CONNECTING]	= "connecting",
 		[NVME_CTRL_DELETING]	= "deleting",
@@ -3579,11 +3565,9 @@ static void nvme_scan_work(struct work_struct *work)
 	struct nvme_id_ctrl *id;
 	unsigned nn;
 
-	if (ctrl->state != NVME_CTRL_LIVE)
+	if (ctrl->state != NVME_CTRL_LIVE || !ctrl->tagset)
 		return;
 
-	WARN_ON_ONCE(!ctrl->tagset);
-
 	if (test_and_clear_bit(NVME_AER_NOTICE_NS_CHANGED, &ctrl->events)) {
 		dev_info(ctrl->device, "rescanning namespaces.\n");
 		nvme_clear_changed_ns_log(ctrl);
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index 93f08d77c896..a0ec40ab62ee 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -182,8 +182,7 @@ bool nvmf_ip_options_match(struct nvme_ctrl *ctrl,
 static inline bool nvmf_check_ready(struct nvme_ctrl *ctrl, struct request *rq,
 		bool queue_live)
 {
-	if (likely(ctrl->state == NVME_CTRL_LIVE ||
-		   ctrl->state == NVME_CTRL_ADMIN_ONLY))
+	if (likely(ctrl->state == NVME_CTRL_LIVE))
 		return true;
 	return __nvmf_check_ready(ctrl, rq, queue_live);
 }
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a818313a1f15..a9fc25c3b64d 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -156,7 +156,6 @@ static inline u16 nvme_req_qid(struct request *req)
 enum nvme_ctrl_state {
 	NVME_CTRL_NEW,
 	NVME_CTRL_LIVE,
-	NVME_CTRL_ADMIN_ONLY,    /* Only admin queue live */
 	NVME_CTRL_RESETTING,
 	NVME_CTRL_CONNECTING,
 	NVME_CTRL_DELETING,
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ef6c31c20571..95411f53f159 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2262,10 +2262,7 @@ static bool __nvme_disable_io_queues(struct nvme_dev *dev, u8 opcode)
 	return true;
 }
 
-/*
- * return error value only when tagset allocation failed
- */
-static int nvme_dev_add(struct nvme_dev *dev)
+static void nvme_dev_add(struct nvme_dev *dev)
 {
 	int ret;
 
@@ -2295,7 +2292,7 @@ static int nvme_dev_add(struct nvme_dev *dev)
 		if (ret) {
 			dev_warn(dev->ctrl.device,
 				"IO queues tagset allocation failed %d\n", ret);
-			return ret;
+			return;
 		}
 		dev->ctrl.tagset = &dev->tagset;
 	} else {
@@ -2306,7 +2303,6 @@ static int nvme_dev_add(struct nvme_dev *dev)
 	}
 
 	nvme_dbbuf_set(dev);
-	return 0;
 }
 
 static int nvme_pci_enable(struct nvme_dev *dev)
@@ -2526,7 +2522,6 @@ static void nvme_reset_work(struct work_struct *work)
 		container_of(work, struct nvme_dev, ctrl.reset_work);
 	bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
 	int result;
-	enum nvme_ctrl_state new_state = NVME_CTRL_LIVE;
 
 	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING)) {
 		result = -ENODEV;
@@ -2620,14 +2615,11 @@ static void nvme_reset_work(struct work_struct *work)
 		dev_warn(dev->ctrl.device, "IO queues not created\n");
 		nvme_kill_queues(&dev->ctrl);
 		nvme_remove_namespaces(&dev->ctrl);
-		new_state = NVME_CTRL_ADMIN_ONLY;
 		nvme_free_tagset(dev);
 	} else {
 		nvme_start_queues(&dev->ctrl);
 		nvme_wait_freeze(&dev->ctrl);
-		/* hit this only when allocate tagset fails */
-		if (nvme_dev_add(dev))
-			new_state = NVME_CTRL_ADMIN_ONLY;
+		nvme_dev_add(dev);
 		nvme_unfreeze(&dev->ctrl);
 	}
 
@@ -2635,9 +2627,9 @@ static void nvme_reset_work(struct work_struct *work)
 	 * If only admin queue live, keep it to do further investigation or
 	 * recovery.
 	 */
-	if (!nvme_change_ctrl_state(&dev->ctrl, new_state)) {
+	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_LIVE)) {
 		dev_warn(dev->ctrl.device,
-			"failed to mark controller state %d\n", new_state);
+			"failed to mark controller live state\n");
 		result = -ENODEV;
 		goto out;
 	}
@@ -2934,8 +2926,7 @@ static int nvme_suspend(struct device *dev)
 	nvme_wait_freeze(ctrl);
 	nvme_sync_queues(ctrl);
 
-	if (ctrl->state != NVME_CTRL_LIVE &&
-	    ctrl->state != NVME_CTRL_ADMIN_ONLY)
+	if (ctrl->state != NVME_CTRL_LIVE)
 		goto unfreeze;
 
 	ndev->last_ps = 0;
-- 
2.14.5


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 5/5] nvme: Wait for reset state when required
  2019-09-05 14:26 [PATCH 1/5] nvme: Restart request timers in resetting state Keith Busch
                   ` (2 preceding siblings ...)
  2019-09-05 14:26 ` [PATCH 4/5] nvme: Remove ADMIN_ONLY state Keith Busch
@ 2019-09-05 14:26 ` Keith Busch
  2019-09-05 15:57   ` James Smart
  2019-09-05 20:47   ` Sagi Grimberg
  2019-09-05 20:13 ` [PATCH 1/5] nvme: Restart request timers in resetting state Sagi Grimberg
  4 siblings, 2 replies; 18+ messages in thread
From: Keith Busch @ 2019-09-05 14:26 UTC (permalink / raw)
  To: Christoph Hellwig, linux-nvme, Sagi Grimberg; +Cc: Keith Busch, Edmund Nadolski

Disabling a controller in a path that resets it in a different
context needs to fence off other resets from occuring between these
actions. Provide a method to block until we've successfully entered a
reset state prior to disabling so that we can perform the controller
disabling without racing with another reset. Otherwise, that other reset
may either undo our teardown, or our teardown may undo the in-progress
reset's bringup.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/core.c | 36 +++++++++++++++++++++++++++++++++++-
 drivers/nvme/host/nvme.h |  3 +++
 drivers/nvme/host/pci.c  | 43 ++++++++++++++++++++++++++++++++-----------
 3 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d3920c1e54b9..995daa20a410 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -367,8 +367,10 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 		break;
 	}
 
-	if (changed)
+	if (changed) {
 		ctrl->state = new_state;
+		wake_up_all(&ctrl->state_wq);
+	}
 
 	spin_unlock_irqrestore(&ctrl->lock, flags);
 	if (changed && ctrl->state == NVME_CTRL_LIVE)
@@ -377,6 +379,37 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 }
 EXPORT_SYMBOL_GPL(nvme_change_ctrl_state);
 
+/*
+ * Returns false for sink states that can't ever transition back to live
+ */
+static bool nvme_state_transient(struct nvme_ctrl *ctrl)
+{
+	switch (ctrl->state) {
+	case NVME_CTRL_NEW:
+	case NVME_CTRL_LIVE:
+	case NVME_CTRL_RESETTING:
+	case NVME_CTRL_CONNECTING:
+		return true;;
+	case NVME_CTRL_DELETING:
+	case NVME_CTRL_DEAD:
+	default:
+		return false;
+	}
+}
+
+/*
+ * Waits for the controller state to be reset, or returns false if it is not
+ * possible to ever transition to reset.
+ */
+bool nvme_wait_reset(struct nvme_ctrl *ctrl)
+{
+	wait_event(ctrl->state_wq,
+		   nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING) ||
+		   !nvme_state_transient(ctrl));
+	return ctrl->state == NVME_CTRL_RESETTING;
+}
+EXPORT_SYMBOL_GPL(nvme_wait_reset);
+
 static void nvme_free_ns_head(struct kref *ref)
 {
 	struct nvme_ns_head *head =
@@ -3841,6 +3874,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 	INIT_WORK(&ctrl->async_event_work, nvme_async_event_work);
 	INIT_WORK(&ctrl->fw_act_work, nvme_fw_act_work);
 	INIT_WORK(&ctrl->delete_work, nvme_delete_ctrl_work);
+	init_waitqueue_head(&ctrl->state_wq);
 
 	INIT_DELAYED_WORK(&ctrl->ka_work, nvme_keep_alive_work);
 	memset(&ctrl->ka_cmd, 0, sizeof(ctrl->ka_cmd));
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a9fc25c3b64d..dc1c95008748 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -15,6 +15,7 @@
 #include <linux/sed-opal.h>
 #include <linux/fault-inject.h>
 #include <linux/rcupdate.h>
+#include <linux/wait.h>
 
 #include <trace/events/block.h>
 
@@ -193,6 +194,7 @@ struct nvme_ctrl {
 	struct cdev cdev;
 	struct work_struct reset_work;
 	struct work_struct delete_work;
+	wait_queue_head_t state_wq;
 
 	struct nvme_subsystem *subsys;
 	struct list_head subsys_entry;
@@ -443,6 +445,7 @@ void nvme_complete_rq(struct request *req);
 bool nvme_cancel_request(struct request *req, void *data, bool reserved);
 bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 		enum nvme_ctrl_state new_state);
+bool nvme_wait_reset(struct nvme_ctrl *ctrl);
 int nvme_disable_ctrl(struct nvme_ctrl *ctrl);
 int nvme_enable_ctrl(struct nvme_ctrl *ctrl);
 int nvme_shutdown_ctrl(struct nvme_ctrl *ctrl);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 95411f53f159..b3a89ca37ead 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2462,6 +2462,14 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 	mutex_unlock(&dev->shutdown_lock);
 }
 
+static int nvme_disable_prepare_reset(struct nvme_dev *dev, bool shutdown)
+{
+	if (!nvme_wait_reset(&dev->ctrl))
+		return -EBUSY;
+	nvme_dev_disable(dev, shutdown);
+	return 0;
+}
+
 static int nvme_setup_prp_pools(struct nvme_dev *dev)
 {
 	dev->prp_page_pool = dma_pool_create("prp list page", dev->dev,
@@ -2509,6 +2517,11 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
 
 static void nvme_remove_dead_ctrl(struct nvme_dev *dev)
 {
+	/*
+	 * Set state to deleting now to avoid blocking nvme_wait_reset(), which
+	 * may be holding this pci_dev's device lock.
+	 */
+	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
 	nvme_get_ctrl(&dev->ctrl);
 	nvme_dev_disable(dev, false);
 	nvme_kill_queues(&dev->ctrl);
@@ -2833,19 +2846,27 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 static void nvme_reset_prepare(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
-	nvme_dev_disable(dev, false);
+
+	/*
+	 * We don't need to check the return value from waiting for the reset
+	 * state as pci_dev device lock is held, making it impossible to race
+	 * with ->remove().
+	 */
+	nvme_disable_prepare_reset(dev, false);
+	nvme_sync_queues(&dev->ctrl);
 }
 
 static void nvme_reset_done(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
+	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_LIVE);
 	nvme_reset_ctrl_sync(&dev->ctrl);
 }
 
 static void nvme_shutdown(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
-	nvme_dev_disable(dev, true);
+	nvme_disable_prepare_reset(dev, true);
 }
 
 /*
@@ -2897,8 +2918,10 @@ static int nvme_resume(struct device *dev)
 	struct nvme_ctrl *ctrl = &ndev->ctrl;
 
 	if (pm_resume_via_firmware() || !ctrl->npss ||
-	    nvme_set_power_state(ctrl, ndev->last_ps) != 0)
-		nvme_reset_ctrl(ctrl);
+	    nvme_set_power_state(ctrl, ndev->last_ps) != 0) {
+		if (nvme_change_ctrl_state(&ndev->ctrl, NVME_CTRL_LIVE))
+			nvme_reset_ctrl(ctrl);
+	}
 	return 0;
 }
 
@@ -2917,10 +2940,8 @@ static int nvme_suspend(struct device *dev)
 	 * device does not support any non-default power states, shut down the
 	 * device fully.
 	 */
-	if (pm_suspend_via_firmware() || !ctrl->npss) {
-		nvme_dev_disable(ndev, true);
-		return 0;
-	}
+	if (pm_suspend_via_firmware() || !ctrl->npss)
+		return nvme_disable_prepare_reset(ndev, true);
 
 	nvme_start_freeze(ctrl);
 	nvme_wait_freeze(ctrl);
@@ -2943,9 +2964,8 @@ static int nvme_suspend(struct device *dev)
 		 * Clearing npss forces a controller reset on resume. The
 		 * correct value will be resdicovered then.
 		 */
-		nvme_dev_disable(ndev, true);
+		ret = nvme_disable_prepare_reset(ndev, true);
 		ctrl->npss = 0;
-		ret = 0;
 		goto unfreeze;
 	}
 	/*
@@ -2963,7 +2983,7 @@ static int nvme_simple_suspend(struct device *dev)
 {
 	struct nvme_dev *ndev = pci_get_drvdata(to_pci_dev(dev));
 
-	nvme_dev_disable(ndev, true);
+	nvme_disable_prepare_reset(ndev, true);
 	return 0;
 }
 
@@ -2972,6 +2992,7 @@ static int nvme_simple_resume(struct device *dev)
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct nvme_dev *ndev = pci_get_drvdata(pdev);
 
+	nvme_change_ctrl_state(&ndev->ctrl, NVME_CTRL_LIVE);
 	nvme_reset_ctrl(&ndev->ctrl);
 	return 0;
 }
-- 
2.14.5


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] nvme: Wait for reset state when required
  2019-09-05 14:26 ` [PATCH 5/5] nvme: Wait for reset state when required Keith Busch
@ 2019-09-05 15:57   ` James Smart
  2019-09-05 20:47   ` Sagi Grimberg
  1 sibling, 0 replies; 18+ messages in thread
From: James Smart @ 2019-09-05 15:57 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig, linux-nvme, Sagi Grimberg; +Cc: Edmund Nadolski

On 9/5/2019 7:26 AM, Keith Busch wrote:
> Disabling a controller in a path that resets it in a different
> context needs to fence off other resets from occuring between these
> actions. Provide a method to block until we've successfully entered a
> reset state prior to disabling so that we can perform the controller
> disabling without racing with another reset. Otherwise, that other reset
> may either undo our teardown, or our teardown may undo the in-progress
> reset's bringup.
>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>   drivers/nvme/host/core.c | 36 +++++++++++++++++++++++++++++++++++-
>   drivers/nvme/host/nvme.h |  3 +++
>   drivers/nvme/host/pci.c  | 43 ++++++++++++++++++++++++++++++++-----------
>   3 files changed, 70 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index d3920c1e54b9..995daa20a410 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -367,8 +367,10 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
>   		break;
>   	}
>   
> -	if (changed)
> +	if (changed) {
>   		ctrl->state = new_state;
> +		wake_up_all(&ctrl->state_wq);
> +	}
>   
>   	spin_unlock_irqrestore(&ctrl->lock, flags);
>   	if (changed && ctrl->state == NVME_CTRL_LIVE)
> @@ -377,6 +379,37 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
>   }
>   EXPORT_SYMBOL_GPL(nvme_change_ctrl_state);
>   
> +/*
> + * Returns false for sink states that can't ever transition back to live
> + */
> +static bool nvme_state_transient(struct nvme_ctrl *ctrl)
> +{
> +	switch (ctrl->state) {
> +	case NVME_CTRL_NEW:
> +	case NVME_CTRL_LIVE:
> +	case NVME_CTRL_RESETTING:
> +	case NVME_CTRL_CONNECTING:
> +		return true;;
> +	case NVME_CTRL_DELETING:
> +	case NVME_CTRL_DEAD:
> +	default:
> +		return false;
> +	}
> +}
> +
> +/*
> + * Waits for the controller state to be reset, or returns false if it is not
> + * possible to ever transition to reset.
> + */
> +bool nvme_wait_reset(struct nvme_ctrl *ctrl)
> +{
> +	wait_event(ctrl->state_wq,
> +		   nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING) ||
> +		   !nvme_state_transient(ctrl));
> +	return ctrl->state == NVME_CTRL_RESETTING;
> +}
> +EXPORT_SYMBOL_GPL(nvme_wait_reset);
> +
>   static void nvme_free_ns_head(struct kref *ref)
>   {
>   	struct nvme_ns_head *head =
> @@ -3841,6 +3874,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
>   	INIT_WORK(&ctrl->async_event_work, nvme_async_event_work);
>   	INIT_WORK(&ctrl->fw_act_work, nvme_fw_act_work);
>   	INIT_WORK(&ctrl->delete_work, nvme_delete_ctrl_work);
> +	init_waitqueue_head(&ctrl->state_wq);
>   
>   	INIT_DELAYED_WORK(&ctrl->ka_work, nvme_keep_alive_work);
>   	memset(&ctrl->ka_cmd, 0, sizeof(ctrl->ka_cmd));
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index a9fc25c3b64d..dc1c95008748 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -15,6 +15,7 @@
>   #include <linux/sed-opal.h>
>   #include <linux/fault-inject.h>
>   #include <linux/rcupdate.h>
> +#include <linux/wait.h>
>   
>   #include <trace/events/block.h>
>   
> @@ -193,6 +194,7 @@ struct nvme_ctrl {
>   	struct cdev cdev;
>   	struct work_struct reset_work;
>   	struct work_struct delete_work;
> +	wait_queue_head_t state_wq;
>   
>   	struct nvme_subsystem *subsys;
>   	struct list_head subsys_entry;
> @@ -443,6 +445,7 @@ void nvme_complete_rq(struct request *req);
>   bool nvme_cancel_request(struct request *req, void *data, bool reserved);
>   bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
>   		enum nvme_ctrl_state new_state);
> +bool nvme_wait_reset(struct nvme_ctrl *ctrl);
>   int nvme_disable_ctrl(struct nvme_ctrl *ctrl);
>   int nvme_enable_ctrl(struct nvme_ctrl *ctrl);
>   int nvme_shutdown_ctrl(struct nvme_ctrl *ctrl);
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 95411f53f159..b3a89ca37ead 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2462,6 +2462,14 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
>   	mutex_unlock(&dev->shutdown_lock);
>   }
>   
> +static int nvme_disable_prepare_reset(struct nvme_dev *dev, bool shutdown)
> +{
> +	if (!nvme_wait_reset(&dev->ctrl))
> +		return -EBUSY;
> +	nvme_dev_disable(dev, shutdown);
> +	return 0;
> +}
> +
>   static int nvme_setup_prp_pools(struct nvme_dev *dev)
>   {
>   	dev->prp_page_pool = dma_pool_create("prp list page", dev->dev,
> @@ -2509,6 +2517,11 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
>   
>   static void nvme_remove_dead_ctrl(struct nvme_dev *dev)
>   {
> +	/*
> +	 * Set state to deleting now to avoid blocking nvme_wait_reset(), which
> +	 * may be holding this pci_dev's device lock.
> +	 */
> +	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
>   	nvme_get_ctrl(&dev->ctrl);
>   	nvme_dev_disable(dev, false);
>   	nvme_kill_queues(&dev->ctrl);
> @@ -2833,19 +2846,27 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>   static void nvme_reset_prepare(struct pci_dev *pdev)
>   {
>   	struct nvme_dev *dev = pci_get_drvdata(pdev);
> -	nvme_dev_disable(dev, false);
> +
> +	/*
> +	 * We don't need to check the return value from waiting for the reset
> +	 * state as pci_dev device lock is held, making it impossible to race
> +	 * with ->remove().
> +	 */
> +	nvme_disable_prepare_reset(dev, false);
> +	nvme_sync_queues(&dev->ctrl);
>   }
>   
>   static void nvme_reset_done(struct pci_dev *pdev)
>   {
>   	struct nvme_dev *dev = pci_get_drvdata(pdev);
> +	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_LIVE);
>   	nvme_reset_ctrl_sync(&dev->ctrl);
>   }
>   
>   static void nvme_shutdown(struct pci_dev *pdev)
>   {
>   	struct nvme_dev *dev = pci_get_drvdata(pdev);
> -	nvme_dev_disable(dev, true);
> +	nvme_disable_prepare_reset(dev, true);
>   }
>   
>   /*
> @@ -2897,8 +2918,10 @@ static int nvme_resume(struct device *dev)
>   	struct nvme_ctrl *ctrl = &ndev->ctrl;
>   
>   	if (pm_resume_via_firmware() || !ctrl->npss ||
> -	    nvme_set_power_state(ctrl, ndev->last_ps) != 0)
> -		nvme_reset_ctrl(ctrl);
> +	    nvme_set_power_state(ctrl, ndev->last_ps) != 0) {
> +		if (nvme_change_ctrl_state(&ndev->ctrl, NVME_CTRL_LIVE))
> +			nvme_reset_ctrl(ctrl);
> +	}
>   	return 0;
>   }
>   
> @@ -2917,10 +2940,8 @@ static int nvme_suspend(struct device *dev)
>   	 * device does not support any non-default power states, shut down the
>   	 * device fully.
>   	 */
> -	if (pm_suspend_via_firmware() || !ctrl->npss) {
> -		nvme_dev_disable(ndev, true);
> -		return 0;
> -	}
> +	if (pm_suspend_via_firmware() || !ctrl->npss)
> +		return nvme_disable_prepare_reset(ndev, true);
>   
>   	nvme_start_freeze(ctrl);
>   	nvme_wait_freeze(ctrl);
> @@ -2943,9 +2964,8 @@ static int nvme_suspend(struct device *dev)
>   		 * Clearing npss forces a controller reset on resume. The
>   		 * correct value will be resdicovered then.
>   		 */
> -		nvme_dev_disable(ndev, true);
> +		ret = nvme_disable_prepare_reset(ndev, true);
>   		ctrl->npss = 0;
> -		ret = 0;
>   		goto unfreeze;
>   	}
>   	/*
> @@ -2963,7 +2983,7 @@ static int nvme_simple_suspend(struct device *dev)
>   {
>   	struct nvme_dev *ndev = pci_get_drvdata(to_pci_dev(dev));
>   
> -	nvme_dev_disable(ndev, true);
> +	nvme_disable_prepare_reset(ndev, true);
>   	return 0;
>   }
>   
> @@ -2972,6 +2992,7 @@ static int nvme_simple_resume(struct device *dev)
>   	struct pci_dev *pdev = to_pci_dev(dev);
>   	struct nvme_dev *ndev = pci_get_drvdata(pdev);
>   
> +	nvme_change_ctrl_state(&ndev->ctrl, NVME_CTRL_LIVE);
>   	nvme_reset_ctrl(&ndev->ctrl);
>   	return 0;
>   }

patches 1-4 look fine, and excluding fc on patch 1 was valid.

This patch overall looks fine too. I do get nervous that the new state 
transitions in some of these functions may make new state dependencies 
that show up in the common layers, especially around reset. Based on the 
pci vs core division and no actual change to the state transitions, it 
doesn't look like it bleeds out.

Reviewed-by: James Smart <james.smart@broadcom.com>

-- james


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] nvme: Restart request timers in resetting state
  2019-09-05 14:26 [PATCH 1/5] nvme: Restart request timers in resetting state Keith Busch
                   ` (3 preceding siblings ...)
  2019-09-05 14:26 ` [PATCH 5/5] nvme: Wait for reset state when required Keith Busch
@ 2019-09-05 20:13 ` Sagi Grimberg
  2019-09-05 20:25   ` Keith Busch
  4 siblings, 1 reply; 18+ messages in thread
From: Sagi Grimberg @ 2019-09-05 20:13 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig, linux-nvme; +Cc: Edmund Nadolski


> +	/*
> +	 * Restart the timer if a controller reset is already scheduled. Any
> +	 * timed out commands would be handled before entering the connecting
> +	 * state.
> +	 */
> +	if (ctrl->ctrl.state == NVME_CTRL_RESETTING)
> +		return BLK_EH_RESET_TIMER;
> +
>   	if (ctrl->ctrl.state != NVME_CTRL_LIVE) {

Not sure I understand what exactly is this solving? if the controller is
not responding to a connect command, we just added another ADMIN_TIMEOUT
to failing it.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] nvme: Prevent resets during paused states
  2019-09-05 14:26 ` [PATCH 2/5] nvme: Prevent resets during paused states Keith Busch
@ 2019-09-05 20:23   ` Sagi Grimberg
  2019-09-05 20:35     ` Keith Busch
  0 siblings, 1 reply; 18+ messages in thread
From: Sagi Grimberg @ 2019-09-05 20:23 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig, linux-nvme; +Cc: Edmund Nadolski


> A paused controller is doing critical internal activation work. Don't
> allow a reset to occur by setting it to the resetting state, preventing
> any future reset from occuring during this time.

Is there a reproducible bug actually being addressed here?

Also, seems a bit "acrobatic" to set the state to RESETTING without
really resetting it (and then change it back to LIVE before you do
actually resetting it).

Would it make sense to look at nvme_ctrl_pp_status when
scheduling a reset in nvme_reset_ctrl? Just a thought..

> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>   drivers/nvme/host/core.c | 9 ++++++---
>   1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 91b1f0e57715..d42167d7594b 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -3705,20 +3705,23 @@ static void nvme_fw_act_work(struct work_struct *work)
>   		fw_act_timeout = jiffies +
>   				msecs_to_jiffies(admin_timeout * 1000);
>   
> +	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING))
> +		return;
> +
>   	nvme_stop_queues(ctrl);
>   	while (nvme_ctrl_pp_status(ctrl)) {
>   		if (time_after(jiffies, fw_act_timeout)) {
>   			dev_warn(ctrl->device,
>   				"Fw activation timeout, reset controller\n");

Would be good if the print will reflect if it resetting or not..

> -			nvme_reset_ctrl(ctrl);
> +			if (nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE))
> +				nvme_reset_ctrl(ctrl);

How can this state change not succeed? ctrl removal?

>   			break;
>   		}
>   		msleep(100);
>   	}
>   
> -	if (ctrl->state != NVME_CTRL_LIVE)
> +	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE))
>   		return;

In what scenario this will not succeed? if the reset did it?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/5] nvme-pci: Free tagset if no IO queues
  2019-09-05 14:26 ` [PATCH 3/5] nvme-pci: Free tagset if no IO queues Keith Busch
@ 2019-09-05 20:24   ` Sagi Grimberg
  2019-09-05 20:40     ` Keith Busch
  0 siblings, 1 reply; 18+ messages in thread
From: Sagi Grimberg @ 2019-09-05 20:24 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig, linux-nvme; +Cc: Edmund Nadolski


> If a controller becomes degraded after a reset, we will not be able to
> perform any IO. We currently teardown all previously created request
> queues and namespaces, but we had been keeping an unusable tagset. Free
> it after all queues using it have been released.

Is this a resource thing? or is this causing actual troubles?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] nvme: Restart request timers in resetting state
  2019-09-05 20:13 ` [PATCH 1/5] nvme: Restart request timers in resetting state Sagi Grimberg
@ 2019-09-05 20:25   ` Keith Busch
  2019-09-05 20:39     ` Sagi Grimberg
  0 siblings, 1 reply; 18+ messages in thread
From: Keith Busch @ 2019-09-05 20:25 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Edmund Nadolski, Christoph Hellwig, linux-nvme

On Thu, Sep 05, 2019 at 01:13:43PM -0700, Sagi Grimberg wrote:
> 
> > +	/*
> > +	 * Restart the timer if a controller reset is already scheduled. Any
> > +	 * timed out commands would be handled before entering the connecting
> > +	 * state.
> > +	 */
> > +	if (ctrl->ctrl.state == NVME_CTRL_RESETTING)
> > +		return BLK_EH_RESET_TIMER;
> > +
> >   	if (ctrl->ctrl.state != NVME_CTRL_LIVE) {
> 
> Not sure I understand what exactly is this solving? if the controller is
> not responding to a connect command, we just added another ADMIN_TIMEOUT
> to failing it.

The connect command is sent in the CONNECTING state, no? So that's not
affected by this. The point of this patch is to not do anything in a
reset state because whatever set the reset state is responsible for
clearing any commands prior to exiting that state.

But the motivation for why I'm bringing this up now is that it also
prepares for PATCH 2/5. That one uses the RESETTING state when the
controller reports CSTS.PP. We do not want to schedule a reset when the
controller is in that state, and we also expect any IO dispatched prior to
seeing CSTS.PP may time out. Any IO should complete once CSTS.PP clears,
so not escalating recovery is the correct action during that window,
and we use the state machine to coordinate that.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] nvme: Prevent resets during paused states
  2019-09-05 20:23   ` Sagi Grimberg
@ 2019-09-05 20:35     ` Keith Busch
  2019-09-05 20:42       ` Sagi Grimberg
  0 siblings, 1 reply; 18+ messages in thread
From: Keith Busch @ 2019-09-05 20:35 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Edmund Nadolski, Christoph Hellwig, linux-nvme

On Thu, Sep 05, 2019 at 01:23:53PM -0700, Sagi Grimberg wrote:
> 
> > A paused controller is doing critical internal activation work. Don't
> > allow a reset to occur by setting it to the resetting state, preventing
> > any future reset from occuring during this time.
> 
> Is there a reproducible bug actually being addressed here?

Yes, IO timeouts happen during CSTS.PP, which is normal, and esaclating
such errors to reset the controller while it is activating firmware is
not a good idea.

Further, we do not want to a user to manaully trigger a reset (via sysfs
or other means), so this properly blocks such actions.
 
> Also, seems a bit "acrobatic" to set the state to RESETTING without
> really resetting it (and then change it back to LIVE before you do
> actually resetting it).

We can think of a CSTS.PP as the device internally resetting itself to
activate firmware.

> Would it make sense to look at nvme_ctrl_pp_status when
> scheduling a reset in nvme_reset_ctrl? Just a thought..

We have to be able to reset if we decide CSTS.PP is stuck, fw activation
timeout.

> > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > ---
> >   drivers/nvme/host/core.c | 9 ++++++---
> >   1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 91b1f0e57715..d42167d7594b 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -3705,20 +3705,23 @@ static void nvme_fw_act_work(struct work_struct *work)
> >   		fw_act_timeout = jiffies +
> >   				msecs_to_jiffies(admin_timeout * 1000);
> > +	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING))
> > +		return;
> > +
> >   	nvme_stop_queues(ctrl);
> >   	while (nvme_ctrl_pp_status(ctrl)) {
> >   		if (time_after(jiffies, fw_act_timeout)) {
> >   			dev_warn(ctrl->device,
> >   				"Fw activation timeout, reset controller\n");
> 
> Would be good if the print will reflect if it resetting or not..
>
> > -			nvme_reset_ctrl(ctrl);
> > +			if (nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE))
> > +				nvme_reset_ctrl(ctrl);
> 
> How can this state change not succeed? ctrl removal?

Right, we can't prevent a transition to a deleting state.

> >   			break;
> >   		}
> >   		msleep(100);
> >   	}
> > -	if (ctrl->state != NVME_CTRL_LIVE)
> > +	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE))
> >   		return;
> 
> In what scenario this will not succeed? if the reset did it?

Controller deletion should be the only reason here.

I see now the "break" for a failed activation ought to be a return,
so I can fix that if you're okay with the rest.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] nvme: Restart request timers in resetting state
  2019-09-05 20:25   ` Keith Busch
@ 2019-09-05 20:39     ` Sagi Grimberg
  2019-09-05 21:36       ` James Smart
  0 siblings, 1 reply; 18+ messages in thread
From: Sagi Grimberg @ 2019-09-05 20:39 UTC (permalink / raw)
  To: Keith Busch; +Cc: Edmund Nadolski, Christoph Hellwig, linux-nvme


>>> +	/*
>>> +	 * Restart the timer if a controller reset is already scheduled. Any
>>> +	 * timed out commands would be handled before entering the connecting
>>> +	 * state.
>>> +	 */
>>> +	if (ctrl->ctrl.state == NVME_CTRL_RESETTING)
>>> +		return BLK_EH_RESET_TIMER;
>>> +
>>>    	if (ctrl->ctrl.state != NVME_CTRL_LIVE) {
>>
>> Not sure I understand what exactly is this solving? if the controller is
>> not responding to a connect command, we just added another ADMIN_TIMEOUT
>> to failing it.
> 
> The connect command is sent in the CONNECTING state, no? So that's not
> affected by this.

Right. but if the channel failed in connect, we will not transition to
RESTTING and handle the failure in the timeout handler. I will need to
test what is the effect here.

> The point of this patch is to not do anything in a
> reset state because whatever set the reset state is responsible for
> clearing any commands prior to exiting that state.

OK, we might need a bit more to make this correct in the fabrics
drivers.

> But the motivation for why I'm bringing this up now is that it also
> prepares for PATCH 2/5. That one uses the RESETTING state when the
> controller reports CSTS.PP. We do not want to schedule a reset when the
> controller is in that state, and we also expect any IO dispatched prior to
> seeing CSTS.PP may time out. Any IO should complete once CSTS.PP clears,
> so not escalating recovery is the correct action during that window,
> and we use the state machine to coordinate that.

I think that patch 2/5 needs some more in-code documentation to explain
why its mangling with the controller state machine without going through
the normal state changing operations.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/5] nvme-pci: Free tagset if no IO queues
  2019-09-05 20:24   ` Sagi Grimberg
@ 2019-09-05 20:40     ` Keith Busch
  2019-09-05 20:43       ` Sagi Grimberg
  0 siblings, 1 reply; 18+ messages in thread
From: Keith Busch @ 2019-09-05 20:40 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Edmund Nadolski, Christoph Hellwig, linux-nvme

On Thu, Sep 05, 2019 at 01:24:43PM -0700, Sagi Grimberg wrote:
> 
> > If a controller becomes degraded after a reset, we will not be able to
> > perform any IO. We currently teardown all previously created request
> > queues and namespaces, but we had been keeping an unusable tagset. Free
> > it after all queues using it have been released.
> 
> Is this a resource thing? or is this causing actual troubles?

Yes, this is freeing an unusable resource so that we don't need to
unbind the driver in order to reclaim it.

I only noticed this though because PATCH 4/5 needs to observe the cleared
tagset to operate correctly.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] nvme: Prevent resets during paused states
  2019-09-05 20:35     ` Keith Busch
@ 2019-09-05 20:42       ` Sagi Grimberg
  0 siblings, 0 replies; 18+ messages in thread
From: Sagi Grimberg @ 2019-09-05 20:42 UTC (permalink / raw)
  To: Keith Busch; +Cc: Edmund Nadolski, Christoph Hellwig, linux-nvme


>> Also, seems a bit "acrobatic" to set the state to RESETTING without
>> really resetting it (and then change it back to LIVE before you do
>> actually resetting it).
> 
> We can think of a CSTS.PP as the device internally resetting itself to
> activate firmware.

Can the reset flush the fw activation work? I'm trying to understand if
this state games are the best solution here...

>> Would it make sense to look at nvme_ctrl_pp_status when
>> scheduling a reset in nvme_reset_ctrl? Just a thought..
> 
> We have to be able to reset if we decide CSTS.PP is stuck, fw activation
> timeout.

See above.

>>> -			nvme_reset_ctrl(ctrl);
>>> +			if (nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE))
>>> +				nvme_reset_ctrl(ctrl);
>>
>> How can this state change not succeed? ctrl removal?
> 
> Right, we can't prevent a transition to a deleting state.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/5] nvme-pci: Free tagset if no IO queues
  2019-09-05 20:40     ` Keith Busch
@ 2019-09-05 20:43       ` Sagi Grimberg
  0 siblings, 0 replies; 18+ messages in thread
From: Sagi Grimberg @ 2019-09-05 20:43 UTC (permalink / raw)
  To: Keith Busch; +Cc: Edmund Nadolski, Christoph Hellwig, linux-nvme


>>> If a controller becomes degraded after a reset, we will not be able to
>>> perform any IO. We currently teardown all previously created request
>>> queues and namespaces, but we had been keeping an unusable tagset. Free
>>> it after all queues using it have been released.
>>
>> Is this a resource thing? or is this causing actual troubles?
> 
> Yes, this is freeing an unusable resource so that we don't need to
> unbind the driver in order to reclaim it.
> 
> I only noticed this though because PATCH 4/5 needs to observe the cleared
> tagset to operate correctly.

I think that it would be easier to understand with a cover-letter
explaining the steps and why they are taken.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] nvme: Wait for reset state when required
  2019-09-05 14:26 ` [PATCH 5/5] nvme: Wait for reset state when required Keith Busch
  2019-09-05 15:57   ` James Smart
@ 2019-09-05 20:47   ` Sagi Grimberg
  2019-09-05 20:55     ` Keith Busch
  1 sibling, 1 reply; 18+ messages in thread
From: Sagi Grimberg @ 2019-09-05 20:47 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig, linux-nvme; +Cc: Edmund Nadolski


> Disabling a controller in a path that resets it in a different
> context needs to fence off other resets from occuring between these
> actions. Provide a method to block until we've successfully entered a
> reset state prior to disabling so that we can perform the controller
> disabling without racing with another reset. Otherwise, that other reset
> may either undo our teardown, or our teardown may undo the in-progress
> reset's bringup.

Why not simply flushing the reset work like nvme_reset_ctrl_sync?

I might be out of context here...

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] nvme: Wait for reset state when required
  2019-09-05 20:47   ` Sagi Grimberg
@ 2019-09-05 20:55     ` Keith Busch
  0 siblings, 0 replies; 18+ messages in thread
From: Keith Busch @ 2019-09-05 20:55 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Edmund Nadolski, Christoph Hellwig, linux-nvme

On Thu, Sep 05, 2019 at 01:47:26PM -0700, Sagi Grimberg wrote:
> 
> > Disabling a controller in a path that resets it in a different
> > context needs to fence off other resets from occuring between these
> > actions. Provide a method to block until we've successfully entered a
> > reset state prior to disabling so that we can perform the controller
> > disabling without racing with another reset. Otherwise, that other reset
> > may either undo our teardown, or our teardown may undo the in-progress
> > reset's bringup.
> 
> Why not simply flushing the reset work like nvme_reset_ctrl_sync?
> 
> I might be out of context here...

It's not enough to know a current reset completes. We have to know that
another won't schedule, which the RESETTING state guarantees.

More context:

This is for when we can't re-enable the controller in the same context
as when we have to disable it.

For example, PCIe FLR occurs in two stages. Suspend/resume is another
example. An IO timeout occuring concurrently with either gets the
wrong end result because the timeout tries to incorrectly re-enable the
device. This patch appropriatly synchronizes everyone through the state
machine to prevent these things from running concurrently.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] nvme: Restart request timers in resetting state
  2019-09-05 20:39     ` Sagi Grimberg
@ 2019-09-05 21:36       ` James Smart
  0 siblings, 0 replies; 18+ messages in thread
From: James Smart @ 2019-09-05 21:36 UTC (permalink / raw)
  To: Sagi Grimberg, Keith Busch; +Cc: Edmund Nadolski, Christoph Hellwig, linux-nvme



On 9/5/2019 1:39 PM, Sagi Grimberg wrote:
>
>> The point of this patch is to not do anything in a
>> reset state because whatever set the reset state is responsible for
>> clearing any commands prior to exiting that state.
>
> OK, we might need a bit more to make this correct in the fabrics
> drivers.

We certainly should be meeting this bar:

rdma/tcp uses blk_mq_tagset_busy_iter() with nvme_cancel_request, which 
is immediately completion of anything outstanding.

FC uses blk_mq_tagset_busy_iter() with an abort path then waits for the 
io aborts.

These coupled with the fabrics check-ready logic to not accept new io 
should mean we meet the bar.

-- james


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, back to index

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-05 14:26 [PATCH 1/5] nvme: Restart request timers in resetting state Keith Busch
2019-09-05 14:26 ` [PATCH 2/5] nvme: Prevent resets during paused states Keith Busch
2019-09-05 20:23   ` Sagi Grimberg
2019-09-05 20:35     ` Keith Busch
2019-09-05 20:42       ` Sagi Grimberg
2019-09-05 14:26 ` [PATCH 3/5] nvme-pci: Free tagset if no IO queues Keith Busch
2019-09-05 20:24   ` Sagi Grimberg
2019-09-05 20:40     ` Keith Busch
2019-09-05 20:43       ` Sagi Grimberg
2019-09-05 14:26 ` [PATCH 4/5] nvme: Remove ADMIN_ONLY state Keith Busch
2019-09-05 14:26 ` [PATCH 5/5] nvme: Wait for reset state when required Keith Busch
2019-09-05 15:57   ` James Smart
2019-09-05 20:47   ` Sagi Grimberg
2019-09-05 20:55     ` Keith Busch
2019-09-05 20:13 ` [PATCH 1/5] nvme: Restart request timers in resetting state Sagi Grimberg
2019-09-05 20:25   ` Keith Busch
2019-09-05 20:39     ` Sagi Grimberg
2019-09-05 21:36       ` James Smart

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org linux-nvme@archiver.kernel.org
	public-inbox-index linux-nvme


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/ public-inbox