linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv4 0/5] nvme: double reset prevention
@ 2019-10-10 16:57 Keith Busch
  2019-10-10 16:57 ` [PATCHv4 1/5] nvme-pci: Free tagset if no IO queues Keith Busch
                   ` (7 more replies)
  0 siblings, 8 replies; 13+ messages in thread
From: Keith Busch @ 2019-10-10 16:57 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Sagi Grimberg
  Cc: Keith Busch, Judy Brock, Edmund Nadolski, James Smart

The main objective of this series is to prevent double resets. This sort
of thing is known to have happened if a timeout occurs at roughly the
same time as a user intiated reset, like through through PCIe's FLR.

The double reset could happen because the controller disabling had been
occuring outside of the RESETTING state when we can't schedule the
reset_work, which is to occur later. When another reset schedules in
between these events, the controller ends up in the wrong state.

The end result of this series is simply to block subsequent resets by
initializing the controller state to RESETTING without actually scheduling
the reset_work.

v3 -> v4:

  Renamed nvme_reset_schedule() to nvme_try_sched_reset_work() and
  documented when it may fail (hot remove). I'm open to suggestions for
  a better name.

Keith Busch (5):
  nvme-pci: Free tagset if no IO queues
  nvme: Remove ADMIN_ONLY state
  nvme: Restart request timers in resetting state
  nvme: Prevent resets during paused controller state
  nvme: Wait for reset state when required

 drivers/nvme/host/core.c    | 91 +++++++++++++++++++++++++++----------
 drivers/nvme/host/fabrics.h |  3 +-
 drivers/nvme/host/nvme.h    |  5 +-
 drivers/nvme/host/pci.c     | 78 ++++++++++++++++++-------------
 drivers/nvme/host/rdma.c    |  8 ++++
 drivers/nvme/host/tcp.c     |  8 ++++
 6 files changed, 134 insertions(+), 59 deletions(-)

-- 
2.21.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCHv4 1/5] nvme-pci: Free tagset if no IO queues
  2019-10-10 16:57 [PATCHv4 0/5] nvme: double reset prevention Keith Busch
@ 2019-10-10 16:57 ` Keith Busch
  2019-10-10 16:57 ` [PATCHv4 2/5] nvme: Remove ADMIN_ONLY state Keith Busch
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Keith Busch @ 2019-10-10 16:57 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Sagi Grimberg
  Cc: Keith Busch, Judy Brock, Edmund Nadolski, James Smart

If a controller becomes degraded after a reset, we will not be able to
perform any IO. We currently teardown previously created request
queues and namespaces, but we had kept the unusable tagset. Free
it after all queues using it have been released.

Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/pci.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index bb88681f4dc3..fe24a02aa213 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2490,14 +2490,20 @@ static void nvme_release_prp_pools(struct nvme_dev *dev)
 	dma_pool_destroy(dev->prp_small_pool);
 }
 
+static void nvme_free_tagset(struct nvme_dev *dev)
+{
+	if (dev->tagset.tags)
+		blk_mq_free_tag_set(&dev->tagset);
+	dev->ctrl.tagset = NULL;
+}
+
 static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
 {
 	struct nvme_dev *dev = to_nvme_dev(ctrl);
 
 	nvme_dbbuf_dma_free(dev);
 	put_device(dev->dev);
-	if (dev->tagset.tags)
-		blk_mq_free_tag_set(&dev->tagset);
+	nvme_free_tagset(dev);
 	if (dev->ctrl.admin_q)
 		blk_put_queue(dev->ctrl.admin_q);
 	kfree(dev->queues);
@@ -2616,6 +2622,7 @@ static void nvme_reset_work(struct work_struct *work)
 		nvme_kill_queues(&dev->ctrl);
 		nvme_remove_namespaces(&dev->ctrl);
 		new_state = NVME_CTRL_ADMIN_ONLY;
+		nvme_free_tagset(dev);
 	} else {
 		nvme_start_queues(&dev->ctrl);
 		nvme_wait_freeze(&dev->ctrl);
-- 
2.21.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCHv4 2/5] nvme: Remove ADMIN_ONLY state
  2019-10-10 16:57 [PATCHv4 0/5] nvme: double reset prevention Keith Busch
  2019-10-10 16:57 ` [PATCHv4 1/5] nvme-pci: Free tagset if no IO queues Keith Busch
@ 2019-10-10 16:57 ` Keith Busch
  2019-10-10 16:57 ` [PATCHv4 3/5] nvme: Restart request timers in resetting state Keith Busch
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Keith Busch @ 2019-10-10 16:57 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Sagi Grimberg
  Cc: Keith Busch, Judy Brock, Edmund Nadolski, James Smart

The admin only state was intended to fence off actions that don't
apply to a non-IO capable controller. The only actual user of this is
the scan_work, and pci was the only transport to ever set this state.
The consequence of having this state is placing an additional burden on
every other action that applies to both live and admin only controllers.

Remove the admin only state and place the admin only burden on the only
place that actually cares: scan_work.

This also prepares to make it easier to temporarily pause a LIVE state
so that we don't need to remember which state the controller had been in
prior to the pause.

Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/core.c    | 23 ++++-------------------
 drivers/nvme/host/fabrics.h |  3 +--
 drivers/nvme/host/nvme.h    |  1 -
 drivers/nvme/host/pci.c     | 21 ++++++---------------
 4 files changed, 11 insertions(+), 37 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index fd7dea36c3b6..7a164b708863 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -116,7 +116,7 @@ static void nvme_queue_scan(struct nvme_ctrl *ctrl)
 	/*
 	 * Only new queue scan work when admin and IO queues are both alive
 	 */
-	if (ctrl->state == NVME_CTRL_LIVE)
+	if (ctrl->state == NVME_CTRL_LIVE && ctrl->tagset)
 		queue_work(nvme_wq, &ctrl->scan_work);
 }
 
@@ -137,8 +137,7 @@ int nvme_reset_ctrl_sync(struct nvme_ctrl *ctrl)
 	ret = nvme_reset_ctrl(ctrl);
 	if (!ret) {
 		flush_work(&ctrl->reset_work);
-		if (ctrl->state != NVME_CTRL_LIVE &&
-		    ctrl->state != NVME_CTRL_ADMIN_ONLY)
+		if (ctrl->state != NVME_CTRL_LIVE)
 			ret = -ENETRESET;
 	}
 
@@ -315,15 +314,6 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 
 	old_state = ctrl->state;
 	switch (new_state) {
-	case NVME_CTRL_ADMIN_ONLY:
-		switch (old_state) {
-		case NVME_CTRL_CONNECTING:
-			changed = true;
-			/* FALLTHRU */
-		default:
-			break;
-		}
-		break;
 	case NVME_CTRL_LIVE:
 		switch (old_state) {
 		case NVME_CTRL_NEW:
@@ -339,7 +329,6 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 		switch (old_state) {
 		case NVME_CTRL_NEW:
 		case NVME_CTRL_LIVE:
-		case NVME_CTRL_ADMIN_ONLY:
 			changed = true;
 			/* FALLTHRU */
 		default:
@@ -359,7 +348,6 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 	case NVME_CTRL_DELETING:
 		switch (old_state) {
 		case NVME_CTRL_LIVE:
-		case NVME_CTRL_ADMIN_ONLY:
 		case NVME_CTRL_RESETTING:
 		case NVME_CTRL_CONNECTING:
 			changed = true;
@@ -2874,7 +2862,6 @@ static int nvme_dev_open(struct inode *inode, struct file *file)
 
 	switch (ctrl->state) {
 	case NVME_CTRL_LIVE:
-	case NVME_CTRL_ADMIN_ONLY:
 		break;
 	default:
 		return -EWOULDBLOCK;
@@ -3168,7 +3155,6 @@ static ssize_t nvme_sysfs_show_state(struct device *dev,
 	static const char *const state_name[] = {
 		[NVME_CTRL_NEW]		= "new",
 		[NVME_CTRL_LIVE]	= "live",
-		[NVME_CTRL_ADMIN_ONLY]	= "only-admin",
 		[NVME_CTRL_RESETTING]	= "resetting",
 		[NVME_CTRL_CONNECTING]	= "connecting",
 		[NVME_CTRL_DELETING]	= "deleting",
@@ -3679,11 +3665,10 @@ static void nvme_scan_work(struct work_struct *work)
 	struct nvme_id_ctrl *id;
 	unsigned nn;
 
-	if (ctrl->state != NVME_CTRL_LIVE)
+	/* No tagset on a live ctrl means IO queues could not created */
+	if (ctrl->state != NVME_CTRL_LIVE || !ctrl->tagset)
 		return;
 
-	WARN_ON_ONCE(!ctrl->tagset);
-
 	if (test_and_clear_bit(NVME_AER_NOTICE_NS_CHANGED, &ctrl->events)) {
 		dev_info(ctrl->device, "rescanning namespaces.\n");
 		nvme_clear_changed_ns_log(ctrl);
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index 93f08d77c896..a0ec40ab62ee 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -182,8 +182,7 @@ bool nvmf_ip_options_match(struct nvme_ctrl *ctrl,
 static inline bool nvmf_check_ready(struct nvme_ctrl *ctrl, struct request *rq,
 		bool queue_live)
 {
-	if (likely(ctrl->state == NVME_CTRL_LIVE ||
-		   ctrl->state == NVME_CTRL_ADMIN_ONLY))
+	if (likely(ctrl->state == NVME_CTRL_LIVE))
 		return true;
 	return __nvmf_check_ready(ctrl, rq, queue_live);
 }
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 38a83ef5bcd3..2ba577271ada 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -161,7 +161,6 @@ static inline u16 nvme_req_qid(struct request *req)
 enum nvme_ctrl_state {
 	NVME_CTRL_NEW,
 	NVME_CTRL_LIVE,
-	NVME_CTRL_ADMIN_ONLY,    /* Only admin queue live */
 	NVME_CTRL_RESETTING,
 	NVME_CTRL_CONNECTING,
 	NVME_CTRL_DELETING,
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index fe24a02aa213..7d0de87d733d 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2263,10 +2263,7 @@ static bool __nvme_disable_io_queues(struct nvme_dev *dev, u8 opcode)
 	return true;
 }
 
-/*
- * return error value only when tagset allocation failed
- */
-static int nvme_dev_add(struct nvme_dev *dev)
+static void nvme_dev_add(struct nvme_dev *dev)
 {
 	int ret;
 
@@ -2296,7 +2293,7 @@ static int nvme_dev_add(struct nvme_dev *dev)
 		if (ret) {
 			dev_warn(dev->ctrl.device,
 				"IO queues tagset allocation failed %d\n", ret);
-			return ret;
+			return;
 		}
 		dev->ctrl.tagset = &dev->tagset;
 	} else {
@@ -2307,7 +2304,6 @@ static int nvme_dev_add(struct nvme_dev *dev)
 	}
 
 	nvme_dbbuf_set(dev);
-	return 0;
 }
 
 static int nvme_pci_enable(struct nvme_dev *dev)
@@ -2527,7 +2523,6 @@ static void nvme_reset_work(struct work_struct *work)
 		container_of(work, struct nvme_dev, ctrl.reset_work);
 	bool was_suspend = !!(dev->ctrl.ctrl_config & NVME_CC_SHN_NORMAL);
 	int result;
-	enum nvme_ctrl_state new_state = NVME_CTRL_LIVE;
 
 	if (WARN_ON(dev->ctrl.state != NVME_CTRL_RESETTING)) {
 		result = -ENODEV;
@@ -2621,14 +2616,11 @@ static void nvme_reset_work(struct work_struct *work)
 		dev_warn(dev->ctrl.device, "IO queues not created\n");
 		nvme_kill_queues(&dev->ctrl);
 		nvme_remove_namespaces(&dev->ctrl);
-		new_state = NVME_CTRL_ADMIN_ONLY;
 		nvme_free_tagset(dev);
 	} else {
 		nvme_start_queues(&dev->ctrl);
 		nvme_wait_freeze(&dev->ctrl);
-		/* hit this only when allocate tagset fails */
-		if (nvme_dev_add(dev))
-			new_state = NVME_CTRL_ADMIN_ONLY;
+		nvme_dev_add(dev);
 		nvme_unfreeze(&dev->ctrl);
 	}
 
@@ -2636,9 +2628,9 @@ static void nvme_reset_work(struct work_struct *work)
 	 * If only admin queue live, keep it to do further investigation or
 	 * recovery.
 	 */
-	if (!nvme_change_ctrl_state(&dev->ctrl, new_state)) {
+	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_LIVE)) {
 		dev_warn(dev->ctrl.device,
-			"failed to mark controller state %d\n", new_state);
+			"failed to mark controller live state\n");
 		result = -ENODEV;
 		goto out;
 	}
@@ -2945,8 +2937,7 @@ static int nvme_suspend(struct device *dev)
 	nvme_wait_freeze(ctrl);
 	nvme_sync_queues(ctrl);
 
-	if (ctrl->state != NVME_CTRL_LIVE &&
-	    ctrl->state != NVME_CTRL_ADMIN_ONLY)
+	if (ctrl->state != NVME_CTRL_LIVE)
 		goto unfreeze;
 
 	ret = nvme_get_power_state(ctrl, &ndev->last_ps);
-- 
2.21.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCHv4 3/5] nvme: Restart request timers in resetting state
  2019-10-10 16:57 [PATCHv4 0/5] nvme: double reset prevention Keith Busch
  2019-10-10 16:57 ` [PATCHv4 1/5] nvme-pci: Free tagset if no IO queues Keith Busch
  2019-10-10 16:57 ` [PATCHv4 2/5] nvme: Remove ADMIN_ONLY state Keith Busch
@ 2019-10-10 16:57 ` Keith Busch
  2019-10-10 16:57 ` [PATCHv4 4/5] nvme: Prevent resets during paused controller state Keith Busch
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Keith Busch @ 2019-10-10 16:57 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Sagi Grimberg
  Cc: Keith Busch, Judy Brock, Edmund Nadolski, James Smart

A controller in the resetting state has not yet completed its recovery
actions. The pci and fc transports were already handling this, so update
the remaining transports to not attempt additional recovery in this
state. Instead, just restart the request timer.

Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/rdma.c | 8 ++++++++
 drivers/nvme/host/tcp.c  | 8 ++++++++
 2 files changed, 16 insertions(+)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 4d280160dd3f..f19a28b4e997 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1701,6 +1701,14 @@ nvme_rdma_timeout(struct request *rq, bool reserved)
 	dev_warn(ctrl->ctrl.device, "I/O %d QID %d timeout\n",
 		 rq->tag, nvme_rdma_queue_idx(queue));
 
+	/*
+	 * Restart the timer if a controller reset is already scheduled. Any
+	 * timed out commands would be handled before entering the connecting
+	 * state.
+	 */
+	if (ctrl->ctrl.state == NVME_CTRL_RESETTING)
+		return BLK_EH_RESET_TIMER;
+
 	if (ctrl->ctrl.state != NVME_CTRL_LIVE) {
 		/*
 		 * Teardown immediately if controller times out while starting
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 385a5212c10f..33de2fddfbb2 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2044,6 +2044,14 @@ nvme_tcp_timeout(struct request *rq, bool reserved)
 	struct nvme_tcp_ctrl *ctrl = req->queue->ctrl;
 	struct nvme_tcp_cmd_pdu *pdu = req->pdu;
 
+	/*
+	 * Restart the timer if a controller reset is already scheduled. Any
+	 * timed out commands would be handled before entering the connecting
+	 * state.
+	 */
+	if (ctrl->ctrl.state == NVME_CTRL_RESETTING)
+		return BLK_EH_RESET_TIMER;
+
 	dev_warn(ctrl->ctrl.device,
 		"queue %d: timeout request %#x type %d\n",
 		nvme_tcp_queue_id(req->queue), rq->tag, pdu->hdr.type);
-- 
2.21.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCHv4 4/5] nvme: Prevent resets during paused controller state
  2019-10-10 16:57 [PATCHv4 0/5] nvme: double reset prevention Keith Busch
                   ` (2 preceding siblings ...)
  2019-10-10 16:57 ` [PATCHv4 3/5] nvme: Restart request timers in resetting state Keith Busch
@ 2019-10-10 16:57 ` Keith Busch
  2019-10-10 16:57 ` [PATCHv4 5/5] nvme: Wait for reset state when required Keith Busch
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 13+ messages in thread
From: Keith Busch @ 2019-10-10 16:57 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Sagi Grimberg
  Cc: Keith Busch, Judy Brock, Edmund Nadolski, James Smart

A paused controller is doing critical internal activation work in the
background. Prevent subsequent controller resets from occurring during
this period by setting the controller state to RESETTING first. A helper
function, nvme_try_sched_reset_work(), is introduced for these paths so
they may continue with scheduling the reset_work after they've completed
their uninterruptible critical section.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/core.c | 29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7a164b708863..0f5a82b7d3a4 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -120,6 +120,21 @@ static void nvme_queue_scan(struct nvme_ctrl *ctrl)
 		queue_work(nvme_wq, &ctrl->scan_work);
 }
 
+/*
+ * Use this function to proceed with scheduling reset_work for a controller
+ * that had previously been set to the resetting state. This is intended for
+ * code paths that can't be interrupted by other reset attempts. A hot removal
+ * may prevent this from succeeding.
+ */
+static int nvme_try_sched_reset(struct nvme_ctrl *ctrl)
+{
+	if (ctrl->state != NVME_CTRL_RESETTING)
+		return -EBUSY;
+	if (!queue_work(nvme_reset_wq, &ctrl->reset_work))
+		return -EBUSY;
+	return 0;
+}
+
 int nvme_reset_ctrl(struct nvme_ctrl *ctrl)
 {
 	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING))
@@ -3829,13 +3844,13 @@ static void nvme_fw_act_work(struct work_struct *work)
 		if (time_after(jiffies, fw_act_timeout)) {
 			dev_warn(ctrl->device,
 				"Fw activation timeout, reset controller\n");
-			nvme_reset_ctrl(ctrl);
-			break;
+			nvme_try_sched_reset(ctrl);
+			return;
 		}
 		msleep(100);
 	}
 
-	if (ctrl->state != NVME_CTRL_LIVE)
+	if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_LIVE))
 		return;
 
 	nvme_start_queues(ctrl);
@@ -3855,7 +3870,13 @@ static void nvme_handle_aen_notice(struct nvme_ctrl *ctrl, u32 result)
 		nvme_queue_scan(ctrl);
 		break;
 	case NVME_AER_NOTICE_FW_ACT_STARTING:
-		queue_work(nvme_wq, &ctrl->fw_act_work);
+		/*
+		 * We are (ab)using the RESETTING state to prevent subsequent
+		 * recovery actions from interfering with the controller's
+		 * firmware activation.
+		 */
+		if (nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING))
+			queue_work(nvme_wq, &ctrl->fw_act_work);
 		break;
 #ifdef CONFIG_NVME_MULTIPATH
 	case NVME_AER_NOTICE_ANA:
-- 
2.21.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCHv4 5/5] nvme: Wait for reset state when required
  2019-10-10 16:57 [PATCHv4 0/5] nvme: double reset prevention Keith Busch
                   ` (3 preceding siblings ...)
  2019-10-10 16:57 ` [PATCHv4 4/5] nvme: Prevent resets during paused controller state Keith Busch
@ 2019-10-10 16:57 ` Keith Busch
  2019-10-16  0:56   ` Balbir Singh
  2019-10-10 17:09 ` [PATCHv4 0/5] nvme: double reset prevention Nadolski, Edmund
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 13+ messages in thread
From: Keith Busch @ 2019-10-10 16:57 UTC (permalink / raw)
  To: linux-nvme, Christoph Hellwig, Sagi Grimberg
  Cc: Keith Busch, Judy Brock, Edmund Nadolski, James Smart

Prevent simultaneous controller disabling/enabling tasks from interfering
with each other through a method to wait until the task successfully
transitioned the controller to the RESETTING state. This ensures disabling
the controller will not be interrupted by another reset path, otherwise
a concurrent reset would leave the controller in the wrong state.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/core.c | 41 +++++++++++++++++++++++++++++++++--
 drivers/nvme/host/nvme.h |  4 ++++
 drivers/nvme/host/pci.c  | 46 +++++++++++++++++++++++++++-------------
 3 files changed, 74 insertions(+), 17 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0f5a82b7d3a4..7442c99a5ed7 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -126,7 +126,7 @@ static void nvme_queue_scan(struct nvme_ctrl *ctrl)
  * code paths that can't be interrupted by other reset attempts. A hot removal
  * may prevent this from succeeding.
  */
-static int nvme_try_sched_reset(struct nvme_ctrl *ctrl)
+int nvme_try_sched_reset(struct nvme_ctrl *ctrl)
 {
 	if (ctrl->state != NVME_CTRL_RESETTING)
 		return -EBUSY;
@@ -134,6 +134,7 @@ static int nvme_try_sched_reset(struct nvme_ctrl *ctrl)
 		return -EBUSY;
 	return 0;
 }
+EXPORT_SYMBOL_GPL(nvme_try_sched_reset);
 
 int nvme_reset_ctrl(struct nvme_ctrl *ctrl)
 {
@@ -384,8 +385,10 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 		break;
 	}
 
-	if (changed)
+	if (changed) {
 		ctrl->state = new_state;
+		wake_up_all(&ctrl->state_wq);
+	}
 
 	spin_unlock_irqrestore(&ctrl->lock, flags);
 	if (changed && ctrl->state == NVME_CTRL_LIVE)
@@ -394,6 +397,39 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 }
 EXPORT_SYMBOL_GPL(nvme_change_ctrl_state);
 
+/*
+ * Returns true for sink states that can't ever transition back to live.
+ */
+static bool nvme_state_terminal(struct nvme_ctrl *ctrl)
+{
+	switch (ctrl->state) {
+	case NVME_CTRL_NEW:
+	case NVME_CTRL_LIVE:
+	case NVME_CTRL_RESETTING:
+	case NVME_CTRL_CONNECTING:
+		return false;
+	case NVME_CTRL_DELETING:
+	case NVME_CTRL_DEAD:
+		return true;
+	default:
+		WARN_ONCE(1, "Unhandled ctrl state:%d", ctrl->state);
+		return true;
+	}
+}
+
+/*
+ * Waits for the controller state to be resetting, or returns false if it is
+ * not possible to ever transition to that state.
+ */
+bool nvme_wait_reset(struct nvme_ctrl *ctrl)
+{
+	wait_event(ctrl->state_wq,
+		   nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING) ||
+		   nvme_state_terminal(ctrl));
+	return ctrl->state == NVME_CTRL_RESETTING;
+}
+EXPORT_SYMBOL_GPL(nvme_wait_reset);
+
 static void nvme_free_ns_head(struct kref *ref)
 {
 	struct nvme_ns_head *head =
@@ -3999,6 +4035,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
 	INIT_WORK(&ctrl->async_event_work, nvme_async_event_work);
 	INIT_WORK(&ctrl->fw_act_work, nvme_fw_act_work);
 	INIT_WORK(&ctrl->delete_work, nvme_delete_ctrl_work);
+	init_waitqueue_head(&ctrl->state_wq);
 
 	INIT_DELAYED_WORK(&ctrl->ka_work, nvme_keep_alive_work);
 	memset(&ctrl->ka_cmd, 0, sizeof(ctrl->ka_cmd));
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 2ba577271ada..22e8401352c2 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -15,6 +15,7 @@
 #include <linux/sed-opal.h>
 #include <linux/fault-inject.h>
 #include <linux/rcupdate.h>
+#include <linux/wait.h>
 
 #include <trace/events/block.h>
 
@@ -198,6 +199,7 @@ struct nvme_ctrl {
 	struct cdev cdev;
 	struct work_struct reset_work;
 	struct work_struct delete_work;
+	wait_queue_head_t state_wq;
 
 	struct nvme_subsystem *subsys;
 	struct list_head subsys_entry;
@@ -448,6 +450,7 @@ void nvme_complete_rq(struct request *req);
 bool nvme_cancel_request(struct request *req, void *data, bool reserved);
 bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 		enum nvme_ctrl_state new_state);
+bool nvme_wait_reset(struct nvme_ctrl *ctrl);
 int nvme_disable_ctrl(struct nvme_ctrl *ctrl);
 int nvme_enable_ctrl(struct nvme_ctrl *ctrl);
 int nvme_shutdown_ctrl(struct nvme_ctrl *ctrl);
@@ -498,6 +501,7 @@ int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count);
 void nvme_stop_keep_alive(struct nvme_ctrl *ctrl);
 int nvme_reset_ctrl(struct nvme_ctrl *ctrl);
 int nvme_reset_ctrl_sync(struct nvme_ctrl *ctrl);
+int nvme_try_sched_reset(struct nvme_ctrl *ctrl);
 int nvme_delete_ctrl(struct nvme_ctrl *ctrl);
 
 int nvme_get_log(struct nvme_ctrl *ctrl, u32 nsid, u8 log_page, u8 lsp,
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 7d0de87d733d..4fcfd01f350d 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2463,6 +2463,14 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 	mutex_unlock(&dev->shutdown_lock);
 }
 
+static int nvme_disable_prepare_reset(struct nvme_dev *dev, bool shutdown)
+{
+	if (!nvme_wait_reset(&dev->ctrl))
+		return -EBUSY;
+	nvme_dev_disable(dev, shutdown);
+	return 0;
+}
+
 static int nvme_setup_prp_pools(struct nvme_dev *dev)
 {
 	dev->prp_page_pool = dma_pool_create("prp list page", dev->dev,
@@ -2510,6 +2518,11 @@ static void nvme_pci_free_ctrl(struct nvme_ctrl *ctrl)
 
 static void nvme_remove_dead_ctrl(struct nvme_dev *dev)
 {
+	/*
+	 * Set state to deleting now to avoid blocking nvme_wait_reset(), which
+	 * may be holding this pci_dev's device lock.
+	 */
+	nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_DELETING);
 	nvme_get_ctrl(&dev->ctrl);
 	nvme_dev_disable(dev, false);
 	nvme_kill_queues(&dev->ctrl);
@@ -2835,19 +2848,28 @@ static int nvme_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 static void nvme_reset_prepare(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
-	nvme_dev_disable(dev, false);
+
+	/*
+	 * We don't need to check the return value from waiting for the reset
+	 * state as pci_dev device lock is held, making it impossible to race
+	 * with ->remove().
+	 */
+	nvme_disable_prepare_reset(dev, false);
+	nvme_sync_queues(&dev->ctrl);
 }
 
 static void nvme_reset_done(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
-	nvme_reset_ctrl_sync(&dev->ctrl);
+
+	if (!nvme_try_sched_reset(&dev->ctrl))
+		flush_work(&dev->ctrl.reset_work);
 }
 
 static void nvme_shutdown(struct pci_dev *pdev)
 {
 	struct nvme_dev *dev = pci_get_drvdata(pdev);
-	nvme_dev_disable(dev, true);
+	nvme_disable_prepare_reset(dev, true);
 }
 
 /*
@@ -2900,7 +2922,7 @@ static int nvme_resume(struct device *dev)
 
 	if (ndev->last_ps == U32_MAX ||
 	    nvme_set_power_state(ctrl, ndev->last_ps) != 0)
-		nvme_reset_ctrl(ctrl);
+		return nvme_try_sched_reset(&ndev->ctrl);
 	return 0;
 }
 
@@ -2928,10 +2950,8 @@ static int nvme_suspend(struct device *dev)
 	 */
 	if (pm_suspend_via_firmware() || !ctrl->npss ||
 	    !pcie_aspm_enabled(pdev) ||
-	    (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND)) {
-		nvme_dev_disable(ndev, true);
-		return 0;
-	}
+	    (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))
+		return nvme_disable_prepare_reset(ndev, true);
 
 	nvme_start_freeze(ctrl);
 	nvme_wait_freeze(ctrl);
@@ -2963,9 +2983,8 @@ static int nvme_suspend(struct device *dev)
 		 * Clearing npss forces a controller reset on resume. The
 		 * correct value will be resdicovered then.
 		 */
-		nvme_dev_disable(ndev, true);
+		ret = nvme_disable_prepare_reset(ndev, true);
 		ctrl->npss = 0;
-		ret = 0;
 	}
 unfreeze:
 	nvme_unfreeze(ctrl);
@@ -2975,9 +2994,7 @@ static int nvme_suspend(struct device *dev)
 static int nvme_simple_suspend(struct device *dev)
 {
 	struct nvme_dev *ndev = pci_get_drvdata(to_pci_dev(dev));
-
-	nvme_dev_disable(ndev, true);
-	return 0;
+	return nvme_disable_prepare_reset(ndev, true);
 }
 
 static int nvme_simple_resume(struct device *dev)
@@ -2985,8 +3002,7 @@ static int nvme_simple_resume(struct device *dev)
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct nvme_dev *ndev = pci_get_drvdata(pdev);
 
-	nvme_reset_ctrl(&ndev->ctrl);
-	return 0;
+	return nvme_try_sched_reset(&ndev->ctrl);
 }
 
 static const struct dev_pm_ops nvme_dev_pm_ops = {
-- 
2.21.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCHv4 0/5] nvme: double reset prevention
  2019-10-10 16:57 [PATCHv4 0/5] nvme: double reset prevention Keith Busch
                   ` (4 preceding siblings ...)
  2019-10-10 16:57 ` [PATCHv4 5/5] nvme: Wait for reset state when required Keith Busch
@ 2019-10-10 17:09 ` Nadolski, Edmund
  2019-10-11 15:58 ` Nadolski, Edmund
  2019-10-14  7:14 ` Christoph Hellwig
  7 siblings, 0 replies; 13+ messages in thread
From: Nadolski, Edmund @ 2019-10-10 17:09 UTC (permalink / raw)
  To: Keith Busch, linux-nvme, Christoph Hellwig, Sagi Grimberg
  Cc: Judy Brock, James Smart

On 10/10/2019 10:57 AM, Keith Busch wrote:
> The main objective of this series is to prevent double resets. This sort
> of thing is known to have happened if a timeout occurs at roughly the
> same time as a user intiated reset, like through through PCIe's FLR.
> 
> The double reset could happen because the controller disabling had been
> occuring outside of the RESETTING state when we can't schedule the
> reset_work, which is to occur later. When another reset schedules in
> between these events, the controller ends up in the wrong state.
> 
> The end result of this series is simply to block subsequent resets by
> initializing the controller state to RESETTING without actually scheduling
> the reset_work.
> 
> v3 -> v4:
> 
>    Renamed nvme_reset_schedule() to nvme_try_sched_reset_work() and
>    documented when it may fail (hot remove). I'm open to suggestions for
>    a better name.

nvme_queue_reset()
nvme_atomic_reset()
nvme_queue_atomic_reset()

Ed


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCHv4 0/5] nvme: double reset prevention
  2019-10-10 16:57 [PATCHv4 0/5] nvme: double reset prevention Keith Busch
                   ` (5 preceding siblings ...)
  2019-10-10 17:09 ` [PATCHv4 0/5] nvme: double reset prevention Nadolski, Edmund
@ 2019-10-11 15:58 ` Nadolski, Edmund
  2019-10-11 16:27   ` Keith Busch
  2019-10-14  7:14 ` Christoph Hellwig
  7 siblings, 1 reply; 13+ messages in thread
From: Nadolski, Edmund @ 2019-10-11 15:58 UTC (permalink / raw)
  To: Keith Busch, linux-nvme, Christoph Hellwig, Sagi Grimberg
  Cc: Judy Brock, James Smart

On 10/10/2019 10:57 AM, Keith Busch wrote:
> The main objective of this series is to prevent double resets. This sort
> of thing is known to have happened if a timeout occurs at roughly the
> same time as a user intiated reset, like through through PCIe's FLR.
> 
> The double reset could happen because the controller disabling had been
> occuring outside of the RESETTING state when we can't schedule the
> reset_work, which is to occur later. When another reset schedules in
> between these events, the controller ends up in the wrong state.
> 
> The end result of this series is simply to block subsequent resets by
> initializing the controller state to RESETTING without actually scheduling
> the reset_work.
> 
> v3 -> v4:
> 
>    Renamed nvme_reset_schedule() to nvme_try_sched_reset_work() and
>    documented when it may fail (hot remove). I'm open to suggestions for
>    a better name.
> 
> Keith Busch (5):
>    nvme-pci: Free tagset if no IO queues
>    nvme: Remove ADMIN_ONLY state
>    nvme: Restart request timers in resetting state
>    nvme: Prevent resets during paused controller state
>    nvme: Wait for reset state when required
> 
>   drivers/nvme/host/core.c    | 91 +++++++++++++++++++++++++++----------
>   drivers/nvme/host/fabrics.h |  3 +-
>   drivers/nvme/host/nvme.h    |  5 +-
>   drivers/nvme/host/pci.c     | 78 ++++++++++++++++++-------------
>   drivers/nvme/host/rdma.c    |  8 ++++
>   drivers/nvme/host/tcp.c     |  8 ++++
>   6 files changed, 134 insertions(+), 59 deletions(-)
> 

Tested-by: Edmund Nadolski <edmund.nadolski@intel.com>

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCHv4 0/5] nvme: double reset prevention
  2019-10-11 15:58 ` Nadolski, Edmund
@ 2019-10-11 16:27   ` Keith Busch
  2019-10-14  7:15     ` Christoph Hellwig
  0 siblings, 1 reply; 13+ messages in thread
From: Keith Busch @ 2019-10-11 16:27 UTC (permalink / raw)
  To: Nadolski, Edmund
  Cc: James Smart, Judy Brock, Christoph Hellwig, linux-nvme, Sagi Grimberg

On Fri, Oct 11, 2019 at 09:58:52AM -0600, Nadolski, Edmund wrote:
> On 10/10/2019 10:57 AM, Keith Busch wrote:
> > The main objective of this series is to prevent double resets. This sort
> > of thing is known to have happened if a timeout occurs at roughly the
> > same time as a user intiated reset, like through through PCIe's FLR.
> > 
> > The double reset could happen because the controller disabling had been
> > occuring outside of the RESETTING state when we can't schedule the
> > reset_work, which is to occur later. When another reset schedules in
> > between these events, the controller ends up in the wrong state.
> > 
> > The end result of this series is simply to block subsequent resets by
> > initializing the controller state to RESETTING without actually scheduling
> > the reset_work.
> > 
> > v3 -> v4:
> > 
> >    Renamed nvme_reset_schedule() to nvme_try_sched_reset_work() and
> >    documented when it may fail (hot remove). I'm open to suggestions for
> >    a better name.
> > 
> > Keith Busch (5):
> >    nvme-pci: Free tagset if no IO queues
> >    nvme: Remove ADMIN_ONLY state
> >    nvme: Restart request timers in resetting state
> >    nvme: Prevent resets during paused controller state
> >    nvme: Wait for reset state when required
> > 
> >   drivers/nvme/host/core.c    | 91 +++++++++++++++++++++++++++----------
> >   drivers/nvme/host/fabrics.h |  3 +-
> >   drivers/nvme/host/nvme.h    |  5 +-
> >   drivers/nvme/host/pci.c     | 78 ++++++++++++++++++-------------
> >   drivers/nvme/host/rdma.c    |  8 ++++
> >   drivers/nvme/host/tcp.c     |  8 ++++
> >   6 files changed, 134 insertions(+), 59 deletions(-)
> > 
> 
> Tested-by: Edmund Nadolski <edmund.nadolski@intel.com>

Thanks, Ed.

Christoph, any remaining concerns on this? If no, do you prefer I push
this to 5.5 or 5.4? This fixings reported bugs, but it ended up a bit
larger and later than I originally hoped, so I'm okay either way.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCHv4 0/5] nvme: double reset prevention
  2019-10-10 16:57 [PATCHv4 0/5] nvme: double reset prevention Keith Busch
                   ` (6 preceding siblings ...)
  2019-10-11 15:58 ` Nadolski, Edmund
@ 2019-10-14  7:14 ` Christoph Hellwig
  7 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2019-10-14  7:14 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, James Smart, linux-nvme, Edmund Nadolski,
	Judy Brock, Christoph Hellwig

For all the remaining patches:

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCHv4 0/5] nvme: double reset prevention
  2019-10-11 16:27   ` Keith Busch
@ 2019-10-14  7:15     ` Christoph Hellwig
  2019-10-14 15:06       ` Keith Busch
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2019-10-14  7:15 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, James Smart, linux-nvme, Nadolski, Edmund,
	Judy Brock, Christoph Hellwig

On Fri, Oct 11, 2019 at 10:27:14AM -0600, Keith Busch wrote:
> Christoph, any remaining concerns on this? If no, do you prefer I push
> this to 5.5 or 5.4? This fixings reported bugs, but it ended up a bit
> larger and later than I originally hoped, so I'm okay either way.

We're still relatively in the 5.4 cycle, so I think we should aim
for that.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCHv4 0/5] nvme: double reset prevention
  2019-10-14  7:15     ` Christoph Hellwig
@ 2019-10-14 15:06       ` Keith Busch
  0 siblings, 0 replies; 13+ messages in thread
From: Keith Busch @ 2019-10-14 15:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Judy Brock, Nadolski, Edmund, Sagi Grimberg, linux-nvme, James Smart

On Mon, Oct 14, 2019 at 09:15:36AM +0200, Christoph Hellwig wrote:
> On Fri, Oct 11, 2019 at 10:27:14AM -0600, Keith Busch wrote:
> > Christoph, any remaining concerns on this? If no, do you prefer I push
> > this to 5.5 or 5.4? This fixings reported bugs, but it ended up a bit
> > larger and later than I originally hoped, so I'm okay either way.
> 
> We're still relatively in the 5.4 cycle, so I think we should aim
> for that.

Thanks, applied to nvme-5.4. Will send the PR mid-week.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCHv4 5/5] nvme: Wait for reset state when required
  2019-10-10 16:57 ` [PATCHv4 5/5] nvme: Wait for reset state when required Keith Busch
@ 2019-10-16  0:56   ` Balbir Singh
  0 siblings, 0 replies; 13+ messages in thread
From: Balbir Singh @ 2019-10-16  0:56 UTC (permalink / raw)
  To: Keith Busch
  Cc: Sagi Grimberg, James Smart, linux-nvme, Edmund Nadolski,
	Judy Brock, Christoph Hellwig

On Fri, Oct 11, 2019 at 01:57:36AM +0900, Keith Busch wrote:
> Prevent simultaneous controller disabling/enabling tasks from interfering
> with each other through a method to wait until the task successfully
> transitioned the controller to the RESETTING state. This ensures disabling
> the controller will not be interrupted by another reset path, otherwise
> a concurrent reset would leave the controller in the wrong state.
> 

Yes, I've run into this when nvme_core.io_timeout is smaller than the
time required to wait on reset (CTS.RDY) and multiple resets occur one
after the other.

> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  drivers/nvme/host/core.c | 41 +++++++++++++++++++++++++++++++++--
>  drivers/nvme/host/nvme.h |  4 ++++
>  drivers/nvme/host/pci.c  | 46 +++++++++++++++++++++++++++-------------
>  3 files changed, 74 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 0f5a82b7d3a4..7442c99a5ed7 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -126,7 +126,7 @@ static void nvme_queue_scan(struct nvme_ctrl *ctrl)
>   * code paths that can't be interrupted by other reset attempts. A hot removal
>   * may prevent this from succeeding.
>   */
> -static int nvme_try_sched_reset(struct nvme_ctrl *ctrl)
> +int nvme_try_sched_reset(struct nvme_ctrl *ctrl)
>  {
>  	if (ctrl->state != NVME_CTRL_RESETTING)
>  		return -EBUSY;
> @@ -134,6 +134,7 @@ static int nvme_try_sched_reset(struct nvme_ctrl *ctrl)
>  		return -EBUSY;
>  	return 0;
>  }
> +EXPORT_SYMBOL_GPL(nvme_try_sched_reset);
>  
>  int nvme_reset_ctrl(struct nvme_ctrl *ctrl)
>  {
> @@ -384,8 +385,10 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
>  		break;
>  	}
>  
> -	if (changed)
> +	if (changed) {
>  		ctrl->state = new_state;
> +		wake_up_all(&ctrl->state_wq);
> +	}

Shouldn't we wake up only if the new_state is terminal or if the
new_state is NVME_CTRL_RESETTING?

>  
>  	spin_unlock_irqrestore(&ctrl->lock, flags);
>  	if (changed && ctrl->state == NVME_CTRL_LIVE)
> @@ -394,6 +397,39 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
>  }
>  EXPORT_SYMBOL_GPL(nvme_change_ctrl_state);
>  
> +/*
> + * Returns true for sink states that can't ever transition back to live.
> + */
> +static bool nvme_state_terminal(struct nvme_ctrl *ctrl)
> +{
> +	switch (ctrl->state) {
> +	case NVME_CTRL_NEW:
> +	case NVME_CTRL_LIVE:
> +	case NVME_CTRL_RESETTING:
> +	case NVME_CTRL_CONNECTING:
> +		return false;
> +	case NVME_CTRL_DELETING:
> +	case NVME_CTRL_DEAD:
> +		return true;
> +	default:
> +		WARN_ONCE(1, "Unhandled ctrl state:%d", ctrl->state);
> +		return true;
> +	}
> +}
> +
> +/*
> + * Waits for the controller state to be resetting, or returns false if it is
> + * not possible to ever transition to that state.
> + */
> +bool nvme_wait_reset(struct nvme_ctrl *ctrl)
> +{
> +	wait_event(ctrl->state_wq,
> +		   nvme_change_ctrl_state(ctrl, NVME_CTRL_RESETTING) ||

Would we expect this nvme_change_ctrl_state to cause a wake_up_all()
right away?

> +		   nvme_state_terminal(ctrl));
> +	return ctrl->state == NVME_CTRL_RESETTING;
> +}
> +EXPORT_SYMBOL_GPL(nvme_wait_reset);
> +
>  static void nvme_free_ns_head(struct kref *ref)
>  {
>  	struct nvme_ns_head *head =
> @@ -3999,6 +4035,7 @@ int nvme_init_ctrl(struct nvme_ctrl *ctrl, struct device *dev,
>  	INIT_WORK(&ctrl->async_event_work, nvme_async_event_work);
>  	INIT_WORK(&ctrl->fw_act_work, nvme_fw_act_work);
>  	INIT_WORK(&ctrl->delete_work, nvme_delete_ctrl_work);
> +	init_waitqueue_head(&ctrl->state_wq);
>  
>  	INIT_DELAYED_WORK(&ctrl->ka_work, nvme_keep_alive_work);
>  	memset(&ctrl->ka_cmd, 0, sizeof(ctrl->ka_cmd));
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 2ba577271ada..22e8401352c2 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -15,6 +15,7 @@
>  #include <linux/sed-opal.h>
>  #include <linux/fault-inject.h>
>  #include <linux/rcupdate.h>
> +#include <linux/wait.h>
>  
>  #include <trace/events/block.h>
>  
> @@ -198,6 +199,7 @@ struct nvme_ctrl {
>  	struct cdev cdev;
>  	struct work_struct reset_work;
>  	struct work_struct delete_work;
> +	wait_queue_head_t state_wq;
>  
>  	struct nvme_subsystem *subsys;
>  	struct list_head subsys_entry;
> @@ -448,6 +450,7 @@ void nvme_complete_rq(struct request *req);
>  bool nvme_cancel_request(struct request *req, void *data, bool reserved);
>  bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
>  		enum nvme_ctrl_state new_state);
> +bool nvme_wait_reset(struct nvme_ctrl *ctrl);
>  int nvme_disable_ctrl(struct nvme_ctrl *ctrl);
>  int nvme_enable_ctrl(struct nvme_ctrl *ctrl);
>  int nvme_shutdown_ctrl(struct nvme_ctrl *ctrl);
> @@ -498,6 +501,7 @@ int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count);
>  void nvme_stop_keep_alive(struct nvme_ctrl *ctrl);
>  int nvme_reset_ctrl(struct nvme_ctrl *ctrl);
>  int nvme_reset_ctrl_sync(struct nvme_ctrl *ctrl);
> +int nvme_try_sched_reset(struct nvme_ctrl *ctrl);
>  int nvme_delete_ctrl(struct nvme_ctrl *ctrl);
>  
>  int nvme_get_log(struct nvme_ctrl *ctrl, u32 nsid, u8 log_page, u8 lsp,
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 7d0de87d733d..4fcfd01f350d 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2463,6 +2463,14 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
>  	mutex_unlock(&dev->shutdown_lock);
>  }
>  
> +static int nvme_disable_prepare_reset(struct nvme_dev *dev, bool shutdown)
> +{

This sounds like we should disable the prepare reset, but I think it
really means in nvme disable, prepare for reset. Should we call it
nvme_disable_and_prepare_reset()?

I've not checked the state machine by hand, but

Acked-by: Balbir Singh <bsingharora@gmail.com>

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-10-16  0:57 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-10 16:57 [PATCHv4 0/5] nvme: double reset prevention Keith Busch
2019-10-10 16:57 ` [PATCHv4 1/5] nvme-pci: Free tagset if no IO queues Keith Busch
2019-10-10 16:57 ` [PATCHv4 2/5] nvme: Remove ADMIN_ONLY state Keith Busch
2019-10-10 16:57 ` [PATCHv4 3/5] nvme: Restart request timers in resetting state Keith Busch
2019-10-10 16:57 ` [PATCHv4 4/5] nvme: Prevent resets during paused controller state Keith Busch
2019-10-10 16:57 ` [PATCHv4 5/5] nvme: Wait for reset state when required Keith Busch
2019-10-16  0:56   ` Balbir Singh
2019-10-10 17:09 ` [PATCHv4 0/5] nvme: double reset prevention Nadolski, Edmund
2019-10-11 15:58 ` Nadolski, Edmund
2019-10-11 16:27   ` Keith Busch
2019-10-14  7:15     ` Christoph Hellwig
2019-10-14 15:06       ` Keith Busch
2019-10-14  7:14 ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).