All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4v3] nvme-fc: track state change failures
@ 2019-05-20  6:36 Hannes Reinecke
  2019-05-20  6:36 ` [PATCH 1/4] nvme: separate out nvme_ctrl_state_name() Hannes Reinecke
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Hannes Reinecke @ 2019-05-20  6:36 UTC (permalink / raw)


Hi all,

this patchset improves logging in the nvme-fc driver if expcected
state changes fail, and fixes up one case where a failed state change
might lead to a stuck controller.

Changes to v2:
- Tear down association on failure as suggested by James Smart
- Align nvme_fc_delete_association() with the exit path
Changes to v1:
- simplified accessor function as suggested by Minwoo Im
- Included suggestions from James Smart

Hannes Reinecke (4):
  nvme: separate out nvme_ctrl_state_name()
  nvme-fc: track state change failures during reconnect
  nvme-fc: fail reconnect if state change fails
  nvme-fc: align nvme_fc_delete_association() with exit path

 drivers/nvme/host/core.c | 37 ++++++++++++++++++++++---------------
 drivers/nvme/host/fc.c   | 38 +++++++++++++++++++++++++++++++-------
 drivers/nvme/host/nvme.h |  1 +
 3 files changed, 54 insertions(+), 22 deletions(-)

-- 
2.16.4

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] nvme: separate out nvme_ctrl_state_name()
  2019-05-20  6:36 [PATCH 0/4v3] nvme-fc: track state change failures Hannes Reinecke
@ 2019-05-20  6:36 ` Hannes Reinecke
  2019-05-21  6:46   ` Christoph Hellwig
                     ` (2 more replies)
  2019-05-20  6:36 ` [PATCH 2/4] nvme-fc: track state change failures during reconnect Hannes Reinecke
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 20+ messages in thread
From: Hannes Reinecke @ 2019-05-20  6:36 UTC (permalink / raw)


Separate out nvme_ctrl_state_name() to return the controller state
as a string.

Signed-off-by: Hannes Reinecke <hare at suse.com>
Reviewed-by: James Smart <james.smart at broadcom.com>
Reviewed-by: Minwoo Im <minwoo.im.dev at gmail.com>
---
 drivers/nvme/host/core.c | 37 ++++++++++++++++++++++---------------
 drivers/nvme/host/nvme.h |  1 +
 2 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c2e4fa694f79..bd1bc7fcbcde 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -380,6 +380,25 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 }
 EXPORT_SYMBOL_GPL(nvme_change_ctrl_state);
 
+static const char *const nvme_ctrl_state_names[] = {
+	[NVME_CTRL_NEW]		= "new",
+	[NVME_CTRL_LIVE]	= "live",
+	[NVME_CTRL_ADMIN_ONLY]	= "only-admin",
+	[NVME_CTRL_RESETTING]	= "resetting",
+	[NVME_CTRL_CONNECTING]	= "connecting",
+	[NVME_CTRL_DELETING]	= "deleting",
+	[NVME_CTRL_DEAD]	= "dead",
+};
+
+const char *nvme_ctrl_state_name(struct nvme_ctrl *ctrl)
+{
+	if ((unsigned)ctrl->state < ARRAY_SIZE(nvme_ctrl_state_names) &&
+	    nvme_ctrl_state_names[ctrl->state])
+		return nvme_ctrl_state_names[ctrl->state];
+	return "unknown state";
+}
+EXPORT_SYMBOL_GPL(nvme_ctrl_state_name);
+
 static void nvme_free_ns_head(struct kref *ref)
 {
 	struct nvme_ns_head *head =
@@ -2989,21 +3008,9 @@ static ssize_t nvme_sysfs_show_state(struct device *dev,
 				     char *buf)
 {
 	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
-	static const char *const state_name[] = {
-		[NVME_CTRL_NEW]		= "new",
-		[NVME_CTRL_LIVE]	= "live",
-		[NVME_CTRL_ADMIN_ONLY]	= "only-admin",
-		[NVME_CTRL_RESETTING]	= "resetting",
-		[NVME_CTRL_CONNECTING]	= "connecting",
-		[NVME_CTRL_DELETING]	= "deleting",
-		[NVME_CTRL_DEAD]	= "dead",
-	};
-
-	if ((unsigned)ctrl->state < ARRAY_SIZE(state_name) &&
-	    state_name[ctrl->state])
-		return sprintf(buf, "%s\n", state_name[ctrl->state]);
-
-	return sprintf(buf, "unknown state\n");
+	const char *state_name = nvme_ctrl_state_name(ctrl);
+
+	return sprintf(buf, "%s\n", state_name);
 }
 
 static DEVICE_ATTR(state, S_IRUGO, nvme_sysfs_show_state, NULL);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 5ee75b5ff83f..b3b13e465dc6 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -419,6 +419,7 @@ void nvme_complete_rq(struct request *req);
 bool nvme_cancel_request(struct request *req, void *data, bool reserved);
 bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 		enum nvme_ctrl_state new_state);
+const char *nvme_ctrl_state_name(struct nvme_ctrl *ctrl);
 int nvme_disable_ctrl(struct nvme_ctrl *ctrl, u64 cap);
 int nvme_enable_ctrl(struct nvme_ctrl *ctrl, u64 cap);
 int nvme_shutdown_ctrl(struct nvme_ctrl *ctrl);
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/4] nvme-fc: track state change failures during reconnect
  2019-05-20  6:36 [PATCH 0/4v3] nvme-fc: track state change failures Hannes Reinecke
  2019-05-20  6:36 ` [PATCH 1/4] nvme: separate out nvme_ctrl_state_name() Hannes Reinecke
@ 2019-05-20  6:36 ` Hannes Reinecke
  2019-05-21  6:47   ` Christoph Hellwig
  2019-05-24  6:51   ` Sagi Grimberg
  2019-05-20  6:36 ` [PATCH 3/4] nvme-fc: fail reconnect if state change fails Hannes Reinecke
  2019-05-20  6:36 ` [PATCH 4/4] nvme-fc: align nvme_fc_delete_association() with exit path Hannes Reinecke
  3 siblings, 2 replies; 20+ messages in thread
From: Hannes Reinecke @ 2019-05-20  6:36 UTC (permalink / raw)


The nvme-fc driver has several situation under which an expected
state transition fails, but doesn't print out any messages if
this happens.
The patch adds logging for these situations.

Signed-off-by: Hannes Reinecke <hare at suse.com>
Reviewed-by: James Smart <james.smart at broadcom.com>
---
 drivers/nvme/host/fc.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 0c9e036afd09..e5c81ba2b7a1 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2867,8 +2867,12 @@ nvme_fc_reconnect_or_delete(struct nvme_fc_ctrl *ctrl, int status)
 	unsigned long recon_delay = ctrl->ctrl.opts->reconnect_delay * HZ;
 	bool recon = true;
 
-	if (ctrl->ctrl.state != NVME_CTRL_CONNECTING)
+	if (ctrl->ctrl.state != NVME_CTRL_CONNECTING) {
+		dev_info(ctrl->ctrl.device,
+			 "NVME-FC{%d}: couldn't reconnect in state %s\n",
+			 ctrl->cnum, nvme_ctrl_state_name(&ctrl->ctrl));
 		return;
+	}
 
 	if (portptr->port_state == FC_OBJSTATE_ONLINE)
 		dev_info(ctrl->ctrl.device,
@@ -2914,7 +2918,8 @@ __nvme_fc_terminate_io(struct nvme_fc_ctrl *ctrl)
 	    !nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING))
 		dev_err(ctrl->ctrl.device,
 			"NVME-FC{%d}: error_recovery: Couldn't change state "
-			"to CONNECTING\n", ctrl->cnum);
+			"from %s to CONNECTING\n", ctrl->cnum,
+			nvme_ctrl_state_name(&ctrl->ctrl));
 }
 
 static void
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/4] nvme-fc: fail reconnect if state change fails
  2019-05-20  6:36 [PATCH 0/4v3] nvme-fc: track state change failures Hannes Reinecke
  2019-05-20  6:36 ` [PATCH 1/4] nvme: separate out nvme_ctrl_state_name() Hannes Reinecke
  2019-05-20  6:36 ` [PATCH 2/4] nvme-fc: track state change failures during reconnect Hannes Reinecke
@ 2019-05-20  6:36 ` Hannes Reinecke
  2019-05-21 16:18   ` James Smart
  2019-05-22 17:43   ` Arun Easi
  2019-05-20  6:36 ` [PATCH 4/4] nvme-fc: align nvme_fc_delete_association() with exit path Hannes Reinecke
  3 siblings, 2 replies; 20+ messages in thread
From: Hannes Reinecke @ 2019-05-20  6:36 UTC (permalink / raw)


If the final state change to LIVE in nvme_fc_create_association()
fails the controller is not operational as no I/O is possible.
So we should be returning an error here to reschedule reconnect.
Additionally it should only be called while in CONNECTING state, so
add a check for this, too.

Signed-off-by: Hannes Reinecke <hare at suse.com>
---
 drivers/nvme/host/fc.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index e5c81ba2b7a1..71b22139e78b 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2624,6 +2624,14 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
 
 	++ctrl->ctrl.nr_reconnects;
 
+	if (ctrl->ctrl.state != NVME_CTRL_CONNECTING) {
+		dev_info(ctrl->ctrl.device,
+			 "NVME-FC{%d}: state %s cancelled new "
+			 "association attempt\n",
+			 ctrl->cnum, nvme_ctrl_state_name(&ctrl->ctrl));
+		return -ENODEV;
+	}
+
 	if (ctrl->rport->remoteport.port_state != FC_OBJSTATE_ONLINE)
 		return -ENODEV;
 
@@ -2726,6 +2734,14 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
 	}
 
 	changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
+	if (!changed && ctrl->ctrl.state != NVME_CTRL_DELETING) {
+		dev_err(ctrl->ctrl.device,
+			"NVME-FC{%d}: error_recovery: Couldn't change "
+			"state from %s to LIVE\n", ctrl->cnum,
+			nvme_ctrl_state_name(&ctrl->ctrl));
+		ret = -EAGAIN;
+		goto out_destroy_queues;
+	}
 
 	ctrl->ctrl.nr_reconnects = 0;
 
@@ -2734,6 +2750,9 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
 
 	return 0;	/* Success */
 
+out_destroy_queues:
+	nvme_fc_delete_hw_io_queues(ctrl);
+	nvme_fc_free_io_queues(ctrl);
 out_term_aen_ops:
 	nvme_fc_term_aen_ops(ctrl);
 out_disconnect_admin_queue:
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/4] nvme-fc: align nvme_fc_delete_association() with exit path
  2019-05-20  6:36 [PATCH 0/4v3] nvme-fc: track state change failures Hannes Reinecke
                   ` (2 preceding siblings ...)
  2019-05-20  6:36 ` [PATCH 3/4] nvme-fc: fail reconnect if state change fails Hannes Reinecke
@ 2019-05-20  6:36 ` Hannes Reinecke
  2019-05-21 16:25   ` James Smart
  3 siblings, 1 reply; 20+ messages in thread
From: Hannes Reinecke @ 2019-05-20  6:36 UTC (permalink / raw)


nvme_fc_delete_association() should align with the exit path in
nvme_fc_create_association() to ensure we are able to handle the
failure case properly.

Signed-off-by: Hannes Reinecke <hare at suse.com>
---
 drivers/nvme/host/fc.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 71b22139e78b..ed9f4affbe10 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2836,6 +2836,11 @@ nvme_fc_delete_association(struct nvme_fc_ctrl *ctrl)
 	ctrl->flags &= ~FCCTRL_TERMIO;
 	spin_unlock_irq(&ctrl->lock);
 
+	if (ctrl->ctrl.tagset) {
+		nvme_fc_delete_hw_io_queues(ctrl);
+		nvme_fc_free_io_queues(ctrl);
+	}
+
 	nvme_fc_term_aen_ops(ctrl);
 
 	/*
@@ -2847,11 +2852,6 @@ nvme_fc_delete_association(struct nvme_fc_ctrl *ctrl)
 	if (ctrl->association_id)
 		nvme_fc_xmt_disconnect_assoc(ctrl);
 
-	if (ctrl->ctrl.tagset) {
-		nvme_fc_delete_hw_io_queues(ctrl);
-		nvme_fc_free_io_queues(ctrl);
-	}
-
 	__nvme_fc_delete_hw_queue(ctrl, &ctrl->queues[0], 0);
 	nvme_fc_free_queue(&ctrl->queues[0]);
 
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 1/4] nvme: separate out nvme_ctrl_state_name()
  2019-05-20  6:36 ` [PATCH 1/4] nvme: separate out nvme_ctrl_state_name() Hannes Reinecke
@ 2019-05-21  6:46   ` Christoph Hellwig
  2019-05-24  6:46   ` Sagi Grimberg
  2019-05-24 20:34   ` Keith Busch
  2 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2019-05-21  6:46 UTC (permalink / raw)


> +const char *nvme_ctrl_state_name(struct nvme_ctrl *ctrl)
> +{
> +	if ((unsigned)ctrl->state < ARRAY_SIZE(nvme_ctrl_state_names) &&

Can we please drop this odd cast, even if it was there in the old code?

Otherwise looks fine:

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/4] nvme-fc: track state change failures during reconnect
  2019-05-20  6:36 ` [PATCH 2/4] nvme-fc: track state change failures during reconnect Hannes Reinecke
@ 2019-05-21  6:47   ` Christoph Hellwig
  2019-05-24  6:51   ` Sagi Grimberg
  1 sibling, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2019-05-21  6:47 UTC (permalink / raw)


On Mon, May 20, 2019@08:36:22AM +0200, Hannes Reinecke wrote:
> The nvme-fc driver has several situation under which an expected
> state transition fails, but doesn't print out any messages if
> this happens.
> The patch adds logging for these situations.
> 
> Signed-off-by: Hannes Reinecke <hare at suse.com>
> Reviewed-by: James Smart <james.smart at broadcom.com>

Looks good,

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/4] nvme-fc: fail reconnect if state change fails
  2019-05-20  6:36 ` [PATCH 3/4] nvme-fc: fail reconnect if state change fails Hannes Reinecke
@ 2019-05-21 16:18   ` James Smart
  2019-05-22 17:43   ` Arun Easi
  1 sibling, 0 replies; 20+ messages in thread
From: James Smart @ 2019-05-21 16:18 UTC (permalink / raw)



On 5/19/2019 11:36 PM, Hannes Reinecke wrote:
> If the final state change to LIVE in nvme_fc_create_association()
> fails the controller is not operational as no I/O is possible.
> So we should be returning an error here to reschedule reconnect.
> Additionally it should only be called while in CONNECTING state, so
> add a check for this, too.
>
> Signed-off-by: Hannes Reinecke <hare at suse.com>
> ---

Reviewed-by:?? James Smart?? <james.smart at broadcom.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 4/4] nvme-fc: align nvme_fc_delete_association() with exit path
  2019-05-20  6:36 ` [PATCH 4/4] nvme-fc: align nvme_fc_delete_association() with exit path Hannes Reinecke
@ 2019-05-21 16:25   ` James Smart
  2019-05-22 14:00     ` Hannes Reinecke
  0 siblings, 1 reply; 20+ messages in thread
From: James Smart @ 2019-05-21 16:25 UTC (permalink / raw)




On 5/19/2019 11:36 PM, Hannes Reinecke wrote:
> nvme_fc_delete_association() should align with the exit path in
> nvme_fc_create_association() to ensure we are able to handle the
> failure case properly.
>
> Signed-off-by: Hannes Reinecke <hare at suse.com>
> ---
>   drivers/nvme/host/fc.c | 10 +++++-----
>   1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> index 71b22139e78b..ed9f4affbe10 100644
> --- a/drivers/nvme/host/fc.c
> +++ b/drivers/nvme/host/fc.c
> @@ -2836,6 +2836,11 @@ nvme_fc_delete_association(struct nvme_fc_ctrl *ctrl)
>   	ctrl->flags &= ~FCCTRL_TERMIO;
>   	spin_unlock_irq(&ctrl->lock);
>   
> +	if (ctrl->ctrl.tagset) {
> +		nvme_fc_delete_hw_io_queues(ctrl);
> +		nvme_fc_free_io_queues(ctrl);
> +	}
> +
>   	nvme_fc_term_aen_ops(ctrl);
>   
>   	/*
> @@ -2847,11 +2852,6 @@ nvme_fc_delete_association(struct nvme_fc_ctrl *ctrl)
>   	if (ctrl->association_id)
>   		nvme_fc_xmt_disconnect_assoc(ctrl);
>   
> -	if (ctrl->ctrl.tagset) {
> -		nvme_fc_delete_hw_io_queues(ctrl);
> -		nvme_fc_free_io_queues(ctrl);
> -	}
> -
>   	__nvme_fc_delete_hw_queue(ctrl, &ctrl->queues[0], 0);
>   	nvme_fc_free_queue(&ctrl->queues[0]);
>   

no.? This is removing the underlying queues while blk-mq is still trying 
to submit to them - causing yet a different set of issues as the driver 
will have released contexts but the calldowns are still happening. Yet 
another different set of issues would likely appear.?? no need for this 
reorg.

-- james

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 4/4] nvme-fc: align nvme_fc_delete_association() with exit path
  2019-05-21 16:25   ` James Smart
@ 2019-05-22 14:00     ` Hannes Reinecke
  2019-05-23 16:01       ` James Smart
  0 siblings, 1 reply; 20+ messages in thread
From: Hannes Reinecke @ 2019-05-22 14:00 UTC (permalink / raw)


On 5/21/19 6:25 PM, James Smart wrote:
> 
> 
> On 5/19/2019 11:36 PM, Hannes Reinecke wrote:
>> nvme_fc_delete_association() should align with the exit path in
>> nvme_fc_create_association() to ensure we are able to handle the
>> failure case properly.
>>
>> Signed-off-by: Hannes Reinecke <hare at suse.com>
>> ---
>> ? drivers/nvme/host/fc.c | 10 +++++-----
>> ? 1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
>> index 71b22139e78b..ed9f4affbe10 100644
>> --- a/drivers/nvme/host/fc.c
>> +++ b/drivers/nvme/host/fc.c
>> @@ -2836,6 +2836,11 @@ nvme_fc_delete_association(struct nvme_fc_ctrl 
>> *ctrl)
>> ????? ctrl->flags &= ~FCCTRL_TERMIO;
>> ????? spin_unlock_irq(&ctrl->lock);
>> +??? if (ctrl->ctrl.tagset) {
>> +??????? nvme_fc_delete_hw_io_queues(ctrl);
>> +??????? nvme_fc_free_io_queues(ctrl);
>> +??? }
>> +
>> ????? nvme_fc_term_aen_ops(ctrl);
>> ????? /*
>> @@ -2847,11 +2852,6 @@ nvme_fc_delete_association(struct nvme_fc_ctrl 
>> *ctrl)
>> ????? if (ctrl->association_id)
>> ????????? nvme_fc_xmt_disconnect_assoc(ctrl);
>> -??? if (ctrl->ctrl.tagset) {
>> -??????? nvme_fc_delete_hw_io_queues(ctrl);
>> -??????? nvme_fc_free_io_queues(ctrl);
>> -??? }
>> -
>> ????? __nvme_fc_delete_hw_queue(ctrl, &ctrl->queues[0], 0);
>> ????? nvme_fc_free_queue(&ctrl->queues[0]);
> 
> no.? This is removing the underlying queues while blk-mq is still trying 
> to submit to them - causing yet a different set of issues as the driver 
> will have released contexts but the calldowns are still happening. Yet 
> another different set of issues would likely appear.?? no need for this 
> reorg.
> 
Ah. Hmm. Well, I thought by having both paths identical we can ensure 
that the error path in the first would actually work.
Now it's quite hard to test, lat alone validate.

But if you have objections we can surely drop this patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare at suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/4] nvme-fc: fail reconnect if state change fails
  2019-05-20  6:36 ` [PATCH 3/4] nvme-fc: fail reconnect if state change fails Hannes Reinecke
  2019-05-21 16:18   ` James Smart
@ 2019-05-22 17:43   ` Arun Easi
  2019-05-23  5:33     ` Hannes Reinecke
  1 sibling, 1 reply; 20+ messages in thread
From: Arun Easi @ 2019-05-22 17:43 UTC (permalink / raw)


On Sun, 19 May 2019, 11:36pm, Hannes Reinecke wrote:

> If the final state change to LIVE in nvme_fc_create_association()
> fails the controller is not operational as no I/O is possible.
> So we should be returning an error here to reschedule reconnect.
> Additionally it should only be called while in CONNECTING state, so
> add a check for this, too.
> 
> Signed-off-by: Hannes Reinecke <hare at suse.com>
> ---
>  drivers/nvme/host/fc.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> index e5c81ba2b7a1..71b22139e78b 100644
> --- a/drivers/nvme/host/fc.c
> +++ b/drivers/nvme/host/fc.c
> @@ -2624,6 +2624,14 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
>  
>  	++ctrl->ctrl.nr_reconnects;
>  
> +	if (ctrl->ctrl.state != NVME_CTRL_CONNECTING) {

Question: There is a path in nvme_fc_resume_controller() that schedules 
connect_work when controller state is NVME_CTRL_NEW. I am not quite sure 
when this is exercised; but if it does, it conflicts with this 
expectation.

Otherwise changes look good.

Regards,
-Arun

> +		dev_info(ctrl->ctrl.device,
> +			 "NVME-FC{%d}: state %s cancelled new "
> +			 "association attempt\n",
> +			 ctrl->cnum, nvme_ctrl_state_name(&ctrl->ctrl));
> +		return -ENODEV;
> +	}
> +
>  	if (ctrl->rport->remoteport.port_state != FC_OBJSTATE_ONLINE)
>  		return -ENODEV;
>  
> @@ -2726,6 +2734,14 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
>  	}
>  
>  	changed = nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE);
> +	if (!changed && ctrl->ctrl.state != NVME_CTRL_DELETING) {
> +		dev_err(ctrl->ctrl.device,
> +			"NVME-FC{%d}: error_recovery: Couldn't change "
> +			"state from %s to LIVE\n", ctrl->cnum,
> +			nvme_ctrl_state_name(&ctrl->ctrl));
> +		ret = -EAGAIN;
> +		goto out_destroy_queues;
> +	}
>  
>  	ctrl->ctrl.nr_reconnects = 0;
>  
> @@ -2734,6 +2750,9 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
>  
>  	return 0;	/* Success */
>  
> +out_destroy_queues:
> +	nvme_fc_delete_hw_io_queues(ctrl);
> +	nvme_fc_free_io_queues(ctrl);
>  out_term_aen_ops:
>  	nvme_fc_term_aen_ops(ctrl);
>  out_disconnect_admin_queue:
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/4] nvme-fc: fail reconnect if state change fails
  2019-05-22 17:43   ` Arun Easi
@ 2019-05-23  5:33     ` Hannes Reinecke
  2019-05-23 15:46       ` James Smart
  0 siblings, 1 reply; 20+ messages in thread
From: Hannes Reinecke @ 2019-05-23  5:33 UTC (permalink / raw)


On 5/22/19 7:43 PM, Arun Easi wrote:
> On Sun, 19 May 2019, 11:36pm, Hannes Reinecke wrote:
> 
>> If the final state change to LIVE in nvme_fc_create_association()
>> fails the controller is not operational as no I/O is possible.
>> So we should be returning an error here to reschedule reconnect.
>> Additionally it should only be called while in CONNECTING state, so
>> add a check for this, too.
>>
>> Signed-off-by: Hannes Reinecke <hare at suse.com>
>> ---
>>  drivers/nvme/host/fc.c | 19 +++++++++++++++++++
>>  1 file changed, 19 insertions(+)
>>
>> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
>> index e5c81ba2b7a1..71b22139e78b 100644
>> --- a/drivers/nvme/host/fc.c
>> +++ b/drivers/nvme/host/fc.c
>> @@ -2624,6 +2624,14 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
>>  
>>  	++ctrl->ctrl.nr_reconnects;
>>  
>> +	if (ctrl->ctrl.state != NVME_CTRL_CONNECTING) {
> 
> Question: There is a path in nvme_fc_resume_controller() that schedules 
> connect_work when controller state is NVME_CTRL_NEW. I am not quite sure 
> when this is exercised; but if it does, it conflicts with this 
> expectation.
> 
To my understanding, the transition from NEW to LIVE is only valid for
PCI; NVMe-oF implementations should always move to CONNECTING first, ie
they should have a transition NEW -> CONNECTING -> LIVE.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare at suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 3/4] nvme-fc: fail reconnect if state change fails
  2019-05-23  5:33     ` Hannes Reinecke
@ 2019-05-23 15:46       ` James Smart
  2019-05-23 15:52         ` [EXT] " Arun Easi
  0 siblings, 1 reply; 20+ messages in thread
From: James Smart @ 2019-05-23 15:46 UTC (permalink / raw)


On 5/22/2019 10:33 PM, Hannes Reinecke wrote:
> On 5/22/19 7:43 PM, Arun Easi wrote:
>>
>> Question: There is a path in nvme_fc_resume_controller() that schedules
>> connect_work when controller state is NVME_CTRL_NEW. I am not quite sure
>> when this is exercised; but if it does, it conflicts with this
>> expectation.
>>
> To my understanding, the transition from NEW to LIVE is only valid for
> PCI; NVMe-oF implementations should always move to CONNECTING first, ie
> they should have a transition NEW -> CONNECTING -> LIVE.
>
> Cheers,
>
> Hannes

I confirm what Hannes says - when the fc transport moved to an 
asynchronous connect using the reconnect path, as part of controller 
initial allocation, it moves from NEW to (temporarily) RESETTING to 
CONNECTING then schedules the reconnect which should go from there to 
LIVE in the normal path.? That NEW scheduling connect_work, although 
still present no long occurs. It was from the pre-async connect.

-- james

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [EXT] Re: [PATCH 3/4] nvme-fc: fail reconnect if state change fails
  2019-05-23 15:46       ` James Smart
@ 2019-05-23 15:52         ` Arun Easi
  0 siblings, 0 replies; 20+ messages in thread
From: Arun Easi @ 2019-05-23 15:52 UTC (permalink / raw)


On Thu, 23 May 2019, 8:46am, James Smart wrote:

> External Email
> 
> ----------------------------------------------------------------------
> On 5/22/2019 10:33 PM, Hannes Reinecke wrote:
> > On 5/22/19 7:43 PM, Arun Easi wrote:
> > > 
> > > Question: There is a path in nvme_fc_resume_controller() that schedules
> > > connect_work when controller state is NVME_CTRL_NEW. I am not quite sure
> > > when this is exercised; but if it does, it conflicts with this
> > > expectation.
> > > 
> > To my understanding, the transition from NEW to LIVE is only valid for
> > PCI; NVMe-oF implementations should always move to CONNECTING first, ie
> > they should have a transition NEW -> CONNECTING -> LIVE.
> > 
> > Cheers,
> > 
> > Hannes
> 
> I confirm what Hannes says - when the fc transport moved to an asynchronous
> connect using the reconnect path, as part of controller initial allocation, it
> moves from NEW to (temporarily) RESETTING to CONNECTING then schedules the
> reconnect which should go from there to LIVE in the normal path.? That NEW
> scheduling connect_work, although still present no long occurs. It was from
> the pre-async connect.
> 

Thanks for the history and clarification. Perhaps that could be added in 
the next cleanup round.

--Arun

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 4/4] nvme-fc: align nvme_fc_delete_association() with exit path
  2019-05-22 14:00     ` Hannes Reinecke
@ 2019-05-23 16:01       ` James Smart
  0 siblings, 0 replies; 20+ messages in thread
From: James Smart @ 2019-05-23 16:01 UTC (permalink / raw)




On 5/22/2019 7:00 AM, Hannes Reinecke wrote:
> On 5/21/19 6:25 PM, James Smart wrote:
>>
>> ? no.? This is removing the underlying queues while blk-mq is still 
>> trying to submit to them - causing yet a different set of issues as 
>> the driver will have released contexts but the calldowns are still 
>> happening. Yet another different set of issues would likely appear.?? 
>> no need for this reorg.
>>
> Ah. Hmm. Well, I thought by having both paths identical we can ensure 
> that the error path in the first would actually work.
> Now it's quite hard to test, lat alone validate.
>
> But if you have objections we can surely drop this patch.

I do believe the failure case that you're adding is different - we 
haven't called start controller, so although the io queues and their 
lldd hw queues are live, we haven't done ns scanning and release of 
their queues to push things to the controller - so there won't be live 
ios in the lld that could be referencing the hw queues, nor any path 
that allows pushing of ios to anything other than the admin queue.?? The 
nvmf_ready checks should filter out normal i/o, especially as our state 
is not yet LIVE.? So that's the main difference with the generic change 
in delete association.

Well, if we really wanted to test it right - I would either: a) avoid 
the partial teardowns in the error path for this last state change check 
and just call the delete association routine; or b) do that, but make 
the error path always call delete association and delete association 
knows how to tear down the partially built up association (vs it expects 
a fully built up one now.

But that seems to be adding quite a bit of work for very little return.? 
To be honest, the existing code, that doesn't fail out if LIVE can't be 
transitioned to, isn't bad. There is no re-connect scheduled as it 
completes successfully. So the association is fully built up. If the 
state change failed - the only state that could have transitioned from 
CONNECTING to then fail LIVE is the DELETING state, and its valid to 
finish out successfully and let the delete path do its thing.

My preference is to leave the LIVE transition handling the way it is. 
But I'm ok if you wanted to add the LIVE-transition patch you proposed, 
but this patch in delete association isn't needed.

-- james

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] nvme: separate out nvme_ctrl_state_name()
  2019-05-20  6:36 ` [PATCH 1/4] nvme: separate out nvme_ctrl_state_name() Hannes Reinecke
  2019-05-21  6:46   ` Christoph Hellwig
@ 2019-05-24  6:46   ` Sagi Grimberg
  2019-05-24  7:33     ` Hannes Reinecke
  2019-05-24 20:34   ` Keith Busch
  2 siblings, 1 reply; 20+ messages in thread
From: Sagi Grimberg @ 2019-05-24  6:46 UTC (permalink / raw)



> +const char *nvme_ctrl_state_name(struct nvme_ctrl *ctrl)
> +{
> +	if ((unsigned)ctrl->state < ARRAY_SIZE(nvme_ctrl_state_names) &&
> +	    nvme_ctrl_state_names[ctrl->state])
> +		return nvme_ctrl_state_names[ctrl->state];
> +	return "unknown state";

Maybe just "unknown"?

Otherwise looks fine..

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/4] nvme-fc: track state change failures during reconnect
  2019-05-20  6:36 ` [PATCH 2/4] nvme-fc: track state change failures during reconnect Hannes Reinecke
  2019-05-21  6:47   ` Christoph Hellwig
@ 2019-05-24  6:51   ` Sagi Grimberg
  2019-05-24  7:35     ` Hannes Reinecke
  1 sibling, 1 reply; 20+ messages in thread
From: Sagi Grimberg @ 2019-05-24  6:51 UTC (permalink / raw)



> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> index 0c9e036afd09..e5c81ba2b7a1 100644
> --- a/drivers/nvme/host/fc.c
> +++ b/drivers/nvme/host/fc.c
> @@ -2867,8 +2867,12 @@ nvme_fc_reconnect_or_delete(struct nvme_fc_ctrl *ctrl, int status)
>   	unsigned long recon_delay = ctrl->ctrl.opts->reconnect_delay * HZ;
>   	bool recon = true;
>   
> -	if (ctrl->ctrl.state != NVME_CTRL_CONNECTING)
> +	if (ctrl->ctrl.state != NVME_CTRL_CONNECTING) {
> +		dev_info(ctrl->ctrl.device,
> +			 "NVME-FC{%d}: couldn't reconnect in state %s\n",
> +			 ctrl->cnum, nvme_ctrl_state_name(&ctrl->ctrl));

Is this always "couldn't"? Is this expected?

>   		return;
> +	}
>   
>   	if (portptr->port_state == FC_OBJSTATE_ONLINE)
>   		dev_info(ctrl->ctrl.device,
> @@ -2914,7 +2918,8 @@ __nvme_fc_terminate_io(struct nvme_fc_ctrl *ctrl)
>   	    !nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING))
>   		dev_err(ctrl->ctrl.device,
>   			"NVME-FC{%d}: error_recovery: Couldn't change state "
> -			"to CONNECTING\n", ctrl->cnum);
> +			"from %s to CONNECTING\n", ctrl->cnum,
> +			nvme_ctrl_state_name(&ctrl->ctrl));

Doesn't this present an unnecessary combination of a lower-case state
and all-caps CONNECTING?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] nvme: separate out nvme_ctrl_state_name()
  2019-05-24  6:46   ` Sagi Grimberg
@ 2019-05-24  7:33     ` Hannes Reinecke
  0 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2019-05-24  7:33 UTC (permalink / raw)


On 5/24/19 8:46 AM, Sagi Grimberg wrote:
> 
>> +const char *nvme_ctrl_state_name(struct nvme_ctrl *ctrl)
>> +{
>> +??? if ((unsigned)ctrl->state < ARRAY_SIZE(nvme_ctrl_state_names) &&
>> +??????? nvme_ctrl_state_names[ctrl->state])
>> +??????? return nvme_ctrl_state_names[ctrl->state];
>> +??? return "unknown state";
> 
> Maybe just "unknown"?
> 
> Otherwise looks fine..

Well, the original code had 'unknown state'.
So I thought to keep this behaviour.
But yeah, I like the 'unknown' better, too.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare at suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 2/4] nvme-fc: track state change failures during reconnect
  2019-05-24  6:51   ` Sagi Grimberg
@ 2019-05-24  7:35     ` Hannes Reinecke
  0 siblings, 0 replies; 20+ messages in thread
From: Hannes Reinecke @ 2019-05-24  7:35 UTC (permalink / raw)


On 5/24/19 8:51 AM, Sagi Grimberg wrote:
> 
>> diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
>> index 0c9e036afd09..e5c81ba2b7a1 100644
>> --- a/drivers/nvme/host/fc.c
>> +++ b/drivers/nvme/host/fc.c
>> @@ -2867,8 +2867,12 @@ nvme_fc_reconnect_or_delete(struct nvme_fc_ctrl 
>> *ctrl, int status)
>> ????? unsigned long recon_delay = ctrl->ctrl.opts->reconnect_delay * HZ;
>> ????? bool recon = true;
>> -??? if (ctrl->ctrl.state != NVME_CTRL_CONNECTING)
>> +??? if (ctrl->ctrl.state != NVME_CTRL_CONNECTING) {
>> +??????? dev_info(ctrl->ctrl.device,
>> +???????????? "NVME-FC{%d}: couldn't reconnect in state %s\n",
>> +???????????? ctrl->cnum, nvme_ctrl_state_name(&ctrl->ctrl));
> 
> Is this always "couldn't"? Is this expected?
> 
Surely not expected; this is pretty much the error path.

>> ????????? return;
>> +??? }
>> ????? if (portptr->port_state == FC_OBJSTATE_ONLINE)
>> ????????? dev_info(ctrl->ctrl.device,
>> @@ -2914,7 +2918,8 @@ __nvme_fc_terminate_io(struct nvme_fc_ctrl *ctrl)
>> ????????? !nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING))
>> ????????? dev_err(ctrl->ctrl.device,
>> ????????????? "NVME-FC{%d}: error_recovery: Couldn't change state "
>> -??????????? "to CONNECTING\n", ctrl->cnum);
>> +??????????? "from %s to CONNECTING\n", ctrl->cnum,
>> +??????????? nvme_ctrl_state_name(&ctrl->ctrl));
> 
> Doesn't this present an unnecessary combination of a lower-case state
> and all-caps CONNECTING?
Hmm. Indeed. I'll be using the same notation for the next version.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare at suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/4] nvme: separate out nvme_ctrl_state_name()
  2019-05-20  6:36 ` [PATCH 1/4] nvme: separate out nvme_ctrl_state_name() Hannes Reinecke
  2019-05-21  6:46   ` Christoph Hellwig
  2019-05-24  6:46   ` Sagi Grimberg
@ 2019-05-24 20:34   ` Keith Busch
  2 siblings, 0 replies; 20+ messages in thread
From: Keith Busch @ 2019-05-24 20:34 UTC (permalink / raw)


On Sun, May 19, 2019@11:36:21PM -0700, Hannes Reinecke wrote:
> Separate out nvme_ctrl_state_name() to return the controller state
> as a string.
> 
> Signed-off-by: Hannes Reinecke <hare at suse.com>
> Reviewed-by: James Smart <james.smart at broadcom.com>
> Reviewed-by: Minwoo Im <minwoo.im.dev at gmail.com>

LGTM

Reviewed-by: Keith Busch <kbusch at kernel.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-05-24 20:34 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-20  6:36 [PATCH 0/4v3] nvme-fc: track state change failures Hannes Reinecke
2019-05-20  6:36 ` [PATCH 1/4] nvme: separate out nvme_ctrl_state_name() Hannes Reinecke
2019-05-21  6:46   ` Christoph Hellwig
2019-05-24  6:46   ` Sagi Grimberg
2019-05-24  7:33     ` Hannes Reinecke
2019-05-24 20:34   ` Keith Busch
2019-05-20  6:36 ` [PATCH 2/4] nvme-fc: track state change failures during reconnect Hannes Reinecke
2019-05-21  6:47   ` Christoph Hellwig
2019-05-24  6:51   ` Sagi Grimberg
2019-05-24  7:35     ` Hannes Reinecke
2019-05-20  6:36 ` [PATCH 3/4] nvme-fc: fail reconnect if state change fails Hannes Reinecke
2019-05-21 16:18   ` James Smart
2019-05-22 17:43   ` Arun Easi
2019-05-23  5:33     ` Hannes Reinecke
2019-05-23 15:46       ` James Smart
2019-05-23 15:52         ` [EXT] " Arun Easi
2019-05-20  6:36 ` [PATCH 4/4] nvme-fc: align nvme_fc_delete_association() with exit path Hannes Reinecke
2019-05-21 16:25   ` James Smart
2019-05-22 14:00     ` Hannes Reinecke
2019-05-23 16:01       ` James Smart

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.