All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] nvme_fc: asynchronous controller create and simple discovery
@ 2018-05-12  0:50 James Smart
  2018-05-12  0:50 ` [PATCH v2 1/7] nvme: remove unnecessary controller subnqn validation James Smart
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: James Smart @ 2018-05-12  0:50 UTC (permalink / raw)


This patch set modifies the fc transport such that create_ctrl results
in the OS's controller creation only and does not initially connect
to the device on the wire inline to the create_ctrl call. Instead, the
initial connect immediately schedules the background reconnect thread
to perform the initial association connect. The patchset also contains
several other cleanups found while implementing the asynchronous
connect testing.

There are two main reasons why asynchronous controller create is done:
1) It simplifies error handling and retries. The initial controller
   connect attempts can be disrupted or by errors as easily as after
   the controller is initially created. As the code currently stands
   there has to be special retry logic and prohibitions around state
   changes if errors occur during the initial connect (which the code
   today does not have). With this patch set, initial connections use
   the same path as a reconnect, and any error handling uses the same
   paths as if the errors occurred post initial connect.
2) As the create_ctrl() call is now very fast, simplistic udev rules
   can be used for auto-connect rather than involving systemd to work
   around long initial connect times, especially if errors in initial
   connect occur.

However, several hurdles in the common infrastructure need to be changed
in order to make this work. The initial controller creation expects the
controller to be fully connected and live on the wire before it returns
back to the cli.  This gave a lot of time for the udev event to be
generated and serviced to create the corresponding /dev file.  The cli
now has to be prepared that it may access the /dev file before the event
had been serviced.  There is also a check in the fabrics layer to validate
the controller subnqn is what the connect request asked for. With this
patch set, FC will return before the initial connect is complete thus
the controller field being checked is not yet set.  There is no reason
that it wouldn't be as the request will be what the fabric connect
requests are based off, so this checking of the nqn should be removed.
Additionally, operations such as connect-all may occur while there is
a connectivity disturbance with the discovery controller, thus the
discovery log read may fail. To circumvent the side effects of giving
up and not connecting, the cli needs to retry the discovery log reads.

Therefore, this patch set is dependent on the following modifications
that have been made to the cli:
  nvme-cli: Wait for device file if not present after successful add_ctrl
     github repo commit id: bb2d87d7f386
  nvme-cli: Add ioctl retry support for "connect-all"
     github repo commit id: d8f949c21e9d


As the patchset now allows simplistic udev scripting, this patchset also
adds the change to the transport to regenerate udev discovery events in
case the system missed the events earlier (such as boot scenarios).

v2:
  added Reviewed-by's
  Reworked 2nd patch to change nvme_set_queue_count() behavior rather
    than add additional validation in the fc transport.


James Smart (7):
  nvme: remove unnecessary controller subnqn validation
  nvme: revise nvme_set_queue_count to return error on some nvme status
    codes
  nvme_fc: remove setting DNR on exception conditions
  nvme_fc: remove reinit_request routine
  nvme_fc: change controllers first connect to use reconnect path
  nvme_fc: fix nulling of queue data on reconnect
  nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport device

 drivers/nvme/host/core.c    |   3 +-
 drivers/nvme/host/fabrics.c |  10 --
 drivers/nvme/host/fc.c      | 257 ++++++++++++++++++++++++++------------------
 3 files changed, 157 insertions(+), 113 deletions(-)

-- 
2.13.1

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/7] nvme: remove unnecessary controller subnqn validation
  2018-05-12  0:50 [PATCH v2 0/7] nvme_fc: asynchronous controller create and simple discovery James Smart
@ 2018-05-12  0:50 ` James Smart
  2018-05-12 13:36   ` Christoph Hellwig
  2018-05-25  9:11   ` Christoph Hellwig
  2018-05-12  0:50 ` [PATCH v2 2/7] nvme: revise nvme_set_queue_count to return error on some nvme status codes James Smart
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 14+ messages in thread
From: James Smart @ 2018-05-12  0:50 UTC (permalink / raw)


After creating the nvme controller, nvmf_create_ctrl() validates
the newly created subsysnqn vs the one specified by the options.

In general, this is an unnecessary check as the Connect message
should implicitly ensure this value matches.

With the change to the FC transport to do an asynchronous connect
for the first association create, the transport will return to
nvmf_create_ctrl() before that first association has been established,
thus the subnqn will not yet be set.

Remove the unnecessary validation.

Signed-off-by: James Smart <james.smart at broadcom.com>
Reviewed-by: Hannes Reinecke <hare at suse.com>
---
 drivers/nvme/host/fabrics.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 7ae732a77fe8..6fdf499d16b4 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -983,16 +983,6 @@ nvmf_create_ctrl(struct device *dev, const char *buf, size_t count)
 		goto out_module_put;
 	}
 
-	if (strcmp(ctrl->subsys->subnqn, opts->subsysnqn)) {
-		dev_warn(ctrl->device,
-			"controller returned incorrect NQN: \"%s\".\n",
-			ctrl->subsys->subnqn);
-		module_put(ops->module);
-		up_read(&nvmf_transports_rwsem);
-		nvme_delete_ctrl_sync(ctrl);
-		return ERR_PTR(-EINVAL);
-	}
-
 	module_put(ops->module);
 	up_read(&nvmf_transports_rwsem);
 	return ctrl;
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/7] nvme: revise nvme_set_queue_count to return error on some nvme status codes
  2018-05-12  0:50 [PATCH v2 0/7] nvme_fc: asynchronous controller create and simple discovery James Smart
  2018-05-12  0:50 ` [PATCH v2 1/7] nvme: remove unnecessary controller subnqn validation James Smart
@ 2018-05-12  0:50 ` James Smart
  2018-05-12 13:34   ` Christoph Hellwig
  2018-05-12  0:50 ` [PATCH v2 3/7] nvme_fc: remove setting DNR on exception conditions James Smart
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: James Smart @ 2018-05-12  0:50 UTC (permalink / raw)


If the SET_FEATURES command failed with an nvme completion failure only,
the current code ignores the failure and lets the controller continue to
initialize but with the io queue count degraded (not existent).

In cases where the failure may have been due to the transport detecting
an error condition or a connectivity loss, the nvme status failure was
manufactured and not a valid representation of the controller. It is better
to fail the command so that a subsequent (re)connect can resolve the state.

The status codes that transports may generate are NVME_SC_ABORT_REQ and
NVME_SC_INTERNAL. In those cases, return a positive error value with the
value being the status code.

Signed-off-by: James Smart <james.smart at broadcom.com>

---
v2:
  replaced logic in fc.c which checked for io queue count of zero
  and recognized failure with change to nvme_set_queue_count() to
  return error if one of the transport error completion values

 drivers/nvme/host/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f3779f350769..7a39ce8d9d5e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1007,7 +1007,8 @@ int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)
 
 	status = nvme_set_features(ctrl, NVME_FEAT_NUM_QUEUES, q_count, NULL, 0,
 			&result);
-	if (status < 0)
+	if (status < 0 ||
+	    status == NVME_SC_ABORT_REQ || status == NVME_SC_INTERNAL)
 		return status;
 
 	/*
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/7] nvme_fc: remove setting DNR on exception conditions
  2018-05-12  0:50 [PATCH v2 0/7] nvme_fc: asynchronous controller create and simple discovery James Smart
  2018-05-12  0:50 ` [PATCH v2 1/7] nvme: remove unnecessary controller subnqn validation James Smart
  2018-05-12  0:50 ` [PATCH v2 2/7] nvme: revise nvme_set_queue_count to return error on some nvme status codes James Smart
@ 2018-05-12  0:50 ` James Smart
  2018-05-25  9:11   ` Christoph Hellwig
  2018-05-12  0:50 ` [PATCH v2 4/7] nvme_fc: remove reinit_request routine James Smart
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: James Smart @ 2018-05-12  0:50 UTC (permalink / raw)


Current code will set DNR if the controller is deleting or there is
an error during controller init. None of this is necessary.

remove the code that sets DNR

Signed-off-by: James Smart <james.smart at broadcom.com>
Reviewed-by: Hannes Reinecke <hare at suse.com>
---
 drivers/nvme/host/fc.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 6cb26bcf6ec0..7e64fe69c945 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -1686,16 +1686,6 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
 		goto check_error;
 	}
 
-	/*
-	 * Force failures of commands if we're killing the controller
-	 * or have an error on a command used to create an new association
-	 */
-	if (status &&
-	    (blk_queue_dying(rq->q) ||
-	     ctrl->ctrl.state == NVME_CTRL_NEW ||
-	     ctrl->ctrl.state == NVME_CTRL_CONNECTING))
-		status |= cpu_to_le16(NVME_SC_DNR << 1);
-
 	__nvme_fc_fcpop_chk_teardowns(ctrl, op, opstate);
 	nvme_end_request(rq, status, result);
 
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 4/7] nvme_fc: remove reinit_request routine
  2018-05-12  0:50 [PATCH v2 0/7] nvme_fc: asynchronous controller create and simple discovery James Smart
                   ` (2 preceding siblings ...)
  2018-05-12  0:50 ` [PATCH v2 3/7] nvme_fc: remove setting DNR on exception conditions James Smart
@ 2018-05-12  0:50 ` James Smart
  2018-05-12 13:35   ` Christoph Hellwig
  2018-05-12  0:50 ` [PATCH v2 5/7] nvme_fc: change controllers first connect to use reconnect path James Smart
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: James Smart @ 2018-05-12  0:50 UTC (permalink / raw)


The reinit_request routine is not necessary. Remove support for the
op callback.

As all that nvme_reinit_tagset() does is itterate and call the
reinit routine, it too has no purpose. Remove the call.

Signed-off-by: James Smart <james.smart at broadcom.com>
Reviewed-by: Hannes Reinecke <hare at suse.com>
---
 drivers/nvme/host/fc.c | 20 --------------------
 1 file changed, 20 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 7e64fe69c945..a9db9cdccc9c 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -1470,21 +1470,6 @@ nvme_fc_xmt_disconnect_assoc(struct nvme_fc_ctrl *ctrl)
 
 static void nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl, char *errmsg);
 
-static int
-nvme_fc_reinit_request(void *data, struct request *rq)
-{
-	struct nvme_fc_fcp_op *op = blk_mq_rq_to_pdu(rq);
-	struct nvme_fc_cmd_iu *cmdiu = &op->cmd_iu;
-
-	memset(cmdiu, 0, sizeof(*cmdiu));
-	cmdiu->scsi_id = NVME_CMD_SCSI_ID;
-	cmdiu->fc_id = NVME_CMD_FC_ID;
-	cmdiu->iu_len = cpu_to_be16(sizeof(*cmdiu) / sizeof(u32));
-	memset(&op->rsp_iu, 0, sizeof(op->rsp_iu));
-
-	return 0;
-}
-
 static void
 __nvme_fc_exit_request(struct nvme_fc_ctrl *ctrl,
 		struct nvme_fc_fcp_op *op)
@@ -2505,10 +2490,6 @@ nvme_fc_reinit_io_queues(struct nvme_fc_ctrl *ctrl)
 
 	nvme_fc_init_io_queues(ctrl);
 
-	ret = nvme_reinit_tagset(&ctrl->ctrl, ctrl->ctrl.tagset);
-	if (ret)
-		goto out_free_io_queues;
-
 	ret = nvme_fc_create_hw_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
 	if (ret)
 		goto out_free_io_queues;
@@ -2920,7 +2901,6 @@ static const struct nvme_ctrl_ops nvme_fc_ctrl_ops = {
 	.submit_async_event	= nvme_fc_submit_async_event,
 	.delete_ctrl		= nvme_fc_delete_ctrl,
 	.get_address		= nvmf_get_address,
-	.reinit_request		= nvme_fc_reinit_request,
 };
 
 static void
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 5/7] nvme_fc: change controllers first connect to use reconnect path
  2018-05-12  0:50 [PATCH v2 0/7] nvme_fc: asynchronous controller create and simple discovery James Smart
                   ` (3 preceding siblings ...)
  2018-05-12  0:50 ` [PATCH v2 4/7] nvme_fc: remove reinit_request routine James Smart
@ 2018-05-12  0:50 ` James Smart
  2018-05-12  0:50 ` [PATCH v2 6/7] nvme_fc: fix nulling of queue data on reconnect James Smart
  2018-05-12  0:50 ` [PATCH v2 7/7] nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport device James Smart
  6 siblings, 0 replies; 14+ messages in thread
From: James Smart @ 2018-05-12  0:50 UTC (permalink / raw)


Current code follows the framework that has been in the transports
from the beginning where initial link-side controller connect occurs
as part of "creating the controller". Thus that first connect fully
talks to the controller and obtains values that can then be used in
for blk-mq setup, etc. It also means that everything about the
controller is fully know before the "create controller" call returns.

This has several weaknesses:
- The initial create_ctrl call made by the cli will block for a long
  time as wire transactions are performed synchronously. This delay
  becomes longer if errors occur or connectivity is lost and retries
  need to be performed.
- Code wise, it means there is a separate connect path for initial
  controller connect vs the (same) steps used in the reconnect path.
- And as there's separate paths, it means there's separate error
  handling and retry logic. It also plays havoc with the NEW state
  (should transition out of it after successful initial connect) vs
  the RESETTING and CONNECTING (reconnect) states that want to be
  transitioned to on error.
- As there's separate paths, to recover from errors and disruptions,
  it requires separate recovery/retry paths as well and can severely
  convolute the controller state.
- Given the duration of the initial create_ctrl() call, automation
  of dynamic discovery can't use simplistic udev rules. Instead,
  more convoluted use of systemd is required.

This patch reworks the fc transport to use the same connect paths
for the initial connection as it uses for reconnect. This makes a
single path for error recovery and handling. Given that the initial
connect becomes asynchronous on a background work thread, the
create_ctrl() now returns very quickly, allowing the simplified
auto-connnect scripting.

This patch:
- Removes the driving of the initial connect and replaces it with
  a state transition to CONNECTING and initiating the reconnect
  thread. A dummy state transition of RESETTING had to be traversed
  as a direct transtion of NEW->CONNECTING is not allowed. Given
  that the controller is "new", the RESETTING transition is a simple
  no-op. Once in the reconnecting thread, the normal behaviors of
  ctrl_loss_tmo (max_retries * connect_delay) and dev_loss_tmo will
  apply before the controller is torn down.
- Only if the state transitions couldn't be traversed and the
  reconnect thread not scheduled, will the controller be torn down
  while in create_ctrl.
- The prior code used the controller state of NEW to indicate
  whether request queues had been initialized or not. For the admin
  queue, the request queue is always created, so there's no need to
  check a state. For IO queues, change to tracking whether a successful
  io request queue create has occurred (e.g. 1st successful connect).
- The initial controller id is initialized to the dynamic controller
  id used in the initial connect message. It will be overwritten by
  the real controller id once the controller is connected on the wire.

Note: this patch is dependent on these changes that have been
added to the nvme-cli utility:
- nvme-cli: Wait for device file if not present after successful add_ctrl
- nvme-cli: Add ioctl retry support for "connect-all"

With this patch, a simplistic udev rule such as one of the following
options below, may now be used to support auto connectivity:

option A:
   ACTION=="change", SUBSYSTEM=="fc", ENV{FC_EVENT}=="nvmediscovery", \
        ENV{NVMEFC_HOST_TRADDR}=="*", ENV{NVMEFC_TRADDR}=="*", \
	RUN+="/usr/sbin/nvme connect-all --transport=fc --host-traddr=$env{NVMEFC_HOST_TRADDR} --traddr=$env{NVMEFC_TRADDR}"

option B:
   SUBSYSTEM!="fc", GOTO="nvme_autoconnect_end"
   ACTION!="change", GOTO="nvme_autoconnect_end"
   ENV{FC_EVENT}!="nvmediscovery", GOTO="nvme_autoconnect_end"
   ENV{NVMEFC_HOST_TRADDR}!="?*", GOTO="nvme_autoconnect_end"
   ENV{NVMEFC_TRADDR}!="?*", GOTO="nvme_autoconnect_end"
   RUN+="/usr/sbin/nvme connect-all -t fc --host-traddr=$env{NVMEFC_HOST_TRADDR} -a $env{NVMEFC_TRADDR}"
   LABEL="nvme_autoconnect_end"

Signed-off-by: James Smart <james.smart at broadcom.com>
Reviewed-by: Hannes Reinecke <hare at suse.com>
---
 drivers/nvme/host/fc.c | 102 ++++++++++++++++++++++---------------------------
 1 file changed, 45 insertions(+), 57 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index a9db9cdccc9c..e5e8b4325349 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -142,6 +142,7 @@ struct nvme_fc_ctrl {
 	struct nvme_fc_rport	*rport;
 	u32			cnum;
 
+	bool			ioq_live;
 	bool			assoc_active;
 	u64			association_id;
 
@@ -2451,6 +2452,8 @@ nvme_fc_create_io_queues(struct nvme_fc_ctrl *ctrl)
 	if (ret)
 		goto out_delete_hw_queues;
 
+	ctrl->ioq_live = true;
+
 	return 0;
 
 out_delete_hw_queues:
@@ -2599,8 +2602,7 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
 	if (ret)
 		goto out_delete_hw_queue;
 
-	if (ctrl->ctrl.state != NVME_CTRL_NEW)
-		blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
+	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
 
 	ret = nvmf_connect_admin_queue(&ctrl->ctrl);
 	if (ret)
@@ -2673,7 +2675,7 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
 	 */
 
 	if (ctrl->ctrl.queue_count > 1) {
-		if (ctrl->ctrl.state == NVME_CTRL_NEW)
+		if (!ctrl->ioq_live)
 			ret = nvme_fc_create_io_queues(ctrl);
 		else
 			ret = nvme_fc_reinit_io_queues(ctrl);
@@ -2760,8 +2762,7 @@ nvme_fc_delete_association(struct nvme_fc_ctrl *ctrl)
 	 * use blk_mq_tagset_busy_itr() and the transport routine to
 	 * terminate the exchanges.
 	 */
-	if (ctrl->ctrl.state != NVME_CTRL_NEW)
-		blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+	blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
 	blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
 				nvme_fc_terminate_exchange, &ctrl->ctrl);
 
@@ -2917,7 +2918,7 @@ nvme_fc_connect_ctrl_work(struct work_struct *work)
 		nvme_fc_reconnect_or_delete(ctrl, ret);
 	else
 		dev_info(ctrl->ctrl.device,
-			"NVME-FC{%d}: controller reconnect complete\n",
+			"NVME-FC{%d}: controller connect complete\n",
 			ctrl->cnum);
 }
 
@@ -2965,7 +2966,7 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 {
 	struct nvme_fc_ctrl *ctrl;
 	unsigned long flags;
-	int ret, idx, retry;
+	int ret, idx;
 
 	if (!(rport->remoteport.port_role &
 	    (FC_PORT_ROLE_NVME_DISCOVERY | FC_PORT_ROLE_NVME_TARGET))) {
@@ -2992,11 +2993,13 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 	}
 
 	ctrl->ctrl.opts = opts;
+	ctrl->ctrl.nr_reconnects = 0;
 	INIT_LIST_HEAD(&ctrl->ctrl_list);
 	ctrl->lport = lport;
 	ctrl->rport = rport;
 	ctrl->dev = lport->dev;
 	ctrl->cnum = idx;
+	ctrl->ioq_live = false;
 	ctrl->assoc_active = false;
 	init_waitqueue_head(&ctrl->ioabort_wait);
 
@@ -3015,6 +3018,7 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 
 	ctrl->ctrl.sqsize = opts->queue_size - 1;
 	ctrl->ctrl.kato = opts->kato;
+	ctrl->ctrl.cntlid = 0xffff;
 
 	ret = -ENOMEM;
 	ctrl->queues = kcalloc(ctrl->ctrl.queue_count,
@@ -3064,68 +3068,52 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 	list_add_tail(&ctrl->ctrl_list, &rport->ctrl_list);
 	spin_unlock_irqrestore(&rport->lock, flags);
 
-	/*
-	 * It's possible that transactions used to create the association
-	 * may fail. Examples: CreateAssociation LS or CreateIOConnection
-	 * LS gets dropped/corrupted/fails; or a frame gets dropped or a
-	 * command times out for one of the actions to init the controller
-	 * (Connect, Get/Set_Property, Set_Features, etc). Many of these
-	 * transport errors (frame drop, LS failure) inherently must kill
-	 * the association. The transport is coded so that any command used
-	 * to create the association (prior to a LIVE state transition
-	 * while NEW or CONNECTING) will fail if it completes in error or
-	 * times out.
-	 *
-	 * As such: as the connect request was mostly likely due to a
-	 * udev event that discovered the remote port, meaning there is
-	 * not an admin or script there to restart if the connect
-	 * request fails, retry the initial connection creation up to
-	 * three times before giving up and declaring failure.
-	 */
-	for (retry = 0; retry < 3; retry++) {
-		ret = nvme_fc_create_association(ctrl);
-		if (!ret)
-			break;
+	if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_RESETTING) ||
+	    !nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_CONNECTING)) {
+		dev_err(ctrl->ctrl.device,
+			"NVME-FC{%d}: failed to init ctrl state\n", ctrl->cnum);
+		goto fail_ctrl;
 	}
 
-	if (ret) {
-		nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING);
-		cancel_work_sync(&ctrl->ctrl.reset_work);
-		cancel_delayed_work_sync(&ctrl->connect_work);
+	nvme_get_ctrl(&ctrl->ctrl);
 
-		/* couldn't schedule retry - fail out */
+	if (!queue_delayed_work(nvme_wq, &ctrl->connect_work, 0)) {
+		nvme_put_ctrl(&ctrl->ctrl);
 		dev_err(ctrl->ctrl.device,
-			"NVME-FC{%d}: Connect retry failed\n", ctrl->cnum);
+			"NVME-FC{%d}: failed to schedule initial connect\n",
+			ctrl->cnum);
+		goto fail_ctrl;
+	}
 
-		ctrl->ctrl.opts = NULL;
+	dev_info(ctrl->ctrl.device,
+		"NVME-FC{%d}: new ctrl: NQN \"%s\"\n",
+		ctrl->cnum, ctrl->ctrl.opts->subsysnqn);
 
-		/* initiate nvme ctrl ref counting teardown */
-		nvme_uninit_ctrl(&ctrl->ctrl);
+	return &ctrl->ctrl;
 
-		/* Remove core ctrl ref. */
-		nvme_put_ctrl(&ctrl->ctrl);
+fail_ctrl:
+	nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_DELETING);
+	cancel_work_sync(&ctrl->ctrl.reset_work);
+	cancel_delayed_work_sync(&ctrl->connect_work);
 
-		/* as we're past the point where we transition to the ref
-		 * counting teardown path, if we return a bad pointer here,
-		 * the calling routine, thinking it's prior to the
-		 * transition, will do an rport put. Since the teardown
-		 * path also does a rport put, we do an extra get here to
-		 * so proper order/teardown happens.
-		 */
-		nvme_fc_rport_get(rport);
+	ctrl->ctrl.opts = NULL;
 
-		if (ret > 0)
-			ret = -EIO;
-		return ERR_PTR(ret);
-	}
+	/* initiate nvme ctrl ref counting teardown */
+	nvme_uninit_ctrl(&ctrl->ctrl);
 
-	nvme_get_ctrl(&ctrl->ctrl);
+	/* Remove core ctrl ref. */
+	nvme_put_ctrl(&ctrl->ctrl);
 
-	dev_info(ctrl->ctrl.device,
-		"NVME-FC{%d}: new ctrl: NQN \"%s\"\n",
-		ctrl->cnum, ctrl->ctrl.opts->subsysnqn);
+	/* as we're past the point where we transition to the ref
+	 * counting teardown path, if we return a bad pointer here,
+	 * the calling routine, thinking it's prior to the
+	 * transition, will do an rport put. Since the teardown
+	 * path also does a rport put, we do an extra get here to
+	 * so proper order/teardown happens.
+	 */
+	nvme_fc_rport_get(rport);
 
-	return &ctrl->ctrl;
+	return ERR_PTR(-EIO);
 
 out_cleanup_admin_q:
 	blk_cleanup_queue(ctrl->ctrl.admin_q);
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 6/7] nvme_fc: fix nulling of queue data on reconnect
  2018-05-12  0:50 [PATCH v2 0/7] nvme_fc: asynchronous controller create and simple discovery James Smart
                   ` (4 preceding siblings ...)
  2018-05-12  0:50 ` [PATCH v2 5/7] nvme_fc: change controllers first connect to use reconnect path James Smart
@ 2018-05-12  0:50 ` James Smart
  2018-05-12  0:50 ` [PATCH v2 7/7] nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport device James Smart
  6 siblings, 0 replies; 14+ messages in thread
From: James Smart @ 2018-05-12  0:50 UTC (permalink / raw)


The reconnect path is calling the init routines to clear a queue
structure. But the queue structure has state that perhaps needs
to persist as long as the controller is live.

Remove the nvme_fc_init_queue() calls on reconnect.
The nvme_fc_free_queue() calls will clear state bits and reset
any relevant queue state for a new connection.

Signed-off-by: James Smart <james.smart at broadcom.com>
Reviewed-by: Hannes Reinecke <hare at suse.com>
---
 drivers/nvme/host/fc.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index e5e8b4325349..72bf01d5435a 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -1879,6 +1879,7 @@ nvme_fc_free_queue(struct nvme_fc_queue *queue)
 	 */
 
 	queue->connection_id = 0;
+	atomic_set(&queue->csn, 1);
 }
 
 static void
@@ -2471,7 +2472,7 @@ nvme_fc_create_io_queues(struct nvme_fc_ctrl *ctrl)
 }
 
 static int
-nvme_fc_reinit_io_queues(struct nvme_fc_ctrl *ctrl)
+nvme_fc_recreate_io_queues(struct nvme_fc_ctrl *ctrl)
 {
 	struct nvmf_ctrl_options *opts = ctrl->ctrl.opts;
 	unsigned int nr_io_queues;
@@ -2491,8 +2492,6 @@ nvme_fc_reinit_io_queues(struct nvme_fc_ctrl *ctrl)
 	if (ctrl->ctrl.queue_count == 1)
 		return 0;
 
-	nvme_fc_init_io_queues(ctrl);
-
 	ret = nvme_fc_create_hw_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
 	if (ret)
 		goto out_free_io_queues;
@@ -2590,8 +2589,6 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
 	 * Create the admin queue
 	 */
 
-	nvme_fc_init_queue(ctrl, 0);
-
 	ret = __nvme_fc_create_hw_queue(ctrl, &ctrl->queues[0], 0,
 				NVME_AQ_DEPTH);
 	if (ret)
@@ -2678,7 +2675,7 @@ nvme_fc_create_association(struct nvme_fc_ctrl *ctrl)
 		if (!ctrl->ioq_live)
 			ret = nvme_fc_create_io_queues(ctrl);
 		else
-			ret = nvme_fc_reinit_io_queues(ctrl);
+			ret = nvme_fc_recreate_io_queues(ctrl);
 		if (ret)
 			goto out_term_aen_ops;
 	}
@@ -3026,6 +3023,8 @@ nvme_fc_init_ctrl(struct device *dev, struct nvmf_ctrl_options *opts,
 	if (!ctrl->queues)
 		goto out_free_ida;
 
+	nvme_fc_init_queue(ctrl, 0);
+
 	memset(&ctrl->admin_tag_set, 0, sizeof(ctrl->admin_tag_set));
 	ctrl->admin_tag_set.ops = &nvme_fc_admin_mq_ops;
 	ctrl->admin_tag_set.queue_depth = NVME_AQ_MQ_TAG_DEPTH;
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 7/7] nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport device
  2018-05-12  0:50 [PATCH v2 0/7] nvme_fc: asynchronous controller create and simple discovery James Smart
                   ` (5 preceding siblings ...)
  2018-05-12  0:50 ` [PATCH v2 6/7] nvme_fc: fix nulling of queue data on reconnect James Smart
@ 2018-05-12  0:50 ` James Smart
  6 siblings, 0 replies; 14+ messages in thread
From: James Smart @ 2018-05-12  0:50 UTC (permalink / raw)


The fc transport device should allow for a rediscovery, as userspace
might have lost the events. Example is udev events not handled during
system startup.

This patch add a sysfs entry 'nvme_discovery' on the fc class to
have it replay all udev discovery events for all local port/remote
port address pairs.

Signed-off-by: James Smart <james.smart at broadcom.com>
Reviewed-by: Hannes Reinecke <hare at suse.com>
Cc: Ewan Milne <emilne at redhat.com>
Cc: Johannes Thumshirn <jthumshirn at suse.com>

---
This is actually v4 - based on Hannes' v3 of the same patch:
[PATCHv3] nvme/fc: add 'discovery' sysfs attribute to fc transport device
http://lists.infradead.org/pipermail/linux-nvme/2017-December/014334.html

However, refcounting in v3 was still messed up when an rport_get() fails.
Modified the patch to add the unwind routine to correct it.

Also renamed the "discovery" attribute to "nvme_discovery" so that there
will not be collisions if/when we move to a shared fc transport and have
scsi there as well.

Note: this patch alone doesn't allow the simple udev script.  The patches
for the fc transport to move initial connect to use the asychronous
reconnect path is what allows the simple udev script. This patch simply
allows to catch up with connectivity if the udev events weren't handled.
---
 drivers/nvme/host/fc.c | 114 +++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 105 insertions(+), 9 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 72bf01d5435a..ec68dd8821bf 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -122,6 +122,7 @@ struct nvme_fc_rport {
 	struct list_head		endp_list; /* for lport->endp_list */
 	struct list_head		ctrl_list;
 	struct list_head		ls_req_list;
+	struct list_head		disc_list;
 	struct device			*dev;	/* physical device for dma */
 	struct nvme_fc_lport		*lport;
 	spinlock_t			lock;
@@ -210,7 +211,6 @@ static DEFINE_IDA(nvme_fc_ctrl_cnt);
  * These items are short-term. They will eventually be moved into
  * a generic FC class. See comments in module init.
  */
-static struct class *fc_class;
 static struct device *fc_udev_device;
 
 
@@ -507,6 +507,7 @@ nvme_fc_free_rport(struct kref *ref)
 	list_del(&rport->endp_list);
 	spin_unlock_irqrestore(&nvme_fc_lock, flags);
 
+	WARN_ON(!list_empty(&rport->disc_list));
 	ida_simple_remove(&lport->endp_cnt, rport->remoteport.port_num);
 
 	kfree(rport);
@@ -694,6 +695,7 @@ nvme_fc_register_remoteport(struct nvme_fc_local_port *localport,
 	INIT_LIST_HEAD(&newrec->endp_list);
 	INIT_LIST_HEAD(&newrec->ctrl_list);
 	INIT_LIST_HEAD(&newrec->ls_req_list);
+	INIT_LIST_HEAD(&newrec->disc_list);
 	kref_init(&newrec->ref);
 	atomic_set(&newrec->act_ctrl_cnt, 0);
 	spin_lock_init(&newrec->lock);
@@ -3253,6 +3255,100 @@ static struct nvmf_transport_ops nvme_fc_transport = {
 	.create_ctrl	= nvme_fc_create_ctrl,
 };
 
+static void nvme_fc_discovery_unwind(struct list_head *lheadp)
+{
+	unsigned long flags;
+	struct nvme_fc_lport *lport;
+	struct nvme_fc_rport *rport, *tmp_rport;
+
+	list_for_each_entry_safe(rport, tmp_rport,
+				 lheadp, disc_list) {
+		spin_lock_irqsave(&nvme_fc_lock, flags);
+		list_del_init(&rport->disc_list);
+		spin_unlock_irqrestore(&nvme_fc_lock, flags);
+		lport = rport->lport;
+		/* signal discovery. Won't hurt if it repeats */
+		nvme_fc_signal_discovery_scan(lport, rport);
+		nvme_fc_rport_put(rport);
+		nvme_fc_lport_put(lport);
+	}
+}
+
+/* Arbitrary successive failures max. With lots of subsystems could be high */
+#define DISCOVERY_MAX_FAIL	20
+
+static ssize_t nvme_fc_nvme_discovery_store(struct device *dev,
+				       struct device_attribute *attr,
+				       const char *buf, size_t count)
+{
+	unsigned long flags;
+	LIST_HEAD(nvme_fc_disc_list);
+	struct nvme_fc_lport *lport;
+	struct nvme_fc_rport *rport, *tmp_rport;
+	int failcnt = 0;
+
+restart:
+	spin_lock_irqsave(&nvme_fc_lock, flags);
+	list_for_each_entry(lport, &nvme_fc_lport_list, port_list) {
+		list_for_each_entry(rport, &lport->endp_list, endp_list) {
+			if (!nvme_fc_lport_get(lport))
+				continue;
+			if (!nvme_fc_rport_get(rport)) {
+				/*
+				 * This is a temporary condition, so upon
+				 * restart this node will be gone from the
+				 * list.
+				 */
+				spin_unlock_irqrestore(&nvme_fc_lock, flags);
+				nvme_fc_lport_put(lport);
+				nvme_fc_discovery_unwind(&nvme_fc_disc_list);
+				if (failcnt++ < DISCOVERY_MAX_FAIL)
+					goto restart;
+				pr_err("nvme_discovery: too many reference "
+				       "failures\n");
+				return 0;
+			}
+			if (list_empty(&rport->disc_list))
+				list_add_tail(&rport->disc_list,
+					      &nvme_fc_disc_list);
+		}
+	}
+	spin_unlock_irqrestore(&nvme_fc_lock, flags);
+
+	list_for_each_entry_safe(rport, tmp_rport,
+				 &nvme_fc_disc_list, disc_list) {
+		spin_lock_irqsave(&nvme_fc_lock, flags);
+		list_del_init(&rport->disc_list);
+		spin_unlock_irqrestore(&nvme_fc_lock, flags);
+		lport = rport->lport;
+		nvme_fc_signal_discovery_scan(lport, rport);
+		nvme_fc_rport_put(rport);
+		nvme_fc_lport_put(lport);
+	}
+	return count;
+}
+static DEVICE_ATTR(nvme_discovery, 0200, NULL, nvme_fc_nvme_discovery_store);
+
+static struct attribute *nvme_fc_attrs[] = {
+	&dev_attr_nvme_discovery.attr,
+	NULL
+};
+
+static struct attribute_group nvme_fc_attr_group = {
+	.attrs = nvme_fc_attrs,
+};
+
+static const struct attribute_group *nvme_fc_attr_groups[] = {
+	&nvme_fc_attr_group,
+	NULL
+};
+
+static struct class fc_class = {
+	.name = "fc",
+	.dev_groups = nvme_fc_attr_groups,
+	.owner = THIS_MODULE,
+};
+
 static int __init nvme_fc_init_module(void)
 {
 	int ret;
@@ -3271,16 +3367,16 @@ static int __init nvme_fc_init_module(void)
 	 * put in place, this code will move to a more generic
 	 * location for the class.
 	 */
-	fc_class = class_create(THIS_MODULE, "fc");
-	if (IS_ERR(fc_class)) {
+	ret = class_register(&fc_class);
+	if (ret) {
 		pr_err("couldn't register class fc\n");
-		return PTR_ERR(fc_class);
+		return ret;
 	}
 
 	/*
 	 * Create a device for the FC-centric udev events
 	 */
-	fc_udev_device = device_create(fc_class, NULL, MKDEV(0, 0), NULL,
+	fc_udev_device = device_create(&fc_class, NULL, MKDEV(0, 0), NULL,
 				"fc_udev_device");
 	if (IS_ERR(fc_udev_device)) {
 		pr_err("couldn't create fc_udev device!\n");
@@ -3295,9 +3391,9 @@ static int __init nvme_fc_init_module(void)
 	return 0;
 
 out_destroy_device:
-	device_destroy(fc_class, MKDEV(0, 0));
+	device_destroy(&fc_class, MKDEV(0, 0));
 out_destroy_class:
-	class_destroy(fc_class);
+	class_unregister(&fc_class);
 	return ret;
 }
 
@@ -3312,8 +3408,8 @@ static void __exit nvme_fc_exit_module(void)
 	ida_destroy(&nvme_fc_local_port_cnt);
 	ida_destroy(&nvme_fc_ctrl_cnt);
 
-	device_destroy(fc_class, MKDEV(0, 0));
-	class_destroy(fc_class);
+	device_destroy(&fc_class, MKDEV(0, 0));
+	class_unregister(&fc_class);
 }
 
 module_init(nvme_fc_init_module);
-- 
2.13.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/7] nvme: revise nvme_set_queue_count to return error on some nvme status codes
  2018-05-12  0:50 ` [PATCH v2 2/7] nvme: revise nvme_set_queue_count to return error on some nvme status codes James Smart
@ 2018-05-12 13:34   ` Christoph Hellwig
  2018-05-14 15:08     ` James Smart
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2018-05-12 13:34 UTC (permalink / raw)


On Fri, May 11, 2018@05:50:23PM -0700, James Smart wrote:
> The status codes that transports may generate are NVME_SC_ABORT_REQ and
> NVME_SC_INTERNAL. In those cases, return a positive error value with the
> value being the status code.

No, your transport should not come up with fake nvme errors, sorry.

And we need this code to ignore errors to bring up various
PCIe controller in degraded conditions.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 4/7] nvme_fc: remove reinit_request routine
  2018-05-12  0:50 ` [PATCH v2 4/7] nvme_fc: remove reinit_request routine James Smart
@ 2018-05-12 13:35   ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2018-05-12 13:35 UTC (permalink / raw)


After this the reinit_request callbacks in the nvme core and block
layer should also be removed.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/7] nvme: remove unnecessary controller subnqn validation
  2018-05-12  0:50 ` [PATCH v2 1/7] nvme: remove unnecessary controller subnqn validation James Smart
@ 2018-05-12 13:36   ` Christoph Hellwig
  2018-05-25  9:11   ` Christoph Hellwig
  1 sibling, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2018-05-12 13:36 UTC (permalink / raw)


Looks ok:

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 2/7] nvme: revise nvme_set_queue_count to return error on some nvme status codes
  2018-05-12 13:34   ` Christoph Hellwig
@ 2018-05-14 15:08     ` James Smart
  0 siblings, 0 replies; 14+ messages in thread
From: James Smart @ 2018-05-14 15:08 UTC (permalink / raw)


On 5/12/2018 6:34 AM, Christoph Hellwig wrote:
> On Fri, May 11, 2018@05:50:23PM -0700, James Smart wrote:
>> The status codes that transports may generate are NVME_SC_ABORT_REQ and
>> NVME_SC_INTERNAL. In those cases, return a positive error value with the
>> value being the status code.
> No, your transport should not come up with fake nvme errors, sorry.
>
> And we need this code to ignore errors to bring up various
> PCIe controller in degraded conditions.

It's not making up fake errors, per say. All transports, when they 
terminate io for an association that fails, kick them back with at least 
the aborted status. FC does have the internal value as well for 
transport-specific errors.? What is happening here - the controller 
initialization starts, but then connectivity is lost before it can 
complete. In this case, it occurred right as the SET_PROPERTY for io 
queues was outstanding.?? I wish the core code wasn't so expectant on 
the happy path, as even if we get past this error, it's very chatty on 
errors and can scare an admin even when the issues are fully recoverable.

Looking closer at the issue, there are two concerns:
a) the setting of the io queue count to zero persists in future 
connection attempts;
b) if the transport intercepts and kills the io, and the controller 
continues to come live in the os, is there any way the transport won't 
kill the association and retry ??? because otherwise, it is stuck in a 
"bogus" degraded mode.

for (a): the transport recalculates the io queue count on each new 
association attempt and ignores previous values. Except that it is used 
to size the hw_queues on the io request queue when it's initially 
created.? I assume the blk_mq_update_nr_hw_queues() call is sufficient 
to resize it on the next association.

for (b): I can't come up with a scenario where the transport would not 
be taking the association down due to the error that generated the 
failure on the command.

As such, I guess I can pull this patch and drop it. It's not necessary.

-- james

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/7] nvme: remove unnecessary controller subnqn validation
  2018-05-12  0:50 ` [PATCH v2 1/7] nvme: remove unnecessary controller subnqn validation James Smart
  2018-05-12 13:36   ` Christoph Hellwig
@ 2018-05-25  9:11   ` Christoph Hellwig
  1 sibling, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2018-05-25  9:11 UTC (permalink / raw)


Thanks,

applied to nvme-4.18-2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 3/7] nvme_fc: remove setting DNR on exception conditions
  2018-05-12  0:50 ` [PATCH v2 3/7] nvme_fc: remove setting DNR on exception conditions James Smart
@ 2018-05-25  9:11   ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2018-05-25  9:11 UTC (permalink / raw)


Thanks,

applied to nvme-4.18-2

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-05-25  9:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-12  0:50 [PATCH v2 0/7] nvme_fc: asynchronous controller create and simple discovery James Smart
2018-05-12  0:50 ` [PATCH v2 1/7] nvme: remove unnecessary controller subnqn validation James Smart
2018-05-12 13:36   ` Christoph Hellwig
2018-05-25  9:11   ` Christoph Hellwig
2018-05-12  0:50 ` [PATCH v2 2/7] nvme: revise nvme_set_queue_count to return error on some nvme status codes James Smart
2018-05-12 13:34   ` Christoph Hellwig
2018-05-14 15:08     ` James Smart
2018-05-12  0:50 ` [PATCH v2 3/7] nvme_fc: remove setting DNR on exception conditions James Smart
2018-05-25  9:11   ` Christoph Hellwig
2018-05-12  0:50 ` [PATCH v2 4/7] nvme_fc: remove reinit_request routine James Smart
2018-05-12 13:35   ` Christoph Hellwig
2018-05-12  0:50 ` [PATCH v2 5/7] nvme_fc: change controllers first connect to use reconnect path James Smart
2018-05-12  0:50 ` [PATCH v2 6/7] nvme_fc: fix nulling of queue data on reconnect James Smart
2018-05-12  0:50 ` [PATCH v2 7/7] nvme_fc: add 'nvme_discovery' sysfs attribute to fc transport device James Smart

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.