linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] nvme-fabrics: fix un-expected behaviour related to hostnqn and hostid
@ 2023-05-12 15:41 Max Gurtovoy
  2023-05-12 15:41 ` [PATCH v3 1/3] nvme-fabrics: unify common code in admin and io queue connect Max Gurtovoy
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Max Gurtovoy @ 2023-05-12 15:41 UTC (permalink / raw)
  To: hch, sagi, kbusch, linux-nvme
  Cc: hare, axboe, oren, ngottlieb, israelr, Max Gurtovoy

Hi Christoph/Sagi/Keith,
I've noticed that we are having some un-expected behaviour in some of
the scenarios of connection establisment related to hostnqn and hostid.
This is specially around overriding hostid for a host.

For example:
-------------
1. Connect to a target with default hostnqn and hostid (after removing
the files from /etc/nvme):
    - # nvme connect -t tcp -n testsubsystem_0 -a 1.1.1.1 -s 4420
2. Check hostnqn and hostid
    - #cat /sys/class/nvme/nvme0/host*
       1791bca2-af3c-4e17-a310-6fbdde3b59cb
       nqn.2014-08.org.nvmexpress:uuid:fc4e5383-ce5c-470d-9889-4adbc2dbe969
3. Connect to a target with given hostid from cmdline:
    - # nvme connect -t tcp -n testsubsystem_0 -a 1.1.1.1 -s 4420 -I 77720af2-ce59-418b-82ff-7c9df0cea35e
        Failed to write to /dev/nvme-fabrics: Operation already in progress <--- failed on duplicate connection check
4. Check hostnqn and hostid of first controller
    - #cat /sys/class/nvme/nvme0/host*
       77720af2-ce59-418b-82ff-7c9df0cea35e <--- hostid was changed !!!!
       nqn.2014-08.org.nvmexpress:uuid:fc4e5383-ce5c-470d-9889-4adbc2dbe969
5. Connect to a target with given hostid from cmdline and allow duplicate connections:
    - # nvme connect -t tcp -n testsubsystem_0 -a 1.1.1.1 -s 4420 -I 77720af2-ce59-418b-82ff-7c9df0cea3bb -D
6. Check hostnqn and hostid of first controller again
    - #cat /sys/class/nvme/nvme0/host*
       77720af2-ce59-418b-82ff-7c9df0cea3bb  <--- hostid was changed again !!!!
       nqn.2014-08.org.nvmexpress:uuid:fc4e5383-ce5c-470d-9889-4adbc2dbe969

This seems like a wrong behaviour to me since in case we try to
reconnect the target will see that a different values after
reconnections and this change the expected behaviour to keep hostid.

According to the spec:

"This feature allows the host to register a Host Identifier with the
controller. The Host Identifier is used by the controller to determine
whether other controllers in the NVM subsystem are associated with the same
host.
"

Also:

"
A host that uses a single Host NQN may employ multiple Host Identifiers
to designate elements of the host
that access an NVM subsystem independently of each other (e.g., physical
or logical partitions of the host)
"

We've decided to implement 1:1 mapping for hostnqn and hostid to have
unambiguous host identification even though the fabrics spec allows it.

These patchset fixes the current behaviour.

The first 2 patches are some preparations and cleanups and patch 3/3 is
the actual fix.

Max Gurtovoy (3):
  nvme-fabrics: unify common code in admin and io queue connect
  nvme-fabrics: check hostid using uuid_equal
  nvme-fabrics: prevent overriding of existing host

 drivers/nvme/host/fabrics.c | 172 ++++++++++++++++++++++++------------
 drivers/nvme/host/fabrics.h |   2 +-
 2 files changed, 115 insertions(+), 59 deletions(-)

-- 
2.18.1



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3 1/3] nvme-fabrics: unify common code in admin and io queue connect
  2023-05-12 15:41 [PATCH v3 0/3] nvme-fabrics: fix un-expected behaviour related to hostnqn and hostid Max Gurtovoy
@ 2023-05-12 15:41 ` Max Gurtovoy
  2023-05-12 15:41 ` [PATCH v3 2/3] nvme-fabrics: check hostid using uuid_equal Max Gurtovoy
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Max Gurtovoy @ 2023-05-12 15:41 UTC (permalink / raw)
  To: hch, sagi, kbusch, linux-nvme
  Cc: hare, axboe, oren, ngottlieb, israelr, Max Gurtovoy

To simplify code maintenance, it is recommended to avoid duplicating
code.

Tested-by: Noam Gottlieb <ngottlieb@nvidia.com>
Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---

changes from v2:
- collected Reviewed-by signatures (Hannes & Christoph)

changes from v1:
- address comments from Christoph

---
 drivers/nvme/host/fabrics.c | 74 +++++++++++++++++++++----------------
 1 file changed, 43 insertions(+), 31 deletions(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 0069ebff85df..0c0172bdc4dc 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -349,6 +349,45 @@ static void nvmf_log_connect_error(struct nvme_ctrl *ctrl,
 	}
 }
 
+static struct nvmf_connect_data *nvmf_connect_data_prep(struct nvme_ctrl *ctrl,
+		u16 cntlid)
+{
+	struct nvmf_connect_data *data;
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return NULL;
+
+	uuid_copy(&data->hostid, &ctrl->opts->host->id);
+	data->cntlid = cpu_to_le16(cntlid);
+	strncpy(data->subsysnqn, ctrl->opts->subsysnqn, NVMF_NQN_SIZE);
+	strncpy(data->hostnqn, ctrl->opts->host->nqn, NVMF_NQN_SIZE);
+
+	return data;
+}
+
+static void nvmf_connect_cmd_prep(struct nvme_ctrl *ctrl, u16 qid,
+		struct nvme_command *cmd)
+{
+	cmd->connect.opcode = nvme_fabrics_command;
+	cmd->connect.fctype = nvme_fabrics_type_connect;
+	cmd->connect.qid = cpu_to_le16(qid);
+
+	if (qid) {
+		cmd->connect.sqsize = cpu_to_le16(ctrl->sqsize);
+	} else {
+		cmd->connect.sqsize = cpu_to_le16(NVME_AQ_DEPTH - 1);
+
+		/*
+		 * set keep-alive timeout in seconds granularity (ms * 1000)
+		 */
+		cmd->connect.kato = cpu_to_le32(ctrl->kato * 1000);
+	}
+
+	if (ctrl->opts->disable_sqflow)
+		cmd->connect.cattr |= NVME_CONNECT_DISABLE_SQFLOW;
+}
+
 /**
  * nvmf_connect_admin_queue() - NVMe Fabrics Admin Queue "Connect"
  *				API function.
@@ -377,28 +416,12 @@ int nvmf_connect_admin_queue(struct nvme_ctrl *ctrl)
 	int ret;
 	u32 result;
 
-	cmd.connect.opcode = nvme_fabrics_command;
-	cmd.connect.fctype = nvme_fabrics_type_connect;
-	cmd.connect.qid = 0;
-	cmd.connect.sqsize = cpu_to_le16(NVME_AQ_DEPTH - 1);
-
-	/*
-	 * Set keep-alive timeout in seconds granularity (ms * 1000)
-	 */
-	cmd.connect.kato = cpu_to_le32(ctrl->kato * 1000);
-
-	if (ctrl->opts->disable_sqflow)
-		cmd.connect.cattr |= NVME_CONNECT_DISABLE_SQFLOW;
+	nvmf_connect_cmd_prep(ctrl, 0, &cmd);
 
-	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	data = nvmf_connect_data_prep(ctrl, 0xffff);
 	if (!data)
 		return -ENOMEM;
 
-	uuid_copy(&data->hostid, &ctrl->opts->host->id);
-	data->cntlid = cpu_to_le16(0xffff);
-	strncpy(data->subsysnqn, ctrl->opts->subsysnqn, NVMF_NQN_SIZE);
-	strncpy(data->hostnqn, ctrl->opts->host->nqn, NVMF_NQN_SIZE);
-
 	ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, &res,
 			data, sizeof(*data), NVME_QID_ANY, 1,
 			BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT);
@@ -468,23 +491,12 @@ int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, u16 qid)
 	int ret;
 	u32 result;
 
-	cmd.connect.opcode = nvme_fabrics_command;
-	cmd.connect.fctype = nvme_fabrics_type_connect;
-	cmd.connect.qid = cpu_to_le16(qid);
-	cmd.connect.sqsize = cpu_to_le16(ctrl->sqsize);
+	nvmf_connect_cmd_prep(ctrl, qid, &cmd);
 
-	if (ctrl->opts->disable_sqflow)
-		cmd.connect.cattr |= NVME_CONNECT_DISABLE_SQFLOW;
-
-	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	data = nvmf_connect_data_prep(ctrl, ctrl->cntlid);
 	if (!data)
 		return -ENOMEM;
 
-	uuid_copy(&data->hostid, &ctrl->opts->host->id);
-	data->cntlid = cpu_to_le16(ctrl->cntlid);
-	strncpy(data->subsysnqn, ctrl->opts->subsysnqn, NVMF_NQN_SIZE);
-	strncpy(data->hostnqn, ctrl->opts->host->nqn, NVMF_NQN_SIZE);
-
 	ret = __nvme_submit_sync_cmd(ctrl->connect_q, &cmd, &res,
 			data, sizeof(*data), qid, 1,
 			BLK_MQ_REQ_RESERVED | BLK_MQ_REQ_NOWAIT);
-- 
2.18.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 2/3] nvme-fabrics: check hostid using uuid_equal
  2023-05-12 15:41 [PATCH v3 0/3] nvme-fabrics: fix un-expected behaviour related to hostnqn and hostid Max Gurtovoy
  2023-05-12 15:41 ` [PATCH v3 1/3] nvme-fabrics: unify common code in admin and io queue connect Max Gurtovoy
@ 2023-05-12 15:41 ` Max Gurtovoy
  2023-05-12 15:41 ` [PATCH v3 3/3] nvme-fabrics: prevent overriding of existing host Max Gurtovoy
  2023-05-15 17:15 ` [PATCH v3 0/3] nvme-fabrics: fix un-expected behaviour related to hostnqn and hostid Keith Busch
  3 siblings, 0 replies; 6+ messages in thread
From: Max Gurtovoy @ 2023-05-12 15:41 UTC (permalink / raw)
  To: hch, sagi, kbusch, linux-nvme
  Cc: hare, axboe, oren, ngottlieb, israelr, Max Gurtovoy

Use a dedicated function to match uuids instead of duplicating it.

Tested-by: Noam Gottlieb <ngottlieb@nvidia.com>
Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---

changes from v2:
- collected Reviewed-by signatures (Hannes & Christoph)

changes from v1:
- address comments from Christoph

---
 drivers/nvme/host/fabrics.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index dcac3df8a5f7..862bc4e5e3d9 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -181,7 +181,7 @@ nvmf_ctlr_matches_baseopts(struct nvme_ctrl *ctrl,
 	    ctrl->state == NVME_CTRL_DEAD ||
 	    strcmp(opts->subsysnqn, ctrl->opts->subsysnqn) ||
 	    strcmp(opts->host->nqn, ctrl->opts->host->nqn) ||
-	    memcmp(&opts->host->id, &ctrl->opts->host->id, sizeof(uuid_t)))
+	    !uuid_equal(&opts->host->id, &ctrl->opts->host->id))
 		return false;
 
 	return true;
-- 
2.18.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 3/3] nvme-fabrics: prevent overriding of existing host
  2023-05-12 15:41 [PATCH v3 0/3] nvme-fabrics: fix un-expected behaviour related to hostnqn and hostid Max Gurtovoy
  2023-05-12 15:41 ` [PATCH v3 1/3] nvme-fabrics: unify common code in admin and io queue connect Max Gurtovoy
  2023-05-12 15:41 ` [PATCH v3 2/3] nvme-fabrics: check hostid using uuid_equal Max Gurtovoy
@ 2023-05-12 15:41 ` Max Gurtovoy
  2023-05-12 20:27   ` Christoph Hellwig
  2023-05-15 17:15 ` [PATCH v3 0/3] nvme-fabrics: fix un-expected behaviour related to hostnqn and hostid Keith Busch
  3 siblings, 1 reply; 6+ messages in thread
From: Max Gurtovoy @ 2023-05-12 15:41 UTC (permalink / raw)
  To: hch, sagi, kbusch, linux-nvme
  Cc: hare, axboe, oren, ngottlieb, israelr, Max Gurtovoy

When first connecting a target using the "default" host parameters,
setting the hostid from the command line during a subsequent connection
establishment would override the "default" hostid parameter. This would
cause an existing connection that is already using the host definitions
to lose its hostid.

To address this issue, the code has been modified to allow only 1:1
mapping between hostnqn and hostid. This will maintain unambiguous host
identification. Any non 1:1 mapping will be rejected during connection
establishment.

Tested-by: Noam Gottlieb <ngottlieb@nvidia.com>
Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
---

changes from v2:
- structured the code and removed comments that redundant now

changes from v1:
- address comments from Christoph
- allow only 1:1 mapping between hostnqn and hostid
- add error prints in case we fail a connection for user to understand
  the reason of the failure

---
 drivers/nvme/host/fabrics.c | 98 +++++++++++++++++++++++++++----------
 1 file changed, 71 insertions(+), 27 deletions(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 0c0172bdc4dc..fc069c41260e 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -21,35 +21,79 @@ static DEFINE_MUTEX(nvmf_hosts_mutex);
 
 static struct nvmf_host *nvmf_default_host;
 
-static struct nvmf_host *__nvmf_host_find(const char *hostnqn)
+/**
+ * __nvmf_host_find() - Find a matching to a previously created host
+ * @hostnqn: Host NQN to match
+ * @id: Host ID to match
+ *
+ * We have defined a host as how it is perceived by the target.
+ * Therefore, we don't allow different Host NQNs with the same Host ID.
+ * Similarly, we do not allow the usage of the same Host NQN with different
+ * Host IDs. This will maintain unambiguous host identification.
+ *
+ * Return: Returns host pointer on success, NULL in case of no match or
+ *         ERR_PTR(-EINVAL) in case of error match.
+ */
+static struct nvmf_host *__nvmf_host_find(const char *hostnqn, uuid_t *id)
 {
 	struct nvmf_host *host;
 
+	lockdep_assert_held(&nvmf_hosts_mutex);
+
 	list_for_each_entry(host, &nvmf_hosts, list) {
-		if (!strcmp(host->nqn, hostnqn))
+		bool same_hostnqn = !strcmp(host->nqn, hostnqn);
+		bool same_hostid = uuid_equal(&host->id, id);
+
+		if (same_hostnqn && same_hostid)
 			return host;
+
+		if (same_hostnqn) {
+			pr_err("found same hostnqn %s but different hostid %pUb\n",
+			       hostnqn, id);
+			return ERR_PTR(-EINVAL);
+		}
+		if (same_hostid) {
+			pr_err("found same hostid %pUb but different hostnqn %s\n",
+			       id, hostnqn);
+			return ERR_PTR(-EINVAL);
+
+		}
 	}
 
 	return NULL;
 }
 
-static struct nvmf_host *nvmf_host_add(const char *hostnqn)
+static struct nvmf_host *nvmf_host_alloc(const char *hostnqn, uuid_t *id)
+{
+	struct nvmf_host *host;
+
+	host = kmalloc(sizeof(*host), GFP_KERNEL);
+	if (!host)
+		return NULL;
+
+	kref_init(&host->ref);
+	uuid_copy(&host->id, id);
+	strscpy(host->nqn, hostnqn, NVMF_NQN_SIZE);
+
+	return host;
+}
+
+static struct nvmf_host *nvmf_host_add(const char *hostnqn, uuid_t *id)
 {
 	struct nvmf_host *host;
 
 	mutex_lock(&nvmf_hosts_mutex);
-	host = __nvmf_host_find(hostnqn);
-	if (host) {
+	host = __nvmf_host_find(hostnqn, id);
+	if (IS_ERR(host)) {
+		goto out_unlock;
+	} else if (host) {
 		kref_get(&host->ref);
 		goto out_unlock;
 	}
 
-	host = kmalloc(sizeof(*host), GFP_KERNEL);
+	host = nvmf_host_alloc(hostnqn, id);
 	if (!host)
-		goto out_unlock;
-
-	kref_init(&host->ref);
-	strscpy(host->nqn, hostnqn, NVMF_NQN_SIZE);
+		return ERR_PTR(-ENOMEM);
 
 	list_add_tail(&host->list, &nvmf_hosts);
 out_unlock:
@@ -60,16 +104,17 @@ static struct nvmf_host *nvmf_host_add(const char *hostnqn)
 static struct nvmf_host *nvmf_host_default(void)
 {
 	struct nvmf_host *host;
+	char nqn[NVMF_NQN_SIZE];
+	uuid_t id;
 
-	host = kmalloc(sizeof(*host), GFP_KERNEL);
+	uuid_gen(&id);
+	snprintf(nqn, NVMF_NQN_SIZE,
+		"nqn.2014-08.org.nvmexpress:uuid:%pUb", &id);
+
+	host = nvmf_host_alloc(nqn, &id);
 	if (!host)
 		return NULL;
 
-	kref_init(&host->ref);
-	uuid_gen(&host->id);
-	snprintf(host->nqn, NVMF_NQN_SIZE,
-		"nqn.2014-08.org.nvmexpress:uuid:%pUb", &host->id);
-
 	mutex_lock(&nvmf_hosts_mutex);
 	list_add_tail(&host->list, &nvmf_hosts);
 	mutex_unlock(&nvmf_hosts_mutex);
@@ -633,6 +678,7 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
 	size_t nqnlen  = 0;
 	int ctrl_loss_tmo = NVMF_DEF_CTRL_LOSS_TMO;
 	uuid_t hostid;
+	char hostnqn[NVMF_NQN_SIZE];
 
 	/* Set defaults */
 	opts->queue_size = NVMF_DEF_QUEUE_SIZE;
@@ -649,7 +695,9 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
 	if (!options)
 		return -ENOMEM;
 
-	uuid_gen(&hostid);
+	/* use default host if not given by user space */
+	uuid_copy(&hostid, &nvmf_default_host->id);
+	strscpy(hostnqn, nvmf_default_host->nqn, NVMF_NQN_SIZE);
 
 	while ((p = strsep(&o, ",\n")) != NULL) {
 		if (!*p)
@@ -795,12 +843,8 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
 				ret = -EINVAL;
 				goto out;
 			}
-			opts->host = nvmf_host_add(p);
+			strscpy(hostnqn, p, NVMF_NQN_SIZE);
 			kfree(p);
-			if (!opts->host) {
-				ret = -ENOMEM;
-				goto out;
-			}
 			break;
 		case NVMF_OPT_RECONNECT_DELAY:
 			if (match_int(args, &token)) {
@@ -957,13 +1001,13 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
 				opts->fast_io_fail_tmo, ctrl_loss_tmo);
 	}
 
-	if (!opts->host) {
-		kref_get(&nvmf_default_host->ref);
-		opts->host = nvmf_default_host;
+	opts->host = nvmf_host_add(hostnqn, &hostid);
+	if (IS_ERR(opts->host)) {
+		ret = PTR_ERR(opts->host);
+		opts->host = NULL;
+		goto out;
 	}
 
-	uuid_copy(&opts->host->id, &hostid);
-
 out:
 	kfree(options);
 	return ret;
-- 
2.18.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 3/3] nvme-fabrics: prevent overriding of existing host
  2023-05-12 15:41 ` [PATCH v3 3/3] nvme-fabrics: prevent overriding of existing host Max Gurtovoy
@ 2023-05-12 20:27   ` Christoph Hellwig
  0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2023-05-12 20:27 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: hch, sagi, kbusch, linux-nvme, hare, axboe, oren, ngottlieb, israelr

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 0/3] nvme-fabrics: fix un-expected behaviour related to hostnqn and hostid
  2023-05-12 15:41 [PATCH v3 0/3] nvme-fabrics: fix un-expected behaviour related to hostnqn and hostid Max Gurtovoy
                   ` (2 preceding siblings ...)
  2023-05-12 15:41 ` [PATCH v3 3/3] nvme-fabrics: prevent overriding of existing host Max Gurtovoy
@ 2023-05-15 17:15 ` Keith Busch
  3 siblings, 0 replies; 6+ messages in thread
From: Keith Busch @ 2023-05-15 17:15 UTC (permalink / raw)
  To: Max Gurtovoy; +Cc: hch, sagi, linux-nvme, hare, axboe, oren, ngottlieb, israelr

Thanks, applied to nvme-6.5.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-05-15 17:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-12 15:41 [PATCH v3 0/3] nvme-fabrics: fix un-expected behaviour related to hostnqn and hostid Max Gurtovoy
2023-05-12 15:41 ` [PATCH v3 1/3] nvme-fabrics: unify common code in admin and io queue connect Max Gurtovoy
2023-05-12 15:41 ` [PATCH v3 2/3] nvme-fabrics: check hostid using uuid_equal Max Gurtovoy
2023-05-12 15:41 ` [PATCH v3 3/3] nvme-fabrics: prevent overriding of existing host Max Gurtovoy
2023-05-12 20:27   ` Christoph Hellwig
2023-05-15 17:15 ` [PATCH v3 0/3] nvme-fabrics: fix un-expected behaviour related to hostnqn and hostid Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).