All of lore.kernel.org
 help / color / mirror / Atom feed
* track subsystem relationships and shared namespaces V2
@ 2017-06-19  9:57 Christoph Hellwig
  2017-06-19  9:57 ` [PATCH 1/6] nvme: remove a misleading comment on strut nvme_ns Christoph Hellwig
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Christoph Hellwig @ 2017-06-19  9:57 UTC (permalink / raw)


Hi all,

this set adds the first bits of required infrastructure for proper
multipath support in the NVMe driver.  It tracks if multiple controllers
belong to the same subsystem, and if they do which namespaces on the
controller refer to the same data.  As part of that it validates a lot
of the restrictions in the NVMe spec related to these facts.

Btw, qemu is a perfect way to create controllers with the same NQN
that fail these checks, as it allows you to specify the same serial
number for multiple controllers, and never sets the required CMIC
bit for multi-controller subsystems.

Changes since V1:
 - don't hardcode offsets in nvme_init_subnqn
 - rename __nvme_find_subsystem to __nvme_find_get_subsystem
 - rename nvme_ns_link_siblings and move the list_add out of it

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/6] nvme: remove a misleading comment on strut nvme_ns
  2017-06-19  9:57 track subsystem relationships and shared namespaces V2 Christoph Hellwig
@ 2017-06-19  9:57 ` Christoph Hellwig
  2017-06-19 16:49   ` Keith Busch
  2017-06-19  9:57 ` [PATCH 2/6] nvme: read the subsystem NQN from Identify Controller Christoph Hellwig
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2017-06-19  9:57 UTC (permalink / raw)


While a NVMe Namespace is somewhat similar to a SCSI Logical Unit (and not
a Logical Unit Number anyway) there are subtile differences.  Remove the
misleading comment.

Signed-off-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Sagi Grimberg <sagi at grmberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
---
 drivers/nvme/host/nvme.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index ec8c7363934d..5985dbb25a90 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -179,9 +179,6 @@ struct nvme_ctrl {
 	struct nvmf_ctrl_options *opts;
 };
 
-/*
- * An NVM Express namespace is equivalent to a SCSI LUN
- */
 struct nvme_ns {
 	struct list_head list;
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/6] nvme: read the subsystem NQN from Identify Controller
  2017-06-19  9:57 track subsystem relationships and shared namespaces V2 Christoph Hellwig
  2017-06-19  9:57 ` [PATCH 1/6] nvme: remove a misleading comment on strut nvme_ns Christoph Hellwig
@ 2017-06-19  9:57 ` Christoph Hellwig
  2017-06-19 11:44   ` Johannes Thumshirn
  2017-06-19 16:50   ` Keith Busch
  2017-06-19  9:57 ` [PATCH 3/6] nvme: simplify nvme_dev_attrs_are_visible Christoph Hellwig
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 26+ messages in thread
From: Christoph Hellwig @ 2017-06-19  9:57 UTC (permalink / raw)


NVMe 1.2.1 or later requires controllers to provide a subsystem NQN in the
Identify controller data structures.  Use this NQN for the subsysnqn
sysfs attribute by storing it in the nvme_ctrl structure after verifying
it.  For older controllers we generate a "fake" NQN per non-normative
text in the NVMe 1.3 spec.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c    | 31 ++++++++++++++++++++++++++++---
 drivers/nvme/host/fabrics.c | 10 ----------
 drivers/nvme/host/fabrics.h |  1 -
 drivers/nvme/host/fc.c      |  1 -
 drivers/nvme/host/nvme.h    |  2 +-
 drivers/nvme/host/rdma.c    |  1 -
 drivers/nvme/target/loop.c  |  1 -
 7 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index aee37b73231d..32cdad8afa2b 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1642,6 +1642,31 @@ static bool quirk_matches(const struct nvme_id_ctrl *id,
 		string_matches(id->fr, q->fr, sizeof(id->fr));
 }
 
+static void nvme_init_subnqn(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
+{
+	size_t nqnlen;
+	int off;
+
+	nqnlen = strnlen(id->subnqn, NVMF_NQN_SIZE);
+	if (nqnlen > 0 && nqnlen < NVMF_NQN_SIZE) {
+		strcpy(ctrl->subnqn, id->subnqn);
+		return;
+	}
+
+	if (ctrl->vs >= NVME_VS(1, 2, 1))
+		dev_warn(ctrl->device, "missing or invalid SUBNQN field.\n");
+
+	/* Generate a "fake" NQN per Figure 254 in NVMe 1.3 + ECN 001 */
+	off = snprintf(ctrl->subnqn, NVMF_NQN_SIZE,
+			"nqn.2014.08.org.nvmexpress:%4x%4x",
+			le16_to_cpu(id->vid), le16_to_cpu(id->ssvid));
+	memcpy(ctrl->subnqn + off, id->sn, sizeof(id->sn));
+	off += sizeof(id->sn);
+	memcpy(ctrl->subnqn + off, id->mn, sizeof(id->mn));
+	off += sizeof(id->mn);
+	memset(ctrl->subnqn + off, 0, sizeof(ctrl->subnqn) - off);
+}
+
 /*
  * Initialize the cached copies of the Identify data and various controller
  * register in our nvme_ctrl structure.  This should be called as soon as
@@ -1677,6 +1702,8 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 		return -EIO;
 	}
 
+	nvme_init_subnqn(ctrl, id);
+
 	if (!ctrl->identified) {
 		/*
 		 * Check for quirks.  Quirk can depend on firmware version,
@@ -2070,8 +2097,7 @@ static ssize_t nvme_sysfs_show_subsysnqn(struct device *dev,
 {
 	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
 
-	return snprintf(buf, PAGE_SIZE, "%s\n",
-			ctrl->ops->get_subsysnqn(ctrl));
+	return snprintf(buf, PAGE_SIZE, "%s\n", ctrl->subnqn);
 }
 static DEVICE_ATTR(subsysnqn, S_IRUGO, nvme_sysfs_show_subsysnqn, NULL);
 
@@ -2116,7 +2142,6 @@ static umode_t nvme_dev_attrs_are_visible(struct kobject *kobj,
 			return 0;
 	}
 
-	CHECK_ATTR(ctrl, a, subsysnqn);
 	CHECK_ATTR(ctrl, a, address);
 
 	return a->mode;
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 6e6864516ce6..4798a8a425e0 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -128,16 +128,6 @@ int nvmf_get_address(struct nvme_ctrl *ctrl, char *buf, int size)
 EXPORT_SYMBOL_GPL(nvmf_get_address);
 
 /**
- * nvmf_get_subsysnqn() - Get subsystem NQN
- * @ctrl:	Host NVMe controller instance which we got the NQN
- */
-const char *nvmf_get_subsysnqn(struct nvme_ctrl *ctrl)
-{
-	return ctrl->opts->subsysnqn;
-}
-EXPORT_SYMBOL_GPL(nvmf_get_subsysnqn);
-
-/**
  * nvmf_reg_read32() -  NVMe Fabrics "Property Get" API function.
  * @ctrl:	Host NVMe controller instance maintaining the admin
  *		queue used to submit the property read command to
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index f1c9bd7ae7ff..694d024707aa 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -138,7 +138,6 @@ int nvmf_connect_io_queue(struct nvme_ctrl *ctrl, u16 qid);
 int nvmf_register_transport(struct nvmf_transport_ops *ops);
 void nvmf_unregister_transport(struct nvmf_transport_ops *ops);
 void nvmf_free_options(struct nvmf_ctrl_options *opts);
-const char *nvmf_get_subsysnqn(struct nvme_ctrl *ctrl);
 int nvmf_get_address(struct nvme_ctrl *ctrl, char *buf, int size);
 bool nvmf_should_reconnect(struct nvme_ctrl *ctrl);
 
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index 5165007e86a6..e443a88e121f 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2631,7 +2631,6 @@ static const struct nvme_ctrl_ops nvme_fc_ctrl_ops = {
 	.free_ctrl		= nvme_fc_nvme_ctrl_freed,
 	.submit_async_event	= nvme_fc_submit_async_event,
 	.delete_ctrl		= nvme_fc_del_nvme_ctrl,
-	.get_subsysnqn		= nvmf_get_subsysnqn,
 	.get_address		= nvmf_get_address,
 };
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 5985dbb25a90..e1d5e775afed 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -138,6 +138,7 @@ struct nvme_ctrl {
 	char serial[20];
 	char model[40];
 	char firmware_rev[8];
+	char subnqn[NVMF_NQN_SIZE];
 	u16 cntlid;
 
 	u32 ctrl_config;
@@ -220,7 +221,6 @@ struct nvme_ctrl_ops {
 	void (*free_ctrl)(struct nvme_ctrl *ctrl);
 	void (*submit_async_event)(struct nvme_ctrl *ctrl, int aer_idx);
 	int (*delete_ctrl)(struct nvme_ctrl *ctrl);
-	const char *(*get_subsysnqn)(struct nvme_ctrl *ctrl);
 	int (*get_address)(struct nvme_ctrl *ctrl, char *buf, int size);
 };
 
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 01dc723e6acf..f190271bf522 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1757,7 +1757,6 @@ static const struct nvme_ctrl_ops nvme_rdma_ctrl_ops = {
 	.free_ctrl		= nvme_rdma_free_ctrl,
 	.submit_async_event	= nvme_rdma_submit_async_event,
 	.delete_ctrl		= nvme_rdma_del_ctrl,
-	.get_subsysnqn		= nvmf_get_subsysnqn,
 	.get_address		= nvmf_get_address,
 };
 
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index f67606523724..f33d027df86b 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -542,7 +542,6 @@ static const struct nvme_ctrl_ops nvme_loop_ctrl_ops = {
 	.free_ctrl		= nvme_loop_free_ctrl,
 	.submit_async_event	= nvme_loop_submit_async_event,
 	.delete_ctrl		= nvme_loop_del_ctrl,
-	.get_subsysnqn		= nvmf_get_subsysnqn,
 };
 
 static int nvme_loop_create_io_queues(struct nvme_loop_ctrl *ctrl)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/6] nvme: simplify nvme_dev_attrs_are_visible
  2017-06-19  9:57 track subsystem relationships and shared namespaces V2 Christoph Hellwig
  2017-06-19  9:57 ` [PATCH 1/6] nvme: remove a misleading comment on strut nvme_ns Christoph Hellwig
  2017-06-19  9:57 ` [PATCH 2/6] nvme: read the subsystem NQN from Identify Controller Christoph Hellwig
@ 2017-06-19  9:57 ` Christoph Hellwig
  2017-06-19 16:50   ` Keith Busch
  2017-06-19  9:57 ` [PATCH 4/6] nvme-fabrics: verify that a controller returns the correct NQN Christoph Hellwig
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2017-06-19  9:57 UTC (permalink / raw)


Signed-off-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
---
 drivers/nvme/host/core.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 32cdad8afa2b..4342ec9bc528 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2126,23 +2126,16 @@ static struct attribute *nvme_dev_attrs[] = {
 	NULL
 };
 
-#define CHECK_ATTR(ctrl, a, name)		\
-	if ((a) == &dev_attr_##name.attr &&	\
-	    !(ctrl)->ops->get_##name)		\
-		return 0
-
 static umode_t nvme_dev_attrs_are_visible(struct kobject *kobj,
 		struct attribute *a, int n)
 {
 	struct device *dev = container_of(kobj, struct device, kobj);
 	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
 
-	if (a == &dev_attr_delete_controller.attr) {
-		if (!ctrl->ops->delete_ctrl)
-			return 0;
-	}
-
-	CHECK_ATTR(ctrl, a, address);
+	if (a == &dev_attr_delete_controller.attr && !ctrl->ops->delete_ctrl)
+		return 0;
+	if (a == &dev_attr_address.attr && !ctrl->ops->get_address)
+		return 0;
 
 	return a->mode;
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/6] nvme-fabrics: verify that a controller returns the correct NQN
  2017-06-19  9:57 track subsystem relationships and shared namespaces V2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2017-06-19  9:57 ` [PATCH 3/6] nvme: simplify nvme_dev_attrs_are_visible Christoph Hellwig
@ 2017-06-19  9:57 ` Christoph Hellwig
  2017-06-19  9:57 ` [PATCH 5/6] nvme: track subsystems Christoph Hellwig
  2017-06-19  9:57 ` [PATCH 6/6] nvme: track shared namespaces in a siblings list Christoph Hellwig
  5 siblings, 0 replies; 26+ messages in thread
From: Christoph Hellwig @ 2017-06-19  9:57 UTC (permalink / raw)


Signed-off-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
---
 drivers/nvme/host/fabrics.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 4798a8a425e0..87038a017fb9 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -862,6 +862,15 @@ nvmf_create_ctrl(struct device *dev, const char *buf, size_t count)
 		goto out_unlock;
 	}
 
+	if (strcmp(ctrl->subnqn, opts->subsysnqn)) {
+		dev_warn(ctrl->device,
+			"controller returned incorrect NQN: \"%s\".\n",
+			ctrl->subnqn);
+		mutex_unlock(&nvmf_transports_mutex);
+		ctrl->ops->delete_ctrl(ctrl);
+		return ERR_PTR(-EINVAL);
+	}
+
 	mutex_unlock(&nvmf_transports_mutex);
 	return ctrl;
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/6] nvme: track subsystems
  2017-06-19  9:57 track subsystem relationships and shared namespaces V2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2017-06-19  9:57 ` [PATCH 4/6] nvme-fabrics: verify that a controller returns the correct NQN Christoph Hellwig
@ 2017-06-19  9:57 ` Christoph Hellwig
  2017-06-19 11:47   ` Johannes Thumshirn
  2017-06-19 16:52   ` Keith Busch
  2017-06-19  9:57 ` [PATCH 6/6] nvme: track shared namespaces in a siblings list Christoph Hellwig
  5 siblings, 2 replies; 26+ messages in thread
From: Christoph Hellwig @ 2017-06-19  9:57 UTC (permalink / raw)


This adds a new nvme_subsystem structure so that we can track multiple
controllers that belong to a single subsystem.  For now we only use it
to store the NQN, and to check that we don't have duplicate NQNs unless
the involved subsystems support multiple controllers.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c    | 111 ++++++++++++++++++++++++++++++++++++++++----
 drivers/nvme/host/fabrics.c |   4 +-
 drivers/nvme/host/nvme.h    |  12 ++++-
 3 files changed, 116 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 4342ec9bc528..d88cd6a4a549 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -68,6 +68,9 @@ MODULE_PARM_DESC(force_apst, "allow APST for newly enumerated devices even if qu
 struct workqueue_struct *nvme_wq;
 EXPORT_SYMBOL_GPL(nvme_wq);
 
+static LIST_HEAD(nvme_subsystems);
+static DEFINE_MUTEX(nvme_subsystems_lock);
+
 static LIST_HEAD(nvme_ctrl_list);
 static DEFINE_SPINLOCK(dev_list_lock);
 
@@ -1642,14 +1645,15 @@ static bool quirk_matches(const struct nvme_id_ctrl *id,
 		string_matches(id->fr, q->fr, sizeof(id->fr));
 }
 
-static void nvme_init_subnqn(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
+static void nvme_init_subnqn(struct nvme_subsystem *subsys, struct nvme_ctrl *ctrl,
+		struct nvme_id_ctrl *id)
 {
 	size_t nqnlen;
 	int off;
 
 	nqnlen = strnlen(id->subnqn, NVMF_NQN_SIZE);
 	if (nqnlen > 0 && nqnlen < NVMF_NQN_SIZE) {
-		strcpy(ctrl->subnqn, id->subnqn);
+		strcpy(subsys->subnqn, id->subnqn);
 		return;
 	}
 
@@ -1657,14 +1661,91 @@ static void nvme_init_subnqn(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
 		dev_warn(ctrl->device, "missing or invalid SUBNQN field.\n");
 
 	/* Generate a "fake" NQN per Figure 254 in NVMe 1.3 + ECN 001 */
-	off = snprintf(ctrl->subnqn, NVMF_NQN_SIZE,
+	off = snprintf(subsys->subnqn, NVMF_NQN_SIZE,
 			"nqn.2014.08.org.nvmexpress:%4x%4x",
 			le16_to_cpu(id->vid), le16_to_cpu(id->ssvid));
-	memcpy(ctrl->subnqn + off, id->sn, sizeof(id->sn));
+	memcpy(subsys->subnqn + off, id->sn, sizeof(id->sn));
 	off += sizeof(id->sn);
-	memcpy(ctrl->subnqn + off, id->mn, sizeof(id->mn));
+	memcpy(subsys->subnqn + off, id->mn, sizeof(id->mn));
 	off += sizeof(id->mn);
-	memset(ctrl->subnqn + off, 0, sizeof(ctrl->subnqn) - off);
+	memset(subsys->subnqn + off, 0, sizeof(subsys->subnqn) - off);
+}
+
+static void nvme_destroy_subsystem(struct kref *ref)
+{
+	struct nvme_subsystem *subsys =
+			container_of(ref, struct nvme_subsystem, ref);
+
+	mutex_lock(&nvme_subsystems_lock);
+	list_del(&subsys->entry);
+	mutex_unlock(&nvme_subsystems_lock);
+
+	kfree(ref);
+}
+
+static void nvme_put_subsystem(struct nvme_subsystem *subsys)
+{
+	kref_put(&subsys->ref, nvme_destroy_subsystem);
+}
+
+static struct nvme_subsystem *__nvme_find_get_subsystem(const char *subsysnqn)
+{
+	struct nvme_subsystem *subsys;
+
+	lockdep_assert_held(&nvme_subsystems_lock);
+
+	list_for_each_entry(subsys, &nvme_subsystems, entry) {
+		if (strcmp(subsys->subnqn, subsysnqn))
+			continue;
+		if (!kref_get_unless_zero(&subsys->ref))
+			continue;
+		return subsys;
+	}
+
+	return NULL;
+}
+
+static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
+{
+	struct nvme_subsystem *subsys, *found;
+
+	subsys = kzalloc(sizeof(*subsys), GFP_KERNEL);
+	if (!subsys)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&subsys->ctrls);
+	kref_init(&subsys->ref);
+	nvme_init_subnqn(subsys, ctrl, id);
+	mutex_init(&subsys->lock);
+
+	mutex_lock(&nvme_subsystems_lock);
+	found = __nvme_find_get_subsystem(subsys->subnqn);
+	if (found) {
+		/*
+		 * Verify that the subsystem actually supports multiple
+		 * controllers, else bail out.
+		 */
+		kfree(subsys);
+		if (!(id->cmic & (1 << 1))) {
+			dev_err(ctrl->device,
+				"ignoring ctrl due to duplicate subnqn (%s).\n",
+				found->subnqn);
+			mutex_unlock(&nvme_subsystems_lock);
+			return -EINVAL;
+		}
+
+		subsys = found;
+	} else {
+		list_add_tail(&subsys->entry, &nvme_subsystems);
+	}
+
+	ctrl->subsys = subsys;
+	mutex_unlock(&nvme_subsystems_lock);
+
+	mutex_lock(&subsys->lock);
+	list_add_tail(&ctrl->subsys_entry, &subsys->ctrls);
+	mutex_unlock(&subsys->lock);
+
+	return 0;
 }
 
 /*
@@ -1702,7 +1783,11 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 		return -EIO;
 	}
 
-	nvme_init_subnqn(ctrl, id);
+	ret = nvme_init_subsystem(ctrl, id);
+	if (ret) {
+		kfree(id);
+		return ret;
+	}
 
 	if (!ctrl->identified) {
 		/*
@@ -2097,7 +2182,7 @@ static ssize_t nvme_sysfs_show_subsysnqn(struct device *dev,
 {
 	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
 
-	return snprintf(buf, PAGE_SIZE, "%s\n", ctrl->subnqn);
+	return snprintf(buf, PAGE_SIZE, "%s\n", ctrl->subsys->subnqn);
 }
 static DEVICE_ATTR(subsysnqn, S_IRUGO, nvme_sysfs_show_subsysnqn, NULL);
 
@@ -2516,12 +2601,22 @@ EXPORT_SYMBOL_GPL(nvme_uninit_ctrl);
 static void nvme_free_ctrl(struct kref *kref)
 {
 	struct nvme_ctrl *ctrl = container_of(kref, struct nvme_ctrl, kref);
+	struct nvme_subsystem *subsys = ctrl->subsys;
 
 	put_device(ctrl->device);
 	nvme_release_instance(ctrl);
 	ida_destroy(&ctrl->ns_ida);
 
+	if (subsys) {
+		mutex_lock(&subsys->lock);
+		list_del(&ctrl->subsys_entry);
+		mutex_unlock(&subsys->lock);
+	}
+
 	ctrl->ops->free_ctrl(ctrl);
+
+	if (subsys)
+		nvme_put_subsystem(subsys);
 }
 
 void nvme_put_ctrl(struct nvme_ctrl *ctrl)
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 87038a017fb9..403f3146c1bd 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -862,10 +862,10 @@ nvmf_create_ctrl(struct device *dev, const char *buf, size_t count)
 		goto out_unlock;
 	}
 
-	if (strcmp(ctrl->subnqn, opts->subsysnqn)) {
+	if (strcmp(ctrl->subsys->subnqn, opts->subsysnqn)) {
 		dev_warn(ctrl->device,
 			"controller returned incorrect NQN: \"%s\".\n",
-			ctrl->subnqn);
+			ctrl->subsys->subnqn);
 		mutex_unlock(&nvmf_transports_mutex);
 		ctrl->ops->delete_ctrl(ctrl);
 		return ERR_PTR(-EINVAL);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index e1d5e775afed..fe0d4ec32207 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -132,13 +132,15 @@ struct nvme_ctrl {
 	struct ida ns_ida;
 	struct work_struct reset_work;
 
+	struct nvme_subsystem *subsys;
+	struct list_head subsys_entry;
+
 	struct opal_dev *opal_dev;
 
 	char name[12];
 	char serial[20];
 	char model[40];
 	char firmware_rev[8];
-	char subnqn[NVMF_NQN_SIZE];
 	u16 cntlid;
 
 	u32 ctrl_config;
@@ -180,6 +182,14 @@ struct nvme_ctrl {
 	struct nvmf_ctrl_options *opts;
 };
 
+struct nvme_subsystem {
+	struct list_head	entry;
+	struct mutex		lock;
+	struct list_head	ctrls;
+	struct kref		ref;
+	char			subnqn[NVMF_NQN_SIZE];
+};
+
 struct nvme_ns {
 	struct list_head list;
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/6] nvme: track shared namespaces in a siblings list
  2017-06-19  9:57 track subsystem relationships and shared namespaces V2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2017-06-19  9:57 ` [PATCH 5/6] nvme: track subsystems Christoph Hellwig
@ 2017-06-19  9:57 ` Christoph Hellwig
  2017-06-19 10:12   ` Sagi Grimberg
  5 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2017-06-19  9:57 UTC (permalink / raw)


Verify that our IDs match for shared namespaces in a subystem, and link
them up in a headless queue so that we can find siblings easily.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h |  1 +
 2 files changed, 100 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d88cd6a4a549..8a44d68a3f98 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2261,6 +2261,100 @@ static struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	return ret;
 }
 
+/*
+ * Check if the two nvme_ns structures refer to the same namespace.
+ *
+ * Returns 1 if they refer to the same namespace, 0 if not, or a negative
+ * errno value on error.
+ */
+static int nvme_ns_compare_ids(struct nvme_ns *ns, struct nvme_id_ns *id,
+		struct nvme_ns *cur)
+{
+	bool is_shared = id->nmic & (1 << 0);
+	bool have_id = false;
+
+	if (!uuid_is_null(&ns->uuid)) {
+		have_id = true;
+		if (!uuid_equal(&ns->uuid, &cur->uuid))
+			return 0;
+	}
+	if (memchr_inv(ns->nguid, 0, sizeof(ns->nguid))) {
+		have_id = true;
+		if (memcmp(&ns->nguid, &cur->nguid, sizeof(ns->nguid)))
+			return 0;
+	}
+	if (memchr_inv(ns->eui, 0, sizeof(ns->eui))) {
+		have_id = true;
+		if (memcmp(&ns->eui, &cur->eui, sizeof(ns->eui)))
+			return 0;
+	}
+
+	/*
+	 * Note that the unique identifiers are only a "should" condition in
+	 * the specification.  But we require them for multipath operations
+	 * due to safety concerns.
+	 */
+	if (!have_id && is_shared) {
+		dev_err(ns->ctrl->device,
+			"shared namespace %u without unique ID.\n",
+			ns->ns_id);
+		return -EINVAL;
+	}
+
+	if (have_id && !is_shared) {
+		dev_err(ns->ctrl->device,
+			"private namespace %u with duplicate ID.\n",
+			ns->ns_id);
+		return -EINVAL;
+	}
+
+	/*
+	 * As of NVMe 1.3 (Section 6.1) this is only strictly required if namespace
+	 * management is supported, but sort of implied for shared namespaces
+	 * by omitting the defintion.  We'll need an ECN for this language
+	 * eventually.
+	 */
+	if (is_shared && ns->ns_id != cur->ns_id) {
+		dev_err(ns->ctrl->device,
+			"non-matching nsids (%u vs %u) for shared namespaces\n",
+			ns->ns_id, cur->ns_id);
+		return -EINVAL;
+	}
+
+	return 1;
+}
+
+static int nvme_ns_find_siblings(struct nvme_ns *ns, struct nvme_id_ns *id)
+{
+	struct nvme_subsystem *subsys = ns->ctrl->subsys;
+	struct nvme_ctrl *ctrl;
+	struct nvme_ns *cur;
+	int err = 0, ret;
+
+	mutex_lock(&subsys->lock);
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+		if (ctrl == ns->ctrl)
+			continue;
+
+		mutex_lock(&ctrl->namespaces_mutex);
+		list_for_each_entry(cur, &ctrl->namespaces, list) {
+			ret = nvme_ns_compare_ids(ns, id, cur);
+			if (ret < 0) {
+				err = ret;
+				break;
+			}
+			if (ret > 0)
+				list_add(&ns->siblings, &cur->siblings);
+		}
+		mutex_unlock(&ctrl->namespaces_mutex);
+
+		if (err)
+			break;
+	}
+	mutex_unlock(&subsys->lock);
+	return err;
+}
+
 static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 {
 	struct nvme_ns *ns;
@@ -2283,6 +2377,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, ns->queue);
 	ns->queue->queuedata = ns;
 	ns->ctrl = ctrl;
+	INIT_LIST_HEAD(&ns->siblings);
 
 	kref_init(&ns->kref);
 	ns->ns_id = nsid;
@@ -2296,6 +2391,9 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	if (nvme_revalidate_ns(ns, &id))
 		goto out_free_queue;
 
+	if (nvme_ns_find_siblings(ns, id))
+		goto out_free_id;
+
 	if (nvme_nvm_ns_supported(ns, id) &&
 				nvme_nvm_register(ns, disk_name, node)) {
 		dev_warn(ctrl->device, "%s: LightNVM init failure\n", __func__);
@@ -2360,6 +2458,7 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 
 	mutex_lock(&ns->ctrl->namespaces_mutex);
 	list_del_init(&ns->list);
+	list_del_init(&ns->siblings);
 	mutex_unlock(&ns->ctrl->namespaces_mutex);
 
 	nvme_put_ns(ns);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index fe0d4ec32207..ab147b76cf28 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -196,6 +196,7 @@ struct nvme_ns {
 	struct nvme_ctrl *ctrl;
 	struct request_queue *queue;
 	struct gendisk *disk;
+	struct list_head siblings;
 	struct nvm_dev *ndev;
 	struct kref kref;
 	int instance;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/6] nvme: track shared namespaces in a siblings list
  2017-06-19  9:57 ` [PATCH 6/6] nvme: track shared namespaces in a siblings list Christoph Hellwig
@ 2017-06-19 10:12   ` Sagi Grimberg
  2017-06-19 10:19     ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Sagi Grimberg @ 2017-06-19 10:12 UTC (permalink / raw)




On 19/06/17 12:57, Christoph Hellwig wrote:
> Verify that our IDs match for shared namespaces in a subystem, and link
> them up in a headless queue so that we can find siblings easily.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>   drivers/nvme/host/core.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++
>   drivers/nvme/host/nvme.h |  1 +
>   2 files changed, 100 insertions(+)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index d88cd6a4a549..8a44d68a3f98 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2261,6 +2261,100 @@ static struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid)
>   	return ret;
>   }
>   
> +/*
> + * Check if the two nvme_ns structures refer to the same namespace.
> + *
> + * Returns 1 if they refer to the same namespace, 0 if not, or a negative
> + * errno value on error.
> + */
> +static int nvme_ns_compare_ids(struct nvme_ns *ns, struct nvme_id_ns *id,
> +		struct nvme_ns *cur)
> +{
> +	bool is_shared = id->nmic & (1 << 0);
> +	bool have_id = false;
> +
> +	if (!uuid_is_null(&ns->uuid)) {
> +		have_id = true;
> +		if (!uuid_equal(&ns->uuid, &cur->uuid))
> +			return 0;
> +	}
> +	if (memchr_inv(ns->nguid, 0, sizeof(ns->nguid))) {
> +		have_id = true;
> +		if (memcmp(&ns->nguid, &cur->nguid, sizeof(ns->nguid)))
> +			return 0;
> +	}
> +	if (memchr_inv(ns->eui, 0, sizeof(ns->eui))) {
> +		have_id = true;
> +		if (memcmp(&ns->eui, &cur->eui, sizeof(ns->eui)))
> +			return 0;
> +	}
> +
> +	/*
> +	 * Note that the unique identifiers are only a "should" condition in
> +	 * the specification.  But we require them for multipath operations
> +	 * due to safety concerns.
> +	 */
> +	if (!have_id && is_shared) {
> +		dev_err(ns->ctrl->device,
> +			"shared namespace %u without unique ID.\n",
> +			ns->ns_id);
> +		return -EINVAL;
> +	}
> +
> +	if (have_id && !is_shared) {
> +		dev_err(ns->ctrl->device,
> +			"private namespace %u with duplicate ID.\n",
> +			ns->ns_id);
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * As of NVMe 1.3 (Section 6.1) this is only strictly required if namespace
> +	 * management is supported, but sort of implied for shared namespaces
> +	 * by omitting the defintion.  We'll need an ECN for this language
> +	 * eventually.
> +	 */
> +	if (is_shared && ns->ns_id != cur->ns_id) {
> +		dev_err(ns->ctrl->device,
> +			"non-matching nsids (%u vs %u) for shared namespaces\n",
> +			ns->ns_id, cur->ns_id);
> +		return -EINVAL;
> +	}
> +
> +	return 1;
> +}
> +
> +static int nvme_ns_find_siblings(struct nvme_ns *ns, struct nvme_id_ns *id)
> +{
> +	struct nvme_subsystem *subsys = ns->ctrl->subsys;
> +	struct nvme_ctrl *ctrl;
> +	struct nvme_ns *cur;
> +	int err = 0, ret;
> +
> +	mutex_lock(&subsys->lock);
> +	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
> +		if (ctrl == ns->ctrl)
> +			continue;
> +
> +		mutex_lock(&ctrl->namespaces_mutex);
> +		list_for_each_entry(cur, &ctrl->namespaces, list) {
> +			ret = nvme_ns_compare_ids(ns, id, cur);
> +			if (ret < 0) {
> +				err = ret;
> +				break;
> +			}
> +			if (ret > 0)

			} else if (ret > 0) {
				list_add(&ns->siblings, &cur->siblings);
			}

> +				list_add(&ns->siblings, &cur->siblings);
> +		}
> +		mutex_unlock(&ctrl->namespaces_mutex);
> +
> +		if (err)
> +			break;
> +	}
> +	mutex_unlock(&subsys->lock);
> +	return err;
> +}
> +

General question, Would it make sense to invoke this search from the
block layer as part of adding a new disk (a new block_device operation)?
And should the sibling association maybe move up to struct block_device?
(we could cache nvme_id_ns in nvme_ns for that to happen)?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 6/6] nvme: track shared namespaces in a siblings list
  2017-06-19 10:12   ` Sagi Grimberg
@ 2017-06-19 10:19     ` Christoph Hellwig
  2017-06-19 10:39       ` Sagi Grimberg
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2017-06-19 10:19 UTC (permalink / raw)


On Mon, Jun 19, 2017@01:12:54PM +0300, Sagi Grimberg wrote:
> General question, Would it make sense to invoke this search from the
> block layer as part of adding a new disk (a new block_device operation)?
> And should the sibling association maybe move up to struct block_device?
> (we could cache nvme_id_ns in nvme_ns for that to happen)?

For the block/nvme multipath code I need to expose the relationship
to the block layer.  But the locking is already complicated enough as
is, so I want to keep the NVMe list as well with that.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 6/6] nvme: track shared namespaces in a siblings list
  2017-06-19 10:19     ` Christoph Hellwig
@ 2017-06-19 10:39       ` Sagi Grimberg
  2017-06-19 10:42         ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Sagi Grimberg @ 2017-06-19 10:39 UTC (permalink / raw)



>> General question, Would it make sense to invoke this search from the
>> block layer as part of adding a new disk (a new block_device operation)?
>> And should the sibling association maybe move up to struct block_device?
>> (we could cache nvme_id_ns in nvme_ns for that to happen)?
> 
> For the block/nvme multipath code I need to expose the relationship
> to the block layer.  But the locking is already complicated enough as
> is, so I want to keep the NVMe list as well with that.

Not sure what you mean, but I didn't mean to change any locking scheme,
just have the sibling list exist in struct block_device and call
nvme_ns_find_siblings as a fops callout when the bdev is created, driver
stays responsible for locking.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 6/6] nvme: track shared namespaces in a siblings list
  2017-06-19 10:39       ` Sagi Grimberg
@ 2017-06-19 10:42         ` Christoph Hellwig
  2017-06-19 10:57           ` Sagi Grimberg
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2017-06-19 10:42 UTC (permalink / raw)


On Mon, Jun 19, 2017@01:39:05PM +0300, Sagi Grimberg wrote:
> Not sure what you mean, but I didn't mean to change any locking scheme,
> just have the sibling list exist in struct block_device and call
> nvme_ns_find_siblings as a fops callout when the bdev is created, driver
> stays responsible for locking.

A common list with driver specific locking isn't going to work,
we'll need block locking for using it for multipath decisions.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 6/6] nvme: track shared namespaces in a siblings list
  2017-06-19 10:42         ` Christoph Hellwig
@ 2017-06-19 10:57           ` Sagi Grimberg
  2017-06-19 10:59             ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Sagi Grimberg @ 2017-06-19 10:57 UTC (permalink / raw)



>> Not sure what you mean, but I didn't mean to change any locking scheme,
>> just have the sibling list exist in struct block_device and call
>> nvme_ns_find_siblings as a fops callout when the bdev is created, driver
>> stays responsible for locking.
> 
> A common list with driver specific locking isn't going to work,
> we'll need block locking for using it for multipath decisions.

Didn't mean driver specific locking on the sibling list itself,
just on the sibling search. The sibling list should obviously have its
own locking. You have yet to reveal how the block layer should handle
the siblings, but I imagine you have some family representation, which
will have its own locking scheme to manage siblings/paths.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 6/6] nvme: track shared namespaces in a siblings list
  2017-06-19 10:57           ` Sagi Grimberg
@ 2017-06-19 10:59             ` Christoph Hellwig
  2017-06-19 11:27               ` Sagi Grimberg
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2017-06-19 10:59 UTC (permalink / raw)


On Mon, Jun 19, 2017@01:57:19PM +0300, Sagi Grimberg wrote:
> Didn't mean driver specific locking on the sibling list itself,
> just on the sibling search. The sibling list should obviously have its
> own locking. You have yet to reveal how the block layer should handle
> the siblings, but I imagine you have some family representation, which
> will have its own locking scheme to manage siblings/paths.

Oh.  No, I don't want to do the search in the block layer at all.

The are two (or more) request_queues siblings thing makes total sense
for the block layer.  But how we decide that fact is totally driver
specif?c.  E.g. for SCSI the search would look completely different
as we don't have the equivalent of the controllers per subsystem
list.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 6/6] nvme: track shared namespaces in a siblings list
  2017-06-19 10:59             ` Christoph Hellwig
@ 2017-06-19 11:27               ` Sagi Grimberg
  2017-06-19 13:00                 ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Sagi Grimberg @ 2017-06-19 11:27 UTC (permalink / raw)



>> Didn't mean driver specific locking on the sibling list itself,
>> just on the sibling search. The sibling list should obviously have its
>> own locking. You have yet to reveal how the block layer should handle
>> the siblings, but I imagine you have some family representation, which
>> will have its own locking scheme to manage siblings/paths.
> 
> Oh.  No, I don't want to do the search in the block layer at all.
> 
> The are two (or more) request_queues siblings thing makes total sense
> for the block layer.  But how we decide that fact is totally driver
> specif?c.  E.g. for SCSI the search would look completely different
> as we don't have the equivalent of the controllers per subsystem
> list.

OK, I'm totally not getting my point across obviously...

I completely agree that the multipath map (sibling list) is
driver specific, I'm just arguing that the search itself can
be invoked from the block layer through a block_device_operation
when the bdev is created (in there, the driver sibling search has
its own driver specific locking).

When a match is found, driver calls something like
blk_link_sibling(a, b) which grows a sibling relationship map, this
call has block layer locking protection.

[somehow I still have a feeling this won't get across either...]





btw, I didn't see handling of the case where a sibling match found where
the sibling is already linked (a sibling too).

say we have namespaces a, b and c, where b,c are siblings of a (all with
the same nsid=3).

If I read the code correctly, c will link to both a and b wouldn't it?

Do we need to check: list_empty(&cur->siblings)?

Or am I not understanding the data structure?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2/6] nvme: read the subsystem NQN from Identify Controller
  2017-06-19  9:57 ` [PATCH 2/6] nvme: read the subsystem NQN from Identify Controller Christoph Hellwig
@ 2017-06-19 11:44   ` Johannes Thumshirn
  2017-06-19 16:50   ` Keith Busch
  1 sibling, 0 replies; 26+ messages in thread
From: Johannes Thumshirn @ 2017-06-19 11:44 UTC (permalink / raw)


On Mon, Jun 19, 2017@11:57:50AM +0200, Christoph Hellwig wrote:
> NVMe 1.2.1 or later requires controllers to provide a subsystem NQN in the
> Identify controller data structures.  Use this NQN for the subsysnqn
> sysfs attribute by storing it in the nvme_ctrl structure after verifying
> it.  For older controllers we generate a "fake" NQN per non-normative
> text in the NVMe 1.3 spec.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---

Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 5/6] nvme: track subsystems
  2017-06-19  9:57 ` [PATCH 5/6] nvme: track subsystems Christoph Hellwig
@ 2017-06-19 11:47   ` Johannes Thumshirn
  2017-06-19 16:52   ` Keith Busch
  1 sibling, 0 replies; 26+ messages in thread
From: Johannes Thumshirn @ 2017-06-19 11:47 UTC (permalink / raw)


On Mon, Jun 19, 2017@11:57:53AM +0200, Christoph Hellwig wrote:
> This adds a new nvme_subsystem structure so that we can track multiple
> controllers that belong to a single subsystem.  For now we only use it
> to store the NQN, and to check that we don't have duplicate NQNs unless
> the involved subsystems support multiple controllers.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---

Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 6/6] nvme: track shared namespaces in a siblings list
  2017-06-19 11:27               ` Sagi Grimberg
@ 2017-06-19 13:00                 ` Christoph Hellwig
  2017-06-19 16:43                   ` Sagi Grimberg
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2017-06-19 13:00 UTC (permalink / raw)


On Mon, Jun 19, 2017@02:27:26PM +0300, Sagi Grimberg wrote:
> I completely agree that the multipath map (sibling list) is
> driver specific, I'm just arguing that the search itself can
> be invoked from the block layer through a block_device_operation
> when the bdev is created (in there, the driver sibling search has
> its own driver specific locking).

Let's wait for that until we have some block layer code posted.
I'll see if I can make this work, but I suspect driving the search
from the driver is going to be a lot easier.

> btw, I didn't see handling of the case where a sibling match found where
> the sibling is already linked (a sibling too).
>
> say we have namespaces a, b and c, where b,c are siblings of a (all with
> the same nsid=3).
>
> If I read the code correctly, c will link to both a and b wouldn't it?
>
> Do we need to check: list_empty(&cur->siblings)?
>
> Or am I not understanding the data structure?

Yeah, we should break after finding the first sibling.  Let me create
a test setup with three controllers with the same namespaces..

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 6/6] nvme: track shared namespaces in a siblings list
  2017-06-19 13:00                 ` Christoph Hellwig
@ 2017-06-19 16:43                   ` Sagi Grimberg
  0 siblings, 0 replies; 26+ messages in thread
From: Sagi Grimberg @ 2017-06-19 16:43 UTC (permalink / raw)



>> I completely agree that the multipath map (sibling list) is
>> driver specific, I'm just arguing that the search itself can
>> be invoked from the block layer through a block_device_operation
>> when the bdev is created (in there, the driver sibling search has
>> its own driver specific locking).
> 
> Let's wait for that until we have some block layer code posted.

I was actually going to ask you to share the block code :)
It will clear a lot for me...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/6] nvme: remove a misleading comment on strut nvme_ns
  2017-06-19  9:57 ` [PATCH 1/6] nvme: remove a misleading comment on strut nvme_ns Christoph Hellwig
@ 2017-06-19 16:49   ` Keith Busch
  0 siblings, 0 replies; 26+ messages in thread
From: Keith Busch @ 2017-06-19 16:49 UTC (permalink / raw)


On Mon, Jun 19, 2017@11:57:49AM +0200, Christoph Hellwig wrote:
> While a NVMe Namespace is somewhat similar to a SCSI Logical Unit (and not
> a Logical Unit Number anyway) there are subtile differences.  Remove the
> misleading comment.

Looks good.

Reviewed-by: Keith Busch <keith.busch at intel.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2/6] nvme: read the subsystem NQN from Identify Controller
  2017-06-19  9:57 ` [PATCH 2/6] nvme: read the subsystem NQN from Identify Controller Christoph Hellwig
  2017-06-19 11:44   ` Johannes Thumshirn
@ 2017-06-19 16:50   ` Keith Busch
  1 sibling, 0 replies; 26+ messages in thread
From: Keith Busch @ 2017-06-19 16:50 UTC (permalink / raw)


On Mon, Jun 19, 2017@11:57:50AM +0200, Christoph Hellwig wrote:
> NVMe 1.2.1 or later requires controllers to provide a subsystem NQN in the
> Identify controller data structures.  Use this NQN for the subsysnqn
> sysfs attribute by storing it in the nvme_ctrl structure after verifying
> it.  For older controllers we generate a "fake" NQN per non-normative
> text in the NVMe 1.3 spec.

Looks good.

Reviewed-by: Keith Busch <keith.busch at intel.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3/6] nvme: simplify nvme_dev_attrs_are_visible
  2017-06-19  9:57 ` [PATCH 3/6] nvme: simplify nvme_dev_attrs_are_visible Christoph Hellwig
@ 2017-06-19 16:50   ` Keith Busch
  0 siblings, 0 replies; 26+ messages in thread
From: Keith Busch @ 2017-06-19 16:50 UTC (permalink / raw)


Looks fine.

Reviewed-by: Keith Busch <keith.busch at intel.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 5/6] nvme: track subsystems
  2017-06-19  9:57 ` [PATCH 5/6] nvme: track subsystems Christoph Hellwig
  2017-06-19 11:47   ` Johannes Thumshirn
@ 2017-06-19 16:52   ` Keith Busch
  1 sibling, 0 replies; 26+ messages in thread
From: Keith Busch @ 2017-06-19 16:52 UTC (permalink / raw)


On Mon, Jun 19, 2017@11:57:53AM +0200, Christoph Hellwig wrote:
> This adds a new nvme_subsystem structure so that we can track multiple
> controllers that belong to a single subsystem.  For now we only use it
> to store the NQN, and to check that we don't have duplicate NQNs unless
> the involved subsystems support multiple controllers.

Looks good.

Reviewed-by: Keith Busch <keith.busch at intel.com>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 6/6] nvme: track shared namespaces in a siblings list
  2017-06-16  6:20     ` Christoph Hellwig
@ 2017-06-18  8:17       ` Sagi Grimberg
  0 siblings, 0 replies; 26+ messages in thread
From: Sagi Grimberg @ 2017-06-18  8:17 UTC (permalink / raw)



>>>    +static int nvme_ns_link_siblings(struct nvme_ns *ns, struct nvme_id_ns
>>> *id,
>>> +		struct nvme_ns *cur)
>>> +{
>>
>> The fact that this can also not link sibling makes me think that
>> the name is a bit confusing. I'm wandering if it would be a good
>> idea to split the id check (and comparison) and the actual link
>> itself?
> 
> I though up that, but the ugliness is that we have three possible
> outcomes:  link up (success), not link up (success) or not link up (error),
> which would be a bit ugly.  But I could probably do it by using -errno, 0
> and 1 as return values.

To me, the code would look better if we split the logic.

Something like:


	list_for_each_entry(cur, &ctrl->namespaces, list) {
		if (nvme_ns_is_sibling(ns, id, cur)) {
			err = nvme_ns_link_sibling(ns, cur);
			if (err)
				break;
		}
	}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 6/6] nvme: track shared namespaces in a siblings list
  2017-06-15 16:49   ` Sagi Grimberg
@ 2017-06-16  6:20     ` Christoph Hellwig
  2017-06-18  8:17       ` Sagi Grimberg
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2017-06-16  6:20 UTC (permalink / raw)


On Thu, Jun 15, 2017@07:49:50PM +0300, Sagi Grimberg wrote:
>>   +static int nvme_ns_link_siblings(struct nvme_ns *ns, struct nvme_id_ns 
>> *id,
>> +		struct nvme_ns *cur)
>> +{
>
> The fact that this can also not link sibling makes me think that
> the name is a bit confusing. I'm wandering if it would be a good
> idea to split the id check (and comparison) and the actual link
> itself?

I though up that, but the ugliness is that we have three possible
outcomes:  link up (success), not link up (success) or not link up (error),
which would be a bit ugly.  But I could probably do it by using -errno, 0
and 1 as return values.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 6/6] nvme: track shared namespaces in a siblings list
  2017-06-15 16:35 ` [PATCH 6/6] nvme: track shared namespaces in a siblings list Christoph Hellwig
@ 2017-06-15 16:49   ` Sagi Grimberg
  2017-06-16  6:20     ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Sagi Grimberg @ 2017-06-15 16:49 UTC (permalink / raw)


Christoph,

> Verify that our IDs match for shared namespaces in a subystem, and link
> them up in a headless queue so that we can find siblings easily.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>   drivers/nvme/host/core.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++
>   drivers/nvme/host/nvme.h |  1 +
>   2 files changed, 89 insertions(+)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 57e48d893173..a67a850ef7db 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2249,6 +2249,89 @@ static struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid)
>   	return ret;
>   }
>   
> +static int nvme_ns_link_siblings(struct nvme_ns *ns, struct nvme_id_ns *id,
> +		struct nvme_ns *cur)
> +{

The fact that this can also not link sibling makes me think that
the name is a bit confusing. I'm wandering if it would be a good
idea to split the id check (and comparison) and the actual link
itself?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 6/6] nvme: track shared namespaces in a siblings list
  2017-06-15 16:34 track subsystem relationships and shared namespaces Christoph Hellwig
@ 2017-06-15 16:35 ` Christoph Hellwig
  2017-06-15 16:49   ` Sagi Grimberg
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2017-06-15 16:35 UTC (permalink / raw)


Verify that our IDs match for shared namespaces in a subystem, and link
them up in a headless queue so that we can find siblings easily.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h |  1 +
 2 files changed, 89 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 57e48d893173..a67a850ef7db 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2249,6 +2249,89 @@ static struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	return ret;
 }
 
+static int nvme_ns_link_siblings(struct nvme_ns *ns, struct nvme_id_ns *id,
+		struct nvme_ns *cur)
+{
+	bool is_shared = id->nmic & (1 << 0);
+	bool have_id = false;
+
+	if (!uuid_is_null(&ns->uuid)) {
+		have_id = true;
+		if (!uuid_equal(&ns->uuid, &cur->uuid))
+			return 0;
+	}
+	if (memchr_inv(ns->nguid, 0, sizeof(ns->nguid))) {
+		have_id = true;
+		if (memcmp(&ns->nguid, &cur->nguid, sizeof(ns->nguid)))
+			return 0;
+	}
+	if (memchr_inv(ns->eui, 0, sizeof(ns->eui))) {
+		have_id = true;
+		if (memcmp(&ns->eui, &cur->eui, sizeof(ns->eui)))
+			return 0;
+	}
+
+	/*
+	 * XXX: unique namespace ids are only required when namespace
+	 * management is enabled.  But between that and the uniqueue ids not
+	 * being mandatory we are between a rock and a hard place.
+	 */
+	if (!have_id && is_shared) {
+		dev_err(ns->ctrl->device,
+			"shared namespace %u without unique ID.\n",
+			ns->ns_id);
+		return -EINVAL;
+	}
+
+	if (have_id && !is_shared) {
+		dev_err(ns->ctrl->device,
+			"private namespace %u with duplicate ID.\n",
+			ns->ns_id);
+		return -EINVAL;
+	}
+
+	/*
+	 * XXX: this is currently only guaranteed when namespace management
+	 * is supported.  Decide if we should make it conditional or simply
+	 * not support multipathing on odd devices.
+	 */
+	if (is_shared && ns->ns_id != cur->ns_id) {
+		dev_err(ns->ctrl->device,
+			"non-matching nsids (%u vs %u) for shared namespaces\n",
+			ns->ns_id, cur->ns_id);
+	}
+
+	list_add(&ns->siblings, &cur->siblings);
+	return 0;
+}
+
+static int nvme_ns_find_siblings(struct nvme_ns *ns, struct nvme_id_ns *id)
+{
+	struct nvme_subsystem *subsys = ns->ctrl->subsys;
+	struct nvme_ctrl *ctrl;
+	struct nvme_ns *cur;
+	int err;
+
+	mutex_lock(&subsys->lock);
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+		if (ctrl == ns->ctrl)
+			continue;
+
+		mutex_lock(&ctrl->namespaces_mutex);
+		list_for_each_entry(cur, &ctrl->namespaces, list) {
+			err = nvme_ns_link_siblings(ns, id, cur);
+			if (err)
+				break;
+		}
+		mutex_unlock(&ctrl->namespaces_mutex);
+
+		if (err)
+			break;
+	}
+	mutex_unlock(&subsys->lock);
+	return err;
+}
+
 static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 {
 	struct nvme_ns *ns;
@@ -2271,6 +2354,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, ns->queue);
 	ns->queue->queuedata = ns;
 	ns->ctrl = ctrl;
+	INIT_LIST_HEAD(&ns->siblings);
 
 	kref_init(&ns->kref);
 	ns->ns_id = nsid;
@@ -2284,6 +2368,9 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	if (nvme_revalidate_ns(ns, &id))
 		goto out_free_queue;
 
+	if (nvme_ns_find_siblings(ns, id))
+		goto out_free_id;
+
 	if (nvme_nvm_ns_supported(ns, id) &&
 				nvme_nvm_register(ns, disk_name, node)) {
 		dev_warn(ctrl->device, "%s: LightNVM init failure\n", __func__);
@@ -2348,6 +2435,7 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 
 	mutex_lock(&ns->ctrl->namespaces_mutex);
 	list_del_init(&ns->list);
+	list_del_init(&ns->siblings);
 	mutex_unlock(&ns->ctrl->namespaces_mutex);
 
 	nvme_put_ns(ns);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 2a006a00c3fc..777a85cc2a0d 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -196,6 +196,7 @@ struct nvme_ns {
 	struct nvme_ctrl *ctrl;
 	struct request_queue *queue;
 	struct gendisk *disk;
+	struct list_head siblings;
 	struct nvm_dev *ndev;
 	struct kref kref;
 	int instance;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-06-19 16:52 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-19  9:57 track subsystem relationships and shared namespaces V2 Christoph Hellwig
2017-06-19  9:57 ` [PATCH 1/6] nvme: remove a misleading comment on strut nvme_ns Christoph Hellwig
2017-06-19 16:49   ` Keith Busch
2017-06-19  9:57 ` [PATCH 2/6] nvme: read the subsystem NQN from Identify Controller Christoph Hellwig
2017-06-19 11:44   ` Johannes Thumshirn
2017-06-19 16:50   ` Keith Busch
2017-06-19  9:57 ` [PATCH 3/6] nvme: simplify nvme_dev_attrs_are_visible Christoph Hellwig
2017-06-19 16:50   ` Keith Busch
2017-06-19  9:57 ` [PATCH 4/6] nvme-fabrics: verify that a controller returns the correct NQN Christoph Hellwig
2017-06-19  9:57 ` [PATCH 5/6] nvme: track subsystems Christoph Hellwig
2017-06-19 11:47   ` Johannes Thumshirn
2017-06-19 16:52   ` Keith Busch
2017-06-19  9:57 ` [PATCH 6/6] nvme: track shared namespaces in a siblings list Christoph Hellwig
2017-06-19 10:12   ` Sagi Grimberg
2017-06-19 10:19     ` Christoph Hellwig
2017-06-19 10:39       ` Sagi Grimberg
2017-06-19 10:42         ` Christoph Hellwig
2017-06-19 10:57           ` Sagi Grimberg
2017-06-19 10:59             ` Christoph Hellwig
2017-06-19 11:27               ` Sagi Grimberg
2017-06-19 13:00                 ` Christoph Hellwig
2017-06-19 16:43                   ` Sagi Grimberg
  -- strict thread matches above, loose matches on Subject: below --
2017-06-15 16:34 track subsystem relationships and shared namespaces Christoph Hellwig
2017-06-15 16:35 ` [PATCH 6/6] nvme: track shared namespaces in a siblings list Christoph Hellwig
2017-06-15 16:49   ` Sagi Grimberg
2017-06-16  6:20     ` Christoph Hellwig
2017-06-18  8:17       ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.