All of lore.kernel.org
 help / color / mirror / Atom feed
* draft ANA support v2
@ 2018-06-01  7:11 Christoph Hellwig
  2018-06-01  7:11 ` [PATCH 1/9] nvme: don't hold nvmf_transports_rwsem for more than transport lookups Christoph Hellwig
                   ` (8 more replies)
  0 siblings, 9 replies; 43+ messages in thread
From: Christoph Hellwig @ 2018-06-01  7:11 UTC (permalink / raw)


Next repost of the ANA support with a lot of changes.

The big TODO list item for the kernel code is support for transition
timeouts, and handling of last resort retries on change or inaccssible
paths.

Besides that we still need nvme-cli and nvmetcli support as well as
an automated test suite.

A git tree is available at:

    git://git.infradead.org/users/hch/block.git nvme-ana

Gitweb:

    http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/nvme-ana

Changes since v1:
 - fix the check for SCT = 3
 - fix the check for immutable ANAGRPID
 - disable ANA if the ANA log can't be transferred in a single chunk
 - enable the ANA AEN explicitly
 - don't zero buffers that don't get copied to the SGL
 - use READ_ONCE/WRITE_ONCE for the ana_state fields
 - update ANA state on completion with an ANA status code
 - improve commit logs
 - add proper AEN enable and masking conditionals
 - fix the numeric value for the change state
 - fixed a memory leak (Hannes Reinecke)
 - fix reporting of NUSE (based on a patch from Hannes Reinecke)
 - make ANATT configurable

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 1/9] nvme: don't hold nvmf_transports_rwsem for more than transport lookups
  2018-06-01  7:11 draft ANA support v2 Christoph Hellwig
@ 2018-06-01  7:11 ` Christoph Hellwig
  2018-06-03 12:26   ` Sagi Grimberg
  2018-06-04  6:30   ` Hannes Reinecke
  2018-06-01  7:11 ` [PATCH 2/9] nvme.h: add ANA definitions Christoph Hellwig
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 43+ messages in thread
From: Christoph Hellwig @ 2018-06-01  7:11 UTC (permalink / raw)


From: Johannes Thumshirn <jthumshirn@suse.de>

Only take nvmf_transports_rwsem when doing a lookup of registered
transports, so that a blocking ->create_ctrl doesn't prevent other
actions on /dev/nvme-fabrics.

Signed-off-by: Johannes Thumshirn <jthumshirn at suse.de>
[hch: increased lock hold time a bit to be safe, added a comment
 and updated the changelog]
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/fabrics.c | 3 ++-
 drivers/nvme/host/fabrics.h | 3 +++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 5f5f7067c41d..fa32c1216409 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -952,6 +952,7 @@ nvmf_create_ctrl(struct device *dev, const char *buf, size_t count)
 		ret = -EBUSY;
 		goto out_unlock;
 	}
+	up_read(&nvmf_transports_rwsem);
 
 	ret = nvmf_check_required_opts(opts, ops->required_opts);
 	if (ret)
@@ -968,11 +969,11 @@ nvmf_create_ctrl(struct device *dev, const char *buf, size_t count)
 	}
 
 	module_put(ops->module);
-	up_read(&nvmf_transports_rwsem);
 	return ctrl;
 
 out_module_put:
 	module_put(ops->module);
+	goto out_free_opts;
 out_unlock:
 	up_read(&nvmf_transports_rwsem);
 out_free_opts:
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index 0cf0460a5c92..7491a0bbf711 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -124,6 +124,9 @@ struct nvmf_ctrl_options {
  *	1. At minimum, 'required_opts' and 'allowed_opts' should
  *	   be set to the same enum parsing options defined earlier.
  *	2. create_ctrl() must be defined (even if it does nothing)
+ *	3. struct nvmf_transport_ops must be statically allocated in the
+ *	   modules .bss section so that a pure module_get on @module
+ *	   prevents the memory from beeing freed.
  */
 struct nvmf_transport_ops {
 	struct list_head	entry;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 2/9] nvme.h: add ANA definitions
  2018-06-01  7:11 draft ANA support v2 Christoph Hellwig
  2018-06-01  7:11 ` [PATCH 1/9] nvme: don't hold nvmf_transports_rwsem for more than transport lookups Christoph Hellwig
@ 2018-06-01  7:11 ` Christoph Hellwig
  2018-06-03 12:27   ` Sagi Grimberg
                     ` (2 more replies)
  2018-06-01  7:11 ` [PATCH 3/9] nvme: add support for the log specific field Christoph Hellwig
                   ` (6 subsequent siblings)
  8 siblings, 3 replies; 43+ messages in thread
From: Christoph Hellwig @ 2018-06-01  7:11 UTC (permalink / raw)


Add various defintions from NVMe 1.3 TP 4004.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 include/linux/nvme.h | 50 +++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 47 insertions(+), 3 deletions(-)

diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 2950ce957656..601837bcb790 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -242,7 +242,12 @@ struct nvme_id_ctrl {
 	__le32			sanicap;
 	__le32			hmminds;
 	__le16			hmmaxd;
-	__u8			rsvd338[174];
+	__u8			rsvd338[4];
+	__u8			anatt;
+	__u8			anacap;
+	__le32			anagrpmax;
+	__le32			nanagrpid;
+	__u8			rsvd352[160];
 	__u8			sqes;
 	__u8			cqes;
 	__le16			maxcmd;
@@ -258,7 +263,8 @@ struct nvme_id_ctrl {
 	__le16			acwu;
 	__u8			rsvd534[2];
 	__le32			sgls;
-	__u8			rsvd540[228];
+	__le32			mnan;
+	__u8			rsvd544[224];
 	char			subnqn[256];
 	__u8			rsvd1024[768];
 	__le32			ioccsz;
@@ -312,7 +318,9 @@ struct nvme_id_ns {
 	__le16			nabspf;
 	__le16			noiob;
 	__u8			nvmcap[16];
-	__u8			rsvd64[40];
+	__u8			rsvd64[28];
+	__le32			anagrpid;
+	__u8			rsvd96[8];
 	__u8			nguid[16];
 	__u8			eui64[8];
 	struct nvme_lbaf	lbaf[16];
@@ -425,6 +433,32 @@ struct nvme_effects_log {
 	__u8   resv[2048];
 };
 
+enum nvme_ana_state {
+	NVME_ANA_OPTIMIZED		= 0x01,
+	NVME_ANA_NONOPTIMIZED		= 0x02,
+	NVME_ANA_INACCESSIBLE		= 0x03,
+	NVME_ANA_PERSISTENT_LOSS	= 0x04,
+	NVME_ANA_CHANGE			= 0x0f,
+};
+
+struct nvme_ana_group_desc {
+	__le32	grpid;
+	__le32	nnsids;
+	__le64	chgcnt;
+	__u8	state;
+	__u8	rsvd17[7];
+	__le32	nsids[];
+};
+
+/* flag for the log specific field of the ANA log */
+#define NVME_ANA_LOG_RGO	(1 << 0)
+
+struct nvme_ana_rsp_hdr {
+	__le64	chgcnt;
+	__le16	ngrps;
+	__le16	rsvd10[3];
+};
+
 enum {
 	NVME_SMART_CRIT_SPARE		= 1 << 0,
 	NVME_SMART_CRIT_TEMPERATURE	= 1 << 1,
@@ -444,11 +478,13 @@ enum {
 enum {
 	NVME_AER_NOTICE_NS_CHANGED	= 0x00,
 	NVME_AER_NOTICE_FW_ACT_STARTING = 0x01,
+	NVME_AER_NOTICE_ANA		= 0x03,
 };
 
 enum {
 	NVME_AEN_CFG_NS_ATTR		= 1 << 8,
 	NVME_AEN_CFG_FW_ACT		= 1 << 9,
+	NVME_AEN_CFG_ANA_CHANGE		= 1 << 11,
 };
 
 struct nvme_lba_range_type {
@@ -758,6 +794,7 @@ enum {
 	NVME_LOG_FW_SLOT	= 0x03,
 	NVME_LOG_CHANGED_NS	= 0x04,
 	NVME_LOG_CMD_EFFECTS	= 0x05,
+	NVME_LOG_ANA		= 0x0c,
 	NVME_LOG_DISC		= 0x70,
 	NVME_LOG_RESERVATION	= 0x80,
 	NVME_FWACT_REPL		= (0 << 3),
@@ -1180,6 +1217,13 @@ enum {
 	NVME_SC_ACCESS_DENIED		= 0x286,
 	NVME_SC_UNWRITTEN_BLOCK		= 0x287,
 
+	/*
+	 * Path-related Errors:
+	 */
+	NVME_SC_ANA_PERSISTENT_LOSS	= 0x301,
+	NVME_SC_ANA_INACCESSIBLE	= 0x302,
+	NVME_SC_ANA_TRANSITION		= 0x303,
+
 	NVME_SC_DNR			= 0x4000,
 };
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 3/9] nvme: add support for the log specific field
  2018-06-01  7:11 draft ANA support v2 Christoph Hellwig
  2018-06-01  7:11 ` [PATCH 1/9] nvme: don't hold nvmf_transports_rwsem for more than transport lookups Christoph Hellwig
  2018-06-01  7:11 ` [PATCH 2/9] nvme.h: add ANA definitions Christoph Hellwig
@ 2018-06-01  7:11 ` Christoph Hellwig
  2018-06-03 12:27   ` Sagi Grimberg
                     ` (2 more replies)
  2018-06-01  7:11 ` [PATCH 4/9] nvme: always failover on path or transport errors Christoph Hellwig
                   ` (5 subsequent siblings)
  8 siblings, 3 replies; 43+ messages in thread
From: Christoph Hellwig @ 2018-06-01  7:11 UTC (permalink / raw)


NVMe 1.3 added a new log specific field to the get log page CQ
defintion.  Add the field and support passing it to nvme_get_log_ext.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c     | 6 +++---
 drivers/nvme/host/lightnvm.c | 2 +-
 drivers/nvme/host/nvme.h     | 2 +-
 include/linux/nvme.h         | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7ad3cfc9d4e1..d67d29e4c5ff 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2257,8 +2257,7 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
 }
 
 int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
-		     u8 log_page, void *log,
-		     size_t size, u64 offset)
+		u8 log_page, u8 lsp, void *log, size_t size, u64 offset)
 {
 	struct nvme_command c = { };
 	unsigned long dwlen = size / 4 - 1;
@@ -2271,6 +2270,7 @@ int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 		c.get_log_page.nsid = cpu_to_le32(NVME_NSID_ALL);
 
 	c.get_log_page.lid = log_page;
+	c.get_log_page.lsp = lsp;
 	c.get_log_page.numdl = cpu_to_le16(dwlen & ((1 << 16) - 1));
 	c.get_log_page.numdu = cpu_to_le16(dwlen >> 16);
 	c.get_log_page.lpol = cpu_to_le32(lower_32_bits(offset));
@@ -2282,7 +2282,7 @@ int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 static int nvme_get_log(struct nvme_ctrl *ctrl, u8 log_page, void *log,
 			size_t size)
 {
-	return nvme_get_log_ext(ctrl, NULL, log_page, log, size, 0);
+	return nvme_get_log_ext(ctrl, NULL, log_page, 0, log, size, 0);
 }
 
 static int nvme_get_effects_log(struct nvme_ctrl *ctrl)
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index 41279da799ed..216c6b0a9554 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -598,7 +598,7 @@ static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
 	while (left) {
 		len = min_t(unsigned int, left, ctrl->max_hw_sectors << 9);
 
-		ret = nvme_get_log_ext(ctrl, ns, NVME_NVM_LOG_REPORT_CHUNK,
+		ret = nvme_get_log_ext(ctrl, ns, NVME_NVM_LOG_REPORT_CHUNK, 0,
 				dev_meta, len, offset);
 		if (ret) {
 			dev_err(ctrl->device, "Get REPORT CHUNK log error\n");
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 07e8bfe705c6..a9695b8645aa 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -433,7 +433,7 @@ int nvme_delete_ctrl(struct nvme_ctrl *ctrl);
 int nvme_delete_ctrl_sync(struct nvme_ctrl *ctrl);
 
 int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
-		u8 log_page, void *log, size_t size, u64 offset);
+		u8 log_page, u8 lsp, void *log, size_t size, u64 offset);
 
 extern const struct attribute_group nvme_ns_id_attr_group;
 extern const struct block_device_operations nvme_ns_head_ops;
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 601837bcb790..fdff3c37bdd9 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -917,7 +917,7 @@ struct nvme_get_log_page_command {
 	__u64			rsvd2[2];
 	union nvme_data_ptr	dptr;
 	__u8			lid;
-	__u8			rsvd10;
+	__u8			lsp; /* upper 4 bits reserved */
 	__le16			numdl;
 	__le16			numdu;
 	__u16			rsvd11;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 4/9] nvme: always failover on path or transport errors
  2018-06-01  7:11 draft ANA support v2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2018-06-01  7:11 ` [PATCH 3/9] nvme: add support for the log specific field Christoph Hellwig
@ 2018-06-01  7:11 ` Christoph Hellwig
  2018-06-01 15:27   ` Mike Snitzer
                     ` (2 more replies)
  2018-06-01  7:11 ` [PATCH 5/9] nvme: add ANA support Christoph Hellwig
                   ` (4 subsequent siblings)
  8 siblings, 3 replies; 43+ messages in thread
From: Christoph Hellwig @ 2018-06-01  7:11 UTC (permalink / raw)


NVMe 1.3 TP 4028 introduced a new status code type 3h for
"Path Related Status".  We should always retry on another path for this
class of errors, without even trying to decode them.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/multipath.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index d7b664ae5923..9412e5138d65 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -59,6 +59,9 @@ bool nvme_req_needs_failover(struct request *req, blk_status_t error)
 {
 	if (!(req->cmd_flags & REQ_NVME_MPATH))
 		return false;
+	/* always failover for path related errors */
+	if ((nvme_req(req)->status & 0x700) == 0x300)
+		return true;
 	return blk_path_error(error);
 }
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 5/9] nvme: add ANA support
  2018-06-01  7:11 draft ANA support v2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2018-06-01  7:11 ` [PATCH 4/9] nvme: always failover on path or transport errors Christoph Hellwig
@ 2018-06-01  7:11 ` Christoph Hellwig
  2018-06-04  6:36   ` Hannes Reinecke
  2018-06-06 12:01   ` Popuri, Sriram
  2018-06-01  7:11 ` [PATCH 6/9] nvmet: track and limit the number of namespaces per subsystem Christoph Hellwig
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 43+ messages in thread
From: Christoph Hellwig @ 2018-06-01  7:11 UTC (permalink / raw)


Add support for Asynchronous Namespace Access as specified in NVMe 1.3
TP 4004.  With ANA each namespace attached to a controller belongs to
an ANA group that describes the characteristics of accessing the
namespaces through this controller.  In the optimized and non-optimized
states namespaces can be accessed regularly, although in a multi-pathing
environment we should always prefer to access a namespace through a
controller where an optimized relationship exists.  Namespaces in
Inaccessible, Permanent-Loss or Change state for a given controller
should not be accessed.

We keep a simple per-controller array of ANA states which is indexed
by the ANA group ID specified in the namespace.  The states are updated
through reading the ANA log page, which is read once during controller
initialization, and whenever the ANA change notice AEN is received, or
when one of the ANA specific status codes that signal a state change
is received on a command.

There currently isn't any support for the ANA transition timeout
yet.

Includes fixes and improvements from Hannes Reinecke.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c      |  30 ++++-
 drivers/nvme/host/multipath.c | 210 +++++++++++++++++++++++++++++++++-
 drivers/nvme/host/nvme.h      |  28 +++++
 3 files changed, 261 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d67d29e4c5ff..4a7fd138e25f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1042,7 +1042,7 @@ int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)
 EXPORT_SYMBOL_GPL(nvme_set_queue_count);
 
 #define NVME_AEN_SUPPORTED \
-	(NVME_AEN_CFG_NS_ATTR | NVME_AEN_CFG_FW_ACT)
+	(NVME_AEN_CFG_NS_ATTR | NVME_AEN_CFG_FW_ACT | NVME_AEN_CFG_ANA_CHANGE)
 
 static void nvme_enable_aen(struct nvme_ctrl *ctrl)
 {
@@ -1473,6 +1473,7 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 	ns->lba_shift = id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ds;
 	if (ns->lba_shift == 0)
 		ns->lba_shift = 9;
+	ns->anagrpid = le32_to_cpu(id->anagrpid);
 	ns->noiob = le16_to_cpu(id->noiob);
 	ns->ext = ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT);
 	ns->ms = le16_to_cpu(id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ms);
@@ -2387,6 +2388,10 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 	nvme_set_queue_limits(ctrl, ctrl->admin_q);
 	ctrl->sgls = le32_to_cpu(id->sgls);
 	ctrl->kas = le16_to_cpu(id->kas);
+	ctrl->max_namespaces = le32_to_cpu(id->mnan);
+	ctrl->anacap = id->anacap;
+	ctrl->nanagrpid = le32_to_cpu(id->nanagrpid);
+	ctrl->anagrpmax = le32_to_cpu(id->anagrpmax);
 
 	if (id->rtd3e) {
 		/* us -> s */
@@ -2465,6 +2470,10 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 	if (ret < 0)
 		return ret;
 
+	ret = nvme_configure_ana(ctrl);
+	if (ret < 0)
+		return ret;
+
 	ctrl->identified = true;
 
 	return 0;
@@ -2666,6 +2675,10 @@ static struct attribute *nvme_ns_id_attrs[] = {
 	&dev_attr_nguid.attr,
 	&dev_attr_eui.attr,
 	&dev_attr_nsid.attr,
+#ifdef CONFIG_NVME_MULTIPATH
+	&dev_attr_ana_grpid.attr,
+	&dev_attr_ana_state.attr,
+#endif
 	NULL,
 };
 
@@ -2688,6 +2701,14 @@ static umode_t nvme_ns_id_attrs_are_visible(struct kobject *kobj,
 		if (!memchr_inv(ids->eui64, 0, sizeof(ids->eui64)))
 			return 0;
 	}
+#ifdef CONFIG_NVME_MULTIPATH
+	if (a == &dev_attr_ana_grpid.attr || a == &dev_attr_ana_state.attr) {
+		if (dev_to_disk(dev)->fops != &nvme_fops) /* per-path attr */
+			return 0;
+		if (!nvme_get_ns_from_dev(dev)->anagrpid)
+			return 0;
+	}
+#endif
 	return a->mode;
 }
 
@@ -3391,6 +3412,11 @@ static void nvme_handle_aen_notice(struct nvme_ctrl *ctrl, u32 result)
 	case NVME_AER_NOTICE_FW_ACT_STARTING:
 		queue_work(nvme_wq, &ctrl->fw_act_work);
 		break;
+	case NVME_AER_NOTICE_ANA:
+		if (WARN_ON_ONCE(!ctrl->ana_log_buf))
+			break;
+		queue_work(nvme_wq, &ctrl->ana_work);
+		break;
 	default:
 		dev_warn(ctrl->device, "async event result %08x\n", result);
 	}
@@ -3426,6 +3452,7 @@ void nvme_stop_ctrl(struct nvme_ctrl *ctrl)
 	nvme_stop_keep_alive(ctrl);
 	flush_work(&ctrl->async_event_work);
 	flush_work(&ctrl->scan_work);
+	cancel_work_sync(&ctrl->ana_work);
 	cancel_work_sync(&ctrl->fw_act_work);
 	if (ctrl->ops->stop_ctrl)
 		ctrl->ops->stop_ctrl(ctrl);
@@ -3460,6 +3487,7 @@ static void nvme_free_ctrl(struct device *dev)
 
 	ida_simple_remove(&nvme_instance_ida, ctrl->instance);
 	kfree(ctrl->effects);
+	nvme_deconfigure_ana(ctrl);
 
 	if (subsys) {
 		mutex_lock(&subsys->lock);
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 9412e5138d65..1a8791340862 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2017 Christoph Hellwig.
+ * Copyright (c) 2017-2018 Christoph Hellwig.
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms and conditions of the GNU General Public License,
@@ -41,6 +41,11 @@ void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
 	}
 }
 
+static void nvme_update_ana_state(struct nvme_ns *ns, enum nvme_ana_state state)
+{
+	WRITE_ONCE(ns->ctrl->ana_state[ns->anagrpid], state);
+}
+
 void nvme_failover_req(struct request *req)
 {
 	struct nvme_ns *ns = req->q->queuedata;
@@ -51,7 +56,31 @@ void nvme_failover_req(struct request *req)
 	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
 	blk_mq_end_request(req, 0);
 
-	nvme_reset_ctrl(ns->ctrl);
+	/*
+	 * Reset the controller for any non-ANA error as we don't know what
+	 * caused the error:
+	 */
+	switch (nvme_req(req)->status & 0x7ff) {
+	case NVME_SC_ANA_TRANSITION:
+		/*
+		 * XXX: We should verify the controller doesn't die on during
+		 * the transition.  But that means we per-group timeout from
+		 * when we first hit the change state, so this won't be
+		 * entirely trivial..
+		 */
+		nvme_update_ana_state(ns, NVME_ANA_CHANGE);
+		break;
+	case NVME_SC_ANA_PERSISTENT_LOSS:
+		nvme_update_ana_state(ns, NVME_ANA_PERSISTENT_LOSS);
+		break;
+	case NVME_SC_ANA_INACCESSIBLE:
+		nvme_update_ana_state(ns, NVME_ANA_INACCESSIBLE);
+		break;
+	default:
+		nvme_reset_ctrl(ns->ctrl);
+		break;
+	}
+
 	kblockd_schedule_work(&ns->head->requeue_work);
 }
 
@@ -77,12 +106,32 @@ void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
 	up_read(&ctrl->namespaces_rwsem);
 }
 
-static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head)
+static inline enum nvme_ana_state nvme_ns_ana_state(struct nvme_ns *ns)
+{
+	if (!nvme_ctrl_has_ana(ns->ctrl))
+		return NVME_ANA_OPTIMIZED;
+	if (WARN_ON_ONCE(ns->anagrpid > ns->ctrl->anagrpmax))
+		return 0;
+	return READ_ONCE(ns->ctrl->ana_state[ns->anagrpid]);
+}
+
+static const char *nvme_ana_state_names[] = {
+	[0]				= "invalid state",
+	[NVME_ANA_OPTIMIZED]		= "optimized",
+	[NVME_ANA_NONOPTIMIZED]		= "non-optimized",
+	[NVME_ANA_INACCESSIBLE]		= "inaccessible",
+	[NVME_ANA_PERSISTENT_LOSS]	= "persistent-loss",
+	[NVME_ANA_CHANGE]		= "change",
+};
+
+static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head,
+		u8 ana_state)
 {
 	struct nvme_ns *ns;
 
 	list_for_each_entry_rcu(ns, &head->list, siblings) {
-		if (ns->ctrl->state == NVME_CTRL_LIVE) {
+		if (ns->ctrl->state == NVME_CTRL_LIVE &&
+		    nvme_ns_ana_state(ns) == ana_state) {
 			rcu_assign_pointer(head->current_path, ns);
 			return ns;
 		}
@@ -95,8 +144,14 @@ inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
 {
 	struct nvme_ns *ns = srcu_dereference(head->current_path, &head->srcu);
 
-	if (unlikely(!ns || ns->ctrl->state != NVME_CTRL_LIVE))
-		ns = __nvme_find_path(head);
+	if (likely(ns && ns->ctrl->state == NVME_CTRL_LIVE &&
+			nvme_ns_ana_state(ns) == NVME_ANA_OPTIMIZED))
+		return ns;
+
+	ns = __nvme_find_path(head, NVME_ANA_OPTIMIZED);
+	if (!ns)
+		ns = __nvme_find_path(head, NVME_ANA_NONOPTIMIZED);
+	/* XXX: try an inaccessible path as last resort per 8.18.3.3 */
 	return ns;
 }
 
@@ -249,3 +304,146 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
 	blk_cleanup_queue(head->disk->queue);
 	put_disk(head->disk);
 }
+
+static int nvme_process_ana_log(struct nvme_ctrl *ctrl, bool groups_only)
+{
+	void *base = ctrl->ana_log_buf;
+	size_t offset = sizeof(struct nvme_ana_rsp_hdr);
+	int error, i;
+
+	/*
+	 * If anagrpid never changes we don't need to process the namespace
+	 * lists.
+	 */
+	if (ctrl->anacap & (1 << 6))
+		groups_only = true;
+
+	error = nvme_get_log_ext(ctrl, NULL, NVME_LOG_ANA,
+			groups_only ? NVME_ANA_LOG_RGO : 0,
+			ctrl->ana_log_buf, ctrl->ana_log_size, 0);
+	if (error) {
+		dev_warn(ctrl->device, "Failed to get ANA log: %d\n", error);
+		return error;
+	}
+
+	for (i = 0; i < le16_to_cpu(ctrl->ana_log_buf->ngrps); i++) {
+		struct nvme_ana_group_desc *desc = base + offset;
+		u32 grpid = le32_to_cpu(desc->grpid);
+		u32 nr_nsids = le32_to_cpu(desc->nnsids), n = 0;
+		size_t nsid_buf_size = nr_nsids * sizeof(__le32);
+		struct nvme_ns *ns;
+
+		if (WARN_ON_ONCE(grpid == 0))
+			return -EINVAL;
+		if (WARN_ON_ONCE(grpid > ctrl->anagrpmax))
+			return -EINVAL;
+		if (WARN_ON_ONCE(desc->state == 0))
+			return -EINVAL;
+		if (WARN_ON_ONCE(desc->state > NVME_ANA_CHANGE))
+			return -EINVAL;
+
+		dev_info(ctrl->device, "ANA group %d: %s.\n",
+				grpid, nvme_ana_state_names[desc->state]);
+		WRITE_ONCE(ctrl->ana_state[grpid], desc->state);
+		offset += sizeof(*desc);
+		if (!nr_nsids)
+			continue;
+
+		if (WARN_ON_ONCE(groups_only))
+			return -EINVAL;
+		if (WARN_ON_ONCE(offset > ctrl->ana_log_size - nsid_buf_size))
+			return -EINVAL;
+
+		down_write(&ctrl->namespaces_rwsem);
+		list_for_each_entry(ns, &ctrl->namespaces, list) {
+			u32 nsid = le32_to_cpu(desc->nsids[n]);
+
+			if (ns->head->ns_id != nsid)
+				continue;
+			ns->anagrpid = grpid;
+			if (++n == nr_nsids)
+				break;
+		}
+		up_write(&ctrl->namespaces_rwsem);
+		WARN_ON_ONCE(n < nr_nsids);
+
+		offset += nsid_buf_size;
+		if (WARN_ON_ONCE(offset > ctrl->ana_log_size - sizeof(*desc)))
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void nvme_ana_work(struct work_struct *work)
+{
+	struct nvme_ctrl *ctrl = container_of(work, struct nvme_ctrl, ana_work);
+
+	nvme_process_ana_log(ctrl, false);
+	nvme_kick_requeue_lists(ctrl);
+}
+
+int nvme_configure_ana(struct nvme_ctrl *ctrl)
+{
+	int error;
+
+	if (!nvme_ctrl_has_ana(ctrl))
+		return 0;
+
+	ctrl->ana_log_size = sizeof(struct nvme_ana_rsp_hdr) +
+		ctrl->nanagrpid * sizeof(struct nvme_ana_group_desc);
+	if (!(ctrl->anacap & (1 << 6)))
+		ctrl->ana_log_size += ctrl->max_namespaces * sizeof(__le32);
+
+	if (ctrl->ana_log_size > ctrl->max_hw_sectors << SECTOR_SHIFT) {
+		dev_err(ctrl->device,
+			"ANA log page size (%zd) larger than MDTS (%d).\n",
+			ctrl->ana_log_size,
+			ctrl->max_hw_sectors << SECTOR_SHIFT);
+		dev_err(ctrl->device, "disabling ANA support.\n");
+		return 0;
+	}
+
+	INIT_WORK(&ctrl->ana_work, nvme_ana_work);
+	ctrl->ana_state = kcalloc(ctrl->anagrpmax, sizeof(*ctrl->ana_state),
+			GFP_KERNEL);
+	if (!ctrl->ana_state)
+		return -ENOMEM;
+
+	ctrl->ana_log_buf = kmalloc(ctrl->ana_log_size, GFP_KERNEL);
+	if (!ctrl->ana_log_buf)
+		goto out_free_ana_state;
+
+	error = nvme_process_ana_log(ctrl, true);
+	if (error)
+		goto out_free_ana_log_buf;
+	return 0;
+out_free_ana_log_buf:
+	kfree(ctrl->ana_log_buf);
+out_free_ana_state:
+	kfree(ctrl->ana_state);
+	return -ENOMEM;
+}
+
+void nvme_deconfigure_ana(struct nvme_ctrl *ctrl)
+{
+	kfree(ctrl->ana_log_buf);
+	kfree(ctrl->ana_state);
+}
+
+static ssize_t ana_grpid_show(struct device *dev, struct device_attribute *attr,
+		char *buf)
+{
+	return sprintf(buf, "%d\n", nvme_get_ns_from_dev(dev)->anagrpid);
+}
+DEVICE_ATTR_RO(ana_grpid);
+
+static ssize_t ana_state_show(struct device *dev, struct device_attribute *attr,
+		char *buf)
+{
+	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
+	enum nvme_ana_state state = nvme_ns_ana_state(ns);
+
+	return sprintf(buf, "%s\n", nvme_ana_state_names[state]);
+}
+DEVICE_ATTR_RO(ana_state);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a9695b8645aa..4a29c736370f 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -169,6 +169,7 @@ struct nvme_ctrl {
 	u16 oacs;
 	u16 nssa;
 	u16 nr_streams;
+	u32 max_namespaces;
 	atomic_t abort_limit;
 	u8 vwc;
 	u32 vs;
@@ -192,6 +193,15 @@ struct nvme_ctrl {
 #define EVENT_NS_CHANGED		(1 << 0)
 	unsigned long events;
 
+	/* asymmetric namespace access: */
+	u8 anacap;
+	u32 anagrpmax;
+	u32 nanagrpid;
+	enum nvme_ana_state *ana_state;
+	size_t ana_log_size;
+	struct nvme_ana_rsp_hdr *ana_log_buf;
+	struct work_struct ana_work;
+
 	/* Power saving configuration */
 	u64 ps_max_latency_us;
 	bool apst_enabled;
@@ -297,6 +307,7 @@ struct nvme_ns {
 #define NVME_NS_REMOVING 0
 #define NVME_NS_DEAD     1
 	u16 noiob;
+	u32 anagrpid;
 
 #ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
 	struct nvme_fault_inject fault_inject;
@@ -438,6 +449,11 @@ int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 extern const struct attribute_group nvme_ns_id_attr_group;
 extern const struct block_device_operations nvme_ns_head_ops;
 
+static inline bool nvme_ctrl_has_ana(struct nvme_ctrl *ctrl)
+{
+	return ctrl->subsys->cmic & (1 << 3);
+}
+
 #ifdef CONFIG_NVME_MULTIPATH
 void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
 			struct nvme_ctrl *ctrl, int *flags);
@@ -447,6 +463,8 @@ void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl);
 int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,struct nvme_ns_head *head);
 void nvme_mpath_add_disk(struct nvme_ns_head *head);
 void nvme_mpath_remove_disk(struct nvme_ns_head *head);
+int nvme_configure_ana(struct nvme_ctrl *ctrl);
+void nvme_deconfigure_ana(struct nvme_ctrl *ctrl);
 
 static inline void nvme_mpath_clear_current_path(struct nvme_ns *ns)
 {
@@ -465,6 +483,9 @@ static inline void nvme_mpath_check_last_path(struct nvme_ns *ns)
 		kblockd_schedule_work(&head->requeue_work);
 }
 
+extern struct device_attribute dev_attr_ana_grpid;
+extern struct device_attribute dev_attr_ana_state;
+
 #else
 /*
  * Without the multipath code enabled, multiple controller per subsystems are
@@ -504,6 +525,13 @@ static inline void nvme_mpath_clear_current_path(struct nvme_ns *ns)
 static inline void nvme_mpath_check_last_path(struct nvme_ns *ns)
 {
 }
+static inline int nvme_configure_ana(struct nvme_ctrl *ctrl)
+{
+	return 0;
+}
+static inline void nvme_deconfigure_ana(struct nvme_ctrl *ctrl)
+{
+}
 #endif /* CONFIG_NVME_MULTIPATH */
 
 #ifdef CONFIG_NVM
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 6/9] nvmet: track and limit the number of namespaces per subsystem
  2018-06-01  7:11 draft ANA support v2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2018-06-01  7:11 ` [PATCH 5/9] nvme: add ANA support Christoph Hellwig
@ 2018-06-01  7:11 ` Christoph Hellwig
  2018-06-04  6:35   ` Hannes Reinecke
  2018-06-04 12:04   ` Johannes Thumshirn
  2018-06-01  7:11 ` [PATCH 7/9] nvmet: add minimal ANA support Christoph Hellwig
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 43+ messages in thread
From: Christoph Hellwig @ 2018-06-01  7:11 UTC (permalink / raw)


TP 4004 introduces a new 'Maximum Number of Allocated Namespaces' field
in the Identify controller data to help the host size resources.  Put
an upper limit on the supported namespaces to be able to support this
value as supporting 32-bits worth of namespaces would lead to very
large buffers.  The limit is completely arbitrary at this point.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/target/admin-cmd.c | 1 +
 drivers/nvme/target/core.c      | 8 +++++++-
 drivers/nvme/target/nvmet.h     | 8 ++++++++
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index ead8fbe6922e..62fe0ee57c31 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -220,6 +220,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 	id->maxcmd = cpu_to_le16(NVMET_MAX_CMD);
 
 	id->nn = cpu_to_le32(ctrl->subsys->max_nsid);
+	id->mnan = cpu_to_le32(NVMET_MAX_NAMESPACES);
 	id->oncs = cpu_to_le16(NVME_CTRL_ONCS_DSM |
 			NVME_CTRL_ONCS_WRITE_ZEROES);
 
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index a03da764ecae..f691ae47c83a 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -332,9 +332,13 @@ static void nvmet_ns_dev_disable(struct nvmet_ns *ns)
 int nvmet_ns_enable(struct nvmet_ns *ns)
 {
 	struct nvmet_subsys *subsys = ns->subsys;
-	int ret = 0;
+	int ret;
 
 	mutex_lock(&subsys->lock);
+	ret = -EMFILE;
+	if (subsys->nr_namespaces == NVMET_MAX_NAMESPACES)
+		goto out_unlock;
+	ret = 0;
 	if (ns->enabled)
 		goto out_unlock;
 
@@ -369,6 +373,7 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
 
 		list_add_tail_rcu(&ns->dev_link, &old->dev_link);
 	}
+	subsys->nr_namespaces++;
 
 	nvmet_ns_changed(subsys, ns->nsid);
 	ns->enabled = true;
@@ -409,6 +414,7 @@ void nvmet_ns_disable(struct nvmet_ns *ns)
 	percpu_ref_exit(&ns->ref);
 
 	mutex_lock(&subsys->lock);
+	subsys->nr_namespaces--;
 	nvmet_ns_changed(subsys, ns->nsid);
 	nvmet_ns_dev_disable(ns);
 out_unlock:
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 480dfe10fad9..bd24e2b7317d 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -166,6 +166,7 @@ struct nvmet_subsys {
 	struct kref		ref;
 
 	struct list_head	namespaces;
+	unsigned int		nr_namespaces;
 	unsigned int		max_nsid;
 
 	struct list_head	ctrls;
@@ -357,6 +358,13 @@ u32 nvmet_get_log_page_len(struct nvme_command *cmd);
 #define NVMET_QUEUE_SIZE	1024
 #define NVMET_NR_QUEUES		128
 #define NVMET_MAX_CMD		NVMET_QUEUE_SIZE
+
+/*
+ * Nice round number that makes a list of nsids fit into a page.
+ * Should become tunable at some point in the future.
+ */
+#define NVMET_MAX_NAMESPACES	1024
+
 #define NVMET_KAS		10
 #define NVMET_DISC_KATO		120
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 7/9] nvmet: add minimal ANA support
  2018-06-01  7:11 draft ANA support v2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2018-06-01  7:11 ` [PATCH 6/9] nvmet: track and limit the number of namespaces per subsystem Christoph Hellwig
@ 2018-06-01  7:11 ` Christoph Hellwig
  2018-06-04  6:41   ` Hannes Reinecke
  2018-06-04 12:25   ` Johannes Thumshirn
  2018-06-01  7:11 ` [PATCH 8/9] nvmet: support configuring additional ANA groups Christoph Hellwig
  2018-06-01  7:11 ` [PATCH 9/9] nvmet: make ANATT configurable Christoph Hellwig
  8 siblings, 2 replies; 43+ messages in thread
From: Christoph Hellwig @ 2018-06-01  7:11 UTC (permalink / raw)


Add support for Asynchronous Namespace Access as specified in NVMe 1.3
TP 4004.

Just add a default ANA group 1 that is optimized on all ports.  This is
(and will remain) the default assignment for any namespace not epxlicitly
assigned to another ANA group.  The ANA state can be manually changed
through the configfs interface, including the change state.

Includes fixes and improvements from Hannes Reinecke.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/target/admin-cmd.c | 87 +++++++++++++++++++++++++++++++--
 drivers/nvme/target/configfs.c  | 10 ++++
 drivers/nvme/target/core.c      | 34 +++++++++++++
 drivers/nvme/target/nvmet.h     | 15 ++++++
 4 files changed, 142 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 62fe0ee57c31..3f4a3794a17e 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -150,6 +150,69 @@ static void nvmet_execute_get_log_changed_ns(struct nvmet_req *req)
 	nvmet_req_complete(req, status);
 }
 
+static u32 nvmet_format_ana_group(struct nvmet_req *req, u32 grpid,
+		struct nvme_ana_group_desc *desc)
+{
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	struct nvmet_ns *ns;
+	u32 count = 0;
+
+	if (!(req->cmd->get_log_page.lsp & NVME_ANA_LOG_RGO)) {
+		rcu_read_lock();
+		list_for_each_entry_rcu(ns, &ctrl->subsys->namespaces, dev_link)
+			if (ns->anagrpid == grpid)
+				desc->nsids[count++] = cpu_to_le32(ns->nsid);
+		rcu_read_unlock();
+	}
+
+	desc->grpid = cpu_to_le32(grpid);
+	desc->nnsids = cpu_to_le32(count);
+	desc->chgcnt = cpu_to_le64(nvmet_ana_chgcnt);
+	desc->state = req->port->ana_state[grpid];
+	memset(desc->rsvd17, 0, sizeof(desc->rsvd17));
+	return sizeof(struct nvme_ana_group_desc) + count * sizeof(__le32);
+}
+
+static void nvmet_execute_get_log_page_ana(struct nvmet_req *req)
+{
+	struct nvme_ana_rsp_hdr hdr = { 0, };
+	struct nvme_ana_group_desc *desc;
+	size_t offset = sizeof(struct nvme_ana_rsp_hdr); /* start beyond hdr */
+	size_t len;
+	u32 grpid;
+	u16 ngrps = 0;
+	u16 status;
+
+	status = NVME_SC_INTERNAL;
+	desc = kmalloc(sizeof(struct nvme_ana_group_desc) +
+			NVMET_MAX_NAMESPACES * sizeof(__le32), GFP_KERNEL);
+	if (!desc)
+		goto out;
+
+	down_read(&nvmet_ana_sem);
+	for (grpid = 1; grpid <= NVMET_MAX_ANAGRPS; grpid++) {
+		if (!nvmet_ana_group_enabled[grpid])
+			continue;
+		len = nvmet_format_ana_group(req, grpid, desc);
+		status = nvmet_copy_to_sgl(req, offset, desc, len);
+		if (status)
+			break;
+		offset += len;
+		ngrps++;
+	}
+
+	hdr.chgcnt = cpu_to_le64(nvmet_ana_chgcnt);
+	hdr.ngrps = cpu_to_le16(ngrps);
+	up_read(&nvmet_ana_sem);
+
+	kfree(desc);
+
+	/* copy the header last once we know the number of groups */
+	status = nvmet_copy_to_sgl(req, 0, &hdr, sizeof(hdr));
+out:
+	nvmet_req_complete(req, status);
+}
+
 static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 {
 	struct nvmet_ctrl *ctrl = req->sq->ctrl;
@@ -181,8 +244,8 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 	 * the safest is to leave it as zeroes.
 	 */
 
-	/* we support multiple ports and multiples hosts: */
-	id->cmic = (1 << 0) | (1 << 1);
+	/* we support multiple ports, multiples hosts and ANA: */
+	id->cmic = (1 << 0) | (1 << 1) | (1 << 3);
 
 	/* no limit on data transfer sizes for now */
 	id->mdts = 0;
@@ -250,6 +313,11 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 
 	id->msdbd = ctrl->ops->msdbd;
 
+	id->anacap = (1 << 0) | (1 << 1) | (1 << 2) | (1 << 3) | (1 << 4);
+	id->anatt = 10; /* random value */
+	id->anagrpmax = cpu_to_le32(NVMET_MAX_ANAGRPS);
+	id->nanagrpid = cpu_to_le32(NVMET_MAX_ANAGRPS);
+
 	/*
 	 * Meh, we don't really support any power state.  Fake up the same
 	 * values that qemu does.
@@ -287,8 +355,15 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
 	 * nuse = ncap = nsze isn't always true, but we have no way to find
 	 * that out from the underlying device.
 	 */
-	id->ncap = id->nuse = id->nsze =
-		cpu_to_le64(ns->size >> ns->blksize_shift);
+	id->ncap = id->nsze = cpu_to_le64(ns->size >> ns->blksize_shift);
+	switch (req->port->ana_state[ns->anagrpid]) {
+	case NVME_ANA_INACCESSIBLE:
+	case NVME_ANA_PERSISTENT_LOSS:
+		break;
+	default:
+		id->nuse = id->nsze;
+		break;
+        }
 
 	/*
 	 * We just provide a single LBA format that matches what the
@@ -302,6 +377,7 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
 	 * controllers, but also with any other user of the block device.
 	 */
 	id->nmic = (1 << 0);
+	id->anagrpid = cpu_to_le32(ns->anagrpid);
 
 	memcpy(&id->nguid, &ns->nguid, sizeof(uuid_le));
 
@@ -582,6 +658,9 @@ u16 nvmet_parse_admin_cmd(struct nvmet_req *req)
 		case NVME_LOG_CHANGED_NS:
 			req->execute = nvmet_execute_get_log_changed_ns;
 			return 0;
+		case NVME_LOG_ANA:
+			req->execute = nvmet_execute_get_log_page_ana;
+			return 0;
 		}
 		break;
 	case nvme_admin_identify:
diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index ad9ff27234b5..7aa5488dc453 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -861,6 +861,7 @@ static void nvmet_port_release(struct config_item *item)
 {
 	struct nvmet_port *port = to_nvmet_port(item);
 
+	kfree(port->ana_state);
 	kfree(port);
 }
 
@@ -896,6 +897,15 @@ static struct config_group *nvmet_ports_make(struct config_group *group,
 	if (!port)
 		return ERR_PTR(-ENOMEM);
 
+	port->ana_state = kcalloc(NVMET_MAX_ANAGRPS + 1,
+			sizeof(*port->ana_state), GFP_KERNEL);
+	if (!port->ana_state) {
+		kfree(port);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	port->ana_state[NVMET_DEFAULT_ANA_GRPID] = NVME_ANA_OPTIMIZED;
+
 	INIT_LIST_HEAD(&port->entry);
 	INIT_LIST_HEAD(&port->subsystems);
 	INIT_LIST_HEAD(&port->referrals);
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index f691ae47c83a..e3f0762efe52 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -39,6 +39,10 @@ static DEFINE_IDA(cntlid_ida);
  */
 DECLARE_RWSEM(nvmet_config_sem);
 
+u32 nvmet_ana_group_enabled[NVMET_MAX_ANAGRPS + 1];
+u64 nvmet_ana_chgcnt;
+DECLARE_RWSEM(nvmet_ana_sem);
+
 static struct nvmet_subsys *nvmet_find_get_subsys(struct nvmet_port *port,
 		const char *subsysnqn);
 
@@ -425,6 +429,10 @@ void nvmet_ns_free(struct nvmet_ns *ns)
 {
 	nvmet_ns_disable(ns);
 
+	down_write(&nvmet_ana_sem);
+	nvmet_ana_group_enabled[ns->anagrpid]--;
+	up_write(&nvmet_ana_sem);
+
 	kfree(ns->device_path);
 	kfree(ns);
 }
@@ -442,6 +450,12 @@ struct nvmet_ns *nvmet_ns_alloc(struct nvmet_subsys *subsys, u32 nsid)
 
 	ns->nsid = nsid;
 	ns->subsys = subsys;
+
+	down_write(&nvmet_ana_sem);
+	ns->anagrpid = NVMET_DEFAULT_ANA_GRPID;
+	nvmet_ana_group_enabled[ns->anagrpid]++;
+	up_write(&nvmet_ana_sem);
+
 	uuid_gen(&ns->uuid);
 
 	return ns;
@@ -548,6 +562,20 @@ int nvmet_sq_init(struct nvmet_sq *sq)
 }
 EXPORT_SYMBOL_GPL(nvmet_sq_init);
 
+static inline u16 nvmet_check_ana_state(struct nvmet_port *port,
+		struct nvmet_ns *ns)
+{
+	enum nvme_ana_state state = port->ana_state[ns->anagrpid];
+
+	if (unlikely(state == NVME_ANA_INACCESSIBLE))
+		return NVME_SC_ANA_INACCESSIBLE;
+	if (unlikely(state == NVME_ANA_PERSISTENT_LOSS))
+		return NVME_SC_ANA_PERSISTENT_LOSS;
+	if (unlikely(state == NVME_ANA_CHANGE))
+		return NVME_SC_ANA_TRANSITION;
+	return 0;
+}
+
 static u16 nvmet_parse_io_cmd(struct nvmet_req *req)
 {
 	struct nvme_command *cmd = req->cmd;
@@ -560,6 +588,9 @@ static u16 nvmet_parse_io_cmd(struct nvmet_req *req)
 	req->ns = nvmet_find_namespace(req->sq->ctrl, cmd->rw.nsid);
 	if (unlikely(!req->ns))
 		return NVME_SC_INVALID_NS | NVME_SC_DNR;
+	ret = nvmet_check_ana_state(req->port, req->ns);
+	if (unlikely(ret))
+		return ret;
 
 	if (req->ns->file)
 		return nvmet_file_parse_io_cmd(req);
@@ -1107,6 +1138,8 @@ static int __init nvmet_init(void)
 {
 	int error;
 
+	nvmet_ana_group_enabled[NVMET_DEFAULT_ANA_GRPID] = 1;
+
 	error = nvmet_init_discovery();
 	if (error)
 		goto out;
@@ -1114,6 +1147,7 @@ static int __init nvmet_init(void)
 	error = nvmet_init_configfs();
 	if (error)
 		goto out_exit_discovery;
+
 	return 0;
 
 out_exit_discovery:
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index bd24e2b7317d..d5d08f70eb0f 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -64,6 +64,7 @@ struct nvmet_ns {
 	loff_t			size;
 	u8			nguid[16];
 	uuid_t			uuid;
+	u32			anagrpid;
 
 	bool			enabled;
 	struct nvmet_subsys	*subsys;
@@ -114,6 +115,7 @@ struct nvmet_port {
 	struct list_head		subsystems;
 	struct config_group		referrals_group;
 	struct list_head		referrals;
+	enum nvme_ana_state		*ana_state;
 	void				*priv;
 	bool				enabled;
 };
@@ -365,6 +367,15 @@ u32 nvmet_get_log_page_len(struct nvme_command *cmd);
  */
 #define NVMET_MAX_NAMESPACES	1024
 
+/*
+ * 0 is not a valid ANA group ID, so we start numbering at 1.
+ *
+ * ANA Group 1 exists without manual intervention, has namespaces assigned to it
+ * by default, and is available in an optimized state through all ports.
+ */
+#define NVMET_MAX_ANAGRPS	1
+#define NVMET_DEFAULT_ANA_GRPID	1
+
 #define NVMET_KAS		10
 #define NVMET_DISC_KATO		120
 
@@ -378,6 +389,10 @@ extern struct nvmet_subsys *nvmet_disc_subsys;
 extern u64 nvmet_genctr;
 extern struct rw_semaphore nvmet_config_sem;
 
+extern u32 nvmet_ana_group_enabled[NVMET_MAX_ANAGRPS + 1];
+extern u64 nvmet_ana_chgcnt;
+extern struct rw_semaphore nvmet_ana_sem;
+
 bool nvmet_host_allowed(struct nvmet_req *req, struct nvmet_subsys *subsys,
 		const char *hostnqn);
 
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 8/9] nvmet: support configuring additional ANA groups
  2018-06-01  7:11 draft ANA support v2 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2018-06-01  7:11 ` [PATCH 7/9] nvmet: add minimal ANA support Christoph Hellwig
@ 2018-06-01  7:11 ` Christoph Hellwig
  2018-06-04  6:42   ` Hannes Reinecke
  2018-06-01  7:11 ` [PATCH 9/9] nvmet: make ANATT configurable Christoph Hellwig
  8 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2018-06-01  7:11 UTC (permalink / raw)


Allow creating non-default ANA groups (group ID > 1).  Groups are created
either by assigning the group ID to a namespace, or by creating a configfs
group object under a specific port.  All namespaces assigned to a group
that doesn't have a configfs object for a given port are marked as
inaccessible.

For all changes in ANA configuration the ANA change AEN is sent.  We only
keep a global changecount instead of additional per-group changecounts to
keep the implementation as simple as possible.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/target/admin-cmd.c |   1 +
 drivers/nvme/target/configfs.c  | 170 +++++++++++++++++++++++++++++++-
 drivers/nvme/target/core.c      |  24 +++++
 drivers/nvme/target/nvmet.h     |  29 +++++-
 4 files changed, 220 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 3f4a3794a17e..128b7e773a8b 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -203,6 +203,7 @@ static void nvmet_execute_get_log_page_ana(struct nvmet_req *req)
 
 	hdr.chgcnt = cpu_to_le64(nvmet_ana_chgcnt);
 	hdr.ngrps = cpu_to_le16(ngrps);
+	clear_bit(NVME_AEN_CFG_ANA_CHANGE, &req->sq->ctrl->aen_masked);
 	up_read(&nvmet_ana_sem);
 
 	kfree(desc);
diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 7aa5488dc453..ddc3fa9fd92f 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -378,6 +378,39 @@ static ssize_t nvmet_ns_device_nguid_store(struct config_item *item,
 
 CONFIGFS_ATTR(nvmet_ns_, device_nguid);
 
+static ssize_t nvmet_ns_ana_grpid_show(struct config_item *item, char *page)
+{
+	return sprintf(page, "%u\n", to_nvmet_ns(item)->anagrpid);
+}
+
+static ssize_t nvmet_ns_ana_grpid_store(struct config_item *item,
+		const char *page, size_t count)
+{
+	struct nvmet_ns *ns = to_nvmet_ns(item);
+	u32 oldgrpid, newgrpid;
+	int ret;
+
+	ret = kstrtou32(page, 0, &newgrpid);
+	if (ret)
+		return ret;
+
+	if (newgrpid < 1 || newgrpid > NVMET_MAX_ANAGRPS)
+		return -EINVAL;
+
+	down_write(&nvmet_ana_sem);
+	oldgrpid = ns->anagrpid;
+	nvmet_ana_group_enabled[newgrpid]++;
+	ns->anagrpid = newgrpid;
+	nvmet_ana_group_enabled[oldgrpid]--;
+	nvmet_ana_chgcnt++;
+	up_write(&nvmet_ana_sem);
+
+	nvmet_send_ana_event(ns->subsys);
+	return count;
+}
+
+CONFIGFS_ATTR(nvmet_ns_, ana_grpid);
+
 static ssize_t nvmet_ns_enable_show(struct config_item *item, char *page)
 {
 	return sprintf(page, "%d\n", to_nvmet_ns(item)->enabled);
@@ -407,6 +440,7 @@ static struct configfs_attribute *nvmet_ns_attrs[] = {
 	&nvmet_ns_attr_device_path,
 	&nvmet_ns_attr_device_nguid,
 	&nvmet_ns_attr_device_uuid,
+	&nvmet_ns_attr_ana_grpid,
 	&nvmet_ns_attr_enable,
 	NULL,
 };
@@ -854,6 +888,131 @@ static const struct config_item_type nvmet_referrals_type = {
 	.ct_group_ops	= &nvmet_referral_group_ops,
 };
 
+static struct {
+	enum nvme_ana_state	state;
+	const char		*name;
+} nvmet_ana_state_names[] = {
+	{ NVME_ANA_OPTIMIZED,		"optimized" },
+	{ NVME_ANA_NONOPTIMIZED,	"non-optimized" },
+	{ NVME_ANA_INACCESSIBLE,	"inaccessible" },
+	{ NVME_ANA_PERSISTENT_LOSS,	"persistent-loss" },
+	{ NVME_ANA_CHANGE,		"change" },
+};
+
+static ssize_t nvmet_ana_group_ana_state_show(struct config_item *item,
+		char *page)
+{
+	struct nvmet_ana_group *grp = to_ana_group(item);
+	enum nvme_ana_state state = grp->port->ana_state[grp->grpid];
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(nvmet_ana_state_names); i++) {
+		if (state != nvmet_ana_state_names[i].state)
+			continue;
+		return sprintf(page, "%s\n", nvmet_ana_state_names[i].name);
+	}
+
+	return sprintf(page, "\n");
+}
+
+static ssize_t nvmet_ana_group_ana_state_store(struct config_item *item,
+		const char *page, size_t count)
+{
+	struct nvmet_ana_group *grp = to_ana_group(item);
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(nvmet_ana_state_names); i++) {
+		if (sysfs_streq(page, nvmet_ana_state_names[i].name))
+			goto found;
+	}
+
+	pr_err("Invalid value '%s' for ana_state\n", page);
+	return -EINVAL;
+
+found:
+	down_write(&nvmet_ana_sem);
+	grp->port->ana_state[grp->grpid] = nvmet_ana_state_names[i].state;
+	nvmet_ana_chgcnt++;
+	up_write(&nvmet_ana_sem);
+
+	nvmet_port_send_ana_event(grp->port);
+	return count;
+}
+
+CONFIGFS_ATTR(nvmet_ana_group_, ana_state);
+
+static struct configfs_attribute *nvmet_ana_group_attrs[] = {
+	&nvmet_ana_group_attr_ana_state,
+	NULL,
+};
+
+static void nvmet_ana_group_release(struct config_item *item)
+{
+	struct nvmet_ana_group *grp = to_ana_group(item);
+
+	down_write(&nvmet_ana_sem);
+	grp->port->ana_state[grp->grpid] = NVME_ANA_INACCESSIBLE;
+	nvmet_ana_group_enabled[grp->grpid]--;
+	up_write(&nvmet_ana_sem);
+
+	nvmet_port_send_ana_event(grp->port);
+	kfree(grp);
+}
+
+static struct configfs_item_operations nvmet_ana_group_item_ops = {
+	.release		= nvmet_ana_group_release,
+};
+
+static const struct config_item_type nvmet_ana_group_type = {
+	.ct_item_ops		= &nvmet_ana_group_item_ops,
+	.ct_attrs		= nvmet_ana_group_attrs,
+	.ct_owner		= THIS_MODULE,
+};
+
+static struct config_group *nvmet_ana_groups_make_group(
+		struct config_group *group, const char *name)
+{
+	struct nvmet_port *port = ana_groups_to_port(&group->cg_item);
+	struct nvmet_ana_group *grp;
+	u32 grpid;
+	int ret;
+
+	ret = kstrtou32(name, 0, &grpid);
+	if (ret)
+		goto out;
+
+	ret = -EINVAL;
+	if (grpid <= 1 || grpid > NVMET_MAX_ANAGRPS)
+		goto out;
+
+	ret = -ENOMEM;
+	grp = kzalloc(sizeof(*grp), GFP_KERNEL);
+	if (!grp)
+		goto out;
+	grp->port = port;
+	grp->grpid = grpid;
+
+	down_write(&nvmet_ana_sem);
+	nvmet_ana_group_enabled[grpid]++;
+	up_write(&nvmet_ana_sem);
+
+	nvmet_port_send_ana_event(grp->port);
+
+	config_group_init_type_name(&grp->group, name, &nvmet_ana_group_type);
+	return &grp->group;
+out:
+	return ERR_PTR(ret);
+}
+
+static struct configfs_group_operations nvmet_ana_groups_group_ops = {
+	.make_group		= nvmet_ana_groups_make_group,
+};
+
+static const struct config_item_type nvmet_ana_groups_type = {
+	.ct_group_ops		= &nvmet_ana_groups_group_ops,
+	.ct_owner		= THIS_MODULE,
+};
+
 /*
  * Ports definitions.
  */
@@ -889,6 +1048,7 @@ static struct config_group *nvmet_ports_make(struct config_group *group,
 {
 	struct nvmet_port *port;
 	u16 portid;
+	u32 i;
 
 	if (kstrtou16(name, 0, &portid))
 		return ERR_PTR(-EINVAL);
@@ -904,7 +1064,12 @@ static struct config_group *nvmet_ports_make(struct config_group *group,
 		return ERR_PTR(-ENOMEM);
 	}
 
-	port->ana_state[NVMET_DEFAULT_ANA_GRPID] = NVME_ANA_OPTIMIZED;
+	for (i = 1; i <= NVMET_MAX_ANAGRPS; i++) {
+		if (i == NVMET_DEFAULT_ANA_GRPID)
+			port->ana_state[1] = NVME_ANA_OPTIMIZED;
+		else
+			port->ana_state[i] = NVME_ANA_INACCESSIBLE;
+	}
 
 	INIT_LIST_HEAD(&port->entry);
 	INIT_LIST_HEAD(&port->subsystems);
@@ -921,6 +1086,9 @@ static struct config_group *nvmet_ports_make(struct config_group *group,
 			"referrals", &nvmet_referrals_type);
 	configfs_add_default_group(&port->referrals_group, &port->group);
 
+	config_group_init_type_name(&port->ana_groups_group,
+			"ana_groups", &nvmet_ana_groups_type);
+	configfs_add_default_group(&port->ana_groups_group, &port->group);
 	return &port->group;
 }
 
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index e3f0762efe52..b38c129642f3 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -193,6 +193,30 @@ static void nvmet_ns_changed(struct nvmet_subsys *subsys, u32 nsid)
 	}
 }
 
+void nvmet_send_ana_event(struct nvmet_subsys *subsys)
+{
+	struct nvmet_ctrl *ctrl;
+
+	mutex_lock(&subsys->lock);
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
+		if (nvmet_aen_disabled(ctrl, NVME_AEN_CFG_ANA_CHANGE))
+			continue;
+		nvmet_add_async_event(ctrl, NVME_AER_TYPE_NOTICE,
+				NVME_AER_NOTICE_ANA, NVME_LOG_ANA);
+	}
+	mutex_unlock(&subsys->lock);
+}
+
+void nvmet_port_send_ana_event(struct nvmet_port *port)
+{
+	struct nvmet_subsys_link *p;
+
+	down_read(&nvmet_config_sem);
+	list_for_each_entry(p, &port->subsystems, entry)
+		nvmet_send_ana_event(p->subsys);
+	up_read(&nvmet_config_sem);
+}
+
 int nvmet_register_transport(const struct nvmet_fabrics_ops *ops)
 {
 	int ret = 0;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index d5d08f70eb0f..74ba9c3e90e5 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -24,18 +24,18 @@
 #include <linux/uuid.h>
 #include <linux/nvme.h>
 #include <linux/configfs.h>
+#include <linux/refcount.h>
 #include <linux/rcupdate.h>
 #include <linux/blkdev.h>
 
 #define NVMET_ASYNC_EVENTS		4
 #define NVMET_ERROR_LOG_SLOTS		128
 
-
 /*
  * Supported optional AENs:
  */
 #define NVMET_AEN_CFG_OPTIONAL \
-	NVME_AEN_CFG_NS_ATTR
+	(NVME_AEN_CFG_NS_ATTR | NVME_AEN_CFG_ANA_CHANGE)
 
 /*
  * Plus mandatory SMART AENs (we'll never send them, but allow enabling them):
@@ -115,6 +115,7 @@ struct nvmet_port {
 	struct list_head		subsystems;
 	struct config_group		referrals_group;
 	struct list_head		referrals;
+	struct config_group		ana_groups_group;
 	enum nvme_ana_state		*ana_state;
 	void				*priv;
 	bool				enabled;
@@ -126,6 +127,13 @@ static inline struct nvmet_port *to_nvmet_port(struct config_item *item)
 			group);
 }
 
+static inline struct nvmet_port *ana_groups_to_port(
+		struct config_item *item)
+{
+	return container_of(to_config_group(item), struct nvmet_port,
+			ana_groups_group);
+}
+
 struct nvmet_ctrl {
 	struct nvmet_subsys	*subsys;
 	struct nvmet_cq		**cqs;
@@ -214,6 +222,18 @@ static inline char *nvmet_host_name(struct nvmet_host *host)
 	return config_item_name(&host->group.cg_item);
 }
 
+struct nvmet_ana_group {
+	struct config_group	group;
+	struct nvmet_port	*port;
+	u32			grpid;
+};
+
+static inline struct nvmet_ana_group *to_ana_group(struct config_item *item)
+{
+	return container_of(to_config_group(item), struct nvmet_ana_group,
+			group);
+}
+
 struct nvmet_host_link {
 	struct list_head	entry;
 	struct nvmet_host	*host;
@@ -340,6 +360,9 @@ void nvmet_ns_disable(struct nvmet_ns *ns);
 struct nvmet_ns *nvmet_ns_alloc(struct nvmet_subsys *subsys, u32 nsid);
 void nvmet_ns_free(struct nvmet_ns *ns);
 
+void nvmet_send_ana_event(struct nvmet_subsys *subsys);
+void nvmet_port_send_ana_event(struct nvmet_port *port);
+
 int nvmet_register_transport(const struct nvmet_fabrics_ops *ops);
 void nvmet_unregister_transport(const struct nvmet_fabrics_ops *ops);
 
@@ -373,7 +396,7 @@ u32 nvmet_get_log_page_len(struct nvme_command *cmd);
  * ANA Group 1 exists without manual intervention, has namespaces assigned to it
  * by default, and is available in an optimized state through all ports.
  */
-#define NVMET_MAX_ANAGRPS	1
+#define NVMET_MAX_ANAGRPS	128
 #define NVMET_DEFAULT_ANA_GRPID	1
 
 #define NVMET_KAS		10
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 9/9] nvmet: make ANATT configurable
  2018-06-01  7:11 draft ANA support v2 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2018-06-01  7:11 ` [PATCH 8/9] nvmet: support configuring additional ANA groups Christoph Hellwig
@ 2018-06-01  7:11 ` Christoph Hellwig
  2018-06-04 10:21   ` Hannes Reinecke
  8 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2018-06-01  7:11 UTC (permalink / raw)


From: Hannes Reinecke <hare@suse.de>

Signed-off-by: Hannes Reinecke <hare at suse.com>
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/target/admin-cmd.c |  2 +-
 drivers/nvme/target/configfs.c  | 29 +++++++++++++++++++++++++++++
 drivers/nvme/target/core.c      |  1 +
 drivers/nvme/target/nvmet.h     |  2 ++
 4 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 128b7e773a8b..04c3dd513bcf 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -315,7 +315,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 	id->msdbd = ctrl->ops->msdbd;
 
 	id->anacap = (1 << 0) | (1 << 1) | (1 << 2) | (1 << 3) | (1 << 4);
-	id->anatt = 10; /* random value */
+	id->anatt = ctrl->subsys->anatt;
 	id->anagrpmax = cpu_to_le32(NVMET_MAX_ANAGRPS);
 	id->nanagrpid = cpu_to_le32(NVMET_MAX_ANAGRPS);
 
diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index ddc3fa9fd92f..706b42b7935e 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -738,10 +738,39 @@ static ssize_t nvmet_subsys_attr_serial_store(struct config_item *item,
 }
 CONFIGFS_ATTR(nvmet_subsys_, attr_serial);
 
+static ssize_t nvmet_subsys_attr_anatt_show(struct config_item *item,
+					    char *page)
+{
+	struct nvmet_subsys *subsys = to_subsys(item);
+
+	return snprintf(page, PAGE_SIZE, "%d\n", subsys->anatt);
+}
+
+static ssize_t nvmet_subsys_attr_anatt_store(struct config_item *item,
+					     const char *page, size_t count)
+{
+	struct nvmet_subsys *subsys = to_subsys(item);
+	int ret;
+	u8 ana_tt;
+
+	ret = kstrtou8(page, 0, &ana_tt);
+	if (ret)
+		return ret;
+	if (ana_tt == 0)
+		return -EINVAL;
+
+	down_write(&nvmet_config_sem);
+	subsys->anatt = ana_tt;
+	up_write(&nvmet_config_sem);
+	return count;
+}
+CONFIGFS_ATTR(nvmet_subsys_, attr_anatt);
+
 static struct configfs_attribute *nvmet_subsys_attrs[] = {
 	&nvmet_subsys_attr_attr_allow_any_host,
 	&nvmet_subsys_attr_attr_version,
 	&nvmet_subsys_attr_attr_serial,
+	&nvmet_subsys_attr_attr_anatt,
 	NULL,
 };
 
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index b38c129642f3..cebc57876889 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -1115,6 +1115,7 @@ struct nvmet_subsys *nvmet_subsys_alloc(const char *subsysnqn,
 		return NULL;
 	}
 	subsys->type = type;
+	subsys->anatt = NVMET_DEFAULT_ANATT;
 	subsys->subsysnqn = kstrndup(subsysnqn, NVMF_NQN_SIZE,
 			GFP_KERNEL);
 	if (!subsys->subsysnqn) {
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 74ba9c3e90e5..ba4aca31aa60 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -178,6 +178,7 @@ struct nvmet_subsys {
 	struct list_head	namespaces;
 	unsigned int		nr_namespaces;
 	unsigned int		max_nsid;
+	u8			anatt;
 
 	struct list_head	ctrls;
 
@@ -398,6 +399,7 @@ u32 nvmet_get_log_page_len(struct nvme_command *cmd);
  */
 #define NVMET_MAX_ANAGRPS	128
 #define NVMET_DEFAULT_ANA_GRPID	1
+#define NVMET_DEFAULT_ANATT	10
 
 #define NVMET_KAS		10
 #define NVMET_DISC_KATO		120
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH 4/9] nvme: always failover on path or transport errors
  2018-06-01  7:11 ` [PATCH 4/9] nvme: always failover on path or transport errors Christoph Hellwig
@ 2018-06-01 15:27   ` Mike Snitzer
  2018-06-04  6:29     ` Christoph Hellwig
  2018-06-03 12:28   ` Sagi Grimberg
  2018-06-04  6:32   ` Hannes Reinecke
  2 siblings, 1 reply; 43+ messages in thread
From: Mike Snitzer @ 2018-06-01 15:27 UTC (permalink / raw)


On Fri, Jun 01 2018 at  3:11P -0400,
Christoph Hellwig <hch@lst.de> wrote:

> NVMe 1.3 TP 4028 introduced a new status code type 3h for
> "Path Related Status".  We should always retry on another path for this
> class of errors, without even trying to decode them.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>  drivers/nvme/host/multipath.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index d7b664ae5923..9412e5138d65 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -59,6 +59,9 @@ bool nvme_req_needs_failover(struct request *req, blk_status_t error)
>  {
>  	if (!(req->cmd_flags & REQ_NVME_MPATH))
>  		return false;
> +	/* always failover for path related errors */
> +	if ((nvme_req(req)->status & 0x700) == 0x300)
> +		return true;
>  	return blk_path_error(error);
>  }

Shouldn't this go into nvme_error_status(), with a new NVME_SC if
needed, so that blk_path_error() just works?

Mike

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 1/9] nvme: don't hold nvmf_transports_rwsem for more than transport lookups
  2018-06-01  7:11 ` [PATCH 1/9] nvme: don't hold nvmf_transports_rwsem for more than transport lookups Christoph Hellwig
@ 2018-06-03 12:26   ` Sagi Grimberg
  2018-06-04  6:30   ` Hannes Reinecke
  1 sibling, 0 replies; 43+ messages in thread
From: Sagi Grimberg @ 2018-06-03 12:26 UTC (permalink / raw)


Looks fine,

Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 2/9] nvme.h: add ANA definitions
  2018-06-01  7:11 ` [PATCH 2/9] nvme.h: add ANA definitions Christoph Hellwig
@ 2018-06-03 12:27   ` Sagi Grimberg
  2018-06-04  6:30   ` Hannes Reinecke
  2018-06-04 11:58   ` Johannes Thumshirn
  2 siblings, 0 replies; 43+ messages in thread
From: Sagi Grimberg @ 2018-06-03 12:27 UTC (permalink / raw)


Reviewed-by: Sagi Grimberg <sagi at grimbeg.me>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 3/9] nvme: add support for the log specific field
  2018-06-01  7:11 ` [PATCH 3/9] nvme: add support for the log specific field Christoph Hellwig
@ 2018-06-03 12:27   ` Sagi Grimberg
  2018-06-04  6:31   ` Hannes Reinecke
  2018-06-04 11:59   ` Johannes Thumshirn
  2 siblings, 0 replies; 43+ messages in thread
From: Sagi Grimberg @ 2018-06-03 12:27 UTC (permalink / raw)


Reviewed-by: Sagi Grimberg <sagi at grimbeg.me>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 4/9] nvme: always failover on path or transport errors
  2018-06-01  7:11 ` [PATCH 4/9] nvme: always failover on path or transport errors Christoph Hellwig
  2018-06-01 15:27   ` Mike Snitzer
@ 2018-06-03 12:28   ` Sagi Grimberg
  2018-06-04  6:32   ` Hannes Reinecke
  2 siblings, 0 replies; 43+ messages in thread
From: Sagi Grimberg @ 2018-06-03 12:28 UTC (permalink / raw)


OK,

Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 4/9] nvme: always failover on path or transport errors
  2018-06-01 15:27   ` Mike Snitzer
@ 2018-06-04  6:29     ` Christoph Hellwig
  2018-06-04 12:01       ` Johannes Thumshirn
  2018-06-04 12:23       ` Mike Snitzer
  0 siblings, 2 replies; 43+ messages in thread
From: Christoph Hellwig @ 2018-06-04  6:29 UTC (permalink / raw)


On Fri, Jun 01, 2018@11:27:11AM -0400, Mike Snitzer wrote:
> > @@ -59,6 +59,9 @@ bool nvme_req_needs_failover(struct request *req, blk_status_t error)
> >  {
> >  	if (!(req->cmd_flags & REQ_NVME_MPATH))
> >  		return false;
> > +	/* always failover for path related errors */
> > +	if ((nvme_req(req)->status & 0x700) == 0x300)
> > +		return true;
> >  	return blk_path_error(error);
> >  }
> 
> Shouldn't this go into nvme_error_status(), with a new NVME_SC if
> needed, so that blk_path_error() just works?

NVME_SC values are defined in the NVMe spec.  The check is intended
as a catchall for multiple of theses values, as we've defined a whole
extensible class for path errors.

But we could probably add a BLK_STS_PATH and I'll map it to that.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 1/9] nvme: don't hold nvmf_transports_rwsem for more than transport lookups
  2018-06-01  7:11 ` [PATCH 1/9] nvme: don't hold nvmf_transports_rwsem for more than transport lookups Christoph Hellwig
  2018-06-03 12:26   ` Sagi Grimberg
@ 2018-06-04  6:30   ` Hannes Reinecke
  1 sibling, 0 replies; 43+ messages in thread
From: Hannes Reinecke @ 2018-06-04  6:30 UTC (permalink / raw)


On Fri,  1 Jun 2018 09:11:20 +0200
Christoph Hellwig <hch@lst.de> wrote:

> From: Johannes Thumshirn <jthumshirn at suse.de>
> 
> Only take nvmf_transports_rwsem when doing a lookup of registered
> transports, so that a blocking ->create_ctrl doesn't prevent other
> actions on /dev/nvme-fabrics.
> 
> Signed-off-by: Johannes Thumshirn <jthumshirn at suse.de>
> [hch: increased lock hold time a bit to be safe, added a comment
>  and updated the changelog]
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>  drivers/nvme/host/fabrics.c | 3 ++-
>  drivers/nvme/host/fabrics.h | 3 +++
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 

Reviewed-by: Hannes Reinecke <hare at suse.com>

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 2/9] nvme.h: add ANA definitions
  2018-06-01  7:11 ` [PATCH 2/9] nvme.h: add ANA definitions Christoph Hellwig
  2018-06-03 12:27   ` Sagi Grimberg
@ 2018-06-04  6:30   ` Hannes Reinecke
  2018-06-04 11:58   ` Johannes Thumshirn
  2 siblings, 0 replies; 43+ messages in thread
From: Hannes Reinecke @ 2018-06-04  6:30 UTC (permalink / raw)


On Fri,  1 Jun 2018 09:11:21 +0200
Christoph Hellwig <hch@lst.de> wrote:

> Add various defintions from NVMe 1.3 TP 4004.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>  include/linux/nvme.h | 50
> +++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 47
> insertions(+), 3 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare at suse.com>

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 3/9] nvme: add support for the log specific field
  2018-06-01  7:11 ` [PATCH 3/9] nvme: add support for the log specific field Christoph Hellwig
  2018-06-03 12:27   ` Sagi Grimberg
@ 2018-06-04  6:31   ` Hannes Reinecke
  2018-06-04 11:59   ` Johannes Thumshirn
  2 siblings, 0 replies; 43+ messages in thread
From: Hannes Reinecke @ 2018-06-04  6:31 UTC (permalink / raw)


On Fri,  1 Jun 2018 09:11:22 +0200
Christoph Hellwig <hch@lst.de> wrote:

> NVMe 1.3 added a new log specific field to the get log page CQ
> defintion.  Add the field and support passing it to nvme_get_log_ext.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>  drivers/nvme/host/core.c     | 6 +++---
>  drivers/nvme/host/lightnvm.c | 2 +-
>  drivers/nvme/host/nvme.h     | 2 +-
>  include/linux/nvme.h         | 2 +-
>  4 files changed, 6 insertions(+), 6 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare at suse.com>

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 4/9] nvme: always failover on path or transport errors
  2018-06-01  7:11 ` [PATCH 4/9] nvme: always failover on path or transport errors Christoph Hellwig
  2018-06-01 15:27   ` Mike Snitzer
  2018-06-03 12:28   ` Sagi Grimberg
@ 2018-06-04  6:32   ` Hannes Reinecke
  2 siblings, 0 replies; 43+ messages in thread
From: Hannes Reinecke @ 2018-06-04  6:32 UTC (permalink / raw)


On Fri,  1 Jun 2018 09:11:23 +0200
Christoph Hellwig <hch@lst.de> wrote:

> NVMe 1.3 TP 4028 introduced a new status code type 3h for
> "Path Related Status".  We should always retry on another path for
> this class of errors, without even trying to decode them.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>  drivers/nvme/host/multipath.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/nvme/host/multipath.c
> b/drivers/nvme/host/multipath.c index d7b664ae5923..9412e5138d65
> 100644 --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -59,6 +59,9 @@ bool nvme_req_needs_failover(struct request *req,
> blk_status_t error) {
>  	if (!(req->cmd_flags & REQ_NVME_MPATH))
>  		return false;
> +	/* always failover for path related errors */
> +	if ((nvme_req(req)->status & 0x700) == 0x300)
> +		return true;
>  	return blk_path_error(error);
>  }
>  

Reviewed-by: Hannes Reinecke <hare at suse.com>

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 6/9] nvmet: track and limit the number of namespaces per subsystem
  2018-06-01  7:11 ` [PATCH 6/9] nvmet: track and limit the number of namespaces per subsystem Christoph Hellwig
@ 2018-06-04  6:35   ` Hannes Reinecke
  2018-06-04 12:04   ` Johannes Thumshirn
  1 sibling, 0 replies; 43+ messages in thread
From: Hannes Reinecke @ 2018-06-04  6:35 UTC (permalink / raw)


On Fri,  1 Jun 2018 09:11:25 +0200
Christoph Hellwig <hch@lst.de> wrote:

> TP 4004 introduces a new 'Maximum Number of Allocated Namespaces'
> field in the Identify controller data to help the host size
> resources.  Put an upper limit on the supported namespaces to be able
> to support this value as supporting 32-bits worth of namespaces would
> lead to very large buffers.  The limit is completely arbitrary at
> this point.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>  drivers/nvme/target/admin-cmd.c | 1 +
>  drivers/nvme/target/core.c      | 8 +++++++-
>  drivers/nvme/target/nvmet.h     | 8 ++++++++
>  3 files changed, 16 insertions(+), 1 deletion(-)
> 

Reviewed-by: Hannes Reinecke <hare at suse.com>

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 5/9] nvme: add ANA support
  2018-06-01  7:11 ` [PATCH 5/9] nvme: add ANA support Christoph Hellwig
@ 2018-06-04  6:36   ` Hannes Reinecke
  2018-06-04  7:03     ` Christoph Hellwig
  2018-06-06 12:01   ` Popuri, Sriram
  1 sibling, 1 reply; 43+ messages in thread
From: Hannes Reinecke @ 2018-06-04  6:36 UTC (permalink / raw)


On Fri,  1 Jun 2018 09:11:24 +0200
Christoph Hellwig <hch@lst.de> wrote:

> Add support for Asynchronous Namespace Access as specified in NVMe 1.3
> TP 4004.  With ANA each namespace attached to a controller belongs to
> an ANA group that describes the characteristics of accessing the
> namespaces through this controller.  In the optimized and
> non-optimized states namespaces can be accessed regularly, although
> in a multi-pathing environment we should always prefer to access a
> namespace through a controller where an optimized relationship
> exists.  Namespaces in Inaccessible, Permanent-Loss or Change state
> for a given controller should not be accessed.
> 
> We keep a simple per-controller array of ANA states which is indexed
> by the ANA group ID specified in the namespace.  The states are
> updated through reading the ANA log page, which is read once during
> controller initialization, and whenever the ANA change notice AEN is
> received, or when one of the ANA specific status codes that signal a
> state change is received on a command.
> 
> There currently isn't any support for the ANA transition timeout
> yet.
> 
> Includes fixes and improvements from Hannes Reinecke.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>  drivers/nvme/host/core.c      |  30 ++++-
>  drivers/nvme/host/multipath.c | 210
> +++++++++++++++++++++++++++++++++- drivers/nvme/host/nvme.h      |
> 28 +++++ 3 files changed, 261 insertions(+), 7 deletions(-)
> 

This suffers from the 'nvme connect' stall if the first detected
namespace is inaccessible.

I would prefer to have it fixed via a separate patchset, though, and
keep this patch as is.
So:

Reviewed-by: Hannes Reinecke <hare at suse.com>

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 7/9] nvmet: add minimal ANA support
  2018-06-01  7:11 ` [PATCH 7/9] nvmet: add minimal ANA support Christoph Hellwig
@ 2018-06-04  6:41   ` Hannes Reinecke
  2018-06-04 12:25   ` Johannes Thumshirn
  1 sibling, 0 replies; 43+ messages in thread
From: Hannes Reinecke @ 2018-06-04  6:41 UTC (permalink / raw)


On Fri,  1 Jun 2018 09:11:26 +0200
Christoph Hellwig <hch@lst.de> wrote:

> Add support for Asynchronous Namespace Access as specified in NVMe 1.3
> TP 4004.
> 
> Just add a default ANA group 1 that is optimized on all ports.  This
> is (and will remain) the default assignment for any namespace not
> epxlicitly assigned to another ANA group.  The ANA state can be
> manually changed through the configfs interface, including the change
> state.
> 
> Includes fixes and improvements from Hannes Reinecke.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>  drivers/nvme/target/admin-cmd.c | 87
> +++++++++++++++++++++++++++++++-- drivers/nvme/target/configfs.c  |
> 10 ++++ drivers/nvme/target/core.c      | 34 +++++++++++++
>  drivers/nvme/target/nvmet.h     | 15 ++++++
>  4 files changed, 142 insertions(+), 4 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare at suse.com>

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 8/9] nvmet: support configuring additional ANA groups
  2018-06-01  7:11 ` [PATCH 8/9] nvmet: support configuring additional ANA groups Christoph Hellwig
@ 2018-06-04  6:42   ` Hannes Reinecke
  2018-06-04  6:53     ` Christoph Hellwig
  0 siblings, 1 reply; 43+ messages in thread
From: Hannes Reinecke @ 2018-06-04  6:42 UTC (permalink / raw)


On Fri,  1 Jun 2018 09:11:27 +0200
Christoph Hellwig <hch@lst.de> wrote:

> Allow creating non-default ANA groups (group ID > 1).  Groups are
> created either by assigning the group ID to a namespace, or by
> creating a configfs group object under a specific port.  All
> namespaces assigned to a group that doesn't have a configfs object
> for a given port are marked as inaccessible.
> 
> For all changes in ANA configuration the ANA change AEN is sent.  We
> only keep a global changecount instead of additional per-group
> changecounts to keep the implementation as simple as possible.
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>  drivers/nvme/target/admin-cmd.c |   1 +
>  drivers/nvme/target/configfs.c  | 170
> +++++++++++++++++++++++++++++++- drivers/nvme/target/core.c      |
> 24 +++++ drivers/nvme/target/nvmet.h     |  29 +++++-
>  4 files changed, 220 insertions(+), 4 deletions(-)
> 

I'm not particularly keen of having the default ANA group _not_ being
present in configfs, but that can be fixed with a later patch.

Reviewed-by: Hannes Reinecke <hare at suse.com>

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 8/9] nvmet: support configuring additional ANA groups
  2018-06-04  6:42   ` Hannes Reinecke
@ 2018-06-04  6:53     ` Christoph Hellwig
  2018-06-04  9:48       ` Hannes Reinecke
  0 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2018-06-04  6:53 UTC (permalink / raw)


On Mon, Jun 04, 2018@08:42:50AM +0200, Hannes Reinecke wrote:
> I'm not particularly keen of having the default ANA group _not_ being
> present in configfs, but that can be fixed with a later patch.

Yes, I came to the conclusion that having it visible and even being
able to change state might be useful.  We just need to careful to
not allow removing it, but I think the SCSI target code has some
precedence for that.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 5/9] nvme: add ANA support
  2018-06-04  6:36   ` Hannes Reinecke
@ 2018-06-04  7:03     ` Christoph Hellwig
  2018-06-04  9:51       ` Hannes Reinecke
  0 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2018-06-04  7:03 UTC (permalink / raw)


On Mon, Jun 04, 2018@08:36:58AM +0200, Hannes Reinecke wrote:
> This suffers from the 'nvme connect' stall if the first detected
> namespace is inaccessible.

Yes.  This isn't really different from any other NVMe target that
constantly returns errors when you connect.

If we want to do this properly we need some block level (e.g. gendisk)
state that a given device is ready for I/O.  And then don't scan
partition tables if it isn't, and instead do the scan once it becomes
life.  In other words: this is something that should be handled
mostly at the block level.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 8/9] nvmet: support configuring additional ANA groups
  2018-06-04  6:53     ` Christoph Hellwig
@ 2018-06-04  9:48       ` Hannes Reinecke
  0 siblings, 0 replies; 43+ messages in thread
From: Hannes Reinecke @ 2018-06-04  9:48 UTC (permalink / raw)


On Mon, 4 Jun 2018 08:53:39 +0200
Christoph Hellwig <hch@lst.de> wrote:

> On Mon, Jun 04, 2018@08:42:50AM +0200, Hannes Reinecke wrote:
> > I'm not particularly keen of having the default ANA group _not_
> > being present in configfs, but that can be fixed with a later
> > patch.  
> 
> Yes, I came to the conclusion that having it visible and even being
> able to change state might be useful.  We just need to careful to
> not allow removing it, but I think the SCSI target code has some
> precedence for that.

No worries, I have a patch for that.
I'm currently rebasing my patchset (including ANATT support) and will
be posting it once testing has completed.

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 5/9] nvme: add ANA support
  2018-06-04  7:03     ` Christoph Hellwig
@ 2018-06-04  9:51       ` Hannes Reinecke
  2018-06-04 12:31         ` Mike Snitzer
  0 siblings, 1 reply; 43+ messages in thread
From: Hannes Reinecke @ 2018-06-04  9:51 UTC (permalink / raw)


On Mon, 4 Jun 2018 09:03:57 +0200
Christoph Hellwig <hch@lst.de> wrote:

> On Mon, Jun 04, 2018@08:36:58AM +0200, Hannes Reinecke wrote:
> > This suffers from the 'nvme connect' stall if the first detected
> > namespace is inaccessible.  
> 
> Yes.  This isn't really different from any other NVMe target that
> constantly returns errors when you connect.
> 
> If we want to do this properly we need some block level (e.g. gendisk)
> state that a given device is ready for I/O.  And then don't scan
> partition tables if it isn't, and instead do the scan once it becomes
> life.  In other words: this is something that should be handled
> mostly at the block level.

Okay; that was one option I figured out, but didn't want to mess with
gendisk unless explicitly requested.
I'll be looking into it.

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 9/9] nvmet: make ANATT configurable
  2018-06-01  7:11 ` [PATCH 9/9] nvmet: make ANATT configurable Christoph Hellwig
@ 2018-06-04 10:21   ` Hannes Reinecke
  2018-06-04 12:11     ` Christoph Hellwig
  0 siblings, 1 reply; 43+ messages in thread
From: Hannes Reinecke @ 2018-06-04 10:21 UTC (permalink / raw)


On Fri,  1 Jun 2018 09:11:28 +0200
Christoph Hellwig <hch@lst.de> wrote:

> From: Hannes Reinecke <hare at suse.de>
> 
> Signed-off-by: Hannes Reinecke <hare at suse.com>
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>  drivers/nvme/target/admin-cmd.c |  2 +-
>  drivers/nvme/target/configfs.c  | 29 +++++++++++++++++++++++++++++
>  drivers/nvme/target/core.c      |  1 +
>  drivers/nvme/target/nvmet.h     |  2 ++
>  4 files changed, 33 insertions(+), 1 deletion(-)
> 

_Actually_ I would like to retract this patch.

I've been working on implementing ANA TT on the target, and found this
particular implementation unworkable.

The problem is that with the current implementation the ANA state is
pretty much tied to the _port_, which might be shared across several
subsystems. Consequently any state change might affect namespaces
on several subsystems, too.

But if the anatt value is tied to the subsystem we'd need to have
different timers/workqueue items per port and subsystem/namespace, which
doesn't really scale. Plus we're not giving anything away by making the
anatt per port, not per subsys.

Please?

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 2/9] nvme.h: add ANA definitions
  2018-06-01  7:11 ` [PATCH 2/9] nvme.h: add ANA definitions Christoph Hellwig
  2018-06-03 12:27   ` Sagi Grimberg
  2018-06-04  6:30   ` Hannes Reinecke
@ 2018-06-04 11:58   ` Johannes Thumshirn
  2 siblings, 0 replies; 43+ messages in thread
From: Johannes Thumshirn @ 2018-06-04 11:58 UTC (permalink / raw)


Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 3/9] nvme: add support for the log specific field
  2018-06-01  7:11 ` [PATCH 3/9] nvme: add support for the log specific field Christoph Hellwig
  2018-06-03 12:27   ` Sagi Grimberg
  2018-06-04  6:31   ` Hannes Reinecke
@ 2018-06-04 11:59   ` Johannes Thumshirn
  2 siblings, 0 replies; 43+ messages in thread
From: Johannes Thumshirn @ 2018-06-04 11:59 UTC (permalink / raw)


Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 4/9] nvme: always failover on path or transport errors
  2018-06-04  6:29     ` Christoph Hellwig
@ 2018-06-04 12:01       ` Johannes Thumshirn
  2018-06-04 12:23       ` Mike Snitzer
  1 sibling, 0 replies; 43+ messages in thread
From: Johannes Thumshirn @ 2018-06-04 12:01 UTC (permalink / raw)


On Mon, Jun 04, 2018@08:29:23AM +0200, Christoph Hellwig wrote:
> But we could probably add a BLK_STS_PATH and I'll map it to that.

Sound like a good idea to me.

Anyways,
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 6/9] nvmet: track and limit the number of namespaces per subsystem
  2018-06-01  7:11 ` [PATCH 6/9] nvmet: track and limit the number of namespaces per subsystem Christoph Hellwig
  2018-06-04  6:35   ` Hannes Reinecke
@ 2018-06-04 12:04   ` Johannes Thumshirn
  1 sibling, 0 replies; 43+ messages in thread
From: Johannes Thumshirn @ 2018-06-04 12:04 UTC (permalink / raw)


Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 9/9] nvmet: make ANATT configurable
  2018-06-04 10:21   ` Hannes Reinecke
@ 2018-06-04 12:11     ` Christoph Hellwig
  0 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2018-06-04 12:11 UTC (permalink / raw)


On Mon, Jun 04, 2018@12:21:02PM +0200, Hannes Reinecke wrote:
> _Actually_ I would like to retract this patch.
> 
> I've been working on implementing ANA TT on the target, and found this
> particular implementation unworkable.
> 
> The problem is that with the current implementation the ANA state is
> pretty much tied to the _port_, which might be shared across several
> subsystems. Consequently any state change might affect namespaces
> on several subsystems, too.
> 
> But if the anatt value is tied to the subsystem we'd need to have
> different timers/workqueue items per port and subsystem/namespace, which
> doesn't really scale. Plus we're not giving anything away by making the
> anatt per port, not per subsys.

Well.  In the end the ANATT is a property of whatever cluster manager
is used to manage the backend connections.  Neither subsystem nor port
really map to that perfectly.  In fact we probably just want to set
it globally.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 4/9] nvme: always failover on path or transport errors
  2018-06-04  6:29     ` Christoph Hellwig
  2018-06-04 12:01       ` Johannes Thumshirn
@ 2018-06-04 12:23       ` Mike Snitzer
  1 sibling, 0 replies; 43+ messages in thread
From: Mike Snitzer @ 2018-06-04 12:23 UTC (permalink / raw)


On Mon, Jun 04 2018 at  2:29P -0400,
Christoph Hellwig <hch@lst.de> wrote:

> On Fri, Jun 01, 2018@11:27:11AM -0400, Mike Snitzer wrote:
> > > @@ -59,6 +59,9 @@ bool nvme_req_needs_failover(struct request *req, blk_status_t error)
> > >  {
> > >  	if (!(req->cmd_flags & REQ_NVME_MPATH))
> > >  		return false;
> > > +	/* always failover for path related errors */
> > > +	if ((nvme_req(req)->status & 0x700) == 0x300)
> > > +		return true;
> > >  	return blk_path_error(error);
> > >  }
> > 
> > Shouldn't this go into nvme_error_status(), with a new NVME_SC if
> > needed, so that blk_path_error() just works?
> 
> NVME_SC values are defined in the NVMe spec.  The check is intended
> as a catchall for multiple of theses values, as we've defined a whole
> extensible class for path errors.

Yes, so adding NVME_SC_ANA_INACCESSIBLE, NVME_SC_ANA_PERSISTENT_LOSS,
and NVME_SC_ANA_TRANSITION to nvme_error_status().
 
> But we could probably add a BLK_STS_PATH and I'll map it to that.

And mapping them to BLK_STS_PATH and updating blk_path_error().
Sounds good.

Thanks,
Mike

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 7/9] nvmet: add minimal ANA support
  2018-06-01  7:11 ` [PATCH 7/9] nvmet: add minimal ANA support Christoph Hellwig
  2018-06-04  6:41   ` Hannes Reinecke
@ 2018-06-04 12:25   ` Johannes Thumshirn
  2018-06-04 12:55     ` Christoph Hellwig
  1 sibling, 1 reply; 43+ messages in thread
From: Johannes Thumshirn @ 2018-06-04 12:25 UTC (permalink / raw)


On Fri, Jun 01, 2018@09:11:26AM +0200, Christoph Hellwig wrote:
> Add support for Asynchronous Namespace Access as specified in NVMe 1.3
> TP 4004.
> 
> Just add a default ANA group 1 that is optimized on all ports.  This is
> (and will remain) the default assignment for any namespace not epxlicitly
> assigned to another ANA group.  The ANA state can be manually changed
> through the configfs interface, including the change state.
> 
> Includes fixes and improvements from Hannes Reinecke.

Hmm I somehow miss the increment of nvmet_ana_chgcnt, here. Am I just
blind or is this intended?

-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 5/9] nvme: add ANA support
  2018-06-04  9:51       ` Hannes Reinecke
@ 2018-06-04 12:31         ` Mike Snitzer
  2018-06-04 13:37           ` Hannes Reinecke
  0 siblings, 1 reply; 43+ messages in thread
From: Mike Snitzer @ 2018-06-04 12:31 UTC (permalink / raw)


On Mon, Jun 04 2018 at  5:51P -0400,
Hannes Reinecke <hare@suse.de> wrote:

> On Mon, 4 Jun 2018 09:03:57 +0200
> Christoph Hellwig <hch@lst.de> wrote:
> 
> > On Mon, Jun 04, 2018@08:36:58AM +0200, Hannes Reinecke wrote:
> > > This suffers from the 'nvme connect' stall if the first detected
> > > namespace is inaccessible.  
> > 
> > Yes.  This isn't really different from any other NVMe target that
> > constantly returns errors when you connect.
> > 
> > If we want to do this properly we need some block level (e.g. gendisk)
> > state that a given device is ready for I/O.  And then don't scan
> > partition tables if it isn't, and instead do the scan once it becomes
> > life.  In other words: this is something that should be handled
> > mostly at the block level.
> 
> Okay; that was one option I figured out, but didn't want to mess with
> gendisk unless explicitly requested.
> I'll be looking into it.

Pretty sure this will bring about potentially trying discussion with
udev folks.  The same kinds of requirements have been around in lvm for
a while (and by lvm I mean it brokers udev readiness while it is
activating particular dm targets whose sub devices shouldn't be scanned
yet or at all).

Mike

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 7/9] nvmet: add minimal ANA support
  2018-06-04 12:25   ` Johannes Thumshirn
@ 2018-06-04 12:55     ` Christoph Hellwig
  0 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2018-06-04 12:55 UTC (permalink / raw)


On Mon, Jun 04, 2018@02:25:07PM +0200, Johannes Thumshirn wrote:
> On Fri, Jun 01, 2018@09:11:26AM +0200, Christoph Hellwig wrote:
> > Add support for Asynchronous Namespace Access as specified in NVMe 1.3
> > TP 4004.
> > 
> > Just add a default ANA group 1 that is optimized on all ports.  This is
> > (and will remain) the default assignment for any namespace not epxlicitly
> > assigned to another ANA group.  The ANA state can be manually changed
> > through the configfs interface, including the change state.
> > 
> > Includes fixes and improvements from Hannes Reinecke.
> 
> Hmm I somehow miss the increment of nvmet_ana_chgcnt, here. Am I just
> blind or is this intended?

It is inteded.  As of this patch we report an immutable ANA state,
so nvmet_ana_chgcnt will never change.

Dynamic configuration is only added in the next patch, which also
(hopefully) increments nvmet_ana_chgcnt in all the right places.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 5/9] nvme: add ANA support
  2018-06-04 12:31         ` Mike Snitzer
@ 2018-06-04 13:37           ` Hannes Reinecke
  0 siblings, 0 replies; 43+ messages in thread
From: Hannes Reinecke @ 2018-06-04 13:37 UTC (permalink / raw)


On Mon, 4 Jun 2018 08:31:35 -0400
Mike Snitzer <snitzer@redhat.com> wrote:

> On Mon, Jun 04 2018 at  5:51P -0400,
> Hannes Reinecke <hare@suse.de> wrote:
> 
> > On Mon, 4 Jun 2018 09:03:57 +0200
> > Christoph Hellwig <hch@lst.de> wrote:
> >   
> > > On Mon, Jun 04, 2018 at 08:36:58AM +0200, Hannes Reinecke wrote:  
> > > > This suffers from the 'nvme connect' stall if the first detected
> > > > namespace is inaccessible.    
> > > 
> > > Yes.  This isn't really different from any other NVMe target that
> > > constantly returns errors when you connect.
> > > 
> > > If we want to do this properly we need some block level (e.g.
> > > gendisk) state that a given device is ready for I/O.  And then
> > > don't scan partition tables if it isn't, and instead do the scan
> > > once it becomes life.  In other words: this is something that
> > > should be handled mostly at the block level.  
> > 
> > Okay; that was one option I figured out, but didn't want to mess
> > with gendisk unless explicitly requested.
> > I'll be looking into it.  
> 
> Pretty sure this will bring about potentially trying discussion with
> udev folks.  The same kinds of requirements have been around in lvm
> for a while (and by lvm I mean it brokers udev readiness while it is
> activating particular dm targets whose sub devices shouldn't be
> scanned yet or at all).
> 
One _can_ work around that issue by setting the GENHD_FL_NO_PART_SCAN
flag correctly; that will avoid the hang as the partitions are never
scanned.
Tricky bit is to unset that flag and kick off revalidation whenever a
new path is connected.

That will get us over the initial hurdle (ie 'nvme connect' will
succeed), but we'll potentially ending up with hanging udev processes
trying to call 'blkid' on those devices.

But that's a different story.

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 5/9] nvme: add ANA support
  2018-06-01  7:11 ` [PATCH 5/9] nvme: add ANA support Christoph Hellwig
  2018-06-04  6:36   ` Hannes Reinecke
@ 2018-06-06 12:01   ` Popuri, Sriram
  2018-06-06 12:13     ` Christoph Hellwig
  1 sibling, 1 reply; 43+ messages in thread
From: Popuri, Sriram @ 2018-06-06 12:01 UTC (permalink / raw)


+	ctrl->ana_state = kcalloc(ctrl->anagrpmax, sizeof(*ctrl->ana_state),
+			GFP_KERNEL);

You are assuming anagrpmax is a small value. I guess it can go upto ffffffffh.

Regards,
~Sriram

-----Original Message-----
From: Linux-nvme <linux-nvme-bounces@lists.infradead.org> On Behalf Of Christoph Hellwig
Sent: Friday, June 1, 2018 12:41 PM
To: linux-nvme at lists.infradead.org
Cc: Hannes Reinecke <hare at suse.de>
Subject: [PATCH 5/9] nvme: add ANA support

Add support for Asynchronous Namespace Access as specified in NVMe 1.3 TP 4004.  With ANA each namespace attached to a controller belongs to an ANA group that describes the characteristics of accessing the namespaces through this controller.  In the optimized and non-optimized states namespaces can be accessed regularly, although in a multi-pathing environment we should always prefer to access a namespace through a controller where an optimized relationship exists.  Namespaces in Inaccessible, Permanent-Loss or Change state for a given controller should not be accessed.

We keep a simple per-controller array of ANA states which is indexed by the ANA group ID specified in the namespace.  The states are updated through reading the ANA log page, which is read once during controller initialization, and whenever the ANA change notice AEN is received, or when one of the ANA specific status codes that signal a state change is received on a command.

There currently isn't any support for the ANA transition timeout yet.

Includes fixes and improvements from Hannes Reinecke.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c      |  30 ++++-
 drivers/nvme/host/multipath.c | 210 +++++++++++++++++++++++++++++++++-
 drivers/nvme/host/nvme.h      |  28 +++++
 3 files changed, 261 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index d67d29e4c5ff..4a7fd138e25f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1042,7 +1042,7 @@ int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)  EXPORT_SYMBOL_GPL(nvme_set_queue_count);
 
 #define NVME_AEN_SUPPORTED \
-	(NVME_AEN_CFG_NS_ATTR | NVME_AEN_CFG_FW_ACT)
+	(NVME_AEN_CFG_NS_ATTR | NVME_AEN_CFG_FW_ACT | NVME_AEN_CFG_ANA_CHANGE)
 
 static void nvme_enable_aen(struct nvme_ctrl *ctrl)  { @@ -1473,6 +1473,7 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 	ns->lba_shift = id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ds;
 	if (ns->lba_shift == 0)
 		ns->lba_shift = 9;
+	ns->anagrpid = le32_to_cpu(id->anagrpid);
 	ns->noiob = le16_to_cpu(id->noiob);
 	ns->ext = ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT);
 	ns->ms = le16_to_cpu(id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ms); @@ -2387,6 +2388,10 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 	nvme_set_queue_limits(ctrl, ctrl->admin_q);
 	ctrl->sgls = le32_to_cpu(id->sgls);
 	ctrl->kas = le16_to_cpu(id->kas);
+	ctrl->max_namespaces = le32_to_cpu(id->mnan);
+	ctrl->anacap = id->anacap;
+	ctrl->nanagrpid = le32_to_cpu(id->nanagrpid);
+	ctrl->anagrpmax = le32_to_cpu(id->anagrpmax);
 
 	if (id->rtd3e) {
 		/* us -> s */
@@ -2465,6 +2470,10 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 	if (ret < 0)
 		return ret;
 
+	ret = nvme_configure_ana(ctrl);
+	if (ret < 0)
+		return ret;
+
 	ctrl->identified = true;
 
 	return 0;
@@ -2666,6 +2675,10 @@ static struct attribute *nvme_ns_id_attrs[] = {
 	&dev_attr_nguid.attr,
 	&dev_attr_eui.attr,
 	&dev_attr_nsid.attr,
+#ifdef CONFIG_NVME_MULTIPATH
+	&dev_attr_ana_grpid.attr,
+	&dev_attr_ana_state.attr,
+#endif
 	NULL,
 };
 
@@ -2688,6 +2701,14 @@ static umode_t nvme_ns_id_attrs_are_visible(struct kobject *kobj,
 		if (!memchr_inv(ids->eui64, 0, sizeof(ids->eui64)))
 			return 0;
 	}
+#ifdef CONFIG_NVME_MULTIPATH
+	if (a == &dev_attr_ana_grpid.attr || a == &dev_attr_ana_state.attr) {
+		if (dev_to_disk(dev)->fops != &nvme_fops) /* per-path attr */
+			return 0;
+		if (!nvme_get_ns_from_dev(dev)->anagrpid)
+			return 0;
+	}
+#endif
 	return a->mode;
 }
 
@@ -3391,6 +3412,11 @@ static void nvme_handle_aen_notice(struct nvme_ctrl *ctrl, u32 result)
 	case NVME_AER_NOTICE_FW_ACT_STARTING:
 		queue_work(nvme_wq, &ctrl->fw_act_work);
 		break;
+	case NVME_AER_NOTICE_ANA:
+		if (WARN_ON_ONCE(!ctrl->ana_log_buf))
+			break;
+		queue_work(nvme_wq, &ctrl->ana_work);
+		break;
 	default:
 		dev_warn(ctrl->device, "async event result %08x\n", result);
 	}
@@ -3426,6 +3452,7 @@ void nvme_stop_ctrl(struct nvme_ctrl *ctrl)
 	nvme_stop_keep_alive(ctrl);
 	flush_work(&ctrl->async_event_work);
 	flush_work(&ctrl->scan_work);
+	cancel_work_sync(&ctrl->ana_work);
 	cancel_work_sync(&ctrl->fw_act_work);
 	if (ctrl->ops->stop_ctrl)
 		ctrl->ops->stop_ctrl(ctrl);
@@ -3460,6 +3487,7 @@ static void nvme_free_ctrl(struct device *dev)
 
 	ida_simple_remove(&nvme_instance_ida, ctrl->instance);
 	kfree(ctrl->effects);
+	nvme_deconfigure_ana(ctrl);
 
 	if (subsys) {
 		mutex_lock(&subsys->lock);
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 9412e5138d65..1a8791340862 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2017 Christoph Hellwig.
+ * Copyright (c) 2017-2018 Christoph Hellwig.
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms and conditions of the GNU General Public License, @@ -41,6 +41,11 @@ void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
 	}
 }
 
+static void nvme_update_ana_state(struct nvme_ns *ns, enum 
+nvme_ana_state state) {
+	WRITE_ONCE(ns->ctrl->ana_state[ns->anagrpid], state); }
+
 void nvme_failover_req(struct request *req)  {
 	struct nvme_ns *ns = req->q->queuedata; @@ -51,7 +56,31 @@ void nvme_failover_req(struct request *req)
 	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
 	blk_mq_end_request(req, 0);
 
-	nvme_reset_ctrl(ns->ctrl);
+	/*
+	 * Reset the controller for any non-ANA error as we don't know what
+	 * caused the error:
+	 */
+	switch (nvme_req(req)->status & 0x7ff) {
+	case NVME_SC_ANA_TRANSITION:
+		/*
+		 * XXX: We should verify the controller doesn't die on during
+		 * the transition.  But that means we per-group timeout from
+		 * when we first hit the change state, so this won't be
+		 * entirely trivial..
+		 */
+		nvme_update_ana_state(ns, NVME_ANA_CHANGE);
+		break;
+	case NVME_SC_ANA_PERSISTENT_LOSS:
+		nvme_update_ana_state(ns, NVME_ANA_PERSISTENT_LOSS);
+		break;
+	case NVME_SC_ANA_INACCESSIBLE:
+		nvme_update_ana_state(ns, NVME_ANA_INACCESSIBLE);
+		break;
+	default:
+		nvme_reset_ctrl(ns->ctrl);
+		break;
+	}
+
 	kblockd_schedule_work(&ns->head->requeue_work);
 }
 
@@ -77,12 +106,32 @@ void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
 	up_read(&ctrl->namespaces_rwsem);
 }
 
-static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head)
+static inline enum nvme_ana_state nvme_ns_ana_state(struct nvme_ns *ns) 
+{
+	if (!nvme_ctrl_has_ana(ns->ctrl))
+		return NVME_ANA_OPTIMIZED;
+	if (WARN_ON_ONCE(ns->anagrpid > ns->ctrl->anagrpmax))
+		return 0;
+	return READ_ONCE(ns->ctrl->ana_state[ns->anagrpid]);
+}
+
+static const char *nvme_ana_state_names[] = {
+	[0]				= "invalid state",
+	[NVME_ANA_OPTIMIZED]		= "optimized",
+	[NVME_ANA_NONOPTIMIZED]		= "non-optimized",
+	[NVME_ANA_INACCESSIBLE]		= "inaccessible",
+	[NVME_ANA_PERSISTENT_LOSS]	= "persistent-loss",
+	[NVME_ANA_CHANGE]		= "change",
+};
+
+static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head,
+		u8 ana_state)
 {
 	struct nvme_ns *ns;
 
 	list_for_each_entry_rcu(ns, &head->list, siblings) {
-		if (ns->ctrl->state == NVME_CTRL_LIVE) {
+		if (ns->ctrl->state == NVME_CTRL_LIVE &&
+		    nvme_ns_ana_state(ns) == ana_state) {
 			rcu_assign_pointer(head->current_path, ns);
 			return ns;
 		}
@@ -95,8 +144,14 @@ inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)  {
 	struct nvme_ns *ns = srcu_dereference(head->current_path, &head->srcu);
 
-	if (unlikely(!ns || ns->ctrl->state != NVME_CTRL_LIVE))
-		ns = __nvme_find_path(head);
+	if (likely(ns && ns->ctrl->state == NVME_CTRL_LIVE &&
+			nvme_ns_ana_state(ns) == NVME_ANA_OPTIMIZED))
+		return ns;
+
+	ns = __nvme_find_path(head, NVME_ANA_OPTIMIZED);
+	if (!ns)
+		ns = __nvme_find_path(head, NVME_ANA_NONOPTIMIZED);
+	/* XXX: try an inaccessible path as last resort per 8.18.3.3 */
 	return ns;
 }
 
@@ -249,3 +304,146 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head)
 	blk_cleanup_queue(head->disk->queue);
 	put_disk(head->disk);
 }
+
+static int nvme_process_ana_log(struct nvme_ctrl *ctrl, bool 
+groups_only) {
+	void *base = ctrl->ana_log_buf;
+	size_t offset = sizeof(struct nvme_ana_rsp_hdr);
+	int error, i;
+
+	/*
+	 * If anagrpid never changes we don't need to process the namespace
+	 * lists.
+	 */
+	if (ctrl->anacap & (1 << 6))
+		groups_only = true;
+
+	error = nvme_get_log_ext(ctrl, NULL, NVME_LOG_ANA,
+			groups_only ? NVME_ANA_LOG_RGO : 0,
+			ctrl->ana_log_buf, ctrl->ana_log_size, 0);
+	if (error) {
+		dev_warn(ctrl->device, "Failed to get ANA log: %d\n", error);
+		return error;
+	}
+
+	for (i = 0; i < le16_to_cpu(ctrl->ana_log_buf->ngrps); i++) {
+		struct nvme_ana_group_desc *desc = base + offset;
+		u32 grpid = le32_to_cpu(desc->grpid);
+		u32 nr_nsids = le32_to_cpu(desc->nnsids), n = 0;
+		size_t nsid_buf_size = nr_nsids * sizeof(__le32);
+		struct nvme_ns *ns;
+
+		if (WARN_ON_ONCE(grpid == 0))
+			return -EINVAL;
+		if (WARN_ON_ONCE(grpid > ctrl->anagrpmax))
+			return -EINVAL;
+		if (WARN_ON_ONCE(desc->state == 0))
+			return -EINVAL;
+		if (WARN_ON_ONCE(desc->state > NVME_ANA_CHANGE))
+			return -EINVAL;
+
+		dev_info(ctrl->device, "ANA group %d: %s.\n",
+				grpid, nvme_ana_state_names[desc->state]);
+		WRITE_ONCE(ctrl->ana_state[grpid], desc->state);
+		offset += sizeof(*desc);
+		if (!nr_nsids)
+			continue;
+
+		if (WARN_ON_ONCE(groups_only))
+			return -EINVAL;
+		if (WARN_ON_ONCE(offset > ctrl->ana_log_size - nsid_buf_size))
+			return -EINVAL;
+
+		down_write(&ctrl->namespaces_rwsem);
+		list_for_each_entry(ns, &ctrl->namespaces, list) {
+			u32 nsid = le32_to_cpu(desc->nsids[n]);
+
+			if (ns->head->ns_id != nsid)
+				continue;
+			ns->anagrpid = grpid;
+			if (++n == nr_nsids)
+				break;
+		}
+		up_write(&ctrl->namespaces_rwsem);
+		WARN_ON_ONCE(n < nr_nsids);
+
+		offset += nsid_buf_size;
+		if (WARN_ON_ONCE(offset > ctrl->ana_log_size - sizeof(*desc)))
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void nvme_ana_work(struct work_struct *work) {
+	struct nvme_ctrl *ctrl = container_of(work, struct nvme_ctrl, 
+ana_work);
+
+	nvme_process_ana_log(ctrl, false);
+	nvme_kick_requeue_lists(ctrl);
+}
+
+int nvme_configure_ana(struct nvme_ctrl *ctrl) {
+	int error;
+
+	if (!nvme_ctrl_has_ana(ctrl))
+		return 0;
+
+	ctrl->ana_log_size = sizeof(struct nvme_ana_rsp_hdr) +
+		ctrl->nanagrpid * sizeof(struct nvme_ana_group_desc);
+	if (!(ctrl->anacap & (1 << 6)))
+		ctrl->ana_log_size += ctrl->max_namespaces * sizeof(__le32);
+
+	if (ctrl->ana_log_size > ctrl->max_hw_sectors << SECTOR_SHIFT) {
+		dev_err(ctrl->device,
+			"ANA log page size (%zd) larger than MDTS (%d).\n",
+			ctrl->ana_log_size,
+			ctrl->max_hw_sectors << SECTOR_SHIFT);
+		dev_err(ctrl->device, "disabling ANA support.\n");
+		return 0;
+	}
+
+	INIT_WORK(&ctrl->ana_work, nvme_ana_work);
+	ctrl->ana_state = kcalloc(ctrl->anagrpmax, sizeof(*ctrl->ana_state),
+			GFP_KERNEL);
+	if (!ctrl->ana_state)
+		return -ENOMEM;
+
+	ctrl->ana_log_buf = kmalloc(ctrl->ana_log_size, GFP_KERNEL);
+	if (!ctrl->ana_log_buf)
+		goto out_free_ana_state;
+
+	error = nvme_process_ana_log(ctrl, true);
+	if (error)
+		goto out_free_ana_log_buf;
+	return 0;
+out_free_ana_log_buf:
+	kfree(ctrl->ana_log_buf);
+out_free_ana_state:
+	kfree(ctrl->ana_state);
+	return -ENOMEM;
+}
+
+void nvme_deconfigure_ana(struct nvme_ctrl *ctrl) {
+	kfree(ctrl->ana_log_buf);
+	kfree(ctrl->ana_state);
+}
+
+static ssize_t ana_grpid_show(struct device *dev, struct device_attribute *attr,
+		char *buf)
+{
+	return sprintf(buf, "%d\n", nvme_get_ns_from_dev(dev)->anagrpid);
+}
+DEVICE_ATTR_RO(ana_grpid);
+
+static ssize_t ana_state_show(struct device *dev, struct device_attribute *attr,
+		char *buf)
+{
+	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
+	enum nvme_ana_state state = nvme_ns_ana_state(ns);
+
+	return sprintf(buf, "%s\n", nvme_ana_state_names[state]); } 
+DEVICE_ATTR_RO(ana_state);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index a9695b8645aa..4a29c736370f 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -169,6 +169,7 @@ struct nvme_ctrl {
 	u16 oacs;
 	u16 nssa;
 	u16 nr_streams;
+	u32 max_namespaces;
 	atomic_t abort_limit;
 	u8 vwc;
 	u32 vs;
@@ -192,6 +193,15 @@ struct nvme_ctrl {
 #define EVENT_NS_CHANGED		(1 << 0)
 	unsigned long events;
 
+	/* asymmetric namespace access: */
+	u8 anacap;
+	u32 anagrpmax;
+	u32 nanagrpid;
+	enum nvme_ana_state *ana_state;
+	size_t ana_log_size;
+	struct nvme_ana_rsp_hdr *ana_log_buf;
+	struct work_struct ana_work;
+
 	/* Power saving configuration */
 	u64 ps_max_latency_us;
 	bool apst_enabled;
@@ -297,6 +307,7 @@ struct nvme_ns {
 #define NVME_NS_REMOVING 0
 #define NVME_NS_DEAD     1
 	u16 noiob;
+	u32 anagrpid;
 
 #ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
 	struct nvme_fault_inject fault_inject; @@ -438,6 +449,11 @@ int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,  extern const struct attribute_group nvme_ns_id_attr_group;  extern const struct block_device_operations nvme_ns_head_ops;
 
+static inline bool nvme_ctrl_has_ana(struct nvme_ctrl *ctrl) {
+	return ctrl->subsys->cmic & (1 << 3);
+}
+
 #ifdef CONFIG_NVME_MULTIPATH
 void nvme_set_disk_name(char *disk_name, struct nvme_ns *ns,
 			struct nvme_ctrl *ctrl, int *flags); @@ -447,6 +463,8 @@ void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl);  int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl,struct nvme_ns_head *head);  void nvme_mpath_add_disk(struct nvme_ns_head *head);  void nvme_mpath_remove_disk(struct nvme_ns_head *head);
+int nvme_configure_ana(struct nvme_ctrl *ctrl); void 
+nvme_deconfigure_ana(struct nvme_ctrl *ctrl);
 
 static inline void nvme_mpath_clear_current_path(struct nvme_ns *ns)  { @@ -465,6 +483,9 @@ static inline void nvme_mpath_check_last_path(struct nvme_ns *ns)
 		kblockd_schedule_work(&head->requeue_work);
 }
 
+extern struct device_attribute dev_attr_ana_grpid; extern struct 
+device_attribute dev_attr_ana_state;
+
 #else
 /*
  * Without the multipath code enabled, multiple controller per subsystems are @@ -504,6 +525,13 @@ static inline void nvme_mpath_clear_current_path(struct nvme_ns *ns)  static inline void nvme_mpath_check_last_path(struct nvme_ns *ns)  {  }
+static inline int nvme_configure_ana(struct nvme_ctrl *ctrl) {
+	return 0;
+}
+static inline void nvme_deconfigure_ana(struct nvme_ctrl *ctrl) { }
 #endif /* CONFIG_NVME_MULTIPATH */
 
 #ifdef CONFIG_NVM
--
2.17.0


_______________________________________________
Linux-nvme mailing list
Linux-nvme at lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 5/9] nvme: add ANA support
  2018-06-06 12:01   ` Popuri, Sriram
@ 2018-06-06 12:13     ` Christoph Hellwig
  2018-06-06 12:27       ` Popuri, Sriram
  0 siblings, 1 reply; 43+ messages in thread
From: Christoph Hellwig @ 2018-06-06 12:13 UTC (permalink / raw)


On Wed, Jun 06, 2018@12:01:04PM +0000, Popuri, Sriram wrote:
> +	ctrl->ana_state = kcalloc(ctrl->anagrpmax, sizeof(*ctrl->ana_state),
> +			GFP_KERNEL);
> 
> You are assuming anagrpmax is a small value. I guess it can go upto ffffffffh.

In theory it could.  In practice that is going to get you very large
data transfers for the ANA log, which we can't chunk.

At some point we might want to look into using the page allocator here,
but for now I'd like to keep it simple.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 5/9] nvme: add ANA support
  2018-06-06 12:13     ` Christoph Hellwig
@ 2018-06-06 12:27       ` Popuri, Sriram
  2018-06-06 12:50         ` Christoph Hellwig
  0 siblings, 1 reply; 43+ messages in thread
From: Popuri, Sriram @ 2018-06-06 12:27 UTC (permalink / raw)


" In theory it could.  In practice that is going to get you very large data transfers for the ANA log, which we can't chunk."
I don't think so. ANA log is based on num ana grps. NANAGRPID can be small, but ANAGRPMAX could be large if a sparse id space is chosen.

Regards,
~Sriram

-----Original Message-----
From: Christoph Hellwig <hch@lst.de> 
Sent: Wednesday, June 6, 2018 5:43 PM
To: Popuri, Sriram <Sriram.Popuri at netapp.com>
Cc: Christoph Hellwig <hch at lst.de>; linux-nvme at lists.infradead.org; Hannes Reinecke <hare at suse.de>; Meneghini, John <John.Meneghini at netapp.com>; Knight, Frederick <Frederick.Knight at netapp.com>
Subject: Re: [PATCH 5/9] nvme: add ANA support

On Wed, Jun 06, 2018@12:01:04PM +0000, Popuri, Sriram wrote:
> +	ctrl->ana_state = kcalloc(ctrl->anagrpmax, sizeof(*ctrl->ana_state),
> +			GFP_KERNEL);
> 
> You are assuming anagrpmax is a small value. I guess it can go upto ffffffffh.

In theory it could.  In practice that is going to get you very large data transfers for the ANA log, which we can't chunk.

At some point we might want to look into using the page allocator here, but for now I'd like to keep it simple.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH 5/9] nvme: add ANA support
  2018-06-06 12:27       ` Popuri, Sriram
@ 2018-06-06 12:50         ` Christoph Hellwig
  0 siblings, 0 replies; 43+ messages in thread
From: Christoph Hellwig @ 2018-06-06 12:50 UTC (permalink / raw)


On Wed, Jun 06, 2018@12:27:43PM +0000, Popuri, Sriram wrote:
> " In theory it could.  In practice that is going to get you very large data transfers for the ANA log, which we can't chunk."
> I don't think so. ANA log is based on num ana grps. NANAGRPID can be small, but ANAGRPMAX could be large if a sparse id space is chosen.

Still the same issue here - if we add another indirection to looking
up the ana group this is really going to mess up the fast path where
we always need to do a state check.  So for now I want to optimize
for sensible targets.  If we really see targets eventually that say
only use the high bits we'll have to find a workaround.

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2018-06-06 12:50 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-01  7:11 draft ANA support v2 Christoph Hellwig
2018-06-01  7:11 ` [PATCH 1/9] nvme: don't hold nvmf_transports_rwsem for more than transport lookups Christoph Hellwig
2018-06-03 12:26   ` Sagi Grimberg
2018-06-04  6:30   ` Hannes Reinecke
2018-06-01  7:11 ` [PATCH 2/9] nvme.h: add ANA definitions Christoph Hellwig
2018-06-03 12:27   ` Sagi Grimberg
2018-06-04  6:30   ` Hannes Reinecke
2018-06-04 11:58   ` Johannes Thumshirn
2018-06-01  7:11 ` [PATCH 3/9] nvme: add support for the log specific field Christoph Hellwig
2018-06-03 12:27   ` Sagi Grimberg
2018-06-04  6:31   ` Hannes Reinecke
2018-06-04 11:59   ` Johannes Thumshirn
2018-06-01  7:11 ` [PATCH 4/9] nvme: always failover on path or transport errors Christoph Hellwig
2018-06-01 15:27   ` Mike Snitzer
2018-06-04  6:29     ` Christoph Hellwig
2018-06-04 12:01       ` Johannes Thumshirn
2018-06-04 12:23       ` Mike Snitzer
2018-06-03 12:28   ` Sagi Grimberg
2018-06-04  6:32   ` Hannes Reinecke
2018-06-01  7:11 ` [PATCH 5/9] nvme: add ANA support Christoph Hellwig
2018-06-04  6:36   ` Hannes Reinecke
2018-06-04  7:03     ` Christoph Hellwig
2018-06-04  9:51       ` Hannes Reinecke
2018-06-04 12:31         ` Mike Snitzer
2018-06-04 13:37           ` Hannes Reinecke
2018-06-06 12:01   ` Popuri, Sriram
2018-06-06 12:13     ` Christoph Hellwig
2018-06-06 12:27       ` Popuri, Sriram
2018-06-06 12:50         ` Christoph Hellwig
2018-06-01  7:11 ` [PATCH 6/9] nvmet: track and limit the number of namespaces per subsystem Christoph Hellwig
2018-06-04  6:35   ` Hannes Reinecke
2018-06-04 12:04   ` Johannes Thumshirn
2018-06-01  7:11 ` [PATCH 7/9] nvmet: add minimal ANA support Christoph Hellwig
2018-06-04  6:41   ` Hannes Reinecke
2018-06-04 12:25   ` Johannes Thumshirn
2018-06-04 12:55     ` Christoph Hellwig
2018-06-01  7:11 ` [PATCH 8/9] nvmet: support configuring additional ANA groups Christoph Hellwig
2018-06-04  6:42   ` Hannes Reinecke
2018-06-04  6:53     ` Christoph Hellwig
2018-06-04  9:48       ` Hannes Reinecke
2018-06-01  7:11 ` [PATCH 9/9] nvmet: make ANATT configurable Christoph Hellwig
2018-06-04 10:21   ` Hannes Reinecke
2018-06-04 12:11     ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.