All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2 00/11] nvme: ANA support
@ 2018-05-22  9:09 Hannes Reinecke
  2018-05-22  9:09 ` [PATCHv2 01/11] nvme.h: untangle AEN notice definitions Hannes Reinecke
                   ` (10 more replies)
  0 siblings, 11 replies; 27+ messages in thread
From: Hannes Reinecke @ 2018-05-22  9:09 UTC (permalink / raw)


Hi all,

here's now the second round of adding ANA support to the nvme subsystem.
It's the combined patchset from hch and me, and I've included
all reviews I've had so far.

There are actually two parts to it:
The first patches update AEN configuration support as ANA relies on AENs to be sent.
So we should better make sure that we have them enabled.
The second part is then the ANA support proper.

As usual, comments and reviews are welcome.

Christoph Hellwig (8):
  nvme.h: untangle AEN notice definitions
  nvmet: refactor AER handling
  nvme.h: add ANA definitions
  nvme: add support for the log specific field
  nvme: always failover on path or transport errors
  nvme: add ANA support
  nvmet: add a new nvmet_zero_sgl helper
  nvmet: split log page implementation

Hannes Reinecke (3):
  nvme: submit AEN event configuration on startup
  nvmet: Add AEN configuration support
  nvmet: ANA support

 drivers/nvme/host/core.c        | 390 ++++++++++++++++++++++++++++++++++++----
 drivers/nvme/host/lightnvm.c    |   2 +-
 drivers/nvme/host/multipath.c   |  43 ++++-
 drivers/nvme/host/nvme.h        |  20 ++-
 drivers/nvme/target/admin-cmd.c | 247 +++++++++++++++++++------
 drivers/nvme/target/configfs.c  | 381 +++++++++++++++++++++++++++++++++++++++
 drivers/nvme/target/core.c      | 120 +++++++++++--
 drivers/nvme/target/discovery.c |  49 +++--
 drivers/nvme/target/io-cmd.c    |  36 ++++
 drivers/nvme/target/nvmet.h     |  52 ++++++
 include/linux/nvme.h            |  65 ++++++-
 11 files changed, 1275 insertions(+), 130 deletions(-)

-- 
2.12.3

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCHv2 01/11] nvme.h: untangle AEN notice definitions
  2018-05-22  9:09 [PATCHv2 00/11] nvme: ANA support Hannes Reinecke
@ 2018-05-22  9:09 ` Hannes Reinecke
  2018-05-22  9:09 ` [PATCHv2 02/11] nvme: submit AEN event configuration on startup Hannes Reinecke
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Hannes Reinecke @ 2018-05-22  9:09 UTC (permalink / raw)


From: Christoph Hellwig <hch@lst.de>

Stop including the event type in the definitions for the notice type.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c | 30 ++++++++++++++++++------------
 include/linux/nvme.h     |  8 ++++++--
 2 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 79d49f16d991..fd71daa92d6a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3343,6 +3343,21 @@ static void nvme_fw_act_work(struct work_struct *work)
 	nvme_get_fw_slot_info(ctrl);
 }
 
+static void nvme_handle_aen_notice(struct nvme_ctrl *ctrl, u32 result)
+{
+	switch ((result & 0xff00) >> 8) {
+	case NVME_AER_NOTICE_NS_CHANGED:
+		dev_info(ctrl->device, "rescanning\n");
+		nvme_queue_scan(ctrl);
+		break;
+	case NVME_AER_NOTICE_FW_ACT_STARTING:
+		queue_work(nvme_wq, &ctrl->fw_act_work);
+		break;
+	default:
+		dev_warn(ctrl->device, "async event result %08x\n", result);
+	}
+}
+
 void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
 		union nvme_result *res)
 {
@@ -3352,6 +3367,9 @@ void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
 		return;
 
 	switch (result & 0x7) {
+	case NVME_AER_NOTICE:
+		nvme_handle_aen_notice(ctrl, result);
+		break;
 	case NVME_AER_ERROR:
 	case NVME_AER_SMART:
 	case NVME_AER_CSS:
@@ -3361,18 +3379,6 @@ void nvme_complete_async_event(struct nvme_ctrl *ctrl, __le16 status,
 	default:
 		break;
 	}
-
-	switch (result & 0xff07) {
-	case NVME_AER_NOTICE_NS_CHANGED:
-		dev_info(ctrl->device, "rescanning\n");
-		nvme_queue_scan(ctrl);
-		break;
-	case NVME_AER_NOTICE_FW_ACT_STARTING:
-		queue_work(nvme_wq, &ctrl->fw_act_work);
-		break;
-	default:
-		dev_warn(ctrl->device, "async event result %08x\n", result);
-	}
 	queue_work(nvme_wq, &ctrl->async_event_work);
 }
 EXPORT_SYMBOL_GPL(nvme_complete_async_event);
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 4112e2bd747f..c37103a4ad38 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -436,10 +436,14 @@ enum {
 enum {
 	NVME_AER_ERROR			= 0,
 	NVME_AER_SMART			= 1,
+	NVME_AER_NOTICE			= 2,
 	NVME_AER_CSS			= 6,
 	NVME_AER_VS			= 7,
-	NVME_AER_NOTICE_NS_CHANGED	= 0x0002,
-	NVME_AER_NOTICE_FW_ACT_STARTING = 0x0102,
+};
+
+enum {
+	NVME_AER_NOTICE_NS_CHANGED	= 0x00,
+	NVME_AER_NOTICE_FW_ACT_STARTING = 0x01,
 };
 
 struct nvme_lba_range_type {
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv2 02/11] nvme: submit AEN event configuration on startup
  2018-05-22  9:09 [PATCHv2 00/11] nvme: ANA support Hannes Reinecke
  2018-05-22  9:09 ` [PATCHv2 01/11] nvme.h: untangle AEN notice definitions Hannes Reinecke
@ 2018-05-22  9:09 ` Hannes Reinecke
  2018-05-22 10:00   ` Christoph Hellwig
  2018-05-22  9:09 ` [PATCHv2 03/11] nvmet: refactor AER handling Hannes Reinecke
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Hannes Reinecke @ 2018-05-22  9:09 UTC (permalink / raw)


We should register for AEN events; some law-abiding targets might
not be sending us AENs otherwise.

Signed-off-by: Hannes Reinecke <hare at suse.com>
---
 drivers/nvme/host/core.c | 19 +++++++++++++++++++
 drivers/nvme/host/nvme.h |  1 +
 include/linux/nvme.h     |  5 +++++
 3 files changed, 25 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index fd71daa92d6a..ae1ebc2c88b3 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1029,6 +1029,22 @@ int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count)
 }
 EXPORT_SYMBOL_GPL(nvme_set_queue_count);
 
+void nvme_enable_aen(struct nvme_ctrl *ctrl)
+{
+	u32 aen_cfg = NVME_SMART_CRIT_SPARE | NVME_SMART_CRIT_TEMPERATURE |
+		NVME_SMART_CRIT_RELIABILITY | NVME_SMART_CRIT_MEDIA |
+		NVME_SMART_CRIT_VOLATILE_MEMORY;
+	u32 result;
+	int status;
+
+	aen_cfg |= ctrl->aen_cfg;
+	status = nvme_set_features(ctrl, NVME_FEAT_ASYNC_EVENT,
+				   aen_cfg, NULL, 0, &result);
+	if (status)
+		dev_warn(ctrl->device, "Failed to configure AEN (cfg %x)\n",
+			 aen_cfg);
+}
+
 static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
 {
 	struct nvme_user_io io;
@@ -2345,6 +2361,8 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 
 	ctrl->oacs = le16_to_cpu(id->oacs);
 	ctrl->oncs = le16_to_cpup(&id->oncs);
+	ctrl->aen_cfg = le32_to_cpu(id->oaes) &
+		(NVME_AEN_CFG_NS_ATTR | NVME_AEN_CFG_FW_ACT);
 	atomic_set(&ctrl->abort_limit, id->acl + 1);
 	ctrl->vwc = id->vwc;
 	ctrl->cntlid = le16_to_cpup(&id->cntlid);
@@ -3401,6 +3419,7 @@ void nvme_start_ctrl(struct nvme_ctrl *ctrl)
 
 	if (ctrl->queue_count > 1) {
 		nvme_queue_scan(ctrl);
+		nvme_enable_aen(ctrl);
 		queue_work(nvme_wq, &ctrl->async_event_work);
 		nvme_start_queues(ctrl);
 	}
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 2701c36552bf..3ec9b3e37616 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -176,6 +176,7 @@ struct nvme_ctrl {
 	u16 kas;
 	u8 npss;
 	u8 apsta;
+	u32 aen_cfg;
 	u32 aen_result;
 	unsigned int shutdown_timeout;
 	unsigned int kato;
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index c37103a4ad38..21457fc25acd 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -446,6 +446,11 @@ enum {
 	NVME_AER_NOTICE_FW_ACT_STARTING = 0x01,
 };
 
+enum {
+	NVME_AEN_CFG_NS_ATTR		= 1 << 8,
+	NVME_AEN_CFG_FW_ACT		= 1 << 9,
+};
+
 struct nvme_lba_range_type {
 	__u8			type;
 	__u8			attributes;
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv2 03/11] nvmet: refactor AER handling
  2018-05-22  9:09 [PATCHv2 00/11] nvme: ANA support Hannes Reinecke
  2018-05-22  9:09 ` [PATCHv2 01/11] nvme.h: untangle AEN notice definitions Hannes Reinecke
  2018-05-22  9:09 ` [PATCHv2 02/11] nvme: submit AEN event configuration on startup Hannes Reinecke
@ 2018-05-22  9:09 ` Hannes Reinecke
  2018-05-22  9:09 ` [PATCHv2 04/11] nvmet: Add AEN configuration support Hannes Reinecke
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Hannes Reinecke @ 2018-05-22  9:09 UTC (permalink / raw)


From: Christoph Hellwig <hch@lst.de>

Create a helper to iterate all controlles in a subsystem instead of open
coding the loop.  Also use the NVME_AER_NOTICE_NS_CHANGED definition instead
of plain 0.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/target/core.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index e95424f172fd..505d4267b8f1 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -137,6 +137,19 @@ static void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type,
 	schedule_work(&ctrl->async_event_work);
 }
 
+static void nvmet_send_ns_changed_event(struct nvmet_subsys *subsys)
+{
+	/*
+	 * XXX: should also include the log id, but the changed namespace list
+	 * log needs to be implemented first..
+	 */
+	struct nvmet_ctrl *ctrl;
+
+	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry)
+		nvmet_add_async_event(ctrl, NVME_AER_TYPE_NOTICE,
+				NVME_AER_NOTICE_NS_CHANGED, 0);
+}
+
 int nvmet_register_transport(const struct nvmet_fabrics_ops *ops)
 {
 	int ret = 0;
@@ -274,7 +287,6 @@ void nvmet_put_namespace(struct nvmet_ns *ns)
 int nvmet_ns_enable(struct nvmet_ns *ns)
 {
 	struct nvmet_subsys *subsys = ns->subsys;
-	struct nvmet_ctrl *ctrl;
 	int ret = 0;
 
 	mutex_lock(&subsys->lock);
@@ -320,9 +332,7 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
 		list_add_tail_rcu(&ns->dev_link, &old->dev_link);
 	}
 
-	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry)
-		nvmet_add_async_event(ctrl, NVME_AER_TYPE_NOTICE, 0, 0);
-
+	nvmet_send_ns_changed_event(subsys);
 	ns->enabled = true;
 	ret = 0;
 out_unlock:
@@ -337,7 +347,6 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
 void nvmet_ns_disable(struct nvmet_ns *ns)
 {
 	struct nvmet_subsys *subsys = ns->subsys;
-	struct nvmet_ctrl *ctrl;
 
 	mutex_lock(&subsys->lock);
 	if (!ns->enabled)
@@ -363,9 +372,7 @@ void nvmet_ns_disable(struct nvmet_ns *ns)
 	percpu_ref_exit(&ns->ref);
 
 	mutex_lock(&subsys->lock);
-	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry)
-		nvmet_add_async_event(ctrl, NVME_AER_TYPE_NOTICE, 0, 0);
-
+	nvmet_send_ns_changed_event(subsys);
 	if (ns->bdev)
 		blkdev_put(ns->bdev, FMODE_WRITE|FMODE_READ);
 out_unlock:
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv2 04/11] nvmet: Add AEN configuration support
  2018-05-22  9:09 [PATCHv2 00/11] nvme: ANA support Hannes Reinecke
                   ` (2 preceding siblings ...)
  2018-05-22  9:09 ` [PATCHv2 03/11] nvmet: refactor AER handling Hannes Reinecke
@ 2018-05-22  9:09 ` Hannes Reinecke
  2018-05-22 10:01   ` Christoph Hellwig
  2018-05-22  9:09 ` [PATCHv2 05/11] nvme.h: add ANA definitions Hannes Reinecke
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Hannes Reinecke @ 2018-05-22  9:09 UTC (permalink / raw)


From: Hannes Reinecke <hare@suse.com>

AEN configuration via the 'Get Features' and 'Set Features' admin
command is mandatory, so we should be implemeting handling for it.

Signed-off-by: Hannes Reinecke <hare at suse.com>
---
 drivers/nvme/target/admin-cmd.c | 8 +++++++-
 drivers/nvme/target/core.c      | 6 ++++--
 drivers/nvme/target/nvmet.h     | 1 +
 3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 5e0e9fcc0d4d..58df0e5b3f5a 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -447,6 +447,11 @@ static void nvmet_execute_set_features(struct nvmet_req *req)
 		req->sq->ctrl->kato = DIV_ROUND_UP(val32, 1000);
 		nvmet_set_result(req, req->sq->ctrl->kato);
 		break;
+	case NVME_FEAT_ASYNC_EVENT:
+		val32 = le32_to_cpu(req->cmd->common.cdw10[1]);
+		req->sq->ctrl->aen_cfg = val32 & NVME_AEN_CFG_NS_ATTR;
+		nvmet_set_result(req, req->sq->ctrl->aen_cfg);
+		break;
 	case NVME_FEAT_HOST_ID:
 		status = NVME_SC_CMD_SEQ_ERROR | NVME_SC_DNR;
 		break;
@@ -485,9 +490,10 @@ static void nvmet_execute_get_features(struct nvmet_req *req)
 		break;
 	case NVME_FEAT_WRITE_ATOMIC:
 		break;
+#endif
 	case NVME_FEAT_ASYNC_EVENT:
+		nvmet_set_result(req, req->sq->ctrl->aen_cfg);
 		break;
-#endif
 	case NVME_FEAT_VOLATILE_WC:
 		nvmet_set_result(req, 1);
 		break;
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 505d4267b8f1..a8fc05b1aff8 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -146,8 +146,9 @@ static void nvmet_send_ns_changed_event(struct nvmet_subsys *subsys)
 	struct nvmet_ctrl *ctrl;
 
 	list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry)
-		nvmet_add_async_event(ctrl, NVME_AER_TYPE_NOTICE,
-				NVME_AER_NOTICE_NS_CHANGED, 0);
+		if (ctrl->aen_cfg & NVME_AEN_CFG_NS_ATTR)
+			nvmet_add_async_event(ctrl, NVME_AER_TYPE_NOTICE,
+					      NVME_AER_NOTICE_NS_CHANGED, 0);
 }
 
 int nvmet_register_transport(const struct nvmet_fabrics_ops *ops)
@@ -816,6 +817,7 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 
 	kref_init(&ctrl->ref);
 	ctrl->subsys = subsys;
+	ctrl->aen_cfg = NVME_AEN_CFG_NS_ATTR;
 
 	ctrl->cqs = kcalloc(subsys->max_qid + 1,
 			sizeof(struct nvmet_cq *),
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 15fd84ab21f8..3fd6686ad906 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -120,6 +120,7 @@ struct nvmet_ctrl {
 	u16			cntlid;
 	u32			kato;
 
+	u32			aen_cfg;
 	struct nvmet_req	*async_event_cmds[NVMET_ASYNC_EVENTS];
 	unsigned int		nr_async_event_cmds;
 	struct list_head	async_events;
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv2 05/11] nvme.h: add ANA definitions
  2018-05-22  9:09 [PATCHv2 00/11] nvme: ANA support Hannes Reinecke
                   ` (3 preceding siblings ...)
  2018-05-22  9:09 ` [PATCHv2 04/11] nvmet: Add AEN configuration support Hannes Reinecke
@ 2018-05-22  9:09 ` Hannes Reinecke
  2018-05-22  9:09 ` [PATCHv2 06/11] nvme: add support for the log specific field Hannes Reinecke
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Hannes Reinecke @ 2018-05-22  9:09 UTC (permalink / raw)


From: Christoph Hellwig <hch@lst.de>

Add various defintions from NVMe 1.3 TP 4004.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 include/linux/nvme.h | 50 +++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 47 insertions(+), 3 deletions(-)

diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 21457fc25acd..03d6494bb028 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -242,7 +242,12 @@ struct nvme_id_ctrl {
 	__le32			sanicap;
 	__le32			hmminds;
 	__le16			hmmaxd;
-	__u8			rsvd338[174];
+	__u8			rsvd338[4];
+	__u8			anatt;
+	__u8			anacap;
+	__le32			anagrpmax;
+	__le32			nanagrpid;
+	__u8			rsvd352[160];
 	__u8			sqes;
 	__u8			cqes;
 	__le16			maxcmd;
@@ -258,7 +263,8 @@ struct nvme_id_ctrl {
 	__le16			acwu;
 	__u8			rsvd534[2];
 	__le32			sgls;
-	__u8			rsvd540[228];
+	__le32			mnan;
+	__u8			rsvd544[224];
 	char			subnqn[256];
 	__u8			rsvd1024[768];
 	__le32			ioccsz;
@@ -312,7 +318,9 @@ struct nvme_id_ns {
 	__le16			nabspf;
 	__le16			noiob;
 	__u8			nvmcap[16];
-	__u8			rsvd64[40];
+	__u8			rsvd64[28];
+	__le32			anagrpid;
+	__u8			rsvd96[8];
 	__u8			nguid[16];
 	__u8			eui64[8];
 	struct nvme_lbaf	lbaf[16];
@@ -425,6 +433,32 @@ struct nvme_effects_log {
 	__u8   resv[2048];
 };
 
+enum nvme_ana_state {
+	NVME_ANA_OPTIMIZED		= 0x01,
+	NVME_ANA_NONOPTIMIZED		= 0x02,
+	NVME_ANA_INACCESSIBLE		= 0x03,
+	NVME_ANA_PERSISTENT_LOSS	= 0x04,
+	NVME_ANA_CHANGE			= 0x0f,
+};
+
+struct nvme_ana_group_desc {
+	__le32	grpid;
+	__le32	nnsids;
+	__le64	chgcnt;
+	__u8	state;
+	__u8	rsvd17[7];
+	__le32	nsids[];
+};
+
+/* flag for the log specific field of the ANA log */
+#define NVME_ANA_LOG_RGO	(1 << 0)
+
+struct nvme_ana_rsp_hdr {
+	__le64	chgcnt;
+	__le16	ngrps;
+	__le16	rsvd10[3];
+};
+
 enum {
 	NVME_SMART_CRIT_SPARE		= 1 << 0,
 	NVME_SMART_CRIT_TEMPERATURE	= 1 << 1,
@@ -444,11 +478,13 @@ enum {
 enum {
 	NVME_AER_NOTICE_NS_CHANGED	= 0x00,
 	NVME_AER_NOTICE_FW_ACT_STARTING = 0x01,
+	NVME_AER_NOTICE_ANA		= 0x03,
 };
 
 enum {
 	NVME_AEN_CFG_NS_ATTR		= 1 << 8,
 	NVME_AEN_CFG_FW_ACT		= 1 << 9,
+	NVME_AEN_CFG_ANA_CHANGE		= 1 << 11,
 };
 
 struct nvme_lba_range_type {
@@ -757,6 +793,7 @@ enum {
 	NVME_LOG_SMART		= 0x02,
 	NVME_LOG_FW_SLOT	= 0x03,
 	NVME_LOG_CMD_EFFECTS	= 0x05,
+	NVME_LOG_ANA		= 0x0c,
 	NVME_LOG_DISC		= 0x70,
 	NVME_LOG_RESERVATION	= 0x80,
 	NVME_FWACT_REPL		= (0 << 3),
@@ -1177,6 +1214,13 @@ enum {
 	NVME_SC_ACCESS_DENIED		= 0x286,
 	NVME_SC_UNWRITTEN_BLOCK		= 0x287,
 
+	/*
+	 * Path-related Errors:
+	 */
+	NVME_SC_ANA_PERSISTENT_LOSS	= 0x301,
+	NVME_SC_ANA_INACCESSIBLE	= 0x302,
+	NVME_SC_ANA_TRANSITION		= 0x303,
+
 	NVME_SC_DNR			= 0x4000,
 };
 
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv2 06/11] nvme: add support for the log specific field
  2018-05-22  9:09 [PATCHv2 00/11] nvme: ANA support Hannes Reinecke
                   ` (4 preceding siblings ...)
  2018-05-22  9:09 ` [PATCHv2 05/11] nvme.h: add ANA definitions Hannes Reinecke
@ 2018-05-22  9:09 ` Hannes Reinecke
  2018-05-22  9:10 ` [PATCHv2 07/11] nvme: always failover on path or transport errors Hannes Reinecke
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Hannes Reinecke @ 2018-05-22  9:09 UTC (permalink / raw)


From: Christoph Hellwig <hch@lst.de>

NVMe 1.3 added a new log specific field to the get log page CQ
defintion.  Add the field and support passing it to nvme_get_log_ext.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c     | 6 +++---
 drivers/nvme/host/lightnvm.c | 2 +-
 drivers/nvme/host/nvme.h     | 2 +-
 include/linux/nvme.h         | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ae1ebc2c88b3..b4408e1e677f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2245,8 +2245,7 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
 }
 
 int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
-		     u8 log_page, void *log,
-		     size_t size, u64 offset)
+		u8 log_page, u8 lsp, void *log, size_t size, u64 offset)
 {
 	struct nvme_command c = { };
 	unsigned long dwlen = size / 4 - 1;
@@ -2259,6 +2258,7 @@ int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 		c.get_log_page.nsid = cpu_to_le32(NVME_NSID_ALL);
 
 	c.get_log_page.lid = log_page;
+	c.get_log_page.lsp = lsp;
 	c.get_log_page.numdl = cpu_to_le16(dwlen & ((1 << 16) - 1));
 	c.get_log_page.numdu = cpu_to_le16(dwlen >> 16);
 	c.get_log_page.lpol = cpu_to_le32(lower_32_bits(offset));
@@ -2270,7 +2270,7 @@ int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 static int nvme_get_log(struct nvme_ctrl *ctrl, u8 log_page, void *log,
 			size_t size)
 {
-	return nvme_get_log_ext(ctrl, NULL, log_page, log, size, 0);
+	return nvme_get_log_ext(ctrl, NULL, log_page, 0, log, size, 0);
 }
 
 static int nvme_get_effects_log(struct nvme_ctrl *ctrl)
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index 41279da799ed..216c6b0a9554 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -598,7 +598,7 @@ static int nvme_nvm_get_chk_meta(struct nvm_dev *ndev,
 	while (left) {
 		len = min_t(unsigned int, left, ctrl->max_hw_sectors << 9);
 
-		ret = nvme_get_log_ext(ctrl, ns, NVME_NVM_LOG_REPORT_CHUNK,
+		ret = nvme_get_log_ext(ctrl, ns, NVME_NVM_LOG_REPORT_CHUNK, 0,
 				dev_meta, len, offset);
 		if (ret) {
 			dev_err(ctrl->device, "Get REPORT CHUNK log error\n");
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 3ec9b3e37616..3f9a4f8aa310 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -432,7 +432,7 @@ int nvme_delete_ctrl(struct nvme_ctrl *ctrl);
 int nvme_delete_ctrl_sync(struct nvme_ctrl *ctrl);
 
 int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
-		u8 log_page, void *log, size_t size, u64 offset);
+		u8 log_page, u8 lsp, void *log, size_t size, u64 offset);
 
 extern const struct attribute_group nvme_ns_id_attr_group;
 extern const struct block_device_operations nvme_ns_head_ops;
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 03d6494bb028..2169e188973f 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -914,7 +914,7 @@ struct nvme_get_log_page_command {
 	__u64			rsvd2[2];
 	union nvme_data_ptr	dptr;
 	__u8			lid;
-	__u8			rsvd10;
+	__u8			lsp; /* upper 4 bits reserved */
 	__le16			numdl;
 	__le16			numdu;
 	__u16			rsvd11;
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv2 07/11] nvme: always failover on path or transport errors
  2018-05-22  9:09 [PATCHv2 00/11] nvme: ANA support Hannes Reinecke
                   ` (5 preceding siblings ...)
  2018-05-22  9:09 ` [PATCHv2 06/11] nvme: add support for the log specific field Hannes Reinecke
@ 2018-05-22  9:10 ` Hannes Reinecke
  2018-05-25 13:24   ` Christoph Hellwig
  2018-05-22  9:10 ` [PATCHv2 08/11] nvme: add ANA support Hannes Reinecke
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Hannes Reinecke @ 2018-05-22  9:10 UTC (permalink / raw)


From: Christoph Hellwig <hch@lst.de>

NVMe 1.3 TP 4028 introduced a new status code type 3h for
"Path Related Status".  We should always retry on another path for this
class of errors, without even trying to decode them.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/multipath.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 956e0b8e9c4d..2802cbc7b7d0 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -37,6 +37,8 @@ bool nvme_req_needs_failover(struct request *req, blk_status_t error)
 {
 	if (!(req->cmd_flags & REQ_NVME_MPATH))
 		return false;
+	if (nvme_req(req)->status & 0x300)
+		return true;
 	return blk_path_error(error);
 }
 
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv2 08/11] nvme: add ANA support
  2018-05-22  9:09 [PATCHv2 00/11] nvme: ANA support Hannes Reinecke
                   ` (6 preceding siblings ...)
  2018-05-22  9:10 ` [PATCHv2 07/11] nvme: always failover on path or transport errors Hannes Reinecke
@ 2018-05-22  9:10 ` Hannes Reinecke
  2018-05-23 11:52   ` Popuri, Sriram
                     ` (2 more replies)
  2018-05-22  9:10 ` [PATCHv2 09/11] nvmet: add a new nvmet_zero_sgl helper Hannes Reinecke
                   ` (2 subsequent siblings)
  10 siblings, 3 replies; 27+ messages in thread
From: Hannes Reinecke @ 2018-05-22  9:10 UTC (permalink / raw)


From: Christoph Hellwig <hch@lst.de>

Adding sypport for Asymmetric Namespace Access (ANA)

Signed-off-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Hannes Reinecke <hare at suse.com>
---
 drivers/nvme/host/core.c      | 335 +++++++++++++++++++++++++++++++++++++++---
 drivers/nvme/host/multipath.c |  41 +++++-
 drivers/nvme/host/nvme.h      |  17 +++
 3 files changed, 366 insertions(+), 27 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index b4408e1e677f..0d8167c68dff 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -68,6 +68,10 @@ static bool streams;
 module_param(streams, bool, 0644);
 MODULE_PARM_DESC(streams, "turn on support for Streams write directives");
 
+static unsigned long ana_log_delay = 500;
+module_param(ana_log_delay, ulong, 0644);
+MODULE_PARM_DESC(ana_log_delay, "Delay in msecs before retrieving ANA log");
+
 /*
  * nvme_wq - hosts nvme related works that are not reset or delete
  * nvme_reset_wq - hosts nvme reset works
@@ -1246,6 +1250,25 @@ static void nvme_put_ns_from_disk(struct nvme_ns_head *head, int idx)
 		srcu_read_unlock(&head->srcu, idx);
 }
 
+static struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid)
+{
+	struct nvme_ns *ns, *ret = NULL;
+
+	down_read(&ctrl->namespaces_rwsem);
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		if (ns->head->ns_id == nsid) {
+			if (!kref_get_unless_zero(&ns->kref))
+				continue;
+			ret = ns;
+			break;
+		}
+		if (ns->head->ns_id > nsid)
+			break;
+	}
+	up_read(&ctrl->namespaces_rwsem);
+	return ret;
+}
+
 static int nvme_ns_ioctl(struct nvme_ns *ns, unsigned cmd, unsigned long arg)
 {
 	switch (cmd) {
@@ -1357,6 +1380,228 @@ static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type)
 }
 #endif /* CONFIG_BLK_DEV_INTEGRITY */
 
+static const char *nvme_ana_state_names[] = {
+	[0]				= "invalid state",
+	[NVME_ANA_OPTIMIZED]		= "optimized",
+	[NVME_ANA_NONOPTIMIZED]		= "non-optimized",
+	[NVME_ANA_INACCESSIBLE]		= "inaccessible",
+	[NVME_ANA_PERSISTENT_LOSS]	= "persistent-loss",
+	[NVME_ANA_CHANGE]		= "change",
+};
+
+static int nvme_process_ana_log(struct nvme_ctrl *ctrl, bool groups_only)
+{
+	void *base = ctrl->ana_log_buf;
+	size_t offset = sizeof(struct nvme_ana_rsp_hdr);
+	int error, i, j;
+	size_t mdts = ctrl->max_hw_sectors << 9;
+	size_t ana_log_size = ctrl->ana_log_size, ana_xfer_size;
+	off_t ana_log_offset = 0;
+
+	/*
+	 * If anagrpid never changes we don't need to process the namespace
+	 * lists.
+	 */
+	if (ctrl->anacap & (1 << 7))
+		groups_only = true;
+
+	if (groups_only)
+		ana_log_size = sizeof(struct nvme_ana_rsp_hdr) +
+			ctrl->nanagrpid * sizeof(struct nvme_ana_group_desc);
+
+	ana_xfer_size = ana_log_size;
+	if (mdts && ana_log_size > mdts) {
+		dev_warn(ctrl->device,
+			 "ANA log too large (%lu, max %zu), truncating\n",
+			 ana_log_size, mdts);
+		ana_xfer_size = mdts;
+	}
+resubmit:
+	error = nvme_get_log_ext(ctrl, NULL, NVME_LOG_ANA,
+			groups_only ? NVME_ANA_LOG_RGO : 0,
+			ctrl->ana_log_buf + ana_log_offset,
+			ana_xfer_size, ana_log_offset);
+	if (error) {
+		dev_warn(ctrl->device, "Failed to get ANA log: %d\n", error);
+		return -EIO;
+	}
+	ana_log_offset += ana_xfer_size;
+	if (ana_log_size > ana_log_offset) {
+		if (ana_log_size - ana_log_offset < mdts)
+			ana_xfer_size = ana_log_size - ana_log_offset;
+		goto resubmit;
+	}
+
+	for (i = 0; i < le16_to_cpu(ctrl->ana_log_buf->ngrps); i++) {
+		struct nvme_ana_group_desc *desc = base + offset;
+		u32 grpid;
+		u32 nr_nsids;
+		size_t nsid_buf_size;
+		struct nvme_ns *ns;
+
+		if (WARN_ON_ONCE(offset > ana_log_size - sizeof(*desc)))
+			return -EINVAL;
+		grpid = le32_to_cpu(desc->grpid);
+		if (WARN_ON_ONCE(grpid == 0))
+			return -EINVAL;
+		if (WARN_ON_ONCE(grpid > ctrl->anagrpmax))
+			return -EINVAL;
+		if (WARN_ON_ONCE(desc->state == 0))
+			return -EINVAL;
+		if (WARN_ON_ONCE(desc->state > NVME_ANA_CHANGE))
+			return -EINVAL;
+
+		offset += sizeof(*desc);
+		if (groups_only) {
+			down_write(&ctrl->namespaces_rwsem);
+			list_for_each_entry(ns, &ctrl->namespaces, list) {
+				if (ns->anagrpid != grpid)
+					continue;
+				if (desc->state == NVME_SC_ANA_TRANSITION)
+					error = -EAGAIN;
+				if (ns->ana_state != desc->state) {
+					ns->ana_state = desc->state;
+					nvme_mpath_clear_current_path(ns);
+				}
+				break;
+			}
+			up_write(&ctrl->namespaces_rwsem);
+			continue;
+		}
+		nr_nsids = le32_to_cpu(desc->nnsids);
+		if (!nr_nsids)
+			continue;
+
+		nsid_buf_size = nr_nsids * sizeof(__le32);
+		if (WARN_ON_ONCE(offset > ana_log_size - nsid_buf_size))
+			return -EINVAL;
+
+		for (j = 0; j < nr_nsids; j++) {
+			u32 nsid = le32_to_cpu(desc->nsids[j]);
+
+			ns = nvme_find_get_ns(ctrl, nsid);
+			if (!ns)
+				continue;
+
+			if (ns->anagrpid != grpid ||
+			    ns->ana_state != desc->state) {
+				ns->anagrpid = grpid;
+				ns->ana_state = desc->state;
+				nvme_mpath_clear_current_path(ns);
+				if (desc->state == NVME_SC_ANA_TRANSITION)
+					error = -EAGAIN;
+			}
+			nvme_put_ns(ns);
+		}
+
+		offset += nsid_buf_size;
+	}
+
+	return error;
+}
+
+static enum nvme_ana_state nvme_get_ana_state(struct nvme_ctrl *ctrl,
+					      unsigned int grpid,
+					      unsigned int nsid)
+{
+	void *base = ctrl->ana_log_buf;
+	size_t offset = sizeof(struct nvme_ana_rsp_hdr);
+	int i, j;
+
+	if (!ctrl->ana_log_buf)
+		return NVME_ANA_OPTIMIZED;
+
+	for (i = 0; i < le16_to_cpu(ctrl->ana_log_buf->ngrps); i++) {
+		struct nvme_ana_group_desc *desc = base + offset;
+		u32 nr_nsids;
+		size_t nsid_buf_size;
+
+		if (offset > ctrl->ana_log_size - sizeof(*desc))
+			break;
+
+		offset += sizeof(*desc);
+		nr_nsids = le32_to_cpu(desc->nnsids);
+		if (!nr_nsids)
+			continue;
+
+		nsid_buf_size = nr_nsids * sizeof(__le32);
+		if (offset > ctrl->ana_log_size - nsid_buf_size)
+			break;
+
+		if (grpid == le32_to_cpu(desc->grpid)) {
+			for (j = 0; j < nr_nsids; j++) {
+				if (le32_to_cpu(desc->nsids[j]) != nsid)
+					continue;
+
+				dev_info(ctrl->device,
+					 "nsid %d: ANA group %d, %s.\n",
+					 nsid, grpid,
+					 nvme_ana_state_names[desc->state]);
+				return desc->state;
+			}
+		}
+		offset += nsid_buf_size;
+	}
+
+	return NVME_ANA_OPTIMIZED;
+}
+
+static void nvme_ana_work(struct work_struct *work)
+{
+	struct nvme_ctrl *ctrl = container_of(work, struct nvme_ctrl,
+					      ana_work.work);
+	int ret;
+
+	if (!ctrl->ana_log_buf)
+		return;
+	if (ctrl->state != NVME_CTRL_LIVE)
+		return;
+	ret = nvme_process_ana_log(ctrl, false);
+	if (ret == -EAGAIN || ret == -EIO) {
+		unsigned long log_delay = ctrl->anatt * HZ;
+
+		if (ret == -EIO)
+			log_delay = msecs_to_jiffies(ana_log_delay);
+		queue_delayed_work(nvme_wq, &ctrl->ana_work, log_delay);
+	}
+	nvme_kick_requeue_lists(ctrl);
+}
+
+int nvme_configure_ana(struct nvme_ctrl *ctrl)
+{
+	int error;
+
+	if (!nvme_ctrl_has_ana(ctrl))
+		return 0;
+
+	INIT_DELAYED_WORK(&ctrl->ana_work, nvme_ana_work);
+	ctrl->ana_log_size = sizeof(struct nvme_ana_rsp_hdr) +
+		ctrl->nanagrpid * sizeof(struct nvme_ana_group_desc) +
+		ctrl->max_namespaces * sizeof(__le32);
+	ctrl->ana_log_buf = kzalloc(ctrl->ana_log_size, GFP_KERNEL);
+	if (!ctrl->ana_log_buf)
+		return -ENOMEM;
+
+	error = nvme_process_ana_log(ctrl, false);
+	if (error) {
+		if (error != -EAGAIN && error != -EIO) {
+			kfree(ctrl->ana_log_buf);
+			ctrl->ana_log_buf = NULL;
+		} else
+			error = 0;
+	}
+	return error;
+}
+
+void nvme_deconfigure_ana(struct nvme_ctrl *ctrl)
+{
+	if (ctrl->ana_log_buf) {
+		cancel_delayed_work_sync(&ctrl->ana_work);
+		kfree(ctrl->ana_log_buf);
+		ctrl->ana_log_buf = NULL;
+	}
+}
+
 static void nvme_set_chunk_size(struct nvme_ns *ns)
 {
 	u32 chunk_size = (((u32)ns->noiob) << (ns->lba_shift - 9));
@@ -1446,7 +1691,9 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	if (ns->ms && !nvme_ns_has_pi(ns) && !blk_get_integrity(disk))
 		capacity = 0;
 
-	set_capacity(disk, capacity);
+	if (ns->ana_state == NVME_ANA_OPTIMIZED ||
+	    ns->ana_state == NVME_ANA_NONOPTIMIZED)
+		set_capacity(disk, capacity);
 	nvme_config_discard(ns);
 	blk_mq_unfreeze_queue(disk->queue);
 }
@@ -1462,6 +1709,7 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 	ns->lba_shift = id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ds;
 	if (ns->lba_shift == 0)
 		ns->lba_shift = 9;
+	ns->anagrpid = le32_to_cpu(id->anagrpid);
 	ns->noiob = le16_to_cpu(id->noiob);
 	ns->ext = ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT);
 	ns->ms = le16_to_cpu(id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ms);
@@ -1488,6 +1736,7 @@ static int nvme_revalidate_disk(struct gendisk *disk)
 	struct nvme_ctrl *ctrl = ns->ctrl;
 	struct nvme_id_ns *id;
 	struct nvme_ns_ids ids;
+	unsigned int grpid;
 	int ret = 0;
 
 	if (test_bit(NVME_NS_DEAD, &ns->flags)) {
@@ -1504,6 +1753,19 @@ static int nvme_revalidate_disk(struct gendisk *disk)
 		goto out;
 	}
 
+	grpid = le32_to_cpu(id->anagrpid);
+	if (grpid) {
+		if (ns->anagrpid != grpid) {
+			dev_warn(ctrl->device, "ns %x ANA group ID changed\n",
+				 ns->head->ns_id);
+			ns->anagrpid = grpid;
+			queue_delayed_work(nvme_wq, &ctrl->ana_work,
+					   msecs_to_jiffies(ana_log_delay));
+		}
+		ns->ana_state = nvme_get_ana_state(ctrl, grpid,
+						   ns->head->ns_id);
+	}
+
 	__nvme_revalidate_disk(disk, id);
 	nvme_report_ns_ids(ctrl, ns->head->ns_id, id, &ids);
 	if (!nvme_ns_ids_equal(&ns->head->ids, &ids)) {
@@ -2363,6 +2625,8 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 	ctrl->oncs = le16_to_cpup(&id->oncs);
 	ctrl->aen_cfg = le32_to_cpu(id->oaes) &
 		(NVME_AEN_CFG_NS_ATTR | NVME_AEN_CFG_FW_ACT);
+	if (nvme_ctrl_has_ana(ctrl))
+		ctrl->aen_cfg |= NVME_AEN_CFG_ANA_CHANGE;
 	atomic_set(&ctrl->abort_limit, id->acl + 1);
 	ctrl->vwc = id->vwc;
 	ctrl->cntlid = le16_to_cpup(&id->cntlid);
@@ -2376,6 +2640,11 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 	nvme_set_queue_limits(ctrl, ctrl->admin_q);
 	ctrl->sgls = le32_to_cpu(id->sgls);
 	ctrl->kas = le16_to_cpu(id->kas);
+	ctrl->max_namespaces = le32_to_cpu(id->mnan);
+	ctrl->anacap = id->anacap;
+	ctrl->anatt = id->anatt;
+	ctrl->nanagrpid = le32_to_cpu(id->nanagrpid);
+	ctrl->anagrpmax = le32_to_cpu(id->anagrpmax);
 
 	if (id->rtd3e) {
 		/* us -> s */
@@ -2445,7 +2714,7 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 	ret = nvme_configure_apst(ctrl);
 	if (ret < 0)
 		return ret;
-	
+
 	ret = nvme_configure_timestamp(ctrl);
 	if (ret < 0)
 		return ret;
@@ -2454,6 +2723,10 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 	if (ret < 0)
 		return ret;
 
+	ret = nvme_configure_ana(ctrl);
+	if (ret < 0)
+		return ret;
+
 	ctrl->identified = true;
 
 	return 0;
@@ -2649,12 +2922,30 @@ static ssize_t nsid_show(struct device *dev, struct device_attribute *attr,
 }
 static DEVICE_ATTR_RO(nsid);
 
+static ssize_t ana_grpid_show(struct device *dev, struct device_attribute *attr,
+		char *buf)
+{
+	return sprintf(buf, "%d\n", nvme_get_ns_from_dev(dev)->anagrpid);
+}
+DEVICE_ATTR_RO(ana_grpid);
+
+static ssize_t ana_state_show(struct device *dev, struct device_attribute *attr,
+		char *buf)
+{
+	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
+
+	return sprintf(buf, "%s\n", nvme_ana_state_names[ns->ana_state]);
+}
+DEVICE_ATTR_RO(ana_state);
+
 static struct attribute *nvme_ns_id_attrs[] = {
 	&dev_attr_wwid.attr,
 	&dev_attr_uuid.attr,
 	&dev_attr_nguid.attr,
 	&dev_attr_eui.attr,
 	&dev_attr_nsid.attr,
+	&dev_attr_ana_grpid.attr,
+	&dev_attr_ana_state.attr,
 	NULL,
 };
 
@@ -2663,6 +2954,7 @@ static umode_t nvme_ns_id_attrs_are_visible(struct kobject *kobj,
 {
 	struct device *dev = container_of(kobj, struct device, kobj);
 	struct nvme_ns_ids *ids = &dev_to_ns_head(dev)->ids;
+	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
 
 	if (a == &dev_attr_uuid.attr) {
 		if (uuid_is_null(&ids->uuid) &&
@@ -2677,6 +2969,10 @@ static umode_t nvme_ns_id_attrs_are_visible(struct kobject *kobj,
 		if (!memchr_inv(ids->eui64, 0, sizeof(ids->eui64)))
 			return 0;
 	}
+	if (a == &dev_attr_ana_grpid.attr || a == &dev_attr_ana_state.attr) {
+		if (!ns->anagrpid)
+			return 0;
+	}
 	return a->mode;
 }
 
@@ -2939,25 +3235,6 @@ static int ns_cmp(void *priv, struct list_head *a, struct list_head *b)
 	return nsa->head->ns_id - nsb->head->ns_id;
 }
 
-static struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid)
-{
-	struct nvme_ns *ns, *ret = NULL;
-
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		if (ns->head->ns_id == nsid) {
-			if (!kref_get_unless_zero(&ns->kref))
-				continue;
-			ret = ns;
-			break;
-		}
-		if (ns->head->ns_id > nsid)
-			break;
-	}
-	up_read(&ctrl->namespaces_rwsem);
-	return ret;
-}
-
 static int nvme_setup_streams_ns(struct nvme_ctrl *ctrl, struct nvme_ns *ns)
 {
 	struct streams_directive_params s;
@@ -2991,6 +3268,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	struct nvme_id_ns *id;
 	char disk_name[DISK_NAME_LEN];
 	int node = dev_to_node(ctrl->dev), flags = GENHD_FL_EXT_DEVT;
+	unsigned int grpid;
 
 	ns = kzalloc_node(sizeof(*ns), GFP_KERNEL, node);
 	if (!ns)
@@ -3002,6 +3280,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
 	ns->queue->queuedata = ns;
 	ns->ctrl = ctrl;
+	ns->ana_state = NVME_ANA_OPTIMIZED;
 
 	kref_init(&ns->kref);
 	ns->lba_shift = 9; /* set to a default value for 512 until disk is validated */
@@ -3019,7 +3298,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	if (nvme_init_ns_head(ns, nsid, id))
 		goto out_free_id;
 	nvme_setup_streams_ns(ctrl, ns);
-	
+
 #ifdef CONFIG_NVME_MULTIPATH
 	/*
 	 * If multipathing is enabled we need to always use the subsystem
@@ -3063,6 +3342,10 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	memcpy(disk->disk_name, disk_name, DISK_NAME_LEN);
 	ns->disk = disk;
 
+	grpid = le32_to_cpu(id->anagrpid);
+	if (grpid)
+		ns->ana_state = nvme_get_ana_state(ctrl, grpid, nsid);
+
 	__nvme_revalidate_disk(disk, id);
 
 	down_write(&ctrl->namespaces_rwsem);
@@ -3371,6 +3654,12 @@ static void nvme_handle_aen_notice(struct nvme_ctrl *ctrl, u32 result)
 	case NVME_AER_NOTICE_FW_ACT_STARTING:
 		queue_work(nvme_wq, &ctrl->fw_act_work);
 		break;
+	case NVME_AER_NOTICE_ANA:
+		if (WARN_ON_ONCE(!ctrl->ana_log_buf))
+			break;
+		queue_delayed_work(nvme_wq, &ctrl->ana_work,
+				   msecs_to_jiffies(ana_log_delay));
+		break;
 	default:
 		dev_warn(ctrl->device, "async event result %08x\n", result);
 	}
@@ -3406,6 +3695,7 @@ void nvme_stop_ctrl(struct nvme_ctrl *ctrl)
 	nvme_stop_keep_alive(ctrl);
 	flush_work(&ctrl->async_event_work);
 	flush_work(&ctrl->scan_work);
+	cancel_delayed_work_sync(&ctrl->ana_work);
 	cancel_work_sync(&ctrl->fw_act_work);
 	if (ctrl->ops->stop_ctrl)
 		ctrl->ops->stop_ctrl(ctrl);
@@ -3440,6 +3730,7 @@ static void nvme_free_ctrl(struct device *dev)
 
 	ida_simple_remove(&nvme_instance_ida, ctrl->instance);
 	kfree(ctrl->effects);
+	nvme_deconfigure_ana(ctrl);
 
 	if (subsys) {
 		mutex_lock(&subsys->lock);
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 2802cbc7b7d0..e10bfa5e4aa5 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -29,7 +29,30 @@ void nvme_failover_req(struct request *req)
 	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
 	blk_mq_end_request(req, 0);
 
-	nvme_reset_ctrl(ns->ctrl);
+	/*
+	 * Reset the controller for any non-ANA error as we don't know what
+	 * caused the error:
+	 */
+	switch (nvme_req(req)->status & 0x7ff) {
+	case NVME_SC_ANA_TRANSITION:
+		queue_delayed_work(nvme_wq, &ns->ctrl->ana_work,
+				   ns->ctrl->anatt * HZ);
+	case NVME_SC_ANA_PERSISTENT_LOSS:
+	case NVME_SC_ANA_INACCESSIBLE:
+		/*
+		 * We could try to update the ANA group state here instead of
+		 * waiting for the AER and log page read.  But concurrency would
+		 * be nasy.
+		 */
+		nvme_mpath_clear_current_path(ns);
+		if (ns->head->disk)
+			kblockd_schedule_work(&ns->head->requeue_work);
+		break;
+	default:
+		nvme_reset_ctrl(ns->ctrl);
+		break;
+	}
+
 	kblockd_schedule_work(&ns->head->requeue_work);
 }
 
@@ -54,12 +77,14 @@ void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
 	up_read(&ctrl->namespaces_rwsem);
 }
 
-static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head)
+static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head,
+		u8 ana_state)
 {
 	struct nvme_ns *ns;
 
 	list_for_each_entry_rcu(ns, &head->list, siblings) {
-		if (ns->ctrl->state == NVME_CTRL_LIVE) {
+		if (ns->ctrl->state == NVME_CTRL_LIVE &&
+		    ns->ana_state == ana_state) {
 			rcu_assign_pointer(head->current_path, ns);
 			return ns;
 		}
@@ -72,8 +97,14 @@ inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
 {
 	struct nvme_ns *ns = srcu_dereference(head->current_path, &head->srcu);
 
-	if (unlikely(!ns || ns->ctrl->state != NVME_CTRL_LIVE))
-		ns = __nvme_find_path(head);
+	if (likely(ns && ns->ctrl->state == NVME_CTRL_LIVE &&
+			ns->ana_state == NVME_ANA_OPTIMIZED))
+		return ns;
+
+	ns = __nvme_find_path(head, NVME_ANA_OPTIMIZED);
+	if (!ns)
+		ns = __nvme_find_path(head, NVME_ANA_NONOPTIMIZED);
+	/* XXX: try an inaccessible path as last resort per 8.18.3.3 */
 	return ns;
 }
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 3f9a4f8aa310..6ea3584fa62f 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -169,6 +169,7 @@ struct nvme_ctrl {
 	u16 oacs;
 	u16 nssa;
 	u16 nr_streams;
+	u32 max_namespaces;
 	atomic_t abort_limit;
 	u8 vwc;
 	u32 vs;
@@ -190,6 +191,15 @@ struct nvme_ctrl {
 	struct nvme_command ka_cmd;
 	struct work_struct fw_act_work;
 
+	/* asymmetric namespace access: */
+	u8 anacap;
+	u8 anatt;
+	u32 anagrpmax;
+	u32 nanagrpid;
+	size_t ana_log_size;
+	struct nvme_ana_rsp_hdr *ana_log_buf;
+	struct delayed_work ana_work;
+
 	/* Power saving configuration */
 	u64 ps_max_latency_us;
 	bool apst_enabled;
@@ -295,6 +305,8 @@ struct nvme_ns {
 #define NVME_NS_REMOVING 0
 #define NVME_NS_DEAD     1
 	u16 noiob;
+	u32 anagrpid;
+	enum nvme_ana_state ana_state;
 
 #ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
 	struct nvme_fault_inject fault_inject;
@@ -437,6 +449,11 @@ int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 extern const struct attribute_group nvme_ns_id_attr_group;
 extern const struct block_device_operations nvme_ns_head_ops;
 
+static inline bool nvme_ctrl_has_ana(struct nvme_ctrl *ctrl)
+{
+	return ctrl->subsys->cmic & (1 << 3);
+}
+
 #ifdef CONFIG_NVME_MULTIPATH
 void nvme_failover_req(struct request *req);
 bool nvme_req_needs_failover(struct request *req, blk_status_t error);
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv2 09/11] nvmet: add a new nvmet_zero_sgl helper
  2018-05-22  9:09 [PATCHv2 00/11] nvme: ANA support Hannes Reinecke
                   ` (7 preceding siblings ...)
  2018-05-22  9:10 ` [PATCHv2 08/11] nvme: add ANA support Hannes Reinecke
@ 2018-05-22  9:10 ` Hannes Reinecke
  2018-05-22  9:10 ` [PATCHv2 10/11] nvmet: split log page implementation Hannes Reinecke
  2018-05-22  9:10 ` [PATCHv2 11/11] nvmet: ANA support Hannes Reinecke
  10 siblings, 0 replies; 27+ messages in thread
From: Hannes Reinecke @ 2018-05-22  9:10 UTC (permalink / raw)


From: Christoph Hellwig <hch@lst.de>

Zeroes the SGL in the payload.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/target/core.c  | 7 +++++++
 drivers/nvme/target/nvmet.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index a8fc05b1aff8..3b9e9eccc950 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -57,6 +57,13 @@ u16 nvmet_copy_from_sgl(struct nvmet_req *req, off_t off, void *buf, size_t len)
 	return 0;
 }
 
+u16 nvmet_zero_sgl(struct nvmet_req *req, off_t off, size_t len)
+{
+	if (sg_zero_buffer(req->sg, req->sg_cnt, len, off) != len)
+		return NVME_SC_SGL_INVALID_DATA | NVME_SC_DNR;
+	return 0;
+}
+
 static unsigned int nvmet_max_nsid(struct nvmet_subsys *subsys)
 {
 	struct nvmet_ns *ns;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 3fd6686ad906..f5f7c1398173 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -317,6 +317,7 @@ u16 nvmet_copy_to_sgl(struct nvmet_req *req, off_t off, const void *buf,
 		size_t len);
 u16 nvmet_copy_from_sgl(struct nvmet_req *req, off_t off, void *buf,
 		size_t len);
+u16 nvmet_zero_sgl(struct nvmet_req *req, off_t off, size_t len);
 
 u32 nvmet_get_log_page_len(struct nvme_command *cmd);
 
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv2 10/11] nvmet: split log page implementation
  2018-05-22  9:09 [PATCHv2 00/11] nvme: ANA support Hannes Reinecke
                   ` (8 preceding siblings ...)
  2018-05-22  9:10 ` [PATCHv2 09/11] nvmet: add a new nvmet_zero_sgl helper Hannes Reinecke
@ 2018-05-22  9:10 ` Hannes Reinecke
  2018-05-22  9:10 ` [PATCHv2 11/11] nvmet: ANA support Hannes Reinecke
  10 siblings, 0 replies; 27+ messages in thread
From: Hannes Reinecke @ 2018-05-22  9:10 UTC (permalink / raw)


From: Christoph Hellwig <hch@lst.de>

Remove the common code to allocate a buffer and copy it into the SGL.
Instead the two no-op implementations just zero the SGL directly, and
the smart log allocates a buffer on its own.  This prepares for the
more elaborate ANA log page.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/target/admin-cmd.c | 101 +++++++++++++++-------------------------
 1 file changed, 37 insertions(+), 64 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 58df0e5b3f5a..8becc7df47b7 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -32,6 +32,11 @@ u32 nvmet_get_log_page_len(struct nvme_command *cmd)
 	return len;
 }
 
+static void nvmet_execute_get_log_page_noop(struct nvmet_req *req)
+{
+	nvmet_req_complete(req, nvmet_zero_sgl(req, 0, req->data_len));
+}
+
 static u16 nvmet_get_smart_log_nsid(struct nvmet_req *req,
 		struct nvme_smart_log *slog)
 {
@@ -89,74 +94,26 @@ static u16 nvmet_get_smart_log_all(struct nvmet_req *req,
 	return NVME_SC_SUCCESS;
 }
 
-static u16 nvmet_get_smart_log(struct nvmet_req *req,
-		struct nvme_smart_log *slog)
+static void nvmet_execute_get_log_page_smart(struct nvmet_req *req)
 {
-	u16 status;
+	struct nvme_smart_log *log;
+	u16 status = NVME_SC_INTERNAL;
 
-	WARN_ON(req == NULL || slog == NULL);
-	if (req->cmd->get_log_page.nsid == cpu_to_le32(NVME_NSID_ALL))
-		status = nvmet_get_smart_log_all(req, slog);
-	else
-		status = nvmet_get_smart_log_nsid(req, slog);
-	return status;
-}
-
-static void nvmet_execute_get_log_page(struct nvmet_req *req)
-{
-	struct nvme_smart_log *smart_log;
-	size_t data_len = nvmet_get_log_page_len(req->cmd);
-	void *buf;
-	u16 status = 0;
-
-	buf = kzalloc(data_len, GFP_KERNEL);
-	if (!buf) {
-		status = NVME_SC_INTERNAL;
+	if (req->data_len != sizeof(*log))
 		goto out;
-	}
 
-	switch (req->cmd->get_log_page.lid) {
-	case NVME_LOG_ERROR:
-		/*
-		 * We currently never set the More bit in the status field,
-		 * so all error log entries are invalid and can be zeroed out.
-		 * This is called a minum viable implementation (TM) of this
-		 * mandatory log page.
-		 */
-		break;
-	case NVME_LOG_SMART:
-		/*
-		 * XXX: fill out actual smart log
-		 *
-		 * We might have a hard time coming up with useful values for
-		 * many of the fields, and even when we have useful data
-		 * available (e.g. units or commands read/written) those aren't
-		 * persistent over power loss.
-		 */
-		if (data_len != sizeof(*smart_log)) {
-			status = NVME_SC_INTERNAL;
-			goto err;
-		}
-		smart_log = buf;
-		status = nvmet_get_smart_log(req, smart_log);
-		if (status)
-			goto err;
-		break;
-	case NVME_LOG_FW_SLOT:
-		/*
-		 * We only support a single firmware slot which always is
-		 * active, so we can zero out the whole firmware slot log and
-		 * still claim to fully implement this mandatory log page.
-		 */
-		break;
-	default:
-		BUG();
-	}
+	log = kzalloc(sizeof(*log), GFP_KERNEL);
+	if (!log)
+		goto out;
 
-	status = nvmet_copy_to_sgl(req, 0, buf, data_len);
+	if (req->cmd->get_log_page.nsid == cpu_to_le32(NVME_NSID_ALL))
+		status = nvmet_get_smart_log_all(req, log);
+	else
+		status = nvmet_get_smart_log_nsid(req, log);
+	if (status)
+		goto out;
 
-err:
-	kfree(buf);
+	status = nvmet_copy_to_sgl(req, 0, log, sizeof(*log));
 out:
 	nvmet_req_complete(req, status);
 }
@@ -566,9 +523,25 @@ u16 nvmet_parse_admin_cmd(struct nvmet_req *req)
 
 		switch (cmd->get_log_page.lid) {
 		case NVME_LOG_ERROR:
-		case NVME_LOG_SMART:
+			/*
+			 * We currently never set the More bit in the status
+			 * field, so all error log entries are invalid and can
+			 * be zeroed out.  This is called a minum viable
+			 * implementation (TM) of this mandatory log page.
+			 */
+			req->execute = nvmet_execute_get_log_page_noop;
+			return 0;
 		case NVME_LOG_FW_SLOT:
-			req->execute = nvmet_execute_get_log_page;
+			/*
+			 * We only support a single firmware slot which always
+			 * is active, so we can zero out the whole firmware slot
+			 * log and still claim to fully implement this mandatory
+			 * log page.
+			 */
+			req->execute = nvmet_execute_get_log_page_noop;
+			return 0;
+		case NVME_LOG_SMART:
+			req->execute = nvmet_execute_get_log_page_smart;
 			return 0;
 		}
 		break;
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv2 11/11] nvmet: ANA support
  2018-05-22  9:09 [PATCHv2 00/11] nvme: ANA support Hannes Reinecke
                   ` (9 preceding siblings ...)
  2018-05-22  9:10 ` [PATCHv2 10/11] nvmet: split log page implementation Hannes Reinecke
@ 2018-05-22  9:10 ` Hannes Reinecke
  2018-05-22 10:05   ` Christoph Hellwig
  10 siblings, 1 reply; 27+ messages in thread
From: Hannes Reinecke @ 2018-05-22  9:10 UTC (permalink / raw)


Add ANA support to the nvme target. The ANA configuration is optional, and
doesn't interfere with existing configurations.
ANA groups are created under each subsystem, the namespaces must to be linked
into this group to become part of the ANA group.
If ANA groups are created it is required to link the ANA groups into the
respective ports. Each ANA group can only be part of one port.
Linking the entire subsystem will be refused is ANA groups are specified.
Also this implementation has a limit of one single ANA state per group,
irrespective of the path. So when different ANA states are required one needs
 to create different ANA groups, one for each state.

Signed-off-by: Hannes Reinecke <hare at suse.com>
---
 drivers/nvme/target/admin-cmd.c | 156 +++++++++++++++-
 drivers/nvme/target/configfs.c  | 381 ++++++++++++++++++++++++++++++++++++++++
 drivers/nvme/target/core.c      |  94 +++++++++-
 drivers/nvme/target/discovery.c |  49 ++++--
 drivers/nvme/target/io-cmd.c    |  36 ++++
 drivers/nvme/target/nvmet.h     |  50 ++++++
 6 files changed, 747 insertions(+), 19 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 8becc7df47b7..e87906a9bf98 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -118,6 +118,79 @@ static void nvmet_execute_get_log_page_smart(struct nvmet_req *req)
 	nvmet_req_complete(req, status);
 }
 
+static u32 nvmet_format_ana_group(struct nvmet_req *req, struct nvmet_ag *ag,
+		struct nvme_ana_group_desc *desc)
+{
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	struct nvmet_ns *ns;
+	u32 count = 0;
+
+	if (!(req->cmd->get_log_page.lsp & NVME_ANA_LOG_RGO)) {
+		struct nvmet_ns_link *n;
+
+		rcu_read_lock();
+		if (!ag->allow_any_ns && !list_empty(&ag->ns_link)) {
+			list_for_each_entry_rcu(n, &ag->ns_link, entry)
+				desc->nsids[count++] = cpu_to_le32(n->ns->nsid);
+		} else {
+			list_for_each_entry_rcu(ns, &ctrl->subsys->namespaces,
+						dev_link)
+				desc->nsids[count++] = cpu_to_le32(ns->nsid);
+		}
+		rcu_read_unlock();
+	}
+
+	desc->grpid = cpu_to_le32(ag->grpid);
+	desc->nnsids = cpu_to_le32(count);
+	desc->chgcnt = cpu_to_le64(ag->change_count);
+	desc->state = ag->state;
+	memset(desc->rsvd17, 0, sizeof(desc->rsvd17));
+
+	return sizeof(struct nvme_ana_group_desc) + count * sizeof(__le32);
+}
+
+static void nvmet_execute_get_log_page_ana(struct nvmet_req *req)
+{
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	struct nvme_ana_rsp_hdr hdr = { 0, };
+	struct nvme_ana_group_desc *desc;
+	struct nvmet_ag_link *a;
+	size_t offset = sizeof(struct nvme_ana_rsp_hdr); /* start beyond hdr */
+	size_t len;
+	u16 ngrps = 0;
+	u16 status;
+
+	status = NVME_SC_INTERNAL;
+	desc = kzalloc(sizeof(struct nvme_ana_group_desc) +
+			ctrl->subsys->nr_namespaces * sizeof(__le32),
+		       GFP_KERNEL);
+	if (!desc)
+		goto out;
+
+	down_read(&nvmet_config_sem);
+	list_for_each_entry(a, &req->port->ags, entry) {
+		if (a->ag->subsys != ctrl->subsys)
+			continue;
+		len = nvmet_format_ana_group(req, a->ag, desc);
+		status = nvmet_copy_to_sgl(req, offset, desc, len);
+		if (status)
+			break;
+		offset += len;
+		ngrps++;
+	}
+
+	hdr.chgcnt = cpu_to_le64(ctrl->subsys->ana_chgcnt);
+	hdr.ngrps = cpu_to_le16(ngrps);
+	up_read(&nvmet_config_sem);
+
+	kfree(desc);
+
+	/* copy the header last once we know the information */
+	status = nvmet_copy_to_sgl(req, 0, &hdr, sizeof(hdr));
+out:
+	nvmet_req_complete(req, status);
+}
+
 static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 {
 	struct nvmet_ctrl *ctrl = req->sq->ctrl;
@@ -149,9 +222,31 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 	 * the safest is to leave it as zeroes.
 	 */
 
-	/* we support multiple ports and multiples hosts: */
+	/* we support multiple ports and multiple hosts */
 	id->cmic = (1 << 0) | (1 << 1);
 
+	/* ANA support */
+	if (!list_empty(&ctrl->subsys->ana_groups)) {
+		u32 grpmax = 0, grpnum = 0;
+		struct nvmet_ag *ag;
+
+		id->cmic |= (1 << 3);
+
+		/* All ANA states are supported */
+		id->anacap = (1 << 0) | (1 << 1) | (1 << 2) | \
+			(1 << 3) | (1 << 4);
+		id->anatt = ctrl->subsys->anatt;
+		down_read(&nvmet_config_sem);
+		list_for_each_entry(ag, &ctrl->subsys->ana_groups, entry) {
+			if (grpmax < ag->grpid)
+				grpmax = ag->grpid;
+			grpnum++;
+		}
+		up_read(&nvmet_config_sem);
+		id->anagrpmax = cpu_to_le32(grpmax);
+		id->nanagrpid = cpu_to_le32(grpnum);
+	}
+
 	/* no limit on data transfer sizes for now */
 	id->mdts = 0;
 	id->cntlid = cpu_to_le16(ctrl->cntlid);
@@ -188,6 +283,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 	id->maxcmd = cpu_to_le16(NVMET_MAX_CMD);
 
 	id->nn = cpu_to_le32(ctrl->subsys->max_nsid);
+	id->mnan = cpu_to_le32(ctrl->subsys->nr_namespaces);
 	id->oncs = cpu_to_le16(NVME_CTRL_ONCS_DSM |
 			NVME_CTRL_ONCS_WRITE_ZEROES);
 
@@ -236,7 +332,11 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
 {
 	struct nvmet_ns *ns;
 	struct nvme_id_ns *id;
+	struct nvmet_ag_link *a;
+	struct nvmet_ns_link *n;
+	struct nvmet_ag *ag = NULL;
 	u16 status = 0;
+	u8 ag_state = 0;
 
 	ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
 	if (!ns) {
@@ -244,18 +344,44 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
 		goto out;
 	}
 
+	list_for_each_entry(a, &req->port->ags, entry) {
+		if (a->ag->subsys != req->sq->ctrl->subsys)
+			continue;
+		list_for_each_entry(n, &a->ag->ns_link, entry) {
+			if (a->ag->allow_any_ns || n->ns == ns) {
+				ag = a->ag;
+				break;
+			}
+		}
+	}
+	if (!ag) {
+		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+		goto out;
+	}
 	id = kzalloc(sizeof(*id), GFP_KERNEL);
 	if (!id) {
 		status = NVME_SC_INTERNAL;
 		goto out_put_ns;
 	}
 
+	if (ag) {
+		id->anagrpid = cpu_to_le32(ag->grpid);
+		ag_state = ag->state;
+	}
+
 	/*
 	 * nuse = ncap = nsze isn't always true, but we have no way to find
 	 * that out from the underlying device.
 	 */
 	id->ncap = id->nuse = id->nsze =
 		cpu_to_le64(ns->size >> ns->blksize_shift);
+	/*
+	 * nuse and nsze should be zero for inaccessible or
+	 * persistent loss ANA state.
+	 */
+	if ((ag_state == NVME_ANA_INACCESSIBLE) ||
+	    (ag_state == NVME_ANA_PERSISTENT_LOSS))
+		id->nuse = id->nsze = 0;
 
 	/*
 	 * We just provide a single LBA format that matches what the
@@ -300,6 +426,27 @@ static void nvmet_execute_identify_nslist(struct nvmet_req *req)
 	}
 
 	rcu_read_lock();
+	if (!list_empty(&req->port->ags)) {
+		struct nvmet_ag_link *a;
+
+		list_for_each_entry_rcu(a, &req->port->ags, entry) {
+			struct nvmet_ns_link *n;
+
+			if (a->ag->subsys != ctrl->subsys)
+				continue;
+			if (a->ag->allow_any_ns)
+				goto list_subsys;
+			list_for_each_entry(n, &a->ag->ns_link, entry) {
+				if (n->ns->nsid <= min_nsid)
+					continue;
+				list[i++] = cpu_to_le32(n->ns->nsid);
+				if (i == buf_size / sizeof(__le32))
+					break;
+			}
+		}
+		goto skip_subsys;
+	}
+list_subsys:
 	list_for_each_entry_rcu(ns, &ctrl->subsys->namespaces, dev_link) {
 		if (ns->nsid <= min_nsid)
 			continue;
@@ -307,6 +454,7 @@ static void nvmet_execute_identify_nslist(struct nvmet_req *req)
 		if (i == buf_size / sizeof(__le32))
 			break;
 	}
+skip_subsys:
 	rcu_read_unlock();
 
 	status = nvmet_copy_to_sgl(req, 0, list, buf_size);
@@ -406,7 +554,8 @@ static void nvmet_execute_set_features(struct nvmet_req *req)
 		break;
 	case NVME_FEAT_ASYNC_EVENT:
 		val32 = le32_to_cpu(req->cmd->common.cdw10[1]);
-		req->sq->ctrl->aen_cfg = val32 & NVME_AEN_CFG_NS_ATTR;
+		req->sq->ctrl->aen_cfg = val32 &
+			(NVME_AEN_CFG_NS_ATTR | NVME_AEN_CFG_ANA_CHANGE);
 		nvmet_set_result(req, req->sq->ctrl->aen_cfg);
 		break;
 	case NVME_FEAT_HOST_ID:
@@ -543,6 +692,9 @@ u16 nvmet_parse_admin_cmd(struct nvmet_req *req)
 		case NVME_LOG_SMART:
 			req->execute = nvmet_execute_get_log_page_smart;
 			return 0;
+		case NVME_LOG_ANA:
+			req->execute = nvmet_execute_get_log_page_ana;
+			return 0;
 		}
 		break;
 	case nvme_admin_identify:
diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index ad9ff27234b5..5bb554e75e78 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -22,6 +22,7 @@
 
 static const struct config_item_type nvmet_host_type;
 static const struct config_item_type nvmet_subsys_type;
+static const struct config_item_type nvmet_ag_type;
 
 static const struct nvmet_transport_name {
 	u8		type;
@@ -478,6 +479,10 @@ static int nvmet_port_subsys_allow_link(struct config_item *parent,
 		pr_err("can only link subsystems into the subsystems dir.!\n");
 		return -EINVAL;
 	}
+	if (!list_empty(&port->ags)) {
+		pr_err("can only link subsystems if no ANA groups are active!\n");
+		return -EAGAIN;
+	}
 	subsys = to_subsys(target);
 	link = kmalloc(sizeof(*link), GFP_KERNEL);
 	if (!link)
@@ -542,6 +547,338 @@ static const struct config_item_type nvmet_port_subsys_type = {
 	.ct_owner		= THIS_MODULE,
 };
 
+static int nvmet_port_ags_allow_link(struct config_item *parent,
+		struct config_item *target)
+{
+	struct nvmet_port *port = to_nvmet_port(parent->ci_parent);
+	struct nvmet_ag *ag;
+	struct nvmet_ag_link *link, *p;
+	int ret;
+
+	if (target->ci_type != &nvmet_ag_type) {
+		pr_err("can only link ana_groups into the ana_groups dir.!\n");
+		return -EINVAL;
+	}
+	if (!list_empty(&port->subsystems)) {
+		pr_err("can only link ana_groups if no subsystems are specified!\n");
+		return -EAGAIN;
+	}
+	ag = to_nvmet_ag(target);
+	link = kmalloc(sizeof(*link), GFP_KERNEL);
+	if (!link)
+		return -ENOMEM;
+	link->ag = ag;
+
+	down_write(&nvmet_config_sem);
+	ret = -EEXIST;
+	list_for_each_entry(p, &port->ags, entry) {
+		if (p->ag == ag)
+			goto out_free_link;
+	}
+	if (list_empty(&port->ags)) {
+		ret = nvmet_enable_port(port);
+		if (ret)
+			goto out_free_link;
+	}
+	list_add_tail(&link->entry, &port->ags);
+	nvmet_genctr++;
+	up_write(&nvmet_config_sem);
+	return 0;
+
+out_free_link:
+	up_write(&nvmet_config_sem);
+	kfree(link);
+	return ret;
+}
+
+static void nvmet_port_ags_drop_link(struct config_item *parent,
+		struct config_item *target)
+{
+	struct nvmet_port *port = to_nvmet_port(parent->ci_parent);
+	struct nvmet_ag *ag = to_nvmet_ag(target);
+	struct nvmet_ag_link *p;
+
+	down_write(&nvmet_config_sem);
+	list_for_each_entry(p, &port->ags, entry) {
+		if (p->ag == ag)
+			goto found;
+	}
+	up_write(&nvmet_config_sem);
+	return;
+
+found:
+	list_del(&p->entry);
+	nvmet_genctr++;
+	if (list_empty(&port->ags))
+		nvmet_disable_port(port);
+	up_write(&nvmet_config_sem);
+	kfree(p);
+}
+
+static struct configfs_item_operations nvmet_port_ags_item_ops = {
+	.allow_link		= nvmet_port_ags_allow_link,
+	.drop_link		= nvmet_port_ags_drop_link,
+};
+
+static const struct config_item_type nvmet_port_ag_type = {
+	.ct_item_ops		= &nvmet_port_ags_item_ops,
+	.ct_owner		= THIS_MODULE,
+};
+
+static int nvmet_ag_ns_allow_link(struct config_item *parent,
+		struct config_item *target)
+{
+	struct nvmet_ag *ag = to_nvmet_ag(parent->ci_parent);
+	struct nvmet_ns *ns;
+	struct nvmet_ns_link *link, *l;
+	int ret;
+
+	if (target->ci_type != &nvmet_ns_type) {
+		pr_err("can only link namespaces into the namespaces dir.!\n");
+		return -EINVAL;
+	}
+	if (ag->allow_any_ns) {
+		pr_err("can't add namespaces when allow_any_ns is set!\n");
+		return -EINVAL;
+	}
+	ns = to_nvmet_ns(target);
+	link = kmalloc(sizeof(*link), GFP_KERNEL);
+	if (!link)
+		return -ENOMEM;
+	link->ns = ns;
+
+	down_write(&nvmet_config_sem);
+	ret = -EEXIST;
+	list_for_each_entry(l, &ag->ns_link, entry) {
+		if (l->ns == ns)
+			goto out_free_link;
+	}
+	list_add_tail(&link->entry, &ag->ns_link);
+	nvmet_genctr++;
+	up_write(&nvmet_config_sem);
+	return 0;
+
+out_free_link:
+	up_write(&nvmet_config_sem);
+	kfree(link);
+	return ret;
+}
+
+static void nvmet_ag_ns_drop_link(struct config_item *parent,
+		struct config_item *target)
+{
+	struct nvmet_ag *ag = to_nvmet_ag(parent->ci_parent);
+	struct nvmet_ns *ns = to_nvmet_ns(target);
+	struct nvmet_ns_link *l;
+
+	down_write(&nvmet_config_sem);
+	list_for_each_entry(l, &ag->ns_link, entry) {
+		if (l->ns == ns)
+			goto found;
+	}
+	up_write(&nvmet_config_sem);
+	return;
+
+found:
+	list_del(&l->entry);
+	nvmet_genctr++;
+	up_write(&nvmet_config_sem);
+	kfree(l);
+}
+
+static struct configfs_item_operations nvmet_ag_ns_item_ops = {
+	.allow_link		= nvmet_ag_ns_allow_link,
+	.drop_link		= nvmet_ag_ns_drop_link,
+};
+
+static const struct config_item_type nvmet_ag_ns_type = {
+	.ct_item_ops		= &nvmet_ag_ns_item_ops,
+	.ct_owner		= THIS_MODULE,
+};
+
+static ssize_t nvmet_ag_state_show(struct config_item *item, char *page)
+{
+	struct nvmet_ag *ag = to_nvmet_ag(item);
+
+	switch (ag->state) {
+	case NVME_ANA_OPTIMIZED:
+		return sprintf(page, "optimized\n");
+	case NVME_ANA_NONOPTIMIZED:
+		return sprintf(page, "nonoptimized\n");
+	case NVME_ANA_INACCESSIBLE:
+		return sprintf(page, "inaccessible\n");
+	case NVME_ANA_PERSISTENT_LOSS:
+		return sprintf(page, "persistent-loss\n");
+	case NVME_ANA_CHANGE:
+		return sprintf(page, "change-state\n");
+	default:
+		return sprintf(page, "<invalid>\n");
+	}
+}
+
+static ssize_t nvmet_ag_state_store(struct config_item *item,
+		const char *page, size_t count)
+{
+	struct nvmet_ag *ag = to_nvmet_ag(item);
+	u8 state;
+
+	if (sysfs_streq(page, "optimized")) {
+		state = NVME_ANA_OPTIMIZED;
+	} else if (sysfs_streq(page, "nonoptimized")) {
+		state = NVME_ANA_NONOPTIMIZED;
+	} else if (sysfs_streq(page, "inaccessible")) {
+		state = NVME_ANA_INACCESSIBLE;
+	} else if (sysfs_streq(page, "persistent-loss")) {
+		state = NVME_ANA_PERSISTENT_LOSS;
+	} else {
+		pr_err("Invalid state '%s' for state\n", page);
+		return -EINVAL;
+	}
+	down_write(&nvmet_config_sem);
+	ag->state = NVME_ANA_CHANGE;
+	ag->pending_state = state;
+	ag->subsys->ana_chgcnt++;
+	ag->change_count++;
+	/* ANA TP mandates that tha ANA group change counter rolls over to 1 */
+	if (!ag->change_count)
+		ag->change_count++;
+	up_write(&nvmet_config_sem);
+	/*
+	 * Reduce the delay timeout by one second to guarantee that
+	 * the transition is finished by ana_tt.
+	 */
+	if (ag->subsys->anatt > 0)
+		schedule_delayed_work(&ag->state_change_work,
+				      (ag->subsys->anatt - 1) * HZ);
+
+	return count;
+}
+
+CONFIGFS_ATTR(nvmet_ag_, state);
+
+static ssize_t nvmet_ag_allow_any_ns_show(struct config_item *item,
+		char *page)
+{
+	return snprintf(page, PAGE_SIZE, "%d\n",
+		to_nvmet_ag(item)->allow_any_ns);
+}
+
+static ssize_t nvmet_ag_allow_any_ns_store(struct config_item *item,
+		const char *page, size_t count)
+{
+	struct nvmet_ag *ag = to_nvmet_ag(item);
+	bool allow_any_ns;
+	int ret = 0;
+
+	if (strtobool(page, &allow_any_ns))
+		return -EINVAL;
+
+	down_write(&nvmet_config_sem);
+	if (allow_any_ns && !list_empty(&ag->ns_link)) {
+		pr_err("Can't set allow_any_ns when explicit namespaces are set!\n");
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	ag->allow_any_ns = allow_any_ns;
+out_unlock:
+	up_write(&nvmet_config_sem);
+	return ret ? ret : count;
+}
+
+CONFIGFS_ATTR(nvmet_ag_, allow_any_ns);
+
+static struct configfs_attribute *nvmet_ag_attrs[] = {
+	&nvmet_ag_attr_state,
+	&nvmet_ag_attr_allow_any_ns,
+	NULL,
+};
+
+static void nvmet_ag_release(struct config_item *item)
+{
+	struct nvmet_ag *ag = to_nvmet_ag(item);
+	struct nvmet_subsys *subsys = ag->subsys;
+
+	down_write(&nvmet_config_sem);
+	list_del_init(&ag->entry);
+	if (list_empty(&subsys->ana_groups))
+		subsys->anatt = 0;
+	up_write(&nvmet_config_sem);
+	nvmet_ag_free(ag);
+}
+
+static struct configfs_item_operations nvmet_ag_item_ops = {
+	.release		= nvmet_ag_release,
+};
+
+static const struct config_item_type nvmet_ag_type = {
+	.ct_item_ops		= &nvmet_ag_item_ops,
+	.ct_attrs		= nvmet_ag_attrs,
+	.ct_owner		= THIS_MODULE,
+};
+
+static struct config_group *nvmet_ag_make(struct config_group *group,
+		const char *name)
+{
+	struct nvmet_subsys *subsys = ag_to_subsys(&group->cg_item);
+	struct nvmet_ag *ag, *a;
+	int ret;
+	u32 agid;
+
+	ret = kstrtou32(name, 0, &agid);
+	if (ret)
+		goto out;
+
+	ret = -EINVAL;
+	if (agid == 0)
+		goto out;
+	ret = -ENOMEM;
+	ag = nvmet_ag_alloc(subsys, agid);
+	if (!ag)
+		goto out;
+	ret = -EEXIST;
+	down_write(&nvmet_config_sem);
+	if (list_empty(&subsys->ana_groups)) {
+		list_add_tail_rcu(&ag->entry, &subsys->ana_groups);
+		/* NVMe-oF requires us to set anatt != 0 for ANA support */
+		subsys->anatt = NVMET_DEFAULT_ANA_TT;
+	} else {
+		/* Keep an ordered list of ANA groups */
+		list_for_each_entry_rcu(a, &subsys->ana_groups, entry) {
+			if (a->grpid == agid) {
+				up_write(&nvmet_config_sem);
+				nvmet_ag_free(ag);
+				goto out;
+			}
+			if (agid < a->grpid)
+				break;
+		}
+		list_add_tail_rcu(&ag->entry, &subsys->ana_groups);
+	}
+	subsys->ana_chgcnt++;
+	up_write(&nvmet_config_sem);
+	config_group_init_type_name(&ag->group, name, &nvmet_ag_type);
+
+	config_group_init_type_name(&ag->ns_group,
+				    "namespaces", &nvmet_ag_ns_type);
+	configfs_add_default_group(&ag->ns_group, &ag->group);
+
+	pr_info("adding ana groupid %d to subsystem %s\n",
+		agid, subsys->subsysnqn);
+	return &ag->group;
+out:
+	return ERR_PTR(ret);
+}
+
+static struct configfs_group_operations nvmet_ana_group_group_ops = {
+	.make_group		= nvmet_ag_make,
+};
+
+static const struct config_item_type nvmet_ana_group_group_type = {
+	.ct_group_ops		= &nvmet_ana_group_group_ops,
+	.ct_owner		= THIS_MODULE,
+};
+
 static int nvmet_allowed_hosts_allow_link(struct config_item *parent,
 		struct config_item *target)
 {
@@ -704,10 +1041,45 @@ static ssize_t nvmet_subsys_attr_serial_store(struct config_item *item,
 }
 CONFIGFS_ATTR(nvmet_subsys_, attr_serial);
 
+static ssize_t nvmet_subsys_attr_anatt_show(struct config_item *item,
+					    char *page)
+{
+	struct nvmet_subsys *subsys = to_subsys(item);
+
+	if (list_empty(&subsys->ana_groups))
+		return snprintf(page, PAGE_SIZE, "0\n");
+	return snprintf(page, PAGE_SIZE, "%d\n", subsys->anatt);
+}
+
+static ssize_t nvmet_subsys_attr_anatt_store(struct config_item *item,
+					     const char *page, size_t count)
+{
+	struct nvmet_subsys *subsys = to_subsys(item);
+	int ret;
+	u8 ana_tt;
+
+	down_write(&nvmet_config_sem);
+	if (list_empty(&subsys->ana_groups)) {
+		up_write(&nvmet_config_sem);
+		return -EAGAIN;
+	}
+	ret = kstrtou8(page, 0, &ana_tt);
+	if (ret || ana_tt == 0) {
+		up_write(&nvmet_config_sem);
+		return -EINVAL;
+	}
+	subsys->anatt = ana_tt;
+	up_write(&nvmet_config_sem);
+
+	return count;
+}
+CONFIGFS_ATTR(nvmet_subsys_, attr_anatt);
+
 static struct configfs_attribute *nvmet_subsys_attrs[] = {
 	&nvmet_subsys_attr_attr_allow_any_host,
 	&nvmet_subsys_attr_attr_version,
 	&nvmet_subsys_attr_attr_serial,
+	&nvmet_subsys_attr_attr_anatt,
 	NULL,
 };
 
@@ -752,6 +1124,10 @@ static struct config_group *nvmet_subsys_make(struct config_group *group,
 			"namespaces", &nvmet_namespaces_type);
 	configfs_add_default_group(&subsys->namespaces_group, &subsys->group);
 
+	config_group_init_type_name(&subsys->ana_group,
+			"ana_groups", &nvmet_ana_group_group_type);
+	configfs_add_default_group(&subsys->ana_group, &subsys->group);
+
 	config_group_init_type_name(&subsys->allowed_hosts_group,
 			"allowed_hosts", &nvmet_allowed_hosts_type);
 	configfs_add_default_group(&subsys->allowed_hosts_group,
@@ -899,6 +1275,7 @@ static struct config_group *nvmet_ports_make(struct config_group *group,
 	INIT_LIST_HEAD(&port->entry);
 	INIT_LIST_HEAD(&port->subsystems);
 	INIT_LIST_HEAD(&port->referrals);
+	INIT_LIST_HEAD(&port->ags);
 
 	port->disc_addr.portid = cpu_to_le16(portid);
 	config_group_init_type_name(&port->group, name, &nvmet_port_type);
@@ -907,6 +1284,10 @@ static struct config_group *nvmet_ports_make(struct config_group *group,
 			"subsystems", &nvmet_port_subsys_type);
 	configfs_add_default_group(&port->subsys_group, &port->group);
 
+	config_group_init_type_name(&port->ana_group,
+			"ana_groups", &nvmet_port_ag_type);
+	configfs_add_default_group(&port->ana_group, &port->group);
+
 	config_group_init_type_name(&port->referrals_group,
 			"referrals", &nvmet_referrals_type);
 	configfs_add_default_group(&port->referrals_group, &port->group);
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 3b9e9eccc950..05a09bb5dbb5 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -144,6 +144,33 @@ static void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type,
 	schedule_work(&ctrl->async_event_work);
 }
 
+static void nvmet_ana_state_change_work(struct work_struct *work)
+{
+	struct nvmet_ag *ag = container_of(work, struct nvmet_ag,
+					   state_change_work.work);
+	struct nvmet_ctrl *ctrl;
+	struct nvmet_ag_link *a;
+
+	if (list_empty(&ag->entry))
+		return;
+	ag->state = ag->pending_state;
+	mutex_lock(&ag->subsys->lock);
+	list_for_each_entry(ctrl, &ag->subsys->ctrls, subsys_entry) {
+		if (list_empty(&ctrl->port->ags))
+			continue;
+
+		list_for_each_entry(a, &ctrl->port->ags, entry) {
+			if (a->ag == ag &&
+			    ctrl->aen_cfg & NVME_AEN_CFG_ANA_CHANGE)
+				nvmet_add_async_event(ctrl,
+						      NVME_AER_TYPE_NOTICE,
+						      NVME_AER_NOTICE_ANA,
+						      NVME_LOG_ANA);
+		}
+	}
+	mutex_unlock(&ag->subsys->lock);
+}
+
 static void nvmet_send_ns_changed_event(struct nvmet_subsys *subsys)
 {
 	/*
@@ -254,6 +281,32 @@ static void nvmet_stop_keep_alive_timer(struct nvmet_ctrl *ctrl)
 	cancel_delayed_work_sync(&ctrl->ka_work);
 }
 
+void nvmet_ag_free(struct nvmet_ag *ag)
+{
+	cancel_delayed_work_sync(&ag->state_change_work);
+	kfree(ag);
+}
+
+struct nvmet_ag *nvmet_ag_alloc(struct nvmet_subsys *subsys, u32 agid)
+{
+	struct nvmet_ag *ag;
+
+	ag = kzalloc(sizeof(*ag), GFP_KERNEL);
+	if (!ag)
+		return NULL;
+
+	INIT_LIST_HEAD(&ag->entry);
+	INIT_LIST_HEAD(&ag->ns_link);
+	INIT_DELAYED_WORK(&ag->state_change_work, nvmet_ana_state_change_work);
+
+	ag->grpid = agid;
+	ag->subsys = subsys;
+	ag->state = ag->pending_state = NVME_ANA_OPTIMIZED;
+	ag->change_count = 1;
+
+	return ag;
+}
+
 static struct nvmet_ns *__nvmet_find_namespace(struct nvmet_ctrl *ctrl,
 		__le32 nsid)
 {
@@ -339,8 +392,9 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
 
 		list_add_tail_rcu(&ns->dev_link, &old->dev_link);
 	}
-
+	subsys->nr_namespaces++;
 	nvmet_send_ns_changed_event(subsys);
+
 	ns->enabled = true;
 	ret = 0;
 out_unlock:
@@ -380,7 +434,9 @@ void nvmet_ns_disable(struct nvmet_ns *ns)
 	percpu_ref_exit(&ns->ref);
 
 	mutex_lock(&subsys->lock);
+	subsys->nr_namespaces--;
 	nvmet_send_ns_changed_event(subsys);
+
 	if (ns->bdev)
 		blkdev_put(ns->bdev, FMODE_WRITE|FMODE_READ);
 out_unlock:
@@ -759,10 +815,20 @@ static bool nvmet_host_discovery_allowed(struct nvmet_req *req,
 		const char *hostnqn)
 {
 	struct nvmet_subsys_link *s;
+	struct nvmet_ag_link *a;
 
-	list_for_each_entry(s, &req->port->subsystems, entry) {
-		if (__nvmet_host_allowed(s->subsys, hostnqn))
-			return true;
+	if (!list_empty(&req->port->ags)) {
+		list_for_each_entry(a, &req->port->ags, entry) {
+			if (__nvmet_host_allowed(a->ag->subsys, hostnqn)) {
+				rcu_read_unlock();
+				return true;
+			}
+		}
+	} else {
+		list_for_each_entry(s, &req->port->subsystems, entry) {
+			if (__nvmet_host_allowed(s->subsys, hostnqn))
+				return true;
+		}
 	}
 
 	return false;
@@ -813,7 +879,6 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 	if (!ctrl)
 		goto out_put_subsystem;
 	mutex_init(&ctrl->lock);
-
 	nvmet_init_cap(ctrl);
 
 	INIT_WORK(&ctrl->async_event_work, nvmet_async_event_work);
@@ -824,7 +889,8 @@ u16 nvmet_alloc_ctrl(const char *subsysnqn, const char *hostnqn,
 
 	kref_init(&ctrl->ref);
 	ctrl->subsys = subsys;
-	ctrl->aen_cfg = NVME_AEN_CFG_NS_ATTR;
+	ctrl->port = req->port;
+	ctrl->aen_cfg = NVME_AEN_CFG_NS_ATTR | NVME_AEN_CFG_ANA_CHANGE;
 
 	ctrl->cqs = kcalloc(subsys->max_qid + 1,
 			sizeof(struct nvmet_cq *),
@@ -949,6 +1015,8 @@ static struct nvmet_subsys *nvmet_find_get_subsys(struct nvmet_port *port,
 		const char *subsysnqn)
 {
 	struct nvmet_subsys_link *p;
+	struct nvmet_ag_link *a;
+	struct nvmet_subsys *subsys;
 
 	if (!port)
 		return NULL;
@@ -961,13 +1029,24 @@ static struct nvmet_subsys *nvmet_find_get_subsys(struct nvmet_port *port,
 	}
 
 	down_read(&nvmet_config_sem);
+	list_for_each_entry(a, &port->ags, entry) {
+		if (!strncmp(a->ag->subsys->subsysnqn, subsysnqn,
+			     NVMF_NQN_SIZE)) {
+			if (!kref_get_unless_zero(&a->ag->subsys->ref))
+				break;
+			subsys = a->ag->subsys;
+			up_read(&nvmet_config_sem);
+			return subsys;
+		}
+	}
 	list_for_each_entry(p, &port->subsystems, entry) {
 		if (!strncmp(p->subsys->subsysnqn, subsysnqn,
 				NVMF_NQN_SIZE)) {
 			if (!kref_get_unless_zero(&p->subsys->ref))
 				break;
+			subsys = p->subsys;
 			up_read(&nvmet_config_sem);
-			return p->subsys;
+			return subsys;
 		}
 	}
 	up_read(&nvmet_config_sem);
@@ -1013,6 +1092,7 @@ struct nvmet_subsys *nvmet_subsys_alloc(const char *subsysnqn,
 	INIT_LIST_HEAD(&subsys->namespaces);
 	INIT_LIST_HEAD(&subsys->ctrls);
 	INIT_LIST_HEAD(&subsys->hosts);
+	INIT_LIST_HEAD(&subsys->ana_groups);
 
 	return subsys;
 }
diff --git a/drivers/nvme/target/discovery.c b/drivers/nvme/target/discovery.c
index 231e04e0a496..f7a7bab1f9de 100644
--- a/drivers/nvme/target/discovery.c
+++ b/drivers/nvme/target/discovery.c
@@ -90,6 +90,7 @@ static void nvmet_execute_get_disc_log_page(struct nvmet_req *req)
 	size_t alloc_len = max(data_len, sizeof(*hdr));
 	int residual_len = data_len - sizeof(*hdr);
 	struct nvmet_subsys_link *p;
+	struct nvmet_ag_link *a, *b;
 	struct nvmet_port *r;
 	u32 numrec = 0;
 	u16 status = 0;
@@ -106,19 +107,47 @@ static void nvmet_execute_get_disc_log_page(struct nvmet_req *req)
 	}
 
 	down_read(&nvmet_config_sem);
-	list_for_each_entry(p, &req->port->subsystems, entry) {
-		if (!nvmet_host_allowed(req, p->subsys, ctrl->hostnqn))
-			continue;
-		if (residual_len >= entry_size) {
-			char traddr[NVMF_TRADDR_SIZE];
-
-			nvmet_set_disc_traddr(req, req->port, traddr);
-			nvmet_format_discovery_entry(hdr, req->port,
+	if (!list_empty(&req->port->ags)) {
+		list_for_each_entry(a, &req->port->ags, entry) {
+			bool dup = false;
+			if (!nvmet_host_allowed(req, a->ag->subsys,
+						ctrl->hostnqn))
+				continue;
+			/* Filter out duplicate entries */
+			list_for_each_entry(b, &req->port->ags, entry) {
+				if (a == b)
+					break;
+				if (a->ag->subsys == b->ag->subsys)
+					dup = true;
+			}
+			if (dup)
+				continue;
+			if (residual_len >= entry_size) {
+				char traddr[NVMF_TRADDR_SIZE];
+
+				nvmet_set_disc_traddr(req, req->port, traddr);
+				nvmet_format_discovery_entry(hdr, req->port,
+					a->ag->subsys->subsysnqn, traddr,
+					NVME_NQN_NVME, numrec);
+				residual_len -= entry_size;
+			}
+			numrec++;
+		}
+	} else {
+		list_for_each_entry(p, &req->port->subsystems, entry) {
+			if (!nvmet_host_allowed(req, p->subsys, ctrl->hostnqn))
+				continue;
+			if (residual_len >= entry_size) {
+				char traddr[NVMF_TRADDR_SIZE];
+
+				nvmet_set_disc_traddr(req, req->port, traddr);
+				nvmet_format_discovery_entry(hdr, req->port,
 					p->subsys->subsysnqn, traddr,
 					NVME_NQN_NVME, numrec);
-			residual_len -= entry_size;
+				residual_len -= entry_size;
+			}
+			numrec++;
 		}
-		numrec++;
 	}
 
 	list_for_each_entry(r, &req->port->referrals, entry) {
diff --git a/drivers/nvme/target/io-cmd.c b/drivers/nvme/target/io-cmd.c
index cd2344179673..7fecf8d3baac 100644
--- a/drivers/nvme/target/io-cmd.c
+++ b/drivers/nvme/target/io-cmd.c
@@ -189,6 +189,38 @@ static void nvmet_execute_write_zeroes(struct nvmet_req *req)
 	}
 }
 
+static u16 nvmet_check_ana_state(struct nvmet_req *req)
+{
+	struct nvmet_ag_link *a;
+	struct nvmet_ns_link *n;
+	struct nvmet_ag *ag = NULL;
+	enum nvme_ana_state state = 0;
+
+	if (list_empty(&req->port->ags))
+		return 0;
+
+	list_for_each_entry(a, &req->port->ags, entry) {
+		list_for_each_entry(n, &a->ag->ns_link, entry) {
+			if (n->ns == req->ns) {
+				ag = a->ag;
+				break;
+			}
+		}
+	}
+	if (ag) {
+		if (work_pending(&ag->state_change_work.work))
+			return NVME_SC_ANA_TRANSITION;
+		state = ag->state;
+	}
+	if (unlikely(state == NVME_ANA_INACCESSIBLE))
+		return NVME_SC_ANA_INACCESSIBLE;
+	if (unlikely(state == NVME_ANA_PERSISTENT_LOSS))
+		return NVME_SC_ANA_PERSISTENT_LOSS;
+	if (unlikely(state == NVME_ANA_CHANGE))
+		return NVME_SC_ANA_TRANSITION;
+	return 0;
+}
+
 u16 nvmet_parse_io_cmd(struct nvmet_req *req)
 {
 	struct nvme_command *cmd = req->cmd;
@@ -204,6 +236,10 @@ u16 nvmet_parse_io_cmd(struct nvmet_req *req)
 	if (unlikely(!req->ns))
 		return NVME_SC_INVALID_NS | NVME_SC_DNR;
 
+	ret = nvmet_check_ana_state(req);
+	if (unlikely(ret))
+		return ret;
+
 	switch (cmd->common.opcode) {
 	case nvme_cmd_read:
 	case nvme_cmd_write:
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index f5f7c1398173..a2fbe93bb735 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -29,6 +29,7 @@
 
 #define NVMET_ASYNC_EVENTS		4
 #define NVMET_ERROR_LOG_SLOTS		128
+#define NVMET_DEFAULT_ANA_TT		2
 
 /* Helper Macros when NVMe error is NVME_SC_CONNECT_INVALID_PARAM
  * The 16 bit shift is to set IATTR bit to 1, which means offending
@@ -39,6 +40,25 @@
 #define IPO_IATTR_CONNECT_SQE(x)	\
 	(cpu_to_le32(offsetof(struct nvmf_connect_command, x)))
 
+struct nvmet_ag {
+	struct list_head	entry;
+	struct list_head	ns_link;
+	struct config_group	group;
+	struct config_group	ns_group;
+	struct nvmet_subsys	*subsys;
+	struct delayed_work	state_change_work;
+	u64			change_count;
+	u32			grpid;
+	u8			state;
+	u8			pending_state;
+	bool			allow_any_ns;
+};
+
+static inline struct nvmet_ag *to_nvmet_ag(struct config_item *item)
+{
+	return container_of(to_config_group(item), struct nvmet_ag, group);
+}
+
 struct nvmet_ns {
 	struct list_head	dev_link;
 	struct percpu_ref	ref;
@@ -96,6 +116,8 @@ struct nvmet_port {
 	struct list_head		subsystems;
 	struct config_group		referrals_group;
 	struct list_head		referrals;
+	struct config_group		ana_group;
+	struct list_head		ags;
 	void				*priv;
 	bool				enabled;
 };
@@ -108,6 +130,7 @@ static inline struct nvmet_port *to_nvmet_port(struct config_item *item)
 
 struct nvmet_ctrl {
 	struct nvmet_subsys	*subsys;
+	struct nvmet_port	*port;
 	struct nvmet_cq		**cqs;
 	struct nvmet_sq		**sqs;
 
@@ -144,8 +167,14 @@ struct nvmet_subsys {
 	struct kref		ref;
 
 	struct list_head	namespaces;
+	unsigned int		nr_namespaces;
 	unsigned int		max_nsid;
 
+	struct list_head	ana_groups;
+	u32			max_grpid;
+	u8			anatt;
+	u64			ana_chgcnt;
+
 	struct list_head	ctrls;
 
 	struct list_head	hosts;
@@ -160,6 +189,7 @@ struct nvmet_subsys {
 	struct config_group	group;
 
 	struct config_group	namespaces_group;
+	struct config_group	ana_group;
 	struct config_group	allowed_hosts_group;
 };
 
@@ -175,6 +205,13 @@ static inline struct nvmet_subsys *namespaces_to_subsys(
 			namespaces_group);
 }
 
+static inline struct nvmet_subsys *ag_to_subsys(
+		struct config_item *item)
+{
+	return container_of(to_config_group(item), struct nvmet_subsys,
+			ana_group);
+}
+
 struct nvmet_host {
 	struct config_group	group;
 };
@@ -199,6 +236,16 @@ struct nvmet_subsys_link {
 	struct nvmet_subsys	*subsys;
 };
 
+struct nvmet_ag_link {
+	struct list_head	entry;
+	struct nvmet_ag		*ag;
+};
+
+struct nvmet_ns_link {
+	struct list_head	entry;
+	struct nvmet_ns		*ns;
+};
+
 struct nvmet_req;
 struct nvmet_fabrics_ops {
 	struct module *owner;
@@ -292,6 +339,9 @@ u16 nvmet_ctrl_find_get(const char *subsysnqn, const char *hostnqn, u16 cntlid,
 void nvmet_ctrl_put(struct nvmet_ctrl *ctrl);
 u16 nvmet_check_ctrl_status(struct nvmet_req *req, struct nvme_command *cmd);
 
+void nvmet_ag_free(struct nvmet_ag *ag);
+struct nvmet_ag *nvmet_ag_alloc(struct nvmet_subsys *subsys, u32 agid);
+
 struct nvmet_subsys *nvmet_subsys_alloc(const char *subsysnqn,
 		enum nvme_subsys_type type);
 void nvmet_subsys_put(struct nvmet_subsys *subsys);
-- 
2.12.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv2 02/11] nvme: submit AEN event configuration on startup
  2018-05-22  9:09 ` [PATCHv2 02/11] nvme: submit AEN event configuration on startup Hannes Reinecke
@ 2018-05-22 10:00   ` Christoph Hellwig
  2018-05-22 10:55     ` Hannes Reinecke
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2018-05-22 10:00 UTC (permalink / raw)


> +void nvme_enable_aen(struct nvme_ctrl *ctrl)
> +{
> +	u32 aen_cfg = NVME_SMART_CRIT_SPARE | NVME_SMART_CRIT_TEMPERATURE |
> +		NVME_SMART_CRIT_RELIABILITY | NVME_SMART_CRIT_MEDIA |
> +		NVME_SMART_CRIT_VOLATILE_MEMORY;

Currently the kernel doesn't really care about smart warnings at
all.  Should we make this a sysfs to that people only enable if they
actually care?

> +	u32 result;
> +	int status;
> +
> +	aen_cfg |= ctrl->aen_cfg;
> +	status = nvme_set_features(ctrl, NVME_FEAT_ASYNC_EVENT,
> +				   aen_cfg, NULL, 0, &result);
> +	if (status)
> +		dev_warn(ctrl->device, "Failed to configure AEN (cfg %x)\n",
> +			 aen_cfg);
> +}
> +

If status is negative we should abort the reset, so we need to pass
that on.

> @@ -2345,6 +2361,8 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
>  
>  	ctrl->oacs = le16_to_cpu(id->oacs);
>  	ctrl->oncs = le16_to_cpup(&id->oncs);
> +	ctrl->aen_cfg = le32_to_cpu(id->oaes) &
> +		(NVME_AEN_CFG_NS_ATTR | NVME_AEN_CFG_FW_ACT);

Hmm.  I'd rather store the oaes value in nvme_ctrl and do this
calculation in nvme_enable_aen.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCHv2 04/11] nvmet: Add AEN configuration support
  2018-05-22  9:09 ` [PATCHv2 04/11] nvmet: Add AEN configuration support Hannes Reinecke
@ 2018-05-22 10:01   ` Christoph Hellwig
  2018-05-22 10:56     ` Hannes Reinecke
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2018-05-22 10:01 UTC (permalink / raw)


> +	case NVME_FEAT_ASYNC_EVENT:
> +		val32 = le32_to_cpu(req->cmd->common.cdw10[1]);
> +		req->sq->ctrl->aen_cfg = val32 & NVME_AEN_CFG_NS_ATTR;

This needs a validity check, and either be under a lock or very
carefully accessed using WRITE_ONCE/READ_ONCE.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCHv2 11/11] nvmet: ANA support
  2018-05-22  9:10 ` [PATCHv2 11/11] nvmet: ANA support Hannes Reinecke
@ 2018-05-22 10:05   ` Christoph Hellwig
  2018-05-22 11:05     ` Hannes Reinecke
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2018-05-22 10:05 UTC (permalink / raw)


On Tue, May 22, 2018@11:10:04AM +0200, Hannes Reinecke wrote:
> Add ANA support to the nvme target. The ANA configuration is optional, and
> doesn't interfere with existing configurations.
> ANA groups are created under each subsystem, the namespaces must to be linked
> into this group to become part of the ANA group.
> If ANA groups are created it is required to link the ANA groups into the
> respective ports. Each ANA group can only be part of one port.
> Linking the entire subsystem will be refused is ANA groups are specified.
> Also this implementation has a limit of one single ANA state per group,
> irrespective of the path. So when different ANA states are required one needs
>  to create different ANA groups, one for each state.
> 
> Signed-off-by: Hannes Reinecke <hare at suse.com>
> ---
>  drivers/nvme/target/admin-cmd.c | 156 +++++++++++++++-
>  drivers/nvme/target/configfs.c  | 381 ++++++++++++++++++++++++++++++++++++++++
>  drivers/nvme/target/core.c      |  94 +++++++++-
>  drivers/nvme/target/discovery.c |  49 ++++--
>  drivers/nvme/target/io-cmd.c    |  36 ++++
>  drivers/nvme/target/nvmet.h     |  50 ++++++
>  6 files changed, 747 insertions(+), 19 deletions(-)

What is the advantage over my implementation, which is a lot simpler?

Especially the configuration here seems extremely complex for no real
gain.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCHv2 02/11] nvme: submit AEN event configuration on startup
  2018-05-22 10:00   ` Christoph Hellwig
@ 2018-05-22 10:55     ` Hannes Reinecke
  0 siblings, 0 replies; 27+ messages in thread
From: Hannes Reinecke @ 2018-05-22 10:55 UTC (permalink / raw)


On 05/22/2018 12:00 PM, Christoph Hellwig wrote:
>> +void nvme_enable_aen(struct nvme_ctrl *ctrl)
>> +{
>> +	u32 aen_cfg = NVME_SMART_CRIT_SPARE | NVME_SMART_CRIT_TEMPERATURE |
>> +		NVME_SMART_CRIT_RELIABILITY | NVME_SMART_CRIT_MEDIA |
>> +		NVME_SMART_CRIT_VOLATILE_MEMORY;
> 
> Currently the kernel doesn't really care about smart warnings at
> all.  Should we make this a sysfs to that people only enable if they
> actually care?
> 
Yeah, can do.

>> +	u32 result;
>> +	int status;
>> +
>> +	aen_cfg |= ctrl->aen_cfg;
>> +	status = nvme_set_features(ctrl, NVME_FEAT_ASYNC_EVENT,
>> +				   aen_cfg, NULL, 0, &result);
>> +	if (status)
>> +		dev_warn(ctrl->device, "Failed to configure AEN (cfg %x)\n",
>> +			 aen_cfg);
>> +}
>> +
> 
> If status is negative we should abort the reset, so we need to pass
> that on.
> 
Ah. I would've thought as this is technically a new feature, so we might 
encounter spurious errors here (which we've never noticed until now).
But if you say so ...

>> @@ -2345,6 +2361,8 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
>>   
>>   	ctrl->oacs = le16_to_cpu(id->oacs);
>>   	ctrl->oncs = le16_to_cpup(&id->oncs);
>> +	ctrl->aen_cfg = le32_to_cpu(id->oaes) &
>> +		(NVME_AEN_CFG_NS_ATTR | NVME_AEN_CFG_FW_ACT);
> 
> Hmm.  I'd rather store the oaes value in nvme_ctrl and do this
> calculation in nvme_enable_aen.
> 
Ok. Will be doing so.

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCHv2 04/11] nvmet: Add AEN configuration support
  2018-05-22 10:01   ` Christoph Hellwig
@ 2018-05-22 10:56     ` Hannes Reinecke
  0 siblings, 0 replies; 27+ messages in thread
From: Hannes Reinecke @ 2018-05-22 10:56 UTC (permalink / raw)


On 05/22/2018 12:01 PM, Christoph Hellwig wrote:
>> +	case NVME_FEAT_ASYNC_EVENT:
>> +		val32 = le32_to_cpu(req->cmd->common.cdw10[1]);
>> +		req->sq->ctrl->aen_cfg = val32 & NVME_AEN_CFG_NS_ATTR;
> 
> This needs a validity check, and either be under a lock or very
> carefully accessed using WRITE_ONCE/READ_ONCE.
>
Okay; a controller lock would be good here. Let's see what we have.

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCHv2 11/11] nvmet: ANA support
  2018-05-22 10:05   ` Christoph Hellwig
@ 2018-05-22 11:05     ` Hannes Reinecke
  2018-05-25 13:34       ` Christoph Hellwig
  0 siblings, 1 reply; 27+ messages in thread
From: Hannes Reinecke @ 2018-05-22 11:05 UTC (permalink / raw)


On 05/22/2018 12:05 PM, Christoph Hellwig wrote:
> On Tue, May 22, 2018@11:10:04AM +0200, Hannes Reinecke wrote:
>> Add ANA support to the nvme target. The ANA configuration is optional, and
>> doesn't interfere with existing configurations.
>> ANA groups are created under each subsystem, the namespaces must to be linked
>> into this group to become part of the ANA group.
>> If ANA groups are created it is required to link the ANA groups into the
>> respective ports. Each ANA group can only be part of one port.
>> Linking the entire subsystem will be refused is ANA groups are specified.
>> Also this implementation has a limit of one single ANA state per group,
>> irrespective of the path. So when different ANA states are required one needs
>>   to create different ANA groups, one for each state.
>>
>> Signed-off-by: Hannes Reinecke <hare at suse.com>
>> ---
>>   drivers/nvme/target/admin-cmd.c | 156 +++++++++++++++-
>>   drivers/nvme/target/configfs.c  | 381 ++++++++++++++++++++++++++++++++++++++++
>>   drivers/nvme/target/core.c      |  94 +++++++++-
>>   drivers/nvme/target/discovery.c |  49 ++++--
>>   drivers/nvme/target/io-cmd.c    |  36 ++++
>>   drivers/nvme/target/nvmet.h     |  50 ++++++
>>   6 files changed, 747 insertions(+), 19 deletions(-)
> 
> What is the advantage over my implementation, which is a lot simpler?
> 
> Especially the configuration here seems extremely complex for no real
> gain.
> 
Primarily this is designed to be a testbed for the various ANA 
configurations. Plus ANA support can be disabled, too :-)
So with this we can do all sorts of compability testing like running
against various revisions, and we can to scalability tests etc.

_And_ you can have a simple ANA configuration with enabling

echo 1 > allow_any_ns

then you only need to link the ANA groups into the ports.
I don't really think it's _that_ complicated.

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCHv2 08/11] nvme: add ANA support
  2018-05-22  9:10 ` [PATCHv2 08/11] nvme: add ANA support Hannes Reinecke
@ 2018-05-23 11:52   ` Popuri, Sriram
  2018-05-23 13:19     ` Hannes Reinecke
  2018-05-25 13:31     ` Christoph Hellwig
  2018-05-25 13:28   ` Christoph Hellwig
  2018-05-31 10:21   ` Sagi Grimberg
  2 siblings, 2 replies; 27+ messages in thread
From: Popuri, Sriram @ 2018-05-23 11:52 UTC (permalink / raw)


Thanks for addressing our comments. Overall looks good and have few comments as below:

1) I guess it should be bit 6 not bit 7.
- if (ctrl->anacap & (1 << 7))
+if (ctrl->anacap & (1 << 6))
2) If you resubmit because ana_xfer_size is restricted by mdts, you need to get nvme_ana_rsp_hdr at last and revalidate chgcnt. Otherwise the log you read might be corrupt.
3) NVME_SC_ANA_TRANSITION used instead of NVME_ANA_CHANGE in nvme_process_ana_log (two places)
4) Can you add info level debug for ANA state changes so its helpful for debugging.

Regards,
~Sriram

-----Original Message-----
From: Linux-nvme <linux-nvme-bounces@lists.infradead.org> On Behalf Of Hannes Reinecke
Sent: Tuesday, May 22, 2018 2:40 PM
To: Keith Busch <keith.busch at intel.com>
Cc: Hannes Reinecke <hare at suse.com>; Johannes Thumshirn <jth at kernel.org>; Sagi Grimberg <sagi at grimberg.me>; linux-nvme at lists.infradead.org; Christoph Hellwig <hch at lst.de>
Subject: [PATCHv2 08/11] nvme: add ANA support

From: Christoph Hellwig <hch@lst.de>

Adding sypport for Asymmetric Namespace Access (ANA)

Signed-off-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Hannes Reinecke <hare at suse.com>
---
 drivers/nvme/host/core.c      | 335 +++++++++++++++++++++++++++++++++++++++---
 drivers/nvme/host/multipath.c |  41 +++++-
 drivers/nvme/host/nvme.h      |  17 +++
 3 files changed, 366 insertions(+), 27 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index b4408e1e677f..0d8167c68dff 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -68,6 +68,10 @@ static bool streams;
 module_param(streams, bool, 0644);
 MODULE_PARM_DESC(streams, "turn on support for Streams write directives");
 
+static unsigned long ana_log_delay = 500; module_param(ana_log_delay, 
+ulong, 0644); MODULE_PARM_DESC(ana_log_delay, "Delay in msecs before 
+retrieving ANA log");
+
 /*
  * nvme_wq - hosts nvme related works that are not reset or delete
  * nvme_reset_wq - hosts nvme reset works @@ -1246,6 +1250,25 @@ static void nvme_put_ns_from_disk(struct nvme_ns_head *head, int idx)
 		srcu_read_unlock(&head->srcu, idx);
 }
 
+static struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, 
+unsigned nsid) {
+	struct nvme_ns *ns, *ret = NULL;
+
+	down_read(&ctrl->namespaces_rwsem);
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		if (ns->head->ns_id == nsid) {
+			if (!kref_get_unless_zero(&ns->kref))
+				continue;
+			ret = ns;
+			break;
+		}
+		if (ns->head->ns_id > nsid)
+			break;
+	}
+	up_read(&ctrl->namespaces_rwsem);
+	return ret;
+}
+
 static int nvme_ns_ioctl(struct nvme_ns *ns, unsigned cmd, unsigned long arg)  {
 	switch (cmd) {
@@ -1357,6 +1380,228 @@ static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type)  }  #endif /* CONFIG_BLK_DEV_INTEGRITY */
 
+static const char *nvme_ana_state_names[] = {
+	[0]				= "invalid state",
+	[NVME_ANA_OPTIMIZED]		= "optimized",
+	[NVME_ANA_NONOPTIMIZED]		= "non-optimized",
+	[NVME_ANA_INACCESSIBLE]		= "inaccessible",
+	[NVME_ANA_PERSISTENT_LOSS]	= "persistent-loss",
+	[NVME_ANA_CHANGE]		= "change",
+};
+
+static int nvme_process_ana_log(struct nvme_ctrl *ctrl, bool 
+groups_only) {
+	void *base = ctrl->ana_log_buf;
+	size_t offset = sizeof(struct nvme_ana_rsp_hdr);
+	int error, i, j;
+	size_t mdts = ctrl->max_hw_sectors << 9;
+	size_t ana_log_size = ctrl->ana_log_size, ana_xfer_size;
+	off_t ana_log_offset = 0;
+
+	/*
+	 * If anagrpid never changes we don't need to process the namespace
+	 * lists.
+	 */
+	if (ctrl->anacap & (1 << 7))
+		groups_only = true;
+
+	if (groups_only)
+		ana_log_size = sizeof(struct nvme_ana_rsp_hdr) +
+			ctrl->nanagrpid * sizeof(struct nvme_ana_group_desc);
+
+	ana_xfer_size = ana_log_size;
+	if (mdts && ana_log_size > mdts) {
+		dev_warn(ctrl->device,
+			 "ANA log too large (%lu, max %zu), truncating\n",
+			 ana_log_size, mdts);
+		ana_xfer_size = mdts;
+	}
+resubmit:
+	error = nvme_get_log_ext(ctrl, NULL, NVME_LOG_ANA,
+			groups_only ? NVME_ANA_LOG_RGO : 0,
+			ctrl->ana_log_buf + ana_log_offset,
+			ana_xfer_size, ana_log_offset);
+	if (error) {
+		dev_warn(ctrl->device, "Failed to get ANA log: %d\n", error);
+		return -EIO;
+	}
+	ana_log_offset += ana_xfer_size;
+	if (ana_log_size > ana_log_offset) {
+		if (ana_log_size - ana_log_offset < mdts)
+			ana_xfer_size = ana_log_size - ana_log_offset;
+		goto resubmit;
+	}
+
+	for (i = 0; i < le16_to_cpu(ctrl->ana_log_buf->ngrps); i++) {
+		struct nvme_ana_group_desc *desc = base + offset;
+		u32 grpid;
+		u32 nr_nsids;
+		size_t nsid_buf_size;
+		struct nvme_ns *ns;
+
+		if (WARN_ON_ONCE(offset > ana_log_size - sizeof(*desc)))
+			return -EINVAL;
+		grpid = le32_to_cpu(desc->grpid);
+		if (WARN_ON_ONCE(grpid == 0))
+			return -EINVAL;
+		if (WARN_ON_ONCE(grpid > ctrl->anagrpmax))
+			return -EINVAL;
+		if (WARN_ON_ONCE(desc->state == 0))
+			return -EINVAL;
+		if (WARN_ON_ONCE(desc->state > NVME_ANA_CHANGE))
+			return -EINVAL;
+
+		offset += sizeof(*desc);
+		if (groups_only) {
+			down_write(&ctrl->namespaces_rwsem);
+			list_for_each_entry(ns, &ctrl->namespaces, list) {
+				if (ns->anagrpid != grpid)
+					continue;
+				if (desc->state == NVME_SC_ANA_TRANSITION)
+					error = -EAGAIN;
+				if (ns->ana_state != desc->state) {
+					ns->ana_state = desc->state;
+					nvme_mpath_clear_current_path(ns);
+				}
+				break;
+			}
+			up_write(&ctrl->namespaces_rwsem);
+			continue;
+		}
+		nr_nsids = le32_to_cpu(desc->nnsids);
+		if (!nr_nsids)
+			continue;
+
+		nsid_buf_size = nr_nsids * sizeof(__le32);
+		if (WARN_ON_ONCE(offset > ana_log_size - nsid_buf_size))
+			return -EINVAL;
+
+		for (j = 0; j < nr_nsids; j++) {
+			u32 nsid = le32_to_cpu(desc->nsids[j]);
+
+			ns = nvme_find_get_ns(ctrl, nsid);
+			if (!ns)
+				continue;
+
+			if (ns->anagrpid != grpid ||
+			    ns->ana_state != desc->state) {
+				ns->anagrpid = grpid;
+				ns->ana_state = desc->state;
+				nvme_mpath_clear_current_path(ns);
+				if (desc->state == NVME_SC_ANA_TRANSITION)
+					error = -EAGAIN;
+			}
+			nvme_put_ns(ns);
+		}
+
+		offset += nsid_buf_size;
+	}
+
+	return error;
+}
+
+static enum nvme_ana_state nvme_get_ana_state(struct nvme_ctrl *ctrl,
+					      unsigned int grpid,
+					      unsigned int nsid)
+{
+	void *base = ctrl->ana_log_buf;
+	size_t offset = sizeof(struct nvme_ana_rsp_hdr);
+	int i, j;
+
+	if (!ctrl->ana_log_buf)
+		return NVME_ANA_OPTIMIZED;
+
+	for (i = 0; i < le16_to_cpu(ctrl->ana_log_buf->ngrps); i++) {
+		struct nvme_ana_group_desc *desc = base + offset;
+		u32 nr_nsids;
+		size_t nsid_buf_size;
+
+		if (offset > ctrl->ana_log_size - sizeof(*desc))
+			break;
+
+		offset += sizeof(*desc);
+		nr_nsids = le32_to_cpu(desc->nnsids);
+		if (!nr_nsids)
+			continue;
+
+		nsid_buf_size = nr_nsids * sizeof(__le32);
+		if (offset > ctrl->ana_log_size - nsid_buf_size)
+			break;
+
+		if (grpid == le32_to_cpu(desc->grpid)) {
+			for (j = 0; j < nr_nsids; j++) {
+				if (le32_to_cpu(desc->nsids[j]) != nsid)
+					continue;
+
+				dev_info(ctrl->device,
+					 "nsid %d: ANA group %d, %s.\n",
+					 nsid, grpid,
+					 nvme_ana_state_names[desc->state]);
+				return desc->state;
+			}
+		}
+		offset += nsid_buf_size;
+	}
+
+	return NVME_ANA_OPTIMIZED;
+}
+
+static void nvme_ana_work(struct work_struct *work) {
+	struct nvme_ctrl *ctrl = container_of(work, struct nvme_ctrl,
+					      ana_work.work);
+	int ret;
+
+	if (!ctrl->ana_log_buf)
+		return;
+	if (ctrl->state != NVME_CTRL_LIVE)
+		return;
+	ret = nvme_process_ana_log(ctrl, false);
+	if (ret == -EAGAIN || ret == -EIO) {
+		unsigned long log_delay = ctrl->anatt * HZ;
+
+		if (ret == -EIO)
+			log_delay = msecs_to_jiffies(ana_log_delay);
+		queue_delayed_work(nvme_wq, &ctrl->ana_work, log_delay);
+	}
+	nvme_kick_requeue_lists(ctrl);
+}
+
+int nvme_configure_ana(struct nvme_ctrl *ctrl) {
+	int error;
+
+	if (!nvme_ctrl_has_ana(ctrl))
+		return 0;
+
+	INIT_DELAYED_WORK(&ctrl->ana_work, nvme_ana_work);
+	ctrl->ana_log_size = sizeof(struct nvme_ana_rsp_hdr) +
+		ctrl->nanagrpid * sizeof(struct nvme_ana_group_desc) +
+		ctrl->max_namespaces * sizeof(__le32);
+	ctrl->ana_log_buf = kzalloc(ctrl->ana_log_size, GFP_KERNEL);
+	if (!ctrl->ana_log_buf)
+		return -ENOMEM;
+
+	error = nvme_process_ana_log(ctrl, false);
+	if (error) {
+		if (error != -EAGAIN && error != -EIO) {
+			kfree(ctrl->ana_log_buf);
+			ctrl->ana_log_buf = NULL;
+		} else
+			error = 0;
+	}
+	return error;
+}
+
+void nvme_deconfigure_ana(struct nvme_ctrl *ctrl) {
+	if (ctrl->ana_log_buf) {
+		cancel_delayed_work_sync(&ctrl->ana_work);
+		kfree(ctrl->ana_log_buf);
+		ctrl->ana_log_buf = NULL;
+	}
+}
+
 static void nvme_set_chunk_size(struct nvme_ns *ns)  {
 	u32 chunk_size = (((u32)ns->noiob) << (ns->lba_shift - 9)); @@ -1446,7 +1691,9 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	if (ns->ms && !nvme_ns_has_pi(ns) && !blk_get_integrity(disk))
 		capacity = 0;
 
-	set_capacity(disk, capacity);
+	if (ns->ana_state == NVME_ANA_OPTIMIZED ||
+	    ns->ana_state == NVME_ANA_NONOPTIMIZED)
+		set_capacity(disk, capacity);
 	nvme_config_discard(ns);
 	blk_mq_unfreeze_queue(disk->queue);
 }
@@ -1462,6 +1709,7 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 	ns->lba_shift = id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ds;
 	if (ns->lba_shift == 0)
 		ns->lba_shift = 9;
+	ns->anagrpid = le32_to_cpu(id->anagrpid);
 	ns->noiob = le16_to_cpu(id->noiob);
 	ns->ext = ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT);
 	ns->ms = le16_to_cpu(id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ms); @@ -1488,6 +1736,7 @@ static int nvme_revalidate_disk(struct gendisk *disk)
 	struct nvme_ctrl *ctrl = ns->ctrl;
 	struct nvme_id_ns *id;
 	struct nvme_ns_ids ids;
+	unsigned int grpid;
 	int ret = 0;
 
 	if (test_bit(NVME_NS_DEAD, &ns->flags)) { @@ -1504,6 +1753,19 @@ static int nvme_revalidate_disk(struct gendisk *disk)
 		goto out;
 	}
 
+	grpid = le32_to_cpu(id->anagrpid);
+	if (grpid) {
+		if (ns->anagrpid != grpid) {
+			dev_warn(ctrl->device, "ns %x ANA group ID changed\n",
+				 ns->head->ns_id);
+			ns->anagrpid = grpid;
+			queue_delayed_work(nvme_wq, &ctrl->ana_work,
+					   msecs_to_jiffies(ana_log_delay));
+		}
+		ns->ana_state = nvme_get_ana_state(ctrl, grpid,
+						   ns->head->ns_id);
+	}
+
 	__nvme_revalidate_disk(disk, id);
 	nvme_report_ns_ids(ctrl, ns->head->ns_id, id, &ids);
 	if (!nvme_ns_ids_equal(&ns->head->ids, &ids)) { @@ -2363,6 +2625,8 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 	ctrl->oncs = le16_to_cpup(&id->oncs);
 	ctrl->aen_cfg = le32_to_cpu(id->oaes) &
 		(NVME_AEN_CFG_NS_ATTR | NVME_AEN_CFG_FW_ACT);
+	if (nvme_ctrl_has_ana(ctrl))
+		ctrl->aen_cfg |= NVME_AEN_CFG_ANA_CHANGE;
 	atomic_set(&ctrl->abort_limit, id->acl + 1);
 	ctrl->vwc = id->vwc;
 	ctrl->cntlid = le16_to_cpup(&id->cntlid); @@ -2376,6 +2640,11 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 	nvme_set_queue_limits(ctrl, ctrl->admin_q);
 	ctrl->sgls = le32_to_cpu(id->sgls);
 	ctrl->kas = le16_to_cpu(id->kas);
+	ctrl->max_namespaces = le32_to_cpu(id->mnan);
+	ctrl->anacap = id->anacap;
+	ctrl->anatt = id->anatt;
+	ctrl->nanagrpid = le32_to_cpu(id->nanagrpid);
+	ctrl->anagrpmax = le32_to_cpu(id->anagrpmax);
 
 	if (id->rtd3e) {
 		/* us -> s */
@@ -2445,7 +2714,7 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 	ret = nvme_configure_apst(ctrl);
 	if (ret < 0)
 		return ret;
-	
+
 	ret = nvme_configure_timestamp(ctrl);
 	if (ret < 0)
 		return ret;
@@ -2454,6 +2723,10 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 	if (ret < 0)
 		return ret;
 
+	ret = nvme_configure_ana(ctrl);
+	if (ret < 0)
+		return ret;
+
 	ctrl->identified = true;
 
 	return 0;
@@ -2649,12 +2922,30 @@ static ssize_t nsid_show(struct device *dev, struct device_attribute *attr,  }  static DEVICE_ATTR_RO(nsid);
 
+static ssize_t ana_grpid_show(struct device *dev, struct device_attribute *attr,
+		char *buf)
+{
+	return sprintf(buf, "%d\n", nvme_get_ns_from_dev(dev)->anagrpid);
+}
+DEVICE_ATTR_RO(ana_grpid);
+
+static ssize_t ana_state_show(struct device *dev, struct device_attribute *attr,
+		char *buf)
+{
+	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
+
+	return sprintf(buf, "%s\n", nvme_ana_state_names[ns->ana_state]);
+}
+DEVICE_ATTR_RO(ana_state);
+
 static struct attribute *nvme_ns_id_attrs[] = {
 	&dev_attr_wwid.attr,
 	&dev_attr_uuid.attr,
 	&dev_attr_nguid.attr,
 	&dev_attr_eui.attr,
 	&dev_attr_nsid.attr,
+	&dev_attr_ana_grpid.attr,
+	&dev_attr_ana_state.attr,
 	NULL,
 };
 
@@ -2663,6 +2954,7 @@ static umode_t nvme_ns_id_attrs_are_visible(struct kobject *kobj,  {
 	struct device *dev = container_of(kobj, struct device, kobj);
 	struct nvme_ns_ids *ids = &dev_to_ns_head(dev)->ids;
+	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
 
 	if (a == &dev_attr_uuid.attr) {
 		if (uuid_is_null(&ids->uuid) &&
@@ -2677,6 +2969,10 @@ static umode_t nvme_ns_id_attrs_are_visible(struct kobject *kobj,
 		if (!memchr_inv(ids->eui64, 0, sizeof(ids->eui64)))
 			return 0;
 	}
+	if (a == &dev_attr_ana_grpid.attr || a == &dev_attr_ana_state.attr) {
+		if (!ns->anagrpid)
+			return 0;
+	}
 	return a->mode;
 }
 
@@ -2939,25 +3235,6 @@ static int ns_cmp(void *priv, struct list_head *a, struct list_head *b)
 	return nsa->head->ns_id - nsb->head->ns_id;  }
 
-static struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid) -{
-	struct nvme_ns *ns, *ret = NULL;
-
-	down_read(&ctrl->namespaces_rwsem);
-	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		if (ns->head->ns_id == nsid) {
-			if (!kref_get_unless_zero(&ns->kref))
-				continue;
-			ret = ns;
-			break;
-		}
-		if (ns->head->ns_id > nsid)
-			break;
-	}
-	up_read(&ctrl->namespaces_rwsem);
-	return ret;
-}
-
 static int nvme_setup_streams_ns(struct nvme_ctrl *ctrl, struct nvme_ns *ns)  {
 	struct streams_directive_params s;
@@ -2991,6 +3268,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	struct nvme_id_ns *id;
 	char disk_name[DISK_NAME_LEN];
 	int node = dev_to_node(ctrl->dev), flags = GENHD_FL_EXT_DEVT;
+	unsigned int grpid;
 
 	ns = kzalloc_node(sizeof(*ns), GFP_KERNEL, node);
 	if (!ns)
@@ -3002,6 +3280,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
 	ns->queue->queuedata = ns;
 	ns->ctrl = ctrl;
+	ns->ana_state = NVME_ANA_OPTIMIZED;
 
 	kref_init(&ns->kref);
 	ns->lba_shift = 9; /* set to a default value for 512 until disk is validated */ @@ -3019,7 +3298,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	if (nvme_init_ns_head(ns, nsid, id))
 		goto out_free_id;
 	nvme_setup_streams_ns(ctrl, ns);
-	
+
 #ifdef CONFIG_NVME_MULTIPATH
 	/*
 	 * If multipathing is enabled we need to always use the subsystem @@ -3063,6 +3342,10 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	memcpy(disk->disk_name, disk_name, DISK_NAME_LEN);
 	ns->disk = disk;
 
+	grpid = le32_to_cpu(id->anagrpid);
+	if (grpid)
+		ns->ana_state = nvme_get_ana_state(ctrl, grpid, nsid);
+
 	__nvme_revalidate_disk(disk, id);
 
 	down_write(&ctrl->namespaces_rwsem);
@@ -3371,6 +3654,12 @@ static void nvme_handle_aen_notice(struct nvme_ctrl *ctrl, u32 result)
 	case NVME_AER_NOTICE_FW_ACT_STARTING:
 		queue_work(nvme_wq, &ctrl->fw_act_work);
 		break;
+	case NVME_AER_NOTICE_ANA:
+		if (WARN_ON_ONCE(!ctrl->ana_log_buf))
+			break;
+		queue_delayed_work(nvme_wq, &ctrl->ana_work,
+				   msecs_to_jiffies(ana_log_delay));
+		break;
 	default:
 		dev_warn(ctrl->device, "async event result %08x\n", result);
 	}
@@ -3406,6 +3695,7 @@ void nvme_stop_ctrl(struct nvme_ctrl *ctrl)
 	nvme_stop_keep_alive(ctrl);
 	flush_work(&ctrl->async_event_work);
 	flush_work(&ctrl->scan_work);
+	cancel_delayed_work_sync(&ctrl->ana_work);
 	cancel_work_sync(&ctrl->fw_act_work);
 	if (ctrl->ops->stop_ctrl)
 		ctrl->ops->stop_ctrl(ctrl);
@@ -3440,6 +3730,7 @@ static void nvme_free_ctrl(struct device *dev)
 
 	ida_simple_remove(&nvme_instance_ida, ctrl->instance);
 	kfree(ctrl->effects);
+	nvme_deconfigure_ana(ctrl);
 
 	if (subsys) {
 		mutex_lock(&subsys->lock);
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 2802cbc7b7d0..e10bfa5e4aa5 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -29,7 +29,30 @@ void nvme_failover_req(struct request *req)
 	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
 	blk_mq_end_request(req, 0);
 
-	nvme_reset_ctrl(ns->ctrl);
+	/*
+	 * Reset the controller for any non-ANA error as we don't know what
+	 * caused the error:
+	 */
+	switch (nvme_req(req)->status & 0x7ff) {
+	case NVME_SC_ANA_TRANSITION:
+		queue_delayed_work(nvme_wq, &ns->ctrl->ana_work,
+				   ns->ctrl->anatt * HZ);
+	case NVME_SC_ANA_PERSISTENT_LOSS:
+	case NVME_SC_ANA_INACCESSIBLE:
+		/*
+		 * We could try to update the ANA group state here instead of
+		 * waiting for the AER and log page read.  But concurrency would
+		 * be nasy.
+		 */
+		nvme_mpath_clear_current_path(ns);
+		if (ns->head->disk)
+			kblockd_schedule_work(&ns->head->requeue_work);
+		break;
+	default:
+		nvme_reset_ctrl(ns->ctrl);
+		break;
+	}
+
 	kblockd_schedule_work(&ns->head->requeue_work);
 }
 
@@ -54,12 +77,14 @@ void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
 	up_read(&ctrl->namespaces_rwsem);
 }
 
-static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head)
+static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head,
+		u8 ana_state)
 {
 	struct nvme_ns *ns;
 
 	list_for_each_entry_rcu(ns, &head->list, siblings) {
-		if (ns->ctrl->state == NVME_CTRL_LIVE) {
+		if (ns->ctrl->state == NVME_CTRL_LIVE &&
+		    ns->ana_state == ana_state) {
 			rcu_assign_pointer(head->current_path, ns);
 			return ns;
 		}
@@ -72,8 +97,14 @@ inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)  {
 	struct nvme_ns *ns = srcu_dereference(head->current_path, &head->srcu);
 
-	if (unlikely(!ns || ns->ctrl->state != NVME_CTRL_LIVE))
-		ns = __nvme_find_path(head);
+	if (likely(ns && ns->ctrl->state == NVME_CTRL_LIVE &&
+			ns->ana_state == NVME_ANA_OPTIMIZED))
+		return ns;
+
+	ns = __nvme_find_path(head, NVME_ANA_OPTIMIZED);
+	if (!ns)
+		ns = __nvme_find_path(head, NVME_ANA_NONOPTIMIZED);
+	/* XXX: try an inaccessible path as last resort per 8.18.3.3 */
 	return ns;
 }
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 3f9a4f8aa310..6ea3584fa62f 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -169,6 +169,7 @@ struct nvme_ctrl {
 	u16 oacs;
 	u16 nssa;
 	u16 nr_streams;
+	u32 max_namespaces;
 	atomic_t abort_limit;
 	u8 vwc;
 	u32 vs;
@@ -190,6 +191,15 @@ struct nvme_ctrl {
 	struct nvme_command ka_cmd;
 	struct work_struct fw_act_work;
 
+	/* asymmetric namespace access: */
+	u8 anacap;
+	u8 anatt;
+	u32 anagrpmax;
+	u32 nanagrpid;
+	size_t ana_log_size;
+	struct nvme_ana_rsp_hdr *ana_log_buf;
+	struct delayed_work ana_work;
+
 	/* Power saving configuration */
 	u64 ps_max_latency_us;
 	bool apst_enabled;
@@ -295,6 +305,8 @@ struct nvme_ns {
 #define NVME_NS_REMOVING 0
 #define NVME_NS_DEAD     1
 	u16 noiob;
+	u32 anagrpid;
+	enum nvme_ana_state ana_state;
 
 #ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
 	struct nvme_fault_inject fault_inject; @@ -437,6 +449,11 @@ int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,  extern const struct attribute_group nvme_ns_id_attr_group;  extern const struct block_device_operations nvme_ns_head_ops;
 
+static inline bool nvme_ctrl_has_ana(struct nvme_ctrl *ctrl) {
+	return ctrl->subsys->cmic & (1 << 3);
+}
+
 #ifdef CONFIG_NVME_MULTIPATH
 void nvme_failover_req(struct request *req);  bool nvme_req_needs_failover(struct request *req, blk_status_t error);
--
2.12.3


_______________________________________________
Linux-nvme mailing list
Linux-nvme at lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCHv2 08/11] nvme: add ANA support
  2018-05-23 11:52   ` Popuri, Sriram
@ 2018-05-23 13:19     ` Hannes Reinecke
  2018-05-25 13:31     ` Christoph Hellwig
  1 sibling, 0 replies; 27+ messages in thread
From: Hannes Reinecke @ 2018-05-23 13:19 UTC (permalink / raw)


On 05/23/2018 01:52 PM, Popuri, Sriram wrote:
> Thanks for addressing our comments. Overall looks good and have few comments as below:
> 
> 1) I guess it should be bit 6 not bit 7.
> - if (ctrl->anacap & (1 << 7))
> +if (ctrl->anacap & (1 << 6))
Ah. Correct.
Will be fixing it up.

> 2) If you resubmit because ana_xfer_size is restricted by mdts, you need to get nvme_ana_rsp_hdr
> at last and revalidate chgcnt. Otherwise the log you read might be corrupt.
Indeed. This whole mdts stuff is pretty raw, so I'm not surprised that 
we're getting errors here.
Maybe it's worthwhile moving it into the generic function; let's see.

> 3) NVME_SC_ANA_TRANSITION used instead of NVME_ANA_CHANGE in nvme_process_ana_log (two places)

Did I? That would've been wrong indeed. Will be fixing it up.

> 4) Can you add info level debug for ANA state changes so its helpful for debugging.
> 
Thought about it myself (and actually did so during development), but 
then decided against it as the entire nvme stack is pretty silent.
Probably I'll make it a conditional logging entry here.

Thanks for the review.

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCHv2 07/11] nvme: always failover on path or transport errors
  2018-05-22  9:10 ` [PATCHv2 07/11] nvme: always failover on path or transport errors Hannes Reinecke
@ 2018-05-25 13:24   ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2018-05-25 13:24 UTC (permalink / raw)


On Tue, May 22, 2018@11:10:00AM +0200, Hannes Reinecke wrote:
>  	if (!(req->cmd_flags & REQ_NVME_MPATH))
>  		return false;
> +	if (nvme_req(req)->status & 0x300)
> +		return true;

As I love to correct everyone including myself:
This actually needs to be

	if ((nvme_req(req)->status & 0x700) == 0x300)

to accomodate for future new SCT values.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCHv2 08/11] nvme: add ANA support
  2018-05-22  9:10 ` [PATCHv2 08/11] nvme: add ANA support Hannes Reinecke
  2018-05-23 11:52   ` Popuri, Sriram
@ 2018-05-25 13:28   ` Christoph Hellwig
  2018-05-25 14:32     ` Hannes Reinecke
  2018-05-31 10:21   ` Sagi Grimberg
  2 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2018-05-25 13:28 UTC (permalink / raw)


On Tue, May 22, 2018@11:10:01AM +0200, Hannes Reinecke wrote:
> From: Christoph Hellwig <hch at lst.de>
> 
> Adding sypport for Asymmetric Namespace Access (ANA)
> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> Signed-off-by: Hannes Reinecke <hare at suse.com>

I just did a quick diff, and this really doesn't look like my
old code.  Quick difference I spotted:

 - lots of code in core.c instead of multipath.c, where it should
   be
 - adds a per-ns ana_state value, including all the crap to main
   it instead of the simple per-controller state table
 - fucks of the sysfs attributes to also who on the multipath
   node, where they can't work, and probably even corrupt
   memory
 - adds a get_log_page call per namespace, which is completely
   counter to the ANA design to not have craptons of roundtrip

And probably a few more things I didn't notice due to all the code
moves.

> ---
>  drivers/nvme/host/core.c      | 335 +++++++++++++++++++++++++++++++++++++++---
>  drivers/nvme/host/multipath.c |  41 +++++-
>  drivers/nvme/host/nvme.h      |  17 +++
>  3 files changed, 366 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index b4408e1e677f..0d8167c68dff 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -68,6 +68,10 @@ static bool streams;
>  module_param(streams, bool, 0644);
>  MODULE_PARM_DESC(streams, "turn on support for Streams write directives");
>  
> +static unsigned long ana_log_delay = 500;
> +module_param(ana_log_delay, ulong, 0644);
> +MODULE_PARM_DESC(ana_log_delay, "Delay in msecs before retrieving ANA log");

Why?  There really is no point in delaying reading the state information
and just messing things up.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCHv2 08/11] nvme: add ANA support
  2018-05-23 11:52   ` Popuri, Sriram
  2018-05-23 13:19     ` Hannes Reinecke
@ 2018-05-25 13:31     ` Christoph Hellwig
  1 sibling, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2018-05-25 13:31 UTC (permalink / raw)


On Wed, May 23, 2018@11:52:05AM +0000, Popuri, Sriram wrote:
> Thanks for addressing our comments. Overall looks good and have few comments as below:
> 
> 1) I guess it should be bit 6 not bit 7.
> - if (ctrl->anacap & (1 << 7))
> +if (ctrl->anacap & (1 << 6))
> 2) If you resubmit because ana_xfer_size is restricted by mdts, you need to get nvme_ana_rsp_hdr at last and revalidate chgcnt. Otherwise the log you read might be corrupt.

We can sensibly chunk the ANA log page anyway.  So if a controller
has NANAGRPID/MNAN config where we can't sensible read it in a single
operation we better reject the ANA support as buggy early on.

> 3) NVME_SC_ANA_TRANSITION used instead of NVME_ANA_CHANGE in nvme_process_ana_log (two places)

> 4) Can you add info level debug for ANA state changes so its helpful for debugging.

My original code had, and Hannes code has it at least in some places.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCHv2 11/11] nvmet: ANA support
  2018-05-22 11:05     ` Hannes Reinecke
@ 2018-05-25 13:34       ` Christoph Hellwig
  2018-05-31 10:27         ` Sagi Grimberg
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2018-05-25 13:34 UTC (permalink / raw)


On Tue, May 22, 2018@01:05:39PM +0200, Hannes Reinecke wrote:
> Primarily this is designed to be a testbed for the various ANA 
> configurations.

Well, hat isn't really what we need in mainline.

> Plus ANA support can be disabled, too :-)

Great.  That should be worth about 20 lines if we really need it,
not 370 lines of additional code over the initial 315.  And not
require iterating the list of all ANA groups in the ANA fast path.
And so on.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCHv2 08/11] nvme: add ANA support
  2018-05-25 13:28   ` Christoph Hellwig
@ 2018-05-25 14:32     ` Hannes Reinecke
  0 siblings, 0 replies; 27+ messages in thread
From: Hannes Reinecke @ 2018-05-25 14:32 UTC (permalink / raw)


On Fri, 25 May 2018 15:28:40 +0200
Christoph Hellwig <hch@lst.de> wrote:

> On Tue, May 22, 2018@11:10:01AM +0200, Hannes Reinecke wrote:
> > From: Christoph Hellwig <hch at lst.de>
> > 
> > Adding sypport for Asymmetric Namespace Access (ANA)
> > 
> > Signed-off-by: Christoph Hellwig <hch at lst.de>
> > Signed-off-by: Hannes Reinecke <hare at suse.com>  
> 
> I just did a quick diff, and this really doesn't look like my
> old code.  Quick difference I spotted:
> 
>  - lots of code in core.c instead of multipath.c, where it should
>    be
>  - adds a per-ns ana_state value, including all the crap to main
>    it instead of the simple per-controller state table
>  - fucks of the sysfs attributes to also who on the multipath
>    node, where they can't work, and probably even corrupt
>    memory
>  - adds a get_log_page call per namespace, which is completely
>    counter to the ANA design to not have craptons of roundtrip
> 
> And probably a few more things I didn't notice due to all the code
> moves.
> 
Okay, will be moving back to storing the ana state in the
per-controller buffer.

> > ---
> >  drivers/nvme/host/core.c      | 335
> > +++++++++++++++++++++++++++++++++++++++---
> > drivers/nvme/host/multipath.c |  41 +++++-
> > drivers/nvme/host/nvme.h      |  17 +++ 3 files changed, 366
> > insertions(+), 27 deletions(-)
> > 
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index b4408e1e677f..0d8167c68dff 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -68,6 +68,10 @@ static bool streams;
> >  module_param(streams, bool, 0644);
> >  MODULE_PARM_DESC(streams, "turn on support for Streams write
> > directives"); 
> > +static unsigned long ana_log_delay = 500;
> > +module_param(ana_log_delay, ulong, 0644);
> > +MODULE_PARM_DESC(ana_log_delay, "Delay in msecs before retrieving
> > ANA log");  
> 
> Why?  There really is no point in delaying reading the state
> information and just messing things up.

This delay will be used if we fail to get the ANA log page; in that
case we _should_ be retrying it.
The amount of retries need to be capped (hence ana_log_retries), and we
shouldn't retry immediately to give the target (and possibly lower
layers) a chance to recover (that's what ana_log_delay is for).

If you don't like having them as module parameters, fine, I can make
them as #define or something.

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCHv2 08/11] nvme: add ANA support
  2018-05-22  9:10 ` [PATCHv2 08/11] nvme: add ANA support Hannes Reinecke
  2018-05-23 11:52   ` Popuri, Sriram
  2018-05-25 13:28   ` Christoph Hellwig
@ 2018-05-31 10:21   ` Sagi Grimberg
  2 siblings, 0 replies; 27+ messages in thread
From: Sagi Grimberg @ 2018-05-31 10:21 UTC (permalink / raw)


> From: Christoph Hellwig <hch at lst.de>
> 
> Adding sypport for Asymmetric Namespace Access (ANA)

Would be nice to have a slightly more detailed change log
for a substantial addition.

> 
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> Signed-off-by: Hannes Reinecke <hare at suse.com>
> ---
>   drivers/nvme/host/core.c      | 335 +++++++++++++++++++++++++++++++++++++++---
>   drivers/nvme/host/multipath.c |  41 +++++-
>   drivers/nvme/host/nvme.h      |  17 +++
>   3 files changed, 366 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index b4408e1e677f..0d8167c68dff 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -68,6 +68,10 @@ static bool streams;
>   module_param(streams, bool, 0644);
>   MODULE_PARM_DESC(streams, "turn on support for Streams write directives");
>   
> +static unsigned long ana_log_delay = 500;
> +module_param(ana_log_delay, ulong, 0644);
> +MODULE_PARM_DESC(ana_log_delay, "Delay in msecs before retrieving ANA log");
> +
>   /*
>    * nvme_wq - hosts nvme related works that are not reset or delete
>    * nvme_reset_wq - hosts nvme reset works
> @@ -1246,6 +1250,25 @@ static void nvme_put_ns_from_disk(struct nvme_ns_head *head, int idx)
>   		srcu_read_unlock(&head->srcu, idx);
>   }
>   
> +static struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid)
> +{
> +	struct nvme_ns *ns, *ret = NULL;
> +
> +	down_read(&ctrl->namespaces_rwsem);
> +	list_for_each_entry(ns, &ctrl->namespaces, list) {
> +		if (ns->head->ns_id == nsid) {
> +			if (!kref_get_unless_zero(&ns->kref))
> +				continue;

Why continue?

> +			ret = ns;
> +			break;
> +		}
> +		if (ns->head->ns_id > nsid)
> +			break;
> +	}
> +	up_read(&ctrl->namespaces_rwsem);
> +	return ret;
> +}
> +
>   static int nvme_ns_ioctl(struct nvme_ns *ns, unsigned cmd, unsigned long arg)
>   {
>   	switch (cmd) {
> @@ -1357,6 +1380,228 @@ static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type)
>   }
>   #endif /* CONFIG_BLK_DEV_INTEGRITY */
>   
> +static const char *nvme_ana_state_names[] = {
> +	[0]				= "invalid state",
> +	[NVME_ANA_OPTIMIZED]		= "optimized",
> +	[NVME_ANA_NONOPTIMIZED]		= "non-optimized",
> +	[NVME_ANA_INACCESSIBLE]		= "inaccessible",
> +	[NVME_ANA_PERSISTENT_LOSS]	= "persistent-loss",
> +	[NVME_ANA_CHANGE]		= "change",
> +};
> +
> +static int nvme_process_ana_log(struct nvme_ctrl *ctrl, bool groups_only)
> +{

This is always called with groups_only=false - is it really needed?

> +	void *base = ctrl->ana_log_buf;
> +	size_t offset = sizeof(struct nvme_ana_rsp_hdr);
> +	int error, i, j;
> +	size_t mdts = ctrl->max_hw_sectors << 9;
> +	size_t ana_log_size = ctrl->ana_log_size, ana_xfer_size;
> +	off_t ana_log_offset = 0;
> +
> +	/*
> +	 * If anagrpid never changes we don't need to process the namespace
> +	 * lists.
> +	 */
> +	if (ctrl->anacap & (1 << 7))

Shouldn't this be bit 6?

Bit 6 if set to ?1? then the ANAGRPID field in the Identify Namespace
data structure (refer to Figure 114) does not change while the namespace
is attached to any controller. If cleared to ?0?, then the ANAGRPID
field may change while the namespace is attached to any controller.
Refer to section 8.18.2.

And can we give this a meaningful name?

> +		groups_only = true;
> +
> +	if (groups_only)
> +		ana_log_size = sizeof(struct nvme_ana_rsp_hdr) +
> +			ctrl->nanagrpid * sizeof(struct nvme_ana_group_desc);
> +
> +	ana_xfer_size = ana_log_size;
> +	if (mdts && ana_log_size > mdts) {
> +		dev_warn(ctrl->device,
> +			 "ANA log too large (%lu, max %zu), truncating\n",
> +			 ana_log_size, mdts);
> +		ana_xfer_size = mdts;
> +	}
> +resubmit:
> +	error = nvme_get_log_ext(ctrl, NULL, NVME_LOG_ANA,
> +			groups_only ? NVME_ANA_LOG_RGO : 0,
> +			ctrl->ana_log_buf + ana_log_offset,
> +			ana_xfer_size, ana_log_offset);
> +	if (error) {
> +		dev_warn(ctrl->device, "Failed to get ANA log: %d\n", error);
> +		return -EIO;
> +	}
> +	ana_log_offset += ana_xfer_size;
> +	if (ana_log_size > ana_log_offset) {
> +		if (ana_log_size - ana_log_offset < mdts)
> +			ana_xfer_size = ana_log_size - ana_log_offset;
> +		goto resubmit;
> +	}
> +
> +	for (i = 0; i < le16_to_cpu(ctrl->ana_log_buf->ngrps); i++) {
> +		struct nvme_ana_group_desc *desc = base + offset;
> +		u32 grpid;
> +		u32 nr_nsids;
> +		size_t nsid_buf_size;
> +		struct nvme_ns *ns;
> +
> +		if (WARN_ON_ONCE(offset > ana_log_size - sizeof(*desc)))
> +			return -EINVAL;
> +		grpid = le32_to_cpu(desc->grpid);
> +		if (WARN_ON_ONCE(grpid == 0))
> +			return -EINVAL;
> +		if (WARN_ON_ONCE(grpid > ctrl->anagrpmax))
> +			return -EINVAL;
> +		if (WARN_ON_ONCE(desc->state == 0))
> +			return -EINVAL;
> +		if (WARN_ON_ONCE(desc->state > NVME_ANA_CHANGE))
> +			return -EINVAL;
> +
> +		offset += sizeof(*desc);
> +		if (groups_only) {
> +			down_write(&ctrl->namespaces_rwsem);
> +			list_for_each_entry(ns, &ctrl->namespaces, list) {
> +				if (ns->anagrpid != grpid)
> +					continue;
> +				if (desc->state == NVME_SC_ANA_TRANSITION)
> +					error = -EAGAIN;
> +				if (ns->ana_state != desc->state) {
> +					ns->ana_state = desc->state;
> +					nvme_mpath_clear_current_path(ns);

Should the path clear unconditionally? even if its not the current path?

> +				}
> +				break;
> +			}
> +			up_write(&ctrl->namespaces_rwsem);
> +			continue;
> +		}

Would be easier to follow if this can be splitted into grpid handling
and nsid handling that both receive desc.

> +		nr_nsids = le32_to_cpu(desc->nnsids);
> +		if (!nr_nsids)
> +			continue;
> +
> +		nsid_buf_size = nr_nsids * sizeof(__le32);
> +		if (WARN_ON_ONCE(offset > ana_log_size - nsid_buf_size))
> +			return -EINVAL;
> +
> +		for (j = 0; j < nr_nsids; j++) {
> +			u32 nsid = le32_to_cpu(desc->nsids[j]);
> +
> +			ns = nvme_find_get_ns(ctrl, nsid);
> +			if (!ns)
> +				continue;
> +
> +			if (ns->anagrpid != grpid ||
> +			    ns->ana_state != desc->state) {
> +				ns->anagrpid = grpid;
> +				ns->ana_state = desc->state;
> +				nvme_mpath_clear_current_path(ns);
> +				if (desc->state == NVME_SC_ANA_TRANSITION)
> +					error = -EAGAIN;
> +			}
> +			nvme_put_ns(ns);
> +		}
> +
> +		offset += nsid_buf_size;
> +	}
> +
> +	return error;
> +}
> +
> +static enum nvme_ana_state nvme_get_ana_state(struct nvme_ctrl *ctrl,
> +					      unsigned int grpid,
> +					      unsigned int nsid)

any reason why not make grpid and nsid u32?

> +{
> +	void *base = ctrl->ana_log_buf;
> +	size_t offset = sizeof(struct nvme_ana_rsp_hdr);
> +	int i, j;
> +
> +	if (!ctrl->ana_log_buf)
> +		return NVME_ANA_OPTIMIZED;

can be goto out (before the last return statement). Also is this
a case of this triggers before we allocated the ana_log_buf? because
if not maybe better to ask on:
	nvme_ctrl_has_ana(ctrl)

> +
> +	for (i = 0; i < le16_to_cpu(ctrl->ana_log_buf->ngrps); i++) {
> +		struct nvme_ana_group_desc *desc = base + offset;
> +		u32 nr_nsids;
> +		size_t nsid_buf_size;
> +
> +		if (offset > ctrl->ana_log_size - sizeof(*desc))
> +			break;
> +
> +		offset += sizeof(*desc);
> +		nr_nsids = le32_to_cpu(desc->nnsids);
> +		if (!nr_nsids)
> +			continue;
> +
> +		nsid_buf_size = nr_nsids * sizeof(__le32);
> +		if (offset > ctrl->ana_log_size - nsid_buf_size)
> +			break;
> +
> +		if (grpid == le32_to_cpu(desc->grpid)) {
> +			for (j = 0; j < nr_nsids; j++) {
> +				if (le32_to_cpu(desc->nsids[j]) != nsid)
> +					continue;
> +
> +				dev_info(ctrl->device,
> +					 "nsid %d: ANA group %d, %s.\n",
> +					 nsid, grpid,
> +					 nvme_ana_state_names[desc->state]);
> +				return desc->state;
> +			}
> +		}
> +		offset += nsid_buf_size;
> +	}
> +
> +	return NVME_ANA_OPTIMIZED;
> +}
> +
> +static void nvme_ana_work(struct work_struct *work)
> +{
> +	struct nvme_ctrl *ctrl = container_of(work, struct nvme_ctrl,
> +					      ana_work.work);
> +	int ret;
> +
> +	if (!ctrl->ana_log_buf)
> +		return;

I think it will be clearer to ask on:
	nvme_ctrl_has_ana(ctrl)

> +	if (ctrl->state != NVME_CTRL_LIVE)
> +		return;
> +	ret = nvme_process_ana_log(ctrl, false);
> +	if (ret == -EAGAIN || ret == -EIO) {
> +		unsigned long log_delay = ctrl->anatt * HZ;
> +
> +		if (ret == -EIO)
> +			log_delay = msecs_to_jiffies(ana_log_delay);
> +		queue_delayed_work(nvme_wq, &ctrl->ana_work, log_delay);
> +	}
> +	nvme_kick_requeue_lists(ctrl);
> +}
> +
> +int nvme_configure_ana(struct nvme_ctrl *ctrl)
> +{
> +	int error;
> +
> +	if (!nvme_ctrl_has_ana(ctrl))
> +		return 0;
> +
> +	INIT_DELAYED_WORK(&ctrl->ana_work, nvme_ana_work);
> +	ctrl->ana_log_size = sizeof(struct nvme_ana_rsp_hdr) +
> +		ctrl->nanagrpid * sizeof(struct nvme_ana_group_desc) +
> +		ctrl->max_namespaces * sizeof(__le32);
> +	ctrl->ana_log_buf = kzalloc(ctrl->ana_log_size, GFP_KERNEL);
> +	if (!ctrl->ana_log_buf)
> +		return -ENOMEM;
> +
> +	error = nvme_process_ana_log(ctrl, false);
> +	if (error) {
> +		if (error != -EAGAIN && error != -EIO) {
> +			kfree(ctrl->ana_log_buf);
> +			ctrl->ana_log_buf = NULL;
> +		} else
> +			error = 0;
> +	}
> +	return error;
> +}
> +
> +void nvme_deconfigure_ana(struct nvme_ctrl *ctrl)
> +{
> +	if (ctrl->ana_log_buf) {

	if (nvme_ctrl_has_ana(ctrl)) {

> +		cancel_delayed_work_sync(&ctrl->ana_work);
> +		kfree(ctrl->ana_log_buf);
> +		ctrl->ana_log_buf = NULL;
> +	}
> +}
> +
>   static void nvme_set_chunk_size(struct nvme_ns *ns)
>   {
>   	u32 chunk_size = (((u32)ns->noiob) << (ns->lba_shift - 9));
> @@ -1446,7 +1691,9 @@ static void nvme_update_disk_info(struct gendisk *disk,
>   	if (ns->ms && !nvme_ns_has_pi(ns) && !blk_get_integrity(disk))
>   		capacity = 0;
>   
> -	set_capacity(disk, capacity);
> +	if (ns->ana_state == NVME_ANA_OPTIMIZED ||
> +	    ns->ana_state == NVME_ANA_NONOPTIMIZED)
> +		set_capacity(disk, capacity);

Should we really only set capacity for accessible? why is this the case?

>   	nvme_config_discard(ns);
>   	blk_mq_unfreeze_queue(disk->queue);
>   }
> @@ -1462,6 +1709,7 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
>   	ns->lba_shift = id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ds;
>   	if (ns->lba_shift == 0)
>   		ns->lba_shift = 9;
> +	ns->anagrpid = le32_to_cpu(id->anagrpid);
>   	ns->noiob = le16_to_cpu(id->noiob);
>   	ns->ext = ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT);
>   	ns->ms = le16_to_cpu(id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ms);
> @@ -1488,6 +1736,7 @@ static int nvme_revalidate_disk(struct gendisk *disk)
>   	struct nvme_ctrl *ctrl = ns->ctrl;
>   	struct nvme_id_ns *id;
>   	struct nvme_ns_ids ids;
> +	unsigned int grpid;
>   	int ret = 0;
>   
>   	if (test_bit(NVME_NS_DEAD, &ns->flags)) {
> @@ -1504,6 +1753,19 @@ static int nvme_revalidate_disk(struct gendisk *disk)
>   		goto out;
>   	}
>   
> +	grpid = le32_to_cpu(id->anagrpid);
> +	if (grpid) {
> +		if (ns->anagrpid != grpid) {
> +			dev_warn(ctrl->device, "ns %x ANA group ID changed\n",
> +				 ns->head->ns_id);
> +			ns->anagrpid = grpid;
> +			queue_delayed_work(nvme_wq, &ctrl->ana_work,
> +					   msecs_to_jiffies(ana_log_delay));
> +		}
> +		ns->ana_state = nvme_get_ana_state(ctrl, grpid,
> +						   ns->head->ns_id);
> +	}
> +
>   	__nvme_revalidate_disk(disk, id);
>   	nvme_report_ns_ids(ctrl, ns->head->ns_id, id, &ids);
>   	if (!nvme_ns_ids_equal(&ns->head->ids, &ids)) {
> @@ -2363,6 +2625,8 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
>   	ctrl->oncs = le16_to_cpup(&id->oncs);
>   	ctrl->aen_cfg = le32_to_cpu(id->oaes) &
>   		(NVME_AEN_CFG_NS_ATTR | NVME_AEN_CFG_FW_ACT);
> +	if (nvme_ctrl_has_ana(ctrl))
> +		ctrl->aen_cfg |= NVME_AEN_CFG_ANA_CHANGE;
>   	atomic_set(&ctrl->abort_limit, id->acl + 1);
>   	ctrl->vwc = id->vwc;
>   	ctrl->cntlid = le16_to_cpup(&id->cntlid);
> @@ -2376,6 +2640,11 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
>   	nvme_set_queue_limits(ctrl, ctrl->admin_q);
>   	ctrl->sgls = le32_to_cpu(id->sgls);
>   	ctrl->kas = le16_to_cpu(id->kas);
> +	ctrl->max_namespaces = le32_to_cpu(id->mnan);
> +	ctrl->anacap = id->anacap;
> +	ctrl->anatt = id->anatt;
> +	ctrl->nanagrpid = le32_to_cpu(id->nanagrpid);
> +	ctrl->anagrpmax = le32_to_cpu(id->anagrpmax);
>   
>   	if (id->rtd3e) {
>   		/* us -> s */
> @@ -2445,7 +2714,7 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
>   	ret = nvme_configure_apst(ctrl);
>   	if (ret < 0)
>   		return ret;
> -	
> +

??

>   	ret = nvme_configure_timestamp(ctrl);
>   	if (ret < 0)
>   		return ret;
> @@ -2454,6 +2723,10 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
>   	if (ret < 0)
>   		return ret;
>   
> +	ret = nvme_configure_ana(ctrl);
> +	if (ret < 0)
> +		return ret;
> +
>   	ctrl->identified = true;
>   
>   	return 0;
> @@ -2649,12 +2922,30 @@ static ssize_t nsid_show(struct device *dev, struct device_attribute *attr,
>   }
>   static DEVICE_ATTR_RO(nsid);
>   
> +static ssize_t ana_grpid_show(struct device *dev, struct device_attribute *attr,
> +		char *buf)
> +{
> +	return sprintf(buf, "%d\n", nvme_get_ns_from_dev(dev)->anagrpid);
> +}
> +DEVICE_ATTR_RO(ana_grpid);
> +
> +static ssize_t ana_state_show(struct device *dev, struct device_attribute *attr,
> +		char *buf)
> +{
> +	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
> +
> +	return sprintf(buf, "%s\n", nvme_ana_state_names[ns->ana_state]);
> +}
> +DEVICE_ATTR_RO(ana_state);
> +
>   static struct attribute *nvme_ns_id_attrs[] = {
>   	&dev_attr_wwid.attr,
>   	&dev_attr_uuid.attr,
>   	&dev_attr_nguid.attr,
>   	&dev_attr_eui.attr,
>   	&dev_attr_nsid.attr,
> +	&dev_attr_ana_grpid.attr,
> +	&dev_attr_ana_state.attr,
>   	NULL,
>   };
>   
> @@ -2663,6 +2954,7 @@ static umode_t nvme_ns_id_attrs_are_visible(struct kobject *kobj,
>   {
>   	struct device *dev = container_of(kobj, struct device, kobj);
>   	struct nvme_ns_ids *ids = &dev_to_ns_head(dev)->ids;
> +	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
>   
>   	if (a == &dev_attr_uuid.attr) {
>   		if (uuid_is_null(&ids->uuid) &&
> @@ -2677,6 +2969,10 @@ static umode_t nvme_ns_id_attrs_are_visible(struct kobject *kobj,
>   		if (!memchr_inv(ids->eui64, 0, sizeof(ids->eui64)))
>   			return 0;
>   	}
> +	if (a == &dev_attr_ana_grpid.attr || a == &dev_attr_ana_state.attr) {
> +		if (!ns->anagrpid)
> +			return 0;
> +	}
>   	return a->mode;
>   }
>   
> @@ -2939,25 +3235,6 @@ static int ns_cmp(void *priv, struct list_head *a, struct list_head *b)
>   	return nsa->head->ns_id - nsb->head->ns_id;
>   }
>   
> -static struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid)
> -{
> -	struct nvme_ns *ns, *ret = NULL;
> -
> -	down_read(&ctrl->namespaces_rwsem);
> -	list_for_each_entry(ns, &ctrl->namespaces, list) {
> -		if (ns->head->ns_id == nsid) {
> -			if (!kref_get_unless_zero(&ns->kref))
> -				continue;
> -			ret = ns;
> -			break;
> -		}
> -		if (ns->head->ns_id > nsid)
> -			break;
> -	}
> -	up_read(&ctrl->namespaces_rwsem);
> -	return ret;
> -}

OK, now I see this was copied...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCHv2 11/11] nvmet: ANA support
  2018-05-25 13:34       ` Christoph Hellwig
@ 2018-05-31 10:27         ` Sagi Grimberg
  0 siblings, 0 replies; 27+ messages in thread
From: Sagi Grimberg @ 2018-05-31 10:27 UTC (permalink / raw)



>> Primarily this is designed to be a testbed for the various ANA
>> configurations.
> 
> Well, hat isn't really what we need in mainline.
> 
>> Plus ANA support can be disabled, too :-)
> 
> Great.  That should be worth about 20 lines if we really need it,
> not 370 lines of additional code over the initial 315.  And not
> require iterating the list of all ANA groups in the ANA fast path.
> And so on.

Hey Hannes, Christoph,

I'm trying to catch up, and reviewed this set. Didn't see a v2 for this
but only "NVMe ANA fixups".. is this supposed to be v2?

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2018-05-31 10:27 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-22  9:09 [PATCHv2 00/11] nvme: ANA support Hannes Reinecke
2018-05-22  9:09 ` [PATCHv2 01/11] nvme.h: untangle AEN notice definitions Hannes Reinecke
2018-05-22  9:09 ` [PATCHv2 02/11] nvme: submit AEN event configuration on startup Hannes Reinecke
2018-05-22 10:00   ` Christoph Hellwig
2018-05-22 10:55     ` Hannes Reinecke
2018-05-22  9:09 ` [PATCHv2 03/11] nvmet: refactor AER handling Hannes Reinecke
2018-05-22  9:09 ` [PATCHv2 04/11] nvmet: Add AEN configuration support Hannes Reinecke
2018-05-22 10:01   ` Christoph Hellwig
2018-05-22 10:56     ` Hannes Reinecke
2018-05-22  9:09 ` [PATCHv2 05/11] nvme.h: add ANA definitions Hannes Reinecke
2018-05-22  9:09 ` [PATCHv2 06/11] nvme: add support for the log specific field Hannes Reinecke
2018-05-22  9:10 ` [PATCHv2 07/11] nvme: always failover on path or transport errors Hannes Reinecke
2018-05-25 13:24   ` Christoph Hellwig
2018-05-22  9:10 ` [PATCHv2 08/11] nvme: add ANA support Hannes Reinecke
2018-05-23 11:52   ` Popuri, Sriram
2018-05-23 13:19     ` Hannes Reinecke
2018-05-25 13:31     ` Christoph Hellwig
2018-05-25 13:28   ` Christoph Hellwig
2018-05-25 14:32     ` Hannes Reinecke
2018-05-31 10:21   ` Sagi Grimberg
2018-05-22  9:10 ` [PATCHv2 09/11] nvmet: add a new nvmet_zero_sgl helper Hannes Reinecke
2018-05-22  9:10 ` [PATCHv2 10/11] nvmet: split log page implementation Hannes Reinecke
2018-05-22  9:10 ` [PATCHv2 11/11] nvmet: ANA support Hannes Reinecke
2018-05-22 10:05   ` Christoph Hellwig
2018-05-22 11:05     ` Hannes Reinecke
2018-05-25 13:34       ` Christoph Hellwig
2018-05-31 10:27         ` Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.