Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 00/15 V3] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support
@ 2020-01-06 13:37 Max Gurtovoy
  2020-01-06 13:37 ` [PATCH] nvme-cli/fabrics: Add pi_enable param to connect cmd Max Gurtovoy
                   ` (15 more replies)
  0 siblings, 16 replies; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-06 13:37 UTC (permalink / raw)
  To: linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, shlomin, israelr, idanb, oren, maxg

Hello Sagi, Christoph, Keith, Martin and Co

This patchset adds metadata (T10-PI) support for NVMeoF/RDMA host side and
target side, using signature verbs API. This set starts with a few preparation
commits to the NVMe host core layer. It continues with NVMeoF/RDMA host
implementation + few preparation commits to the NVMe target core layer.
The patchset ends with NVMeoF/RDMA target implementation.

Configuration:
Host:
 - nvme connect --pi_enable --transport=rdma --traddr=10.0.1.1 --nqn=test-nvme

Target:
 - echo 1 > /config/nvmet/subsystems/${NAME}/attr_pi_enable
 - echo 1 > /config/nvmet/ports/${PORT_NUM}/param_pi_enable

The code was tested using Mellanox's ConnectX-4/ConnectX-5 HCAs.
This series applies on top of nvme_5.5 branch cleanly.

Changes from v2:
 - Convert the virtual start sector (which passed to bip_set_seed function)
   to be in integrity interval units (Patch 14/15)
 - Clarify some commit messages

Changes from v1:
 - Added Reviewed-by signatures
 - Added namespace features flag (Patch 01/15)
 - Remove nvme_ns_has_pi function (Patch 01/15)
 - Added has_pi field to struct nvme_request (Patch 01/15)
 - Subject change for patch 02/15
 - Fix comment for PCI metadata (Patch 03/15)
 - Rebase over "nvme: Avoid preallocating big SGL for data" patchset
 - Introduce NVME_INLINE_PROT_SG_CNT flag (Patch 05/15)
 - Introduce nvme_rdma_sgl structure (Patch 06/15)
 - Remove first_sgl pointer from struct nvme_rdma_request (Patch 06/15)
 - Split nvme-rdma patches (Patches 06/15, 07/15)
 - Rename is_protected to use_pi (Patch 07/15)
 - Refactor nvme_rdma_get_max_fr_pages function (Patch 07/15)
 - Added ifdef CONFIG_BLK_DEV_INTEGRITY (Patches 07/15, 09/15, 13/15,
   14/15, 15/15)
 - Added port configfs pi_enable (Patch 14/15)

Israel Rukshin (12):
  nvme: Introduce namespace features flag
  nvme-fabrics: Allow user enabling metadata/T10-PI support
  nvme: Introduce NVME_INLINE_PROT_SG_CNT
  nvme-rdma: Introduce nvme_rdma_sgl structure
  nvmet: Prepare metadata request
  nvmet: Add metadata characteristics for a namespace
  nvmet: Rename nvmet_rw_len to nvmet_rw_data_len
  nvmet: Rename nvmet_check_data_len to nvmet_check_transfer_len
  nvme: Add Metadata Capabilities enumerations
  nvmet: Add metadata/T10-PI support
  nvmet: Add metadata support for block devices
  nvmet-rdma: Add metadata/T10-PI support

Max Gurtovoy (3):
  nvme: Enforce extended LBA format for fabrics metadata
  nvme: Introduce max_integrity_segments ctrl attribute
  nvme-rdma: Add metadata/T10-PI support

 drivers/nvme/host/core.c          |  75 +++++---
 drivers/nvme/host/fabrics.c       |  11 ++
 drivers/nvme/host/fabrics.h       |   3 +
 drivers/nvme/host/nvme.h          |   9 +-
 drivers/nvme/host/pci.c           |   7 +
 drivers/nvme/host/rdma.c          | 368 +++++++++++++++++++++++++++++++++-----
 drivers/nvme/target/admin-cmd.c   |  37 ++--
 drivers/nvme/target/configfs.c    |  61 +++++++
 drivers/nvme/target/core.c        |  54 ++++--
 drivers/nvme/target/discovery.c   |   8 +-
 drivers/nvme/target/fabrics-cmd.c |  19 +-
 drivers/nvme/target/io-cmd-bdev.c | 115 +++++++++++-
 drivers/nvme/target/io-cmd-file.c |   8 +-
 drivers/nvme/target/nvmet.h       |  40 ++++-
 drivers/nvme/target/rdma.c        | 235 ++++++++++++++++++++++--
 include/linux/nvme.h              |   2 +
 16 files changed, 925 insertions(+), 127 deletions(-)

-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH] nvme-cli/fabrics: Add pi_enable param to connect cmd
  2020-01-06 13:37 [PATCH 00/15 V3] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
@ 2020-01-06 13:37 ` Max Gurtovoy
  2020-01-06 13:37 ` [PATCH 01/15] nvme: Introduce namespace features flag Max Gurtovoy
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-06 13:37 UTC (permalink / raw)
  To: linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, shlomin, israelr, idanb, oren, maxg

From: Israel Rukshin <israelr@mellanox.com>

Added 'pi_enable' to 'connect' command so users can enable metadata support.

usage examples:
nvme connect --pi_enable --transport=rdma --traddr=10.0.1.1 --nqn=test-nvme
nvme connect -p -t rdma -a 10.0.1.1 -n test_nvme

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
---
 fabrics.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/fabrics.c b/fabrics.c
index 8982ae4..b22ea14 100644
--- a/fabrics.c
+++ b/fabrics.c
@@ -72,6 +72,7 @@ static struct config {
 	int  disable_sqflow;
 	int  hdr_digest;
 	int  data_digest;
+	int  pi_enable;
 	bool persistent;
 	bool quiet;
 } cfg = { NULL };
@@ -756,7 +757,9 @@ static int build_options(char *argstr, int max_len, bool discover)
 	    add_bool_argument(&argstr, &max_len, "disable_sqflow",
 				cfg.disable_sqflow) ||
 	    add_bool_argument(&argstr, &max_len, "hdr_digest", cfg.hdr_digest) ||
-	    add_bool_argument(&argstr, &max_len, "data_digest", cfg.data_digest))
+	    add_bool_argument(&argstr, &max_len, "data_digest",
+				cfg.data_digest) ||
+	    add_bool_argument(&argstr, &max_len, "pi_enable", cfg.pi_enable))
 		return -EINVAL;
 
 	return 0;
@@ -885,6 +888,13 @@ retry:
 		p += len;
 	}
 
+	if (cfg.pi_enable) {
+		len = sprintf(p, ",pi_enable");
+		if (len < 0)
+			return -EINVAL;
+		p += len;
+	}
+
 	switch (e->trtype) {
 	case NVMF_TRTYPE_RDMA:
 	case NVMF_TRTYPE_TCP:
@@ -1171,6 +1181,7 @@ int discover(const char *desc, int argc, char **argv, bool connect)
 		OPT_INT("tos",             'T', &cfg.tos,             "type of service"),
 		OPT_FLAG("hdr_digest",     'g', &cfg.hdr_digest,      "enable transport protocol header digest (TCP transport)"),
 		OPT_FLAG("data_digest",    'G', &cfg.data_digest,     "enable transport protocol data digest (TCP transport)"),
+		OPT_FLAG("pi_enable",      'p', &cfg.pi_enable,       "enable metadata (T10-PI) support (default false)"),
 		OPT_INT("nr-io-queues",    'i', &cfg.nr_io_queues,    "number of io queues to use (default is core count)"),
 		OPT_INT("nr-write-queues", 'W', &cfg.nr_write_queues, "number of write queues to use (default 0)"),
 		OPT_INT("nr-poll-queues",  'P', &cfg.nr_poll_queues,  "number of poll queues to use (default 0)"),
@@ -1231,6 +1242,7 @@ int connect(const char *desc, int argc, char **argv)
 		OPT_FLAG("disable-sqflow",    'd', &cfg.disable_sqflow,    "disable controller sq flow control (default false)"),
 		OPT_FLAG("hdr-digest",        'g', &cfg.hdr_digest,        "enable transport protocol header digest (TCP transport)"),
 		OPT_FLAG("data-digest",       'G', &cfg.data_digest,       "enable transport protocol data digest (TCP transport)"),
+		OPT_FLAG("pi_enable",         'p', &cfg.pi_enable,         "enable metadata (T10-PI) support (default false)"),
 		OPT_END()
 	};
 
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 01/15] nvme: Introduce namespace features flag
  2020-01-06 13:37 [PATCH 00/15 V3] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
  2020-01-06 13:37 ` [PATCH] nvme-cli/fabrics: Add pi_enable param to connect cmd Max Gurtovoy
@ 2020-01-06 13:37 ` Max Gurtovoy
  2020-01-07 18:07   ` Keith Busch
  2020-01-09  3:11   ` Martin K. Petersen
  2020-01-06 13:37 ` [PATCH 02/15] nvme: Enforce extended LBA format for fabrics metadata Max Gurtovoy
                   ` (13 subsequent siblings)
  15 siblings, 2 replies; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-06 13:37 UTC (permalink / raw)
  To: linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, shlomin, israelr, idanb, oren, maxg

From: Israel Rukshin <israelr@mellanox.com>

Centralize all the integrity checks to one place and make the code more
readable. Also add has_pi field to the nvme_req structure as well, so the
transport drivers could use it.

Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
---
 drivers/nvme/host/core.c | 40 +++++++++++++++++++++++++++-------------
 drivers/nvme/host/nvme.h |  6 +++++-
 2 files changed, 32 insertions(+), 14 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 2a84e14..d98eb48 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -209,11 +209,6 @@ static int nvme_delete_ctrl_sync(struct nvme_ctrl *ctrl)
 	return ret;
 }
 
-static inline bool nvme_ns_has_pi(struct nvme_ns *ns)
-{
-	return ns->pi_type && ns->ms == sizeof(struct t10_pi_tuple);
-}
-
 static blk_status_t nvme_error_status(u16 status)
 {
 	switch (status & 0x7ff) {
@@ -473,6 +468,7 @@ static inline void nvme_clear_nvme_request(struct request *req)
 	if (!(req->rq_flags & RQF_DONTPREP)) {
 		nvme_req(req)->retries = 0;
 		nvme_req(req)->flags = 0;
+		nvme_req(req)->has_pi = false;
 		req->rq_flags |= RQF_DONTPREP;
 	}
 }
@@ -706,6 +702,7 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
 		nvme_assign_write_stream(ctrl, req, &control, &dsmgmt);
 
 	if (ns->ms) {
+		nvme_req(req)->has_pi = ns->features & NVME_NS_DIF_SUPPORTED;
 		/*
 		 * If formated with metadata, the block layer always provides a
 		 * metadata buffer if CONFIG_BLK_DEV_INTEGRITY is enabled.  Else
@@ -713,7 +710,7 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
 		 * namespace capacity to zero to prevent any I/O.
 		 */
 		if (!blk_integrity_rq(req)) {
-			if (WARN_ON_ONCE(!nvme_ns_has_pi(ns)))
+			if (WARN_ON_ONCE(!nvme_req(req)->has_pi))
 				return BLK_STS_NOTSUPP;
 			control |= NVME_RW_PRINFO_PRACT;
 		}
@@ -1271,7 +1268,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
 	meta_len = (io.nblocks + 1) * ns->ms;
 	metadata = (void __user *)(uintptr_t)io.metadata;
 
-	if (ns->ext) {
+	if (ns->features & NVME_NS_EXT_LBAS) {
 		length += meta_len;
 		meta_len = 0;
 	} else if (meta_len) {
@@ -1801,10 +1798,10 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	blk_queue_io_min(disk->queue, phys_bs);
 	blk_queue_io_opt(disk->queue, io_opt);
 
-	if (ns->ms && !ns->ext &&
-	    (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
+	if (ns->features & NVME_NS_DIX_SUPPORTED)
 		nvme_init_integrity(disk, ns->ms, ns->pi_type);
-	if ((ns->ms && !nvme_ns_has_pi(ns) && !blk_get_integrity(disk)) ||
+	if ((ns->ms && !(ns->features & NVME_NS_DIF_SUPPORTED) &&
+	     !blk_get_integrity(disk)) ||
 	    ns->lba_shift > PAGE_SHIFT)
 		capacity = 0;
 
@@ -1834,12 +1831,29 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 		ns->lba_shift = 9;
 	ns->noiob = le16_to_cpu(id->noiob);
 	ns->ms = le16_to_cpu(id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ms);
-	ns->ext = ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT);
+	ns->features = 0;
 	/* the PI implementation requires metadata equal t10 pi tuple size */
-	if (ns->ms == sizeof(struct t10_pi_tuple))
+	if (ns->ms == sizeof(struct t10_pi_tuple)) {
 		ns->pi_type = id->dps & NVME_NS_DPS_PI_MASK;
-	else
+		if (ns->pi_type)
+			ns->features |= NVME_NS_DIF_SUPPORTED;
+	} else {
 		ns->pi_type = 0;
+	}
+
+	if (ns->ms) {
+		if (id->flbas & NVME_NS_FLBAS_META_EXT)
+			ns->features |= NVME_NS_EXT_LBAS;
+
+		/*
+		 * For PCI, Extended logical block will be generated by the
+		 * controller.
+		 */
+		if (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) {
+			if (!(ns->features & NVME_NS_EXT_LBAS))
+				ns->features |= NVME_NS_DIX_SUPPORTED;
+		}
+	}
 
 	if (ns->noiob)
 		nvme_set_chunk_size(ns);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 1024fec..7247390 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -139,6 +139,7 @@ struct nvme_request {
 	u8			flags;
 	u16			status;
 	struct nvme_ctrl	*ctrl;
+	bool			has_pi;
 };
 
 /*
@@ -381,8 +382,11 @@ struct nvme_ns {
 	u16 ms;
 	u16 sgs;
 	u32 sws;
-	bool ext;
 	u8 pi_type;
+	unsigned long features;
+#define NVME_NS_EXT_LBAS	(1 << 0)
+#define NVME_NS_DIX_SUPPORTED	(1 << 1)
+#define NVME_NS_DIF_SUPPORTED	(1 << 2)
 	unsigned long flags;
 #define NVME_NS_REMOVING	0
 #define NVME_NS_DEAD     	1
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 02/15] nvme: Enforce extended LBA format for fabrics metadata
  2020-01-06 13:37 [PATCH 00/15 V3] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
  2020-01-06 13:37 ` [PATCH] nvme-cli/fabrics: Add pi_enable param to connect cmd Max Gurtovoy
  2020-01-06 13:37 ` [PATCH 01/15] nvme: Introduce namespace features flag Max Gurtovoy
@ 2020-01-06 13:37 ` Max Gurtovoy
  2020-01-16 23:53   ` James Smart
  2020-01-06 13:37 ` [PATCH 03/15] nvme: Introduce max_integrity_segments ctrl attribute Max Gurtovoy
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-06 13:37 UTC (permalink / raw)
  To: linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, shlomin, israelr, idanb, oren, maxg

An extended LBA is a larger LBA that is created when metadata associated
with the LBA is transferred contiguously with the LBA data (AKA
interleaved). The metadata may be either transferred as part of the LBA
(creating an extended LBA) or it may be transferred as a separate
contiguous buffer of data. According to the NVMeoF spec, a fabrics ctrl
supports only an Extended LBA format. Fail revalidation in case we have a
spec violation. Also initialize the integrity profile for the block device
for fabrics ctrl.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
---
 drivers/nvme/host/core.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d98eb48..089cdc3c 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1818,7 +1818,7 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	blk_mq_unfreeze_queue(disk->queue);
 }
 
-static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
+static int __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 {
 	struct nvme_ns *ns = disk->private_data;
 
@@ -1846,11 +1846,21 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 			ns->features |= NVME_NS_EXT_LBAS;
 
 		/*
+		 * For Fabrics, only metadata as part of extended data LBA is
+		 * supported. Fail in case of a spec violation.
+		 */
+		if (ns->ctrl->ops->flags & NVME_F_FABRICS) {
+			if (WARN_ON_ONCE(!(ns->features & NVME_NS_EXT_LBAS)))
+				return -EINVAL;
+		}
+
+		/*
 		 * For PCI, Extended logical block will be generated by the
 		 * controller.
 		 */
 		if (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) {
-			if (!(ns->features & NVME_NS_EXT_LBAS))
+			if (ns->ctrl->ops->flags & NVME_F_FABRICS ||
+			    !(ns->features & NVME_NS_EXT_LBAS))
 				ns->features |= NVME_NS_DIX_SUPPORTED;
 		}
 	}
@@ -1865,6 +1875,7 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 		revalidate_disk(ns->head->disk);
 	}
 #endif
+	return 0;
 }
 
 static int nvme_revalidate_disk(struct gendisk *disk)
@@ -1889,7 +1900,10 @@ static int nvme_revalidate_disk(struct gendisk *disk)
 		goto free_id;
 	}
 
-	__nvme_revalidate_disk(disk, id);
+	ret = __nvme_revalidate_disk(disk, id);
+	if (ret)
+		goto free_id;
+
 	ret = nvme_report_ns_ids(ctrl, ns->head->ns_id, id, &ids);
 	if (ret)
 		goto free_id;
@@ -3565,7 +3579,8 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	memcpy(disk->disk_name, disk_name, DISK_NAME_LEN);
 	ns->disk = disk;
 
-	__nvme_revalidate_disk(disk, id);
+	if (__nvme_revalidate_disk(disk, id))
+		goto out_free_disk;
 
 	if ((ctrl->quirks & NVME_QUIRK_LIGHTNVM) && id->vs[0] == 0x1) {
 		ret = nvme_nvm_register(ns, disk_name, node);
@@ -3590,6 +3605,8 @@ static int nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	return 0;
  out_put_disk:
 	put_disk(ns->disk);
+ out_free_disk:
+	del_gendisk(ns->disk);
  out_unlink_ns:
 	mutex_lock(&ctrl->subsys->lock);
 	list_del_rcu(&ns->siblings);
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 03/15] nvme: Introduce max_integrity_segments ctrl attribute
  2020-01-06 13:37 [PATCH 00/15 V3] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (2 preceding siblings ...)
  2020-01-06 13:37 ` [PATCH 02/15] nvme: Enforce extended LBA format for fabrics metadata Max Gurtovoy
@ 2020-01-06 13:37 ` Max Gurtovoy
  2020-01-09  3:12   ` Martin K. Petersen
  2020-01-06 13:37 ` [PATCH 04/15] nvme-fabrics: Allow user enabling metadata/T10-PI support Max Gurtovoy
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-06 13:37 UTC (permalink / raw)
  To: linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, shlomin, israelr, idanb, oren, maxg

This patch doesn't change any logic, and is needed as a preparation
for adding PI support for fabrics drivers that will use an extended
LBA format for metadata.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/core.c | 11 +++++++----
 drivers/nvme/host/nvme.h |  1 +
 drivers/nvme/host/pci.c  |  7 +++++++
 3 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 089cdc3c..d21a716 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1619,7 +1619,8 @@ static int nvme_getgeo(struct block_device *bdev, struct hd_geometry *geo)
 }
 
 #ifdef CONFIG_BLK_DEV_INTEGRITY
-static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type)
+static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type,
+				u32 max_integrity_segments)
 {
 	struct blk_integrity integrity;
 
@@ -1642,10 +1643,11 @@ static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type)
 	}
 	integrity.tuple_size = ms;
 	blk_integrity_register(disk, &integrity);
-	blk_queue_max_integrity_segments(disk->queue, 1);
+	blk_queue_max_integrity_segments(disk->queue, max_integrity_segments);
 }
 #else
-static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type)
+static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type,
+				u32 max_integrity_segments)
 {
 }
 #endif /* CONFIG_BLK_DEV_INTEGRITY */
@@ -1799,7 +1801,8 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	blk_queue_io_opt(disk->queue, io_opt);
 
 	if (ns->features & NVME_NS_DIX_SUPPORTED)
-		nvme_init_integrity(disk, ns->ms, ns->pi_type);
+		nvme_init_integrity(disk, ns->ms, ns->pi_type,
+				    ns->ctrl->max_integrity_segments);
 	if ((ns->ms && !(ns->features & NVME_NS_DIF_SUPPORTED) &&
 	     !blk_get_integrity(disk)) ||
 	    ns->lba_shift > PAGE_SHIFT)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 7247390..0f8aacf 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -229,6 +229,7 @@ struct nvme_ctrl {
 	u32 page_size;
 	u32 max_hw_sectors;
 	u32 max_segments;
+	u32 max_integrity_segments;
 	u16 crdt[3];
 	u16 oncs;
 	u16 oacs;
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 365a2dd..c7113a5 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2580,6 +2580,13 @@ static void nvme_reset_work(struct work_struct *work)
 		goto out;
 	}
 
+	/*
+	 * NVMe PCI driver doesn't support Extended LBA format and supports
+	 * only a single integrity segment for a separate contiguous buffer
+	 * of metadata.
+	 */
+	dev->ctrl.max_integrity_segments = 1;
+
 	result = nvme_init_identify(&dev->ctrl);
 	if (result)
 		goto out;
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 04/15] nvme-fabrics: Allow user enabling metadata/T10-PI support
  2020-01-06 13:37 [PATCH 00/15 V3] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (3 preceding siblings ...)
  2020-01-06 13:37 ` [PATCH 03/15] nvme: Introduce max_integrity_segments ctrl attribute Max Gurtovoy
@ 2020-01-06 13:37 ` Max Gurtovoy
  2020-01-06 13:37 ` [PATCH 05/15] nvme: Introduce NVME_INLINE_PROT_SG_CNT Max Gurtovoy
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-06 13:37 UTC (permalink / raw)
  To: linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, shlomin, israelr, idanb, oren, maxg

From: Israel Rukshin <israelr@mellanox.com>

Preparation for adding metadata (T10-PI) over fabric support. This will
allow end-to-end protection information passthrough and validation for
NVMe over Fabric.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
---
 drivers/nvme/host/core.c    |  3 ++-
 drivers/nvme/host/fabrics.c | 11 +++++++++++
 drivers/nvme/host/fabrics.h |  3 +++
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d21a716..4cd5b95 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1862,7 +1862,8 @@ static int __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 		 * controller.
 		 */
 		if (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) {
-			if (ns->ctrl->ops->flags & NVME_F_FABRICS ||
+			if ((ns->ctrl->ops->flags & NVME_F_FABRICS &&
+			     ns->ctrl->opts->pi_enable) ||
 			    !(ns->features & NVME_NS_EXT_LBAS))
 				ns->features |= NVME_NS_DIX_SUPPORTED;
 		}
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 74b8818..c09230e 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -612,6 +612,7 @@ bool __nvmf_check_ready(struct nvme_ctrl *ctrl, struct request *rq,
 	{ NVMF_OPT_NR_WRITE_QUEUES,	"nr_write_queues=%d"	},
 	{ NVMF_OPT_NR_POLL_QUEUES,	"nr_poll_queues=%d"	},
 	{ NVMF_OPT_TOS,			"tos=%d"		},
+	{ NVMF_OPT_PI_ENABLE,		"pi_enable"		},
 	{ NVMF_OPT_ERR,			NULL			}
 };
 
@@ -634,6 +635,7 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
 	opts->hdr_digest = false;
 	opts->data_digest = false;
 	opts->tos = -1; /* < 0 == use transport default */
+	opts->pi_enable = false;
 
 	options = o = kstrdup(buf, GFP_KERNEL);
 	if (!options)
@@ -867,6 +869,15 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts,
 			}
 			opts->tos = token;
 			break;
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+		case NVMF_OPT_PI_ENABLE:
+			if (opts->discovery_nqn) {
+				pr_debug("Ignoring pi_enable value for discovery controller\n");
+				break;
+			}
+			opts->pi_enable = true;
+			break;
+#endif
 		default:
 			pr_warn("unknown parameter or missing value '%s' in ctrl creation request\n",
 				p);
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index a0ec40a..773f748 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -56,6 +56,7 @@ enum {
 	NVMF_OPT_NR_WRITE_QUEUES = 1 << 17,
 	NVMF_OPT_NR_POLL_QUEUES = 1 << 18,
 	NVMF_OPT_TOS		= 1 << 19,
+	NVMF_OPT_PI_ENABLE	= 1 << 20,
 };
 
 /**
@@ -89,6 +90,7 @@ enum {
  * @nr_write_queues: number of queues for write I/O
  * @nr_poll_queues: number of queues for polling I/O
  * @tos: type of service
+ * @pi_enable: Enable metadata (T10-PI) support
  */
 struct nvmf_ctrl_options {
 	unsigned		mask;
@@ -111,6 +113,7 @@ struct nvmf_ctrl_options {
 	unsigned int		nr_write_queues;
 	unsigned int		nr_poll_queues;
 	int			tos;
+	bool			pi_enable;
 };
 
 /*
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 05/15] nvme: Introduce NVME_INLINE_PROT_SG_CNT
  2020-01-06 13:37 [PATCH 00/15 V3] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (4 preceding siblings ...)
  2020-01-06 13:37 ` [PATCH 04/15] nvme-fabrics: Allow user enabling metadata/T10-PI support Max Gurtovoy
@ 2020-01-06 13:37 ` Max Gurtovoy
  2020-01-09  3:13   ` Martin K. Petersen
  2020-01-06 13:37 ` [PATCH 06/15] nvme-rdma: Introduce nvme_rdma_sgl structure Max Gurtovoy
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-06 13:37 UTC (permalink / raw)
  To: linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, shlomin, israelr, idanb, oren, maxg

From: Israel Rukshin <israelr@mellanox.com>

SGL size of PI metadata is usually small. Thus, 1 inline sg should
cover most cases. The macro will be used for pre-allocate a single SGL
entry for protection information. The preallocation of small inline SGLs
depends on SG_CHAIN capability so if the ARCH doesn't support SG_CHAIN,
use the runtime allocation for the SGL. This patch is a preparation for
adding metadata (T10-PI) over fabric support.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
---
 drivers/nvme/host/nvme.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 0f8aacf..5c94fcf 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -30,8 +30,10 @@
 
 #ifdef CONFIG_ARCH_NO_SG_CHAIN
 #define  NVME_INLINE_SG_CNT  0
+#define  NVME_INLINE_PROT_SG_CNT  0
 #else
 #define  NVME_INLINE_SG_CNT  2
+#define  NVME_INLINE_PROT_SG_CNT  1
 #endif
 
 extern struct workqueue_struct *nvme_wq;
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 06/15] nvme-rdma: Introduce nvme_rdma_sgl structure
  2020-01-06 13:37 [PATCH 00/15 V3] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (5 preceding siblings ...)
  2020-01-06 13:37 ` [PATCH 05/15] nvme: Introduce NVME_INLINE_PROT_SG_CNT Max Gurtovoy
@ 2020-01-06 13:37 ` Max Gurtovoy
  2020-01-06 13:37 ` [PATCH 07/15] nvme-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-06 13:37 UTC (permalink / raw)
  To: linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, shlomin, israelr, idanb, oren, maxg

From: Israel Rukshin <israelr@mellanox.com>

Remove first_sgl pointer from struct nvme_rdma_request and use pointer
arithmetic instead. The inline scatterlist, if exists, will be located
right after the nvme_rdma_request. This patch is needed as a preparation
for adding PI support.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
---
 drivers/nvme/host/rdma.c | 41 ++++++++++++++++++++++++-----------------
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 2a47c6c..b6dd9f5 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -48,6 +48,11 @@ struct nvme_rdma_qe {
 	u64			dma;
 };
 
+struct nvme_rdma_sgl {
+	int			nents;
+	struct sg_table		sg_table;
+};
+
 struct nvme_rdma_queue;
 struct nvme_rdma_request {
 	struct nvme_request	req;
@@ -58,12 +63,10 @@ struct nvme_rdma_request {
 	refcount_t		ref;
 	struct ib_sge		sge[1 + NVME_RDMA_MAX_INLINE_SEGMENTS];
 	u32			num_sge;
-	int			nents;
 	struct ib_reg_wr	reg_wr;
 	struct ib_cqe		reg_cqe;
 	struct nvme_rdma_queue  *queue;
-	struct sg_table		sg_table;
-	struct scatterlist	first_sgl[];
+	struct nvme_rdma_sgl	data_sgl;
 };
 
 enum nvme_rdma_queue_flags {
@@ -1159,8 +1162,9 @@ static void nvme_rdma_unmap_data(struct nvme_rdma_queue *queue,
 		req->mr = NULL;
 	}
 
-	ib_dma_unmap_sg(ibdev, req->sg_table.sgl, req->nents, rq_dma_dir(rq));
-	sg_free_table_chained(&req->sg_table, NVME_INLINE_SG_CNT);
+	ib_dma_unmap_sg(ibdev, req->data_sgl.sg_table.sgl, req->data_sgl.nents,
+			rq_dma_dir(rq));
+	sg_free_table_chained(&req->data_sgl.sg_table, NVME_INLINE_SG_CNT);
 }
 
 static int nvme_rdma_set_sg_null(struct nvme_command *c)
@@ -1179,7 +1183,7 @@ static int nvme_rdma_map_sg_inline(struct nvme_rdma_queue *queue,
 		int count)
 {
 	struct nvme_sgl_desc *sg = &c->common.dptr.sgl;
-	struct scatterlist *sgl = req->sg_table.sgl;
+	struct scatterlist *sgl = req->data_sgl.sg_table.sgl;
 	struct ib_sge *sge = &req->sge[1];
 	u32 len = 0;
 	int i;
@@ -1204,8 +1208,8 @@ static int nvme_rdma_map_sg_single(struct nvme_rdma_queue *queue,
 {
 	struct nvme_keyed_sgl_desc *sg = &c->common.dptr.ksgl;
 
-	sg->addr = cpu_to_le64(sg_dma_address(req->sg_table.sgl));
-	put_unaligned_le24(sg_dma_len(req->sg_table.sgl), sg->length);
+	sg->addr = cpu_to_le64(sg_dma_address(req->data_sgl.sg_table.sgl));
+	put_unaligned_le24(sg_dma_len(req->data_sgl.sg_table.sgl), sg->length);
 	put_unaligned_le32(queue->device->pd->unsafe_global_rkey, sg->key);
 	sg->type = NVME_KEY_SGL_FMT_DATA_DESC << 4;
 	return 0;
@@ -1226,7 +1230,8 @@ static int nvme_rdma_map_sg_fr(struct nvme_rdma_queue *queue,
 	 * Align the MR to a 4K page size to match the ctrl page size and
 	 * the block virtual boundary.
 	 */
-	nr = ib_map_mr_sg(req->mr, req->sg_table.sgl, count, NULL, SZ_4K);
+	nr = ib_map_mr_sg(req->mr, req->data_sgl.sg_table.sgl, count, NULL,
+			  SZ_4K);
 	if (unlikely(nr < count)) {
 		ib_mr_pool_put(queue->qp, &queue->qp->rdma_mrs, req->mr);
 		req->mr = NULL;
@@ -1273,17 +1278,18 @@ static int nvme_rdma_map_data(struct nvme_rdma_queue *queue,
 	if (!blk_rq_nr_phys_segments(rq))
 		return nvme_rdma_set_sg_null(c);
 
-	req->sg_table.sgl = req->first_sgl;
-	ret = sg_alloc_table_chained(&req->sg_table,
-			blk_rq_nr_phys_segments(rq), req->sg_table.sgl,
+	req->data_sgl.sg_table.sgl = (struct scatterlist *)(req + 1);
+	ret = sg_alloc_table_chained(&req->data_sgl.sg_table,
+			blk_rq_nr_phys_segments(rq), req->data_sgl.sg_table.sgl,
 			NVME_INLINE_SG_CNT);
 	if (ret)
 		return -ENOMEM;
 
-	req->nents = blk_rq_map_sg(rq->q, rq, req->sg_table.sgl);
+	req->data_sgl.nents = blk_rq_map_sg(rq->q, rq,
+					    req->data_sgl.sg_table.sgl);
 
-	count = ib_dma_map_sg(ibdev, req->sg_table.sgl, req->nents,
-			      rq_dma_dir(rq));
+	count = ib_dma_map_sg(ibdev, req->data_sgl.sg_table.sgl,
+			      req->data_sgl.nents, rq_dma_dir(rq));
 	if (unlikely(count <= 0)) {
 		ret = -EIO;
 		goto out_free_table;
@@ -1312,9 +1318,10 @@ static int nvme_rdma_map_data(struct nvme_rdma_queue *queue,
 	return 0;
 
 out_unmap_sg:
-	ib_dma_unmap_sg(ibdev, req->sg_table.sgl, req->nents, rq_dma_dir(rq));
+	ib_dma_unmap_sg(ibdev, req->data_sgl.sg_table.sgl, req->data_sgl.nents,
+			rq_dma_dir(rq));
 out_free_table:
-	sg_free_table_chained(&req->sg_table, NVME_INLINE_SG_CNT);
+	sg_free_table_chained(&req->data_sgl.sg_table, NVME_INLINE_SG_CNT);
 	return ret;
 }
 
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 07/15] nvme-rdma: Add metadata/T10-PI support
  2020-01-06 13:37 [PATCH 00/15 V3] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (6 preceding siblings ...)
  2020-01-06 13:37 ` [PATCH 06/15] nvme-rdma: Introduce nvme_rdma_sgl structure Max Gurtovoy
@ 2020-01-06 13:37 ` Max Gurtovoy
  2020-01-06 13:37 ` [PATCH 08/15] nvmet: Prepare metadata request Max Gurtovoy
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-06 13:37 UTC (permalink / raw)
  To: linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, shlomin, israelr, idanb, oren, maxg

For capable HCAs (e.g. ConnectX-4/ConnectX-5) this will allow end-to-end
protection information passthrough and validation for NVMe over RDMA
transport. Metadata offload support was implemented over the new RDMA
signature verbs API and it is enabled per controller by using nvme-cli.

usage example:
nvme connect --pi_enable --transport=rdma --traddr=10.0.1.1 --nqn=test-nvme

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
---
 drivers/nvme/host/rdma.c | 331 ++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 298 insertions(+), 33 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index b6dd9f5..a50f11f 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -67,6 +67,9 @@ struct nvme_rdma_request {
 	struct ib_cqe		reg_cqe;
 	struct nvme_rdma_queue  *queue;
 	struct nvme_rdma_sgl	data_sgl;
+	/* Metadata (T10-PI) support */
+	struct nvme_rdma_sgl	*pi_sgl;
+	bool			use_pi;
 };
 
 enum nvme_rdma_queue_flags {
@@ -88,6 +91,7 @@ struct nvme_rdma_queue {
 	struct rdma_cm_id	*cm_id;
 	int			cm_error;
 	struct completion	cm_done;
+	bool			pi_support;
 };
 
 struct nvme_rdma_ctrl {
@@ -115,6 +119,7 @@ struct nvme_rdma_ctrl {
 	struct nvme_ctrl	ctrl;
 	bool			use_inline_data;
 	u32			io_queues[HCTX_MAX_TYPES];
+	bool			pi_support;
 };
 
 static inline struct nvme_rdma_ctrl *to_rdma_ctrl(struct nvme_ctrl *ctrl)
@@ -272,6 +277,8 @@ static int nvme_rdma_create_qp(struct nvme_rdma_queue *queue, const int factor)
 	init_attr.qp_type = IB_QPT_RC;
 	init_attr.send_cq = queue->ib_cq;
 	init_attr.recv_cq = queue->ib_cq;
+	if (queue->pi_support)
+		init_attr.create_flags |= IB_QP_CREATE_INTEGRITY_EN;
 
 	ret = rdma_create_qp(queue->cm_id, dev->pd, &init_attr);
 
@@ -287,6 +294,18 @@ static void nvme_rdma_exit_request(struct blk_mq_tag_set *set,
 	kfree(req->sqe.data);
 }
 
+static unsigned int nvme_rdma_cmd_size(bool has_pi)
+{
+	/*
+	 * nvme rdma command consists of struct nvme_rdma_request, data SGL,
+	 * PI nvme_rdma_sgl struct and then PI SGL.
+	 */
+	return sizeof(struct nvme_rdma_request) +
+	       sizeof(struct scatterlist) * NVME_INLINE_SG_CNT +
+	       has_pi * (sizeof(struct nvme_rdma_sgl) +
+			 sizeof(struct scatterlist) * NVME_INLINE_PROT_SG_CNT);
+}
+
 static int nvme_rdma_init_request(struct blk_mq_tag_set *set,
 		struct request *rq, unsigned int hctx_idx,
 		unsigned int numa_node)
@@ -301,6 +320,10 @@ static int nvme_rdma_init_request(struct blk_mq_tag_set *set,
 	if (!req->sqe.data)
 		return -ENOMEM;
 
+	/* PI nvme_rdma_sgl struct is located after command's data SGL */
+	if (queue->pi_support)
+		req->pi_sgl = (void *)nvme_req(rq) + nvme_rdma_cmd_size(false);
+
 	req->queue = queue;
 
 	return 0;
@@ -411,6 +434,8 @@ static void nvme_rdma_destroy_queue_ib(struct nvme_rdma_queue *queue)
 	dev = queue->device;
 	ibdev = dev->dev;
 
+	if (queue->pi_support)
+		ib_mr_pool_destroy(queue->qp, &queue->qp->sig_mrs);
 	ib_mr_pool_destroy(queue->qp, &queue->qp->rdma_mrs);
 
 	/*
@@ -427,10 +452,13 @@ static void nvme_rdma_destroy_queue_ib(struct nvme_rdma_queue *queue)
 	nvme_rdma_dev_put(dev);
 }
 
-static int nvme_rdma_get_max_fr_pages(struct ib_device *ibdev)
+static int nvme_rdma_get_max_fr_pages(struct ib_device *ibdev, bool pi_support)
 {
-	return min_t(u32, NVME_RDMA_MAX_SEGMENTS,
-		     ibdev->attrs.max_fast_reg_page_list_len - 1);
+	u32 max_page_list_len =
+		pi_support ? ibdev->attrs.max_pi_fast_reg_page_list_len :
+			     ibdev->attrs.max_fast_reg_page_list_len;
+
+	return min_t(u32, NVME_RDMA_MAX_SEGMENTS, max_page_list_len - 1);
 }
 
 static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue)
@@ -487,7 +515,7 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue)
 	 * misaligned we'll end up using two entries for a single data page,
 	 * so one additional entry is required.
 	 */
-	pages_per_mr = nvme_rdma_get_max_fr_pages(ibdev) + 1;
+	pages_per_mr = nvme_rdma_get_max_fr_pages(ibdev, queue->pi_support) + 1;
 	ret = ib_mr_pool_init(queue->qp, &queue->qp->rdma_mrs,
 			      queue->queue_size,
 			      IB_MR_TYPE_MEM_REG,
@@ -499,10 +527,24 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue)
 		goto out_destroy_ring;
 	}
 
+	if (queue->pi_support) {
+		ret = ib_mr_pool_init(queue->qp, &queue->qp->sig_mrs,
+				      queue->queue_size, IB_MR_TYPE_INTEGRITY,
+				      pages_per_mr, pages_per_mr);
+		if (ret) {
+			dev_err(queue->ctrl->ctrl.device,
+				"failed to initialize PI MR pool sized %d for QID %d\n",
+				queue->queue_size, idx);
+			goto out_destroy_mr_pool;
+		}
+	}
+
 	set_bit(NVME_RDMA_Q_TR_READY, &queue->flags);
 
 	return 0;
 
+out_destroy_mr_pool:
+	ib_mr_pool_destroy(queue->qp, &queue->qp->rdma_mrs);
 out_destroy_ring:
 	nvme_rdma_free_ring(ibdev, queue->rsp_ring, queue->queue_size,
 			    sizeof(struct nvme_completion), DMA_FROM_DEVICE);
@@ -524,6 +566,7 @@ static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
 
 	queue = &ctrl->queues[idx];
 	queue->ctrl = ctrl;
+	queue->pi_support = idx && ctrl->pi_support;
 	init_completion(&queue->cm_done);
 
 	if (idx > 0)
@@ -733,8 +776,7 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
 		set->queue_depth = NVME_AQ_MQ_TAG_DEPTH;
 		set->reserved_tags = 2; /* connect + keep-alive */
 		set->numa_node = nctrl->numa_node;
-		set->cmd_size = sizeof(struct nvme_rdma_request) +
-			NVME_INLINE_SG_CNT * sizeof(struct scatterlist);
+		set->cmd_size = nvme_rdma_cmd_size(false);
 		set->driver_data = ctrl;
 		set->nr_hw_queues = 1;
 		set->timeout = ADMIN_TIMEOUT;
@@ -747,8 +789,7 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
 		set->reserved_tags = 1; /* fabric connect */
 		set->numa_node = nctrl->numa_node;
 		set->flags = BLK_MQ_F_SHOULD_MERGE;
-		set->cmd_size = sizeof(struct nvme_rdma_request) +
-			NVME_INLINE_SG_CNT * sizeof(struct scatterlist);
+		set->cmd_size = nvme_rdma_cmd_size(ctrl->pi_support);
 		set->driver_data = ctrl;
 		set->nr_hw_queues = nctrl->queue_count - 1;
 		set->timeout = NVME_IO_TIMEOUT;
@@ -790,7 +831,21 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
 	ctrl->device = ctrl->queues[0].device;
 	ctrl->ctrl.numa_node = dev_to_node(ctrl->device->dev->dma_device);
 
-	ctrl->max_fr_pages = nvme_rdma_get_max_fr_pages(ctrl->device->dev);
+	/* T10-PI support */
+	if (ctrl->ctrl.opts->pi_enable) {
+		if (!(ctrl->device->dev->attrs.device_cap_flags &
+		      IB_DEVICE_INTEGRITY_HANDOVER)) {
+			dev_warn(ctrl->ctrl.device,
+				 "T10-PI requested but not supported on %s, continue without T10-PI\n",
+				 ctrl->device->dev->name);
+			ctrl->ctrl.opts->pi_enable = false;
+		} else {
+			ctrl->pi_support = true;
+		}
+	}
+
+	ctrl->max_fr_pages = nvme_rdma_get_max_fr_pages(ctrl->device->dev,
+							ctrl->pi_support);
 
 	/*
 	 * Bind the async event SQE DMA mapping to the admin queue lifetime.
@@ -832,6 +887,8 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
 
 	ctrl->ctrl.max_segments = ctrl->max_fr_pages;
 	ctrl->ctrl.max_hw_sectors = ctrl->max_fr_pages << (ilog2(SZ_4K) - 9);
+	if (ctrl->pi_support)
+		ctrl->ctrl.max_integrity_segments = ctrl->max_fr_pages;
 
 	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
 
@@ -1157,8 +1214,19 @@ static void nvme_rdma_unmap_data(struct nvme_rdma_queue *queue,
 	if (!blk_rq_nr_phys_segments(rq))
 		return;
 
+	if (blk_integrity_rq(rq)) {
+		ib_dma_unmap_sg(ibdev, req->pi_sgl->sg_table.sgl,
+				req->pi_sgl->nents, rq_dma_dir(rq));
+		sg_free_table_chained(&req->pi_sgl->sg_table,
+				      NVME_INLINE_PROT_SG_CNT);
+	}
+
 	if (req->mr) {
-		ib_mr_pool_put(queue->qp, &queue->qp->rdma_mrs, req->mr);
+		if (req->use_pi)
+			ib_mr_pool_put(queue->qp, &queue->qp->sig_mrs, req->mr);
+		else
+			ib_mr_pool_put(queue->qp, &queue->qp->rdma_mrs,
+				       req->mr);
 		req->mr = NULL;
 	}
 
@@ -1215,34 +1283,115 @@ static int nvme_rdma_map_sg_single(struct nvme_rdma_queue *queue,
 	return 0;
 }
 
-static int nvme_rdma_map_sg_fr(struct nvme_rdma_queue *queue,
-		struct nvme_rdma_request *req, struct nvme_command *c,
-		int count)
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+static void nvme_rdma_set_diff_domain(struct nvme_command *cmd, struct bio *bio,
+		struct ib_sig_domain *domain, struct request *rq)
 {
-	struct nvme_keyed_sgl_desc *sg = &c->common.dptr.ksgl;
-	int nr;
+	struct blk_integrity *bi = blk_get_integrity(bio->bi_disk);
+	struct nvme_ns *ns = rq->rq_disk->private_data;
+
+	WARN_ON(bi == NULL);
 
-	req->mr = ib_mr_pool_get(queue->qp, &queue->qp->rdma_mrs);
-	if (WARN_ON_ONCE(!req->mr))
-		return -EAGAIN;
+	domain->sig_type = IB_SIG_TYPE_T10_DIF;
+	domain->sig.dif.bg_type = IB_T10DIF_CRC;
+	domain->sig.dif.pi_interval = 1 << bi->interval_exp;
+	domain->sig.dif.ref_tag = le32_to_cpu(cmd->rw.reftag);
 
 	/*
-	 * Align the MR to a 4K page size to match the ctrl page size and
-	 * the block virtual boundary.
+	 * At the moment we hard code those, but in the future
+	 * we will take them from cmd.
 	 */
-	nr = ib_map_mr_sg(req->mr, req->data_sgl.sg_table.sgl, count, NULL,
-			  SZ_4K);
-	if (unlikely(nr < count)) {
-		ib_mr_pool_put(queue->qp, &queue->qp->rdma_mrs, req->mr);
-		req->mr = NULL;
-		if (nr < 0)
-			return nr;
-		return -EINVAL;
+	domain->sig.dif.apptag_check_mask = 0xffff;
+	domain->sig.dif.app_escape = true;
+	domain->sig.dif.ref_escape = true;
+	if (ns->pi_type != NVME_NS_DPS_PI_TYPE3)
+		domain->sig.dif.ref_remap = true;
+}
+
+static void nvme_rdma_set_sig_attrs(struct bio *bio,
+		struct ib_sig_attrs *sig_attrs, struct nvme_command *c,
+		struct request *rq)
+{
+	u16 control = le16_to_cpu(c->rw.control);
+
+	memset(sig_attrs, 0, sizeof(*sig_attrs));
+
+	if (control & NVME_RW_PRINFO_PRACT) {
+		/* for WRITE_INSERT/READ_STRIP no memory domain */
+		sig_attrs->mem.sig_type = IB_SIG_TYPE_NONE;
+		nvme_rdma_set_diff_domain(c, bio, &sig_attrs->wire, rq);
+		/* Clear the PRACT bit since HCA will generate/verify the PI */
+		control &= ~NVME_RW_PRINFO_PRACT;
+		c->rw.control = cpu_to_le16(control);
+	} else {
+		/* for WRITE_PASS/READ_PASS both wire/memory domains exist */
+		nvme_rdma_set_diff_domain(c, bio, &sig_attrs->wire, rq);
+		nvme_rdma_set_diff_domain(c, bio, &sig_attrs->mem, rq);
 	}
+}
+
+static void nvme_rdma_set_prot_checks(struct nvme_command *cmd, u8 *mask)
+{
+	*mask = 0;
+	if (le16_to_cpu(cmd->rw.control) & NVME_RW_PRINFO_PRCHK_REF)
+		*mask |= IB_SIG_CHECK_REFTAG;
+	if (le16_to_cpu(cmd->rw.control) & NVME_RW_PRINFO_PRCHK_GUARD)
+		*mask |= IB_SIG_CHECK_GUARD;
+}
+
+static void nvme_rdma_sig_done(struct ib_cq *cq, struct ib_wc *wc)
+{
+	if (unlikely(wc->status != IB_WC_SUCCESS))
+		nvme_rdma_wr_error(cq, wc, "SIG");
+}
+
+static void nvme_rdma_set_pi_wr(struct nvme_rdma_request *req,
+				struct nvme_command *c)
+{
+	struct ib_sig_attrs *sig_attrs = req->mr->sig_attrs;
+	struct ib_reg_wr *wr = &req->reg_wr;
+	struct request *rq = blk_mq_rq_from_pdu(req);
+	struct bio *bio = rq->bio;
+	struct nvme_keyed_sgl_desc *sg = &c->common.dptr.ksgl;
+
+	nvme_rdma_set_sig_attrs(bio, sig_attrs, c, rq);
+
+	nvme_rdma_set_prot_checks(c, &sig_attrs->check_mask);
 
 	ib_update_fast_reg_key(req->mr, ib_inc_rkey(req->mr->rkey));
 
+	req->reg_cqe.done = nvme_rdma_sig_done;
+
+	memset(wr, 0, sizeof(*wr));
+	wr->wr.opcode = IB_WR_REG_MR_INTEGRITY;
+	wr->wr.wr_cqe = &req->reg_cqe;
+	wr->wr.num_sge = 0;
+	wr->wr.send_flags = 0;
+	wr->mr = req->mr;
+	wr->key = req->mr->rkey;
+	wr->access = IB_ACCESS_LOCAL_WRITE |
+		     IB_ACCESS_REMOTE_READ |
+		     IB_ACCESS_REMOTE_WRITE;
+
+	sg->addr = cpu_to_le64(req->mr->iova);
+	put_unaligned_le24(req->mr->length, sg->length);
+	put_unaligned_le32(req->mr->rkey, sg->key);
+	sg->type = (NVME_KEY_SGL_FMT_DATA_DESC << 4) | NVME_SGL_FMT_INVALIDATE;
+}
+#else
+static void nvme_rdma_set_pi_wr(struct nvme_rdma_request *req,
+				struct nvme_command *c)
+{
+}
+#endif /* CONFIG_BLK_DEV_INTEGRITY */
+
+static void nvme_rdma_set_reg_wr(struct nvme_rdma_request *req,
+				 struct nvme_command *c)
+{
+	struct nvme_keyed_sgl_desc *sg = &c->common.dptr.ksgl;
+
 	req->reg_cqe.done = nvme_rdma_memreg_done;
+
 	memset(&req->reg_wr, 0, sizeof(req->reg_wr));
 	req->reg_wr.wr.opcode = IB_WR_REG_MR;
 	req->reg_wr.wr.wr_cqe = &req->reg_cqe;
@@ -1258,8 +1407,52 @@ static int nvme_rdma_map_sg_fr(struct nvme_rdma_queue *queue,
 	put_unaligned_le32(req->mr->rkey, sg->key);
 	sg->type = (NVME_KEY_SGL_FMT_DATA_DESC << 4) |
 			NVME_SGL_FMT_INVALIDATE;
+}
+
+static int nvme_rdma_map_sg_fr(struct nvme_rdma_queue *queue,
+		struct nvme_rdma_request *req, struct nvme_command *c,
+		int count, int pi_count)
+{
+	struct nvme_rdma_sgl *sgl = &req->data_sgl;
+	int nr;
+
+	if (req->use_pi) {
+		req->mr = ib_mr_pool_get(queue->qp, &queue->qp->sig_mrs);
+		if (WARN_ON_ONCE(!req->mr))
+			return -EAGAIN;
+
+		nr = ib_map_mr_sg_pi(req->mr, sgl->sg_table.sgl, count, 0,
+				     req->pi_sgl->sg_table.sgl, pi_count, 0,
+				     SZ_4K);
+		if (unlikely(nr))
+			goto mr_put;
+
+		nvme_rdma_set_pi_wr(req, c);
+	} else {
+		req->mr = ib_mr_pool_get(queue->qp, &queue->qp->rdma_mrs);
+		if (WARN_ON_ONCE(!req->mr))
+			return -EAGAIN;
+		/*
+		 * Align the MR to a 4K page size to match the ctrl page size
+		 * and the block virtual boundary.
+		 */
+		nr = ib_map_mr_sg(req->mr, sgl->sg_table.sgl, count, 0, SZ_4K);
+		if (unlikely(nr < count))
+			goto mr_put;
+
+		nvme_rdma_set_reg_wr(req, c);
+	}
 
 	return 0;
+mr_put:
+	if (req->use_pi)
+		ib_mr_pool_put(queue->qp, &queue->qp->sig_mrs, req->mr);
+	else
+		ib_mr_pool_put(queue->qp, &queue->qp->rdma_mrs, req->mr);
+	req->mr = NULL;
+	if (nr < 0)
+		return nr;
+	return -EINVAL;
 }
 
 static int nvme_rdma_map_data(struct nvme_rdma_queue *queue,
@@ -1268,6 +1461,7 @@ static int nvme_rdma_map_data(struct nvme_rdma_queue *queue,
 	struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
 	struct nvme_rdma_device *dev = queue->device;
 	struct ib_device *ibdev = dev->dev;
+	int pi_count = 0;
 	int count, ret;
 
 	req->num_sge = 1;
@@ -1295,7 +1489,30 @@ static int nvme_rdma_map_data(struct nvme_rdma_queue *queue,
 		goto out_free_table;
 	}
 
-	if (count <= dev->num_inline_segments) {
+	if (blk_integrity_rq(rq)) {
+		memset(req->pi_sgl, 0, sizeof(struct nvme_rdma_sgl));
+		req->pi_sgl->sg_table.sgl =
+			(struct scatterlist *)(req->pi_sgl + 1);
+		ret = sg_alloc_table_chained(&req->pi_sgl->sg_table,
+				blk_rq_count_integrity_sg(rq->q, rq->bio),
+				req->pi_sgl->sg_table.sgl,
+				NVME_INLINE_PROT_SG_CNT);
+		if (unlikely(ret)) {
+			ret = -ENOMEM;
+			goto out_unmap_sg;
+		}
+
+		req->pi_sgl->nents = blk_rq_map_integrity_sg(rq->q, rq->bio,
+					req->pi_sgl->sg_table.sgl);
+		pi_count = ib_dma_map_sg(ibdev, req->pi_sgl->sg_table.sgl,
+					 req->pi_sgl->nents, rq_dma_dir(rq));
+		if (unlikely(pi_count <= 0)) {
+			ret = -EIO;
+			goto out_free_pi_table;
+		}
+	}
+
+	if (count <= dev->num_inline_segments && !req->use_pi) {
 		if (rq_data_dir(rq) == WRITE && nvme_rdma_queue_idx(queue) &&
 		    queue->ctrl->use_inline_data &&
 		    blk_rq_payload_bytes(rq) <=
@@ -1310,13 +1527,21 @@ static int nvme_rdma_map_data(struct nvme_rdma_queue *queue,
 		}
 	}
 
-	ret = nvme_rdma_map_sg_fr(queue, req, c, count);
+	ret = nvme_rdma_map_sg_fr(queue, req, c, count, pi_count);
 out:
 	if (unlikely(ret))
-		goto out_unmap_sg;
+		goto out_unmap_pi_sg;
 
 	return 0;
 
+out_unmap_pi_sg:
+	if (blk_integrity_rq(rq))
+		ib_dma_unmap_sg(ibdev, req->pi_sgl->sg_table.sgl,
+				req->pi_sgl->nents, rq_dma_dir(rq));
+out_free_pi_table:
+	if (blk_integrity_rq(rq))
+		sg_free_table_chained(&req->pi_sgl->sg_table,
+				      NVME_INLINE_PROT_SG_CNT);
 out_unmap_sg:
 	ib_dma_unmap_sg(ibdev, req->data_sgl.sg_table.sgl, req->data_sgl.nents,
 			rq_dma_dir(rq));
@@ -1769,6 +1994,8 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 	blk_mq_start_request(rq);
 
+	req->use_pi = queue->pi_support && nvme_req(rq)->has_pi;
+
 	err = nvme_rdma_map_data(queue, rq, c);
 	if (unlikely(err < 0)) {
 		dev_err(queue->ctrl->ctrl.device,
@@ -1809,12 +2036,50 @@ static int nvme_rdma_poll(struct blk_mq_hw_ctx *hctx)
 	return ib_process_cq_direct(queue->ib_cq, -1);
 }
 
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+static void nvme_rdma_check_pi_status(struct nvme_rdma_request *req)
+{
+	struct request *rq = blk_mq_rq_from_pdu(req);
+	struct ib_mr_status mr_status;
+	int ret;
+
+	ret = ib_check_mr_status(req->mr, IB_MR_CHECK_SIG_STATUS, &mr_status);
+	if (ret) {
+		pr_err("ib_check_mr_status failed, ret %d\n", ret);
+		nvme_req(rq)->status = NVME_SC_INVALID_PI;
+		return;
+	}
+
+	if (mr_status.fail_status & IB_MR_CHECK_SIG_STATUS) {
+		switch (mr_status.sig_err.err_type) {
+		case IB_SIG_BAD_GUARD:
+			nvme_req(rq)->status = NVME_SC_GUARD_CHECK;
+			break;
+		case IB_SIG_BAD_REFTAG:
+			nvme_req(rq)->status = NVME_SC_REFTAG_CHECK;
+			break;
+		case IB_SIG_BAD_APPTAG:
+			nvme_req(rq)->status = NVME_SC_APPTAG_CHECK;
+			break;
+		}
+		pr_err("PI error found type %d expected 0x%x vs actual 0x%x\n",
+		       mr_status.sig_err.err_type, mr_status.sig_err.expected,
+		       mr_status.sig_err.actual);
+	}
+}
+#endif
+
 static void nvme_rdma_complete_rq(struct request *rq)
 {
 	struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
 	struct nvme_rdma_queue *queue = req->queue;
 	struct ib_device *ibdev = queue->device->dev;
 
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+	if (req->use_pi)
+		nvme_rdma_check_pi_status(req);
+#endif
+
 	nvme_rdma_unmap_data(queue, rq);
 	ib_dma_unmap_single(ibdev, req->sqe.dma, sizeof(struct nvme_command),
 			    DMA_TO_DEVICE);
@@ -1934,7 +2199,7 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
 static const struct nvme_ctrl_ops nvme_rdma_ctrl_ops = {
 	.name			= "rdma",
 	.module			= THIS_MODULE,
-	.flags			= NVME_F_FABRICS,
+	.flags			= NVME_F_FABRICS | NVME_F_METADATA_SUPPORTED,
 	.reg_read32		= nvmf_reg_read32,
 	.reg_read64		= nvmf_reg_read64,
 	.reg_write32		= nvmf_reg_write32,
@@ -2078,7 +2343,7 @@ static struct nvme_ctrl *nvme_rdma_create_ctrl(struct device *dev,
 	.allowed_opts	= NVMF_OPT_TRSVCID | NVMF_OPT_RECONNECT_DELAY |
 			  NVMF_OPT_HOST_TRADDR | NVMF_OPT_CTRL_LOSS_TMO |
 			  NVMF_OPT_NR_WRITE_QUEUES | NVMF_OPT_NR_POLL_QUEUES |
-			  NVMF_OPT_TOS,
+			  NVMF_OPT_TOS | NVMF_OPT_PI_ENABLE,
 	.create_ctrl	= nvme_rdma_create_ctrl,
 };
 
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 08/15] nvmet: Prepare metadata request
  2020-01-06 13:37 [PATCH 00/15 V3] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (7 preceding siblings ...)
  2020-01-06 13:37 ` [PATCH 07/15] nvme-rdma: Add metadata/T10-PI support Max Gurtovoy
@ 2020-01-06 13:37 ` Max Gurtovoy
  2020-01-06 13:37 ` [PATCH 09/15] nvmet: Add metadata characteristics for a namespace Max Gurtovoy
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-06 13:37 UTC (permalink / raw)
  To: linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, shlomin, israelr, idanb, oren, maxg

From: Israel Rukshin <israelr@mellanox.com>

Allocate the metadata SGL buffers and add metadata fields for the
request. This is a preparation for adding metadata support over the
fabrics.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
---
 drivers/nvme/target/core.c  | 48 ++++++++++++++++++++++++++++++++++++++-------
 drivers/nvme/target/nvmet.h |  5 +++++
 2 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 28438b8..ab3ffc6 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -863,13 +863,17 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
 	req->sq = sq;
 	req->ops = ops;
 	req->sg = NULL;
+	req->prot_sg = NULL;
 	req->sg_cnt = 0;
+	req->prot_sg_cnt = 0;
 	req->transfer_len = 0;
+	req->prot_len = 0;
 	req->cqe->status = 0;
 	req->cqe->sq_head = 0;
 	req->ns = NULL;
 	req->error_loc = NVMET_NO_ERROR_LOC;
 	req->error_slba = 0;
+	req->use_pi = false;
 
 	trace_nvmet_req_init(req, req->cmd);
 
@@ -941,6 +945,7 @@ bool nvmet_check_data_len(struct nvmet_req *req, size_t data_len)
 int nvmet_req_alloc_sgl(struct nvmet_req *req)
 {
 	struct pci_dev *p2p_dev = NULL;
+	int data_len = req->transfer_len - req->prot_len;
 
 	if (IS_ENABLED(CONFIG_PCI_P2PDMA)) {
 		if (req->sq->ctrl && req->ns)
@@ -950,11 +955,23 @@ int nvmet_req_alloc_sgl(struct nvmet_req *req)
 		req->p2p_dev = NULL;
 		if (req->sq->qid && p2p_dev) {
 			req->sg = pci_p2pmem_alloc_sgl(p2p_dev, &req->sg_cnt,
-						       req->transfer_len);
-			if (req->sg) {
-				req->p2p_dev = p2p_dev;
-				return 0;
+						       data_len);
+			if (!req->sg)
+				goto fallback;
+
+			if (req->prot_len) {
+				req->prot_sg =
+					pci_p2pmem_alloc_sgl(p2p_dev,
+							     &req->prot_sg_cnt,
+							     req->prot_len);
+				if (!req->prot_sg) {
+					pci_p2pmem_free_sgl(req->p2p_dev,
+							    req->sg);
+					goto fallback;
+				}
 			}
+			req->p2p_dev = p2p_dev;
+			return 0;
 		}
 
 		/*
@@ -963,23 +980,40 @@ int nvmet_req_alloc_sgl(struct nvmet_req *req)
 		 */
 	}
 
-	req->sg = sgl_alloc(req->transfer_len, GFP_KERNEL, &req->sg_cnt);
+fallback:
+	req->sg = sgl_alloc(data_len, GFP_KERNEL, &req->sg_cnt);
 	if (unlikely(!req->sg))
 		return -ENOMEM;
 
+	if (req->prot_len) {
+		req->prot_sg = sgl_alloc(req->prot_len, GFP_KERNEL,
+					 &req->prot_sg_cnt);
+		if (unlikely(!req->prot_sg)) {
+			sgl_free(req->sg);
+			return -ENOMEM;
+		}
+	}
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(nvmet_req_alloc_sgl);
 
 void nvmet_req_free_sgl(struct nvmet_req *req)
 {
-	if (req->p2p_dev)
+	if (req->p2p_dev) {
 		pci_p2pmem_free_sgl(req->p2p_dev, req->sg);
-	else
+		if (req->prot_sg)
+			pci_p2pmem_free_sgl(req->p2p_dev, req->prot_sg);
+	} else {
 		sgl_free(req->sg);
+		if (req->prot_sg)
+			sgl_free(req->prot_sg);
+	}
 
 	req->sg = NULL;
+	req->prot_sg = NULL;
 	req->sg_cnt = 0;
+	req->prot_sg_cnt = 0;
 }
 EXPORT_SYMBOL_GPL(nvmet_req_free_sgl);
 
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 46df45e..60011f3 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -291,6 +291,7 @@ struct nvmet_req {
 	struct nvmet_cq		*cq;
 	struct nvmet_ns		*ns;
 	struct scatterlist	*sg;
+	struct scatterlist	*prot_sg;
 	struct bio_vec		inline_bvec[NVMET_MAX_INLINE_BIOVEC];
 	union {
 		struct {
@@ -304,8 +305,10 @@ struct nvmet_req {
 		} f;
 	};
 	int			sg_cnt;
+	int			prot_sg_cnt;
 	/* data length as parsed from the SGL descriptor: */
 	size_t			transfer_len;
+	size_t			prot_len;
 
 	struct nvmet_port	*port;
 
@@ -316,6 +319,8 @@ struct nvmet_req {
 	struct device		*p2p_client;
 	u16			error_loc;
 	u64			error_slba;
+	/* Metadata (T10-PI) support */
+	bool			use_pi;
 };
 
 extern struct workqueue_struct *buffered_io_wq;
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 09/15] nvmet: Add metadata characteristics for a namespace
  2020-01-06 13:37 [PATCH 00/15 V3] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (8 preceding siblings ...)
  2020-01-06 13:37 ` [PATCH 08/15] nvmet: Prepare metadata request Max Gurtovoy
@ 2020-01-06 13:37 ` Max Gurtovoy
  2020-01-09  3:16   ` Martin K. Petersen
  2020-01-06 13:37 ` [PATCH 10/15] nvmet: Rename nvmet_rw_len to nvmet_rw_data_len Max Gurtovoy
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-06 13:37 UTC (permalink / raw)
  To: linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, shlomin, israelr, idanb, oren, maxg

From: Israel Rukshin <israelr@mellanox.com>

Fill those namespace fields from the block device format for adding
metadata (T10-PI) over fabric support with block devices.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
---
 drivers/nvme/target/io-cmd-bdev.c | 22 ++++++++++++++++++++++
 drivers/nvme/target/nvmet.h       |  3 +++
 2 files changed, 25 insertions(+)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index b6fca0e..fb40022 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -50,6 +50,9 @@ void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id)
 int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
 {
 	int ret;
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+	struct blk_integrity *bi;
+#endif
 
 	ns->bdev = blkdev_get_by_path(ns->device_path,
 			FMODE_READ | FMODE_WRITE, NULL);
@@ -64,6 +67,25 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
 	}
 	ns->size = i_size_read(ns->bdev->bd_inode);
 	ns->blksize_shift = blksize_bits(bdev_logical_block_size(ns->bdev));
+
+	ns->prot_type = 0;
+	ns->ms = 0;
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+	bi = bdev_get_integrity(ns->bdev);
+	if (bi) {
+		ns->ms = bi->tuple_size;
+		if (bi->profile == &t10_pi_type1_crc)
+			ns->prot_type = NVME_NS_DPS_PI_TYPE1;
+		else if (bi->profile == &t10_pi_type3_crc)
+			ns->prot_type = NVME_NS_DPS_PI_TYPE3;
+		else
+			/* Unsupported metadata type */
+			ns->ms = 0;
+	}
+
+	pr_debug("ms %d pi_type %d\n", ns->ms, ns->prot_type);
+#endif
+
 	return 0;
 }
 
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 60011f3..89e0174 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -19,6 +19,7 @@
 #include <linux/rcupdate.h>
 #include <linux/blkdev.h>
 #include <linux/radix-tree.h>
+#include <linux/t10-pi.h>
 
 #define NVMET_ASYNC_EVENTS		4
 #define NVMET_ERROR_LOG_SLOTS		128
@@ -76,6 +77,8 @@ struct nvmet_ns {
 
 	int			use_p2pmem;
 	struct pci_dev		*p2p_dev;
+	int			prot_type;
+	int			ms;
 };
 
 static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 10/15] nvmet: Rename nvmet_rw_len to nvmet_rw_data_len
  2020-01-06 13:37 [PATCH 00/15 V3] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (9 preceding siblings ...)
  2020-01-06 13:37 ` [PATCH 09/15] nvmet: Add metadata characteristics for a namespace Max Gurtovoy
@ 2020-01-06 13:37 ` Max Gurtovoy
  2020-01-09  3:17   ` Martin K. Petersen
  2020-01-06 13:37 ` [PATCH 11/15] nvmet: Rename nvmet_check_data_len to nvmet_check_transfer_len Max Gurtovoy
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-06 13:37 UTC (permalink / raw)
  To: linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, shlomin, israelr, idanb, oren, maxg

From: Israel Rukshin <israelr@mellanox.com>

The function doesn't add the metadata length (only data length is
calculated). This is preparation for adding metadata (T10-PI) support.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/target/io-cmd-bdev.c | 2 +-
 drivers/nvme/target/io-cmd-file.c | 2 +-
 drivers/nvme/target/nvmet.h       | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index fb40022..190cdc8 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -173,7 +173,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 	sector_t sector;
 	int op, i;
 
-	if (!nvmet_check_data_len(req, nvmet_rw_len(req)))
+	if (!nvmet_check_data_len(req, nvmet_rw_data_len(req)))
 		return;
 
 	if (!req->sg_cnt) {
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index caebfce..e0a4859 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -232,7 +232,7 @@ static void nvmet_file_execute_rw(struct nvmet_req *req)
 {
 	ssize_t nr_bvec = req->sg_cnt;
 
-	if (!nvmet_check_data_len(req, nvmet_rw_len(req)))
+	if (!nvmet_check_data_len(req, nvmet_rw_data_len(req)))
 		return;
 
 	if (!req->sg_cnt || !nr_bvec) {
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 89e0174..8e6f11f 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -495,7 +495,7 @@ void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type,
 u16 nvmet_file_flush(struct nvmet_req *req);
 void nvmet_ns_changed(struct nvmet_subsys *subsys, u32 nsid);
 
-static inline u32 nvmet_rw_len(struct nvmet_req *req)
+static inline u32 nvmet_rw_data_len(struct nvmet_req *req)
 {
 	return ((u32)le16_to_cpu(req->cmd->rw.length) + 1) <<
 			req->ns->blksize_shift;
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 11/15] nvmet: Rename nvmet_check_data_len to nvmet_check_transfer_len
  2020-01-06 13:37 [PATCH 00/15 V3] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (10 preceding siblings ...)
  2020-01-06 13:37 ` [PATCH 10/15] nvmet: Rename nvmet_rw_len to nvmet_rw_data_len Max Gurtovoy
@ 2020-01-06 13:37 ` Max Gurtovoy
  2020-01-09  3:19   ` Martin K. Petersen
  2020-01-06 13:37 ` [PATCH 12/15] nvme: Add Metadata Capabilities enumerations Max Gurtovoy
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-06 13:37 UTC (permalink / raw)
  To: linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, shlomin, israelr, idanb, oren, maxg

From: Israel Rukshin <israelr@mellanox.com>

The function doesn't checks only the data length, because the transfer
length includes also the metadata length in some cases. This is
preparation for adding metadata (T10-PI) support.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/target/admin-cmd.c   | 14 +++++++-------
 drivers/nvme/target/core.c        |  6 +++---
 drivers/nvme/target/discovery.c   |  8 ++++----
 drivers/nvme/target/fabrics-cmd.c |  8 ++++----
 drivers/nvme/target/io-cmd-bdev.c |  8 ++++----
 drivers/nvme/target/io-cmd-file.c |  8 ++++----
 drivers/nvme/target/nvmet.h       |  2 +-
 7 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 56c21b50..19ae155 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -284,7 +284,7 @@ static void nvmet_execute_get_log_page_ana(struct nvmet_req *req)
 
 static void nvmet_execute_get_log_page(struct nvmet_req *req)
 {
-	if (!nvmet_check_data_len(req, nvmet_get_log_page_len(req->cmd)))
+	if (!nvmet_check_transfer_len(req, nvmet_get_log_page_len(req->cmd)))
 		return;
 
 	switch (req->cmd->get_log_page.lid) {
@@ -597,7 +597,7 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req)
 
 static void nvmet_execute_identify(struct nvmet_req *req)
 {
-	if (!nvmet_check_data_len(req, NVME_IDENTIFY_DATA_SIZE))
+	if (!nvmet_check_transfer_len(req, NVME_IDENTIFY_DATA_SIZE))
 		return;
 
 	switch (req->cmd->identify.cns) {
@@ -626,7 +626,7 @@ static void nvmet_execute_identify(struct nvmet_req *req)
  */
 static void nvmet_execute_abort(struct nvmet_req *req)
 {
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 	nvmet_set_result(req, 1);
 	nvmet_req_complete(req, 0);
@@ -712,7 +712,7 @@ static void nvmet_execute_set_features(struct nvmet_req *req)
 	u32 cdw10 = le32_to_cpu(req->cmd->common.cdw10);
 	u16 status = 0;
 
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 
 	switch (cdw10 & 0xff) {
@@ -778,7 +778,7 @@ static void nvmet_execute_get_features(struct nvmet_req *req)
 	u32 cdw10 = le32_to_cpu(req->cmd->common.cdw10);
 	u16 status = 0;
 
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 
 	switch (cdw10 & 0xff) {
@@ -845,7 +845,7 @@ void nvmet_execute_async_event(struct nvmet_req *req)
 {
 	struct nvmet_ctrl *ctrl = req->sq->ctrl;
 
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 
 	mutex_lock(&ctrl->lock);
@@ -864,7 +864,7 @@ void nvmet_execute_keep_alive(struct nvmet_req *req)
 {
 	struct nvmet_ctrl *ctrl = req->sq->ctrl;
 
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 
 	pr_debug("ctrl %d update keep-alive timer for %d secs\n",
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index ab3ffc6..092c06f 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -930,9 +930,9 @@ void nvmet_req_uninit(struct nvmet_req *req)
 }
 EXPORT_SYMBOL_GPL(nvmet_req_uninit);
 
-bool nvmet_check_data_len(struct nvmet_req *req, size_t data_len)
+bool nvmet_check_transfer_len(struct nvmet_req *req, size_t len)
 {
-	if (unlikely(data_len != req->transfer_len)) {
+	if (unlikely(len != req->transfer_len)) {
 		req->error_loc = offsetof(struct nvme_common_command, dptr);
 		nvmet_req_complete(req, NVME_SC_SGL_INVALID_DATA | NVME_SC_DNR);
 		return false;
@@ -940,7 +940,7 @@ bool nvmet_check_data_len(struct nvmet_req *req, size_t data_len)
 
 	return true;
 }
-EXPORT_SYMBOL_GPL(nvmet_check_data_len);
+EXPORT_SYMBOL_GPL(nvmet_check_transfer_len);
 
 int nvmet_req_alloc_sgl(struct nvmet_req *req)
 {
diff --git a/drivers/nvme/target/discovery.c b/drivers/nvme/target/discovery.c
index 0c2274b..40cf0b6 100644
--- a/drivers/nvme/target/discovery.c
+++ b/drivers/nvme/target/discovery.c
@@ -171,7 +171,7 @@ static void nvmet_execute_disc_get_log_page(struct nvmet_req *req)
 	u16 status = 0;
 	void *buffer;
 
-	if (!nvmet_check_data_len(req, data_len))
+	if (!nvmet_check_transfer_len(req, data_len))
 		return;
 
 	if (req->cmd->get_log_page.lid != NVME_LOG_DISC) {
@@ -244,7 +244,7 @@ static void nvmet_execute_disc_identify(struct nvmet_req *req)
 	const char model[] = "Linux";
 	u16 status = 0;
 
-	if (!nvmet_check_data_len(req, NVME_IDENTIFY_DATA_SIZE))
+	if (!nvmet_check_transfer_len(req, NVME_IDENTIFY_DATA_SIZE))
 		return;
 
 	if (req->cmd->identify.cns != NVME_ID_CNS_CTRL) {
@@ -298,7 +298,7 @@ static void nvmet_execute_disc_set_features(struct nvmet_req *req)
 	u32 cdw10 = le32_to_cpu(req->cmd->common.cdw10);
 	u16 stat;
 
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 
 	switch (cdw10 & 0xff) {
@@ -324,7 +324,7 @@ static void nvmet_execute_disc_get_features(struct nvmet_req *req)
 	u32 cdw10 = le32_to_cpu(req->cmd->common.cdw10);
 	u16 stat = 0;
 
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 
 	switch (cdw10 & 0xff) {
diff --git a/drivers/nvme/target/fabrics-cmd.c b/drivers/nvme/target/fabrics-cmd.c
index f729747..ee41b2b 100644
--- a/drivers/nvme/target/fabrics-cmd.c
+++ b/drivers/nvme/target/fabrics-cmd.c
@@ -12,7 +12,7 @@ static void nvmet_execute_prop_set(struct nvmet_req *req)
 	u64 val = le64_to_cpu(req->cmd->prop_set.value);
 	u16 status = 0;
 
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 
 	if (req->cmd->prop_set.attrib & 1) {
@@ -41,7 +41,7 @@ static void nvmet_execute_prop_get(struct nvmet_req *req)
 	u16 status = 0;
 	u64 val = 0;
 
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 
 	if (req->cmd->prop_get.attrib & 1) {
@@ -151,7 +151,7 @@ static void nvmet_execute_admin_connect(struct nvmet_req *req)
 	struct nvmet_ctrl *ctrl = NULL;
 	u16 status = 0;
 
-	if (!nvmet_check_data_len(req, sizeof(struct nvmf_connect_data)))
+	if (!nvmet_check_transfer_len(req, sizeof(struct nvmf_connect_data)))
 		return;
 
 	d = kmalloc(sizeof(*d), GFP_KERNEL);
@@ -218,7 +218,7 @@ static void nvmet_execute_io_connect(struct nvmet_req *req)
 	u16 qid = le16_to_cpu(c->qid);
 	u16 status = 0;
 
-	if (!nvmet_check_data_len(req, sizeof(struct nvmf_connect_data)))
+	if (!nvmet_check_transfer_len(req, sizeof(struct nvmf_connect_data)))
 		return;
 
 	d = kmalloc(sizeof(*d), GFP_KERNEL);
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 190cdc8..9f99b9d 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -173,7 +173,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 	sector_t sector;
 	int op, i;
 
-	if (!nvmet_check_data_len(req, nvmet_rw_data_len(req)))
+	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
 		return;
 
 	if (!req->sg_cnt) {
@@ -234,7 +234,7 @@ static void nvmet_bdev_execute_flush(struct nvmet_req *req)
 {
 	struct bio *bio = &req->b.inline_bio;
 
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 
 	bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
@@ -302,7 +302,7 @@ static void nvmet_bdev_execute_discard(struct nvmet_req *req)
 
 static void nvmet_bdev_execute_dsm(struct nvmet_req *req)
 {
-	if (!nvmet_check_data_len(req, nvmet_dsm_len(req)))
+	if (!nvmet_check_transfer_len(req, nvmet_dsm_len(req)))
 		return;
 
 	switch (le32_to_cpu(req->cmd->dsm.attributes)) {
@@ -326,7 +326,7 @@ static void nvmet_bdev_execute_write_zeroes(struct nvmet_req *req)
 	sector_t nr_sector;
 	int ret;
 
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 
 	sector = le64_to_cpu(write_zeroes->slba) <<
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index e0a4859..1492f55 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -232,7 +232,7 @@ static void nvmet_file_execute_rw(struct nvmet_req *req)
 {
 	ssize_t nr_bvec = req->sg_cnt;
 
-	if (!nvmet_check_data_len(req, nvmet_rw_data_len(req)))
+	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
 		return;
 
 	if (!req->sg_cnt || !nr_bvec) {
@@ -276,7 +276,7 @@ static void nvmet_file_flush_work(struct work_struct *w)
 
 static void nvmet_file_execute_flush(struct nvmet_req *req)
 {
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 	INIT_WORK(&req->f.work, nvmet_file_flush_work);
 	schedule_work(&req->f.work);
@@ -336,7 +336,7 @@ static void nvmet_file_dsm_work(struct work_struct *w)
 
 static void nvmet_file_execute_dsm(struct nvmet_req *req)
 {
-	if (!nvmet_check_data_len(req, nvmet_dsm_len(req)))
+	if (!nvmet_check_transfer_len(req, nvmet_dsm_len(req)))
 		return;
 	INIT_WORK(&req->f.work, nvmet_file_dsm_work);
 	schedule_work(&req->f.work);
@@ -366,7 +366,7 @@ static void nvmet_file_write_zeroes_work(struct work_struct *w)
 
 static void nvmet_file_execute_write_zeroes(struct nvmet_req *req)
 {
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 	INIT_WORK(&req->f.work, nvmet_file_write_zeroes_work);
 	schedule_work(&req->f.work);
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 8e6f11f..c8b7095 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -381,7 +381,7 @@ static inline bool nvmet_aen_bit_disabled(struct nvmet_ctrl *ctrl, u32 bn)
 bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
 		struct nvmet_sq *sq, const struct nvmet_fabrics_ops *ops);
 void nvmet_req_uninit(struct nvmet_req *req);
-bool nvmet_check_data_len(struct nvmet_req *req, size_t data_len);
+bool nvmet_check_transfer_len(struct nvmet_req *req, size_t len);
 void nvmet_req_complete(struct nvmet_req *req, u16 status);
 int nvmet_req_alloc_sgl(struct nvmet_req *req);
 void nvmet_req_free_sgl(struct nvmet_req *req);
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 12/15] nvme: Add Metadata Capabilities enumerations
  2020-01-06 13:37 [PATCH 00/15 V3] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (11 preceding siblings ...)
  2020-01-06 13:37 ` [PATCH 11/15] nvmet: Rename nvmet_check_data_len to nvmet_check_transfer_len Max Gurtovoy
@ 2020-01-06 13:37 ` Max Gurtovoy
  2020-01-06 13:37 ` [PATCH 13/15] nvmet: Add metadata/T10-PI support Max Gurtovoy
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-06 13:37 UTC (permalink / raw)
  To: linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, shlomin, israelr, idanb, oren, maxg

From: Israel Rukshin <israelr@mellanox.com>

The enumerations will be used to expose the namespace metadata format by
the target.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
---
 include/linux/nvme.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 3d5189f..4e968d3 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -396,6 +396,8 @@ enum {
 	NVME_NS_FEAT_THIN	= 1 << 0,
 	NVME_NS_FLBAS_LBA_MASK	= 0xf,
 	NVME_NS_FLBAS_META_EXT	= 0x10,
+	NVME_NS_MC_META_EXT	= 1 << 0,
+	NVME_NS_MC_META_BUF	= 1 << 1,
 	NVME_LBAF_RP_BEST	= 0,
 	NVME_LBAF_RP_BETTER	= 1,
 	NVME_LBAF_RP_GOOD	= 2,
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 13/15] nvmet: Add metadata/T10-PI support
  2020-01-06 13:37 [PATCH 00/15 V3] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (12 preceding siblings ...)
  2020-01-06 13:37 ` [PATCH 12/15] nvme: Add Metadata Capabilities enumerations Max Gurtovoy
@ 2020-01-06 13:37 ` Max Gurtovoy
  2020-01-09  3:24   ` Martin K. Petersen
  2020-01-17 16:46   ` James Smart
  2020-01-06 13:37 ` [PATCH 14/15] nvmet: Add metadata support for block devices Max Gurtovoy
  2020-01-06 13:37 ` [PATCH 15/15] nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
  15 siblings, 2 replies; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-06 13:37 UTC (permalink / raw)
  To: linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, shlomin, israelr, idanb, oren, maxg

From: Israel Rukshin <israelr@mellanox.com>

Expose the namespace metadata format when PI is enabled. The user needs
to enable the capability per subsystem and per port. The other metadata
properties are taken from the namespace/bdev.

Usage example:
echo 1 > /config/nvmet/subsystems/${NAME}/attr_pi_enable
echo 1 > /config/nvmet/ports/${PORT_NUM}/param_pi_enable

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
---
 drivers/nvme/target/admin-cmd.c   | 23 ++++++++++++---
 drivers/nvme/target/configfs.c    | 61 +++++++++++++++++++++++++++++++++++++++
 drivers/nvme/target/fabrics-cmd.c | 11 +++++++
 drivers/nvme/target/nvmet.h       | 28 ++++++++++++++++++
 4 files changed, 119 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 19ae155..20e7f08 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -346,8 +346,8 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 	/* we support multiple ports, multiples hosts and ANA: */
 	id->cmic = (1 << 0) | (1 << 1) | (1 << 3);
 
-	/* no limit on data transfer sizes for now */
-	id->mdts = 0;
+	/* Limit MDTS for metadata capable controllers (assume mpsmin is 4k) */
+	id->mdts = ctrl->pi_support ? ilog2(NVMET_T10_PI_MAX_KB_SZ >> 2) : 0;
 	id->cntlid = cpu_to_le16(ctrl->cntlid);
 	id->ver = cpu_to_le32(ctrl->subsys->ver);
 
@@ -405,9 +405,13 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 
 	strlcpy(id->subnqn, ctrl->subsys->subsysnqn, sizeof(id->subnqn));
 
-	/* Max command capsule size is sqe + single page of in-capsule data */
+	/*
+	 * Max command capsule size is sqe + single page of in-capsule data.
+	 * Disable inline data for Metadata capable controllers.
+	 */
 	id->ioccsz = cpu_to_le32((sizeof(struct nvme_command) +
-				  req->port->inline_data_size) / 16);
+				  req->port->inline_data_size *
+				  !ctrl->pi_support) / 16);
 	/* Max response capsule size is cqe */
 	id->iorcsz = cpu_to_le32(sizeof(struct nvme_completion) / 16);
 
@@ -437,6 +441,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 
 static void nvmet_execute_identify_ns(struct nvmet_req *req)
 {
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
 	struct nvmet_ns *ns;
 	struct nvme_id_ns *id;
 	u16 status = 0;
@@ -493,6 +498,16 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
 
 	id->lbaf[0].ds = ns->blksize_shift;
 
+	if (ctrl->pi_support && nvmet_ns_has_pi(ns)) {
+		id->dpc = NVME_NS_DPC_PI_FIRST | NVME_NS_DPC_PI_LAST |
+			  NVME_NS_DPC_PI_TYPE1 | NVME_NS_DPC_PI_TYPE2 |
+			  NVME_NS_DPC_PI_TYPE3;
+		id->mc = NVME_NS_MC_META_EXT;
+		id->dps = ns->prot_type;
+		id->flbas = NVME_NS_FLBAS_META_EXT;
+		id->lbaf[0].ms = ns->ms;
+	}
+
 	if (ns->readonly)
 		id->nsattr |= (1 << 0);
 	nvmet_put_namespace(ns);
diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 98613a4..7543025 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -248,6 +248,36 @@ static ssize_t nvmet_param_inline_data_size_store(struct config_item *item,
 
 CONFIGFS_ATTR(nvmet_, param_inline_data_size);
 
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+static ssize_t nvmet_param_pi_enable_show(struct config_item *item,
+		char *page)
+{
+	struct nvmet_port *port = to_nvmet_port(item);
+
+	return snprintf(page, PAGE_SIZE, "%d\n", port->pi_enable);
+}
+
+static ssize_t nvmet_param_pi_enable_store(struct config_item *item,
+		const char *page, size_t count)
+{
+	struct nvmet_port *port = to_nvmet_port(item);
+	bool val;
+
+	if (strtobool(page, &val))
+		return -EINVAL;
+
+	if (port->enabled) {
+		pr_err("Disable port before setting pi_enable value.\n");
+		return -EACCES;
+	}
+
+	port->pi_enable = val;
+	return count;
+}
+
+CONFIGFS_ATTR(nvmet_, param_pi_enable);
+#endif
+
 static ssize_t nvmet_addr_trtype_show(struct config_item *item,
 		char *page)
 {
@@ -862,10 +892,38 @@ static ssize_t nvmet_subsys_attr_serial_store(struct config_item *item,
 }
 CONFIGFS_ATTR(nvmet_subsys_, attr_serial);
 
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+static ssize_t nvmet_subsys_attr_pi_enable_show(struct config_item *item,
+						char *page)
+{
+	return snprintf(page, PAGE_SIZE, "%d\n", to_subsys(item)->pi_support);
+}
+
+static ssize_t nvmet_subsys_attr_pi_enable_store(struct config_item *item,
+						 const char *page, size_t count)
+{
+	struct nvmet_subsys *subsys = to_subsys(item);
+	bool pi_enable;
+
+	if (strtobool(page, &pi_enable))
+		return -EINVAL;
+
+	down_write(&nvmet_config_sem);
+	subsys->pi_support = pi_enable;
+	up_write(&nvmet_config_sem);
+
+	return count;
+}
+CONFIGFS_ATTR(nvmet_subsys_, attr_pi_enable);
+#endif
+
 static struct configfs_attribute *nvmet_subsys_attrs[] = {
 	&nvmet_subsys_attr_attr_allow_any_host,
 	&nvmet_subsys_attr_attr_version,
 	&nvmet_subsys_attr_attr_serial,
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+	&nvmet_subsys_attr_attr_pi_enable,
+#endif
 	NULL,
 };
 
@@ -1161,6 +1219,9 @@ static void nvmet_port_release(struct config_item *item)
 	&nvmet_attr_addr_trsvcid,
 	&nvmet_attr_addr_trtype,
 	&nvmet_attr_param_inline_data_size,
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+	&nvmet_attr_param_pi_enable,
+#endif
 	NULL,
 };
 
diff --git a/drivers/nvme/target/fabrics-cmd.c b/drivers/nvme/target/fabrics-cmd.c
index ee41b2b..7211996 100644
--- a/drivers/nvme/target/fabrics-cmd.c
+++ b/drivers/nvme/target/fabrics-cmd.c
@@ -192,6 +192,17 @@ static void nvmet_execute_admin_connect(struct nvmet_req *req)
 		goto out;
 	}
 
+	if (ctrl->subsys->pi_support && ctrl->port->pi_enable) {
+		if (ctrl->port->pi_capable) {
+			ctrl->pi_support = true;
+			pr_info("controller %d T10-PI enabled\n", ctrl->cntlid);
+		} else {
+			ctrl->pi_support = false;
+			pr_warn("T10-PI is not supported on controller %d\n",
+				ctrl->cntlid);
+		}
+	}
+
 	uuid_copy(&ctrl->hostid, &d->hostid);
 
 	status = nvmet_install_queue(ctrl, req);
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index c8b7095..5ee01e0 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -144,6 +144,8 @@ struct nvmet_port {
 	bool				enabled;
 	int				inline_data_size;
 	const struct nvmet_fabrics_ops	*tr_ops;
+	bool				pi_capable;
+	bool				pi_enable;
 };
 
 static inline struct nvmet_port *to_nvmet_port(struct config_item *item)
@@ -203,6 +205,7 @@ struct nvmet_ctrl {
 	spinlock_t		error_lock;
 	u64			err_counter;
 	struct nvme_error_slot	slots[NVMET_ERROR_LOG_SLOTS];
+	bool			pi_support;
 };
 
 struct nvmet_subsys {
@@ -225,6 +228,7 @@ struct nvmet_subsys {
 	u64			ver;
 	u64			serial;
 	char			*subsysnqn;
+	bool			pi_support;
 
 	struct config_group	group;
 
@@ -501,6 +505,30 @@ static inline u32 nvmet_rw_data_len(struct nvmet_req *req)
 			req->ns->blksize_shift;
 }
 
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+static inline u32 nvmet_rw_prot_len(struct nvmet_req *req)
+{
+	return ((u32)le16_to_cpu(req->cmd->rw.length) + 1) * req->ns->ms;
+}
+
+static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
+{
+	return ns->prot_type && ns->ms == sizeof(struct t10_pi_tuple);
+}
+#else
+static inline u32 nvmet_rw_prot_len(struct nvmet_req *req)
+{
+	return 0;
+}
+
+static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
+{
+	return false;
+}
+#endif /* CONFIG_BLK_DEV_INTEGRITY */
+
+#define NVMET_T10_PI_MAX_KB_SZ 128
+
 static inline u32 nvmet_dsm_len(struct nvmet_req *req)
 {
 	return (le32_to_cpu(req->cmd->dsm.nr) + 1) *
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 14/15] nvmet: Add metadata support for block devices
  2020-01-06 13:37 [PATCH 00/15 V3] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (13 preceding siblings ...)
  2020-01-06 13:37 ` [PATCH 13/15] nvmet: Add metadata/T10-PI support Max Gurtovoy
@ 2020-01-06 13:37 ` Max Gurtovoy
  2020-01-06 13:37 ` [PATCH 15/15] nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
  15 siblings, 0 replies; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-06 13:37 UTC (permalink / raw)
  To: linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, shlomin, israelr, idanb, oren, maxg

From: Israel Rukshin <israelr@mellanox.com>

Create a block IO request for the metadata from the protection SG list.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
---
 drivers/nvme/target/io-cmd-bdev.c | 87 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 85 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 9f99b9d..1d84e71 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -164,6 +164,61 @@ static void nvmet_bio_done(struct bio *bio)
 		bio_put(bio);
 }
 
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+static int nvmet_bdev_alloc_bip(struct nvmet_req *req, struct bio *bio,
+				struct sg_mapping_iter *miter)
+{
+	struct blk_integrity *bi;
+	struct bio_integrity_payload *bip;
+	struct block_device *bdev = req->ns->bdev;
+	int rc;
+	size_t resid, len;
+
+	bi = bdev_get_integrity(bdev);
+	if (unlikely(!bi)) {
+		pr_err("Unable to locate bio_integrity\n");
+		return -ENODEV;
+	}
+
+	bip = bio_integrity_alloc(bio, GFP_NOIO,
+			min_t(unsigned int, req->prot_sg_cnt, BIO_MAX_PAGES));
+	if (IS_ERR(bip)) {
+		pr_err("Unable to allocate bio_integrity_payload\n");
+		return PTR_ERR(bip);
+	}
+
+	bip->bip_iter.bi_size = bio_integrity_bytes(bi, bio_sectors(bio));
+	/* virtual start sector must be in integrity interval units */
+	bip_set_seed(bip, bio->bi_iter.bi_sector >>
+				  (bi->interval_exp - SECTOR_SHIFT));
+
+	resid = bip->bip_iter.bi_size;
+	while (resid > 0 && sg_miter_next(miter)) {
+		len = min_t(size_t, miter->length, resid);
+		rc = bio_integrity_add_page(bio, miter->page, len,
+					    offset_in_page(miter->addr));
+		if (unlikely(rc != len)) {
+			pr_err("bio_integrity_add_page() failed; %d\n", rc);
+			sg_miter_stop(miter);
+			return -ENOMEM;
+		}
+
+		resid -= len;
+		if (len < miter->length)
+			miter->consumed -= miter->length - len;
+	}
+	sg_miter_stop(miter);
+
+	return 0;
+}
+#else
+static int nvmet_bdev_alloc_bip(struct nvmet_req *req, struct bio *bio,
+				struct sg_mapping_iter *miter)
+{
+	return 0;
+}
+#endif /* CONFIG_BLK_DEV_INTEGRITY */
+
 static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 {
 	int sg_cnt = req->sg_cnt;
@@ -171,9 +226,11 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 	struct scatterlist *sg;
 	struct blk_plug plug;
 	sector_t sector;
-	int op, i;
+	int op, i, rc;
+	struct sg_mapping_iter prot_miter;
 
-	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
+	if (!nvmet_check_transfer_len(req,
+				      nvmet_rw_data_len(req) + req->prot_len))
 		return;
 
 	if (!req->sg_cnt) {
@@ -208,11 +265,25 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 	bio->bi_opf = op;
 
 	blk_start_plug(&plug);
+	if (req->use_pi)
+		sg_miter_start(&prot_miter, req->prot_sg, req->prot_sg_cnt,
+			       op == REQ_OP_READ ? SG_MITER_FROM_SG :
+						   SG_MITER_TO_SG);
+
 	for_each_sg(req->sg, sg, req->sg_cnt, i) {
 		while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset)
 				!= sg->length) {
 			struct bio *prev = bio;
 
+			if (req->use_pi) {
+				rc = nvmet_bdev_alloc_bip(req, bio,
+							  &prot_miter);
+				if (unlikely(rc)) {
+					bio_io_error(bio);
+					return;
+				}
+			}
+
 			bio = bio_alloc(GFP_KERNEL, min(sg_cnt, BIO_MAX_PAGES));
 			bio_set_dev(bio, req->ns->bdev);
 			bio->bi_iter.bi_sector = sector;
@@ -226,6 +297,14 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 		sg_cnt--;
 	}
 
+	if (req->use_pi) {
+		rc = nvmet_bdev_alloc_bip(req, bio, &prot_miter);
+		if (unlikely(rc)) {
+			bio_io_error(bio);
+			return;
+		}
+	}
+
 	submit_bio(bio);
 	blk_finish_plug(&plug);
 }
@@ -353,6 +432,10 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
 	case nvme_cmd_read:
 	case nvme_cmd_write:
 		req->execute = nvmet_bdev_execute_rw;
+		if (req->sq->ctrl->pi_support && nvmet_ns_has_pi(req->ns)) {
+			req->use_pi = true;
+			req->prot_len = nvmet_rw_prot_len(req);
+		}
 		return 0;
 	case nvme_cmd_flush:
 		req->execute = nvmet_bdev_execute_flush;
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 15/15] nvmet-rdma: Add metadata/T10-PI support
  2020-01-06 13:37 [PATCH 00/15 V3] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (14 preceding siblings ...)
  2020-01-06 13:37 ` [PATCH 14/15] nvmet: Add metadata support for block devices Max Gurtovoy
@ 2020-01-06 13:37 ` Max Gurtovoy
  2020-01-09  3:29   ` Martin K. Petersen
  15 siblings, 1 reply; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-06 13:37 UTC (permalink / raw)
  To: linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, shlomin, israelr, idanb, oren, maxg

From: Israel Rukshin <israelr@mellanox.com>

For capable HCAs (e.g. ConnectX-4/ConnectX-5) this will allow end-to-end
protection information passthrough and validation for NVMe over RDMA
transport. Metadata support was implemented over the new RDMA signature
verbs API.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
---
 drivers/nvme/target/rdma.c | 235 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 217 insertions(+), 18 deletions(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 37d262a..bdd3299 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -54,6 +54,7 @@ struct nvmet_rdma_rsp {
 	struct nvmet_rdma_queue	*queue;
 
 	struct ib_cqe		read_cqe;
+	struct ib_cqe		write_cqe;
 	struct rdma_rw_ctx	rw;
 
 	struct nvmet_req	req;
@@ -129,6 +130,9 @@ struct nvmet_rdma_device {
 static void nvmet_rdma_send_done(struct ib_cq *cq, struct ib_wc *wc);
 static void nvmet_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc);
 static void nvmet_rdma_read_data_done(struct ib_cq *cq, struct ib_wc *wc);
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+static void nvmet_rdma_write_data_done(struct ib_cq *cq, struct ib_wc *wc);
+#endif
 static void nvmet_rdma_qp_event(struct ib_event *event, void *priv);
 static void nvmet_rdma_queue_disconnect(struct nvmet_rdma_queue *queue);
 static void nvmet_rdma_free_rsp(struct nvmet_rdma_device *ndev,
@@ -386,6 +390,10 @@ static int nvmet_rdma_alloc_rsp(struct nvmet_rdma_device *ndev,
 
 	/* Data In / RDMA READ */
 	r->read_cqe.done = nvmet_rdma_read_data_done;
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+	/* Data Out / RDMA WRITE */
+	r->write_cqe.done = nvmet_rdma_write_data_done;
+#endif
 	return 0;
 
 out_free_rsp:
@@ -495,6 +503,138 @@ static void nvmet_rdma_process_wr_wait_list(struct nvmet_rdma_queue *queue)
 	spin_unlock(&queue->rsp_wr_wait_lock);
 }
 
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+static u16 nvmet_rdma_check_pi_status(struct ib_mr *sig_mr)
+{
+	struct ib_mr_status mr_status;
+	int ret;
+	u16 status = 0;
+
+	ret = ib_check_mr_status(sig_mr, IB_MR_CHECK_SIG_STATUS, &mr_status);
+	if (ret) {
+		pr_err("ib_check_mr_status failed, ret %d\n", ret);
+		return NVME_SC_INVALID_PI;
+	}
+
+	if (mr_status.fail_status & IB_MR_CHECK_SIG_STATUS) {
+		switch (mr_status.sig_err.err_type) {
+		case IB_SIG_BAD_GUARD:
+			status = NVME_SC_GUARD_CHECK;
+			break;
+		case IB_SIG_BAD_REFTAG:
+			status = NVME_SC_REFTAG_CHECK;
+			break;
+		case IB_SIG_BAD_APPTAG:
+			status = NVME_SC_APPTAG_CHECK;
+			break;
+		}
+		pr_err("PI error found type %d expected 0x%x vs actual 0x%x\n",
+		       mr_status.sig_err.err_type,
+		       mr_status.sig_err.expected,
+		       mr_status.sig_err.actual);
+	}
+
+	return status;
+}
+
+static void nvmet_rdma_set_diff_domain(struct nvme_command *cmd,
+		struct ib_sig_domain *domain, struct nvmet_ns *ns)
+{
+	struct blk_integrity *bi = bdev_get_integrity(ns->bdev);
+
+	WARN_ON(bi == NULL);
+
+	domain->sig_type = IB_SIG_TYPE_T10_DIF;
+	domain->sig.dif.bg_type = IB_T10DIF_CRC;
+	domain->sig.dif.pi_interval = 1 << bi->interval_exp;
+	domain->sig.dif.ref_tag = le32_to_cpu(cmd->rw.reftag);
+
+	/*
+	 * At the moment we hard code those, but in the future
+	 * we will take them from cmd.
+	 */
+	domain->sig.dif.apptag_check_mask = 0xffff;
+	domain->sig.dif.app_escape = true;
+	domain->sig.dif.ref_escape = true;
+	if (ns->prot_type != NVME_NS_DPS_PI_TYPE3)
+		domain->sig.dif.ref_remap = true;
+}
+
+static void nvmet_rdma_set_sig_attrs(struct nvmet_req *req,
+				     struct ib_sig_attrs *sig_attrs)
+{
+	struct nvme_command *cmd = req->cmd;
+	u16 control = le16_to_cpu(cmd->rw.control);
+
+	memset(sig_attrs, 0, sizeof(*sig_attrs));
+
+	if (control & NVME_RW_PRINFO_PRACT) {
+		/* for WRITE_INSERT/READ_STRIP no wire domain */
+		sig_attrs->wire.sig_type = IB_SIG_TYPE_NONE;
+		nvmet_rdma_set_diff_domain(cmd, &sig_attrs->mem, req->ns);
+		/* Clear the PRACT bit since HCA will generate/verify the PI */
+		control &= ~NVME_RW_PRINFO_PRACT;
+		cmd->rw.control = cpu_to_le16(control);
+		/* PI is added by the HW */
+		req->transfer_len += req->prot_len;
+	} else {
+		/* for WRITE_PASS/READ_PASS both wire/memory domains exist */
+		nvmet_rdma_set_diff_domain(cmd, &sig_attrs->wire, req->ns);
+		nvmet_rdma_set_diff_domain(cmd, &sig_attrs->mem, req->ns);
+	}
+
+	if (le16_to_cpu(cmd->rw.control) & NVME_RW_PRINFO_PRCHK_REF)
+		sig_attrs->check_mask |= IB_SIG_CHECK_REFTAG;
+	if (le16_to_cpu(cmd->rw.control) & NVME_RW_PRINFO_PRCHK_GUARD)
+		sig_attrs->check_mask |= IB_SIG_CHECK_GUARD;
+	if (le16_to_cpu(cmd->rw.control) & NVME_RW_PRINFO_PRCHK_APP)
+		sig_attrs->check_mask |= IB_SIG_CHECK_APPTAG;
+}
+#else
+static u16 nvmet_rdma_check_pi_status(struct ib_mr *sig_mr)
+{
+	return 0;
+}
+
+static void nvmet_rdma_set_sig_attrs(struct nvmet_req *req,
+				     struct ib_sig_attrs *sig_attrs)
+{
+}
+#endif /* CONFIG_BLK_DEV_INTEGRITY */
+
+static int nvmet_rdma_rw_ctx_init(struct nvmet_rdma_rsp *rsp, u64 addr, u32 key,
+				  struct ib_sig_attrs *sig_attrs)
+{
+	struct rdma_cm_id *cm_id = rsp->queue->cm_id;
+	struct nvmet_req *req = &rsp->req;
+	int ret;
+
+	if (req->use_pi)
+		ret = rdma_rw_ctx_signature_init(&rsp->rw, cm_id->qp,
+			cm_id->port_num, req->sg, req->sg_cnt, req->prot_sg,
+			req->prot_sg_cnt, sig_attrs, addr, key,
+			nvmet_data_dir(req));
+	else
+		ret = rdma_rw_ctx_init(&rsp->rw, cm_id->qp, cm_id->port_num,
+				       req->sg, req->sg_cnt, 0, addr, key,
+				       nvmet_data_dir(req));
+
+	return ret;
+}
+
+static void nvmet_rdma_rw_ctx_destroy(struct nvmet_rdma_rsp *rsp)
+{
+	struct rdma_cm_id *cm_id = rsp->queue->cm_id;
+	struct nvmet_req *req = &rsp->req;
+
+	if (req->use_pi)
+		rdma_rw_ctx_destroy_signature(&rsp->rw, cm_id->qp,
+			cm_id->port_num, req->sg, req->sg_cnt, req->prot_sg,
+			req->prot_sg_cnt, nvmet_data_dir(req));
+	else
+		rdma_rw_ctx_destroy(&rsp->rw, cm_id->qp, cm_id->port_num,
+				    req->sg, req->sg_cnt, nvmet_data_dir(req));
+}
 
 static void nvmet_rdma_release_rsp(struct nvmet_rdma_rsp *rsp)
 {
@@ -502,11 +642,8 @@ static void nvmet_rdma_release_rsp(struct nvmet_rdma_rsp *rsp)
 
 	atomic_add(1 + rsp->n_rdma, &queue->sq_wr_avail);
 
-	if (rsp->n_rdma) {
-		rdma_rw_ctx_destroy(&rsp->rw, queue->cm_id->qp,
-				queue->cm_id->port_num, rsp->req.sg,
-				rsp->req.sg_cnt, nvmet_data_dir(&rsp->req));
-	}
+	if (rsp->n_rdma)
+		nvmet_rdma_rw_ctx_destroy(rsp);
 
 	if (rsp->req.sg != rsp->cmd->inline_sg)
 		nvmet_req_free_sgl(&rsp->req);
@@ -561,11 +698,16 @@ static void nvmet_rdma_queue_response(struct nvmet_req *req)
 		rsp->send_wr.opcode = IB_WR_SEND;
 	}
 
-	if (nvmet_rdma_need_data_out(rsp))
-		first_wr = rdma_rw_ctx_wrs(&rsp->rw, cm_id->qp,
-				cm_id->port_num, NULL, &rsp->send_wr);
-	else
+	if (nvmet_rdma_need_data_out(rsp)) {
+		if (rsp->req.use_pi)
+			first_wr = rdma_rw_ctx_wrs(&rsp->rw, cm_id->qp,
+					cm_id->port_num, &rsp->write_cqe, NULL);
+		else
+			first_wr = rdma_rw_ctx_wrs(&rsp->rw, cm_id->qp,
+					cm_id->port_num, NULL, &rsp->send_wr);
+	} else {
 		first_wr = &rsp->send_wr;
+	}
 
 	nvmet_rdma_post_recv(rsp->queue->dev, rsp->cmd);
 
@@ -584,15 +726,14 @@ static void nvmet_rdma_read_data_done(struct ib_cq *cq, struct ib_wc *wc)
 	struct nvmet_rdma_rsp *rsp =
 		container_of(wc->wr_cqe, struct nvmet_rdma_rsp, read_cqe);
 	struct nvmet_rdma_queue *queue = cq->cq_context;
+	u16 status = 0;
 
 	WARN_ON(rsp->n_rdma <= 0);
 	atomic_add(rsp->n_rdma, &queue->sq_wr_avail);
-	rdma_rw_ctx_destroy(&rsp->rw, queue->cm_id->qp,
-			queue->cm_id->port_num, rsp->req.sg,
-			rsp->req.sg_cnt, nvmet_data_dir(&rsp->req));
 	rsp->n_rdma = 0;
 
 	if (unlikely(wc->status != IB_WC_SUCCESS)) {
+		nvmet_rdma_rw_ctx_destroy(rsp);
 		nvmet_req_uninit(&rsp->req);
 		nvmet_rdma_release_rsp(rsp);
 		if (wc->status != IB_WC_WR_FLUSH_ERR) {
@@ -603,9 +744,59 @@ static void nvmet_rdma_read_data_done(struct ib_cq *cq, struct ib_wc *wc)
 		return;
 	}
 
-	rsp->req.execute(&rsp->req);
+	if (rsp->req.use_pi)
+		status = nvmet_rdma_check_pi_status(rsp->rw.reg->mr);
+	nvmet_rdma_rw_ctx_destroy(rsp);
+
+	if (unlikely(status))
+		nvmet_req_complete(&rsp->req, status);
+	else
+		rsp->req.execute(&rsp->req);
 }
 
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+static void nvmet_rdma_write_data_done(struct ib_cq *cq, struct ib_wc *wc)
+{
+	struct nvmet_rdma_rsp *rsp =
+		container_of(wc->wr_cqe, struct nvmet_rdma_rsp, write_cqe);
+	struct nvmet_rdma_queue *queue = cq->cq_context;
+	struct rdma_cm_id *cm_id = rsp->queue->cm_id;
+	u16 status;
+
+	WARN_ON(rsp->n_rdma <= 0);
+	atomic_add(rsp->n_rdma, &queue->sq_wr_avail);
+	rsp->n_rdma = 0;
+
+	if (unlikely(wc->status != IB_WC_SUCCESS)) {
+		nvmet_rdma_rw_ctx_destroy(rsp);
+		nvmet_req_uninit(&rsp->req);
+		nvmet_rdma_release_rsp(rsp);
+		if (wc->status != IB_WC_WR_FLUSH_ERR) {
+			pr_info("RDMA WRITE for CQE 0x%p failed with status %s (%d).\n",
+				wc->wr_cqe, ib_wc_status_msg(wc->status),
+				wc->status);
+			nvmet_rdma_error_comp(queue);
+		}
+		return;
+	}
+
+	/*
+	 * Upon RDMA completion check the signature status
+	 * - if succeeded send good NVMe response
+	 * - if failed send bad NVMe response with appropriate error
+	 */
+	status = nvmet_rdma_check_pi_status(rsp->rw.reg->mr);
+	if (unlikely(status))
+		rsp->req.cqe->status = cpu_to_le16(status << 1);
+	nvmet_rdma_rw_ctx_destroy(rsp);
+
+	if (unlikely(ib_post_send(cm_id->qp, &rsp->send_wr, NULL))) {
+		pr_err("sending cmd response failed\n");
+		nvmet_rdma_release_rsp(rsp);
+	}
+}
+#endif /* CONFIG_BLK_DEV_INTEGRITY */
+
 static void nvmet_rdma_use_inline_sg(struct nvmet_rdma_rsp *rsp, u32 len,
 		u64 off)
 {
@@ -660,9 +851,9 @@ static u16 nvmet_rdma_map_sgl_inline(struct nvmet_rdma_rsp *rsp)
 static u16 nvmet_rdma_map_sgl_keyed(struct nvmet_rdma_rsp *rsp,
 		struct nvme_keyed_sgl_desc *sgl, bool invalidate)
 {
-	struct rdma_cm_id *cm_id = rsp->queue->cm_id;
 	u64 addr = le64_to_cpu(sgl->addr);
 	u32 key = get_unaligned_le32(sgl->key);
+	struct ib_sig_attrs sig_attrs;
 	int ret;
 
 	rsp->req.transfer_len = get_unaligned_le24(sgl->length);
@@ -671,13 +862,14 @@ static u16 nvmet_rdma_map_sgl_keyed(struct nvmet_rdma_rsp *rsp,
 	if (!rsp->req.transfer_len)
 		return 0;
 
+	if (rsp->req.use_pi)
+		nvmet_rdma_set_sig_attrs(&rsp->req, &sig_attrs);
+
 	ret = nvmet_req_alloc_sgl(&rsp->req);
 	if (unlikely(ret < 0))
 		goto error_out;
 
-	ret = rdma_rw_ctx_init(&rsp->rw, cm_id->qp, cm_id->port_num,
-			rsp->req.sg, rsp->req.sg_cnt, 0, addr, key,
-			nvmet_data_dir(&rsp->req));
+	ret = nvmet_rdma_rw_ctx_init(rsp, addr, key, &sig_attrs);
 	if (unlikely(ret < 0))
 		goto error_out;
 	rsp->n_rdma += ret;
@@ -956,6 +1148,9 @@ static void nvmet_rdma_free_dev(struct kref *ref)
 			goto out_free_pd;
 	}
 
+	port->pi_capable = ndev->device->attrs.device_cap_flags &
+			IB_DEVICE_INTEGRITY_HANDOVER ? true : false;
+
 	list_add(&ndev->entry, &device_list);
 out_unlock:
 	mutex_unlock(&device_list_mutex);
@@ -1020,6 +1215,10 @@ static int nvmet_rdma_create_queue_ib(struct nvmet_rdma_queue *queue)
 		qp_attr.cap.max_recv_sge = 1 + ndev->inline_page_count;
 	}
 
+	if (queue->port->pi_capable && queue->port->pi_enable &&
+	    queue->host_qid)
+		qp_attr.create_flags |= IB_QP_CREATE_INTEGRITY_EN;
+
 	ret = rdma_create_qp(queue->cm_id, ndev->pd, &qp_attr);
 	if (ret) {
 		pr_err("failed to create_qp ret= %d\n", ret);
@@ -1164,6 +1363,7 @@ static int nvmet_rdma_cm_reject(struct rdma_cm_id *cm_id,
 	INIT_WORK(&queue->release_work, nvmet_rdma_release_queue_work);
 	queue->dev = ndev;
 	queue->cm_id = cm_id;
+	queue->port = cm_id->context;
 
 	spin_lock_init(&queue->state_lock);
 	queue->state = NVMET_RDMA_Q_CONNECTING;
@@ -1282,7 +1482,6 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id,
 		ret = -ENOMEM;
 		goto put_device;
 	}
-	queue->port = cm_id->context;
 
 	if (queue->host_qid == 0) {
 		/* Let inflight controller teardown complete */
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/15] nvme: Introduce namespace features flag
  2020-01-06 13:37 ` [PATCH 01/15] nvme: Introduce namespace features flag Max Gurtovoy
@ 2020-01-07 18:07   ` Keith Busch
  2020-01-08 12:00     ` Max Gurtovoy
  2020-01-09  3:11   ` Martin K. Petersen
  1 sibling, 1 reply; 49+ messages in thread
From: Keith Busch @ 2020-01-07 18:07 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, sagi, martin.petersen, shlomin, israelr, vladimirk,
	linux-nvme, idanb, oren, hch

On Mon, Jan 06, 2020 at 03:37:22PM +0200, Max Gurtovoy wrote:
> From: Israel Rukshin <israelr@mellanox.com>
> 
> Centralize all the integrity checks to one place and make the code more
> readable. Also add has_pi field to the nvme_req structure as well, so the
> transport drivers could use it.

I think the two changes should be in different patches, splitting the
namespace "features" flags from nvme_request "has_pi". I'm not even
sure there's a need for the per-IO settings and checks for "has_pi",
though. I don't find any transport driver changes in this series using
this flag outside this patch.
 
> @@ -1834,12 +1831,29 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
> +	if (ns->ms) {
> +		if (id->flbas & NVME_NS_FLBAS_META_EXT)
> +			ns->features |= NVME_NS_EXT_LBAS;
> +
> +		/*
> +		 * For PCI, Extended logical block will be generated by the
> +		 * controller.
> +		 */
> +		if (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) {
> +			if (!(ns->features & NVME_NS_EXT_LBAS))

A little simpler:

		if (id->flbas & NVME_NS_FLBAS_META_EXT)
			ns->features |= NVME_NS_EXT_LBAS;
		else if (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED)
			ns->features |= NVME_NS_DIX_SUPPORTED;

> +				ns->features |= NVME_NS_DIX_SUPPORTED;
> +		}
> +	}

It's not really a "DIX" flag since we can observe separate metadata
formats for non-DIX related use. Can we use a more generic name for this
feature flag? maybe "NVME_NS_MS_SEPARATE".

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/15] nvme: Introduce namespace features flag
  2020-01-07 18:07   ` Keith Busch
@ 2020-01-08 12:00     ` Max Gurtovoy
  0 siblings, 0 replies; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-08 12:00 UTC (permalink / raw)
  To: Keith Busch
  Cc: axboe, sagi, martin.petersen, shlomin, israelr, vladimirk,
	linux-nvme, idanb, oren, hch


On 1/7/2020 8:07 PM, Keith Busch wrote:
> On Mon, Jan 06, 2020 at 03:37:22PM +0200, Max Gurtovoy wrote:
>> From: Israel Rukshin <israelr@mellanox.com>
>>
>> Centralize all the integrity checks to one place and make the code more
>> readable. Also add has_pi field to the nvme_req structure as well, so the
>> transport drivers could use it.
> I think the two changes should be in different patches, splitting the
> namespace "features" flags from nvme_request "has_pi". I'm not even
> sure there's a need for the per-IO settings and checks for "has_pi",
> though. I don't find any transport driver changes in this series using
> this flag outside this patch.

Sure, we can split it to 2 commits.

"has_pi" flag is used by RDMA transport (commit 7/15 "nvme-rdma: Add 
metadata/T10-PI support")

>   
>> @@ -1834,12 +1831,29 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
>> +	if (ns->ms) {
>> +		if (id->flbas & NVME_NS_FLBAS_META_EXT)
>> +			ns->features |= NVME_NS_EXT_LBAS;
>> +
>> +		/*
>> +		 * For PCI, Extended logical block will be generated by the
>> +		 * controller.
>> +		 */
>> +		if (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) {
>> +			if (!(ns->features & NVME_NS_EXT_LBAS))
> A little simpler:
>
> 		if (id->flbas & NVME_NS_FLBAS_META_EXT)
> 			ns->features |= NVME_NS_EXT_LBAS;
> 		else if (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED)
> 			ns->features |= NVME_NS_DIX_SUPPORTED;

In fabrics case this will not be true...

>
>> +				ns->features |= NVME_NS_DIX_SUPPORTED;
>> +		}
>> +	}
> It's not really a "DIX" flag since we can observe separate metadata
> formats for non-DIX related use. Can we use a more generic name for this
> feature flag? maybe "NVME_NS_MS_SEPARATE".

The meaning of NVME_NS_DIX_SUPPORTED is that the block layer is 
*allowed* to send metadata to the namespace (and we later on will call 
nvme_init_integrity).

In case of PCI transport, this will be set only if the SSD use 
"separate" mode since Linux block layer only support "separate" mode and 
the nvme_pci driver pass these buffers to the drive (no conversion to 
extended mode if needed).

In case of fabrics transport, this will be set only if the user asked 
for metadata support in the connect command (pi_enable). Again, Linux 
block layer will send "separate" buffers and the transport driver will 
transform it to "extended" mode (this is a *must* since in fabrics, 
according to the spec, only extended mode supported).




_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/15] nvme: Introduce namespace features flag
  2020-01-06 13:37 ` [PATCH 01/15] nvme: Introduce namespace features flag Max Gurtovoy
  2020-01-07 18:07   ` Keith Busch
@ 2020-01-09  3:11   ` Martin K. Petersen
  2020-01-09 10:38     ` Max Gurtovoy
  1 sibling, 1 reply; 49+ messages in thread
From: Martin K. Petersen @ 2020-01-09  3:11 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, sagi, martin.petersen, shlomin, israelr, vladimirk,
	linux-nvme, idanb, oren, kbusch, hch


Max,

> +#define NVME_NS_DIX_SUPPORTED	(1 << 1)
> +#define NVME_NS_DIF_SUPPORTED	(1 << 2)

Not so keen on the DIF/DIX terminology in this context.

DIF as a term was obsoleted long ago in T10. The proper term is
Protection Information which you also use throughout these changes. We
have lots of DIF references in the kernel for hysterical raisins. But
let's not create new ones.

DIX is a spec I wrote to describe how controllers could exchange T10
Protection Information with a host. It's specific to SCSI and although
DIX is referenced in the NVMe spec, it would not be accurate to describe
the NVMe features using the term DIX.

So I'm in agreement with Keith that this should be named using more
descriptive terms such as NVME_NS_PI, NVME_NS_CONTIGUOUS_PI,
NVME_NS_SEPARATE_PI, or similar.

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 03/15] nvme: Introduce max_integrity_segments ctrl attribute
  2020-01-06 13:37 ` [PATCH 03/15] nvme: Introduce max_integrity_segments ctrl attribute Max Gurtovoy
@ 2020-01-09  3:12   ` Martin K. Petersen
  0 siblings, 0 replies; 49+ messages in thread
From: Martin K. Petersen @ 2020-01-09  3:12 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, sagi, martin.petersen, shlomin, israelr, vladimirk,
	linux-nvme, idanb, oren, kbusch, hch


Max,

> This patch doesn't change any logic, and is needed as a preparation
> for adding PI support for fabrics drivers that will use an extended
> LBA format for metadata.

Looks fine.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 05/15] nvme: Introduce NVME_INLINE_PROT_SG_CNT
  2020-01-06 13:37 ` [PATCH 05/15] nvme: Introduce NVME_INLINE_PROT_SG_CNT Max Gurtovoy
@ 2020-01-09  3:13   ` Martin K. Petersen
  0 siblings, 0 replies; 49+ messages in thread
From: Martin K. Petersen @ 2020-01-09  3:13 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, sagi, martin.petersen, shlomin, israelr, vladimirk,
	linux-nvme, idanb, oren, kbusch, hch


Max,

> SGL size of PI metadata is usually small. Thus, 1 inline sg should
> cover most cases. The macro will be used for pre-allocate a single SGL
> entry for protection information. The preallocation of small inline
> SGLs depends on SG_CHAIN capability so if the ARCH doesn't support
> SG_CHAIN, use the runtime allocation for the SGL. This patch is a
> preparation for adding metadata (T10-PI) over fabric support.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 09/15] nvmet: Add metadata characteristics for a namespace
  2020-01-06 13:37 ` [PATCH 09/15] nvmet: Add metadata characteristics for a namespace Max Gurtovoy
@ 2020-01-09  3:16   ` Martin K. Petersen
  0 siblings, 0 replies; 49+ messages in thread
From: Martin K. Petersen @ 2020-01-09  3:16 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, sagi, martin.petersen, shlomin, israelr, vladimirk,
	linux-nvme, idanb, oren, kbusch, hch


Max,

> Fill those namespace fields from the block device format for adding
> metadata (T10-PI) over fabric support with block devices.

Looks OK.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 10/15] nvmet: Rename nvmet_rw_len to nvmet_rw_data_len
  2020-01-06 13:37 ` [PATCH 10/15] nvmet: Rename nvmet_rw_len to nvmet_rw_data_len Max Gurtovoy
@ 2020-01-09  3:17   ` Martin K. Petersen
  0 siblings, 0 replies; 49+ messages in thread
From: Martin K. Petersen @ 2020-01-09  3:17 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, sagi, martin.petersen, shlomin, israelr, vladimirk,
	linux-nvme, idanb, oren, kbusch, hch


Max,

> The function doesn't add the metadata length (only data length is
> calculated). This is preparation for adding metadata (T10-PI) support.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 11/15] nvmet: Rename nvmet_check_data_len to nvmet_check_transfer_len
  2020-01-06 13:37 ` [PATCH 11/15] nvmet: Rename nvmet_check_data_len to nvmet_check_transfer_len Max Gurtovoy
@ 2020-01-09  3:19   ` Martin K. Petersen
  0 siblings, 0 replies; 49+ messages in thread
From: Martin K. Petersen @ 2020-01-09  3:19 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, sagi, martin.petersen, shlomin, israelr, vladimirk,
	linux-nvme, idanb, oren, kbusch, hch


Max,

> The function doesn't checks only the data length, because the transfer
> length includes also the metadata length in some cases. This is
> preparation for adding metadata (T10-PI) support.

Commit description needs a bit of grammar checking. Otherwise OK.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 13/15] nvmet: Add metadata/T10-PI support
  2020-01-06 13:37 ` [PATCH 13/15] nvmet: Add metadata/T10-PI support Max Gurtovoy
@ 2020-01-09  3:24   ` Martin K. Petersen
  2020-01-27 17:17     ` Max Gurtovoy
  2020-01-17 16:46   ` James Smart
  1 sibling, 1 reply; 49+ messages in thread
From: Martin K. Petersen @ 2020-01-09  3:24 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, sagi, martin.petersen, shlomin, israelr, vladimirk,
	linux-nvme, idanb, oren, kbusch, hch


Max,

> -	/* no limit on data transfer sizes for now */
> -	id->mdts = 0;
> +	/* Limit MDTS for metadata capable controllers (assume mpsmin is 4k) */
> +	id->mdts = ctrl->pi_support ? ilog2(NVMET_T10_PI_MAX_KB_SZ >> 2) : 0;

[...]

> +#define NVMET_T10_PI_MAX_KB_SZ 128

Where does this limit come from?

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 15/15] nvmet-rdma: Add metadata/T10-PI support
  2020-01-06 13:37 ` [PATCH 15/15] nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
@ 2020-01-09  3:29   ` Martin K. Petersen
  0 siblings, 0 replies; 49+ messages in thread
From: Martin K. Petersen @ 2020-01-09  3:29 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, sagi, martin.petersen, shlomin, israelr, vladimirk,
	linux-nvme, idanb, oren, kbusch, hch


Max,

> +static void nvmet_rdma_set_diff_domain(struct nvme_command *cmd,
> +		struct ib_sig_domain *domain, struct nvmet_ns *ns)

diff_domain?

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/15] nvme: Introduce namespace features flag
  2020-01-09  3:11   ` Martin K. Petersen
@ 2020-01-09 10:38     ` Max Gurtovoy
  2020-01-09 16:26       ` Keith Busch
  0 siblings, 1 reply; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-09 10:38 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: axboe, sagi, vladimirk, shlomin, israelr, linux-nvme, idanb,
	oren, kbusch, hch


On 1/9/2020 5:11 AM, Martin K. Petersen wrote:
> Max,
>
>> +#define NVME_NS_DIX_SUPPORTED	(1 << 1)
>> +#define NVME_NS_DIF_SUPPORTED	(1 << 2)
> Not so keen on the DIF/DIX terminology in this context.

Martin,

how about:

+#define NVME_NS_PI_HOST_SUPPORTED (1 << 1)
+#define NVME_NS_PI_CTRL_SUPPORTED (1 << 2)

>
> DIF as a term was obsoleted long ago in T10. The proper term is
> Protection Information which you also use throughout these changes. We
> have lots of DIF references in the kernel for hysterical raisins. But
> let's not create new ones.
>
> DIX is a spec I wrote to describe how controllers could exchange T10
> Protection Information with a host. It's specific to SCSI and although
> DIX is referenced in the NVMe spec, it would not be accurate to describe
> the NVMe features using the term DIX.
>
> So I'm in agreement with Keith that this should be named using more
> descriptive terms such as NVME_NS_PI, NVME_NS_CONTIGUOUS_PI,
> NVME_NS_SEPARATE_PI, or similar.
>

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/15] nvme: Introduce namespace features flag
  2020-01-09 10:38     ` Max Gurtovoy
@ 2020-01-09 16:26       ` Keith Busch
  2020-01-12  9:40         ` Max Gurtovoy
  2020-01-12  9:40         ` Max Gurtovoy
  0 siblings, 2 replies; 49+ messages in thread
From: Keith Busch @ 2020-01-09 16:26 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, sagi, Martin K. Petersen, shlomin, israelr, vladimirk,
	linux-nvme, idanb, oren, hch

On Thu, Jan 09, 2020 at 12:38:05PM +0200, Max Gurtovoy wrote:
> 
> On 1/9/2020 5:11 AM, Martin K. Petersen wrote:
> > Max,
> > 
> > > +#define NVME_NS_DIX_SUPPORTED	(1 << 1)
> > > +#define NVME_NS_DIF_SUPPORTED	(1 << 2)
> > Not so keen on the DIF/DIX terminology in this context.
> 
> Martin,
> 
> how about:
> 
> +#define NVME_NS_PI_HOST_SUPPORTED (1 << 1)
> +#define NVME_NS_PI_CTRL_SUPPORTED (1 << 2)

Well, I was trying to say earlier that nvme supports formats with metadata
that's not used for protection information. The metadata, whether separate
or interleaved, can be used for some proprietary non-pi related feature.

The nvme driver only leverages "blk-integrity" to facilitate allocating
and managing the metdata payloads even when not used for integrity. It
might make sense to give that block component a more generic name than
"integrity".

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/15] nvme: Introduce namespace features flag
  2020-01-09 16:26       ` Keith Busch
@ 2020-01-12  9:40         ` Max Gurtovoy
  2020-01-13 20:31           ` Keith Busch
  2020-01-12  9:40         ` Max Gurtovoy
  1 sibling, 1 reply; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-12  9:40 UTC (permalink / raw)
  To: Keith Busch
  Cc: axboe, sagi, Martin K. Petersen, shlomin, israelr, vladimirk,
	linux-nvme, idanb, oren, hch


On 1/9/2020 6:26 PM, Keith Busch wrote:
> On Thu, Jan 09, 2020 at 12:38:05PM +0200, Max Gurtovoy wrote:
>> On 1/9/2020 5:11 AM, Martin K. Petersen wrote:
>>> Max,
>>>
>>>> +#define NVME_NS_DIX_SUPPORTED	(1 << 1)
>>>> +#define NVME_NS_DIF_SUPPORTED	(1 << 2)
>>> Not so keen on the DIF/DIX terminology in this context.
>> Martin,
>>
>> how about:
>>
>> +#define NVME_NS_PI_HOST_SUPPORTED (1 << 1)
>> +#define NVME_NS_PI_CTRL_SUPPORTED (1 << 2)
> Well, I was trying to say earlier that nvme supports formats with metadata
> that's not used for protection information. The metadata, whether separate
> or interleaved, can be used for some proprietary non-pi related feature.

so how about:

+#define NVME_NS_MD_HOST_SUPPORTED (1 << 1)
+#define NVME_NS_MD_CTRL_SUPPORTED (1 << 2)


We didn't change any logic for metadata support for non-PI format.

>
> The nvme driver only leverages "blk-integrity" to facilitate allocating
> and managing the metdata payloads even when not used for integrity. It
> might make sense to give that block component a more generic name than
> "integrity".

Yup that True, But this patchset is already big enough by itself.

We can do block layer refactoring in different series..


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/15] nvme: Introduce namespace features flag
  2020-01-09 16:26       ` Keith Busch
  2020-01-12  9:40         ` Max Gurtovoy
@ 2020-01-12  9:40         ` Max Gurtovoy
  1 sibling, 0 replies; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-12  9:40 UTC (permalink / raw)
  To: Keith Busch
  Cc: axboe, sagi, Martin K. Petersen, shlomin, israelr, vladimirk,
	linux-nvme, idanb, oren, hch


On 1/9/2020 6:26 PM, Keith Busch wrote:
> On Thu, Jan 09, 2020 at 12:38:05PM +0200, Max Gurtovoy wrote:
>> On 1/9/2020 5:11 AM, Martin K. Petersen wrote:
>>> Max,
>>>
>>>> +#define NVME_NS_DIX_SUPPORTED	(1 << 1)
>>>> +#define NVME_NS_DIF_SUPPORTED	(1 << 2)
>>> Not so keen on the DIF/DIX terminology in this context.
>> Martin,
>>
>> how about:
>>
>> +#define NVME_NS_PI_HOST_SUPPORTED (1 << 1)
>> +#define NVME_NS_PI_CTRL_SUPPORTED (1 << 2)
> Well, I was trying to say earlier that nvme supports formats with metadata
> that's not used for protection information. The metadata, whether separate
> or interleaved, can be used for some proprietary non-pi related feature.

so how about:

+#define NVME_NS_MD_HOST_SUPPORTED (1 << 1)
+#define NVME_NS_MD_CTRL_SUPPORTED (1 << 2)


We didn't change any logic for metadata support for non-PI format.

>
> The nvme driver only leverages "blk-integrity" to facilitate allocating
> and managing the metdata payloads even when not used for integrity. It
> might make sense to give that block component a more generic name than
> "integrity".

Yup that True, But this patchset is already big enough by itself.

We can do block layer refactoring in different series..


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/15] nvme: Introduce namespace features flag
  2020-01-12  9:40         ` Max Gurtovoy
@ 2020-01-13 20:31           ` Keith Busch
  2020-01-14 16:04             ` Max Gurtovoy
  0 siblings, 1 reply; 49+ messages in thread
From: Keith Busch @ 2020-01-13 20:31 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, sagi, Martin K. Petersen, shlomin, israelr, vladimirk,
	linux-nvme, idanb, oren, hch

On Sun, Jan 12, 2020 at 11:40:31AM +0200, Max Gurtovoy wrote:
> On 1/9/2020 6:26 PM, Keith Busch wrote:
> > On Thu, Jan 09, 2020 at 12:38:05PM +0200, Max Gurtovoy wrote:
> > > On 1/9/2020 5:11 AM, Martin K. Petersen wrote:
> > > > Max,
> > > > 
> > > > > +#define NVME_NS_DIX_SUPPORTED	(1 << 1)
> > > > > +#define NVME_NS_DIF_SUPPORTED	(1 << 2)
> > > > Not so keen on the DIF/DIX terminology in this context.
> > > Martin,
> > > 
> > > how about:
> > > 
> > > +#define NVME_NS_PI_HOST_SUPPORTED (1 << 1)
> > > +#define NVME_NS_PI_CTRL_SUPPORTED (1 << 2)
> > Well, I was trying to say earlier that nvme supports formats with metadata
> > that's not used for protection information. The metadata, whether separate
> > or interleaved, can be used for some proprietary non-pi related feature.
> 
> so how about:
> 
> +#define NVME_NS_MD_HOST_SUPPORTED (1 << 1)
> +#define NVME_NS_MD_CTRL_SUPPORTED (1 << 2)

You're using these to report the in-memory format of the meta-data,
right? These are either Extended or Separate from the block data, so
I think the names should convey that. Something like the following
maybe:
 
  #define NVME_NS_MD_SEP_SUPPORTED (1 << 1)
  #define NVME_NS_MD_EXT_SUPPORTED (1 << 2)
 
> We didn't change any logic for metadata support for non-PI format.
> 
> > 
> > The nvme driver only leverages "blk-integrity" to facilitate allocating
> > and managing the metdata payloads even when not used for integrity. It
> > might make sense to give that block component a more generic name than
> > "integrity".
> 
> Yup that True, But this patchset is already big enough by itself.
> 
> We can do block layer refactoring in different series..

Absolutely, didn't mean to imply you take that on. :)

I was just trying to emphasize the current name doesn't always describe how
that component is used.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/15] nvme: Introduce namespace features flag
  2020-01-13 20:31           ` Keith Busch
@ 2020-01-14 16:04             ` Max Gurtovoy
  0 siblings, 0 replies; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-14 16:04 UTC (permalink / raw)
  To: Keith Busch
  Cc: axboe, sagi, Martin K. Petersen, shlomin, israelr, vladimirk,
	linux-nvme, idanb, oren, hch


On 1/13/2020 10:31 PM, Keith Busch wrote:
> On Sun, Jan 12, 2020 at 11:40:31AM +0200, Max Gurtovoy wrote:
>> On 1/9/2020 6:26 PM, Keith Busch wrote:
>>> On Thu, Jan 09, 2020 at 12:38:05PM +0200, Max Gurtovoy wrote:
>>>> On 1/9/2020 5:11 AM, Martin K. Petersen wrote:
>>>>> Max,
>>>>>
>>>>>> +#define NVME_NS_DIX_SUPPORTED	(1 << 1)
>>>>>> +#define NVME_NS_DIF_SUPPORTED	(1 << 2)
>>>>> Not so keen on the DIF/DIX terminology in this context.
>>>> Martin,
>>>>
>>>> how about:
>>>>
>>>> +#define NVME_NS_PI_HOST_SUPPORTED (1 << 1)
>>>> +#define NVME_NS_PI_CTRL_SUPPORTED (1 << 2)
>>> Well, I was trying to say earlier that nvme supports formats with metadata
>>> that's not used for protection information. The metadata, whether separate
>>> or interleaved, can be used for some proprietary non-pi related feature.
>> so how about:
>>
>> +#define NVME_NS_MD_HOST_SUPPORTED (1 << 1)
>> +#define NVME_NS_MD_CTRL_SUPPORTED (1 << 2)
> You're using these to report the in-memory format of the meta-data,
> right? These are either Extended or Separate from the block data, so
> I think the names should convey that. Something like the following
> maybe:
>   
>    #define NVME_NS_MD_SEP_SUPPORTED (1 << 1)
>    #define NVME_NS_MD_EXT_SUPPORTED (1 << 2)

No, we're setting it to know if we have host MD (aka DIX, memory-domain) 
and if we have ctrl MD (aka DIF, ctrl-domain).

for the MD format we use:

#define NVME_NS_EXT_LBAS	(1 << 0)

>   
>> We didn't change any logic for metadata support for non-PI format.
>>
>>> The nvme driver only leverages "blk-integrity" to facilitate allocating
>>> and managing the metdata payloads even when not used for integrity. It
>>> might make sense to give that block component a more generic name than
>>> "integrity".
>> Yup that True, But this patchset is already big enough by itself.
>>
>> We can do block layer refactoring in different series..
> Absolutely, didn't mean to imply you take that on. :)
>
> I was just trying to emphasize the current name doesn't always describe how
> that component is used.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/15] nvme: Enforce extended LBA format for fabrics metadata
  2020-01-06 13:37 ` [PATCH 02/15] nvme: Enforce extended LBA format for fabrics metadata Max Gurtovoy
@ 2020-01-16 23:53   ` James Smart
  2020-01-19 11:20     ` Max Gurtovoy
  0 siblings, 1 reply; 49+ messages in thread
From: James Smart @ 2020-01-16 23:53 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, idanb, israelr, shlomin, oren



On 1/6/2020 5:37 AM, Max Gurtovoy wrote:
> An extended LBA is a larger LBA that is created when metadata associated
> with the LBA is transferred contiguously with the LBA data (AKA
> interleaved). The metadata may be either transferred as part of the LBA
> (creating an extended LBA) or it may be transferred as a separate
> contiguous buffer of data. According to the NVMeoF spec, a fabrics ctrl
> supports only an Extended LBA format. Fail revalidation in case we have a
> spec violation. Also initialize the integrity profile for the block device
> for fabrics ctrl.
>
> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
> Signed-off-by: Israel Rukshin <israelr@mellanox.com>
> ---
>   drivers/nvme/host/core.c | 25 +++++++++++++++++++++----
>   1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index d98eb48..089cdc3c 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1818,7 +1818,7 @@ static void nvme_update_disk_info(struct gendisk *disk,
>   	blk_mq_unfreeze_queue(disk->queue);
>   }
>   
> -static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
> +static int __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
>   {
>   	struct nvme_ns *ns = disk->private_data;
>   
> @@ -1846,11 +1846,21 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
>   			ns->features |= NVME_NS_EXT_LBAS;
>   
>   		/*
> +		 * For Fabrics, only metadata as part of extended data LBA is
> +		 * supported. Fail in case of a spec violation.
> +		 */
> +		if (ns->ctrl->ops->flags & NVME_F_FABRICS) {
> +			if (WARN_ON_ONCE(!(ns->features & NVME_NS_EXT_LBAS)))
> +				return -EINVAL;
> +		}
> +
> +		/*
>   		 * For PCI, Extended logical block will be generated by the
>   		 * controller.
>   		 */
>   		if (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) {
> -			if (!(ns->features & NVME_NS_EXT_LBAS))
> +			if (ns->ctrl->ops->flags & NVME_F_FABRICS ||
> +			    !(ns->features & NVME_NS_EXT_LBAS))
>   				ns->features |= NVME_NS_DIX_SUPPORTED;

This last change seems odd - why is DIX set if NVME_F_FABRICS ?

Per patch description above, Fabrics spec requires metadata as an 
extended LBA, thus it doesn't support DIX.

Which is touches on a lot of odd things with the nvme spec as it's 
certainly possible for, within the os host implementation, to have the 
host transmitting engine to convert an OS separate DIF buf to an 
extended lba transmission on the wire and as presented to the 
controller.  Transports can certainly help make this happen - and add 
egress checking as the data leaves the host.    Which means - I'm not 
sure this hard DIX definition being implemented this way is the way to go.

-- james



_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 13/15] nvmet: Add metadata/T10-PI support
  2020-01-06 13:37 ` [PATCH 13/15] nvmet: Add metadata/T10-PI support Max Gurtovoy
  2020-01-09  3:24   ` Martin K. Petersen
@ 2020-01-17 16:46   ` James Smart
  2020-01-19 13:47     ` Max Gurtovoy
  1 sibling, 1 reply; 49+ messages in thread
From: James Smart @ 2020-01-17 16:46 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, idanb, israelr, shlomin, oren

On 1/6/2020 5:37 AM, Max Gurtovoy wrote:
> From: Israel Rukshin <israelr@mellanox.com>
>
> Expose the namespace metadata format when PI is enabled. The user needs
> to enable the capability per subsystem and per port. The other metadata
> properties are taken from the namespace/bdev.
>
> Usage example:
> echo 1 > /config/nvmet/subsystems/${NAME}/attr_pi_enable
> echo 1 > /config/nvmet/ports/${PORT_NUM}/param_pi_enable
>
>

Mind adding a core routine, callable by a transport, to set the 
pi_enable attribute on the port ?

Also - I don't know that all block sizes, metadata sizes, and DIF types 
are supported by the port. Should these registration be "capabilities" 
lists ?

-- james

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/15] nvme: Enforce extended LBA format for fabrics metadata
  2020-01-16 23:53   ` James Smart
@ 2020-01-19 11:20     ` Max Gurtovoy
  2020-01-21 17:40       ` James Smart
  0 siblings, 1 reply; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-19 11:20 UTC (permalink / raw)
  To: James Smart, linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, idanb, israelr, shlomin, oren


On 1/17/2020 1:53 AM, James Smart wrote:
>
>
> On 1/6/2020 5:37 AM, Max Gurtovoy wrote:
>> An extended LBA is a larger LBA that is created when metadata associated
>> with the LBA is transferred contiguously with the LBA data (AKA
>> interleaved). The metadata may be either transferred as part of the LBA
>> (creating an extended LBA) or it may be transferred as a separate
>> contiguous buffer of data. According to the NVMeoF spec, a fabrics ctrl
>> supports only an Extended LBA format. Fail revalidation in case we 
>> have a
>> spec violation. Also initialize the integrity profile for the block 
>> device
>> for fabrics ctrl.
>>
>> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
>> Signed-off-by: Israel Rukshin <israelr@mellanox.com>
>> ---
>>   drivers/nvme/host/core.c | 25 +++++++++++++++++++++----
>>   1 file changed, 21 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index d98eb48..089cdc3c 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -1818,7 +1818,7 @@ static void nvme_update_disk_info(struct 
>> gendisk *disk,
>>       blk_mq_unfreeze_queue(disk->queue);
>>   }
>>   -static void __nvme_revalidate_disk(struct gendisk *disk, struct 
>> nvme_id_ns *id)
>> +static int __nvme_revalidate_disk(struct gendisk *disk, struct 
>> nvme_id_ns *id)
>>   {
>>       struct nvme_ns *ns = disk->private_data;
>>   @@ -1846,11 +1846,21 @@ static void __nvme_revalidate_disk(struct 
>> gendisk *disk, struct nvme_id_ns *id)
>>               ns->features |= NVME_NS_EXT_LBAS;
>>             /*
>> +         * For Fabrics, only metadata as part of extended data LBA is
>> +         * supported. Fail in case of a spec violation.
>> +         */
>> +        if (ns->ctrl->ops->flags & NVME_F_FABRICS) {
>> +            if (WARN_ON_ONCE(!(ns->features & NVME_NS_EXT_LBAS)))
>> +                return -EINVAL;
>> +        }
>> +
>> +        /*
>>            * For PCI, Extended logical block will be generated by the
>>            * controller.
>>            */
>>           if (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) {
>> -            if (!(ns->features & NVME_NS_EXT_LBAS))
>> +            if (ns->ctrl->ops->flags & NVME_F_FABRICS ||
>> +                !(ns->features & NVME_NS_EXT_LBAS))
>>                   ns->features |= NVME_NS_DIX_SUPPORTED;
>
> This last change seems odd - why is DIX set if NVME_F_FABRICS ?
>
> Per patch description above, Fabrics spec requires metadata as an 
> extended LBA, thus it doesn't support DIX.

we refer DIX as memory domain metadata.

>
> Which is touches on a lot of odd things with the nvme spec as it's 
> certainly possible for, within the os host implementation, to have the 
> host transmitting engine to convert an OS separate DIF buf to an 
> extended lba transmission on the wire and as presented to the 
> controller.  Transports can certainly help make this happen - and add 
> egress checking as the data leaves the host.    Which means - I'm not 
> sure this hard DIX definition being implemented this way is the way to 
> go.

RDMA transport is converting separated SGLs (non-extended mode) that 
sent by the block layer to extended mode.

The idea here is to define on which conditions we'll ask the block layer 
to set it's metadata infrastructure.

for PCI - only in case of non-extended mode (in extended mode the block 
layer will not set integrity, and the nvme driver will set the 
PRACT/PRCHK if needed) since there is no conversion to extended mode in 
the nvme driver.

for fabrics - always ask for blk integrity setting since the transport 
(RDMA only for now) is responsible for transferring it to extended mode 
on the wire.

-Max.


>
> -- james
>
>

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 13/15] nvmet: Add metadata/T10-PI support
  2020-01-17 16:46   ` James Smart
@ 2020-01-19 13:47     ` Max Gurtovoy
  0 siblings, 0 replies; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-19 13:47 UTC (permalink / raw)
  To: James Smart, linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, idanb, israelr, shlomin, oren


On 1/17/2020 6:46 PM, James Smart wrote:
> On 1/6/2020 5:37 AM, Max Gurtovoy wrote:
>> From: Israel Rukshin <israelr@mellanox.com>
>>
>> Expose the namespace metadata format when PI is enabled. The user needs
>> to enable the capability per subsystem and per port. The other metadata
>> properties are taken from the namespace/bdev.
>>
>> Usage example:
>> echo 1 > /config/nvmet/subsystems/${NAME}/attr_pi_enable
>> echo 1 > /config/nvmet/ports/${PORT_NUM}/param_pi_enable
>>
>>
>
> Mind adding a core routine, callable by a transport, to set the 
> pi_enable attribute on the port ?

I'm not sure what you're asking here.

port->pi_enable is set using configfs before port is active and 
port->pi_capable will be set by each transport before creating the ctrl.

ctrl->pi_support will be set according to port capabilities and 
subsystem subsys->pi_support.

E.g in case you would like to expose subsystem with MD namespaces using 
RDMA port and using FC/TCP port it will work.

The RDMA port ctrl will verify/generate the MD and the FC/TCP port ctrl 
will create a regular bio request that will be passed to the bdev driver 
to handle.

Also these ctrl will identify themselves with the correct capabilities.


>
> Also - I don't know that all block sizes, metadata sizes, and DIF 
> types are supported by the port. Should these registration be 
> "capabilities" lists ?

would it worth complicating the code in this stage for ports that might 
not support all types ?

RDMA ports support all needed bs, metadata sizes and DIF types.

We can add "capabilities" in the future for another transports that PI 
will be implemented for them (if needed).


-Max.


>
> -- james

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/15] nvme: Enforce extended LBA format for fabrics metadata
  2020-01-19 11:20     ` Max Gurtovoy
@ 2020-01-21 17:40       ` James Smart
  0 siblings, 0 replies; 49+ messages in thread
From: James Smart @ 2020-01-21 17:40 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, idanb, israelr, shlomin, oren



On 1/19/2020 3:20 AM, Max Gurtovoy wrote:
>
> On 1/17/2020 1:53 AM, James Smart wrote:
>>
>>
>> On 1/6/2020 5:37 AM, Max Gurtovoy wrote:
>>> An extended LBA is a larger LBA that is created when metadata 
>>> associated
>>> with the LBA is transferred contiguously with the LBA data (AKA
>>> interleaved). The metadata may be either transferred as part of the LBA
>>> (creating an extended LBA) or it may be transferred as a separate
>>> contiguous buffer of data. According to the NVMeoF spec, a fabrics ctrl
>>> supports only an Extended LBA format. Fail revalidation in case we 
>>> have a
>>> spec violation. Also initialize the integrity profile for the block 
>>> device
>>> for fabrics ctrl.
>>>
>>> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
>>> Signed-off-by: Israel Rukshin <israelr@mellanox.com>
>>> ---
>>>   drivers/nvme/host/core.c | 25 +++++++++++++++++++++----
>>>   1 file changed, 21 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index d98eb48..089cdc3c 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -1818,7 +1818,7 @@ static void nvme_update_disk_info(struct 
>>> gendisk *disk,
>>>       blk_mq_unfreeze_queue(disk->queue);
>>>   }
>>>   -static void __nvme_revalidate_disk(struct gendisk *disk, struct 
>>> nvme_id_ns *id)
>>> +static int __nvme_revalidate_disk(struct gendisk *disk, struct 
>>> nvme_id_ns *id)
>>>   {
>>>       struct nvme_ns *ns = disk->private_data;
>>>   @@ -1846,11 +1846,21 @@ static void __nvme_revalidate_disk(struct 
>>> gendisk *disk, struct nvme_id_ns *id)
>>>               ns->features |= NVME_NS_EXT_LBAS;
>>>             /*
>>> +         * For Fabrics, only metadata as part of extended data LBA is
>>> +         * supported. Fail in case of a spec violation.
>>> +         */
>>> +        if (ns->ctrl->ops->flags & NVME_F_FABRICS) {
>>> +            if (WARN_ON_ONCE(!(ns->features & NVME_NS_EXT_LBAS)))
>>> +                return -EINVAL;
>>> +        }
>>> +
>>> +        /*
>>>            * For PCI, Extended logical block will be generated by the
>>>            * controller.
>>>            */
>>>           if (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) {
>>> -            if (!(ns->features & NVME_NS_EXT_LBAS))
>>> +            if (ns->ctrl->ops->flags & NVME_F_FABRICS ||
>>> +                !(ns->features & NVME_NS_EXT_LBAS))
>>>                   ns->features |= NVME_NS_DIX_SUPPORTED;
>>
>> This last change seems odd - why is DIX set if NVME_F_FABRICS ?
>>
>> Per patch description above, Fabrics spec requires metadata as an 
>> extended LBA, thus it doesn't support DIX.
>
> we refer DIX as memory domain metadata.

It's fine.  But somewhere we need to be clear that "DIX" as it's 
referenced here is relative to the OS to host port interface and is not 
"DIX" as per NVME stds definition. The flag you are setting on the ns 
features is *not* a ns attribute as read from the controller.  A comment 
should be had somewhere.


>
>>
>> Which is touches on a lot of odd things with the nvme spec as it's 
>> certainly possible for, within the os host implementation, to have 
>> the host transmitting engine to convert an OS separate DIF buf to an 
>> extended lba transmission on the wire and as presented to the 
>> controller.  Transports can certainly help make this happen - and add 
>> egress checking as the data leaves the host.    Which means - I'm not 
>> sure this hard DIX definition being implemented this way is the way 
>> to go.
>
> RDMA transport is converting separated SGLs (non-extended mode) that 
> sent by the block layer to extended mode.
>
> The idea here is to define on which conditions we'll ask the block 
> layer to set it's metadata infrastructure.
>
> for PCI - only in case of non-extended mode (in extended mode the 
> block layer will not set integrity, and the nvme driver will set the 
> PRACT/PRCHK if needed) since there is no conversion to extended mode 
> in the nvme driver.
>
> for fabrics - always ask for blk integrity setting since the transport 
> (RDMA only for now) is responsible for transferring it to extended 
> mode on the wire.

Yep agree. But there should be a comment on what is happening within the 
OS to host port interface vs what is happening per the nvme std. It 
seems to get muddled.

-- james


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 13/15] nvmet: Add metadata/T10-PI support
  2020-01-09  3:24   ` Martin K. Petersen
@ 2020-01-27 17:17     ` Max Gurtovoy
  2020-01-29  2:32       ` Martin K. Petersen
  0 siblings, 1 reply; 49+ messages in thread
From: Max Gurtovoy @ 2020-01-27 17:17 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: axboe, sagi, vladimirk, shlomin, israelr, linux-nvme, idanb,
	oren, kbusch, hch


On 1/9/2020 5:24 AM, Martin K. Petersen wrote:
> Max,


Martin,


>
>> -	/* no limit on data transfer sizes for now */
>> -	id->mdts = 0;
>> +	/* Limit MDTS for metadata capable controllers (assume mpsmin is 4k) */
>> +	id->mdts = ctrl->pi_support ? ilog2(NVMET_T10_PI_MAX_KB_SZ >> 2) : 0;
> [...]
>
>> +#define NVMET_T10_PI_MAX_KB_SZ 128
> Where does this limit come from?

This is coming from few reasons:

1. In the current driver implementation of the ib_core (RDMA) we are 
limited to 1MB IO for T10-DIF offload so we need to limit the mdts 
anyway. For that, I want to add a ctrl->ops->get_ctrl_mdts(ctrl).

2. There is some unclear (to me) behavior in the block layer regarding 
splitting integrity bios. We get guard error over the fabric transaction 
in case we need to split a bio (the error is at the target side). I'm 
debugging it and found potential bug that I fixed using:

@@ -378,6 +378,12 @@ void bio_integrity_advance(struct bio *bio, 
unsigned int bytes_done)

         bip->bip_iter.bi_sector += bytes_done >> 9;
         bvec_iter_advance(bip->bip_vec, &bip->bip_iter, bytes);
+
+       if (split && bip->bip_iter.bi_bvec_done) {
+               bip->bip_vec[bip->bip_iter.bi_idx].bv_len -= 
bip->bip_iter.bi_bvec_done;
+               bip->bip_vec[bip->bip_iter.bi_idx].bv_offset += 
bip->bip_iter.bi_bvec_done;
+               bip->bip_iter.bi_bvec_done = 0;
+       }
  }

but unfortunately it helped only if we split the bio once. In case we 
split it more than once, we get to guard check error again.

maybe you have an idea how to overcome this ?


3. The maximum_integrity_segments = 1 for NVMe devices, so we can use 
4096/8=512 integrity sectors before reaching the limit of 
max_integrity_segments. This leads to mdts=6 (256KB) if the bs=512 . I'm 
afraid that we might have issues with the fact that the target prot_sgl 
 > 1 and the NVMe driver set maximum_integrity_segments = 1.


Therefore, I suggest to limit the mdts per transport (and internally 
check if this ctrl is pi_enabled).


All the above is happening when using submit_bio --> ... --> bio_split 
function that call bio_advance as well (that is called from completion 
context as well). This path is not used in local NVMe IO.

In the local NVMe IO (e.g using FIO application), there is no call to 
bio_split/bio_advance in the submission flow, and there is only one call 
to bio_advance in the completion flow.


-Max.


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 13/15] nvmet: Add metadata/T10-PI support
  2020-01-27 17:17     ` Max Gurtovoy
@ 2020-01-29  2:32       ` Martin K. Petersen
  0 siblings, 0 replies; 49+ messages in thread
From: Martin K. Petersen @ 2020-01-29  2:32 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, sagi, Martin K. Petersen, shlomin, israelr, vladimirk,
	linux-nvme, idanb, oren, kbusch, hch


Max,

> 2. There is some unclear (to me) behavior in the block layer regarding
> splitting integrity bios. We get guard error over the fabric
> transaction in case we need to split a bio (the error is at the target
> side).
>
> maybe you have an idea how to overcome this ?

I'm working on bringing our data integrity qualification tooling up on
an upstream kernel. The tooling has been stuck in the past for a while
and consequently I haven't tripped over this yet. Will get there soon.

> 3. The maximum_integrity_segments = 1 for NVMe devices, so we can use
> 4096/8=512 integrity sectors before reaching the limit of
> max_integrity_segments. This leads to mdts=6 (256KB) if the bs=512
> . I'm afraid that we might have issues with the fact that the target
> prot_sgl 1 and the NVMe driver set maximum_integrity_segments = 1.

> Therefore, I suggest to limit the mdts per transport (and internally
> check if this ctrl is pi_enabled).

Makes sense. That's kind of what I was fishing for when I asked where
this came from.

> All the above is happening when using submit_bio --> ... --> bio_split
> function that call bio_advance as well (that is called from completion
> context as well). This path is not used in local NVMe IO.

The splitting approach has changed since we cut our last production
kernel. Can't rule out regressions in this area, although I would have
hoped that blktests had caught them.

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH] nvme-cli/fabrics: Add pi_enable param to connect cmd
  2019-12-02 14:47 [PATCH 00/16 V2] nvme-rdma/nvmet-rdma: " Max Gurtovoy
@ 2019-12-02 14:47 ` Max Gurtovoy
  0 siblings, 0 replies; 49+ messages in thread
From: Max Gurtovoy @ 2019-12-02 14:47 UTC (permalink / raw)
  To: linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, shlomin, israelr, idanb, oren, maxg

From: Israel Rukshin <israelr@mellanox.com>

Added 'pi_enable' to 'connect' command so users can enable metadata support.

usage examples:
nvme connect --pi_enable --transport=rdma --traddr=10.0.1.1 --nqn=test-nvme
nvme connect -p -t rdma -a 10.0.1.1 -n test_nvme

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
---
 fabrics.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/fabrics.c b/fabrics.c
index 8982ae4..b22ea14 100644
--- a/fabrics.c
+++ b/fabrics.c
@@ -72,6 +72,7 @@ static struct config {
 	int  disable_sqflow;
 	int  hdr_digest;
 	int  data_digest;
+	int  pi_enable;
 	bool persistent;
 	bool quiet;
 } cfg = { NULL };
@@ -756,7 +757,9 @@ static int build_options(char *argstr, int max_len, bool discover)
 	    add_bool_argument(&argstr, &max_len, "disable_sqflow",
 				cfg.disable_sqflow) ||
 	    add_bool_argument(&argstr, &max_len, "hdr_digest", cfg.hdr_digest) ||
-	    add_bool_argument(&argstr, &max_len, "data_digest", cfg.data_digest))
+	    add_bool_argument(&argstr, &max_len, "data_digest",
+				cfg.data_digest) ||
+	    add_bool_argument(&argstr, &max_len, "pi_enable", cfg.pi_enable))
 		return -EINVAL;
 
 	return 0;
@@ -885,6 +888,13 @@ retry:
 		p += len;
 	}
 
+	if (cfg.pi_enable) {
+		len = sprintf(p, ",pi_enable");
+		if (len < 0)
+			return -EINVAL;
+		p += len;
+	}
+
 	switch (e->trtype) {
 	case NVMF_TRTYPE_RDMA:
 	case NVMF_TRTYPE_TCP:
@@ -1171,6 +1181,7 @@ int discover(const char *desc, int argc, char **argv, bool connect)
 		OPT_INT("tos",             'T', &cfg.tos,             "type of service"),
 		OPT_FLAG("hdr_digest",     'g', &cfg.hdr_digest,      "enable transport protocol header digest (TCP transport)"),
 		OPT_FLAG("data_digest",    'G', &cfg.data_digest,     "enable transport protocol data digest (TCP transport)"),
+		OPT_FLAG("pi_enable",      'p', &cfg.pi_enable,       "enable metadata (T10-PI) support (default false)"),
 		OPT_INT("nr-io-queues",    'i', &cfg.nr_io_queues,    "number of io queues to use (default is core count)"),
 		OPT_INT("nr-write-queues", 'W', &cfg.nr_write_queues, "number of write queues to use (default 0)"),
 		OPT_INT("nr-poll-queues",  'P', &cfg.nr_poll_queues,  "number of poll queues to use (default 0)"),
@@ -1231,6 +1242,7 @@ int connect(const char *desc, int argc, char **argv)
 		OPT_FLAG("disable-sqflow",    'd', &cfg.disable_sqflow,    "disable controller sq flow control (default false)"),
 		OPT_FLAG("hdr-digest",        'g', &cfg.hdr_digest,        "enable transport protocol header digest (TCP transport)"),
 		OPT_FLAG("data-digest",       'G', &cfg.data_digest,       "enable transport protocol data digest (TCP transport)"),
+		OPT_FLAG("pi_enable",         'p', &cfg.pi_enable,         "enable metadata (T10-PI) support (default false)"),
 		OPT_END()
 	};
 
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] nvme-cli/fabrics: Add pi_enable param to connect cmd
  2019-11-12 18:14             ` James Smart
@ 2019-11-12 18:23               ` Sagi Grimberg
  0 siblings, 0 replies; 49+ messages in thread
From: Sagi Grimberg @ 2019-11-12 18:23 UTC (permalink / raw)
  To: James Smart, Max Gurtovoy, Martin K. Petersen
  Cc: vladimirk, idanb, israelr, linux-nvme, shlomin, oren, kbusch, hch


>>>> When an I/O request is received by the SCSI stack, we inspect it to
>>>> determine whether we need to allocate one or two scatterlists. If the
>>>> request has a bip attached, we'll allocate a separate scatterlist which
>>>> is then used to map the protection information before the I/O is
>>>> submitted to the device driver.
>>>
>>> For an NVMe/RDMA controller we must allocate a pool of PI MRs in the 
>>> constructor function in order to support E2E PI (and it's expensive).
>>>
>>> We can't do it in queue_rq function. In queue_rq we get the needed MR 
>>> according to the namespace format from a pre-allocated pool (PI_MR 
>>> for PI_namespace and REGULAR_MR for REGULAR_namespace).
>>>
>>> We can go with the module parameter approach but it will be very 
>>> limited for users and wasteful.
>>
>> Yep, this is a fundamental difference for the RDMA devices that supports
>> integrity.. You effectively need substantially more resources to track
>> all the inflight contexts.
>>
>> We could in theory lazily allocate the special resources, but the 
>> queue-pair extra sizing is not something that you can change in 
>> runtime...
>>
>> I'm fine with explicitly enabling PI for controllers that we know that
>> support PI.
> 
> Ok - but we're still going to want the ability to dynamically recognize 
> support in the transport and in the controller and turn it on even 
> without the user-space flag.

Sure, nothing says that fc can't turn it on based on detection.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] nvme-cli/fabrics: Add pi_enable param to connect cmd
  2019-11-12 17:47           ` Sagi Grimberg
@ 2019-11-12 18:14             ` James Smart
  2019-11-12 18:23               ` Sagi Grimberg
  0 siblings, 1 reply; 49+ messages in thread
From: James Smart @ 2019-11-12 18:14 UTC (permalink / raw)
  To: Sagi Grimberg, Max Gurtovoy, Martin K. Petersen
  Cc: vladimirk, idanb, israelr, linux-nvme, shlomin, oren, kbusch, hch


On 11/12/2019 9:47 AM, Sagi Grimberg wrote:
>
>>> When an I/O request is received by the SCSI stack, we inspect it to
>>> determine whether we need to allocate one or two scatterlists. If the
>>> request has a bip attached, we'll allocate a separate scatterlist which
>>> is then used to map the protection information before the I/O is
>>> submitted to the device driver.
>>
>> For an NVMe/RDMA controller we must allocate a pool of PI MRs in the 
>> constructor function in order to support E2E PI (and it's expensive).
>>
>> We can't do it in queue_rq function. In queue_rq we get the needed MR 
>> according to the namespace format from a pre-allocated pool (PI_MR 
>> for PI_namespace and REGULAR_MR for REGULAR_namespace).
>>
>> We can go with the module parameter approach but it will be very 
>> limited for users and wasteful.
>
> Yep, this is a fundamental difference for the RDMA devices that supports
> integrity.. You effectively need substantially more resources to track
> all the inflight contexts.
>
> We could in theory lazily allocate the special resources, but the 
> queue-pair extra sizing is not something that you can change in 
> runtime...
>
> I'm fine with explicitly enabling PI for controllers that we know that
> support PI.

Ok - but we're still going to want the ability to dynamically recognize 
support in the transport and in the controller and turn it on even 
without the user-space flag.

-- james


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] nvme-cli/fabrics: Add pi_enable param to connect cmd
  2019-11-10  9:25         ` Max Gurtovoy
@ 2019-11-12 17:47           ` Sagi Grimberg
  2019-11-12 18:14             ` James Smart
  0 siblings, 1 reply; 49+ messages in thread
From: Sagi Grimberg @ 2019-11-12 17:47 UTC (permalink / raw)
  To: Max Gurtovoy, Martin K. Petersen
  Cc: vladimirk, idanb, israelr, linux-nvme, shlomin, oren, kbusch, hch


>> When an I/O request is received by the SCSI stack, we inspect it to
>> determine whether we need to allocate one or two scatterlists. If the
>> request has a bip attached, we'll allocate a separate scatterlist which
>> is then used to map the protection information before the I/O is
>> submitted to the device driver.
> 
> For an NVMe/RDMA controller we must allocate a pool of PI MRs in the 
> constructor function in order to support E2E PI (and it's expensive).
> 
> We can't do it in queue_rq function. In queue_rq we get the needed MR 
> according to the namespace format from a pre-allocated pool (PI_MR for 
> PI_namespace and REGULAR_MR for REGULAR_namespace).
> 
> We can go with the module parameter approach but it will be very limited 
> for users and wasteful.

Yep, this is a fundamental difference for the RDMA devices that supports
integrity.. You effectively need substantially more resources to track
all the inflight contexts.

We could in theory lazily allocate the special resources, but the 
queue-pair extra sizing is not something that you can change in runtime...

I'm fine with explicitly enabling PI for controllers that we know that
support PI.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] nvme-cli/fabrics: Add pi_enable param to connect cmd
  2019-11-09  1:55       ` Martin K. Petersen
@ 2019-11-10  9:25         ` Max Gurtovoy
  2019-11-12 17:47           ` Sagi Grimberg
  0 siblings, 1 reply; 49+ messages in thread
From: Max Gurtovoy @ 2019-11-10  9:25 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: sagi, vladimirk, idanb, israelr, linux-nvme, shlomin, oren, kbusch, hch


On 11/9/2019 3:55 AM, Martin K. Petersen wrote:
> Max,

Martin,


>
>> So iSER will create a signature resources for every capable device and
>> connection without giving the user the possibility to distinguish
>> between needed PI controllers and un-needed PI controllers.
>>
>> We don't have a format command in the fabrics so this is the best
>> option we can think of for adding flexibility to users.
>>
>> I spoke with Christoph about the module param possibility and we both
>> agreed it's not the way to go.
>>
>> Let me know if you have an idea that will enable flexibility to users.
> The way it works in SCSI is that if a user wants to enable PI, they do
> so when provision the device. So either a format for disk drives or by
> selecting an option while creating a LUN in a storage array management
> interface.
>
> There are some knobs that can be twiddled on the initiator side to
> enable/disable PI and/or DIX but these are meant for test or buggy
> device workaround purposes. Not as a means for the user to select or
> deselect the feature.
>
> The user can decide on a per-device basis whether the block layer should
> generate PI on write and verify on read. Those are the "proper" policy
> knobs on the initiator side.

The block layer option to enable/disable DIX is there, but there is no 
way to disable DIF (wire PI check and generation).

Fabric storage doesn't support formatting a LUN/Namespace so we need to 
give flexibility to users.

Why we need to enforce the usage of E2E PI for all the controllers if 
the user is not interested (because lack of resources for example) ?

Maybe the application has some kind of QOS, e.g. protected namespace is 
ClassA (expensive storage) and non-protected namespace is ClassB (cheap 
storage) and they both use NVMe/RDMA initiator...

For example, we do it for TOS feature.

>
> When an I/O request is received by the SCSI stack, we inspect it to
> determine whether we need to allocate one or two scatterlists. If the
> request has a bip attached, we'll allocate a separate scatterlist which
> is then used to map the protection information before the I/O is
> submitted to the device driver.

For an NVMe/RDMA controller we must allocate a pool of PI MRs in the 
constructor function in order to support E2E PI (and it's expensive).

We can't do it in queue_rq function. In queue_rq we get the needed MR 
according to the namespace format from a pre-allocated pool (PI_MR for 
PI_namespace and REGULAR_MR for REGULAR_namespace).

We can go with the module parameter approach but it will be very limited 
for users and wasteful.



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] nvme-cli/fabrics: Add pi_enable param to connect cmd
  2019-11-07 12:02     ` Max Gurtovoy
@ 2019-11-09  1:55       ` Martin K. Petersen
  2019-11-10  9:25         ` Max Gurtovoy
  0 siblings, 1 reply; 49+ messages in thread
From: Martin K. Petersen @ 2019-11-09  1:55 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: sagi, vladimirk, idanb, israelr, linux-nvme, shlomin, oren,
	Martin K. Petersen, kbusch, hch


Max,

> So iSER will create a signature resources for every capable device and
> connection without giving the user the possibility to distinguish
> between needed PI controllers and un-needed PI controllers.
>
> We don't have a format command in the fabrics so this is the best
> option we can think of for adding flexibility to users.
>
> I spoke with Christoph about the module param possibility and we both
> agreed it's not the way to go.
>
> Let me know if you have an idea that will enable flexibility to users.

The way it works in SCSI is that if a user wants to enable PI, they do
so when provision the device. So either a format for disk drives or by
selecting an option while creating a LUN in a storage array management
interface.

There are some knobs that can be twiddled on the initiator side to
enable/disable PI and/or DIX but these are meant for test or buggy
device workaround purposes. Not as a means for the user to select or
deselect the feature.

The user can decide on a per-device basis whether the block layer should
generate PI on write and verify on read. Those are the "proper" policy
knobs on the initiator side.

When an I/O request is received by the SCSI stack, we inspect it to
determine whether we need to allocate one or two scatterlists. If the
request has a bip attached, we'll allocate a separate scatterlist which
is then used to map the protection information before the I/O is
submitted to the device driver.

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] nvme-cli/fabrics: Add pi_enable param to connect cmd
  2019-11-07  2:36   ` Martin K. Petersen
@ 2019-11-07 12:02     ` Max Gurtovoy
  2019-11-09  1:55       ` Martin K. Petersen
  0 siblings, 1 reply; 49+ messages in thread
From: Max Gurtovoy @ 2019-11-07 12:02 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: sagi, vladimirk, idanb, israelr, linux-nvme, shlomin, oren, kbusch, hch


On 11/7/2019 4:36 AM, Martin K. Petersen wrote:
> Hi Max,

Hi Martin,

>
>> Added 'pi_enable' to 'connect' command so users can enable metadata
>> support.
>>
>> usage examples:
>> nvme connect --pi_enable --transport=rdma --traddr=10.0.1.1 --nqn=test-nvme
>> nvme connect -p -t rdma -a 10.0.1.1 -n test_nvme
> Why does it have to be explicitly enabled? I can understand why you'd
> want the capability to disable PI in case you encounter a bad target.
> But having to opt-in is different from how the rest of the block
> integrity code works. In SCSI, if HBA and target are capable, both DIX
> and PI enabled unless you explicitly disable them.

In SCSI/iSER solution (the only solution over the fabric in Linux) there 
is a module param to enable the initiator side.

I guess it was implemented this way since people don't want to mess 
with  iscsi-daemon.

So iSER will create a signature resources for every capable device and 
connection without giving the user the possibility to distinguish 
between needed PI controllers and un-needed PI controllers.

We don't have a format command in the fabrics so this is the best option 
we can think of for adding flexibility to users.

I spoke with Christoph about the module param possibility and we both 
agreed it's not the way to go.

Let me know if you have an idea that will enable flexibility to users.

>

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] nvme-cli/fabrics: Add pi_enable param to connect cmd
  2019-11-05 16:20 ` [PATCH] nvme-cli/fabrics: Add pi_enable param to connect cmd Max Gurtovoy
@ 2019-11-07  2:36   ` Martin K. Petersen
  2019-11-07 12:02     ` Max Gurtovoy
  0 siblings, 1 reply; 49+ messages in thread
From: Martin K. Petersen @ 2019-11-07  2:36 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: sagi, vladimirk, shlomin, israelr, linux-nvme, idanb, oren, kbusch, hch


Hi Max,

> Added 'pi_enable' to 'connect' command so users can enable metadata
> support.
>
> usage examples:
> nvme connect --pi_enable --transport=rdma --traddr=10.0.1.1 --nqn=test-nvme
> nvme connect -p -t rdma -a 10.0.1.1 -n test_nvme

Why does it have to be explicitly enabled? I can understand why you'd
want the capability to disable PI in case you encounter a bad target.
But having to opt-in is different from how the rest of the block
integrity code works. In SCSI, if HBA and target are capable, both DIX
and PI enabled unless you explicitly disable them.

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH] nvme-cli/fabrics: Add pi_enable param to connect cmd
  2019-11-05 16:20 [PATCH 00/15] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
@ 2019-11-05 16:20 ` Max Gurtovoy
  2019-11-07  2:36   ` Martin K. Petersen
  0 siblings, 1 reply; 49+ messages in thread
From: Max Gurtovoy @ 2019-11-05 16:20 UTC (permalink / raw)
  To: linux-nvme, kbusch, hch, sagi
  Cc: vladimirk, idanb, israelr, shlomin, oren, maxg

From: Israel Rukshin <israelr@mellanox.com>

Added 'pi_enable' to 'connect' command so users can enable metadata support.

usage examples:
nvme connect --pi_enable --transport=rdma --traddr=10.0.1.1 --nqn=test-nvme
nvme connect -p -t rdma -a 10.0.1.1 -n test_nvme

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
---
 fabrics.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/fabrics.c b/fabrics.c
index 8982ae4..b22ea14 100644
--- a/fabrics.c
+++ b/fabrics.c
@@ -72,6 +72,7 @@ static struct config {
 	int  disable_sqflow;
 	int  hdr_digest;
 	int  data_digest;
+	int  pi_enable;
 	bool persistent;
 	bool quiet;
 } cfg = { NULL };
@@ -756,7 +757,9 @@ static int build_options(char *argstr, int max_len, bool discover)
 	    add_bool_argument(&argstr, &max_len, "disable_sqflow",
 				cfg.disable_sqflow) ||
 	    add_bool_argument(&argstr, &max_len, "hdr_digest", cfg.hdr_digest) ||
-	    add_bool_argument(&argstr, &max_len, "data_digest", cfg.data_digest))
+	    add_bool_argument(&argstr, &max_len, "data_digest",
+				cfg.data_digest) ||
+	    add_bool_argument(&argstr, &max_len, "pi_enable", cfg.pi_enable))
 		return -EINVAL;
 
 	return 0;
@@ -885,6 +888,13 @@ retry:
 		p += len;
 	}
 
+	if (cfg.pi_enable) {
+		len = sprintf(p, ",pi_enable");
+		if (len < 0)
+			return -EINVAL;
+		p += len;
+	}
+
 	switch (e->trtype) {
 	case NVMF_TRTYPE_RDMA:
 	case NVMF_TRTYPE_TCP:
@@ -1171,6 +1181,7 @@ int discover(const char *desc, int argc, char **argv, bool connect)
 		OPT_INT("tos",             'T', &cfg.tos,             "type of service"),
 		OPT_FLAG("hdr_digest",     'g', &cfg.hdr_digest,      "enable transport protocol header digest (TCP transport)"),
 		OPT_FLAG("data_digest",    'G', &cfg.data_digest,     "enable transport protocol data digest (TCP transport)"),
+		OPT_FLAG("pi_enable",      'p', &cfg.pi_enable,       "enable metadata (T10-PI) support (default false)"),
 		OPT_INT("nr-io-queues",    'i', &cfg.nr_io_queues,    "number of io queues to use (default is core count)"),
 		OPT_INT("nr-write-queues", 'W', &cfg.nr_write_queues, "number of write queues to use (default 0)"),
 		OPT_INT("nr-poll-queues",  'P', &cfg.nr_poll_queues,  "number of poll queues to use (default 0)"),
@@ -1231,6 +1242,7 @@ int connect(const char *desc, int argc, char **argv)
 		OPT_FLAG("disable-sqflow",    'd', &cfg.disable_sqflow,    "disable controller sq flow control (default false)"),
 		OPT_FLAG("hdr-digest",        'g', &cfg.hdr_digest,        "enable transport protocol header digest (TCP transport)"),
 		OPT_FLAG("data-digest",       'G', &cfg.data_digest,       "enable transport protocol data digest (TCP transport)"),
+		OPT_FLAG("pi_enable",         'p', &cfg.pi_enable,         "enable metadata (T10-PI) support (default false)"),
 		OPT_END()
 	};
 
-- 
1.8.3.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, back to index

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-06 13:37 [PATCH 00/15 V3] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
2020-01-06 13:37 ` [PATCH] nvme-cli/fabrics: Add pi_enable param to connect cmd Max Gurtovoy
2020-01-06 13:37 ` [PATCH 01/15] nvme: Introduce namespace features flag Max Gurtovoy
2020-01-07 18:07   ` Keith Busch
2020-01-08 12:00     ` Max Gurtovoy
2020-01-09  3:11   ` Martin K. Petersen
2020-01-09 10:38     ` Max Gurtovoy
2020-01-09 16:26       ` Keith Busch
2020-01-12  9:40         ` Max Gurtovoy
2020-01-13 20:31           ` Keith Busch
2020-01-14 16:04             ` Max Gurtovoy
2020-01-12  9:40         ` Max Gurtovoy
2020-01-06 13:37 ` [PATCH 02/15] nvme: Enforce extended LBA format for fabrics metadata Max Gurtovoy
2020-01-16 23:53   ` James Smart
2020-01-19 11:20     ` Max Gurtovoy
2020-01-21 17:40       ` James Smart
2020-01-06 13:37 ` [PATCH 03/15] nvme: Introduce max_integrity_segments ctrl attribute Max Gurtovoy
2020-01-09  3:12   ` Martin K. Petersen
2020-01-06 13:37 ` [PATCH 04/15] nvme-fabrics: Allow user enabling metadata/T10-PI support Max Gurtovoy
2020-01-06 13:37 ` [PATCH 05/15] nvme: Introduce NVME_INLINE_PROT_SG_CNT Max Gurtovoy
2020-01-09  3:13   ` Martin K. Petersen
2020-01-06 13:37 ` [PATCH 06/15] nvme-rdma: Introduce nvme_rdma_sgl structure Max Gurtovoy
2020-01-06 13:37 ` [PATCH 07/15] nvme-rdma: Add metadata/T10-PI support Max Gurtovoy
2020-01-06 13:37 ` [PATCH 08/15] nvmet: Prepare metadata request Max Gurtovoy
2020-01-06 13:37 ` [PATCH 09/15] nvmet: Add metadata characteristics for a namespace Max Gurtovoy
2020-01-09  3:16   ` Martin K. Petersen
2020-01-06 13:37 ` [PATCH 10/15] nvmet: Rename nvmet_rw_len to nvmet_rw_data_len Max Gurtovoy
2020-01-09  3:17   ` Martin K. Petersen
2020-01-06 13:37 ` [PATCH 11/15] nvmet: Rename nvmet_check_data_len to nvmet_check_transfer_len Max Gurtovoy
2020-01-09  3:19   ` Martin K. Petersen
2020-01-06 13:37 ` [PATCH 12/15] nvme: Add Metadata Capabilities enumerations Max Gurtovoy
2020-01-06 13:37 ` [PATCH 13/15] nvmet: Add metadata/T10-PI support Max Gurtovoy
2020-01-09  3:24   ` Martin K. Petersen
2020-01-27 17:17     ` Max Gurtovoy
2020-01-29  2:32       ` Martin K. Petersen
2020-01-17 16:46   ` James Smart
2020-01-19 13:47     ` Max Gurtovoy
2020-01-06 13:37 ` [PATCH 14/15] nvmet: Add metadata support for block devices Max Gurtovoy
2020-01-06 13:37 ` [PATCH 15/15] nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
2020-01-09  3:29   ` Martin K. Petersen
  -- strict thread matches above, loose matches on Subject: below --
2019-12-02 14:47 [PATCH 00/16 V2] nvme-rdma/nvmet-rdma: " Max Gurtovoy
2019-12-02 14:47 ` [PATCH] nvme-cli/fabrics: Add pi_enable param to connect cmd Max Gurtovoy
2019-11-05 16:20 [PATCH 00/15] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
2019-11-05 16:20 ` [PATCH] nvme-cli/fabrics: Add pi_enable param to connect cmd Max Gurtovoy
2019-11-07  2:36   ` Martin K. Petersen
2019-11-07 12:02     ` Max Gurtovoy
2019-11-09  1:55       ` Martin K. Petersen
2019-11-10  9:25         ` Max Gurtovoy
2019-11-12 17:47           ` Sagi Grimberg
2019-11-12 18:14             ` James Smart
2019-11-12 18:23               ` Sagi Grimberg

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git