linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support
@ 2020-05-04 15:57 Max Gurtovoy
  2020-05-04 15:57 ` [PATCH 01/16] block: always define struct blk_integrity in genhd.h Max Gurtovoy
                   ` (16 more replies)
  0 siblings, 17 replies; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-04 15:57 UTC (permalink / raw)
  To: maxg, linux-nvme, kbusch, hch, sagi, martin.petersen, jsmart2021, axboe
  Cc: vladimirk, shlomin, israelr, idanb, oren, nitzanc

Hello Sagi, Christoph, Keith, Martin, Jens, James and Co

This patchset adds metadata (T10-PI) support for NVMeoF/RDMA host side
and target side, using signature verbs API. This set starts with a few
preparation commits to the the block layer and NVMe host core layer. It
continues with NVMeoF/RDMA host implementation and preparations to NVMe
target core layer. The patchset ends with NVMeoF/RDMA target
implementation.

In V7 I mainly did some refactoring to NVMe/RDMA drivers for host and
target sides according to Christoph's suggestions. As agreed, for the
initial mode, we'll use T10-PI/metadata for capable devices and act
according to target controller and namespaces configuration.

Configuration:
Host:
 - same as before

Target:
 - echo 1 > /config/nvmet/subsystems/${NAME}/attr_pi_enable
 - echo 1 > /config/nvmet/ports/${PORT_NUM}/param_pi_enable

The code was tested using Mellanox's ConnectX-4/ConnectX-5 HCAs.
This series applies on top of nvme_5.8 branch cleanly.

Changes from v6:
 - added preparation patch for genhd.h
 - added "Reviewed-by" (Christoph) to patches 2, 3, 4, 6, 10, 13
 - refactor nvme-rdma/nvmet-rdma code according to Christoph suggestions
 - rename md to metadata to make code more readable
 
Changes from v5:
 - rebased over nvme_5.8
 - configure RDMA controllers for T10-PI by default (if capable)
 - removed RW api patch for exporting single MR length (will add it in the future if needed)
 - Implemented Christoph's suggestions for NVMe host core
 
Changes from v4:
 - removed get_mdts patches (merged)
 - added enum nvme_ns_features instead of defines (patch 1/17)
 - rename pi/prot to md (patches 2/17 + 6/17 + 8/17 + 9/17 + 10/17)
 - another rebase

Changes from v3:
 - Added Reviewed-by signatures
 - New RDMA/rw patch (Patch 17/19)
 - Add mdts setting op for controllers (Patches 14/19, 18/19)
 - Rename NVME_NS_DIX_SUPPORTED to NVME_NS_MD_HOST_SUPPORTED and
   NVME_NS_DIF_SUPPORTED to NVME_NS_MD_CTRL_SUPPORTED (Patch 01/19)
 - Split "nvme: Introduce namespace features flag" patch (patch 02/19)
 - Rename nvmet_rdma_set_diff_domain to nvmet_rdma_set_sig_domain
   and nvme_rdma_set_diff_domain to nvme_rdma_set_sig_domain
   (Patches 08/19, 19/19)
 - Remove ns parameter from nvme_rdma_set_sig_domain/nvmet_rdma_set_sig_domain
   functions (patch 08/19, 19/19)
 - Rebase over nvme-5.7 branch

Changes from v2:
 - Convert the virtual start sector (which passed to bip_set_seed function)
   to be in integrity interval units (Patch 14/15)
 - Clarify some commit messages

Changes from v1:
 - Added Reviewed-by signatures
 - Added namespace features flag (Patch 01/15)
 - Remove nvme_ns_has_pi function (Patch 01/15)
 - Added has_pi field to struct nvme_request (Patch 01/15)
 - Subject change for patch 02/15
 - Fix comment for PCI metadata (Patch 03/15)
 - Rebase over "nvme: Avoid preallocating big SGL for data" patchset
 - Introduce NVME_INLINE_PROT_SG_CNT flag (Patch 05/15)
 - Introduce nvme_rdma_sgl structure (Patch 06/15)
 - Remove first_sgl pointer from struct nvme_rdma_request (Patch 06/15)
 - Split nvme-rdma patches (Patches 06/15, 07/15)
 - Rename is_protected to use_pi (Patch 07/15)
 - Refactor nvme_rdma_get_max_fr_pages function (Patch 07/15)
 - Added ifdef CONFIG_BLK_DEV_INTEGRITY (Patches 07/15, 09/15, 13/15,
   14/15, 15/15)
 - Added port configfs pi_enable (Patch 14/15)

Israel Rukshin (9):
  nvme: introduce NVME_INLINE_METADATA_SG_CNT
  nvme-rdma: introduce nvme_rdma_sgl structure
  nvmet: add metadata characteristics for a namespace
  nvmet: rename nvmet_rw_len to nvmet_rw_data_len
  nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len
  nvme: add Metadata Capabilities enumerations
  nvmet: add metadata/T10-PI support
  nvmet: add metadata support for block devices
  nvmet-rdma: add metadata/T10-PI support

James Smart (1):
  nvme: make nvme_ns_has_pi accessible to transports

Max Gurtovoy (6):
  block: always define struct blk_integrity in genhd.h
  nvme: introduce namespace features flag
  nvme: introduce NVME_NS_METADATA_SUPPORTED flag
  nvme: introduce max_integrity_segments ctrl attribute
  nvme: enforce extended LBA format for fabrics metadata
  nvme-rdma: add metadata/T10-PI support

 drivers/nvme/host/core.c          |  83 +++++++---
 drivers/nvme/host/lightnvm.c      |   5 +-
 drivers/nvme/host/nvme.h          |  17 ++-
 drivers/nvme/host/pci.c           |   6 +
 drivers/nvme/host/rdma.c          | 313 ++++++++++++++++++++++++++++++++++----
 drivers/nvme/target/admin-cmd.c   |  40 +++--
 drivers/nvme/target/configfs.c    |  58 +++++++
 drivers/nvme/target/core.c        | 122 +++++++++++----
 drivers/nvme/target/discovery.c   |   8 +-
 drivers/nvme/target/fabrics-cmd.c |  15 +-
 drivers/nvme/target/io-cmd-bdev.c | 113 +++++++++++++-
 drivers/nvme/target/io-cmd-file.c |   6 +-
 drivers/nvme/target/nvmet.h       |  33 +++-
 drivers/nvme/target/rdma.c        | 231 +++++++++++++++++++++++++---
 include/linux/genhd.h             |   4 -
 include/linux/nvme.h              |   6 +
 16 files changed, 919 insertions(+), 141 deletions(-)

-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 01/16] block: always define struct blk_integrity in genhd.h
  2020-05-04 15:57 [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
@ 2020-05-04 15:57 ` Max Gurtovoy
  2020-05-14  2:51   ` Martin K. Petersen
  2020-05-04 15:57 ` [PATCH 02/16] nvme: introduce namespace features flag Max Gurtovoy
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-04 15:57 UTC (permalink / raw)
  To: maxg, linux-nvme, kbusch, hch, sagi, martin.petersen, jsmart2021, axboe
  Cc: vladimirk, shlomin, israelr, idanb, oren, nitzanc

This will reduce the amount of ifdefs inside the source code for various
drivers and also will reduce the amount of stub functions that were
created for the !CONFIG_BLK_DEV_INTEGRITY case.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Israel Rukshin <israelr@mellanox.com>
---
 include/linux/genhd.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 058d895..8418581 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -169,8 +169,6 @@ struct disk_part_tbl {
 struct disk_events;
 struct badblocks;
 
-#if defined(CONFIG_BLK_DEV_INTEGRITY)
-
 struct blk_integrity {
 	const struct blk_integrity_profile	*profile;
 	unsigned char				flags;
@@ -179,8 +177,6 @@ struct blk_integrity {
 	unsigned char				tag_size;
 };
 
-#endif	/* CONFIG_BLK_DEV_INTEGRITY */
-
 struct gendisk {
 	/* major, first_minor and minors are input parameters only,
 	 * don't use directly.  Use disk_devt() and disk_max_parts().
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 02/16] nvme: introduce namespace features flag
  2020-05-04 15:57 [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
  2020-05-04 15:57 ` [PATCH 01/16] block: always define struct blk_integrity in genhd.h Max Gurtovoy
@ 2020-05-04 15:57 ` Max Gurtovoy
  2020-05-04 23:59   ` James Smart
  2020-05-14  2:52   ` Martin K. Petersen
  2020-05-04 15:57 ` [PATCH 03/16] nvme: introduce NVME_NS_METADATA_SUPPORTED flag Max Gurtovoy
                   ` (14 subsequent siblings)
  16 siblings, 2 replies; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-04 15:57 UTC (permalink / raw)
  To: maxg, linux-nvme, kbusch, hch, sagi, martin.petersen, jsmart2021, axboe
  Cc: vladimirk, shlomin, israelr, idanb, oren, nitzanc

Replace the specific ext boolean (that implies on extended LBA format)
with a feature in the new namespace features flag. This is a preparation
for adding more namespace features (such as metadata specific features).

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c     | 8 +++++---
 drivers/nvme/host/lightnvm.c | 5 ++++-
 drivers/nvme/host/nvme.h     | 6 +++++-
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 8070e32..1d226cc 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1302,7 +1302,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
 	meta_len = (io.nblocks + 1) * ns->ms;
 	metadata = nvme_to_user_ptr(io.metadata);
 
-	if (ns->ext) {
+	if (ns->features & NVME_NS_EXT_LBAS) {
 		length += meta_len;
 		meta_len = 0;
 	} else if (meta_len) {
@@ -1882,7 +1882,7 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	blk_queue_io_min(disk->queue, phys_bs);
 	blk_queue_io_opt(disk->queue, io_opt);
 
-	if (ns->ms && !ns->ext &&
+	if (ns->ms && !(ns->features & NVME_NS_EXT_LBAS) &&
 	    (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
 		nvme_init_integrity(disk, ns->ms, ns->pi_type);
 	if ((ns->ms && !nvme_ns_has_pi(ns) && !blk_get_integrity(disk)) ||
@@ -1921,8 +1921,10 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 	else
 		iob = nvme_lba_to_sect(ns, le16_to_cpu(id->noiob));
 
+	ns->features = 0;
 	ns->ms = le16_to_cpu(id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ms);
-	ns->ext = ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT);
+	if (ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT))
+		ns->features |= NVME_NS_EXT_LBAS;
 	/* the PI implementation requires metadata equal t10 pi tuple size */
 	if (ns->ms == sizeof(struct t10_pi_tuple))
 		ns->pi_type = id->dps & NVME_NS_DPS_PI_MASK;
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index ec46693..98fffac 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -961,7 +961,10 @@ int nvme_nvm_register(struct nvme_ns *ns, char *disk_name, int node)
 	geo = &dev->geo;
 	geo->csecs = 1 << ns->lba_shift;
 	geo->sos = ns->ms;
-	geo->ext = ns->ext;
+	if (ns->features & NVME_NS_EXT_LBAS)
+		geo->ext = true;
+	else
+		geo->ext = false;
 	geo->mdts = ns->ctrl->max_hw_sectors;
 
 	dev->q = q;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index f3ab177..110577c7 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -364,6 +364,10 @@ struct nvme_ns_head {
 #endif
 };
 
+enum nvme_ns_features {
+	NVME_NS_EXT_LBAS = 1 << 0, /* support extended LBA format */
+};
+
 struct nvme_ns {
 	struct list_head list;
 
@@ -383,8 +387,8 @@ struct nvme_ns {
 	u16 ms;
 	u16 sgs;
 	u32 sws;
-	bool ext;
 	u8 pi_type;
+	unsigned long features;
 	unsigned long flags;
 #define NVME_NS_REMOVING	0
 #define NVME_NS_DEAD     	1
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 03/16] nvme: introduce NVME_NS_METADATA_SUPPORTED flag
  2020-05-04 15:57 [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
  2020-05-04 15:57 ` [PATCH 01/16] block: always define struct blk_integrity in genhd.h Max Gurtovoy
  2020-05-04 15:57 ` [PATCH 02/16] nvme: introduce namespace features flag Max Gurtovoy
@ 2020-05-04 15:57 ` Max Gurtovoy
  2020-05-05 23:33   ` James Smart
  2020-05-14  2:53   ` Martin K. Petersen
  2020-05-04 15:57 ` [PATCH 04/16] nvme: make nvme_ns_has_pi accessible to transports Max Gurtovoy
                   ` (13 subsequent siblings)
  16 siblings, 2 replies; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-04 15:57 UTC (permalink / raw)
  To: maxg, linux-nvme, kbusch, hch, sagi, martin.petersen, jsmart2021, axboe
  Cc: vladimirk, shlomin, israelr, idanb, oren, nitzanc

This is a preparation for adding support for metadata in fabric
controllers. New flag will imply that NVMe namespace supports getting
metadata that was originally generated by host's block layer.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c | 41 ++++++++++++++++++++++++++++++++++-------
 drivers/nvme/host/nvme.h |  1 +
 2 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 1d226cc..4b7faf9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1882,13 +1882,27 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	blk_queue_io_min(disk->queue, phys_bs);
 	blk_queue_io_opt(disk->queue, io_opt);
 
-	if (ns->ms && !(ns->features & NVME_NS_EXT_LBAS) &&
-	    (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
-		nvme_init_integrity(disk, ns->ms, ns->pi_type);
-	if ((ns->ms && !nvme_ns_has_pi(ns) && !blk_get_integrity(disk)) ||
-	    ns->lba_shift > PAGE_SHIFT)
+	/*
+	 * The block layer can't support LBA sizes larger than the page size
+	 * yet, so catch this early and don't allow block I/O.
+	 */
+	if (ns->lba_shift > PAGE_SHIFT)
 		capacity = 0;
 
+	/*
+	 * Register a metadata profile for PI, or the plain non-integrity NVMe
+	 * metadata masquerading as Type 0 if supported, otherwise reject block
+	 * I/O to namespaces with metadata except when the namespace supports
+	 * PI, as it can strip/insert in that case.
+	 */
+	if (ns->ms) {
+		if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) &&
+		    (ns->features & NVME_NS_METADATA_SUPPORTED))
+			nvme_init_integrity(disk, ns->ms, ns->pi_type);
+		else if (!nvme_ns_has_pi(ns))
+			capacity = 0;
+	}
+
 	set_capacity_revalidate_and_notify(disk, capacity, false);
 
 	nvme_config_discard(disk, ns);
@@ -1923,14 +1937,27 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 
 	ns->features = 0;
 	ns->ms = le16_to_cpu(id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ms);
-	if (ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT))
-		ns->features |= NVME_NS_EXT_LBAS;
 	/* the PI implementation requires metadata equal t10 pi tuple size */
 	if (ns->ms == sizeof(struct t10_pi_tuple))
 		ns->pi_type = id->dps & NVME_NS_DPS_PI_MASK;
 	else
 		ns->pi_type = 0;
 
+	if (ns->ms) {
+		if (id->flbas & NVME_NS_FLBAS_META_EXT)
+			ns->features |= NVME_NS_EXT_LBAS;
+
+		/*
+		 * For PCI, Extended logical block will be generated by the
+		 * controller. Non-extended format can be generated by the
+		 * block layer.
+		 */
+		if (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) {
+			if (!(ns->features & NVME_NS_EXT_LBAS))
+				ns->features |= NVME_NS_METADATA_SUPPORTED;
+		}
+	}
+
 	if (iob)
 		blk_queue_chunk_sectors(ns->queue, rounddown_pow_of_two(iob));
 	nvme_update_disk_info(disk, ns, id);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 110577c7..0dda145 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -366,6 +366,7 @@ struct nvme_ns_head {
 
 enum nvme_ns_features {
 	NVME_NS_EXT_LBAS = 1 << 0, /* support extended LBA format */
+	NVME_NS_METADATA_SUPPORTED = 1 << 1, /* support getting generated md */
 };
 
 struct nvme_ns {
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 04/16] nvme: make nvme_ns_has_pi accessible to transports
  2020-05-04 15:57 [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (2 preceding siblings ...)
  2020-05-04 15:57 ` [PATCH 03/16] nvme: introduce NVME_NS_METADATA_SUPPORTED flag Max Gurtovoy
@ 2020-05-04 15:57 ` Max Gurtovoy
  2020-05-14  2:53   ` Martin K. Petersen
  2020-05-04 15:57 ` [PATCH 05/16] nvme: introduce max_integrity_segments ctrl attribute Max Gurtovoy
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-04 15:57 UTC (permalink / raw)
  To: maxg, linux-nvme, kbusch, hch, sagi, martin.petersen, jsmart2021, axboe
  Cc: vladimirk, shlomin, israelr, idanb, oren, nitzanc

From: James Smart <jsmart2021@gmail.com>

Move the nvme_ns_has_pi() inline from core.c to the nvme.h header.
This allows use by the transports.

Signed-off-by: James Smart <jsmart2021@gmail.com>
[maxg: added a comment for nvme_ns_has_pi()]
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c | 6 ------
 drivers/nvme/host/nvme.h | 7 +++++++
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 4b7faf9..94d2549 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -19,7 +19,6 @@
 #include <linux/pr.h>
 #include <linux/ptrace.h>
 #include <linux/nvme_ioctl.h>
-#include <linux/t10-pi.h>
 #include <linux/pm_qos.h>
 #include <asm/unaligned.h>
 
@@ -204,11 +203,6 @@ static void nvme_delete_ctrl_sync(struct nvme_ctrl *ctrl)
 	nvme_put_ctrl(ctrl);
 }
 
-static inline bool nvme_ns_has_pi(struct nvme_ns *ns)
-{
-	return ns->pi_type && ns->ms == sizeof(struct t10_pi_tuple);
-}
-
 static blk_status_t nvme_error_status(u16 status)
 {
 	switch (status & 0x7ff) {
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 0dda145..0f5fa85 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -16,6 +16,7 @@
 #include <linux/fault-inject.h>
 #include <linux/rcupdate.h>
 #include <linux/wait.h>
+#include <linux/t10-pi.h>
 
 #include <trace/events/block.h>
 
@@ -399,6 +400,12 @@ struct nvme_ns {
 
 };
 
+/* NVMe ns supports metadata actions by the controller (generate/strip) */
+static inline bool nvme_ns_has_pi(struct nvme_ns *ns)
+{
+	return ns->pi_type && ns->ms == sizeof(struct t10_pi_tuple);
+}
+
 struct nvme_ctrl_ops {
 	const char *name;
 	struct module *module;
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 05/16] nvme: introduce max_integrity_segments ctrl attribute
  2020-05-04 15:57 [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (3 preceding siblings ...)
  2020-05-04 15:57 ` [PATCH 04/16] nvme: make nvme_ns_has_pi accessible to transports Max Gurtovoy
@ 2020-05-04 15:57 ` Max Gurtovoy
  2020-05-05 23:51   ` James Smart
  2020-05-13 19:04   ` James Smart
  2020-05-04 15:57 ` [PATCH 06/16] nvme: enforce extended LBA format for fabrics metadata Max Gurtovoy
                   ` (11 subsequent siblings)
  16 siblings, 2 replies; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-04 15:57 UTC (permalink / raw)
  To: maxg, linux-nvme, kbusch, hch, sagi, martin.petersen, jsmart2021, axboe
  Cc: vladimirk, shlomin, israelr, idanb, oren, nitzanc

This patch doesn't change any logic, and is needed as a preparation
for adding PI support for fabrics drivers that will use an extended
LBA format for metadata and will support more than 1 integrity segment.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c | 11 +++++++----
 drivers/nvme/host/nvme.h |  1 +
 drivers/nvme/host/pci.c  |  6 ++++++
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 94d2549..89b1bfcb 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1690,7 +1690,8 @@ static int nvme_getgeo(struct block_device *bdev, struct hd_geometry *geo)
 }
 
 #ifdef CONFIG_BLK_DEV_INTEGRITY
-static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type)
+static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type,
+				u32 max_integrity_segments)
 {
 	struct blk_integrity integrity;
 
@@ -1713,10 +1714,11 @@ static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type)
 	}
 	integrity.tuple_size = ms;
 	blk_integrity_register(disk, &integrity);
-	blk_queue_max_integrity_segments(disk->queue, 1);
+	blk_queue_max_integrity_segments(disk->queue, max_integrity_segments);
 }
 #else
-static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type)
+static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type,
+				u32 max_integrity_segments)
 {
 }
 #endif /* CONFIG_BLK_DEV_INTEGRITY */
@@ -1892,7 +1894,8 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	if (ns->ms) {
 		if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) &&
 		    (ns->features & NVME_NS_METADATA_SUPPORTED))
-			nvme_init_integrity(disk, ns->ms, ns->pi_type);
+			nvme_init_integrity(disk, ns->ms, ns->pi_type,
+					    ns->ctrl->max_integrity_segments);
 		else if (!nvme_ns_has_pi(ns))
 			capacity = 0;
 	}
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 0f5fa85..6c7e76c 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -229,6 +229,7 @@ struct nvme_ctrl {
 	u32 page_size;
 	u32 max_hw_sectors;
 	u32 max_segments;
+	u32 max_integrity_segments;
 	u16 crdt[3];
 	u16 oncs;
 	u16 oacs;
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index eef5d6a..a9b124a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2539,6 +2539,12 @@ static void nvme_reset_work(struct work_struct *work)
 		goto out;
 	}
 
+	/*
+	 * We do not support an SGL for metadata (yet), so we are limited to a
+	 * single integrity segment for the separate metadata pointer.
+	 */
+	dev->ctrl.max_integrity_segments = 1;
+
 	result = nvme_init_identify(&dev->ctrl);
 	if (result)
 		goto out;
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 06/16] nvme: enforce extended LBA format for fabrics metadata
  2020-05-04 15:57 [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (4 preceding siblings ...)
  2020-05-04 15:57 ` [PATCH 05/16] nvme: introduce max_integrity_segments ctrl attribute Max Gurtovoy
@ 2020-05-04 15:57 ` Max Gurtovoy
  2020-05-13 19:03   ` James Smart
  2020-05-04 15:57 ` [PATCH 07/16] nvme: introduce NVME_INLINE_METADATA_SG_CNT Max Gurtovoy
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-04 15:57 UTC (permalink / raw)
  To: maxg, linux-nvme, kbusch, hch, sagi, martin.petersen, jsmart2021, axboe
  Cc: vladimirk, shlomin, israelr, idanb, oren, nitzanc

An extended LBA is a larger LBA that is created when metadata associated
with the LBA is transferred contiguously with the LBA data (AKA
interleaved). The metadata may be either transferred as part of the LBA
(creating an extended LBA) or it may be transferred as a separate
contiguous buffer of data. According to the NVMeoF spec, a fabrics ctrl
supports only an Extended LBA format. Fail revalidation in case we have a
spec violation. Also add a flag that will imply on capable transports and
controllers as part of a preparation for allowing end-to-end protection
information for fabric controllers.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c | 41 +++++++++++++++++++++++++++--------------
 1 file changed, 27 insertions(+), 14 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 89b1bfcb..f565281 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1913,9 +1913,10 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	blk_mq_unfreeze_queue(disk->queue);
 }
 
-static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
+static int __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 {
 	struct nvme_ns *ns = disk->private_data;
+	struct nvme_ctrl *ctrl = ns->ctrl;
 	u32 iob;
 
 	/*
@@ -1926,9 +1927,9 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 	if (ns->lba_shift == 0)
 		ns->lba_shift = 9;
 
-	if ((ns->ctrl->quirks & NVME_QUIRK_STRIPE_SIZE) &&
-	    is_power_of_2(ns->ctrl->max_hw_sectors))
-		iob = ns->ctrl->max_hw_sectors;
+	if ((ctrl->quirks & NVME_QUIRK_STRIPE_SIZE) &&
+	    is_power_of_2(ctrl->max_hw_sectors))
+		iob = ctrl->max_hw_sectors;
 	else
 		iob = nvme_lba_to_sect(ns, le16_to_cpu(id->noiob));
 
@@ -1941,16 +1942,24 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 		ns->pi_type = 0;
 
 	if (ns->ms) {
-		if (id->flbas & NVME_NS_FLBAS_META_EXT)
-			ns->features |= NVME_NS_EXT_LBAS;
-
 		/*
-		 * For PCI, Extended logical block will be generated by the
-		 * controller. Non-extended format can be generated by the
-		 * block layer.
+		 * For PCIe only the separate metadata pointer is supported,
+		 * as the block layer supplies metadata in a separate bio_vec
+		 * chain. For Fabrics, only metadata as part of extended data
+		 * LBA is supported on the wire per the Fabrics specification,
+		 * but the HBA/HCA will do the remapping from the separate
+		 * metadata buffers for us.
 		 */
-		if (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) {
-			if (!(ns->features & NVME_NS_EXT_LBAS))
+		if (id->flbas & NVME_NS_FLBAS_META_EXT) {
+			ns->features |= NVME_NS_EXT_LBAS;
+			if ((ctrl->ops->flags & NVME_F_FABRICS) &&
+			    (ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) &&
+			    ctrl->max_integrity_segments)
+				ns->features |= NVME_NS_METADATA_SUPPORTED;
+		} else {
+			if (WARN_ON_ONCE(ctrl->ops->flags & NVME_F_FABRICS))
+				return -EINVAL;
+			if (ctrl->ops->flags & NVME_F_METADATA_SUPPORTED)
 				ns->features |= NVME_NS_METADATA_SUPPORTED;
 		}
 	}
@@ -1965,6 +1974,7 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 		revalidate_disk(ns->head->disk);
 	}
 #endif
+	return 0;
 }
 
 static int nvme_revalidate_disk(struct gendisk *disk)
@@ -2000,7 +2010,7 @@ static int nvme_revalidate_disk(struct gendisk *disk)
 		goto free_id;
 	}
 
-	__nvme_revalidate_disk(disk, id);
+	ret = __nvme_revalidate_disk(disk, id);
 free_id:
 	kfree(id);
 out:
@@ -3654,7 +3664,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	memcpy(disk->disk_name, disk_name, DISK_NAME_LEN);
 	ns->disk = disk;
 
-	__nvme_revalidate_disk(disk, id);
+	if (__nvme_revalidate_disk(disk, id))
+		goto out_free_disk;
 
 	if ((ctrl->quirks & NVME_QUIRK_LIGHTNVM) && id->vs[0] == 0x1) {
 		ret = nvme_nvm_register(ns, disk_name, node);
@@ -3679,6 +3690,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	return;
  out_put_disk:
 	put_disk(ns->disk);
+ out_free_disk:
+	del_gendisk(ns->disk);
  out_unlink_ns:
 	mutex_lock(&ctrl->subsys->lock);
 	list_del_rcu(&ns->siblings);
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 07/16] nvme: introduce NVME_INLINE_METADATA_SG_CNT
  2020-05-04 15:57 [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (5 preceding siblings ...)
  2020-05-04 15:57 ` [PATCH 06/16] nvme: enforce extended LBA format for fabrics metadata Max Gurtovoy
@ 2020-05-04 15:57 ` Max Gurtovoy
  2020-05-13 19:05   ` James Smart
  2020-05-04 15:57 ` [PATCH 08/16] nvme-rdma: introduce nvme_rdma_sgl structure Max Gurtovoy
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-04 15:57 UTC (permalink / raw)
  To: maxg, linux-nvme, kbusch, hch, sagi, martin.petersen, jsmart2021, axboe
  Cc: vladimirk, shlomin, israelr, idanb, oren, nitzanc

From: Israel Rukshin <israelr@mellanox.com>

SGL size of metadata is usually small. Thus, 1 inline sg should cover
most cases. The macro will be used for pre-allocate a single SGL entry
for metadata. The preallocation of small inline SGLs depends on SG_CHAIN
capability so if the ARCH doesn't support SG_CHAIN, use the runtime
allocation for the SGL. This patch is a preparation for adding metadata
(T10-PI) over fabric support.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/nvme.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 6c7e76c..7c2e6d4 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -31,8 +31,10 @@
 
 #ifdef CONFIG_ARCH_NO_SG_CHAIN
 #define  NVME_INLINE_SG_CNT  0
+#define  NVME_INLINE_METADATA_SG_CNT  0
 #else
 #define  NVME_INLINE_SG_CNT  2
+#define  NVME_INLINE_METADATA_SG_CNT  1
 #endif
 
 extern struct workqueue_struct *nvme_wq;
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 08/16] nvme-rdma: introduce nvme_rdma_sgl structure
  2020-05-04 15:57 [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (6 preceding siblings ...)
  2020-05-04 15:57 ` [PATCH 07/16] nvme: introduce NVME_INLINE_METADATA_SG_CNT Max Gurtovoy
@ 2020-05-04 15:57 ` Max Gurtovoy
  2020-05-04 15:57 ` [PATCH 09/16] nvme-rdma: add metadata/T10-PI support Max Gurtovoy
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-04 15:57 UTC (permalink / raw)
  To: maxg, linux-nvme, kbusch, hch, sagi, martin.petersen, jsmart2021, axboe
  Cc: vladimirk, shlomin, israelr, idanb, oren, nitzanc

From: Israel Rukshin <israelr@mellanox.com>

Remove first_sgl pointer from struct nvme_rdma_request and use pointer
arithmetic instead. The inline scatterlist, if exists, will be located
right after the nvme_rdma_request. This patch is needed as a preparation
for adding PI support.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/rdma.c | 41 ++++++++++++++++++++++++-----------------
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index cac8a93..4086874 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -48,6 +48,11 @@ struct nvme_rdma_qe {
 	u64			dma;
 };
 
+struct nvme_rdma_sgl {
+	int			nents;
+	struct sg_table		sg_table;
+};
+
 struct nvme_rdma_queue;
 struct nvme_rdma_request {
 	struct nvme_request	req;
@@ -58,12 +63,10 @@ struct nvme_rdma_request {
 	refcount_t		ref;
 	struct ib_sge		sge[1 + NVME_RDMA_MAX_INLINE_SEGMENTS];
 	u32			num_sge;
-	int			nents;
 	struct ib_reg_wr	reg_wr;
 	struct ib_cqe		reg_cqe;
 	struct nvme_rdma_queue  *queue;
-	struct sg_table		sg_table;
-	struct scatterlist	first_sgl[];
+	struct nvme_rdma_sgl	data_sgl;
 };
 
 enum nvme_rdma_queue_flags {
@@ -1158,8 +1161,9 @@ static void nvme_rdma_unmap_data(struct nvme_rdma_queue *queue,
 		req->mr = NULL;
 	}
 
-	ib_dma_unmap_sg(ibdev, req->sg_table.sgl, req->nents, rq_dma_dir(rq));
-	sg_free_table_chained(&req->sg_table, NVME_INLINE_SG_CNT);
+	ib_dma_unmap_sg(ibdev, req->data_sgl.sg_table.sgl, req->data_sgl.nents,
+			rq_dma_dir(rq));
+	sg_free_table_chained(&req->data_sgl.sg_table, NVME_INLINE_SG_CNT);
 }
 
 static int nvme_rdma_set_sg_null(struct nvme_command *c)
@@ -1178,7 +1182,7 @@ static int nvme_rdma_map_sg_inline(struct nvme_rdma_queue *queue,
 		int count)
 {
 	struct nvme_sgl_desc *sg = &c->common.dptr.sgl;
-	struct scatterlist *sgl = req->sg_table.sgl;
+	struct scatterlist *sgl = req->data_sgl.sg_table.sgl;
 	struct ib_sge *sge = &req->sge[1];
 	u32 len = 0;
 	int i;
@@ -1203,8 +1207,8 @@ static int nvme_rdma_map_sg_single(struct nvme_rdma_queue *queue,
 {
 	struct nvme_keyed_sgl_desc *sg = &c->common.dptr.ksgl;
 
-	sg->addr = cpu_to_le64(sg_dma_address(req->sg_table.sgl));
-	put_unaligned_le24(sg_dma_len(req->sg_table.sgl), sg->length);
+	sg->addr = cpu_to_le64(sg_dma_address(req->data_sgl.sg_table.sgl));
+	put_unaligned_le24(sg_dma_len(req->data_sgl.sg_table.sgl), sg->length);
 	put_unaligned_le32(queue->device->pd->unsafe_global_rkey, sg->key);
 	sg->type = NVME_KEY_SGL_FMT_DATA_DESC << 4;
 	return 0;
@@ -1225,7 +1229,8 @@ static int nvme_rdma_map_sg_fr(struct nvme_rdma_queue *queue,
 	 * Align the MR to a 4K page size to match the ctrl page size and
 	 * the block virtual boundary.
 	 */
-	nr = ib_map_mr_sg(req->mr, req->sg_table.sgl, count, NULL, SZ_4K);
+	nr = ib_map_mr_sg(req->mr, req->data_sgl.sg_table.sgl, count, NULL,
+			  SZ_4K);
 	if (unlikely(nr < count)) {
 		ib_mr_pool_put(queue->qp, &queue->qp->rdma_mrs, req->mr);
 		req->mr = NULL;
@@ -1272,17 +1277,18 @@ static int nvme_rdma_map_data(struct nvme_rdma_queue *queue,
 	if (!blk_rq_nr_phys_segments(rq))
 		return nvme_rdma_set_sg_null(c);
 
-	req->sg_table.sgl = req->first_sgl;
-	ret = sg_alloc_table_chained(&req->sg_table,
-			blk_rq_nr_phys_segments(rq), req->sg_table.sgl,
+	req->data_sgl.sg_table.sgl = (struct scatterlist *)(req + 1);
+	ret = sg_alloc_table_chained(&req->data_sgl.sg_table,
+			blk_rq_nr_phys_segments(rq), req->data_sgl.sg_table.sgl,
 			NVME_INLINE_SG_CNT);
 	if (ret)
 		return -ENOMEM;
 
-	req->nents = blk_rq_map_sg(rq->q, rq, req->sg_table.sgl);
+	req->data_sgl.nents = blk_rq_map_sg(rq->q, rq,
+					    req->data_sgl.sg_table.sgl);
 
-	count = ib_dma_map_sg(ibdev, req->sg_table.sgl, req->nents,
-			      rq_dma_dir(rq));
+	count = ib_dma_map_sg(ibdev, req->data_sgl.sg_table.sgl,
+			      req->data_sgl.nents, rq_dma_dir(rq));
 	if (unlikely(count <= 0)) {
 		ret = -EIO;
 		goto out_free_table;
@@ -1311,9 +1317,10 @@ static int nvme_rdma_map_data(struct nvme_rdma_queue *queue,
 	return 0;
 
 out_unmap_sg:
-	ib_dma_unmap_sg(ibdev, req->sg_table.sgl, req->nents, rq_dma_dir(rq));
+	ib_dma_unmap_sg(ibdev, req->data_sgl.sg_table.sgl, req->data_sgl.nents,
+			rq_dma_dir(rq));
 out_free_table:
-	sg_free_table_chained(&req->sg_table, NVME_INLINE_SG_CNT);
+	sg_free_table_chained(&req->data_sgl.sg_table, NVME_INLINE_SG_CNT);
 	return ret;
 }
 
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 09/16] nvme-rdma: add metadata/T10-PI support
  2020-05-04 15:57 [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (7 preceding siblings ...)
  2020-05-04 15:57 ` [PATCH 08/16] nvme-rdma: introduce nvme_rdma_sgl structure Max Gurtovoy
@ 2020-05-04 15:57 ` Max Gurtovoy
  2020-05-05  6:12   ` Christoph Hellwig
  2020-05-14  3:02   ` Martin K. Petersen
  2020-05-04 15:57 ` [PATCH 10/16] nvmet: add metadata characteristics for a namespace Max Gurtovoy
                   ` (7 subsequent siblings)
  16 siblings, 2 replies; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-04 15:57 UTC (permalink / raw)
  To: maxg, linux-nvme, kbusch, hch, sagi, martin.petersen, jsmart2021, axboe
  Cc: vladimirk, shlomin, israelr, idanb, oren, nitzanc

For capable HCAs (e.g. ConnectX-5/ConnectX-6) this will allow end-to-end
protection information passthrough and validation for NVMe over RDMA
transport. Metadata offload support was implemented over the new RDMA
signature verbs API and it is enabled for capable controllers.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
---
 drivers/nvme/host/rdma.c | 272 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 262 insertions(+), 10 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 4086874..05dbed3 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -34,6 +34,11 @@
 
 #define NVME_RDMA_MAX_INLINE_SEGMENTS	4
 
+#define NVME_RDMA_DATA_SGL_SIZE \
+	(sizeof(struct scatterlist) * NVME_INLINE_SG_CNT)
+#define NVME_RDMA_METADATA_SGL_SIZE \
+	(sizeof(struct scatterlist) * NVME_INLINE_METADATA_SG_CNT)
+
 struct nvme_rdma_device {
 	struct ib_device	*dev;
 	struct ib_pd		*pd;
@@ -67,6 +72,8 @@ struct nvme_rdma_request {
 	struct ib_cqe		reg_cqe;
 	struct nvme_rdma_queue  *queue;
 	struct nvme_rdma_sgl	data_sgl;
+	struct nvme_rdma_sgl	*metadata_sgl;
+	bool			use_sig_mr;
 };
 
 enum nvme_rdma_queue_flags {
@@ -88,6 +95,7 @@ struct nvme_rdma_queue {
 	struct rdma_cm_id	*cm_id;
 	int			cm_error;
 	struct completion	cm_done;
+	bool			pi_support;
 };
 
 struct nvme_rdma_ctrl {
@@ -264,6 +272,8 @@ static int nvme_rdma_create_qp(struct nvme_rdma_queue *queue, const int factor)
 	init_attr.qp_type = IB_QPT_RC;
 	init_attr.send_cq = queue->ib_cq;
 	init_attr.recv_cq = queue->ib_cq;
+	if (queue->pi_support)
+		init_attr.create_flags |= IB_QP_CREATE_INTEGRITY_EN;
 
 	ret = rdma_create_qp(queue->cm_id, dev->pd, &init_attr);
 
@@ -293,6 +303,12 @@ static int nvme_rdma_init_request(struct blk_mq_tag_set *set,
 	if (!req->sqe.data)
 		return -ENOMEM;
 
+	/* metadata nvme_rdma_sgl struct is located after command's data SGL */
+	if (queue->pi_support)
+		req->metadata_sgl = (void *)nvme_req(rq) +
+			sizeof(struct nvme_rdma_request) +
+			NVME_RDMA_DATA_SGL_SIZE;
+
 	req->queue = queue;
 
 	return 0;
@@ -403,6 +419,8 @@ static void nvme_rdma_destroy_queue_ib(struct nvme_rdma_queue *queue)
 	dev = queue->device;
 	ibdev = dev->dev;
 
+	if (queue->pi_support)
+		ib_mr_pool_destroy(queue->qp, &queue->qp->sig_mrs);
 	ib_mr_pool_destroy(queue->qp, &queue->qp->rdma_mrs);
 
 	/*
@@ -419,10 +437,16 @@ static void nvme_rdma_destroy_queue_ib(struct nvme_rdma_queue *queue)
 	nvme_rdma_dev_put(dev);
 }
 
-static int nvme_rdma_get_max_fr_pages(struct ib_device *ibdev)
+static int nvme_rdma_get_max_fr_pages(struct ib_device *ibdev, bool pi_support)
 {
-	return min_t(u32, NVME_RDMA_MAX_SEGMENTS,
-		     ibdev->attrs.max_fast_reg_page_list_len - 1);
+	u32 max_page_list_len;
+
+	if (pi_support)
+		max_page_list_len = ibdev->attrs.max_pi_fast_reg_page_list_len;
+	else
+		max_page_list_len = ibdev->attrs.max_fast_reg_page_list_len;
+
+	return min_t(u32, NVME_RDMA_MAX_SEGMENTS, max_page_list_len - 1);
 }
 
 static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue)
@@ -479,7 +503,7 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue)
 	 * misaligned we'll end up using two entries for a single data page,
 	 * so one additional entry is required.
 	 */
-	pages_per_mr = nvme_rdma_get_max_fr_pages(ibdev) + 1;
+	pages_per_mr = nvme_rdma_get_max_fr_pages(ibdev, queue->pi_support) + 1;
 	ret = ib_mr_pool_init(queue->qp, &queue->qp->rdma_mrs,
 			      queue->queue_size,
 			      IB_MR_TYPE_MEM_REG,
@@ -491,10 +515,24 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue)
 		goto out_destroy_ring;
 	}
 
+	if (queue->pi_support) {
+		ret = ib_mr_pool_init(queue->qp, &queue->qp->sig_mrs,
+				      queue->queue_size, IB_MR_TYPE_INTEGRITY,
+				      pages_per_mr, pages_per_mr);
+		if (ret) {
+			dev_err(queue->ctrl->ctrl.device,
+				"failed to initialize PI MR pool sized %d for QID %d\n",
+				queue->queue_size, idx);
+			goto out_destroy_mr_pool;
+		}
+	}
+
 	set_bit(NVME_RDMA_Q_TR_READY, &queue->flags);
 
 	return 0;
 
+out_destroy_mr_pool:
+	ib_mr_pool_destroy(queue->qp, &queue->qp->rdma_mrs);
 out_destroy_ring:
 	nvme_rdma_free_ring(ibdev, queue->rsp_ring, queue->queue_size,
 			    sizeof(struct nvme_completion), DMA_FROM_DEVICE);
@@ -516,6 +554,10 @@ static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
 
 	queue = &ctrl->queues[idx];
 	queue->ctrl = ctrl;
+	if (idx && ctrl->ctrl.max_integrity_segments)
+		queue->pi_support = true;
+	else
+		queue->pi_support = false;
 	init_completion(&queue->cm_done);
 
 	if (idx > 0)
@@ -726,7 +768,7 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
 		set->reserved_tags = 2; /* connect + keep-alive */
 		set->numa_node = nctrl->numa_node;
 		set->cmd_size = sizeof(struct nvme_rdma_request) +
-			NVME_INLINE_SG_CNT * sizeof(struct scatterlist);
+				NVME_RDMA_DATA_SGL_SIZE;
 		set->driver_data = ctrl;
 		set->nr_hw_queues = 1;
 		set->timeout = ADMIN_TIMEOUT;
@@ -740,7 +782,10 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
 		set->numa_node = nctrl->numa_node;
 		set->flags = BLK_MQ_F_SHOULD_MERGE;
 		set->cmd_size = sizeof(struct nvme_rdma_request) +
-			NVME_INLINE_SG_CNT * sizeof(struct scatterlist);
+				NVME_RDMA_DATA_SGL_SIZE;
+		if (nctrl->max_integrity_segments)
+			set->cmd_size += sizeof(struct nvme_rdma_sgl) +
+					 NVME_RDMA_METADATA_SGL_SIZE;
 		set->driver_data = ctrl;
 		set->nr_hw_queues = nctrl->queue_count - 1;
 		set->timeout = NVME_IO_TIMEOUT;
@@ -773,6 +818,7 @@ static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl,
 static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
 		bool new)
 {
+	bool pi_capable = false;
 	int error;
 
 	error = nvme_rdma_alloc_queue(ctrl, 0, NVME_AQ_DEPTH);
@@ -782,7 +828,13 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
 	ctrl->device = ctrl->queues[0].device;
 	ctrl->ctrl.numa_node = dev_to_node(ctrl->device->dev->dma_device);
 
-	ctrl->max_fr_pages = nvme_rdma_get_max_fr_pages(ctrl->device->dev);
+	/* T10-PI support */
+	if (ctrl->device->dev->attrs.device_cap_flags &
+	    IB_DEVICE_INTEGRITY_HANDOVER)
+		pi_capable = true;
+
+	ctrl->max_fr_pages = nvme_rdma_get_max_fr_pages(ctrl->device->dev,
+							pi_capable);
 
 	/*
 	 * Bind the async event SQE DMA mapping to the admin queue lifetime.
@@ -824,6 +876,10 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
 
 	ctrl->ctrl.max_segments = ctrl->max_fr_pages;
 	ctrl->ctrl.max_hw_sectors = ctrl->max_fr_pages << (ilog2(SZ_4K) - 9);
+	if (pi_capable)
+		ctrl->ctrl.max_integrity_segments = ctrl->max_fr_pages;
+	else
+		ctrl->ctrl.max_integrity_segments = 0;
 
 	blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
 
@@ -1152,12 +1208,23 @@ static void nvme_rdma_unmap_data(struct nvme_rdma_queue *queue,
 	struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
 	struct nvme_rdma_device *dev = queue->device;
 	struct ib_device *ibdev = dev->dev;
+	struct list_head *pool = &queue->qp->rdma_mrs;
 
 	if (!blk_rq_nr_phys_segments(rq))
 		return;
 
+	if (blk_integrity_rq(rq)) {
+		ib_dma_unmap_sg(ibdev, req->metadata_sgl->sg_table.sgl,
+				req->metadata_sgl->nents, rq_dma_dir(rq));
+		sg_free_table_chained(&req->metadata_sgl->sg_table,
+				      NVME_INLINE_METADATA_SG_CNT);
+	}
+
+	if (req->use_sig_mr)
+		pool = &queue->qp->sig_mrs;
+
 	if (req->mr) {
-		ib_mr_pool_put(queue->qp, &queue->qp->rdma_mrs, req->mr);
+		ib_mr_pool_put(queue->qp, pool, req->mr);
 		req->mr = NULL;
 	}
 
@@ -1261,12 +1328,117 @@ static int nvme_rdma_map_sg_fr(struct nvme_rdma_queue *queue,
 	return 0;
 }
 
+static void nvme_rdma_set_sig_domain(struct blk_integrity *bi,
+		struct nvme_command *cmd, struct ib_sig_domain *domain,
+		u16 control)
+{
+	domain->sig_type = IB_SIG_TYPE_T10_DIF;
+	domain->sig.dif.bg_type = IB_T10DIF_CRC;
+	domain->sig.dif.pi_interval = 1 << bi->interval_exp;
+	domain->sig.dif.ref_tag = le32_to_cpu(cmd->rw.reftag);
+	domain->sig.dif.apptag_check_mask = 0xffff;
+	domain->sig.dif.app_escape = true;
+	domain->sig.dif.ref_escape = true;
+	if (control & NVME_RW_PRINFO_PRCHK_REF)
+		domain->sig.dif.ref_remap = true;
+}
+
+static void nvme_rdma_set_sig_attrs(struct blk_integrity *bi,
+		struct nvme_command *cmd, struct ib_sig_attrs *sig_attrs)
+{
+	u16 control = le16_to_cpu(cmd->rw.control);
+
+	memset(sig_attrs, 0, sizeof(*sig_attrs));
+	if (control & NVME_RW_PRINFO_PRACT) {
+		/* for WRITE_INSERT/READ_STRIP no memory domain */
+		sig_attrs->mem.sig_type = IB_SIG_TYPE_NONE;
+		nvme_rdma_set_sig_domain(bi, cmd, &sig_attrs->wire, control);
+		/* Clear the PRACT bit since HCA will generate/verify the PI */
+		control &= ~NVME_RW_PRINFO_PRACT;
+		cmd->rw.control = cpu_to_le16(control);
+	} else {
+		/* for WRITE_PASS/READ_PASS both wire/memory domains exist */
+		nvme_rdma_set_sig_domain(bi, cmd, &sig_attrs->wire, control);
+		nvme_rdma_set_sig_domain(bi, cmd, &sig_attrs->mem, control);
+	}
+}
+
+static void nvme_rdma_set_prot_checks(struct nvme_command *cmd, u8 *mask)
+{
+	*mask = 0;
+	if (le16_to_cpu(cmd->rw.control) & NVME_RW_PRINFO_PRCHK_REF)
+		*mask |= IB_SIG_CHECK_REFTAG;
+	if (le16_to_cpu(cmd->rw.control) & NVME_RW_PRINFO_PRCHK_GUARD)
+		*mask |= IB_SIG_CHECK_GUARD;
+}
+
+static void nvme_rdma_sig_done(struct ib_cq *cq, struct ib_wc *wc)
+{
+	if (unlikely(wc->status != IB_WC_SUCCESS))
+		nvme_rdma_wr_error(cq, wc, "SIG");
+}
+
+static int nvme_rdma_map_sg_pi(struct nvme_rdma_queue *queue,
+		struct nvme_rdma_request *req, struct nvme_command *c,
+		int count, int pi_count)
+{
+	struct nvme_rdma_sgl *sgl = &req->data_sgl;
+	struct ib_reg_wr *wr = &req->reg_wr;
+	struct request *rq = blk_mq_rq_from_pdu(req);
+	struct bio *bio = rq->bio;
+	struct nvme_keyed_sgl_desc *sg = &c->common.dptr.ksgl;
+	int nr;
+
+	req->mr = ib_mr_pool_get(queue->qp, &queue->qp->sig_mrs);
+	if (WARN_ON_ONCE(!req->mr))
+		return -EAGAIN;
+
+	nr = ib_map_mr_sg_pi(req->mr, sgl->sg_table.sgl, count, NULL,
+			     req->metadata_sgl->sg_table.sgl, pi_count, NULL,
+			     SZ_4K);
+	if (unlikely(nr))
+		goto mr_put;
+
+	nvme_rdma_set_sig_attrs(blk_get_integrity(bio->bi_disk), c,
+				req->mr->sig_attrs);
+	nvme_rdma_set_prot_checks(c, &req->mr->sig_attrs->check_mask);
+
+	ib_update_fast_reg_key(req->mr, ib_inc_rkey(req->mr->rkey));
+
+	req->reg_cqe.done = nvme_rdma_sig_done;
+	memset(wr, 0, sizeof(*wr));
+	wr->wr.opcode = IB_WR_REG_MR_INTEGRITY;
+	wr->wr.wr_cqe = &req->reg_cqe;
+	wr->wr.num_sge = 0;
+	wr->wr.send_flags = 0;
+	wr->mr = req->mr;
+	wr->key = req->mr->rkey;
+	wr->access = IB_ACCESS_LOCAL_WRITE |
+		     IB_ACCESS_REMOTE_READ |
+		     IB_ACCESS_REMOTE_WRITE;
+
+	sg->addr = cpu_to_le64(req->mr->iova);
+	put_unaligned_le24(req->mr->length, sg->length);
+	put_unaligned_le32(req->mr->rkey, sg->key);
+	sg->type = NVME_KEY_SGL_FMT_DATA_DESC << 4;
+
+	return 0;
+
+mr_put:
+	ib_mr_pool_put(queue->qp, &queue->qp->sig_mrs, req->mr);
+	req->mr = NULL;
+	if (nr < 0)
+		return nr;
+	return -EINVAL;
+}
+
 static int nvme_rdma_map_data(struct nvme_rdma_queue *queue,
 		struct request *rq, struct nvme_command *c)
 {
 	struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
 	struct nvme_rdma_device *dev = queue->device;
 	struct ib_device *ibdev = dev->dev;
+	int pi_count = 0;
 	int count, ret;
 
 	req->num_sge = 1;
@@ -1294,6 +1466,35 @@ static int nvme_rdma_map_data(struct nvme_rdma_queue *queue,
 		goto out_free_table;
 	}
 
+	if (blk_integrity_rq(rq)) {
+		req->metadata_sgl->sg_table.sgl =
+			(struct scatterlist *)(req->metadata_sgl + 1);
+		ret = sg_alloc_table_chained(&req->metadata_sgl->sg_table,
+				blk_rq_count_integrity_sg(rq->q, rq->bio),
+				req->metadata_sgl->sg_table.sgl,
+				NVME_INLINE_METADATA_SG_CNT);
+		if (unlikely(ret)) {
+			ret = -ENOMEM;
+			goto out_unmap_sg;
+		}
+
+		req->metadata_sgl->nents = blk_rq_map_integrity_sg(rq->q,
+				rq->bio, req->metadata_sgl->sg_table.sgl);
+		pi_count = ib_dma_map_sg(ibdev,
+					 req->metadata_sgl->sg_table.sgl,
+					 req->metadata_sgl->nents,
+					 rq_dma_dir(rq));
+		if (unlikely(pi_count <= 0)) {
+			ret = -EIO;
+			goto out_free_pi_table;
+		}
+	}
+
+	if (req->use_sig_mr) {
+		ret = nvme_rdma_map_sg_pi(queue, req, c, count, pi_count);
+		goto out;
+	}
+
 	if (count <= dev->num_inline_segments) {
 		if (rq_data_dir(rq) == WRITE && nvme_rdma_queue_idx(queue) &&
 		    queue->ctrl->use_inline_data &&
@@ -1312,10 +1513,18 @@ static int nvme_rdma_map_data(struct nvme_rdma_queue *queue,
 	ret = nvme_rdma_map_sg_fr(queue, req, c, count);
 out:
 	if (unlikely(ret))
-		goto out_unmap_sg;
+		goto out_unmap_pi_sg;
 
 	return 0;
 
+out_unmap_pi_sg:
+	if (blk_integrity_rq(rq))
+		ib_dma_unmap_sg(ibdev, req->metadata_sgl->sg_table.sgl,
+				req->metadata_sgl->nents, rq_dma_dir(rq));
+out_free_pi_table:
+	if (blk_integrity_rq(rq))
+		sg_free_table_chained(&req->metadata_sgl->sg_table,
+				      NVME_INLINE_METADATA_SG_CNT);
 out_unmap_sg:
 	ib_dma_unmap_sg(ibdev, req->data_sgl.sg_table.sgl, req->data_sgl.nents,
 			rq_dma_dir(rq));
@@ -1768,6 +1977,15 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 	blk_mq_start_request(rq);
 
+	if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) &&
+	    queue->pi_support &&
+	    (c->common.opcode == nvme_cmd_write ||
+	     c->common.opcode == nvme_cmd_read) &&
+	    nvme_ns_has_pi(ns))
+		req->use_sig_mr = true;
+	else
+		req->use_sig_mr = false;
+
 	err = nvme_rdma_map_data(queue, rq, c);
 	if (unlikely(err < 0)) {
 		dev_err(queue->ctrl->ctrl.device,
@@ -1808,12 +2026,46 @@ static int nvme_rdma_poll(struct blk_mq_hw_ctx *hctx)
 	return ib_process_cq_direct(queue->ib_cq, -1);
 }
 
+static void nvme_rdma_check_pi_status(struct nvme_rdma_request *req)
+{
+	struct request *rq = blk_mq_rq_from_pdu(req);
+	struct ib_mr_status mr_status;
+	int ret;
+
+	ret = ib_check_mr_status(req->mr, IB_MR_CHECK_SIG_STATUS, &mr_status);
+	if (ret) {
+		pr_err("ib_check_mr_status failed, ret %d\n", ret);
+		nvme_req(rq)->status = NVME_SC_INVALID_PI;
+		return;
+	}
+
+	if (mr_status.fail_status & IB_MR_CHECK_SIG_STATUS) {
+		switch (mr_status.sig_err.err_type) {
+		case IB_SIG_BAD_GUARD:
+			nvme_req(rq)->status = NVME_SC_GUARD_CHECK;
+			break;
+		case IB_SIG_BAD_REFTAG:
+			nvme_req(rq)->status = NVME_SC_REFTAG_CHECK;
+			break;
+		case IB_SIG_BAD_APPTAG:
+			nvme_req(rq)->status = NVME_SC_APPTAG_CHECK;
+			break;
+		}
+		pr_err("PI error found type %d expected 0x%x vs actual 0x%x\n",
+		       mr_status.sig_err.err_type, mr_status.sig_err.expected,
+		       mr_status.sig_err.actual);
+	}
+}
+
 static void nvme_rdma_complete_rq(struct request *rq)
 {
 	struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
 	struct nvme_rdma_queue *queue = req->queue;
 	struct ib_device *ibdev = queue->device->dev;
 
+	if (req->use_sig_mr)
+		nvme_rdma_check_pi_status(req);
+
 	nvme_rdma_unmap_data(queue, rq);
 	ib_dma_unmap_single(ibdev, req->sqe.dma, sizeof(struct nvme_command),
 			    DMA_TO_DEVICE);
@@ -1933,7 +2185,7 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
 static const struct nvme_ctrl_ops nvme_rdma_ctrl_ops = {
 	.name			= "rdma",
 	.module			= THIS_MODULE,
-	.flags			= NVME_F_FABRICS,
+	.flags			= NVME_F_FABRICS | NVME_F_METADATA_SUPPORTED,
 	.reg_read32		= nvmf_reg_read32,
 	.reg_read64		= nvmf_reg_read64,
 	.reg_write32		= nvmf_reg_write32,
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 10/16] nvmet: add metadata characteristics for a namespace
  2020-05-04 15:57 [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (8 preceding siblings ...)
  2020-05-04 15:57 ` [PATCH 09/16] nvme-rdma: add metadata/T10-PI support Max Gurtovoy
@ 2020-05-04 15:57 ` Max Gurtovoy
  2020-05-13 19:25   ` James Smart
  2020-05-04 15:57 ` [PATCH 11/16] nvmet: rename nvmet_rw_len to nvmet_rw_data_len Max Gurtovoy
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-04 15:57 UTC (permalink / raw)
  To: maxg, linux-nvme, kbusch, hch, sagi, martin.petersen, jsmart2021, axboe
  Cc: vladimirk, shlomin, israelr, idanb, oren, nitzanc

From: Israel Rukshin <israelr@mellanox.com>

Fill those namespace fields from the block device format for adding
metadata (T10-PI) over fabric support with block devices.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/target/io-cmd-bdev.c | 22 ++++++++++++++++++++++
 drivers/nvme/target/nvmet.h       |  3 +++
 2 files changed, 25 insertions(+)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 0427e04..45dc644 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -47,6 +47,22 @@ void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id)
 	id->nows = to0based(ql->io_opt / ql->logical_block_size);
 }
 
+static void nvmet_bdev_ns_enable_integrity(struct nvmet_ns *ns)
+{
+	struct blk_integrity *bi = bdev_get_integrity(ns->bdev);
+
+	if (bi) {
+		ns->metadata_size = bi->tuple_size;
+		if (bi->profile == &t10_pi_type1_crc)
+			ns->pi_type = NVME_NS_DPS_PI_TYPE1;
+		else if (bi->profile == &t10_pi_type3_crc)
+			ns->pi_type = NVME_NS_DPS_PI_TYPE3;
+		else
+			/* Unsupported metadata type */
+			ns->metadata_size = 0;
+	}
+}
+
 int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
 {
 	int ret;
@@ -64,6 +80,12 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
 	}
 	ns->size = i_size_read(ns->bdev->bd_inode);
 	ns->blksize_shift = blksize_bits(bdev_logical_block_size(ns->bdev));
+
+	ns->pi_type = 0;
+	ns->metadata_size = 0;
+	if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY))
+		nvmet_bdev_ns_enable_integrity(ns);
+
 	return 0;
 }
 
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 3d981eb..b9fb57b 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -19,6 +19,7 @@
 #include <linux/rcupdate.h>
 #include <linux/blkdev.h>
 #include <linux/radix-tree.h>
+#include <linux/t10-pi.h>
 
 #define NVMET_ASYNC_EVENTS		4
 #define NVMET_ERROR_LOG_SLOTS		128
@@ -77,6 +78,8 @@ struct nvmet_ns {
 
 	int			use_p2pmem;
 	struct pci_dev		*p2p_dev;
+	int			pi_type;
+	int			metadata_size;
 };
 
 static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 11/16] nvmet: rename nvmet_rw_len to nvmet_rw_data_len
  2020-05-04 15:57 [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (9 preceding siblings ...)
  2020-05-04 15:57 ` [PATCH 10/16] nvmet: add metadata characteristics for a namespace Max Gurtovoy
@ 2020-05-04 15:57 ` Max Gurtovoy
  2020-05-13 19:25   ` James Smart
  2020-05-04 15:57 ` [PATCH 12/16] nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len Max Gurtovoy
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-04 15:57 UTC (permalink / raw)
  To: maxg, linux-nvme, kbusch, hch, sagi, martin.petersen, jsmart2021, axboe
  Cc: vladimirk, shlomin, israelr, idanb, oren, nitzanc

From: Israel Rukshin <israelr@mellanox.com>

The function doesn't add the metadata length (only data length is
calculated). This is preparation for adding metadata (T10-PI) support.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 drivers/nvme/target/io-cmd-bdev.c | 2 +-
 drivers/nvme/target/io-cmd-file.c | 2 +-
 drivers/nvme/target/nvmet.h       | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 45dc644..1ebee62 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -178,7 +178,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 	sector_t sector;
 	int op, i;
 
-	if (!nvmet_check_data_len(req, nvmet_rw_len(req)))
+	if (!nvmet_check_data_len(req, nvmet_rw_data_len(req)))
 		return;
 
 	if (!req->sg_cnt) {
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index f0bd08d..b31717c 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -241,7 +241,7 @@ static void nvmet_file_execute_rw(struct nvmet_req *req)
 {
 	ssize_t nr_bvec = req->sg_cnt;
 
-	if (!nvmet_check_data_len(req, nvmet_rw_len(req)))
+	if (!nvmet_check_data_len(req, nvmet_rw_data_len(req)))
 		return;
 
 	if (!req->sg_cnt || !nr_bvec) {
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index b9fb57b..0ce9404 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -504,7 +504,7 @@ void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type,
 void nvmet_bdev_ns_revalidate(struct nvmet_ns *ns);
 int nvmet_file_ns_revalidate(struct nvmet_ns *ns);
 
-static inline u32 nvmet_rw_len(struct nvmet_req *req)
+static inline u32 nvmet_rw_data_len(struct nvmet_req *req)
 {
 	return ((u32)le16_to_cpu(req->cmd->rw.length) + 1) <<
 			req->ns->blksize_shift;
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 12/16] nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len
  2020-05-04 15:57 [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (10 preceding siblings ...)
  2020-05-04 15:57 ` [PATCH 11/16] nvmet: rename nvmet_rw_len to nvmet_rw_data_len Max Gurtovoy
@ 2020-05-04 15:57 ` Max Gurtovoy
  2020-05-13 19:27   ` James Smart
  2020-05-04 15:57 ` [PATCH 13/16] nvme: add Metadata Capabilities enumerations Max Gurtovoy
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-04 15:57 UTC (permalink / raw)
  To: maxg, linux-nvme, kbusch, hch, sagi, martin.petersen, jsmart2021, axboe
  Cc: vladimirk, shlomin, israelr, idanb, oren, nitzanc

From: Israel Rukshin <israelr@mellanox.com>

The function doesn't check only the data length, because the transfer
length includes also the metadata length in some cases. This is
preparation for adding metadata (T10-PI) support.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 drivers/nvme/target/admin-cmd.c   | 14 +++++++-------
 drivers/nvme/target/core.c        |  6 +++---
 drivers/nvme/target/discovery.c   |  8 ++++----
 drivers/nvme/target/fabrics-cmd.c |  8 ++++----
 drivers/nvme/target/io-cmd-bdev.c |  6 +++---
 drivers/nvme/target/io-cmd-file.c |  6 +++---
 drivers/nvme/target/nvmet.h       |  2 +-
 7 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 4c79aa8..905aba8 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -295,7 +295,7 @@ static void nvmet_execute_get_log_page_ana(struct nvmet_req *req)
 
 static void nvmet_execute_get_log_page(struct nvmet_req *req)
 {
-	if (!nvmet_check_data_len(req, nvmet_get_log_page_len(req->cmd)))
+	if (!nvmet_check_transfer_len(req, nvmet_get_log_page_len(req->cmd)))
 		return;
 
 	switch (req->cmd->get_log_page.lid) {
@@ -630,7 +630,7 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req)
 
 static void nvmet_execute_identify(struct nvmet_req *req)
 {
-	if (!nvmet_check_data_len(req, NVME_IDENTIFY_DATA_SIZE))
+	if (!nvmet_check_transfer_len(req, NVME_IDENTIFY_DATA_SIZE))
 		return;
 
 	switch (req->cmd->identify.cns) {
@@ -659,7 +659,7 @@ static void nvmet_execute_identify(struct nvmet_req *req)
  */
 static void nvmet_execute_abort(struct nvmet_req *req)
 {
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 	nvmet_set_result(req, 1);
 	nvmet_req_complete(req, 0);
@@ -748,7 +748,7 @@ static void nvmet_execute_set_features(struct nvmet_req *req)
 	u16 nsqr;
 	u16 ncqr;
 
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 
 	switch (cdw10 & 0xff) {
@@ -820,7 +820,7 @@ static void nvmet_execute_get_features(struct nvmet_req *req)
 	u32 cdw10 = le32_to_cpu(req->cmd->common.cdw10);
 	u16 status = 0;
 
-	if (!nvmet_check_data_len(req, nvmet_feat_data_len(req, cdw10)))
+	if (!nvmet_check_transfer_len(req, nvmet_feat_data_len(req, cdw10)))
 		return;
 
 	switch (cdw10 & 0xff) {
@@ -887,7 +887,7 @@ void nvmet_execute_async_event(struct nvmet_req *req)
 {
 	struct nvmet_ctrl *ctrl = req->sq->ctrl;
 
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 
 	mutex_lock(&ctrl->lock);
@@ -906,7 +906,7 @@ void nvmet_execute_keep_alive(struct nvmet_req *req)
 {
 	struct nvmet_ctrl *ctrl = req->sq->ctrl;
 
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 
 	pr_debug("ctrl %d update keep-alive timer for %d secs\n",
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index b685f99..50dfc60 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -936,9 +936,9 @@ void nvmet_req_uninit(struct nvmet_req *req)
 }
 EXPORT_SYMBOL_GPL(nvmet_req_uninit);
 
-bool nvmet_check_data_len(struct nvmet_req *req, size_t data_len)
+bool nvmet_check_transfer_len(struct nvmet_req *req, size_t len)
 {
-	if (unlikely(data_len != req->transfer_len)) {
+	if (unlikely(len != req->transfer_len)) {
 		req->error_loc = offsetof(struct nvme_common_command, dptr);
 		nvmet_req_complete(req, NVME_SC_SGL_INVALID_DATA | NVME_SC_DNR);
 		return false;
@@ -946,7 +946,7 @@ bool nvmet_check_data_len(struct nvmet_req *req, size_t data_len)
 
 	return true;
 }
-EXPORT_SYMBOL_GPL(nvmet_check_data_len);
+EXPORT_SYMBOL_GPL(nvmet_check_transfer_len);
 
 bool nvmet_check_data_len_lte(struct nvmet_req *req, size_t data_len)
 {
diff --git a/drivers/nvme/target/discovery.c b/drivers/nvme/target/discovery.c
index 0c2274b..40cf0b6 100644
--- a/drivers/nvme/target/discovery.c
+++ b/drivers/nvme/target/discovery.c
@@ -171,7 +171,7 @@ static void nvmet_execute_disc_get_log_page(struct nvmet_req *req)
 	u16 status = 0;
 	void *buffer;
 
-	if (!nvmet_check_data_len(req, data_len))
+	if (!nvmet_check_transfer_len(req, data_len))
 		return;
 
 	if (req->cmd->get_log_page.lid != NVME_LOG_DISC) {
@@ -244,7 +244,7 @@ static void nvmet_execute_disc_identify(struct nvmet_req *req)
 	const char model[] = "Linux";
 	u16 status = 0;
 
-	if (!nvmet_check_data_len(req, NVME_IDENTIFY_DATA_SIZE))
+	if (!nvmet_check_transfer_len(req, NVME_IDENTIFY_DATA_SIZE))
 		return;
 
 	if (req->cmd->identify.cns != NVME_ID_CNS_CTRL) {
@@ -298,7 +298,7 @@ static void nvmet_execute_disc_set_features(struct nvmet_req *req)
 	u32 cdw10 = le32_to_cpu(req->cmd->common.cdw10);
 	u16 stat;
 
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 
 	switch (cdw10 & 0xff) {
@@ -324,7 +324,7 @@ static void nvmet_execute_disc_get_features(struct nvmet_req *req)
 	u32 cdw10 = le32_to_cpu(req->cmd->common.cdw10);
 	u16 stat = 0;
 
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 
 	switch (cdw10 & 0xff) {
diff --git a/drivers/nvme/target/fabrics-cmd.c b/drivers/nvme/target/fabrics-cmd.c
index feef15c..52a6f70 100644
--- a/drivers/nvme/target/fabrics-cmd.c
+++ b/drivers/nvme/target/fabrics-cmd.c
@@ -12,7 +12,7 @@ static void nvmet_execute_prop_set(struct nvmet_req *req)
 	u64 val = le64_to_cpu(req->cmd->prop_set.value);
 	u16 status = 0;
 
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 
 	if (req->cmd->prop_set.attrib & 1) {
@@ -41,7 +41,7 @@ static void nvmet_execute_prop_get(struct nvmet_req *req)
 	u16 status = 0;
 	u64 val = 0;
 
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 
 	if (req->cmd->prop_get.attrib & 1) {
@@ -156,7 +156,7 @@ static void nvmet_execute_admin_connect(struct nvmet_req *req)
 	struct nvmet_ctrl *ctrl = NULL;
 	u16 status = 0;
 
-	if (!nvmet_check_data_len(req, sizeof(struct nvmf_connect_data)))
+	if (!nvmet_check_transfer_len(req, sizeof(struct nvmf_connect_data)))
 		return;
 
 	d = kmalloc(sizeof(*d), GFP_KERNEL);
@@ -223,7 +223,7 @@ static void nvmet_execute_io_connect(struct nvmet_req *req)
 	u16 qid = le16_to_cpu(c->qid);
 	u16 status = 0;
 
-	if (!nvmet_check_data_len(req, sizeof(struct nvmf_connect_data)))
+	if (!nvmet_check_transfer_len(req, sizeof(struct nvmf_connect_data)))
 		return;
 
 	d = kmalloc(sizeof(*d), GFP_KERNEL);
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 1ebee62..19b57f6 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -178,7 +178,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 	sector_t sector;
 	int op, i;
 
-	if (!nvmet_check_data_len(req, nvmet_rw_data_len(req)))
+	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
 		return;
 
 	if (!req->sg_cnt) {
@@ -239,7 +239,7 @@ static void nvmet_bdev_execute_flush(struct nvmet_req *req)
 {
 	struct bio *bio = &req->b.inline_bio;
 
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 
 	bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
@@ -331,7 +331,7 @@ static void nvmet_bdev_execute_write_zeroes(struct nvmet_req *req)
 	sector_t nr_sector;
 	int ret;
 
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 
 	sector = le64_to_cpu(write_zeroes->slba) <<
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index b31717c..0abbefd 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -241,7 +241,7 @@ static void nvmet_file_execute_rw(struct nvmet_req *req)
 {
 	ssize_t nr_bvec = req->sg_cnt;
 
-	if (!nvmet_check_data_len(req, nvmet_rw_data_len(req)))
+	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
 		return;
 
 	if (!req->sg_cnt || !nr_bvec) {
@@ -285,7 +285,7 @@ static void nvmet_file_flush_work(struct work_struct *w)
 
 static void nvmet_file_execute_flush(struct nvmet_req *req)
 {
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 	INIT_WORK(&req->f.work, nvmet_file_flush_work);
 	schedule_work(&req->f.work);
@@ -375,7 +375,7 @@ static void nvmet_file_write_zeroes_work(struct work_struct *w)
 
 static void nvmet_file_execute_write_zeroes(struct nvmet_req *req)
 {
-	if (!nvmet_check_data_len(req, 0))
+	if (!nvmet_check_transfer_len(req, 0))
 		return;
 	INIT_WORK(&req->f.work, nvmet_file_write_zeroes_work);
 	schedule_work(&req->f.work);
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 0ce9404..ceedaaf 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -387,7 +387,7 @@ static inline bool nvmet_aen_bit_disabled(struct nvmet_ctrl *ctrl, u32 bn)
 bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
 		struct nvmet_sq *sq, const struct nvmet_fabrics_ops *ops);
 void nvmet_req_uninit(struct nvmet_req *req);
-bool nvmet_check_data_len(struct nvmet_req *req, size_t data_len);
+bool nvmet_check_transfer_len(struct nvmet_req *req, size_t len);
 bool nvmet_check_data_len_lte(struct nvmet_req *req, size_t data_len);
 void nvmet_req_complete(struct nvmet_req *req, u16 status);
 int nvmet_req_alloc_sgl(struct nvmet_req *req);
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 13/16] nvme: add Metadata Capabilities enumerations
  2020-05-04 15:57 [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (11 preceding siblings ...)
  2020-05-04 15:57 ` [PATCH 12/16] nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len Max Gurtovoy
@ 2020-05-04 15:57 ` Max Gurtovoy
  2020-05-13 19:27   ` James Smart
  2020-05-14  3:07   ` Martin K. Petersen
  2020-05-04 15:57 ` [PATCH 14/16] nvmet: add metadata/T10-PI support Max Gurtovoy
                   ` (3 subsequent siblings)
  16 siblings, 2 replies; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-04 15:57 UTC (permalink / raw)
  To: maxg, linux-nvme, kbusch, hch, sagi, martin.petersen, jsmart2021, axboe
  Cc: vladimirk, shlomin, israelr, idanb, oren, nitzanc

From: Israel Rukshin <israelr@mellanox.com>

The enumerations will be used to expose the namespace metadata format by
the target.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/nvme.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 3d5189f..57c1b6b 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -412,6 +412,12 @@ enum {
 	NVME_NS_DPS_PI_TYPE3	= 3,
 };
 
+/* Identify Namespace Metadata Capabilities (MC): */
+enum {
+	NVME_MC_EXTENDED_LBA	= (1 << 0),
+	NVME_MC_METADATA_PTR	= (1 << 1),
+};
+
 struct nvme_ns_id_desc {
 	__u8 nidt;
 	__u8 nidl;
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 14/16] nvmet: add metadata/T10-PI support
  2020-05-04 15:57 [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (12 preceding siblings ...)
  2020-05-04 15:57 ` [PATCH 13/16] nvme: add Metadata Capabilities enumerations Max Gurtovoy
@ 2020-05-04 15:57 ` Max Gurtovoy
  2020-05-13 19:51   ` James Smart
  2020-05-04 15:57 ` [PATCH 15/16] nvmet: add metadata support for block devices Max Gurtovoy
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-04 15:57 UTC (permalink / raw)
  To: maxg, linux-nvme, kbusch, hch, sagi, martin.petersen, jsmart2021, axboe
  Cc: vladimirk, shlomin, israelr, idanb, oren, nitzanc

From: Israel Rukshin <israelr@mellanox.com>

Expose the namespace metadata format when PI is enabled. The user needs
to enable the capability per subsystem and per port. The other metadata
properties are taken from the namespace/bdev.

Usage example:
echo 1 > /config/nvmet/subsystems/${NAME}/attr_pi_enable
echo 1 > /config/nvmet/ports/${PORT_NUM}/param_pi_enable

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
---
 drivers/nvme/target/admin-cmd.c   | 26 +++++++++++++++---
 drivers/nvme/target/configfs.c    | 58 +++++++++++++++++++++++++++++++++++++++
 drivers/nvme/target/core.c        | 21 +++++++++++---
 drivers/nvme/target/fabrics-cmd.c |  7 +++--
 drivers/nvme/target/nvmet.h       | 19 +++++++++++++
 5 files changed, 121 insertions(+), 10 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 905aba8..9c8a00d 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -341,6 +341,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 {
 	struct nvmet_ctrl *ctrl = req->sq->ctrl;
 	struct nvme_id_ctrl *id;
+	u32 cmd_capsule_size;
 	u16 status = 0;
 
 	id = kzalloc(sizeof(*id), GFP_KERNEL);
@@ -433,9 +434,15 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 
 	strlcpy(id->subnqn, ctrl->subsys->subsysnqn, sizeof(id->subnqn));
 
-	/* Max command capsule size is sqe + single page of in-capsule data */
-	id->ioccsz = cpu_to_le32((sizeof(struct nvme_command) +
-				  req->port->inline_data_size) / 16);
+	/*
+	 * Max command capsule size is sqe + in-capsule data size.
+	 * Disable in-capsule data for Metadata capable controllers.
+	 */
+	cmd_capsule_size = sizeof(struct nvme_command);
+	if (!ctrl->pi_support)
+		cmd_capsule_size += req->port->inline_data_size;
+	id->ioccsz = cpu_to_le32(cmd_capsule_size / 16);
+
 	/* Max response capsule size is cqe */
 	id->iorcsz = cpu_to_le32(sizeof(struct nvme_completion) / 16);
 
@@ -465,6 +472,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 
 static void nvmet_execute_identify_ns(struct nvmet_req *req)
 {
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
 	struct nvmet_ns *ns;
 	struct nvme_id_ns *id;
 	u16 status = 0;
@@ -482,7 +490,7 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
 	}
 
 	/* return an all zeroed buffer if we can't find an active namespace */
-	ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
+	ns = nvmet_find_namespace(ctrl, req->cmd->identify.nsid);
 	if (!ns)
 		goto done;
 
@@ -526,6 +534,16 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
 
 	id->lbaf[0].ds = ns->blksize_shift;
 
+	if (ctrl->pi_support && nvmet_ns_has_pi(ns)) {
+		id->dpc = NVME_NS_DPC_PI_FIRST | NVME_NS_DPC_PI_LAST |
+			  NVME_NS_DPC_PI_TYPE1 | NVME_NS_DPC_PI_TYPE2 |
+			  NVME_NS_DPC_PI_TYPE3;
+		id->mc = NVME_MC_EXTENDED_LBA;
+		id->dps = ns->pi_type;
+		id->flbas = NVME_NS_FLBAS_META_EXT;
+		id->lbaf[0].ms = ns->metadata_size;
+	}
+
 	if (ns->readonly)
 		id->nsattr |= (1 << 0);
 	nvmet_put_namespace(ns);
diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 58cabd7..42e5f61 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -248,6 +248,36 @@ static ssize_t nvmet_param_inline_data_size_store(struct config_item *item,
 
 CONFIGFS_ATTR(nvmet_, param_inline_data_size);
 
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+static ssize_t nvmet_param_pi_enable_show(struct config_item *item,
+		char *page)
+{
+	struct nvmet_port *port = to_nvmet_port(item);
+
+	return snprintf(page, PAGE_SIZE, "%d\n", port->pi_enable);
+}
+
+static ssize_t nvmet_param_pi_enable_store(struct config_item *item,
+		const char *page, size_t count)
+{
+	struct nvmet_port *port = to_nvmet_port(item);
+	bool val;
+
+	if (strtobool(page, &val))
+		return -EINVAL;
+
+	if (port->enabled) {
+		pr_err("Disable port before setting pi_enable value.\n");
+		return -EACCES;
+	}
+
+	port->pi_enable = val;
+	return count;
+}
+
+CONFIGFS_ATTR(nvmet_, param_pi_enable);
+#endif
+
 static ssize_t nvmet_addr_trtype_show(struct config_item *item,
 		char *page)
 {
@@ -987,6 +1017,28 @@ static ssize_t nvmet_subsys_attr_model_store(struct config_item *item,
 }
 CONFIGFS_ATTR(nvmet_subsys_, attr_model);
 
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+static ssize_t nvmet_subsys_attr_pi_enable_show(struct config_item *item,
+						char *page)
+{
+	return snprintf(page, PAGE_SIZE, "%d\n", to_subsys(item)->pi_support);
+}
+
+static ssize_t nvmet_subsys_attr_pi_enable_store(struct config_item *item,
+						 const char *page, size_t count)
+{
+	struct nvmet_subsys *subsys = to_subsys(item);
+	bool pi_enable;
+
+	if (strtobool(page, &pi_enable))
+		return -EINVAL;
+
+	subsys->pi_support = pi_enable;
+	return count;
+}
+CONFIGFS_ATTR(nvmet_subsys_, attr_pi_enable);
+#endif
+
 static struct configfs_attribute *nvmet_subsys_attrs[] = {
 	&nvmet_subsys_attr_attr_allow_any_host,
 	&nvmet_subsys_attr_attr_version,
@@ -994,6 +1046,9 @@ static ssize_t nvmet_subsys_attr_model_store(struct config_item *item,
 	&nvmet_subsys_attr_attr_cntlid_min,
 	&nvmet_subsys_attr_attr_cntlid_max,
 	&nvmet_subsys_attr_attr_model,
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+	&nvmet_subsys_attr_attr_pi_enable,
+#endif
 	NULL,
 };
 
@@ -1297,6 +1352,9 @@ static void nvmet_port_release(struct config_item *item)
 	&nvmet_attr_addr_trsvcid,
 	&nvmet_attr_addr_trtype,
 	&nvmet_attr_param_inline_data_size,
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+	&nvmet_attr_param_pi_enable,
+#endif
 	NULL,
 };
 
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 50dfc60..9f6b0f9 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -322,12 +322,21 @@ int nvmet_enable_port(struct nvmet_port *port)
 	if (!try_module_get(ops->owner))
 		return -EINVAL;
 
-	ret = ops->add_port(port);
-	if (ret) {
-		module_put(ops->owner);
-		return ret;
+	/*
+	 * If the user requested PI support and the transport isn't pi capable,
+	 * don't enable the port.
+	 */
+	if (port->pi_enable && !ops->metadata_support) {
+		pr_err("T10-PI is not supported by transport type %d\n",
+		       port->disc_addr.trtype);
+		ret = -EINVAL;
+		goto out_put;
 	}
 
+	ret = ops->add_port(port);
+	if (ret)
+		goto out_put;
+
 	/* If the transport didn't set inline_data_size, then disable it. */
 	if (port->inline_data_size < 0)
 		port->inline_data_size = 0;
@@ -335,6 +344,10 @@ int nvmet_enable_port(struct nvmet_port *port)
 	port->enabled = true;
 	port->tr_ops = ops;
 	return 0;
+
+out_put:
+	module_put(ops->owner);
+	return ret;
 }
 
 void nvmet_disable_port(struct nvmet_port *port)
diff --git a/drivers/nvme/target/fabrics-cmd.c b/drivers/nvme/target/fabrics-cmd.c
index 52a6f70..42bd12b 100644
--- a/drivers/nvme/target/fabrics-cmd.c
+++ b/drivers/nvme/target/fabrics-cmd.c
@@ -197,6 +197,8 @@ static void nvmet_execute_admin_connect(struct nvmet_req *req)
 		goto out;
 	}
 
+	ctrl->pi_support = ctrl->port->pi_enable && ctrl->subsys->pi_support;
+
 	uuid_copy(&ctrl->hostid, &d->hostid);
 
 	status = nvmet_install_queue(ctrl, req);
@@ -205,8 +207,9 @@ static void nvmet_execute_admin_connect(struct nvmet_req *req)
 		goto out;
 	}
 
-	pr_info("creating controller %d for subsystem %s for NQN %s.\n",
-		ctrl->cntlid, ctrl->subsys->subsysnqn, ctrl->hostnqn);
+	pr_info("creating controller %d for subsystem %s for NQN %s%s.\n",
+		ctrl->cntlid, ctrl->subsys->subsysnqn, ctrl->hostnqn,
+		ctrl->pi_support ? " T10-PI is enabled" : "");
 	req->cqe->result.u16 = cpu_to_le16(ctrl->cntlid);
 
 out:
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index ceedaaf..2986e2c 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -145,6 +145,7 @@ struct nvmet_port {
 	bool				enabled;
 	int				inline_data_size;
 	const struct nvmet_fabrics_ops	*tr_ops;
+	bool				pi_enable;
 };
 
 static inline struct nvmet_port *to_nvmet_port(struct config_item *item)
@@ -204,6 +205,7 @@ struct nvmet_ctrl {
 	spinlock_t		error_lock;
 	u64			err_counter;
 	struct nvme_error_slot	slots[NVMET_ERROR_LOG_SLOTS];
+	bool			pi_support;
 };
 
 struct nvmet_subsys_model {
@@ -233,6 +235,7 @@ struct nvmet_subsys {
 	u64			ver;
 	u64			serial;
 	char			*subsysnqn;
+	bool			pi_support;
 
 	struct config_group	group;
 
@@ -284,6 +287,7 @@ struct nvmet_fabrics_ops {
 	unsigned int type;
 	unsigned int msdbd;
 	bool has_keyed_sgls : 1;
+	bool metadata_support : 1;
 	void (*queue_response)(struct nvmet_req *req);
 	int (*add_port)(struct nvmet_port *port);
 	void (*remove_port)(struct nvmet_port *port);
@@ -510,6 +514,14 @@ static inline u32 nvmet_rw_data_len(struct nvmet_req *req)
 			req->ns->blksize_shift;
 }
 
+static inline u32 nvmet_rw_metadata_len(struct nvmet_req *req)
+{
+	if (!IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY))
+		return 0;
+	return ((u32)le16_to_cpu(req->cmd->rw.length) + 1) *
+			req->ns->metadata_size;
+}
+
 static inline u32 nvmet_dsm_len(struct nvmet_req *req)
 {
 	return (le32_to_cpu(req->cmd->dsm.nr) + 1) *
@@ -524,4 +536,11 @@ static inline __le16 to0based(u32 a)
 	return cpu_to_le16(max(1U, min(1U << 16, a)) - 1);
 }
 
+static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
+{
+	if (!IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY))
+		return false;
+	return ns->pi_type && ns->metadata_size == sizeof(struct t10_pi_tuple);
+}
+
 #endif /* _NVMET_H */
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 15/16] nvmet: add metadata support for block devices
  2020-05-04 15:57 [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (13 preceding siblings ...)
  2020-05-04 15:57 ` [PATCH 14/16] nvmet: add metadata/T10-PI support Max Gurtovoy
@ 2020-05-04 15:57 ` Max Gurtovoy
  2020-05-04 15:57 ` [PATCH 16/16] nvmet-rdma: add metadata/T10-PI support Max Gurtovoy
  2020-05-05  6:13 ` [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add " Christoph Hellwig
  16 siblings, 0 replies; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-04 15:57 UTC (permalink / raw)
  To: maxg, linux-nvme, kbusch, hch, sagi, martin.petersen, jsmart2021, axboe
  Cc: vladimirk, shlomin, israelr, idanb, oren, nitzanc

From: Israel Rukshin <israelr@mellanox.com>

Allocate the metadata SGL buffers and set metadata fields for the
request. Then create a block IO request for the metadata from the
protection SG list.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
---
 drivers/nvme/target/core.c        | 95 ++++++++++++++++++++++++++++-----------
 drivers/nvme/target/io-cmd-bdev.c | 87 ++++++++++++++++++++++++++++++++++-
 drivers/nvme/target/nvmet.h       |  7 ++-
 drivers/nvme/target/rdma.c        |  4 +-
 4 files changed, 161 insertions(+), 32 deletions(-)

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 9f6b0f9..4e9a2e7 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -886,8 +886,11 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
 	req->sq = sq;
 	req->ops = ops;
 	req->sg = NULL;
+	req->metadata_sg = NULL;
 	req->sg_cnt = 0;
+	req->metadata_sg_cnt = 0;
 	req->transfer_len = 0;
+	req->metadata_len = 0;
 	req->cqe->status = 0;
 	req->cqe->sq_head = 0;
 	req->ns = NULL;
@@ -972,50 +975,90 @@ bool nvmet_check_data_len_lte(struct nvmet_req *req, size_t data_len)
 	return true;
 }
 
-int nvmet_req_alloc_sgl(struct nvmet_req *req)
+static unsigned int nvmet_data_transfer_len(struct nvmet_req *req)
 {
-	struct pci_dev *p2p_dev = NULL;
+	return req->transfer_len - req->metadata_len;
+}
 
-	if (IS_ENABLED(CONFIG_PCI_P2PDMA)) {
-		if (req->sq->ctrl && req->ns)
-			p2p_dev = radix_tree_lookup(&req->sq->ctrl->p2p_ns_map,
-						    req->ns->nsid);
+static int nvmet_req_alloc_p2pmem_sgls(struct nvmet_req *req)
+{
+	req->sg = pci_p2pmem_alloc_sgl(req->p2p_dev, &req->sg_cnt,
+			nvmet_data_transfer_len(req));
+	if (!req->sg)
+		goto out_err;
 
-		req->p2p_dev = NULL;
-		if (req->sq->qid && p2p_dev) {
-			req->sg = pci_p2pmem_alloc_sgl(p2p_dev, &req->sg_cnt,
-						       req->transfer_len);
-			if (req->sg) {
-				req->p2p_dev = p2p_dev;
-				return 0;
-			}
-		}
+	if (req->metadata_len) {
+		req->metadata_sg = pci_p2pmem_alloc_sgl(req->p2p_dev,
+				&req->metadata_sg_cnt, req->metadata_len);
+		if (!req->metadata_sg)
+			goto out_free_sg;
+	}
+	return 0;
+out_free_sg:
+	pci_p2pmem_free_sgl(req->p2p_dev, req->sg);
+out_err:
+	return -ENOMEM;
+}
 
-		/*
-		 * If no P2P memory was available we fallback to using
-		 * regular memory
-		 */
+static bool nvmet_req_find_p2p_dev(struct nvmet_req *req)
+{
+	if (!IS_ENABLED(CONFIG_PCI_P2PDMA))
+		return false;
+
+	if (req->sq->ctrl && req->sq->qid && req->ns) {
+		req->p2p_dev = radix_tree_lookup(&req->sq->ctrl->p2p_ns_map,
+						 req->ns->nsid);
+		if (req->p2p_dev)
+			return true;
 	}
 
-	req->sg = sgl_alloc(req->transfer_len, GFP_KERNEL, &req->sg_cnt);
+	req->p2p_dev = NULL;
+	return false;
+}
+
+int nvmet_req_alloc_sgls(struct nvmet_req *req)
+{
+	if (nvmet_req_find_p2p_dev(req) && !nvmet_req_alloc_p2pmem_sgls(req))
+		return 0;
+
+	req->sg = sgl_alloc(nvmet_data_transfer_len(req), GFP_KERNEL,
+			    &req->sg_cnt);
 	if (unlikely(!req->sg))
-		return -ENOMEM;
+		goto out;
+
+	if (req->metadata_len) {
+		req->metadata_sg = sgl_alloc(req->metadata_len, GFP_KERNEL,
+					     &req->metadata_sg_cnt);
+		if (unlikely(!req->metadata_sg))
+			goto out_free;
+	}
 
 	return 0;
+out_free:
+	sgl_free(req->sg);
+out:
+	return -ENOMEM;
 }
-EXPORT_SYMBOL_GPL(nvmet_req_alloc_sgl);
+EXPORT_SYMBOL_GPL(nvmet_req_alloc_sgls);
 
-void nvmet_req_free_sgl(struct nvmet_req *req)
+void nvmet_req_free_sgls(struct nvmet_req *req)
 {
-	if (req->p2p_dev)
+	if (req->p2p_dev) {
 		pci_p2pmem_free_sgl(req->p2p_dev, req->sg);
-	else
+		if (req->metadata_sg)
+			pci_p2pmem_free_sgl(req->p2p_dev, req->metadata_sg);
+	} else {
 		sgl_free(req->sg);
+		if (req->metadata_sg)
+			sgl_free(req->metadata_sg);
+	}
 
 	req->sg = NULL;
+	req->metadata_sg = NULL;
 	req->sg_cnt = 0;
+	req->metadata_sg_cnt = 0;
 }
-EXPORT_SYMBOL_GPL(nvmet_req_free_sgl);
+EXPORT_SYMBOL_GPL(nvmet_req_free_sgls);
 
 static inline bool nvmet_cc_en(u32 cc)
 {
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 19b57f6..dab7950 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -169,6 +169,61 @@ static void nvmet_bio_done(struct bio *bio)
 		bio_put(bio);
 }
 
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+static int nvmet_bdev_alloc_bip(struct nvmet_req *req, struct bio *bio,
+				struct sg_mapping_iter *miter)
+{
+	struct blk_integrity *bi;
+	struct bio_integrity_payload *bip;
+	struct block_device *bdev = req->ns->bdev;
+	int rc;
+	size_t resid, len;
+
+	bi = bdev_get_integrity(bdev);
+	if (unlikely(!bi)) {
+		pr_err("Unable to locate bio_integrity\n");
+		return -ENODEV;
+	}
+
+	bip = bio_integrity_alloc(bio, GFP_NOIO,
+		min_t(unsigned int, req->metadata_sg_cnt, BIO_MAX_PAGES));
+	if (IS_ERR(bip)) {
+		pr_err("Unable to allocate bio_integrity_payload\n");
+		return PTR_ERR(bip);
+	}
+
+	bip->bip_iter.bi_size = bio_integrity_bytes(bi, bio_sectors(bio));
+	/* virtual start sector must be in integrity interval units */
+	bip_set_seed(bip, bio->bi_iter.bi_sector >>
+		     (bi->interval_exp - SECTOR_SHIFT));
+
+	resid = bip->bip_iter.bi_size;
+	while (resid > 0 && sg_miter_next(miter)) {
+		len = min_t(size_t, miter->length, resid);
+		rc = bio_integrity_add_page(bio, miter->page, len,
+					    offset_in_page(miter->addr));
+		if (unlikely(rc != len)) {
+			pr_err("bio_integrity_add_page() failed; %d\n", rc);
+			sg_miter_stop(miter);
+			return -ENOMEM;
+		}
+
+		resid -= len;
+		if (len < miter->length)
+			miter->consumed -= miter->length - len;
+	}
+	sg_miter_stop(miter);
+
+	return 0;
+}
+#else
+static int nvmet_bdev_alloc_bip(struct nvmet_req *req, struct bio *bio,
+				struct sg_mapping_iter *miter)
+{
+	return -EINVAL;
+}
+#endif /* CONFIG_BLK_DEV_INTEGRITY */
+
 static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 {
 	int sg_cnt = req->sg_cnt;
@@ -176,9 +231,12 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 	struct scatterlist *sg;
 	struct blk_plug plug;
 	sector_t sector;
-	int op, i;
+	int op, i, rc;
+	struct sg_mapping_iter prot_miter;
+	unsigned int iter_flags;
+	unsigned int total_len = nvmet_rw_data_len(req) + req->metadata_len;
 
-	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
+	if (!nvmet_check_transfer_len(req, total_len))
 		return;
 
 	if (!req->sg_cnt) {
@@ -190,8 +248,10 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 		op = REQ_OP_WRITE | REQ_SYNC | REQ_IDLE;
 		if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA))
 			op |= REQ_FUA;
+		iter_flags = SG_MITER_TO_SG;
 	} else {
 		op = REQ_OP_READ;
+		iter_flags = SG_MITER_FROM_SG;
 	}
 
 	if (is_pci_p2pdma_page(sg_page(req->sg)))
@@ -213,11 +273,24 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 	bio->bi_opf = op;
 
 	blk_start_plug(&plug);
+	if (req->metadata_len)
+		sg_miter_start(&prot_miter, req->metadata_sg,
+			       req->metadata_sg_cnt, iter_flags);
+
 	for_each_sg(req->sg, sg, req->sg_cnt, i) {
 		while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset)
 				!= sg->length) {
 			struct bio *prev = bio;
 
+			if (req->metadata_len) {
+				rc = nvmet_bdev_alloc_bip(req, bio,
+							  &prot_miter);
+				if (unlikely(rc)) {
+					bio_io_error(bio);
+					return;
+				}
+			}
+
 			bio = bio_alloc(GFP_KERNEL, min(sg_cnt, BIO_MAX_PAGES));
 			bio_set_dev(bio, req->ns->bdev);
 			bio->bi_iter.bi_sector = sector;
@@ -231,6 +304,14 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 		sg_cnt--;
 	}
 
+	if (req->metadata_len) {
+		rc = nvmet_bdev_alloc_bip(req, bio, &prot_miter);
+		if (unlikely(rc)) {
+			bio_io_error(bio);
+			return;
+		}
+	}
+
 	submit_bio(bio);
 	blk_finish_plug(&plug);
 }
@@ -358,6 +439,8 @@ u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req)
 	case nvme_cmd_read:
 	case nvme_cmd_write:
 		req->execute = nvmet_bdev_execute_rw;
+		if (req->sq->ctrl->pi_support && nvmet_ns_has_pi(req->ns))
+			req->metadata_len = nvmet_rw_metadata_len(req);
 		return 0;
 	case nvme_cmd_flush:
 		req->execute = nvmet_bdev_execute_flush;
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 2986e2c..3f72a17 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -309,6 +309,7 @@ struct nvmet_req {
 	struct nvmet_cq		*cq;
 	struct nvmet_ns		*ns;
 	struct scatterlist	*sg;
+	struct scatterlist	*metadata_sg;
 	struct bio_vec		inline_bvec[NVMET_MAX_INLINE_BIOVEC];
 	union {
 		struct {
@@ -322,8 +323,10 @@ struct nvmet_req {
 		} f;
 	};
 	int			sg_cnt;
+	int			metadata_sg_cnt;
 	/* data length as parsed from the SGL descriptor: */
 	size_t			transfer_len;
+	size_t			metadata_len;
 
 	struct nvmet_port	*port;
 
@@ -394,8 +397,8 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
 bool nvmet_check_transfer_len(struct nvmet_req *req, size_t len);
 bool nvmet_check_data_len_lte(struct nvmet_req *req, size_t data_len);
 void nvmet_req_complete(struct nvmet_req *req, u16 status);
-int nvmet_req_alloc_sgl(struct nvmet_req *req);
-void nvmet_req_free_sgl(struct nvmet_req *req);
+int nvmet_req_alloc_sgls(struct nvmet_req *req);
+void nvmet_req_free_sgls(struct nvmet_req *req);
 
 void nvmet_execute_keep_alive(struct nvmet_req *req);
 
diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 7a90b10..178dbff 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -546,7 +546,7 @@ static void nvmet_rdma_release_rsp(struct nvmet_rdma_rsp *rsp)
 	}
 
 	if (rsp->req.sg != rsp->cmd->inline_sg)
-		nvmet_req_free_sgl(&rsp->req);
+		nvmet_req_free_sgls(&rsp->req);
 
 	if (unlikely(!list_empty_careful(&queue->rsp_wr_wait_list)))
 		nvmet_rdma_process_wr_wait_list(queue);
@@ -708,7 +708,7 @@ static u16 nvmet_rdma_map_sgl_keyed(struct nvmet_rdma_rsp *rsp,
 	if (!rsp->req.transfer_len)
 		return 0;
 
-	ret = nvmet_req_alloc_sgl(&rsp->req);
+	ret = nvmet_req_alloc_sgls(&rsp->req);
 	if (unlikely(ret < 0))
 		goto error_out;
 
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 16/16] nvmet-rdma: add metadata/T10-PI support
  2020-05-04 15:57 [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (14 preceding siblings ...)
  2020-05-04 15:57 ` [PATCH 15/16] nvmet: add metadata support for block devices Max Gurtovoy
@ 2020-05-04 15:57 ` Max Gurtovoy
  2020-05-14  3:10   ` Martin K. Petersen
  2020-05-05  6:13 ` [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add " Christoph Hellwig
  16 siblings, 1 reply; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-04 15:57 UTC (permalink / raw)
  To: maxg, linux-nvme, kbusch, hch, sagi, martin.petersen, jsmart2021, axboe
  Cc: vladimirk, shlomin, israelr, idanb, oren, nitzanc

From: Israel Rukshin <israelr@mellanox.com>

For capable HCAs (e.g. ConnectX-5/ConnectX-6) this will allow end-to-end
protection information passthrough and validation for NVMe over RDMA
transport. Metadata support was implemented over the new RDMA signature
verbs API.

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
---
 drivers/nvme/target/rdma.c | 227 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 208 insertions(+), 19 deletions(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 178dbff..ec59f67 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -33,6 +33,7 @@
 
 /* Assume mpsmin == device_page_size == 4KB */
 #define NVMET_RDMA_MAX_MDTS			8
+#define NVMET_RDMA_MAX_METADATA_MDTS		5
 
 struct nvmet_rdma_srq;
 
@@ -60,6 +61,7 @@ struct nvmet_rdma_rsp {
 	struct nvmet_rdma_queue	*queue;
 
 	struct ib_cqe		read_cqe;
+	struct ib_cqe		write_cqe;
 	struct rdma_rw_ctx	rw;
 
 	struct nvmet_req	req;
@@ -161,6 +163,7 @@ struct nvmet_rdma_device {
 static void nvmet_rdma_send_done(struct ib_cq *cq, struct ib_wc *wc);
 static void nvmet_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc);
 static void nvmet_rdma_read_data_done(struct ib_cq *cq, struct ib_wc *wc);
+static void nvmet_rdma_write_data_done(struct ib_cq *cq, struct ib_wc *wc);
 static void nvmet_rdma_qp_event(struct ib_event *event, void *priv);
 static void nvmet_rdma_queue_disconnect(struct nvmet_rdma_queue *queue);
 static void nvmet_rdma_free_rsp(struct nvmet_rdma_device *ndev,
@@ -423,6 +426,9 @@ static int nvmet_rdma_alloc_rsp(struct nvmet_rdma_device *ndev,
 
 	/* Data In / RDMA READ */
 	r->read_cqe.done = nvmet_rdma_read_data_done;
+	/* Data Out / RDMA WRITE */
+	r->write_cqe.done = nvmet_rdma_write_data_done;
+
 	return 0;
 
 out_free_rsp:
@@ -532,6 +538,122 @@ static void nvmet_rdma_process_wr_wait_list(struct nvmet_rdma_queue *queue)
 	spin_unlock(&queue->rsp_wr_wait_lock);
 }
 
+static u16 nvmet_rdma_check_pi_status(struct ib_mr *sig_mr)
+{
+	struct ib_mr_status mr_status;
+	int ret;
+	u16 status = 0;
+
+	ret = ib_check_mr_status(sig_mr, IB_MR_CHECK_SIG_STATUS, &mr_status);
+	if (ret) {
+		pr_err("ib_check_mr_status failed, ret %d\n", ret);
+		return NVME_SC_INVALID_PI;
+	}
+
+	if (mr_status.fail_status & IB_MR_CHECK_SIG_STATUS) {
+		switch (mr_status.sig_err.err_type) {
+		case IB_SIG_BAD_GUARD:
+			status = NVME_SC_GUARD_CHECK;
+			break;
+		case IB_SIG_BAD_REFTAG:
+			status = NVME_SC_REFTAG_CHECK;
+			break;
+		case IB_SIG_BAD_APPTAG:
+			status = NVME_SC_APPTAG_CHECK;
+			break;
+		}
+		pr_err("PI error found type %d expected 0x%x vs actual 0x%x\n",
+		       mr_status.sig_err.err_type,
+		       mr_status.sig_err.expected,
+		       mr_status.sig_err.actual);
+	}
+
+	return status;
+}
+
+static void nvmet_rdma_set_sig_domain(struct blk_integrity *bi,
+		struct nvme_command *cmd, struct ib_sig_domain *domain,
+		u16 control)
+{
+	domain->sig_type = IB_SIG_TYPE_T10_DIF;
+	domain->sig.dif.bg_type = IB_T10DIF_CRC;
+	domain->sig.dif.pi_interval = 1 << bi->interval_exp;
+	domain->sig.dif.ref_tag = le32_to_cpu(cmd->rw.reftag);
+	domain->sig.dif.apptag_check_mask = 0xffff;
+	domain->sig.dif.app_escape = true;
+	domain->sig.dif.ref_escape = true;
+	if (control & NVME_RW_PRINFO_PRCHK_REF)
+		domain->sig.dif.ref_remap = true;
+}
+
+static void nvmet_rdma_set_sig_attrs(struct nvmet_req *req,
+				     struct ib_sig_attrs *sig_attrs)
+{
+	struct nvme_command *cmd = req->cmd;
+	u16 control = le16_to_cpu(cmd->rw.control);
+	struct blk_integrity *bi;
+
+	bi = bdev_get_integrity(req->ns->bdev);
+
+	memset(sig_attrs, 0, sizeof(*sig_attrs));
+
+	if (control & NVME_RW_PRINFO_PRACT) {
+		/* for WRITE_INSERT/READ_STRIP no wire domain */
+		sig_attrs->wire.sig_type = IB_SIG_TYPE_NONE;
+		nvmet_rdma_set_sig_domain(bi, cmd, &sig_attrs->mem, control);
+		/* Clear the PRACT bit since HCA will generate/verify the PI */
+		control &= ~NVME_RW_PRINFO_PRACT;
+		cmd->rw.control = cpu_to_le16(control);
+		/* PI is added by the HW */
+		req->transfer_len += req->metadata_len;
+	} else {
+		/* for WRITE_PASS/READ_PASS both wire/memory domains exist */
+		nvmet_rdma_set_sig_domain(bi, cmd, &sig_attrs->wire, control);
+		nvmet_rdma_set_sig_domain(bi, cmd, &sig_attrs->mem, control);
+	}
+
+	if (control & NVME_RW_PRINFO_PRCHK_REF)
+		sig_attrs->check_mask |= IB_SIG_CHECK_REFTAG;
+	if (control & NVME_RW_PRINFO_PRCHK_GUARD)
+		sig_attrs->check_mask |= IB_SIG_CHECK_GUARD;
+	if (control & NVME_RW_PRINFO_PRCHK_APP)
+		sig_attrs->check_mask |= IB_SIG_CHECK_APPTAG;
+}
+
+static int nvmet_rdma_rw_ctx_init(struct nvmet_rdma_rsp *rsp, u64 addr, u32 key,
+				  struct ib_sig_attrs *sig_attrs)
+{
+	struct rdma_cm_id *cm_id = rsp->queue->cm_id;
+	struct nvmet_req *req = &rsp->req;
+	int ret;
+
+	if (req->metadata_len)
+		ret = rdma_rw_ctx_signature_init(&rsp->rw, cm_id->qp,
+			cm_id->port_num, req->sg, req->sg_cnt,
+			req->metadata_sg, req->metadata_sg_cnt, sig_attrs,
+			addr, key, nvmet_data_dir(req));
+	else
+		ret = rdma_rw_ctx_init(&rsp->rw, cm_id->qp, cm_id->port_num,
+				       req->sg, req->sg_cnt, 0, addr, key,
+				       nvmet_data_dir(req));
+
+	return ret;
+}
+
+static void nvmet_rdma_rw_ctx_destroy(struct nvmet_rdma_rsp *rsp)
+{
+	struct rdma_cm_id *cm_id = rsp->queue->cm_id;
+	struct nvmet_req *req = &rsp->req;
+
+	if (req->metadata_len)
+		rdma_rw_ctx_destroy_signature(&rsp->rw, cm_id->qp,
+			cm_id->port_num, req->sg, req->sg_cnt,
+			req->metadata_sg, req->metadata_sg_cnt,
+			nvmet_data_dir(req));
+	else
+		rdma_rw_ctx_destroy(&rsp->rw, cm_id->qp, cm_id->port_num,
+				    req->sg, req->sg_cnt, nvmet_data_dir(req));
+}
 
 static void nvmet_rdma_release_rsp(struct nvmet_rdma_rsp *rsp)
 {
@@ -539,11 +661,8 @@ static void nvmet_rdma_release_rsp(struct nvmet_rdma_rsp *rsp)
 
 	atomic_add(1 + rsp->n_rdma, &queue->sq_wr_avail);
 
-	if (rsp->n_rdma) {
-		rdma_rw_ctx_destroy(&rsp->rw, queue->qp,
-				queue->cm_id->port_num, rsp->req.sg,
-				rsp->req.sg_cnt, nvmet_data_dir(&rsp->req));
-	}
+	if (rsp->n_rdma)
+		nvmet_rdma_rw_ctx_destroy(rsp);
 
 	if (rsp->req.sg != rsp->cmd->inline_sg)
 		nvmet_req_free_sgls(&rsp->req);
@@ -598,11 +717,16 @@ static void nvmet_rdma_queue_response(struct nvmet_req *req)
 		rsp->send_wr.opcode = IB_WR_SEND;
 	}
 
-	if (nvmet_rdma_need_data_out(rsp))
-		first_wr = rdma_rw_ctx_wrs(&rsp->rw, cm_id->qp,
-				cm_id->port_num, NULL, &rsp->send_wr);
-	else
+	if (nvmet_rdma_need_data_out(rsp)) {
+		if (rsp->req.metadata_len)
+			first_wr = rdma_rw_ctx_wrs(&rsp->rw, cm_id->qp,
+					cm_id->port_num, &rsp->write_cqe, NULL);
+		else
+			first_wr = rdma_rw_ctx_wrs(&rsp->rw, cm_id->qp,
+					cm_id->port_num, NULL, &rsp->send_wr);
+	} else {
 		first_wr = &rsp->send_wr;
+	}
 
 	nvmet_rdma_post_recv(rsp->queue->dev, rsp->cmd);
 
@@ -621,15 +745,14 @@ static void nvmet_rdma_read_data_done(struct ib_cq *cq, struct ib_wc *wc)
 	struct nvmet_rdma_rsp *rsp =
 		container_of(wc->wr_cqe, struct nvmet_rdma_rsp, read_cqe);
 	struct nvmet_rdma_queue *queue = cq->cq_context;
+	u16 status = 0;
 
 	WARN_ON(rsp->n_rdma <= 0);
 	atomic_add(rsp->n_rdma, &queue->sq_wr_avail);
-	rdma_rw_ctx_destroy(&rsp->rw, queue->qp,
-			queue->cm_id->port_num, rsp->req.sg,
-			rsp->req.sg_cnt, nvmet_data_dir(&rsp->req));
 	rsp->n_rdma = 0;
 
 	if (unlikely(wc->status != IB_WC_SUCCESS)) {
+		nvmet_rdma_rw_ctx_destroy(rsp);
 		nvmet_req_uninit(&rsp->req);
 		nvmet_rdma_release_rsp(rsp);
 		if (wc->status != IB_WC_WR_FLUSH_ERR) {
@@ -640,7 +763,58 @@ static void nvmet_rdma_read_data_done(struct ib_cq *cq, struct ib_wc *wc)
 		return;
 	}
 
-	rsp->req.execute(&rsp->req);
+	if (rsp->req.metadata_len)
+		status = nvmet_rdma_check_pi_status(rsp->rw.reg->mr);
+	nvmet_rdma_rw_ctx_destroy(rsp);
+
+	if (unlikely(status))
+		nvmet_req_complete(&rsp->req, status);
+	else
+		rsp->req.execute(&rsp->req);
+}
+
+static void nvmet_rdma_write_data_done(struct ib_cq *cq, struct ib_wc *wc)
+{
+	struct nvmet_rdma_rsp *rsp =
+		container_of(wc->wr_cqe, struct nvmet_rdma_rsp, write_cqe);
+	struct nvmet_rdma_queue *queue = cq->cq_context;
+	struct rdma_cm_id *cm_id = rsp->queue->cm_id;
+	u16 status;
+
+	if (!IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY))
+		return;
+
+	WARN_ON(rsp->n_rdma <= 0);
+	atomic_add(rsp->n_rdma, &queue->sq_wr_avail);
+	rsp->n_rdma = 0;
+
+	if (unlikely(wc->status != IB_WC_SUCCESS)) {
+		nvmet_rdma_rw_ctx_destroy(rsp);
+		nvmet_req_uninit(&rsp->req);
+		nvmet_rdma_release_rsp(rsp);
+		if (wc->status != IB_WC_WR_FLUSH_ERR) {
+			pr_info("RDMA WRITE for CQE 0x%p failed with status %s (%d).\n",
+				wc->wr_cqe, ib_wc_status_msg(wc->status),
+				wc->status);
+			nvmet_rdma_error_comp(queue);
+		}
+		return;
+	}
+
+	/*
+	 * Upon RDMA completion check the signature status
+	 * - if succeeded send good NVMe response
+	 * - if failed send bad NVMe response with appropriate error
+	 */
+	status = nvmet_rdma_check_pi_status(rsp->rw.reg->mr);
+	if (unlikely(status))
+		rsp->req.cqe->status = cpu_to_le16(status << 1);
+	nvmet_rdma_rw_ctx_destroy(rsp);
+
+	if (unlikely(ib_post_send(cm_id->qp, &rsp->send_wr, NULL))) {
+		pr_err("sending cmd response failed\n");
+		nvmet_rdma_release_rsp(rsp);
+	}
 }
 
 static void nvmet_rdma_use_inline_sg(struct nvmet_rdma_rsp *rsp, u32 len,
@@ -697,9 +871,9 @@ static u16 nvmet_rdma_map_sgl_inline(struct nvmet_rdma_rsp *rsp)
 static u16 nvmet_rdma_map_sgl_keyed(struct nvmet_rdma_rsp *rsp,
 		struct nvme_keyed_sgl_desc *sgl, bool invalidate)
 {
-	struct rdma_cm_id *cm_id = rsp->queue->cm_id;
 	u64 addr = le64_to_cpu(sgl->addr);
 	u32 key = get_unaligned_le32(sgl->key);
+	struct ib_sig_attrs sig_attrs;
 	int ret;
 
 	rsp->req.transfer_len = get_unaligned_le24(sgl->length);
@@ -708,13 +882,14 @@ static u16 nvmet_rdma_map_sgl_keyed(struct nvmet_rdma_rsp *rsp,
 	if (!rsp->req.transfer_len)
 		return 0;
 
+	if (rsp->req.metadata_len)
+		nvmet_rdma_set_sig_attrs(&rsp->req, &sig_attrs);
+
 	ret = nvmet_req_alloc_sgls(&rsp->req);
 	if (unlikely(ret < 0))
 		goto error_out;
 
-	ret = rdma_rw_ctx_init(&rsp->rw, cm_id->qp, cm_id->port_num,
-			rsp->req.sg, rsp->req.sg_cnt, 0, addr, key,
-			nvmet_data_dir(&rsp->req));
+	ret = nvmet_rdma_rw_ctx_init(rsp, addr, key, &sig_attrs);
 	if (unlikely(ret < 0))
 		goto error_out;
 	rsp->n_rdma += ret;
@@ -1108,6 +1283,9 @@ static int nvmet_rdma_create_queue_ib(struct nvmet_rdma_queue *queue)
 		qp_attr.cap.max_recv_sge = 1 + ndev->inline_page_count;
 	}
 
+	if (queue->port->pi_enable && queue->host_qid)
+		qp_attr.create_flags |= IB_QP_CREATE_INTEGRITY_EN;
+
 	ret = rdma_create_qp(queue->cm_id, ndev->pd, &qp_attr);
 	if (ret) {
 		pr_err("failed to create_qp ret= %d\n", ret);
@@ -1226,6 +1404,7 @@ static int nvmet_rdma_cm_reject(struct rdma_cm_id *cm_id,
 		struct rdma_cm_id *cm_id,
 		struct rdma_cm_event *event)
 {
+	struct nvmet_rdma_port *port = cm_id->context;
 	struct nvmet_rdma_queue *queue;
 	int ret;
 
@@ -1252,6 +1431,7 @@ static int nvmet_rdma_cm_reject(struct rdma_cm_id *cm_id,
 	INIT_WORK(&queue->release_work, nvmet_rdma_release_queue_work);
 	queue->dev = ndev;
 	queue->cm_id = cm_id;
+	queue->port = port->nport;
 
 	spin_lock_init(&queue->state_lock);
 	queue->state = NVMET_RDMA_Q_CONNECTING;
@@ -1369,7 +1549,6 @@ static int nvmet_rdma_cm_accept(struct rdma_cm_id *cm_id,
 static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id,
 		struct rdma_cm_event *event)
 {
-	struct nvmet_rdma_port *port = cm_id->context;
 	struct nvmet_rdma_device *ndev;
 	struct nvmet_rdma_queue *queue;
 	int ret = -EINVAL;
@@ -1385,7 +1564,6 @@ static int nvmet_rdma_queue_connect(struct rdma_cm_id *cm_id,
 		ret = -ENOMEM;
 		goto put_device;
 	}
-	queue->port = port->nport;
 
 	if (queue->host_qid == 0) {
 		/* Let inflight controller teardown complete */
@@ -1657,6 +1835,14 @@ static int nvmet_rdma_enable_port(struct nvmet_rdma_port *port)
 		goto out_destroy_id;
 	}
 
+	if (port->nport->pi_enable &&
+	    !(cm_id->device->attrs.device_cap_flags &
+	      IB_DEVICE_INTEGRITY_HANDOVER)) {
+		pr_err("T10-PI is not supported for %pISpcs\n", addr);
+		ret = -EINVAL;
+		goto out_destroy_id;
+	}
+
 	port->cm_id = cm_id;
 	return 0;
 
@@ -1766,6 +1952,8 @@ static void nvmet_rdma_disc_port_addr(struct nvmet_req *req,
 
 static u8 nvmet_rdma_get_mdts(const struct nvmet_ctrl *ctrl)
 {
+	if (ctrl->pi_support)
+		return NVMET_RDMA_MAX_METADATA_MDTS;
 	return NVMET_RDMA_MAX_MDTS;
 }
 
@@ -1774,6 +1962,7 @@ static u8 nvmet_rdma_get_mdts(const struct nvmet_ctrl *ctrl)
 	.type			= NVMF_TRTYPE_RDMA,
 	.msdbd			= 1,
 	.has_keyed_sgls		= 1,
+	.metadata_support	= 1,
 	.add_port		= nvmet_rdma_add_port,
 	.remove_port		= nvmet_rdma_remove_port,
 	.queue_response		= nvmet_rdma_queue_response,
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 02/16] nvme: introduce namespace features flag
  2020-05-04 15:57 ` [PATCH 02/16] nvme: introduce namespace features flag Max Gurtovoy
@ 2020-05-04 23:59   ` James Smart
  2020-05-14  2:52   ` Martin K. Petersen
  1 sibling, 0 replies; 58+ messages in thread
From: James Smart @ 2020-05-04 23:59 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, kbusch, hch, sagi, martin.petersen,
	jsmart2021, axboe
  Cc: vladimirk, idanb, israelr, shlomin, oren, nitzanc



On 5/4/2020 8:57 AM, Max Gurtovoy wrote:
> Replace the specific ext boolean (that implies on extended LBA format)
> with a feature in the new namespace features flag. This is a preparation
> for adding more namespace features (such as metadata specific features).
>
> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
> Reviewed-by: Israel Rukshin <israelr@mellanox.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
>

Reviewed-by: James Smart <james.smart@broadcom.com>

-- james

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 09/16] nvme-rdma: add metadata/T10-PI support
  2020-05-04 15:57 ` [PATCH 09/16] nvme-rdma: add metadata/T10-PI support Max Gurtovoy
@ 2020-05-05  6:12   ` Christoph Hellwig
  2020-05-14  3:02   ` Martin K. Petersen
  1 sibling, 0 replies; 58+ messages in thread
From: Christoph Hellwig @ 2020-05-05  6:12 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, jsmart2021, sagi, martin.petersen, shlomin, israelr,
	vladimirk, linux-nvme, idanb, oren, kbusch, nitzanc, hch

On Mon, May 04, 2020 at 06:57:48PM +0300, Max Gurtovoy wrote:
> For capable HCAs (e.g. ConnectX-5/ConnectX-6) this will allow end-to-end
> protection information passthrough and validation for NVMe over RDMA
> transport. Metadata offload support was implemented over the new RDMA
> signature verbs API and it is enabled for capable controllers.
> 
> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
> Signed-off-by: Israel Rukshin <israelr@mellanox.com>
> ---
>  drivers/nvme/host/rdma.c | 272 +++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 262 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 4086874..05dbed3 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -34,6 +34,11 @@
>  
>  #define NVME_RDMA_MAX_INLINE_SEGMENTS	4
>  
> +#define NVME_RDMA_DATA_SGL_SIZE \
> +	(sizeof(struct scatterlist) * NVME_INLINE_SG_CNT)
> +#define NVME_RDMA_METADATA_SGL_SIZE \
> +	(sizeof(struct scatterlist) * NVME_INLINE_METADATA_SG_CNT)

It might make sense to lift this to common code as well and clean up the
other transports, but we don't really need to do that in this series.

Otherwise this looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support
  2020-05-04 15:57 [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
                   ` (15 preceding siblings ...)
  2020-05-04 15:57 ` [PATCH 16/16] nvmet-rdma: add metadata/T10-PI support Max Gurtovoy
@ 2020-05-05  6:13 ` Christoph Hellwig
  2020-05-14 15:55   ` Max Gurtovoy
  16 siblings, 1 reply; 58+ messages in thread
From: Christoph Hellwig @ 2020-05-05  6:13 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, jsmart2021, sagi, martin.petersen, shlomin, israelr,
	vladimirk, linux-nvme, idanb, oren, kbusch, nitzanc, hch

The series looks good to me now, but I'll wait for a few more comments
before applying it.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 03/16] nvme: introduce NVME_NS_METADATA_SUPPORTED flag
  2020-05-04 15:57 ` [PATCH 03/16] nvme: introduce NVME_NS_METADATA_SUPPORTED flag Max Gurtovoy
@ 2020-05-05 23:33   ` James Smart
  2020-05-06  8:39     ` Max Gurtovoy
  2020-05-14  2:53   ` Martin K. Petersen
  1 sibling, 1 reply; 58+ messages in thread
From: James Smart @ 2020-05-05 23:33 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, kbusch, hch, sagi, martin.petersen,
	jsmart2021, axboe
  Cc: vladimirk, idanb, israelr, shlomin, oren, nitzanc



On 5/4/2020 8:57 AM, Max Gurtovoy wrote:
> This is a preparation for adding support for metadata in fabric
> controllers. New flag will imply that NVMe namespace supports getting
> metadata that was originally generated by host's block layer.
>
> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
> Reviewed-by: Israel Rukshin <israelr@mellanox.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/core.c | 41 ++++++++++++++++++++++++++++++++++-------
>   drivers/nvme/host/nvme.h |  1 +
>   2 files changed, 35 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 1d226cc..4b7faf9 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1882,13 +1882,27 @@ static void nvme_update_disk_info(struct gendisk *disk,
>   	blk_queue_io_min(disk->queue, phys_bs);
>   	blk_queue_io_opt(disk->queue, io_opt);
>   
> -	if (ns->ms && !(ns->features & NVME_NS_EXT_LBAS) &&
> -	    (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
> -		nvme_init_integrity(disk, ns->ms, ns->pi_type);
> -	if ((ns->ms && !nvme_ns_has_pi(ns) && !blk_get_integrity(disk)) ||
> -	    ns->lba_shift > PAGE_SHIFT)
> +	/*
> +	 * The block layer can't support LBA sizes larger than the page size
> +	 * yet, so catch this early and don't allow block I/O.
> +	 */
> +	if (ns->lba_shift > PAGE_SHIFT)
>   		capacity = 0;
>   
> +	/*
> +	 * Register a metadata profile for PI, or the plain non-integrity NVMe
> +	 * metadata masquerading as Type 0 if supported, otherwise reject block
> +	 * I/O to namespaces with metadata except when the namespace supports
> +	 * PI, as it can strip/insert in that case.
> +	 */
> +	if (ns->ms) {
> +		if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) &&
> +		    (ns->features & NVME_NS_METADATA_SUPPORTED))
> +			nvme_init_integrity(disk, ns->ms, ns->pi_type);
> +		else if (!nvme_ns_has_pi(ns))
> +			capacity = 0;
> +	}
> +
Look below for how I interpret the meaning of the 
NVME_NS_METADATA_SUPPORTED flag. It's a rollup of several conditions. 
Not all of those conditions are considered in the else clause.

The "else if" clause looks too light to address all the cases where 
capacity should be set to 0. Probably shouldn't be an else.
Examples:
- ! IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) & meta is pi   (aka nvme_hs_has_pi)
- meta is not pi (thus pi_type=0 in call to nvme_init_integrity()), 
which results in  !blk_get_integrity(disk) which is not checked.
- meta is pi and:
   - !ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED
   - !ns->features & NVME_NS_EXT_LBAS

may be a couple others.

>   	set_capacity_revalidate_and_notify(disk, capacity, false);
>   
>   	nvme_config_discard(disk, ns);
> @@ -1923,14 +1937,27 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
>   
>   	ns->features = 0;
>   	ns->ms = le16_to_cpu(id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ms);
> -	if (ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT))
> -		ns->features |= NVME_NS_EXT_LBAS;
>   	/* the PI implementation requires metadata equal t10 pi tuple size */
>   	if (ns->ms == sizeof(struct t10_pi_tuple))
>   		ns->pi_type = id->dps & NVME_NS_DPS_PI_MASK;
>   	else
>   		ns->pi_type = 0;
>   
> +	if (ns->ms) {
> +		if (id->flbas & NVME_NS_FLBAS_META_EXT)
> +			ns->features |= NVME_NS_EXT_LBAS;
> +
> +		/*
> +		 * For PCI, Extended logical block will be generated by the
> +		 * controller. Non-extended format can be generated by the
> +		 * block layer.
> +		 */
> +		if (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) {
> +			if (!(ns->features & NVME_NS_EXT_LBAS))
> +				ns->features |= NVME_NS_METADATA_SUPPORTED;

So I interpret this flag to mean:
1) namespace has metadata,
2) controller transport supports metadata (and implicitly: since it's 
pci so far it's via separate buffer only)
3) the nvme controller requires a separate metadata buffer (thus it 
matches the transport).
4) and implicitly, if we're going to have DIF, the block layer will give 
us a separate buffer.
note: all bets are off if the metadata isn't for DIF. but that's not 
handled here.....

I wish the comment explained that.

And of course these will change with fabrics as it will be expected that 
ns's on fabrics will be ns->features & NVME_NS_EXT_LBAS. But - that 
isn't part of this commit.  So doesn't count against this patch.

> +		}
> +	}
> +
>   	if (iob)
>   		blk_queue_chunk_sectors(ns->queue, rounddown_pow_of_two(iob));
>   	nvme_update_disk_info(disk, ns, id);
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 110577c7..0dda145 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -366,6 +366,7 @@ struct nvme_ns_head {
>   
>   enum nvme_ns_features {
>   	NVME_NS_EXT_LBAS = 1 << 0, /* support extended LBA format */
> +	NVME_NS_METADATA_SUPPORTED = 1 << 1, /* support getting generated md */
>   };
>   
>   struct nvme_ns {


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 05/16] nvme: introduce max_integrity_segments ctrl attribute
  2020-05-04 15:57 ` [PATCH 05/16] nvme: introduce max_integrity_segments ctrl attribute Max Gurtovoy
@ 2020-05-05 23:51   ` James Smart
  2020-05-06  7:08     ` Christoph Hellwig
  2020-05-13 19:04   ` James Smart
  1 sibling, 1 reply; 58+ messages in thread
From: James Smart @ 2020-05-05 23:51 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, kbusch, hch, sagi, martin.petersen,
	jsmart2021, axboe
  Cc: vladimirk, idanb, israelr, shlomin, oren, nitzanc



On 5/4/2020 8:57 AM, Max Gurtovoy wrote:
> This patch doesn't change any logic, and is needed as a preparation
> for adding PI support for fabrics drivers that will use an extended
> LBA format for metadata and will support more than 1 integrity segment.
>
> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
> Signed-off-by: Israel Rukshin <israelr@mellanox.com>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>

Would have preferred this was known as max meta segments without typing 
to pi. But this is fine.

Reviewed-by: James Smart <james.smart@broadcom.com>

-- james


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 05/16] nvme: introduce max_integrity_segments ctrl attribute
  2020-05-05 23:51   ` James Smart
@ 2020-05-06  7:08     ` Christoph Hellwig
  0 siblings, 0 replies; 58+ messages in thread
From: Christoph Hellwig @ 2020-05-06  7:08 UTC (permalink / raw)
  To: James Smart
  Cc: axboe, jsmart2021, sagi, martin.petersen, idanb, israelr,
	vladimirk, linux-nvme, shlomin, oren, kbusch, Max Gurtovoy,
	nitzanc, hch

On Tue, May 05, 2020 at 04:51:02PM -0700, James Smart wrote:
>
>
> On 5/4/2020 8:57 AM, Max Gurtovoy wrote:
>> This patch doesn't change any logic, and is needed as a preparation
>> for adding PI support for fabrics drivers that will use an extended
>> LBA format for metadata and will support more than 1 integrity segment.
>>
>> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
>> Signed-off-by: Israel Rukshin <israelr@mellanox.com>
>> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
>> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> ---
>>
>
> Would have preferred this was known as max meta segments without typing to 
> pi. But this is fine.

I can't find pi anywhere in here - it says integrity.  That's the Linux
name, metadata would be the NVMe one.  I'm fine either way.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 03/16] nvme: introduce NVME_NS_METADATA_SUPPORTED flag
  2020-05-05 23:33   ` James Smart
@ 2020-05-06  8:39     ` Max Gurtovoy
  2020-05-06 20:44       ` James Smart
  0 siblings, 1 reply; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-06  8:39 UTC (permalink / raw)
  To: James Smart, linux-nvme, kbusch, hch, sagi, martin.petersen,
	jsmart2021, axboe
  Cc: vladimirk, idanb, israelr, shlomin, oren, nitzanc


On 5/6/2020 2:33 AM, James Smart wrote:
>
>
> On 5/4/2020 8:57 AM, Max Gurtovoy wrote:
>> This is a preparation for adding support for metadata in fabric
>> controllers. New flag will imply that NVMe namespace supports getting
>> metadata that was originally generated by host's block layer.
>>
>> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
>> Reviewed-by: Israel Rukshin <israelr@mellanox.com>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> ---
>>   drivers/nvme/host/core.c | 41 
>> ++++++++++++++++++++++++++++++++++-------
>>   drivers/nvme/host/nvme.h |  1 +
>>   2 files changed, 35 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 1d226cc..4b7faf9 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -1882,13 +1882,27 @@ static void nvme_update_disk_info(struct 
>> gendisk *disk,
>>       blk_queue_io_min(disk->queue, phys_bs);
>>       blk_queue_io_opt(disk->queue, io_opt);
>>   -    if (ns->ms && !(ns->features & NVME_NS_EXT_LBAS) &&
>> -        (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
>> -        nvme_init_integrity(disk, ns->ms, ns->pi_type);
>> -    if ((ns->ms && !nvme_ns_has_pi(ns) && !blk_get_integrity(disk)) ||
>> -        ns->lba_shift > PAGE_SHIFT)
>> +    /*
>> +     * The block layer can't support LBA sizes larger than the page 
>> size
>> +     * yet, so catch this early and don't allow block I/O.
>> +     */
>> +    if (ns->lba_shift > PAGE_SHIFT)
>>           capacity = 0;
>>   +    /*
>> +     * Register a metadata profile for PI, or the plain 
>> non-integrity NVMe
>> +     * metadata masquerading as Type 0 if supported, otherwise 
>> reject block
>> +     * I/O to namespaces with metadata except when the namespace 
>> supports
>> +     * PI, as it can strip/insert in that case.
>> +     */
>> +    if (ns->ms) {
>> +        if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) &&
>> +            (ns->features & NVME_NS_METADATA_SUPPORTED))
>> +            nvme_init_integrity(disk, ns->ms, ns->pi_type);
>> +        else if (!nvme_ns_has_pi(ns))
>> +            capacity = 0;
>> +    }
>> +
> Look below for how I interpret the meaning of the 
> NVME_NS_METADATA_SUPPORTED flag. It's a rollup of several conditions. 
> Not all of those conditions are considered in the else clause.

NVME_NS_METADATA_SUPPORTED has 1 meaning:

support getting metadata from the block layer.

Linux block is supplying only 2 separate pointers for data/metadata (aka 
Non-Extended mode in NVMe).

So drivers that can't convert between the 2 will set this flag only in 
case their controller support separate buffer mode.

It doesn't mean the controller can't generate the metadata by himself...

Maybe we move the IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) to the place 
where we suppose to set the NVME_NS_METADATA_SUPPORTED.

will it make life easier:

if (ns->features & NVME_NS_METADATA_SUPPORTED)

     nvme_init_integrity(disk, ns->ms, ns->pi_type);

else if (!nvme_ns_has_pi(ns))

     capacity = 0;



>
> The "else if" clause looks too light to address all the cases where 
> capacity should be set to 0. Probably shouldn't be an else.
> Examples:
> - ! IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) & meta is pi   (aka 
> nvme_hs_has_pi)
> - meta is not pi (thus pi_type=0 in call to nvme_init_integrity()), 
> which results in  !blk_get_integrity(disk) which is not checked.

This will set dummy nop_profile and blk_get_integrity(disk) will not 
return NULL.

If we init the integrity in nvme_init_integrity it will not return NULL 
also for type 0.


> - meta is pi and:
>   - !ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED
>   - !ns->features & NVME_NS_EXT_LBAS
>
> may be a couple others.
>
>> set_capacity_revalidate_and_notify(disk, capacity, false);
>>         nvme_config_discard(disk, ns);
>> @@ -1923,14 +1937,27 @@ static void __nvme_revalidate_disk(struct 
>> gendisk *disk, struct nvme_id_ns *id)
>>         ns->features = 0;
>>       ns->ms = le16_to_cpu(id->lbaf[id->flbas & 
>> NVME_NS_FLBAS_LBA_MASK].ms);
>> -    if (ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT))
>> -        ns->features |= NVME_NS_EXT_LBAS;
>>       /* the PI implementation requires metadata equal t10 pi tuple 
>> size */
>>       if (ns->ms == sizeof(struct t10_pi_tuple))
>>           ns->pi_type = id->dps & NVME_NS_DPS_PI_MASK;
>>       else
>>           ns->pi_type = 0;
>>   +    if (ns->ms) {
>> +        if (id->flbas & NVME_NS_FLBAS_META_EXT)
>> +            ns->features |= NVME_NS_EXT_LBAS;
>> +
>> +        /*
>> +         * For PCI, Extended logical block will be generated by the
>> +         * controller. Non-extended format can be generated by the
>> +         * block layer.
>> +         */
>> +        if (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) {
>> +            if (!(ns->features & NVME_NS_EXT_LBAS))
>> +                ns->features |= NVME_NS_METADATA_SUPPORTED;

and here we can do:

+        if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) && 
ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) {
+            if (!(ns->features & NVME_NS_EXT_LBAS))
+                ns->features |= NVME_NS_METADATA_SUPPORTED;


>
> So I interpret this flag to mean:
> 1) namespace has metadata,
> 2) controller transport supports metadata (and implicitly: since it's 
> pci so far it's via separate buffer only)
> 3) the nvme controller requires a separate metadata buffer (thus it 
> matches the transport).
> 4) and implicitly, if we're going to have DIF, the block layer will 
> give us a separate buffer.
> note: all bets are off if the metadata isn't for DIF. but that's not 
> handled here.....
>
> I wish the comment explained that.
>
> And of course these will change with fabrics as it will be expected 
> that ns's on fabrics will be ns->features & NVME_NS_EXT_LBAS. But - 
> that isn't part of this commit.  So doesn't count against this patch.
>
>> +        }
>> +    }
>> +
>>       if (iob)
>>           blk_queue_chunk_sectors(ns->queue, rounddown_pow_of_two(iob));
>>       nvme_update_disk_info(disk, ns, id);
>> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
>> index 110577c7..0dda145 100644
>> --- a/drivers/nvme/host/nvme.h
>> +++ b/drivers/nvme/host/nvme.h
>> @@ -366,6 +366,7 @@ struct nvme_ns_head {
>>     enum nvme_ns_features {
>>       NVME_NS_EXT_LBAS = 1 << 0, /* support extended LBA format */
>> +    NVME_NS_METADATA_SUPPORTED = 1 << 1, /* support getting 
>> generated md */
>>   };
>>     struct nvme_ns {
>

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 03/16] nvme: introduce NVME_NS_METADATA_SUPPORTED flag
  2020-05-06  8:39     ` Max Gurtovoy
@ 2020-05-06 20:44       ` James Smart
  2020-05-07  9:02         ` Max Gurtovoy
  0 siblings, 1 reply; 58+ messages in thread
From: James Smart @ 2020-05-06 20:44 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, kbusch, hch, sagi, martin.petersen,
	jsmart2021, axboe
  Cc: vladimirk, idanb, israelr, shlomin, oren, nitzanc



On 5/6/2020 1:39 AM, Max Gurtovoy wrote:
>
> On 5/6/2020 2:33 AM, James Smart wrote:
>>
>>
>> On 5/4/2020 8:57 AM, Max Gurtovoy wrote:
>>> This is a preparation for adding support for metadata in fabric
>>> controllers. New flag will imply that NVMe namespace supports getting
>>> metadata that was originally generated by host's block layer.
>>>
>>> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
>>> Reviewed-by: Israel Rukshin <israelr@mellanox.com>
>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>>> ---
>>>   drivers/nvme/host/core.c | 41 
>>> ++++++++++++++++++++++++++++++++++-------
>>>   drivers/nvme/host/nvme.h |  1 +
>>>   2 files changed, 35 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index 1d226cc..4b7faf9 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -1882,13 +1882,27 @@ static void nvme_update_disk_info(struct 
>>> gendisk *disk,
>>>       blk_queue_io_min(disk->queue, phys_bs);
>>>       blk_queue_io_opt(disk->queue, io_opt);
>>>   -    if (ns->ms && !(ns->features & NVME_NS_EXT_LBAS) &&
>>> -        (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
>>> -        nvme_init_integrity(disk, ns->ms, ns->pi_type);
>>> -    if ((ns->ms && !nvme_ns_has_pi(ns) && !blk_get_integrity(disk)) ||
>>> -        ns->lba_shift > PAGE_SHIFT)
>>> +    /*
>>> +     * The block layer can't support LBA sizes larger than the page 
>>> size
>>> +     * yet, so catch this early and don't allow block I/O.
>>> +     */
>>> +    if (ns->lba_shift > PAGE_SHIFT)
>>>           capacity = 0;
>>>   +    /*
>>> +     * Register a metadata profile for PI, or the plain 
>>> non-integrity NVMe
>>> +     * metadata masquerading as Type 0 if supported, otherwise 
>>> reject block
>>> +     * I/O to namespaces with metadata except when the namespace 
>>> supports
>>> +     * PI, as it can strip/insert in that case.
>>> +     */
>>> +    if (ns->ms) {
>>> +        if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) &&
>>> +            (ns->features & NVME_NS_METADATA_SUPPORTED))
>>> +            nvme_init_integrity(disk, ns->ms, ns->pi_type);
>>> +        else if (!nvme_ns_has_pi(ns))
>>> +            capacity = 0;
>>> +    }
>>> +
>> Look below for how I interpret the meaning of the 
>> NVME_NS_METADATA_SUPPORTED flag. It's a rollup of several conditions. 
>> Not all of those conditions are considered in the else clause.
>
> NVME_NS_METADATA_SUPPORTED has 1 meaning:
>
> support getting metadata from the block layer.

Well I disagree with you as several other conditions had to be true in 
order for it to be set.

>
> Linux block is supplying only 2 separate pointers for data/metadata 
> (aka Non-Extended mode in NVMe).
>
> So drivers that can't convert between the 2 will set this flag only in 
> case their controller support separate buffer mode.

Maybe I'm splitting hairs... but..  your 1 meaning is "blk sends 
separate & transport supports separate buffer" and you missed the "& 
controller requires separate".

> It doesn't mean the controller can't generate the metadata by himself...

I didn't say that it did.  And I don't see how this case would be 
covered by this flag unless there's lots of assumptions.  Minimally the 
"& controller requires separate" would likely be false - the controller 
would require extended LBA.  And we would need something else to 
indicate "blk doesn't have to send me separate meta, yet I can still do 
pi".  I know we're going outside the scope of this patch, probably into 
patch 6.


>
> Maybe we move the IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) to the place 
> where we suppose to set the NVME_NS_METADATA_SUPPORTED.
>
> will it make life easier:
>
> if (ns->features & NVME_NS_METADATA_SUPPORTED)
>
>     nvme_init_integrity(disk, ns->ms, ns->pi_type);
>
> else if (!nvme_ns_has_pi(ns))
>
>     capacity = 0;

I think you still missed it.

Let me reword this snippet with what the flag really means:

if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) && (ns->ctrl->ops->flags & 
NVME_F_METADATA_SUPPORTED) && !(ns->features & NVME_NS_EXT_LBAS))
   nvme_init_integrity(...,  ns->pi_type)
else if (!nvme_ns_has_pi(ns))
    capacity=0.

This leaves the cases where capacity is not zero'd, thus there may be io 
attempted to the ns:
a kernel w/o CONFIG_BLK_DEV_INTEGRITY enabled, and the ns was formatted 
for pi.
a kernel w/ CONFIG_BLK_DEV_INTEGRITY, the ns was formatted for pi, but 
the transport has no idea about a separate buffer.
a kernel w/ CONFIG_BLK_DEV_INTEGRITY, the ns was formatted for pi, the 
transport knows how to 2 do buffers, but the controller requires 
extended LBAS.

The 1st and last lines can be cases with pcie drives (you would hope you 
couldn't format for pi w/o having support for it, hope no one plugs a 
pre-formatted drive in) .
The 1st and middle lines can be cases with fabric-attached subsystems.

Rather than resolving it in this patch, let's defer the conversation to 
patch 6, where the snippet is modified for fabrics.  I commented as, if 
the 2 patches were ever separated, this patch would leave holes.


>
>
>
>>
>> The "else if" clause looks too light to address all the cases where 
>> capacity should be set to 0. Probably shouldn't be an else.
>> Examples:
>> - ! IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) & meta is pi   (aka 
>> nvme_hs_has_pi)
>> - meta is not pi (thus pi_type=0 in call to nvme_init_integrity()), 
>> which results in !blk_get_integrity(disk) which is not checked.
>
> This will set dummy nop_profile and blk_get_integrity(disk) will not 
> return NULL.
>
> If we init the integrity in nvme_init_integrity it will not return 
> NULL also for type 0.

Good - I was under the impression it would have.


>
>
>> - meta is pi and:
>>   - !ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED
>>   - !ns->features & NVME_NS_EXT_LBAS
>>
>> may be a couple others.
>>
>>> set_capacity_revalidate_and_notify(disk, capacity, false);
>>>         nvme_config_discard(disk, ns);
>>> @@ -1923,14 +1937,27 @@ static void __nvme_revalidate_disk(struct 
>>> gendisk *disk, struct nvme_id_ns *id)
>>>         ns->features = 0;
>>>       ns->ms = le16_to_cpu(id->lbaf[id->flbas & 
>>> NVME_NS_FLBAS_LBA_MASK].ms);
>>> -    if (ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT))
>>> -        ns->features |= NVME_NS_EXT_LBAS;
>>>       /* the PI implementation requires metadata equal t10 pi tuple 
>>> size */
>>>       if (ns->ms == sizeof(struct t10_pi_tuple))
>>>           ns->pi_type = id->dps & NVME_NS_DPS_PI_MASK;
>>>       else
>>>           ns->pi_type = 0;
>>>   +    if (ns->ms) {
>>> +        if (id->flbas & NVME_NS_FLBAS_META_EXT)
>>> +            ns->features |= NVME_NS_EXT_LBAS;
>>> +
>>> +        /*
>>> +         * For PCI, Extended logical block will be generated by the
>>> +         * controller. Non-extended format can be generated by the
>>> +         * block layer.
>>> +         */
>>> +        if (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) {
>>> +            if (!(ns->features & NVME_NS_EXT_LBAS))
>>> +                ns->features |= NVME_NS_METADATA_SUPPORTED;
>
> and here we can do:
>
> +        if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) && 
> ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) {
> +            if (!(ns->features & NVME_NS_EXT_LBAS))
> +                ns->features |= NVME_NS_METADATA_SUPPORTED;

it probably is better to apply it here, but it didn't change the 
discussion above.

-- james



_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 03/16] nvme: introduce NVME_NS_METADATA_SUPPORTED flag
  2020-05-06 20:44       ` James Smart
@ 2020-05-07  9:02         ` Max Gurtovoy
  2020-05-11 23:50           ` James Smart
  0 siblings, 1 reply; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-07  9:02 UTC (permalink / raw)
  To: James Smart, linux-nvme, kbusch, hch, sagi, martin.petersen,
	jsmart2021, axboe
  Cc: vladimirk, idanb, israelr, shlomin, oren, nitzanc


On 5/6/2020 11:44 PM, James Smart wrote:
>
>
> On 5/6/2020 1:39 AM, Max Gurtovoy wrote:
>>
>> On 5/6/2020 2:33 AM, James Smart wrote:
>>>
>>>
>>> On 5/4/2020 8:57 AM, Max Gurtovoy wrote:
>>>> This is a preparation for adding support for metadata in fabric
>>>> controllers. New flag will imply that NVMe namespace supports getting
>>>> metadata that was originally generated by host's block layer.
>>>>
>>>> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
>>>> Reviewed-by: Israel Rukshin <israelr@mellanox.com>
>>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>>>> ---
>>>>   drivers/nvme/host/core.c | 41 
>>>> ++++++++++++++++++++++++++++++++++-------
>>>>   drivers/nvme/host/nvme.h |  1 +
>>>>   2 files changed, 35 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>>> index 1d226cc..4b7faf9 100644
>>>> --- a/drivers/nvme/host/core.c
>>>> +++ b/drivers/nvme/host/core.c
>>>> @@ -1882,13 +1882,27 @@ static void nvme_update_disk_info(struct 
>>>> gendisk *disk,
>>>>       blk_queue_io_min(disk->queue, phys_bs);
>>>>       blk_queue_io_opt(disk->queue, io_opt);
>>>>   -    if (ns->ms && !(ns->features & NVME_NS_EXT_LBAS) &&
>>>> -        (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
>>>> -        nvme_init_integrity(disk, ns->ms, ns->pi_type);
>>>> -    if ((ns->ms && !nvme_ns_has_pi(ns) && 
>>>> !blk_get_integrity(disk)) ||
>>>> -        ns->lba_shift > PAGE_SHIFT)
>>>> +    /*
>>>> +     * The block layer can't support LBA sizes larger than the 
>>>> page size
>>>> +     * yet, so catch this early and don't allow block I/O.
>>>> +     */
>>>> +    if (ns->lba_shift > PAGE_SHIFT)
>>>>           capacity = 0;
>>>>   +    /*
>>>> +     * Register a metadata profile for PI, or the plain 
>>>> non-integrity NVMe
>>>> +     * metadata masquerading as Type 0 if supported, otherwise 
>>>> reject block
>>>> +     * I/O to namespaces with metadata except when the namespace 
>>>> supports
>>>> +     * PI, as it can strip/insert in that case.
>>>> +     */
>>>> +    if (ns->ms) {
>>>> +        if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) &&
>>>> +            (ns->features & NVME_NS_METADATA_SUPPORTED))
>>>> +            nvme_init_integrity(disk, ns->ms, ns->pi_type);
>>>> +        else if (!nvme_ns_has_pi(ns))
>>>> +            capacity = 0;
>>>> +    }
>>>> +
>>> Look below for how I interpret the meaning of the 
>>> NVME_NS_METADATA_SUPPORTED flag. It's a rollup of several 
>>> conditions. Not all of those conditions are considered in the else 
>>> clause.
>>
>> NVME_NS_METADATA_SUPPORTED has 1 meaning:
>>
>> support getting metadata from the block layer.
>
> Well I disagree with you as several other conditions had to be true in 
> order for it to be set.
>
>>
>> Linux block is supplying only 2 separate pointers for data/metadata 
>> (aka Non-Extended mode in NVMe).
>>
>> So drivers that can't convert between the 2 will set this flag only 
>> in case their controller support separate buffer mode.
>
> Maybe I'm splitting hairs... but..  your 1 meaning is "blk sends 
> separate & transport supports separate buffer" and you missed the "& 
> controller requires separate".
>
>> It doesn't mean the controller can't generate the metadata by himself...
>
> I didn't say that it did.  And I don't see how this case would be 
> covered by this flag unless there's lots of assumptions. Minimally the 
> "& controller requires separate" would likely be false - the 
> controller would require extended LBA.  And we would need something 
> else to indicate "blk doesn't have to send me separate meta, yet I can 
> still do pi".  I know we're going outside the scope of this patch, 
> probably into patch 6.
>
>
>>
>> Maybe we move the IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) to the place 
>> where we suppose to set the NVME_NS_METADATA_SUPPORTED.
>>
>> will it make life easier:
>>
>> if (ns->features & NVME_NS_METADATA_SUPPORTED)
>>
>>     nvme_init_integrity(disk, ns->ms, ns->pi_type);
>>
>> else if (!nvme_ns_has_pi(ns))
>>
>>     capacity = 0;
>
> I think you still missed it.
>
> Let me reword this snippet with what the flag really means:
>
> if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) && (ns->ctrl->ops->flags & 
> NVME_F_METADATA_SUPPORTED) && !(ns->features & NVME_NS_EXT_LBAS))
>   nvme_init_integrity(...,  ns->pi_type)
> else if (!nvme_ns_has_pi(ns))
>    capacity=0.
>
> This leaves the cases where capacity is not zero'd, thus there may be 
> io attempted to the ns:
> a kernel w/o CONFIG_BLK_DEV_INTEGRITY enabled, and the ns was 
> formatted for pi.

Controller will generate it.


> a kernel w/ CONFIG_BLK_DEV_INTEGRITY, the ns was formatted for pi, but 
> the transport has no idea about a separate buffer.

If the transport supports METADATA, it must have an idea about separate 
and non-separate modes.


> a kernel w/ CONFIG_BLK_DEV_INTEGRITY, the ns was formatted for pi, the 
> transport knows how to 2 do buffers, but the controller requires 
> extended LBAS.

In this point of time (patch 3/16) only PCI transport supports the 
metadata feature and it can't convert from 2 buffers to extended mode.

So in case you get a write/read command from block layer (without 
metadata of course), the core layer will see that the ns "has pi" and 
will set the PRACT bit so that the SSD controller will generate/strip 
the metadata.


>
> The 1st and last lines can be cases with pcie drives (you would hope 
> you couldn't format for pi w/o having support for it, hope no one 
> plugs a pre-formatted drive in) .
> The 1st and middle lines can be cases with fabric-attached subsystems.

for PCI we're ok ?

for fabrics, the conditions are different and not supported in this stage.



>
> Rather than resolving it in this patch, let's defer the conversation 
> to patch 6, where the snippet is modified for fabrics.  I commented 
> as, if the 2 patches were ever separated, this patch would leave holes.
>
>
>>
>>
>>
>>>
>>> The "else if" clause looks too light to address all the cases where 
>>> capacity should be set to 0. Probably shouldn't be an else.
>>> Examples:
>>> - ! IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) & meta is pi (aka 
>>> nvme_hs_has_pi)
>>> - meta is not pi (thus pi_type=0 in call to nvme_init_integrity()), 
>>> which results in !blk_get_integrity(disk) which is not checked.
>>
>> This will set dummy nop_profile and blk_get_integrity(disk) will not 
>> return NULL.
>>
>> If we init the integrity in nvme_init_integrity it will not return 
>> NULL also for type 0.
>
> Good - I was under the impression it would have.
>
>
>>
>>
>>> - meta is pi and:
>>>   - !ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED
>>>   - !ns->features & NVME_NS_EXT_LBAS
>>>
>>> may be a couple others.
>>>
>>>> set_capacity_revalidate_and_notify(disk, capacity, false);
>>>>         nvme_config_discard(disk, ns);
>>>> @@ -1923,14 +1937,27 @@ static void __nvme_revalidate_disk(struct 
>>>> gendisk *disk, struct nvme_id_ns *id)
>>>>         ns->features = 0;
>>>>       ns->ms = le16_to_cpu(id->lbaf[id->flbas & 
>>>> NVME_NS_FLBAS_LBA_MASK].ms);
>>>> -    if (ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT))
>>>> -        ns->features |= NVME_NS_EXT_LBAS;
>>>>       /* the PI implementation requires metadata equal t10 pi tuple 
>>>> size */
>>>>       if (ns->ms == sizeof(struct t10_pi_tuple))
>>>>           ns->pi_type = id->dps & NVME_NS_DPS_PI_MASK;
>>>>       else
>>>>           ns->pi_type = 0;
>>>>   +    if (ns->ms) {
>>>> +        if (id->flbas & NVME_NS_FLBAS_META_EXT)
>>>> +            ns->features |= NVME_NS_EXT_LBAS;
>>>> +
>>>> +        /*
>>>> +         * For PCI, Extended logical block will be generated by the
>>>> +         * controller. Non-extended format can be generated by the
>>>> +         * block layer.
>>>> +         */
>>>> +        if (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) {
>>>> +            if (!(ns->features & NVME_NS_EXT_LBAS))
>>>> +                ns->features |= NVME_NS_METADATA_SUPPORTED;
>>
>> and here we can do:
>>
>> +        if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) && 
>> ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) {
>> +            if (!(ns->features & NVME_NS_EXT_LBAS))
>> +                ns->features |= NVME_NS_METADATA_SUPPORTED;
>
> it probably is better to apply it here, but it didn't change the 
> discussion above.
>
> -- james
>
>

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 03/16] nvme: introduce NVME_NS_METADATA_SUPPORTED flag
  2020-05-07  9:02         ` Max Gurtovoy
@ 2020-05-11 23:50           ` James Smart
  2020-05-13 18:18             ` Christoph Hellwig
  0 siblings, 1 reply; 58+ messages in thread
From: James Smart @ 2020-05-11 23:50 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, kbusch, hch, sagi, martin.petersen,
	jsmart2021, axboe
  Cc: vladimirk, idanb, israelr, shlomin, oren, nitzanc



On 5/7/2020 2:02 AM, Max Gurtovoy wrote:
>
> On 5/6/2020 11:44 PM, James Smart wrote:
>>
>> Let me reword this snippet with what the flag really means:
>>
>> if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) && (ns->ctrl->ops->flags & 
>> NVME_F_METADATA_SUPPORTED) && !(ns->features & NVME_NS_EXT_LBAS))
>>   nvme_init_integrity(...,  ns->pi_type)
>> else if (!nvme_ns_has_pi(ns))
>>    capacity=0.
>>
>> This leaves the cases where capacity is not zero'd, thus there may be 
>> io attempted to the ns:
>> a kernel w/o CONFIG_BLK_DEV_INTEGRITY enabled, and the ns was 
>> formatted for pi.
>
> Controller will generate it.

Ok - So what you meant was: other code will look for the integrity rq, 
and if not set will set PRACT to have controller generate it.

Agree.  I was thinking that it would be better to validate what was 
going to happen at this point, but you are correct.

>
>
>> a kernel w/ CONFIG_BLK_DEV_INTEGRITY, the ns was formatted for pi, 
>> but the transport has no idea about a separate buffer.
>
> If the transport supports METADATA, it must have an idea about 
> separate and non-separate modes.

Agree - if transport doesn't set its ops flag, then the new flag will 
never be set, thus it will not have a blk integrity rq struct, so it'll 
fall into PRACT>

>
>
>> a kernel w/ CONFIG_BLK_DEV_INTEGRITY, the ns was formatted for pi, 
>> the transport knows how to 2 do buffers, but the controller requires 
>> extended LBAS.
>
> In this point of time (patch 3/16) only PCI transport supports the 
> metadata feature and it can't convert from 2 buffers to extended mode.
>
> So in case you get a write/read command from block layer (without 
> metadata of course), the core layer will see that the ns "has pi" and 
> will set the PRACT bit so that the SSD controller will generate/strip 
> the metadata.

Agree.

Lets move on. I've been a little short of time. I should review the 
meatier patch, patch 6, tomorrow sometime.

so...
Reviewed-by:  James Smart <james.smart@broadcom.com>


-- james




_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 03/16] nvme: introduce NVME_NS_METADATA_SUPPORTED flag
  2020-05-11 23:50           ` James Smart
@ 2020-05-13 18:18             ` Christoph Hellwig
  2020-05-13 19:53               ` James Smart
  0 siblings, 1 reply; 58+ messages in thread
From: Christoph Hellwig @ 2020-05-13 18:18 UTC (permalink / raw)
  To: James Smart
  Cc: axboe, jsmart2021, sagi, martin.petersen, idanb, israelr,
	vladimirk, linux-nvme, shlomin, oren, kbusch, Max Gurtovoy,
	nitzanc, hch

On Mon, May 11, 2020 at 04:50:09PM -0700, James Smart wrote:
> Lets move on. I've been a little short of time. I should review the meatier 
> patch, patch 6, tomorrow sometime.

Can you get to it?  I'd like to pick this series up in the next days.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 06/16] nvme: enforce extended LBA format for fabrics metadata
  2020-05-04 15:57 ` [PATCH 06/16] nvme: enforce extended LBA format for fabrics metadata Max Gurtovoy
@ 2020-05-13 19:03   ` James Smart
  2020-05-14  2:56     ` Martin K. Petersen
  2020-05-14  8:15     ` Max Gurtovoy
  0 siblings, 2 replies; 58+ messages in thread
From: James Smart @ 2020-05-13 19:03 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, kbusch, hch, sagi, martin.petersen,
	jsmart2021, axboe
  Cc: vladimirk, idanb, israelr, shlomin, oren, nitzanc



On 5/4/2020 8:57 AM, Max Gurtovoy wrote:
> An extended LBA is a larger LBA that is created when metadata associated
> with the LBA is transferred contiguously with the LBA data (AKA
> interleaved). The metadata may be either transferred as part of the LBA
> (creating an extended LBA) or it may be transferred as a separate
> contiguous buffer of data. According to the NVMeoF spec, a fabrics ctrl
> supports only an Extended LBA format. Fail revalidation in case we have a
> spec violation. Also add a flag that will imply on capable transports and
> controllers as part of a preparation for allowing end-to-end protection
> information for fabric controllers.
>
> Suggested-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
> Signed-off-by: Israel Rukshin <israelr@mellanox.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/core.c | 41 +++++++++++++++++++++++++++--------------
>   1 file changed, 27 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 89b1bfcb..f565281 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1941,16 +1942,24 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
>   		ns->pi_type = 0;
>   
>   	if (ns->ms) {
> -		if (id->flbas & NVME_NS_FLBAS_META_EXT)
> -			ns->features |= NVME_NS_EXT_LBAS;
> -
>   		/*
> -		 * For PCI, Extended logical block will be generated by the
> -		 * controller. Non-extended format can be generated by the
> -		 * block layer.
> +		 * For PCIe only the separate metadata pointer is supported,
> +		 * as the block layer supplies metadata in a separate bio_vec
> +		 * chain. For Fabrics, only metadata as part of extended data
> +		 * LBA is supported on the wire per the Fabrics specification,
> +		 * but the HBA/HCA will do the remapping from the separate
> +		 * metadata buffers for us.
>   		 */
> -		if (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) {
> -			if (!(ns->features & NVME_NS_EXT_LBAS))
> +		if (id->flbas & NVME_NS_FLBAS_META_EXT) {
> +			ns->features |= NVME_NS_EXT_LBAS;
> +			if ((ctrl->ops->flags & NVME_F_FABRICS) &&
> +			    (ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) &&
> +			    ctrl->max_integrity_segments)
> +				ns->features |= NVME_NS_METADATA_SUPPORTED;

This is ok.  Later patches will should deal with no block-layer 
integrity buffer, but having the transport port auto-generate/strip 
interleaved pi data, as we will want to protect during transmission on 
the wire, not just PRACT=1 it.

> +		} else {
> +			if (WARN_ON_ONCE(ctrl->ops->flags & NVME_F_FABRICS))
> +				return -EINVAL;
> +			if (ctrl->ops->flags & NVME_F_METADATA_SUPPORTED)
>   				ns->features |= NVME_NS_METADATA_SUPPORTED;
>   		}

The NVME_F_METADATA_SUPPORTED flag should be on a per-ctrl basis. There 
will be different transport ports under the same transport with 
different abilities.

I'm ok if these patches go in, then a later patch moves the flags (might 
as well move NVME_F_FABRICS too).


Reviewed-by:  James Smart <james.smart@broadcom.com>

-- james



_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 05/16] nvme: introduce max_integrity_segments ctrl attribute
  2020-05-04 15:57 ` [PATCH 05/16] nvme: introduce max_integrity_segments ctrl attribute Max Gurtovoy
  2020-05-05 23:51   ` James Smart
@ 2020-05-13 19:04   ` James Smart
  1 sibling, 0 replies; 58+ messages in thread
From: James Smart @ 2020-05-13 19:04 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, kbusch, hch, sagi, martin.petersen,
	jsmart2021, axboe
  Cc: vladimirk, idanb, israelr, shlomin, oren, nitzanc



On 5/4/2020 8:57 AM, Max Gurtovoy wrote:
> This patch doesn't change any logic, and is needed as a preparation
> for adding PI support for fabrics drivers that will use an extended
> LBA format for metadata and will support more than 1 integrity segment.
>
> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
> Signed-off-by: Israel Rukshin <israelr@mellanox.com>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>

Reviewed-by:  James Smart <james.smart@broadcom.com>

-- james


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 07/16] nvme: introduce NVME_INLINE_METADATA_SG_CNT
  2020-05-04 15:57 ` [PATCH 07/16] nvme: introduce NVME_INLINE_METADATA_SG_CNT Max Gurtovoy
@ 2020-05-13 19:05   ` James Smart
  0 siblings, 0 replies; 58+ messages in thread
From: James Smart @ 2020-05-13 19:05 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, kbusch, hch, sagi, martin.petersen,
	jsmart2021, axboe
  Cc: vladimirk, idanb, israelr, shlomin, oren, nitzanc



On 5/4/2020 8:57 AM, Max Gurtovoy wrote:
> From: Israel Rukshin <israelr@mellanox.com>
>
> SGL size of metadata is usually small. Thus, 1 inline sg should cover
> most cases. The macro will be used for pre-allocate a single SGL entry
> for metadata. The preallocation of small inline SGLs depends on SG_CHAIN
> capability so if the ARCH doesn't support SG_CHAIN, use the runtime
> allocation for the SGL. This patch is a preparation for adding metadata
> (T10-PI) over fabric support.
>
> Signed-off-by: Israel Rukshin <israelr@mellanox.com>
> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>

Reviewed-by:  James Smart <james.smart@broadcom.com>

-- james


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 10/16] nvmet: add metadata characteristics for a namespace
  2020-05-04 15:57 ` [PATCH 10/16] nvmet: add metadata characteristics for a namespace Max Gurtovoy
@ 2020-05-13 19:25   ` James Smart
  2020-05-14  3:06     ` Martin K. Petersen
  0 siblings, 1 reply; 58+ messages in thread
From: James Smart @ 2020-05-13 19:25 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, kbusch, hch, sagi, martin.petersen,
	jsmart2021, axboe
  Cc: vladimirk, idanb, israelr, shlomin, oren, nitzanc



On 5/4/2020 8:57 AM, Max Gurtovoy wrote:
> From: Israel Rukshin <israelr@mellanox.com>
>
> Fill those namespace fields from the block device format for adding
> metadata (T10-PI) over fabric support with block devices.
>
> Signed-off-by: Israel Rukshin <israelr@mellanox.com>
> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/target/io-cmd-bdev.c | 22 ++++++++++++++++++++++
>   drivers/nvme/target/nvmet.h       |  3 +++
>   2 files changed, 25 insertions(+)
>
> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
> index 0427e04..45dc644 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -47,6 +47,22 @@ void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id)
>   	id->nows = to0based(ql->io_opt / ql->logical_block_size);
>   }
>   
> +static void nvmet_bdev_ns_enable_integrity(struct nvmet_ns *ns)
> +{
> +	struct blk_integrity *bi = bdev_get_integrity(ns->bdev);
> +
> +	if (bi) {
> +		ns->metadata_size = bi->tuple_size;
> +		if (bi->profile == &t10_pi_type1_crc)
> +			ns->pi_type = NVME_NS_DPS_PI_TYPE1;
> +		else if (bi->profile == &t10_pi_type3_crc)
> +			ns->pi_type = NVME_NS_DPS_PI_TYPE3;
> +		else
> +			/* Unsupported metadata type */
> +			ns->metadata_size = 0;
> +	}
> +}
> +
>   int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
>   {
>   	int ret;
> @@ -64,6 +80,12 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
>   	}
>   	ns->size = i_size_read(ns->bdev->bd_inode);
>   	ns->blksize_shift = blksize_bits(bdev_logical_block_size(ns->bdev));
> +
> +	ns->pi_type = 0;
> +	ns->metadata_size = 0;
> +	if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY))
> +		nvmet_bdev_ns_enable_integrity(ns);
> +

Ok. In the future we will want to consider a case of a transport 
stripping/inserting PI data, even if not implemented by the controller's 
bdev. Although it only checks fabric transmission, it's better than 0 
checking.

I assume, all bdevs that support an integrity profile will report a 
blocksize that does not contain the integrity data ?

And we're going to fail as badly now as before if the bdev didn't have a 
power-of-2 blocksize, or supported metadata (a blk-integrity buffer) and 
metadata_size isn't set ?

-- james


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 11/16] nvmet: rename nvmet_rw_len to nvmet_rw_data_len
  2020-05-04 15:57 ` [PATCH 11/16] nvmet: rename nvmet_rw_len to nvmet_rw_data_len Max Gurtovoy
@ 2020-05-13 19:25   ` James Smart
  0 siblings, 0 replies; 58+ messages in thread
From: James Smart @ 2020-05-13 19:25 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, kbusch, hch, sagi, martin.petersen,
	jsmart2021, axboe
  Cc: vladimirk, idanb, israelr, shlomin, oren, nitzanc



On 5/4/2020 8:57 AM, Max Gurtovoy wrote:
> From: Israel Rukshin <israelr@mellanox.com>
>
> The function doesn't add the metadata length (only data length is
> calculated). This is preparation for adding metadata (T10-PI) support.
>
> Signed-off-by: Israel Rukshin <israelr@mellanox.com>
> Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
>   drivers/nvme/target/io-cmd-bdev.c | 2 +-
>   drivers/nvme/target/io-cmd-file.c | 2 +-
>   drivers/nvme/target/nvmet.h       | 2 +-
>   3 files changed, 3 insertions(+), 3 deletions(-)
>
>

Reviewed-by:  James Smart <james.smart@broadcom.com>

-- james


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 12/16] nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len
  2020-05-04 15:57 ` [PATCH 12/16] nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len Max Gurtovoy
@ 2020-05-13 19:27   ` James Smart
  0 siblings, 0 replies; 58+ messages in thread
From: James Smart @ 2020-05-13 19:27 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, kbusch, hch, sagi, martin.petersen,
	jsmart2021, axboe
  Cc: vladimirk, idanb, israelr, shlomin, oren, nitzanc



On 5/4/2020 8:57 AM, Max Gurtovoy wrote:
> From: Israel Rukshin <israelr@mellanox.com>
>
> The function doesn't check only the data length, because the transfer
> length includes also the metadata length in some cases. This is
> preparation for adding metadata (T10-PI) support.
>
> Signed-off-by: Israel Rukshin <israelr@mellanox.com>
> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
>

Reviewed-by:  James Smart <james.smart@broadcom.com>

-- james


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 13/16] nvme: add Metadata Capabilities enumerations
  2020-05-04 15:57 ` [PATCH 13/16] nvme: add Metadata Capabilities enumerations Max Gurtovoy
@ 2020-05-13 19:27   ` James Smart
  2020-05-14  3:07   ` Martin K. Petersen
  1 sibling, 0 replies; 58+ messages in thread
From: James Smart @ 2020-05-13 19:27 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, kbusch, hch, sagi, martin.petersen,
	jsmart2021, axboe
  Cc: vladimirk, idanb, israelr, shlomin, oren, nitzanc



On 5/4/2020 8:57 AM, Max Gurtovoy wrote:
> From: Israel Rukshin <israelr@mellanox.com>
>
> The enumerations will be used to expose the namespace metadata format by
> the target.
>
> Suggested-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Israel Rukshin <israelr@mellanox.com>
> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>   include/linux/nvme.h | 6 ++++++
>   1 file changed, 6 insertions(+)
>
>

Reviewed-by:  James Smart <james.smart@broadcom.com>

-- james


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 14/16] nvmet: add metadata/T10-PI support
  2020-05-04 15:57 ` [PATCH 14/16] nvmet: add metadata/T10-PI support Max Gurtovoy
@ 2020-05-13 19:51   ` James Smart
  2020-05-14 15:09     ` Max Gurtovoy
  0 siblings, 1 reply; 58+ messages in thread
From: James Smart @ 2020-05-13 19:51 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, kbusch, hch, sagi, martin.petersen,
	jsmart2021, axboe
  Cc: vladimirk, idanb, israelr, shlomin, oren, nitzanc



On 5/4/2020 8:57 AM, Max Gurtovoy wrote:
> From: Israel Rukshin <israelr@mellanox.com>
>
> Expose the namespace metadata format when PI is enabled. The user needs
> to enable the capability per subsystem and per port. The other metadata
> properties are taken from the namespace/bdev.
>
> Usage example:
> echo 1 > /config/nvmet/subsystems/${NAME}/attr_pi_enable
> echo 1 > /config/nvmet/ports/${PORT_NUM}/param_pi_enable
>
> Signed-off-by: Israel Rukshin <israelr@mellanox.com>
> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
> ---
>   drivers/nvme/target/admin-cmd.c   | 26 +++++++++++++++---
>   drivers/nvme/target/configfs.c    | 58 +++++++++++++++++++++++++++++++++++++++
>   drivers/nvme/target/core.c        | 21 +++++++++++---
>   drivers/nvme/target/fabrics-cmd.c |  7 +++--
>   drivers/nvme/target/nvmet.h       | 19 +++++++++++++
>   5 files changed, 121 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
> index 905aba8..9c8a00d 100644
> --- a/drivers/nvme/target/admin-cmd.c
> +++ b/drivers/nvme/target/admin-cmd.c
> @@ -341,6 +341,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
>   {
>   	struct nvmet_ctrl *ctrl = req->sq->ctrl;
>   	struct nvme_id_ctrl *id;
> +	u32 cmd_capsule_size;
>   	u16 status = 0;
>   
>   	id = kzalloc(sizeof(*id), GFP_KERNEL);
> @@ -433,9 +434,15 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
>   
>   	strlcpy(id->subnqn, ctrl->subsys->subsysnqn, sizeof(id->subnqn));
>   
> -	/* Max command capsule size is sqe + single page of in-capsule data */
> -	id->ioccsz = cpu_to_le32((sizeof(struct nvme_command) +
> -				  req->port->inline_data_size) / 16);
> +	/*
> +	 * Max command capsule size is sqe + in-capsule data size.
> +	 * Disable in-capsule data for Metadata capable controllers.
> +	 */
> +	cmd_capsule_size = sizeof(struct nvme_command);
> +	if (!ctrl->pi_support)
> +		cmd_capsule_size += req->port->inline_data_size;
> +	id->ioccsz = cpu_to_le32(cmd_capsule_size / 16);
> +

This seems odd to me, and should probably be referencing attributes of 
the a transport, which should probably be set based on what transport 
port the command was received on.


>   	/* Max response capsule size is cqe */
>   	id->iorcsz = cpu_to_le32(sizeof(struct nvme_completion) / 16);
>   
> @@ -465,6 +472,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
>   
>   static void nvmet_execute_identify_ns(struct nvmet_req *req)
>   {
> +	struct nvmet_ctrl *ctrl = req->sq->ctrl;
>   	struct nvmet_ns *ns;
>   	struct nvme_id_ns *id;
>   	u16 status = 0;
> @@ -482,7 +490,7 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
>   	}
>   
>   	/* return an all zeroed buffer if we can't find an active namespace */
> -	ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
> +	ns = nvmet_find_namespace(ctrl, req->cmd->identify.nsid);
>   	if (!ns)
>   		goto done;
>   
> @@ -526,6 +534,16 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
>   
>   	id->lbaf[0].ds = ns->blksize_shift;
>   
> +	if (ctrl->pi_support && nvmet_ns_has_pi(ns)) {
> +		id->dpc = NVME_NS_DPC_PI_FIRST | NVME_NS_DPC_PI_LAST |
> +			  NVME_NS_DPC_PI_TYPE1 | NVME_NS_DPC_PI_TYPE2 |
> +			  NVME_NS_DPC_PI_TYPE3;

Why is PI_TYPE2 getting set if nvmet_bdev_ns_enable_integrity() won't 
set a metadata size ?

> +		id->mc = NVME_MC_EXTENDED_LBA;
> +		id->dps = ns->pi_type;
> +		id->flbas = NVME_NS_FLBAS_META_EXT;

I guess this is ok - always requiring extended lbas. Will this ever 
change over time ?

> +		id->lbaf[0].ms = ns->metadata_size;
> +	}
> +
>   	if (ns->readonly)
>   		id->nsattr |= (1 << 0);
>   	nvmet_put_namespace(ns);
...
> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
> index 50dfc60..9f6b0f9 100644
> --- a/drivers/nvme/target/core.c
> +++ b/drivers/nvme/target/core.c
> @@ -322,12 +322,21 @@ int nvmet_enable_port(struct nvmet_port *port)
>   	if (!try_module_get(ops->owner))
>   		return -EINVAL;
>   
> -	ret = ops->add_port(port);
> -	if (ret) {
> -		module_put(ops->owner);
> -		return ret;
> +	/*
> +	 * If the user requested PI support and the transport isn't pi capable,
> +	 * don't enable the port.
> +	 */
> +	if (port->pi_enable && !ops->metadata_support) {
> +		pr_err("T10-PI is not supported by transport type %d\n",
> +		       port->disc_addr.trtype);
> +		ret = -EINVAL;
> +		goto out_put;
>   	}
>   
> +	ret = ops->add_port(port);
> +	if (ret)
> +		goto out_put;
> +

ok for now - but it's going to have to be changed later to check the 
port attributes, not the transport ops.  So it'll go back to add_port, 
then validation w/ delete_port if fails.

>   	/* If the transport didn't set inline_data_size, then disable it. */
>   	if (port->inline_data_size < 0)
>   		port->inline_data_size = 0;
> @@ -335,6 +344,10 @@ int nvmet_enable_port(struct nvmet_port *port)
>   	port->enabled = true;
>   	port->tr_ops = ops;
>   	return 0;
> +
> +out_put:
> +	module_put(ops->owner);
> +	return ret;
>   }
>   
>   void nvmet_disable_port(struct nvmet_port *port)
> diff --git a/drivers/nvme/target/fabrics-cmd.c b/drivers/nvme/target/fabrics-cmd.c
> index 52a6f70..42bd12b 100644
> --- a/drivers/nvme/target/fabrics-cmd.c
> +++ b/drivers/nvme/target/fabrics-cmd.c
> @@ -197,6 +197,8 @@ static void nvmet_execute_admin_connect(struct nvmet_req *req)
>   		goto out;
>   	}
>   
> +	ctrl->pi_support = ctrl->port->pi_enable && ctrl->subsys->pi_support;
> +
>   	uuid_copy(&ctrl->hostid, &d->hostid);
>   
>   	status = nvmet_install_queue(ctrl, req);
> @@ -205,8 +207,9 @@ static void nvmet_execute_admin_connect(struct nvmet_req *req)
>   		goto out;
>   	}
>   
> -	pr_info("creating controller %d for subsystem %s for NQN %s.\n",
> -		ctrl->cntlid, ctrl->subsys->subsysnqn, ctrl->hostnqn);
> +	pr_info("creating controller %d for subsystem %s for NQN %s%s.\n",
> +		ctrl->cntlid, ctrl->subsys->subsysnqn, ctrl->hostnqn,
> +		ctrl->pi_support ? " T10-PI is enabled" : "");
>   	req->cqe->result.u16 = cpu_to_le16(ctrl->cntlid);
>   
>   out:
> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
> index ceedaaf..2986e2c 100644
> --- a/drivers/nvme/target/nvmet.h
> +++ b/drivers/nvme/target/nvmet.h
> @@ -145,6 +145,7 @@ struct nvmet_port {
>   	bool				enabled;
>   	int				inline_data_size;
>   	const struct nvmet_fabrics_ops	*tr_ops;
> +	bool				pi_enable;

ok - but would have liked to have seen this generic-ized for metadata 
support, then PI in the metadata.


-- james




_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 03/16] nvme: introduce NVME_NS_METADATA_SUPPORTED flag
  2020-05-13 18:18             ` Christoph Hellwig
@ 2020-05-13 19:53               ` James Smart
  0 siblings, 0 replies; 58+ messages in thread
From: James Smart @ 2020-05-13 19:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, jsmart2021, sagi, martin.petersen, idanb, israelr,
	vladimirk, linux-nvme, shlomin, oren, kbusch, Max Gurtovoy,
	nitzanc

On 5/13/2020 11:18 AM, Christoph Hellwig wrote:
> On Mon, May 11, 2020 at 04:50:09PM -0700, James Smart wrote:
>> Lets move on. I've been a little short of time. I should review the meatier
>> patch, patch 6, tomorrow sometime.
> Can you get to it?  I'd like to pick this series up in the next days.

comments posted.  A few questions, but only patch 14 significant.

-- james

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 01/16] block: always define struct blk_integrity in genhd.h
  2020-05-04 15:57 ` [PATCH 01/16] block: always define struct blk_integrity in genhd.h Max Gurtovoy
@ 2020-05-14  2:51   ` Martin K. Petersen
  0 siblings, 0 replies; 58+ messages in thread
From: Martin K. Petersen @ 2020-05-14  2:51 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, jsmart2021, sagi, martin.petersen, shlomin, israelr,
	vladimirk, linux-nvme, idanb, oren, kbusch, nitzanc, hch


Max,

> This will reduce the amount of ifdefs inside the source code for
> various drivers and also will reduce the amount of stub functions that
> were created for the !CONFIG_BLK_DEV_INTEGRITY case.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 02/16] nvme: introduce namespace features flag
  2020-05-04 15:57 ` [PATCH 02/16] nvme: introduce namespace features flag Max Gurtovoy
  2020-05-04 23:59   ` James Smart
@ 2020-05-14  2:52   ` Martin K. Petersen
  1 sibling, 0 replies; 58+ messages in thread
From: Martin K. Petersen @ 2020-05-14  2:52 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, jsmart2021, sagi, martin.petersen, shlomin, israelr,
	vladimirk, linux-nvme, idanb, oren, kbusch, nitzanc, hch


Max,

> Replace the specific ext boolean (that implies on extended LBA format)
> with a feature in the new namespace features flag. This is a
> preparation for adding more namespace features (such as metadata
> specific features).

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 03/16] nvme: introduce NVME_NS_METADATA_SUPPORTED flag
  2020-05-04 15:57 ` [PATCH 03/16] nvme: introduce NVME_NS_METADATA_SUPPORTED flag Max Gurtovoy
  2020-05-05 23:33   ` James Smart
@ 2020-05-14  2:53   ` Martin K. Petersen
  1 sibling, 0 replies; 58+ messages in thread
From: Martin K. Petersen @ 2020-05-14  2:53 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, jsmart2021, sagi, martin.petersen, shlomin, israelr,
	vladimirk, linux-nvme, idanb, oren, kbusch, nitzanc, hch


Max,

> This is a preparation for adding support for metadata in fabric
> controllers. New flag will imply that NVMe namespace supports getting
> metadata that was originally generated by host's block layer.

I share some of James' concerns. But OK for now.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 04/16] nvme: make nvme_ns_has_pi accessible to transports
  2020-05-04 15:57 ` [PATCH 04/16] nvme: make nvme_ns_has_pi accessible to transports Max Gurtovoy
@ 2020-05-14  2:53   ` Martin K. Petersen
  0 siblings, 0 replies; 58+ messages in thread
From: Martin K. Petersen @ 2020-05-14  2:53 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, jsmart2021, sagi, martin.petersen, idanb, israelr,
	vladimirk, linux-nvme, shlomin, oren, kbusch, nitzanc, hch


Max,

> Move the nvme_ns_has_pi() inline from core.c to the nvme.h header.
> This allows use by the transports.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 06/16] nvme: enforce extended LBA format for fabrics metadata
  2020-05-13 19:03   ` James Smart
@ 2020-05-14  2:56     ` Martin K. Petersen
  2020-05-14  8:28       ` Max Gurtovoy
  2020-05-14  8:15     ` Max Gurtovoy
  1 sibling, 1 reply; 58+ messages in thread
From: Martin K. Petersen @ 2020-05-14  2:56 UTC (permalink / raw)
  To: James Smart
  Cc: axboe, jsmart2021, sagi, martin.petersen, idanb, israelr,
	vladimirk, linux-nvme, shlomin, oren, kbusch, Max Gurtovoy,
	nitzanc, hch


James,

> This is ok. Later patches will should deal with no block-layer
> integrity buffer, but having the transport port auto-generate/strip
> interleaved pi data, as we will want to protect during transmission on
> the wire, not just PRACT=1 it.

Yeah. Whether the integrity is supplied by the block layer is a per-I/O
property. Revalidation should record capabilities, not enforce any
particular set of I/O properties.

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 09/16] nvme-rdma: add metadata/T10-PI support
  2020-05-04 15:57 ` [PATCH 09/16] nvme-rdma: add metadata/T10-PI support Max Gurtovoy
  2020-05-05  6:12   ` Christoph Hellwig
@ 2020-05-14  3:02   ` Martin K. Petersen
  2020-05-14  8:48     ` Max Gurtovoy
  1 sibling, 1 reply; 58+ messages in thread
From: Martin K. Petersen @ 2020-05-14  3:02 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, jsmart2021, sagi, martin.petersen, shlomin, israelr,
	vladimirk, linux-nvme, idanb, oren, kbusch, nitzanc, hch


Max,

> +static void nvme_rdma_set_sig_domain(struct blk_integrity *bi,
> +		struct nvme_command *cmd, struct ib_sig_domain *domain,
> +		u16 control)
> +{
> +	domain->sig_type = IB_SIG_TYPE_T10_DIF;
> +	domain->sig.dif.bg_type = IB_T10DIF_CRC;
> +	domain->sig.dif.pi_interval = 1 << bi->interval_exp;
> +	domain->sig.dif.ref_tag = le32_to_cpu(cmd->rw.reftag);
> +	domain->sig.dif.apptag_check_mask = 0xffff;
> +	domain->sig.dif.app_escape = true;
> +	domain->sig.dif.ref_escape = true;
> +	if (control & NVME_RW_PRINFO_PRCHK_REF)
> +		domain->sig.dif.ref_remap = true;
> +}

Why are you setting ref_escape? In DIX that means that checking will
only be disabled for blocks that have a ref tag value of 0xffffffff
*and* an app tag value of 0xffff. That's specific to Type 3. Whereas
both Type 1 and Type 2 rely on being able to disable checking for blocks
with an app tag value of 0xffff regardless of the ref tag value.

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 10/16] nvmet: add metadata characteristics for a namespace
  2020-05-13 19:25   ` James Smart
@ 2020-05-14  3:06     ` Martin K. Petersen
  0 siblings, 0 replies; 58+ messages in thread
From: Martin K. Petersen @ 2020-05-14  3:06 UTC (permalink / raw)
  To: James Smart
  Cc: axboe, jsmart2021, sagi, martin.petersen, idanb, israelr,
	vladimirk, linux-nvme, shlomin, oren, kbusch, Max Gurtovoy,
	nitzanc, hch


James,

> I assume, all bdevs that support an integrity profile will report a
> blocksize that does not contain the integrity data ?

They must.

> And we're going to fail as badly now as before if the bdev didn't have
> a power-of-2 blocksize, or supported metadata (a blk-integrity buffer)
> and metadata_size isn't set ?

We do not support logical block sizes that are not powers of two.

The whole point of PI was to keep the logical block size at 512 despite
520 bytes going over the wire per block. And the point of DIX was to
accommodate sending and receiving the PI portion without having to
change the I/O path to accommodate non-powers-of-two.

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 13/16] nvme: add Metadata Capabilities enumerations
  2020-05-04 15:57 ` [PATCH 13/16] nvme: add Metadata Capabilities enumerations Max Gurtovoy
  2020-05-13 19:27   ` James Smart
@ 2020-05-14  3:07   ` Martin K. Petersen
  1 sibling, 0 replies; 58+ messages in thread
From: Martin K. Petersen @ 2020-05-14  3:07 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, jsmart2021, sagi, martin.petersen, shlomin, israelr,
	vladimirk, linux-nvme, idanb, oren, kbusch, nitzanc, hch


Max,

> The enumerations will be used to expose the namespace metadata format
> by the target.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 16/16] nvmet-rdma: add metadata/T10-PI support
  2020-05-04 15:57 ` [PATCH 16/16] nvmet-rdma: add metadata/T10-PI support Max Gurtovoy
@ 2020-05-14  3:10   ` Martin K. Petersen
  2020-05-14  8:55     ` Max Gurtovoy
  0 siblings, 1 reply; 58+ messages in thread
From: Martin K. Petersen @ 2020-05-14  3:10 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, jsmart2021, sagi, martin.petersen, shlomin, israelr,
	vladimirk, linux-nvme, idanb, oren, kbusch, nitzanc, hch


Max,

> +static void nvmet_rdma_set_sig_domain(struct blk_integrity *bi,
> +		struct nvme_command *cmd, struct ib_sig_domain *domain,
> +		u16 control)
> +{
> +	domain->sig_type = IB_SIG_TYPE_T10_DIF;
> +	domain->sig.dif.bg_type = IB_T10DIF_CRC;
> +	domain->sig.dif.pi_interval = 1 << bi->interval_exp;
> +	domain->sig.dif.ref_tag = le32_to_cpu(cmd->rw.reftag);
> +	domain->sig.dif.apptag_check_mask = 0xffff;
> +	domain->sig.dif.app_escape = true;
> +	domain->sig.dif.ref_escape = true;
> +	if (control & NVME_RW_PRINFO_PRCHK_REF)
> +		domain->sig.dif.ref_remap = true;
> +}

Same comment as the initiator side wrt. the ref tag escape.

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 06/16] nvme: enforce extended LBA format for fabrics metadata
  2020-05-13 19:03   ` James Smart
  2020-05-14  2:56     ` Martin K. Petersen
@ 2020-05-14  8:15     ` Max Gurtovoy
  1 sibling, 0 replies; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-14  8:15 UTC (permalink / raw)
  To: James Smart, linux-nvme, kbusch, hch, sagi, martin.petersen,
	jsmart2021, axboe
  Cc: vladimirk, idanb, israelr, shlomin, oren, nitzanc


On 5/13/2020 10:03 PM, James Smart wrote:
>
>
> On 5/4/2020 8:57 AM, Max Gurtovoy wrote:
>> An extended LBA is a larger LBA that is created when metadata associated
>> with the LBA is transferred contiguously with the LBA data (AKA
>> interleaved). The metadata may be either transferred as part of the LBA
>> (creating an extended LBA) or it may be transferred as a separate
>> contiguous buffer of data. According to the NVMeoF spec, a fabrics ctrl
>> supports only an Extended LBA format. Fail revalidation in case we 
>> have a
>> spec violation. Also add a flag that will imply on capable transports 
>> and
>> controllers as part of a preparation for allowing end-to-end protection
>> information for fabric controllers.
>>
>> Suggested-by: Christoph Hellwig <hch@lst.de>
>> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
>> Signed-off-by: Israel Rukshin <israelr@mellanox.com>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> ---
>>   drivers/nvme/host/core.c | 41 
>> +++++++++++++++++++++++++++--------------
>>   1 file changed, 27 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 89b1bfcb..f565281 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -1941,16 +1942,24 @@ static void __nvme_revalidate_disk(struct 
>> gendisk *disk, struct nvme_id_ns *id)
>>           ns->pi_type = 0;
>>         if (ns->ms) {
>> -        if (id->flbas & NVME_NS_FLBAS_META_EXT)
>> -            ns->features |= NVME_NS_EXT_LBAS;
>> -
>>           /*
>> -         * For PCI, Extended logical block will be generated by the
>> -         * controller. Non-extended format can be generated by the
>> -         * block layer.
>> +         * For PCIe only the separate metadata pointer is supported,
>> +         * as the block layer supplies metadata in a separate bio_vec
>> +         * chain. For Fabrics, only metadata as part of extended data
>> +         * LBA is supported on the wire per the Fabrics specification,
>> +         * but the HBA/HCA will do the remapping from the separate
>> +         * metadata buffers for us.
>>            */
>> -        if (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) {
>> -            if (!(ns->features & NVME_NS_EXT_LBAS))
>> +        if (id->flbas & NVME_NS_FLBAS_META_EXT) {
>> +            ns->features |= NVME_NS_EXT_LBAS;
>> +            if ((ctrl->ops->flags & NVME_F_FABRICS) &&
>> +                (ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) &&
>> +                ctrl->max_integrity_segments)
>> +                ns->features |= NVME_NS_METADATA_SUPPORTED;
>
> This is ok.  Later patches will should deal with no block-layer 
> integrity buffer, but having the transport port auto-generate/strip 
> interleaved pi data, as we will want to protect during transmission on 
> the wire, not just PRACT=1 it.

PRACT=1 will imply that action should be done by the controller (can be 
host controller for fabrics as well).

In RDMA, if we don't get block-layer integrity buffer, we'll 
generate/strip metadata and it will be sent on the wire.

I agree that this won't be done if !IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) 
and we can fix it in the future.


>
>> +        } else {
>> +            if (WARN_ON_ONCE(ctrl->ops->flags & NVME_F_FABRICS))
>> +                return -EINVAL;
>> +            if (ctrl->ops->flags & NVME_F_METADATA_SUPPORTED)
>>                   ns->features |= NVME_NS_METADATA_SUPPORTED;
>>           }
>
> The NVME_F_METADATA_SUPPORTED flag should be on a per-ctrl basis. 
> There will be different transport ports under the same transport with 
> different abilities.
>
> I'm ok if these patches go in, then a later patch moves the flags 
> (might as well move NVME_F_FABRICS too).

ctrl->max_integrity_segments will cover this as well:

if ((ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) && 
ctrl->max_integrity_segments)


>
>
> Reviewed-by:  James Smart <james.smart@broadcom.com>
>
> -- james
>
>

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 06/16] nvme: enforce extended LBA format for fabrics metadata
  2020-05-14  2:56     ` Martin K. Petersen
@ 2020-05-14  8:28       ` Max Gurtovoy
  0 siblings, 0 replies; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-14  8:28 UTC (permalink / raw)
  To: Martin K. Petersen, James Smart
  Cc: axboe, jsmart2021, sagi, vladimirk, idanb, israelr, linux-nvme,
	shlomin, oren, kbusch, nitzanc, hch


On 5/14/2020 5:56 AM, Martin K. Petersen wrote:
> James,
>
>> This is ok. Later patches will should deal with no block-layer
>> integrity buffer, but having the transport port auto-generate/strip
>> interleaved pi data, as we will want to protect during transmission on
>> the wire, not just PRACT=1 it.
> Yeah. Whether the integrity is supplied by the block layer is a per-I/O
> property. Revalidation should record capabilities, not enforce any
> particular set of I/O properties.

maybe the subject is not the best but I do want to verify that we don't 
have a spec violation for fabrics controllers.



_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 09/16] nvme-rdma: add metadata/T10-PI support
  2020-05-14  3:02   ` Martin K. Petersen
@ 2020-05-14  8:48     ` Max Gurtovoy
  2020-05-14 22:40       ` Martin K. Petersen
  0 siblings, 1 reply; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-14  8:48 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: axboe, jsmart2021, sagi, vladimirk, shlomin, israelr, linux-nvme,
	idanb, oren, kbusch, nitzanc, hch


On 5/14/2020 6:02 AM, Martin K. Petersen wrote:
> Max,
>
>> +static void nvme_rdma_set_sig_domain(struct blk_integrity *bi,
>> +		struct nvme_command *cmd, struct ib_sig_domain *domain,
>> +		u16 control)
>> +{
>> +	domain->sig_type = IB_SIG_TYPE_T10_DIF;
>> +	domain->sig.dif.bg_type = IB_T10DIF_CRC;
>> +	domain->sig.dif.pi_interval = 1 << bi->interval_exp;
>> +	domain->sig.dif.ref_tag = le32_to_cpu(cmd->rw.reftag);
>> +	domain->sig.dif.apptag_check_mask = 0xffff;
>> +	domain->sig.dif.app_escape = true;
>> +	domain->sig.dif.ref_escape = true;
>> +	if (control & NVME_RW_PRINFO_PRCHK_REF)
>> +		domain->sig.dif.ref_remap = true;
>> +}
> Why are you setting ref_escape? In DIX that means that checking will
> only be disabled for blocks that have a ref tag value of 0xffffffff
> *and* an app tag value of 0xffff. That's specific to Type 3. Whereas
> both Type 1 and Type 2 rely on being able to disable checking for blocks
> with an app tag value of 0xffff regardless of the ref tag value.


Thanks for catching this Martin.

will this fix it (probably should be fixed in iser/isert as well) ?


diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 05dbed3..9b9d11b 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1338,9 +1338,10 @@ static void nvme_rdma_set_sig_domain(struct 
blk_integrity *bi,
         domain->sig.dif.ref_tag = le32_to_cpu(cmd->rw.reftag);
         domain->sig.dif.apptag_check_mask = 0xffff;
         domain->sig.dif.app_escape = true;
-       domain->sig.dif.ref_escape = true;
         if (control & NVME_RW_PRINFO_PRCHK_REF)
                 domain->sig.dif.ref_remap = true;
+       else
+               domain->sig.dif.ref_escape = true;
  }




_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 16/16] nvmet-rdma: add metadata/T10-PI support
  2020-05-14  3:10   ` Martin K. Petersen
@ 2020-05-14  8:55     ` Max Gurtovoy
  0 siblings, 0 replies; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-14  8:55 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: axboe, jsmart2021, sagi, vladimirk, shlomin, israelr, linux-nvme,
	idanb, oren, kbusch, nitzanc, hch


On 5/14/2020 6:10 AM, Martin K. Petersen wrote:
> Max,
>
>> +static void nvmet_rdma_set_sig_domain(struct blk_integrity *bi,
>> +		struct nvme_command *cmd, struct ib_sig_domain *domain,
>> +		u16 control)
>> +{
>> +	domain->sig_type = IB_SIG_TYPE_T10_DIF;
>> +	domain->sig.dif.bg_type = IB_T10DIF_CRC;
>> +	domain->sig.dif.pi_interval = 1 << bi->interval_exp;
>> +	domain->sig.dif.ref_tag = le32_to_cpu(cmd->rw.reftag);
>> +	domain->sig.dif.apptag_check_mask = 0xffff;
>> +	domain->sig.dif.app_escape = true;
>> +	domain->sig.dif.ref_escape = true;
>> +	if (control & NVME_RW_PRINFO_PRCHK_REF)
>> +		domain->sig.dif.ref_remap = true;
>> +}
> Same comment as the initiator side wrt. the ref tag escape.

same fix as well:


diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index ec59f67..e26cdd7 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -581,9 +581,10 @@ static void nvmet_rdma_set_sig_domain(struct 
blk_integrity *bi,
         domain->sig.dif.ref_tag = le32_to_cpu(cmd->rw.reftag);
         domain->sig.dif.apptag_check_mask = 0xffff;
         domain->sig.dif.app_escape = true;
-       domain->sig.dif.ref_escape = true;
         if (control & NVME_RW_PRINFO_PRCHK_REF)
                 domain->sig.dif.ref_remap = true;
+       else
+               domain->sig.dif.ref_escape = true;
  }




_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 14/16] nvmet: add metadata/T10-PI support
  2020-05-13 19:51   ` James Smart
@ 2020-05-14 15:09     ` Max Gurtovoy
  2020-05-14 15:37       ` James Smart
  0 siblings, 1 reply; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-14 15:09 UTC (permalink / raw)
  To: James Smart, linux-nvme, kbusch, hch, sagi, martin.petersen,
	jsmart2021, axboe
  Cc: vladimirk, idanb, israelr, shlomin, oren, nitzanc


On 5/13/2020 10:51 PM, James Smart wrote:
>
>
> On 5/4/2020 8:57 AM, Max Gurtovoy wrote:
>> From: Israel Rukshin <israelr@mellanox.com>
>>
>> Expose the namespace metadata format when PI is enabled. The user needs
>> to enable the capability per subsystem and per port. The other metadata
>> properties are taken from the namespace/bdev.
>>
>> Usage example:
>> echo 1 > /config/nvmet/subsystems/${NAME}/attr_pi_enable
>> echo 1 > /config/nvmet/ports/${PORT_NUM}/param_pi_enable
>>
>> Signed-off-by: Israel Rukshin <israelr@mellanox.com>
>> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
>> ---
>>   drivers/nvme/target/admin-cmd.c   | 26 +++++++++++++++---
>>   drivers/nvme/target/configfs.c    | 58 
>> +++++++++++++++++++++++++++++++++++++++
>>   drivers/nvme/target/core.c        | 21 +++++++++++---
>>   drivers/nvme/target/fabrics-cmd.c |  7 +++--
>>   drivers/nvme/target/nvmet.h       | 19 +++++++++++++
>>   5 files changed, 121 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/nvme/target/admin-cmd.c 
>> b/drivers/nvme/target/admin-cmd.c
>> index 905aba8..9c8a00d 100644
>> --- a/drivers/nvme/target/admin-cmd.c
>> +++ b/drivers/nvme/target/admin-cmd.c
>> @@ -341,6 +341,7 @@ static void nvmet_execute_identify_ctrl(struct 
>> nvmet_req *req)
>>   {
>>       struct nvmet_ctrl *ctrl = req->sq->ctrl;
>>       struct nvme_id_ctrl *id;
>> +    u32 cmd_capsule_size;
>>       u16 status = 0;
>>         id = kzalloc(sizeof(*id), GFP_KERNEL);
>> @@ -433,9 +434,15 @@ static void nvmet_execute_identify_ctrl(struct 
>> nvmet_req *req)
>>         strlcpy(id->subnqn, ctrl->subsys->subsysnqn, 
>> sizeof(id->subnqn));
>>   -    /* Max command capsule size is sqe + single page of in-capsule 
>> data */
>> -    id->ioccsz = cpu_to_le32((sizeof(struct nvme_command) +
>> -                  req->port->inline_data_size) / 16);
>> +    /*
>> +     * Max command capsule size is sqe + in-capsule data size.
>> +     * Disable in-capsule data for Metadata capable controllers.
>> +     */
>> +    cmd_capsule_size = sizeof(struct nvme_command);
>> +    if (!ctrl->pi_support)
>> +        cmd_capsule_size += req->port->inline_data_size;
>> +    id->ioccsz = cpu_to_le32(cmd_capsule_size / 16);
>> +
>
> This seems odd to me, and should probably be referencing attributes of 
> the a transport, which should probably be set based on what transport 
> port the command was received on.

we set this in admin_connect:

ctrl->pi_support = ctrl->port->pi_enable && ctrl->subsys->pi_support;


>
>
>>       /* Max response capsule size is cqe */
>>       id->iorcsz = cpu_to_le32(sizeof(struct nvme_completion) / 16);
>>   @@ -465,6 +472,7 @@ static void nvmet_execute_identify_ctrl(struct 
>> nvmet_req *req)
>>     static void nvmet_execute_identify_ns(struct nvmet_req *req)
>>   {
>> +    struct nvmet_ctrl *ctrl = req->sq->ctrl;
>>       struct nvmet_ns *ns;
>>       struct nvme_id_ns *id;
>>       u16 status = 0;
>> @@ -482,7 +490,7 @@ static void nvmet_execute_identify_ns(struct 
>> nvmet_req *req)
>>       }
>>         /* return an all zeroed buffer if we can't find an active 
>> namespace */
>> -    ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
>> +    ns = nvmet_find_namespace(ctrl, req->cmd->identify.nsid);
>>       if (!ns)
>>           goto done;
>>   @@ -526,6 +534,16 @@ static void nvmet_execute_identify_ns(struct 
>> nvmet_req *req)
>>         id->lbaf[0].ds = ns->blksize_shift;
>>   +    if (ctrl->pi_support && nvmet_ns_has_pi(ns)) {
>> +        id->dpc = NVME_NS_DPC_PI_FIRST | NVME_NS_DPC_PI_LAST |
>> +              NVME_NS_DPC_PI_TYPE1 | NVME_NS_DPC_PI_TYPE2 |
>> +              NVME_NS_DPC_PI_TYPE3;
>
> Why is PI_TYPE2 getting set if nvmet_bdev_ns_enable_integrity() won't 
> set a metadata size ?

It wont:

static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
{
         if (!IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY))
                 return false;
         return ns->pi_type && ns->metadata_size == sizeof(struct 
t10_pi_tuple);
}

>
>> +        id->mc = NVME_MC_EXTENDED_LBA;
>> +        id->dps = ns->pi_type;
>> +        id->flbas = NVME_NS_FLBAS_META_EXT;
>
> I guess this is ok - always requiring extended lbas. Will this ever 
> change over time ?

This is according to the SPEC of NVMe/Fabrics


>
>> +        id->lbaf[0].ms = ns->metadata_size;
>> +    }
>> +
>>       if (ns->readonly)
>>           id->nsattr |= (1 << 0);
>>       nvmet_put_namespace(ns);
> ...
>> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
>> index 50dfc60..9f6b0f9 100644
>> --- a/drivers/nvme/target/core.c
>> +++ b/drivers/nvme/target/core.c
>> @@ -322,12 +322,21 @@ int nvmet_enable_port(struct nvmet_port *port)
>>       if (!try_module_get(ops->owner))
>>           return -EINVAL;
>>   -    ret = ops->add_port(port);
>> -    if (ret) {
>> -        module_put(ops->owner);
>> -        return ret;
>> +    /*
>> +     * If the user requested PI support and the transport isn't pi 
>> capable,
>> +     * don't enable the port.
>> +     */
>> +    if (port->pi_enable && !ops->metadata_support) {
>> +        pr_err("T10-PI is not supported by transport type %d\n",
>> +               port->disc_addr.trtype);
>> +        ret = -EINVAL;
>> +        goto out_put;
>>       }
>>   +    ret = ops->add_port(port);
>> +    if (ret)
>> +        goto out_put;
>> +
>
> ok for now - but it's going to have to be changed later to check the 
> port attributes, not the transport ops.  So it'll go back to add_port, 
> then validation w/ delete_port if fails.
>
>>       /* If the transport didn't set inline_data_size, then disable 
>> it. */
>>       if (port->inline_data_size < 0)
>>           port->inline_data_size = 0;
>> @@ -335,6 +344,10 @@ int nvmet_enable_port(struct nvmet_port *port)
>>       port->enabled = true;
>>       port->tr_ops = ops;
>>       return 0;
>> +
>> +out_put:
>> +    module_put(ops->owner);
>> +    return ret;
>>   }
>>     void nvmet_disable_port(struct nvmet_port *port)
>> diff --git a/drivers/nvme/target/fabrics-cmd.c 
>> b/drivers/nvme/target/fabrics-cmd.c
>> index 52a6f70..42bd12b 100644
>> --- a/drivers/nvme/target/fabrics-cmd.c
>> +++ b/drivers/nvme/target/fabrics-cmd.c
>> @@ -197,6 +197,8 @@ static void nvmet_execute_admin_connect(struct 
>> nvmet_req *req)
>>           goto out;
>>       }
>>   +    ctrl->pi_support = ctrl->port->pi_enable && 
>> ctrl->subsys->pi_support;
>> +
>>       uuid_copy(&ctrl->hostid, &d->hostid);
>>         status = nvmet_install_queue(ctrl, req);
>> @@ -205,8 +207,9 @@ static void nvmet_execute_admin_connect(struct 
>> nvmet_req *req)
>>           goto out;
>>       }
>>   -    pr_info("creating controller %d for subsystem %s for NQN %s.\n",
>> -        ctrl->cntlid, ctrl->subsys->subsysnqn, ctrl->hostnqn);
>> +    pr_info("creating controller %d for subsystem %s for NQN %s%s.\n",
>> +        ctrl->cntlid, ctrl->subsys->subsysnqn, ctrl->hostnqn,
>> +        ctrl->pi_support ? " T10-PI is enabled" : "");
>>       req->cqe->result.u16 = cpu_to_le16(ctrl->cntlid);
>>     out:
>> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
>> index ceedaaf..2986e2c 100644
>> --- a/drivers/nvme/target/nvmet.h
>> +++ b/drivers/nvme/target/nvmet.h
>> @@ -145,6 +145,7 @@ struct nvmet_port {
>>       bool                enabled;
>>       int                inline_data_size;
>>       const struct nvmet_fabrics_ops    *tr_ops;
>> +    bool                pi_enable;
>
> ok - but would have liked to have seen this generic-ized for metadata 
> support, then PI in the metadata.
>
>
> -- james
>
>
>

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 14/16] nvmet: add metadata/T10-PI support
  2020-05-14 15:09     ` Max Gurtovoy
@ 2020-05-14 15:37       ` James Smart
  0 siblings, 0 replies; 58+ messages in thread
From: James Smart @ 2020-05-14 15:37 UTC (permalink / raw)
  To: Max Gurtovoy, James Smart, linux-nvme, kbusch, hch, sagi,
	martin.petersen, axboe
  Cc: vladimirk, idanb, israelr, shlomin, oren, nitzanc

ok - I'm good on this

-- james



_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support
  2020-05-05  6:13 ` [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add " Christoph Hellwig
@ 2020-05-14 15:55   ` Max Gurtovoy
  0 siblings, 0 replies; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-14 15:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, jsmart2021, sagi, martin.petersen, shlomin, israelr,
	vladimirk, linux-nvme, idanb, oren, kbusch, nitzanc


On 5/5/2020 9:13 AM, Christoph Hellwig wrote:
> The series looks good to me now, but I'll wait for a few more comments
> before applying it.

I'll wait for Martin's ack on ref_escape and will send V8 with 2 small 
fixes to RDMA drivers and reviewers signatures.

Hopefully this version can be merged..


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 09/16] nvme-rdma: add metadata/T10-PI support
  2020-05-14  8:48     ` Max Gurtovoy
@ 2020-05-14 22:40       ` Martin K. Petersen
  2020-05-15 14:50         ` Max Gurtovoy
  0 siblings, 1 reply; 58+ messages in thread
From: Martin K. Petersen @ 2020-05-14 22:40 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: axboe, jsmart2021, sagi, Martin K. Petersen, shlomin, israelr,
	vladimirk, linux-nvme, idanb, oren, kbusch, nitzanc, hch


Hi Max!

> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 05dbed3..9b9d11b 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -1338,9 +1338,10 @@ static void nvme_rdma_set_sig_domain(struct
> blk_integrity *bi,
>         domain->sig.dif.ref_tag = le32_to_cpu(cmd->rw.reftag);
>         domain->sig.dif.apptag_check_mask = 0xffff;

Also, this has me wondering. Not sure how it works in your case?

There are basically two modes of operation:

 1. The app tag is opaque storage and may be different for each LBA in
    the I/O. The only thing the HBA should do is to bypass all PI checks
    if the app tag value for a given block is 0xffff (if app_escape is
    true).

 2. The app tag, or part of it, is constant for an entire I/O. And for
    that model one specifies the mask identifying which bits in the app
    tag should be verified against a constant value. The HBA should
    still bypass PI checking if the app tag value for a given block is
    0xffff and app_escape is true.

Linux belongs to the school #1 of the app tag being opaque storage. I'm
not entirely sure how that works given apptag_check_mask = 0xffff which
sounds to me like you want to verify all bits in each app tag against a
constant value. But you don't set the constant value to check against.
What's going on there?

>         domain->sig.dif.app_escape = true;
> -       domain->sig.dif.ref_escape = true;
>         if (control & NVME_RW_PRINFO_PRCHK_REF)
>                 domain->sig.dif.ref_remap = true;
> +       else
> +               domain->sig.dif.ref_escape = true;
>  }

ref_escape should really only be true if the namespace is formatted with
Type 3. Whether the ref tag should be checked is orthogonal.

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 09/16] nvme-rdma: add metadata/T10-PI support
  2020-05-14 22:40       ` Martin K. Petersen
@ 2020-05-15 14:50         ` Max Gurtovoy
  2020-05-18 17:22           ` Martin K. Petersen
  0 siblings, 1 reply; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-15 14:50 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: axboe, jsmart2021, sagi, vladimirk, shlomin, israelr, linux-nvme,
	idanb, oren, kbusch, nitzanc, hch


On 5/15/2020 1:40 AM, Martin K. Petersen wrote:
> Hi Max!
>
>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>> index 05dbed3..9b9d11b 100644
>> --- a/drivers/nvme/host/rdma.c
>> +++ b/drivers/nvme/host/rdma.c
>> @@ -1338,9 +1338,10 @@ static void nvme_rdma_set_sig_domain(struct
>> blk_integrity *bi,
>>          domain->sig.dif.ref_tag = le32_to_cpu(cmd->rw.reftag);
>>          domain->sig.dif.apptag_check_mask = 0xffff;
> Also, this has me wondering. Not sure how it works in your case?
>
> There are basically two modes of operation:
>
>   1. The app tag is opaque storage and may be different for each LBA in
>      the I/O. The only thing the HBA should do is to bypass all PI checks
>      if the app tag value for a given block is 0xffff (if app_escape is
>      true).
>
>   2. The app tag, or part of it, is constant for an entire I/O. And for
>      that model one specifies the mask identifying which bits in the app
>      tag should be verified against a constant value. The HBA should
>      still bypass PI checking if the app tag value for a given block is
>      0xffff and app_escape is true.
>
> Linux belongs to the school #1 of the app tag being opaque storage. I'm
> not entirely sure how that works given apptag_check_mask = 0xffff which
> sounds to me like you want to verify all bits in each app tag against a
> constant value. But you don't set the constant value to check against.
> What's going on there?

You're right.

It works since apptag is always 0.

This code was borrowed from iSER, so I kinda assumed it works well.

Thanks for this review.

I'll fix it for iSER ULP as well.

>
>>          domain->sig.dif.app_escape = true;
>> -       domain->sig.dif.ref_escape = true;
>>          if (control & NVME_RW_PRINFO_PRCHK_REF)
>>                  domain->sig.dif.ref_remap = true;
>> +       else
>> +               domain->sig.dif.ref_escape = true;
>>   }
> ref_escape should really only be true if the namespace is formatted with
> Type 3. Whether the ref tag should be checked is orthogonal.

Will the bellow cover this issue:

static void nvme_rdma_set_sig_domain(struct blk_integrity *bi,
                 struct nvme_command *cmd, struct ib_sig_domain *domain,
                 u16 control, u8 pi_type)
{
         domain->sig_type = IB_SIG_TYPE_T10_DIF;
         domain->sig.dif.bg_type = IB_T10DIF_CRC;
         domain->sig.dif.pi_interval = 1 << bi->interval_exp;
         domain->sig.dif.ref_tag = le32_to_cpu(cmd->rw.reftag);
         if (control & NVME_RW_PRINFO_PRCHK_REF)
                 domain->sig.dif.ref_remap = true;

         domain->sig.dif.app_tag = le16_to_cpu(cmd->rw.apptag);
         domain->sig.dif.apptag_check_mask = le16_to_cpu(cmd->rw.appmask);
         domain->sig.dif.app_escape = true;
         if (pi_type == NVME_NS_DPS_PI_TYPE3)
                 domain->sig.dif.ref_escape = true;
}



_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 09/16] nvme-rdma: add metadata/T10-PI support
  2020-05-15 14:50         ` Max Gurtovoy
@ 2020-05-18 17:22           ` Martin K. Petersen
  0 siblings, 0 replies; 58+ messages in thread
From: Martin K. Petersen @ 2020-05-18 17:22 UTC (permalink / raw)
  To: Max Gurtovoy; +Cc: linux-nvme


Max,

> Will the bellow cover this issue:
>
> static void nvme_rdma_set_sig_domain(struct blk_integrity *bi,
>                 struct nvme_command *cmd, struct ib_sig_domain *domain,
>                 u16 control, u8 pi_type)
> {
>         domain->sig_type = IB_SIG_TYPE_T10_DIF;
>         domain->sig.dif.bg_type = IB_T10DIF_CRC;
>         domain->sig.dif.pi_interval = 1 << bi->interval_exp;
>         domain->sig.dif.ref_tag = le32_to_cpu(cmd->rw.reftag);
>         if (control & NVME_RW_PRINFO_PRCHK_REF)
>                 domain->sig.dif.ref_remap = true;
>
>         domain->sig.dif.app_tag = le16_to_cpu(cmd->rw.apptag);
>         domain->sig.dif.apptag_check_mask = le16_to_cpu(cmd->rw.appmask);
>         domain->sig.dif.app_escape = true;
>         if (pi_type == NVME_NS_DPS_PI_TYPE3)
>                 domain->sig.dif.ref_escape = true;
> }

Yes, this looks OK.

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 14/16] nvmet: add metadata/T10-PI support
  2020-05-19 14:05 [PATCH 00/16 v8] " Max Gurtovoy
@ 2020-05-19 14:06 ` Max Gurtovoy
  0 siblings, 0 replies; 58+ messages in thread
From: Max Gurtovoy @ 2020-05-19 14:06 UTC (permalink / raw)
  To: sagi, linux-nvme, kbusch, hch, martin.petersen, jsmart2021, axboe
  Cc: vladimirk, shlomin, israelr, James Smart, idanb, oren,
	Max Gurtovoy, nitzanc

From: Israel Rukshin <israelr@mellanox.com>

Expose the namespace metadata format when PI is enabled. The user needs
to enable the capability per subsystem and per port. The other metadata
properties are taken from the namespace/bdev.

Usage example:
echo 1 > /config/nvmet/subsystems/${NAME}/attr_pi_enable
echo 1 > /config/nvmet/ports/${PORT_NUM}/param_pi_enable

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
---
 drivers/nvme/target/admin-cmd.c   | 26 +++++++++++++++---
 drivers/nvme/target/configfs.c    | 58 +++++++++++++++++++++++++++++++++++++++
 drivers/nvme/target/core.c        | 21 +++++++++++---
 drivers/nvme/target/fabrics-cmd.c |  7 +++--
 drivers/nvme/target/nvmet.h       | 19 +++++++++++++
 5 files changed, 121 insertions(+), 10 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 905aba8..9c8a00d 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -341,6 +341,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 {
 	struct nvmet_ctrl *ctrl = req->sq->ctrl;
 	struct nvme_id_ctrl *id;
+	u32 cmd_capsule_size;
 	u16 status = 0;
 
 	id = kzalloc(sizeof(*id), GFP_KERNEL);
@@ -433,9 +434,15 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 
 	strlcpy(id->subnqn, ctrl->subsys->subsysnqn, sizeof(id->subnqn));
 
-	/* Max command capsule size is sqe + single page of in-capsule data */
-	id->ioccsz = cpu_to_le32((sizeof(struct nvme_command) +
-				  req->port->inline_data_size) / 16);
+	/*
+	 * Max command capsule size is sqe + in-capsule data size.
+	 * Disable in-capsule data for Metadata capable controllers.
+	 */
+	cmd_capsule_size = sizeof(struct nvme_command);
+	if (!ctrl->pi_support)
+		cmd_capsule_size += req->port->inline_data_size;
+	id->ioccsz = cpu_to_le32(cmd_capsule_size / 16);
+
 	/* Max response capsule size is cqe */
 	id->iorcsz = cpu_to_le32(sizeof(struct nvme_completion) / 16);
 
@@ -465,6 +472,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 
 static void nvmet_execute_identify_ns(struct nvmet_req *req)
 {
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
 	struct nvmet_ns *ns;
 	struct nvme_id_ns *id;
 	u16 status = 0;
@@ -482,7 +490,7 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
 	}
 
 	/* return an all zeroed buffer if we can't find an active namespace */
-	ns = nvmet_find_namespace(req->sq->ctrl, req->cmd->identify.nsid);
+	ns = nvmet_find_namespace(ctrl, req->cmd->identify.nsid);
 	if (!ns)
 		goto done;
 
@@ -526,6 +534,16 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
 
 	id->lbaf[0].ds = ns->blksize_shift;
 
+	if (ctrl->pi_support && nvmet_ns_has_pi(ns)) {
+		id->dpc = NVME_NS_DPC_PI_FIRST | NVME_NS_DPC_PI_LAST |
+			  NVME_NS_DPC_PI_TYPE1 | NVME_NS_DPC_PI_TYPE2 |
+			  NVME_NS_DPC_PI_TYPE3;
+		id->mc = NVME_MC_EXTENDED_LBA;
+		id->dps = ns->pi_type;
+		id->flbas = NVME_NS_FLBAS_META_EXT;
+		id->lbaf[0].ms = ns->metadata_size;
+	}
+
 	if (ns->readonly)
 		id->nsattr |= (1 << 0);
 	nvmet_put_namespace(ns);
diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 24eb4cf..d71d216 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -248,6 +248,36 @@ static ssize_t nvmet_param_inline_data_size_store(struct config_item *item,
 
 CONFIGFS_ATTR(nvmet_, param_inline_data_size);
 
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+static ssize_t nvmet_param_pi_enable_show(struct config_item *item,
+		char *page)
+{
+	struct nvmet_port *port = to_nvmet_port(item);
+
+	return snprintf(page, PAGE_SIZE, "%d\n", port->pi_enable);
+}
+
+static ssize_t nvmet_param_pi_enable_store(struct config_item *item,
+		const char *page, size_t count)
+{
+	struct nvmet_port *port = to_nvmet_port(item);
+	bool val;
+
+	if (strtobool(page, &val))
+		return -EINVAL;
+
+	if (port->enabled) {
+		pr_err("Disable port before setting pi_enable value.\n");
+		return -EACCES;
+	}
+
+	port->pi_enable = val;
+	return count;
+}
+
+CONFIGFS_ATTR(nvmet_, param_pi_enable);
+#endif
+
 static ssize_t nvmet_addr_trtype_show(struct config_item *item,
 		char *page)
 {
@@ -984,6 +1014,28 @@ static ssize_t nvmet_subsys_attr_model_store(struct config_item *item,
 }
 CONFIGFS_ATTR(nvmet_subsys_, attr_model);
 
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+static ssize_t nvmet_subsys_attr_pi_enable_show(struct config_item *item,
+						char *page)
+{
+	return snprintf(page, PAGE_SIZE, "%d\n", to_subsys(item)->pi_support);
+}
+
+static ssize_t nvmet_subsys_attr_pi_enable_store(struct config_item *item,
+						 const char *page, size_t count)
+{
+	struct nvmet_subsys *subsys = to_subsys(item);
+	bool pi_enable;
+
+	if (strtobool(page, &pi_enable))
+		return -EINVAL;
+
+	subsys->pi_support = pi_enable;
+	return count;
+}
+CONFIGFS_ATTR(nvmet_subsys_, attr_pi_enable);
+#endif
+
 static struct configfs_attribute *nvmet_subsys_attrs[] = {
 	&nvmet_subsys_attr_attr_allow_any_host,
 	&nvmet_subsys_attr_attr_version,
@@ -991,6 +1043,9 @@ static ssize_t nvmet_subsys_attr_model_store(struct config_item *item,
 	&nvmet_subsys_attr_attr_cntlid_min,
 	&nvmet_subsys_attr_attr_cntlid_max,
 	&nvmet_subsys_attr_attr_model,
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+	&nvmet_subsys_attr_attr_pi_enable,
+#endif
 	NULL,
 };
 
@@ -1290,6 +1345,9 @@ static void nvmet_port_release(struct config_item *item)
 	&nvmet_attr_addr_trsvcid,
 	&nvmet_attr_addr_trtype,
 	&nvmet_attr_param_inline_data_size,
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+	&nvmet_attr_param_pi_enable,
+#endif
 	NULL,
 };
 
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 50dfc60..9f6b0f9 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -322,12 +322,21 @@ int nvmet_enable_port(struct nvmet_port *port)
 	if (!try_module_get(ops->owner))
 		return -EINVAL;
 
-	ret = ops->add_port(port);
-	if (ret) {
-		module_put(ops->owner);
-		return ret;
+	/*
+	 * If the user requested PI support and the transport isn't pi capable,
+	 * don't enable the port.
+	 */
+	if (port->pi_enable && !ops->metadata_support) {
+		pr_err("T10-PI is not supported by transport type %d\n",
+		       port->disc_addr.trtype);
+		ret = -EINVAL;
+		goto out_put;
 	}
 
+	ret = ops->add_port(port);
+	if (ret)
+		goto out_put;
+
 	/* If the transport didn't set inline_data_size, then disable it. */
 	if (port->inline_data_size < 0)
 		port->inline_data_size = 0;
@@ -335,6 +344,10 @@ int nvmet_enable_port(struct nvmet_port *port)
 	port->enabled = true;
 	port->tr_ops = ops;
 	return 0;
+
+out_put:
+	module_put(ops->owner);
+	return ret;
 }
 
 void nvmet_disable_port(struct nvmet_port *port)
diff --git a/drivers/nvme/target/fabrics-cmd.c b/drivers/nvme/target/fabrics-cmd.c
index 52a6f70..42bd12b 100644
--- a/drivers/nvme/target/fabrics-cmd.c
+++ b/drivers/nvme/target/fabrics-cmd.c
@@ -197,6 +197,8 @@ static void nvmet_execute_admin_connect(struct nvmet_req *req)
 		goto out;
 	}
 
+	ctrl->pi_support = ctrl->port->pi_enable && ctrl->subsys->pi_support;
+
 	uuid_copy(&ctrl->hostid, &d->hostid);
 
 	status = nvmet_install_queue(ctrl, req);
@@ -205,8 +207,9 @@ static void nvmet_execute_admin_connect(struct nvmet_req *req)
 		goto out;
 	}
 
-	pr_info("creating controller %d for subsystem %s for NQN %s.\n",
-		ctrl->cntlid, ctrl->subsys->subsysnqn, ctrl->hostnqn);
+	pr_info("creating controller %d for subsystem %s for NQN %s%s.\n",
+		ctrl->cntlid, ctrl->subsys->subsysnqn, ctrl->hostnqn,
+		ctrl->pi_support ? " T10-PI is enabled" : "");
 	req->cqe->result.u16 = cpu_to_le16(ctrl->cntlid);
 
 out:
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index ceedaaf..2986e2c 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -145,6 +145,7 @@ struct nvmet_port {
 	bool				enabled;
 	int				inline_data_size;
 	const struct nvmet_fabrics_ops	*tr_ops;
+	bool				pi_enable;
 };
 
 static inline struct nvmet_port *to_nvmet_port(struct config_item *item)
@@ -204,6 +205,7 @@ struct nvmet_ctrl {
 	spinlock_t		error_lock;
 	u64			err_counter;
 	struct nvme_error_slot	slots[NVMET_ERROR_LOG_SLOTS];
+	bool			pi_support;
 };
 
 struct nvmet_subsys_model {
@@ -233,6 +235,7 @@ struct nvmet_subsys {
 	u64			ver;
 	u64			serial;
 	char			*subsysnqn;
+	bool			pi_support;
 
 	struct config_group	group;
 
@@ -284,6 +287,7 @@ struct nvmet_fabrics_ops {
 	unsigned int type;
 	unsigned int msdbd;
 	bool has_keyed_sgls : 1;
+	bool metadata_support : 1;
 	void (*queue_response)(struct nvmet_req *req);
 	int (*add_port)(struct nvmet_port *port);
 	void (*remove_port)(struct nvmet_port *port);
@@ -510,6 +514,14 @@ static inline u32 nvmet_rw_data_len(struct nvmet_req *req)
 			req->ns->blksize_shift;
 }
 
+static inline u32 nvmet_rw_metadata_len(struct nvmet_req *req)
+{
+	if (!IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY))
+		return 0;
+	return ((u32)le16_to_cpu(req->cmd->rw.length) + 1) *
+			req->ns->metadata_size;
+}
+
 static inline u32 nvmet_dsm_len(struct nvmet_req *req)
 {
 	return (le32_to_cpu(req->cmd->dsm.nr) + 1) *
@@ -524,4 +536,11 @@ static inline __le16 to0based(u32 a)
 	return cpu_to_le16(max(1U, min(1U << 16, a)) - 1);
 }
 
+static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
+{
+	if (!IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY))
+		return false;
+	return ns->pi_type && ns->metadata_size == sizeof(struct t10_pi_tuple);
+}
+
 #endif /* _NVMET_H */
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 14/16] nvmet: Add metadata/T10-PI support
  2019-12-02 14:47 [PATCH 00/16 V2] nvme-rdma/nvmet-rdma: Add " Max Gurtovoy
@ 2019-12-02 14:48 ` Max Gurtovoy
  0 siblings, 0 replies; 58+ messages in thread
From: Max Gurtovoy @ 2019-12-02 14:48 UTC (permalink / raw)
  To: linux-nvme, kbusch, hch, sagi, martin.petersen
  Cc: axboe, vladimirk, shlomin, israelr, idanb, oren, maxg

From: Israel Rukshin <israelr@mellanox.com>

Expose the namespace metadata format when PI is enabled. The user needs
to enable the capability per subsystem and per port. The other metadata
properties are taken from the namespace/bdev.

Usage example:
echo 1 > /config/nvmet/subsystems/${NAME}/attr_pi_enable
echo 1 > /config/nvmet/ports/${PORT_NUM}/param_pi_enable

Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
---
 drivers/nvme/target/admin-cmd.c   | 23 ++++++++++++---
 drivers/nvme/target/configfs.c    | 61 +++++++++++++++++++++++++++++++++++++++
 drivers/nvme/target/fabrics-cmd.c | 11 +++++++
 drivers/nvme/target/nvmet.h       | 28 ++++++++++++++++++
 4 files changed, 119 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 19ae1552e54a..20e7f08c8b44 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -346,8 +346,8 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 	/* we support multiple ports, multiples hosts and ANA: */
 	id->cmic = (1 << 0) | (1 << 1) | (1 << 3);
 
-	/* no limit on data transfer sizes for now */
-	id->mdts = 0;
+	/* Limit MDTS for metadata capable controllers (assume mpsmin is 4k) */
+	id->mdts = ctrl->pi_support ? ilog2(NVMET_T10_PI_MAX_KB_SZ >> 2) : 0;
 	id->cntlid = cpu_to_le16(ctrl->cntlid);
 	id->ver = cpu_to_le32(ctrl->subsys->ver);
 
@@ -405,9 +405,13 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 
 	strlcpy(id->subnqn, ctrl->subsys->subsysnqn, sizeof(id->subnqn));
 
-	/* Max command capsule size is sqe + single page of in-capsule data */
+	/*
+	 * Max command capsule size is sqe + single page of in-capsule data.
+	 * Disable inline data for Metadata capable controllers.
+	 */
 	id->ioccsz = cpu_to_le32((sizeof(struct nvme_command) +
-				  req->port->inline_data_size) / 16);
+				  req->port->inline_data_size *
+				  !ctrl->pi_support) / 16);
 	/* Max response capsule size is cqe */
 	id->iorcsz = cpu_to_le32(sizeof(struct nvme_completion) / 16);
 
@@ -437,6 +441,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 
 static void nvmet_execute_identify_ns(struct nvmet_req *req)
 {
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
 	struct nvmet_ns *ns;
 	struct nvme_id_ns *id;
 	u16 status = 0;
@@ -493,6 +498,16 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
 
 	id->lbaf[0].ds = ns->blksize_shift;
 
+	if (ctrl->pi_support && nvmet_ns_has_pi(ns)) {
+		id->dpc = NVME_NS_DPC_PI_FIRST | NVME_NS_DPC_PI_LAST |
+			  NVME_NS_DPC_PI_TYPE1 | NVME_NS_DPC_PI_TYPE2 |
+			  NVME_NS_DPC_PI_TYPE3;
+		id->mc = NVME_NS_MC_META_EXT;
+		id->dps = ns->prot_type;
+		id->flbas = NVME_NS_FLBAS_META_EXT;
+		id->lbaf[0].ms = ns->ms;
+	}
+
 	if (ns->readonly)
 		id->nsattr |= (1 << 0);
 	nvmet_put_namespace(ns);
diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 98613a45bd3b..754302532567 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -248,6 +248,36 @@ static ssize_t nvmet_param_inline_data_size_store(struct config_item *item,
 
 CONFIGFS_ATTR(nvmet_, param_inline_data_size);
 
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+static ssize_t nvmet_param_pi_enable_show(struct config_item *item,
+		char *page)
+{
+	struct nvmet_port *port = to_nvmet_port(item);
+
+	return snprintf(page, PAGE_SIZE, "%d\n", port->pi_enable);
+}
+
+static ssize_t nvmet_param_pi_enable_store(struct config_item *item,
+		const char *page, size_t count)
+{
+	struct nvmet_port *port = to_nvmet_port(item);
+	bool val;
+
+	if (strtobool(page, &val))
+		return -EINVAL;
+
+	if (port->enabled) {
+		pr_err("Disable port before setting pi_enable value.\n");
+		return -EACCES;
+	}
+
+	port->pi_enable = val;
+	return count;
+}
+
+CONFIGFS_ATTR(nvmet_, param_pi_enable);
+#endif
+
 static ssize_t nvmet_addr_trtype_show(struct config_item *item,
 		char *page)
 {
@@ -862,10 +892,38 @@ static ssize_t nvmet_subsys_attr_serial_store(struct config_item *item,
 }
 CONFIGFS_ATTR(nvmet_subsys_, attr_serial);
 
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+static ssize_t nvmet_subsys_attr_pi_enable_show(struct config_item *item,
+						char *page)
+{
+	return snprintf(page, PAGE_SIZE, "%d\n", to_subsys(item)->pi_support);
+}
+
+static ssize_t nvmet_subsys_attr_pi_enable_store(struct config_item *item,
+						 const char *page, size_t count)
+{
+	struct nvmet_subsys *subsys = to_subsys(item);
+	bool pi_enable;
+
+	if (strtobool(page, &pi_enable))
+		return -EINVAL;
+
+	down_write(&nvmet_config_sem);
+	subsys->pi_support = pi_enable;
+	up_write(&nvmet_config_sem);
+
+	return count;
+}
+CONFIGFS_ATTR(nvmet_subsys_, attr_pi_enable);
+#endif
+
 static struct configfs_attribute *nvmet_subsys_attrs[] = {
 	&nvmet_subsys_attr_attr_allow_any_host,
 	&nvmet_subsys_attr_attr_version,
 	&nvmet_subsys_attr_attr_serial,
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+	&nvmet_subsys_attr_attr_pi_enable,
+#endif
 	NULL,
 };
 
@@ -1161,6 +1219,9 @@ static struct configfs_attribute *nvmet_port_attrs[] = {
 	&nvmet_attr_addr_trsvcid,
 	&nvmet_attr_addr_trtype,
 	&nvmet_attr_param_inline_data_size,
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+	&nvmet_attr_param_pi_enable,
+#endif
 	NULL,
 };
 
diff --git a/drivers/nvme/target/fabrics-cmd.c b/drivers/nvme/target/fabrics-cmd.c
index ee41b2bb359e..72119960489d 100644
--- a/drivers/nvme/target/fabrics-cmd.c
+++ b/drivers/nvme/target/fabrics-cmd.c
@@ -192,6 +192,17 @@ static void nvmet_execute_admin_connect(struct nvmet_req *req)
 		goto out;
 	}
 
+	if (ctrl->subsys->pi_support && ctrl->port->pi_enable) {
+		if (ctrl->port->pi_capable) {
+			ctrl->pi_support = true;
+			pr_info("controller %d T10-PI enabled\n", ctrl->cntlid);
+		} else {
+			ctrl->pi_support = false;
+			pr_warn("T10-PI is not supported on controller %d\n",
+				ctrl->cntlid);
+		}
+	}
+
 	uuid_copy(&ctrl->hostid, &d->hostid);
 
 	status = nvmet_install_queue(ctrl, req);
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index c8b7095a0ded..5ee01e0887e7 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -144,6 +144,8 @@ struct nvmet_port {
 	bool				enabled;
 	int				inline_data_size;
 	const struct nvmet_fabrics_ops	*tr_ops;
+	bool				pi_capable;
+	bool				pi_enable;
 };
 
 static inline struct nvmet_port *to_nvmet_port(struct config_item *item)
@@ -203,6 +205,7 @@ struct nvmet_ctrl {
 	spinlock_t		error_lock;
 	u64			err_counter;
 	struct nvme_error_slot	slots[NVMET_ERROR_LOG_SLOTS];
+	bool			pi_support;
 };
 
 struct nvmet_subsys {
@@ -225,6 +228,7 @@ struct nvmet_subsys {
 	u64			ver;
 	u64			serial;
 	char			*subsysnqn;
+	bool			pi_support;
 
 	struct config_group	group;
 
@@ -501,6 +505,30 @@ static inline u32 nvmet_rw_data_len(struct nvmet_req *req)
 			req->ns->blksize_shift;
 }
 
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+static inline u32 nvmet_rw_prot_len(struct nvmet_req *req)
+{
+	return ((u32)le16_to_cpu(req->cmd->rw.length) + 1) * req->ns->ms;
+}
+
+static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
+{
+	return ns->prot_type && ns->ms == sizeof(struct t10_pi_tuple);
+}
+#else
+static inline u32 nvmet_rw_prot_len(struct nvmet_req *req)
+{
+	return 0;
+}
+
+static inline bool nvmet_ns_has_pi(struct nvmet_ns *ns)
+{
+	return false;
+}
+#endif /* CONFIG_BLK_DEV_INTEGRITY */
+
+#define NVMET_T10_PI_MAX_KB_SZ 128
+
 static inline u32 nvmet_dsm_len(struct nvmet_req *req)
 {
 	return (le32_to_cpu(req->cmd->dsm.nr) + 1) *
-- 
2.16.3


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2020-05-19 14:08 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-04 15:57 [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
2020-05-04 15:57 ` [PATCH 01/16] block: always define struct blk_integrity in genhd.h Max Gurtovoy
2020-05-14  2:51   ` Martin K. Petersen
2020-05-04 15:57 ` [PATCH 02/16] nvme: introduce namespace features flag Max Gurtovoy
2020-05-04 23:59   ` James Smart
2020-05-14  2:52   ` Martin K. Petersen
2020-05-04 15:57 ` [PATCH 03/16] nvme: introduce NVME_NS_METADATA_SUPPORTED flag Max Gurtovoy
2020-05-05 23:33   ` James Smart
2020-05-06  8:39     ` Max Gurtovoy
2020-05-06 20:44       ` James Smart
2020-05-07  9:02         ` Max Gurtovoy
2020-05-11 23:50           ` James Smart
2020-05-13 18:18             ` Christoph Hellwig
2020-05-13 19:53               ` James Smart
2020-05-14  2:53   ` Martin K. Petersen
2020-05-04 15:57 ` [PATCH 04/16] nvme: make nvme_ns_has_pi accessible to transports Max Gurtovoy
2020-05-14  2:53   ` Martin K. Petersen
2020-05-04 15:57 ` [PATCH 05/16] nvme: introduce max_integrity_segments ctrl attribute Max Gurtovoy
2020-05-05 23:51   ` James Smart
2020-05-06  7:08     ` Christoph Hellwig
2020-05-13 19:04   ` James Smart
2020-05-04 15:57 ` [PATCH 06/16] nvme: enforce extended LBA format for fabrics metadata Max Gurtovoy
2020-05-13 19:03   ` James Smart
2020-05-14  2:56     ` Martin K. Petersen
2020-05-14  8:28       ` Max Gurtovoy
2020-05-14  8:15     ` Max Gurtovoy
2020-05-04 15:57 ` [PATCH 07/16] nvme: introduce NVME_INLINE_METADATA_SG_CNT Max Gurtovoy
2020-05-13 19:05   ` James Smart
2020-05-04 15:57 ` [PATCH 08/16] nvme-rdma: introduce nvme_rdma_sgl structure Max Gurtovoy
2020-05-04 15:57 ` [PATCH 09/16] nvme-rdma: add metadata/T10-PI support Max Gurtovoy
2020-05-05  6:12   ` Christoph Hellwig
2020-05-14  3:02   ` Martin K. Petersen
2020-05-14  8:48     ` Max Gurtovoy
2020-05-14 22:40       ` Martin K. Petersen
2020-05-15 14:50         ` Max Gurtovoy
2020-05-18 17:22           ` Martin K. Petersen
2020-05-04 15:57 ` [PATCH 10/16] nvmet: add metadata characteristics for a namespace Max Gurtovoy
2020-05-13 19:25   ` James Smart
2020-05-14  3:06     ` Martin K. Petersen
2020-05-04 15:57 ` [PATCH 11/16] nvmet: rename nvmet_rw_len to nvmet_rw_data_len Max Gurtovoy
2020-05-13 19:25   ` James Smart
2020-05-04 15:57 ` [PATCH 12/16] nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len Max Gurtovoy
2020-05-13 19:27   ` James Smart
2020-05-04 15:57 ` [PATCH 13/16] nvme: add Metadata Capabilities enumerations Max Gurtovoy
2020-05-13 19:27   ` James Smart
2020-05-14  3:07   ` Martin K. Petersen
2020-05-04 15:57 ` [PATCH 14/16] nvmet: add metadata/T10-PI support Max Gurtovoy
2020-05-13 19:51   ` James Smart
2020-05-14 15:09     ` Max Gurtovoy
2020-05-14 15:37       ` James Smart
2020-05-04 15:57 ` [PATCH 15/16] nvmet: add metadata support for block devices Max Gurtovoy
2020-05-04 15:57 ` [PATCH 16/16] nvmet-rdma: add metadata/T10-PI support Max Gurtovoy
2020-05-14  3:10   ` Martin K. Petersen
2020-05-14  8:55     ` Max Gurtovoy
2020-05-05  6:13 ` [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add " Christoph Hellwig
2020-05-14 15:55   ` Max Gurtovoy
  -- strict thread matches above, loose matches on Subject: below --
2020-05-19 14:05 [PATCH 00/16 v8] " Max Gurtovoy
2020-05-19 14:06 ` [PATCH 14/16] nvmet: add " Max Gurtovoy
2019-12-02 14:47 [PATCH 00/16 V2] nvme-rdma/nvmet-rdma: Add " Max Gurtovoy
2019-12-02 14:48 ` [PATCH 14/16] nvmet: " Max Gurtovoy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).