All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host
@ 2018-05-27 15:50 Max Gurtovoy
  2018-05-27 15:50   ` Max Gurtovoy
                   ` (17 more replies)
  0 siblings, 18 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-27 15:50 UTC (permalink / raw)


Hello Sagi, Christoph, Keith, Jason and Co

This patchset adds T10-PI support for NVMeoF/RDMA host side, using signature
verbs API. This set starts with a few preparation commits to the RDMA/core
layer + a bug fix in iSER target ref_tag check mask. It continues with a few
fixes in the NVMe core layer and enablement of metadata support over fabrics.
The patchset ends with preparation patches in the RDMA transport that include
minor functionality changes.

The code was tested using Mellanox's ConnectX-4/ConnectX-5 HCA in the initiator
side vs. Mellanox offloaded target implementation using ConnectX-5 adapter (Currently
not upstream, but can be found in MLNX_OFED-4.1 and above packages).

This series applies on top of kernel 4.17-rc3 tag cleanly.

Currently, T10 support for the target side is in it's initial plans for implementation
but I decided to get some feadback first and push it incrementaly.

Most of the first patches can be pushed without waiting for final version of the later patches
in the series.

My next steps are:
 - run a nightly regression test suite to check that old functionality works properly
 - run performance benchmark with/without T10-PI
 - compare the results of the non T10-PI runs to previous results
 - run long IO run (using tools like ezFio/fio)
 - test the code using HCA's that don't support T10-PI offload (such as ConnectX-3)

Meanwhile, I would be happy to hear a feadback on this series :)
(also it will be great if someone can run it in his testing environments).

Max Gurtovoy (17):
  IB/isert: fix T10-pi check mask setting
  RDMA/core: introduce check masks for T10-PI offload
  IB/iser: use T10-PI check mask definitions from core layer
  IB/isert: use T10-PI check mask definitions from core layer
  nvme: Fix extended data LBA supported setting
  nvme: Add WARN in case fabrics ctrl was set with wrong metadata caps
  nvme: introduce max_integrity_segments ctrl attribute
  nvme: limit integrity segments to be <= data segments
  nvme: reduce the metadata size in case the ctrl doesn't support it
  nvme: export nvme_ns_has_pi function
  nvme-rdma: Introduce cqe for local invalidation
  nvme-rdma: Introduce nvme_rdma_set_keyed_sgl helper func
  nvme-rdma: introduce nvme_rdma_sgl structure
  nvme-rdma: refactor cmd mapping/unmapping mechanism
  nvme-rdma: Add helper function for preparing sg list to RDMA operation
  nvme-rdma: Introduce nvme_rdma_first_wr helper function
  nvme-rdma: Add T10-PI support

 drivers/infiniband/ulp/iser/iscsi_iser.h  |   4 -
 drivers/infiniband/ulp/iser/iser_memory.c |   4 +-
 drivers/infiniband/ulp/isert/ib_isert.c   |   6 +-
 drivers/nvme/host/core.c                  |  41 +-
 drivers/nvme/host/nvme.h                  |   2 +
 drivers/nvme/host/pci.c                   |   6 +
 drivers/nvme/host/rdma.c                  | 761 +++++++++++++++++++++++++-----
 include/rdma/ib_verbs.h                   |  13 +
 8 files changed, 701 insertions(+), 136 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 01/17] IB/isert: fix T10-pi check mask setting
  2018-05-27 15:50 [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host Max Gurtovoy
@ 2018-05-27 15:50   ` Max Gurtovoy
  2018-05-27 15:50 ` [PATCH 02/17] RDMA/core: introduce check masks for T10-PI offload Max Gurtovoy
                     ` (16 subsequent siblings)
  17 siblings, 0 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-27 15:50 UTC (permalink / raw)
  To: jgg, keith.busch, linux-nvme, hch, sagi
  Cc: leon, oren, idanb, vladimirk, Max Gurtovoy, stable

A copy/paste bug (probably) caused setting of an app_tag check mask
in case where a ref_tag check was needed.

Fixes: 38a2d0d429f1 ("IB/isert: convert to the generic RDMA READ/WRITE API")
Fixes: 9e961ae73c2c ("IB/isert: Support T10-PI protected transactions")
Cc: stable@vger.kernel.org
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
---
 drivers/infiniband/ulp/isert/ib_isert.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index fff40b0..d8b6bde 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -2108,7 +2108,7 @@
 
 	sig_attrs->check_mask =
 	       (se_cmd->prot_checks & TARGET_DIF_CHECK_GUARD  ? 0xc0 : 0) |
-	       (se_cmd->prot_checks & TARGET_DIF_CHECK_REFTAG ? 0x30 : 0) |
+	       (se_cmd->prot_checks & TARGET_DIF_CHECK_APPTAG ? 0x30 : 0) |
 	       (se_cmd->prot_checks & TARGET_DIF_CHECK_REFTAG ? 0x0f : 0);
 	return 0;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 01/17] IB/isert: fix T10-pi check mask setting
@ 2018-05-27 15:50   ` Max Gurtovoy
  0 siblings, 0 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-27 15:50 UTC (permalink / raw)


A copy/paste bug (probably) caused setting of an app_tag check mask
in case where a ref_tag check was needed.

Fixes: 38a2d0d429f1 ("IB/isert: convert to the generic RDMA READ/WRITE API")
Fixes: 9e961ae73c2c ("IB/isert: Support T10-PI protected transactions")
Cc: stable at vger.kernel.org
Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/infiniband/ulp/isert/ib_isert.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index fff40b0..d8b6bde 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -2108,7 +2108,7 @@
 
 	sig_attrs->check_mask =
 	       (se_cmd->prot_checks & TARGET_DIF_CHECK_GUARD  ? 0xc0 : 0) |
-	       (se_cmd->prot_checks & TARGET_DIF_CHECK_REFTAG ? 0x30 : 0) |
+	       (se_cmd->prot_checks & TARGET_DIF_CHECK_APPTAG ? 0x30 : 0) |
 	       (se_cmd->prot_checks & TARGET_DIF_CHECK_REFTAG ? 0x0f : 0);
 	return 0;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 02/17] RDMA/core: introduce check masks for T10-PI offload
  2018-05-27 15:50 [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host Max Gurtovoy
  2018-05-27 15:50   ` Max Gurtovoy
@ 2018-05-27 15:50 ` Max Gurtovoy
  2018-05-28  7:21   ` Christoph Hellwig
  2018-05-30 21:56   ` Sagi Grimberg
  2018-05-27 15:50 ` [PATCH 03/17] IB/iser: use T10-PI check mask definitions from core layer Max Gurtovoy
                   ` (15 subsequent siblings)
  17 siblings, 2 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-27 15:50 UTC (permalink / raw)


T10-PI offload capability is currently supported in iSER protocol only,
and the definition of the HCA protection information checks are missing
from the core layer. Add those definition to avoid code duplication in
other drivers (such iSER target and NVMeoF).

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 include/rdma/ib_verbs.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 9fc8a82..bdb1bbc 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -861,6 +861,19 @@ enum ib_sig_err_type {
 };
 
 /**
+ * Signature check masks (8 bytes in total) according to the T10-PI standard:
+ *  -------- -------- ------------
+ * | GUARD  | APPTAG |   REFTAG   |
+ * |  2B    |  2B    |    4B      |
+ *  -------- -------- ------------
+ */
+enum {
+	IB_SIG_CHECK_GUARD = 0xc0,
+	IB_SIG_CHECK_APPTAG = 0x30,
+	IB_SIG_CHECK_REFTAG = 0x0f,
+};
+
+/**
  * struct ib_sig_err - signature error descriptor
  */
 struct ib_sig_err {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 03/17] IB/iser: use T10-PI check mask definitions from core layer
  2018-05-27 15:50 [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host Max Gurtovoy
  2018-05-27 15:50   ` Max Gurtovoy
  2018-05-27 15:50 ` [PATCH 02/17] RDMA/core: introduce check masks for T10-PI offload Max Gurtovoy
@ 2018-05-27 15:50 ` Max Gurtovoy
  2018-05-28  7:22   ` Christoph Hellwig
  2018-05-30 21:57   ` Sagi Grimberg
  2018-05-27 15:50 ` [PATCH 04/17] IB/isert: " Max Gurtovoy
                   ` (14 subsequent siblings)
  17 siblings, 2 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-27 15:50 UTC (permalink / raw)


No reason to re-define protection information check in ib_iser driver.
Use check masks from RDMA core driver.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/infiniband/ulp/iser/iscsi_iser.h  | 4 ----
 drivers/infiniband/ulp/iser/iser_memory.c | 4 ++--
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h
index c1ae4ae..e13765d 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -383,10 +383,6 @@ struct iser_device {
 	bool                         remote_inv_sup;
 };
 
-#define ISER_CHECK_GUARD	0xc0
-#define ISER_CHECK_REFTAG	0x0f
-#define ISER_CHECK_APPTAG	0x30
-
 /**
  * struct iser_reg_resources - Fast registration recources
  *
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c
index 322209d..ca844a9 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -362,9 +362,9 @@ void iser_unreg_mem_fastreg(struct iscsi_iser_task *iser_task,
 {
 	*mask = 0;
 	if (sc->prot_flags & SCSI_PROT_REF_CHECK)
-		*mask |= ISER_CHECK_REFTAG;
+		*mask |= IB_SIG_CHECK_REFTAG;
 	if (sc->prot_flags & SCSI_PROT_GUARD_CHECK)
-		*mask |= ISER_CHECK_GUARD;
+		*mask |= IB_SIG_CHECK_GUARD;
 }
 
 static inline void
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 04/17] IB/isert: use T10-PI check mask definitions from core layer
  2018-05-27 15:50 [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host Max Gurtovoy
                   ` (2 preceding siblings ...)
  2018-05-27 15:50 ` [PATCH 03/17] IB/iser: use T10-PI check mask definitions from core layer Max Gurtovoy
@ 2018-05-27 15:50 ` Max Gurtovoy
  2018-05-28  7:22   ` Christoph Hellwig
  2018-05-30 21:58   ` Sagi Grimberg
  2018-05-27 15:50 ` [PATCH 05/17] nvme: Fix extended data LBA supported setting Max Gurtovoy
                   ` (13 subsequent siblings)
  17 siblings, 2 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-27 15:50 UTC (permalink / raw)


No reason to use hard-coded protection information checks in ib_isert driver.
Use check masks from RDMA core driver.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/infiniband/ulp/isert/ib_isert.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index d8b6bde..8a0cda6 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -2107,9 +2107,9 @@
 	}
 
 	sig_attrs->check_mask =
-	       (se_cmd->prot_checks & TARGET_DIF_CHECK_GUARD  ? 0xc0 : 0) |
-	       (se_cmd->prot_checks & TARGET_DIF_CHECK_APPTAG ? 0x30 : 0) |
-	       (se_cmd->prot_checks & TARGET_DIF_CHECK_REFTAG ? 0x0f : 0);
+	       (se_cmd->prot_checks & TARGET_DIF_CHECK_GUARD  ? IB_SIG_CHECK_GUARD : 0)  |
+	       (se_cmd->prot_checks & TARGET_DIF_CHECK_APPTAG ? IB_SIG_CHECK_APPTAG : 0) |
+	       (se_cmd->prot_checks & TARGET_DIF_CHECK_REFTAG ? IB_SIG_CHECK_REFTAG : 0);
 	return 0;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 05/17] nvme: Fix extended data LBA supported setting
  2018-05-27 15:50 [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host Max Gurtovoy
                   ` (3 preceding siblings ...)
  2018-05-27 15:50 ` [PATCH 04/17] IB/isert: " Max Gurtovoy
@ 2018-05-27 15:50 ` Max Gurtovoy
  2018-05-28  7:22   ` Christoph Hellwig
  2018-05-30 22:00   ` Sagi Grimberg
  2018-05-27 15:50 ` [PATCH 06/17] nvme: Add WARN in case fabrics ctrl was set with wrong metadata caps Max Gurtovoy
                   ` (12 subsequent siblings)
  17 siblings, 2 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-27 15:50 UTC (permalink / raw)


This value depands on the metadata support value.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/nvme/host/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9df4f71..d8254b4 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1443,8 +1443,8 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 	if (ns->lba_shift == 0)
 		ns->lba_shift = 9;
 	ns->noiob = le16_to_cpu(id->noiob);
-	ns->ext = ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT);
 	ns->ms = le16_to_cpu(id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ms);
+	ns->ext = ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT);
 	/* the PI implementation requires metadata equal t10 pi tuple size */
 	if (ns->ms == sizeof(struct t10_pi_tuple))
 		ns->pi_type = id->dps & NVME_NS_DPS_PI_MASK;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 06/17] nvme: Add WARN in case fabrics ctrl was set with wrong metadata caps
  2018-05-27 15:50 [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host Max Gurtovoy
                   ` (4 preceding siblings ...)
  2018-05-27 15:50 ` [PATCH 05/17] nvme: Fix extended data LBA supported setting Max Gurtovoy
@ 2018-05-27 15:50 ` Max Gurtovoy
  2018-05-28  7:24   ` Christoph Hellwig
  2018-05-27 15:50 ` [PATCH 07/17] nvme: introduce max_integrity_segments ctrl attribute Max Gurtovoy
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-27 15:50 UTC (permalink / raw)


An extended LBA is a larger LBA that is created when metadata associated
with the LBA is transferred contiguously with the LBA data (AKA interleaved).
The metadata may be either transferred as part of the LBA (creating an extended
LBA) or it may be transferred as a separate contiguous buffer of data. According
to the NVMEoF spec, a fabrics ctrl supports only an Extended LBA format. add WARN
in case we have a spec violation. Also init the integrity profile for the block
device for fabrics ctrl.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/nvme/host/core.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d8254b4..c605bb3 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1419,9 +1419,10 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	blk_queue_physical_block_size(disk->queue, bs);
 	blk_queue_io_min(disk->queue, bs);
 
-	if (ns->ms && !ns->ext &&
-	    (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
-		nvme_init_integrity(disk, ns->ms, ns->pi_type);
+	if (ns->ms && (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED)) {
+		if ((ns->ctrl->ops->flags & NVME_F_FABRICS) || !ns->ext)
+			nvme_init_integrity(disk, ns->ms, ns->pi_type);
+	}
 	if (ns->ms && !nvme_ns_has_pi(ns) && !blk_get_integrity(disk))
 		capacity = 0;
 	set_capacity(disk, capacity);
@@ -1445,6 +1446,18 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 	ns->noiob = le16_to_cpu(id->noiob);
 	ns->ms = le16_to_cpu(id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ms);
 	ns->ext = ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT);
+
+	/*
+	 * For Fabrics, only metadata as part of extended data LBA is supported.
+	 * Reduce the metadata size in case of a spec violation.
+	 */
+	if (ns->ms && (ns->ctrl->ops->flags & NVME_F_FABRICS)) {
+		if (!ns->ext) {
+			WARN_ON_ONCE(1);
+			ns->ms = 0;
+		}
+	}
+
 	/* the PI implementation requires metadata equal t10 pi tuple size */
 	if (ns->ms == sizeof(struct t10_pi_tuple))
 		ns->pi_type = id->dps & NVME_NS_DPS_PI_MASK;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 07/17] nvme: introduce max_integrity_segments ctrl attribute
  2018-05-27 15:50 [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host Max Gurtovoy
                   ` (5 preceding siblings ...)
  2018-05-27 15:50 ` [PATCH 06/17] nvme: Add WARN in case fabrics ctrl was set with wrong metadata caps Max Gurtovoy
@ 2018-05-27 15:50 ` Max Gurtovoy
  2018-05-30 22:08   ` Sagi Grimberg
  2018-05-27 15:50 ` [PATCH 08/17] nvme: limit integrity segments to be <= data segments Max Gurtovoy
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-27 15:50 UTC (permalink / raw)


This patch doesn't change any logic, and is needed as a preparataion
for adding T10 support for fabrics drivers that will use an extended
LBA format for metadata.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/nvme/host/core.c | 11 +++++++----
 drivers/nvme/host/nvme.h |  1 +
 drivers/nvme/host/pci.c  |  6 ++++++
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c605bb3..197208a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1310,7 +1310,8 @@ static int nvme_getgeo(struct block_device *bdev, struct hd_geometry *geo)
 }
 
 #ifdef CONFIG_BLK_DEV_INTEGRITY
-static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type)
+static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type,
+		u32 max_integrity_segments)
 {
 	struct blk_integrity integrity;
 
@@ -1333,10 +1334,11 @@ static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type)
 	}
 	integrity.tuple_size = ms;
 	blk_integrity_register(disk, &integrity);
-	blk_queue_max_integrity_segments(disk->queue, 1);
+	blk_queue_max_integrity_segments(disk->queue, max_integrity_segments);
 }
 #else
-static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type)
+static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type,
+		u32 max_integrity_segments)
 {
 }
 #endif /* CONFIG_BLK_DEV_INTEGRITY */
@@ -1421,7 +1423,8 @@ static void nvme_update_disk_info(struct gendisk *disk,
 
 	if (ns->ms && (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED)) {
 		if ((ns->ctrl->ops->flags & NVME_F_FABRICS) || !ns->ext)
-			nvme_init_integrity(disk, ns->ms, ns->pi_type);
+			nvme_init_integrity(disk, ns->ms, ns->pi_type,
+					    ns->ctrl->max_integrity_segments);
 	}
 	if (ns->ms && !nvme_ns_has_pi(ns) && !blk_get_integrity(disk))
 		capacity = 0;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 061fecf..49759ca 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -164,6 +164,7 @@ struct nvme_ctrl {
 	u64 cap;
 	u32 page_size;
 	u32 max_hw_sectors;
+	u32 max_integrity_segments;
 	u16 oncs;
 	u16 oacs;
 	u16 nssa;
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index fbc71fa..0024fb2 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2326,6 +2326,12 @@ static void nvme_reset_work(struct work_struct *work)
 	if (result)
 		goto out;
 
+	/*
+	 * NVMe PCI driver doesn't support Extended LBA format so using 1 integrity
+	 * segment should be enough for a separate contiguous buffer of metadata.
+	 */
+	dev->ctrl.max_integrity_segments = 1;
+
 	result = nvme_init_identify(&dev->ctrl);
 	if (result)
 		goto out;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 08/17] nvme: limit integrity segments to be <= data segments
  2018-05-27 15:50 [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host Max Gurtovoy
                   ` (6 preceding siblings ...)
  2018-05-27 15:50 ` [PATCH 07/17] nvme: introduce max_integrity_segments ctrl attribute Max Gurtovoy
@ 2018-05-27 15:50 ` Max Gurtovoy
  2018-05-30 22:09   ` Sagi Grimberg
  2018-05-27 15:50 ` [PATCH 09/17] nvme: reduce the metadata size in case the ctrl doesn't support it Max Gurtovoy
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-27 15:50 UTC (permalink / raw)


There is no reason for setting max_integrity_segments to be greater than
max_segments.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/nvme/host/core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 197208a..e81e898 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1795,6 +1795,10 @@ static void nvme_set_queue_limits(struct nvme_ctrl *ctrl,
 		u32 max_segments =
 			(ctrl->max_hw_sectors / (ctrl->page_size >> 9)) + 1;
 
+		if (ctrl->max_integrity_segments)
+			ctrl->max_integrity_segments = min(ctrl->max_integrity_segments,
+							   max_segments);
+
 		blk_queue_max_hw_sectors(q, ctrl->max_hw_sectors);
 		blk_queue_max_segments(q, min_t(u32, max_segments, USHRT_MAX));
 	}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 09/17] nvme: reduce the metadata size in case the ctrl doesn't support it
  2018-05-27 15:50 [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host Max Gurtovoy
                   ` (7 preceding siblings ...)
  2018-05-27 15:50 ` [PATCH 08/17] nvme: limit integrity segments to be <= data segments Max Gurtovoy
@ 2018-05-27 15:50 ` Max Gurtovoy
  2018-05-28  7:25   ` Christoph Hellwig
  2018-05-27 15:50 ` [PATCH 10/17] nvme: export nvme_ns_has_pi function Max Gurtovoy
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-27 15:50 UTC (permalink / raw)


There is no reason for setting metadata size while the controller is
not capable to perform command with metadata.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/nvme/host/core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index e81e898..1830978 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1461,6 +1461,10 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 		}
 	}
 
+	/* Reduce the metadata size in case the ctrl doesn't support it */
+	if (!ns->ctrl->max_integrity_segments)
+		ns->ms = 0;
+
 	/* the PI implementation requires metadata equal t10 pi tuple size */
 	if (ns->ms == sizeof(struct t10_pi_tuple))
 		ns->pi_type = id->dps & NVME_NS_DPS_PI_MASK;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 10/17] nvme: export nvme_ns_has_pi function
  2018-05-27 15:50 [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host Max Gurtovoy
                   ` (8 preceding siblings ...)
  2018-05-27 15:50 ` [PATCH 09/17] nvme: reduce the metadata size in case the ctrl doesn't support it Max Gurtovoy
@ 2018-05-27 15:50 ` Max Gurtovoy
  2018-05-28  7:25   ` Christoph Hellwig
  2018-05-30 22:19   ` Sagi Grimberg
  2018-05-27 15:50 ` [PATCH 11/17] nvme-rdma: Introduce cqe for local invalidation Max Gurtovoy
                   ` (7 subsequent siblings)
  17 siblings, 2 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-27 15:50 UTC (permalink / raw)


This function will be used by transports that support T10-PI
mechanism.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/nvme/host/core.c | 3 ++-
 drivers/nvme/host/nvme.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 1830978..cc71a86 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -168,10 +168,11 @@ int nvme_delete_ctrl_sync(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_delete_ctrl_sync);
 
-static inline bool nvme_ns_has_pi(struct nvme_ns *ns)
+bool nvme_ns_has_pi(struct nvme_ns *ns)
 {
 	return ns->pi_type && ns->ms == sizeof(struct t10_pi_tuple);
 }
+EXPORT_SYMBOL_GPL(nvme_ns_has_pi);
 
 static blk_status_t nvme_error_status(struct request *req)
 {
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 49759ca..b2352de 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -429,6 +429,7 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 int nvme_reset_ctrl_sync(struct nvme_ctrl *ctrl);
 int nvme_delete_ctrl(struct nvme_ctrl *ctrl);
 int nvme_delete_ctrl_sync(struct nvme_ctrl *ctrl);
+bool nvme_ns_has_pi(struct nvme_ns *ns);
 
 int nvme_get_log_ext(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 		u8 log_page, void *log, size_t size, u64 offset);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 11/17] nvme-rdma: Introduce cqe for local invalidation
  2018-05-27 15:50 [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host Max Gurtovoy
                   ` (9 preceding siblings ...)
  2018-05-27 15:50 ` [PATCH 10/17] nvme: export nvme_ns_has_pi function Max Gurtovoy
@ 2018-05-27 15:50 ` Max Gurtovoy
  2018-05-28  7:25   ` Christoph Hellwig
  2018-05-30 22:26   ` Sagi Grimberg
  2018-05-27 15:50 ` [PATCH 12/17] nvme-rdma: Introduce nvme_rdma_set_keyed_sgl helper func Max Gurtovoy
                   ` (6 subsequent siblings)
  17 siblings, 2 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-27 15:50 UTC (permalink / raw)


Using the same cqe object for registration and invalidation completions
is not safe. Separate them to use 2 cqe objects. Also pass the rkey and
the cqe as arguments for local invalidation function, as a preparation
for invalidating sig_mr keys.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/nvme/host/rdma.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 1eb4438..a240800 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -68,6 +68,7 @@ struct nvme_rdma_request {
 	int			nents;
 	struct ib_reg_wr	reg_wr;
 	struct ib_cqe		reg_cqe;
+	struct ib_cqe		inv_cqe;
 	struct nvme_rdma_queue  *queue;
 	struct sg_table		sg_table;
 	struct scatterlist	first_sgl[];
@@ -1020,7 +1021,7 @@ static void nvme_rdma_memreg_done(struct ib_cq *cq, struct ib_wc *wc)
 static void nvme_rdma_inv_rkey_done(struct ib_cq *cq, struct ib_wc *wc)
 {
 	struct nvme_rdma_request *req =
-		container_of(wc->wr_cqe, struct nvme_rdma_request, reg_cqe);
+		container_of(wc->wr_cqe, struct nvme_rdma_request, inv_cqe);
 	struct request *rq = blk_mq_rq_from_pdu(req);
 
 	if (unlikely(wc->status != IB_WC_SUCCESS)) {
@@ -1033,8 +1034,8 @@ static void nvme_rdma_inv_rkey_done(struct ib_cq *cq, struct ib_wc *wc)
 
 }
 
-static int nvme_rdma_inv_rkey(struct nvme_rdma_queue *queue,
-		struct nvme_rdma_request *req)
+static int nvme_rdma_inv_rkey(struct nvme_rdma_queue *queue, u32 rkey,
+		struct ib_cqe *cqe)
 {
 	struct ib_send_wr *bad_wr;
 	struct ib_send_wr wr = {
@@ -1042,11 +1043,10 @@ static int nvme_rdma_inv_rkey(struct nvme_rdma_queue *queue,
 		.next		    = NULL,
 		.num_sge	    = 0,
 		.send_flags	    = IB_SEND_SIGNALED,
-		.ex.invalidate_rkey = req->mr->rkey,
+		.ex.invalidate_rkey = rkey,
 	};
 
-	req->reg_cqe.done = nvme_rdma_inv_rkey_done;
-	wr.wr_cqe = &req->reg_cqe;
+	wr.wr_cqe = cqe;
 
 	return ib_post_send(queue->qp, &wr, &bad_wr);
 }
@@ -1141,6 +1141,7 @@ static int nvme_rdma_map_sg_fr(struct nvme_rdma_queue *queue,
 	ib_update_fast_reg_key(req->mr, ib_inc_rkey(req->mr->rkey));
 
 	req->reg_cqe.done = nvme_rdma_memreg_done;
+	req->inv_cqe.done = nvme_rdma_inv_rkey_done;
 	memset(&req->reg_wr, 0, sizeof(req->reg_wr));
 	req->reg_wr.wr.opcode = IB_WR_REG_MR;
 	req->reg_wr.wr.wr_cqe = &req->reg_cqe;
@@ -1348,7 +1349,7 @@ static int nvme_rdma_process_nvme_rsp(struct nvme_rdma_queue *queue,
 			nvme_rdma_error_recovery(queue->ctrl);
 		}
 	} else if (req->mr) {
-		ret = nvme_rdma_inv_rkey(queue, req);
+		ret = nvme_rdma_inv_rkey(queue, req->mr->rkey, &req->inv_cqe);
 		if (unlikely(ret < 0)) {
 			dev_err(queue->ctrl->ctrl.device,
 				"Queueing INV WR for rkey %#x failed (%d)\n",
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 12/17] nvme-rdma: Introduce nvme_rdma_set_keyed_sgl helper func
  2018-05-27 15:50 [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host Max Gurtovoy
                   ` (10 preceding siblings ...)
  2018-05-27 15:50 ` [PATCH 11/17] nvme-rdma: Introduce cqe for local invalidation Max Gurtovoy
@ 2018-05-27 15:50 ` Max Gurtovoy
  2018-05-28  7:26   ` Christoph Hellwig
  2018-05-27 15:50 ` [PATCH 13/17] nvme-rdma: introduce nvme_rdma_sgl structure Max Gurtovoy
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-27 15:50 UTC (permalink / raw)


This function will be used to avoid code duplication while
setting keyed sgl fields.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/nvme/host/rdma.c | 34 +++++++++++++++++++---------------
 1 file changed, 19 insertions(+), 15 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index a240800..b96cf88 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1074,14 +1074,23 @@ static void nvme_rdma_unmap_data(struct nvme_rdma_queue *queue,
 	sg_free_table_chained(&req->sg_table, true);
 }
 
-static int nvme_rdma_set_sg_null(struct nvme_command *c)
+static void nvme_rdma_set_keyed_sgl(u64 addr, u64 length, u32 key,
+		struct nvme_command *c, bool invalidate)
 {
 	struct nvme_keyed_sgl_desc *sg = &c->common.dptr.ksgl;
 
-	sg->addr = 0;
-	put_unaligned_le24(0, sg->length);
-	put_unaligned_le32(0, sg->key);
+	sg->addr = cpu_to_le64(addr);
+	put_unaligned_le24(length, sg->length);
+	put_unaligned_le32(key, sg->key);
 	sg->type = NVME_KEY_SGL_FMT_DATA_DESC << 4;
+	if (invalidate)
+		sg->type |= NVME_SGL_FMT_INVALIDATE;
+}
+
+static int nvme_rdma_set_sg_null(struct nvme_command *c)
+{
+	nvme_rdma_set_keyed_sgl(0, 0, 0, c, false);
+
 	return 0;
 }
 
@@ -1105,12 +1114,11 @@ static int nvme_rdma_map_sg_inline(struct nvme_rdma_queue *queue,
 static int nvme_rdma_map_sg_single(struct nvme_rdma_queue *queue,
 		struct nvme_rdma_request *req, struct nvme_command *c)
 {
-	struct nvme_keyed_sgl_desc *sg = &c->common.dptr.ksgl;
+	nvme_rdma_set_keyed_sgl(sg_dma_address(req->sg_table.sgl),
+				sg_dma_len(req->sg_table.sgl),
+				queue->device->pd->unsafe_global_rkey,
+				c, false);
 
-	sg->addr = cpu_to_le64(sg_dma_address(req->sg_table.sgl));
-	put_unaligned_le24(sg_dma_len(req->sg_table.sgl), sg->length);
-	put_unaligned_le32(queue->device->pd->unsafe_global_rkey, sg->key);
-	sg->type = NVME_KEY_SGL_FMT_DATA_DESC << 4;
 	return 0;
 }
 
@@ -1118,7 +1126,6 @@ static int nvme_rdma_map_sg_fr(struct nvme_rdma_queue *queue,
 		struct nvme_rdma_request *req, struct nvme_command *c,
 		int count)
 {
-	struct nvme_keyed_sgl_desc *sg = &c->common.dptr.ksgl;
 	int nr;
 
 	req->mr = ib_mr_pool_get(queue->qp, &queue->qp->rdma_mrs);
@@ -1152,11 +1159,8 @@ static int nvme_rdma_map_sg_fr(struct nvme_rdma_queue *queue,
 			     IB_ACCESS_REMOTE_READ |
 			     IB_ACCESS_REMOTE_WRITE;
 
-	sg->addr = cpu_to_le64(req->mr->iova);
-	put_unaligned_le24(req->mr->length, sg->length);
-	put_unaligned_le32(req->mr->rkey, sg->key);
-	sg->type = (NVME_KEY_SGL_FMT_DATA_DESC << 4) |
-			NVME_SGL_FMT_INVALIDATE;
+	nvme_rdma_set_keyed_sgl(req->mr->iova, req->mr->length, req->mr->rkey,
+				c, true);
 
 	return 0;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 13/17] nvme-rdma: introduce nvme_rdma_sgl structure
  2018-05-27 15:50 [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host Max Gurtovoy
                   ` (11 preceding siblings ...)
  2018-05-27 15:50 ` [PATCH 12/17] nvme-rdma: Introduce nvme_rdma_set_keyed_sgl helper func Max Gurtovoy
@ 2018-05-27 15:50 ` Max Gurtovoy
  2018-05-27 15:50 ` [PATCH 14/17] nvme-rdma: refactor cmd mapping/unmapping mechanism Max Gurtovoy
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-27 15:50 UTC (permalink / raw)


This structure will bind all the necessary properties for mapping and
sending an sg list received from the block layer (that may also will
need a memory registration). This is a preparation patch for adding
T10-PI support that will use this structure to map the integrity sg list
as well.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/nvme/host/rdma.c | 111 +++++++++++++++++++++++++----------------------
 1 file changed, 60 insertions(+), 51 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index b96cf88..3b63811 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -55,23 +55,27 @@ struct nvme_rdma_qe {
 	u64			dma;
 };
 
+struct nvme_rdma_sgl {
+	struct ib_mr		*mr;
+	int			nents;
+	struct ib_reg_wr	reg_wr;
+	struct ib_cqe		reg_cqe;
+	struct ib_cqe		inv_cqe;
+	struct sg_table		sg_table;
+	struct scatterlist	first_sgl[SG_CHUNK_SIZE];
+};
+
 struct nvme_rdma_queue;
 struct nvme_rdma_request {
 	struct nvme_request	req;
-	struct ib_mr		*mr;
 	struct nvme_rdma_qe	sqe;
 	union nvme_result	result;
 	__le16			status;
 	refcount_t		ref;
 	struct ib_sge		sge[1 + NVME_RDMA_MAX_INLINE_SEGMENTS];
 	u32			num_sge;
-	int			nents;
-	struct ib_reg_wr	reg_wr;
-	struct ib_cqe		reg_cqe;
-	struct ib_cqe		inv_cqe;
 	struct nvme_rdma_queue  *queue;
-	struct sg_table		sg_table;
-	struct scatterlist	first_sgl[];
+	struct nvme_rdma_sgl	data_sgl;
 };
 
 enum nvme_rdma_queue_flags {
@@ -689,8 +693,7 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
 		set->queue_depth = NVME_AQ_MQ_TAG_DEPTH;
 		set->reserved_tags = 2; /* connect + keep-alive */
 		set->numa_node = NUMA_NO_NODE;
-		set->cmd_size = sizeof(struct nvme_rdma_request) +
-			SG_CHUNK_SIZE * sizeof(struct scatterlist);
+		set->cmd_size = sizeof(struct nvme_rdma_request);
 		set->driver_data = ctrl;
 		set->nr_hw_queues = 1;
 		set->timeout = ADMIN_TIMEOUT;
@@ -703,8 +706,7 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
 		set->reserved_tags = 1; /* fabric connect */
 		set->numa_node = NUMA_NO_NODE;
 		set->flags = BLK_MQ_F_SHOULD_MERGE;
-		set->cmd_size = sizeof(struct nvme_rdma_request) +
-			SG_CHUNK_SIZE * sizeof(struct scatterlist);
+		set->cmd_size = sizeof(struct nvme_rdma_request);
 		set->driver_data = ctrl;
 		set->nr_hw_queues = nctrl->queue_count - 1;
 		set->timeout = NVME_IO_TIMEOUT;
@@ -1020,8 +1022,10 @@ static void nvme_rdma_memreg_done(struct ib_cq *cq, struct ib_wc *wc)
 
 static void nvme_rdma_inv_rkey_done(struct ib_cq *cq, struct ib_wc *wc)
 {
+	struct nvme_rdma_sgl *sgl =
+		container_of(wc->wr_cqe, struct nvme_rdma_sgl, inv_cqe);
 	struct nvme_rdma_request *req =
-		container_of(wc->wr_cqe, struct nvme_rdma_request, inv_cqe);
+		container_of(sgl, struct nvme_rdma_request, data_sgl);
 	struct request *rq = blk_mq_rq_from_pdu(req);
 
 	if (unlikely(wc->status != IB_WC_SUCCESS)) {
@@ -1055,23 +1059,24 @@ static void nvme_rdma_unmap_data(struct nvme_rdma_queue *queue,
 		struct request *rq)
 {
 	struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
+	struct nvme_rdma_sgl *sgl = &req->data_sgl;
 	struct nvme_rdma_device *dev = queue->device;
 	struct ib_device *ibdev = dev->dev;
 
 	if (!blk_rq_payload_bytes(rq))
 		return;
 
-	if (req->mr) {
-		ib_mr_pool_put(queue->qp, &queue->qp->rdma_mrs, req->mr);
-		req->mr = NULL;
+	if (sgl->mr) {
+		ib_mr_pool_put(queue->qp, &queue->qp->rdma_mrs, sgl->mr);
+		sgl->mr = NULL;
 	}
 
-	ib_dma_unmap_sg(ibdev, req->sg_table.sgl,
-			req->nents, rq_data_dir(rq) ==
+	ib_dma_unmap_sg(ibdev, sgl->sg_table.sgl,
+			sgl->nents, rq_data_dir(rq) ==
 				    WRITE ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
 
 	nvme_cleanup_cmd(rq);
-	sg_free_table_chained(&req->sg_table, true);
+	sg_free_table_chained(&sgl->sg_table, true);
 }
 
 static void nvme_rdma_set_keyed_sgl(u64 addr, u64 length, u32 key,
@@ -1099,12 +1104,12 @@ static int nvme_rdma_map_sg_inline(struct nvme_rdma_queue *queue,
 {
 	struct nvme_sgl_desc *sg = &c->common.dptr.sgl;
 
-	req->sge[1].addr = sg_dma_address(req->sg_table.sgl);
-	req->sge[1].length = sg_dma_len(req->sg_table.sgl);
+	req->sge[1].addr = sg_dma_address(req->data_sgl.sg_table.sgl);
+	req->sge[1].length = sg_dma_len(req->data_sgl.sg_table.sgl);
 	req->sge[1].lkey = queue->device->pd->local_dma_lkey;
 
 	sg->addr = cpu_to_le64(queue->ctrl->ctrl.icdoff);
-	sg->length = cpu_to_le32(sg_dma_len(req->sg_table.sgl));
+	sg->length = cpu_to_le32(sg_dma_len(req->data_sgl.sg_table.sgl));
 	sg->type = (NVME_SGL_FMT_DATA_DESC << 4) | NVME_SGL_FMT_OFFSET;
 
 	req->num_sge++;
@@ -1114,8 +1119,8 @@ static int nvme_rdma_map_sg_inline(struct nvme_rdma_queue *queue,
 static int nvme_rdma_map_sg_single(struct nvme_rdma_queue *queue,
 		struct nvme_rdma_request *req, struct nvme_command *c)
 {
-	nvme_rdma_set_keyed_sgl(sg_dma_address(req->sg_table.sgl),
-				sg_dma_len(req->sg_table.sgl),
+	nvme_rdma_set_keyed_sgl(sg_dma_address(req->data_sgl.sg_table.sgl),
+				sg_dma_len(req->data_sgl.sg_table.sgl),
 				queue->device->pd->unsafe_global_rkey,
 				c, false);
 
@@ -1126,40 +1131,41 @@ static int nvme_rdma_map_sg_fr(struct nvme_rdma_queue *queue,
 		struct nvme_rdma_request *req, struct nvme_command *c,
 		int count)
 {
+	struct nvme_rdma_sgl *sgl = &req->data_sgl;
 	int nr;
 
-	req->mr = ib_mr_pool_get(queue->qp, &queue->qp->rdma_mrs);
-	if (WARN_ON_ONCE(!req->mr))
+	sgl->mr = ib_mr_pool_get(queue->qp, &queue->qp->rdma_mrs);
+	if (WARN_ON_ONCE(!sgl->mr))
 		return -EAGAIN;
 
 	/*
 	 * Align the MR to a 4K page size to match the ctrl page size and
 	 * the block virtual boundary.
 	 */
-	nr = ib_map_mr_sg(req->mr, req->sg_table.sgl, count, NULL, SZ_4K);
+	nr = ib_map_mr_sg(sgl->mr, sgl->sg_table.sgl, count, NULL, SZ_4K);
 	if (unlikely(nr < count)) {
-		ib_mr_pool_put(queue->qp, &queue->qp->rdma_mrs, req->mr);
-		req->mr = NULL;
+		ib_mr_pool_put(queue->qp, &queue->qp->rdma_mrs, sgl->mr);
+		sgl->mr = NULL;
 		if (nr < 0)
 			return nr;
 		return -EINVAL;
 	}
 
-	ib_update_fast_reg_key(req->mr, ib_inc_rkey(req->mr->rkey));
+	ib_update_fast_reg_key(sgl->mr, ib_inc_rkey(sgl->mr->rkey));
 
-	req->reg_cqe.done = nvme_rdma_memreg_done;
-	req->inv_cqe.done = nvme_rdma_inv_rkey_done;
-	memset(&req->reg_wr, 0, sizeof(req->reg_wr));
-	req->reg_wr.wr.opcode = IB_WR_REG_MR;
-	req->reg_wr.wr.wr_cqe = &req->reg_cqe;
-	req->reg_wr.wr.num_sge = 0;
-	req->reg_wr.mr = req->mr;
-	req->reg_wr.key = req->mr->rkey;
-	req->reg_wr.access = IB_ACCESS_LOCAL_WRITE |
+	sgl->reg_cqe.done = nvme_rdma_memreg_done;
+	sgl->inv_cqe.done = nvme_rdma_inv_rkey_done;
+	memset(&sgl->reg_wr, 0, sizeof(sgl->reg_wr));
+	sgl->reg_wr.wr.opcode = IB_WR_REG_MR;
+	sgl->reg_wr.wr.wr_cqe = &sgl->reg_cqe;
+	sgl->reg_wr.wr.num_sge = 0;
+	sgl->reg_wr.mr = sgl->mr;
+	sgl->reg_wr.key = sgl->mr->rkey;
+	sgl->reg_wr.access = IB_ACCESS_LOCAL_WRITE |
 			     IB_ACCESS_REMOTE_READ |
 			     IB_ACCESS_REMOTE_WRITE;
 
-	nvme_rdma_set_keyed_sgl(req->mr->iova, req->mr->length, req->mr->rkey,
+	nvme_rdma_set_keyed_sgl(sgl->mr->iova, sgl->mr->length, sgl->mr->rkey,
 				c, true);
 
 	return 0;
@@ -1169,6 +1175,7 @@ static int nvme_rdma_map_data(struct nvme_rdma_queue *queue,
 		struct request *rq, struct nvme_command *c)
 {
 	struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
+	struct nvme_rdma_sgl *sgl = &req->data_sgl;
 	struct nvme_rdma_device *dev = queue->device;
 	struct ib_device *ibdev = dev->dev;
 	int count, ret;
@@ -1181,18 +1188,18 @@ static int nvme_rdma_map_data(struct nvme_rdma_queue *queue,
 	if (!blk_rq_payload_bytes(rq))
 		return nvme_rdma_set_sg_null(c);
 
-	req->sg_table.sgl = req->first_sgl;
-	ret = sg_alloc_table_chained(&req->sg_table,
-			blk_rq_nr_phys_segments(rq), req->sg_table.sgl);
+	sgl->sg_table.sgl = sgl->first_sgl;
+	ret = sg_alloc_table_chained(&sgl->sg_table,
+			blk_rq_nr_phys_segments(rq), sgl->sg_table.sgl);
 	if (ret)
 		return -ENOMEM;
 
-	req->nents = blk_rq_map_sg(rq->q, rq, req->sg_table.sgl);
+	sgl->nents = blk_rq_map_sg(rq->q, rq, sgl->sg_table.sgl);
 
-	count = ib_dma_map_sg(ibdev, req->sg_table.sgl, req->nents,
+	count = ib_dma_map_sg(ibdev, sgl->sg_table.sgl, sgl->nents,
 		    rq_data_dir(rq) == WRITE ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
 	if (unlikely(count <= 0)) {
-		sg_free_table_chained(&req->sg_table, true);
+		sg_free_table_chained(&sgl->sg_table, true);
 		return -EIO;
 	}
 
@@ -1330,6 +1337,7 @@ static int nvme_rdma_process_nvme_rsp(struct nvme_rdma_queue *queue,
 {
 	struct request *rq;
 	struct nvme_rdma_request *req;
+	struct nvme_rdma_sgl *sgl;
 	int ret = 0;
 
 	rq = blk_mq_tag_to_rq(nvme_rdma_tagset(queue), cqe->command_id);
@@ -1344,20 +1352,21 @@ static int nvme_rdma_process_nvme_rsp(struct nvme_rdma_queue *queue,
 
 	req->status = cqe->status;
 	req->result = cqe->result;
+	sgl = &req->data_sgl;
 
 	if (wc->wc_flags & IB_WC_WITH_INVALIDATE) {
-		if (unlikely(wc->ex.invalidate_rkey != req->mr->rkey)) {
+		if (unlikely(wc->ex.invalidate_rkey != sgl->mr->rkey)) {
 			dev_err(queue->ctrl->ctrl.device,
 				"Bogus remote invalidation for rkey %#x\n",
-				req->mr->rkey);
+				sgl->mr->rkey);
 			nvme_rdma_error_recovery(queue->ctrl);
 		}
-	} else if (req->mr) {
-		ret = nvme_rdma_inv_rkey(queue, req->mr->rkey, &req->inv_cqe);
+	} else if (sgl->mr) {
+		ret = nvme_rdma_inv_rkey(queue, sgl->mr->rkey, &sgl->inv_cqe);
 		if (unlikely(ret < 0)) {
 			dev_err(queue->ctrl->ctrl.device,
 				"Queueing INV WR for rkey %#x failed (%d)\n",
-				req->mr->rkey, ret);
+				sgl->mr->rkey, ret);
 			nvme_rdma_error_recovery(queue->ctrl);
 		}
 		/* the local invalidation completion will end the request */
@@ -1650,7 +1659,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 			sizeof(struct nvme_command), DMA_TO_DEVICE);
 
 	err = nvme_rdma_post_send(queue, sqe, req->sge, req->num_sge,
-			req->mr ? &req->reg_wr.wr : NULL);
+			req->data_sgl.mr ? &req->data_sgl.reg_wr.wr : NULL);
 	if (unlikely(err)) {
 		nvme_rdma_unmap_data(queue, rq);
 		goto err;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 14/17] nvme-rdma: refactor cmd mapping/unmapping mechanism
  2018-05-27 15:50 [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host Max Gurtovoy
                   ` (12 preceding siblings ...)
  2018-05-27 15:50 ` [PATCH 13/17] nvme-rdma: introduce nvme_rdma_sgl structure Max Gurtovoy
@ 2018-05-27 15:50 ` Max Gurtovoy
  2018-05-30 22:33   ` Sagi Grimberg
  2018-05-27 15:50 ` [PATCH 15/17] nvme-rdma: Add helper function for preparing sg list to RDMA operation Max Gurtovoy
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-27 15:50 UTC (permalink / raw)


Split the IO request mapping for RDMA operation to a structured
procedure:
- map the rq data sgl to rdma data_sgl
- map the rdma data_sgl to nvme descriptor

This is a preparation patch for adding T10-PI support.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/nvme/host/rdma.c | 114 +++++++++++++++++++++++++++++++----------------
 1 file changed, 75 insertions(+), 39 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 3b63811..a54de37 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1055,28 +1055,35 @@ static int nvme_rdma_inv_rkey(struct nvme_rdma_queue *queue, u32 rkey,
 	return ib_post_send(queue->qp, &wr, &bad_wr);
 }
 
-static void nvme_rdma_unmap_data(struct nvme_rdma_queue *queue,
-		struct request *rq)
+static void nvme_rdma_unmap_data_sgl(struct nvme_rdma_queue *queue,
+		struct nvme_rdma_sgl *sgl, struct request *rq)
 {
-	struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
-	struct nvme_rdma_sgl *sgl = &req->data_sgl;
 	struct nvme_rdma_device *dev = queue->device;
 	struct ib_device *ibdev = dev->dev;
 
+	ib_dma_unmap_sg(ibdev, sgl->sg_table.sgl,
+			sgl->nents, rq_data_dir(rq) ==
+				    WRITE ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
+
+	sg_free_table_chained(&sgl->sg_table, true);
+}
+
+static void nvme_rdma_unmap_cmd(struct nvme_rdma_queue *queue,
+		struct request *rq)
+{
+	struct nvme_rdma_request *req;
+
 	if (!blk_rq_payload_bytes(rq))
 		return;
 
-	if (sgl->mr) {
-		ib_mr_pool_put(queue->qp, &queue->qp->rdma_mrs, sgl->mr);
-		sgl->mr = NULL;
+	req = blk_mq_rq_to_pdu(rq);
+	if (req->data_sgl.mr) {
+		ib_mr_pool_put(queue->qp, &queue->qp->rdma_mrs, req->data_sgl.mr);
+		req->data_sgl.mr = NULL;
 	}
 
-	ib_dma_unmap_sg(ibdev, sgl->sg_table.sgl,
-			sgl->nents, rq_data_dir(rq) ==
-				    WRITE ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
-
+	nvme_rdma_unmap_data_sgl(queue, &req->data_sgl, rq);
 	nvme_cleanup_cmd(rq);
-	sg_free_table_chained(&sgl->sg_table, true);
 }
 
 static void nvme_rdma_set_keyed_sgl(u64 addr, u64 length, u32 key,
@@ -1171,7 +1178,49 @@ static int nvme_rdma_map_sg_fr(struct nvme_rdma_queue *queue,
 	return 0;
 }
 
-static int nvme_rdma_map_data(struct nvme_rdma_queue *queue,
+static int nvme_rdma_map_data_sgl(struct nvme_rdma_sgl *sgl,
+		struct request *rq, struct ib_device *ibdev, int *count)
+{
+	int ret;
+
+	sgl->sg_table.sgl = sgl->first_sgl;
+	ret = sg_alloc_table_chained(&sgl->sg_table,
+			blk_rq_nr_phys_segments(rq), sgl->sg_table.sgl);
+	if (unlikely(ret))
+		return -ENOMEM;
+
+	sgl->nents = blk_rq_map_sg(rq->q, rq, sgl->sg_table.sgl);
+
+	*count = ib_dma_map_sg(ibdev, sgl->sg_table.sgl, sgl->nents,
+		    rq_data_dir(rq) == WRITE ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
+	if (unlikely(*count <= 0)) {
+		ret = -EIO;
+		goto out_free_table;
+	}
+
+	return 0;
+
+out_free_table:
+	sg_free_table_chained(&sgl->sg_table, true);
+	return ret;
+}
+
+static int nvme_rdma_map_rq(struct nvme_rdma_queue *queue,
+		struct nvme_rdma_request *req, struct request *rq,
+		struct nvme_command *c, int nents)
+{
+	if (nents == 1) {
+		if (rq_data_dir(rq) == WRITE && nvme_rdma_queue_idx(queue) &&
+		    blk_rq_payload_bytes(rq) <=	nvme_rdma_inline_data_size(queue))
+			return nvme_rdma_map_sg_inline(queue, req, c);
+		if (queue->device->pd->flags & IB_PD_UNSAFE_GLOBAL_RKEY)
+			return nvme_rdma_map_sg_single(queue, req, c);
+	}
+
+	return nvme_rdma_map_sg_fr(queue, req, c, nents);
+}
+
+static int nvme_rdma_setup_cmd(struct nvme_rdma_queue *queue,
 		struct request *rq, struct nvme_command *c)
 {
 	struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
@@ -1188,32 +1237,19 @@ static int nvme_rdma_map_data(struct nvme_rdma_queue *queue,
 	if (!blk_rq_payload_bytes(rq))
 		return nvme_rdma_set_sg_null(c);
 
-	sgl->sg_table.sgl = sgl->first_sgl;
-	ret = sg_alloc_table_chained(&sgl->sg_table,
-			blk_rq_nr_phys_segments(rq), sgl->sg_table.sgl);
-	if (ret)
-		return -ENOMEM;
-
-	sgl->nents = blk_rq_map_sg(rq->q, rq, sgl->sg_table.sgl);
-
-	count = ib_dma_map_sg(ibdev, sgl->sg_table.sgl, sgl->nents,
-		    rq_data_dir(rq) == WRITE ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
-	if (unlikely(count <= 0)) {
-		sg_free_table_chained(&sgl->sg_table, true);
-		return -EIO;
-	}
+	ret = nvme_rdma_map_data_sgl(sgl, rq, ibdev, &count);
+	if (unlikely(ret))
+		return ret;
 
-	if (count == 1) {
-		if (rq_data_dir(rq) == WRITE && nvme_rdma_queue_idx(queue) &&
-		    blk_rq_payload_bytes(rq) <=
-				nvme_rdma_inline_data_size(queue))
-			return nvme_rdma_map_sg_inline(queue, req, c);
+	ret = nvme_rdma_map_rq(queue, req, rq, c, count);
+	if (unlikely(ret))
+		goto out_unmap_data_sgl;
 
-		if (dev->pd->flags & IB_PD_UNSAFE_GLOBAL_RKEY)
-			return nvme_rdma_map_sg_single(queue, req, c);
-	}
+	return 0;
 
-	return nvme_rdma_map_sg_fr(queue, req, c, count);
+out_unmap_data_sgl:
+	nvme_rdma_unmap_data_sgl(queue, sgl, rq);
+	return ret;
 }
 
 static void nvme_rdma_send_done(struct ib_cq *cq, struct ib_wc *wc)
@@ -1645,7 +1681,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 	blk_mq_start_request(rq);
 
-	err = nvme_rdma_map_data(queue, rq, c);
+	err = nvme_rdma_setup_cmd(queue, rq, c);
 	if (unlikely(err < 0)) {
 		dev_err(queue->ctrl->ctrl.device,
 			     "Failed to map data (%d)\n", err);
@@ -1661,7 +1697,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 	err = nvme_rdma_post_send(queue, sqe, req->sge, req->num_sge,
 			req->data_sgl.mr ? &req->data_sgl.reg_wr.wr : NULL);
 	if (unlikely(err)) {
-		nvme_rdma_unmap_data(queue, rq);
+		nvme_rdma_unmap_cmd(queue, rq);
 		goto err;
 	}
 
@@ -1697,7 +1733,7 @@ static void nvme_rdma_complete_rq(struct request *rq)
 {
 	struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
 
-	nvme_rdma_unmap_data(req->queue, rq);
+	nvme_rdma_unmap_cmd(req->queue, rq);
 	nvme_complete_rq(rq);
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 15/17] nvme-rdma: Add helper function for preparing sg list to RDMA operation
  2018-05-27 15:50 [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host Max Gurtovoy
                   ` (13 preceding siblings ...)
  2018-05-27 15:50 ` [PATCH 14/17] nvme-rdma: refactor cmd mapping/unmapping mechanism Max Gurtovoy
@ 2018-05-27 15:50 ` Max Gurtovoy
  2018-05-27 15:50 ` [PATCH 16/17] nvme-rdma: Introduce nvme_rdma_first_wr helper function Max Gurtovoy
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-27 15:50 UTC (permalink / raw)


In orter to prepare a scatterlist for an RDMA the following actions should
be accomplished:
- Set a memory region (MR) to a dma mapped SG list.
- Set an appropriate work request (WR) for MR registration.
- Set an appropriate callbacks for MR registration and invalidation completions.

This helper also will be used for preparing protected sg list for T10-PI.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/nvme/host/rdma.c | 53 ++++++++++++++++++++++++++++++++----------------
 1 file changed, 35 insertions(+), 18 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index a54de37..f1d9759 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1134,14 +1134,16 @@ static int nvme_rdma_map_sg_single(struct nvme_rdma_queue *queue,
 	return 0;
 }
 
-static int nvme_rdma_map_sg_fr(struct nvme_rdma_queue *queue,
-		struct nvme_rdma_request *req, struct nvme_command *c,
-		int count)
+static int nvme_rdma_map_mr_sg(struct nvme_rdma_queue *queue,
+			       struct list_head *mr_pool,
+			       struct nvme_rdma_sgl *sgl, int nents)
 {
-	struct nvme_rdma_sgl *sgl = &req->data_sgl;
+	struct scatterlist *sg = sgl->sg_table.sgl;
+	struct ib_reg_wr *reg_wr = &sgl->reg_wr;
+	struct ib_cqe *reg_cqe = &sgl->reg_cqe;
 	int nr;
 
-	sgl->mr = ib_mr_pool_get(queue->qp, &queue->qp->rdma_mrs);
+	sgl->mr = ib_mr_pool_get(queue->qp, mr_pool);
 	if (WARN_ON_ONCE(!sgl->mr))
 		return -EAGAIN;
 
@@ -1149,9 +1151,9 @@ static int nvme_rdma_map_sg_fr(struct nvme_rdma_queue *queue,
 	 * Align the MR to a 4K page size to match the ctrl page size and
 	 * the block virtual boundary.
 	 */
-	nr = ib_map_mr_sg(sgl->mr, sgl->sg_table.sgl, count, NULL, SZ_4K);
-	if (unlikely(nr < count)) {
-		ib_mr_pool_put(queue->qp, &queue->qp->rdma_mrs, sgl->mr);
+	nr = ib_map_mr_sg(sgl->mr, sg, nents, NULL, SZ_4K);
+	if (unlikely(nr < nents)) {
+		ib_mr_pool_put(queue->qp, mr_pool, sgl->mr);
 		sgl->mr = NULL;
 		if (nr < 0)
 			return nr;
@@ -1160,17 +1162,32 @@ static int nvme_rdma_map_sg_fr(struct nvme_rdma_queue *queue,
 
 	ib_update_fast_reg_key(sgl->mr, ib_inc_rkey(sgl->mr->rkey));
 
-	sgl->reg_cqe.done = nvme_rdma_memreg_done;
+	reg_cqe->done = nvme_rdma_memreg_done;
+	memset(reg_wr, 0, sizeof(*reg_wr));
+	reg_wr->wr.opcode = IB_WR_REG_MR;
+	reg_wr->wr.wr_cqe = reg_cqe;
+	reg_wr->wr.num_sge = 0;
+	reg_wr->mr = sgl->mr;
+	reg_wr->key = sgl->mr->rkey;
+	reg_wr->access = IB_ACCESS_LOCAL_WRITE |
+			 IB_ACCESS_REMOTE_READ |
+			 IB_ACCESS_REMOTE_WRITE;
+
+	return 0;
+}
+
+static int nvme_rdma_map_sg_fr(struct nvme_rdma_queue *queue,
+		struct nvme_rdma_request *req, struct nvme_command *c,
+		int nents)
+{
+	struct nvme_rdma_sgl *sgl = &req->data_sgl;
+	int ret;
+
+	ret = nvme_rdma_map_mr_sg(queue, &queue->qp->rdma_mrs, sgl, nents);
+	if (unlikely(ret))
+		return -EAGAIN;
+
 	sgl->inv_cqe.done = nvme_rdma_inv_rkey_done;
-	memset(&sgl->reg_wr, 0, sizeof(sgl->reg_wr));
-	sgl->reg_wr.wr.opcode = IB_WR_REG_MR;
-	sgl->reg_wr.wr.wr_cqe = &sgl->reg_cqe;
-	sgl->reg_wr.wr.num_sge = 0;
-	sgl->reg_wr.mr = sgl->mr;
-	sgl->reg_wr.key = sgl->mr->rkey;
-	sgl->reg_wr.access = IB_ACCESS_LOCAL_WRITE |
-			     IB_ACCESS_REMOTE_READ |
-			     IB_ACCESS_REMOTE_WRITE;
 
 	nvme_rdma_set_keyed_sgl(sgl->mr->iova, sgl->mr->length, sgl->mr->rkey,
 				c, true);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 16/17] nvme-rdma: Introduce nvme_rdma_first_wr helper function
  2018-05-27 15:50 [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host Max Gurtovoy
                   ` (14 preceding siblings ...)
  2018-05-27 15:50 ` [PATCH 15/17] nvme-rdma: Add helper function for preparing sg list to RDMA operation Max Gurtovoy
@ 2018-05-27 15:50 ` Max Gurtovoy
  2018-05-27 15:50 ` [PATCH 17/17] nvme-rdma: Add T10-PI support Max Gurtovoy
  2018-05-30 21:47 ` [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host Sagi Grimberg
  17 siblings, 0 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-27 15:50 UTC (permalink / raw)


This patch doesn't change any logic and is needed only for improving
code readability.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/nvme/host/rdma.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index f1d9759..88f2f00 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1668,6 +1668,15 @@ static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
 	return BLK_EH_HANDLED;
 }
 
+static inline struct ib_send_wr *
+nvme_rdma_first_wr(struct nvme_rdma_request *req)
+{
+	if (req->data_sgl.mr)
+		return &req->data_sgl.reg_wr.wr;
+
+	return NULL;
+}
+
 static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 		const struct blk_mq_queue_data *bd)
 {
@@ -1712,7 +1721,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 			sizeof(struct nvme_command), DMA_TO_DEVICE);
 
 	err = nvme_rdma_post_send(queue, sqe, req->sge, req->num_sge,
-			req->data_sgl.mr ? &req->data_sgl.reg_wr.wr : NULL);
+				  nvme_rdma_first_wr(req));
 	if (unlikely(err)) {
 		nvme_rdma_unmap_cmd(queue, rq);
 		goto err;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 17/17] nvme-rdma: Add T10-PI support
  2018-05-27 15:50 [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host Max Gurtovoy
                   ` (15 preceding siblings ...)
  2018-05-27 15:50 ` [PATCH 16/17] nvme-rdma: Introduce nvme_rdma_first_wr helper function Max Gurtovoy
@ 2018-05-27 15:50 ` Max Gurtovoy
  2018-05-28  7:28   ` Christoph Hellwig
  2018-05-30 23:05   ` Sagi Grimberg
  2018-05-30 21:47 ` [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host Sagi Grimberg
  17 siblings, 2 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-27 15:50 UTC (permalink / raw)


For capable HCAs (e.g. ConnectX-4/ConnectX-5) this will allow end-to-end
protection information passthrough and validation for NVMe over RDMA transport.
Similar to iSER, T10-PI offload support was implemented over RDMA signature
verbs API and is enabled via module parameter. In the future we might want
to add support per controller.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/nvme/host/rdma.c | 533 +++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 490 insertions(+), 43 deletions(-)

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 88f2f00..f64a91f 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -57,6 +57,7 @@ struct nvme_rdma_qe {
 
 struct nvme_rdma_sgl {
 	struct ib_mr		*mr;
+	struct ib_sge		sge;
 	int			nents;
 	struct ib_reg_wr	reg_wr;
 	struct ib_cqe		reg_cqe;
@@ -65,6 +66,16 @@ struct nvme_rdma_sgl {
 	struct scatterlist	first_sgl[SG_CHUNK_SIZE];
 };
 
+struct nvme_rdma_pi_sgl {
+	struct nvme_rdma_sgl		sgl;
+	struct ib_sig_attrs		sig_attrs;
+	struct ib_sig_handover_wr	sig_wr;
+	struct ib_cqe			sig_cqe;
+	struct ib_cqe			sig_inv_cqe;
+	struct ib_sge			sig_sge;
+	struct ib_mr			*sig_mr;
+};
+
 struct nvme_rdma_queue;
 struct nvme_rdma_request {
 	struct nvme_request	req;
@@ -76,6 +87,10 @@ struct nvme_rdma_request {
 	u32			num_sge;
 	struct nvme_rdma_queue  *queue;
 	struct nvme_rdma_sgl	data_sgl;
+
+	/* T10-PI support */
+	bool			is_protected;
+	struct nvme_rdma_pi_sgl	pi_sgl[];
 };
 
 enum nvme_rdma_queue_flags {
@@ -97,6 +112,7 @@ struct nvme_rdma_queue {
 	struct rdma_cm_id	*cm_id;
 	int			cm_error;
 	struct completion	cm_done;
+	bool			pi_support;
 };
 
 struct nvme_rdma_ctrl {
@@ -122,6 +138,7 @@ struct nvme_rdma_ctrl {
 	struct sockaddr_storage src_addr;
 
 	struct nvme_ctrl	ctrl;
+	bool			pi_support;
 };
 
 static inline struct nvme_rdma_ctrl *to_rdma_ctrl(struct nvme_ctrl *ctrl)
@@ -145,6 +162,10 @@ static inline struct nvme_rdma_ctrl *to_rdma_ctrl(struct nvme_ctrl *ctrl)
 MODULE_PARM_DESC(register_always,
 	 "Use memory registration even for contiguous memory regions");
 
+static bool pi_enable = false;
+module_param_named(pi_enable, pi_enable, bool, 0444);
+MODULE_PARM_DESC(pi_enable, "Enable T10-PI offload support (default:disabled)");
+
 static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
 		struct rdma_cm_event *event);
 static void nvme_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc);
@@ -259,6 +280,8 @@ static int nvme_rdma_create_qp(struct nvme_rdma_queue *queue, const int factor)
 	init_attr.qp_type = IB_QPT_RC;
 	init_attr.send_cq = queue->ib_cq;
 	init_attr.recv_cq = queue->ib_cq;
+	if (queue->pi_support)
+		init_attr.create_flags |= IB_QP_CREATE_SIGNATURE_EN;
 
 	ret = rdma_create_qp(queue->cm_id, dev->pd, &init_attr);
 
@@ -393,6 +416,12 @@ static int nvme_rdma_dev_get(struct nvme_rdma_device *dev)
 	return NULL;
 }
 
+static int nvme_rdma_get_max_fr_pages(struct ib_device *ibdev)
+{
+	return min_t(u32, NVME_RDMA_MAX_SEGMENTS,
+		     ibdev->attrs.max_fast_reg_page_list_len);
+}
+
 static void nvme_rdma_destroy_queue_ib(struct nvme_rdma_queue *queue)
 {
 	struct nvme_rdma_device *dev;
@@ -404,6 +433,8 @@ static void nvme_rdma_destroy_queue_ib(struct nvme_rdma_queue *queue)
 	dev = queue->device;
 	ibdev = dev->dev;
 
+	if (queue->pi_support)
+		ib_mr_pool_destroy(queue->qp, &queue->qp->sig_mrs);
 	ib_mr_pool_destroy(queue->qp, &queue->qp->rdma_mrs);
 
 	/*
@@ -420,16 +451,11 @@ static void nvme_rdma_destroy_queue_ib(struct nvme_rdma_queue *queue)
 	nvme_rdma_dev_put(dev);
 }
 
-static int nvme_rdma_get_max_fr_pages(struct ib_device *ibdev)
-{
-	return min_t(u32, NVME_RDMA_MAX_SEGMENTS,
-		     ibdev->attrs.max_fast_reg_page_list_len);
-}
-
 static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue)
 {
 	struct ib_device *ibdev;
-	const int send_wr_factor = 3;			/* MR, SEND, INV */
+	/* (MR, SEND, INV) + MR (prot, sig), INV (prot, sig) for T-10 PI */
+	const int send_wr_factor = (queue->pi_support * 4) + 3;
 	const int cq_factor = send_wr_factor + 1;	/* + RECV */
 	int comp_vector, idx = nvme_rdma_queue_idx(queue);
 	int ret;
@@ -469,20 +495,35 @@ static int nvme_rdma_create_queue_ib(struct nvme_rdma_queue *queue)
 	}
 
 	ret = ib_mr_pool_init(queue->qp, &queue->qp->rdma_mrs,
-			      queue->queue_size,
+			      (queue->pi_support + 1) * queue->queue_size,
 			      IB_MR_TYPE_MEM_REG,
 			      nvme_rdma_get_max_fr_pages(ibdev));
 	if (ret) {
 		dev_err(queue->ctrl->ctrl.device,
 			"failed to initialize MR pool sized %d for QID %d\n",
-			queue->queue_size, idx);
+			(queue->pi_support + 1) * queue->queue_size, idx);
 		goto out_destroy_ring;
 	}
 
+	if (queue->pi_support) {
+		ret = ib_mr_pool_init(queue->qp, &queue->qp->sig_mrs,
+				queue->queue_size,
+				IB_MR_TYPE_SIGNATURE,
+				2);
+		if (ret) {
+			dev_err(queue->ctrl->ctrl.device,
+			"failed to init sig MR pool sized %d for QID %d\n",
+			queue->queue_size, nvme_rdma_queue_idx(queue));
+			goto out_destroy_mr_pool;
+		}
+	}
+
 	set_bit(NVME_RDMA_Q_TR_READY, &queue->flags);
 
 	return 0;
 
+out_destroy_mr_pool:
+	ib_mr_pool_destroy(queue->qp, &queue->qp->rdma_mrs);
 out_destroy_ring:
 	nvme_rdma_free_ring(ibdev, queue->rsp_ring, queue->queue_size,
 			    sizeof(struct nvme_completion), DMA_FROM_DEVICE);
@@ -504,9 +545,11 @@ static int nvme_rdma_alloc_queue(struct nvme_rdma_ctrl *ctrl,
 
 	queue = &ctrl->queues[idx];
 	queue->ctrl = ctrl;
+	queue->pi_support = idx && ctrl->pi_support;
 	init_completion(&queue->cm_done);
 
-	if (idx > 0)
+	/* No inline data for Admin queue and T10-PI capabale queues */
+	if (idx > 0 && !queue->pi_support)
 		queue->cmnd_capsule_len = ctrl->ctrl.ioccsz * 16;
 	else
 		queue->cmnd_capsule_len = sizeof(struct nvme_command);
@@ -706,7 +749,8 @@ static struct blk_mq_tag_set *nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
 		set->reserved_tags = 1; /* fabric connect */
 		set->numa_node = NUMA_NO_NODE;
 		set->flags = BLK_MQ_F_SHOULD_MERGE;
-		set->cmd_size = sizeof(struct nvme_rdma_request);
+		set->cmd_size = sizeof(struct nvme_rdma_request) +
+			(ctrl->pi_support * sizeof(struct nvme_rdma_pi_sgl));
 		set->driver_data = ctrl;
 		set->nr_hw_queues = nctrl->queue_count - 1;
 		set->timeout = NVME_IO_TIMEOUT;
@@ -756,6 +800,19 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
 
 	ctrl->device = ctrl->queues[0].device;
 
+	/* T10-PI support */
+	if (pi_enable) {
+		if (!(ctrl->device->dev->attrs.device_cap_flags &
+		      IB_DEVICE_SIGNATURE_HANDOVER)) {
+			dev_warn(ctrl->ctrl.device,
+				 "T10-PI requested but not supported on %s, "
+				 "continue without T10-PI\n",
+				 ctrl->device->dev->name);
+			ctrl->pi_support = false;
+		} else {
+			ctrl->pi_support = true;
+		}
+	}
 	ctrl->max_fr_pages = nvme_rdma_get_max_fr_pages(ctrl->device->dev);
 
 	if (new) {
@@ -793,6 +850,8 @@ static int nvme_rdma_configure_admin_queue(struct nvme_rdma_ctrl *ctrl,
 
 	ctrl->ctrl.max_hw_sectors =
 		(ctrl->max_fr_pages - 1) << (ilog2(SZ_4K) - 9);
+	if (ctrl->pi_support)
+		ctrl->ctrl.max_integrity_segments = ctrl->max_fr_pages >> 1;
 
 	error = nvme_init_identify(&ctrl->ctrl);
 	if (error)
@@ -1020,6 +1079,18 @@ static void nvme_rdma_memreg_done(struct ib_cq *cq, struct ib_wc *wc)
 		nvme_rdma_wr_error(cq, wc, "MEMREG");
 }
 
+static void nvme_rdma_inv_prot_rkey_done(struct ib_cq *cq, struct ib_wc *wc)
+{
+	if (unlikely(wc->status != IB_WC_SUCCESS))
+		nvme_rdma_wr_error(cq, wc, "LOCAL_PROT_INV");
+}
+
+static void nvme_rdma_inv_sig_rkey_done(struct ib_cq *cq, struct ib_wc *wc)
+{
+	if (unlikely(wc->status != IB_WC_SUCCESS))
+		nvme_rdma_wr_error(cq, wc, "LOCAL_SIG_INV");
+}
+
 static void nvme_rdma_inv_rkey_done(struct ib_cq *cq, struct ib_wc *wc)
 {
 	struct nvme_rdma_sgl *sgl =
@@ -1068,6 +1139,13 @@ static void nvme_rdma_unmap_data_sgl(struct nvme_rdma_queue *queue,
 	sg_free_table_chained(&sgl->sg_table, true);
 }
 
+static void nvme_rdma_unmap_integrity_sgl(struct nvme_rdma_queue *queue,
+		struct nvme_rdma_pi_sgl *pi_sgl,
+		struct request *rq)
+{
+	nvme_rdma_unmap_data_sgl(queue, &pi_sgl->sgl, rq);
+}
+
 static void nvme_rdma_unmap_cmd(struct nvme_rdma_queue *queue,
 		struct request *rq)
 {
@@ -1077,6 +1155,19 @@ static void nvme_rdma_unmap_cmd(struct nvme_rdma_queue *queue,
 		return;
 
 	req = blk_mq_rq_to_pdu(rq);
+
+	if (req->is_protected) {
+		ib_mr_pool_put(queue->qp, &queue->qp->sig_mrs,
+			       req->pi_sgl->sig_mr);
+		req->pi_sgl->sig_mr = NULL;
+		if (blk_integrity_rq(rq)) {
+			ib_mr_pool_put(queue->qp, &queue->qp->rdma_mrs,
+				       req->pi_sgl->sgl.mr);
+			req->pi_sgl->sgl.mr = NULL;
+			nvme_rdma_unmap_integrity_sgl(queue, req->pi_sgl, rq);
+		}
+	}
+
 	if (req->data_sgl.mr) {
 		ib_mr_pool_put(queue->qp, &queue->qp->rdma_mrs, req->data_sgl.mr);
 		req->data_sgl.mr = NULL;
@@ -1136,11 +1227,13 @@ static int nvme_rdma_map_sg_single(struct nvme_rdma_queue *queue,
 
 static int nvme_rdma_map_mr_sg(struct nvme_rdma_queue *queue,
 			       struct list_head *mr_pool,
-			       struct nvme_rdma_sgl *sgl, int nents)
+			       struct nvme_rdma_sgl *sgl,
+			       int nents, struct ib_send_wr *last)
 {
 	struct scatterlist *sg = sgl->sg_table.sgl;
 	struct ib_reg_wr *reg_wr = &sgl->reg_wr;
 	struct ib_cqe *reg_cqe = &sgl->reg_cqe;
+	struct ib_sge *sge = &sgl->sge;
 	int nr;
 
 	sgl->mr = ib_mr_pool_get(queue->qp, mr_pool);
@@ -1173,26 +1266,230 @@ static int nvme_rdma_map_mr_sg(struct nvme_rdma_queue *queue,
 			 IB_ACCESS_REMOTE_READ |
 			 IB_ACCESS_REMOTE_WRITE;
 
+	sge->lkey = sgl->mr->lkey;
+	sge->addr = sgl->mr->iova;
+	sge->length = sgl->mr->length;
+
+	if (last)
+		last->next = &reg_wr->wr;
+
 	return 0;
 }
 
+static void nvme_rdma_set_diff_domain(struct nvme_command *cmd,
+		struct bio *bio,
+		struct ib_sig_attrs *sig_attrs,
+		struct ib_sig_domain *domain, struct request *rq)
+{
+	struct blk_integrity *bi = blk_get_integrity(bio->bi_disk);
+	struct nvme_ns *ns = rq->rq_disk->private_data;
+
+	WARN_ON(bi == NULL);
+
+	domain->sig_type = IB_SIG_TYPE_T10_DIF;
+	domain->sig.dif.pi_interval = 1 << bi->interval_exp;
+	domain->sig.dif.ref_tag = le32_to_cpu(cmd->rw.reftag);
+
+	/*
+	 * At the moment we hard code those, but in the future
+	 * we will take them from cmd.
+	 */
+	domain->sig.dif.apptag_check_mask = 0xffff;
+	domain->sig.dif.app_escape = true;
+	domain->sig.dif.ref_escape = true;
+	if (ns->pi_type != NVME_NS_DPS_PI_TYPE3)
+		domain->sig.dif.ref_remap = true;
+}
+
+static int nvme_rdma_set_sig_attrs(struct bio *bio,
+		struct ib_sig_attrs *sig_attrs, struct nvme_command *c,
+		struct request *rq)
+{
+	u16 control = le16_to_cpu(c->rw.control);
+
+	if (control & NVME_RW_PRINFO_PRACT) {
+		/* for WRITE_INSERT/READ_STRIP no memory domain */
+		sig_attrs->mem.sig_type = IB_SIG_TYPE_NONE;
+		nvme_rdma_set_diff_domain(c, bio, sig_attrs, &sig_attrs->wire, rq);
+		sig_attrs->wire.sig.dif.bg_type = IB_T10DIF_CRC;
+		/* Clear the PRACT bit since HCA will generate/verify the PI */
+		control &= ~NVME_RW_PRINFO_PRACT;
+		c->rw.control = cpu_to_le16(control);
+	} else {
+		/* for WRITE_PASS/READ_PASS both wire/memory domains exist */
+		nvme_rdma_set_diff_domain(c, bio, sig_attrs, &sig_attrs->wire, rq);
+		sig_attrs->wire.sig.dif.bg_type = IB_T10DIF_CRC;
+		nvme_rdma_set_diff_domain(c, bio, sig_attrs, &sig_attrs->mem, rq);
+		sig_attrs->mem.sig.dif.bg_type = IB_T10DIF_CRC;
+	}
+
+	return 0;
+}
+
+static void nvme_rdma_set_prot_checks(struct nvme_command *cmd, u8 *mask)
+{
+	*mask = 0;
+	if (le16_to_cpu(cmd->rw.control) & NVME_RW_PRINFO_PRCHK_REF)
+		*mask |= IB_SIG_CHECK_REFTAG;
+	if (le16_to_cpu(cmd->rw.control) & NVME_RW_PRINFO_PRCHK_GUARD)
+		*mask |= IB_SIG_CHECK_GUARD;
+}
+
+static void nvme_rdma_sig_done(struct ib_cq *cq, struct ib_wc *wc)
+{
+	if (unlikely(wc->status != IB_WC_SUCCESS))
+		nvme_rdma_wr_error(cq, wc, "SIG");
+}
+
+static int nvme_rdma_map_sig_mr_sg(struct nvme_rdma_queue *queue,
+			struct list_head *mr_pool,
+			struct nvme_rdma_request *req,
+			struct nvme_command *c,
+			struct ib_send_wr *last)
+{
+	struct nvme_rdma_pi_sgl *pi_sgl = req->pi_sgl;
+	struct ib_sig_attrs *sig_attrs = &pi_sgl->sig_attrs;
+	struct ib_cqe *cqe = &pi_sgl->sig_cqe;
+	struct ib_cqe *inv_cqe = &pi_sgl->sig_inv_cqe;
+	struct ib_sig_handover_wr *wr = &pi_sgl->sig_wr;
+	struct request *rq = blk_mq_rq_from_pdu(req);
+	struct bio *bio = rq->bio;
+	struct blk_integrity *bi = blk_get_integrity(bio->bi_disk);
+	int ret;
+
+	pi_sgl->sig_mr = ib_mr_pool_get(queue->qp, mr_pool);
+	if (WARN_ON_ONCE(!pi_sgl->sig_mr))
+		return -EAGAIN;
+
+	memset(sig_attrs, 0, sizeof(*sig_attrs));
+
+	ret = nvme_rdma_set_sig_attrs(bio, sig_attrs, c, rq);
+	if (unlikely(ret))
+		goto put_sig_mr;
+
+	nvme_rdma_set_prot_checks(c, &sig_attrs->check_mask);
+
+	ib_update_fast_reg_key(pi_sgl->sig_mr, ib_inc_rkey(pi_sgl->sig_mr->rkey));
+
+	cqe->done = nvme_rdma_sig_done;
+	inv_cqe->done = nvme_rdma_inv_sig_rkey_done;
+
+	memset(wr, 0, sizeof(*wr));
+	wr->wr.opcode = IB_WR_REG_SIG_MR;
+	wr->wr.wr_cqe = cqe;
+	wr->wr.sg_list = &req->data_sgl.sge;
+	wr->wr.num_sge = 1;
+	wr->wr.send_flags = 0;
+	wr->sig_attrs = sig_attrs;
+	wr->sig_mr = pi_sgl->sig_mr;
+	if (blk_integrity_rq(rq))
+		wr->prot = &pi_sgl->sgl.sge;
+	else
+		wr->prot = NULL;
+	wr->access_flags = IB_ACCESS_LOCAL_WRITE |
+			   IB_ACCESS_REMOTE_READ |
+			   IB_ACCESS_REMOTE_WRITE;
+
+	pi_sgl->sig_sge.lkey = pi_sgl->sig_mr->lkey;
+	pi_sgl->sig_sge.addr = 0;
+	pi_sgl->sig_sge.length = blk_rq_bytes(rq) +
+		bi->tuple_size * (blk_rq_bytes(rq) >> bi->interval_exp);
+
+	if (last)
+		last->next = &wr->wr;
+
+	return 0;
+
+put_sig_mr:
+	ib_mr_pool_put(queue->qp, mr_pool, pi_sgl->sig_mr);
+	return ret;
+}
+
 static int nvme_rdma_map_sg_fr(struct nvme_rdma_queue *queue,
 		struct nvme_rdma_request *req, struct nvme_command *c,
-		int nents)
+		int nents, int prot_nents)
 {
 	struct nvme_rdma_sgl *sgl = &req->data_sgl;
+	struct ib_send_wr *last;
+	struct ib_sge *sge;
+	u32 rkey;
 	int ret;
 
-	ret = nvme_rdma_map_mr_sg(queue, &queue->qp->rdma_mrs, sgl, nents);
+	ret = nvme_rdma_map_mr_sg(queue, &queue->qp->rdma_mrs, sgl, nents, NULL);
 	if (unlikely(ret))
 		return -EAGAIN;
 
 	sgl->inv_cqe.done = nvme_rdma_inv_rkey_done;
+	last = &sgl->reg_wr.wr;
+
+	if (req->is_protected) {
+		if (prot_nents) {
+			ret = nvme_rdma_map_mr_sg(queue, &queue->qp->rdma_mrs, &req->pi_sgl->sgl,
+						  prot_nents, last);
+			if (unlikely(ret)) {
+				ret = -EAGAIN;
+				goto put_mr;
+			}
+
+			req->pi_sgl->sgl.inv_cqe.done = nvme_rdma_inv_prot_rkey_done;
+			last = &req->pi_sgl->sgl.reg_wr.wr;
+		}
 
-	nvme_rdma_set_keyed_sgl(sgl->mr->iova, sgl->mr->length, sgl->mr->rkey,
+		ret = nvme_rdma_map_sig_mr_sg(queue, &queue->qp->sig_mrs, req,
+					      c, last);
+		if (unlikely(ret)) {
+			ret = -EAGAIN;
+			goto put_prot_mr;
+		}
+
+		sge = &req->pi_sgl->sig_sge;
+		rkey = req->pi_sgl->sig_mr->rkey;
+	} else {
+		sge = &sgl->sge;
+		rkey = sgl->mr->rkey;
+	}
+
+	nvme_rdma_set_keyed_sgl(sge->addr, sge->length, rkey,
 				c, true);
 
 	return 0;
+put_prot_mr:
+	if (prot_nents) {
+		ib_mr_pool_put(queue->qp, &queue->qp->rdma_mrs, req->pi_sgl->sgl.mr);
+		req->pi_sgl->sgl.mr = NULL;
+	}
+put_mr:
+	ib_mr_pool_put(queue->qp, &queue->qp->rdma_mrs, sgl->mr);
+	sgl->mr = NULL;
+	return ret;
+}
+
+static int nvme_rdma_map_integrity_sgl(struct nvme_rdma_pi_sgl *pi_sgl,
+		struct request *rq, struct ib_device *ibdev, int *count)
+{
+	int ret;
+
+	pi_sgl->sgl.sg_table.sgl = pi_sgl->sgl.first_sgl;
+	ret = sg_alloc_table_chained(&pi_sgl->sgl.sg_table,
+			blk_rq_count_integrity_sg(rq->q, rq->bio),
+			pi_sgl->sgl.sg_table.sgl);
+	if (unlikely(ret))
+		return -ENOMEM;
+
+	pi_sgl->sgl.nents = blk_rq_map_integrity_sg(rq->q, rq->bio,
+				pi_sgl->sgl.sg_table.sgl);
+	*count = ib_dma_map_sg(ibdev, pi_sgl->sgl.sg_table.sgl, pi_sgl->sgl.nents,
+			rq_data_dir(rq) == WRITE ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
+	if (unlikely(*count <= 0)) {
+		ret = -EIO;
+		goto out_free_table;
+	}
+
+	return ret;
+
+out_free_table:
+	sg_free_table_chained(&pi_sgl->sgl.sg_table, true);
+	return ret;
 }
 
 static int nvme_rdma_map_data_sgl(struct nvme_rdma_sgl *sgl,
@@ -1224,9 +1521,9 @@ static int nvme_rdma_map_data_sgl(struct nvme_rdma_sgl *sgl,
 
 static int nvme_rdma_map_rq(struct nvme_rdma_queue *queue,
 		struct nvme_rdma_request *req, struct request *rq,
-		struct nvme_command *c, int nents)
+		struct nvme_command *c, int nents, int prot_nents)
 {
-	if (nents == 1) {
+	if (nents == 1 && !req->is_protected) {
 		if (rq_data_dir(rq) == WRITE && nvme_rdma_queue_idx(queue) &&
 		    blk_rq_payload_bytes(rq) <=	nvme_rdma_inline_data_size(queue))
 			return nvme_rdma_map_sg_inline(queue, req, c);
@@ -1234,7 +1531,7 @@ static int nvme_rdma_map_rq(struct nvme_rdma_queue *queue,
 			return nvme_rdma_map_sg_single(queue, req, c);
 	}
 
-	return nvme_rdma_map_sg_fr(queue, req, c, nents);
+	return nvme_rdma_map_sg_fr(queue, req, c, nents, prot_nents);
 }
 
 static int nvme_rdma_setup_cmd(struct nvme_rdma_queue *queue,
@@ -1244,6 +1541,7 @@ static int nvme_rdma_setup_cmd(struct nvme_rdma_queue *queue,
 	struct nvme_rdma_sgl *sgl = &req->data_sgl;
 	struct nvme_rdma_device *dev = queue->device;
 	struct ib_device *ibdev = dev->dev;
+	int prot_count = 0;
 	int count, ret;
 
 	req->num_sge = 1;
@@ -1258,12 +1556,21 @@ static int nvme_rdma_setup_cmd(struct nvme_rdma_queue *queue,
 	if (unlikely(ret))
 		return ret;
 
-	ret = nvme_rdma_map_rq(queue, req, rq, c, count);
+	if (blk_integrity_rq(rq)) {
+		ret = nvme_rdma_map_integrity_sgl(req->pi_sgl, rq, ibdev, &prot_count);
+		if (ret < 0)
+			goto out_unmap_data_sgl;
+	}
+
+	ret = nvme_rdma_map_rq(queue, req, rq, c, count, prot_count);
 	if (unlikely(ret))
-		goto out_unmap_data_sgl;
+		goto out_unmap_integrity_sgl;
 
 	return 0;
 
+out_unmap_integrity_sgl:
+	if (blk_integrity_rq(rq))
+		nvme_rdma_unmap_integrity_sgl(queue, req->pi_sgl, rq);
 out_unmap_data_sgl:
 	nvme_rdma_unmap_data_sgl(queue, sgl, rq);
 	return ret;
@@ -1288,7 +1595,7 @@ static void nvme_rdma_send_done(struct ib_cq *cq, struct ib_wc *wc)
 
 static int nvme_rdma_post_send(struct nvme_rdma_queue *queue,
 		struct nvme_rdma_qe *qe, struct ib_sge *sge, u32 num_sge,
-		struct ib_send_wr *first)
+		struct ib_send_wr *first, struct ib_send_wr *last)
 {
 	struct ib_send_wr wr, *bad_wr;
 	int ret;
@@ -1305,7 +1612,7 @@ static int nvme_rdma_post_send(struct nvme_rdma_queue *queue,
 	wr.send_flags = IB_SEND_SIGNALED;
 
 	if (first)
-		first->next = &wr;
+		last->next = &wr; /* we need to add WR to last configured WR */
 	else
 		first = &wr;
 
@@ -1381,16 +1688,115 @@ static void nvme_rdma_submit_async_event(struct nvme_ctrl *arg)
 	ib_dma_sync_single_for_device(dev, sqe->dma, sizeof(*cmd),
 			DMA_TO_DEVICE);
 
-	ret = nvme_rdma_post_send(queue, sqe, &sge, 1, NULL);
+	ret = nvme_rdma_post_send(queue, sqe, &sge, 1, NULL, NULL);
 	WARN_ON_ONCE(ret);
 }
 
+static int nvme_rdma_local_invalidate_keys(struct nvme_rdma_queue *queue,
+		struct nvme_rdma_request *req, u32 sig_rkey, u32 prot_rkey,
+		u32 data_rkey)
+{
+	struct request *rq;
+	int ret = 0;
+
+	if (sig_rkey) {
+		ret = nvme_rdma_inv_rkey(queue, sig_rkey,
+					 &req->pi_sgl->sig_inv_cqe);
+		if (unlikely(ret < 0)) {
+			dev_err(queue->ctrl->ctrl.device,
+				"Queueing INV SIG WR for rkey %#x failed (%d)\n",
+				sig_rkey, ret);
+			/* fail with DNR on posting local inv failure */
+			rq = blk_mq_rq_from_pdu(req);
+			nvme_req(rq)->status = NVME_SC_INTERNAL | NVME_SC_DNR;
+			ret = -EINVAL;
+		}
+	}
+	if (prot_rkey) {
+		ret = nvme_rdma_inv_rkey(queue, prot_rkey,
+					 &req->pi_sgl->sgl.inv_cqe);
+		if (unlikely(ret < 0)) {
+			dev_err(queue->ctrl->ctrl.device,
+				"Queueing INV PROT WR for rkey %#x failed (%d)\n",
+				prot_rkey, ret);
+			/* fail with DNR on posting local inv failure */
+			rq = blk_mq_rq_from_pdu(req);
+			nvme_req(rq)->status = NVME_SC_INTERNAL | NVME_SC_DNR;
+			ret = -EINVAL;
+		}
+	}
+	if (data_rkey) {
+		ret = nvme_rdma_inv_rkey(queue, data_rkey,
+					 &req->data_sgl.inv_cqe);
+		if (unlikely(ret < 0)) {
+			dev_err(queue->ctrl->ctrl.device,
+				"Queueing INV WR for rkey %#x failed (%d)\n",
+				data_rkey, ret);
+			/* fail with DNR on posting local inv failure */
+			rq = blk_mq_rq_from_pdu(req);
+			nvme_req(rq)->status = NVME_SC_INTERNAL | NVME_SC_DNR;
+			ret = -EINVAL;
+		}
+	}
+
+	return ret;
+}
+
+static int nvme_rdma_invalidate_req(struct nvme_rdma_queue *queue,
+		struct nvme_rdma_request *req, struct ib_wc *wc)
+{
+	u32 rkey;
+	int ret;
+
+	if (req->is_protected) {
+		rkey = req->pi_sgl->sig_mr->rkey;
+		if (wc->wc_flags & IB_WC_WITH_INVALIDATE) {
+			if (unlikely(wc->ex.invalidate_rkey != rkey)) {
+				dev_err(queue->ctrl->ctrl.device,
+					"Bogus remote invalidation for rkey %#x\n",
+					rkey);
+				ret = -EINVAL;
+				goto out_error_recovery;
+			} else {
+				rkey = 0; /* no need to invalidate this key */
+			}
+		}
+		ret = nvme_rdma_local_invalidate_keys(queue, req, rkey,
+				req->pi_sgl->sgl.mr ? req->pi_sgl->sgl.mr->rkey : 0,
+				req->data_sgl.mr ? req->data_sgl.mr->rkey : 0);
+		if (unlikely(ret))
+			goto out_error_recovery;
+	} else if (req->data_sgl.mr) {
+		rkey = req->data_sgl.mr->rkey;
+		if (wc->wc_flags & IB_WC_WITH_INVALIDATE) {
+			if (unlikely(wc->ex.invalidate_rkey != rkey)) {
+				dev_err(queue->ctrl->ctrl.device,
+					"Bogus remote invalidation for rkey %#x\n",
+					rkey);
+				ret = -EINVAL;
+				goto out_error_recovery;
+			} else {
+				return 1; /* This will indicate that we should complete the cmd */
+			}
+		} else {
+			ret = nvme_rdma_local_invalidate_keys(queue, req, 0, 0, rkey);
+			if (unlikely(ret))
+				goto out_error_recovery;
+		}
+	}
+
+	return 0;
+
+out_error_recovery:
+	nvme_rdma_error_recovery(queue->ctrl);
+	return ret;
+}
+
 static int nvme_rdma_process_nvme_rsp(struct nvme_rdma_queue *queue,
 		struct nvme_completion *cqe, struct ib_wc *wc, int tag)
 {
 	struct request *rq;
 	struct nvme_rdma_request *req;
-	struct nvme_rdma_sgl *sgl;
 	int ret = 0;
 
 	rq = blk_mq_tag_to_rq(nvme_rdma_tagset(queue), cqe->command_id);
@@ -1405,25 +1811,12 @@ static int nvme_rdma_process_nvme_rsp(struct nvme_rdma_queue *queue,
 
 	req->status = cqe->status;
 	req->result = cqe->result;
-	sgl = &req->data_sgl;
 
-	if (wc->wc_flags & IB_WC_WITH_INVALIDATE) {
-		if (unlikely(wc->ex.invalidate_rkey != sgl->mr->rkey)) {
-			dev_err(queue->ctrl->ctrl.device,
-				"Bogus remote invalidation for rkey %#x\n",
-				sgl->mr->rkey);
-			nvme_rdma_error_recovery(queue->ctrl);
+	if (req->data_sgl.mr || req->is_protected) {
+		if (!nvme_rdma_invalidate_req(queue, req, wc)) {
+			/* the local invalidation completion will end the request */
+			return 0;
 		}
-	} else if (sgl->mr) {
-		ret = nvme_rdma_inv_rkey(queue, sgl->mr->rkey, &sgl->inv_cqe);
-		if (unlikely(ret < 0)) {
-			dev_err(queue->ctrl->ctrl.device,
-				"Queueing INV WR for rkey %#x failed (%d)\n",
-				sgl->mr->rkey, ret);
-			nvme_rdma_error_recovery(queue->ctrl);
-		}
-		/* the local invalidation completion will end the request */
-		return 0;
 	}
 
 	if (refcount_dec_and_test(&req->ref)) {
@@ -1669,6 +2062,17 @@ static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
 }
 
 static inline struct ib_send_wr *
+nvme_rdma_last_wr(struct nvme_rdma_request *req)
+{
+	if (req->is_protected)
+		return &req->pi_sgl->sig_wr.wr;
+	else if (req->data_sgl.mr)
+		return &req->data_sgl.reg_wr.wr;
+
+	return NULL;
+}
+
+static inline struct ib_send_wr *
 nvme_rdma_first_wr(struct nvme_rdma_request *req)
 {
 	if (req->data_sgl.mr)
@@ -1707,6 +2111,12 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 	blk_mq_start_request(rq);
 
+	if (nvme_rdma_queue_idx(queue)) {
+		if (c->common.opcode == nvme_cmd_write ||
+		    c->common.opcode == nvme_cmd_read)
+			req->is_protected = nvme_ns_has_pi(ns);
+	}
+
 	err = nvme_rdma_setup_cmd(queue, rq, c);
 	if (unlikely(err < 0)) {
 		dev_err(queue->ctrl->ctrl.device,
@@ -1721,7 +2131,8 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
 			sizeof(struct nvme_command), DMA_TO_DEVICE);
 
 	err = nvme_rdma_post_send(queue, sqe, req->sge, req->num_sge,
-				  nvme_rdma_first_wr(req));
+				  nvme_rdma_first_wr(req),
+				  nvme_rdma_last_wr(req));
 	if (unlikely(err)) {
 		nvme_rdma_unmap_cmd(queue, rq);
 		goto err;
@@ -1755,10 +2166,46 @@ static int nvme_rdma_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
 	return found;
 }
 
+static void
+nvme_rdma_check_pi_status(struct nvme_rdma_request *req)
+{
+	struct request *rq = blk_mq_rq_from_pdu(req);
+	struct ib_mr_status mr_status;
+	int ret;
+
+	ret = ib_check_mr_status(req->pi_sgl->sig_mr, IB_MR_CHECK_SIG_STATUS,
+				 &mr_status);
+	if (ret) {
+		pr_err("ib_check_mr_status failed, ret %d\n", ret);
+		nvme_req(rq)->status = NVME_SC_INVALID_PI;
+		return;
+	}
+
+	if (mr_status.fail_status & IB_MR_CHECK_SIG_STATUS) {
+		switch (mr_status.sig_err.err_type) {
+		case IB_SIG_BAD_GUARD:
+			nvme_req(rq)->status = NVME_SC_GUARD_CHECK;
+			break;
+		case IB_SIG_BAD_REFTAG:
+			nvme_req(rq)->status = NVME_SC_REFTAG_CHECK;
+			break;
+		case IB_SIG_BAD_APPTAG:
+			nvme_req(rq)->status = NVME_SC_APPTAG_CHECK;
+			break;
+		}
+		pr_err("PI error found type %d expected 0x%x vs actual 0x%x\n",
+		       mr_status.sig_err.err_type,
+		       mr_status.sig_err.expected,
+		       mr_status.sig_err.actual);
+	}
+}
+
 static void nvme_rdma_complete_rq(struct request *rq)
 {
 	struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
 
+	if (req->is_protected)
+		nvme_rdma_check_pi_status(req);
 	nvme_rdma_unmap_cmd(req->queue, rq);
 	nvme_complete_rq(rq);
 }
@@ -1861,7 +2308,7 @@ static void nvme_rdma_reset_ctrl_work(struct work_struct *work)
 static const struct nvme_ctrl_ops nvme_rdma_ctrl_ops = {
 	.name			= "rdma",
 	.module			= THIS_MODULE,
-	.flags			= NVME_F_FABRICS,
+	.flags			= NVME_F_FABRICS | NVME_F_METADATA_SUPPORTED,
 	.reg_read32		= nvmf_reg_read32,
 	.reg_read64		= nvmf_reg_read64,
 	.reg_write32		= nvmf_reg_write32,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* Re: [PATCH 01/17] IB/isert: fix T10-pi check mask setting
  2018-05-27 15:50   ` Max Gurtovoy
@ 2018-05-28  7:21     ` Christoph Hellwig
  -1 siblings, 0 replies; 93+ messages in thread
From: Christoph Hellwig @ 2018-05-28  7:21 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: jgg, keith.busch, linux-nvme, hch, sagi, leon, oren, idanb,
	vladimirk, stable

On Sun, May 27, 2018 at 06:50:06PM +0300, Max Gurtovoy wrote:
> A copy/paste bug (probably) caused setting of an app_tag check mask
> in case where a ref_tag check was needed.
> 
> Fixes: 38a2d0d429f1 ("IB/isert: convert to the generic RDMA READ/WRITE API")
> Fixes: 9e961ae73c2c ("IB/isert: Support T10-PI protected transactions")
> Cc: stable@vger.kernel.org
> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>

Looks good, and should probably got into the scsi tree ASAP.

Btw, please don't cc stable@ for the whole series.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 01/17] IB/isert: fix T10-pi check mask setting
@ 2018-05-28  7:21     ` Christoph Hellwig
  0 siblings, 0 replies; 93+ messages in thread
From: Christoph Hellwig @ 2018-05-28  7:21 UTC (permalink / raw)


On Sun, May 27, 2018@06:50:06PM +0300, Max Gurtovoy wrote:
> A copy/paste bug (probably) caused setting of an app_tag check mask
> in case where a ref_tag check was needed.
> 
> Fixes: 38a2d0d429f1 ("IB/isert: convert to the generic RDMA READ/WRITE API")
> Fixes: 9e961ae73c2c ("IB/isert: Support T10-PI protected transactions")
> Cc: stable at vger.kernel.org
> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>

Looks good, and should probably got into the scsi tree ASAP.

Btw, please don't cc stable@ for the whole series.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 02/17] RDMA/core: introduce check masks for T10-PI offload
  2018-05-27 15:50 ` [PATCH 02/17] RDMA/core: introduce check masks for T10-PI offload Max Gurtovoy
@ 2018-05-28  7:21   ` Christoph Hellwig
  2018-05-30 21:56   ` Sagi Grimberg
  1 sibling, 0 replies; 93+ messages in thread
From: Christoph Hellwig @ 2018-05-28  7:21 UTC (permalink / raw)


On Sun, May 27, 2018@06:50:07PM +0300, Max Gurtovoy wrote:
> T10-PI offload capability is currently supported in iSER protocol only,
> and the definition of the HCA protection information checks are missing
> from the core layer. Add those definition to avoid code duplication in
> other drivers (such iSER target and NVMeoF).
> 
> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
> ---
>  include/rdma/ib_verbs.h | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 9fc8a82..bdb1bbc 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -861,6 +861,19 @@ enum ib_sig_err_type {
>  };
>  
>  /**
> + * Signature check masks (8 bytes in total) according to the T10-PI standard:
> + *  -------- -------- ------------
> + * | GUARD  | APPTAG |   REFTAG   |
> + * |  2B    |  2B    |    4B      |
> + *  -------- -------- ------------
> + */
> +enum {
> +	IB_SIG_CHECK_GUARD = 0xc0,
> +	IB_SIG_CHECK_APPTAG = 0x30,
> +	IB_SIG_CHECK_REFTAG = 0x0f,
> +};

Maybe align the equal signs using tabs?

Otherwise looks fine:

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 03/17] IB/iser: use T10-PI check mask definitions from core layer
  2018-05-27 15:50 ` [PATCH 03/17] IB/iser: use T10-PI check mask definitions from core layer Max Gurtovoy
@ 2018-05-28  7:22   ` Christoph Hellwig
  2018-05-30 21:57   ` Sagi Grimberg
  1 sibling, 0 replies; 93+ messages in thread
From: Christoph Hellwig @ 2018-05-28  7:22 UTC (permalink / raw)


On Sun, May 27, 2018@06:50:08PM +0300, Max Gurtovoy wrote:
> No reason to re-define protection information check in ib_iser driver.
> Use check masks from RDMA core driver.
> 
> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>

Looks fine:

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 04/17] IB/isert: use T10-PI check mask definitions from core layer
  2018-05-27 15:50 ` [PATCH 04/17] IB/isert: " Max Gurtovoy
@ 2018-05-28  7:22   ` Christoph Hellwig
  2018-05-30 10:48     ` Max Gurtovoy
  2018-05-30 21:58   ` Sagi Grimberg
  1 sibling, 1 reply; 93+ messages in thread
From: Christoph Hellwig @ 2018-05-28  7:22 UTC (permalink / raw)


On Sun, May 27, 2018@06:50:09PM +0300, Max Gurtovoy wrote:
> No reason to use hard-coded protection information checks in ib_isert driver.
> Use check masks from RDMA core driver.

Looks fine:

Reviewed-by: Christoph Hellwig <hch at lst.de>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 05/17] nvme: Fix extended data LBA supported setting
  2018-05-27 15:50 ` [PATCH 05/17] nvme: Fix extended data LBA supported setting Max Gurtovoy
@ 2018-05-28  7:22   ` Christoph Hellwig
  2018-05-29 12:47     ` Max Gurtovoy
  2018-05-30 22:00   ` Sagi Grimberg
  1 sibling, 1 reply; 93+ messages in thread
From: Christoph Hellwig @ 2018-05-28  7:22 UTC (permalink / raw)


On Sun, May 27, 2018@06:50:10PM +0300, Max Gurtovoy wrote:
> This value depands on the metadata support value.
> 
> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>

Looks fine, I'll pull this into the nvme tree ASAP.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 06/17] nvme: Add WARN in case fabrics ctrl was set with wrong metadata caps
  2018-05-27 15:50 ` [PATCH 06/17] nvme: Add WARN in case fabrics ctrl was set with wrong metadata caps Max Gurtovoy
@ 2018-05-28  7:24   ` Christoph Hellwig
  2018-05-28 14:56     ` Max Gurtovoy
  2018-05-30 22:05     ` Sagi Grimberg
  0 siblings, 2 replies; 93+ messages in thread
From: Christoph Hellwig @ 2018-05-28  7:24 UTC (permalink / raw)


On Sun, May 27, 2018@06:50:11PM +0300, Max Gurtovoy wrote:
> An extended LBA is a larger LBA that is created when metadata associated
> with the LBA is transferred contiguously with the LBA data (AKA interleaved).
> The metadata may be either transferred as part of the LBA (creating an extended
> LBA) or it may be transferred as a separate contiguous buffer of data. According
> to the NVMEoF spec, a fabrics ctrl supports only an Extended LBA format. add WARN
> in case we have a spec violation. Also init the integrity profile for the block
> device for fabrics ctrl.
> 
> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
> ---
>  drivers/nvme/host/core.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index d8254b4..c605bb3 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1419,9 +1419,10 @@ static void nvme_update_disk_info(struct gendisk *disk,
>  	blk_queue_physical_block_size(disk->queue, bs);
>  	blk_queue_io_min(disk->queue, bs);
>  
> -	if (ns->ms && !ns->ext &&
> -	    (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
> -		nvme_init_integrity(disk, ns->ms, ns->pi_type);
> +	if (ns->ms && (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED)) {
> +		if ((ns->ctrl->ops->flags & NVME_F_FABRICS) || !ns->ext)
> +			nvme_init_integrity(disk, ns->ms, ns->pi_type);
> +	}
>  	if (ns->ms && !nvme_ns_has_pi(ns) && !blk_get_integrity(disk))
>  		capacity = 0;
>  	set_capacity(disk, capacity);
> @@ -1445,6 +1446,18 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
>  	ns->noiob = le16_to_cpu(id->noiob);
>  	ns->ms = le16_to_cpu(id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ms);
>  	ns->ext = ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT);
> +
> +	/*
> +	 * For Fabrics, only metadata as part of extended data LBA is supported.
> +	 * Reduce the metadata size in case of a spec violation.
> +	 */
> +	if (ns->ms && (ns->ctrl->ops->flags & NVME_F_FABRICS)) {
> +		if (!ns->ext) {
> +			WARN_ON_ONCE(1);
> +			ns->ms = 0;
> +		}
> +	}

Should we just reject the probe instead of papering over it?

Also if we keep this it should be:

		if (WARN_ON_ONCE(!ns->ext))
			ns->ms = 0;

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 09/17] nvme: reduce the metadata size in case the ctrl doesn't support it
  2018-05-27 15:50 ` [PATCH 09/17] nvme: reduce the metadata size in case the ctrl doesn't support it Max Gurtovoy
@ 2018-05-28  7:25   ` Christoph Hellwig
  0 siblings, 0 replies; 93+ messages in thread
From: Christoph Hellwig @ 2018-05-28  7:25 UTC (permalink / raw)


On Sun, May 27, 2018@06:50:14PM +0300, Max Gurtovoy wrote:
> There is no reason for setting metadata size while the controller is
> not capable to perform command with metadata.
> 
> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
> ---
>  drivers/nvme/host/core.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index e81e898..1830978 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1461,6 +1461,10 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
>  		}
>  	}
>  
> +	/* Reduce the metadata size in case the ctrl doesn't support it */
> +	if (!ns->ctrl->max_integrity_segments)
> +		ns->ms = 0;

You already had the reduce wording in a previous patch, and I think it
is rather confusing.  By setting ns->ms to 0 you do not reduce the size,
but you disable metadata, so it should reflect that.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 10/17] nvme: export nvme_ns_has_pi function
  2018-05-27 15:50 ` [PATCH 10/17] nvme: export nvme_ns_has_pi function Max Gurtovoy
@ 2018-05-28  7:25   ` Christoph Hellwig
  2018-05-28 12:41     ` Max Gurtovoy
  2018-05-30 22:19   ` Sagi Grimberg
  1 sibling, 1 reply; 93+ messages in thread
From: Christoph Hellwig @ 2018-05-28  7:25 UTC (permalink / raw)


> -static inline bool nvme_ns_has_pi(struct nvme_ns *ns)
> +bool nvme_ns_has_pi(struct nvme_ns *ns)
>  {
>  	return ns->pi_type && ns->ms == sizeof(struct t10_pi_tuple);
>  }
> +EXPORT_SYMBOL_GPL(nvme_ns_has_pi);

Just make this an inline in nvme.h

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 11/17] nvme-rdma: Introduce cqe for local invalidation
  2018-05-27 15:50 ` [PATCH 11/17] nvme-rdma: Introduce cqe for local invalidation Max Gurtovoy
@ 2018-05-28  7:25   ` Christoph Hellwig
  2018-05-30 22:26   ` Sagi Grimberg
  1 sibling, 0 replies; 93+ messages in thread
From: Christoph Hellwig @ 2018-05-28  7:25 UTC (permalink / raw)


On Sun, May 27, 2018@06:50:16PM +0300, Max Gurtovoy wrote:
> Using the same cqe object for registration and invalidation completions
> is not safe.

Is not safe why?

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 12/17] nvme-rdma: Introduce nvme_rdma_set_keyed_sgl helper func
  2018-05-27 15:50 ` [PATCH 12/17] nvme-rdma: Introduce nvme_rdma_set_keyed_sgl helper func Max Gurtovoy
@ 2018-05-28  7:26   ` Christoph Hellwig
  2018-05-30 22:27     ` Sagi Grimberg
  0 siblings, 1 reply; 93+ messages in thread
From: Christoph Hellwig @ 2018-05-28  7:26 UTC (permalink / raw)


On Sun, May 27, 2018@06:50:17PM +0300, Max Gurtovoy wrote:
> This function will be used to avoid code duplication while
> setting keyed sgl fields.

Does this really buy us anything?

> 
> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
> ---
>  drivers/nvme/host/rdma.c | 34 +++++++++++++++++++---------------
>  1 file changed, 19 insertions(+), 15 deletions(-)

Because the diffstate doesn't look very promising, and the code isn't
really more readable either.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 17/17] nvme-rdma: Add T10-PI support
  2018-05-27 15:50 ` [PATCH 17/17] nvme-rdma: Add T10-PI support Max Gurtovoy
@ 2018-05-28  7:28   ` Christoph Hellwig
  2018-05-30 23:05   ` Sagi Grimberg
  1 sibling, 0 replies; 93+ messages in thread
From: Christoph Hellwig @ 2018-05-28  7:28 UTC (permalink / raw)


This seems to miss a call to nvme_dif_remap().  Which btw, I'd rather
merge with sd_dif_prepare/sd_dif_complete and move to the block layer.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 01/17] IB/isert: fix T10-pi check mask setting
  2018-05-28  7:21     ` Christoph Hellwig
@ 2018-05-28 11:54       ` Max Gurtovoy
  -1 siblings, 0 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-28 11:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: jgg, keith.busch, linux-nvme, sagi, leon, oren, idanb, vladimirk, stable



On 5/28/2018 10:21 AM, Christoph Hellwig wrote:
> On Sun, May 27, 2018 at 06:50:06PM +0300, Max Gurtovoy wrote:
>> A copy/paste bug (probably) caused setting of an app_tag check mask
>> in case where a ref_tag check was needed.
>>
>> Fixes: 38a2d0d429f1 ("IB/isert: convert to the generic RDMA READ/WRITE API")
>> Fixes: 9e961ae73c2c ("IB/isert: Support T10-PI protected transactions")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
> 
> Looks good, and should probably got into the scsi tree ASAP.

Yes, there are few patches that can be pushed "as is".

Should it go through Nicholas PR ?

> 
> Btw, please don't cc stable@ for the whole series.
> 

The cc stable@ is only for this patch and not for the series.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 01/17] IB/isert: fix T10-pi check mask setting
@ 2018-05-28 11:54       ` Max Gurtovoy
  0 siblings, 0 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-28 11:54 UTC (permalink / raw)




On 5/28/2018 10:21 AM, Christoph Hellwig wrote:
> On Sun, May 27, 2018@06:50:06PM +0300, Max Gurtovoy wrote:
>> A copy/paste bug (probably) caused setting of an app_tag check mask
>> in case where a ref_tag check was needed.
>>
>> Fixes: 38a2d0d429f1 ("IB/isert: convert to the generic RDMA READ/WRITE API")
>> Fixes: 9e961ae73c2c ("IB/isert: Support T10-PI protected transactions")
>> Cc: stable at vger.kernel.org
>> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
> 
> Looks good, and should probably got into the scsi tree ASAP.

Yes, there are few patches that can be pushed "as is".

Should it go through Nicholas PR ?

> 
> Btw, please don't cc stable@ for the whole series.
> 

The cc stable@ is only for this patch and not for the series.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 01/17] IB/isert: fix T10-pi check mask setting
  2018-05-28 11:54       ` Max Gurtovoy
@ 2018-05-28 12:03         ` Christoph Hellwig
  -1 siblings, 0 replies; 93+ messages in thread
From: Christoph Hellwig @ 2018-05-28 12:03 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: Christoph Hellwig, jgg, keith.busch, linux-nvme, sagi, leon,
	oren, idanb, vladimirk, stable

On Mon, May 28, 2018 at 02:54:48PM +0300, Max Gurtovoy wrote:
>
>
> On 5/28/2018 10:21 AM, Christoph Hellwig wrote:
>> On Sun, May 27, 2018 at 06:50:06PM +0300, Max Gurtovoy wrote:
>>> A copy/paste bug (probably) caused setting of an app_tag check mask
>>> in case where a ref_tag check was needed.
>>>
>>> Fixes: 38a2d0d429f1 ("IB/isert: convert to the generic RDMA READ/WRITE API")
>>> Fixes: 9e961ae73c2c ("IB/isert: Support T10-PI protected transactions")
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
>>
>> Looks good, and should probably got into the scsi tree ASAP.
>
> Yes, there are few patches that can be pushed "as is".
>
> Should it go through Nicholas PR ?

Nic has been non-responsive for a while and Martin has queued up
target fixes when needed.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 01/17] IB/isert: fix T10-pi check mask setting
@ 2018-05-28 12:03         ` Christoph Hellwig
  0 siblings, 0 replies; 93+ messages in thread
From: Christoph Hellwig @ 2018-05-28 12:03 UTC (permalink / raw)


On Mon, May 28, 2018@02:54:48PM +0300, Max Gurtovoy wrote:
>
>
> On 5/28/2018 10:21 AM, Christoph Hellwig wrote:
>> On Sun, May 27, 2018@06:50:06PM +0300, Max Gurtovoy wrote:
>>> A copy/paste bug (probably) caused setting of an app_tag check mask
>>> in case where a ref_tag check was needed.
>>>
>>> Fixes: 38a2d0d429f1 ("IB/isert: convert to the generic RDMA READ/WRITE API")
>>> Fixes: 9e961ae73c2c ("IB/isert: Support T10-PI protected transactions")
>>> Cc: stable at vger.kernel.org
>>> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
>>
>> Looks good, and should probably got into the scsi tree ASAP.
>
> Yes, there are few patches that can be pushed "as is".
>
> Should it go through Nicholas PR ?

Nic has been non-responsive for a while and Martin has queued up
target fixes when needed.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 01/17] IB/isert: fix T10-pi check mask setting
  2018-05-28 12:03         ` Christoph Hellwig
@ 2018-05-28 12:04           ` Max Gurtovoy
  -1 siblings, 0 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-28 12:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: jgg, keith.busch, linux-nvme, sagi, leon, oren, idanb, vladimirk,
	stable, martin.petersen

Hi Martin/Jason,

can you please coordinate on taking this fix through your PR (RDMA or 
SCSi) ?

On 5/28/2018 3:03 PM, Christoph Hellwig wrote:
> On Mon, May 28, 2018 at 02:54:48PM +0300, Max Gurtovoy wrote:
>>
>>
>> On 5/28/2018 10:21 AM, Christoph Hellwig wrote:
>>> On Sun, May 27, 2018 at 06:50:06PM +0300, Max Gurtovoy wrote:
>>>> A copy/paste bug (probably) caused setting of an app_tag check mask
>>>> in case where a ref_tag check was needed.
>>>>
>>>> Fixes: 38a2d0d429f1 ("IB/isert: convert to the generic RDMA READ/WRITE API")
>>>> Fixes: 9e961ae73c2c ("IB/isert: Support T10-PI protected transactions")
>>>> Cc: stable@vger.kernel.org
>>>> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
>>>
>>> Looks good, and should probably got into the scsi tree ASAP.
>>
>> Yes, there are few patches that can be pushed "as is".
>>
>> Should it go through Nicholas PR ?
> 
> Nic has been non-responsive for a while and Martin has queued up
> target fixes when needed.
> 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 01/17] IB/isert: fix T10-pi check mask setting
@ 2018-05-28 12:04           ` Max Gurtovoy
  0 siblings, 0 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-28 12:04 UTC (permalink / raw)


Hi Martin/Jason,

can you please coordinate on taking this fix through your PR (RDMA or 
SCSi) ?

On 5/28/2018 3:03 PM, Christoph Hellwig wrote:
> On Mon, May 28, 2018@02:54:48PM +0300, Max Gurtovoy wrote:
>>
>>
>> On 5/28/2018 10:21 AM, Christoph Hellwig wrote:
>>> On Sun, May 27, 2018@06:50:06PM +0300, Max Gurtovoy wrote:
>>>> A copy/paste bug (probably) caused setting of an app_tag check mask
>>>> in case where a ref_tag check was needed.
>>>>
>>>> Fixes: 38a2d0d429f1 ("IB/isert: convert to the generic RDMA READ/WRITE API")
>>>> Fixes: 9e961ae73c2c ("IB/isert: Support T10-PI protected transactions")
>>>> Cc: stable at vger.kernel.org
>>>> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
>>>
>>> Looks good, and should probably got into the scsi tree ASAP.
>>
>> Yes, there are few patches that can be pushed "as is".
>>
>> Should it go through Nicholas PR ?
> 
> Nic has been non-responsive for a while and Martin has queued up
> target fixes when needed.
> 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 10/17] nvme: export nvme_ns_has_pi function
  2018-05-28  7:25   ` Christoph Hellwig
@ 2018-05-28 12:41     ` Max Gurtovoy
  0 siblings, 0 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-28 12:41 UTC (permalink / raw)




On 5/28/2018 10:25 AM, Christoph Hellwig wrote:
>> -static inline bool nvme_ns_has_pi(struct nvme_ns *ns)
>> +bool nvme_ns_has_pi(struct nvme_ns *ns)
>>   {
>>   	return ns->pi_type && ns->ms == sizeof(struct t10_pi_tuple);
>>   }
>> +EXPORT_SYMBOL_GPL(nvme_ns_has_pi);
> 
> Just make this an inline in nvme.h
> 

Sure, I'll do it in V2.
Thanks.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 06/17] nvme: Add WARN in case fabrics ctrl was set with wrong metadata caps
  2018-05-28  7:24   ` Christoph Hellwig
@ 2018-05-28 14:56     ` Max Gurtovoy
  2018-05-30 22:05     ` Sagi Grimberg
  1 sibling, 0 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-28 14:56 UTC (permalink / raw)




On 5/28/2018 10:24 AM, Christoph Hellwig wrote:
> On Sun, May 27, 2018@06:50:11PM +0300, Max Gurtovoy wrote:
>> An extended LBA is a larger LBA that is created when metadata associated
>> with the LBA is transferred contiguously with the LBA data (AKA interleaved).
>> The metadata may be either transferred as part of the LBA (creating an extended
>> LBA) or it may be transferred as a separate contiguous buffer of data. According
>> to the NVMEoF spec, a fabrics ctrl supports only an Extended LBA format. add WARN
>> in case we have a spec violation. Also init the integrity profile for the block
>> device for fabrics ctrl.
>>
>> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
>> ---
>>   drivers/nvme/host/core.c | 19 ++++++++++++++++---
>>   1 file changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index d8254b4..c605bb3 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -1419,9 +1419,10 @@ static void nvme_update_disk_info(struct gendisk *disk,
>>   	blk_queue_physical_block_size(disk->queue, bs);
>>   	blk_queue_io_min(disk->queue, bs);
>>   
>> -	if (ns->ms && !ns->ext &&
>> -	    (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
>> -		nvme_init_integrity(disk, ns->ms, ns->pi_type);
>> +	if (ns->ms && (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED)) {
>> +		if ((ns->ctrl->ops->flags & NVME_F_FABRICS) || !ns->ext)
>> +			nvme_init_integrity(disk, ns->ms, ns->pi_type);
>> +	}
>>   	if (ns->ms && !nvme_ns_has_pi(ns) && !blk_get_integrity(disk))
>>   		capacity = 0;
>>   	set_capacity(disk, capacity);
>> @@ -1445,6 +1446,18 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
>>   	ns->noiob = le16_to_cpu(id->noiob);
>>   	ns->ms = le16_to_cpu(id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ms);
>>   	ns->ext = ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT);
>> +
>> +	/*
>> +	 * For Fabrics, only metadata as part of extended data LBA is supported.
>> +	 * Reduce the metadata size in case of a spec violation.
>> +	 */
>> +	if (ns->ms && (ns->ctrl->ops->flags & NVME_F_FABRICS)) {
>> +		if (!ns->ext) {
>> +			WARN_ON_ONCE(1);
>> +			ns->ms = 0;
>> +		}
>> +	}
> 
> Should we just reject the probe instead of papering over it?

Are you suggesting making __nvme_revalidate_disk non-void function ?

> 
> Also if we keep this it should be:
> 
> 		if (WARN_ON_ONCE(!ns->ext))
> 			ns->ms = 0;
>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 01/17] IB/isert: fix T10-pi check mask setting
  2018-05-28 12:04           ` Max Gurtovoy
@ 2018-05-28 16:33             ` Jason Gunthorpe
  -1 siblings, 0 replies; 93+ messages in thread
From: Jason Gunthorpe @ 2018-05-28 16:33 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: Christoph Hellwig, keith.busch, linux-nvme, sagi, leon, oren,
	idanb, vladimirk, stable, martin.petersen

On Mon, May 28, 2018 at 03:04:38PM +0300, Max Gurtovoy wrote:
> Hi Martin/Jason,
> 
> can you please coordinate on taking this fix through your PR (RDMA or SCSi)
> ?

Martin, I see no problem for you to take it. Let me know if you prefer
we do it through RDMA.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 01/17] IB/isert: fix T10-pi check mask setting
@ 2018-05-28 16:33             ` Jason Gunthorpe
  0 siblings, 0 replies; 93+ messages in thread
From: Jason Gunthorpe @ 2018-05-28 16:33 UTC (permalink / raw)


On Mon, May 28, 2018@03:04:38PM +0300, Max Gurtovoy wrote:
> Hi Martin/Jason,
> 
> can you please coordinate on taking this fix through your PR (RDMA or SCSi)
> ?

Martin, I see no problem for you to take it. Let me know if you prefer
we do it through RDMA.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 01/17] IB/isert: fix T10-pi check mask setting
  2018-05-28 16:33             ` Jason Gunthorpe
@ 2018-05-29  3:01               ` Martin K. Petersen
  -1 siblings, 0 replies; 93+ messages in thread
From: Martin K. Petersen @ 2018-05-29  3:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Max Gurtovoy, Christoph Hellwig, keith.busch, linux-nvme, sagi,
	leon, oren, idanb, vladimirk, stable, martin.petersen


Jason,

> I see no problem for you to take it. Let me know if you prefer we do
> it through RDMA.

Applied to 4.17/scsi-fixes. Thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 01/17] IB/isert: fix T10-pi check mask setting
@ 2018-05-29  3:01               ` Martin K. Petersen
  0 siblings, 0 replies; 93+ messages in thread
From: Martin K. Petersen @ 2018-05-29  3:01 UTC (permalink / raw)



Jason,

> I see no problem for you to take it. Let me know if you prefer we do
> it through RDMA.

Applied to 4.17/scsi-fixes. Thanks!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 01/17] IB/isert: fix T10-pi check mask setting
  2018-05-29  3:01               ` Martin K. Petersen
@ 2018-05-29 12:08                 ` Max Gurtovoy
  -1 siblings, 0 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-29 12:08 UTC (permalink / raw)
  To: Martin K. Petersen, Jason Gunthorpe
  Cc: Christoph Hellwig, keith.busch, linux-nvme, sagi, leon, oren,
	idanb, vladimirk, stable



On 5/29/2018 6:01 AM, Martin K. Petersen wrote:
> 
> Jason,
> 
>> I see no problem for you to take it. Let me know if you prefer we do
>> it through RDMA.
> 
> Applied to 4.17/scsi-fixes. Thanks!
> 

Thanks Martin.

Jason,
how do you suggest to proceed with patches 2/3/4 that were reviewed by 
Christoph ?

we must wait until Martin PR for sure...

-Max.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 01/17] IB/isert: fix T10-pi check mask setting
@ 2018-05-29 12:08                 ` Max Gurtovoy
  0 siblings, 0 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-29 12:08 UTC (permalink / raw)




On 5/29/2018 6:01 AM, Martin K. Petersen wrote:
> 
> Jason,
> 
>> I see no problem for you to take it. Let me know if you prefer we do
>> it through RDMA.
> 
> Applied to 4.17/scsi-fixes. Thanks!
> 

Thanks Martin.

Jason,
how do you suggest to proceed with patches 2/3/4 that were reviewed by 
Christoph ?

we must wait until Martin PR for sure...

-Max.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 05/17] nvme: Fix extended data LBA supported setting
  2018-05-28  7:22   ` Christoph Hellwig
@ 2018-05-29 12:47     ` Max Gurtovoy
  0 siblings, 0 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-29 12:47 UTC (permalink / raw)




On 5/28/2018 10:22 AM, Christoph Hellwig wrote:
> On Sun, May 27, 2018@06:50:10PM +0300, Max Gurtovoy wrote:
>> This value depands on the metadata support value.
>>
>> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
> 
> Looks fine, I'll pull this into the nvme tree ASAP.
> 

great.
BTW, we can use it for stable kernel as well IMO.
I get NMI and many ref tag errors while configuring my nvme drive with 
extended LBA format and running local fio test. It shouldn't init the 
integrity (by calling blk_integrity_register) in this case at all since 
the driver support only pointer mode metadata. The right behavior in 
this case is to set the PRACT bit (and it's done in nvme_setup_rw).

-Max.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 01/17] IB/isert: fix T10-pi check mask setting
  2018-05-29 12:08                 ` Max Gurtovoy
@ 2018-05-29 19:23                   ` Jason Gunthorpe
  -1 siblings, 0 replies; 93+ messages in thread
From: Jason Gunthorpe @ 2018-05-29 19:23 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: Martin K. Petersen, Christoph Hellwig, keith.busch, linux-nvme,
	sagi, leon, oren, idanb, vladimirk, stable

On Tue, May 29, 2018 at 03:08:21PM +0300, Max Gurtovoy wrote:
> 
> 
> On 5/29/2018 6:01 AM, Martin K. Petersen wrote:
> >
> >Jason,
> >
> >>I see no problem for you to take it. Let me know if you prefer we do
> >>it through RDMA.
> >
> >Applied to 4.17/scsi-fixes. Thanks!
> >
> 
> Thanks Martin.
> 
> Jason,
> how do you suggest to proceed with patches 2/3/4 that were reviewed by
> Christoph ?

If you want RDMA to take any patches send them to
linux-rdma@vger.kernel.org, updated to include the tags that Christoph
included.. Otherwise they are not in the RDMA patchworks.

> we must wait until Martin PR for sure...

Hmm, maybe we should have sent this patch through rdma in that
case.. The merge window is soon anyhow so it doesn't really matter
this time - but you need to think about what tree patches need to go
to when building your patch plan, don't just send a large unstructured
series like this and expect the maintainers to figure out all the
dependencies!

So, send the patches you want to the rdma list around rc1

Thanks,
Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 01/17] IB/isert: fix T10-pi check mask setting
@ 2018-05-29 19:23                   ` Jason Gunthorpe
  0 siblings, 0 replies; 93+ messages in thread
From: Jason Gunthorpe @ 2018-05-29 19:23 UTC (permalink / raw)


On Tue, May 29, 2018@03:08:21PM +0300, Max Gurtovoy wrote:
> 
> 
> On 5/29/2018 6:01 AM, Martin K. Petersen wrote:
> >
> >Jason,
> >
> >>I see no problem for you to take it. Let me know if you prefer we do
> >>it through RDMA.
> >
> >Applied to 4.17/scsi-fixes. Thanks!
> >
> 
> Thanks Martin.
> 
> Jason,
> how do you suggest to proceed with patches 2/3/4 that were reviewed by
> Christoph ?

If you want RDMA to take any patches send them to
linux-rdma at vger.kernel.org, updated to include the tags that Christoph
included.. Otherwise they are not in the RDMA patchworks.

> we must wait until Martin PR for sure...

Hmm, maybe we should have sent this patch through rdma in that
case.. The merge window is soon anyhow so it doesn't really matter
this time - but you need to think about what tree patches need to go
to when building your patch plan, don't just send a large unstructured
series like this and expect the maintainers to figure out all the
dependencies!

So, send the patches you want to the rdma list around rc1

Thanks,
Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 01/17] IB/isert: fix T10-pi check mask setting
  2018-05-29 19:23                   ` Jason Gunthorpe
@ 2018-05-29 22:11                     ` Martin K. Petersen
  -1 siblings, 0 replies; 93+ messages in thread
From: Martin K. Petersen @ 2018-05-29 22:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Max Gurtovoy, Martin K. Petersen, Christoph Hellwig, keith.busch,
	linux-nvme, sagi, leon, oren, idanb, vladimirk, stable


Jason,

>> we must wait until Martin PR for sure...
>
> Hmm, maybe we should have sent this patch through rdma in that
> case.. The merge window is soon anyhow so it doesn't really matter
> this time - but you need to think about what tree patches need to go
> to when building your patch plan, don't just send a large unstructured
> series like this and expect the maintainers to figure out all the
> dependencies!

I am also happy to just drop the patch from scsi-fixes and you can
funnel everything through RDMA. Doesn't matter to me.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 01/17] IB/isert: fix T10-pi check mask setting
@ 2018-05-29 22:11                     ` Martin K. Petersen
  0 siblings, 0 replies; 93+ messages in thread
From: Martin K. Petersen @ 2018-05-29 22:11 UTC (permalink / raw)



Jason,

>> we must wait until Martin PR for sure...
>
> Hmm, maybe we should have sent this patch through rdma in that
> case.. The merge window is soon anyhow so it doesn't really matter
> this time - but you need to think about what tree patches need to go
> to when building your patch plan, don't just send a large unstructured
> series like this and expect the maintainers to figure out all the
> dependencies!

I am also happy to just drop the patch from scsi-fixes and you can
funnel everything through RDMA. Doesn't matter to me.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 01/17] IB/isert: fix T10-pi check mask setting
  2018-05-29 22:11                     ` Martin K. Petersen
@ 2018-05-29 22:19                       ` Jason Gunthorpe
  -1 siblings, 0 replies; 93+ messages in thread
From: Jason Gunthorpe @ 2018-05-29 22:19 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Max Gurtovoy, Christoph Hellwig, keith.busch, linux-nvme, sagi,
	leon, oren, idanb, vladimirk, stable

On Tue, May 29, 2018 at 06:11:18PM -0400, Martin K. Petersen wrote:
> 
> Jason,
> 
> >> we must wait until Martin PR for sure...
> >
> > Hmm, maybe we should have sent this patch through rdma in that
> > case.. The merge window is soon anyhow so it doesn't really matter
> > this time - but you need to think about what tree patches need to go
> > to when building your patch plan, don't just send a large unstructured
> > series like this and expect the maintainers to figure out all the
> > dependencies!
> 
> I am also happy to just drop the patch from scsi-fixes and you can
> funnel everything through RDMA. Doesn't matter to me.

Sure, if you can still do that.. Max can you send me all four patches
tomorrow and I'll route them through rdma.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 01/17] IB/isert: fix T10-pi check mask setting
@ 2018-05-29 22:19                       ` Jason Gunthorpe
  0 siblings, 0 replies; 93+ messages in thread
From: Jason Gunthorpe @ 2018-05-29 22:19 UTC (permalink / raw)


On Tue, May 29, 2018@06:11:18PM -0400, Martin K. Petersen wrote:
> 
> Jason,
> 
> >> we must wait until Martin PR for sure...
> >
> > Hmm, maybe we should have sent this patch through rdma in that
> > case.. The merge window is soon anyhow so it doesn't really matter
> > this time - but you need to think about what tree patches need to go
> > to when building your patch plan, don't just send a large unstructured
> > series like this and expect the maintainers to figure out all the
> > dependencies!
> 
> I am also happy to just drop the patch from scsi-fixes and you can
> funnel everything through RDMA. Doesn't matter to me.

Sure, if you can still do that.. Max can you send me all four patches
tomorrow and I'll route them through rdma.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 01/17] IB/isert: fix T10-pi check mask setting
  2018-05-29 22:19                       ` Jason Gunthorpe
@ 2018-05-29 22:41                         ` Martin K. Petersen
  -1 siblings, 0 replies; 93+ messages in thread
From: Martin K. Petersen @ 2018-05-29 22:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Martin K. Petersen, Max Gurtovoy, Christoph Hellwig, keith.busch,
	linux-nvme, sagi, leon, oren, idanb, vladimirk, stable


Jason,

>> I am also happy to just drop the patch from scsi-fixes and you can
>> funnel everything through RDMA. Doesn't matter to me.
>
> Sure, if you can still do that..

No problem. Done.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 01/17] IB/isert: fix T10-pi check mask setting
@ 2018-05-29 22:41                         ` Martin K. Petersen
  0 siblings, 0 replies; 93+ messages in thread
From: Martin K. Petersen @ 2018-05-29 22:41 UTC (permalink / raw)



Jason,

>> I am also happy to just drop the patch from scsi-fixes and you can
>> funnel everything through RDMA. Doesn't matter to me.
>
> Sure, if you can still do that..

No problem. Done.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 01/17] IB/isert: fix T10-pi check mask setting
  2018-05-29 22:19                       ` Jason Gunthorpe
@ 2018-05-30  8:07                         ` Max Gurtovoy
  -1 siblings, 0 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-30  8:07 UTC (permalink / raw)
  To: Jason Gunthorpe, Martin K. Petersen
  Cc: Christoph Hellwig, keith.busch, linux-nvme, sagi, leon, oren,
	idanb, vladimirk, stable



On 5/30/2018 1:19 AM, Jason Gunthorpe wrote:
> On Tue, May 29, 2018 at 06:11:18PM -0400, Martin K. Petersen wrote:
>>
>> Jason,
>>
>>>> we must wait until Martin PR for sure...
>>>
>>> Hmm, maybe we should have sent this patch through rdma in that
>>> case.. The merge window is soon anyhow so it doesn't really matter
>>> this time - but you need to think about what tree patches need to go
>>> to when building your patch plan, don't just send a large unstructured
>>> series like this and expect the maintainers to figure out all the
>>> dependencies!
>>
>> I am also happy to just drop the patch from scsi-fixes and you can
>> funnel everything through RDMA. Doesn't matter to me.
> 
> Sure, if you can still do that.. Max can you send me all four patches
> tomorrow and I'll route them through rdma.

Yes sure. I'm sorry for the ping-pong, I've tried to cover this in the 
cover letter but I guess I should've mentioned specifically what patches 
goes through which tree.

> 
> Thanks,
> Jason
> 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 01/17] IB/isert: fix T10-pi check mask setting
@ 2018-05-30  8:07                         ` Max Gurtovoy
  0 siblings, 0 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-30  8:07 UTC (permalink / raw)




On 5/30/2018 1:19 AM, Jason Gunthorpe wrote:
> On Tue, May 29, 2018@06:11:18PM -0400, Martin K. Petersen wrote:
>>
>> Jason,
>>
>>>> we must wait until Martin PR for sure...
>>>
>>> Hmm, maybe we should have sent this patch through rdma in that
>>> case.. The merge window is soon anyhow so it doesn't really matter
>>> this time - but you need to think about what tree patches need to go
>>> to when building your patch plan, don't just send a large unstructured
>>> series like this and expect the maintainers to figure out all the
>>> dependencies!
>>
>> I am also happy to just drop the patch from scsi-fixes and you can
>> funnel everything through RDMA. Doesn't matter to me.
> 
> Sure, if you can still do that.. Max can you send me all four patches
> tomorrow and I'll route them through rdma.

Yes sure. I'm sorry for the ping-pong, I've tried to cover this in the 
cover letter but I guess I should've mentioned specifically what patches 
goes through which tree.

> 
> Thanks,
> Jason
> 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 04/17] IB/isert: use T10-PI check mask definitions from core layer
  2018-05-28  7:22   ` Christoph Hellwig
@ 2018-05-30 10:48     ` Max Gurtovoy
  2018-05-30 12:08       ` Christoph Hellwig
  0 siblings, 1 reply; 93+ messages in thread
From: Max Gurtovoy @ 2018-05-30 10:48 UTC (permalink / raw)




On 5/28/2018 10:22 AM, Christoph Hellwig wrote:
> On Sun, May 27, 2018@06:50:09PM +0300, Max Gurtovoy wrote:
>> No reason to use hard-coded protection information checks in ib_isert driver.
>> Use check masks from RDMA core driver.
> 
> Looks fine:
> 
> Reviewed-by: Christoph Hellwig <hch at lst.de>
> 

Christoph,

Before I sent the patches to Jason, I will change the commit to be:

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c 
b/drivers/infiniband/ulp/isert/ib_isert.c
index d8b6bde..d03f05a 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -2106,10 +2106,13 @@
                 return -EINVAL;
         }

-       sig_attrs->check_mask =
-              (se_cmd->prot_checks & TARGET_DIF_CHECK_GUARD  ? 0xc0 : 0) |
-              (se_cmd->prot_checks & TARGET_DIF_CHECK_APPTAG ? 0x30 : 0) |
-              (se_cmd->prot_checks & TARGET_DIF_CHECK_REFTAG ? 0x0f : 0);
+       if (se_cmd->prot_checks & TARGET_DIF_CHECK_GUARD)
+               sig_attrs->check_mask |= IB_SIG_CHECK_GUARD;
+       if (se_cmd->prot_checks & TARGET_DIF_CHECK_APPTAG)
+               sig_attrs->check_mask |= IB_SIG_CHECK_APPTAG;
+       if (se_cmd->prot_checks & TARGET_DIF_CHECK_REFTAG)
+               sig_attrs->check_mask |= IB_SIG_CHECK_REFTAG;
+
         return 0;
  }



I can reduce the instructions since we don't need to do a bitwise "or" 
with 0 (check mask is set to 0 in the start of the function).

Let me know if I can add your "Reviewed-by" for this.

^ permalink raw reply related	[flat|nested] 93+ messages in thread

* [PATCH 04/17] IB/isert: use T10-PI check mask definitions from core layer
  2018-05-30 10:48     ` Max Gurtovoy
@ 2018-05-30 12:08       ` Christoph Hellwig
  2018-05-30 15:24         ` Jason Gunthorpe
  0 siblings, 1 reply; 93+ messages in thread
From: Christoph Hellwig @ 2018-05-30 12:08 UTC (permalink / raw)


On Wed, May 30, 2018@01:48:13PM +0300, Max Gurtovoy wrote:
> -       sig_attrs->check_mask =
> -              (se_cmd->prot_checks & TARGET_DIF_CHECK_GUARD  ? 0xc0 : 0) |
> -              (se_cmd->prot_checks & TARGET_DIF_CHECK_APPTAG ? 0x30 : 0) |
> -              (se_cmd->prot_checks & TARGET_DIF_CHECK_REFTAG ? 0x0f : 0);
> +       if (se_cmd->prot_checks & TARGET_DIF_CHECK_GUARD)
> +               sig_attrs->check_mask |= IB_SIG_CHECK_GUARD;
> +       if (se_cmd->prot_checks & TARGET_DIF_CHECK_APPTAG)
> +               sig_attrs->check_mask |= IB_SIG_CHECK_APPTAG;
> +       if (se_cmd->prot_checks & TARGET_DIF_CHECK_REFTAG)
> +               sig_attrs->check_mask |= IB_SIG_CHECK_REFTAG;
> +
>         return 0;
>  }
>
>
>
> I can reduce the instructions since we don't need to do a bitwise "or" with 
> 0 (check mask is set to 0 in the start of the function).

And it looks much cleaner to start with..
>
> Let me know if I can add your "Reviewed-by" for this.

Please keep it.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 04/17] IB/isert: use T10-PI check mask definitions from core layer
  2018-05-30 12:08       ` Christoph Hellwig
@ 2018-05-30 15:24         ` Jason Gunthorpe
  2018-05-30 21:59           ` Sagi Grimberg
  0 siblings, 1 reply; 93+ messages in thread
From: Jason Gunthorpe @ 2018-05-30 15:24 UTC (permalink / raw)


On Wed, May 30, 2018@02:08:01PM +0200, Christoph Hellwig wrote:
> On Wed, May 30, 2018@01:48:13PM +0300, Max Gurtovoy wrote:
> > -       sig_attrs->check_mask =
> > -              (se_cmd->prot_checks & TARGET_DIF_CHECK_GUARD  ? 0xc0 : 0) |
> > -              (se_cmd->prot_checks & TARGET_DIF_CHECK_APPTAG ? 0x30 : 0) |
> > -              (se_cmd->prot_checks & TARGET_DIF_CHECK_REFTAG ? 0x0f : 0);
> > +       if (se_cmd->prot_checks & TARGET_DIF_CHECK_GUARD)
> > +               sig_attrs->check_mask |= IB_SIG_CHECK_GUARD;
> > +       if (se_cmd->prot_checks & TARGET_DIF_CHECK_APPTAG)
> > +               sig_attrs->check_mask |= IB_SIG_CHECK_APPTAG;
> > +       if (se_cmd->prot_checks & TARGET_DIF_CHECK_REFTAG)
> > +               sig_attrs->check_mask |= IB_SIG_CHECK_REFTAG;
> > +
> >         return 0;
> >  }
> >
> >
> >
> > I can reduce the instructions since we don't need to do a bitwise "or" with 
> > 0 (check mask is set to 0 in the start of the function).
> 
> And it looks much cleaner to start with..

I also prefer this version to the use of the ternary operator..

Thanks,
Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 01/17] IB/isert: fix T10-pi check mask setting
  2018-05-30  8:07                         ` Max Gurtovoy
@ 2018-05-30 15:30                           ` Jason Gunthorpe
  -1 siblings, 0 replies; 93+ messages in thread
From: Jason Gunthorpe @ 2018-05-30 15:30 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: Martin K. Petersen, Christoph Hellwig, keith.busch, linux-nvme,
	sagi, leon, oren, idanb, vladimirk, stable

On Wed, May 30, 2018 at 11:07:17AM +0300, Max Gurtovoy wrote:
> 
> 
> On 5/30/2018 1:19 AM, Jason Gunthorpe wrote:
> >On Tue, May 29, 2018 at 06:11:18PM -0400, Martin K. Petersen wrote:
> >>
> >>Jason,
> >>
> >>>>we must wait until Martin PR for sure...
> >>>
> >>>Hmm, maybe we should have sent this patch through rdma in that
> >>>case.. The merge window is soon anyhow so it doesn't really matter
> >>>this time - but you need to think about what tree patches need to go
> >>>to when building your patch plan, don't just send a large unstructured
> >>>series like this and expect the maintainers to figure out all the
> >>>dependencies!
> >>
> >>I am also happy to just drop the patch from scsi-fixes and you can
> >>funnel everything through RDMA. Doesn't matter to me.
> >
> >Sure, if you can still do that.. Max can you send me all four patches
> >tomorrow and I'll route them through rdma.
> 
> Yes sure. I'm sorry for the ping-pong, I've tried to cover this in the cover
> letter but I guess I should've mentioned specifically what patches goes
> through which tree.

Just in general don't send such a jumble of patches for multiple
maintainers under one cover letter. It is error prone to sort it out.

Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 01/17] IB/isert: fix T10-pi check mask setting
@ 2018-05-30 15:30                           ` Jason Gunthorpe
  0 siblings, 0 replies; 93+ messages in thread
From: Jason Gunthorpe @ 2018-05-30 15:30 UTC (permalink / raw)


On Wed, May 30, 2018@11:07:17AM +0300, Max Gurtovoy wrote:
> 
> 
> On 5/30/2018 1:19 AM, Jason Gunthorpe wrote:
> >On Tue, May 29, 2018@06:11:18PM -0400, Martin K. Petersen wrote:
> >>
> >>Jason,
> >>
> >>>>we must wait until Martin PR for sure...
> >>>
> >>>Hmm, maybe we should have sent this patch through rdma in that
> >>>case.. The merge window is soon anyhow so it doesn't really matter
> >>>this time - but you need to think about what tree patches need to go
> >>>to when building your patch plan, don't just send a large unstructured
> >>>series like this and expect the maintainers to figure out all the
> >>>dependencies!
> >>
> >>I am also happy to just drop the patch from scsi-fixes and you can
> >>funnel everything through RDMA. Doesn't matter to me.
> >
> >Sure, if you can still do that.. Max can you send me all four patches
> >tomorrow and I'll route them through rdma.
> 
> Yes sure. I'm sorry for the ping-pong, I've tried to cover this in the cover
> letter but I guess I should've mentioned specifically what patches goes
> through which tree.

Just in general don't send such a jumble of patches for multiple
maintainers under one cover letter. It is error prone to sort it out.

Jason

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host
  2018-05-27 15:50 [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host Max Gurtovoy
                   ` (16 preceding siblings ...)
  2018-05-27 15:50 ` [PATCH 17/17] nvme-rdma: Add T10-PI support Max Gurtovoy
@ 2018-05-30 21:47 ` Sagi Grimberg
  17 siblings, 0 replies; 93+ messages in thread
From: Sagi Grimberg @ 2018-05-30 21:47 UTC (permalink / raw)



> Hello Sagi, Christoph, Keith, Jason and Co
> 
> This patchset adds T10-PI support for NVMeoF/RDMA host side, using signature
> verbs API.

Cool ;)

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 01/17] IB/isert: fix T10-pi check mask setting
  2018-05-27 15:50   ` Max Gurtovoy
@ 2018-05-30 21:47     ` Sagi Grimberg
  -1 siblings, 0 replies; 93+ messages in thread
From: Sagi Grimberg @ 2018-05-30 21:47 UTC (permalink / raw)
  To: Max Gurtovoy, jgg, keith.busch, linux-nvme, hch
  Cc: leon, oren, idanb, vladimirk, stable

Looks good,

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 01/17] IB/isert: fix T10-pi check mask setting
@ 2018-05-30 21:47     ` Sagi Grimberg
  0 siblings, 0 replies; 93+ messages in thread
From: Sagi Grimberg @ 2018-05-30 21:47 UTC (permalink / raw)


Looks good,

Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* Re: [PATCH 01/17] IB/isert: fix T10-pi check mask setting
  2018-05-27 15:50   ` Max Gurtovoy
@ 2018-05-30 21:49     ` Sagi Grimberg
  -1 siblings, 0 replies; 93+ messages in thread
From: Sagi Grimberg @ 2018-05-30 21:49 UTC (permalink / raw)
  To: Max Gurtovoy, jgg, keith.busch, linux-nvme, hch
  Cc: leon, oren, idanb, vladimirk, stable

btw, please CC target-devel for isert patches in the future. Also the
subject prefix is usually iser-target (not a must though)

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 01/17] IB/isert: fix T10-pi check mask setting
@ 2018-05-30 21:49     ` Sagi Grimberg
  0 siblings, 0 replies; 93+ messages in thread
From: Sagi Grimberg @ 2018-05-30 21:49 UTC (permalink / raw)


btw, please CC target-devel for isert patches in the future. Also the
subject prefix is usually iser-target (not a must though)

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 02/17] RDMA/core: introduce check masks for T10-PI offload
  2018-05-27 15:50 ` [PATCH 02/17] RDMA/core: introduce check masks for T10-PI offload Max Gurtovoy
  2018-05-28  7:21   ` Christoph Hellwig
@ 2018-05-30 21:56   ` Sagi Grimberg
  1 sibling, 0 replies; 93+ messages in thread
From: Sagi Grimberg @ 2018-05-30 21:56 UTC (permalink / raw)



>   /**
> + * Signature check masks (8 bytes in total) according to the T10-PI standard:
> + *  -------- -------- ------------
> + * | GUARD  | APPTAG |   REFTAG   |
> + * |  2B    |  2B    |    4B      |
> + *  -------- -------- ------------
> + */
> +enum {
> +	IB_SIG_CHECK_GUARD = 0xc0,
> +	IB_SIG_CHECK_APPTAG = 0x30,
> +	IB_SIG_CHECK_REFTAG = 0x0f,
> +};
> +
> +/**
>    * struct ib_sig_err - signature error descriptor
>    */
>   struct ib_sig_err {

I'm thinking we ought to move al the dif/signature stuff to its own
header rather than spamming ib_verbs.h...

Otherwise looks good,

Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 03/17] IB/iser: use T10-PI check mask definitions from core layer
  2018-05-27 15:50 ` [PATCH 03/17] IB/iser: use T10-PI check mask definitions from core layer Max Gurtovoy
  2018-05-28  7:22   ` Christoph Hellwig
@ 2018-05-30 21:57   ` Sagi Grimberg
  1 sibling, 0 replies; 93+ messages in thread
From: Sagi Grimberg @ 2018-05-30 21:57 UTC (permalink / raw)


Looks good,

Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 04/17] IB/isert: use T10-PI check mask definitions from core layer
  2018-05-27 15:50 ` [PATCH 04/17] IB/isert: " Max Gurtovoy
  2018-05-28  7:22   ` Christoph Hellwig
@ 2018-05-30 21:58   ` Sagi Grimberg
  1 sibling, 0 replies; 93+ messages in thread
From: Sagi Grimberg @ 2018-05-30 21:58 UTC (permalink / raw)



> diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
> index d8b6bde..8a0cda6 100644
> --- a/drivers/infiniband/ulp/isert/ib_isert.c
> +++ b/drivers/infiniband/ulp/isert/ib_isert.c
> @@ -2107,9 +2107,9 @@
>   	}
>   
>   	sig_attrs->check_mask =
> -	       (se_cmd->prot_checks & TARGET_DIF_CHECK_GUARD  ? 0xc0 : 0) |
> -	       (se_cmd->prot_checks & TARGET_DIF_CHECK_APPTAG ? 0x30 : 0) |
> -	       (se_cmd->prot_checks & TARGET_DIF_CHECK_REFTAG ? 0x0f : 0);
> +	       (se_cmd->prot_checks & TARGET_DIF_CHECK_GUARD  ? IB_SIG_CHECK_GUARD : 0)  |
> +	       (se_cmd->prot_checks & TARGET_DIF_CHECK_APPTAG ? IB_SIG_CHECK_APPTAG : 0) |
> +	       (se_cmd->prot_checks & TARGET_DIF_CHECK_REFTAG ? IB_SIG_CHECK_REFTAG : 0);

Please consider avoiding >80 char lines.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 04/17] IB/isert: use T10-PI check mask definitions from core layer
  2018-05-30 15:24         ` Jason Gunthorpe
@ 2018-05-30 21:59           ` Sagi Grimberg
  0 siblings, 0 replies; 93+ messages in thread
From: Sagi Grimberg @ 2018-05-30 21:59 UTC (permalink / raw)



>>> -       sig_attrs->check_mask =
>>> -              (se_cmd->prot_checks & TARGET_DIF_CHECK_GUARD  ? 0xc0 : 0) |
>>> -              (se_cmd->prot_checks & TARGET_DIF_CHECK_APPTAG ? 0x30 : 0) |
>>> -              (se_cmd->prot_checks & TARGET_DIF_CHECK_REFTAG ? 0x0f : 0);
>>> +       if (se_cmd->prot_checks & TARGET_DIF_CHECK_GUARD)
>>> +               sig_attrs->check_mask |= IB_SIG_CHECK_GUARD;
>>> +       if (se_cmd->prot_checks & TARGET_DIF_CHECK_APPTAG)
>>> +               sig_attrs->check_mask |= IB_SIG_CHECK_APPTAG;
>>> +       if (se_cmd->prot_checks & TARGET_DIF_CHECK_REFTAG)
>>> +               sig_attrs->check_mask |= IB_SIG_CHECK_REFTAG;
>>> +
>>>          return 0;
>>>   }
>>>
>>>
>>>
>>> I can reduce the instructions since we don't need to do a bitwise "or" with
>>> 0 (check mask is set to 0 in the start of the function).
>>
>> And it looks much cleaner to start with..
> 
> I also prefer this version to the use of the ternary operator..

I concur, when you send it out, you can add

Reviewed-by: Sagi Grimberg <sagi at grimbrg.me>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 05/17] nvme: Fix extended data LBA supported setting
  2018-05-27 15:50 ` [PATCH 05/17] nvme: Fix extended data LBA supported setting Max Gurtovoy
  2018-05-28  7:22   ` Christoph Hellwig
@ 2018-05-30 22:00   ` Sagi Grimberg
  1 sibling, 0 replies; 93+ messages in thread
From: Sagi Grimberg @ 2018-05-30 22:00 UTC (permalink / raw)


Reviewed-by: Sagi Grimberg <sagi at grmberg.me>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 06/17] nvme: Add WARN in case fabrics ctrl was set with wrong metadata caps
  2018-05-28  7:24   ` Christoph Hellwig
  2018-05-28 14:56     ` Max Gurtovoy
@ 2018-05-30 22:05     ` Sagi Grimberg
  1 sibling, 0 replies; 93+ messages in thread
From: Sagi Grimberg @ 2018-05-30 22:05 UTC (permalink / raw)



>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index d8254b4..c605bb3 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -1419,9 +1419,10 @@ static void nvme_update_disk_info(struct gendisk *disk,
>>   	blk_queue_physical_block_size(disk->queue, bs);
>>   	blk_queue_io_min(disk->queue, bs);
>>   
>> -	if (ns->ms && !ns->ext &&
>> -	    (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
>> -		nvme_init_integrity(disk, ns->ms, ns->pi_type);
>> +	if (ns->ms && (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED)) {
>> +		if ((ns->ctrl->ops->flags & NVME_F_FABRICS) || !ns->ext)
>> +			nvme_init_integrity(disk, ns->ms, ns->pi_type);
>> +	}
>>   	if (ns->ms && !nvme_ns_has_pi(ns) && !blk_get_integrity(disk))
>>   		capacity = 0;
>>   	set_capacity(disk, capacity);
>> @@ -1445,6 +1446,18 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
>>   	ns->noiob = le16_to_cpu(id->noiob);
>>   	ns->ms = le16_to_cpu(id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ms);
>>   	ns->ext = ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT);
>> +
>> +	/*
>> +	 * For Fabrics, only metadata as part of extended data LBA is supported.
>> +	 * Reduce the metadata size in case of a spec violation.
>> +	 */
>> +	if (ns->ms && (ns->ctrl->ops->flags & NVME_F_FABRICS)) {
>> +		if (!ns->ext) {
>> +			WARN_ON_ONCE(1);
>> +			ns->ms = 0;
>> +		}
>> +	}
> 
> Should we just reject the probe instead of papering over it?

Yes I think we should, we can't really go ahead with this namespace as
if nothing happened...

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 07/17] nvme: introduce max_integrity_segments ctrl attribute
  2018-05-27 15:50 ` [PATCH 07/17] nvme: introduce max_integrity_segments ctrl attribute Max Gurtovoy
@ 2018-05-30 22:08   ` Sagi Grimberg
  0 siblings, 0 replies; 93+ messages in thread
From: Sagi Grimberg @ 2018-05-30 22:08 UTC (permalink / raw)


Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 08/17] nvme: limit integrity segments to be <= data segments
  2018-05-27 15:50 ` [PATCH 08/17] nvme: limit integrity segments to be <= data segments Max Gurtovoy
@ 2018-05-30 22:09   ` Sagi Grimberg
  2018-06-07 13:02     ` Max Gurtovoy
  0 siblings, 1 reply; 93+ messages in thread
From: Sagi Grimberg @ 2018-05-30 22:09 UTC (permalink / raw)



> There is no reason for setting max_integrity_segments to be greater than
> max_segments.
> 
> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
> ---
>   drivers/nvme/host/core.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 197208a..e81e898 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1795,6 +1795,10 @@ static void nvme_set_queue_limits(struct nvme_ctrl *ctrl,
>   		u32 max_segments =
>   			(ctrl->max_hw_sectors / (ctrl->page_size >> 9)) + 1;
>   
> +		if (ctrl->max_integrity_segments)
> +			ctrl->max_integrity_segments = min(ctrl->max_integrity_segments,
> +							   max_segments);
> +

What is this protecting against?

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 10/17] nvme: export nvme_ns_has_pi function
  2018-05-27 15:50 ` [PATCH 10/17] nvme: export nvme_ns_has_pi function Max Gurtovoy
  2018-05-28  7:25   ` Christoph Hellwig
@ 2018-05-30 22:19   ` Sagi Grimberg
  1 sibling, 0 replies; 93+ messages in thread
From: Sagi Grimberg @ 2018-05-30 22:19 UTC (permalink / raw)


Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 11/17] nvme-rdma: Introduce cqe for local invalidation
  2018-05-27 15:50 ` [PATCH 11/17] nvme-rdma: Introduce cqe for local invalidation Max Gurtovoy
  2018-05-28  7:25   ` Christoph Hellwig
@ 2018-05-30 22:26   ` Sagi Grimberg
  1 sibling, 0 replies; 93+ messages in thread
From: Sagi Grimberg @ 2018-05-30 22:26 UTC (permalink / raw)



> Using the same cqe object for registration and invalidation completions
> is not safe.

Its perfectly safe as we never post invalidate before the reg_mr
completed as the subsequent send is fenced behind it.

> Separate them to use 2 cqe objects. Also pass the rkey and
> the cqe as arguments for local invalidation function, as a preparation
> for invalidating sig_mr keys.
> 
> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
> ---
>   drivers/nvme/host/rdma.c | 15 ++++++++-------
>   1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 1eb4438..a240800 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -68,6 +68,7 @@ struct nvme_rdma_request {
>   	int			nents;
>   	struct ib_reg_wr	reg_wr;
>   	struct ib_cqe		reg_cqe;
> +	struct ib_cqe		inv_cqe;
>   	struct nvme_rdma_queue  *queue;
>   	struct sg_table		sg_table;
>   	struct scatterlist	first_sgl[];
> @@ -1020,7 +1021,7 @@ static void nvme_rdma_memreg_done(struct ib_cq *cq, struct ib_wc *wc)
>   static void nvme_rdma_inv_rkey_done(struct ib_cq *cq, struct ib_wc *wc)
>   {
>   	struct nvme_rdma_request *req =
> -		container_of(wc->wr_cqe, struct nvme_rdma_request, reg_cqe);
> +		container_of(wc->wr_cqe, struct nvme_rdma_request, inv_cqe);
>   	struct request *rq = blk_mq_rq_from_pdu(req);
>   
>   	if (unlikely(wc->status != IB_WC_SUCCESS)) {
> @@ -1033,8 +1034,8 @@ static void nvme_rdma_inv_rkey_done(struct ib_cq *cq, struct ib_wc *wc)
>   
>   }
>   
> -static int nvme_rdma_inv_rkey(struct nvme_rdma_queue *queue,
> -		struct nvme_rdma_request *req)
> +static int nvme_rdma_inv_rkey(struct nvme_rdma_queue *queue, u32 rkey,
> +		struct ib_cqe *cqe)
>   {
>   	struct ib_send_wr *bad_wr;
>   	struct ib_send_wr wr = {
> @@ -1042,11 +1043,10 @@ static int nvme_rdma_inv_rkey(struct nvme_rdma_queue *queue,
>   		.next		    = NULL,
>   		.num_sge	    = 0,
>   		.send_flags	    = IB_SEND_SIGNALED,
> -		.ex.invalidate_rkey = req->mr->rkey,
> +		.ex.invalidate_rkey = rkey,
>   	};
>   
> -	req->reg_cqe.done = nvme_rdma_inv_rkey_done;
> -	wr.wr_cqe = &req->reg_cqe;
> +	wr.wr_cqe = cqe;
>   
>   	return ib_post_send(queue->qp, &wr, &bad_wr);
>   }
> @@ -1141,6 +1141,7 @@ static int nvme_rdma_map_sg_fr(struct nvme_rdma_queue *queue,
>   	ib_update_fast_reg_key(req->mr, ib_inc_rkey(req->mr->rkey));
>   
>   	req->reg_cqe.done = nvme_rdma_memreg_done;
> +	req->inv_cqe.done = nvme_rdma_inv_rkey_done;
>   	memset(&req->reg_wr, 0, sizeof(req->reg_wr));
>   	req->reg_wr.wr.opcode = IB_WR_REG_MR;
>   	req->reg_wr.wr.wr_cqe = &req->reg_cqe;
> @@ -1348,7 +1349,7 @@ static int nvme_rdma_process_nvme_rsp(struct nvme_rdma_queue *queue,
>   			nvme_rdma_error_recovery(queue->ctrl);
>   		}
>   	} else if (req->mr) {
> -		ret = nvme_rdma_inv_rkey(queue, req);
> +		ret = nvme_rdma_inv_rkey(queue, req->mr->rkey, &req->inv_cqe);
>   		if (unlikely(ret < 0)) {
>   			dev_err(queue->ctrl->ctrl.device,
>   				"Queueing INV WR for rkey %#x failed (%d)\n",
> 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 12/17] nvme-rdma: Introduce nvme_rdma_set_keyed_sgl helper func
  2018-05-28  7:26   ` Christoph Hellwig
@ 2018-05-30 22:27     ` Sagi Grimberg
  0 siblings, 0 replies; 93+ messages in thread
From: Sagi Grimberg @ 2018-05-30 22:27 UTC (permalink / raw)



>> This function will be used to avoid code duplication while
>> setting keyed sgl fields.
> 
> Does this really buy us anything?
> 
>>
>> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
>> ---
>>   drivers/nvme/host/rdma.c | 34 +++++++++++++++++++---------------
>>   1 file changed, 19 insertions(+), 15 deletions(-)
> 
> Because the diffstate doesn't look very promising, and the code isn't
> really more readable either.

Can't say I like it either...

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 14/17] nvme-rdma: refactor cmd mapping/unmapping mechanism
  2018-05-27 15:50 ` [PATCH 14/17] nvme-rdma: refactor cmd mapping/unmapping mechanism Max Gurtovoy
@ 2018-05-30 22:33   ` Sagi Grimberg
  0 siblings, 0 replies; 93+ messages in thread
From: Sagi Grimberg @ 2018-05-30 22:33 UTC (permalink / raw)



> Split the IO request mapping for RDMA operation to a structured
> procedure:
> - map the rq data sgl to rdma data_sgl
> - map the rdma data_sgl to nvme descriptor

 From first sight it looks somewhat redundant,

> 
> This is a preparation patch for adding T10-PI support.

but we shall see ;)

> 
> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
> ---
>   drivers/nvme/host/rdma.c | 114 +++++++++++++++++++++++++++++++----------------
>   1 file changed, 75 insertions(+), 39 deletions(-)
> 
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 3b63811..a54de37 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -1055,28 +1055,35 @@ static int nvme_rdma_inv_rkey(struct nvme_rdma_queue *queue, u32 rkey,
>   	return ib_post_send(queue->qp, &wr, &bad_wr);
>   }
>   
> -static void nvme_rdma_unmap_data(struct nvme_rdma_queue *queue,
> -		struct request *rq)
> +static void nvme_rdma_unmap_data_sgl(struct nvme_rdma_queue *queue,
> +		struct nvme_rdma_sgl *sgl, struct request *rq)
>   {
> -	struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
> -	struct nvme_rdma_sgl *sgl = &req->data_sgl;
>   	struct nvme_rdma_device *dev = queue->device;
>   	struct ib_device *ibdev = dev->dev;
>   
> +	ib_dma_unmap_sg(ibdev, sgl->sg_table.sgl,
> +			sgl->nents, rq_data_dir(rq) ==
> +				    WRITE ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
> +
> +	sg_free_table_chained(&sgl->sg_table, true);
> +}
> +
> +static void nvme_rdma_unmap_cmd(struct nvme_rdma_queue *queue,
> +		struct request *rq)
> +{
> +	struct nvme_rdma_request *req;
> +
>   	if (!blk_rq_payload_bytes(rq))
>   		return;
>   
> -	if (sgl->mr) {
> -		ib_mr_pool_put(queue->qp, &queue->qp->rdma_mrs, sgl->mr);
> -		sgl->mr = NULL;
> +	req = blk_mq_rq_to_pdu(rq);
> +	if (req->data_sgl.mr) {
> +		ib_mr_pool_put(queue->qp, &queue->qp->rdma_mrs, req->data_sgl.mr);
> +		req->data_sgl.mr = NULL;
>   	}
>   
> -	ib_dma_unmap_sg(ibdev, sgl->sg_table.sgl,
> -			sgl->nents, rq_data_dir(rq) ==
> -				    WRITE ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
> -
> +	nvme_rdma_unmap_data_sgl(queue, &req->data_sgl, rq);
>   	nvme_cleanup_cmd(rq);
> -	sg_free_table_chained(&sgl->sg_table, true);
>   }
>   
>   static void nvme_rdma_set_keyed_sgl(u64 addr, u64 length, u32 key,
> @@ -1171,7 +1178,49 @@ static int nvme_rdma_map_sg_fr(struct nvme_rdma_queue *queue,
>   	return 0;
>   }
>   
> -static int nvme_rdma_map_data(struct nvme_rdma_queue *queue,
> +static int nvme_rdma_map_data_sgl(struct nvme_rdma_sgl *sgl,
> +		struct request *rq, struct ib_device *ibdev, int *count)
> +{
> +	int ret;
> +
> +	sgl->sg_table.sgl = sgl->first_sgl;
> +	ret = sg_alloc_table_chained(&sgl->sg_table,
> +			blk_rq_nr_phys_segments(rq), sgl->sg_table.sgl);
> +	if (unlikely(ret))
> +		return -ENOMEM;
> +
> +	sgl->nents = blk_rq_map_sg(rq->q, rq, sgl->sg_table.sgl);
> +
> +	*count = ib_dma_map_sg(ibdev, sgl->sg_table.sgl, sgl->nents,
> +		    rq_data_dir(rq) == WRITE ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
> +	if (unlikely(*count <= 0)) {
> +		ret = -EIO;
> +		goto out_free_table;
> +	}
> +
> +	return 0;
> +
> +out_free_table:
> +	sg_free_table_chained(&sgl->sg_table, true);
> +	return ret;
> +}
> +
> +static int nvme_rdma_map_rq(struct nvme_rdma_queue *queue,
> +		struct nvme_rdma_request *req, struct request *rq,
> +		struct nvme_command *c, int nents)
> +{
> +	if (nents == 1) {
> +		if (rq_data_dir(rq) == WRITE && nvme_rdma_queue_idx(queue) &&
> +		    blk_rq_payload_bytes(rq) <=	nvme_rdma_inline_data_size(queue))
> +			return nvme_rdma_map_sg_inline(queue, req, c);
> +		if (queue->device->pd->flags & IB_PD_UNSAFE_GLOBAL_RKEY)
> +			return nvme_rdma_map_sg_single(queue, req, c);
> +	}
> +
> +	return nvme_rdma_map_sg_fr(queue, req, c, nents);
> +}
> +
> +static int nvme_rdma_setup_cmd(struct nvme_rdma_queue *queue,
>   		struct request *rq, struct nvme_command *c)
>   {
>   	struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
> @@ -1188,32 +1237,19 @@ static int nvme_rdma_map_data(struct nvme_rdma_queue *queue,
>   	if (!blk_rq_payload_bytes(rq))
>   		return nvme_rdma_set_sg_null(c);
>   
> -	sgl->sg_table.sgl = sgl->first_sgl;
> -	ret = sg_alloc_table_chained(&sgl->sg_table,
> -			blk_rq_nr_phys_segments(rq), sgl->sg_table.sgl);
> -	if (ret)
> -		return -ENOMEM;
> -
> -	sgl->nents = blk_rq_map_sg(rq->q, rq, sgl->sg_table.sgl);
> -
> -	count = ib_dma_map_sg(ibdev, sgl->sg_table.sgl, sgl->nents,
> -		    rq_data_dir(rq) == WRITE ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
> -	if (unlikely(count <= 0)) {
> -		sg_free_table_chained(&sgl->sg_table, true);
> -		return -EIO;
> -	}
> +	ret = nvme_rdma_map_data_sgl(sgl, rq, ibdev, &count);
> +	if (unlikely(ret))
> +		return ret;
>   
> -	if (count == 1) {
> -		if (rq_data_dir(rq) == WRITE && nvme_rdma_queue_idx(queue) &&
> -		    blk_rq_payload_bytes(rq) <=
> -				nvme_rdma_inline_data_size(queue))
> -			return nvme_rdma_map_sg_inline(queue, req, c);
> +	ret = nvme_rdma_map_rq(queue, req, rq, c, count);
> +	if (unlikely(ret))
> +		goto out_unmap_data_sgl;
>   
> -		if (dev->pd->flags & IB_PD_UNSAFE_GLOBAL_RKEY)
> -			return nvme_rdma_map_sg_single(queue, req, c);
> -	}
> +	return 0;
>   
> -	return nvme_rdma_map_sg_fr(queue, req, c, count);
> +out_unmap_data_sgl:
> +	nvme_rdma_unmap_data_sgl(queue, sgl, rq);
> +	return ret;
>   }
>   
>   static void nvme_rdma_send_done(struct ib_cq *cq, struct ib_wc *wc)
> @@ -1645,7 +1681,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
>   
>   	blk_mq_start_request(rq);
>   
> -	err = nvme_rdma_map_data(queue, rq, c);
> +	err = nvme_rdma_setup_cmd(queue, rq, c);
>   	if (unlikely(err < 0)) {
>   		dev_err(queue->ctrl->ctrl.device,
>   			     "Failed to map data (%d)\n", err);
> @@ -1661,7 +1697,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
>   	err = nvme_rdma_post_send(queue, sqe, req->sge, req->num_sge,
>   			req->data_sgl.mr ? &req->data_sgl.reg_wr.wr : NULL);
>   	if (unlikely(err)) {
> -		nvme_rdma_unmap_data(queue, rq);
> +		nvme_rdma_unmap_cmd(queue, rq);
>   		goto err;
>   	}
>   
> @@ -1697,7 +1733,7 @@ static void nvme_rdma_complete_rq(struct request *rq)
>   {
>   	struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq);
>   
> -	nvme_rdma_unmap_data(req->queue, rq);
> +	nvme_rdma_unmap_cmd(req->queue, rq);
>   	nvme_complete_rq(rq);
>   }
>   
> 

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 17/17] nvme-rdma: Add T10-PI support
  2018-05-27 15:50 ` [PATCH 17/17] nvme-rdma: Add T10-PI support Max Gurtovoy
  2018-05-28  7:28   ` Christoph Hellwig
@ 2018-05-30 23:05   ` Sagi Grimberg
  2018-06-03  8:51     ` Max Gurtovoy
  1 sibling, 1 reply; 93+ messages in thread
From: Sagi Grimberg @ 2018-05-30 23:05 UTC (permalink / raw)


Max,

> For capable HCAs (e.g. ConnectX-4/ConnectX-5) this will allow end-to-end
> protection information passthrough and validation for NVMe over RDMA transport.
> Similar to iSER, T10-PI offload support was implemented over RDMA signature
> verbs API and is enabled via module parameter. In the future we might want
> to add support per controller.

In NVMe we like to call inherited features differently, in nvme its
metadata and not T10-PI :)

Thanks for doing this btw!

Looking at the code, I realize that the rdma core API can improve
significantly. I think we can actually achieve a much better
abstraction of the HW implementation details. This would be a good
chance to clean it all up.

What I'm thinking of is we should have a single MR that
describes everything (data, metadata, signature parameters) instead
of exposing the use of multiple MRs via the API.

Here's how I think we can do it (and correct me where I'm wrong)
1. modify ib_alloc_mr to accept another parameter of max_meta_sg
which will tell the driver to allocate more "internal" mrs or entries
for a signature enabled MR (driver that does not support would fail if
its set or non-signature MR was requested).

2. introduce ib_map_mr_sg_sig that will accept both data and meta sgls
and will prepare the internal MR mappings.

3. have ib_sig_handover_wr perform whatever needed to do the job.
Which is posting the reg_mr for the "internal" mr(s).

Note that IIRC the Mellanox device can register with disabling the
device checking the MR was invalidated so it wouldn't need to be
locally invalidated (note that this is an internal MR that is private
to the driver with local access rights so no harm in not invalidating it).

4. Move ib_sig_attrs into a signature enabled ib_mr (dynamically
allocated).

5. Given that we now have two block consumers of this, add a better
abstraction helpers that take struct request and blk_integrity and
fill out the ib_sig_attrs accordingly...

Side note, the signature enabled MR can have only a single "internal"
mr because the strided format in the mlx5 signature segment is perfectly
capable of accepting two strides in a single lkey.

So consider a signature stride that takes two sgls:
data: {d1, d2, d3, d4}
meta: {m1, m2, m3, m4}

So the internal MR will look like:

            data start  meta start
            |           |
            -------------------------
            |d1|d2|d3|d4|m1|m2|m3|m4|
            -------------------------

and the sig_mr stride block would be same lkey but different
offsets in it (offset 0 and offset d1+d2+d3+d4)

Thoughts?

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 17/17] nvme-rdma: Add T10-PI support
  2018-05-30 23:05   ` Sagi Grimberg
@ 2018-06-03  8:51     ` Max Gurtovoy
  2018-06-03 11:30       ` Sagi Grimberg
  0 siblings, 1 reply; 93+ messages in thread
From: Max Gurtovoy @ 2018-06-03  8:51 UTC (permalink / raw)


Sagi,

our architecture team have presented MR features and improvments during 
OFA 2018, and there are some similar ideas to your idea below. We need 
to thing and plan how to progress here. Your suggestion looks reasonable 
but in my opinion it should be divided to 2 phases (otherwise it will be 
hard to push it in the near future):
1. use 1 MR to map data + metadata
2. use 1 MR to describe data, metadata and signature params.

Oren,
Can you share the vision for the MR layout and how it can fit to the T10 
solution for ULPs ? Also pros/cons about the internal MR/s implemetation ?

On 5/31/2018 2:05 AM, Sagi Grimberg wrote:
> Max,
> 
>> For capable HCAs (e.g. ConnectX-4/ConnectX-5) this will allow end-to-end
>> protection information passthrough and validation for NVMe over RDMA 
>> transport.
>> Similar to iSER, T10-PI offload support was implemented over RDMA 
>> signature
>> verbs API and is enabled via module parameter. In the future we might 
>> want
>> to add support per controller.
> 
> In NVMe we like to call inherited features differently, in nvme its
> metadata and not T10-PI :)
> 
> Thanks for doing this btw!

NP :)

> 
> Looking at the code, I realize that the rdma core API can improve
> significantly. I think we can actually achieve a much better
> abstraction of the HW implementation details. This would be a good
> chance to clean it all up.
> 
> What I'm thinking of is we should have a single MR that
> describes everything (data, metadata, signature parameters) instead
> of exposing the use of multiple MRs via the API.
> 
> Here's how I think we can do it (and correct me where I'm wrong)
> 1. modify ib_alloc_mr to accept another parameter of max_meta_sg
> which will tell the driver to allocate more "internal" mrs or entries
> for a signature enabled MR (driver that does not support would fail if
> its set or non-signature MR was requested).
> 
> 2. introduce ib_map_mr_sg_sig that will accept both data and meta sgls
> and will prepare the internal MR mappings.
> 
> 3. have ib_sig_handover_wr perform whatever needed to do the job.
> Which is posting the reg_mr for the "internal" mr(s).
> 
> Note that IIRC the Mellanox device can register with disabling the
> device checking the MR was invalidated so it wouldn't need to be
> locally invalidated (note that this is an internal MR that is private
> to the driver with local access rights so no harm in not invalidating it).
> 
> 4. Move ib_sig_attrs into a signature enabled ib_mr (dynamically
> allocated).
> 
> 5. Given that we now have two block consumers of this, add a better
> abstraction helpers that take struct request and blk_integrity and
> fill out the ib_sig_attrs accordingly...
> 
> Side note, the signature enabled MR can have only a single "internal"
> mr because the strided format in the mlx5 signature segment is perfectly
> capable of accepting two strides in a single lkey.
> 
> So consider a signature stride that takes two sgls:
> data: {d1, d2, d3, d4}
> meta: {m1, m2, m3, m4}
> 
> So the internal MR will look like:
> 
>  ?????????? data start? meta start
>  ?????????? |?????????? |
>  ?????????? -------------------------
>  ?????????? |d1|d2|d3|d4|m1|m2|m3|m4|
>  ?????????? -------------------------
> 
> and the sig_mr stride block would be same lkey but different
> offsets in it (offset 0 and offset d1+d2+d3+d4)
> 
> Thoughts?

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 17/17] nvme-rdma: Add T10-PI support
  2018-06-03  8:51     ` Max Gurtovoy
@ 2018-06-03 11:30       ` Sagi Grimberg
  2018-06-03 14:01         ` Oren Duer
  2018-06-06 12:33         ` Max Gurtovoy
  0 siblings, 2 replies; 93+ messages in thread
From: Sagi Grimberg @ 2018-06-03 11:30 UTC (permalink / raw)



> Sagi,
> 
> our architecture team have presented MR features and improvments during 
> OFA 2018, and there are some similar ideas to your idea below. We need 
> to thing and plan how to progress here.

I have not attended the OFA workshop.

> Your suggestion looks reasonable 
> but in my opinion it should be divided to 2 phases (otherwise it will be 
> hard to push it in the near future):
> 1. use 1 MR to map data + metadata
> 2. use 1 MR to describe data, metadata and signature params.

Actually I think that given that we have only a single user and a single
provider we are in an ideal position to avoid the hassle of phasing this
cleanup. I think the proposed change for adding PI support is more
intrusive than it should be due to the API we came up with at the time
(mostly my fault probably).

> Oren,
> Can you share the vision for the MR layout and how it can fit to the T10 
> solution for ULPs ? Also pros/cons about the internal MR/s implemetation ?

Without knowing the end-goal in mind, I'd suggest to make the API
very specific to PI and have all the generic stuff inside the provider
driver. Looking back, I think it was a mistake to try and reflect a
generic interface for adding block metadata for consumers that only use
PI (and will probably never do anything else).

What I'd think we should aim for as an interface is something like:

1. mr = ib_alloc_mr(pd, IB_MR_TYPE_PI, max_data_sg, max_meta_sg)

2. count = ib_mr_map_sg_pi(mr, data_sg, data_sg_cnt, data_sg_offset,
                             meta_sg, meta_sg_cnt, meta_sg_offset,
                             page_size)

3. ret = ib_mr_setup_pi(struct ib_mr *mr, struct blk_integrity *bi)
    *although this might not be able to converge scsi and nvme in one
     shot so we might have each ulp do its own thing here..

4. ret = ib_post_send(qp, pi_wr)
    where ib_pi_wr looks exactly like ib_reg_wr but has the opcode
    dissection for the provider to do all internal stuff.

5. ret = ib_check_pi_status(mr)

Try to hide all the multiple mr madness...

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 17/17] nvme-rdma: Add T10-PI support
  2018-06-03 11:30       ` Sagi Grimberg
@ 2018-06-03 14:01         ` Oren Duer
  2018-06-03 14:04           ` Oren Duer
  2018-06-03 16:30           ` Sagi Grimberg
  2018-06-06 12:33         ` Max Gurtovoy
  1 sibling, 2 replies; 93+ messages in thread
From: Oren Duer @ 2018-06-03 14:01 UTC (permalink / raw)


On Sun, Jun 3, 2018@2:30 PM, Sagi Grimberg <sagi@grimberg.me> wrote:
>
>> Sagi,
>>
>> our architecture team have presented MR features and improvments during
>> OFA 2018, and there are some similar ideas to your idea below. We need to
>> thing and plan how to progress here.
>
>
> I have not attended the OFA workshop.

We started pushing a new approach to all that cool MR registration stuff.
Presented in last OFA conf:
https://www.openfabrics.org/images/2018workshop/presentations/301_TOved_NonContiguousMemoryRegistration.pdf

The basic idea is aligned with your proposal: allocating an MR,
setting a memory layout to it using a variety
of ibv_mr_set_layout_*() calls, and then posting a WR.

This framework then can be expanded to support DIF, as was also
presented in the same conference:
https://www.openfabrics.org/images/2018workshop/presentations/301_TOved_NonContiguousMemoryRegistration.pdf
Basically, we're proposing DIF to be yet another memory layout using
ibv_mr_set_layout_signature().

This was all proposed for user space, but I believe we would like to
keep kernel and user verbs as similar as possible.

> Try to hide all the multiple mr madness...

Not so sure here. Some implementations will want to control the MRs
explicitly. On targets, for instance,
data buffers can be in full control of the target implementation, in
which case each buffer (or bunch of them)
will be pre-registered with mr that will be kept across different IOs.


-- 
Oren

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 17/17] nvme-rdma: Add T10-PI support
  2018-06-03 14:01         ` Oren Duer
@ 2018-06-03 14:04           ` Oren Duer
  2018-06-03 16:30           ` Sagi Grimberg
  1 sibling, 0 replies; 93+ messages in thread
From: Oren Duer @ 2018-06-03 14:04 UTC (permalink / raw)


On Sun, Jun 3, 2018@5:01 PM, Oren Duer <oren.duer@gmail.com> wrote:
> This framework then can be expanded to support DIF, as was also
> presented in the same conference:
> https://www.openfabrics.org/images/2018workshop/presentations/301_TOved_NonContiguousMemoryRegistration.pdf

Sorry, pasted the same link twice.
This is the correct T10-DIF ppt:
https://www.openfabrics.org/images/2018workshop/presentations/307_TOved_T10-DIFOffload.pdf


Oren

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 17/17] nvme-rdma: Add T10-PI support
  2018-06-03 14:01         ` Oren Duer
  2018-06-03 14:04           ` Oren Duer
@ 2018-06-03 16:30           ` Sagi Grimberg
  2018-06-05  6:35             ` Oren Duer
  1 sibling, 1 reply; 93+ messages in thread
From: Sagi Grimberg @ 2018-06-03 16:30 UTC (permalink / raw)


Hey Oren,

> We started pushing a new approach to all that cool MR registration stuff.
> Presented in last OFA conf:
> https://www.openfabrics.org/images/2018workshop/presentations/301_TOved_NonContiguousMemoryRegistration.pdf
> 
> The basic idea is aligned with your proposal: allocating an MR,
> setting a memory layout to it using a variety
> of ibv_mr_set_layout_*() calls, and then posting a WR.

Hmm, in my mind, overloading the interface to be generic was counter
productive specifically for PI. I know that there are bunch of
operations to do on an MR, but I think that each of them is better off
with its own clean and simple API that gives the user the minimum stuff
it needs to understand or perform in order to get the task done.

> This framework then can be expanded to support DIF, as was also
> presented in the same conference:
> https://www.openfabrics.org/images/2018workshop/presentations/301_TOved_NonContiguousMemoryRegistration.pdf
> Basically, we're proposing DIF to be yet another memory layout using
> ibv_mr_set_layout_signature().
> 
> This was all proposed for user space, but I believe we would like to
> keep kernel and user verbs as similar as possible.

I understand that this approach unifies all the MR operations with the
concept of a layout. But I think that the ULP would want an interface
that maps in a simple and clean conversion from upper level PI into
verbs PI.

>> Try to hide all the multiple mr madness...
> 
> Not so sure here. Some implementations will want to control the MRs
> explicitly. On targets, for instance,
> data buffers can be in full control of the target implementation, in
> which case each buffer (or bunch of them)
> will be pre-registered with mr that will be kept across different IOs.

I think this is ignoring how all our kernel consumers use MRs today and
in the foreseeable future. What I see is that our ULPs are exposed to
more than they probably want just for a theoretical level of freedom
for a use-case that I don't think will ever find its way to the kernel.

Moreover, I'm also not absolutely sure that the current API is not
dictating how RDMA vendors should implement PI.

Note that my comments are mostly directed to the MR hierarchy that PI
is presenting to the consumer. I think we are much better off with
hiding it down in the providers such that the ULP will see yet another
type of a memory region that can do PI.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 17/17] nvme-rdma: Add T10-PI support
  2018-06-03 16:30           ` Sagi Grimberg
@ 2018-06-05  6:35             ` Oren Duer
  2018-06-07 15:30               ` Sagi Grimberg
  0 siblings, 1 reply; 93+ messages in thread
From: Oren Duer @ 2018-06-05  6:35 UTC (permalink / raw)


On Sun, Jun 3, 2018@7:30 PM, Sagi Grimberg <sagi@grimberg.me> wrote:
> Hey Oren,
>
>> We started pushing a new approach to all that cool MR registration stuff.
>> Presented in last OFA conf:
>>
>> https://www.openfabrics.org/images/2018workshop/presentations/301_TOved_NonContiguousMemoryRegistration.pdf
>>
>> The basic idea is aligned with your proposal: allocating an MR,
>> setting a memory layout to it using a variety
>> of ibv_mr_set_layout_*() calls, and then posting a WR.
>
>
> Hmm, in my mind, overloading the interface to be generic was counter
> productive specifically for PI. I know that there are bunch of
> operations to do on an MR, but I think that each of them is better off
> with its own clean and simple API that gives the user the minimum stuff
> it needs to understand or perform in order to get the task done.

You can say the opposite too... When you see different use cases converge
into one elegant API then you know you've done something right.
I think this is one of those cases. DIF and other signatures, simple layouts,
indirections, complex n-dim matrix layouts are all attributes assigned to
memory, and apply when this memory is accessed.

> Note that my comments are mostly directed to the MR hierarchy that PI
> is presenting to the consumer. I think we are much better off with
> hiding it down in the providers such that the ULP will see yet another
> type of a memory region that can do PI.

We'll look into that. Basically I don't mind the API of
set_mr_layout_signature()
to take (data_sgl, dif_sgl) instead of (data_sge, dif_sge). Just need to make
sure we can hide in provider the pre-allocation of all the "hidden" mkeys for
that case. We don't want ulp to experience hickups due to internal key
allocations and management...

-- 
Oren

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 17/17] nvme-rdma: Add T10-PI support
  2018-06-03 11:30       ` Sagi Grimberg
  2018-06-03 14:01         ` Oren Duer
@ 2018-06-06 12:33         ` Max Gurtovoy
  2018-06-07 15:26           ` Sagi Grimberg
  1 sibling, 1 reply; 93+ messages in thread
From: Max Gurtovoy @ 2018-06-06 12:33 UTC (permalink / raw)




On 6/3/2018 2:30 PM, Sagi Grimberg wrote:
> 
>> Sagi,
>>
>> our architecture team have presented MR features and improvments 
>> during OFA 2018, and there are some similar ideas to your idea below. 
>> We need to thing and plan how to progress here.
> 
> I have not attended the OFA workshop.
> 
>> Your suggestion looks reasonable but in my opinion it should be 
>> divided to 2 phases (otherwise it will be hard to push it in the near 
>> future):
>> 1. use 1 MR to map data + metadata
>> 2. use 1 MR to describe data, metadata and signature params.
> 
> Actually I think that given that we have only a single user and a single
> provider we are in an ideal position to avoid the hassle of phasing this
> cleanup. I think the proposed change for adding PI support is more
> intrusive than it should be due to the API we came up with at the time
> (mostly my fault probably).
> 
>> Oren,
>> Can you share the vision for the MR layout and how it can fit to the 
>> T10 solution for ULPs ? Also pros/cons about the internal MR/s 
>> implemetation ?
> 
> Without knowing the end-goal in mind, I'd suggest to make the API
> very specific to PI and have all the generic stuff inside the provider
> driver. Looking back, I think it was a mistake to try and reflect a
> generic interface for adding block metadata for consumers that only use
> PI (and will probably never do anything else).
> 
> What I'd think we should aim for as an interface is something like:
> 
> 1. mr = ib_alloc_mr(pd, IB_MR_TYPE_PI, max_data_sg, max_meta_sg)

Ok, I'll also change all the other callers and providers with new API.
> 
> 2. count = ib_mr_map_sg_pi(mr, data_sg, data_sg_cnt, data_sg_offset,
>  ??????????????????????????? meta_sg, meta_sg_cnt, meta_sg_offset,
>  ??????????????????????????? page_size)

Sound reasonable too.

> 
> 3. ret = ib_mr_setup_pi(struct ib_mr *mr, struct blk_integrity *bi)
>  ?? *although this might not be able to converge scsi and nvme in one
>  ??? shot so we might have each ulp do its own thing here..

In the first patchset each ULP will fill in the sig_attrs by itself.
We can add this incrementaly.

> 
> 4. ret = ib_post_send(qp, pi_wr)
>  ?? where ib_pi_wr looks exactly like ib_reg_wr but has the opcode
>  ?? dissection for the provider to do all internal stuff.

did you mean similar to IB_WR_REG_SIG_MR (and not IB_WR_REG_MR) ?
This IB_WR_REG_PI_MR will need to create internal WR typed IB_WR_REG_MR 
for the internal mr.
> 
> 5. ret = ib_check_pi_status(mr)
> 
> Try to hide all the multiple mr madness...

I'll create internal mlx5_ib_mr pi_mr for mr typed IB_MR_TYPE_PI.

Max.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 08/17] nvme: limit integrity segments to be <= data segments
  2018-05-30 22:09   ` Sagi Grimberg
@ 2018-06-07 13:02     ` Max Gurtovoy
  2018-06-07 15:23       ` Sagi Grimberg
  2018-06-07 23:50       ` Martin K. Petersen
  0 siblings, 2 replies; 93+ messages in thread
From: Max Gurtovoy @ 2018-06-07 13:02 UTC (permalink / raw)




On 5/31/2018 1:09 AM, Sagi Grimberg wrote:
> 
>> There is no reason for setting max_integrity_segments to be greater than
>> max_segments.
>>
>> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
>> ---
>> ? drivers/nvme/host/core.c | 4 ++++
>> ? 1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 197208a..e81e898 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -1795,6 +1795,10 @@ static void nvme_set_queue_limits(struct 
>> nvme_ctrl *ctrl,
>> ????????? u32 max_segments =
>> ????????????? (ctrl->max_hw_sectors / (ctrl->page_size >> 9)) + 1;
>> +??????? if (ctrl->max_integrity_segments)
>> +??????????? ctrl->max_integrity_segments = 
>> min(ctrl->max_integrity_segments,
>> +?????????????????????????????? max_segments);
>> +
> 
> What is this protecting against?

This is for good practice, I don't think the block layer will/can send 
us more integrity_segments than data_segments. but seems weird to have 
more max_integrity_segments than max_data_segments, isn't it ?

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 08/17] nvme: limit integrity segments to be <= data segments
  2018-06-07 13:02     ` Max Gurtovoy
@ 2018-06-07 15:23       ` Sagi Grimberg
  2018-06-07 23:50       ` Martin K. Petersen
  1 sibling, 0 replies; 93+ messages in thread
From: Sagi Grimberg @ 2018-06-07 15:23 UTC (permalink / raw)



>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index 197208a..e81e898 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -1795,6 +1795,10 @@ static void nvme_set_queue_limits(struct 
>>> nvme_ctrl *ctrl,
>>> ????????? u32 max_segments =
>>> ????????????? (ctrl->max_hw_sectors / (ctrl->page_size >> 9)) + 1;
>>> +??????? if (ctrl->max_integrity_segments)
>>> +??????????? ctrl->max_integrity_segments = 
>>> min(ctrl->max_integrity_segments,
>>> +?????????????????????????????? max_segments);
>>> +
>>
>> What is this protecting against?
> 
> This is for good practice, I don't think the block layer will/can send 
> us more integrity_segments than data_segments. but seems weird to have 
> more max_integrity_segments than max_data_segments, isn't it ?

drivers are not responsible to cover for upper layer bugs. If there is a
bug, it needs to be fixed, not hide behind a driver protection.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 17/17] nvme-rdma: Add T10-PI support
  2018-06-06 12:33         ` Max Gurtovoy
@ 2018-06-07 15:26           ` Sagi Grimberg
  0 siblings, 0 replies; 93+ messages in thread
From: Sagi Grimberg @ 2018-06-07 15:26 UTC (permalink / raw)



>> 3. ret = ib_mr_setup_pi(struct ib_mr *mr, struct blk_integrity *bi)
>> ??? *although this might not be able to converge scsi and nvme in one
>> ???? shot so we might have each ulp do its own thing here..
> 
> In the first patchset each ULP will fill in the sig_attrs by itself.
> We can add this incrementaly.

OK... I would like to have a lib in place that maps better to
block integrity semantics.

>> 4. ret = ib_post_send(qp, pi_wr)
>> ??? where ib_pi_wr looks exactly like ib_reg_wr but has the opcode
>> ??? dissection for the provider to do all internal stuff.
> 
> did you mean similar to IB_WR_REG_SIG_MR (and not IB_WR_REG_MR) ?
> This IB_WR_REG_PI_MR will need to create internal WR typed IB_WR_REG_MR 
> for the internal mr.

We could even simply use IB_WR_REG_MR, and if the MR has PI set
(previously ib_map_mr_sg_pi was called).

>> 5. ret = ib_check_pi_status(mr)
>>
>> Try to hide all the multiple mr madness...
> 
> I'll create internal mlx5_ib_mr pi_mr for mr typed IB_MR_TYPE_PI.

Cool..

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 17/17] nvme-rdma: Add T10-PI support
  2018-06-05  6:35             ` Oren Duer
@ 2018-06-07 15:30               ` Sagi Grimberg
  0 siblings, 0 replies; 93+ messages in thread
From: Sagi Grimberg @ 2018-06-07 15:30 UTC (permalink / raw)



>> Hmm, in my mind, overloading the interface to be generic was counter
>> productive specifically for PI. I know that there are bunch of
>> operations to do on an MR, but I think that each of them is better off
>> with its own clean and simple API that gives the user the minimum stuff
>> it needs to understand or perform in order to get the task done.
> 
> You can say the opposite too... When you see different use cases converge
> into one elegant API then you know you've done something right.
> I think this is one of those cases. DIF and other signatures, simple layouts,
> indirections, complex n-dim matrix layouts are all attributes assigned to
> memory, and apply when this memory is accessed.

I'm not opposed to have a common API, but I would also think that we
should layer feature specific API on top of it such that the actual
consumer will only need to care about what it needs to.

>> Note that my comments are mostly directed to the MR hierarchy that PI
>> is presenting to the consumer. I think we are much better off with
>> hiding it down in the providers such that the ULP will see yet another
>> type of a memory region that can do PI.
> 
> We'll look into that. Basically I don't mind the API of
> set_mr_layout_signature()
> to take (data_sgl, dif_sgl) instead of (data_sge, dif_sge). Just need to make
> sure we can hide in provider the pre-allocation of all the "hidden" mkeys for
> that case. We don't want ulp to experience hickups due to internal key
> allocations and management...

100% agree. All allocations must not be made in the data path.

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 08/17] nvme: limit integrity segments to be <= data segments
  2018-06-07 13:02     ` Max Gurtovoy
  2018-06-07 15:23       ` Sagi Grimberg
@ 2018-06-07 23:50       ` Martin K. Petersen
  2018-06-09  1:33         ` Max Gurtovoy
  1 sibling, 1 reply; 93+ messages in thread
From: Martin K. Petersen @ 2018-06-07 23:50 UTC (permalink / raw)



Max,

> This is for good practice, I don't think the block layer will/can send
> us more integrity_segments than data_segments.

They are orthogonal. The number of integrity segments is gated by the
max_integrity_segments queue limit.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 08/17] nvme: limit integrity segments to be <= data segments
  2018-06-07 23:50       ` Martin K. Petersen
@ 2018-06-09  1:33         ` Max Gurtovoy
  2018-06-13  0:35           ` Martin K. Petersen
  0 siblings, 1 reply; 93+ messages in thread
From: Max Gurtovoy @ 2018-06-09  1:33 UTC (permalink / raw)




On 6/8/2018 2:50 AM, Martin K. Petersen wrote:
> 
> Max,
> 
>> This is for good practice, I don't think the block layer will/can send
>> us more integrity_segments than data_segments.
> 
> They are orthogonal. The number of integrity segments is gated by the
> max_integrity_segments queue limit.
> 

I agree, but do you think it's reasonable to set max_integrity_segments 
 > max_segments ?

^ permalink raw reply	[flat|nested] 93+ messages in thread

* [PATCH 08/17] nvme: limit integrity segments to be <= data segments
  2018-06-09  1:33         ` Max Gurtovoy
@ 2018-06-13  0:35           ` Martin K. Petersen
  0 siblings, 0 replies; 93+ messages in thread
From: Martin K. Petersen @ 2018-06-13  0:35 UTC (permalink / raw)



Max,

> I agree, but do you think it's reasonable to set max_integrity_segments 
> > max_segments ?

The upper bound for max_integrity_segments is number of sectors in the
I/O.

That's clearly a pathological corner case. But we used to be able to hit
it with the "right" choice of kernel memory allocator and dd with bs=512
(effectively because the allocator always allocated backwards in memory
so no two 8-byte PI allocations for a 512-byte buffer_head could be
combined into one contiguous segment).

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 93+ messages in thread

end of thread, other threads:[~2018-06-13  0:35 UTC | newest]

Thread overview: 93+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-27 15:50 [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host Max Gurtovoy
2018-05-27 15:50 ` [PATCH 01/17] IB/isert: fix T10-pi check mask setting Max Gurtovoy
2018-05-27 15:50   ` Max Gurtovoy
2018-05-28  7:21   ` Christoph Hellwig
2018-05-28  7:21     ` Christoph Hellwig
2018-05-28 11:54     ` Max Gurtovoy
2018-05-28 11:54       ` Max Gurtovoy
2018-05-28 12:03       ` Christoph Hellwig
2018-05-28 12:03         ` Christoph Hellwig
2018-05-28 12:04         ` Max Gurtovoy
2018-05-28 12:04           ` Max Gurtovoy
2018-05-28 16:33           ` Jason Gunthorpe
2018-05-28 16:33             ` Jason Gunthorpe
2018-05-29  3:01             ` Martin K. Petersen
2018-05-29  3:01               ` Martin K. Petersen
2018-05-29 12:08               ` Max Gurtovoy
2018-05-29 12:08                 ` Max Gurtovoy
2018-05-29 19:23                 ` Jason Gunthorpe
2018-05-29 19:23                   ` Jason Gunthorpe
2018-05-29 22:11                   ` Martin K. Petersen
2018-05-29 22:11                     ` Martin K. Petersen
2018-05-29 22:19                     ` Jason Gunthorpe
2018-05-29 22:19                       ` Jason Gunthorpe
2018-05-29 22:41                       ` Martin K. Petersen
2018-05-29 22:41                         ` Martin K. Petersen
2018-05-30  8:07                       ` Max Gurtovoy
2018-05-30  8:07                         ` Max Gurtovoy
2018-05-30 15:30                         ` Jason Gunthorpe
2018-05-30 15:30                           ` Jason Gunthorpe
2018-05-30 21:47   ` Sagi Grimberg
2018-05-30 21:47     ` Sagi Grimberg
2018-05-30 21:49   ` Sagi Grimberg
2018-05-30 21:49     ` Sagi Grimberg
2018-05-27 15:50 ` [PATCH 02/17] RDMA/core: introduce check masks for T10-PI offload Max Gurtovoy
2018-05-28  7:21   ` Christoph Hellwig
2018-05-30 21:56   ` Sagi Grimberg
2018-05-27 15:50 ` [PATCH 03/17] IB/iser: use T10-PI check mask definitions from core layer Max Gurtovoy
2018-05-28  7:22   ` Christoph Hellwig
2018-05-30 21:57   ` Sagi Grimberg
2018-05-27 15:50 ` [PATCH 04/17] IB/isert: " Max Gurtovoy
2018-05-28  7:22   ` Christoph Hellwig
2018-05-30 10:48     ` Max Gurtovoy
2018-05-30 12:08       ` Christoph Hellwig
2018-05-30 15:24         ` Jason Gunthorpe
2018-05-30 21:59           ` Sagi Grimberg
2018-05-30 21:58   ` Sagi Grimberg
2018-05-27 15:50 ` [PATCH 05/17] nvme: Fix extended data LBA supported setting Max Gurtovoy
2018-05-28  7:22   ` Christoph Hellwig
2018-05-29 12:47     ` Max Gurtovoy
2018-05-30 22:00   ` Sagi Grimberg
2018-05-27 15:50 ` [PATCH 06/17] nvme: Add WARN in case fabrics ctrl was set with wrong metadata caps Max Gurtovoy
2018-05-28  7:24   ` Christoph Hellwig
2018-05-28 14:56     ` Max Gurtovoy
2018-05-30 22:05     ` Sagi Grimberg
2018-05-27 15:50 ` [PATCH 07/17] nvme: introduce max_integrity_segments ctrl attribute Max Gurtovoy
2018-05-30 22:08   ` Sagi Grimberg
2018-05-27 15:50 ` [PATCH 08/17] nvme: limit integrity segments to be <= data segments Max Gurtovoy
2018-05-30 22:09   ` Sagi Grimberg
2018-06-07 13:02     ` Max Gurtovoy
2018-06-07 15:23       ` Sagi Grimberg
2018-06-07 23:50       ` Martin K. Petersen
2018-06-09  1:33         ` Max Gurtovoy
2018-06-13  0:35           ` Martin K. Petersen
2018-05-27 15:50 ` [PATCH 09/17] nvme: reduce the metadata size in case the ctrl doesn't support it Max Gurtovoy
2018-05-28  7:25   ` Christoph Hellwig
2018-05-27 15:50 ` [PATCH 10/17] nvme: export nvme_ns_has_pi function Max Gurtovoy
2018-05-28  7:25   ` Christoph Hellwig
2018-05-28 12:41     ` Max Gurtovoy
2018-05-30 22:19   ` Sagi Grimberg
2018-05-27 15:50 ` [PATCH 11/17] nvme-rdma: Introduce cqe for local invalidation Max Gurtovoy
2018-05-28  7:25   ` Christoph Hellwig
2018-05-30 22:26   ` Sagi Grimberg
2018-05-27 15:50 ` [PATCH 12/17] nvme-rdma: Introduce nvme_rdma_set_keyed_sgl helper func Max Gurtovoy
2018-05-28  7:26   ` Christoph Hellwig
2018-05-30 22:27     ` Sagi Grimberg
2018-05-27 15:50 ` [PATCH 13/17] nvme-rdma: introduce nvme_rdma_sgl structure Max Gurtovoy
2018-05-27 15:50 ` [PATCH 14/17] nvme-rdma: refactor cmd mapping/unmapping mechanism Max Gurtovoy
2018-05-30 22:33   ` Sagi Grimberg
2018-05-27 15:50 ` [PATCH 15/17] nvme-rdma: Add helper function for preparing sg list to RDMA operation Max Gurtovoy
2018-05-27 15:50 ` [PATCH 16/17] nvme-rdma: Introduce nvme_rdma_first_wr helper function Max Gurtovoy
2018-05-27 15:50 ` [PATCH 17/17] nvme-rdma: Add T10-PI support Max Gurtovoy
2018-05-28  7:28   ` Christoph Hellwig
2018-05-30 23:05   ` Sagi Grimberg
2018-06-03  8:51     ` Max Gurtovoy
2018-06-03 11:30       ` Sagi Grimberg
2018-06-03 14:01         ` Oren Duer
2018-06-03 14:04           ` Oren Duer
2018-06-03 16:30           ` Sagi Grimberg
2018-06-05  6:35             ` Oren Duer
2018-06-07 15:30               ` Sagi Grimberg
2018-06-06 12:33         ` Max Gurtovoy
2018-06-07 15:26           ` Sagi Grimberg
2018-05-30 21:47 ` [RFC PATCH 00/17] T10-PI support for NVMeoF/RDMA host Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.