All of lore.kernel.org
 help / color / mirror / Atom feed
* generic RDMA READ/WRITE API V6
@ 2016-04-11 21:32 Christoph Hellwig
  2016-04-11 21:32 ` [PATCH 02/12] IB/cma: pass the port number to ib_create_qp Christoph Hellwig
                   ` (11 more replies)
  0 siblings, 12 replies; 66+ messages in thread
From: Christoph Hellwig @ 2016-04-11 21:32 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagi-NQWnxTmZq1alnMjI0IkVqw, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

This series contains patches that implement a first version of a generic
API to handle RDMA READ/WRITE operations as commonly used on the target
(or server) side for storage protocols.

This has been developed for the upcoming NVMe over Fabrics target, and
extensively teѕted as part of that, although this upstream version has
additional updates over the one we're currently using.

It hides details such as the use of MRs for iWarp devices, and will allow
looking at other HCA specifics easily in the future.

This series contains also conversion the SRP and iSER targets to the new
API.

I think it's basically ready to merge now.

I also have a git tree available at:

	git://git.infradead.org/users/hch/rdma.git rdma-rw-api

gitweb:

	http://git.infradead.org/users/hch/rdma.git/shortlog/refs/heads/rdma-rw-api

Changes since V5:
 - rebase to 4.6-rc3
 - now includes signature MR support
 - now contains the iSER target conversion
 - new module option for force MR usage for debugging purposes
 - fixes a bug with the non-merged SG count passes to the MR map
   routine, causing lockups when using SRP with MRs.
 - includes the mlx5 max_sg_rd fix (should probably go into 4.6-rc
   and -stable)

Changes since V4:
 - fix SG iteration in rdma_rw_init_mr_wrs
 - address various misc review feedback items from Bart and Leon

Changes since V3:
 - really fold the list_del in mr_pool_get into the right patch

Changes since V2:
 - fold the list_del in mr_pool_get into the right patch
 - clamp the max FR page size length
 - minor srpt style fix
 - spelling fixes
 - renamed rdma_has_read_invalidate to rdma_cap_read_inv

Changes since V1:
 - fixed offset handling in ib_sg_to_pages
 - uses proper SG iterators to handle larger than PAGE_SIZE segments
 - adjusted parameters for some functions to reduce size of the context
 - SRP target support

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 01/12] IB/mlx5: Expose correct max_sge_rd limit
       [not found] ` <1460410360-13104-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2016-04-11 21:32   ` Christoph Hellwig
       [not found]     ` <1460410360-13104-2-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  2016-04-11 21:32   ` [PATCH 06/12] IB/core: add a simple MR pool Christoph Hellwig
  2016-04-11 21:32   ` [PATCH 09/12] target: enhance and export target_alloc_sgl/target_free_sgl Christoph Hellwig
  2 siblings, 1 reply; 66+ messages in thread
From: Christoph Hellwig @ 2016-04-11 21:32 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagi-NQWnxTmZq1alnMjI0IkVqw, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA, Sagi Grimberg

From: Sagi Grimberg <sagig-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>

mlx5 devices (Connect-IB, ConnectX-4, ConnectX-4-LX) has a limitation
where rdma read work queue entries cannot exceed 512 bytes.
A rdma_read wqe needs to fit in 512 bytes:
- wqe control segment (16 bytes)
- rdma segment (16 bytes)
- scatter elements (16 bytes each)

So max_sge_rd should be: (512 - 16 - 16) / 16 = 30.

Reported-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Tested-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Signed-off-by: Sagi Grimberg <sagig-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c |  2 +-
 include/linux/mlx5/device.h       | 11 +++++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 5acf346..049754f 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -530,7 +530,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 		     sizeof(struct mlx5_wqe_ctrl_seg)) /
 		     sizeof(struct mlx5_wqe_data_seg);
 	props->max_sge = min(max_rq_sg, max_sq_sg);
-	props->max_sge_rd = props->max_sge;
+	props->max_sge_rd	   = MLX5_MAX_SGE_RD;
 	props->max_cq		   = 1 << MLX5_CAP_GEN(mdev, log_max_cq);
 	props->max_cqe = (1 << MLX5_CAP_GEN(mdev, log_max_cq_sz)) - 1;
 	props->max_mr		   = 1 << MLX5_CAP_GEN(mdev, log_max_mkey);
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 8156e3c..b3575f3 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -392,6 +392,17 @@ enum {
 	MLX5_CAP_OFF_CMDIF_CSUM		= 46,
 };
 
+enum {
+	/*
+	 * Max wqe size for rdma read is 512 bytes, so this
+	 * limits our max_sge_rd as the wqe needs to fit:
+	 * - ctrl segment (16 bytes)
+	 * - rdma segment (16 bytes)
+	 * - scatter elements (16 bytes each)
+	 */
+	MLX5_MAX_SGE_RD	= (512 - 16 - 16) / 16
+};
+
 struct mlx5_inbox_hdr {
 	__be16		opcode;
 	u8		rsvd[4];
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 02/12] IB/cma: pass the port number to ib_create_qp
  2016-04-11 21:32 generic RDMA READ/WRITE API V6 Christoph Hellwig
@ 2016-04-11 21:32 ` Christoph Hellwig
       [not found]   ` <1460410360-13104-3-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  2016-04-19  3:14   ` Ira Weiny
  2016-04-11 21:32 ` [PATCH 03/12] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg Christoph Hellwig
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 66+ messages in thread
From: Christoph Hellwig @ 2016-04-11 21:32 UTC (permalink / raw)
  To: dledford; +Cc: bart.vanassche, swise, sagi, linux-rdma, target-devel

The new RW API will need this.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/core/cma.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 93ab0ae..6ebaf20 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -800,6 +800,7 @@ int rdma_create_qp(struct rdma_cm_id *id, struct ib_pd *pd,
 	if (id->device != pd->device)
 		return -EINVAL;
 
+	qp_init_attr->port_num = id->port_num;
 	qp = ib_create_qp(pd, qp_init_attr);
 	if (IS_ERR(qp))
 		return PTR_ERR(qp);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 03/12] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg
  2016-04-11 21:32 generic RDMA READ/WRITE API V6 Christoph Hellwig
  2016-04-11 21:32 ` [PATCH 02/12] IB/cma: pass the port number to ib_create_qp Christoph Hellwig
@ 2016-04-11 21:32 ` Christoph Hellwig
       [not found]   ` <1460410360-13104-4-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  2016-04-11 21:32 ` [PATCH 04/12] IB/core: add a helper to check for READ WITH INVALIDATE support Christoph Hellwig
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 66+ messages in thread
From: Christoph Hellwig @ 2016-04-11 21:32 UTC (permalink / raw)
  To: dledford; +Cc: bart.vanassche, swise, sagi, linux-rdma, target-devel

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/core/verbs.c             | 24 ++++++++++++------------
 drivers/infiniband/hw/cxgb3/iwch_provider.c |  7 +++----
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h      |  5 ++---
 drivers/infiniband/hw/cxgb4/mem.c           |  7 +++----
 drivers/infiniband/hw/mlx4/mlx4_ib.h        |  5 ++---
 drivers/infiniband/hw/mlx4/mr.c             |  7 +++----
 drivers/infiniband/hw/mlx5/mlx5_ib.h        |  5 ++---
 drivers/infiniband/hw/mlx5/mr.c             | 21 ++++++++++++---------
 drivers/infiniband/hw/nes/nes_verbs.c       |  7 +++----
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |  7 +++----
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h |  5 ++---
 drivers/infiniband/ulp/iser/iser_memory.c   |  4 ++--
 drivers/infiniband/ulp/isert/ib_isert.c     |  2 +-
 drivers/infiniband/ulp/srp/ib_srp.c         |  2 +-
 include/rdma/ib_verbs.h                     | 23 +++++++++--------------
 net/rds/ib_frmr.c                           |  2 +-
 net/sunrpc/xprtrdma/frwr_ops.c              |  2 +-
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c     |  2 +-
 18 files changed, 63 insertions(+), 74 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 15b8adb..064dbef 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1597,6 +1597,7 @@ EXPORT_SYMBOL(ib_set_vf_guid);
  * @mr:            memory region
  * @sg:            dma mapped scatterlist
  * @sg_nents:      number of entries in sg
+ * @sg_offset:     offset in bytes into sg
  * @page_size:     page vector desired page size
  *
  * Constraints:
@@ -1615,17 +1616,15 @@ EXPORT_SYMBOL(ib_set_vf_guid);
  * After this completes successfully, the  memory region
  * is ready for registration.
  */
-int ib_map_mr_sg(struct ib_mr *mr,
-		 struct scatterlist *sg,
-		 int sg_nents,
-		 unsigned int page_size)
+int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
+		unsigned int sg_offset, unsigned int page_size)
 {
 	if (unlikely(!mr->device->map_mr_sg))
 		return -ENOSYS;
 
 	mr->page_size = page_size;
 
-	return mr->device->map_mr_sg(mr, sg, sg_nents);
+	return mr->device->map_mr_sg(mr, sg, sg_nents, sg_offset);
 }
 EXPORT_SYMBOL(ib_map_mr_sg);
 
@@ -1635,6 +1634,7 @@ EXPORT_SYMBOL(ib_map_mr_sg);
  * @mr:            memory region
  * @sgl:           dma mapped scatterlist
  * @sg_nents:      number of entries in sg
+ * @sg_offset:     offset in bytes into sg
  * @set_page:      driver page assignment function pointer
  *
  * Core service helper for drivers to convert the largest
@@ -1645,10 +1645,8 @@ EXPORT_SYMBOL(ib_map_mr_sg);
  * Returns the number of sg elements that were assigned to
  * a page vector.
  */
-int ib_sg_to_pages(struct ib_mr *mr,
-		   struct scatterlist *sgl,
-		   int sg_nents,
-		   int (*set_page)(struct ib_mr *, u64))
+int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents,
+		unsigned int sg_offset, int (*set_page)(struct ib_mr *, u64))
 {
 	struct scatterlist *sg;
 	u64 last_end_dma_addr = 0;
@@ -1656,12 +1654,12 @@ int ib_sg_to_pages(struct ib_mr *mr,
 	u64 page_mask = ~((u64)mr->page_size - 1);
 	int i, ret;
 
-	mr->iova = sg_dma_address(&sgl[0]);
+	mr->iova = sg_dma_address(&sgl[0]) + sg_offset;
 	mr->length = 0;
 
 	for_each_sg(sgl, sg, sg_nents, i) {
-		u64 dma_addr = sg_dma_address(sg);
-		unsigned int dma_len = sg_dma_len(sg);
+		u64 dma_addr = sg_dma_address(sg) + sg_offset;
+		unsigned int dma_len = sg_dma_len(sg) - sg_offset;
 		u64 end_dma_addr = dma_addr + dma_len;
 		u64 page_addr = dma_addr & page_mask;
 
@@ -1694,6 +1692,8 @@ next_page:
 		mr->length += dma_len;
 		last_end_dma_addr = end_dma_addr;
 		last_page_off = end_dma_addr & ~page_mask;
+
+		sg_offset = 0;
 	}
 
 	return i;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 42a7b89..1423538 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -783,15 +783,14 @@ static int iwch_set_page(struct ib_mr *ibmr, u64 addr)
 	return 0;
 }
 
-static int iwch_map_mr_sg(struct ib_mr *ibmr,
-			  struct scatterlist *sg,
-			  int sg_nents)
+static int iwch_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
+		int sg_nents, unsigned sg_offset)
 {
 	struct iwch_mr *mhp = to_iwch_mr(ibmr);
 
 	mhp->npages = 0;
 
-	return ib_sg_to_pages(ibmr, sg, sg_nents, iwch_set_page);
+	return ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, iwch_set_page);
 }
 
 static int iwch_destroy_qp(struct ib_qp *ib_qp)
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index df43f87..067cb3f 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -917,9 +917,8 @@ void c4iw_qp_rem_ref(struct ib_qp *qp);
 struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
 			    enum ib_mr_type mr_type,
 			    u32 max_num_sg);
-int c4iw_map_mr_sg(struct ib_mr *ibmr,
-		   struct scatterlist *sg,
-		   int sg_nents);
+int c4iw_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+		unsigned int sg_offset);
 int c4iw_dealloc_mw(struct ib_mw *mw);
 struct ib_mw *c4iw_alloc_mw(struct ib_pd *pd, enum ib_mw_type type,
 			    struct ib_udata *udata);
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index 008be07..38afb3d 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -690,15 +690,14 @@ static int c4iw_set_page(struct ib_mr *ibmr, u64 addr)
 	return 0;
 }
 
-int c4iw_map_mr_sg(struct ib_mr *ibmr,
-		   struct scatterlist *sg,
-		   int sg_nents)
+int c4iw_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+		unsigned int sg_offset)
 {
 	struct c4iw_mr *mhp = to_c4iw_mr(ibmr);
 
 	mhp->mpl_len = 0;
 
-	return ib_sg_to_pages(ibmr, sg, sg_nents, c4iw_set_page);
+	return ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, c4iw_set_page);
 }
 
 int c4iw_dereg_mr(struct ib_mr *ib_mr)
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 1eca01c..ba32817 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -717,9 +717,8 @@ int mlx4_ib_dealloc_mw(struct ib_mw *mw);
 struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd,
 			       enum ib_mr_type mr_type,
 			       u32 max_num_sg);
-int mlx4_ib_map_mr_sg(struct ib_mr *ibmr,
-		      struct scatterlist *sg,
-		      int sg_nents);
+int mlx4_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+		unsigned int sg_offset);
 int mlx4_ib_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period);
 int mlx4_ib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata);
 struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index ce0b5aa..b04f623 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -528,9 +528,8 @@ static int mlx4_set_page(struct ib_mr *ibmr, u64 addr)
 	return 0;
 }
 
-int mlx4_ib_map_mr_sg(struct ib_mr *ibmr,
-		      struct scatterlist *sg,
-		      int sg_nents)
+int mlx4_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+		unsigned int sg_offset)
 {
 	struct mlx4_ib_mr *mr = to_mmr(ibmr);
 	int rc;
@@ -541,7 +540,7 @@ int mlx4_ib_map_mr_sg(struct ib_mr *ibmr,
 				   sizeof(u64) * mr->max_pages,
 				   DMA_TO_DEVICE);
 
-	rc = ib_sg_to_pages(ibmr, sg, sg_nents, mlx4_set_page);
+	rc = ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, mlx4_set_page);
 
 	ib_dma_sync_single_for_device(ibmr->device, mr->page_map,
 				      sizeof(u64) * mr->max_pages,
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index b46c255..8c835b2 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -712,9 +712,8 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr);
 struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
 			       enum ib_mr_type mr_type,
 			       u32 max_num_sg);
-int mlx5_ib_map_mr_sg(struct ib_mr *ibmr,
-		      struct scatterlist *sg,
-		      int sg_nents);
+int mlx5_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+		unsigned int sg_offset);
 int mlx5_ib_process_mad(struct ib_device *ibdev, int mad_flags, u8 port_num,
 			const struct ib_wc *in_wc, const struct ib_grh *in_grh,
 			const struct ib_mad_hdr *in, size_t in_mad_size,
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 4d5bff1..b678eac 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1751,24 +1751,27 @@ done:
 static int
 mlx5_ib_sg_to_klms(struct mlx5_ib_mr *mr,
 		   struct scatterlist *sgl,
-		   unsigned short sg_nents)
+		   unsigned short sg_nents,
+		   unsigned int sg_offset)
 {
 	struct scatterlist *sg = sgl;
 	struct mlx5_klm *klms = mr->descs;
 	u32 lkey = mr->ibmr.pd->local_dma_lkey;
 	int i;
 
-	mr->ibmr.iova = sg_dma_address(sg);
+	mr->ibmr.iova = sg_dma_address(sg) + sg_offset;
 	mr->ibmr.length = 0;
 	mr->ndescs = sg_nents;
 
 	for_each_sg(sgl, sg, sg_nents, i) {
 		if (unlikely(i > mr->max_descs))
 			break;
-		klms[i].va = cpu_to_be64(sg_dma_address(sg));
-		klms[i].bcount = cpu_to_be32(sg_dma_len(sg));
+		klms[i].va = cpu_to_be64(sg_dma_address(sg) + sg_offset);
+		klms[i].bcount = cpu_to_be32(sg_dma_len(sg) - sg_offset);
 		klms[i].key = cpu_to_be32(lkey);
 		mr->ibmr.length += sg_dma_len(sg);
+
+		sg_offset = 0;
 	}
 
 	return i;
@@ -1788,9 +1791,8 @@ static int mlx5_set_page(struct ib_mr *ibmr, u64 addr)
 	return 0;
 }
 
-int mlx5_ib_map_mr_sg(struct ib_mr *ibmr,
-		      struct scatterlist *sg,
-		      int sg_nents)
+int mlx5_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+		unsigned int sg_offset)
 {
 	struct mlx5_ib_mr *mr = to_mmr(ibmr);
 	int n;
@@ -1802,9 +1804,10 @@ int mlx5_ib_map_mr_sg(struct ib_mr *ibmr,
 				   DMA_TO_DEVICE);
 
 	if (mr->access_mode == MLX5_ACCESS_MODE_KLM)
-		n = mlx5_ib_sg_to_klms(mr, sg, sg_nents);
+		n = mlx5_ib_sg_to_klms(mr, sg, sg_nents, sg_offset);
 	else
-		n = ib_sg_to_pages(ibmr, sg, sg_nents, mlx5_set_page);
+		n = ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset,
+				mlx5_set_page);
 
 	ib_dma_sync_single_for_device(ibmr->device, mr->desc_map,
 				      mr->desc_size * mr->max_descs,
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index fba69a3..698aab6 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -402,15 +402,14 @@ static int nes_set_page(struct ib_mr *ibmr, u64 addr)
 	return 0;
 }
 
-static int nes_map_mr_sg(struct ib_mr *ibmr,
-			 struct scatterlist *sg,
-			 int sg_nents)
+static int nes_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg,
+		int sg_nents, unsigned int sg_offset)
 {
 	struct nes_mr *nesmr = to_nesmr(ibmr);
 
 	nesmr->npages = 0;
 
-	return ib_sg_to_pages(ibmr, sg, sg_nents, nes_set_page);
+	return ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, nes_set_page);
 }
 
 /**
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index a8496a1..9ddd550 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -3081,13 +3081,12 @@ static int ocrdma_set_page(struct ib_mr *ibmr, u64 addr)
 	return 0;
 }
 
-int ocrdma_map_mr_sg(struct ib_mr *ibmr,
-		     struct scatterlist *sg,
-		     int sg_nents)
+int ocrdma_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+		unsigned int sg_offset)
 {
 	struct ocrdma_mr *mr = get_ocrdma_mr(ibmr);
 
 	mr->npages = 0;
 
-	return ib_sg_to_pages(ibmr, sg, sg_nents, ocrdma_set_page);
+	return ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, ocrdma_set_page);
 }
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
index 8b517fd..b290e5d 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
@@ -122,8 +122,7 @@ struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *, u64 start, u64 length,
 struct ib_mr *ocrdma_alloc_mr(struct ib_pd *pd,
 			      enum ib_mr_type mr_type,
 			      u32 max_num_sg);
-int ocrdma_map_mr_sg(struct ib_mr *ibmr,
-		     struct scatterlist *sg,
-		     int sg_nents);
+int ocrdma_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
+		unsigned sg_offset);
 
 #endif				/* __OCRDMA_VERBS_H__ */
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c
index 9a391cc..44cc85f 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -236,7 +236,7 @@ int iser_fast_reg_fmr(struct iscsi_iser_task *iser_task,
 	page_vec->npages = 0;
 	page_vec->fake_mr.page_size = SIZE_4K;
 	plen = ib_sg_to_pages(&page_vec->fake_mr, mem->sg,
-			      mem->size, iser_set_page);
+			      mem->size, 0, iser_set_page);
 	if (unlikely(plen < mem->size)) {
 		iser_err("page vec too short to hold this SG\n");
 		iser_data_buf_dump(mem, device->ib_device);
@@ -446,7 +446,7 @@ static int iser_fast_reg_mr(struct iscsi_iser_task *iser_task,
 
 	ib_update_fast_reg_key(mr, ib_inc_rkey(mr->rkey));
 
-	n = ib_map_mr_sg(mr, mem->sg, mem->size, SIZE_4K);
+	n = ib_map_mr_sg(mr, mem->sg, mem->size, 0, SIZE_4K);
 	if (unlikely(n != mem->size)) {
 		iser_err("failed to map sg (%d/%d)\n",
 			 n, mem->size);
diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 411e446..a44a736 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -2461,7 +2461,7 @@ isert_fast_reg_mr(struct isert_conn *isert_conn,
 		wr = &inv_wr;
 	}
 
-	n = ib_map_mr_sg(mr, mem->sg, mem->nents, PAGE_SIZE);
+	n = ib_map_mr_sg(mr, mem->sg, mem->nents, 0, PAGE_SIZE);
 	if (unlikely(n != mem->nents)) {
 		isert_err("failed to map mr sg (%d/%d)\n",
 			 n, mem->nents);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index b6bf204..ca425f2 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1333,7 +1333,7 @@ static int srp_map_finish_fr(struct srp_map_state *state,
 	rkey = ib_inc_rkey(desc->mr->rkey);
 	ib_update_fast_reg_key(desc->mr, rkey);
 
-	n = ib_map_mr_sg(desc->mr, state->sg, sg_nents, dev->mr_page_size);
+	n = ib_map_mr_sg(desc->mr, state->sg, sg_nents, 0, dev->mr_page_size);
 	if (unlikely(n < 0))
 		return n;
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index fb2cef4..24d0d82 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1827,7 +1827,8 @@ struct ib_device {
 					       u32 max_num_sg);
 	int                        (*map_mr_sg)(struct ib_mr *mr,
 						struct scatterlist *sg,
-						int sg_nents);
+						int sg_nents,
+						unsigned sg_offset);
 	struct ib_mw *             (*alloc_mw)(struct ib_pd *pd,
 					       enum ib_mw_type type,
 					       struct ib_udata *udata);
@@ -3111,29 +3112,23 @@ struct net_device *ib_get_net_dev_by_params(struct ib_device *dev, u8 port,
 					    u16 pkey, const union ib_gid *gid,
 					    const struct sockaddr *addr);
 
-int ib_map_mr_sg(struct ib_mr *mr,
-		 struct scatterlist *sg,
-		 int sg_nents,
-		 unsigned int page_size);
+int ib_map_mr_sg(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
+		unsigned int sg_offset, unsigned int page_size);
 
 static inline int
-ib_map_mr_sg_zbva(struct ib_mr *mr,
-		  struct scatterlist *sg,
-		  int sg_nents,
-		  unsigned int page_size)
+ib_map_mr_sg_zbva(struct ib_mr *mr, struct scatterlist *sg, int sg_nents,
+		unsigned int sg_offset, unsigned int page_size)
 {
 	int n;
 
-	n = ib_map_mr_sg(mr, sg, sg_nents, page_size);
+	n = ib_map_mr_sg(mr, sg, sg_nents, sg_offset, page_size);
 	mr->iova = 0;
 
 	return n;
 }
 
-int ib_sg_to_pages(struct ib_mr *mr,
-		   struct scatterlist *sgl,
-		   int sg_nents,
-		   int (*set_page)(struct ib_mr *, u64));
+int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents,
+		unsigned int sg_offset, int (*set_page)(struct ib_mr *, u64));
 
 void ib_drain_rq(struct ib_qp *qp);
 void ib_drain_sq(struct ib_qp *qp);
diff --git a/net/rds/ib_frmr.c b/net/rds/ib_frmr.c
index 93ff038..d921adc 100644
--- a/net/rds/ib_frmr.c
+++ b/net/rds/ib_frmr.c
@@ -111,7 +111,7 @@ static int rds_ib_post_reg_frmr(struct rds_ib_mr *ibmr)
 		cpu_relax();
 	}
 
-	ret = ib_map_mr_sg_zbva(frmr->mr, ibmr->sg, ibmr->sg_len, PAGE_SIZE);
+	ret = ib_map_mr_sg_zbva(frmr->mr, ibmr->sg, ibmr->sg_len, 0, PAGE_SIZE);
 	if (unlikely(ret != ibmr->sg_len))
 		return ret < 0 ? ret : -EINVAL;
 
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index c250924..3274a4a 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -421,7 +421,7 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
 		return -ENOMEM;
 	}
 
-	n = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, PAGE_SIZE);
+	n = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, 0, PAGE_SIZE);
 	if (unlikely(n != frmr->sg_nents)) {
 		pr_err("RPC:       %s: failed to map mr %p (%u/%u)\n",
 		       __func__, frmr->fr_mr, n, frmr->sg_nents);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index 3b24a64..19a74e9 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -281,7 +281,7 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
 	}
 	atomic_inc(&xprt->sc_dma_used);
 
-	n = ib_map_mr_sg(frmr->mr, frmr->sg, frmr->sg_nents, PAGE_SIZE);
+	n = ib_map_mr_sg(frmr->mr, frmr->sg, frmr->sg_nents, 0, PAGE_SIZE);
 	if (unlikely(n != frmr->sg_nents)) {
 		pr_err("svcrdma: failed to map mr %p (%d/%d elements)\n",
 		       frmr->mr, n, frmr->sg_nents);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 04/12] IB/core: add a helper to check for READ WITH INVALIDATE support
  2016-04-11 21:32 generic RDMA READ/WRITE API V6 Christoph Hellwig
  2016-04-11 21:32 ` [PATCH 02/12] IB/cma: pass the port number to ib_create_qp Christoph Hellwig
  2016-04-11 21:32 ` [PATCH 03/12] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg Christoph Hellwig
@ 2016-04-11 21:32 ` Christoph Hellwig
       [not found]   ` <1460410360-13104-5-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  2016-04-19  3:15   ` Ira Weiny
  2016-04-11 21:32 ` [PATCH 05/12] IB/core: refactor ib_create_qp Christoph Hellwig
                   ` (8 subsequent siblings)
  11 siblings, 2 replies; 66+ messages in thread
From: Christoph Hellwig @ 2016-04-11 21:32 UTC (permalink / raw)
  To: dledford; +Cc: bart.vanassche, swise, sagi, linux-rdma, target-devel

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
---
 include/rdma/ib_verbs.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 24d0d82..9e8616a 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2318,6 +2318,18 @@ static inline bool rdma_cap_roce_gid_table(const struct ib_device *device,
 		device->add_gid && device->del_gid;
 }
 
+/*
+ * Check if the device supports READ W/ INVALIDATE.
+ */
+static inline bool rdma_cap_read_inv(struct ib_device *dev, u32 port_num)
+{
+	/*
+	 * iWarp drivers must support READ W/ INVALIDATE.  No other protocol
+	 * has support for it yet.
+	 */
+	return rdma_protocol_iwarp(dev, port_num);
+}
+
 int ib_query_gid(struct ib_device *device,
 		 u8 port_num, int index, union ib_gid *gid,
 		 struct ib_gid_attr *attr);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 05/12] IB/core: refactor ib_create_qp
  2016-04-11 21:32 generic RDMA READ/WRITE API V6 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2016-04-11 21:32 ` [PATCH 04/12] IB/core: add a helper to check for READ WITH INVALIDATE support Christoph Hellwig
@ 2016-04-11 21:32 ` Christoph Hellwig
  2016-04-17 20:00   ` Sagi Grimberg
       [not found]   ` <1460410360-13104-6-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  2016-04-11 21:32 ` [PATCH 07/12] IB/core: add a need_inval flag to struct ib_mr Christoph Hellwig
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 66+ messages in thread
From: Christoph Hellwig @ 2016-04-11 21:32 UTC (permalink / raw)
  To: dledford; +Cc: bart.vanassche, swise, sagi, linux-rdma, target-devel

Split the XRC magic into a separate function, and return early on failure
to make the initialization code readable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/core/verbs.c | 103 +++++++++++++++++++++-------------------
 1 file changed, 54 insertions(+), 49 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 064dbef..d0ed260 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -723,62 +723,67 @@ struct ib_qp *ib_open_qp(struct ib_xrcd *xrcd,
 }
 EXPORT_SYMBOL(ib_open_qp);
 
+static struct ib_qp *ib_create_xrc_qp(struct ib_qp *qp,
+		struct ib_qp_init_attr *qp_init_attr)
+{
+	struct ib_qp *real_qp = qp;
+
+	qp->event_handler = __ib_shared_qp_event_handler;
+	qp->qp_context = qp;
+	qp->pd = NULL;
+	qp->send_cq = qp->recv_cq = NULL;
+	qp->srq = NULL;
+	qp->xrcd = qp_init_attr->xrcd;
+	atomic_inc(&qp_init_attr->xrcd->usecnt);
+	INIT_LIST_HEAD(&qp->open_list);
+
+	qp = __ib_open_qp(real_qp, qp_init_attr->event_handler,
+			  qp_init_attr->qp_context);
+	if (!IS_ERR(qp))
+		__ib_insert_xrcd_qp(qp_init_attr->xrcd, real_qp);
+	else
+		real_qp->device->destroy_qp(real_qp);
+	return qp;
+}
+
 struct ib_qp *ib_create_qp(struct ib_pd *pd,
 			   struct ib_qp_init_attr *qp_init_attr)
 {
-	struct ib_qp *qp, *real_qp;
-	struct ib_device *device;
+	struct ib_device *device = pd ? pd->device : qp_init_attr->xrcd->device;
+	struct ib_qp *qp;
 
-	device = pd ? pd->device : qp_init_attr->xrcd->device;
 	qp = device->create_qp(pd, qp_init_attr, NULL);
-
-	if (!IS_ERR(qp)) {
-		qp->device     = device;
-		qp->real_qp    = qp;
-		qp->uobject    = NULL;
-		qp->qp_type    = qp_init_attr->qp_type;
-
-		atomic_set(&qp->usecnt, 0);
-		if (qp_init_attr->qp_type == IB_QPT_XRC_TGT) {
-			qp->event_handler = __ib_shared_qp_event_handler;
-			qp->qp_context = qp;
-			qp->pd = NULL;
-			qp->send_cq = qp->recv_cq = NULL;
-			qp->srq = NULL;
-			qp->xrcd = qp_init_attr->xrcd;
-			atomic_inc(&qp_init_attr->xrcd->usecnt);
-			INIT_LIST_HEAD(&qp->open_list);
-
-			real_qp = qp;
-			qp = __ib_open_qp(real_qp, qp_init_attr->event_handler,
-					  qp_init_attr->qp_context);
-			if (!IS_ERR(qp))
-				__ib_insert_xrcd_qp(qp_init_attr->xrcd, real_qp);
-			else
-				real_qp->device->destroy_qp(real_qp);
-		} else {
-			qp->event_handler = qp_init_attr->event_handler;
-			qp->qp_context = qp_init_attr->qp_context;
-			if (qp_init_attr->qp_type == IB_QPT_XRC_INI) {
-				qp->recv_cq = NULL;
-				qp->srq = NULL;
-			} else {
-				qp->recv_cq = qp_init_attr->recv_cq;
-				atomic_inc(&qp_init_attr->recv_cq->usecnt);
-				qp->srq = qp_init_attr->srq;
-				if (qp->srq)
-					atomic_inc(&qp_init_attr->srq->usecnt);
-			}
-
-			qp->pd	    = pd;
-			qp->send_cq = qp_init_attr->send_cq;
-			qp->xrcd    = NULL;
-
-			atomic_inc(&pd->usecnt);
-			atomic_inc(&qp_init_attr->send_cq->usecnt);
-		}
+	if (IS_ERR(qp))
+		return qp;
+
+	qp->device     = device;
+	qp->real_qp    = qp;
+	qp->uobject    = NULL;
+	qp->qp_type    = qp_init_attr->qp_type;
+
+	atomic_set(&qp->usecnt, 0);
+	if (qp_init_attr->qp_type == IB_QPT_XRC_TGT)
+		return ib_create_xrc_qp(qp, qp_init_attr);
+
+	qp->event_handler = qp_init_attr->event_handler;
+	qp->qp_context = qp_init_attr->qp_context;
+	if (qp_init_attr->qp_type == IB_QPT_XRC_INI) {
+		qp->recv_cq = NULL;
+		qp->srq = NULL;
+	} else {
+		qp->recv_cq = qp_init_attr->recv_cq;
+		atomic_inc(&qp_init_attr->recv_cq->usecnt);
+		qp->srq = qp_init_attr->srq;
+		if (qp->srq)
+			atomic_inc(&qp_init_attr->srq->usecnt);
 	}
 
+	qp->pd	    = pd;
+	qp->send_cq = qp_init_attr->send_cq;
+	qp->xrcd    = NULL;
+
+	atomic_inc(&pd->usecnt);
+	atomic_inc(&qp_init_attr->send_cq->usecnt);
 	return qp;
 }
 EXPORT_SYMBOL(ib_create_qp);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 06/12] IB/core: add a simple MR pool
       [not found] ` <1460410360-13104-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  2016-04-11 21:32   ` [PATCH 01/12] IB/mlx5: Expose correct max_sge_rd limit Christoph Hellwig
@ 2016-04-11 21:32   ` Christoph Hellwig
  2016-04-17 20:01     ` Sagi Grimberg
  2016-04-19  3:19     ` Ira Weiny
  2016-04-11 21:32   ` [PATCH 09/12] target: enhance and export target_alloc_sgl/target_free_sgl Christoph Hellwig
  2 siblings, 2 replies; 66+ messages in thread
From: Christoph Hellwig @ 2016-04-11 21:32 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagi-NQWnxTmZq1alnMjI0IkVqw, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Tested-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
Reviewed-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Reviewed-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
---
 drivers/infiniband/core/Makefile  |  2 +-
 drivers/infiniband/core/mr_pool.c | 86 +++++++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/verbs.c   |  5 +++
 include/rdma/ib_verbs.h           |  8 +++-
 include/rdma/mr_pool.h            | 25 ++++++++++++
 5 files changed, 124 insertions(+), 2 deletions(-)
 create mode 100644 drivers/infiniband/core/mr_pool.c
 create mode 100644 include/rdma/mr_pool.h

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index f818538..48bd9d8 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -10,7 +10,7 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) +=	ib_uverbs.o ib_ucm.o \
 
 ib_core-y :=			packer.o ud_header.o verbs.o cq.o sysfs.o \
 				device.o fmr_pool.o cache.o netlink.o \
-				roce_gid_mgmt.o
+				roce_gid_mgmt.o mr_pool.o
 ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
 ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o
 
diff --git a/drivers/infiniband/core/mr_pool.c b/drivers/infiniband/core/mr_pool.c
new file mode 100644
index 0000000..49d478b
--- /dev/null
+++ b/drivers/infiniband/core/mr_pool.c
@@ -0,0 +1,86 @@
+/*
+ * Copyright (c) 2016 HGST, a Western Digital Company.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#include <rdma/ib_verbs.h>
+#include <rdma/mr_pool.h>
+
+struct ib_mr *ib_mr_pool_get(struct ib_qp *qp, struct list_head *list)
+{
+	struct ib_mr *mr;
+	unsigned long flags;
+
+	spin_lock_irqsave(&qp->mr_lock, flags);
+	mr = list_first_entry_or_null(list, struct ib_mr, qp_entry);
+	if (mr) {
+		list_del(&mr->qp_entry);
+		qp->mrs_used++;
+	}
+	spin_unlock_irqrestore(&qp->mr_lock, flags);
+
+	return mr;
+}
+EXPORT_SYMBOL(ib_mr_pool_get);
+
+void ib_mr_pool_put(struct ib_qp *qp, struct list_head *list, struct ib_mr *mr)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&qp->mr_lock, flags);
+	list_add(&mr->qp_entry, list);
+	qp->mrs_used--;
+	spin_unlock_irqrestore(&qp->mr_lock, flags);
+}
+EXPORT_SYMBOL(ib_mr_pool_put);
+
+int ib_mr_pool_init(struct ib_qp *qp, struct list_head *list, int nr,
+		enum ib_mr_type type, u32 max_num_sg)
+{
+	struct ib_mr *mr;
+	unsigned long flags;
+	int ret, i;
+
+	for (i = 0; i < nr; i++) {
+		mr = ib_alloc_mr(qp->pd, type, max_num_sg);
+		if (IS_ERR(mr)) {
+			ret = PTR_ERR(mr);
+			goto out;
+		}
+
+		spin_lock_irqsave(&qp->mr_lock, flags);
+		list_add_tail(&mr->qp_entry, list);
+		spin_unlock_irqrestore(&qp->mr_lock, flags);
+	}
+
+	return 0;
+out:
+	ib_mr_pool_destroy(qp, list);
+	return ret;
+}
+EXPORT_SYMBOL(ib_mr_pool_init);
+
+void ib_mr_pool_destroy(struct ib_qp *qp, struct list_head *list)
+{
+	struct ib_mr *mr;
+	unsigned long flags;
+
+	spin_lock_irqsave(&qp->mr_lock, flags);
+	while (!list_empty(list)) {
+		mr = list_first_entry(list, struct ib_mr, qp_entry);
+		list_del(&mr->qp_entry);
+
+		spin_unlock_irqrestore(&qp->mr_lock, flags);
+		ib_dereg_mr(mr);
+		spin_lock_irqsave(&qp->mr_lock, flags);
+	}
+	spin_unlock_irqrestore(&qp->mr_lock, flags);
+}
+EXPORT_SYMBOL(ib_mr_pool_destroy);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index d0ed260..d9ea2fb 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -762,6 +762,9 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
 	qp->qp_type    = qp_init_attr->qp_type;
 
 	atomic_set(&qp->usecnt, 0);
+	qp->mrs_used = 0;
+	spin_lock_init(&qp->mr_lock);
+
 	if (qp_init_attr->qp_type == IB_QPT_XRC_TGT)
 		return ib_create_xrc_qp(qp, qp_init_attr);
 
@@ -1255,6 +1258,8 @@ int ib_destroy_qp(struct ib_qp *qp)
 	struct ib_srq *srq;
 	int ret;
 
+	WARN_ON_ONCE(qp->mrs_used > 0);
+
 	if (atomic_read(&qp->usecnt))
 		return -EBUSY;
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 9e8616a..400a8a0 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1421,9 +1421,12 @@ struct ib_qp {
 	struct ib_pd	       *pd;
 	struct ib_cq	       *send_cq;
 	struct ib_cq	       *recv_cq;
+	spinlock_t		mr_lock;
+	int			mrs_used;
 	struct ib_srq	       *srq;
 	struct ib_xrcd	       *xrcd; /* XRC TGT QPs only */
 	struct list_head	xrcd_list;
+
 	/* count times opened, mcast attaches, flow attaches */
 	atomic_t		usecnt;
 	struct list_head	open_list;
@@ -1438,12 +1441,15 @@ struct ib_qp {
 struct ib_mr {
 	struct ib_device  *device;
 	struct ib_pd	  *pd;
-	struct ib_uobject *uobject;
 	u32		   lkey;
 	u32		   rkey;
 	u64		   iova;
 	u32		   length;
 	unsigned int	   page_size;
+	union {
+		struct ib_uobject	*uobject;	/* user */
+		struct list_head	qp_entry;	/* FR */
+	};
 };
 
 struct ib_mw {
diff --git a/include/rdma/mr_pool.h b/include/rdma/mr_pool.h
new file mode 100644
index 0000000..986010b
--- /dev/null
+++ b/include/rdma/mr_pool.h
@@ -0,0 +1,25 @@
+/*
+ * Copyright (c) 2016 HGST, a Western Digital Company.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#ifndef _RDMA_MR_POOL_H
+#define _RDMA_MR_POOL_H 1
+
+#include <rdma/ib_verbs.h>
+
+struct ib_mr *ib_mr_pool_get(struct ib_qp *qp, struct list_head *list);
+void ib_mr_pool_put(struct ib_qp *qp, struct list_head *list, struct ib_mr *mr);
+
+int ib_mr_pool_init(struct ib_qp *qp, struct list_head *list, int nr,
+		enum ib_mr_type type, u32 max_num_sg);
+void ib_mr_pool_destroy(struct ib_qp *qp, struct list_head *list);
+
+#endif /* _RDMA_MR_POOL_H */
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 07/12] IB/core: add a need_inval flag to struct ib_mr
  2016-04-11 21:32 generic RDMA READ/WRITE API V6 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2016-04-11 21:32 ` [PATCH 05/12] IB/core: refactor ib_create_qp Christoph Hellwig
@ 2016-04-11 21:32 ` Christoph Hellwig
       [not found]   ` <1460410360-13104-8-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  2016-04-19  3:20   ` Ira Weiny
  2016-04-11 21:32 ` [PATCH 08/12] IB/core: generic RDMA READ/WRITE API Christoph Hellwig
                   ` (6 subsequent siblings)
  11 siblings, 2 replies; 66+ messages in thread
From: Christoph Hellwig @ 2016-04-11 21:32 UTC (permalink / raw)
  To: dledford
  Cc: bart.vanassche, swise, sagi, linux-rdma, target-devel, Steve Wise

From: Steve Wise <swise@chelsio.com>

This is the first step toward moving MR invalidation decisions
to the core.  It will be needed by the upcoming RW API.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/infiniband/core/verbs.c | 2 ++
 include/rdma/ib_verbs.h         | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index d9ea2fb..179d800 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1353,6 +1353,7 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags)
 		mr->pd      = pd;
 		mr->uobject = NULL;
 		atomic_inc(&pd->usecnt);
+		mr->need_inval = false;
 	}
 
 	return mr;
@@ -1399,6 +1400,7 @@ struct ib_mr *ib_alloc_mr(struct ib_pd *pd,
 		mr->pd      = pd;
 		mr->uobject = NULL;
 		atomic_inc(&pd->usecnt);
+		mr->need_inval = false;
 	}
 
 	return mr;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 400a8a0..3f66647 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1446,6 +1446,7 @@ struct ib_mr {
 	u64		   iova;
 	u32		   length;
 	unsigned int	   page_size;
+	bool		   need_inval;
 	union {
 		struct ib_uobject	*uobject;	/* user */
 		struct list_head	qp_entry;	/* FR */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 08/12] IB/core: generic RDMA READ/WRITE API
  2016-04-11 21:32 generic RDMA READ/WRITE API V6 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2016-04-11 21:32 ` [PATCH 07/12] IB/core: add a need_inval flag to struct ib_mr Christoph Hellwig
@ 2016-04-11 21:32 ` Christoph Hellwig
       [not found]   ` <1460410360-13104-9-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
       [not found] ` <1460410360-13104-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 66+ messages in thread
From: Christoph Hellwig @ 2016-04-11 21:32 UTC (permalink / raw)
  To: dledford; +Cc: bart.vanassche, swise, sagi, linux-rdma, target-devel

This supports both manual mapping of lots of SGEs, as well as using MRs
from the QP's MR pool, for iWarp or other cases where it's more optimal.
For now, MRs are only used for iWARP transports.  The user of the RDMA-RW
API must allocate the QP MR pool as well as size the SQ accordingly.

Thanks to Steve Wise for testing, fixing and rewriting the iWarp support,
and to Sagi Grimberg for ideas, reviews and fixes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/infiniband/core/Makefile |   2 +-
 drivers/infiniband/core/rw.c     | 503 +++++++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/verbs.c  |  25 ++
 include/rdma/ib_verbs.h          |  14 +-
 include/rdma/rw.h                |  73 ++++++
 5 files changed, 615 insertions(+), 2 deletions(-)
 create mode 100644 drivers/infiniband/core/rw.c
 create mode 100644 include/rdma/rw.h

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 48bd9d8..26987d9 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -8,7 +8,7 @@ obj-$(CONFIG_INFINIBAND_USER_MAD) +=	ib_umad.o
 obj-$(CONFIG_INFINIBAND_USER_ACCESS) +=	ib_uverbs.o ib_ucm.o \
 					$(user_access-y)
 
-ib_core-y :=			packer.o ud_header.o verbs.o cq.o sysfs.o \
+ib_core-y :=			packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
 				device.o fmr_pool.o cache.o netlink.o \
 				roce_gid_mgmt.o mr_pool.o
 ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
new file mode 100644
index 0000000..a5a094b
--- /dev/null
+++ b/drivers/infiniband/core/rw.c
@@ -0,0 +1,503 @@
+/*
+ * Copyright (c) 2016 HGST, a Western Digital Company.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#include <linux/moduleparam.h>
+#include <linux/slab.h>
+#include <rdma/mr_pool.h>
+#include <rdma/rw.h>
+
+static bool rdma_rw_force_mr;
+module_param_named(force_mr, rdma_rw_force_mr, bool, 0);
+MODULE_PARM_DESC(force_mr, "Force usage of MRs for RDMA READ/WRITE operations");
+
+/*
+ * Check if the device might use memory registration.  This is currently only
+ * true for iWarp devices. In the future we can hopefully fine tune this based
+ * on HCA driver input.
+ */
+static inline bool rdma_rw_can_use_mr(struct ib_device *dev, u8 port_num)
+{
+	if (rdma_protocol_iwarp(dev, port_num))
+		return true;
+	if (unlikely(rdma_rw_force_mr))
+		return true;
+	return false;
+}
+
+/*
+ * Check if the device will use memory registration for this RW operation.
+ * We currently always use memory registrations for iWarp reads, and iWarp
+ * writes, but never for IB and RoCE.
+ *
+ * XXX: In the future we can hopefully fine tune this based on HCA driver
+ * input.
+ */
+static inline bool rdma_rw_io_needs_mr(struct ib_device *dev, u8 port_num,
+		enum dma_data_direction dir, int dma_nents)
+{
+	if (rdma_protocol_iwarp(dev, port_num) && dir == DMA_FROM_DEVICE)
+		return true;
+	if (unlikely(rdma_rw_force_mr))
+		return true;
+	return false;
+}
+
+static inline u32 rdma_rw_max_sge(struct ib_device *dev,
+		enum dma_data_direction dir)
+{
+	return dir == DMA_TO_DEVICE ?
+		dev->attrs.max_sge : dev->attrs.max_sge_rd;
+}
+
+static inline u32 rdma_rw_fr_page_list_len(struct ib_device *dev)
+{
+	/* arbitrary limit to avoid allocating gigantic resources */
+	return min_t(u32, dev->attrs.max_fast_reg_page_list_len, 256);
+}
+
+static int rdma_rw_init_one_mr(struct ib_qp *qp, u8 port_num,
+		struct rdma_rw_reg_ctx *reg, struct scatterlist *sg,
+		u32 sg_cnt, u32 offset)
+{
+	u32 pages_per_mr = rdma_rw_fr_page_list_len(qp->pd->device);
+	u32 nents = min(sg_cnt, pages_per_mr);
+	int count = 0, ret;
+
+	reg->mr = ib_mr_pool_get(qp, &qp->rdma_mrs);
+	if (!reg->mr)
+		return -EAGAIN;
+
+	if (reg->mr->need_inval) {
+		reg->inv_wr.opcode = IB_WR_LOCAL_INV;
+		reg->inv_wr.ex.invalidate_rkey = reg->mr->lkey;
+		reg->inv_wr.next = &reg->reg_wr.wr;
+		count++;
+	} else {
+		reg->inv_wr.next = NULL;
+	}
+
+	ret = ib_map_mr_sg(reg->mr, sg, nents, offset, PAGE_SIZE);
+	if (ret < nents) {
+		ib_mr_pool_put(qp, &qp->rdma_mrs, reg->mr);
+		return -EINVAL;
+	}
+
+	reg->reg_wr.wr.opcode = IB_WR_REG_MR;
+	reg->reg_wr.mr = reg->mr;
+	reg->reg_wr.access = IB_ACCESS_LOCAL_WRITE;
+	if (rdma_protocol_iwarp(qp->device, port_num))
+		reg->reg_wr.access |= IB_ACCESS_REMOTE_WRITE;
+	count++;
+
+	reg->sge.addr = reg->mr->iova;
+	reg->sge.length = reg->mr->length;
+	return count;
+}
+
+static int rdma_rw_init_mr_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
+		u8 port_num, struct scatterlist *sg, u32 sg_cnt, u32 offset,
+		u64 remote_addr, u32 rkey, enum dma_data_direction dir)
+{
+	u32 pages_per_mr = rdma_rw_fr_page_list_len(qp->pd->device);
+	int i, j, ret = 0, count = 0;
+
+	ctx->nr_ops = (sg_cnt + pages_per_mr - 1) / pages_per_mr;
+	ctx->reg = kcalloc(ctx->nr_ops, sizeof(*ctx->reg), GFP_KERNEL);
+	if (!ctx->reg) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	for (i = 0; i < ctx->nr_ops; i++) {
+		struct rdma_rw_reg_ctx *prev = i ? &ctx->reg[i - 1] : NULL;
+		struct rdma_rw_reg_ctx *reg = &ctx->reg[i];
+		u32 nents = min(sg_cnt, pages_per_mr);
+
+		ret = rdma_rw_init_one_mr(qp, port_num, reg, sg, sg_cnt,
+				offset);
+		if (ret < 0)
+			goto out_free;
+		count += ret;
+
+		if (prev) {
+			if (reg->mr->need_inval)
+				prev->wr.wr.next = &reg->inv_wr;
+			else
+				prev->wr.wr.next = &reg->reg_wr.wr;
+		}
+
+		reg->reg_wr.wr.next = &reg->wr.wr;
+
+		reg->wr.wr.sg_list = &reg->sge;
+		reg->wr.wr.num_sge = 1;
+		reg->wr.remote_addr = remote_addr;
+		reg->wr.rkey = rkey;
+		if (dir == DMA_TO_DEVICE) {
+			reg->wr.wr.opcode = IB_WR_RDMA_WRITE;
+		} else if (!rdma_cap_read_inv(qp->device, port_num)) {
+			reg->wr.wr.opcode = IB_WR_RDMA_READ;
+		} else {
+			reg->wr.wr.opcode = IB_WR_RDMA_READ_WITH_INV;
+			reg->wr.wr.ex.invalidate_rkey = reg->mr->lkey;
+		}
+		count++;
+
+		remote_addr += reg->sge.length;
+		sg_cnt -= nents;
+		for (j = 0; j < nents; j++)
+			sg = sg_next(sg);
+		offset = 0;
+	}
+
+	ctx->type = RDMA_RW_MR;
+	return count;
+
+out_free:
+	while (--i >= 0)
+		ib_mr_pool_put(qp, &qp->rdma_mrs, ctx->reg[i].mr);
+	kfree(ctx->reg);
+out:
+	return ret;
+}
+
+static int rdma_rw_init_map_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
+		struct scatterlist *sg, u32 sg_cnt, u32 offset,
+		u64 remote_addr, u32 rkey, enum dma_data_direction dir)
+{
+	struct ib_device *dev = qp->pd->device;
+	u32 max_sge = rdma_rw_max_sge(dev, dir);
+	struct ib_sge *sge;
+	u32 total_len = 0, i, j;
+
+	ctx->nr_ops = DIV_ROUND_UP(sg_cnt, max_sge);
+
+	ctx->map.sges = sge = kcalloc(sg_cnt, sizeof(*sge), GFP_KERNEL);
+	if (!ctx->map.sges)
+		goto out;
+
+	ctx->map.wrs = kcalloc(ctx->nr_ops, sizeof(*ctx->map.wrs), GFP_KERNEL);
+	if (!ctx->map.wrs)
+		goto out_free_sges;
+
+	for (i = 0; i < ctx->nr_ops; i++) {
+		struct ib_rdma_wr *rdma_wr = &ctx->map.wrs[i];
+		u32 nr_sge = min(sg_cnt, max_sge);
+
+		if (dir == DMA_TO_DEVICE)
+			rdma_wr->wr.opcode = IB_WR_RDMA_WRITE;
+		else
+			rdma_wr->wr.opcode = IB_WR_RDMA_READ;
+		rdma_wr->remote_addr = remote_addr + total_len;
+		rdma_wr->rkey = rkey;
+		rdma_wr->wr.sg_list = sge;
+
+		for (j = 0; j < nr_sge; j++, sg = sg_next(sg)) {
+			rdma_wr->wr.num_sge++;
+
+			sge->addr = ib_sg_dma_address(dev, sg) + offset;
+			sge->length = ib_sg_dma_len(dev, sg) - offset;
+			sge->lkey = qp->pd->local_dma_lkey;
+
+			total_len += sge->length;
+			sge++;
+			sg_cnt--;
+			offset = 0;
+		}
+
+		if (i + 1 < ctx->nr_ops)
+			rdma_wr->wr.next = &ctx->map.wrs[i + 1].wr;
+	}
+
+	ctx->type = RDMA_RW_MULTI_WR;
+	return ctx->nr_ops;
+
+out_free_sges:
+	kfree(ctx->map.sges);
+out:
+	return -ENOMEM;
+}
+
+static int rdma_rw_init_single_wr(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
+		struct scatterlist *sg, u32 offset, u64 remote_addr, u32 rkey,
+		enum dma_data_direction dir)
+{
+	struct ib_device *dev = qp->pd->device;
+	struct ib_rdma_wr *rdma_wr = &ctx->single.wr;
+
+	ctx->nr_ops = 1;
+
+	ctx->single.sge.lkey = qp->pd->local_dma_lkey;
+	ctx->single.sge.addr = ib_sg_dma_address(dev, sg) + offset;
+	ctx->single.sge.length = ib_sg_dma_len(dev, sg) - offset;
+
+	memset(rdma_wr, 0, sizeof(*rdma_wr));
+	if (dir == DMA_TO_DEVICE)
+		rdma_wr->wr.opcode = IB_WR_RDMA_WRITE;
+	else
+		rdma_wr->wr.opcode = IB_WR_RDMA_READ;
+	rdma_wr->wr.sg_list = &ctx->single.sge;
+	rdma_wr->wr.num_sge = 1;
+	rdma_wr->remote_addr = remote_addr;
+	rdma_wr->rkey = rkey;
+
+	ctx->type = RDMA_RW_SINGLE_WR;
+	return 1;
+}
+
+/**
+ * rdma_rw_ctx_init - initialize a RDMA READ/WRITE context
+ * @ctx:	context to initialize
+ * @qp:		queue pair to operate on
+ * @port_num:	port num to which the connection is bound
+ * @sg:		scatterlist to READ/WRITE from/to
+ * @sg_cnt:	number of entries in @sg
+ * @sg_offset:	current byte offset into @sg
+ * @remote_addr:remote address to read/write (relative to @rkey)
+ * @rkey:	remote key to operate on
+ * @dir:	%DMA_TO_DEVICE for RDMA WRITE, %DMA_FROM_DEVICE for RDMA READ
+ *
+ * Returns the number of WQEs that will be needed on the workqueue if
+ * successful, or a negative error code.
+ */
+int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
+		struct scatterlist *sg, u32 sg_cnt, u32 sg_offset,
+		u64 remote_addr, u32 rkey, enum dma_data_direction dir)
+{
+	struct ib_device *dev = qp->pd->device;
+	int ret;
+
+	ret = ib_dma_map_sg(dev, sg, sg_cnt, dir);
+	if (!ret)
+		return -ENOMEM;
+	sg_cnt = ret;
+
+	/*
+	 * Skip to the S/G entry that sg_offset falls into:
+	 */
+	for (;;) {
+		u32 len = ib_sg_dma_len(dev, sg);
+
+		if (sg_offset < len)
+			break;
+
+		sg = sg_next(sg);
+		sg_offset -= len;
+		sg_cnt--;
+	}
+
+	ret = -EIO;
+	if (WARN_ON_ONCE(sg_cnt == 0))
+		goto out_unmap_sg;
+
+	if (rdma_rw_io_needs_mr(qp->device, port_num, dir, sg_cnt)) {
+		ret = rdma_rw_init_mr_wrs(ctx, qp, port_num, sg, sg_cnt,
+				sg_offset, remote_addr, rkey, dir);
+	} else if (sg_cnt > 1) {
+		ret = rdma_rw_init_map_wrs(ctx, qp, sg, sg_cnt, sg_offset,
+				remote_addr, rkey, dir);
+	} else {
+		ret = rdma_rw_init_single_wr(ctx, qp, sg, sg_offset,
+				remote_addr, rkey, dir);
+	}
+
+	if (ret < 0)
+		goto out_unmap_sg;
+	return ret;
+
+out_unmap_sg:
+	ib_dma_unmap_sg(dev, sg, sg_cnt, dir);
+	return ret;
+}
+EXPORT_SYMBOL(rdma_rw_ctx_init);
+
+/*
+ * Now that we are going to post the WRs we can update the lkey and need_inval
+ * state on the MRs.  If we were doing this at init time, we would get double
+ * or missing invalidations if a context was initialized but not actually
+ * posted.
+ */
+static void rdma_rw_update_lkey(struct rdma_rw_reg_ctx *reg, bool need_inval)
+{
+	reg->mr->need_inval = need_inval;
+	ib_update_fast_reg_key(reg->mr, ib_inc_rkey(reg->mr->lkey));
+	reg->reg_wr.key = reg->mr->lkey;
+	reg->sge.lkey = reg->mr->lkey;
+}
+
+/**
+ * rdma_rw_ctx_wrs - return chain of WRs for a RDMA READ or WRITE operation
+ * @ctx:	context to operate on
+ * @qp:		queue pair to operate on
+ * @port_num:	port num to which the connection is bound
+ * @cqe:	completion queue entry for the last WR
+ * @chain_wr:	WR to append to the posted chain
+ *
+ * Return the WR chain for the set of RDMA READ/WRITE operations described by
+ * @ctx, as well as any memory registration operations needed.  If @chain_wr
+ * is non-NULL the WR it points to will be appended to the chain of WRs posted.
+ * If @chain_wr is not set @cqe must be set so that the caller gets a
+ * completion notification.
+ */
+struct ib_send_wr *rdma_rw_ctx_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
+		u8 port_num, struct ib_cqe *cqe, struct ib_send_wr *chain_wr)
+{
+	struct ib_send_wr *first_wr, *last_wr;
+	int i;
+
+	switch (ctx->type) {
+	case RDMA_RW_MR:
+		for (i = 0; i < ctx->nr_ops; i++) {
+			rdma_rw_update_lkey(&ctx->reg[i],
+				ctx->reg[i].wr.wr.opcode !=
+					IB_WR_RDMA_READ_WITH_INV);
+		}
+
+		if (ctx->reg[0].inv_wr.next)
+			first_wr = &ctx->reg[0].inv_wr;
+		else
+			first_wr = &ctx->reg[0].reg_wr.wr;
+		last_wr = &ctx->reg[ctx->nr_ops - 1].wr.wr;
+		break;
+	case RDMA_RW_MULTI_WR:
+		first_wr = &ctx->map.wrs[0].wr;
+		last_wr = &ctx->map.wrs[ctx->nr_ops - 1].wr;
+		break;
+	case RDMA_RW_SINGLE_WR:
+		first_wr = &ctx->single.wr.wr;
+		last_wr = &ctx->single.wr.wr;
+		break;
+	default:
+		BUG();
+	}
+
+	if (chain_wr) {
+		last_wr->next = chain_wr;
+	} else {
+		last_wr->wr_cqe = cqe;
+		last_wr->send_flags |= IB_SEND_SIGNALED;
+	}
+
+	return first_wr;
+}
+EXPORT_SYMBOL(rdma_rw_ctx_wrs);
+
+/**
+ * rdma_rw_ctx_post - post a RDMA READ or RDMA WRITE operation
+ * @ctx:	context to operate on
+ * @qp:		queue pair to operate on
+ * @port_num:	port num to which the connection is bound
+ * @cqe:	completion queue entry for the last WR
+ * @chain_wr:	WR to append to the posted chain
+ *
+ * Post the set of RDMA READ/WRITE operations described by @ctx, as well as
+ * any memory registration operations needed.  If @chain_wr is non-NULL the
+ * WR it points to will be appended to the chain of WRs posted.  If @chain_wr
+ * is not set @cqe must be set so that the caller gets a completion
+ * notification.
+ */
+int rdma_rw_ctx_post(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
+		struct ib_cqe *cqe, struct ib_send_wr *chain_wr)
+{
+	struct ib_send_wr *first_wr, *bad_wr;
+
+	first_wr = rdma_rw_ctx_wrs(ctx, qp, port_num, cqe, chain_wr);
+	return ib_post_send(qp, first_wr, &bad_wr);
+}
+EXPORT_SYMBOL(rdma_rw_ctx_post);
+
+/**
+ * rdma_rw_ctx_destroy - release all resources allocated by rdma_rw_ctx_init
+ * @ctx:	context to release
+ * @qp:		queue pair to operate on
+ * @port_num:	port num to which the connection is bound
+ * @sg:		scatterlist that was used for the READ/WRITE
+ * @sg_cnt:	number of entries in @sg
+ * @dir:	%DMA_TO_DEVICE for RDMA WRITE, %DMA_FROM_DEVICE for RDMA READ
+ */
+void rdma_rw_ctx_destroy(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
+		struct scatterlist *sg, u32 sg_cnt, enum dma_data_direction dir)
+{
+	int i;
+
+	switch (ctx->type) {
+	case RDMA_RW_MR:
+		for (i = 0; i < ctx->nr_ops; i++)
+			ib_mr_pool_put(qp, &qp->rdma_mrs, ctx->reg[i].mr);
+		kfree(ctx->reg);
+		break;
+	case RDMA_RW_MULTI_WR:
+		kfree(ctx->map.wrs);
+		kfree(ctx->map.sges);
+		break;
+	case RDMA_RW_SINGLE_WR:
+		break;
+	default:
+		BUG();
+		break;
+	}
+
+	ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
+}
+EXPORT_SYMBOL(rdma_rw_ctx_destroy);
+
+void rdma_rw_init_qp(struct ib_device *dev, struct ib_qp_init_attr *attr)
+{
+	u32 factor;
+
+	WARN_ON_ONCE(attr->port_num == 0);
+
+	/*
+	 * Each context needs at least one RDMA READ or WRITE WR.
+	 *
+	 * For some hardware we might need more, eventually we should ask the
+	 * HCA driver for a multiplier here.
+	 */
+	factor = 1;
+
+	/*
+	 * If the devices needs MRs to perform RDMA READ or WRITE operations,
+	 * we'll need two additional MRs for the registrations and the
+	 * invalidation.
+	 */
+	if (rdma_rw_can_use_mr(dev, attr->port_num))
+		factor += 2;	/* inv + reg */
+
+	attr->cap.max_send_wr += factor * attr->cap.max_rdma_ctxs;
+
+	/*
+	 * But maybe we were just too high in the sky and the device doesn't
+	 * even support all we need, and we'll have to live with what we get..
+	 */
+	attr->cap.max_send_wr =
+		min_t(u32, attr->cap.max_send_wr, dev->attrs.max_qp_wr);
+}
+
+int rdma_rw_init_mrs(struct ib_qp *qp, struct ib_qp_init_attr *attr)
+{
+	struct ib_device *dev = qp->pd->device;
+	int ret = 0;
+
+	if (rdma_rw_can_use_mr(dev, attr->port_num)) {
+		ret = ib_mr_pool_init(qp, &qp->rdma_mrs,
+				attr->cap.max_rdma_ctxs, IB_MR_TYPE_MEM_REG,
+				rdma_rw_fr_page_list_len(dev));
+		if (ret)
+			return ret;
+	}
+
+	return ret;
+}
+
+void rdma_rw_cleanup_mrs(struct ib_qp *qp)
+{
+	ib_mr_pool_destroy(qp, &qp->rdma_mrs);
+}
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 179d800..769b000 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -48,6 +48,7 @@
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_cache.h>
 #include <rdma/ib_addr.h>
+#include <rdma/rw.h>
 
 #include "core_priv.h"
 
@@ -751,6 +752,16 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
 {
 	struct ib_device *device = pd ? pd->device : qp_init_attr->xrcd->device;
 	struct ib_qp *qp;
+	int ret;
+
+	/*
+	 * If the callers is using the RDMA API calculate the resources
+	 * needed for the RDMA READ/WRITE operations.
+	 *
+	 * Note that these callers need to pass in a port number.
+	 */
+	if (qp_init_attr->cap.max_rdma_ctxs)
+		rdma_rw_init_qp(device, qp_init_attr);
 
 	qp = device->create_qp(pd, qp_init_attr, NULL);
 	if (IS_ERR(qp))
@@ -764,6 +775,7 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
 	atomic_set(&qp->usecnt, 0);
 	qp->mrs_used = 0;
 	spin_lock_init(&qp->mr_lock);
+	INIT_LIST_HEAD(&qp->rdma_mrs);
 
 	if (qp_init_attr->qp_type == IB_QPT_XRC_TGT)
 		return ib_create_xrc_qp(qp, qp_init_attr);
@@ -787,6 +799,16 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
 
 	atomic_inc(&pd->usecnt);
 	atomic_inc(&qp_init_attr->send_cq->usecnt);
+
+	if (qp_init_attr->cap.max_rdma_ctxs) {
+		ret = rdma_rw_init_mrs(qp, qp_init_attr);
+		if (ret) {
+			pr_err("failed to init MR pool ret= %d\n", ret);
+			ib_destroy_qp(qp);
+			qp = ERR_PTR(ret);
+		}
+	}
+
 	return qp;
 }
 EXPORT_SYMBOL(ib_create_qp);
@@ -1271,6 +1293,9 @@ int ib_destroy_qp(struct ib_qp *qp)
 	rcq  = qp->recv_cq;
 	srq  = qp->srq;
 
+	if (!qp->uobject)
+		rdma_rw_cleanup_mrs(qp);
+
 	ret = qp->device->destroy_qp(qp);
 	if (!ret) {
 		if (pd)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 3f66647..dd8e15d 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -931,6 +931,13 @@ struct ib_qp_cap {
 	u32	max_send_sge;
 	u32	max_recv_sge;
 	u32	max_inline_data;
+
+	/*
+	 * Maximum number of rdma_rw_ctx structures in flight at a time.
+	 * ib_create_qp() will calculate the right amount of neededed WRs
+	 * and MRs based on this.
+	 */
+	u32	max_rdma_ctxs;
 };
 
 enum ib_sig_type {
@@ -1002,7 +1009,11 @@ struct ib_qp_init_attr {
 	enum ib_sig_type	sq_sig_type;
 	enum ib_qp_type		qp_type;
 	enum ib_qp_create_flags	create_flags;
-	u8			port_num; /* special QP types only */
+
+	/*
+	 * Only needed for special QP types, or when using the RW API.
+	 */
+	u8			port_num;
 };
 
 struct ib_qp_open_attr {
@@ -1423,6 +1434,7 @@ struct ib_qp {
 	struct ib_cq	       *recv_cq;
 	spinlock_t		mr_lock;
 	int			mrs_used;
+	struct list_head	rdma_mrs;
 	struct ib_srq	       *srq;
 	struct ib_xrcd	       *xrcd; /* XRC TGT QPs only */
 	struct list_head	xrcd_list;
diff --git a/include/rdma/rw.h b/include/rdma/rw.h
new file mode 100644
index 0000000..5e93146
--- /dev/null
+++ b/include/rdma/rw.h
@@ -0,0 +1,73 @@
+/*
+ * Copyright (c) 2016 HGST, a Western Digital Company.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#ifndef _RDMA_RW_H
+#define _RDMA_RW_H
+
+#include <linux/dma-mapping.h>
+#include <linux/scatterlist.h>
+#include <rdma/ib_verbs.h>
+#include <rdma/rdma_cm.h>
+#include <rdma/mr_pool.h>
+
+#define RDMA_RW_SINGLE_WR	0
+#define RDMA_RW_MULTI_WR	1
+#define RDMA_RW_MR		2
+
+struct rdma_rw_ctx {
+	/* number of RDMA READ/WRITE WRs (not counting MR WRs) */
+	u32			nr_ops;
+
+	/* tag for the union below: */
+	u8			type;
+
+	union {
+		/* for mapping a single SGE: */
+		struct {
+			struct ib_sge		sge;
+			struct ib_rdma_wr	wr;
+		} single;
+
+		/* for mapping of multiple SGEs: */
+		struct {
+			struct ib_sge		*sges;
+			struct ib_rdma_wr	*wrs;
+		} map;
+
+		/* for registering multiple WRs: */
+		struct rdma_rw_reg_ctx {
+			struct ib_sge		sge;
+			struct ib_rdma_wr	wr;
+			struct ib_reg_wr	reg_wr;
+			struct ib_send_wr	inv_wr;
+			struct ib_mr		*mr;
+		} *reg;
+	};
+};
+
+int rdma_rw_ctx_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
+		struct scatterlist *sg, u32 sg_cnt, u32 sg_offset,
+		u64 remote_addr, u32 rkey, enum dma_data_direction dir);
+void rdma_rw_ctx_destroy(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
+		struct scatterlist *sg, u32 sg_cnt,
+		enum dma_data_direction dir);
+
+struct ib_send_wr *rdma_rw_ctx_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
+		u8 port_num, struct ib_cqe *cqe, struct ib_send_wr *chain_wr);
+int rdma_rw_ctx_post(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
+		struct ib_cqe *cqe, struct ib_send_wr *chain_wr);
+
+void rdma_rw_init_qp(struct ib_device *dev, struct ib_qp_init_attr *attr);
+int rdma_rw_init_mrs(struct ib_qp *qp, struct ib_qp_init_attr *attr);
+void rdma_rw_cleanup_mrs(struct ib_qp *qp);
+
+#endif /* _RDMA_RW_H */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 09/12] target: enhance and export target_alloc_sgl/target_free_sgl
       [not found] ` <1460410360-13104-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  2016-04-11 21:32   ` [PATCH 01/12] IB/mlx5: Expose correct max_sge_rd limit Christoph Hellwig
  2016-04-11 21:32   ` [PATCH 06/12] IB/core: add a simple MR pool Christoph Hellwig
@ 2016-04-11 21:32   ` Christoph Hellwig
  2 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2016-04-11 21:32 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagi-NQWnxTmZq1alnMjI0IkVqw, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

The SRP target driver will need to allocate and chain it's own SGLs soon.
For this export target_alloc_sgl, and add a new argument to it so that it
can allocate an additional chain entry that doesn't point to a page.  Also
export transport_free_sgl after renaming it to target_free_sgl to free
these SGLs again.

Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Reviewed-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
---
 drivers/target/target_core_transport.c | 32 ++++++++++++++++++--------------
 drivers/target/target_core_xcopy.c     |  2 +-
 include/target/target_core_backend.h   |  1 -
 include/target/target_core_fabric.h    |  4 ++++
 4 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index ab2bf12..590384a 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2195,7 +2195,7 @@ queue_full:
 	transport_handle_queue_full(cmd, cmd->se_dev);
 }
 
-static inline void transport_free_sgl(struct scatterlist *sgl, int nents)
+void target_free_sgl(struct scatterlist *sgl, int nents)
 {
 	struct scatterlist *sg;
 	int count;
@@ -2205,6 +2205,7 @@ static inline void transport_free_sgl(struct scatterlist *sgl, int nents)
 
 	kfree(sgl);
 }
+EXPORT_SYMBOL(target_free_sgl);
 
 static inline void transport_reset_sgl_orig(struct se_cmd *cmd)
 {
@@ -2225,7 +2226,7 @@ static inline void transport_reset_sgl_orig(struct se_cmd *cmd)
 static inline void transport_free_pages(struct se_cmd *cmd)
 {
 	if (!(cmd->se_cmd_flags & SCF_PASSTHROUGH_PROT_SG_TO_MEM_NOALLOC)) {
-		transport_free_sgl(cmd->t_prot_sg, cmd->t_prot_nents);
+		target_free_sgl(cmd->t_prot_sg, cmd->t_prot_nents);
 		cmd->t_prot_sg = NULL;
 		cmd->t_prot_nents = 0;
 	}
@@ -2236,7 +2237,7 @@ static inline void transport_free_pages(struct se_cmd *cmd)
 		 * SG_TO_MEM_NOALLOC to function with COMPARE_AND_WRITE
 		 */
 		if (cmd->se_cmd_flags & SCF_COMPARE_AND_WRITE) {
-			transport_free_sgl(cmd->t_bidi_data_sg,
+			target_free_sgl(cmd->t_bidi_data_sg,
 					   cmd->t_bidi_data_nents);
 			cmd->t_bidi_data_sg = NULL;
 			cmd->t_bidi_data_nents = 0;
@@ -2246,11 +2247,11 @@ static inline void transport_free_pages(struct se_cmd *cmd)
 	}
 	transport_reset_sgl_orig(cmd);
 
-	transport_free_sgl(cmd->t_data_sg, cmd->t_data_nents);
+	target_free_sgl(cmd->t_data_sg, cmd->t_data_nents);
 	cmd->t_data_sg = NULL;
 	cmd->t_data_nents = 0;
 
-	transport_free_sgl(cmd->t_bidi_data_sg, cmd->t_bidi_data_nents);
+	target_free_sgl(cmd->t_bidi_data_sg, cmd->t_bidi_data_nents);
 	cmd->t_bidi_data_sg = NULL;
 	cmd->t_bidi_data_nents = 0;
 }
@@ -2324,20 +2325,22 @@ EXPORT_SYMBOL(transport_kunmap_data_sg);
 
 int
 target_alloc_sgl(struct scatterlist **sgl, unsigned int *nents, u32 length,
-		 bool zero_page)
+		 bool zero_page, bool chainable)
 {
 	struct scatterlist *sg;
 	struct page *page;
 	gfp_t zero_flag = (zero_page) ? __GFP_ZERO : 0;
-	unsigned int nent;
+	unsigned int nalloc, nent;
 	int i = 0;
 
-	nent = DIV_ROUND_UP(length, PAGE_SIZE);
-	sg = kmalloc(sizeof(struct scatterlist) * nent, GFP_KERNEL);
+	nalloc = nent = DIV_ROUND_UP(length, PAGE_SIZE);
+	if (chainable)
+		nalloc++;
+	sg = kmalloc_array(nalloc, sizeof(struct scatterlist), GFP_KERNEL);
 	if (!sg)
 		return -ENOMEM;
 
-	sg_init_table(sg, nent);
+	sg_init_table(sg, nalloc);
 
 	while (length) {
 		u32 page_len = min_t(u32, length, PAGE_SIZE);
@@ -2361,6 +2364,7 @@ out:
 	kfree(sg);
 	return -ENOMEM;
 }
+EXPORT_SYMBOL(target_alloc_sgl);
 
 /*
  * Allocate any required resources to execute the command.  For writes we
@@ -2376,7 +2380,7 @@ transport_generic_new_cmd(struct se_cmd *cmd)
 	if (cmd->prot_op != TARGET_PROT_NORMAL &&
 	    !(cmd->se_cmd_flags & SCF_PASSTHROUGH_PROT_SG_TO_MEM_NOALLOC)) {
 		ret = target_alloc_sgl(&cmd->t_prot_sg, &cmd->t_prot_nents,
-				       cmd->prot_length, true);
+				       cmd->prot_length, true, false);
 		if (ret < 0)
 			return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
 	}
@@ -2401,13 +2405,13 @@ transport_generic_new_cmd(struct se_cmd *cmd)
 
 			ret = target_alloc_sgl(&cmd->t_bidi_data_sg,
 					       &cmd->t_bidi_data_nents,
-					       bidi_length, zero_flag);
+					       bidi_length, zero_flag, false);
 			if (ret < 0)
 				return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
 		}
 
 		ret = target_alloc_sgl(&cmd->t_data_sg, &cmd->t_data_nents,
-				       cmd->data_length, zero_flag);
+				       cmd->data_length, zero_flag, false);
 		if (ret < 0)
 			return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
 	} else if ((cmd->se_cmd_flags & SCF_COMPARE_AND_WRITE) &&
@@ -2421,7 +2425,7 @@ transport_generic_new_cmd(struct se_cmd *cmd)
 
 		ret = target_alloc_sgl(&cmd->t_bidi_data_sg,
 				       &cmd->t_bidi_data_nents,
-				       caw_length, zero_flag);
+				       caw_length, zero_flag, false);
 		if (ret < 0)
 			return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
 	}
diff --git a/drivers/target/target_core_xcopy.c b/drivers/target/target_core_xcopy.c
index 47fe94e..75cd854 100644
--- a/drivers/target/target_core_xcopy.c
+++ b/drivers/target/target_core_xcopy.c
@@ -563,7 +563,7 @@ static int target_xcopy_setup_pt_cmd(
 
 	if (alloc_mem) {
 		rc = target_alloc_sgl(&cmd->t_data_sg, &cmd->t_data_nents,
-				      cmd->data_length, false);
+				      cmd->data_length, false, false);
 		if (rc < 0) {
 			ret = rc;
 			goto out;
diff --git a/include/target/target_core_backend.h b/include/target/target_core_backend.h
index 28ee5c2..d8ab510 100644
--- a/include/target/target_core_backend.h
+++ b/include/target/target_core_backend.h
@@ -85,7 +85,6 @@ extern struct configfs_attribute *passthrough_attrib_attrs[];
 void	*transport_kmap_data_sg(struct se_cmd *);
 void	transport_kunmap_data_sg(struct se_cmd *);
 /* core helpers also used by xcopy during internal command setup */
-int	target_alloc_sgl(struct scatterlist **, unsigned int *, u32, bool);
 sense_reason_t	transport_generic_map_mem_to_cmd(struct se_cmd *,
 		struct scatterlist *, u32, struct scatterlist *, u32);
 
diff --git a/include/target/target_core_fabric.h b/include/target/target_core_fabric.h
index 8ff6d40..78d88f0 100644
--- a/include/target/target_core_fabric.h
+++ b/include/target/target_core_fabric.h
@@ -185,6 +185,10 @@ int	core_tpg_set_initiator_node_tag(struct se_portal_group *,
 int	core_tpg_register(struct se_wwn *, struct se_portal_group *, int);
 int	core_tpg_deregister(struct se_portal_group *);
 
+int	target_alloc_sgl(struct scatterlist **sgl, unsigned int *nents,
+		u32 length, bool zero_page, bool chainable);
+void	target_free_sgl(struct scatterlist *sgl, int nents);
+
 /*
  * The LIO target core uses DMA_TO_DEVICE to mean that data is going
  * to the target (eg handling a WRITE) and DMA_FROM_DEVICE to mean
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 10/12] IB/srpt: convert to the generic RDMA READ/WRITE API
  2016-04-11 21:32 generic RDMA READ/WRITE API V6 Christoph Hellwig
                   ` (6 preceding siblings ...)
       [not found] ` <1460410360-13104-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2016-04-11 21:32 ` Christoph Hellwig
  2016-04-13 18:57   ` Bart Van Assche
  2016-04-11 21:32 ` [PATCH 11/12] IB/core: add RW API support for signature MRs Christoph Hellwig
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 66+ messages in thread
From: Christoph Hellwig @ 2016-04-11 21:32 UTC (permalink / raw)
  To: dledford; +Cc: bart.vanassche, swise, sagi, linux-rdma, target-devel

Replace the homegrown RDMA READ/WRITE code in srpt with the generic API.
The only real twist here is that we need to allocate one Linux scatterlist
per direct buffer in the SRP command, and chain them before handing them
off to the target core.

As a side-effect of the conversion the driver will also chain the SEND
of the SRP response to the RDMA WRITE WRs for a DATA OUT command, and
properly account for RDMA WRITE WRs instead of just for RDMA READ WRs
like the driver previously did.

We now allocate half of the SQ size to RDMA READ/WRITE contexts, assuming
by default one RDMA READ or WRITE operation per command.  If a command
has multiple operations it will eat into the budget but will still succeed,
possible after waiting for WQEs to be available.

Also ensure the QPs request the maximum allowed SGEs so that RDMA R/W API
works correctly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 737 ++++++++++++----------------------
 drivers/infiniband/ulp/srpt/ib_srpt.h |  31 +-
 2 files changed, 269 insertions(+), 499 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 8b42401..d69b1a9 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -765,52 +765,6 @@ static int srpt_post_recv(struct srpt_device *sdev,
 }
 
 /**
- * srpt_post_send() - Post an IB send request.
- *
- * Returns zero upon success and a non-zero value upon failure.
- */
-static int srpt_post_send(struct srpt_rdma_ch *ch,
-			  struct srpt_send_ioctx *ioctx, int len)
-{
-	struct ib_sge list;
-	struct ib_send_wr wr, *bad_wr;
-	struct srpt_device *sdev = ch->sport->sdev;
-	int ret;
-
-	atomic_inc(&ch->req_lim);
-
-	ret = -ENOMEM;
-	if (unlikely(atomic_dec_return(&ch->sq_wr_avail) < 0)) {
-		pr_warn("IB send queue full (needed 1)\n");
-		goto out;
-	}
-
-	ib_dma_sync_single_for_device(sdev->device, ioctx->ioctx.dma, len,
-				      DMA_TO_DEVICE);
-
-	list.addr = ioctx->ioctx.dma;
-	list.length = len;
-	list.lkey = sdev->pd->local_dma_lkey;
-
-	ioctx->ioctx.cqe.done = srpt_send_done;
-	wr.next = NULL;
-	wr.wr_cqe = &ioctx->ioctx.cqe;
-	wr.sg_list = &list;
-	wr.num_sge = 1;
-	wr.opcode = IB_WR_SEND;
-	wr.send_flags = IB_SEND_SIGNALED;
-
-	ret = ib_post_send(ch->qp, &wr, &bad_wr);
-
-out:
-	if (ret < 0) {
-		atomic_inc(&ch->sq_wr_avail);
-		atomic_dec(&ch->req_lim);
-	}
-	return ret;
-}
-
-/**
  * srpt_zerolength_write() - Perform a zero-length RDMA write.
  *
  * A quote from the InfiniBand specification: C9-88: For an HCA responder
@@ -843,6 +797,110 @@ static void srpt_zerolength_write_done(struct ib_cq *cq, struct ib_wc *wc)
 	}
 }
 
+static int srpt_alloc_rw_ctxs(struct srpt_send_ioctx *ioctx,
+		struct srp_direct_buf *db, int nbufs, struct scatterlist **sg,
+		unsigned *sg_cnt)
+{
+	enum dma_data_direction dir = target_reverse_dma_direction(&ioctx->cmd);
+	struct srpt_rdma_ch *ch = ioctx->ch;
+	struct scatterlist *prev = NULL;
+	unsigned prev_nents;
+	int ret, i;
+
+	if (nbufs == 1) {
+		ioctx->rw_ctxs = &ioctx->s_rw_ctx;
+	} else {
+		ioctx->rw_ctxs = kmalloc_array(nbufs, sizeof(*ioctx->rw_ctxs),
+				GFP_KERNEL);
+		if (!ioctx->rw_ctxs)
+			return -ENOMEM;
+	}
+
+	for (i = 0; i < nbufs; i++, db++) {
+		struct srpt_rw_ctx *ctx = &ioctx->rw_ctxs[i];
+		u64 remote_addr = be64_to_cpu(db->va);
+		u32 size = be32_to_cpu(db->len);
+		u32 rkey = be32_to_cpu(db->key);
+
+		ret = target_alloc_sgl(&ctx->sg, &ctx->nents, size, false,
+				i < nbufs - 1);
+		if (ret)
+			goto unwind;
+
+		ret = rdma_rw_ctx_init(&ctx->rw, ch->qp, ch->sport->port,
+				ctx->sg, ctx->nents, 0, remote_addr, rkey, dir);
+		if (ret < 0) {
+			target_free_sgl(ctx->sg, ctx->nents);
+			goto unwind;
+		}
+
+		ioctx->n_rdma += ret;
+
+		if (prev) {
+			sg_unmark_end(&prev[prev_nents - 1]);
+			sg_chain(prev, prev_nents + 1, ctx->sg);
+		} else {
+			*sg = ctx->sg;
+		}
+
+		prev = ctx->sg;
+		prev_nents = ctx->nents;
+
+		*sg_cnt += ctx->nents;
+	}
+
+	ioctx->n_rw_ctx = nbufs;
+	return 0;
+
+unwind:
+	while (--i >= 0) {
+		struct srpt_rw_ctx *ctx = &ioctx->rw_ctxs[i];
+
+		rdma_rw_ctx_destroy(&ctx->rw, ch->qp, ch->sport->port,
+				ctx->sg, ctx->nents, dir);
+		target_free_sgl(ctx->sg, ctx->nents);
+	}
+	if (ioctx->rw_ctxs != &ioctx->s_rw_ctx)
+		kfree(ioctx->rw_ctxs);
+	return ret;
+}
+
+static void srpt_free_rw_ctxs(struct srpt_rdma_ch *ch,
+				    struct srpt_send_ioctx *ioctx)
+{
+	enum dma_data_direction dir = target_reverse_dma_direction(&ioctx->cmd);
+	int i;
+
+	for (i = 0; i < ioctx->n_rw_ctx; i++) {
+		struct srpt_rw_ctx *ctx = &ioctx->rw_ctxs[i];
+
+		rdma_rw_ctx_destroy(&ctx->rw, ch->qp, ch->sport->port,
+				ctx->sg, ctx->nents, dir);
+		target_free_sgl(ctx->sg, ctx->nents);
+	}
+
+	if (ioctx->rw_ctxs != &ioctx->s_rw_ctx)
+		kfree(ioctx->rw_ctxs);
+}
+
+static inline void *srpt_get_desc_buf(struct srp_cmd *srp_cmd)
+{
+	/*
+	 * The pointer computations below will only be compiled correctly
+	 * if srp_cmd::add_data is declared as s8*, u8*, s8[] or u8[], so check
+	 * whether srp_cmd::add_data has been declared as a byte pointer.
+	 */
+	BUILD_BUG_ON(!__same_type(srp_cmd->add_data[0], (s8)0) &&
+		     !__same_type(srp_cmd->add_data[0], (u8)0));
+
+	/*
+	 * According to the SRP spec, the lower two bits of the 'ADDITIONAL
+	 * CDB LENGTH' field are reserved and the size in bytes of this field
+	 * is four times the value specified in bits 3..7. Hence the "& ~3".
+	 */
+	return srp_cmd->add_data + (srp_cmd->add_cdb_len & ~3);
+}
+
 /**
  * srpt_get_desc_tbl() - Parse the data descriptors of an SRP_CMD request.
  * @ioctx: Pointer to the I/O context associated with the request.
@@ -858,94 +916,59 @@ static void srpt_zerolength_write_done(struct ib_cq *cq, struct ib_wc *wc)
  * -ENOMEM when memory allocation fails and zero upon success.
  */
 static int srpt_get_desc_tbl(struct srpt_send_ioctx *ioctx,
-			     struct srp_cmd *srp_cmd,
-			     enum dma_data_direction *dir, u64 *data_len)
+		struct srp_cmd *srp_cmd, enum dma_data_direction *dir,
+		struct scatterlist **sg, unsigned *sg_cnt, u64 *data_len)
 {
-	struct srp_indirect_buf *idb;
-	struct srp_direct_buf *db;
-	unsigned add_cdb_offset;
-	int ret;
-
-	/*
-	 * The pointer computations below will only be compiled correctly
-	 * if srp_cmd::add_data is declared as s8*, u8*, s8[] or u8[], so check
-	 * whether srp_cmd::add_data has been declared as a byte pointer.
-	 */
-	BUILD_BUG_ON(!__same_type(srp_cmd->add_data[0], (s8)0)
-		     && !__same_type(srp_cmd->add_data[0], (u8)0));
-
 	BUG_ON(!dir);
 	BUG_ON(!data_len);
 
-	ret = 0;
-	*data_len = 0;
-
 	/*
 	 * The lower four bits of the buffer format field contain the DATA-IN
 	 * buffer descriptor format, and the highest four bits contain the
 	 * DATA-OUT buffer descriptor format.
 	 */
-	*dir = DMA_NONE;
 	if (srp_cmd->buf_fmt & 0xf)
 		/* DATA-IN: transfer data from target to initiator (read). */
 		*dir = DMA_FROM_DEVICE;
 	else if (srp_cmd->buf_fmt >> 4)
 		/* DATA-OUT: transfer data from initiator to target (write). */
 		*dir = DMA_TO_DEVICE;
+	else
+		*dir = DMA_NONE;
+
+	/* initialize data_direction early as srpt_alloc_rw_ctxs needs it */
+	ioctx->cmd.data_direction = *dir;
 
-	/*
-	 * According to the SRP spec, the lower two bits of the 'ADDITIONAL
-	 * CDB LENGTH' field are reserved and the size in bytes of this field
-	 * is four times the value specified in bits 3..7. Hence the "& ~3".
-	 */
-	add_cdb_offset = srp_cmd->add_cdb_len & ~3;
 	if (((srp_cmd->buf_fmt & 0xf) == SRP_DATA_DESC_DIRECT) ||
 	    ((srp_cmd->buf_fmt >> 4) == SRP_DATA_DESC_DIRECT)) {
-		ioctx->n_rbuf = 1;
-		ioctx->rbufs = &ioctx->single_rbuf;
+	    	struct srp_direct_buf *db = srpt_get_desc_buf(srp_cmd);
 
-		db = (struct srp_direct_buf *)(srp_cmd->add_data
-					       + add_cdb_offset);
-		memcpy(ioctx->rbufs, db, sizeof(*db));
 		*data_len = be32_to_cpu(db->len);
+		return srpt_alloc_rw_ctxs(ioctx, db, 1, sg, sg_cnt);
 	} else if (((srp_cmd->buf_fmt & 0xf) == SRP_DATA_DESC_INDIRECT) ||
 		   ((srp_cmd->buf_fmt >> 4) == SRP_DATA_DESC_INDIRECT)) {
-		idb = (struct srp_indirect_buf *)(srp_cmd->add_data
-						  + add_cdb_offset);
+		struct srp_indirect_buf *idb = srpt_get_desc_buf(srp_cmd);
+		int nbufs = be32_to_cpu(idb->table_desc.len) /
+				sizeof(struct srp_direct_buf);
 
-		ioctx->n_rbuf = be32_to_cpu(idb->table_desc.len) / sizeof(*db);
-
-		if (ioctx->n_rbuf >
+		if (nbufs >
 		    (srp_cmd->data_out_desc_cnt + srp_cmd->data_in_desc_cnt)) {
 			pr_err("received unsupported SRP_CMD request"
 			       " type (%u out + %u in != %u / %zu)\n",
 			       srp_cmd->data_out_desc_cnt,
 			       srp_cmd->data_in_desc_cnt,
 			       be32_to_cpu(idb->table_desc.len),
-			       sizeof(*db));
-			ioctx->n_rbuf = 0;
-			ret = -EINVAL;
-			goto out;
-		}
-
-		if (ioctx->n_rbuf == 1)
-			ioctx->rbufs = &ioctx->single_rbuf;
-		else {
-			ioctx->rbufs =
-				kmalloc(ioctx->n_rbuf * sizeof(*db), GFP_ATOMIC);
-			if (!ioctx->rbufs) {
-				ioctx->n_rbuf = 0;
-				ret = -ENOMEM;
-				goto out;
-			}
+			       sizeof(struct srp_direct_buf));
+			return -EINVAL;
 		}
 
-		db = idb->desc_list;
-		memcpy(ioctx->rbufs, db, ioctx->n_rbuf * sizeof(*db));
 		*data_len = be32_to_cpu(idb->len);
+		return srpt_alloc_rw_ctxs(ioctx, idb->desc_list, nbufs,
+				sg, sg_cnt);
+	} else {
+		*data_len = 0;
+		return 0;
 	}
-out:
-	return ret;
 }
 
 /**
@@ -1049,217 +1072,6 @@ static int srpt_ch_qp_err(struct srpt_rdma_ch *ch)
 }
 
 /**
- * srpt_unmap_sg_to_ib_sge() - Unmap an IB SGE list.
- */
-static void srpt_unmap_sg_to_ib_sge(struct srpt_rdma_ch *ch,
-				    struct srpt_send_ioctx *ioctx)
-{
-	struct scatterlist *sg;
-	enum dma_data_direction dir;
-
-	BUG_ON(!ch);
-	BUG_ON(!ioctx);
-	BUG_ON(ioctx->n_rdma && !ioctx->rdma_wrs);
-
-	while (ioctx->n_rdma)
-		kfree(ioctx->rdma_wrs[--ioctx->n_rdma].wr.sg_list);
-
-	kfree(ioctx->rdma_wrs);
-	ioctx->rdma_wrs = NULL;
-
-	if (ioctx->mapped_sg_count) {
-		sg = ioctx->sg;
-		WARN_ON(!sg);
-		dir = ioctx->cmd.data_direction;
-		BUG_ON(dir == DMA_NONE);
-		ib_dma_unmap_sg(ch->sport->sdev->device, sg, ioctx->sg_cnt,
-				target_reverse_dma_direction(&ioctx->cmd));
-		ioctx->mapped_sg_count = 0;
-	}
-}
-
-/**
- * srpt_map_sg_to_ib_sge() - Map an SG list to an IB SGE list.
- */
-static int srpt_map_sg_to_ib_sge(struct srpt_rdma_ch *ch,
-				 struct srpt_send_ioctx *ioctx)
-{
-	struct ib_device *dev = ch->sport->sdev->device;
-	struct se_cmd *cmd;
-	struct scatterlist *sg, *sg_orig;
-	int sg_cnt;
-	enum dma_data_direction dir;
-	struct ib_rdma_wr *riu;
-	struct srp_direct_buf *db;
-	dma_addr_t dma_addr;
-	struct ib_sge *sge;
-	u64 raddr;
-	u32 rsize;
-	u32 tsize;
-	u32 dma_len;
-	int count, nrdma;
-	int i, j, k;
-
-	BUG_ON(!ch);
-	BUG_ON(!ioctx);
-	cmd = &ioctx->cmd;
-	dir = cmd->data_direction;
-	BUG_ON(dir == DMA_NONE);
-
-	ioctx->sg = sg = sg_orig = cmd->t_data_sg;
-	ioctx->sg_cnt = sg_cnt = cmd->t_data_nents;
-
-	count = ib_dma_map_sg(ch->sport->sdev->device, sg, sg_cnt,
-			      target_reverse_dma_direction(cmd));
-	if (unlikely(!count))
-		return -EAGAIN;
-
-	ioctx->mapped_sg_count = count;
-
-	if (ioctx->rdma_wrs && ioctx->n_rdma_wrs)
-		nrdma = ioctx->n_rdma_wrs;
-	else {
-		nrdma = (count + SRPT_DEF_SG_PER_WQE - 1) / SRPT_DEF_SG_PER_WQE
-			+ ioctx->n_rbuf;
-
-		ioctx->rdma_wrs = kcalloc(nrdma, sizeof(*ioctx->rdma_wrs),
-				GFP_KERNEL);
-		if (!ioctx->rdma_wrs)
-			goto free_mem;
-
-		ioctx->n_rdma_wrs = nrdma;
-	}
-
-	db = ioctx->rbufs;
-	tsize = cmd->data_length;
-	dma_len = ib_sg_dma_len(dev, &sg[0]);
-	riu = ioctx->rdma_wrs;
-
-	/*
-	 * For each remote desc - calculate the #ib_sge.
-	 * If #ib_sge < SRPT_DEF_SG_PER_WQE per rdma operation then
-	 *      each remote desc rdma_iu is required a rdma wr;
-	 * else
-	 *      we need to allocate extra rdma_iu to carry extra #ib_sge in
-	 *      another rdma wr
-	 */
-	for (i = 0, j = 0;
-	     j < count && i < ioctx->n_rbuf && tsize > 0; ++i, ++riu, ++db) {
-		rsize = be32_to_cpu(db->len);
-		raddr = be64_to_cpu(db->va);
-		riu->remote_addr = raddr;
-		riu->rkey = be32_to_cpu(db->key);
-		riu->wr.num_sge = 0;
-
-		/* calculate how many sge required for this remote_buf */
-		while (rsize > 0 && tsize > 0) {
-
-			if (rsize >= dma_len) {
-				tsize -= dma_len;
-				rsize -= dma_len;
-				raddr += dma_len;
-
-				if (tsize > 0) {
-					++j;
-					if (j < count) {
-						sg = sg_next(sg);
-						dma_len = ib_sg_dma_len(
-								dev, sg);
-					}
-				}
-			} else {
-				tsize -= rsize;
-				dma_len -= rsize;
-				rsize = 0;
-			}
-
-			++riu->wr.num_sge;
-
-			if (rsize > 0 &&
-			    riu->wr.num_sge == SRPT_DEF_SG_PER_WQE) {
-				++ioctx->n_rdma;
-				riu->wr.sg_list = kmalloc_array(riu->wr.num_sge,
-						sizeof(*riu->wr.sg_list),
-						GFP_KERNEL);
-				if (!riu->wr.sg_list)
-					goto free_mem;
-
-				++riu;
-				riu->wr.num_sge = 0;
-				riu->remote_addr = raddr;
-				riu->rkey = be32_to_cpu(db->key);
-			}
-		}
-
-		++ioctx->n_rdma;
-		riu->wr.sg_list = kmalloc_array(riu->wr.num_sge,
-					sizeof(*riu->wr.sg_list),
-					GFP_KERNEL);
-		if (!riu->wr.sg_list)
-			goto free_mem;
-	}
-
-	db = ioctx->rbufs;
-	tsize = cmd->data_length;
-	riu = ioctx->rdma_wrs;
-	sg = sg_orig;
-	dma_len = ib_sg_dma_len(dev, &sg[0]);
-	dma_addr = ib_sg_dma_address(dev, &sg[0]);
-
-	/* this second loop is really mapped sg_addres to rdma_iu->ib_sge */
-	for (i = 0, j = 0;
-	     j < count && i < ioctx->n_rbuf && tsize > 0; ++i, ++riu, ++db) {
-		rsize = be32_to_cpu(db->len);
-		sge = riu->wr.sg_list;
-		k = 0;
-
-		while (rsize > 0 && tsize > 0) {
-			sge->addr = dma_addr;
-			sge->lkey = ch->sport->sdev->pd->local_dma_lkey;
-
-			if (rsize >= dma_len) {
-				sge->length =
-					(tsize < dma_len) ? tsize : dma_len;
-				tsize -= dma_len;
-				rsize -= dma_len;
-
-				if (tsize > 0) {
-					++j;
-					if (j < count) {
-						sg = sg_next(sg);
-						dma_len = ib_sg_dma_len(
-								dev, sg);
-						dma_addr = ib_sg_dma_address(
-								dev, sg);
-					}
-				}
-			} else {
-				sge->length = (tsize < rsize) ? tsize : rsize;
-				tsize -= rsize;
-				dma_len -= rsize;
-				dma_addr += rsize;
-				rsize = 0;
-			}
-
-			++k;
-			if (k == riu->wr.num_sge && rsize > 0 && tsize > 0) {
-				++riu;
-				sge = riu->wr.sg_list;
-				k = 0;
-			} else if (rsize > 0 && tsize > 0)
-				++sge;
-		}
-	}
-
-	return 0;
-
-free_mem:
-	srpt_unmap_sg_to_ib_sge(ch, ioctx);
-
-	return -ENOMEM;
-}
-
-/**
  * srpt_get_send_ioctx() - Obtain an I/O context for sending to the initiator.
  */
 static struct srpt_send_ioctx *srpt_get_send_ioctx(struct srpt_rdma_ch *ch)
@@ -1284,12 +1096,7 @@ static struct srpt_send_ioctx *srpt_get_send_ioctx(struct srpt_rdma_ch *ch)
 	BUG_ON(ioctx->ch != ch);
 	spin_lock_init(&ioctx->spinlock);
 	ioctx->state = SRPT_STATE_NEW;
-	ioctx->n_rbuf = 0;
-	ioctx->rbufs = NULL;
-	ioctx->n_rdma = 0;
-	ioctx->n_rdma_wrs = 0;
-	ioctx->rdma_wrs = NULL;
-	ioctx->mapped_sg_count = 0;
+	ioctx->n_rw_ctx = 0;
 	init_completion(&ioctx->tx_done);
 	ioctx->queue_status_only = false;
 	/*
@@ -1359,7 +1166,6 @@ static int srpt_abort_cmd(struct srpt_send_ioctx *ioctx)
 		 * SRP_RSP sending failed or the SRP_RSP send completion has
 		 * not been received in time.
 		 */
-		srpt_unmap_sg_to_ib_sge(ioctx->ch, ioctx);
 		transport_generic_free_cmd(&ioctx->cmd, 0);
 		break;
 	case SRPT_STATE_MGMT_RSP_SENT:
@@ -1387,6 +1193,7 @@ static void srpt_rdma_read_done(struct ib_cq *cq, struct ib_wc *wc)
 
 	WARN_ON(ioctx->n_rdma <= 0);
 	atomic_add(ioctx->n_rdma, &ch->sq_wr_avail);
+	ioctx->n_rdma = 0;
 
 	if (unlikely(wc->status != IB_WC_SUCCESS)) {
 		pr_info("RDMA_READ for ioctx 0x%p failed with status %d\n",
@@ -1403,23 +1210,6 @@ static void srpt_rdma_read_done(struct ib_cq *cq, struct ib_wc *wc)
 		       __LINE__, srpt_get_cmd_state(ioctx));
 }
 
-static void srpt_rdma_write_done(struct ib_cq *cq, struct ib_wc *wc)
-{
-	struct srpt_send_ioctx *ioctx =
-		container_of(wc->wr_cqe, struct srpt_send_ioctx, rdma_cqe);
-
-	if (unlikely(wc->status != IB_WC_SUCCESS)) {
-		/*
-		 * Note: if an RDMA write error completion is received that
-		 * means that a SEND also has been posted. Defer further
-		 * processing of the associated command until the send error
-		 * completion has been received.
-		 */
-		pr_info("RDMA_WRITE for ioctx 0x%p failed with status %d\n",
-			ioctx, wc->status);
-	}
-}
-
 /**
  * srpt_build_cmd_rsp() - Build an SRP_RSP response.
  * @ch: RDMA channel through which the request has been received.
@@ -1531,12 +1321,14 @@ static int srpt_check_stop_free(struct se_cmd *cmd)
 /**
  * srpt_handle_cmd() - Process SRP_CMD.
  */
-static void srpt_handle_cmd(struct srpt_rdma_ch *ch,
+static int srpt_handle_cmd(struct srpt_rdma_ch *ch,
 			    struct srpt_recv_ioctx *recv_ioctx,
 			    struct srpt_send_ioctx *send_ioctx)
 {
 	struct se_cmd *cmd;
 	struct srp_cmd *srp_cmd;
+	struct scatterlist *sg = NULL;
+	unsigned sg_cnt = 0;
 	u64 data_len;
 	enum dma_data_direction dir;
 	int rc;
@@ -1563,26 +1355,34 @@ static void srpt_handle_cmd(struct srpt_rdma_ch *ch,
 		break;
 	}
 
-	if (srpt_get_desc_tbl(send_ioctx, srp_cmd, &dir, &data_len)) {
-		pr_err("0x%llx: parsing SRP descriptor table failed.\n",
-		       srp_cmd->tag);
+	rc = srpt_get_desc_tbl(send_ioctx, srp_cmd, &dir, &sg, &sg_cnt,
+			&data_len);
+	if (rc) {
+		if (rc != -EAGAIN) {
+			pr_err("0x%llx: parsing SRP descriptor table failed.\n",
+			       srp_cmd->tag);
+		} else {
+			printk_ratelimited("out of MRs for 0x%llx\n", srp_cmd->tag);
+		}
 		goto release_ioctx;
 	}
 
-	rc = target_submit_cmd(cmd, ch->sess, srp_cmd->cdb,
+	rc = target_submit_cmd_map_sgls(cmd, ch->sess, srp_cmd->cdb,
 			       &send_ioctx->sense_data[0],
 			       scsilun_to_int(&srp_cmd->lun), data_len,
-			       TCM_SIMPLE_TAG, dir, TARGET_SCF_ACK_KREF);
+			       TCM_SIMPLE_TAG, dir, TARGET_SCF_ACK_KREF,
+			       sg, sg_cnt, NULL, 0, NULL, 0);
 	if (rc != 0) {
 		pr_debug("target_submit_cmd() returned %d for tag %#llx\n", rc,
 			 srp_cmd->tag);
 		goto release_ioctx;
 	}
-	return;
+	return 0;
 
 release_ioctx:
 	send_ioctx->state = SRPT_STATE_DONE;
 	srpt_release_cmd(cmd);
+	return rc;
 }
 
 static int srp_tmr_to_tcm(int fn)
@@ -1664,28 +1464,24 @@ static void srpt_handle_new_iu(struct srpt_rdma_ch *ch,
 				   recv_ioctx->ioctx.dma, srp_max_req_size,
 				   DMA_FROM_DEVICE);
 
-	if (unlikely(ch->state == CH_CONNECTING)) {
-		list_add_tail(&recv_ioctx->wait_list, &ch->cmd_wait_list);
-		goto out;
-	}
+	if (unlikely(ch->state == CH_CONNECTING))
+		goto out_wait;
 
 	if (unlikely(ch->state != CH_LIVE))
-		goto out;
+		return;
 
 	srp_cmd = recv_ioctx->ioctx.buf;
 	if (srp_cmd->opcode == SRP_CMD || srp_cmd->opcode == SRP_TSK_MGMT) {
 		if (!send_ioctx)
 			send_ioctx = srpt_get_send_ioctx(ch);
-		if (unlikely(!send_ioctx)) {
-			list_add_tail(&recv_ioctx->wait_list,
-				      &ch->cmd_wait_list);
-			goto out;
-		}
+		if (unlikely(!send_ioctx))
+			goto out_wait;
 	}
 
 	switch (srp_cmd->opcode) {
 	case SRP_CMD:
-		srpt_handle_cmd(ch, recv_ioctx, send_ioctx);
+		if (srpt_handle_cmd(ch, recv_ioctx, send_ioctx) == -EAGAIN)
+			goto out_wait;
 		break;
 	case SRP_TSK_MGMT:
 		srpt_handle_tsk_mgmt(ch, recv_ioctx, send_ioctx);
@@ -1709,8 +1505,10 @@ static void srpt_handle_new_iu(struct srpt_rdma_ch *ch,
 	}
 
 	srpt_post_recv(ch->sport->sdev, recv_ioctx);
-out:
 	return;
+
+out_wait:
+	list_add_tail(&recv_ioctx->wait_list, &ch->cmd_wait_list);
 }
 
 static void srpt_recv_done(struct ib_cq *cq, struct ib_wc *wc)
@@ -1779,14 +1577,13 @@ static void srpt_send_done(struct ib_cq *cq, struct ib_wc *wc)
 	WARN_ON(state != SRPT_STATE_CMD_RSP_SENT &&
 		state != SRPT_STATE_MGMT_RSP_SENT);
 
-	atomic_inc(&ch->sq_wr_avail);
+	atomic_add(1 + ioctx->n_rdma, &ch->sq_wr_avail);
 
 	if (wc->status != IB_WC_SUCCESS)
 		pr_info("sending response for ioctx 0x%p failed"
 			" with status %d\n", ioctx, wc->status);
 
 	if (state != SRPT_STATE_DONE) {
-		srpt_unmap_sg_to_ib_sge(ch, ioctx);
 		transport_generic_free_cmd(&ioctx->cmd, 0);
 	} else {
 		pr_err("IB completion has been received too late for"
@@ -1832,8 +1629,18 @@ retry:
 	qp_init->srq = sdev->srq;
 	qp_init->sq_sig_type = IB_SIGNAL_REQ_WR;
 	qp_init->qp_type = IB_QPT_RC;
-	qp_init->cap.max_send_wr = srp_sq_size;
-	qp_init->cap.max_send_sge = SRPT_DEF_SG_PER_WQE;
+	/*
+	 * We divide up our send queue size into half SEND WRs to send the
+	 * completions, and half R/W contexts to actually do the RDMA
+	 * READ/WRITE transfers.  Note that we need to allocate CQ slots for
+	 * both both, as RDMA contexts will also post completions for the
+	 * RDMA READ case.
+	 */
+	qp_init->cap.max_send_wr = srp_sq_size / 2;
+	qp_init->cap.max_rdma_ctxs = srp_sq_size / 2;
+	qp_init->cap.max_send_sge = max(sdev->device->attrs.max_sge_rd,
+					sdev->device->attrs.max_sge);
+	qp_init->port_num = ch->sport->port;
 
 	ch->qp = ib_create_qp(sdev->pd, qp_init);
 	if (IS_ERR(ch->qp)) {
@@ -2386,95 +2193,6 @@ static int srpt_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
 	return ret;
 }
 
-/**
- * srpt_perform_rdmas() - Perform IB RDMA.
- *
- * Returns zero upon success or a negative number upon failure.
- */
-static int srpt_perform_rdmas(struct srpt_rdma_ch *ch,
-			      struct srpt_send_ioctx *ioctx)
-{
-	struct ib_send_wr *bad_wr;
-	int sq_wr_avail, ret, i;
-	enum dma_data_direction dir;
-	const int n_rdma = ioctx->n_rdma;
-
-	dir = ioctx->cmd.data_direction;
-	if (dir == DMA_TO_DEVICE) {
-		/* write */
-		ret = -ENOMEM;
-		sq_wr_avail = atomic_sub_return(n_rdma, &ch->sq_wr_avail);
-		if (sq_wr_avail < 0) {
-			pr_warn("IB send queue full (needed %d)\n",
-				n_rdma);
-			goto out;
-		}
-	}
-
-	for (i = 0; i < n_rdma; i++) {
-		struct ib_send_wr *wr = &ioctx->rdma_wrs[i].wr;
-
-		wr->opcode = (dir == DMA_FROM_DEVICE) ?
-				IB_WR_RDMA_WRITE : IB_WR_RDMA_READ;
-
-		if (i == n_rdma - 1) {
-			/* only get completion event for the last rdma read */
-			if (dir == DMA_TO_DEVICE) {
-				wr->send_flags = IB_SEND_SIGNALED;
-				ioctx->rdma_cqe.done = srpt_rdma_read_done;
-			} else {
-				ioctx->rdma_cqe.done = srpt_rdma_write_done;
-			}
-			wr->wr_cqe = &ioctx->rdma_cqe;
-			wr->next = NULL;
-		} else {
-			wr->wr_cqe = NULL;
-			wr->next = &ioctx->rdma_wrs[i + 1].wr;
-		}
-	}
-
-	ret = ib_post_send(ch->qp, &ioctx->rdma_wrs->wr, &bad_wr);
-	if (ret)
-		pr_err("%s[%d]: ib_post_send() returned %d for %d/%d\n",
-				 __func__, __LINE__, ret, i, n_rdma);
-out:
-	if (unlikely(dir == DMA_TO_DEVICE && ret < 0))
-		atomic_add(n_rdma, &ch->sq_wr_avail);
-	return ret;
-}
-
-/**
- * srpt_xfer_data() - Start data transfer from initiator to target.
- */
-static int srpt_xfer_data(struct srpt_rdma_ch *ch,
-			  struct srpt_send_ioctx *ioctx)
-{
-	int ret;
-
-	ret = srpt_map_sg_to_ib_sge(ch, ioctx);
-	if (ret) {
-		pr_err("%s[%d] ret=%d\n", __func__, __LINE__, ret);
-		goto out;
-	}
-
-	ret = srpt_perform_rdmas(ch, ioctx);
-	if (ret) {
-		if (ret == -EAGAIN || ret == -ENOMEM)
-			pr_info("%s[%d] queue full -- ret=%d\n",
-				__func__, __LINE__, ret);
-		else
-			pr_err("%s[%d] fatal error -- ret=%d\n",
-			       __func__, __LINE__, ret);
-		goto out_unmap;
-	}
-
-out:
-	return ret;
-out_unmap:
-	srpt_unmap_sg_to_ib_sge(ch, ioctx);
-	goto out;
-}
-
 static int srpt_write_pending_status(struct se_cmd *se_cmd)
 {
 	struct srpt_send_ioctx *ioctx;
@@ -2491,11 +2209,42 @@ static int srpt_write_pending(struct se_cmd *se_cmd)
 	struct srpt_send_ioctx *ioctx =
 		container_of(se_cmd, struct srpt_send_ioctx, cmd);
 	struct srpt_rdma_ch *ch = ioctx->ch;
+	struct ib_send_wr *first_wr = NULL, *bad_wr;
+	struct ib_cqe *cqe = &ioctx->rdma_cqe;
 	enum srpt_command_state new_state;
+	int ret, i;
 
 	new_state = srpt_set_cmd_state(ioctx, SRPT_STATE_NEED_DATA);
 	WARN_ON(new_state == SRPT_STATE_DONE);
-	return srpt_xfer_data(ch, ioctx);
+
+	if (atomic_sub_return(ioctx->n_rdma, &ch->sq_wr_avail) < 0) {
+		pr_warn("%s: IB send queue full (needed %d)\n",
+				__func__, ioctx->n_rdma);
+		ret = -ENOMEM;
+		goto out_undo;
+	}
+
+	cqe->done = srpt_rdma_read_done;
+	for (i = ioctx->n_rw_ctx - 1; i >= 0; i--) {
+		struct srpt_rw_ctx *ctx = &ioctx->rw_ctxs[i];
+
+		first_wr = rdma_rw_ctx_wrs(&ctx->rw, ch->qp, ch->sport->port,
+				cqe, first_wr);
+		cqe = NULL;
+	}
+	
+	ret = ib_post_send(ch->qp, first_wr, &bad_wr);
+	if (ret) {
+		pr_err("%s: ib_post_send() returned %d for %d (avail: %d)\n",
+			 __func__, ret, ioctx->n_rdma,
+			 atomic_read(&ch->sq_wr_avail));
+		goto out_undo;
+	}
+
+	return 0;
+out_undo:
+	atomic_add(ioctx->n_rdma, &ch->sq_wr_avail);
+	return ret;
 }
 
 static u8 tcm_to_srp_tsk_mgmt_status(const int tcm_mgmt_status)
@@ -2517,17 +2266,17 @@ static u8 tcm_to_srp_tsk_mgmt_status(const int tcm_mgmt_status)
  */
 static void srpt_queue_response(struct se_cmd *cmd)
 {
-	struct srpt_rdma_ch *ch;
-	struct srpt_send_ioctx *ioctx;
+	struct srpt_send_ioctx *ioctx =
+		container_of(cmd, struct srpt_send_ioctx, cmd);
+	struct srpt_rdma_ch *ch = ioctx->ch;
+	struct srpt_device *sdev = ch->sport->sdev;
+	struct ib_send_wr send_wr, *first_wr = NULL, *bad_wr;
+	struct ib_sge sge;
 	enum srpt_command_state state;
 	unsigned long flags;
-	int ret;
-	enum dma_data_direction dir;
-	int resp_len;
+	int resp_len, ret, i;
 	u8 srp_tm_status;
 
-	ioctx = container_of(cmd, struct srpt_send_ioctx, cmd);
-	ch = ioctx->ch;
 	BUG_ON(!ch);
 
 	spin_lock_irqsave(&ioctx->spinlock, flags);
@@ -2554,17 +2303,19 @@ static void srpt_queue_response(struct se_cmd *cmd)
 		return;
 	}
 
-	dir = ioctx->cmd.data_direction;
-
 	/* For read commands, transfer the data to the initiator. */
-	if (dir == DMA_FROM_DEVICE && ioctx->cmd.data_length &&
+	if (ioctx->cmd.data_direction == DMA_FROM_DEVICE &&
+	    ioctx->cmd.data_length &&
 	    !ioctx->queue_status_only) {
-		ret = srpt_xfer_data(ch, ioctx);
-		if (ret) {
-			pr_err("xfer_data failed for tag %llu\n",
-			       ioctx->cmd.tag);
-			return;
+		for (i = ioctx->n_rw_ctx - 1; i >= 0; i--) {
+			struct srpt_rw_ctx *ctx = &ioctx->rw_ctxs[i];
+
+			first_wr = rdma_rw_ctx_wrs(&ctx->rw, ch->qp,
+					ch->sport->port, NULL,
+					first_wr ? first_wr : &send_wr);
 		}
+	} else {
+		first_wr = &send_wr;
 	}
 
 	if (state != SRPT_STATE_MGMT)
@@ -2576,14 +2327,46 @@ static void srpt_queue_response(struct se_cmd *cmd)
 		resp_len = srpt_build_tskmgmt_rsp(ch, ioctx, srp_tm_status,
 						 ioctx->cmd.tag);
 	}
-	ret = srpt_post_send(ch, ioctx, resp_len);
-	if (ret) {
-		pr_err("sending cmd response failed for tag %llu\n",
-		       ioctx->cmd.tag);
-		srpt_unmap_sg_to_ib_sge(ch, ioctx);
-		srpt_set_cmd_state(ioctx, SRPT_STATE_DONE);
-		target_put_sess_cmd(&ioctx->cmd);
+
+	atomic_inc(&ch->req_lim);
+
+	if (unlikely(atomic_sub_return(1 + ioctx->n_rdma,
+			&ch->sq_wr_avail) < 0)) {
+		pr_warn("%s: IB send queue full (needed %d)\n",
+				__func__, ioctx->n_rdma);
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ib_dma_sync_single_for_device(sdev->device, ioctx->ioctx.dma, resp_len,
+				      DMA_TO_DEVICE);
+
+	sge.addr = ioctx->ioctx.dma;
+	sge.length = resp_len;
+	sge.lkey = sdev->pd->local_dma_lkey;
+
+	ioctx->ioctx.cqe.done = srpt_send_done;
+	send_wr.next = NULL;
+	send_wr.wr_cqe = &ioctx->ioctx.cqe;
+	send_wr.sg_list = &sge;
+	send_wr.num_sge = 1;
+	send_wr.opcode = IB_WR_SEND;
+	send_wr.send_flags = IB_SEND_SIGNALED;
+
+	ret = ib_post_send(ch->qp, first_wr, &bad_wr);
+	if (ret < 0) {
+		pr_err("%s: sending cmd response failed for tag %llu (%d)\n",
+			__func__, ioctx->cmd.tag, ret);
+		goto out;
 	}
+
+	return;
+
+out:
+	atomic_add(1 + ioctx->n_rdma, &ch->sq_wr_avail);
+	atomic_dec(&ch->req_lim);
+	srpt_set_cmd_state(ioctx, SRPT_STATE_DONE);
+	target_put_sess_cmd(&ioctx->cmd);
 }
 
 static int srpt_queue_data_in(struct se_cmd *cmd)
@@ -2599,10 +2382,6 @@ static void srpt_queue_tm_rsp(struct se_cmd *cmd)
 
 static void srpt_aborted_task(struct se_cmd *cmd)
 {
-	struct srpt_send_ioctx *ioctx = container_of(cmd,
-				struct srpt_send_ioctx, cmd);
-
-	srpt_unmap_sg_to_ib_sge(ioctx->ch, ioctx);
 }
 
 static int srpt_queue_status(struct se_cmd *cmd)
@@ -2903,12 +2682,10 @@ static void srpt_release_cmd(struct se_cmd *se_cmd)
 	unsigned long flags;
 
 	WARN_ON(ioctx->state != SRPT_STATE_DONE);
-	WARN_ON(ioctx->mapped_sg_count != 0);
 
-	if (ioctx->n_rbuf > 1) {
-		kfree(ioctx->rbufs);
-		ioctx->rbufs = NULL;
-		ioctx->n_rbuf = 0;
+	if (ioctx->n_rw_ctx) {
+		srpt_free_rw_ctxs(ch, ioctx);
+		ioctx->n_rw_ctx = 0;
 	}
 
 	spin_lock_irqsave(&ch->spinlock, flags);
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.h b/drivers/infiniband/ulp/srpt/ib_srpt.h
index af9b8b5..fee6bfd 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.h
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.h
@@ -42,6 +42,7 @@
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_sa.h>
 #include <rdma/ib_cm.h>
+#include <rdma/rw.h>
 
 #include <scsi/srp.h>
 
@@ -105,7 +106,6 @@ enum {
 	SRP_LOGIN_RSP_MULTICHAN_MAINTAINED = 0x2,
 
 	SRPT_DEF_SG_TABLESIZE = 128,
-	SRPT_DEF_SG_PER_WQE = 16,
 
 	MIN_SRPT_SQ_SIZE = 16,
 	DEF_SRPT_SQ_SIZE = 4096,
@@ -174,21 +174,17 @@ struct srpt_recv_ioctx {
 	struct srpt_ioctx	ioctx;
 	struct list_head	wait_list;
 };
+	
+struct srpt_rw_ctx {
+	struct rdma_rw_ctx	rw;
+	struct scatterlist	*sg;
+	unsigned int		nents;
+};
 
 /**
  * struct srpt_send_ioctx - SRPT send I/O context.
  * @ioctx:       See above.
  * @ch:          Channel pointer.
- * @free_list:   Node in srpt_rdma_ch.free_list.
- * @n_rbuf:      Number of data buffers in the received SRP command.
- * @rbufs:       Pointer to SRP data buffer array.
- * @single_rbuf: SRP data buffer if the command has only a single buffer.
- * @sg:          Pointer to sg-list associated with this I/O context.
- * @sg_cnt:      SG-list size.
- * @mapped_sg_count: ib_dma_map_sg() return value.
- * @n_rdma_wrs:  Number of elements in the rdma_wrs array.
- * @rdma_wrs:    Array with information about the RDMA mapping.
- * @tag:         Tag of the received SRP information unit.
  * @spinlock:    Protects 'state'.
  * @state:       I/O context state.
  * @cmd:         Target core command data structure.
@@ -197,21 +193,18 @@ struct srpt_recv_ioctx {
 struct srpt_send_ioctx {
 	struct srpt_ioctx	ioctx;
 	struct srpt_rdma_ch	*ch;
-	struct ib_rdma_wr	*rdma_wrs;
+
+	struct srpt_rw_ctx	s_rw_ctx;
+	struct srpt_rw_ctx	*rw_ctxs;
+
 	struct ib_cqe		rdma_cqe;
-	struct srp_direct_buf	*rbufs;
-	struct srp_direct_buf	single_rbuf;
-	struct scatterlist	*sg;
 	struct list_head	free_list;
 	spinlock_t		spinlock;
 	enum srpt_command_state	state;
 	struct se_cmd		cmd;
 	struct completion	tx_done;
-	int			sg_cnt;
-	int			mapped_sg_count;
-	u16			n_rdma_wrs;
 	u8			n_rdma;
-	u8			n_rbuf;
+	u8			n_rw_ctx;
 	bool			queue_status_only;
 	u8			sense_data[TRANSPORT_SENSE_BUFFER];
 };
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 11/12] IB/core: add RW API support for signature MRs
  2016-04-11 21:32 generic RDMA READ/WRITE API V6 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2016-04-11 21:32 ` [PATCH 10/12] IB/srpt: convert to the generic RDMA READ/WRITE API Christoph Hellwig
@ 2016-04-11 21:32 ` Christoph Hellwig
       [not found]   ` <1460410360-13104-12-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  2016-04-11 21:32 ` [PATCH 12/12] IB/isert: convert to the generic RDMA READ/WRITE API Christoph Hellwig
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 66+ messages in thread
From: Christoph Hellwig @ 2016-04-11 21:32 UTC (permalink / raw)
  To: dledford; +Cc: bart.vanassche, swise, sagi, linux-rdma, target-devel

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/infiniband/core/rw.c    | 226 +++++++++++++++++++++++++++++++++++++++-
 drivers/infiniband/core/verbs.c |   1 +
 include/rdma/ib_verbs.h         |   1 +
 include/rdma/rw.h               |  20 ++++
 4 files changed, 243 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index a5a094b..7a999a5 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -319,6 +319,146 @@ out_unmap_sg:
 }
 EXPORT_SYMBOL(rdma_rw_ctx_init);
 
+/**
+ * rdma_rw_ctx_signature init - initialize a RW context with signature offload
+ * @ctx:	context to initialize
+ * @qp:		queue pair to operate on
+ * @port_num:	port num to which the connection is bound
+ * @sg:		scatterlist to READ/WRITE from/to
+ * @sg_cnt:	number of entries in @sg
+ * @prot_sg:	scatterlist to READ/WRITE protection information from/to
+ * @prot_sg_cnt: number of entries in @prot_sg
+ * @sig_attrs:	signature offloading algorithms
+ * @remote_addr:remote address to read/write (relative to @rkey)
+ * @rkey:	remote key to operate on
+ * @dir:	%DMA_TO_DEVICE for RDMA WRITE, %DMA_FROM_DEVICE for RDMA READ
+ *
+ * Returns the number of WQEs that will be needed on the workqueue if
+ * successful, or a negative error code.
+ */
+int rdma_rw_ctx_signature_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
+		u8 port_num, struct scatterlist *sg, u32 sg_cnt,
+		struct scatterlist *prot_sg, u32 prot_sg_cnt,
+		struct ib_sig_attrs *sig_attrs,
+		u64 remote_addr, u32 rkey, enum dma_data_direction dir)
+{
+	struct ib_device *dev = qp->pd->device;
+	u32 pages_per_mr = rdma_rw_fr_page_list_len(qp->pd->device);
+	struct ib_rdma_wr *rdma_wr;
+	struct ib_send_wr *prev_wr = NULL;
+	int count = 0, ret;
+
+	if (sg_cnt > pages_per_mr || prot_sg_cnt > pages_per_mr) {
+		pr_err("SG count too large\n");
+		return -EINVAL;
+	}
+
+	ret = ib_dma_map_sg(dev, sg, sg_cnt, dir);
+	if (!ret)
+		return -ENOMEM;
+	sg_cnt = ret;
+
+	ret = ib_dma_map_sg(dev, prot_sg, prot_sg_cnt, dir);
+	if (!ret) {
+		ret = -ENOMEM;
+		goto out_unmap_sg;
+	}
+	prot_sg_cnt = ret;
+
+	ctx->type = RDMA_RW_SIG_MR;
+	ctx->nr_ops = 1;
+	ctx->sig = kcalloc(1, sizeof(*ctx->sig), GFP_KERNEL);
+	if (!ctx->reg) {
+		ret = -ENOMEM;
+		goto out_unmap_prot_sg;
+	}
+
+	ret = rdma_rw_init_one_mr(qp, port_num, &ctx->sig->data, sg, sg_cnt, 0);
+	if (ret < 0)
+		goto out_free_ctx;
+	count += ret;
+	prev_wr = &ctx->sig->data.reg_wr.wr;
+
+	if (prot_sg_cnt) {
+		ret = rdma_rw_init_one_mr(qp, port_num, &ctx->sig->prot,
+				prot_sg, prot_sg_cnt, 0);
+		if (ret < 0)
+			goto out_destroy_data_mr;
+		count += ret;
+
+		if (ctx->sig->prot.inv_wr.next)
+			prev_wr->next = &ctx->sig->prot.inv_wr;
+		else
+			prev_wr->next = &ctx->sig->prot.reg_wr.wr;
+		prev_wr = &ctx->sig->prot.reg_wr.wr;
+	} else {
+		ctx->sig->prot.mr = NULL;
+	}
+
+	ctx->sig->sig_mr = ib_mr_pool_get(qp, &qp->sig_mrs);
+	if (!ctx->sig->sig_mr) {
+		ret = -EAGAIN;
+		goto out_destroy_prot_mr;
+	}
+
+	if (ctx->sig->sig_mr->need_inval) {
+		memset(&ctx->sig->sig_inv_wr, 0, sizeof(ctx->sig->sig_inv_wr));
+
+		ctx->sig->sig_inv_wr.opcode = IB_WR_LOCAL_INV;
+		ctx->sig->sig_inv_wr.ex.invalidate_rkey = ctx->sig->sig_mr->rkey;
+
+		prev_wr->next = &ctx->sig->sig_inv_wr;
+		prev_wr = &ctx->sig->sig_inv_wr;
+	}
+
+	ctx->sig->sig_wr.wr.opcode = IB_WR_REG_SIG_MR;
+	ctx->sig->sig_wr.wr.wr_cqe = NULL;
+	ctx->sig->sig_wr.wr.sg_list = &ctx->sig->data.sge;
+	ctx->sig->sig_wr.wr.num_sge = 1;
+	ctx->sig->sig_wr.access_flags = IB_ACCESS_LOCAL_WRITE;
+	ctx->sig->sig_wr.sig_attrs = sig_attrs;
+	ctx->sig->sig_wr.sig_mr = ctx->sig->sig_mr;
+	if (prot_sg_cnt)
+		ctx->sig->sig_wr.prot = &ctx->sig->prot.sge;
+	prev_wr->next = &ctx->sig->sig_wr.wr;
+	prev_wr = &ctx->sig->sig_wr.wr;
+	count++;
+
+	ctx->sig->sig_sge.addr = 0;
+	ctx->sig->sig_sge.length = ctx->sig->data.sge.length;
+	if (sig_attrs->wire.sig_type != IB_SIG_TYPE_NONE)
+		ctx->sig->sig_sge.length += ctx->sig->prot.sge.length;
+
+	rdma_wr = &ctx->sig->data.wr;
+	rdma_wr->wr.sg_list = &ctx->sig->sig_sge;
+	rdma_wr->wr.num_sge = 1;
+	rdma_wr->remote_addr = remote_addr;
+	rdma_wr->rkey = rkey;
+	if (dir == DMA_TO_DEVICE)
+		rdma_wr->wr.opcode = IB_WR_RDMA_WRITE;
+	else
+		rdma_wr->wr.opcode = IB_WR_RDMA_READ;
+	prev_wr->next = &rdma_wr->wr;
+	prev_wr = &rdma_wr->wr;
+	count++;
+
+	return count;
+
+out_destroy_prot_mr:
+	if (prot_sg_cnt)
+		ib_mr_pool_put(qp, &qp->rdma_mrs, ctx->sig->prot.mr);
+out_destroy_data_mr:
+	ib_mr_pool_put(qp, &qp->rdma_mrs, ctx->sig->data.mr);
+out_free_ctx:
+	kfree(ctx->sig);
+out_unmap_prot_sg:
+	ib_dma_unmap_sg(dev, prot_sg, prot_sg_cnt, dir);
+out_unmap_sg:
+	ib_dma_unmap_sg(dev, sg, sg_cnt, dir);
+	return ret;
+}
+EXPORT_SYMBOL(rdma_rw_ctx_signature_init);
+
 /*
  * Now that we are going to post the WRs we can update the lkey and need_inval
  * state on the MRs.  If we were doing this at init time, we would get double
@@ -354,6 +494,22 @@ struct ib_send_wr *rdma_rw_ctx_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
 	int i;
 
 	switch (ctx->type) {
+	case RDMA_RW_SIG_MR:
+		rdma_rw_update_lkey(&ctx->sig->data, true);
+		if (ctx->sig->prot.mr)
+			rdma_rw_update_lkey(&ctx->sig->prot, true);
+	
+		ctx->sig->sig_mr->need_inval = true;
+		ib_update_fast_reg_key(ctx->sig->sig_mr,
+			ib_inc_rkey(ctx->sig->sig_mr->lkey));
+		ctx->sig->sig_sge.lkey = ctx->sig->sig_mr->lkey;
+
+		if (ctx->sig->data.inv_wr.next)
+			first_wr = &ctx->sig->data.inv_wr;
+		else
+			first_wr = &ctx->sig->data.reg_wr.wr;
+		last_wr = &ctx->sig->data.wr.wr;
+		break;
 	case RDMA_RW_MR:
 		for (i = 0; i < ctx->nr_ops; i++) {
 			rdma_rw_update_lkey(&ctx->reg[i],
@@ -449,6 +605,38 @@ void rdma_rw_ctx_destroy(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
 }
 EXPORT_SYMBOL(rdma_rw_ctx_destroy);
 
+/**
+ * rdma_rw_ctx_destroy_signature - release all resources allocated by
+ *	rdma_rw_ctx_init_signature
+ * @ctx:	context to release
+ * @qp:		queue pair to operate on
+ * @port_num:	port num to which the connection is bound
+ * @sg:		scatterlist that was used for the READ/WRITE
+ * @sg_cnt:	number of entries in @sg
+ * @prot_sg:	scatterlist that was used for the READ/WRITE of the PI
+ * @prot_sg_cnt: number of entries in @prot_sg
+ * @dir:	%DMA_TO_DEVICE for RDMA WRITE, %DMA_FROM_DEVICE for RDMA READ
+ */
+void rdma_rw_ctx_destroy_signature(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
+		u8 port_num, struct scatterlist *sg, u32 sg_cnt,
+		struct scatterlist *prot_sg, u32 prot_sg_cnt,
+		enum dma_data_direction dir)
+{
+	if (WARN_ON_ONCE(ctx->type != RDMA_RW_SIG_MR))
+		return;
+
+	ib_mr_pool_put(qp, &qp->rdma_mrs, ctx->sig->data.mr);
+	if (ctx->sig->prot.mr)
+		ib_mr_pool_put(qp, &qp->rdma_mrs, ctx->sig->prot.mr);
+	ib_mr_pool_put(qp, &qp->sig_mrs, ctx->sig->sig_mr);
+	kfree(ctx->sig);
+
+	if (ctx->sig->prot.mr)
+		ib_dma_unmap_sg(qp->pd->device, prot_sg, prot_sg_cnt, dir);
+	ib_dma_unmap_sg(qp->pd->device, sg, sg_cnt, dir);
+}
+EXPORT_SYMBOL(rdma_rw_ctx_destroy_signature);
+
 void rdma_rw_init_qp(struct ib_device *dev, struct ib_qp_init_attr *attr)
 {
 	u32 factor;
@@ -468,7 +656,9 @@ void rdma_rw_init_qp(struct ib_device *dev, struct ib_qp_init_attr *attr)
 	 * we'll need two additional MRs for the registrations and the
 	 * invalidation.
 	 */
-	if (rdma_rw_can_use_mr(dev, attr->port_num))
+	if (attr->create_flags & IB_QP_CREATE_SIGNATURE_EN)
+		factor += 6;	/* (inv + reg) * (data + prot + sig) */
+	else if (rdma_rw_can_use_mr(dev, attr->port_num))
 		factor += 2;	/* inv + reg */
 
 	attr->cap.max_send_wr += factor * attr->cap.max_rdma_ctxs;
@@ -484,20 +674,46 @@ void rdma_rw_init_qp(struct ib_device *dev, struct ib_qp_init_attr *attr)
 int rdma_rw_init_mrs(struct ib_qp *qp, struct ib_qp_init_attr *attr)
 {
 	struct ib_device *dev = qp->pd->device;
+	u32 nr_mrs = 0, nr_sig_mrs = 0;
 	int ret = 0;
 
-	if (rdma_rw_can_use_mr(dev, attr->port_num)) {
-		ret = ib_mr_pool_init(qp, &qp->rdma_mrs,
-				attr->cap.max_rdma_ctxs, IB_MR_TYPE_MEM_REG,
+	if (attr->create_flags & IB_QP_CREATE_SIGNATURE_EN) {
+		nr_sig_mrs = attr->cap.max_rdma_ctxs;
+		nr_mrs = attr->cap.max_rdma_ctxs * 2;
+	} else if (rdma_rw_can_use_mr(dev, attr->port_num)) {
+		nr_mrs = attr->cap.max_rdma_ctxs;
+	}
+
+	if (nr_mrs) {
+		ret = ib_mr_pool_init(qp, &qp->rdma_mrs, nr_mrs,
+				IB_MR_TYPE_MEM_REG,
 				rdma_rw_fr_page_list_len(dev));
-		if (ret)
+		if (ret) {
+			pr_err("%s: failed to allocated %d MRs\n",
+				__func__, nr_mrs);
 			return ret;
+		}
 	}
 
+	if (nr_sig_mrs) {
+		ret = ib_mr_pool_init(qp, &qp->sig_mrs, nr_sig_mrs,
+				IB_MR_TYPE_SIGNATURE, 2);
+		if (ret) {
+			pr_err("%s: failed to allocated %d SIG MRs\n",
+				__func__, nr_mrs);
+			goto out_free_rdma_mrs;
+		}
+	}
+
+	return 0;
+
+out_free_rdma_mrs:
+	ib_mr_pool_destroy(qp, &qp->rdma_mrs);
 	return ret;
 }
 
 void rdma_rw_cleanup_mrs(struct ib_qp *qp)
 {
+	ib_mr_pool_destroy(qp, &qp->sig_mrs);
 	ib_mr_pool_destroy(qp, &qp->rdma_mrs);
 }
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 769b000..e2b6634 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -776,6 +776,7 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
 	qp->mrs_used = 0;
 	spin_lock_init(&qp->mr_lock);
 	INIT_LIST_HEAD(&qp->rdma_mrs);
+	INIT_LIST_HEAD(&qp->sig_mrs);
 
 	if (qp_init_attr->qp_type == IB_QPT_XRC_TGT)
 		return ib_create_xrc_qp(qp, qp_init_attr);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index dd8e15d..544c55b 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1435,6 +1435,7 @@ struct ib_qp {
 	spinlock_t		mr_lock;
 	int			mrs_used;
 	struct list_head	rdma_mrs;
+	struct list_head	sig_mrs;
 	struct ib_srq	       *srq;
 	struct ib_xrcd	       *xrcd; /* XRC TGT QPs only */
 	struct list_head	xrcd_list;
diff --git a/include/rdma/rw.h b/include/rdma/rw.h
index 5e93146..3d75420 100644
--- a/include/rdma/rw.h
+++ b/include/rdma/rw.h
@@ -22,6 +22,7 @@
 #define RDMA_RW_SINGLE_WR	0
 #define RDMA_RW_MULTI_WR	1
 #define RDMA_RW_MR		2
+#define RDMA_RW_SIG_MR		3
 
 struct rdma_rw_ctx {
 	/* number of RDMA READ/WRITE WRs (not counting MR WRs) */
@@ -51,6 +52,15 @@ struct rdma_rw_ctx {
 			struct ib_send_wr	inv_wr;
 			struct ib_mr		*mr;
 		} *reg;
+
+		struct {
+			struct rdma_rw_reg_ctx	data;
+			struct rdma_rw_reg_ctx	prot;
+			struct ib_send_wr	sig_inv_wr;
+			struct ib_mr		*sig_mr;
+			struct ib_sge		sig_sge;
+			struct ib_sig_handover_wr sig_wr;
+		} *sig;
 	};
 };
 
@@ -61,6 +71,16 @@ void rdma_rw_ctx_destroy(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
 		struct scatterlist *sg, u32 sg_cnt,
 		enum dma_data_direction dir);
 
+int rdma_rw_ctx_signature_init(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
+		u8 port_num, struct scatterlist *sg, u32 sg_cnt,
+		struct scatterlist *prot_sg, u32 prot_sg_cnt,
+		struct ib_sig_attrs *sig_attrs, u64 remote_addr, u32 rkey,
+		enum dma_data_direction dir);
+void rdma_rw_ctx_destroy_signature(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
+		u8 port_num, struct scatterlist *sg, u32 sg_cnt,
+		struct scatterlist *prot_sg, u32 prot_sg_cnt,
+		enum dma_data_direction dir);
+
 struct ib_send_wr *rdma_rw_ctx_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
 		u8 port_num, struct ib_cqe *cqe, struct ib_send_wr *chain_wr);
 int rdma_rw_ctx_post(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 12/12] IB/isert: convert to the generic RDMA READ/WRITE API
  2016-04-11 21:32 generic RDMA READ/WRITE API V6 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2016-04-11 21:32 ` [PATCH 11/12] IB/core: add RW API support for signature MRs Christoph Hellwig
@ 2016-04-11 21:32 ` Christoph Hellwig
       [not found]   ` <1460410360-13104-13-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  2016-04-12 18:31 ` generic RDMA READ/WRITE API V6 Steve Wise
  2016-04-22 22:29 ` Bart Van Assche
  11 siblings, 1 reply; 66+ messages in thread
From: Christoph Hellwig @ 2016-04-11 21:32 UTC (permalink / raw)
  To: dledford; +Cc: bart.vanassche, swise, sagi, linux-rdma, target-devel

Replace the homegrown RDMA READ/WRITE code in isert with the generic API,
which also adds iWarp support to the I/O path as a side effect.  Note
that full iWarp operation will need a few additional patches from Steve.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/infiniband/ulp/isert/ib_isert.c | 841 ++++----------------------------
 drivers/infiniband/ulp/isert/ib_isert.h |  69 +--
 2 files changed, 85 insertions(+), 825 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index a44a736..2fcdbe0 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -33,7 +33,8 @@
 
 #define	ISERT_MAX_CONN		8
 #define ISER_MAX_RX_CQ_LEN	(ISERT_QP_MAX_RECV_DTOS * ISERT_MAX_CONN)
-#define ISER_MAX_TX_CQ_LEN	(ISERT_QP_MAX_REQ_DTOS  * ISERT_MAX_CONN)
+#define ISER_MAX_TX_CQ_LEN \
+	((ISERT_QP_MAX_REQ_DTOS + ISCSI_DEF_XMIT_CMDS_MAX) * ISERT_MAX_CONN)
 #define ISER_MAX_CQ_LEN		(ISER_MAX_RX_CQ_LEN + ISER_MAX_TX_CQ_LEN + \
 				 ISERT_MAX_CONN)
 
@@ -46,14 +47,6 @@ static LIST_HEAD(device_list);
 static struct workqueue_struct *isert_comp_wq;
 static struct workqueue_struct *isert_release_wq;
 
-static void
-isert_unmap_cmd(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn);
-static int
-isert_map_rdma(struct isert_cmd *isert_cmd, struct iscsi_conn *conn);
-static void
-isert_unreg_rdma(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn);
-static int
-isert_reg_rdma(struct isert_cmd *isert_cmd, struct iscsi_conn *conn);
 static int
 isert_put_response(struct iscsi_conn *conn, struct iscsi_cmd *cmd);
 static int
@@ -142,6 +135,7 @@ isert_create_qp(struct isert_conn *isert_conn,
 	attr.recv_cq = comp->cq;
 	attr.cap.max_send_wr = ISERT_QP_MAX_REQ_DTOS + 1;
 	attr.cap.max_recv_wr = ISERT_QP_MAX_RECV_DTOS + 1;
+	attr.cap.max_rdma_ctxs = ISCSI_DEF_XMIT_CMDS_MAX;
 	attr.cap.max_send_sge = device->ib_device->attrs.max_sge;
 	isert_conn->max_sge = min(device->ib_device->attrs.max_sge,
 				  device->ib_device->attrs.max_sge_rd);
@@ -270,9 +264,9 @@ isert_alloc_comps(struct isert_device *device)
 				 device->ib_device->num_comp_vectors));
 
 	isert_info("Using %d CQs, %s supports %d vectors support "
-		   "Fast registration %d pi_capable %d\n",
+		   "pi_capable %d\n",
 		   device->comps_used, device->ib_device->name,
-		   device->ib_device->num_comp_vectors, device->use_fastreg,
+		   device->ib_device->num_comp_vectors,
 		   device->pi_capable);
 
 	device->comps = kcalloc(device->comps_used, sizeof(struct isert_comp),
@@ -313,18 +307,6 @@ isert_create_device_ib_res(struct isert_device *device)
 	isert_dbg("devattr->max_sge: %d\n", ib_dev->attrs.max_sge);
 	isert_dbg("devattr->max_sge_rd: %d\n", ib_dev->attrs.max_sge_rd);
 
-	/* asign function handlers */
-	if (ib_dev->attrs.device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS &&
-	    ib_dev->attrs.device_cap_flags & IB_DEVICE_SIGNATURE_HANDOVER) {
-		device->use_fastreg = 1;
-		device->reg_rdma_mem = isert_reg_rdma;
-		device->unreg_rdma_mem = isert_unreg_rdma;
-	} else {
-		device->use_fastreg = 0;
-		device->reg_rdma_mem = isert_map_rdma;
-		device->unreg_rdma_mem = isert_unmap_cmd;
-	}
-
 	ret = isert_alloc_comps(device);
 	if (ret)
 		goto out;
@@ -417,146 +399,6 @@ isert_device_get(struct rdma_cm_id *cma_id)
 }
 
 static void
-isert_conn_free_fastreg_pool(struct isert_conn *isert_conn)
-{
-	struct fast_reg_descriptor *fr_desc, *tmp;
-	int i = 0;
-
-	if (list_empty(&isert_conn->fr_pool))
-		return;
-
-	isert_info("Freeing conn %p fastreg pool", isert_conn);
-
-	list_for_each_entry_safe(fr_desc, tmp,
-				 &isert_conn->fr_pool, list) {
-		list_del(&fr_desc->list);
-		ib_dereg_mr(fr_desc->data_mr);
-		if (fr_desc->pi_ctx) {
-			ib_dereg_mr(fr_desc->pi_ctx->prot_mr);
-			ib_dereg_mr(fr_desc->pi_ctx->sig_mr);
-			kfree(fr_desc->pi_ctx);
-		}
-		kfree(fr_desc);
-		++i;
-	}
-
-	if (i < isert_conn->fr_pool_size)
-		isert_warn("Pool still has %d regions registered\n",
-			isert_conn->fr_pool_size - i);
-}
-
-static int
-isert_create_pi_ctx(struct fast_reg_descriptor *desc,
-		    struct ib_device *device,
-		    struct ib_pd *pd)
-{
-	struct pi_context *pi_ctx;
-	int ret;
-
-	pi_ctx = kzalloc(sizeof(*desc->pi_ctx), GFP_KERNEL);
-	if (!pi_ctx) {
-		isert_err("Failed to allocate pi context\n");
-		return -ENOMEM;
-	}
-
-	pi_ctx->prot_mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
-				      ISCSI_ISER_SG_TABLESIZE);
-	if (IS_ERR(pi_ctx->prot_mr)) {
-		isert_err("Failed to allocate prot frmr err=%ld\n",
-			  PTR_ERR(pi_ctx->prot_mr));
-		ret = PTR_ERR(pi_ctx->prot_mr);
-		goto err_pi_ctx;
-	}
-	desc->ind |= ISERT_PROT_KEY_VALID;
-
-	pi_ctx->sig_mr = ib_alloc_mr(pd, IB_MR_TYPE_SIGNATURE, 2);
-	if (IS_ERR(pi_ctx->sig_mr)) {
-		isert_err("Failed to allocate signature enabled mr err=%ld\n",
-			  PTR_ERR(pi_ctx->sig_mr));
-		ret = PTR_ERR(pi_ctx->sig_mr);
-		goto err_prot_mr;
-	}
-
-	desc->pi_ctx = pi_ctx;
-	desc->ind |= ISERT_SIG_KEY_VALID;
-	desc->ind &= ~ISERT_PROTECTED;
-
-	return 0;
-
-err_prot_mr:
-	ib_dereg_mr(pi_ctx->prot_mr);
-err_pi_ctx:
-	kfree(pi_ctx);
-
-	return ret;
-}
-
-static int
-isert_create_fr_desc(struct ib_device *ib_device, struct ib_pd *pd,
-		     struct fast_reg_descriptor *fr_desc)
-{
-	fr_desc->data_mr = ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
-				       ISCSI_ISER_SG_TABLESIZE);
-	if (IS_ERR(fr_desc->data_mr)) {
-		isert_err("Failed to allocate data frmr err=%ld\n",
-			  PTR_ERR(fr_desc->data_mr));
-		return PTR_ERR(fr_desc->data_mr);
-	}
-	fr_desc->ind |= ISERT_DATA_KEY_VALID;
-
-	isert_dbg("Created fr_desc %p\n", fr_desc);
-
-	return 0;
-}
-
-static int
-isert_conn_create_fastreg_pool(struct isert_conn *isert_conn)
-{
-	struct fast_reg_descriptor *fr_desc;
-	struct isert_device *device = isert_conn->device;
-	struct se_session *se_sess = isert_conn->conn->sess->se_sess;
-	struct se_node_acl *se_nacl = se_sess->se_node_acl;
-	int i, ret, tag_num;
-	/*
-	 * Setup the number of FRMRs based upon the number of tags
-	 * available to session in iscsi_target_locate_portal().
-	 */
-	tag_num = max_t(u32, ISCSIT_MIN_TAGS, se_nacl->queue_depth);
-	tag_num = (tag_num * 2) + ISCSIT_EXTRA_TAGS;
-
-	isert_conn->fr_pool_size = 0;
-	for (i = 0; i < tag_num; i++) {
-		fr_desc = kzalloc(sizeof(*fr_desc), GFP_KERNEL);
-		if (!fr_desc) {
-			isert_err("Failed to allocate fast_reg descriptor\n");
-			ret = -ENOMEM;
-			goto err;
-		}
-
-		ret = isert_create_fr_desc(device->ib_device,
-					   device->pd, fr_desc);
-		if (ret) {
-			isert_err("Failed to create fastreg descriptor err=%d\n",
-			       ret);
-			kfree(fr_desc);
-			goto err;
-		}
-
-		list_add_tail(&fr_desc->list, &isert_conn->fr_pool);
-		isert_conn->fr_pool_size++;
-	}
-
-	isert_dbg("Creating conn %p fastreg pool size=%d",
-		 isert_conn, isert_conn->fr_pool_size);
-
-	return 0;
-
-err:
-	isert_conn_free_fastreg_pool(isert_conn);
-	return ret;
-}
-
-static void
 isert_init_conn(struct isert_conn *isert_conn)
 {
 	isert_conn->state = ISER_CONN_INIT;
@@ -565,8 +407,6 @@ isert_init_conn(struct isert_conn *isert_conn)
 	init_completion(&isert_conn->login_req_comp);
 	kref_init(&isert_conn->kref);
 	mutex_init(&isert_conn->mutex);
-	spin_lock_init(&isert_conn->pool_lock);
-	INIT_LIST_HEAD(&isert_conn->fr_pool);
 	INIT_WORK(&isert_conn->release_work, isert_release_work);
 }
 
@@ -739,9 +579,6 @@ isert_connect_release(struct isert_conn *isert_conn)
 
 	BUG_ON(!device);
 
-	if (device->use_fastreg)
-		isert_conn_free_fastreg_pool(isert_conn);
-
 	isert_free_rx_descriptors(isert_conn);
 	if (isert_conn->cm_id)
 		rdma_destroy_id(isert_conn->cm_id);
@@ -1080,7 +917,6 @@ isert_init_send_wr(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd,
 {
 	struct iser_tx_desc *tx_desc = &isert_cmd->tx_desc;
 
-	isert_cmd->iser_ib_op = ISER_IB_SEND;
 	tx_desc->tx_cqe.done = isert_send_done;
 	send_wr->wr_cqe = &tx_desc->tx_cqe;
 
@@ -1160,16 +996,6 @@ isert_put_login_tx(struct iscsi_conn *conn, struct iscsi_login *login,
 	}
 	if (!login->login_failed) {
 		if (login->login_complete) {
-			if (!conn->sess->sess_ops->SessionType &&
-			    isert_conn->device->use_fastreg) {
-				ret = isert_conn_create_fastreg_pool(isert_conn);
-				if (ret) {
-					isert_err("Conn: %p failed to create"
-					       " fastreg pool\n", isert_conn);
-					return ret;
-				}
-			}
-
 			ret = isert_alloc_rx_descriptors(isert_conn);
 			if (ret)
 				return ret;
@@ -1633,97 +1459,26 @@ isert_login_recv_done(struct ib_cq *cq, struct ib_wc *wc)
 				ISER_RX_PAYLOAD_SIZE, DMA_FROM_DEVICE);
 }
 
-static int
-isert_map_data_buf(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd,
-		   struct scatterlist *sg, u32 nents, u32 length, u32 offset,
-		   enum iser_ib_op_code op, struct isert_data_buf *data)
-{
-	struct ib_device *ib_dev = isert_conn->cm_id->device;
-
-	data->dma_dir = op == ISER_IB_RDMA_WRITE ?
-			      DMA_TO_DEVICE : DMA_FROM_DEVICE;
-
-	data->len = length - offset;
-	data->offset = offset;
-	data->sg_off = data->offset / PAGE_SIZE;
-
-	data->sg = &sg[data->sg_off];
-	data->nents = min_t(unsigned int, nents - data->sg_off,
-					  ISCSI_ISER_SG_TABLESIZE);
-	data->len = min_t(unsigned int, data->len, ISCSI_ISER_SG_TABLESIZE *
-					PAGE_SIZE);
-
-	data->dma_nents = ib_dma_map_sg(ib_dev, data->sg, data->nents,
-					data->dma_dir);
-	if (unlikely(!data->dma_nents)) {
-		isert_err("Cmd: unable to dma map SGs %p\n", sg);
-		return -EINVAL;
-	}
-
-	isert_dbg("Mapped cmd: %p count: %u sg: %p sg_nents: %u rdma_len %d\n",
-		  isert_cmd, data->dma_nents, data->sg, data->nents, data->len);
-
-	return 0;
-}
-
 static void
-isert_unmap_data_buf(struct isert_conn *isert_conn, struct isert_data_buf *data)
+isert_rdma_rw_ctx_destroy(struct isert_cmd *cmd, struct isert_conn *conn)
 {
-	struct ib_device *ib_dev = isert_conn->cm_id->device;
-
-	ib_dma_unmap_sg(ib_dev, data->sg, data->nents, data->dma_dir);
-	memset(data, 0, sizeof(*data));
-}
-
-
-
-static void
-isert_unmap_cmd(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn)
-{
-	isert_dbg("Cmd %p\n", isert_cmd);
+	struct se_cmd *se_cmd = &cmd->iscsi_cmd->se_cmd;
+	enum dma_data_direction dir = target_reverse_dma_direction(se_cmd);
 
-	if (isert_cmd->data.sg) {
-		isert_dbg("Cmd %p unmap_sg op\n", isert_cmd);
-		isert_unmap_data_buf(isert_conn, &isert_cmd->data);
-	}
-
-	if (isert_cmd->rdma_wr) {
-		isert_dbg("Cmd %p free send_wr\n", isert_cmd);
-		kfree(isert_cmd->rdma_wr);
-		isert_cmd->rdma_wr = NULL;
-	}
-
-	if (isert_cmd->ib_sge) {
-		isert_dbg("Cmd %p free ib_sge\n", isert_cmd);
-		kfree(isert_cmd->ib_sge);
-		isert_cmd->ib_sge = NULL;
-	}
-}
-
-static void
-isert_unreg_rdma(struct isert_cmd *isert_cmd, struct isert_conn *isert_conn)
-{
-	isert_dbg("Cmd %p\n", isert_cmd);
-
-	if (isert_cmd->fr_desc) {
-		isert_dbg("Cmd %p free fr_desc %p\n", isert_cmd, isert_cmd->fr_desc);
-		if (isert_cmd->fr_desc->ind & ISERT_PROTECTED) {
-			isert_unmap_data_buf(isert_conn, &isert_cmd->prot);
-			isert_cmd->fr_desc->ind &= ~ISERT_PROTECTED;
-		}
-		spin_lock_bh(&isert_conn->pool_lock);
-		list_add_tail(&isert_cmd->fr_desc->list, &isert_conn->fr_pool);
-		spin_unlock_bh(&isert_conn->pool_lock);
-		isert_cmd->fr_desc = NULL;
-	}
+	if (!cmd->rw.nr_ops)
+		return;
 
-	if (isert_cmd->data.sg) {
-		isert_dbg("Cmd %p unmap_sg op\n", isert_cmd);
-		isert_unmap_data_buf(isert_conn, &isert_cmd->data);
+	if (isert_prot_cmd(conn, se_cmd)) {
+		rdma_rw_ctx_destroy_signature(&cmd->rw, conn->qp,
+				conn->cm_id->port_num, se_cmd->t_data_sg,
+				se_cmd->t_data_nents, se_cmd->t_prot_sg,
+				se_cmd->t_prot_nents, dir);
+	} else {
+		rdma_rw_ctx_destroy(&cmd->rw, conn->qp, conn->cm_id->port_num,
+				se_cmd->t_data_sg, se_cmd->t_data_nents, dir);
 	}
 
-	isert_cmd->ib_sge = NULL;
-	isert_cmd->rdma_wr = NULL;
+	cmd->rw.nr_ops = 0;
 }
 
 static void
@@ -1732,7 +1487,6 @@ isert_put_cmd(struct isert_cmd *isert_cmd, bool comp_err)
 	struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;
 	struct isert_conn *isert_conn = isert_cmd->conn;
 	struct iscsi_conn *conn = isert_conn->conn;
-	struct isert_device *device = isert_conn->device;
 	struct iscsi_text_rsp *hdr;
 
 	isert_dbg("Cmd %p\n", isert_cmd);
@@ -1760,7 +1514,7 @@ isert_put_cmd(struct isert_cmd *isert_cmd, bool comp_err)
 			}
 		}
 
-		device->unreg_rdma_mem(isert_cmd, isert_conn);
+		isert_rdma_rw_ctx_destroy(isert_cmd, isert_conn);
 		transport_generic_free_cmd(&cmd->se_cmd, 0);
 		break;
 	case ISCSI_OP_SCSI_TMFUNC:
@@ -1894,14 +1648,9 @@ isert_rdma_write_done(struct ib_cq *cq, struct ib_wc *wc)
 
 	isert_dbg("Cmd %p\n", isert_cmd);
 
-	if (isert_cmd->fr_desc && isert_cmd->fr_desc->ind & ISERT_PROTECTED) {
-		ret = isert_check_pi_status(cmd,
-				isert_cmd->fr_desc->pi_ctx->sig_mr);
-		isert_cmd->fr_desc->ind &= ~ISERT_PROTECTED;
-	}
+	ret = isert_check_pi_status(cmd, isert_cmd->rw.sig->sig_mr);
+	isert_rdma_rw_ctx_destroy(isert_cmd, isert_conn);
 
-	device->unreg_rdma_mem(isert_cmd, isert_conn);
-	isert_cmd->rdma_wr_num = 0;
 	if (ret)
 		transport_send_check_condition_and_sense(cmd, cmd->pi_err, 0);
 	else
@@ -1929,16 +1678,12 @@ isert_rdma_read_done(struct ib_cq *cq, struct ib_wc *wc)
 
 	isert_dbg("Cmd %p\n", isert_cmd);
 
-	if (isert_cmd->fr_desc && isert_cmd->fr_desc->ind & ISERT_PROTECTED) {
-		ret = isert_check_pi_status(se_cmd,
-					    isert_cmd->fr_desc->pi_ctx->sig_mr);
-		isert_cmd->fr_desc->ind &= ~ISERT_PROTECTED;
-	}
-
 	iscsit_stop_dataout_timer(cmd);
-	device->unreg_rdma_mem(isert_cmd, isert_conn);
-	cmd->write_data_done = isert_cmd->data.len;
-	isert_cmd->rdma_wr_num = 0;
+
+	if (isert_cmd->rw.type == RDMA_RW_SIG_MR)
+		ret = isert_check_pi_status(se_cmd, isert_cmd->rw.sig->sig_mr);
+	isert_rdma_rw_ctx_destroy(isert_cmd, isert_conn);
+	cmd->write_data_done = 0;
 
 	isert_dbg("Cmd: %p RDMA_READ comp calling execute_cmd\n", isert_cmd);
 	spin_lock_bh(&cmd->istate_lock);
@@ -2111,7 +1856,6 @@ isert_aborted_task(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
 {
 	struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
 	struct isert_conn *isert_conn = conn->context;
-	struct isert_device *device = isert_conn->device;
 
 	spin_lock_bh(&conn->cmd_lock);
 	if (!list_empty(&cmd->i_conn_node))
@@ -2120,8 +1864,7 @@ isert_aborted_task(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
 
 	if (cmd->data_direction == DMA_TO_DEVICE)
 		iscsit_stop_dataout_timer(cmd);
-
-	device->unreg_rdma_mem(isert_cmd, isert_conn);
+	isert_rdma_rw_ctx_destroy(isert_cmd, isert_conn);
 }
 
 static enum target_prot_op
@@ -2274,234 +2017,6 @@ isert_put_text_rsp(struct iscsi_cmd *cmd, struct iscsi_conn *conn)
 	return isert_post_response(isert_conn, isert_cmd);
 }
 
-static int
-isert_build_rdma_wr(struct isert_conn *isert_conn, struct isert_cmd *isert_cmd,
-		    struct ib_sge *ib_sge, struct ib_rdma_wr *rdma_wr,
-		    u32 data_left, u32 offset)
-{
-	struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;
-	struct scatterlist *sg_start, *tmp_sg;
-	struct isert_device *device = isert_conn->device;
-	struct ib_device *ib_dev = device->ib_device;
-	u32 sg_off, page_off;
-	int i = 0, sg_nents;
-
-	sg_off = offset / PAGE_SIZE;
-	sg_start = &cmd->se_cmd.t_data_sg[sg_off];
-	sg_nents = min(cmd->se_cmd.t_data_nents - sg_off, isert_conn->max_sge);
-	page_off = offset % PAGE_SIZE;
-
-	rdma_wr->wr.sg_list = ib_sge;
-	rdma_wr->wr.wr_cqe = &isert_cmd->tx_desc.tx_cqe;
-
-	/*
-	 * Perform mapping of TCM scatterlist memory ib_sge dma_addr.
-	 */
-	for_each_sg(sg_start, tmp_sg, sg_nents, i) {
-		isert_dbg("RDMA from SGL dma_addr: 0x%llx dma_len: %u, "
-			  "page_off: %u\n",
-			  (unsigned long long)tmp_sg->dma_address,
-			  tmp_sg->length, page_off);
-
-		ib_sge->addr = ib_sg_dma_address(ib_dev, tmp_sg) + page_off;
-		ib_sge->length = min_t(u32, data_left,
-				ib_sg_dma_len(ib_dev, tmp_sg) - page_off);
-		ib_sge->lkey = device->pd->local_dma_lkey;
-
-		isert_dbg("RDMA ib_sge: addr: 0x%llx  length: %u lkey: %x\n",
-			  ib_sge->addr, ib_sge->length, ib_sge->lkey);
-		page_off = 0;
-		data_left -= ib_sge->length;
-		if (!data_left)
-			break;
-		ib_sge++;
-		isert_dbg("Incrementing ib_sge pointer to %p\n", ib_sge);
-	}
-
-	rdma_wr->wr.num_sge = ++i;
-	isert_dbg("Set outgoing sg_list: %p num_sg: %u from TCM SGLs\n",
-		  rdma_wr->wr.sg_list, rdma_wr->wr.num_sge);
-
-	return rdma_wr->wr.num_sge;
-}
-
-static int
-isert_map_rdma(struct isert_cmd *isert_cmd, struct iscsi_conn *conn)
-{
-	struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;
-	struct se_cmd *se_cmd = &cmd->se_cmd;
-	struct isert_conn *isert_conn = conn->context;
-	struct isert_data_buf *data = &isert_cmd->data;
-	struct ib_rdma_wr *rdma_wr;
-	struct ib_sge *ib_sge;
-	u32 offset, data_len, data_left, rdma_write_max, va_offset = 0;
-	int ret = 0, i, ib_sge_cnt;
-
-	offset = isert_cmd->iser_ib_op == ISER_IB_RDMA_READ ?
-			cmd->write_data_done : 0;
-	ret = isert_map_data_buf(isert_conn, isert_cmd, se_cmd->t_data_sg,
-				 se_cmd->t_data_nents, se_cmd->data_length,
-				 offset, isert_cmd->iser_ib_op,
-				 &isert_cmd->data);
-	if (ret)
-		return ret;
-
-	data_left = data->len;
-	offset = data->offset;
-
-	ib_sge = kzalloc(sizeof(struct ib_sge) * data->nents, GFP_KERNEL);
-	if (!ib_sge) {
-		isert_warn("Unable to allocate ib_sge\n");
-		ret = -ENOMEM;
-		goto unmap_cmd;
-	}
-	isert_cmd->ib_sge = ib_sge;
-
-	isert_cmd->rdma_wr_num = DIV_ROUND_UP(data->nents, isert_conn->max_sge);
-	isert_cmd->rdma_wr = kzalloc(sizeof(struct ib_rdma_wr) *
-			isert_cmd->rdma_wr_num, GFP_KERNEL);
-	if (!isert_cmd->rdma_wr) {
-		isert_dbg("Unable to allocate isert_cmd->rdma_wr\n");
-		ret = -ENOMEM;
-		goto unmap_cmd;
-	}
-
-	rdma_write_max = isert_conn->max_sge * PAGE_SIZE;
-
-	for (i = 0; i < isert_cmd->rdma_wr_num; i++) {
-		rdma_wr = &isert_cmd->rdma_wr[i];
-		data_len = min(data_left, rdma_write_max);
-
-		rdma_wr->wr.send_flags = 0;
-		if (isert_cmd->iser_ib_op == ISER_IB_RDMA_WRITE) {
-			isert_cmd->tx_desc.tx_cqe.done = isert_rdma_write_done;
-
-			rdma_wr->wr.opcode = IB_WR_RDMA_WRITE;
-			rdma_wr->remote_addr = isert_cmd->read_va + offset;
-			rdma_wr->rkey = isert_cmd->read_stag;
-			if (i + 1 == isert_cmd->rdma_wr_num)
-				rdma_wr->wr.next = &isert_cmd->tx_desc.send_wr;
-			else
-				rdma_wr->wr.next = &isert_cmd->rdma_wr[i + 1].wr;
-		} else {
-			isert_cmd->tx_desc.tx_cqe.done = isert_rdma_read_done;
-
-			rdma_wr->wr.opcode = IB_WR_RDMA_READ;
-			rdma_wr->remote_addr = isert_cmd->write_va + va_offset;
-			rdma_wr->rkey = isert_cmd->write_stag;
-			if (i + 1 == isert_cmd->rdma_wr_num)
-				rdma_wr->wr.send_flags = IB_SEND_SIGNALED;
-			else
-				rdma_wr->wr.next = &isert_cmd->rdma_wr[i + 1].wr;
-		}
-
-		ib_sge_cnt = isert_build_rdma_wr(isert_conn, isert_cmd, ib_sge,
-					rdma_wr, data_len, offset);
-		ib_sge += ib_sge_cnt;
-
-		offset += data_len;
-		va_offset += data_len;
-		data_left -= data_len;
-	}
-
-	return 0;
-unmap_cmd:
-	isert_unmap_data_buf(isert_conn, data);
-
-	return ret;
-}
-
-static inline void
-isert_inv_rkey(struct ib_send_wr *inv_wr, struct ib_mr *mr)
-{
-	u32 rkey;
-
-	memset(inv_wr, 0, sizeof(*inv_wr));
-	inv_wr->wr_cqe = NULL;
-	inv_wr->opcode = IB_WR_LOCAL_INV;
-	inv_wr->ex.invalidate_rkey = mr->rkey;
-
-	/* Bump the key */
-	rkey = ib_inc_rkey(mr->rkey);
-	ib_update_fast_reg_key(mr, rkey);
-}
-
-static int
-isert_fast_reg_mr(struct isert_conn *isert_conn,
-		  struct fast_reg_descriptor *fr_desc,
-		  struct isert_data_buf *mem,
-		  enum isert_indicator ind,
-		  struct ib_sge *sge)
-{
-	struct isert_device *device = isert_conn->device;
-	struct ib_device *ib_dev = device->ib_device;
-	struct ib_mr *mr;
-	struct ib_reg_wr reg_wr;
-	struct ib_send_wr inv_wr, *bad_wr, *wr = NULL;
-	int ret, n;
-
-	if (mem->dma_nents == 1) {
-		sge->lkey = device->pd->local_dma_lkey;
-		sge->addr = ib_sg_dma_address(ib_dev, &mem->sg[0]);
-		sge->length = ib_sg_dma_len(ib_dev, &mem->sg[0]);
-		isert_dbg("sge: addr: 0x%llx  length: %u lkey: %x\n",
-			 sge->addr, sge->length, sge->lkey);
-		return 0;
-	}
-
-	if (ind == ISERT_DATA_KEY_VALID)
-		/* Registering data buffer */
-		mr = fr_desc->data_mr;
-	else
-		/* Registering protection buffer */
-		mr = fr_desc->pi_ctx->prot_mr;
-
-	if (!(fr_desc->ind & ind)) {
-		isert_inv_rkey(&inv_wr, mr);
-		wr = &inv_wr;
-	}
-
-	n = ib_map_mr_sg(mr, mem->sg, mem->nents, 0, PAGE_SIZE);
-	if (unlikely(n != mem->nents)) {
-		isert_err("failed to map mr sg (%d/%d)\n",
-			 n, mem->nents);
-		return n < 0 ? n : -EINVAL;
-	}
-
-	isert_dbg("Use fr_desc %p sg_nents %d offset %u\n",
-		  fr_desc, mem->nents, mem->offset);
-
-	reg_wr.wr.next = NULL;
-	reg_wr.wr.opcode = IB_WR_REG_MR;
-	reg_wr.wr.wr_cqe = NULL;
-	reg_wr.wr.send_flags = 0;
-	reg_wr.wr.num_sge = 0;
-	reg_wr.mr = mr;
-	reg_wr.key = mr->lkey;
-	reg_wr.access = IB_ACCESS_LOCAL_WRITE;
-
-	if (!wr)
-		wr = &reg_wr.wr;
-	else
-		wr->next = &reg_wr.wr;
-
-	ret = ib_post_send(isert_conn->qp, wr, &bad_wr);
-	if (ret) {
-		isert_err("fast registration failed, ret:%d\n", ret);
-		return ret;
-	}
-	fr_desc->ind &= ~ind;
-
-	sge->lkey = mr->lkey;
-	sge->addr = mr->iova;
-	sge->length = mr->length;
-
-	isert_dbg("sge: addr: 0x%llx  length: %u lkey: %x\n",
-		  sge->addr, sge->length, sge->lkey);
-
-	return ret;
-}
-
 static inline void
 isert_set_dif_domain(struct se_cmd *se_cmd, struct ib_sig_attrs *sig_attrs,
 		     struct ib_sig_domain *domain)
@@ -2526,6 +2041,8 @@ isert_set_dif_domain(struct se_cmd *se_cmd, struct ib_sig_attrs *sig_attrs,
 static int
 isert_set_sig_attrs(struct se_cmd *se_cmd, struct ib_sig_attrs *sig_attrs)
 {
+	memset(sig_attrs, 0, sizeof(*sig_attrs));
+
 	switch (se_cmd->prot_op) {
 	case TARGET_PROT_DIN_INSERT:
 	case TARGET_PROT_DOUT_STRIP:
@@ -2547,228 +2064,59 @@ isert_set_sig_attrs(struct se_cmd *se_cmd, struct ib_sig_attrs *sig_attrs)
 		return -EINVAL;
 	}
 
+	sig_attrs->check_mask =
+	       (se_cmd->prot_checks & TARGET_DIF_CHECK_GUARD  ? 0xc0 : 0) |
+	       (se_cmd->prot_checks & TARGET_DIF_CHECK_REFTAG ? 0x30 : 0) |
+	       (se_cmd->prot_checks & TARGET_DIF_CHECK_REFTAG ? 0x0f : 0);
 	return 0;
 }
 
-static inline u8
-isert_set_prot_checks(u8 prot_checks)
-{
-	return (prot_checks & TARGET_DIF_CHECK_GUARD  ? 0xc0 : 0) |
-	       (prot_checks & TARGET_DIF_CHECK_REFTAG ? 0x30 : 0) |
-	       (prot_checks & TARGET_DIF_CHECK_REFTAG ? 0x0f : 0);
-}
-
-static int
-isert_reg_sig_mr(struct isert_conn *isert_conn,
-		 struct isert_cmd *isert_cmd,
-		 struct fast_reg_descriptor *fr_desc)
-{
-	struct se_cmd *se_cmd = &isert_cmd->iscsi_cmd->se_cmd;
-	struct ib_sig_handover_wr sig_wr;
-	struct ib_send_wr inv_wr, *bad_wr, *wr = NULL;
-	struct pi_context *pi_ctx = fr_desc->pi_ctx;
-	struct ib_sig_attrs sig_attrs;
-	int ret;
-
-	memset(&sig_attrs, 0, sizeof(sig_attrs));
-	ret = isert_set_sig_attrs(se_cmd, &sig_attrs);
-	if (ret)
-		goto err;
-
-	sig_attrs.check_mask = isert_set_prot_checks(se_cmd->prot_checks);
-
-	if (!(fr_desc->ind & ISERT_SIG_KEY_VALID)) {
-		isert_inv_rkey(&inv_wr, pi_ctx->sig_mr);
-		wr = &inv_wr;
-	}
-
-	memset(&sig_wr, 0, sizeof(sig_wr));
-	sig_wr.wr.opcode = IB_WR_REG_SIG_MR;
-	sig_wr.wr.wr_cqe = NULL;
-	sig_wr.wr.sg_list = &isert_cmd->ib_sg[DATA];
-	sig_wr.wr.num_sge = 1;
-	sig_wr.access_flags = IB_ACCESS_LOCAL_WRITE;
-	sig_wr.sig_attrs = &sig_attrs;
-	sig_wr.sig_mr = pi_ctx->sig_mr;
-	if (se_cmd->t_prot_sg)
-		sig_wr.prot = &isert_cmd->ib_sg[PROT];
-
-	if (!wr)
-		wr = &sig_wr.wr;
-	else
-		wr->next = &sig_wr.wr;
-
-	ret = ib_post_send(isert_conn->qp, wr, &bad_wr);
-	if (ret) {
-		isert_err("fast registration failed, ret:%d\n", ret);
-		goto err;
-	}
-	fr_desc->ind &= ~ISERT_SIG_KEY_VALID;
-
-	isert_cmd->ib_sg[SIG].lkey = pi_ctx->sig_mr->lkey;
-	isert_cmd->ib_sg[SIG].addr = 0;
-	isert_cmd->ib_sg[SIG].length = se_cmd->data_length;
-	if (se_cmd->prot_op != TARGET_PROT_DIN_STRIP &&
-	    se_cmd->prot_op != TARGET_PROT_DOUT_INSERT)
-		/*
-		 * We have protection guards on the wire
-		 * so we need to set a larget transfer
-		 */
-		isert_cmd->ib_sg[SIG].length += se_cmd->prot_length;
-
-	isert_dbg("sig_sge: addr: 0x%llx  length: %u lkey: %x\n",
-		  isert_cmd->ib_sg[SIG].addr, isert_cmd->ib_sg[SIG].length,
-		  isert_cmd->ib_sg[SIG].lkey);
-err:
-	return ret;
-}
-
 static int
-isert_handle_prot_cmd(struct isert_conn *isert_conn,
-		      struct isert_cmd *isert_cmd)
-{
-	struct isert_device *device = isert_conn->device;
-	struct se_cmd *se_cmd = &isert_cmd->iscsi_cmd->se_cmd;
+isert_rdma_rw_ctx_post(struct isert_cmd *cmd, struct isert_conn *conn,
+		struct ib_cqe *cqe, struct ib_send_wr *chain_wr)
+{
+	struct se_cmd *se_cmd = &cmd->iscsi_cmd->se_cmd;
+	enum dma_data_direction dir = target_reverse_dma_direction(se_cmd);
+	u8 port_num = conn->cm_id->port_num;
+	u64 addr;
+	u32 rkey, offset;
 	int ret;
 
-	if (!isert_cmd->fr_desc->pi_ctx) {
-		ret = isert_create_pi_ctx(isert_cmd->fr_desc,
-					  device->ib_device,
-					  device->pd);
-		if (ret) {
-			isert_err("conn %p failed to allocate pi_ctx\n",
-				  isert_conn);
-			return ret;
-		}
-	}
-
-	if (se_cmd->t_prot_sg) {
-		ret = isert_map_data_buf(isert_conn, isert_cmd,
-					 se_cmd->t_prot_sg,
-					 se_cmd->t_prot_nents,
-					 se_cmd->prot_length,
-					 0,
-					 isert_cmd->iser_ib_op,
-					 &isert_cmd->prot);
-		if (ret) {
-			isert_err("conn %p failed to map protection buffer\n",
-				  isert_conn);
-			return ret;
-		}
-
-		memset(&isert_cmd->ib_sg[PROT], 0, sizeof(isert_cmd->ib_sg[PROT]));
-		ret = isert_fast_reg_mr(isert_conn, isert_cmd->fr_desc,
-					&isert_cmd->prot,
-					ISERT_PROT_KEY_VALID,
-					&isert_cmd->ib_sg[PROT]);
-		if (ret) {
-			isert_err("conn %p failed to fast reg mr\n",
-				  isert_conn);
-			goto unmap_prot_cmd;
-		}
-	}
-
-	ret = isert_reg_sig_mr(isert_conn, isert_cmd, isert_cmd->fr_desc);
-	if (ret) {
-		isert_err("conn %p failed to fast reg mr\n",
-			  isert_conn);
-		goto unmap_prot_cmd;
-	}
-	isert_cmd->fr_desc->ind |= ISERT_PROTECTED;
-
-	return 0;
-
-unmap_prot_cmd:
-	if (se_cmd->t_prot_sg)
-		isert_unmap_data_buf(isert_conn, &isert_cmd->prot);
-
-	return ret;
-}
-
-static int
-isert_reg_rdma(struct isert_cmd *isert_cmd, struct iscsi_conn *conn)
-{
-	struct iscsi_cmd *cmd = isert_cmd->iscsi_cmd;
-	struct se_cmd *se_cmd = &cmd->se_cmd;
-	struct isert_conn *isert_conn = conn->context;
-	struct fast_reg_descriptor *fr_desc = NULL;
-	struct ib_rdma_wr *rdma_wr;
-	struct ib_sge *ib_sg;
-	u32 offset;
-	int ret = 0;
-	unsigned long flags;
-
-	offset = isert_cmd->iser_ib_op == ISER_IB_RDMA_READ ?
-			cmd->write_data_done : 0;
-	ret = isert_map_data_buf(isert_conn, isert_cmd, se_cmd->t_data_sg,
-				 se_cmd->t_data_nents, se_cmd->data_length,
-				 offset, isert_cmd->iser_ib_op,
-				 &isert_cmd->data);
-	if (ret)
-		return ret;
-
-	if (isert_cmd->data.dma_nents != 1 ||
-	    isert_prot_cmd(isert_conn, se_cmd)) {
-		spin_lock_irqsave(&isert_conn->pool_lock, flags);
-		fr_desc = list_first_entry(&isert_conn->fr_pool,
-					   struct fast_reg_descriptor, list);
-		list_del(&fr_desc->list);
-		spin_unlock_irqrestore(&isert_conn->pool_lock, flags);
-		isert_cmd->fr_desc = fr_desc;
-	}
-
-	ret = isert_fast_reg_mr(isert_conn, fr_desc, &isert_cmd->data,
-				ISERT_DATA_KEY_VALID, &isert_cmd->ib_sg[DATA]);
-	if (ret)
-		goto unmap_cmd;
-
-	if (isert_prot_cmd(isert_conn, se_cmd)) {
-		ret = isert_handle_prot_cmd(isert_conn, isert_cmd);
-		if (ret)
-			goto unmap_cmd;
-
-		ib_sg = &isert_cmd->ib_sg[SIG];
+	if (dir == DMA_FROM_DEVICE) {
+		addr = cmd->write_va;
+		rkey = cmd->write_stag;
+		offset = cmd->iscsi_cmd->write_data_done;
 	} else {
-		ib_sg = &isert_cmd->ib_sg[DATA];
+		addr = cmd->read_va;
+		rkey = cmd->read_stag;
+		offset = 0;
 	}
 
-	memcpy(&isert_cmd->s_ib_sge, ib_sg, sizeof(*ib_sg));
-	isert_cmd->ib_sge = &isert_cmd->s_ib_sge;
-	isert_cmd->rdma_wr_num = 1;
-	memset(&isert_cmd->s_rdma_wr, 0, sizeof(isert_cmd->s_rdma_wr));
-	isert_cmd->rdma_wr = &isert_cmd->s_rdma_wr;
+	if (isert_prot_cmd(conn, se_cmd)) {
+		struct ib_sig_attrs sig_attrs;
 
-	rdma_wr = &isert_cmd->s_rdma_wr;
-	rdma_wr->wr.sg_list = &isert_cmd->s_ib_sge;
-	rdma_wr->wr.num_sge = 1;
-	rdma_wr->wr.wr_cqe = &isert_cmd->tx_desc.tx_cqe;
-	if (isert_cmd->iser_ib_op == ISER_IB_RDMA_WRITE) {
-		isert_cmd->tx_desc.tx_cqe.done = isert_rdma_write_done;
+		ret = isert_set_sig_attrs(se_cmd, &sig_attrs);
+		if (ret)
+			return ret;
 
-		rdma_wr->wr.opcode = IB_WR_RDMA_WRITE;
-		rdma_wr->remote_addr = isert_cmd->read_va;
-		rdma_wr->rkey = isert_cmd->read_stag;
-		rdma_wr->wr.send_flags = !isert_prot_cmd(isert_conn, se_cmd) ?
-				      0 : IB_SEND_SIGNALED;
+		WARN_ON_ONCE(offset);
+		ret = rdma_rw_ctx_signature_init(&cmd->rw, conn->qp, port_num,
+				se_cmd->t_data_sg, se_cmd->t_data_nents,
+				se_cmd->t_prot_sg, se_cmd->t_prot_nents,
+				&sig_attrs, addr, rkey, dir);
 	} else {
-		isert_cmd->tx_desc.tx_cqe.done = isert_rdma_read_done;
-
-		rdma_wr->wr.opcode = IB_WR_RDMA_READ;
-		rdma_wr->remote_addr = isert_cmd->write_va;
-		rdma_wr->rkey = isert_cmd->write_stag;
-		rdma_wr->wr.send_flags = IB_SEND_SIGNALED;
+		ret = rdma_rw_ctx_init(&cmd->rw, conn->qp, port_num,
+				se_cmd->t_data_sg, se_cmd->t_data_nents,
+				offset, addr, rkey, dir);
 	}
-
-	return 0;
-
-unmap_cmd:
-	if (fr_desc) {
-		spin_lock_irqsave(&isert_conn->pool_lock, flags);
-		list_add_tail(&fr_desc->list, &isert_conn->fr_pool);
-		spin_unlock_irqrestore(&isert_conn->pool_lock, flags);
+	if (ret < 0) {
+		isert_err("Cmd: %p failed to prepare RDMA res\n", cmd);
+		return ret;
 	}
-	isert_unmap_data_buf(isert_conn, &isert_cmd->data);
 
+	ret = rdma_rw_ctx_post(&cmd->rw, conn->qp, port_num, cqe, chain_wr);
+	if (ret < 0)
+		isert_err("Cmd: %p failed to post RDMA res\n", cmd);
 	return ret;
 }
 
@@ -2778,21 +2126,17 @@ isert_put_datain(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
 	struct se_cmd *se_cmd = &cmd->se_cmd;
 	struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
 	struct isert_conn *isert_conn = conn->context;
-	struct isert_device *device = isert_conn->device;
-	struct ib_send_wr *wr_failed;
+	struct ib_cqe *cqe = NULL;
+	struct ib_send_wr *chain_wr = NULL;
 	int rc;
 
 	isert_dbg("Cmd: %p RDMA_WRITE data_length: %u\n",
 		 isert_cmd, se_cmd->data_length);
 
-	isert_cmd->iser_ib_op = ISER_IB_RDMA_WRITE;
-	rc = device->reg_rdma_mem(isert_cmd, conn);
-	if (rc) {
-		isert_err("Cmd: %p failed to prepare RDMA res\n", isert_cmd);
-		return rc;
-	}
-
-	if (!isert_prot_cmd(isert_conn, se_cmd)) {
+	if (isert_prot_cmd(isert_conn, se_cmd)) {
+		isert_cmd->tx_desc.tx_cqe.done = isert_rdma_write_done;
+		cqe = &isert_cmd->tx_desc.tx_cqe;
+	} else {
 		/*
 		 * Build isert_conn->tx_desc for iSCSI response PDU and attach
 		 */
@@ -2803,56 +2147,35 @@ isert_put_datain(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
 		isert_init_tx_hdrs(isert_conn, &isert_cmd->tx_desc);
 		isert_init_send_wr(isert_conn, isert_cmd,
 				   &isert_cmd->tx_desc.send_wr);
-		isert_cmd->s_rdma_wr.wr.next = &isert_cmd->tx_desc.send_wr;
-		isert_cmd->rdma_wr_num += 1;
 
 		rc = isert_post_recv(isert_conn, isert_cmd->rx_desc);
 		if (rc) {
 			isert_err("ib_post_recv failed with %d\n", rc);
 			return rc;
 		}
-	}
 
-	rc = ib_post_send(isert_conn->qp, &isert_cmd->rdma_wr->wr, &wr_failed);
-	if (rc)
-		isert_warn("ib_post_send() failed for IB_WR_RDMA_WRITE\n");
-
-	if (!isert_prot_cmd(isert_conn, se_cmd))
-		isert_dbg("Cmd: %p posted RDMA_WRITE + Response for iSER Data "
-			 "READ\n", isert_cmd);
-	else
-		isert_dbg("Cmd: %p posted RDMA_WRITE for iSER Data READ\n",
-			 isert_cmd);
+		chain_wr = &isert_cmd->tx_desc.send_wr;
+	}
 
+	isert_rdma_rw_ctx_post(isert_cmd, isert_conn, cqe, chain_wr);
+	isert_dbg("Cmd: %p posted RDMA_WRITE for iSER Data READ\n", isert_cmd);
 	return 1;
 }
 
 static int
 isert_get_dataout(struct iscsi_conn *conn, struct iscsi_cmd *cmd, bool recovery)
 {
-	struct se_cmd *se_cmd = &cmd->se_cmd;
 	struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
-	struct isert_conn *isert_conn = conn->context;
-	struct isert_device *device = isert_conn->device;
-	struct ib_send_wr *wr_failed;
-	int rc;
 
 	isert_dbg("Cmd: %p RDMA_READ data_length: %u write_data_done: %u\n",
-		 isert_cmd, se_cmd->data_length, cmd->write_data_done);
-	isert_cmd->iser_ib_op = ISER_IB_RDMA_READ;
-	rc = device->reg_rdma_mem(isert_cmd, conn);
-	if (rc) {
-		isert_err("Cmd: %p failed to prepare RDMA res\n", isert_cmd);
-		return rc;
-	}
+		 isert_cmd, cmd->se_cmd.data_length, cmd->write_data_done);
 
-	rc = ib_post_send(isert_conn->qp, &isert_cmd->rdma_wr->wr, &wr_failed);
-	if (rc)
-		isert_warn("ib_post_send() failed for IB_WR_RDMA_READ\n");
+	isert_cmd->tx_desc.tx_cqe.done = isert_rdma_read_done;
+	isert_rdma_rw_ctx_post(isert_cmd, conn->context,
+			&isert_cmd->tx_desc.tx_cqe, NULL);
 
 	isert_dbg("Cmd: %p posted RDMA_READ memory for ISER Data WRITE\n",
 		 isert_cmd);
-
 	return 0;
 }
 
diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h
index 147900c..e512ba9 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -3,6 +3,7 @@
 #include <linux/in6.h>
 #include <rdma/ib_verbs.h>
 #include <rdma/rdma_cm.h>
+#include <rdma/rw.h>
 #include <scsi/iser.h>
 
 
@@ -53,10 +54,7 @@
 
 #define ISERT_MIN_POSTED_RX	(ISCSI_DEF_XMIT_CMDS_MAX >> 2)
 
-#define ISERT_INFLIGHT_DATAOUTS	8
-
-#define ISERT_QP_MAX_REQ_DTOS	(ISCSI_DEF_XMIT_CMDS_MAX *    \
-				(1 + ISERT_INFLIGHT_DATAOUTS) + \
+#define ISERT_QP_MAX_REQ_DTOS	(ISCSI_DEF_XMIT_CMDS_MAX +    \
 				ISERT_MAX_TX_MISC_PDUS	+ \
 				ISERT_MAX_RX_MISC_PDUS)
 
@@ -71,13 +69,6 @@ enum isert_desc_type {
 	ISCSI_TX_DATAIN
 };
 
-enum iser_ib_op_code {
-	ISER_IB_RECV,
-	ISER_IB_SEND,
-	ISER_IB_RDMA_WRITE,
-	ISER_IB_RDMA_READ,
-};
-
 enum iser_conn_state {
 	ISER_CONN_INIT,
 	ISER_CONN_UP,
@@ -118,42 +109,6 @@ static inline struct iser_tx_desc *cqe_to_tx_desc(struct ib_cqe *cqe)
 	return container_of(cqe, struct iser_tx_desc, tx_cqe);
 }
 
-
-enum isert_indicator {
-	ISERT_PROTECTED		= 1 << 0,
-	ISERT_DATA_KEY_VALID	= 1 << 1,
-	ISERT_PROT_KEY_VALID	= 1 << 2,
-	ISERT_SIG_KEY_VALID	= 1 << 3,
-};
-
-struct pi_context {
-	struct ib_mr		       *prot_mr;
-	struct ib_mr		       *sig_mr;
-};
-
-struct fast_reg_descriptor {
-	struct list_head		list;
-	struct ib_mr		       *data_mr;
-	u8				ind;
-	struct pi_context	       *pi_ctx;
-};
-
-struct isert_data_buf {
-	struct scatterlist     *sg;
-	int			nents;
-	u32			sg_off;
-	u32			len; /* cur_rdma_length */
-	u32			offset;
-	unsigned int		dma_nents;
-	enum dma_data_direction dma_dir;
-};
-
-enum {
-	DATA = 0,
-	PROT = 1,
-	SIG = 2,
-};
-
 struct isert_cmd {
 	uint32_t		read_stag;
 	uint32_t		write_stag;
@@ -166,16 +121,7 @@ struct isert_cmd {
 	struct iscsi_cmd	*iscsi_cmd;
 	struct iser_tx_desc	tx_desc;
 	struct iser_rx_desc	*rx_desc;
-	enum iser_ib_op_code	iser_ib_op;
-	struct ib_sge		*ib_sge;
-	struct ib_sge		s_ib_sge;
-	int			rdma_wr_num;
-	struct ib_rdma_wr	*rdma_wr;
-	struct ib_rdma_wr	s_rdma_wr;
-	struct ib_sge		ib_sg[3];
-	struct isert_data_buf	data;
-	struct isert_data_buf	prot;
-	struct fast_reg_descriptor *fr_desc;
+	struct rdma_rw_ctx	rw;
 	struct work_struct	comp_work;
 	struct scatterlist	sg;
 };
@@ -210,10 +156,6 @@ struct isert_conn {
 	struct isert_device	*device;
 	struct mutex		mutex;
 	struct kref		kref;
-	struct list_head	fr_pool;
-	int			fr_pool_size;
-	/* lock to protect fastreg pool */
-	spinlock_t		pool_lock;
 	struct work_struct	release_work;
 	bool                    logout_posted;
 	bool                    snd_w_inv;
@@ -236,7 +178,6 @@ struct isert_comp {
 };
 
 struct isert_device {
-	int			use_fastreg;
 	bool			pi_capable;
 	int			refcount;
 	struct ib_device	*ib_device;
@@ -244,10 +185,6 @@ struct isert_device {
 	struct isert_comp	*comps;
 	int                     comps_used;
 	struct list_head	dev_node;
-	int			(*reg_rdma_mem)(struct isert_cmd *isert_cmd,
-						struct iscsi_conn *conn);
-	void			(*unreg_rdma_mem)(struct isert_cmd *isert_cmd,
-						  struct isert_conn *isert_conn);
 };
 
 struct isert_np {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* RE: generic RDMA READ/WRITE API V6
  2016-04-11 21:32 generic RDMA READ/WRITE API V6 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2016-04-11 21:32 ` [PATCH 12/12] IB/isert: convert to the generic RDMA READ/WRITE API Christoph Hellwig
@ 2016-04-12 18:31 ` Steve Wise
  2016-04-22 22:29 ` Bart Van Assche
  11 siblings, 0 replies; 66+ messages in thread
From: Steve Wise @ 2016-04-12 18:31 UTC (permalink / raw)
  To: 'Christoph Hellwig', dledford
  Cc: bart.vanassche, sagi, linux-rdma, target-devel

> This series contains patches that implement a first version of a generic
> API to handle RDMA READ/WRITE operations as commonly used on the target
> (or server) side for storage protocols.
> 
> This has been developed for the upcoming NVMe over Fabrics target, and
> extensively teѕted as part of that, although this upstream version has
> additional updates over the one we're currently using.
> 
> It hides details such as the use of MRs for iWarp devices, and will allow
> looking at other HCA specifics easily in the future.
> 
> This series contains also conversion the SRP and iSER targets to the new
> API.
> 
> I think it's basically ready to merge now.
>

I agree.
 
> I also have a git tree available at:
> 
> 	git://git.infradead.org/users/hch/rdma.git rdma-rw-api
> 

With your rdam-rw-api branch, + these fixes:

From: Steve Wise <swise@...>
Subject: [PATCH 0/3] iw_cxgb3/4 bug fixes
http://article.gmane.org/gmane.linux.drivers.rdma/35350

From: Christoph Hellwig <hch@...>
Subject: fix large I/O regression with iSER in 4.4+ V2
http://article.gmane.org/gmane.linux.drivers.rdma/35345


And a workaround for this 4.6 regression that breaks cxgb4:

From: Steve Wise <swise <at> opengridcomputing.com>
Subject: RE: 4.6-rc2 regression with commit 104daa71b396: check VPD access offset against length
http://article.gmane.org/gmane.linux.kernel.pci/50831


The iSER initiator and target test ok on cxgb4/iWARP.

Tested-by: Steve Wise <swise@opengridcomputing.com>

Steve.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/12] IB/core: generic RDMA READ/WRITE API
       [not found]   ` <1460410360-13104-9-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2016-04-12 23:52     ` Bart Van Assche
       [not found]       ` <570D8A42.9040107-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 66+ messages in thread
From: Bart Van Assche @ 2016-04-12 23:52 UTC (permalink / raw)
  To: Christoph Hellwig, dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagi-NQWnxTmZq1alnMjI0IkVqw, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

On 04/11/2016 02:32 PM, Christoph Hellwig wrote:
> +/*
> + * Check if the device will use memory registration for this RW operation.
> + * We currently always use memory registrations for iWarp reads, and iWarp
> + * writes, but never for IB and RoCE.
> + *
> + * XXX: In the future we can hopefully fine tune this based on HCA driver
> + * input.
> + */
> +static inline bool rdma_rw_io_needs_mr(struct ib_device *dev, u8 port_num,
> +		enum dma_data_direction dir, int dma_nents)
> +{
> +	if (rdma_protocol_iwarp(dev, port_num) && dir == DMA_FROM_DEVICE)
> +		return true;
> +	if (unlikely(rdma_rw_force_mr))
> +		return true;
> +	return false;
> +}

Please clarify the comment above this function. The way that comment is 
written seems to contradict the code for iWARP writes.

> +static int rdma_rw_init_one_mr(struct ib_qp *qp, u8 port_num,
> +		struct rdma_rw_reg_ctx *reg, struct scatterlist *sg,
> +		u32 sg_cnt, u32 offset)
> +{
> +	u32 pages_per_mr = rdma_rw_fr_page_list_len(qp->pd->device);
> +	u32 nents = min(sg_cnt, pages_per_mr);
> +	int count = 0, ret;
> +
> +	reg->mr = ib_mr_pool_get(qp, &qp->rdma_mrs);
> +	if (!reg->mr)
> +		return -EAGAIN;
> +
> +	if (reg->mr->need_inval) {
> +		reg->inv_wr.opcode = IB_WR_LOCAL_INV;
> +		reg->inv_wr.ex.invalidate_rkey = reg->mr->lkey;
> +		reg->inv_wr.next = &reg->reg_wr.wr;
> +		count++;
> +	} else {
> +		reg->inv_wr.next = NULL;
> +	}
> +
> +	ret = ib_map_mr_sg(reg->mr, sg, nents, offset, PAGE_SIZE);
> +	if (ret < nents) {
> +		ib_mr_pool_put(qp, &qp->rdma_mrs, reg->mr);
> +		return -EINVAL;
> +	}

The above code assumes that the length of each sg list element is lower 
than or equal to mr->page_size. I think this is something that should be 
documented since the block layer has to be configured explicitly to 
ensure this.

> +static int rdma_rw_init_mr_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
> +		u8 port_num, struct scatterlist *sg, u32 sg_cnt, u32 offset,
> +		u64 remote_addr, u32 rkey, enum dma_data_direction dir)
> +{
> +	u32 pages_per_mr = rdma_rw_fr_page_list_len(qp->pd->device);
> +	int i, j, ret = 0, count = 0;
> +
> +	ctx->nr_ops = (sg_cnt + pages_per_mr - 1) / pages_per_mr;
> +	ctx->reg = kcalloc(ctx->nr_ops, sizeof(*ctx->reg), GFP_KERNEL);
> +	if (!ctx->reg) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	for (i = 0; i < ctx->nr_ops; i++) {
> +		struct rdma_rw_reg_ctx *prev = i ? &ctx->reg[i - 1] : NULL;
> +		struct rdma_rw_reg_ctx *reg = &ctx->reg[i];
> +		u32 nents = min(sg_cnt, pages_per_mr);

The same min(sg_cnt, pages_per_mr) computation occurs here and in 
rdma_rw_init_one_mr(). Is there a way to avoid that duplication?

> +#define RDMA_RW_SINGLE_WR	0
> +#define RDMA_RW_MULTI_WR	1
> +#define RDMA_RW_MR		2

The above constants are only used in the rw.c source file. Do we need 
these constants in the header file or can these be moved into source 
file rw.c?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 08/12] IB/core: generic RDMA READ/WRITE API
       [not found]       ` <570D8A42.9040107-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2016-04-13 13:50         ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2016-04-13 13:50 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagi-NQWnxTmZq1alnMjI0IkVqw, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

On Tue, Apr 12, 2016 at 04:52:34PM -0700, Bart Van Assche wrote:
> Please clarify the comment above this function. The way that comment is 
> written seems to contradict the code for iWARP writes.

Thanks, fixed.

>> +		count++;
>> +	} else {
>> +		reg->inv_wr.next = NULL;
>> +	}
>> +
>> +	ret = ib_map_mr_sg(reg->mr, sg, nents, offset, PAGE_SIZE);
>> +	if (ret < nents) {
>> +		ib_mr_pool_put(qp, &qp->rdma_mrs, reg->mr);
>> +		return -EINVAL;
>> +	}
>
> The above code assumes that the length of each sg list element is lower 
> than or equal to mr->page_size. I think this is something that should be 
> documented since the block layer has to be configured explicitly to ensure 
> this.

It shouldn't assume that - ib_map_mr_sg just uses PAGE_SIZE as
the MR granularuty.  Note that the block layer isn't involved for any of
the users of this function - it's used on the target sides of iSER, SRP,
and (out of tree for a few more weeks) NVMe over fabrics.  All of them
only provide 4k segments so while I think this code should handle larger
segments, there is no way to verify that until we have consumers that
provide larger segments.  I think SCST had some allocator that could
use larger pages, and Pure Storage mention they had LIO changes to
use large pages as well, so if this comes up I think we should be
able to support it without too much effort.
>
>> +	u32 pages_per_mr = rdma_rw_fr_page_list_len(qp->pd->device);
>> +	int i, j, ret = 0, count = 0;
>> +
>> +	ctx->nr_ops = (sg_cnt + pages_per_mr - 1) / pages_per_mr;
>> +	ctx->reg = kcalloc(ctx->nr_ops, sizeof(*ctx->reg), GFP_KERNEL);
>> +	if (!ctx->reg) {
>> +		ret = -ENOMEM;
>> +		goto out;
>> +	}
>> +
>> +	for (i = 0; i < ctx->nr_ops; i++) {
>> +		struct rdma_rw_reg_ctx *prev = i ? &ctx->reg[i - 1] : NULL;
>> +		struct rdma_rw_reg_ctx *reg = &ctx->reg[i];
>> +		u32 nents = min(sg_cnt, pages_per_mr);
>
> The same min(sg_cnt, pages_per_mr) computation occurs here and in 
> rdma_rw_init_one_mr(). Is there a way to avoid that duplication?

It's just a single min statement, so coming up with an inline function
to wrap seems a little too much overhead to me.  If you have a better
suggestion I'd be happy to look into it.

>> +#define RDMA_RW_SINGLE_WR	0
>> +#define RDMA_RW_MULTI_WR	1
>> +#define RDMA_RW_MR		2
>
> The above constants are only used in the rw.c source file. Do we need these 
> constants in the header file or can these be moved into source file rw.c?

I have moved them, and converted the constants to an enum while I was at
it.

Thanks for the review!
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 10/12] IB/srpt: convert to the generic RDMA READ/WRITE API
  2016-04-11 21:32 ` [PATCH 10/12] IB/srpt: convert to the generic RDMA READ/WRITE API Christoph Hellwig
@ 2016-04-13 18:57   ` Bart Van Assche
  2016-04-14 13:32     ` Christoph Hellwig
  0 siblings, 1 reply; 66+ messages in thread
From: Bart Van Assche @ 2016-04-13 18:57 UTC (permalink / raw)
  To: Christoph Hellwig, dledford
  Cc: bart.vanassche, swise, sagi, linux-rdma, target-devel

On 04/11/2016 02:32 PM, Christoph Hellwig wrote:
>   static int srpt_get_desc_tbl(struct srpt_send_ioctx *ioctx,
> -			     struct srp_cmd *srp_cmd,
> -			     enum dma_data_direction *dir, u64 *data_len)
> +		struct srp_cmd *srp_cmd, enum dma_data_direction *dir,
> +		struct scatterlist **sg, unsigned *sg_cnt, u64 *data_len)
>   {
[ ... ]
>
> -		db = idb->desc_list;
> -		memcpy(ioctx->rbufs, db, ioctx->n_rbuf * sizeof(*db));
>   		*data_len = be32_to_cpu(idb->len);
> +		return srpt_alloc_rw_ctxs(ioctx, idb->desc_list, nbufs,
> +				sg, sg_cnt);
> +	} else {
> +		*data_len = 0;
> +		return 0;
>   	}
> -out:
> -	return ret;
>   }

srpt_get_desc_tbl() only has one caller. Have you considered to move 
srpt_alloc_rw_ctxs() from this function to the caller of 
srpt_get_desc_tbl()?

> -	if (srpt_get_desc_tbl(send_ioctx, srp_cmd, &dir, &data_len)) {
> -		pr_err("0x%llx: parsing SRP descriptor table failed.\n",
> -		       srp_cmd->tag);
> +	rc = srpt_get_desc_tbl(send_ioctx, srp_cmd, &dir, &sg, &sg_cnt,
> +			&data_len);
> +	if (rc) {
> +		if (rc != -EAGAIN) {
> +			pr_err("0x%llx: parsing SRP descriptor table failed.\n",
> +			       srp_cmd->tag);
> +		} else {
> +			printk_ratelimited("out of MRs for 0x%llx\n", srp_cmd->tag);
> +		}
>   		goto release_ioctx;
>   	}

Sorry but releasing an I/O context if srpt_alloc_rw_ctxs() returns 
-EAGAIN looks wrong to me. If this happens the I/O context should be 
added to the wait list without releasing it. Additionally, 
srpt_recv_done() will have to be modified such that newly received 
commands are added to the wait list if this list is not empty to prevent 
that starvation of a postponed request occurs due to new incoming requests.

Bart.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 10/12] IB/srpt: convert to the generic RDMA READ/WRITE API
  2016-04-13 18:57   ` Bart Van Assche
@ 2016-04-14 13:32     ` Christoph Hellwig
  2016-04-28 21:02       ` Doug Ledford
  0 siblings, 1 reply; 66+ messages in thread
From: Christoph Hellwig @ 2016-04-14 13:32 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, dledford, swise, sagi, linux-rdma, target-devel

On Wed, Apr 13, 2016 at 11:57:57AM -0700, Bart Van Assche wrote:
> On 04/11/2016 02:32 PM, Christoph Hellwig wrote:
>>   static int srpt_get_desc_tbl(struct srpt_send_ioctx *ioctx,
>> -			     struct srp_cmd *srp_cmd,
>> -			     enum dma_data_direction *dir, u64 *data_len)
>> +		struct srp_cmd *srp_cmd, enum dma_data_direction *dir,
>> +		struct scatterlist **sg, unsigned *sg_cnt, u64 *data_len)
>>   {
> [ ... ]
>>
>> -		db = idb->desc_list;
>> -		memcpy(ioctx->rbufs, db, ioctx->n_rbuf * sizeof(*db));
>>   		*data_len = be32_to_cpu(idb->len);
>> +		return srpt_alloc_rw_ctxs(ioctx, idb->desc_list, nbufs,
>> +				sg, sg_cnt);
>> +	} else {
>> +		*data_len = 0;
>> +		return 0;
>>   	}
>> -out:
>> -	return ret;
>>   }
>
> srpt_get_desc_tbl() only has one caller. Have you considered to move 
> srpt_alloc_rw_ctxs() from this function to the caller of 
> srpt_get_desc_tbl()?

I looked into a couple options.  srpt_alloc_rw_ctxs needs the
pointer to the srp_direct_buf array, and the number of buffers, so we'd
need two more output arguments to srpt_get_desc_tbl, so it didn't
seem worthwhile to me.  If you want me to make the change anyway, I can
update the patch.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/12] IB/cma: pass the port number to ib_create_qp
       [not found]   ` <1460410360-13104-3-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2016-04-15 17:55     ` Sagi Grimberg
  0 siblings, 0 replies; 66+ messages in thread
From: Sagi Grimberg @ 2016-04-15 17:55 UTC (permalink / raw)
  To: Christoph Hellwig, dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

Looks fine,

Reviewed-by: Sagi Grimberg <sagi@grimberg.e>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 03/12] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg
       [not found]   ` <1460410360-13104-4-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2016-04-15 17:56     ` Sagi Grimberg
  0 siblings, 0 replies; 66+ messages in thread
From: Sagi Grimberg @ 2016-04-15 17:56 UTC (permalink / raw)
  To: Christoph Hellwig, dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

Looks good,

Reviewed-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 04/12] IB/core: add a helper to check for READ WITH INVALIDATE support
       [not found]   ` <1460410360-13104-5-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2016-04-15 17:56     ` Sagi Grimberg
  0 siblings, 0 replies; 66+ messages in thread
From: Sagi Grimberg @ 2016-04-15 17:56 UTC (permalink / raw)
  To: Christoph Hellwig, dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

Looks good,

Reviewed-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 01/12] IB/mlx5: Expose correct max_sge_rd limit
       [not found]     ` <1460410360-13104-2-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2016-04-17 13:53       ` Leon Romanovsky
       [not found]         ` <20160417135341.GC6349-2ukJVAZIZ/Y@public.gmane.org>
  0 siblings, 1 reply; 66+ messages in thread
From: Leon Romanovsky @ 2016-04-17 13:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagi-NQWnxTmZq1alnMjI0IkVqw, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA, Sagi Grimberg

[-- Attachment #1: Type: text/plain, Size: 248 bytes --]

On Mon, Apr 11, 2016 at 02:32:29PM -0700, Christoph Hellwig wrote:
> From: Sagi Grimberg <sagig-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>

Sagi sent it to ML approximately two weeks ago [1]

[1] http://www.spinics.net/lists/linux-rdma/msg34619.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 01/12] IB/mlx5: Expose correct max_sge_rd limit
       [not found]         ` <20160417135341.GC6349-2ukJVAZIZ/Y@public.gmane.org>
@ 2016-04-17 18:06           ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2016-04-17 18:06 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Christoph Hellwig, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagi-NQWnxTmZq1alnMjI0IkVqw, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA, Sagi Grimberg

On Sun, Apr 17, 2016 at 04:53:41PM +0300, Leon Romanovsky wrote:
> On Mon, Apr 11, 2016 at 02:32:29PM -0700, Christoph Hellwig wrote:
> > From: Sagi Grimberg <sagig-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
> 
> Sagi sent it to ML approximately two weeks ago [1]

I know..
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 05/12] IB/core: refactor ib_create_qp
  2016-04-11 21:32 ` [PATCH 05/12] IB/core: refactor ib_create_qp Christoph Hellwig
@ 2016-04-17 20:00   ` Sagi Grimberg
       [not found]   ` <1460410360-13104-6-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
  1 sibling, 0 replies; 66+ messages in thread
From: Sagi Grimberg @ 2016-04-17 20:00 UTC (permalink / raw)
  To: Christoph Hellwig, dledford
  Cc: bart.vanassche, swise, linux-rdma, target-devel

It's not needed as part of the patchset, but looks good,

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 06/12] IB/core: add a simple MR pool
  2016-04-11 21:32   ` [PATCH 06/12] IB/core: add a simple MR pool Christoph Hellwig
@ 2016-04-17 20:01     ` Sagi Grimberg
  2016-04-19  3:19     ` Ira Weiny
  1 sibling, 0 replies; 66+ messages in thread
From: Sagi Grimberg @ 2016-04-17 20:01 UTC (permalink / raw)
  To: Christoph Hellwig, dledford
  Cc: bart.vanassche, swise, linux-rdma, target-devel

Looks fine,

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 07/12] IB/core: add a need_inval flag to struct ib_mr
       [not found]   ` <1460410360-13104-8-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2016-04-17 20:01     ` Sagi Grimberg
  0 siblings, 0 replies; 66+ messages in thread
From: Sagi Grimberg @ 2016-04-17 20:01 UTC (permalink / raw)
  To: Christoph Hellwig, dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA, Steve Wise

Yes, thank you.

Reviewed-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 05/12] IB/core: refactor ib_create_qp
       [not found]   ` <1460410360-13104-6-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2016-04-19  3:08     ` Ira Weiny
  0 siblings, 0 replies; 66+ messages in thread
From: Ira Weiny @ 2016-04-19  3:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagi-NQWnxTmZq1alnMjI0IkVqw, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

On Mon, Apr 11, 2016 at 02:32:33PM -0700, Christoph Hellwig wrote:
> Split the XRC magic into a separate function, and return early on failure
> to make the initialization code readable.
> 
> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> Tested-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
> Reviewed-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Reviewed-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>

Reviewed-by: Ira Weiny <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

> ---
>  drivers/infiniband/core/verbs.c | 103 +++++++++++++++++++++-------------------
>  1 file changed, 54 insertions(+), 49 deletions(-)
> 
> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index 064dbef..d0ed260 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -723,62 +723,67 @@ struct ib_qp *ib_open_qp(struct ib_xrcd *xrcd,
>  }
>  EXPORT_SYMBOL(ib_open_qp);
>  
> +static struct ib_qp *ib_create_xrc_qp(struct ib_qp *qp,
> +		struct ib_qp_init_attr *qp_init_attr)
> +{
> +	struct ib_qp *real_qp = qp;
> +
> +	qp->event_handler = __ib_shared_qp_event_handler;
> +	qp->qp_context = qp;
> +	qp->pd = NULL;
> +	qp->send_cq = qp->recv_cq = NULL;
> +	qp->srq = NULL;
> +	qp->xrcd = qp_init_attr->xrcd;
> +	atomic_inc(&qp_init_attr->xrcd->usecnt);
> +	INIT_LIST_HEAD(&qp->open_list);
> +
> +	qp = __ib_open_qp(real_qp, qp_init_attr->event_handler,
> +			  qp_init_attr->qp_context);
> +	if (!IS_ERR(qp))
> +		__ib_insert_xrcd_qp(qp_init_attr->xrcd, real_qp);
> +	else
> +		real_qp->device->destroy_qp(real_qp);
> +	return qp;
> +}
> +
>  struct ib_qp *ib_create_qp(struct ib_pd *pd,
>  			   struct ib_qp_init_attr *qp_init_attr)
>  {
> -	struct ib_qp *qp, *real_qp;
> -	struct ib_device *device;
> +	struct ib_device *device = pd ? pd->device : qp_init_attr->xrcd->device;
> +	struct ib_qp *qp;
>  
> -	device = pd ? pd->device : qp_init_attr->xrcd->device;
>  	qp = device->create_qp(pd, qp_init_attr, NULL);
> -
> -	if (!IS_ERR(qp)) {
> -		qp->device     = device;
> -		qp->real_qp    = qp;
> -		qp->uobject    = NULL;
> -		qp->qp_type    = qp_init_attr->qp_type;
> -
> -		atomic_set(&qp->usecnt, 0);
> -		if (qp_init_attr->qp_type == IB_QPT_XRC_TGT) {
> -			qp->event_handler = __ib_shared_qp_event_handler;
> -			qp->qp_context = qp;
> -			qp->pd = NULL;
> -			qp->send_cq = qp->recv_cq = NULL;
> -			qp->srq = NULL;
> -			qp->xrcd = qp_init_attr->xrcd;
> -			atomic_inc(&qp_init_attr->xrcd->usecnt);
> -			INIT_LIST_HEAD(&qp->open_list);
> -
> -			real_qp = qp;
> -			qp = __ib_open_qp(real_qp, qp_init_attr->event_handler,
> -					  qp_init_attr->qp_context);
> -			if (!IS_ERR(qp))
> -				__ib_insert_xrcd_qp(qp_init_attr->xrcd, real_qp);
> -			else
> -				real_qp->device->destroy_qp(real_qp);
> -		} else {
> -			qp->event_handler = qp_init_attr->event_handler;
> -			qp->qp_context = qp_init_attr->qp_context;
> -			if (qp_init_attr->qp_type == IB_QPT_XRC_INI) {
> -				qp->recv_cq = NULL;
> -				qp->srq = NULL;
> -			} else {
> -				qp->recv_cq = qp_init_attr->recv_cq;
> -				atomic_inc(&qp_init_attr->recv_cq->usecnt);
> -				qp->srq = qp_init_attr->srq;
> -				if (qp->srq)
> -					atomic_inc(&qp_init_attr->srq->usecnt);
> -			}
> -
> -			qp->pd	    = pd;
> -			qp->send_cq = qp_init_attr->send_cq;
> -			qp->xrcd    = NULL;
> -
> -			atomic_inc(&pd->usecnt);
> -			atomic_inc(&qp_init_attr->send_cq->usecnt);
> -		}
> +	if (IS_ERR(qp))
> +		return qp;
> +
> +	qp->device     = device;
> +	qp->real_qp    = qp;
> +	qp->uobject    = NULL;
> +	qp->qp_type    = qp_init_attr->qp_type;
> +
> +	atomic_set(&qp->usecnt, 0);
> +	if (qp_init_attr->qp_type == IB_QPT_XRC_TGT)
> +		return ib_create_xrc_qp(qp, qp_init_attr);
> +
> +	qp->event_handler = qp_init_attr->event_handler;
> +	qp->qp_context = qp_init_attr->qp_context;
> +	if (qp_init_attr->qp_type == IB_QPT_XRC_INI) {
> +		qp->recv_cq = NULL;
> +		qp->srq = NULL;
> +	} else {
> +		qp->recv_cq = qp_init_attr->recv_cq;
> +		atomic_inc(&qp_init_attr->recv_cq->usecnt);
> +		qp->srq = qp_init_attr->srq;
> +		if (qp->srq)
> +			atomic_inc(&qp_init_attr->srq->usecnt);
>  	}
>  
> +	qp->pd	    = pd;
> +	qp->send_cq = qp_init_attr->send_cq;
> +	qp->xrcd    = NULL;
> +
> +	atomic_inc(&pd->usecnt);
> +	atomic_inc(&qp_init_attr->send_cq->usecnt);
>  	return qp;
>  }
>  EXPORT_SYMBOL(ib_create_qp);
> -- 
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/12] IB/cma: pass the port number to ib_create_qp
  2016-04-11 21:32 ` [PATCH 02/12] IB/cma: pass the port number to ib_create_qp Christoph Hellwig
       [not found]   ` <1460410360-13104-3-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2016-04-19  3:14   ` Ira Weiny
  2016-04-19 17:30     ` Jason Gunthorpe
  1 sibling, 1 reply; 66+ messages in thread
From: Ira Weiny @ 2016-04-19  3:14 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dledford, bart.vanassche, swise, sagi, linux-rdma, target-devel

On Mon, Apr 11, 2016 at 02:32:30PM -0700, Christoph Hellwig wrote:
> The new RW API will need this.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Tested-by: Steve Wise <swise@opengridcomputing.com>

I'm not opposed to this change but traditionally QPs are bound to a device not
to a single port.

How does this change that semantic?

Ira

> ---
>  drivers/infiniband/core/cma.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index 93ab0ae..6ebaf20 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -800,6 +800,7 @@ int rdma_create_qp(struct rdma_cm_id *id, struct ib_pd *pd,
>  	if (id->device != pd->device)
>  		return -EINVAL;
>  
> +	qp_init_attr->port_num = id->port_num;
>  	qp = ib_create_qp(pd, qp_init_attr);
>  	if (IS_ERR(qp))
>  		return PTR_ERR(qp);
> -- 
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 04/12] IB/core: add a helper to check for READ WITH INVALIDATE support
  2016-04-11 21:32 ` [PATCH 04/12] IB/core: add a helper to check for READ WITH INVALIDATE support Christoph Hellwig
       [not found]   ` <1460410360-13104-5-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2016-04-19  3:15   ` Ira Weiny
  1 sibling, 0 replies; 66+ messages in thread
From: Ira Weiny @ 2016-04-19  3:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dledford, bart.vanassche, swise, sagi, linux-rdma, target-devel

On Mon, Apr 11, 2016 at 02:32:32PM -0700, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Tested-by: Steve Wise <swise@opengridcomputing.com>
> Reviewed-by: Steve Wise <swise@opengridcomputing.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

> ---
>  include/rdma/ib_verbs.h | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 24d0d82..9e8616a 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -2318,6 +2318,18 @@ static inline bool rdma_cap_roce_gid_table(const struct ib_device *device,
>  		device->add_gid && device->del_gid;
>  }
>  
> +/*
> + * Check if the device supports READ W/ INVALIDATE.
> + */
> +static inline bool rdma_cap_read_inv(struct ib_device *dev, u32 port_num)
> +{
> +	/*
> +	 * iWarp drivers must support READ W/ INVALIDATE.  No other protocol
> +	 * has support for it yet.
> +	 */
> +	return rdma_protocol_iwarp(dev, port_num);
> +}
> +
>  int ib_query_gid(struct ib_device *device,
>  		 u8 port_num, int index, union ib_gid *gid,
>  		 struct ib_gid_attr *attr);
> -- 
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 06/12] IB/core: add a simple MR pool
  2016-04-11 21:32   ` [PATCH 06/12] IB/core: add a simple MR pool Christoph Hellwig
  2016-04-17 20:01     ` Sagi Grimberg
@ 2016-04-19  3:19     ` Ira Weiny
  1 sibling, 0 replies; 66+ messages in thread
From: Ira Weiny @ 2016-04-19  3:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dledford, bart.vanassche, swise, sagi, linux-rdma, target-devel

On Mon, Apr 11, 2016 at 02:32:34PM -0700, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Tested-by: Steve Wise <swise@opengridcomputing.com>
> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Reviewed-by: Steve Wise <swise@opengridcomputing.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

> ---
>  drivers/infiniband/core/Makefile  |  2 +-
>  drivers/infiniband/core/mr_pool.c | 86 +++++++++++++++++++++++++++++++++++++++
>  drivers/infiniband/core/verbs.c   |  5 +++
>  include/rdma/ib_verbs.h           |  8 +++-
>  include/rdma/mr_pool.h            | 25 ++++++++++++
>  5 files changed, 124 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/infiniband/core/mr_pool.c
>  create mode 100644 include/rdma/mr_pool.h
> 
> diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
> index f818538..48bd9d8 100644
> --- a/drivers/infiniband/core/Makefile
> +++ b/drivers/infiniband/core/Makefile
> @@ -10,7 +10,7 @@ obj-$(CONFIG_INFINIBAND_USER_ACCESS) +=	ib_uverbs.o ib_ucm.o \
>  
>  ib_core-y :=			packer.o ud_header.o verbs.o cq.o sysfs.o \
>  				device.o fmr_pool.o cache.o netlink.o \
> -				roce_gid_mgmt.o
> +				roce_gid_mgmt.o mr_pool.o
>  ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
>  ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o
>  
> diff --git a/drivers/infiniband/core/mr_pool.c b/drivers/infiniband/core/mr_pool.c
> new file mode 100644
> index 0000000..49d478b
> --- /dev/null
> +++ b/drivers/infiniband/core/mr_pool.c
> @@ -0,0 +1,86 @@
> +/*
> + * Copyright (c) 2016 HGST, a Western Digital Company.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + */
> +#include <rdma/ib_verbs.h>
> +#include <rdma/mr_pool.h>
> +
> +struct ib_mr *ib_mr_pool_get(struct ib_qp *qp, struct list_head *list)
> +{
> +	struct ib_mr *mr;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&qp->mr_lock, flags);
> +	mr = list_first_entry_or_null(list, struct ib_mr, qp_entry);
> +	if (mr) {
> +		list_del(&mr->qp_entry);
> +		qp->mrs_used++;
> +	}
> +	spin_unlock_irqrestore(&qp->mr_lock, flags);
> +
> +	return mr;
> +}
> +EXPORT_SYMBOL(ib_mr_pool_get);
> +
> +void ib_mr_pool_put(struct ib_qp *qp, struct list_head *list, struct ib_mr *mr)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&qp->mr_lock, flags);
> +	list_add(&mr->qp_entry, list);
> +	qp->mrs_used--;
> +	spin_unlock_irqrestore(&qp->mr_lock, flags);
> +}
> +EXPORT_SYMBOL(ib_mr_pool_put);
> +
> +int ib_mr_pool_init(struct ib_qp *qp, struct list_head *list, int nr,
> +		enum ib_mr_type type, u32 max_num_sg)
> +{
> +	struct ib_mr *mr;
> +	unsigned long flags;
> +	int ret, i;
> +
> +	for (i = 0; i < nr; i++) {
> +		mr = ib_alloc_mr(qp->pd, type, max_num_sg);
> +		if (IS_ERR(mr)) {
> +			ret = PTR_ERR(mr);
> +			goto out;
> +		}
> +
> +		spin_lock_irqsave(&qp->mr_lock, flags);
> +		list_add_tail(&mr->qp_entry, list);
> +		spin_unlock_irqrestore(&qp->mr_lock, flags);
> +	}
> +
> +	return 0;
> +out:
> +	ib_mr_pool_destroy(qp, list);
> +	return ret;
> +}
> +EXPORT_SYMBOL(ib_mr_pool_init);
> +
> +void ib_mr_pool_destroy(struct ib_qp *qp, struct list_head *list)
> +{
> +	struct ib_mr *mr;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&qp->mr_lock, flags);
> +	while (!list_empty(list)) {
> +		mr = list_first_entry(list, struct ib_mr, qp_entry);
> +		list_del(&mr->qp_entry);
> +
> +		spin_unlock_irqrestore(&qp->mr_lock, flags);
> +		ib_dereg_mr(mr);
> +		spin_lock_irqsave(&qp->mr_lock, flags);
> +	}
> +	spin_unlock_irqrestore(&qp->mr_lock, flags);
> +}
> +EXPORT_SYMBOL(ib_mr_pool_destroy);
> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index d0ed260..d9ea2fb 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -762,6 +762,9 @@ struct ib_qp *ib_create_qp(struct ib_pd *pd,
>  	qp->qp_type    = qp_init_attr->qp_type;
>  
>  	atomic_set(&qp->usecnt, 0);
> +	qp->mrs_used = 0;
> +	spin_lock_init(&qp->mr_lock);
> +
>  	if (qp_init_attr->qp_type == IB_QPT_XRC_TGT)
>  		return ib_create_xrc_qp(qp, qp_init_attr);
>  
> @@ -1255,6 +1258,8 @@ int ib_destroy_qp(struct ib_qp *qp)
>  	struct ib_srq *srq;
>  	int ret;
>  
> +	WARN_ON_ONCE(qp->mrs_used > 0);
> +
>  	if (atomic_read(&qp->usecnt))
>  		return -EBUSY;
>  
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 9e8616a..400a8a0 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1421,9 +1421,12 @@ struct ib_qp {
>  	struct ib_pd	       *pd;
>  	struct ib_cq	       *send_cq;
>  	struct ib_cq	       *recv_cq;
> +	spinlock_t		mr_lock;
> +	int			mrs_used;
>  	struct ib_srq	       *srq;
>  	struct ib_xrcd	       *xrcd; /* XRC TGT QPs only */
>  	struct list_head	xrcd_list;
> +
>  	/* count times opened, mcast attaches, flow attaches */
>  	atomic_t		usecnt;
>  	struct list_head	open_list;
> @@ -1438,12 +1441,15 @@ struct ib_qp {
>  struct ib_mr {
>  	struct ib_device  *device;
>  	struct ib_pd	  *pd;
> -	struct ib_uobject *uobject;
>  	u32		   lkey;
>  	u32		   rkey;
>  	u64		   iova;
>  	u32		   length;
>  	unsigned int	   page_size;
> +	union {
> +		struct ib_uobject	*uobject;	/* user */
> +		struct list_head	qp_entry;	/* FR */
> +	};
>  };
>  
>  struct ib_mw {
> diff --git a/include/rdma/mr_pool.h b/include/rdma/mr_pool.h
> new file mode 100644
> index 0000000..986010b
> --- /dev/null
> +++ b/include/rdma/mr_pool.h
> @@ -0,0 +1,25 @@
> +/*
> + * Copyright (c) 2016 HGST, a Western Digital Company.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + */
> +#ifndef _RDMA_MR_POOL_H
> +#define _RDMA_MR_POOL_H 1
> +
> +#include <rdma/ib_verbs.h>
> +
> +struct ib_mr *ib_mr_pool_get(struct ib_qp *qp, struct list_head *list);
> +void ib_mr_pool_put(struct ib_qp *qp, struct list_head *list, struct ib_mr *mr);
> +
> +int ib_mr_pool_init(struct ib_qp *qp, struct list_head *list, int nr,
> +		enum ib_mr_type type, u32 max_num_sg);
> +void ib_mr_pool_destroy(struct ib_qp *qp, struct list_head *list);
> +
> +#endif /* _RDMA_MR_POOL_H */
> -- 
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 07/12] IB/core: add a need_inval flag to struct ib_mr
  2016-04-11 21:32 ` [PATCH 07/12] IB/core: add a need_inval flag to struct ib_mr Christoph Hellwig
       [not found]   ` <1460410360-13104-8-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2016-04-19  3:20   ` Ira Weiny
  1 sibling, 0 replies; 66+ messages in thread
From: Ira Weiny @ 2016-04-19  3:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dledford, bart.vanassche, swise, sagi, linux-rdma, target-devel,
	Steve Wise

On Mon, Apr 11, 2016 at 02:32:35PM -0700, Christoph Hellwig wrote:
> From: Steve Wise <swise@chelsio.com>
> 
> This is the first step toward moving MR invalidation decisions
> to the core.  It will be needed by the upcoming RW API.
> 
> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

> ---
>  drivers/infiniband/core/verbs.c | 2 ++
>  include/rdma/ib_verbs.h         | 1 +
>  2 files changed, 3 insertions(+)
> 
> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index d9ea2fb..179d800 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -1353,6 +1353,7 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags)
>  		mr->pd      = pd;
>  		mr->uobject = NULL;
>  		atomic_inc(&pd->usecnt);
> +		mr->need_inval = false;
>  	}
>  
>  	return mr;
> @@ -1399,6 +1400,7 @@ struct ib_mr *ib_alloc_mr(struct ib_pd *pd,
>  		mr->pd      = pd;
>  		mr->uobject = NULL;
>  		atomic_inc(&pd->usecnt);
> +		mr->need_inval = false;
>  	}
>  
>  	return mr;
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 400a8a0..3f66647 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1446,6 +1446,7 @@ struct ib_mr {
>  	u64		   iova;
>  	u32		   length;
>  	unsigned int	   page_size;
> +	bool		   need_inval;
>  	union {
>  		struct ib_uobject	*uobject;	/* user */
>  		struct list_head	qp_entry;	/* FR */
> -- 
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/12] IB/cma: pass the port number to ib_create_qp
  2016-04-19  3:14   ` Ira Weiny
@ 2016-04-19 17:30     ` Jason Gunthorpe
  2016-04-19 18:38       ` Christoph Hellwig
       [not found]       ` <20160419173032.GD20844-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 2 replies; 66+ messages in thread
From: Jason Gunthorpe @ 2016-04-19 17:30 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Christoph Hellwig, dledford, bart.vanassche, swise, sagi,
	linux-rdma, target-devel

On Mon, Apr 18, 2016 at 11:14:27PM -0400, Ira Weiny wrote:
> On Mon, Apr 11, 2016 at 02:32:30PM -0700, Christoph Hellwig wrote:
> > The new RW API will need this.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
> > Tested-by: Steve Wise <swise@opengridcomputing.com>
> 
> I'm not opposed to this change but traditionally QPs are bound to a
> device not to a single port.

Right, this was done because rdma_protocol_iwarp takes a port number.

I think we discussed this once, the core code doesn't actually support
different protocols on different ports, so the port_num argument to
rdma_protocol_iwarp is redundant.

This all starts to look really goofy when multi-port APM is used and
the QP's port number changes dynamically at runtime. (I have some
experimental patches that do that), I'd rather see all the port_num
stuff in this series go away. :(

Jason

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/12] IB/cma: pass the port number to ib_create_qp
  2016-04-19 17:30     ` Jason Gunthorpe
@ 2016-04-19 18:38       ` Christoph Hellwig
       [not found]         ` <20160419183830.GB1211-jcswGhMUV9g@public.gmane.org>
       [not found]       ` <20160419173032.GD20844-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 66+ messages in thread
From: Christoph Hellwig @ 2016-04-19 18:38 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Ira Weiny, Christoph Hellwig, dledford, bart.vanassche, swise,
	sagi, linux-rdma, target-devel

On Tue, Apr 19, 2016 at 11:30:32AM -0600, Jason Gunthorpe wrote:
> Right, this was done because rdma_protocol_iwarp takes a port number.
> 
> I think we discussed this once, the core code doesn't actually support
> different protocols on different ports, so the port_num argument to
> rdma_protocol_iwarp is redundant.
> 
> This all starts to look really goofy when multi-port APM is used and
> the QP's port number changes dynamically at runtime. (I have some
> experimental patches that do that), I'd rather see all the port_num
> stuff in this series go away. :(

Reall, I would _love_ to kill all that port_num crap.  But until
we get agreement from Doug and all the core maintainers that we
can kill it from the core, and that multi-protocol devices are
indeed as silly as they seem I can't..

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/12] IB/cma: pass the port number to ib_create_qp
       [not found]       ` <20160419173032.GD20844-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-04-19 18:49         ` Sagi Grimberg
  2016-04-19 19:24           ` Jason Gunthorpe
  0 siblings, 1 reply; 66+ messages in thread
From: Sagi Grimberg @ 2016-04-19 18:49 UTC (permalink / raw)
  To: Jason Gunthorpe, Ira Weiny
  Cc: Christoph Hellwig, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA


>>> The new RW API will need this.
>>>
>>> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
>>> Reviewed-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
>>> Tested-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
>>
>> I'm not opposed to this change but traditionally QPs are bound to a
>> device not to a single port.
>
> Right, this was done because rdma_protocol_iwarp takes a port number.
>
> I think we discussed this once, the core code doesn't actually support
> different protocols on different ports, so the port_num argument to
> rdma_protocol_iwarp is redundant.
>
> This all starts to look really goofy when multi-port APM is used and
> the QP's port number changes dynamically at runtime. (I have some
> experimental patches that do that), I'd rather see all the port_num
> stuff in this series go away. :(

HCH and I complained about this per-port distinction in several private
conversations. I'd really love to see it go away too.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/12] IB/cma: pass the port number to ib_create_qp
  2016-04-19 18:49         ` Sagi Grimberg
@ 2016-04-19 19:24           ` Jason Gunthorpe
       [not found]             ` <20160419192430.GB27028-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 66+ messages in thread
From: Jason Gunthorpe @ 2016-04-19 19:24 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Ira Weiny, Christoph Hellwig, dledford, bart.vanassche, swise,
	linux-rdma, target-devel

On Tue, Apr 19, 2016 at 09:49:03PM +0300, Sagi Grimberg wrote:
> 
> >>>The new RW API will need this.
> >>>
> >>>Signed-off-by: Christoph Hellwig <hch@lst.de>
> >>>Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
> >>>Tested-by: Steve Wise <swise@opengridcomputing.com>
> >>
> >>I'm not opposed to this change but traditionally QPs are bound to a
> >>device not to a single port.
> >
> >Right, this was done because rdma_protocol_iwarp takes a port number.
> >
> >I think we discussed this once, the core code doesn't actually support
> >different protocols on different ports, so the port_num argument to
> >rdma_protocol_iwarp is redundant.
> >
> >This all starts to look really goofy when multi-port APM is used and
> >the QP's port number changes dynamically at runtime. (I have some
> >experimental patches that do that), I'd rather see all the port_num
> >stuff in this series go away. :(
> 
> HCH and I complained about this per-port distinction in several private
> conversations. I'd really love to see it go away too.

I'm in support of eliminating them. One protocol per device.

IB APM hard requires those semantics, and that reflects the reality of
all the drivers today.

Nothing more is required than sending a patch, IHMO..

Jason

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH 02/12] IB/cma: pass the port number to ib_create_qp
       [not found]             ` <20160419192430.GB27028-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2016-04-19 19:41               ` Steve Wise
  2016-04-19 20:05                 ` 'Christoph Hellwig'
  2016-04-28 19:43               ` Hefty, Sean
  1 sibling, 1 reply; 66+ messages in thread
From: Steve Wise @ 2016-04-19 19:41 UTC (permalink / raw)
  To: 'Jason Gunthorpe', 'Sagi Grimberg'
  Cc: 'Ira Weiny', 'Christoph Hellwig',
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

> On Tue, Apr 19, 2016 at 09:49:03PM +0300, Sagi Grimberg wrote:
> >
> > >>>The new RW API will need this.
> > >>>
> > >>>Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> > >>>Reviewed-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> > >>>Tested-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
> > >>
> > >>I'm not opposed to this change but traditionally QPs are bound to a
> > >>device not to a single port.
> > >
> > >Right, this was done because rdma_protocol_iwarp takes a port number.
> > >
> > >I think we discussed this once, the core code doesn't actually support
> > >different protocols on different ports, so the port_num argument to
> > >rdma_protocol_iwarp is redundant.
> > >
> > >This all starts to look really goofy when multi-port APM is used and
> > >the QP's port number changes dynamically at runtime. (I have some
> > >experimental patches that do that), I'd rather see all the port_num
> > >stuff in this series go away. :(
> >
> > HCH and I complained about this per-port distinction in several private
> > conversations. I'd really love to see it go away too.
> 
> I'm in support of eliminating them. One protocol per device.
>

Ditto.
 
> IB APM hard requires those semantics, and that reflects the reality of
> all the drivers today.
> 
> Nothing more is required than sending a patch, IHMO..
> 

I've been trying to sift through the original threads regarding
rdma_protocol_iwarp() and friends.  I couldn't find anybody advocating hard that
the protocol/transport type should be per port.
 
I think this thread has Doug stating it really should be per-device and static.
Doug, correct me if I'm wrong...

https://lkml.org/lkml/2015/4/10/612

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/12] IB/cma: pass the port number to ib_create_qp
  2016-04-19 19:41               ` Steve Wise
@ 2016-04-19 20:05                 ` 'Christoph Hellwig'
  2016-04-19 20:21                   ` Jason Gunthorpe
       [not found]                   ` <20160419200555.GA2561-jcswGhMUV9g@public.gmane.org>
  0 siblings, 2 replies; 66+ messages in thread
From: 'Christoph Hellwig' @ 2016-04-19 20:05 UTC (permalink / raw)
  To: Steve Wise
  Cc: 'Jason Gunthorpe', 'Sagi Grimberg',
	'Ira Weiny', 'Christoph Hellwig',
	dledford, bart.vanassche, linux-rdma, target-devel

I can offer a trade:  once this series is accepted I'll clean
up all the port_num arguments in the protocol checks over the whole
tree :)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/12] IB/cma: pass the port number to ib_create_qp
  2016-04-19 20:05                 ` 'Christoph Hellwig'
@ 2016-04-19 20:21                   ` Jason Gunthorpe
       [not found]                   ` <20160419200555.GA2561-jcswGhMUV9g@public.gmane.org>
  1 sibling, 0 replies; 66+ messages in thread
From: Jason Gunthorpe @ 2016-04-19 20:21 UTC (permalink / raw)
  To: 'Christoph Hellwig'
  Cc: Steve Wise, 'Sagi Grimberg', 'Ira Weiny',
	dledford, bart.vanassche, linux-rdma, target-devel

On Tue, Apr 19, 2016 at 10:05:55PM +0200, 'Christoph Hellwig' wrote:
> I can offer a trade:  once this series is accepted I'll clean
> up all the port_num arguments in the protocol checks over the whole
> tree :)

Yeah, this series has spun enough, it should land! :)

Jason

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH 02/12] IB/cma: pass the port number to ib_create_qp
       [not found]                   ` <20160419200555.GA2561-jcswGhMUV9g@public.gmane.org>
@ 2016-04-19 20:26                     ` Steve Wise
  2016-04-21  3:11                       ` ira.weiny
  0 siblings, 1 reply; 66+ messages in thread
From: Steve Wise @ 2016-04-19 20:26 UTC (permalink / raw)
  To: 'Christoph Hellwig'
  Cc: 'Jason Gunthorpe', 'Sagi Grimberg',
	'Ira Weiny',
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA



> -----Original Message-----
> From: 'Christoph Hellwig' [mailto:hch-jcswGhMUV9g@public.gmane.org]
> Sent: Tuesday, April 19, 2016 3:06 PM
> To: Steve Wise
> Cc: 'Jason Gunthorpe'; 'Sagi Grimberg'; 'Ira Weiny'; 'Christoph Hellwig';
> dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> target-devel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: [PATCH 02/12] IB/cma: pass the port number to ib_create_qp
> 
> I can offer a trade:  once this series is accepted I'll clean
> up all the port_num arguments in the protocol checks over the whole
> tree :)

I will help as needed.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/12] IB/cma: pass the port number to ib_create_qp
  2016-04-19 20:26                     ` Steve Wise
@ 2016-04-21  3:11                       ` ira.weiny
  0 siblings, 0 replies; 66+ messages in thread
From: ira.weiny @ 2016-04-21  3:11 UTC (permalink / raw)
  To: Steve Wise
  Cc: 'Christoph Hellwig', 'Jason Gunthorpe',
	'Sagi Grimberg',
	dledford, bart.vanassche, linux-rdma, target-devel

On Tue, Apr 19, 2016 at 03:26:37PM -0500, Steve Wise wrote:
> 
> 
> > -----Original Message-----
> > From: 'Christoph Hellwig' [mailto:hch@lst.de]
> > Sent: Tuesday, April 19, 2016 3:06 PM
> > To: Steve Wise
> > Cc: 'Jason Gunthorpe'; 'Sagi Grimberg'; 'Ira Weiny'; 'Christoph Hellwig';
> > dledford@redhat.com; bart.vanassche@sandisk.com; linux-rdma@vger.kernel.org;
> > target-devel@vger.kernel.org
> > Subject: Re: [PATCH 02/12] IB/cma: pass the port number to ib_create_qp
> > 
> > I can offer a trade:  once this series is accepted I'll clean
> > up all the port_num arguments in the protocol checks over the whole
> > tree :)
> 
> I will help as needed.
> 

Agreed.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 11/12] IB/core: add RW API support for signature MRs
       [not found]   ` <1460410360-13104-12-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2016-04-22 21:53     ` Bart Van Assche
  0 siblings, 0 replies; 66+ messages in thread
From: Bart Van Assche @ 2016-04-22 21:53 UTC (permalink / raw)
  To: Christoph Hellwig, dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagi-NQWnxTmZq1alnMjI0IkVqw, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

On 04/11/2016 02:32 PM, Christoph Hellwig wrote:
> +			pr_err("%s: failed to allocated %d MRs\n",
> +				__func__, nr_mrs);

The above probably should read "failed to allocate" instead of "failed 
to allocated"?

> +			pr_err("%s: failed to allocated %d SIG MRs\n",
> +				__func__, nr_mrs);

Same comment here.

If you have to repost this patch, please remove the trailing whitespace 
introduced by this patch.

Anyway:

Reviewed-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: generic RDMA READ/WRITE API V6
  2016-04-11 21:32 generic RDMA READ/WRITE API V6 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2016-04-12 18:31 ` generic RDMA READ/WRITE API V6 Steve Wise
@ 2016-04-22 22:29 ` Bart Van Assche
       [not found]   ` <571AA5C8.4080502-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  11 siblings, 1 reply; 66+ messages in thread
From: Bart Van Assche @ 2016-04-22 22:29 UTC (permalink / raw)
  To: Christoph Hellwig, dledford; +Cc: swise, sagi, linux-rdma, target-devel

On 04/11/2016 02:32 PM, Christoph Hellwig wrote:
> 	git://git.infradead.org/users/hch/rdma.git rdma-rw-api

Hello Christoph,

Is the version that has been pushed on April 18 the latest and greatest 
version of this patch series ? I'm asking because with that version I 
see error messages appearing that I hadn't seen with the previous version:

ib_srpt:srpt_qp_event: ib_srpt QP event 16 on cm_id=ffff8801713d5628 
sess_name=0x0000000000000000e41d2d03000a85b1 state=1
ib_srpt:srpt_qp_event: ib_srpt 0x0000000000000000e41d2d03000a85b1-522, 
state live: received Last WQE event.
ib_srpt RDMA_READ for ioctx 0xffff8804593092a8 failed with status 4

This test was run with the force_mr=Y:

$ cat /etc/modprobe.d/ib_core.conf
options ib_core force_mr=Y

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH 02/12] IB/cma: pass the port number to ib_create_qp
       [not found]             ` <20160419192430.GB27028-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2016-04-19 19:41               ` Steve Wise
@ 2016-04-28 19:43               ` Hefty, Sean
  2016-04-28 20:07                 ` Jason Gunthorpe
  1 sibling, 1 reply; 66+ messages in thread
From: Hefty, Sean @ 2016-04-28 19:43 UTC (permalink / raw)
  To: Jason Gunthorpe, Sagi Grimberg
  Cc: Weiny, Ira, Christoph Hellwig, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

> > HCH and I complained about this per-port distinction in several private
> > conversations. I'd really love to see it go away too.
> 
> I'm in support of eliminating them. One protocol per device.

I'm slow reading this thread, but there are devices today (e.g. qlogic) that support multiple protocols (e.g. iwarp, roce, rocev2).  Even the qib and opa drivers do, if you include psm as a separate protocol from ib.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/12] IB/cma: pass the port number to ib_create_qp
  2016-04-28 19:43               ` Hefty, Sean
@ 2016-04-28 20:07                 ` Jason Gunthorpe
  2016-04-28 21:53                   ` Hefty, Sean
  0 siblings, 1 reply; 66+ messages in thread
From: Jason Gunthorpe @ 2016-04-28 20:07 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Sagi Grimberg, Weiny, Ira, Christoph Hellwig, dledford,
	bart.vanassche, swise, linux-rdma, target-devel

On Thu, Apr 28, 2016 at 07:43:59PM +0000, Hefty, Sean wrote:
> > > HCH and I complained about this per-port distinction in several private
> > > conversations. I'd really love to see it go away too.
> > 
> > I'm in support of eliminating them. One protocol per device.
> 
> I'm slow reading this thread, but there are devices today
> (e.g. qlogic) that support multiple protocols (e.g. iwarp, roce,
> rocev2).  Even the qib and opa drivers do, if you include psm as a
> separate protocol from ib.

I see several litmus tests for what kinds of ports can be combined
into a device (eg the 'protocol'):

1) Various cap tests are the same on every port. Particularly the
   iWarp special behaviours we are talking about here.
2) AHs are not port-specific, so the AH addressing format must be
   defined by the device. Thus IB and iWarp cannot be combined.
3) Verbs APM must work across ports. So eg rocee and IB cannot be
   combined since they use a different CM process.

Multi-port really only exists to support APM, if APM doesn't work then
drivers don't need to create multi-port devices.

Jason

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 10/12] IB/srpt: convert to the generic RDMA READ/WRITE API
  2016-04-14 13:32     ` Christoph Hellwig
@ 2016-04-28 21:02       ` Doug Ledford
       [not found]         ` <57227A6D.4000802-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 66+ messages in thread
From: Doug Ledford @ 2016-04-28 21:02 UTC (permalink / raw)
  To: Christoph Hellwig, Bart Van Assche; +Cc: swise, sagi, linux-rdma, target-devel

[-- Attachment #1: Type: text/plain, Size: 2678 bytes --]

On 04/14/2016 09:32 AM, Christoph Hellwig wrote:
> On Wed, Apr 13, 2016 at 11:57:57AM -0700, Bart Van Assche wrote:
>> On 04/11/2016 02:32 PM, Christoph Hellwig wrote:
>>>   static int srpt_get_desc_tbl(struct srpt_send_ioctx *ioctx,
>>> -			     struct srp_cmd *srp_cmd,
>>> -			     enum dma_data_direction *dir, u64 *data_len)
>>> +		struct srp_cmd *srp_cmd, enum dma_data_direction *dir,
>>> +		struct scatterlist **sg, unsigned *sg_cnt, u64 *data_len)
>>>   {
>> [ ... ]
>>>
>>> -		db = idb->desc_list;
>>> -		memcpy(ioctx->rbufs, db, ioctx->n_rbuf * sizeof(*db));
>>>   		*data_len = be32_to_cpu(idb->len);
>>> +		return srpt_alloc_rw_ctxs(ioctx, idb->desc_list, nbufs,
>>> +				sg, sg_cnt);
>>> +	} else {
>>> +		*data_len = 0;
>>> +		return 0;
>>>   	}
>>> -out:
>>> -	return ret;
>>>   }
>>
>> srpt_get_desc_tbl() only has one caller. Have you considered to move 
>> srpt_alloc_rw_ctxs() from this function to the caller of 
>> srpt_get_desc_tbl()?
> 
> I looked into a couple options.  srpt_alloc_rw_ctxs needs the
> pointer to the srp_direct_buf array, and the number of buffers, so we'd
> need two more output arguments to srpt_get_desc_tbl, so it didn't
> seem worthwhile to me.  If you want me to make the change anyway, I can
> update the patch.
> 

Hi Christoph,

I see you responded to Bart's comment above, but in the same email he
had a second comment on this patch (that the logic was incorrect in part
of it), and I've not seen a response to that.  Here's the comment I'm
referring to:

>> -    if (srpt_get_desc_tbl(send_ioctx, srp_cmd, &dir, &data_len)) {
>> -        pr_err("0x%llx: parsing SRP descriptor table failed.\n",
>> -               srp_cmd->tag);
>> +    rc = srpt_get_desc_tbl(send_ioctx, srp_cmd, &dir, &sg, &sg_cnt,
>> +            &data_len);
>> +    if (rc) {
>> +        if (rc != -EAGAIN) {
>> +            pr_err("0x%llx: parsing SRP descriptor table failed.\n",
>> +                   srp_cmd->tag);
>> +        } else {
>> +            printk_ratelimited("out of MRs for 0x%llx\n", srp_cmd->tag);
>> +        }
>>           goto release_ioctx;
>>       }
> 
> Sorry but releasing an I/O context if srpt_alloc_rw_ctxs() returns
> -EAGAIN looks wrong to me. If this happens the I/O context should be
> added to the wait list without releasing it. Additionally,
> srpt_recv_done() will have to be modified such that newly received
> commands are added to the wait list if this list is not empty to
> prevent that starvation of a postponed request occurs due to new
> incoming requests.



-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 12/12] IB/isert: convert to the generic RDMA READ/WRITE API
       [not found]   ` <1460410360-13104-13-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2016-04-28 21:04     ` Doug Ledford
  2016-04-29 11:46       ` Sagi Grimberg
  0 siblings, 1 reply; 66+ messages in thread
From: Doug Ledford @ 2016-04-28 21:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagi-NQWnxTmZq1alnMjI0IkVqw, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 846 bytes --]

On 04/11/2016 05:32 PM, Christoph Hellwig wrote:
> Replace the homegrown RDMA READ/WRITE code in isert with the generic API,
> which also adds iWarp support to the I/O path as a side effect.  Note
> that full iWarp operation will need a few additional patches from Steve.
> 
> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> ---
>  drivers/infiniband/ulp/isert/ib_isert.c | 841 ++++----------------------------
>  drivers/infiniband/ulp/isert/ib_isert.h |  69 +--
>  2 files changed, 85 insertions(+), 825 deletions(-)

Hi Sagi,

I've seen your reviews on the smaller patches in this series, but this
one in particular has your name all over it.  If you could review it, I
would appreciate it ;-)

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/12] IB/cma: pass the port number to ib_create_qp
       [not found]         ` <20160419183830.GB1211-jcswGhMUV9g@public.gmane.org>
@ 2016-04-28 21:05           ` Doug Ledford
  0 siblings, 0 replies; 66+ messages in thread
From: Doug Ledford @ 2016-04-28 21:05 UTC (permalink / raw)
  To: Christoph Hellwig, Jason Gunthorpe
  Cc: Ira Weiny, bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagi-NQWnxTmZq1alnMjI0IkVqw, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1119 bytes --]

On 04/19/2016 02:38 PM, Christoph Hellwig wrote:
> On Tue, Apr 19, 2016 at 11:30:32AM -0600, Jason Gunthorpe wrote:
>> Right, this was done because rdma_protocol_iwarp takes a port number.
>>
>> I think we discussed this once, the core code doesn't actually support
>> different protocols on different ports, so the port_num argument to
>> rdma_protocol_iwarp is redundant.
>>
>> This all starts to look really goofy when multi-port APM is used and
>> the QP's port number changes dynamically at runtime. (I have some
>> experimental patches that do that), I'd rather see all the port_num
>> stuff in this series go away. :(
> 
> Reall, I would _love_ to kill all that port_num crap.  But until
> we get agreement from Doug and all the core maintainers that we
> can kill it from the core, and that multi-protocol devices are
> indeed as silly as they seem I can't..
> 

No worries, the patchset (and the subsequent series you promised a few
emails later in this thread) can proceed ;-)

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH 02/12] IB/cma: pass the port number to ib_create_qp
  2016-04-28 20:07                 ` Jason Gunthorpe
@ 2016-04-28 21:53                   ` Hefty, Sean
  2016-04-28 22:09                     ` Jason Gunthorpe
  0 siblings, 1 reply; 66+ messages in thread
From: Hefty, Sean @ 2016-04-28 21:53 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Sagi Grimberg, Weiny, Ira, Christoph Hellwig, dledford,
	bart.vanassche, swise, linux-rdma, target-devel

> I see several litmus tests for what kinds of ports can be combined
> into a device (eg the 'protocol'):
> 
> 1) Various cap tests are the same on every port. Particularly the
>    iWarp special behaviours we are talking about here.
> 2) AHs are not port-specific, so the AH addressing format must be
>    defined by the device. Thus IB and iWarp cannot be combined.
> 3) Verbs APM must work across ports. So eg rocee and IB cannot be
>    combined since they use a different CM process.
> 
> Multi-port really only exists to support APM, if APM doesn't work then
> drivers don't need to create multi-port devices.

I don't know the details of the qlogic device, but it is entirely possible that it allows different protocols to share resources (PDs, CQs, IP addresses, etc.).  I think we need to be careful dismissing multi-protocol devices as silly, or restricting which protocols can run over which port.  Restricting all ports on a device to support all protocols is different than restricting a device to supporting a single protocol, and it affects more than APM.

- Sean

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/12] IB/cma: pass the port number to ib_create_qp
  2016-04-28 21:53                   ` Hefty, Sean
@ 2016-04-28 22:09                     ` Jason Gunthorpe
  2016-04-28 23:23                       ` Hefty, Sean
  2016-04-28 23:25                       ` Weiny, Ira
  0 siblings, 2 replies; 66+ messages in thread
From: Jason Gunthorpe @ 2016-04-28 22:09 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Sagi Grimberg, Weiny, Ira, Christoph Hellwig, dledford,
	bart.vanassche, swise, linux-rdma, target-devel

On Thu, Apr 28, 2016 at 09:53:52PM +0000, Hefty, Sean wrote:
> > I see several litmus tests for what kinds of ports can be combined
> > into a device (eg the 'protocol'):
> > 
> > 1) Various cap tests are the same on every port. Particularly the
> >    iWarp special behaviours we are talking about here.
> > 2) AHs are not port-specific, so the AH addressing format must be
> >    defined by the device. Thus IB and iWarp cannot be combined.
> > 3) Verbs APM must work across ports. So eg rocee and IB cannot be
> >    combined since they use a different CM process.
> > 
> > Multi-port really only exists to support APM, if APM doesn't work then
> > drivers don't need to create multi-port devices.
> 
> I don't know the details of the qlogic device, but it is entirely
> possible that it allows different protocols to share resources (PDs,
> CQs, IP addresses, etc.).  I think we need to be careful dismissing
> multi-protocol devices as silly, or restricting which protocols can
> run over which port.

This isn't dismissing them as silly, it is a pragmatic need in the
core code that everything associated with a PD have a minimum standard
of uniformity - and it is very clear that includes things like the
iwarp special cases and the particular format of the AHs.

For instance, even if a hardware device can run rocee and iwarp
concurrently over a single port, today we absolutely must have
different struct ib_devices for the same physical port to be able to
plug that into the core stack.

Fundamentally we have the wrong model for such hardware. When a PD is
created it should set the 'protocol' and select the compatible member
ports that belong to the PD. Cap tests and so forth should be done
against the PD, not a port or a device.

Fixing that is major surgery, and having cap tests to the port is not
helping clarify the current situation.

> Restricting all ports on a device to support all protocols is
> different than restricting a device to supporting a single protocol,
> and it affects more than APM.

What else is there that is cross port in verbs?

Jason

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH 02/12] IB/cma: pass the port number to ib_create_qp
  2016-04-28 22:09                     ` Jason Gunthorpe
@ 2016-04-28 23:23                       ` Hefty, Sean
  2016-04-28 23:49                         ` Jason Gunthorpe
  2016-04-28 23:25                       ` Weiny, Ira
  1 sibling, 1 reply; 66+ messages in thread
From: Hefty, Sean @ 2016-04-28 23:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Sagi Grimberg, Weiny, Ira, Christoph Hellwig, dledford,
	bart.vanassche, swise, linux-rdma, target-devel

> This isn't dismissing them as silly, it is a pragmatic need in the
> core code that everything associated with a PD have a minimum standard
> of uniformity - and it is very clear that includes things like the
> iwarp special cases and the particular format of the AHs.

I was referring to Christoph's comment "that multi-protocol devices are indeed as silly as they seem".  Maybe we're using different meanings for the term 'device'.  I'm referring to the physical hardware.

> For instance, even if a hardware device can run rocee and iwarp
> concurrently over a single port, today we absolutely must have
> different struct ib_devices for the same physical port to be able to
> plug that into the core stack.
> 
> Fundamentally we have the wrong model for such hardware. When a PD is
> created it should set the 'protocol' and select the compatible member
> ports that belong to the PD. Cap tests and so forth should be done
> against the PD, not a port or a device.

I agree that the model is wrong.  But this is the first email I've read (and I skip reading a lot) mentioning the PD.  My concern is that the discussion mentioned removing multi-protocol support completely, rather than improving it.

> Fixing that is major surgery, and having cap tests to the port is not
> helping clarify the current situation.
> 
> > Restricting all ports on a device to support all protocols is
> > different than restricting a device to supporting a single protocol,
> > and it affects more than APM.
> 
> What else is there that is cross port in verbs?

I was referring to the sharing of resources (e.g. CQs, MRs) across different protocols on the same device.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH 02/12] IB/cma: pass the port number to ib_create_qp
  2016-04-28 22:09                     ` Jason Gunthorpe
  2016-04-28 23:23                       ` Hefty, Sean
@ 2016-04-28 23:25                       ` Weiny, Ira
       [not found]                         ` <2807E5FD2F6FDA4886F6618EAC48510E22EC858F-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
  1 sibling, 1 reply; 66+ messages in thread
From: Weiny, Ira @ 2016-04-28 23:25 UTC (permalink / raw)
  To: Jason Gunthorpe, Hefty, Sean
  Cc: Sagi Grimberg, Christoph Hellwig, dledford, bart.vanassche,
	swise, linux-rdma, target-devel

> 
> On Thu, Apr 28, 2016 at 09:53:52PM +0000, Hefty, Sean wrote:
> > > I see several litmus tests for what kinds of ports can be combined
> > > into a device (eg the 'protocol'):
> > >
> > > 1) Various cap tests are the same on every port. Particularly the
> > >    iWarp special behaviours we are talking about here.
> > > 2) AHs are not port-specific, so the AH addressing format must be
> > >    defined by the device. Thus IB and iWarp cannot be combined.
> > > 3) Verbs APM must work across ports. So eg rocee and IB cannot be
> > >    combined since they use a different CM process.
> > >
> > > Multi-port really only exists to support APM, if APM doesn't work
> > > then drivers don't need to create multi-port devices.
> >
> > I don't know the details of the qlogic device, but it is entirely
> > possible that it allows different protocols to share resources (PDs,
> > CQs, IP addresses, etc.).  I think we need to be careful dismissing
> > multi-protocol devices as silly, or restricting which protocols can
> > run over which port.
> 
> This isn't dismissing them as silly, it is a pragmatic need in the core code that
> everything associated with a PD have a minimum standard of uniformity -
> and it is very clear that includes things like the iwarp special cases and the
> particular format of the AHs.
> 
> For instance, even if a hardware device can run rocee and iwarp concurrently
> over a single port, today we absolutely must have different struct ib_devices
> for the same physical port to be able to plug that into the core stack.
> 
> Fundamentally we have the wrong model for such hardware. When a PD is
> created it should set the 'protocol' and select the compatible member ports
> that belong to the PD. Cap tests and so forth should be done against the PD,
> not a port or a device.
> 
> Fixing that is major surgery, and having cap tests to the port is not helping
> clarify the current situation.
> 
> > Restricting all ports on a device to support all protocols is
> > different than restricting a device to supporting a single protocol,
> > and it affects more than APM.
> 
> What else is there that is cross port in verbs?
> 

Well the statement said nothing about verbs.  It said 

<quote>
But until we get agreement from Doug and all the core maintainers that we can kill 
> it from the core, and that multi-protocol devices are indeed as silly 
> as they seem
</quote>

What Sean is again pointing out is that there are devices which support multiple protocols even on the same port.  What you say is true for _verbs_ QPs and _verbs_ PDs but not everything is a QP.

Also part of the history is that when these immutable capability flags were added we recognized that some devices would be supporting Ethernet (RoCE) on 1 port and IB on the other.  I do agree wih you that this is probably better modeled as 2 devices each with a single port.

Mellanox how hard will it be to change your drivers to that model?  I'm not even sure how the detection of Link Layer works any more.  Back in the day it took a config file and was done when the module loaded.  But I thought support for autodetection was in the works.  Is the driver capable of that now?  If so I see a number of issues here with users changing 1 of 2 IB ports from an IB switch to an Ethernet switch and having autodetection have to tear down a single port and create a device with the other.  What happens with APM then?   :-/

But all that is different from the qib/hfi case where we have 1 port with 2 protocols on it.  If we are going to add PSM into the core then I think it is _semantically_ appropriate for users to be able to query for the protocols supported on a port and get back more than 1.

I did not oppose the change Christoph suggested but that was before we started talking about adding in PSM...

Ira

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/12] IB/cma: pass the port number to ib_create_qp
  2016-04-28 23:23                       ` Hefty, Sean
@ 2016-04-28 23:49                         ` Jason Gunthorpe
  0 siblings, 0 replies; 66+ messages in thread
From: Jason Gunthorpe @ 2016-04-28 23:49 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Sagi Grimberg, Weiny, Ira, Christoph Hellwig, dledford,
	bart.vanassche, swise, linux-rdma, target-devel

On Thu, Apr 28, 2016 at 11:23:33PM +0000, Hefty, Sean wrote:
> > This isn't dismissing them as silly, it is a pragmatic need in the
> > core code that everything associated with a PD have a minimum standard
> > of uniformity - and it is very clear that includes things like the
> > iwarp special cases and the particular format of the AHs.
> 
> I was referring to Christoph's comment "that multi-protocol devices
> are indeed as silly as they seem".  Maybe we're using different
> meanings for the term 'device'.  I'm referring to the physical
> hardware.

Oh. I think the rest of this thread uses device to refer to a 'struct
ib_device'.

> > Fundamentally we have the wrong model for such hardware. When a PD is
> > created it should set the 'protocol' and select the compatible member
> > ports that belong to the PD. Cap tests and so forth should be done
> > against the PD, not a port or a device.
> 
> I agree that the model is wrong.  But this is the first email I've
> read (and I skip reading a lot) mentioning the PD.

The last time this came up we talked about the right place to apply
the 'cap' tests and the idea of using the PD or QP was briefly
discussed.

> My concern is that the discussion mentioned removing multi-protocol
> support completely, rather than improving it.

No, the topic is to remove the port num from the cap tests.

This is clarifying the current capability of the core code, which is
that a struct ib_device must have certain uniformity across all ports.

The port_num is totally wrong headed and is not the way to support
some future multi-protocol hardware within a single struct
ib_device.

> I was referring to the sharing of resources (e.g. CQs, MRs) across
> different protocols on the same device.

Hum, can those even realistically be shared? Eg an iWarp and IB MRs
have very different semantics, both in terms of the key and how the
permission model works.

IIRC CQs had some subtle differences too, and of course extracting an
address from a WC is very different.

This is why I bring up the PD as a 'narrower' container for the
required uniformity.

Jason

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 02/12] IB/cma: pass the port number to ib_create_qp
       [not found]                         ` <2807E5FD2F6FDA4886F6618EAC48510E22EC858F-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2016-04-29  0:01                           ` Jason Gunthorpe
  0 siblings, 0 replies; 66+ messages in thread
From: Jason Gunthorpe @ 2016-04-29  0:01 UTC (permalink / raw)
  To: Weiny, Ira
  Cc: Hefty, Sean, Sagi Grimberg, Christoph Hellwig,
	dledford-H+wXaHxf7aLQT0dZR+AlfA,
	bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

On Thu, Apr 28, 2016 at 11:25:35PM +0000, Weiny, Ira wrote:

> Mellanox how hard will it be to change your drivers to that model?
> I'm not even sure how the detection of Link Layer works any more.

Hmm, I vaugely remember looking into this and thinking the mlx drivers
already did this?

It may even be that IB and rocee can do APM and could perhaps be part
of the same struct ib_device without breaking the world. I actually
have no idea.

> But all that is different from the qib/hfi case where we have 1 port
> with 2 protocols on it.  If we are going to add PSM into the core
> then I think it is _semantically_ appropriate for users to be able
> to query for the protocols supported on a port and get back more
> than 1.

That doesn't make sense, the issue here is that we have a variety of
verbs 'flavours'. PSM is not a verbs flavour.

*If* PSM gets a kAPI (nobody is talking about doing this?) *and* gains
multiple incompatible flavours (such as OPA and IB?) then it will need
a unique set of cap tests and restrictions on which physical ports can
be used together.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 10/12] IB/srpt: convert to the generic RDMA READ/WRITE API
       [not found]         ` <57227A6D.4000802-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-04-29  6:34           ` Christoph Hellwig
       [not found]             ` <20160429063443.GA18893-jcswGhMUV9g@public.gmane.org>
  0 siblings, 1 reply; 66+ messages in thread
From: Christoph Hellwig @ 2016-04-29  6:34 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Christoph Hellwig, Bart Van Assche,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagi-NQWnxTmZq1alnMjI0IkVqw, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

On Thu, Apr 28, 2016 at 05:02:37PM -0400, Doug Ledford wrote:
> I see you responded to Bart's comment above, but in the same email he
> had a second comment on this patch (that the logic was incorrect in part
> of it), and I've not seen a response to that.  Here's the comment I'm
> referring to:

That one has also been addressed in the latest repost.  Bart has reported
another issue with the current version, which I plan to look into by
the weekend or early next week.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 12/12] IB/isert: convert to the generic RDMA READ/WRITE API
  2016-04-28 21:04     ` Doug Ledford
@ 2016-04-29 11:46       ` Sagi Grimberg
       [not found]         ` <572349AA.2070407-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
  0 siblings, 1 reply; 66+ messages in thread
From: Sagi Grimberg @ 2016-04-29 11:46 UTC (permalink / raw)
  To: Doug Ledford, Christoph Hellwig
  Cc: bart.vanassche, swise, linux-rdma, target-devel


>> Replace the homegrown RDMA READ/WRITE code in isert with the generic API,
>> which also adds iWarp support to the I/O path as a side effect.  Note
>> that full iWarp operation will need a few additional patches from Steve.
>>
>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>> ---
>>   drivers/infiniband/ulp/isert/ib_isert.c | 841 ++++----------------------------
>>   drivers/infiniband/ulp/isert/ib_isert.h |  69 +--
>>   2 files changed, 85 insertions(+), 825 deletions(-)
>
> Hi Sagi,
>
> I've seen your reviews on the smaller patches in this series, but this
> one in particular has your name all over it.  If you could review it, I
> would appreciate it ;-)

~800 LOC deleted, what's not to love ? :)

The patch looks fine to me,
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

I did some light testing with rxe and it works fine (no signature stuff
though). I've asked Mellanox folks to get this set into their regression
systems and still hoping to get their tested-by tag, but if we don't
hear from them by the merge window I don't think we should block it.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 10/12] IB/srpt: convert to the generic RDMA READ/WRITE API
       [not found]             ` <20160429063443.GA18893-jcswGhMUV9g@public.gmane.org>
@ 2016-04-29 14:44               ` Doug Ledford
  0 siblings, 0 replies; 66+ messages in thread
From: Doug Ledford @ 2016-04-29 14:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bart Van Assche, swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagi-NQWnxTmZq1alnMjI0IkVqw, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 676 bytes --]

On 04/29/2016 02:34 AM, Christoph Hellwig wrote:
> On Thu, Apr 28, 2016 at 05:02:37PM -0400, Doug Ledford wrote:
>> I see you responded to Bart's comment above, but in the same email he
>> had a second comment on this patch (that the logic was incorrect in part
>> of it), and I've not seen a response to that.  Here's the comment I'm
>> referring to:
> 
> That one has also been addressed in the latest repost.  Bart has reported
> another issue with the current version, which I plan to look into by
> the weekend or early next week.
> 

OK, thanks.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 12/12] IB/isert: convert to the generic RDMA READ/WRITE API
       [not found]         ` <572349AA.2070407-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
@ 2016-04-29 14:45           ` Doug Ledford
       [not found]             ` <e7959da7-79ca-0422-fbc9-9b3814516e1b-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 66+ messages in thread
From: Doug Ledford @ 2016-04-29 14:45 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig
  Cc: bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1465 bytes --]

On 04/29/2016 07:46 AM, Sagi Grimberg wrote:
> 
>>> Replace the homegrown RDMA READ/WRITE code in isert with the generic
>>> API,
>>> which also adds iWarp support to the I/O path as a side effect.  Note
>>> that full iWarp operation will need a few additional patches from Steve.
>>>
>>> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
>>> ---
>>>   drivers/infiniband/ulp/isert/ib_isert.c | 841
>>> ++++----------------------------
>>>   drivers/infiniband/ulp/isert/ib_isert.h |  69 +--
>>>   2 files changed, 85 insertions(+), 825 deletions(-)
>>
>> Hi Sagi,
>>
>> I've seen your reviews on the smaller patches in this series, but this
>> one in particular has your name all over it.  If you could review it, I
>> would appreciate it ;-)
> 
> ~800 LOC deleted, what's not to love ? :)

Well, if it neutered support for something in the process, it wouldn't
be to love ;-)

> The patch looks fine to me,
> Reviewed-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
> 
> I did some light testing with rxe and it works fine (no signature stuff
> though). I've asked Mellanox folks to get this set into their regression
> systems and still hoping to get their tested-by tag, but if we don't
> hear from them by the merge window I don't think we should block it.

Good to hear, thanks!

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 12/12] IB/isert: convert to the generic RDMA READ/WRITE API
       [not found]             ` <e7959da7-79ca-0422-fbc9-9b3814516e1b-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2016-04-29 16:42               ` Leon Romanovsky
  0 siblings, 0 replies; 66+ messages in thread
From: Leon Romanovsky @ 2016-04-29 16:42 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Sagi Grimberg, Christoph Hellwig,
	bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 1739 bytes --]

On Fri, Apr 29, 2016 at 10:45:34AM -0400, Doug Ledford wrote:
> On 04/29/2016 07:46 AM, Sagi Grimberg wrote:
> > 
> >>> Replace the homegrown RDMA READ/WRITE code in isert with the generic
> >>> API,
> >>> which also adds iWarp support to the I/O path as a side effect.  Note
> >>> that full iWarp operation will need a few additional patches from Steve.
> >>>
> >>> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> >>> ---
> >>>   drivers/infiniband/ulp/isert/ib_isert.c | 841
> >>> ++++----------------------------
> >>>   drivers/infiniband/ulp/isert/ib_isert.h |  69 +--
> >>>   2 files changed, 85 insertions(+), 825 deletions(-)
> >>
> >> Hi Sagi,
> >>
> >> I've seen your reviews on the smaller patches in this series, but this
> >> one in particular has your name all over it.  If you could review it, I
> >> would appreciate it ;-)
> > 
> > ~800 LOC deleted, what's not to love ? :)
> 
> Well, if it neutered support for something in the process, it wouldn't
> be to love ;-)
> 
> > The patch looks fine to me,
> > Reviewed-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
> > 
> > I did some light testing with rxe and it works fine (no signature stuff
> > though). I've asked Mellanox folks to get this set into their regression
> > systems and still hoping to get their tested-by tag, but if we don't
> > hear from them by the merge window I don't think we should block it.
> 
> Good to hear, thanks!

We are in the Passover vacation, and most of the people aren't at
work, so I don't know when it is planned to be tested.

> 
> -- 
> Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>               GPG KeyID: 0E572FDD
> 
> 



[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: generic RDMA READ/WRITE API V6
       [not found]   ` <571AA5C8.4080502-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2016-05-02 15:15     ` Christoph Hellwig
       [not found]       ` <20160502151535.GA520-jcswGhMUV9g@public.gmane.org>
  0 siblings, 1 reply; 66+ messages in thread
From: Christoph Hellwig @ 2016-05-02 15:15 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagi-NQWnxTmZq1alnMjI0IkVqw, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

On Fri, Apr 22, 2016 at 03:29:28PM -0700, Bart Van Assche wrote:
> On 04/11/2016 02:32 PM, Christoph Hellwig wrote:
>> 	git://git.infradead.org/users/hch/rdma.git rdma-rw-api
>
> Hello Christoph,
>
> Is the version that has been pushed on April 18 the latest and greatest 
> version of this patch series ?

Should be.  I've pushed out a new version, but the only changes are
in response to your small review comments, and a no-op rebase to Doug's
latest tree.

> I'm asking because with that version I see 
> error messages appearing that I hadn't seen with the previous version:
>
> ib_srpt:srpt_qp_event: ib_srpt QP event 16 on cm_id=ffff8801713d5628 
> sess_name=0x0000000000000000e41d2d03000a85b1 state=1
> ib_srpt:srpt_qp_event: ib_srpt 0x0000000000000000e41d2d03000a85b1-522, 
> state live: received Last WQE event.
> ib_srpt RDMA_READ for ioctx 0xffff8804593092a8 failed with status 4
>
> This test was run with the force_mr=Y:
>
> $ cat /etc/modprobe.d/ib_core.conf
> options ib_core force_mr=Y

I haven't been able to reproduce this with my usual xfstests run
on mlx4 hardware.  What did you do to reproduce the issue, and what
hardware were you using?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: generic RDMA READ/WRITE API V6
       [not found]       ` <20160502151535.GA520-jcswGhMUV9g@public.gmane.org>
@ 2016-05-02 19:08         ` Bart Van Assche
  2016-05-02 22:14           ` Bart Van Assche
       [not found]           ` <5727A5C7.1090009-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 2 replies; 66+ messages in thread
From: Bart Van Assche @ 2016-05-02 19:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagi-NQWnxTmZq1alnMjI0IkVqw, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

On 05/02/2016 08:15 AM, Christoph Hellwig wrote:
> On Fri, Apr 22, 2016 at 03:29:28PM -0700, Bart Van Assche wrote:
>> On 04/11/2016 02:32 PM, Christoph Hellwig wrote:
>>> 	git://git.infradead.org/users/hch/rdma.git rdma-rw-api
>>
>> Hello Christoph,
>>
>> Is the version that has been pushed on April 18 the latest and greatest
>> version of this patch series ?
>
> Should be.  I've pushed out a new version, but the only changes are
> in response to your small review comments, and a no-op rebase to Doug's
> latest tree.
>
>> I'm asking because with that version I see
>> error messages appearing that I hadn't seen with the previous version:
>>
>> ib_srpt:srpt_qp_event: ib_srpt QP event 16 on cm_id=ffff8801713d5628
>> sess_name=0x0000000000000000e41d2d03000a85b1 state=1
>> ib_srpt:srpt_qp_event: ib_srpt 0x0000000000000000e41d2d03000a85b1-522,
>> state live: received Last WQE event.
>> ib_srpt RDMA_READ for ioctx 0xffff8804593092a8 failed with status 4
>>
>> This test was run with the force_mr=Y:
>>
>> $ cat /etc/modprobe.d/ib_core.conf
>> options ib_core force_mr=Y
>
> I haven't been able to reproduce this with my usual xfstests run
> on mlx4 hardware.  What did you do to reproduce the issue, and what
> hardware were you using?

After having disabled CONFIG_SLUB_DEBUG_ON I don't see the "QP event" 
message anymore. But running xfstests triggered the following (mlx4 
hardware; SRP initiator and LIO target running on the same server and 
communicating over loopback):

WARNING: CPU: 11 PID: 9224 at drivers/infiniband/ulp/srpt/ib_srpt.c:1209 
srpt_rdma_read_done+0xc7/0x110 [ib_srpt]
Call Trace:
  [<ffffffff812c0bf5>] dump_stack+0x67/0x92
  [<ffffffff81058011>] __warn+0xc1/0xe0
  [<ffffffff810580e8>] warn_slowpath_null+0x18/0x20
  [<ffffffffa05db7c7>] srpt_rdma_read_done+0xc7/0x110 [ib_srpt]
  [<ffffffffa045c73b>] __ib_process_cq+0x4b/0xd0 [ib_core]
  [<ffffffffa045c82b>] ib_cq_poll_work+0x1b/0x60 [ib_core]
  [<ffffffff81071fea>] process_one_work+0x19a/0x490
  [<ffffffff81071f8a>] ? process_one_work+0x13a/0x490
  [<ffffffff81072329>] worker_thread+0x49/0x490
  [<ffffffff810722e0>] ? process_one_work+0x490/0x490
  [<ffffffff810788da>] kthread+0xea/0x100
  [<ffffffff8159e632>] ret_from_fork+0x22/0x40

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: generic RDMA READ/WRITE API V6
  2016-05-02 19:08         ` Bart Van Assche
@ 2016-05-02 22:14           ` Bart Van Assche
  2016-05-03  8:40             ` Christoph Hellwig
       [not found]           ` <5727A5C7.1090009-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  1 sibling, 1 reply; 66+ messages in thread
From: Bart Van Assche @ 2016-05-02 22:14 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: dledford, swise, sagi, linux-rdma, target-devel

On 05/02/2016 12:08 PM, Bart Van Assche wrote:
> On 05/02/2016 08:15 AM, Christoph Hellwig wrote:
>> On Fri, Apr 22, 2016 at 03:29:28PM -0700, Bart Van Assche wrote:
>>> On 04/11/2016 02:32 PM, Christoph Hellwig wrote:
>>>> 	git://git.infradead.org/users/hch/rdma.git rdma-rw-api
>>>
>>> Hello Christoph,
>>>
>>> Is the version that has been pushed on April 18 the latest and greatest
>>> version of this patch series ?
>>
>> Should be.  I've pushed out a new version, but the only changes are
>> in response to your small review comments, and a no-op rebase to Doug's
>> latest tree.
>>
>>> I'm asking because with that version I see
>>> error messages appearing that I hadn't seen with the previous version:
>>>
>>> ib_srpt:srpt_qp_event: ib_srpt QP event 16 on cm_id=ffff8801713d5628
>>> sess_name=0x0000000000000000e41d2d03000a85b1 state=1
>>> ib_srpt:srpt_qp_event: ib_srpt 0x0000000000000000e41d2d03000a85b1-522,
>>> state live: received Last WQE event.
>>> ib_srpt RDMA_READ for ioctx 0xffff8804593092a8 failed with status 4
>>>
>>> This test was run with the force_mr=Y:
>>>
>>> $ cat /etc/modprobe.d/ib_core.conf
>>> options ib_core force_mr=Y
>>
>> I haven't been able to reproduce this with my usual xfstests run
>> on mlx4 hardware.  What did you do to reproduce the issue, and what
>> hardware were you using?
>
> After having disabled CONFIG_SLUB_DEBUG_ON I don't see the "QP event"
> message anymore. But running xfstests triggered the following (mlx4
> hardware; SRP initiator and LIO target running on the same server and
> communicating over loopback):
>
> WARNING: CPU: 11 PID: 9224 at drivers/infiniband/ulp/srpt/ib_srpt.c:1209
> srpt_rdma_read_done+0xc7/0x110 [ib_srpt]
> Call Trace:
>    [<ffffffff812c0bf5>] dump_stack+0x67/0x92
>    [<ffffffff81058011>] __warn+0xc1/0xe0
>    [<ffffffff810580e8>] warn_slowpath_null+0x18/0x20
>    [<ffffffffa05db7c7>] srpt_rdma_read_done+0xc7/0x110 [ib_srpt]
>    [<ffffffffa045c73b>] __ib_process_cq+0x4b/0xd0 [ib_core]
>    [<ffffffffa045c82b>] ib_cq_poll_work+0x1b/0x60 [ib_core]
>    [<ffffffff81071fea>] process_one_work+0x19a/0x490
>    [<ffffffff81071f8a>] ? process_one_work+0x13a/0x490
>    [<ffffffff81072329>] worker_thread+0x49/0x490
>    [<ffffffff810722e0>] ? process_one_work+0x490/0x490
>    [<ffffffff810788da>] kthread+0xea/0x100
>    [<ffffffff8159e632>] ret_from_fork+0x22/0x40

(replying to my own e-mail)

I just noticed that ib_comp_wq is created as follows:

	ib_comp_wq = alloc_workqueue("ib-comp-wq",
			WQ_UNBOUND | WQ_HIGHPRI | WQ_MEM_RECLAIM,
			WQ_UNBOUND_MAX_ACTIVE);

I think this breaks the locking guarantees for completion handlers. A 
quote from Documentation/infiniband/core_locking.txt: "The driver must 
guarantee that only one CQ event handler for a given CQ is running at a 
time." The ib_srpt driver assumes that completion handler invocations 
are serialized such that no locking is needed to access wait_list from 
inside a completion handler.

Bart.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: generic RDMA READ/WRITE API V6
  2016-05-02 22:14           ` Bart Van Assche
@ 2016-05-03  8:40             ` Christoph Hellwig
  2016-05-03 16:10               ` Bart Van Assche
  0 siblings, 1 reply; 66+ messages in thread
From: Christoph Hellwig @ 2016-05-03  8:40 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, dledford, swise, sagi, linux-rdma, target-devel

On Mon, May 02, 2016 at 03:14:34PM -0700, Bart Van Assche wrote:
> I just noticed that ib_comp_wq is created as follows:
>
> 	ib_comp_wq = alloc_workqueue("ib-comp-wq",
> 			WQ_UNBOUND | WQ_HIGHPRI | WQ_MEM_RECLAIM,
> 			WQ_UNBOUND_MAX_ACTIVE);
>
> I think this breaks the locking guarantees for completion handlers. A quote 
> from Documentation/infiniband/core_locking.txt: "The driver must guarantee 
> that only one CQ event handler for a given CQ is running at a time." The 
> ib_srpt driver assumes that completion handler invocations are serialized 
> such that no locking is needed to access wait_list from inside a completion 
> handler.

This should still be the case - the max_active argument to alloc_workqueue
just specified the amount of work_structs that may be exectured on the
workqueue concurrently, but each individual work_struct can only be
excuted once at a time.  See the following paragraph in
Documentation/workqueue.txt:

"Note that the flag WQ_NON_REENTRANT no longer exists as all workqueues
 are now non-reentrant - any work item is guaranteed to be executed by
 at most one worker system-wide at any given time."

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: generic RDMA READ/WRITE API V6
       [not found]           ` <5727A5C7.1090009-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2016-05-03 14:31             ` Christoph Hellwig
       [not found]               ` <20160503143104.GA30342-jcswGhMUV9g@public.gmane.org>
  0 siblings, 1 reply; 66+ messages in thread
From: Christoph Hellwig @ 2016-05-03 14:31 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, dledford-H+wXaHxf7aLQT0dZR+AlfA,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagi-NQWnxTmZq1alnMjI0IkVqw, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

On Mon, May 02, 2016 at 12:08:55PM -0700, Bart Van Assche wrote:
> After having disabled CONFIG_SLUB_DEBUG_ON I don't see the "QP event" 
> message anymore.

This brings up memories: we've seen odd, unexplainable with SLUB debug and
MRs during NVMe over fabrics development..

> But running xfstests triggered the following (mlx4 
> hardware; SRP initiator and LIO target running on the same server and 
> communicating over loopback):

I can reproduce this, thanks.  The issue was that my implementation
of keeping the MRs around when getting an -EAGAIN for setting up new
RDMA R/W contexts wasn't correct.  To make it properly work we'd need
a pointer to the send ioctx from the recv ioctx.  I don't feel safe
to make this change at this point, and given that force_mr is only
a debug option for the SRP target until RDMA/CM support goes in I
think we should be fine without it.  I'll resend the series once I
get feedback from the buildbot, and I'd be happy if you could review
it quickly.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: generic RDMA READ/WRITE API V6
  2016-05-03  8:40             ` Christoph Hellwig
@ 2016-05-03 16:10               ` Bart Van Assche
  0 siblings, 0 replies; 66+ messages in thread
From: Bart Van Assche @ 2016-05-03 16:10 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: dledford, swise, sagi, linux-rdma, target-devel

On 05/03/2016 01:40 AM, Christoph Hellwig wrote:
> This should still be the case - the max_active argument to alloc_workqueue
> just specified the amount of work_structs that may be executed on the
> workqueue concurrently, but each individual work_struct can only be
> executed once at a time.  See the following paragraph in
> Documentation/workqueue.txt:
>
> "Note that the flag WQ_NON_REENTRANT no longer exists as all workqueues
>   are now non-reentrant - any work item is guaranteed to be executed by
>   at most one worker system-wide at any given time."

Thanks for the feedback. I had overlooked the 
find_worker_executing_work() call in __queue_work() in 
kernel/workqueue.c when I reviewed that code yesterday.

Bart.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: generic RDMA READ/WRITE API V6
       [not found]               ` <20160503143104.GA30342-jcswGhMUV9g@public.gmane.org>
@ 2016-05-03 21:23                 ` Bart Van Assche
  0 siblings, 0 replies; 66+ messages in thread
From: Bart Van Assche @ 2016-05-03 21:23 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagi-NQWnxTmZq1alnMjI0IkVqw, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA

On 05/03/2016 07:31 AM, Christoph Hellwig wrote:
>> But running xfstests triggered the following (mlx4
>> hardware; SRP initiator and LIO target running on the same server and
>> communicating over loopback):
>
> I can reproduce this, thanks.  The issue was that my implementation
> of keeping the MRs around when getting an -EAGAIN for setting up new
> RDMA R/W contexts wasn't correct.  To make it properly work we'd need
> a pointer to the send ioctx from the recv ioctx.  I don't feel safe
> to make this change at this point, and given that force_mr is only
> a debug option for the SRP target until RDMA/CM support goes in I
> think we should be fine without it.  I'll resend the series once I
> get feedback from the buildbot, and I'd be happy if you could review
> it quickly.

Thanks for the analysis. I'm fine with this patch series going upstream.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 07/12] IB/core: add a need_inval flag to struct ib_mr
       [not found] ` <1461010463-6603-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
@ 2016-04-18 20:14   ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2016-04-18 20:14 UTC (permalink / raw)
  To: dledford-H+wXaHxf7aLQT0dZR+AlfA
  Cc: bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	sagi-NQWnxTmZq1alnMjI0IkVqw, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	target-devel-u79uwXL29TY76Z2rM5mHXA, Steve Wise

From: Steve Wise <swise-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>

This is the first step toward moving MR invalidation decisions
to the core.  It will be needed by the upcoming RW API.

Signed-off-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
Reviewed-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Reviewed-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
---
 drivers/infiniband/core/verbs.c | 2 ++
 include/rdma/ib_verbs.h         | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index d9ea2fb..179d800 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1353,6 +1353,7 @@ struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags)
 		mr->pd      = pd;
 		mr->uobject = NULL;
 		atomic_inc(&pd->usecnt);
+		mr->need_inval = false;
 	}
 
 	return mr;
@@ -1399,6 +1400,7 @@ struct ib_mr *ib_alloc_mr(struct ib_pd *pd,
 		mr->pd      = pd;
 		mr->uobject = NULL;
 		atomic_inc(&pd->usecnt);
+		mr->need_inval = false;
 	}
 
 	return mr;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 400a8a0..3f66647 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1446,6 +1446,7 @@ struct ib_mr {
 	u64		   iova;
 	u32		   length;
 	unsigned int	   page_size;
+	bool		   need_inval;
 	union {
 		struct ib_uobject	*uobject;	/* user */
 		struct list_head	qp_entry;	/* FR */
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2016-05-03 21:23 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-11 21:32 generic RDMA READ/WRITE API V6 Christoph Hellwig
2016-04-11 21:32 ` [PATCH 02/12] IB/cma: pass the port number to ib_create_qp Christoph Hellwig
     [not found]   ` <1460410360-13104-3-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-04-15 17:55     ` Sagi Grimberg
2016-04-19  3:14   ` Ira Weiny
2016-04-19 17:30     ` Jason Gunthorpe
2016-04-19 18:38       ` Christoph Hellwig
     [not found]         ` <20160419183830.GB1211-jcswGhMUV9g@public.gmane.org>
2016-04-28 21:05           ` Doug Ledford
     [not found]       ` <20160419173032.GD20844-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-04-19 18:49         ` Sagi Grimberg
2016-04-19 19:24           ` Jason Gunthorpe
     [not found]             ` <20160419192430.GB27028-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-04-19 19:41               ` Steve Wise
2016-04-19 20:05                 ` 'Christoph Hellwig'
2016-04-19 20:21                   ` Jason Gunthorpe
     [not found]                   ` <20160419200555.GA2561-jcswGhMUV9g@public.gmane.org>
2016-04-19 20:26                     ` Steve Wise
2016-04-21  3:11                       ` ira.weiny
2016-04-28 19:43               ` Hefty, Sean
2016-04-28 20:07                 ` Jason Gunthorpe
2016-04-28 21:53                   ` Hefty, Sean
2016-04-28 22:09                     ` Jason Gunthorpe
2016-04-28 23:23                       ` Hefty, Sean
2016-04-28 23:49                         ` Jason Gunthorpe
2016-04-28 23:25                       ` Weiny, Ira
     [not found]                         ` <2807E5FD2F6FDA4886F6618EAC48510E22EC858F-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2016-04-29  0:01                           ` Jason Gunthorpe
2016-04-11 21:32 ` [PATCH 03/12] IB/core: allow passing mapping an offset into the SG in ib_map_mr_sg Christoph Hellwig
     [not found]   ` <1460410360-13104-4-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-04-15 17:56     ` Sagi Grimberg
2016-04-11 21:32 ` [PATCH 04/12] IB/core: add a helper to check for READ WITH INVALIDATE support Christoph Hellwig
     [not found]   ` <1460410360-13104-5-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-04-15 17:56     ` Sagi Grimberg
2016-04-19  3:15   ` Ira Weiny
2016-04-11 21:32 ` [PATCH 05/12] IB/core: refactor ib_create_qp Christoph Hellwig
2016-04-17 20:00   ` Sagi Grimberg
     [not found]   ` <1460410360-13104-6-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-04-19  3:08     ` Ira Weiny
2016-04-11 21:32 ` [PATCH 07/12] IB/core: add a need_inval flag to struct ib_mr Christoph Hellwig
     [not found]   ` <1460410360-13104-8-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-04-17 20:01     ` Sagi Grimberg
2016-04-19  3:20   ` Ira Weiny
2016-04-11 21:32 ` [PATCH 08/12] IB/core: generic RDMA READ/WRITE API Christoph Hellwig
     [not found]   ` <1460410360-13104-9-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-04-12 23:52     ` Bart Van Assche
     [not found]       ` <570D8A42.9040107-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-04-13 13:50         ` Christoph Hellwig
     [not found] ` <1460410360-13104-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-04-11 21:32   ` [PATCH 01/12] IB/mlx5: Expose correct max_sge_rd limit Christoph Hellwig
     [not found]     ` <1460410360-13104-2-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-04-17 13:53       ` Leon Romanovsky
     [not found]         ` <20160417135341.GC6349-2ukJVAZIZ/Y@public.gmane.org>
2016-04-17 18:06           ` Christoph Hellwig
2016-04-11 21:32   ` [PATCH 06/12] IB/core: add a simple MR pool Christoph Hellwig
2016-04-17 20:01     ` Sagi Grimberg
2016-04-19  3:19     ` Ira Weiny
2016-04-11 21:32   ` [PATCH 09/12] target: enhance and export target_alloc_sgl/target_free_sgl Christoph Hellwig
2016-04-11 21:32 ` [PATCH 10/12] IB/srpt: convert to the generic RDMA READ/WRITE API Christoph Hellwig
2016-04-13 18:57   ` Bart Van Assche
2016-04-14 13:32     ` Christoph Hellwig
2016-04-28 21:02       ` Doug Ledford
     [not found]         ` <57227A6D.4000802-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-04-29  6:34           ` Christoph Hellwig
     [not found]             ` <20160429063443.GA18893-jcswGhMUV9g@public.gmane.org>
2016-04-29 14:44               ` Doug Ledford
2016-04-11 21:32 ` [PATCH 11/12] IB/core: add RW API support for signature MRs Christoph Hellwig
     [not found]   ` <1460410360-13104-12-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-04-22 21:53     ` Bart Van Assche
2016-04-11 21:32 ` [PATCH 12/12] IB/isert: convert to the generic RDMA READ/WRITE API Christoph Hellwig
     [not found]   ` <1460410360-13104-13-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-04-28 21:04     ` Doug Ledford
2016-04-29 11:46       ` Sagi Grimberg
     [not found]         ` <572349AA.2070407-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2016-04-29 14:45           ` Doug Ledford
     [not found]             ` <e7959da7-79ca-0422-fbc9-9b3814516e1b-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2016-04-29 16:42               ` Leon Romanovsky
2016-04-12 18:31 ` generic RDMA READ/WRITE API V6 Steve Wise
2016-04-22 22:29 ` Bart Van Assche
     [not found]   ` <571AA5C8.4080502-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-05-02 15:15     ` Christoph Hellwig
     [not found]       ` <20160502151535.GA520-jcswGhMUV9g@public.gmane.org>
2016-05-02 19:08         ` Bart Van Assche
2016-05-02 22:14           ` Bart Van Assche
2016-05-03  8:40             ` Christoph Hellwig
2016-05-03 16:10               ` Bart Van Assche
     [not found]           ` <5727A5C7.1090009-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2016-05-03 14:31             ` Christoph Hellwig
     [not found]               ` <20160503143104.GA30342-jcswGhMUV9g@public.gmane.org>
2016-05-03 21:23                 ` Bart Van Assche
2016-04-18 20:14 generic RDMA READ/WRITE API V7 Christoph Hellwig
     [not found] ` <1461010463-6603-1-git-send-email-hch-jcswGhMUV9g@public.gmane.org>
2016-04-18 20:14   ` [PATCH 07/12] IB/core: add a need_inval flag to struct ib_mr Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.