All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH WIP 00/43] New fast registration API
@ 2015-07-22  6:55 Sagi Grimberg
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Hi all,

So I went ahead and tried to implement some of the stuff
we've been talking about. I figured I'd send out a WIP version
to try and communicate early where this is heading.

In order to have a sane patchset I followed a scheme that
add-new/port-existing/drop-old...

The set starts with:
- Convert ib_create_mr API to ib_alloc_mr as Christoph suggested (1)
- Add vendor drivers support for ib_alloc_mr (2-7)
- Port ULPs to use ib_alloc_mr (8-12)
- Drop alloc_fast_reg_mr API (core + vendor drivers) (13-20)

Continues with:
- Allocate vendor private page lists (21-27)
- Add a new fast registration API that will replace existing frwr (28)
- Add support for the new API in relevant vendor drivers (29-35)
  * its a bit hacky since just bluntly duplicated the registration routines
    keep in mind that this is transient until we drop the old API...
- Port ULPs to use the new API (iser, isert, xprtrdma for now) (36-38)
  this is on top of Chuck's nfs-rdma-for-4.3 and updated iser/isert code

The set should end with:
- Complete ULPs porting (svcrdma, rds, srp)
- Drop old fast registration API - FRWR (core + vendor drivers)
- Still have the huge-pages bit to work out.

I also added the arbitrary sg list registration support to mlx5 and iser
in a less intrusive API additions (39-43) just to show the concept.

This set was lightly tested on the ported ULPs over mlx5 (didn't have a
chance to test mlx4 yet).

The main reasons for this preview are:
- Help with testing (especially on devices that I don't have access to
  e.g cxgb3, cxgb4, ocrdma, nes, qib). I probably have bugs there
  as I just compile tested so far.
- Help with porting of the rest of the ULPs (rds, srp, svcrdma)
- Early code review

What I've noticed from this effort was that several drivers keep
a shadow mapped page lists for specific device settings. At registration
time, the drivers iterate on the page list and sets the mapped page list
entries with some extra information. I'd expect these drivers not to use
the core function to map SG list to pages and use it's own function which
will allow them to lose their page list duplication. I haven't done that yet.

Comments and review are welcomed (and needed!).

Sorry for the long series, but it's kinda transverse...

The code/patches can be found in:
https://github.com/sagigrimberg/linux/tree/fastreg_api_wip

Sagi Grimberg (43):
  IB: Modify ib_create_mr API
  IB/mlx4: Support ib_alloc_mr verb
  ocrdma: Support ib_alloc_mr verb
  iw_cxgb4: Support ib_alloc_mr verb
  cxgb3: Support ib_alloc_mr verb
  nes: Support ib_alloc_mr verb
  qib: Support ib_alloc_mr verb
  IB/iser: Convert to ib_alloc_mr
  iser-target: Convert to ib_alloc_mr
  IB/srp: Convert to ib_alloc_mr
  xprtrdma, svcrdma: Convert to ib_alloc_mr
  RDS: Convert to ib_alloc_mr
  mlx5: Drop mlx5_ib_alloc_fast_reg_mr
  mlx4: Drop mlx4_ib_alloc_fast_reg_mr
  ocrdma: Drop ocrdma_alloc_frmr
  qib: Drop qib_alloc_fast_reg_mr
  nes: Drop nes_alloc_fast_reg_mr
  cxgb4: Drop c4iw_alloc_fast_reg_mr
  cxgb3: Drop iwch_alloc_fast_reg_mr
  IB/core: Drop ib_alloc_fast_reg_mr
  mlx5: Allocate a private page list in ib_alloc_mr
  mlx4: Allocate a private page list in ib_alloc_mr
  ocrdma: Allocate a private page list in ib_alloc_mr
  cxgb3: Allocate a provate page list in ib_alloc_mr
  cxgb4: Allocate a private page list in ib_alloc_mr
  qib: Allocate a private page list in ib_alloc_mr
  nes: Allocate a private page list in ib_alloc_mr
  IB/core: Introduce new fast registration API
  mlx5: Support the new memory registration API
  mlx4: Support the new memory registration API
  ocrdma: Support the new memory registration API
  cxgb3: Support the new memory registration API
  cxgb4: Support the new memory registration API
  nes: Support the new memory registration API
  qib: Support the new memory registration API
  iser: Port to new fast registration api
  xprtrdma: Port to new memory registration API
  iser-target: Port to new memory registration API
  IB/core: Add arbitrary sg_list support
  mlx5: Allocate private context for arbitrary scatterlist registration
  mlx5: Add arbitrary sg list support
  iser: Accept arbitrary sg lists mapping if the device supports it
  iser: Move unaligned counter increment

 drivers/infiniband/core/verbs.c             | 164 ++++++++++++++++++----
 drivers/infiniband/hw/cxgb3/iwch_provider.c |  35 ++++-
 drivers/infiniband/hw/cxgb3/iwch_provider.h |   2 +
 drivers/infiniband/hw/cxgb3/iwch_qp.c       |  48 +++++++
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h      |  12 +-
 drivers/infiniband/hw/cxgb4/mem.c           |  38 +++++-
 drivers/infiniband/hw/cxgb4/provider.c      |   3 +-
 drivers/infiniband/hw/cxgb4/qp.c            |  75 +++++++++-
 drivers/infiniband/hw/mlx4/main.c           |   3 +-
 drivers/infiniband/hw/mlx4/mlx4_ib.h        |  14 +-
 drivers/infiniband/hw/mlx4/mr.c             |  74 +++++++++-
 drivers/infiniband/hw/mlx4/qp.c             |  27 ++++
 drivers/infiniband/hw/mlx5/main.c           |   5 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h        |  20 ++-
 drivers/infiniband/hw/mlx5/mr.c             | 204 +++++++++++++++++++++-------
 drivers/infiniband/hw/mlx5/qp.c             | 107 +++++++++++++++
 drivers/infiniband/hw/nes/nes_verbs.c       | 129 +++++++++++++++++-
 drivers/infiniband/hw/nes/nes_verbs.h       |   5 +
 drivers/infiniband/hw/ocrdma/ocrdma.h       |   2 +
 drivers/infiniband/hw/ocrdma/ocrdma_main.c  |   3 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c |  88 +++++++++++-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h |   8 +-
 drivers/infiniband/hw/qib/qib_keys.c        |  56 ++++++++
 drivers/infiniband/hw/qib/qib_mr.c          |  30 +++-
 drivers/infiniband/hw/qib/qib_verbs.c       |   8 +-
 drivers/infiniband/hw/qib/qib_verbs.h       |  12 +-
 drivers/infiniband/ulp/iser/iscsi_iser.h    |   6 +-
 drivers/infiniband/ulp/iser/iser_memory.c   |  48 +++----
 drivers/infiniband/ulp/iser/iser_verbs.c    |  38 ++----
 drivers/infiniband/ulp/isert/ib_isert.c     | 128 ++++-------------
 drivers/infiniband/ulp/isert/ib_isert.h     |   2 -
 drivers/infiniband/ulp/srp/ib_srp.c         |   3 +-
 include/rdma/ib_verbs.h                     |  88 +++++++-----
 net/rds/iw_rdma.c                           |   5 +-
 net/rds/iw_send.c                           |   5 +-
 net/sunrpc/xprtrdma/frwr_ops.c              |  86 ++++++------
 net/sunrpc/xprtrdma/svc_rdma_transport.c    |   2 +-
 net/sunrpc/xprtrdma/xprt_rdma.h             |   4 +-
 38 files changed, 1223 insertions(+), 364 deletions(-)

-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* [PATCH WIP 01/43] IB: Modify ib_create_mr API
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-07-22  6:55   ` Sagi Grimberg
       [not found]     ` <1437548143-24893-2-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-07-22  6:55   ` [PATCH WIP 02/43] IB/mlx4: Support ib_alloc_mr verb Sagi Grimberg
                     ` (42 subsequent siblings)
  43 siblings, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Use ib_alloc_mr with specific parameters.
Change the existing callers.

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/verbs.c          | 20 ++++++++++++------
 drivers/infiniband/hw/mlx5/main.c        |  2 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h     |  6 ++++--
 drivers/infiniband/hw/mlx5/mr.c          | 21 ++++++++++++++-----
 drivers/infiniband/ulp/iser/iser_verbs.c |  4 +---
 drivers/infiniband/ulp/isert/ib_isert.c  |  6 +-----
 include/rdma/ib_verbs.h                  | 36 ++++++++++----------------------
 7 files changed, 48 insertions(+), 47 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 8197ce7..23d73bd 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1235,16 +1235,24 @@ int ib_dereg_mr(struct ib_mr *mr)
 }
 EXPORT_SYMBOL(ib_dereg_mr);
 
-struct ib_mr *ib_create_mr(struct ib_pd *pd,
-			   struct ib_mr_init_attr *mr_init_attr)
+/**
+ * ib_alloc_mr() - Allocates a memory region
+ * @pd:            protection domain associated with the region
+ * @mr_type:       memory region type
+ * @max_entries:   maximum registration entries available
+ * @flags:         create flags
+ */
+struct ib_mr *ib_alloc_mr(struct ib_pd *pd,
+			  enum ib_mr_type mr_type,
+			  u32 max_entries,
+			  u32 flags)
 {
 	struct ib_mr *mr;
 
-	if (!pd->device->create_mr)
+	if (!pd->device->alloc_mr)
 		return ERR_PTR(-ENOSYS);
 
-	mr = pd->device->create_mr(pd, mr_init_attr);
-
+	mr = pd->device->alloc_mr(pd, mr_type, max_entries, flags);
 	if (!IS_ERR(mr)) {
 		mr->device  = pd->device;
 		mr->pd      = pd;
@@ -1255,7 +1263,7 @@ struct ib_mr *ib_create_mr(struct ib_pd *pd,
 
 	return mr;
 }
-EXPORT_SYMBOL(ib_create_mr);
+EXPORT_SYMBOL(ib_alloc_mr);
 
 struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len)
 {
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 48f02da..82a371f 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1502,7 +1502,7 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 	dev->ib_dev.attach_mcast	= mlx5_ib_mcg_attach;
 	dev->ib_dev.detach_mcast	= mlx5_ib_mcg_detach;
 	dev->ib_dev.process_mad		= mlx5_ib_process_mad;
-	dev->ib_dev.create_mr		= mlx5_ib_create_mr;
+	dev->ib_dev.alloc_mr		= mlx5_ib_alloc_mr;
 	dev->ib_dev.alloc_fast_reg_mr	= mlx5_ib_alloc_fast_reg_mr;
 	dev->ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list;
 	dev->ib_dev.free_fast_reg_page_list  = mlx5_ib_free_fast_reg_page_list;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 29c74e9..cd6fb5d 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -573,8 +573,10 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 int mlx5_ib_update_mtt(struct mlx5_ib_mr *mr, u64 start_page_index,
 		       int npages, int zap);
 int mlx5_ib_dereg_mr(struct ib_mr *ibmr);
-struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd,
-				struct ib_mr_init_attr *mr_init_attr);
+struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
+			       enum ib_mr_type mr_type,
+			       u32 max_entries,
+			       u32 flags);
 struct ib_mr *mlx5_ib_alloc_fast_reg_mr(struct ib_pd *pd,
 					int max_page_list_len);
 struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 3197c00..185c963 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1247,14 +1247,19 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr)
 	return 0;
 }
 
-struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd,
-				struct ib_mr_init_attr *mr_init_attr)
+struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
+			       enum ib_mr_type mr_type,
+			       u32 max_entries,
+			       u32 flags)
 {
 	struct mlx5_ib_dev *dev = to_mdev(pd->device);
 	struct mlx5_create_mkey_mbox_in *in;
 	struct mlx5_ib_mr *mr;
 	int access_mode, err;
-	int ndescs = roundup(mr_init_attr->max_reg_descriptors, 4);
+	int ndescs = roundup(max_entries, 4);
+
+	if (flags)
+		return ERR_PTR(-EINVAL);
 
 	mr = kzalloc(sizeof(*mr), GFP_KERNEL);
 	if (!mr)
@@ -1270,9 +1275,11 @@ struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd,
 	in->seg.xlt_oct_size = cpu_to_be32(ndescs);
 	in->seg.qpn_mkey7_0 = cpu_to_be32(0xffffff << 8);
 	in->seg.flags_pd = cpu_to_be32(to_mpd(pd)->pdn);
-	access_mode = MLX5_ACCESS_MODE_MTT;
 
-	if (mr_init_attr->flags & IB_MR_SIGNATURE_EN) {
+	if (mr_type == IB_MR_TYPE_FAST_REG) {
+		access_mode = MLX5_ACCESS_MODE_MTT;
+		in->seg.log2_page_size = PAGE_SHIFT;
+	} else if (mr_type == IB_MR_TYPE_SIGNATURE) {
 		u32 psv_index[2];
 
 		in->seg.flags_pd = cpu_to_be32(be32_to_cpu(in->seg.flags_pd) |
@@ -1298,6 +1305,10 @@ struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd,
 		mr->sig->sig_err_exists = false;
 		/* Next UMR, Arm SIGERR */
 		++mr->sig->sigerr_count;
+	} else {
+		mlx5_ib_warn(dev, "Invalid mr type %d\n", mr_type);
+		err = -EINVAL;
+		goto err_free_in;
 	}
 
 	in->seg.flags = MLX5_PERM_UMR_EN | access_mode;
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c
index 7a5c49f..6be4d4a 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -326,8 +326,6 @@ iser_alloc_pi_ctx(struct ib_device *ib_device,
 		  unsigned int size)
 {
 	struct iser_pi_context *pi_ctx = NULL;
-	struct ib_mr_init_attr mr_init_attr = {.max_reg_descriptors = 2,
-					       .flags = IB_MR_SIGNATURE_EN};
 	int ret;
 
 	desc->pi_ctx = kzalloc(sizeof(*desc->pi_ctx), GFP_KERNEL);
@@ -342,7 +340,7 @@ iser_alloc_pi_ctx(struct ib_device *ib_device,
 		goto alloc_reg_res_err;
 	}
 
-	pi_ctx->sig_mr = ib_create_mr(pd, &mr_init_attr);
+	pi_ctx->sig_mr = ib_alloc_mr(pd, IB_MR_TYPE_SIGNATURE, 2, 0);
 	if (IS_ERR(pi_ctx->sig_mr)) {
 		ret = PTR_ERR(pi_ctx->sig_mr);
 		goto sig_mr_failure;
diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index e59228d..f0b7c9b 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -508,7 +508,6 @@ isert_create_pi_ctx(struct fast_reg_descriptor *desc,
 		    struct ib_device *device,
 		    struct ib_pd *pd)
 {
-	struct ib_mr_init_attr mr_init_attr;
 	struct pi_context *pi_ctx;
 	int ret;
 
@@ -536,10 +535,7 @@ isert_create_pi_ctx(struct fast_reg_descriptor *desc,
 	}
 	desc->ind |= ISERT_PROT_KEY_VALID;
 
-	memset(&mr_init_attr, 0, sizeof(mr_init_attr));
-	mr_init_attr.max_reg_descriptors = 2;
-	mr_init_attr.flags |= IB_MR_SIGNATURE_EN;
-	pi_ctx->sig_mr = ib_create_mr(pd, &mr_init_attr);
+	pi_ctx->sig_mr = ib_alloc_mr(pd, IB_MR_TYPE_SIGNATURE, 2, 0);
 	if (IS_ERR(pi_ctx->sig_mr)) {
 		isert_err("Failed to allocate signature enabled mr err=%ld\n",
 			  PTR_ERR(pi_ctx->sig_mr));
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 4468a64..5ec9a70 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -556,20 +556,9 @@ __attribute_const__ int ib_rate_to_mult(enum ib_rate rate);
  */
 __attribute_const__ int ib_rate_to_mbps(enum ib_rate rate);
 
-enum ib_mr_create_flags {
-	IB_MR_SIGNATURE_EN = 1,
-};
-
-/**
- * ib_mr_init_attr - Memory region init attributes passed to routine
- *     ib_create_mr.
- * @max_reg_descriptors: max number of registration descriptors that
- *     may be used with registration work requests.
- * @flags: MR creation flags bit mask.
- */
-struct ib_mr_init_attr {
-	int	    max_reg_descriptors;
-	u32	    flags;
+enum ib_mr_type {
+	IB_MR_TYPE_FAST_REG,
+	IB_MR_TYPE_SIGNATURE,
 };
 
 /**
@@ -1668,8 +1657,10 @@ struct ib_device {
 	int                        (*query_mr)(struct ib_mr *mr,
 					       struct ib_mr_attr *mr_attr);
 	int                        (*dereg_mr)(struct ib_mr *mr);
-	struct ib_mr *		   (*create_mr)(struct ib_pd *pd,
-						struct ib_mr_init_attr *mr_init_attr);
+	struct ib_mr *		   (*alloc_mr)(struct ib_pd *pd,
+					       enum ib_mr_type mr_type,
+					       u32 max_entries,
+					       u32 flags);
 	struct ib_mr *		   (*alloc_fast_reg_mr)(struct ib_pd *pd,
 					       int max_page_list_len);
 	struct ib_fast_reg_page_list * (*alloc_fast_reg_page_list)(struct ib_device *device,
@@ -2806,15 +2797,10 @@ int ib_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr);
  */
 int ib_dereg_mr(struct ib_mr *mr);
 
-
-/**
- * ib_create_mr - Allocates a memory region that may be used for
- *     signature handover operations.
- * @pd: The protection domain associated with the region.
- * @mr_init_attr: memory region init attributes.
- */
-struct ib_mr *ib_create_mr(struct ib_pd *pd,
-			   struct ib_mr_init_attr *mr_init_attr);
+struct ib_mr *ib_alloc_mr(struct ib_pd *pd,
+			  enum ib_mr_type mr_type,
+			  u32 max_entries,
+			  u32 flags);
 
 /**
  * ib_alloc_fast_reg_mr - Allocates memory region usable with the
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 02/43] IB/mlx4: Support ib_alloc_mr verb
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-07-22  6:55   ` [PATCH WIP 01/43] IB: Modify ib_create_mr API Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
       [not found]     ` <1437548143-24893-3-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-07-22  6:55   ` [PATCH WIP 03/43] ocrdma: " Sagi Grimberg
                     ` (41 subsequent siblings)
  43 siblings, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx4/main.c    |  1 +
 drivers/infiniband/hw/mlx4/mlx4_ib.h |  4 ++++
 drivers/infiniband/hw/mlx4/mr.c      | 38 ++++++++++++++++++++++++++++++++++++
 3 files changed, 43 insertions(+)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index a6f44ee..54671c7 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2298,6 +2298,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 	ibdev->ib_dev.rereg_user_mr	= mlx4_ib_rereg_user_mr;
 	ibdev->ib_dev.dereg_mr		= mlx4_ib_dereg_mr;
 	ibdev->ib_dev.alloc_fast_reg_mr = mlx4_ib_alloc_fast_reg_mr;
+	ibdev->ib_dev.alloc_mr		= mlx4_ib_alloc_mr;
 	ibdev->ib_dev.alloc_fast_reg_page_list = mlx4_ib_alloc_fast_reg_page_list;
 	ibdev->ib_dev.free_fast_reg_page_list  = mlx4_ib_free_fast_reg_page_list;
 	ibdev->ib_dev.attach_mcast	= mlx4_ib_mcg_attach;
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 334387f..c8b5679 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -680,6 +680,10 @@ struct ib_mw *mlx4_ib_alloc_mw(struct ib_pd *pd, enum ib_mw_type type);
 int mlx4_ib_bind_mw(struct ib_qp *qp, struct ib_mw *mw,
 		    struct ib_mw_bind *mw_bind);
 int mlx4_ib_dealloc_mw(struct ib_mw *mw);
+struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd,
+			       enum ib_mr_type mr_type,
+			       u32 max_entries,
+			       u32 flags);
 struct ib_mr *mlx4_ib_alloc_fast_reg_mr(struct ib_pd *pd,
 					int max_page_list_len);
 struct ib_fast_reg_page_list *mlx4_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index e0d2717..3cba374 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -350,6 +350,44 @@ int mlx4_ib_dealloc_mw(struct ib_mw *ibmw)
 	return 0;
 }
 
+struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd,
+			       enum ib_mr_type mr_type,
+			       u32 max_entries,
+			       u32 flags)
+{
+	struct mlx4_ib_dev *dev = to_mdev(pd->device);
+	struct mlx4_ib_mr *mr;
+	int err;
+
+	if (mr_type != IB_MR_TYPE_FAST_REG || flags)
+		return ERR_PTR(-EINVAL);
+
+	mr = kmalloc(sizeof *mr, GFP_KERNEL);
+	if (!mr)
+		return ERR_PTR(-ENOMEM);
+
+	err = mlx4_mr_alloc(dev->dev, to_mpd(pd)->pdn, 0, 0, 0,
+			    max_entries, 0, &mr->mmr);
+	if (err)
+		goto err_free;
+
+	err = mlx4_mr_enable(dev->dev, &mr->mmr);
+	if (err)
+		goto err_mr;
+
+	mr->ibmr.rkey = mr->ibmr.lkey = mr->mmr.key;
+	mr->umem = NULL;
+
+	return &mr->ibmr;
+
+err_mr:
+	(void) mlx4_mr_free(dev->dev, &mr->mmr);
+
+err_free:
+	kfree(mr);
+	return ERR_PTR(err);
+}
+
 struct ib_mr *mlx4_ib_alloc_fast_reg_mr(struct ib_pd *pd,
 					int max_page_list_len)
 {
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 03/43] ocrdma: Support ib_alloc_mr verb
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-07-22  6:55   ` [PATCH WIP 01/43] IB: Modify ib_create_mr API Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 02/43] IB/mlx4: Support ib_alloc_mr verb Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 04/43] iw_cxgb4: " Sagi Grimberg
                     ` (40 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/ocrdma/ocrdma_main.c  |  1 +
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 47 +++++++++++++++++++++++++++++
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h |  4 +++
 3 files changed, 52 insertions(+)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index 8a1398b..d7ebe04 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -294,6 +294,7 @@ static int ocrdma_register_device(struct ocrdma_dev *dev)
 	dev->ibdev.dereg_mr = ocrdma_dereg_mr;
 	dev->ibdev.reg_user_mr = ocrdma_reg_user_mr;
 
+	dev->ibdev.alloc_mr = ocrdma_alloc_mr;
 	dev->ibdev.alloc_fast_reg_mr = ocrdma_alloc_frmr;
 	dev->ibdev.alloc_fast_reg_page_list = ocrdma_alloc_frmr_page_list;
 	dev->ibdev.free_fast_reg_page_list = ocrdma_free_frmr_page_list;
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index 5bb61eb..3487780 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -2983,6 +2983,53 @@ int ocrdma_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags cq_flags)
 	return 0;
 }
 
+struct ib_mr *ocrdma_alloc_mr(struct ib_pd *ibpd,
+			      enum ib_mr_type mr_type,
+			      u32 max_entries,
+			      u32 flags)
+{
+	int status;
+	struct ocrdma_mr *mr;
+	struct ocrdma_pd *pd = get_ocrdma_pd(ibpd);
+	struct ocrdma_dev *dev = get_ocrdma_dev(ibpd->device);
+
+	if (mr_type != IB_MR_TYPE_FAST_REG || flags)
+		return ERR_PTR(-EINVAL);
+
+	if (max_entries > dev->attr.max_pages_per_frmr)
+		return ERR_PTR(-EINVAL);
+
+	mr = kzalloc(sizeof(*mr), GFP_KERNEL);
+	if (!mr)
+		return ERR_PTR(-ENOMEM);
+
+	status = ocrdma_get_pbl_info(dev, mr, max_entries);
+	if (status)
+		goto pbl_err;
+	mr->hwmr.fr_mr = 1;
+	mr->hwmr.remote_rd = 0;
+	mr->hwmr.remote_wr = 0;
+	mr->hwmr.local_rd = 0;
+	mr->hwmr.local_wr = 0;
+	mr->hwmr.mw_bind = 0;
+	status = ocrdma_build_pbl_tbl(dev, &mr->hwmr);
+	if (status)
+		goto pbl_err;
+	status = ocrdma_reg_mr(dev, &mr->hwmr, pd->id, 0);
+	if (status)
+		goto mbx_err;
+	mr->ibmr.rkey = mr->hwmr.lkey;
+	mr->ibmr.lkey = mr->hwmr.lkey;
+	dev->stag_arr[(mr->hwmr.lkey >> 8) & (OCRDMA_MAX_STAG - 1)] =
+		(unsigned long) mr;
+	return &mr->ibmr;
+mbx_err:
+	ocrdma_free_mr_pbl_tbl(dev, &mr->hwmr);
+pbl_err:
+	kfree(mr);
+	return ERR_PTR(-ENOMEM);
+}
+
 struct ib_mr *ocrdma_alloc_frmr(struct ib_pd *ibpd, int max_page_list_len)
 {
 	int status;
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
index b15c608..eebcda2 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
@@ -96,6 +96,10 @@ struct ib_mr *ocrdma_reg_kernel_mr(struct ib_pd *,
 				   int num_phys_buf, int acc, u64 *iova_start);
 struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *, u64 start, u64 length,
 				 u64 virt, int acc, struct ib_udata *);
+struct ib_mr *ocrdma_alloc_mr(struct ib_pd *pd,
+			      enum ib_mr_type mr_type,
+			      u32 max_entries,
+			      u32 flags);
 struct ib_mr *ocrdma_alloc_frmr(struct ib_pd *pd, int max_page_list_len);
 struct ib_fast_reg_page_list *ocrdma_alloc_frmr_page_list(struct ib_device
 							*ibdev,
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 04/43] iw_cxgb4: Support ib_alloc_mr verb
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 03/43] ocrdma: " Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 05/43] cxgb3: " Sagi Grimberg
                     ` (39 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h |  4 +++
 drivers/infiniband/hw/cxgb4/mem.c      | 57 ++++++++++++++++++++++++++++++++++
 drivers/infiniband/hw/cxgb4/provider.c |  1 +
 3 files changed, 62 insertions(+)

diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index cc77844..97b2568 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -970,6 +970,10 @@ void c4iw_free_fastreg_pbl(struct ib_fast_reg_page_list *page_list);
 struct ib_fast_reg_page_list *c4iw_alloc_fastreg_pbl(
 					struct ib_device *device,
 					int page_list_len);
+struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
+			    enum ib_mr_type mr_type,
+			    u32 max_entries,
+			    u32 flags);
 struct ib_mr *c4iw_alloc_fast_reg_mr(struct ib_pd *pd, int pbl_depth);
 int c4iw_dealloc_mw(struct ib_mw *mw);
 struct ib_mw *c4iw_alloc_mw(struct ib_pd *pd, enum ib_mw_type type);
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index cff815b..7ee01ce 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -853,6 +853,63 @@ int c4iw_dealloc_mw(struct ib_mw *mw)
 	return 0;
 }
 
+struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
+			    enum ib_mr_type mr_type,
+			    u32 max_entries,
+			    u32 flags)
+{
+	struct c4iw_dev *rhp;
+	struct c4iw_pd *php;
+	struct c4iw_mr *mhp;
+	u32 mmid;
+	u32 stag = 0;
+	int ret = 0;
+
+	if (mr_type != IB_MR_TYPE_FAST_REG || flags)
+		return ERR_PTR(-EINVAL);
+
+	php = to_c4iw_pd(pd);
+	rhp = php->rhp;
+	mhp = kzalloc(sizeof(*mhp), GFP_KERNEL);
+	if (!mhp) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
+	mhp->rhp = rhp;
+	ret = alloc_pbl(mhp, max_entries);
+	if (ret)
+		goto err1;
+	mhp->attr.pbl_size = max_entries;
+	ret = allocate_stag(&rhp->rdev, &stag, php->pdid,
+				 mhp->attr.pbl_size, mhp->attr.pbl_addr);
+	if (ret)
+		goto err2;
+	mhp->attr.pdid = php->pdid;
+	mhp->attr.type = FW_RI_STAG_NSMR;
+	mhp->attr.stag = stag;
+	mhp->attr.state = 1;
+	mmid = (stag) >> 8;
+	mhp->ibmr.rkey = mhp->ibmr.lkey = stag;
+	if (insert_handle(rhp, &rhp->mmidr, mhp, mmid)) {
+		ret = -ENOMEM;
+		goto err3;
+	}
+
+	PDBG("%s mmid 0x%x mhp %p stag 0x%x\n", __func__, mmid, mhp, stag);
+	return &(mhp->ibmr);
+err3:
+	dereg_mem(&rhp->rdev, stag, mhp->attr.pbl_size,
+		       mhp->attr.pbl_addr);
+err2:
+	c4iw_pblpool_free(&mhp->rhp->rdev, mhp->attr.pbl_addr,
+			      mhp->attr.pbl_size << 3);
+err1:
+	kfree(mhp);
+err:
+	return ERR_PTR(ret);
+}
+
 struct ib_mr *c4iw_alloc_fast_reg_mr(struct ib_pd *pd, int pbl_depth)
 {
 	struct c4iw_dev *rhp;
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index 6eee3d3..2885aba 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -556,6 +556,7 @@ int c4iw_register_device(struct c4iw_dev *dev)
 	dev->ibdev.alloc_mw = c4iw_alloc_mw;
 	dev->ibdev.bind_mw = c4iw_bind_mw;
 	dev->ibdev.dealloc_mw = c4iw_dealloc_mw;
+	dev->ibdev.alloc_mr = c4iw_alloc_mr;
 	dev->ibdev.alloc_fast_reg_mr = c4iw_alloc_fast_reg_mr;
 	dev->ibdev.alloc_fast_reg_page_list = c4iw_alloc_fastreg_pbl;
 	dev->ibdev.free_fast_reg_page_list = c4iw_free_fastreg_pbl;
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 05/43] cxgb3: Support ib_alloc_mr verb
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (3 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 04/43] iw_cxgb4: " Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 06/43] nes: " Sagi Grimberg
                     ` (38 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/cxgb3/iwch_provider.c | 53 +++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index b1b7323..d0e9e2d 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -796,6 +796,58 @@ static int iwch_dealloc_mw(struct ib_mw *mw)
 	return 0;
 }
 
+static struct ib_mr *iwch_alloc_mr(struct ib_pd *pd,
+				   enum ib_mr_type mr_type,
+				   u32 max_entries,
+				   u32 flags)
+{
+	struct iwch_dev *rhp;
+	struct iwch_pd *php;
+	struct iwch_mr *mhp;
+	u32 mmid;
+	u32 stag = 0;
+	int ret = 0;
+
+	if (mr_type != IB_MR_TYPE_FAST_REG || flags)
+		return ERR_PTR(-EINVAL);
+
+	php = to_iwch_pd(pd);
+	rhp = php->rhp;
+	mhp = kzalloc(sizeof(*mhp), GFP_KERNEL);
+	if (!mhp)
+		goto err;
+
+	mhp->rhp = rhp;
+	ret = iwch_alloc_pbl(mhp, max_entries);
+	if (ret)
+		goto err1;
+	mhp->attr.pbl_size = max_entries;
+	ret = cxio_allocate_stag(&rhp->rdev, &stag, php->pdid,
+				 mhp->attr.pbl_size, mhp->attr.pbl_addr);
+	if (ret)
+		goto err2;
+	mhp->attr.pdid = php->pdid;
+	mhp->attr.type = TPT_NON_SHARED_MR;
+	mhp->attr.stag = stag;
+	mhp->attr.state = 1;
+	mmid = (stag) >> 8;
+	mhp->ibmr.rkey = mhp->ibmr.lkey = stag;
+	if (insert_handle(rhp, &rhp->mmidr, mhp, mmid))
+		goto err3;
+
+	PDBG("%s mmid 0x%x mhp %p stag 0x%x\n", __func__, mmid, mhp, stag);
+	return &(mhp->ibmr);
+err3:
+	cxio_dereg_mem(&rhp->rdev, stag, mhp->attr.pbl_size,
+		       mhp->attr.pbl_addr);
+err2:
+	iwch_free_pbl(mhp);
+err1:
+	kfree(mhp);
+err:
+	return ERR_PTR(ret);
+}
+
 static struct ib_mr *iwch_alloc_fast_reg_mr(struct ib_pd *pd, int pbl_depth)
 {
 	struct iwch_dev *rhp;
@@ -1439,6 +1491,7 @@ int iwch_register_device(struct iwch_dev *dev)
 	dev->ibdev.alloc_mw = iwch_alloc_mw;
 	dev->ibdev.bind_mw = iwch_bind_mw;
 	dev->ibdev.dealloc_mw = iwch_dealloc_mw;
+	dev->ibdev.alloc_mr = iwch_alloc_mr;
 	dev->ibdev.alloc_fast_reg_mr = iwch_alloc_fast_reg_mr;
 	dev->ibdev.alloc_fast_reg_page_list = iwch_alloc_fastreg_pbl;
 	dev->ibdev.free_fast_reg_page_list = iwch_free_fastreg_pbl;
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 06/43] nes: Support ib_alloc_mr verb
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (4 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 05/43] cxgb3: " Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 07/43] qib: " Sagi Grimberg
                     ` (37 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/nes/nes_verbs.c | 73 +++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)

diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index fbc43e5..ac63763 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -375,6 +375,78 @@ static int alloc_fast_reg_mr(struct nes_device *nesdev, struct nes_pd *nespd,
 }
 
 /*
+ * nes_alloc_mr
+ */
+static struct ib_mr *nes_alloc_mr(struct ib_pd *ibpd,
+				  enum ib_mr_type mr_type,
+				  u32 max_entries,
+				  u32 flags)
+{
+	struct nes_pd *nespd = to_nespd(ibpd);
+	struct nes_vnic *nesvnic = to_nesvnic(ibpd->device);
+	struct nes_device *nesdev = nesvnic->nesdev;
+	struct nes_adapter *nesadapter = nesdev->nesadapter;
+
+	u32 next_stag_index;
+	u8 stag_key = 0;
+	u32 driver_key = 0;
+	int err = 0;
+	u32 stag_index = 0;
+	struct nes_mr *nesmr;
+	u32 stag;
+	int ret;
+	struct ib_mr *ibmr;
+
+	if (mr_type != IB_MR_TYPE_FAST_REG || flags)
+		return ERR_PTR(-EINVAL);
+
+/*
+ * Note:  Set to always use a fixed length single page entry PBL.  This is to allow
+ *	 for the fast_reg_mr operation to always know the size of the PBL.
+ */
+	if (max_entries > (NES_4K_PBL_CHUNK_SIZE / sizeof(u64)))
+		return ERR_PTR(-E2BIG);
+
+	get_random_bytes(&next_stag_index, sizeof(next_stag_index));
+	stag_key = (u8)next_stag_index;
+	next_stag_index >>= 8;
+	next_stag_index %= nesadapter->max_mr;
+
+	err = nes_alloc_resource(nesadapter, nesadapter->allocated_mrs,
+				 nesadapter->max_mr, &stag_index,
+				 &next_stag_index, NES_RESOURCE_FAST_MR);
+	if (err)
+		return ERR_PTR(err);
+
+	nesmr = kzalloc(sizeof(*nesmr), GFP_KERNEL);
+	if (!nesmr) {
+		nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	stag = stag_index << 8;
+	stag |= driver_key;
+	stag += (u32)stag_key;
+
+	nes_debug(NES_DBG_MR, "Allocating STag 0x%08X index = 0x%08X\n",
+		  stag, stag_index);
+
+	ret = alloc_fast_reg_mr(nesdev, nespd, stag, max_entries);
+
+	if (ret == 0) {
+		nesmr->ibmr.rkey = stag;
+		nesmr->ibmr.lkey = stag;
+		nesmr->mode = IWNES_MEMREG_TYPE_FMEM;
+		ibmr = &nesmr->ibmr;
+	} else {
+		kfree(nesmr);
+		nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index);
+		ibmr = ERR_PTR(-ENOMEM);
+	}
+	return ibmr;
+}
+
+/*
  * nes_alloc_fast_reg_mr
  */
 static struct ib_mr *nes_alloc_fast_reg_mr(struct ib_pd *ibpd, int max_page_list_len)
@@ -3929,6 +4001,7 @@ struct nes_ib_device *nes_init_ofa_device(struct net_device *netdev)
 	nesibdev->ibdev.dealloc_mw = nes_dealloc_mw;
 	nesibdev->ibdev.bind_mw = nes_bind_mw;
 
+	nesibdev->ibdev.alloc_mr = nes_alloc_mr;
 	nesibdev->ibdev.alloc_fast_reg_mr = nes_alloc_fast_reg_mr;
 	nesibdev->ibdev.alloc_fast_reg_page_list = nes_alloc_fast_reg_page_list;
 	nesibdev->ibdev.free_fast_reg_page_list = nes_free_fast_reg_page_list;
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 07/43] qib: Support ib_alloc_mr verb
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (5 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 06/43] nes: " Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 08/43] IB/iser: Convert to ib_alloc_mr Sagi Grimberg
                     ` (36 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/qib/qib_mr.c    | 23 +++++++++++++++++++++++
 drivers/infiniband/hw/qib/qib_verbs.c |  1 +
 drivers/infiniband/hw/qib/qib_verbs.h |  5 +++++
 3 files changed, 29 insertions(+)

diff --git a/drivers/infiniband/hw/qib/qib_mr.c b/drivers/infiniband/hw/qib/qib_mr.c
index c4473db..1522255 100644
--- a/drivers/infiniband/hw/qib/qib_mr.c
+++ b/drivers/infiniband/hw/qib/qib_mr.c
@@ -327,6 +327,29 @@ out:
  *
  * Return the memory region on success, otherwise return an errno.
  */
+struct ib_mr *qib_alloc_mr(struct ib_pd *pd,
+			   enum ib_mr_type mr_type,
+			   u32 max_entries,
+			   u32 flags)
+{
+	struct qib_mr *mr;
+
+	if (mr_type != IB_MR_TYPE_FAST_REG || flags)
+		return ERR_PTR(-EINVAL);
+
+	mr = alloc_mr(max_entries, pd);
+	if (IS_ERR(mr))
+		return (struct ib_mr *)mr;
+
+	return &mr->ibmr;
+}
+
+/*
+ * Allocate a memory region usable with the
+ * IB_WR_FAST_REG_MR send work request.
+ *
+ * Return the memory region on success, otherwise return an errno.
+ */
 struct ib_mr *qib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len)
 {
 	struct qib_mr *mr;
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
index a05d1a3..323666b 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -2235,6 +2235,7 @@ int qib_register_ib_device(struct qib_devdata *dd)
 	ibdev->reg_phys_mr = qib_reg_phys_mr;
 	ibdev->reg_user_mr = qib_reg_user_mr;
 	ibdev->dereg_mr = qib_dereg_mr;
+	ibdev->alloc_mr = qib_alloc_mr;
 	ibdev->alloc_fast_reg_mr = qib_alloc_fast_reg_mr;
 	ibdev->alloc_fast_reg_page_list = qib_alloc_fast_reg_page_list;
 	ibdev->free_fast_reg_page_list = qib_free_fast_reg_page_list;
diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h
index 1635572..034510c 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.h
+++ b/drivers/infiniband/hw/qib/qib_verbs.h
@@ -1032,6 +1032,11 @@ struct ib_mr *qib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
 
 int qib_dereg_mr(struct ib_mr *ibmr);
 
+struct ib_mr *qib_alloc_mr(struct ib_pd *pd,
+			   enum ib_mr_type mr_type,
+			   u32 max_entries,
+			   u32 flags);
+
 struct ib_mr *qib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len);
 
 struct ib_fast_reg_page_list *qib_alloc_fast_reg_page_list(
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 08/43] IB/iser: Convert to ib_alloc_mr
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (6 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 07/43] qib: " Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 09/43] iser-target: " Sagi Grimberg
                     ` (35 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/ulp/iser/iser_verbs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c
index 6be4d4a..ecc3265 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -296,7 +296,7 @@ iser_alloc_reg_res(struct ib_device *ib_device,
 		return PTR_ERR(res->frpl);
 	}
 
-	res->mr = ib_alloc_fast_reg_mr(pd, size);
+	res->mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, size, 0);
 	if (IS_ERR(res->mr)) {
 		ret = PTR_ERR(res->mr);
 		iser_err("Failed to allocate ib_fast_reg_mr err=%d\n", ret);
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 09/43] iser-target: Convert to ib_alloc_mr
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (7 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 08/43] IB/iser: Convert to ib_alloc_mr Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 10/43] IB/srp: " Sagi Grimberg
                     ` (34 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/ulp/isert/ib_isert.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index f0b7c9b..94395ce 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -526,7 +526,8 @@ isert_create_pi_ctx(struct fast_reg_descriptor *desc,
 		goto err_pi_ctx;
 	}
 
-	pi_ctx->prot_mr = ib_alloc_fast_reg_mr(pd, ISCSI_ISER_SG_TABLESIZE);
+	pi_ctx->prot_mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG,
+				      ISCSI_ISER_SG_TABLESIZE, 0);
 	if (IS_ERR(pi_ctx->prot_mr)) {
 		isert_err("Failed to allocate prot frmr err=%ld\n",
 			  PTR_ERR(pi_ctx->prot_mr));
@@ -573,7 +574,8 @@ isert_create_fr_desc(struct ib_device *ib_device, struct ib_pd *pd,
 		return PTR_ERR(fr_desc->data_frpl);
 	}
 
-	fr_desc->data_mr = ib_alloc_fast_reg_mr(pd, ISCSI_ISER_SG_TABLESIZE);
+	fr_desc->data_mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG,
+				       ISCSI_ISER_SG_TABLESIZE, 0);
 	if (IS_ERR(fr_desc->data_mr)) {
 		isert_err("Failed to allocate data frmr err=%ld\n",
 			  PTR_ERR(fr_desc->data_mr));
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 10/43] IB/srp: Convert to ib_alloc_mr
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (8 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 09/43] iser-target: " Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 11/43] xprtrdma, svcrdma: " Sagi Grimberg
                     ` (33 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 1218738..7747587 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -378,7 +378,8 @@ static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device,
 	INIT_LIST_HEAD(&pool->free_list);
 
 	for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) {
-		mr = ib_alloc_fast_reg_mr(pd, max_page_list_len);
+		mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG,
+				 max_page_list_len, 0);
 		if (IS_ERR(mr)) {
 			ret = PTR_ERR(mr);
 			goto destroy_pool;
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 11/43] xprtrdma, svcrdma: Convert to ib_alloc_mr
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (9 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 10/43] IB/srp: " Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 12/43] RDS: " Sagi Grimberg
                     ` (32 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 net/sunrpc/xprtrdma/frwr_ops.c           | 6 +++---
 net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 63f282e..517efed 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -117,7 +117,7 @@ __frwr_recovery_worker(struct work_struct *work)
 	if (ib_dereg_mr(r->r.frmr.fr_mr))
 		goto out_fail;
 
-	r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(pd, depth);
+	r->r.frmr.fr_mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, depth, 0);
 	if (IS_ERR(r->r.frmr.fr_mr))
 		goto out_fail;
 
@@ -148,7 +148,7 @@ __frwr_init(struct rpcrdma_mw *r, struct ib_pd *pd, struct ib_device *device,
 	struct rpcrdma_frmr *f = &r->r.frmr;
 	int rc;
 
-	f->fr_mr = ib_alloc_fast_reg_mr(pd, depth);
+	f->fr_mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, depth, 0);
 	if (IS_ERR(f->fr_mr))
 		goto out_mr_err;
 	f->fr_pgl = ib_alloc_fast_reg_page_list(device, depth);
@@ -158,7 +158,7 @@ __frwr_init(struct rpcrdma_mw *r, struct ib_pd *pd, struct ib_device *device,
 
 out_mr_err:
 	rc = PTR_ERR(f->fr_mr);
-	dprintk("RPC:       %s: ib_alloc_fast_reg_mr status %i\n",
+	dprintk("RPC:       %s: ib_alloc_mr status %i\n",
 		__func__, rc);
 	return rc;
 
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 6b36279..fd933d9 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -738,7 +738,7 @@ static struct svc_rdma_fastreg_mr *rdma_alloc_frmr(struct svcxprt_rdma *xprt)
 	if (!frmr)
 		goto err;
 
-	mr = ib_alloc_fast_reg_mr(xprt->sc_pd, RPCSVC_MAXPAGES);
+	mr = ib_alloc_mr(xprt->sc_pd, IB_MR_TYPE_FAST_REG, RPCSVC_MAXPAGES, 0);
 	if (IS_ERR(mr))
 		goto err_free_frmr;
 
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 12/43] RDS: Convert to ib_alloc_mr
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (10 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 11/43] xprtrdma, svcrdma: " Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 13/43] mlx5: Drop mlx5_ib_alloc_fast_reg_mr Sagi Grimberg
                     ` (31 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 net/rds/iw_rdma.c | 5 +++--
 net/rds/iw_send.c | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/rds/iw_rdma.c b/net/rds/iw_rdma.c
index dba8d08..dac0131 100644
--- a/net/rds/iw_rdma.c
+++ b/net/rds/iw_rdma.c
@@ -667,11 +667,12 @@ static int rds_iw_init_fastreg(struct rds_iw_mr_pool *pool,
 	struct ib_mr *mr;
 	int err;
 
-	mr = ib_alloc_fast_reg_mr(rds_iwdev->pd, pool->max_message_size);
+	mr = ib_alloc_mr(rds_iwdev->pd, IB_MR_TYPE_FAST_REG,
+			 pool->max_message_size, 0);
 	if (IS_ERR(mr)) {
 		err = PTR_ERR(mr);
 
-		printk(KERN_WARNING "RDS/IW: ib_alloc_fast_reg_mr failed (err=%d)\n", err);
+		printk(KERN_WARNING "RDS/IW: ib_alloc_mr failed (err=%d)\n", err);
 		return err;
 	}
 
diff --git a/net/rds/iw_send.c b/net/rds/iw_send.c
index 334fe98..0d8e74b 100644
--- a/net/rds/iw_send.c
+++ b/net/rds/iw_send.c
@@ -153,9 +153,10 @@ void rds_iw_send_init_ring(struct rds_iw_connection *ic)
 		sge->length = sizeof(struct rds_header);
 		sge->lkey = 0;
 
-		send->s_mr = ib_alloc_fast_reg_mr(ic->i_pd, fastreg_message_size);
+		send->s_mr = ib_alloc_mr(ic->i_pd, IB_MR_TYPE_FAST_REG,
+					 fastreg_message_size, 0);
 		if (IS_ERR(send->s_mr)) {
-			printk(KERN_WARNING "RDS/IW: ib_alloc_fast_reg_mr failed\n");
+			printk(KERN_WARNING "RDS/IW: ib_alloc_mr failed\n");
 			break;
 		}
 
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 13/43] mlx5: Drop mlx5_ib_alloc_fast_reg_mr
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (11 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 12/43] RDS: " Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 14/43] mlx4: Drop mlx4_ib_alloc_fast_reg_mr Sagi Grimberg
                     ` (30 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c    |  1 -
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  2 --
 drivers/infiniband/hw/mlx5/mr.c      | 44 ------------------------------------
 3 files changed, 47 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 82a371f..ce75875 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1503,7 +1503,6 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 	dev->ib_dev.detach_mcast	= mlx5_ib_mcg_detach;
 	dev->ib_dev.process_mad		= mlx5_ib_process_mad;
 	dev->ib_dev.alloc_mr		= mlx5_ib_alloc_mr;
-	dev->ib_dev.alloc_fast_reg_mr	= mlx5_ib_alloc_fast_reg_mr;
 	dev->ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list;
 	dev->ib_dev.free_fast_reg_page_list  = mlx5_ib_free_fast_reg_page_list;
 	dev->ib_dev.check_mr_status	= mlx5_ib_check_mr_status;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index cd6fb5d..c2916f1 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -577,8 +577,6 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
 			       enum ib_mr_type mr_type,
 			       u32 max_entries,
 			       u32 flags);
-struct ib_mr *mlx5_ib_alloc_fast_reg_mr(struct ib_pd *pd,
-					int max_page_list_len);
 struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
 							       int page_list_len);
 void mlx5_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list);
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 185c963..c8de302 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1344,50 +1344,6 @@ err_free:
 	return ERR_PTR(err);
 }
 
-struct ib_mr *mlx5_ib_alloc_fast_reg_mr(struct ib_pd *pd,
-					int max_page_list_len)
-{
-	struct mlx5_ib_dev *dev = to_mdev(pd->device);
-	struct mlx5_create_mkey_mbox_in *in;
-	struct mlx5_ib_mr *mr;
-	int err;
-
-	mr = kzalloc(sizeof(*mr), GFP_KERNEL);
-	if (!mr)
-		return ERR_PTR(-ENOMEM);
-
-	in = kzalloc(sizeof(*in), GFP_KERNEL);
-	if (!in) {
-		err = -ENOMEM;
-		goto err_free;
-	}
-
-	in->seg.status = MLX5_MKEY_STATUS_FREE;
-	in->seg.xlt_oct_size = cpu_to_be32((max_page_list_len + 1) / 2);
-	in->seg.qpn_mkey7_0 = cpu_to_be32(0xffffff << 8);
-	in->seg.flags = MLX5_PERM_UMR_EN | MLX5_ACCESS_MODE_MTT;
-	in->seg.flags_pd = cpu_to_be32(to_mpd(pd)->pdn);
-	/*
-	 * TBD not needed - issue 197292 */
-	in->seg.log2_page_size = PAGE_SHIFT;
-
-	err = mlx5_core_create_mkey(dev->mdev, &mr->mmr, in, sizeof(*in), NULL,
-				    NULL, NULL);
-	kfree(in);
-	if (err)
-		goto err_free;
-
-	mr->ibmr.lkey = mr->mmr.key;
-	mr->ibmr.rkey = mr->mmr.key;
-	mr->umem = NULL;
-
-	return &mr->ibmr;
-
-err_free:
-	kfree(mr);
-	return ERR_PTR(err);
-}
-
 struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
 							       int page_list_len)
 {
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 14/43] mlx4: Drop mlx4_ib_alloc_fast_reg_mr
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (12 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 13/43] mlx5: Drop mlx5_ib_alloc_fast_reg_mr Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 15/43] ocrdma: Drop ocrdma_alloc_frmr Sagi Grimberg
                     ` (29 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx4/main.c    |  1 -
 drivers/infiniband/hw/mlx4/mlx4_ib.h |  2 --
 drivers/infiniband/hw/mlx4/mr.c      | 33 ---------------------------------
 3 files changed, 36 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 54671c7..829fcf4 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2297,7 +2297,6 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 	ibdev->ib_dev.reg_user_mr	= mlx4_ib_reg_user_mr;
 	ibdev->ib_dev.rereg_user_mr	= mlx4_ib_rereg_user_mr;
 	ibdev->ib_dev.dereg_mr		= mlx4_ib_dereg_mr;
-	ibdev->ib_dev.alloc_fast_reg_mr = mlx4_ib_alloc_fast_reg_mr;
 	ibdev->ib_dev.alloc_mr		= mlx4_ib_alloc_mr;
 	ibdev->ib_dev.alloc_fast_reg_page_list = mlx4_ib_alloc_fast_reg_page_list;
 	ibdev->ib_dev.free_fast_reg_page_list  = mlx4_ib_free_fast_reg_page_list;
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index c8b5679..9220faf 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -684,8 +684,6 @@ struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd,
 			       enum ib_mr_type mr_type,
 			       u32 max_entries,
 			       u32 flags);
-struct ib_mr *mlx4_ib_alloc_fast_reg_mr(struct ib_pd *pd,
-					int max_page_list_len);
 struct ib_fast_reg_page_list *mlx4_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
 							       int page_list_len);
 void mlx4_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list);
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index 3cba374..121ee7f 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -388,39 +388,6 @@ err_free:
 	return ERR_PTR(err);
 }
 
-struct ib_mr *mlx4_ib_alloc_fast_reg_mr(struct ib_pd *pd,
-					int max_page_list_len)
-{
-	struct mlx4_ib_dev *dev = to_mdev(pd->device);
-	struct mlx4_ib_mr *mr;
-	int err;
-
-	mr = kmalloc(sizeof *mr, GFP_KERNEL);
-	if (!mr)
-		return ERR_PTR(-ENOMEM);
-
-	err = mlx4_mr_alloc(dev->dev, to_mpd(pd)->pdn, 0, 0, 0,
-			    max_page_list_len, 0, &mr->mmr);
-	if (err)
-		goto err_free;
-
-	err = mlx4_mr_enable(dev->dev, &mr->mmr);
-	if (err)
-		goto err_mr;
-
-	mr->ibmr.rkey = mr->ibmr.lkey = mr->mmr.key;
-	mr->umem = NULL;
-
-	return &mr->ibmr;
-
-err_mr:
-	(void) mlx4_mr_free(dev->dev, &mr->mmr);
-
-err_free:
-	kfree(mr);
-	return ERR_PTR(err);
-}
-
 struct ib_fast_reg_page_list *mlx4_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
 							       int page_list_len)
 {
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 15/43] ocrdma: Drop ocrdma_alloc_frmr
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (13 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 14/43] mlx4: Drop mlx4_ib_alloc_fast_reg_mr Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 16/43] qib: Drop qib_alloc_fast_reg_mr Sagi Grimberg
                     ` (28 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/ocrdma/ocrdma_main.c  |  1 -
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 41 -----------------------------
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h |  1 -
 3 files changed, 43 deletions(-)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index d7ebe04..47d2814 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -295,7 +295,6 @@ static int ocrdma_register_device(struct ocrdma_dev *dev)
 	dev->ibdev.reg_user_mr = ocrdma_reg_user_mr;
 
 	dev->ibdev.alloc_mr = ocrdma_alloc_mr;
-	dev->ibdev.alloc_fast_reg_mr = ocrdma_alloc_frmr;
 	dev->ibdev.alloc_fast_reg_page_list = ocrdma_alloc_frmr_page_list;
 	dev->ibdev.free_fast_reg_page_list = ocrdma_free_frmr_page_list;
 
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index 3487780..fb97db1 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -3030,47 +3030,6 @@ pbl_err:
 	return ERR_PTR(-ENOMEM);
 }
 
-struct ib_mr *ocrdma_alloc_frmr(struct ib_pd *ibpd, int max_page_list_len)
-{
-	int status;
-	struct ocrdma_mr *mr;
-	struct ocrdma_pd *pd = get_ocrdma_pd(ibpd);
-	struct ocrdma_dev *dev = get_ocrdma_dev(ibpd->device);
-
-	if (max_page_list_len > dev->attr.max_pages_per_frmr)
-		return ERR_PTR(-EINVAL);
-
-	mr = kzalloc(sizeof(*mr), GFP_KERNEL);
-	if (!mr)
-		return ERR_PTR(-ENOMEM);
-
-	status = ocrdma_get_pbl_info(dev, mr, max_page_list_len);
-	if (status)
-		goto pbl_err;
-	mr->hwmr.fr_mr = 1;
-	mr->hwmr.remote_rd = 0;
-	mr->hwmr.remote_wr = 0;
-	mr->hwmr.local_rd = 0;
-	mr->hwmr.local_wr = 0;
-	mr->hwmr.mw_bind = 0;
-	status = ocrdma_build_pbl_tbl(dev, &mr->hwmr);
-	if (status)
-		goto pbl_err;
-	status = ocrdma_reg_mr(dev, &mr->hwmr, pd->id, 0);
-	if (status)
-		goto mbx_err;
-	mr->ibmr.rkey = mr->hwmr.lkey;
-	mr->ibmr.lkey = mr->hwmr.lkey;
-	dev->stag_arr[(mr->hwmr.lkey >> 8) & (OCRDMA_MAX_STAG - 1)] =
-		(unsigned long) mr;
-	return &mr->ibmr;
-mbx_err:
-	ocrdma_free_mr_pbl_tbl(dev, &mr->hwmr);
-pbl_err:
-	kfree(mr);
-	return ERR_PTR(-ENOMEM);
-}
-
 struct ib_fast_reg_page_list *ocrdma_alloc_frmr_page_list(struct ib_device
 							  *ibdev,
 							  int page_list_len)
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
index eebcda2..d09ff8e 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
@@ -100,7 +100,6 @@ struct ib_mr *ocrdma_alloc_mr(struct ib_pd *pd,
 			      enum ib_mr_type mr_type,
 			      u32 max_entries,
 			      u32 flags);
-struct ib_mr *ocrdma_alloc_frmr(struct ib_pd *pd, int max_page_list_len);
 struct ib_fast_reg_page_list *ocrdma_alloc_frmr_page_list(struct ib_device
 							*ibdev,
 							int page_list_len);
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 16/43] qib: Drop qib_alloc_fast_reg_mr
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (14 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 15/43] ocrdma: Drop ocrdma_alloc_frmr Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 17/43] nes: Drop nes_alloc_fast_reg_mr Sagi Grimberg
                     ` (27 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/qib/qib_mr.c    | 17 -----------------
 drivers/infiniband/hw/qib/qib_verbs.c |  1 -
 drivers/infiniband/hw/qib/qib_verbs.h |  2 --
 3 files changed, 20 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_mr.c b/drivers/infiniband/hw/qib/qib_mr.c
index 1522255..2a4afea 100644
--- a/drivers/infiniband/hw/qib/qib_mr.c
+++ b/drivers/infiniband/hw/qib/qib_mr.c
@@ -344,23 +344,6 @@ struct ib_mr *qib_alloc_mr(struct ib_pd *pd,
 	return &mr->ibmr;
 }
 
-/*
- * Allocate a memory region usable with the
- * IB_WR_FAST_REG_MR send work request.
- *
- * Return the memory region on success, otherwise return an errno.
- */
-struct ib_mr *qib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len)
-{
-	struct qib_mr *mr;
-
-	mr = alloc_mr(max_page_list_len, pd);
-	if (IS_ERR(mr))
-		return (struct ib_mr *)mr;
-
-	return &mr->ibmr;
-}
-
 struct ib_fast_reg_page_list *
 qib_alloc_fast_reg_page_list(struct ib_device *ibdev, int page_list_len)
 {
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
index 323666b..ef022a1 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -2236,7 +2236,6 @@ int qib_register_ib_device(struct qib_devdata *dd)
 	ibdev->reg_user_mr = qib_reg_user_mr;
 	ibdev->dereg_mr = qib_dereg_mr;
 	ibdev->alloc_mr = qib_alloc_mr;
-	ibdev->alloc_fast_reg_mr = qib_alloc_fast_reg_mr;
 	ibdev->alloc_fast_reg_page_list = qib_alloc_fast_reg_page_list;
 	ibdev->free_fast_reg_page_list = qib_free_fast_reg_page_list;
 	ibdev->alloc_fmr = qib_alloc_fmr;
diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h
index 034510c..8fbd995 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.h
+++ b/drivers/infiniband/hw/qib/qib_verbs.h
@@ -1037,8 +1037,6 @@ struct ib_mr *qib_alloc_mr(struct ib_pd *pd,
 			   u32 max_entries,
 			   u32 flags);
 
-struct ib_mr *qib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len);
-
 struct ib_fast_reg_page_list *qib_alloc_fast_reg_page_list(
 				struct ib_device *ibdev, int page_list_len);
 
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 17/43] nes: Drop nes_alloc_fast_reg_mr
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (15 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 16/43] qib: Drop qib_alloc_fast_reg_mr Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 18/43] cxgb4: Drop c4iw_alloc_fast_reg_mr Sagi Grimberg
                     ` (26 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/nes/nes_verbs.c | 66 -----------------------------------
 1 file changed, 66 deletions(-)

diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index ac63763..752e6ea 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -447,71 +447,6 @@ static struct ib_mr *nes_alloc_mr(struct ib_pd *ibpd,
 }
 
 /*
- * nes_alloc_fast_reg_mr
- */
-static struct ib_mr *nes_alloc_fast_reg_mr(struct ib_pd *ibpd, int max_page_list_len)
-{
-	struct nes_pd *nespd = to_nespd(ibpd);
-	struct nes_vnic *nesvnic = to_nesvnic(ibpd->device);
-	struct nes_device *nesdev = nesvnic->nesdev;
-	struct nes_adapter *nesadapter = nesdev->nesadapter;
-
-	u32 next_stag_index;
-	u8 stag_key = 0;
-	u32 driver_key = 0;
-	int err = 0;
-	u32 stag_index = 0;
-	struct nes_mr *nesmr;
-	u32 stag;
-	int ret;
-	struct ib_mr *ibmr;
-/*
- * Note:  Set to always use a fixed length single page entry PBL.  This is to allow
- *	 for the fast_reg_mr operation to always know the size of the PBL.
- */
-	if (max_page_list_len > (NES_4K_PBL_CHUNK_SIZE / sizeof(u64)))
-		return ERR_PTR(-E2BIG);
-
-	get_random_bytes(&next_stag_index, sizeof(next_stag_index));
-	stag_key = (u8)next_stag_index;
-	next_stag_index >>= 8;
-	next_stag_index %= nesadapter->max_mr;
-
-	err = nes_alloc_resource(nesadapter, nesadapter->allocated_mrs,
-				 nesadapter->max_mr, &stag_index,
-				 &next_stag_index, NES_RESOURCE_FAST_MR);
-	if (err)
-		return ERR_PTR(err);
-
-	nesmr = kzalloc(sizeof(*nesmr), GFP_KERNEL);
-	if (!nesmr) {
-		nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index);
-		return ERR_PTR(-ENOMEM);
-	}
-
-	stag = stag_index << 8;
-	stag |= driver_key;
-	stag += (u32)stag_key;
-
-	nes_debug(NES_DBG_MR, "Allocating STag 0x%08X index = 0x%08X\n",
-		  stag, stag_index);
-
-	ret = alloc_fast_reg_mr(nesdev, nespd, stag, max_page_list_len);
-
-	if (ret == 0) {
-		nesmr->ibmr.rkey = stag;
-		nesmr->ibmr.lkey = stag;
-		nesmr->mode = IWNES_MEMREG_TYPE_FMEM;
-		ibmr = &nesmr->ibmr;
-	} else {
-		kfree(nesmr);
-		nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index);
-		ibmr = ERR_PTR(-ENOMEM);
-	}
-	return ibmr;
-}
-
-/*
  * nes_alloc_fast_reg_page_list
  */
 static struct ib_fast_reg_page_list *nes_alloc_fast_reg_page_list(
@@ -4002,7 +3937,6 @@ struct nes_ib_device *nes_init_ofa_device(struct net_device *netdev)
 	nesibdev->ibdev.bind_mw = nes_bind_mw;
 
 	nesibdev->ibdev.alloc_mr = nes_alloc_mr;
-	nesibdev->ibdev.alloc_fast_reg_mr = nes_alloc_fast_reg_mr;
 	nesibdev->ibdev.alloc_fast_reg_page_list = nes_alloc_fast_reg_page_list;
 	nesibdev->ibdev.free_fast_reg_page_list = nes_free_fast_reg_page_list;
 
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 18/43] cxgb4: Drop c4iw_alloc_fast_reg_mr
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (16 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 17/43] nes: Drop nes_alloc_fast_reg_mr Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 19/43] cxgb3: Drop iwch_alloc_fast_reg_mr Sagi Grimberg
                     ` (25 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h |  1 -
 drivers/infiniband/hw/cxgb4/mem.c      | 51 ----------------------------------
 drivers/infiniband/hw/cxgb4/provider.c |  1 -
 3 files changed, 53 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index 97b2568..886be9c 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -974,7 +974,6 @@ struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
 			    enum ib_mr_type mr_type,
 			    u32 max_entries,
 			    u32 flags);
-struct ib_mr *c4iw_alloc_fast_reg_mr(struct ib_pd *pd, int pbl_depth);
 int c4iw_dealloc_mw(struct ib_mw *mw);
 struct ib_mw *c4iw_alloc_mw(struct ib_pd *pd, enum ib_mw_type type);
 struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start,
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index 7ee01ce..5ecf4aa 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -910,57 +910,6 @@ err:
 	return ERR_PTR(ret);
 }
 
-struct ib_mr *c4iw_alloc_fast_reg_mr(struct ib_pd *pd, int pbl_depth)
-{
-	struct c4iw_dev *rhp;
-	struct c4iw_pd *php;
-	struct c4iw_mr *mhp;
-	u32 mmid;
-	u32 stag = 0;
-	int ret = 0;
-
-	php = to_c4iw_pd(pd);
-	rhp = php->rhp;
-	mhp = kzalloc(sizeof(*mhp), GFP_KERNEL);
-	if (!mhp) {
-		ret = -ENOMEM;
-		goto err;
-	}
-
-	mhp->rhp = rhp;
-	ret = alloc_pbl(mhp, pbl_depth);
-	if (ret)
-		goto err1;
-	mhp->attr.pbl_size = pbl_depth;
-	ret = allocate_stag(&rhp->rdev, &stag, php->pdid,
-				 mhp->attr.pbl_size, mhp->attr.pbl_addr);
-	if (ret)
-		goto err2;
-	mhp->attr.pdid = php->pdid;
-	mhp->attr.type = FW_RI_STAG_NSMR;
-	mhp->attr.stag = stag;
-	mhp->attr.state = 1;
-	mmid = (stag) >> 8;
-	mhp->ibmr.rkey = mhp->ibmr.lkey = stag;
-	if (insert_handle(rhp, &rhp->mmidr, mhp, mmid)) {
-		ret = -ENOMEM;
-		goto err3;
-	}
-
-	PDBG("%s mmid 0x%x mhp %p stag 0x%x\n", __func__, mmid, mhp, stag);
-	return &(mhp->ibmr);
-err3:
-	dereg_mem(&rhp->rdev, stag, mhp->attr.pbl_size,
-		       mhp->attr.pbl_addr);
-err2:
-	c4iw_pblpool_free(&mhp->rhp->rdev, mhp->attr.pbl_addr,
-			      mhp->attr.pbl_size << 3);
-err1:
-	kfree(mhp);
-err:
-	return ERR_PTR(ret);
-}
-
 struct ib_fast_reg_page_list *c4iw_alloc_fastreg_pbl(struct ib_device *device,
 						     int page_list_len)
 {
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index 2885aba..7746113 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -557,7 +557,6 @@ int c4iw_register_device(struct c4iw_dev *dev)
 	dev->ibdev.bind_mw = c4iw_bind_mw;
 	dev->ibdev.dealloc_mw = c4iw_dealloc_mw;
 	dev->ibdev.alloc_mr = c4iw_alloc_mr;
-	dev->ibdev.alloc_fast_reg_mr = c4iw_alloc_fast_reg_mr;
 	dev->ibdev.alloc_fast_reg_page_list = c4iw_alloc_fastreg_pbl;
 	dev->ibdev.free_fast_reg_page_list = c4iw_free_fastreg_pbl;
 	dev->ibdev.attach_mcast = c4iw_multicast_attach;
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 19/43] cxgb3: Drop iwch_alloc_fast_reg_mr
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (17 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 18/43] cxgb4: Drop c4iw_alloc_fast_reg_mr Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 20/43] IB/core: Drop ib_alloc_fast_reg_mr Sagi Grimberg
                     ` (24 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/cxgb3/iwch_provider.c | 47 -----------------------------
 1 file changed, 47 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index d0e9e2d..af55b79 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -848,52 +848,6 @@ err:
 	return ERR_PTR(ret);
 }
 
-static struct ib_mr *iwch_alloc_fast_reg_mr(struct ib_pd *pd, int pbl_depth)
-{
-	struct iwch_dev *rhp;
-	struct iwch_pd *php;
-	struct iwch_mr *mhp;
-	u32 mmid;
-	u32 stag = 0;
-	int ret = 0;
-
-	php = to_iwch_pd(pd);
-	rhp = php->rhp;
-	mhp = kzalloc(sizeof(*mhp), GFP_KERNEL);
-	if (!mhp)
-		goto err;
-
-	mhp->rhp = rhp;
-	ret = iwch_alloc_pbl(mhp, pbl_depth);
-	if (ret)
-		goto err1;
-	mhp->attr.pbl_size = pbl_depth;
-	ret = cxio_allocate_stag(&rhp->rdev, &stag, php->pdid,
-				 mhp->attr.pbl_size, mhp->attr.pbl_addr);
-	if (ret)
-		goto err2;
-	mhp->attr.pdid = php->pdid;
-	mhp->attr.type = TPT_NON_SHARED_MR;
-	mhp->attr.stag = stag;
-	mhp->attr.state = 1;
-	mmid = (stag) >> 8;
-	mhp->ibmr.rkey = mhp->ibmr.lkey = stag;
-	if (insert_handle(rhp, &rhp->mmidr, mhp, mmid))
-		goto err3;
-
-	PDBG("%s mmid 0x%x mhp %p stag 0x%x\n", __func__, mmid, mhp, stag);
-	return &(mhp->ibmr);
-err3:
-	cxio_dereg_mem(&rhp->rdev, stag, mhp->attr.pbl_size,
-		       mhp->attr.pbl_addr);
-err2:
-	iwch_free_pbl(mhp);
-err1:
-	kfree(mhp);
-err:
-	return ERR_PTR(ret);
-}
-
 static struct ib_fast_reg_page_list *iwch_alloc_fastreg_pbl(
 					struct ib_device *device,
 					int page_list_len)
@@ -1492,7 +1446,6 @@ int iwch_register_device(struct iwch_dev *dev)
 	dev->ibdev.bind_mw = iwch_bind_mw;
 	dev->ibdev.dealloc_mw = iwch_dealloc_mw;
 	dev->ibdev.alloc_mr = iwch_alloc_mr;
-	dev->ibdev.alloc_fast_reg_mr = iwch_alloc_fast_reg_mr;
 	dev->ibdev.alloc_fast_reg_page_list = iwch_alloc_fastreg_pbl;
 	dev->ibdev.free_fast_reg_page_list = iwch_free_fastreg_pbl;
 	dev->ibdev.attach_mcast = iwch_multicast_attach;
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 20/43] IB/core: Drop ib_alloc_fast_reg_mr
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (18 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 19/43] cxgb3: Drop iwch_alloc_fast_reg_mr Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 21/43] mlx5: Allocate a private page list in ib_alloc_mr Sagi Grimberg
                     ` (23 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Fully replaced by a more generic and suitable
ib_alloc_mr

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/verbs.c | 21 ---------------------
 include/rdma/ib_verbs.h         | 11 -----------
 2 files changed, 32 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 23d73bd..beed431 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1265,27 +1265,6 @@ struct ib_mr *ib_alloc_mr(struct ib_pd *pd,
 }
 EXPORT_SYMBOL(ib_alloc_mr);
 
-struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len)
-{
-	struct ib_mr *mr;
-
-	if (!pd->device->alloc_fast_reg_mr)
-		return ERR_PTR(-ENOSYS);
-
-	mr = pd->device->alloc_fast_reg_mr(pd, max_page_list_len);
-
-	if (!IS_ERR(mr)) {
-		mr->device  = pd->device;
-		mr->pd      = pd;
-		mr->uobject = NULL;
-		atomic_inc(&pd->usecnt);
-		atomic_set(&mr->usecnt, 0);
-	}
-
-	return mr;
-}
-EXPORT_SYMBOL(ib_alloc_fast_reg_mr);
-
 struct ib_fast_reg_page_list *ib_alloc_fast_reg_page_list(struct ib_device *device,
 							  int max_page_list_len)
 {
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 5ec9a70..7a93e2d 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1661,8 +1661,6 @@ struct ib_device {
 					       enum ib_mr_type mr_type,
 					       u32 max_entries,
 					       u32 flags);
-	struct ib_mr *		   (*alloc_fast_reg_mr)(struct ib_pd *pd,
-					       int max_page_list_len);
 	struct ib_fast_reg_page_list * (*alloc_fast_reg_page_list)(struct ib_device *device,
 								   int page_list_len);
 	void			   (*free_fast_reg_page_list)(struct ib_fast_reg_page_list *page_list);
@@ -2803,15 +2801,6 @@ struct ib_mr *ib_alloc_mr(struct ib_pd *pd,
 			  u32 flags);
 
 /**
- * ib_alloc_fast_reg_mr - Allocates memory region usable with the
- *   IB_WR_FAST_REG_MR send work request.
- * @pd: The protection domain associated with the region.
- * @max_page_list_len: requested max physical buffer list length to be
- *   used with fast register work requests for this MR.
- */
-struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len);
-
-/**
  * ib_alloc_fast_reg_page_list - Allocates a page list array
  * @device - ib device pointer.
  * @page_list_len - size of the page list array to be allocated.
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 21/43] mlx5: Allocate a private page list in ib_alloc_mr
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (19 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 20/43] IB/core: Drop ib_alloc_fast_reg_mr Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
       [not found]     ` <1437548143-24893-22-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-07-22  6:55   ` [PATCH WIP 22/43] mlx4: " Sagi Grimberg
                     ` (22 subsequent siblings)
  43 siblings, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  5 ++++
 drivers/infiniband/hw/mlx5/mr.c      | 45 ++++++++++++++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index c2916f1..df5e959 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -315,6 +315,11 @@ enum mlx5_ib_mtt_access_flags {
 
 struct mlx5_ib_mr {
 	struct ib_mr		ibmr;
+	u64		        *pl;
+	__be64			*mpl;
+	dma_addr_t		pl_map;
+	int			ndescs;
+	int			max_descs;
 	struct mlx5_core_mr	mmr;
 	struct ib_umem	       *umem;
 	struct mlx5_shared_mr_info	*smr_info;
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index c8de302..1075065 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1167,6 +1167,42 @@ error:
 	return err;
 }
 
+static int
+mlx5_alloc_page_list(struct ib_device *device,
+		     struct mlx5_ib_mr *mr, int ndescs)
+{
+	int size = ndescs * sizeof(u64);
+
+	mr->pl = kcalloc(ndescs, sizeof(u64), GFP_KERNEL);
+	if (!mr->pl)
+		return -ENOMEM;
+
+	mr->mpl = dma_alloc_coherent(device->dma_device, size,
+				     &mr->pl_map, GFP_KERNEL);
+	if (!mr->mpl)
+		goto err;
+
+	return 0;
+err:
+	kfree(mr->pl);
+
+	return -ENOMEM;
+}
+
+static void
+mlx5_free_page_list(struct mlx5_ib_mr *mr)
+{
+	struct ib_device *device = mr->ibmr.device;
+	int size = mr->max_descs * sizeof(u64);
+
+	kfree(mr->pl);
+	if (mr->mpl)
+		dma_free_coherent(device->dma_device, size,
+				  mr->mpl, mr->pl_map);
+	mr->pl = NULL;
+	mr->mpl = NULL;
+}
+
 static int clean_mr(struct mlx5_ib_mr *mr)
 {
 	struct mlx5_ib_dev *dev = to_mdev(mr->ibmr.device);
@@ -1186,6 +1222,8 @@ static int clean_mr(struct mlx5_ib_mr *mr)
 		mr->sig = NULL;
 	}
 
+	mlx5_free_page_list(mr);
+
 	if (!umred) {
 		err = destroy_mkey(dev, mr);
 		if (err) {
@@ -1279,6 +1317,12 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
 	if (mr_type == IB_MR_TYPE_FAST_REG) {
 		access_mode = MLX5_ACCESS_MODE_MTT;
 		in->seg.log2_page_size = PAGE_SHIFT;
+
+		err = mlx5_alloc_page_list(pd->device, mr, ndescs);
+		if (err)
+			goto err_free_in;
+
+		mr->max_descs = ndescs;
 	} else if (mr_type == IB_MR_TYPE_SIGNATURE) {
 		u32 psv_index[2];
 
@@ -1335,6 +1379,7 @@ err_destroy_psv:
 			mlx5_ib_warn(dev, "failed to destroy wire psv %d\n",
 				     mr->sig->psv_wire.psv_idx);
 	}
+	mlx5_free_page_list(mr);
 err_free_sig:
 	kfree(mr->sig);
 err_free_in:
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 22/43] mlx4: Allocate a private page list in ib_alloc_mr
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (20 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 21/43] mlx5: Allocate a private page list in ib_alloc_mr Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 23/43] ocrdma: " Sagi Grimberg
                     ` (21 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx4/mlx4_ib.h |  5 ++++
 drivers/infiniband/hw/mlx4/mr.c      | 52 +++++++++++++++++++++++++++++++++---
 2 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 9220faf..a9a4a7f 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -120,6 +120,11 @@ struct mlx4_ib_mr {
 	struct ib_mr		ibmr;
 	struct mlx4_mr		mmr;
 	struct ib_umem	       *umem;
+	u64		        *pl;
+	__be64			*mpl;
+	dma_addr_t		pl_map;
+	u32			npages;
+	u32			max_pages;
 };
 
 struct mlx4_ib_mw {
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index 121ee7f..01e16bc 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -271,11 +271,50 @@ release_mpt_entry:
 	return err;
 }
 
+static int
+mlx4_alloc_page_list(struct ib_device *device,
+		     struct mlx4_ib_mr *mr,
+		      int max_entries)
+{
+	int size = max_entries * sizeof (u64);
+
+	mr->pl = kcalloc(max_entries, sizeof(u64), GFP_KERNEL);
+	if (!mr->pl)
+		return -ENOMEM;
+
+	mr->mpl = dma_alloc_coherent(device->dma_device, size,
+				     &mr->pl_map, GFP_KERNEL);
+	if (!mr->mpl)
+		goto err;
+
+	return 0;
+err:
+	kfree(mr->pl);
+
+	return -ENOMEM;
+}
+
+static void
+mlx4_free_page_list(struct mlx4_ib_mr *mr)
+{
+	struct ib_device *device = mr->ibmr.device;
+	int size = mr->max_pages * sizeof(u64);
+
+	kfree(mr->pl);
+	if (mr->mpl)
+		dma_free_coherent(device->dma_device, size,
+				  mr->mpl, mr->pl_map);
+	mr->pl = NULL;
+	mr->mpl = NULL;
+}
+
 int mlx4_ib_dereg_mr(struct ib_mr *ibmr)
 {
 	struct mlx4_ib_mr *mr = to_mmr(ibmr);
 	int ret;
 
+	mlx4_free_page_list(mr);
+
 	ret = mlx4_mr_free(to_mdev(ibmr->device)->dev, &mr->mmr);
 	if (ret)
 		return ret;
@@ -371,18 +410,25 @@ struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd,
 	if (err)
 		goto err_free;
 
+	err = mlx4_alloc_page_list(pd->device, mr, max_entries);
+	if (err)
+		goto err_free_mr;
+
+	mr->max_pages = max_entries;
+
 	err = mlx4_mr_enable(dev->dev, &mr->mmr);
 	if (err)
-		goto err_mr;
+		goto err_free_pl;
 
 	mr->ibmr.rkey = mr->ibmr.lkey = mr->mmr.key;
 	mr->umem = NULL;
 
 	return &mr->ibmr;
 
-err_mr:
+err_free_pl:
+	mlx4_free_page_list(mr);
+err_free_mr:
 	(void) mlx4_mr_free(dev->dev, &mr->mmr);
-
 err_free:
 	kfree(mr);
 	return ERR_PTR(err);
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 23/43] ocrdma: Allocate a private page list in ib_alloc_mr
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (21 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 22/43] mlx4: " Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 24/43] cxgb3: Allocate a provate " Sagi Grimberg
                     ` (20 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/ocrdma/ocrdma.h       | 2 ++
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 9 +++++++++
 2 files changed, 11 insertions(+)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h b/drivers/infiniband/hw/ocrdma/ocrdma.h
index b396344..37deea2 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma.h
@@ -178,6 +178,8 @@ struct ocrdma_mr {
 	struct ib_mr ibmr;
 	struct ib_umem *umem;
 	struct ocrdma_hw_mr hwmr;
+	u64 *pl;
+	u32 npages;
 };
 
 struct ocrdma_stats {
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index fb97db1..a764cb9 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -957,6 +957,7 @@ int ocrdma_dereg_mr(struct ib_mr *ib_mr)
 
 	(void) ocrdma_mbx_dealloc_lkey(dev, mr->hwmr.fr_mr, mr->hwmr.lkey);
 
+	kfree(mr->pl);
 	ocrdma_free_mr_pbl_tbl(dev, &mr->hwmr);
 
 	/* it could be user registered memory. */
@@ -3003,6 +3004,12 @@ struct ib_mr *ocrdma_alloc_mr(struct ib_pd *ibpd,
 	if (!mr)
 		return ERR_PTR(-ENOMEM);
 
+	mr->pl = kcalloc(max_entries, sizeof(u64), GFP_KERNEL);
+	if (!mr->pl) {
+		status = -ENOMEM;
+		goto pl_err;
+	}
+
 	status = ocrdma_get_pbl_info(dev, mr, max_entries);
 	if (status)
 		goto pbl_err;
@@ -3026,6 +3033,8 @@ struct ib_mr *ocrdma_alloc_mr(struct ib_pd *ibpd,
 mbx_err:
 	ocrdma_free_mr_pbl_tbl(dev, &mr->hwmr);
 pbl_err:
+	kfree(mr->pl);
+pl_err:
 	kfree(mr);
 	return ERR_PTR(-ENOMEM);
 }
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 24/43] cxgb3: Allocate a provate page list in ib_alloc_mr
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (22 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 23/43] ocrdma: " Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 25/43] cxgb4: Allocate a private " Sagi Grimberg
                     ` (19 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/cxgb3/iwch_provider.c | 9 +++++++++
 drivers/infiniband/hw/cxgb3/iwch_provider.h | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index af55b79..c9368e6 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -463,6 +463,7 @@ static int iwch_dereg_mr(struct ib_mr *ib_mr)
 		return -EINVAL;
 
 	mhp = to_iwch_mr(ib_mr);
+	kfree(mhp->pl);
 	rhp = mhp->rhp;
 	mmid = mhp->attr.stag >> 8;
 	cxio_dereg_mem(&rhp->rdev, mhp->attr.stag, mhp->attr.pbl_size,
@@ -817,6 +818,12 @@ static struct ib_mr *iwch_alloc_mr(struct ib_pd *pd,
 	if (!mhp)
 		goto err;
 
+	mhp->pl = kcalloc(max_entries, sizeof(u64), GFP_KERNEL);
+	if (!mhp->pl) {
+		ret = -ENOMEM;
+		goto pl_err;
+	}
+
 	mhp->rhp = rhp;
 	ret = iwch_alloc_pbl(mhp, max_entries);
 	if (ret)
@@ -843,6 +850,8 @@ err3:
 err2:
 	iwch_free_pbl(mhp);
 err1:
+	kfree(mhp->pl);
+pl_err:
 	kfree(mhp);
 err:
 	return ERR_PTR(ret);
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h
index 87c14b0..8e16da9 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.h
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h
@@ -77,6 +77,8 @@ struct iwch_mr {
 	struct iwch_dev *rhp;
 	u64 kva;
 	struct tpt_attributes attr;
+	u64 *pl;
+	u32 npages;
 };
 
 typedef struct iwch_mw iwch_mw_handle;
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 25/43] cxgb4: Allocate a private page list in ib_alloc_mr
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (23 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 24/43] cxgb3: Allocate a provate " Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 26/43] qib: " Sagi Grimberg
                     ` (18 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h |  4 ++++
 drivers/infiniband/hw/cxgb4/mem.c      | 15 +++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index 886be9c..e529ace 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -386,6 +386,10 @@ struct c4iw_mr {
 	struct c4iw_dev *rhp;
 	u64 kva;
 	struct tpt_attributes attr;
+	u64 *mpl;
+	dma_addr_t mpl_addr;
+	u32 max_mpl_len;
+	u32 mpl_len;
 };
 
 static inline struct c4iw_mr *to_c4iw_mr(struct ib_mr *ibmr)
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index 5ecf4aa..91aedce 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -864,6 +864,7 @@ struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
 	u32 mmid;
 	u32 stag = 0;
 	int ret = 0;
+	int length = roundup(max_entries * sizeof(u64), 32);
 
 	if (mr_type != IB_MR_TYPE_FAST_REG || flags)
 		return ERR_PTR(-EINVAL);
@@ -876,6 +877,14 @@ struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
 		goto err;
 	}
 
+	mhp->mpl = dma_alloc_coherent(&rhp->rdev.lldi.pdev->dev,
+				      length, &mhp->mpl_addr, GFP_KERNEL);
+	if (!mhp->mpl) {
+		ret = -ENOMEM;
+		goto err_mpl;
+	}
+	mhp->max_mpl_len = length;
+
 	mhp->rhp = rhp;
 	ret = alloc_pbl(mhp, max_entries);
 	if (ret)
@@ -905,6 +914,9 @@ err2:
 	c4iw_pblpool_free(&mhp->rhp->rdev, mhp->attr.pbl_addr,
 			      mhp->attr.pbl_size << 3);
 err1:
+	dma_free_coherent(&mhp->rhp->rdev.lldi.pdev->dev,
+			  mhp->max_mpl_len, mhp->mpl, mhp->mpl_addr);
+err_mpl:
 	kfree(mhp);
 err:
 	return ERR_PTR(ret);
@@ -970,6 +982,9 @@ int c4iw_dereg_mr(struct ib_mr *ib_mr)
 	rhp = mhp->rhp;
 	mmid = mhp->attr.stag >> 8;
 	remove_handle(rhp, &rhp->mmidr, mmid);
+	if (mhp->mpl)
+		dma_free_coherent(&mhp->rhp->rdev.lldi.pdev->dev,
+				  mhp->max_mpl_len, mhp->mpl, mhp->mpl_addr);
 	dereg_mem(&rhp->rdev, mhp->attr.stag, mhp->attr.pbl_size,
 		       mhp->attr.pbl_addr);
 	if (mhp->attr.pbl_size)
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 26/43] qib: Allocate a private page list in ib_alloc_mr
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (24 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 25/43] cxgb4: Allocate a private " Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 27/43] nes: " Sagi Grimberg
                     ` (17 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/qib/qib_mr.c    | 9 +++++++++
 drivers/infiniband/hw/qib/qib_verbs.h | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/drivers/infiniband/hw/qib/qib_mr.c b/drivers/infiniband/hw/qib/qib_mr.c
index 2a4afea..a58a347 100644
--- a/drivers/infiniband/hw/qib/qib_mr.c
+++ b/drivers/infiniband/hw/qib/qib_mr.c
@@ -303,6 +303,7 @@ int qib_dereg_mr(struct ib_mr *ibmr)
 	int ret = 0;
 	unsigned long timeout;
 
+	kfree(mr->pl);
 	qib_free_lkey(&mr->mr);
 
 	qib_put_mr(&mr->mr); /* will set completion if last */
@@ -341,7 +342,15 @@ struct ib_mr *qib_alloc_mr(struct ib_pd *pd,
 	if (IS_ERR(mr))
 		return (struct ib_mr *)mr;
 
+	mr->pl = kcalloc(max_entries, sizeof(u64), GFP_KERNEL);
+	if (!mr->pl)
+		goto err;
+
 	return &mr->ibmr;
+
+err:
+	qib_dereg_mr(&mr->ibmr);
+	return ERR_PTR(-ENOMEM);
 }
 
 struct ib_fast_reg_page_list *
diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h
index 8fbd995..c8062ae 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.h
+++ b/drivers/infiniband/hw/qib/qib_verbs.h
@@ -330,6 +330,8 @@ struct qib_mr {
 	struct ib_mr ibmr;
 	struct ib_umem *umem;
 	struct qib_mregion mr;  /* must be last */
+	u64 *pl;
+	u32 npages;
 };
 
 /*
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 27/43] nes: Allocate a private page list in ib_alloc_mr
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (25 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 26/43] qib: " Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 28/43] IB/core: Introduce new fast registration API Sagi Grimberg
                     ` (16 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/nes/nes_verbs.c | 27 +++++++++++++++++++++++++++
 drivers/infiniband/hw/nes/nes_verbs.h |  5 +++++
 2 files changed, 32 insertions(+)

diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index 752e6ea..532496d 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -51,6 +51,7 @@ atomic_t qps_created;
 atomic_t sw_qps_destroyed;
 
 static void nes_unregister_ofa_device(struct nes_ib_device *nesibdev);
+static int nes_dereg_mr(struct ib_mr *ib_mr);
 
 /**
  * nes_alloc_mw
@@ -443,7 +444,25 @@ static struct ib_mr *nes_alloc_mr(struct ib_pd *ibpd,
 		nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index);
 		ibmr = ERR_PTR(-ENOMEM);
 	}
+
+	nesmr->pl = kcalloc(max_entries, sizeof(u64), GFP_KERNEL);
+	if (!nesmr->pl)
+		goto err;
+
+	nesmr->mpl = pci_alloc_consistent(nesdev->pcidev,
+					  max_entries * sizeof(u64),
+					  &nesmr->mpl_addr);
+	if (!nesmr->mpl_addr)
+		goto err;
+
+	nesmr->max_pages = max_entries;
+
 	return ibmr;
+
+err:
+	nes_dereg_mr(ibmr);
+
+	return ERR_PTR(-ENOMEM);
 }
 
 /*
@@ -2681,6 +2700,14 @@ static int nes_dereg_mr(struct ib_mr *ib_mr)
 	u16 major_code;
 	u16 minor_code;
 
+
+	kfree(nesmr->pl);
+	if (nesmr->mpl)
+		pci_free_consistent(nesdev->pcidev,
+				    nesmr->max_pages * sizeof(u64),
+				    nesmr->mpl,
+				    nesmr->mpl_addr);
+
 	if (nesmr->region) {
 		ib_umem_release(nesmr->region);
 	}
diff --git a/drivers/infiniband/hw/nes/nes_verbs.h b/drivers/infiniband/hw/nes/nes_verbs.h
index 309b31c..e99aa69 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.h
+++ b/drivers/infiniband/hw/nes/nes_verbs.h
@@ -79,6 +79,11 @@ struct nes_mr {
 	u16               pbls_used;
 	u8                mode;
 	u8                pbl_4k;
+	u64               *pl;
+	u64               *mpl;
+	dma_addr_t        mpl_addr;
+	u32               max_pages;
+	u32		  npages;
 };
 
 struct nes_hw_pb {
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (26 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 27/43] nes: " Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
       [not found]     ` <1437548143-24893-29-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-07-22  6:55   ` [PATCH WIP 29/43] mlx5: Support the new memory " Sagi Grimberg
                     ` (15 subsequent siblings)
  43 siblings, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

The new fast registration is receiving a struct
scatterlist and converts it to a page list under
the verbs API. The user is provided with a new
verb ib_map_mr_sg, and a helper to set the send work
request structure.

The drivers are handed with a generic helper that
converts a scatterlist into a vector of pages.
Given that some drivers have a shadow mapped page list,
I expect that drivers might use their own routines to
avoid the extra copies.

The new registration API is added with fast_reg for
now, but once all drivers and ULPs will be ported, we
can drop the old registration API.

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/core/verbs.c | 123 ++++++++++++++++++++++++++++++++++++++++
 include/rdma/ib_verbs.h         |  37 ++++++++++++
 2 files changed, 160 insertions(+)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index beed431..9875163 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1481,3 +1481,126 @@ int ib_check_mr_status(struct ib_mr *mr, u32 check_mask,
 		mr->device->check_mr_status(mr, check_mask, mr_status) : -ENOSYS;
 }
 EXPORT_SYMBOL(ib_check_mr_status);
+
+
+/**
+ * ib_map_mr_sg() - Populates MR with a dma mapped SG list
+ * @mr:            memory region
+ * @sg:            dma mapped scatterlist
+ * @sg_nents:      number of entries in sg
+ * @access:        access permissions
+ *
+ * After this completes successfully, the memory region is ready
+ * for fast registration.
+ */
+int ib_map_mr_sg(struct ib_mr *mr,
+		 struct scatterlist *sg,
+		 unsigned short sg_nents,
+		 unsigned int access)
+{
+	int rc;
+
+	if (!mr->device->map_mr_sg)
+		return -ENOSYS;
+
+	rc = mr->device->map_mr_sg(mr, sg, sg_nents);
+	if (!rc)
+		mr->access = access;
+
+	return rc;
+}
+EXPORT_SYMBOL(ib_map_mr_sg);
+
+/**
+ * ib_sg_to_pages() - Convert a sg list to a page vector
+ * @dev:           ib device
+ * @sgl:           dma mapped scatterlist
+ * @sg_nents:      number of entries in sg
+ * @max_pages:     maximum pages allowed
+ * @pages:         output page vector
+ * @npages:        output number of mapped pages
+ * @length:        output total byte length
+ * @offset:        output first byte offset
+ *
+ * Core service helper for drivers to convert a scatter
+ * list to a page vector. The assumption is that the
+ * sg must meet the following conditions:
+ * - Only the first sg is allowed to have an offset
+ * - All the elements are of the same size - PAGE_SIZE
+ * - The last element is allowed to have length less than
+ *   PAGE_SIZE
+ *
+ * If any of those conditions is not met, the routine will
+ * fail with EINVAL.
+ */
+int ib_sg_to_pages(struct scatterlist *sgl,
+		   unsigned short sg_nents,
+		   unsigned short max_pages,
+		   u64 *pages, u32 *npages,
+		   u32 *length, u64 *offset)
+{
+	struct scatterlist *sg;
+	u64 last_end_dma_addr = 0, last_page_addr = 0;
+	unsigned int last_page_off = 0;
+	int i, j = 0;
+
+	/* TODO: We can do better with huge pages */
+
+	*offset = sg_dma_address(&sgl[0]);
+	*length = 0;
+
+	for_each_sg(sgl, sg, sg_nents, i) {
+		u64 dma_addr = sg_dma_address(sg);
+		unsigned int dma_len = sg_dma_len(sg);
+		u64 end_dma_addr = dma_addr + dma_len;
+		u64 page_addr = dma_addr & PAGE_MASK;
+
+		*length += dma_len;
+
+		/* Fail we ran out of pages */
+		if (unlikely(j > max_pages))
+			return -EINVAL;
+
+		if (i && sg->offset) {
+			if (unlikely((last_end_dma_addr) != dma_addr)) {
+				/* gap - fail */
+				goto err;
+			}
+			if (last_page_off + dma_len < PAGE_SIZE) {
+				/* chunk this fragment with the last */
+				last_end_dma_addr += dma_len;
+				last_page_off += dma_len;
+				continue;
+			} else {
+				/* map starting from the next page */
+				page_addr = last_page_addr + PAGE_SIZE;
+				dma_len -= PAGE_SIZE - last_page_off;
+			}
+		}
+
+		do {
+			pages[j++] = page_addr;
+			page_addr += PAGE_SIZE;
+		} while (page_addr < end_dma_addr);
+
+		last_end_dma_addr = end_dma_addr;
+		last_page_addr = end_dma_addr & PAGE_MASK;
+		last_page_off = end_dma_addr & ~PAGE_MASK;
+	}
+
+	*npages = j;
+
+	return 0;
+err:
+	pr_err("RDMA alignment violation\n");
+	for_each_sg(sgl, sg, sg_nents, i) {
+		u64 dma_addr = sg_dma_address(sg);
+		unsigned int dma_len = sg_dma_len(sg);
+
+		pr_err("sg[%d]: offset=0x%x, dma_addr=0x%llx, dma_len=0x%x\n",
+			i, sg->offset, dma_addr, dma_len);
+	}
+
+	return -EINVAL;
+}
+EXPORT_SYMBOL(ib_sg_to_pages);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 7a93e2d..d543fee 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1013,6 +1013,7 @@ enum ib_wr_opcode {
 	IB_WR_RDMA_READ_WITH_INV,
 	IB_WR_LOCAL_INV,
 	IB_WR_FAST_REG_MR,
+	IB_WR_FASTREG_MR,
 	IB_WR_MASKED_ATOMIC_CMP_AND_SWP,
 	IB_WR_MASKED_ATOMIC_FETCH_AND_ADD,
 	IB_WR_BIND_MW,
@@ -1117,6 +1118,10 @@ struct ib_send_wr {
 			u32				rkey;
 		} fast_reg;
 		struct {
+			struct ib_mr *mr;
+			u32          key;
+		} fastreg;
+		struct {
 			struct ib_mw            *mw;
 			/* The new rkey for the memory window. */
 			u32                      rkey;
@@ -1316,6 +1321,9 @@ struct ib_mr {
 	struct ib_uobject *uobject;
 	u32		   lkey;
 	u32		   rkey;
+	int		   access;
+	u64		   iova;
+	u32		   length;
 	atomic_t	   usecnt; /* count number of MWs */
 };
 
@@ -1661,6 +1669,9 @@ struct ib_device {
 					       enum ib_mr_type mr_type,
 					       u32 max_entries,
 					       u32 flags);
+	int                        (*map_mr_sg)(struct ib_mr *mr,
+						struct scatterlist *sg,
+						unsigned short sg_nents);
 	struct ib_fast_reg_page_list * (*alloc_fast_reg_page_list)(struct ib_device *device,
 								   int page_list_len);
 	void			   (*free_fast_reg_page_list)(struct ib_fast_reg_page_list *page_list);
@@ -2991,4 +3002,30 @@ static inline int ib_check_mr_access(int flags)
 int ib_check_mr_status(struct ib_mr *mr, u32 check_mask,
 		       struct ib_mr_status *mr_status);
 
+int ib_map_mr_sg(struct ib_mr *mr,
+		 struct scatterlist *sg,
+		 unsigned short sg_nents,
+		 unsigned int access);
+
+int ib_sg_to_pages(struct scatterlist *sgl,
+		   unsigned short sg_nents,
+		   unsigned short max_pages,
+		   u64 *pages, u32 *npages,
+		   u32 *length, u64 *offset);
+
+static inline void
+ib_set_fastreg_wr(struct ib_mr *mr,
+		  u32 key,
+		  uintptr_t wr_id,
+		  bool signaled,
+		  struct ib_send_wr *wr)
+{
+	wr->opcode = IB_WR_FASTREG_MR;
+	wr->wr_id = wr_id;
+	wr->send_flags = signaled ? IB_SEND_SIGNALED : 0;
+	wr->num_sge = 0;
+	wr->wr.fastreg.mr = mr;
+	wr->wr.fastreg.key = key;
+}
+
 #endif /* IB_VERBS_H */
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 29/43] mlx5: Support the new memory registration API
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (27 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 28/43] IB/core: Introduce new fast registration API Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 30/43] mlx4: " Sagi Grimberg
                     ` (14 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Just duplicated the functions to take the needed
arguments from the private MR context. The old
fast_reg routines will be dropped later.

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c    |  1 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  3 ++
 drivers/infiniband/hw/mlx5/mr.c      | 11 +++++
 drivers/infiniband/hw/mlx5/qp.c      | 90 ++++++++++++++++++++++++++++++++++++
 4 files changed, 105 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index ce75875..a90ef7a 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1503,6 +1503,7 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 	dev->ib_dev.detach_mcast	= mlx5_ib_mcg_detach;
 	dev->ib_dev.process_mad		= mlx5_ib_process_mad;
 	dev->ib_dev.alloc_mr		= mlx5_ib_alloc_mr;
+	dev->ib_dev.map_mr_sg		= mlx5_ib_map_mr_sg;
 	dev->ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list;
 	dev->ib_dev.free_fast_reg_page_list  = mlx5_ib_free_fast_reg_page_list;
 	dev->ib_dev.check_mr_status	= mlx5_ib_check_mr_status;
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index df5e959..7017a1a 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -582,6 +582,9 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
 			       enum ib_mr_type mr_type,
 			       u32 max_entries,
 			       u32 flags);
+int mlx5_ib_map_mr_sg(struct ib_mr *ibmr,
+		      struct scatterlist *sg,
+		      unsigned short sg_nents);
 struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
 							       int page_list_len);
 void mlx5_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list);
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 1075065..7a030a2 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1471,3 +1471,14 @@ int mlx5_ib_check_mr_status(struct ib_mr *ibmr, u32 check_mask,
 done:
 	return ret;
 }
+
+int mlx5_ib_map_mr_sg(struct ib_mr *ibmr,
+		      struct scatterlist *sg,
+		      unsigned short sg_nents)
+{
+	struct mlx5_ib_mr *mr = to_mmr(ibmr);
+
+	return ib_sg_to_pages(sg, sg_nents, mr->max_descs,
+			      mr->pl, &mr->ndescs,
+			      &ibmr->length, &ibmr->iova);
+}
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 203c8a4..f0a03aa 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -65,6 +65,7 @@ static const u32 mlx5_ib_opcode[] = {
 	[IB_WR_SEND_WITH_INV]			= MLX5_OPCODE_SEND_INVAL,
 	[IB_WR_LOCAL_INV]			= MLX5_OPCODE_UMR,
 	[IB_WR_FAST_REG_MR]			= MLX5_OPCODE_UMR,
+	[IB_WR_FASTREG_MR]			= MLX5_OPCODE_UMR,
 	[IB_WR_MASKED_ATOMIC_CMP_AND_SWP]	= MLX5_OPCODE_ATOMIC_MASKED_CS,
 	[IB_WR_MASKED_ATOMIC_FETCH_AND_ADD]	= MLX5_OPCODE_ATOMIC_MASKED_FA,
 	[MLX5_IB_WR_UMR]			= MLX5_OPCODE_UMR,
@@ -1903,6 +1904,17 @@ static __be64 sig_mkey_mask(void)
 	return cpu_to_be64(result);
 }
 
+static void set_fastreg_umr_seg(struct mlx5_wqe_umr_ctrl_seg *umr,
+				struct mlx5_ib_mr *mr)
+{
+	int ndescs = mr->ndescs;
+
+	memset(umr, 0, sizeof(*umr));
+	umr->flags = MLX5_UMR_CHECK_NOT_FREE;
+	umr->klm_octowords = get_klm_octo(ndescs);
+	umr->mkey_mask = frwr_mkey_mask();
+}
+
 static void set_frwr_umr_segment(struct mlx5_wqe_umr_ctrl_seg *umr,
 				 struct ib_send_wr *wr, int li)
 {
@@ -1994,6 +2006,23 @@ static u8 get_umr_flags(int acc)
 		MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN;
 }
 
+static void set_fastreg_mkey_seg(struct mlx5_mkey_seg *seg,
+				 struct mlx5_ib_mr *mr, u32 key,
+				 int *writ)
+{
+	int ndescs = ALIGN(mr->ndescs, 8) >> 1;
+
+	memset(seg, 0, sizeof(*seg));
+	seg->flags = get_umr_flags(mr->ibmr.access) | MLX5_ACCESS_MODE_MTT;
+	*writ = seg->flags & (MLX5_PERM_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE);
+	seg->qpn_mkey7_0 = cpu_to_be32((key & 0xff) | 0xffffff00);
+	seg->flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL);
+	seg->start_addr = cpu_to_be64(mr->ibmr.iova);
+	seg->len = cpu_to_be64(mr->ibmr.length);
+	seg->xlt_oct_size = cpu_to_be32(ndescs);
+	seg->log2_page_size = PAGE_SHIFT;
+}
+
 static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr,
 			     int li, int *writ)
 {
@@ -2035,6 +2064,23 @@ static void set_reg_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *w
 				       mlx5_mkey_variant(umrwr->mkey));
 }
 
+static void set_fastreg_ds(struct mlx5_wqe_data_seg *dseg,
+			   struct mlx5_ib_mr *mr,
+			   struct mlx5_ib_pd *pd,
+			   int writ)
+{
+	u64 perm = MLX5_EN_RD | (writ ? MLX5_EN_WR : 0);
+	int bcount = sizeof(u64) * mr->ndescs;
+	int i;
+
+	for (i = 0; i < mr->ndescs; i++)
+		mr->mpl[i] = cpu_to_be64(mr->pl[i] | perm);
+
+	dseg->addr = cpu_to_be64(mr->pl_map);
+	dseg->byte_count = cpu_to_be32(ALIGN(bcount, 64));
+	dseg->lkey = cpu_to_be32(pd->pa_lkey);
+}
+
 static void set_frwr_pages(struct mlx5_wqe_data_seg *dseg,
 			   struct ib_send_wr *wr,
 			   struct mlx5_core_dev *mdev,
@@ -2440,6 +2486,37 @@ static int set_psv_wr(struct ib_sig_domain *domain,
 	return 0;
 }
 
+static int set_fastreg_wr(struct mlx5_ib_qp *qp,
+			  struct ib_send_wr *wr,
+			  void **seg, int *size)
+{
+	struct mlx5_ib_mr *mr = to_mmr(wr->wr.fastreg.mr);
+	struct mlx5_ib_pd *pd = to_mpd(qp->ibqp.pd);
+	u32 key = wr->wr.fastreg.key;
+	int writ = 0;
+
+	if (unlikely(wr->send_flags & IB_SEND_INLINE))
+		return -EINVAL;
+
+	set_fastreg_umr_seg(*seg, mr);
+	*seg += sizeof(struct mlx5_wqe_umr_ctrl_seg);
+	*size += sizeof(struct mlx5_wqe_umr_ctrl_seg) / 16;
+	if (unlikely((*seg == qp->sq.qend)))
+		*seg = mlx5_get_send_wqe(qp, 0);
+
+	set_fastreg_mkey_seg(*seg, mr, key, &writ);
+	*seg += sizeof(struct mlx5_mkey_seg);
+	*size += sizeof(struct mlx5_mkey_seg) / 16;
+	if (unlikely((*seg == qp->sq.qend)))
+		*seg = mlx5_get_send_wqe(qp, 0);
+
+	set_fastreg_ds(*seg, mr, pd, writ);
+	*seg += sizeof(struct mlx5_wqe_data_seg);
+	*size += (sizeof(struct mlx5_wqe_data_seg) / 16);
+
+	return 0;
+}
+
 static int set_frwr_li_wr(void **seg, struct ib_send_wr *wr, int *size,
 			  struct mlx5_core_dev *mdev, struct mlx5_ib_pd *pd, struct mlx5_ib_qp *qp)
 {
@@ -2683,6 +2760,19 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 				num_sge = 0;
 				break;
 
+			case IB_WR_FASTREG_MR:
+				next_fence = MLX5_FENCE_MODE_INITIATOR_SMALL;
+				qp->sq.wr_data[idx] = IB_WR_FASTREG_MR;
+				ctrl->imm = cpu_to_be32(wr->wr.fastreg.key);
+				err = set_fastreg_wr(qp, wr, &seg, &size);
+				if (err) {
+					mlx5_ib_warn(dev, "\n");
+					*bad_wr = wr;
+					goto out;
+				}
+				num_sge = 0;
+				break;
+
 			case IB_WR_REG_SIG_MR:
 				qp->sq.wr_data[idx] = IB_WR_REG_SIG_MR;
 				mr = to_mmr(wr->wr.sig_handover.sig_mr);
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 30/43] mlx4: Support the new memory registration API
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (28 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 29/43] mlx5: Support the new memory " Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 31/43] ocrdma: " Sagi Grimberg
                     ` (13 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Just duplicated the functions to take the needed
arguments from the private MR context. The old
fast_reg routines will be dropped later.

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx4/main.c    |  1 +
 drivers/infiniband/hw/mlx4/mlx4_ib.h |  3 +++
 drivers/infiniband/hw/mlx4/mr.c      | 11 +++++++++++
 drivers/infiniband/hw/mlx4/qp.c      | 27 +++++++++++++++++++++++++++
 4 files changed, 42 insertions(+)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 829fcf4..f2d101c 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2298,6 +2298,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 	ibdev->ib_dev.rereg_user_mr	= mlx4_ib_rereg_user_mr;
 	ibdev->ib_dev.dereg_mr		= mlx4_ib_dereg_mr;
 	ibdev->ib_dev.alloc_mr		= mlx4_ib_alloc_mr;
+	ibdev->ib_dev.map_mr_sg		= mlx4_ib_map_mr_sg;
 	ibdev->ib_dev.alloc_fast_reg_page_list = mlx4_ib_alloc_fast_reg_page_list;
 	ibdev->ib_dev.free_fast_reg_page_list  = mlx4_ib_free_fast_reg_page_list;
 	ibdev->ib_dev.attach_mcast	= mlx4_ib_mcg_attach;
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index a9a4a7f..e5c7292 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -689,6 +689,9 @@ struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd,
 			       enum ib_mr_type mr_type,
 			       u32 max_entries,
 			       u32 flags);
+int mlx4_ib_map_mr_sg(struct ib_mr *ibmr,
+		      struct scatterlist *sg,
+		      unsigned short sg_nents);
 struct ib_fast_reg_page_list *mlx4_ib_alloc_fast_reg_page_list(struct ib_device *ibdev,
 							       int page_list_len);
 void mlx4_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list);
diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c
index 01e16bc..9a86829 100644
--- a/drivers/infiniband/hw/mlx4/mr.c
+++ b/drivers/infiniband/hw/mlx4/mr.c
@@ -574,3 +574,14 @@ int mlx4_ib_fmr_dealloc(struct ib_fmr *ibfmr)
 
 	return err;
 }
+
+int mlx4_ib_map_mr_sg(struct ib_mr *ibmr,
+		      struct scatterlist *sg,
+		      unsigned short sg_nents)
+{
+	struct mlx4_ib_mr *mr = to_mmr(ibmr);
+
+	return ib_sg_to_pages(sg, sg_nents, mr->max_pages,
+			      mr->pl, &mr->npages,
+			      &ibmr->length, &ibmr->iova);
+}
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index c5a3a5f..492e799 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -2401,6 +2401,25 @@ static __be32 convert_access(int acc)
 		cpu_to_be32(MLX4_WQE_FMR_PERM_LOCAL_READ);
 }
 
+static void set_fastreg_seg(struct mlx4_wqe_fmr_seg *fseg, struct ib_send_wr *wr)
+{
+	struct mlx4_ib_mr *mr = to_mmr(wr->wr.fastreg.mr);
+	int i;
+
+	for (i = 0; i < mr->npages; ++i)
+		mr->mpl[i] = cpu_to_be64(mr->pl[i] | MLX4_MTT_FLAG_PRESENT);
+
+	fseg->flags		= convert_access(mr->ibmr.access);
+	fseg->mem_key		= cpu_to_be32(wr->wr.fastreg.key);
+	fseg->buf_list		= cpu_to_be64(mr->pl_map);
+	fseg->start_addr	= cpu_to_be64(mr->ibmr.iova);
+	fseg->reg_len		= cpu_to_be64(mr->ibmr.length);
+	fseg->offset		= 0; /* XXX -- is this just for ZBVA? */
+	fseg->page_size		= cpu_to_be32(PAGE_SHIFT);
+	fseg->reserved[0]	= 0;
+	fseg->reserved[1]	= 0;
+}
+
 static void set_fmr_seg(struct mlx4_wqe_fmr_seg *fseg, struct ib_send_wr *wr)
 {
 	struct mlx4_ib_fast_reg_page_list *mfrpl = to_mfrpl(wr->wr.fast_reg.page_list);
@@ -2759,6 +2778,14 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 				size += sizeof (struct mlx4_wqe_fmr_seg) / 16;
 				break;
 
+			case IB_WR_FASTREG_MR:
+				ctrl->srcrb_flags |=
+					cpu_to_be32(MLX4_WQE_CTRL_STRONG_ORDER);
+				set_fastreg_seg(wqe, wr);
+				wqe  += sizeof (struct mlx4_wqe_fmr_seg);
+				size += sizeof (struct mlx4_wqe_fmr_seg) / 16;
+				break;
+
 			case IB_WR_BIND_MW:
 				ctrl->srcrb_flags |=
 					cpu_to_be32(MLX4_WQE_CTRL_STRONG_ORDER);
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 31/43] ocrdma: Support the new memory registration API
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (29 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 30/43] mlx4: " Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 32/43] cxgb3: " Sagi Grimberg
                     ` (12 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Just duplicated the functions to take the needed
arguments from the private MR context. The old
fast_reg routines will be dropped later.
---
 drivers/infiniband/hw/ocrdma/ocrdma_main.c  |  1 +
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 67 +++++++++++++++++++++++++++++
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h |  3 ++
 3 files changed, 71 insertions(+)

diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index 47d2814..2dd6b06 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -295,6 +295,7 @@ static int ocrdma_register_device(struct ocrdma_dev *dev)
 	dev->ibdev.reg_user_mr = ocrdma_reg_user_mr;
 
 	dev->ibdev.alloc_mr = ocrdma_alloc_mr;
+	dev->ibdev.map_mr_sg = ocrdma_map_mr_sg;
 	dev->ibdev.alloc_fast_reg_page_list = ocrdma_alloc_frmr_page_list;
 	dev->ibdev.free_fast_reg_page_list = ocrdma_free_frmr_page_list;
 
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index a764cb9..0f32fc4 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -2121,6 +2121,59 @@ static int get_encoded_page_size(int pg_sz)
 	return i;
 }
 
+static int ocrdma_build_fr2(struct ocrdma_qp *qp, struct ocrdma_hdr_wqe *hdr,
+			   struct ib_send_wr *wr)
+{
+	u64 fbo;
+	struct ocrdma_ewqe_fr *fast_reg = (struct ocrdma_ewqe_fr *)(hdr + 1);
+	struct ocrdma_mr *mr = get_ocrdma_mr(wr->wr.fastreg.mr);
+	struct ocrdma_pbl *pbl_tbl = mr->hwmr.pbl_table;
+	struct ocrdma_pbe *pbe;
+	u32 wqe_size = sizeof(*fast_reg) + sizeof(*hdr);
+	int num_pbes = 0, i;
+
+	wqe_size = roundup(wqe_size, OCRDMA_WQE_ALIGN_BYTES);
+
+	hdr->cw |= (OCRDMA_FR_MR << OCRDMA_WQE_OPCODE_SHIFT);
+	hdr->cw |= ((wqe_size / OCRDMA_WQE_STRIDE) << OCRDMA_WQE_SIZE_SHIFT);
+
+	if (mr->ibmr.access & IB_ACCESS_LOCAL_WRITE)
+		hdr->rsvd_lkey_flags |= OCRDMA_LKEY_FLAG_LOCAL_WR;
+	if (mr->ibmr.access & IB_ACCESS_REMOTE_WRITE)
+		hdr->rsvd_lkey_flags |= OCRDMA_LKEY_FLAG_REMOTE_WR;
+	if (mr->ibmr.access & IB_ACCESS_REMOTE_READ)
+		hdr->rsvd_lkey_flags |= OCRDMA_LKEY_FLAG_REMOTE_RD;
+	hdr->lkey = wr->wr.fastreg.key;
+	hdr->total_len = mr->ibmr.length;
+
+	fbo = mr->ibmr.iova - mr->pl[0];
+
+	fast_reg->va_hi = upper_32_bits(mr->ibmr.iova);
+	fast_reg->va_lo = (u32) (mr->ibmr.iova & 0xffffffff);
+	fast_reg->fbo_hi = upper_32_bits(fbo);
+	fast_reg->fbo_lo = (u32) fbo & 0xffffffff;
+	fast_reg->num_sges = mr->npages;
+	fast_reg->size_sge = get_encoded_page_size(1 << PAGE_SHIFT);
+
+	pbe = pbl_tbl->va;
+	for (i = 0; i < mr->npages; i++) {
+		u64 buf_addr = mr->pl[i];
+		pbe->pa_lo = cpu_to_le32((u32) (buf_addr & PAGE_MASK));
+		pbe->pa_hi = cpu_to_le32((u32) upper_32_bits(buf_addr));
+		num_pbes += 1;
+		pbe++;
+
+		/* if the pbl is full storing the pbes,
+		 * move to next pbl.
+		*/
+		if (num_pbes == (mr->hwmr.pbl_size/sizeof(u64))) {
+			pbl_tbl++;
+			pbe = (struct ocrdma_pbe *)pbl_tbl->va;
+		}
+	}
+
+	return 0;
+}
 
 static int ocrdma_build_fr(struct ocrdma_qp *qp, struct ocrdma_hdr_wqe *hdr,
 			   struct ib_send_wr *wr)
@@ -2248,6 +2301,9 @@ int ocrdma_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 		case IB_WR_FAST_REG_MR:
 			status = ocrdma_build_fr(qp, hdr, wr);
 			break;
+		case IB_WR_FASTREG_MR:
+			status = ocrdma_build_fr2(qp, hdr, wr);
+			break;
 		default:
 			status = -EINVAL;
 			break;
@@ -3221,3 +3277,14 @@ pbl_err:
 	kfree(mr);
 	return ERR_PTR(status);
 }
+
+int ocrdma_map_mr_sg(struct ib_mr *ibmr,
+		     struct scatterlist *sg,
+		     unsigned short sg_nents)
+{
+	struct ocrdma_mr *mr = get_ocrdma_mr(ibmr);
+
+	return ib_sg_to_pages(sg, sg_nents, mr->hwmr.num_pbes,
+			      mr->pl, &mr->npages,
+			      &ibmr->length, &ibmr->iova);
+}
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
index d09ff8e..4c60eec 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
@@ -100,6 +100,9 @@ struct ib_mr *ocrdma_alloc_mr(struct ib_pd *pd,
 			      enum ib_mr_type mr_type,
 			      u32 max_entries,
 			      u32 flags);
+int ocrdma_map_mr_sg(struct ib_mr *ibmr,
+		     struct scatterlist *sg,
+		     unsigned short sg_nents);
 struct ib_fast_reg_page_list *ocrdma_alloc_frmr_page_list(struct ib_device
 							*ibdev,
 							int page_list_len);
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 32/43] cxgb3: Support the new memory registration API
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (30 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 31/43] ocrdma: " Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 33/43] cxgb4: " Sagi Grimberg
                     ` (11 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Just duplicated the functions to take the needed
arguments from the private MR context. The old
fast_reg routines will be dropped later.

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/cxgb3/iwch_provider.c | 12 ++++++++
 drivers/infiniband/hw/cxgb3/iwch_qp.c       | 48 +++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index c9368e6..b25cb6a 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -857,6 +857,17 @@ err:
 	return ERR_PTR(ret);
 }
 
+static int iwch_map_mr_sg(struct ib_mr *ibmr,
+			  struct scatterlist *sg,
+			  unsigned short sg_nents)
+{
+	struct iwch_mr *mhp = to_iwch_mr(ibmr);
+
+	return ib_sg_to_pages(sg, sg_nents, mhp->attr.pbl_size,
+			      mhp->pl, &mhp->npages,
+			      &ibmr->length, &ibmr->iova);
+}
+
 static struct ib_fast_reg_page_list *iwch_alloc_fastreg_pbl(
 					struct ib_device *device,
 					int page_list_len)
@@ -1455,6 +1466,7 @@ int iwch_register_device(struct iwch_dev *dev)
 	dev->ibdev.bind_mw = iwch_bind_mw;
 	dev->ibdev.dealloc_mw = iwch_dealloc_mw;
 	dev->ibdev.alloc_mr = iwch_alloc_mr;
+	dev->ibdev.map_mr_sg = iwch_map_mr_sg;
 	dev->ibdev.alloc_fast_reg_page_list = iwch_alloc_fastreg_pbl;
 	dev->ibdev.free_fast_reg_page_list = iwch_free_fastreg_pbl;
 	dev->ibdev.attach_mcast = iwch_multicast_attach;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c
index b57c0be..2c30326 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_qp.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -146,6 +146,49 @@ static int build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr,
 	return 0;
 }
 
+static int build_fastreg2(union t3_wr *wqe, struct ib_send_wr *wr,
+			  u8 *flit_cnt, int *wr_cnt, struct t3_wq *wq)
+{
+	struct iwch_mr *mhp = to_iwch_mr(wr->wr.fastreg.mr);
+	int i;
+	__be64 *p;
+
+	if (mhp->npages > T3_MAX_FASTREG_DEPTH)
+		return -EINVAL;
+	*wr_cnt = 1;
+	wqe->fastreg.stag = cpu_to_be32(wr->wr.fastreg.key);
+	wqe->fastreg.len = cpu_to_be32(mhp->ibmr.length);
+	wqe->fastreg.va_base_hi = cpu_to_be32(mhp->ibmr.iova >> 32);
+	wqe->fastreg.va_base_lo_fbo =
+				cpu_to_be32(mhp->ibmr.iova & 0xffffffff);
+	wqe->fastreg.page_type_perms = cpu_to_be32(
+		V_FR_PAGE_COUNT(mhp->npages) |
+		V_FR_PAGE_SIZE(PAGE_SHIFT - 12) |
+		V_FR_TYPE(TPT_VATO) |
+		V_FR_PERMS(iwch_ib_to_tpt_access(mhp->ibmr.access)));
+	p = &wqe->fastreg.pbl_addrs[0];
+	for (i = 0; i < mhp->npages; i++, p++) {
+
+		/* If we need a 2nd WR, then set it up */
+		if (i == T3_MAX_FASTREG_FRAG) {
+			*wr_cnt = 2;
+			wqe = (union t3_wr *)(wq->queue +
+				Q_PTR2IDX((wq->wptr+1), wq->size_log2));
+			build_fw_riwrh((void *)wqe, T3_WR_FASTREG, 0,
+			       Q_GENBIT(wq->wptr + 1, wq->size_log2),
+			       0, 1 + mhp->npages - T3_MAX_FASTREG_FRAG,
+			       T3_EOP);
+
+			p = &wqe->pbl_frag.pbl_addrs[0];
+		}
+		*p = cpu_to_be64((u64)mhp->pl[i]);
+	}
+	*flit_cnt = 5 + mhp->npages;
+	if (*flit_cnt > 15)
+		*flit_cnt = 15;
+	return 0;
+}
+
 static int build_fastreg(union t3_wr *wqe, struct ib_send_wr *wr,
 				u8 *flit_cnt, int *wr_cnt, struct t3_wq *wq)
 {
@@ -419,6 +462,11 @@ int iwch_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 			err = build_fastreg(wqe, wr, &t3_wr_flit_cnt,
 						 &wr_cnt, &qhp->wq);
 			break;
+		case IB_WR_FASTREG_MR:
+			t3_wr_opcode = T3_WR_FASTREG;
+			err = build_fastreg2(wqe, wr, &t3_wr_flit_cnt,
+						 &wr_cnt, &qhp->wq);
+			break;
 		case IB_WR_LOCAL_INV:
 			if (wr->send_flags & IB_SEND_FENCE)
 				t3_wr_flags |= T3_LOCAL_FENCE_FLAG;
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 33/43] cxgb4: Support the new memory registration API
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (31 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 32/43] cxgb3: " Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 34/43] nes: " Sagi Grimberg
                     ` (10 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Just duplicated the functions to take the needed
arguments from the private MR context. The old
fast_reg routines will be dropped later.

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h |  3 ++
 drivers/infiniband/hw/cxgb4/mem.c      | 11 +++++
 drivers/infiniband/hw/cxgb4/provider.c |  1 +
 drivers/infiniband/hw/cxgb4/qp.c       | 75 +++++++++++++++++++++++++++++++++-
 4 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index e529ace..ce2bbf3 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -978,6 +978,9 @@ struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd,
 			    enum ib_mr_type mr_type,
 			    u32 max_entries,
 			    u32 flags);
+int c4iw_map_mr_sg(struct ib_mr *ibmr,
+		   struct scatterlist *sg,
+		   unsigned short sg_nents);
 int c4iw_dealloc_mw(struct ib_mw *mw);
 struct ib_mw *c4iw_alloc_mw(struct ib_pd *pd, enum ib_mw_type type);
 struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start,
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index 91aedce..ea37fc7 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -922,6 +922,17 @@ err:
 	return ERR_PTR(ret);
 }
 
+int c4iw_map_mr_sg(struct ib_mr *ibmr,
+		   struct scatterlist *sg,
+		   unsigned short sg_nents)
+{
+	struct c4iw_mr *mhp = to_c4iw_mr(ibmr);
+
+	return ib_sg_to_pages(sg, sg_nents, mhp->max_mpl_len,
+			      mhp->mpl, &mhp->mpl_len,
+			      &ibmr->length, &ibmr->iova);
+}
+
 struct ib_fast_reg_page_list *c4iw_alloc_fastreg_pbl(struct ib_device *device,
 						     int page_list_len)
 {
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index 7746113..55dedad 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -557,6 +557,7 @@ int c4iw_register_device(struct c4iw_dev *dev)
 	dev->ibdev.bind_mw = c4iw_bind_mw;
 	dev->ibdev.dealloc_mw = c4iw_dealloc_mw;
 	dev->ibdev.alloc_mr = c4iw_alloc_mr;
+	dev->ibdev.map_mr_sg = c4iw_map_mr_sg;
 	dev->ibdev.alloc_fast_reg_page_list = c4iw_alloc_fastreg_pbl;
 	dev->ibdev.free_fast_reg_page_list = c4iw_free_fastreg_pbl;
 	dev->ibdev.attach_mcast = c4iw_multicast_attach;
diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c
index 6517e12..e5d1d99 100644
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -605,10 +605,75 @@ static int build_rdma_recv(struct c4iw_qp *qhp, union t4_recv_wr *wqe,
 	return 0;
 }
 
-static int build_fastreg(struct t4_sq *sq, union t4_wr *wqe,
+static int build_fastreg2(struct t4_sq *sq, union t4_wr *wqe,
 			 struct ib_send_wr *wr, u8 *len16, u8 t5dev)
 {
+	struct c4iw_mr *mhp = to_c4iw_mr(wr->wr.fastreg.mr);
+	struct fw_ri_immd *imdp;
+	__be64 *p;
+	int i;
+	int pbllen = roundup(mhp->mpl_len * sizeof(u64), 32);
+	int rem;
+
+	if (mhp->mpl_len > t4_max_fr_depth(use_dsgl))
+		return -EINVAL;
+
+	wqe->fr.qpbinde_to_dcacpu = 0;
+	wqe->fr.pgsz_shift = PAGE_SHIFT - 12;
+	wqe->fr.addr_type = FW_RI_VA_BASED_TO;
+	wqe->fr.mem_perms = c4iw_ib_to_tpt_access(mhp->ibmr.access);
+	wqe->fr.len_hi = 0;
+	wqe->fr.len_lo = cpu_to_be32(mhp->ibmr.length);
+	wqe->fr.stag = cpu_to_be32(wr->wr.fastreg.key);
+	wqe->fr.va_hi = cpu_to_be32(mhp->ibmr.iova >> 32);
+	wqe->fr.va_lo_fbo = cpu_to_be32(mhp->ibmr.iova &
+					0xffffffff);
+
+	if (t5dev && use_dsgl && (pbllen > max_fr_immd)) {
+		struct fw_ri_dsgl *sglp;
+
+		for (i = 0; i < mhp->mpl_len; i++) {
+			mhp->mpl[i] = (__force u64)cpu_to_be64((u64)mhp->mpl[i]);
+		}
+
+		sglp = (struct fw_ri_dsgl *)(&wqe->fr + 1);
+		sglp->op = FW_RI_DATA_DSGL;
+		sglp->r1 = 0;
+		sglp->nsge = cpu_to_be16(1);
+		sglp->addr0 = cpu_to_be64(mhp->mpl_addr);
+		sglp->len0 = cpu_to_be32(pbllen);
+
+		*len16 = DIV_ROUND_UP(sizeof(wqe->fr) + sizeof(*sglp), 16);
+	} else {
+		imdp = (struct fw_ri_immd *)(&wqe->fr + 1);
+		imdp->op = FW_RI_DATA_IMMD;
+		imdp->r1 = 0;
+		imdp->r2 = 0;
+		imdp->immdlen = cpu_to_be32(pbllen);
+		p = (__be64 *)(imdp + 1);
+		rem = pbllen;
+		for (i = 0; i < mhp->mpl_len; i++) {
+			*p = cpu_to_be64((u64)mhp->mpl[i]);
+			rem -= sizeof(*p);
+			if (++p == (__be64 *)&sq->queue[sq->size])
+				p = (__be64 *)sq->queue;
+		}
+		BUG_ON(rem < 0);
+		while (rem) {
+			*p = 0;
+			rem -= sizeof(*p);
+			if (++p == (__be64 *)&sq->queue[sq->size])
+				p = (__be64 *)sq->queue;
+		}
+		*len16 = DIV_ROUND_UP(sizeof(wqe->fr) + sizeof(*imdp)
+				      + pbllen, 16);
+	}
+	return 0;
+}
 
+static int build_fastreg(struct t4_sq *sq, union t4_wr *wqe,
+			 struct ib_send_wr *wr, u8 *len16, u8 t5dev)
+{
 	struct fw_ri_immd *imdp;
 	__be64 *p;
 	int i;
@@ -821,6 +886,14 @@ int c4iw_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 					    qhp->rhp->rdev.lldi.adapter_type) ?
 					    1 : 0);
 			break;
+		case IB_WR_FASTREG_MR:
+			fw_opcode = FW_RI_FR_NSMR_WR;
+			swsqe->opcode = FW_RI_FAST_REGISTER;
+			err = build_fastreg2(&qhp->wq.sq, wqe, wr, &len16,
+					    is_t5(
+					    qhp->rhp->rdev.lldi.adapter_type) ?
+					    1 : 0);
+			break;
 		case IB_WR_LOCAL_INV:
 			if (wr->send_flags & IB_SEND_FENCE)
 				fw_flags |= FW_RI_LOCAL_FENCE_FLAG;
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 34/43] nes: Support the new memory registration API
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (32 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 33/43] cxgb4: " Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 35/43] qib: " Sagi Grimberg
                     ` (9 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Just duplicated the functions to take the needed
arguments from the private MR context. The old
fast_reg routines will be dropped later.

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/nes/nes_verbs.c | 85 +++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index 532496d..d5d8b01 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -465,6 +465,17 @@ err:
 	return ERR_PTR(-ENOMEM);
 }
 
+static int nes_map_mr_sg(struct ib_mr *ibmr,
+			 struct scatterlist *sg,
+			 unsigned short sg_nents)
+{
+	struct nes_mr *nesmr = to_nesmr(ibmr);
+
+	return ib_sg_to_pages(sg, sg_nents, nesmr->max_pages,
+			      nesmr->pl, &nesmr->npages,
+			      &ibmr->length, &ibmr->iova);
+}
+
 /*
  * nes_alloc_fast_reg_page_list
  */
@@ -3537,6 +3548,79 @@ static int nes_post_send(struct ib_qp *ibqp, struct ib_send_wr *ib_wr,
 				  wqe_misc);
 			break;
 		}
+		case IB_WR_FASTREG_MR:
+		{
+			int i;
+			struct nes_mr *mr = to_nesmr(ib_wr->wr.fastreg.mr);
+			int flags = mr->ibmr.access;
+			u64 *src_page_list = mr->pl;
+			u64 *dst_page_list = mr->mpl;
+
+			if (mr->npages > (NES_4K_PBL_CHUNK_SIZE / sizeof(u64))) {
+				nes_debug(NES_DBG_IW_TX, "SQ_FMR: bad page_list_len\n");
+				err = -EINVAL;
+				break;
+			}
+			wqe_misc = NES_IWARP_SQ_OP_FAST_REG;
+			set_wqe_64bit_value(wqe->wqe_words,
+					    NES_IWARP_SQ_FMR_WQE_VA_FBO_LOW_IDX,
+					    mr->ibmr.iova);
+			set_wqe_32bit_value(wqe->wqe_words,
+					    NES_IWARP_SQ_FMR_WQE_LENGTH_LOW_IDX,
+					    mr->ibmr.length);
+			set_wqe_32bit_value(wqe->wqe_words,
+					    NES_IWARP_SQ_FMR_WQE_LENGTH_HIGH_IDX, 0);
+			set_wqe_32bit_value(wqe->wqe_words,
+					    NES_IWARP_SQ_FMR_WQE_MR_STAG_IDX,
+					    ib_wr->wr.fastreg.key);
+
+			/* Set page size: currently only 4K*/
+			if (ib_wr->wr.fast_reg.page_shift == 12) {
+				wqe_misc |= NES_IWARP_SQ_FMR_WQE_PAGE_SIZE_4K;
+			} else {
+				nes_debug(NES_DBG_IW_TX, "Invalid page shift,"
+					  " ib_wr=%u, max=1\n", ib_wr->num_sge);
+				err = -EINVAL;
+				break;
+			}
+
+			/* Set access_flags */
+			wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_LOCAL_READ;
+			if (flags & IB_ACCESS_LOCAL_WRITE)
+				wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_LOCAL_WRITE;
+
+			if (flags & IB_ACCESS_REMOTE_WRITE)
+				wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_REMOTE_WRITE;
+
+			if (flags & IB_ACCESS_REMOTE_READ)
+				wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_REMOTE_READ;
+
+			if (flags & IB_ACCESS_MW_BIND)
+				wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_WINDOW_BIND;
+
+			/* Fill in PBL info: */
+			set_wqe_64bit_value(wqe->wqe_words,
+					    NES_IWARP_SQ_FMR_WQE_PBL_ADDR_LOW_IDX,
+					    mr->mpl_addr);
+
+			set_wqe_32bit_value(wqe->wqe_words,
+					    NES_IWARP_SQ_FMR_WQE_PBL_LENGTH_IDX,
+					    mr->npages * 8);
+
+			for (i = 0; i < mr->npages; i++)
+				dst_page_list[i] = cpu_to_le64(src_page_list[i]);
+
+			nes_debug(NES_DBG_IW_TX, "SQ_FMR: iova_start: %llx, "
+				  "length: %d, rkey: %0x, pgl_paddr: %llx, "
+				  "page_list_len: %u, wqe_misc: %x\n",
+				  (unsigned long long) mr->ibmr.iova,
+				  mr->ibmr.length,
+				  ib_wr->wr.fastreg.key,
+				  (unsigned long long) mr->mpl_addr,
+				  mr->npages,
+				  wqe_misc);
+			break;
+		}
 		default:
 			/* error */
 			err = -EINVAL;
@@ -3964,6 +4048,7 @@ struct nes_ib_device *nes_init_ofa_device(struct net_device *netdev)
 	nesibdev->ibdev.bind_mw = nes_bind_mw;
 
 	nesibdev->ibdev.alloc_mr = nes_alloc_mr;
+	nesibdev->ibdev.map_mr_sg = nes_map_mr_sg;
 	nesibdev->ibdev.alloc_fast_reg_page_list = nes_alloc_fast_reg_page_list;
 	nesibdev->ibdev.free_fast_reg_page_list = nes_free_fast_reg_page_list;
 
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 35/43] qib: Support the new memory registration API
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (33 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 34/43] nes: " Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 36/43] iser: Port to new fast registration api Sagi Grimberg
                     ` (8 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Just duplicated the functions to take the needed
arguments from the private MR context. The old
fast_reg routines will be dropped later.

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/qib/qib_keys.c  | 56 +++++++++++++++++++++++++++++++++++
 drivers/infiniband/hw/qib/qib_mr.c    | 11 +++++++
 drivers/infiniband/hw/qib/qib_verbs.c |  6 +++-
 drivers/infiniband/hw/qib/qib_verbs.h |  5 ++++
 4 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/qib/qib_keys.c b/drivers/infiniband/hw/qib/qib_keys.c
index ad843c7..557e6c2 100644
--- a/drivers/infiniband/hw/qib/qib_keys.c
+++ b/drivers/infiniband/hw/qib/qib_keys.c
@@ -385,3 +385,59 @@ bail:
 	spin_unlock_irqrestore(&rkt->lock, flags);
 	return ret;
 }
+
+/*
+ * Initialize the memory region specified by the work reqeust.
+ */
+int qib_fastreg_mr(struct qib_qp *qp, struct ib_send_wr *wr)
+{
+	struct qib_lkey_table *rkt = &to_idev(qp->ibqp.device)->lk_table;
+	struct qib_pd *pd = to_ipd(qp->ibqp.pd);
+	struct qib_mr *mr = to_imr(wr->wr.fastreg.mr);
+	struct qib_mregion *mrg;
+	u32 key = wr->wr.fastreg.key;
+	unsigned i, n, m;
+	int ret = -EINVAL;
+	unsigned long flags;
+	u64 *page_list;
+	size_t ps;
+
+	spin_lock_irqsave(&rkt->lock, flags);
+	if (pd->user || key == 0)
+		goto bail;
+
+	mrg = rcu_dereference_protected(
+		rkt->table[(key >> (32 - ib_qib_lkey_table_size))],
+		lockdep_is_held(&rkt->lock));
+	if (unlikely(mrg == NULL || qp->ibqp.pd != mrg->pd))
+		goto bail;
+
+	if (mr->npages > mrg->max_segs)
+		goto bail;
+
+	ps = 1UL << PAGE_SHIFT;
+	if (mr->ibmr.length > ps * mr->npages)
+		goto bail;
+
+	mrg->user_base = mr->ibmr.iova;
+	mrg->iova = mr->ibmr.iova;
+	mrg->lkey = key;
+	mrg->length = mr->ibmr.length;
+	mrg->access_flags = mr->ibmr.access;
+	page_list = mr->pl;
+	m = 0;
+	n = 0;
+	for (i = 0; i < wr->wr.fast_reg.page_list_len; i++) {
+		mrg->map[m]->segs[n].vaddr = (void *) page_list[i];
+		mrg->map[m]->segs[n].length = ps;
+		if (++n == QIB_SEGSZ) {
+			m++;
+			n = 0;
+		}
+	}
+
+	ret = 0;
+bail:
+	spin_unlock_irqrestore(&rkt->lock, flags);
+	return ret;
+}
diff --git a/drivers/infiniband/hw/qib/qib_mr.c b/drivers/infiniband/hw/qib/qib_mr.c
index a58a347..a4986f0 100644
--- a/drivers/infiniband/hw/qib/qib_mr.c
+++ b/drivers/infiniband/hw/qib/qib_mr.c
@@ -353,6 +353,17 @@ err:
 	return ERR_PTR(-ENOMEM);
 }
 
+int qib_map_mr_sg(struct ib_mr *ibmr,
+		  struct scatterlist *sg,
+		  unsigned short sg_nents)
+{
+	struct qib_mr *mr = to_imr(ibmr);
+
+	return ib_sg_to_pages(sg, sg_nents, mr->mr.max_segs,
+			      mr->pl, &mr->npages,
+			      &ibmr->length, &ibmr->iova);
+}
+
 struct ib_fast_reg_page_list *
 qib_alloc_fast_reg_page_list(struct ib_device *ibdev, int page_list_len)
 {
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
index ef022a1..8561f90 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -361,7 +361,10 @@ static int qib_post_one_send(struct qib_qp *qp, struct ib_send_wr *wr,
 	 * undefined operations.
 	 * Make sure buffer is large enough to hold the result for atomics.
 	 */
-	if (wr->opcode == IB_WR_FAST_REG_MR) {
+	if (wr->opcode == IB_WR_FASTREG_MR) {
+		if (qib_fastreg_mr(qp, wr))
+			goto bail_inval;
+	} else if (wr->opcode == IB_WR_FAST_REG_MR) {
 		if (qib_fast_reg_mr(qp, wr))
 			goto bail_inval;
 	} else if (qp->ibqp.qp_type == IB_QPT_UC) {
@@ -2236,6 +2239,7 @@ int qib_register_ib_device(struct qib_devdata *dd)
 	ibdev->reg_user_mr = qib_reg_user_mr;
 	ibdev->dereg_mr = qib_dereg_mr;
 	ibdev->alloc_mr = qib_alloc_mr;
+	ibdev->map_mr_sg = qib_map_mr_sg;
 	ibdev->alloc_fast_reg_page_list = qib_alloc_fast_reg_page_list;
 	ibdev->free_fast_reg_page_list = qib_free_fast_reg_page_list;
 	ibdev->alloc_fmr = qib_alloc_fmr;
diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h
index c8062ae..c7a3af5 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.h
+++ b/drivers/infiniband/hw/qib/qib_verbs.h
@@ -1039,12 +1039,17 @@ struct ib_mr *qib_alloc_mr(struct ib_pd *pd,
 			   u32 max_entries,
 			   u32 flags);
 
+int qib_map_mr_sg(struct ib_mr *ibmr,
+		  struct scatterlist *sg,
+		  unsigned short sg_nents);
+
 struct ib_fast_reg_page_list *qib_alloc_fast_reg_page_list(
 				struct ib_device *ibdev, int page_list_len);
 
 void qib_free_fast_reg_page_list(struct ib_fast_reg_page_list *pl);
 
 int qib_fast_reg_mr(struct qib_qp *qp, struct ib_send_wr *wr);
+int qib_fastreg_mr(struct qib_qp *qp, struct ib_send_wr *wr);
 
 struct ib_fmr *qib_alloc_fmr(struct ib_pd *pd, int mr_access_flags,
 			     struct ib_fmr_attr *fmr_attr);
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 36/43] iser: Port to new fast registration api
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (34 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 35/43] qib: " Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 37/43] xprtrdma: Port to new memory registration API Sagi Grimberg
                     ` (7 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/ulp/iser/iscsi_iser.h  |  6 +----
 drivers/infiniband/ulp/iser/iser_memory.c | 40 ++++++++++++-------------------
 drivers/infiniband/ulp/iser/iser_verbs.c  | 16 +------------
 3 files changed, 17 insertions(+), 45 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 6c7efe6..88d0ffc 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -413,7 +413,6 @@ struct iser_device {
  *
  * @mr:         memory region
  * @fmr_pool:   pool of fmrs
- * @frpl:       fast reg page list used by frwrs
  * @page_vec:   fast reg page list used by fmr pool
  * @mr_valid:   is mr valid indicator
  */
@@ -422,10 +421,7 @@ struct iser_reg_resources {
 		struct ib_mr             *mr;
 		struct ib_fmr_pool       *fmr_pool;
 	};
-	union {
-		struct ib_fast_reg_page_list     *frpl;
-		struct iser_page_vec             *page_vec;
-	};
+	struct iser_page_vec             *page_vec;
 	u8				  mr_valid:1;
 };
 
diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c
index d6d980b..094cf8a 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -732,19 +732,19 @@ static int iser_fast_reg_mr(struct iscsi_iser_task *iser_task,
 			    struct iser_reg_resources *rsc,
 			    struct iser_mem_reg *reg)
 {
-	struct ib_conn *ib_conn = &iser_task->iser_conn->ib_conn;
-	struct iser_device *device = ib_conn->device;
-	struct ib_mr *mr = rsc->mr;
-	struct ib_fast_reg_page_list *frpl = rsc->frpl;
 	struct iser_tx_desc *tx_desc = &iser_task->desc;
+	struct ib_mr *mr = rsc->mr;
 	struct ib_send_wr *wr;
-	int offset, size, plen;
-
-	plen = iser_sg_to_page_vec(mem, device->ib_device, frpl->page_list,
-				   &offset, &size);
-	if (plen * SIZE_4K < size) {
-		iser_err("fast reg page_list too short to hold this SG\n");
-		return -EINVAL;
+	int err;
+	int access = IB_ACCESS_LOCAL_WRITE  |
+		     IB_ACCESS_REMOTE_WRITE |
+		     IB_ACCESS_REMOTE_READ;
+
+	err = ib_map_mr_sg(mr, mem->sg, mem->size, access);
+	if (err) {
+		iser_err("failed to map sg %p with %d entries\n",
+			 mem->sg, mem->dma_nents);
+		return err;
 	}
 
 	if (!rsc->mr_valid) {
@@ -753,24 +753,14 @@ static int iser_fast_reg_mr(struct iscsi_iser_task *iser_task,
 	}
 
 	wr = iser_tx_next_wr(tx_desc);
-	wr->opcode = IB_WR_FAST_REG_MR;
-	wr->wr_id = ISER_FASTREG_LI_WRID;
-	wr->send_flags = 0;
-	wr->wr.fast_reg.iova_start = frpl->page_list[0] + offset;
-	wr->wr.fast_reg.page_list = frpl;
-	wr->wr.fast_reg.page_list_len = plen;
-	wr->wr.fast_reg.page_shift = SHIFT_4K;
-	wr->wr.fast_reg.length = size;
-	wr->wr.fast_reg.rkey = mr->rkey;
-	wr->wr.fast_reg.access_flags = (IB_ACCESS_LOCAL_WRITE  |
-					IB_ACCESS_REMOTE_WRITE |
-					IB_ACCESS_REMOTE_READ);
+	ib_set_fastreg_wr(mr, mr->rkey, ISER_FASTREG_LI_WRID,
+			  false, wr);
 	rsc->mr_valid = 0;
 
 	reg->sge.lkey = mr->lkey;
 	reg->rkey = mr->rkey;
-	reg->sge.addr = frpl->page_list[0] + offset;
-	reg->sge.length = size;
+	reg->sge.addr = mr->iova;
+	reg->sge.length = mr->length;
 
 	iser_dbg("fast reg: lkey=0x%x, rkey=0x%x, addr=0x%llx,"
 		 " length=0x%x\n", reg->sge.lkey, reg->rkey,
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c
index ecc3265..332f784 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -288,35 +288,21 @@ iser_alloc_reg_res(struct ib_device *ib_device,
 {
 	int ret;
 
-	res->frpl = ib_alloc_fast_reg_page_list(ib_device, size);
-	if (IS_ERR(res->frpl)) {
-		ret = PTR_ERR(res->frpl);
-		iser_err("Failed to allocate ib_fast_reg_page_list err=%d\n",
-			 ret);
-		return PTR_ERR(res->frpl);
-	}
-
 	res->mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, size, 0);
 	if (IS_ERR(res->mr)) {
 		ret = PTR_ERR(res->mr);
 		iser_err("Failed to allocate ib_fast_reg_mr err=%d\n", ret);
-		goto fast_reg_mr_failure;
+		return ret;
 	}
 	res->mr_valid = 1;
 
 	return 0;
-
-fast_reg_mr_failure:
-	ib_free_fast_reg_page_list(res->frpl);
-
-	return ret;
 }
 
 static void
 iser_free_reg_res(struct iser_reg_resources *rsc)
 {
 	ib_dereg_mr(rsc->mr);
-	ib_free_fast_reg_page_list(rsc->frpl);
 }
 
 static int
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 37/43] xprtrdma: Port to new memory registration API
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (35 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 36/43] iser: Port to new fast registration api Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
       [not found]     ` <1437548143-24893-38-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-07-22  6:55   ` [PATCH WIP 38/43] iser-target: " Sagi Grimberg
                     ` (6 subsequent siblings)
  43 siblings, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 net/sunrpc/xprtrdma/frwr_ops.c  | 80 ++++++++++++++++++++++-------------------
 net/sunrpc/xprtrdma/xprt_rdma.h |  4 ++-
 2 files changed, 47 insertions(+), 37 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 517efed..e28246b 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -151,9 +151,13 @@ __frwr_init(struct rpcrdma_mw *r, struct ib_pd *pd, struct ib_device *device,
 	f->fr_mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, depth, 0);
 	if (IS_ERR(f->fr_mr))
 		goto out_mr_err;
-	f->fr_pgl = ib_alloc_fast_reg_page_list(device, depth);
-	if (IS_ERR(f->fr_pgl))
+
+	f->sg = kcalloc(sizeof(*f->sg), depth, GFP_KERNEL);
+	if (IS_ERR(f->sg))
 		goto out_list_err;
+
+	sg_init_table(f->sg, depth);
+
 	return 0;
 
 out_mr_err:
@@ -163,7 +167,7 @@ out_mr_err:
 	return rc;
 
 out_list_err:
-	rc = PTR_ERR(f->fr_pgl);
+	rc = -ENOMEM;
 	dprintk("RPC:       %s: ib_alloc_fast_reg_page_list status %i\n",
 		__func__, rc);
 	ib_dereg_mr(f->fr_mr);
@@ -179,7 +183,7 @@ __frwr_release(struct rpcrdma_mw *r)
 	if (rc)
 		dprintk("RPC:       %s: ib_dereg_mr status %i\n",
 			__func__, rc);
-	ib_free_fast_reg_page_list(r->r.frmr.fr_pgl);
+	kfree(r->r.frmr.sg);
 }
 
 static int
@@ -320,10 +324,7 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
 	struct ib_send_wr fastreg_wr, *bad_wr;
 	u8 key;
 	int len, pageoff;
-	int i, rc;
-	int seg_len;
-	u64 pa;
-	int page_no;
+	int i, rc, access;
 
 	mw = seg1->rl_mw;
 	seg1->rl_mw = NULL;
@@ -344,39 +345,46 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
 	if (nsegs > ia->ri_max_frmr_depth)
 		nsegs = ia->ri_max_frmr_depth;
 
-	for (page_no = i = 0; i < nsegs;) {
-		rpcrdma_map_one(device, seg, direction);
-		pa = seg->mr_dma;
-		for (seg_len = seg->mr_len; seg_len > 0; seg_len -= PAGE_SIZE) {
-			frmr->fr_pgl->page_list[page_no++] = pa;
-			pa += PAGE_SIZE;
-		}
+	for (i = 0; i < nsegs;) {
+		sg_set_page(&frmr->sg[i], seg->mr_page,
+			    seg->mr_len, offset_in_page(seg->mr_offset));
 		len += seg->mr_len;
-		++seg;
 		++i;
-		/* Check for holes */
+		++seg;
+
+		/* Check for holes - needed?? */
 		if ((i < nsegs && offset_in_page(seg->mr_offset)) ||
 		    offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
 			break;
 	}
+
+	frmr->sg_nents = i;
+	frmr->dma_nents = ib_dma_map_sg(device, frmr->sg,
+					frmr->sg_nents, direction);
+	if (!frmr->dma_nents) {
+		pr_err("RPC:       %s: failed to dma map sg %p sg_nents %d\n",
+			__func__, frmr->sg, frmr->sg_nents);
+		return -ENOMEM;
+	}
+
 	dprintk("RPC:       %s: Using frmr %p to map %d segments (%d bytes)\n",
 		__func__, mw, i, len);
 
-	memset(&fastreg_wr, 0, sizeof(fastreg_wr));
-	fastreg_wr.wr_id = (unsigned long)(void *)mw;
-	fastreg_wr.opcode = IB_WR_FAST_REG_MR;
-	fastreg_wr.wr.fast_reg.iova_start = seg1->mr_dma + pageoff;
-	fastreg_wr.wr.fast_reg.page_list = frmr->fr_pgl;
-	fastreg_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
-	fastreg_wr.wr.fast_reg.page_list_len = page_no;
-	fastreg_wr.wr.fast_reg.length = len;
-	fastreg_wr.wr.fast_reg.access_flags = writing ?
-				IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
-				IB_ACCESS_REMOTE_READ;
 	mr = frmr->fr_mr;
+	access = writing ? IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
+			   IB_ACCESS_REMOTE_READ;
+	rc = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, access);
+	if (rc) {
+		pr_err("RPC:       %s: failed to map mr %p rc %d\n",
+			__func__, frmr->fr_mr, rc);
+		return rc;
+	}
+
 	key = (u8)(mr->rkey & 0x000000FF);
 	ib_update_fast_reg_key(mr, ++key);
-	fastreg_wr.wr.fast_reg.rkey = mr->rkey;
+
+	memset(&fastreg_wr, 0, sizeof(fastreg_wr));
+	ib_set_fastreg_wr(mr, mr->rkey, (uintptr_t)mw, false, &fastreg_wr);
 
 	DECR_CQCOUNT(&r_xprt->rx_ep);
 	rc = ib_post_send(ia->ri_id->qp, &fastreg_wr, &bad_wr);
@@ -385,15 +393,14 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
 
 	seg1->rl_mw = mw;
 	seg1->mr_rkey = mr->rkey;
-	seg1->mr_base = seg1->mr_dma + pageoff;
+	seg1->mr_base = mr->iova;
 	seg1->mr_nsegs = i;
 	seg1->mr_len = len;
 	return i;
 
 out_senderr:
 	dprintk("RPC:       %s: ib_post_send status %i\n", __func__, rc);
-	while (i--)
-		rpcrdma_unmap_one(device, --seg);
+	ib_dma_unmap_sg(device, frmr->sg, frmr->sg_nents, direction);
 	__frwr_queue_recovery(mw);
 	return rc;
 }
@@ -407,22 +414,23 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
 	struct rpcrdma_mr_seg *seg1 = seg;
 	struct rpcrdma_ia *ia = &r_xprt->rx_ia;
 	struct rpcrdma_mw *mw = seg1->rl_mw;
+	struct rpcrdma_frmr *frmr = &mw->r.frmr;
 	struct ib_send_wr invalidate_wr, *bad_wr;
 	int rc, nsegs = seg->mr_nsegs;
 
 	dprintk("RPC:       %s: FRMR %p\n", __func__, mw);
 
 	seg1->rl_mw = NULL;
-	mw->r.frmr.fr_state = FRMR_IS_INVALID;
+	frmr->fr_state = FRMR_IS_INVALID;
 
 	memset(&invalidate_wr, 0, sizeof(invalidate_wr));
 	invalidate_wr.wr_id = (unsigned long)(void *)mw;
 	invalidate_wr.opcode = IB_WR_LOCAL_INV;
-	invalidate_wr.ex.invalidate_rkey = mw->r.frmr.fr_mr->rkey;
+	invalidate_wr.ex.invalidate_rkey = frmr->fr_mr->rkey;
 	DECR_CQCOUNT(&r_xprt->rx_ep);
 
-	while (seg1->mr_nsegs--)
-		rpcrdma_unmap_one(ia->ri_device, seg++);
+	ib_dma_unmap_sg(ia->ri_device, frmr->sg, frmr->sg_nents, seg1->mr_dir);
+
 	read_lock(&ia->ri_qplock);
 	rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr);
 	read_unlock(&ia->ri_qplock);
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 886f8c8..a1c3ab2b 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -195,7 +195,9 @@ enum rpcrdma_frmr_state {
 };
 
 struct rpcrdma_frmr {
-	struct ib_fast_reg_page_list	*fr_pgl;
+	struct scatterlist		*sg;
+	unsigned int			sg_nents;
+	unsigned int			dma_nents;
 	struct ib_mr			*fr_mr;
 	enum rpcrdma_frmr_state		fr_state;
 	struct work_struct		fr_work;
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 38/43] iser-target: Port to new memory registration API
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (36 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 37/43] xprtrdma: Port to new memory registration API Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
       [not found]     ` <1437548143-24893-39-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-07-22  6:55   ` [PATCH WIP 39/43] IB/core: Add arbitrary sg_list support Sagi Grimberg
                     ` (5 subsequent siblings)
  43 siblings, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/ulp/isert/ib_isert.c | 116 ++++++--------------------------
 drivers/infiniband/ulp/isert/ib_isert.h |   2 -
 2 files changed, 19 insertions(+), 99 deletions(-)

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 94395ce..af1c01d 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -486,10 +486,8 @@ isert_conn_free_fastreg_pool(struct isert_conn *isert_conn)
 	list_for_each_entry_safe(fr_desc, tmp,
 				 &isert_conn->fr_pool, list) {
 		list_del(&fr_desc->list);
-		ib_free_fast_reg_page_list(fr_desc->data_frpl);
 		ib_dereg_mr(fr_desc->data_mr);
 		if (fr_desc->pi_ctx) {
-			ib_free_fast_reg_page_list(fr_desc->pi_ctx->prot_frpl);
 			ib_dereg_mr(fr_desc->pi_ctx->prot_mr);
 			ib_dereg_mr(fr_desc->pi_ctx->sig_mr);
 			kfree(fr_desc->pi_ctx);
@@ -517,22 +515,13 @@ isert_create_pi_ctx(struct fast_reg_descriptor *desc,
 		return -ENOMEM;
 	}
 
-	pi_ctx->prot_frpl = ib_alloc_fast_reg_page_list(device,
-					    ISCSI_ISER_SG_TABLESIZE);
-	if (IS_ERR(pi_ctx->prot_frpl)) {
-		isert_err("Failed to allocate prot frpl err=%ld\n",
-			  PTR_ERR(pi_ctx->prot_frpl));
-		ret = PTR_ERR(pi_ctx->prot_frpl);
-		goto err_pi_ctx;
-	}
-
 	pi_ctx->prot_mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG,
 				      ISCSI_ISER_SG_TABLESIZE, 0);
 	if (IS_ERR(pi_ctx->prot_mr)) {
 		isert_err("Failed to allocate prot frmr err=%ld\n",
 			  PTR_ERR(pi_ctx->prot_mr));
 		ret = PTR_ERR(pi_ctx->prot_mr);
-		goto err_prot_frpl;
+		goto err_pi_ctx;
 	}
 	desc->ind |= ISERT_PROT_KEY_VALID;
 
@@ -552,8 +541,6 @@ isert_create_pi_ctx(struct fast_reg_descriptor *desc,
 
 err_prot_mr:
 	ib_dereg_mr(pi_ctx->prot_mr);
-err_prot_frpl:
-	ib_free_fast_reg_page_list(pi_ctx->prot_frpl);
 err_pi_ctx:
 	kfree(pi_ctx);
 
@@ -564,34 +551,18 @@ static int
 isert_create_fr_desc(struct ib_device *ib_device, struct ib_pd *pd,
 		     struct fast_reg_descriptor *fr_desc)
 {
-	int ret;
-
-	fr_desc->data_frpl = ib_alloc_fast_reg_page_list(ib_device,
-							 ISCSI_ISER_SG_TABLESIZE);
-	if (IS_ERR(fr_desc->data_frpl)) {
-		isert_err("Failed to allocate data frpl err=%ld\n",
-			  PTR_ERR(fr_desc->data_frpl));
-		return PTR_ERR(fr_desc->data_frpl);
-	}
-
 	fr_desc->data_mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG,
 				       ISCSI_ISER_SG_TABLESIZE, 0);
 	if (IS_ERR(fr_desc->data_mr)) {
 		isert_err("Failed to allocate data frmr err=%ld\n",
 			  PTR_ERR(fr_desc->data_mr));
-		ret = PTR_ERR(fr_desc->data_mr);
-		goto err_data_frpl;
+		return PTR_ERR(fr_desc->data_mr);
 	}
 	fr_desc->ind |= ISERT_DATA_KEY_VALID;
 
 	isert_dbg("Created fr_desc %p\n", fr_desc);
 
 	return 0;
-
-err_data_frpl:
-	ib_free_fast_reg_page_list(fr_desc->data_frpl);
-
-	return ret;
 }
 
 static int
@@ -2521,45 +2492,6 @@ unmap_cmd:
 	return ret;
 }
 
-static int
-isert_map_fr_pagelist(struct ib_device *ib_dev,
-		      struct scatterlist *sg_start, int sg_nents, u64 *fr_pl)
-{
-	u64 start_addr, end_addr, page, chunk_start = 0;
-	struct scatterlist *tmp_sg;
-	int i = 0, new_chunk, last_ent, n_pages;
-
-	n_pages = 0;
-	new_chunk = 1;
-	last_ent = sg_nents - 1;
-	for_each_sg(sg_start, tmp_sg, sg_nents, i) {
-		start_addr = ib_sg_dma_address(ib_dev, tmp_sg);
-		if (new_chunk)
-			chunk_start = start_addr;
-		end_addr = start_addr + ib_sg_dma_len(ib_dev, tmp_sg);
-
-		isert_dbg("SGL[%d] dma_addr: 0x%llx len: %u\n",
-			  i, (unsigned long long)tmp_sg->dma_address,
-			  tmp_sg->length);
-
-		if ((end_addr & ~PAGE_MASK) && i < last_ent) {
-			new_chunk = 0;
-			continue;
-		}
-		new_chunk = 1;
-
-		page = chunk_start & PAGE_MASK;
-		do {
-			fr_pl[n_pages++] = page;
-			isert_dbg("Mapped page_list[%d] page_addr: 0x%llx\n",
-				  n_pages - 1, page);
-			page += PAGE_SIZE;
-		} while (page < end_addr);
-	}
-
-	return n_pages;
-}
-
 static inline void
 isert_inv_rkey(struct ib_send_wr *inv_wr, struct ib_mr *mr)
 {
@@ -2585,11 +2517,9 @@ isert_fast_reg_mr(struct isert_conn *isert_conn,
 	struct isert_device *device = isert_conn->device;
 	struct ib_device *ib_dev = device->ib_device;
 	struct ib_mr *mr;
-	struct ib_fast_reg_page_list *frpl;
 	struct ib_send_wr fr_wr, inv_wr;
 	struct ib_send_wr *bad_wr, *wr = NULL;
-	int ret, pagelist_len;
-	u32 page_off;
+	int ret;
 
 	if (mem->dma_nents == 1) {
 		sge->lkey = device->mr->lkey;
@@ -2600,40 +2530,32 @@ isert_fast_reg_mr(struct isert_conn *isert_conn,
 		return 0;
 	}
 
-	if (ind == ISERT_DATA_KEY_VALID) {
+	if (ind == ISERT_DATA_KEY_VALID)
 		/* Registering data buffer */
 		mr = fr_desc->data_mr;
-		frpl = fr_desc->data_frpl;
-	} else {
+	else
 		/* Registering protection buffer */
 		mr = fr_desc->pi_ctx->prot_mr;
-		frpl = fr_desc->pi_ctx->prot_frpl;
-	}
-
-	page_off = mem->offset % PAGE_SIZE;
-
-	isert_dbg("Use fr_desc %p sg_nents %d offset %u\n",
-		  fr_desc, mem->nents, mem->offset);
-
-	pagelist_len = isert_map_fr_pagelist(ib_dev, mem->sg, mem->nents,
-					     &frpl->page_list[0]);
 
 	if (!(fr_desc->ind & ind)) {
 		isert_inv_rkey(&inv_wr, mr);
 		wr = &inv_wr;
 	}
 
+	ret = ib_map_mr_sg(mr, mem->sg, mem->nents, IB_ACCESS_LOCAL_WRITE);
+	if (ret) {
+		isert_err("failed to map sg %p with %d entries\n",
+			 mem->sg, mem->dma_nents);
+		return ret;
+	}
+
+	isert_dbg("Use fr_desc %p sg_nents %d offset %u\n",
+		  fr_desc, mem->nents, mem->offset);
+
 	/* Prepare FASTREG WR */
 	memset(&fr_wr, 0, sizeof(fr_wr));
-	fr_wr.wr_id = ISER_FASTREG_LI_WRID;
-	fr_wr.opcode = IB_WR_FAST_REG_MR;
-	fr_wr.wr.fast_reg.iova_start = frpl->page_list[0] + page_off;
-	fr_wr.wr.fast_reg.page_list = frpl;
-	fr_wr.wr.fast_reg.page_list_len = pagelist_len;
-	fr_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
-	fr_wr.wr.fast_reg.length = mem->len;
-	fr_wr.wr.fast_reg.rkey = mr->rkey;
-	fr_wr.wr.fast_reg.access_flags = IB_ACCESS_LOCAL_WRITE;
+	ib_set_fastreg_wr(mr, mr->lkey, ISER_FASTREG_LI_WRID,
+			  false, &fr_wr);
 
 	if (!wr)
 		wr = &fr_wr;
@@ -2648,8 +2570,8 @@ isert_fast_reg_mr(struct isert_conn *isert_conn,
 	fr_desc->ind &= ~ind;
 
 	sge->lkey = mr->lkey;
-	sge->addr = frpl->page_list[0] + page_off;
-	sge->length = mem->len;
+	sge->addr = mr->iova;
+	sge->length = mr->length;
 
 	isert_dbg("sge: addr: 0x%llx  length: %u lkey: %x\n",
 		  sge->addr, sge->length, sge->lkey);
diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h
index 9ec23a78..a63fc6a 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.h
+++ b/drivers/infiniband/ulp/isert/ib_isert.h
@@ -84,14 +84,12 @@ enum isert_indicator {
 
 struct pi_context {
 	struct ib_mr		       *prot_mr;
-	struct ib_fast_reg_page_list   *prot_frpl;
 	struct ib_mr		       *sig_mr;
 };
 
 struct fast_reg_descriptor {
 	struct list_head		list;
 	struct ib_mr		       *data_mr;
-	struct ib_fast_reg_page_list   *data_frpl;
 	u8				ind;
 	struct pi_context	       *pi_ctx;
 };
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 39/43] IB/core: Add arbitrary sg_list support
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (37 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 38/43] iser-target: " Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
       [not found]     ` <1437548143-24893-40-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-07-22  6:55   ` [PATCH WIP 40/43] mlx5: Allocate private context for arbitrary scatterlist registration Sagi Grimberg
                     ` (4 subsequent siblings)
  43 siblings, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 include/rdma/ib_verbs.h | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index d543fee..cc83c39 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -133,6 +133,7 @@ enum ib_device_cap_flags {
 	IB_DEVICE_MANAGED_FLOW_STEERING = (1<<29),
 	IB_DEVICE_SIGNATURE_HANDOVER	= (1<<30),
 	IB_DEVICE_ON_DEMAND_PAGING	= (1<<31),
+	IB_DEVICE_MAP_ARB_SG		= (1ULL<<32),
 };
 
 enum ib_signature_prot_cap {
@@ -193,7 +194,7 @@ struct ib_device_attr {
 	u32			hw_ver;
 	int			max_qp;
 	int			max_qp_wr;
-	int			device_cap_flags;
+	u64			device_cap_flags;
 	int			max_sge;
 	int			max_sge_rd;
 	int			max_cq;
@@ -556,6 +557,11 @@ __attribute_const__ int ib_rate_to_mult(enum ib_rate rate);
  */
 __attribute_const__ int ib_rate_to_mbps(enum ib_rate rate);
 
+enum ib_mr_flags {
+	IB_MR_MAP_ARB_SG = 1,
+};
+
+
 enum ib_mr_type {
 	IB_MR_TYPE_FAST_REG,
 	IB_MR_TYPE_SIGNATURE,
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 40/43] mlx5: Allocate private context for arbitrary scatterlist registration
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (38 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 39/43] IB/core: Add arbitrary sg_list support Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
       [not found]     ` <1437548143-24893-41-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-07-22  6:55   ` [PATCH WIP 41/43] mlx5: Add arbitrary sg list support Sagi Grimberg
                     ` (3 subsequent siblings)
  43 siblings, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  6 ++-
 drivers/infiniband/hw/mlx5/mr.c      | 71 ++++++++++++++++++++++++++++++------
 2 files changed, 64 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 7017a1a..fb3ac22 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -315,11 +315,15 @@ enum mlx5_ib_mtt_access_flags {
 
 struct mlx5_ib_mr {
 	struct ib_mr		ibmr;
-	u64		        *pl;
+	union {
+		__be64			*pl;
+		struct mlx5_klm		*klms;
+	};
 	__be64			*mpl;
 	dma_addr_t		pl_map;
 	int			ndescs;
 	int			max_descs;
+	int			access_mode;
 	struct mlx5_core_mr	mmr;
 	struct ib_umem	       *umem;
 	struct mlx5_shared_mr_info	*smr_info;
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 7a030a2..45209c7 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1168,6 +1168,40 @@ error:
 }
 
 static int
+mlx5_alloc_klm_list(struct ib_device *device,
+		    struct mlx5_ib_mr *mr, int ndescs)
+{
+	int size = sizeof(struct mlx5_klm) * ndescs;
+
+	size += max_t(int, MLX5_UMR_ALIGN - ARCH_KMALLOC_MINALIGN, 0);
+	mr->klms = kzalloc(size, GFP_KERNEL);
+	if (!mr->klms)
+		return -ENOMEM;
+
+	mr->pl_map = dma_map_single(device->dma_device, mr->klms,
+				    size, DMA_TO_DEVICE);
+	if (dma_mapping_error(device->dma_device, mr->pl_map))
+		goto err;
+
+	return 0;
+err:
+	kfree(mr->klms);
+
+	return -ENOMEM;
+}
+
+static void
+mlx5_free_klm_list(struct mlx5_ib_mr *mr)
+{
+	struct ib_device *device = mr->ibmr.device;
+	int size = mr->max_descs * sizeof(struct mlx5_klm);
+
+	kfree(mr->klms);
+	dma_unmap_single(device->dma_device, mr->pl_map, size, DMA_TO_DEVICE);
+	mr->klms = NULL;
+}
+
+static int
 mlx5_alloc_page_list(struct ib_device *device,
 		     struct mlx5_ib_mr *mr, int ndescs)
 {
@@ -1222,7 +1256,10 @@ static int clean_mr(struct mlx5_ib_mr *mr)
 		mr->sig = NULL;
 	}
 
-	mlx5_free_page_list(mr);
+	if (mr->access_mode == MLX5_ACCESS_MODE_MTT)
+		mlx5_free_page_list(mr);
+	else
+		mlx5_free_klm_list(mr);
 
 	if (!umred) {
 		err = destroy_mkey(dev, mr);
@@ -1293,10 +1330,10 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
 	struct mlx5_ib_dev *dev = to_mdev(pd->device);
 	struct mlx5_create_mkey_mbox_in *in;
 	struct mlx5_ib_mr *mr;
-	int access_mode, err;
-	int ndescs = roundup(max_entries, 4);
+	int ndescs = ALIGN(max_entries, 4);
+	int err;
 
-	if (flags)
+	if (flags & ~IB_MR_MAP_ARB_SG)
 		return ERR_PTR(-EINVAL);
 
 	mr = kzalloc(sizeof(*mr), GFP_KERNEL);
@@ -1315,13 +1352,20 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
 	in->seg.flags_pd = cpu_to_be32(to_mpd(pd)->pdn);
 
 	if (mr_type == IB_MR_TYPE_FAST_REG) {
-		access_mode = MLX5_ACCESS_MODE_MTT;
-		in->seg.log2_page_size = PAGE_SHIFT;
+		if (flags & IB_MR_MAP_ARB_SG) {
+			mr->access_mode = MLX5_ACCESS_MODE_KLM;
 
-		err = mlx5_alloc_page_list(pd->device, mr, ndescs);
-		if (err)
-			goto err_free_in;
+			err = mlx5_alloc_klm_list(pd->device, mr, ndescs);
+			if (err)
+				goto err_free_in;
+		} else {
+			mr->access_mode = MLX5_ACCESS_MODE_MTT;
+			in->seg.log2_page_size = PAGE_SHIFT;
 
+			err = mlx5_alloc_page_list(pd->device, mr, ndescs);
+			if (err)
+				goto err_free_in;
+		}
 		mr->max_descs = ndescs;
 	} else if (mr_type == IB_MR_TYPE_SIGNATURE) {
 		u32 psv_index[2];
@@ -1341,7 +1385,7 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
 		if (err)
 			goto err_free_sig;
 
-		access_mode = MLX5_ACCESS_MODE_KLM;
+		mr->access_mode = MLX5_ACCESS_MODE_KLM;
 		mr->sig->psv_memory.psv_idx = psv_index[0];
 		mr->sig->psv_wire.psv_idx = psv_index[1];
 
@@ -1355,7 +1399,7 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd,
 		goto err_free_in;
 	}
 
-	in->seg.flags = MLX5_PERM_UMR_EN | access_mode;
+	in->seg.flags = MLX5_PERM_UMR_EN | mr->access_mode;
 	err = mlx5_core_create_mkey(dev->mdev, &mr->mmr, in, sizeof(*in),
 				    NULL, NULL, NULL);
 	if (err)
@@ -1379,7 +1423,10 @@ err_destroy_psv:
 			mlx5_ib_warn(dev, "failed to destroy wire psv %d\n",
 				     mr->sig->psv_wire.psv_idx);
 	}
-	mlx5_free_page_list(mr);
+	if (mr->access_mode == MLX5_ACCESS_MODE_MTT)
+		mlx5_free_page_list(mr);
+	else
+		mlx5_free_klm_list(mr);
 err_free_sig:
 	kfree(mr->sig);
 err_free_in:
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 41/43] mlx5: Add arbitrary sg list support
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (39 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 40/43] mlx5: Allocate private context for arbitrary scatterlist registration Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 42/43] iser: Accept arbitrary sg lists mapping if the device supports it Sagi Grimberg
                     ` (2 subsequent siblings)
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

If ib_alloc_mr is called with IB_MR_MAP_ARB_SG, the driver
allocate a private klm list instead of a private page list.

And set the UMR wqe correctly when posting the fast registration.

Also, expose device cap IB_DEVICE_MAP_ARB_SG

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/hw/mlx5/main.c |  1 +
 drivers/infiniband/hw/mlx5/mr.c   | 30 ++++++++++++++++++++++++++++++
 drivers/infiniband/hw/mlx5/qp.c   | 31 ++++++++++++++++++++++++-------
 3 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index a90ef7a..2402563 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -249,6 +249,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
 	if (MLX5_CAP_GEN(mdev, xrc))
 		props->device_cap_flags |= IB_DEVICE_XRC;
 	props->device_cap_flags |= IB_DEVICE_MEM_MGT_EXTENSIONS;
+	props->device_cap_flags |= IB_DEVICE_MAP_ARB_SG;
 	if (MLX5_CAP_GEN(mdev, sho)) {
 		props->device_cap_flags |= IB_DEVICE_SIGNATURE_HANDOVER;
 		/* At this stage no support for signature handover */
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 45209c7..836e717 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1519,12 +1519,42 @@ done:
 	return ret;
 }
 
+static int
+mlx5_ib_sg_to_klms(struct mlx5_ib_mr *mr,
+		   struct scatterlist *sgl,
+		   unsigned short sg_nents)
+{
+	struct scatterlist *sg = sgl;
+	u32 lkey = mr->ibmr.device->local_dma_lkey;
+	int i;
+
+	if (sg_nents > mr->max_descs)
+		return -EINVAL;
+
+	mr->ibmr.iova = sg_dma_address(sg);
+	mr->ibmr.length = 0;
+	mr->ndescs = sg_nents;
+
+	for (i = 0; i < sg_nents; i++) {
+		mr->klms[i].va = cpu_to_be64(sg_dma_address(sg));
+		mr->klms[i].bcount = cpu_to_be32(sg_dma_len(sg));
+		mr->klms[i].key = cpu_to_be32(lkey);
+		mr->ibmr.length += sg_dma_len(sg);
+		sg = sg_next(sg);
+	}
+
+	return 0;
+}
+
 int mlx5_ib_map_mr_sg(struct ib_mr *ibmr,
 		      struct scatterlist *sg,
 		      unsigned short sg_nents)
 {
 	struct mlx5_ib_mr *mr = to_mmr(ibmr);
 
+	if (mr->access_mode == MLX5_ACCESS_MODE_KLM)
+		return mlx5_ib_sg_to_klms(mr, sg, sg_nents);
+
 	return ib_sg_to_pages(sg, sg_nents, mr->max_descs,
 			      mr->pl, &mr->ndescs,
 			      &ibmr->length, &ibmr->iova);
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index f0a03aa..3fb0396 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -1909,6 +1909,10 @@ static void set_fastreg_umr_seg(struct mlx5_wqe_umr_ctrl_seg *umr,
 {
 	int ndescs = mr->ndescs;
 
+	if (mr->access_mode == MLX5_ACCESS_MODE_KLM)
+		/* KLMs take twice the size of MTTs */
+		ndescs *= 2;
+
 	memset(umr, 0, sizeof(*umr));
 	umr->flags = MLX5_UMR_CHECK_NOT_FREE;
 	umr->klm_octowords = get_klm_octo(ndescs);
@@ -2012,15 +2016,21 @@ static void set_fastreg_mkey_seg(struct mlx5_mkey_seg *seg,
 {
 	int ndescs = ALIGN(mr->ndescs, 8) >> 1;
 
+	if (mr->access_mode == MLX5_ACCESS_MODE_MTT)
+		seg->log2_page_size = PAGE_SHIFT;
+	else if (mr->access_mode == MLX5_ACCESS_MODE_KLM)
+		/* KLMs take twice the size of MTTs */
+		ndescs *= 2;
+
+
 	memset(seg, 0, sizeof(*seg));
-	seg->flags = get_umr_flags(mr->ibmr.access) | MLX5_ACCESS_MODE_MTT;
+	seg->flags = get_umr_flags(mr->ibmr.access) | mr->access_mode;
 	*writ = seg->flags & (MLX5_PERM_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE);
 	seg->qpn_mkey7_0 = cpu_to_be32((key & 0xff) | 0xffffff00);
 	seg->flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL);
 	seg->start_addr = cpu_to_be64(mr->ibmr.iova);
 	seg->len = cpu_to_be64(mr->ibmr.length);
 	seg->xlt_oct_size = cpu_to_be32(ndescs);
-	seg->log2_page_size = PAGE_SHIFT;
 }
 
 static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr,
@@ -2069,12 +2079,19 @@ static void set_fastreg_ds(struct mlx5_wqe_data_seg *dseg,
 			   struct mlx5_ib_pd *pd,
 			   int writ)
 {
-	u64 perm = MLX5_EN_RD | (writ ? MLX5_EN_WR : 0);
-	int bcount = sizeof(u64) * mr->ndescs;
-	int i;
+	int bcount;
+
+	if (mr->access_mode == MLX5_ACCESS_MODE_MTT) {
+		u64 perm = MLX5_EN_RD | (writ ? MLX5_EN_WR : 0);
+		int i;
+
+		bcount = sizeof(u64) * mr->ndescs;
+		for (i = 0; i < mr->ndescs; i++)
+			mr->mpl[i] = cpu_to_be64(mr->pl[i] | perm);
+	} else {
+		bcount = sizeof(struct mlx5_klm) * mr->ndescs;
+	}
 
-	for (i = 0; i < mr->ndescs; i++)
-		mr->mpl[i] = cpu_to_be64(mr->pl[i] | perm);
 
 	dseg->addr = cpu_to_be64(mr->pl_map);
 	dseg->byte_count = cpu_to_be32(ALIGN(bcount, 64));
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 42/43] iser: Accept arbitrary sg lists mapping if the device supports it
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (40 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 41/43] mlx5: Add arbitrary sg list support Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22  6:55   ` [PATCH WIP 43/43] iser: Move unaligned counter increment Sagi Grimberg
  2015-07-22 17:10   ` [PATCH WIP 00/43] New fast registration API Christoph Hellwig
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

If the device support arbitrary sg list mapping (device cap
IB_DEVICE_MAP_ARB_SG set) we allocate the memory regions with
IB_MR_MAP_ARB_SG and skip the counce buffer work around.

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/ulp/iser/iser_memory.c |  4 ++++
 drivers/infiniband/ulp/iser/iser_verbs.c  | 20 ++++++++++++--------
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c
index 094cf8a..690f840 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -781,6 +781,10 @@ iser_handle_unaligned_buf(struct iscsi_iser_task *task,
 	aligned_len = iser_data_buf_aligned_len(mem, device->ib_device,
 						iser_conn->scsi_sg_tablesize);
 	if (aligned_len != mem->dma_nents) {
+		if (device->dev_attr.device_cap_flags & IB_DEVICE_MAP_ARB_SG)
+			/* Arbitrary sg support, no need to bounce :) */
+			return 0;
+
 		err = fall_to_bounce_buf(task, mem, dir);
 		if (err)
 			return err;
diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c
index 332f784..978e283 100644
--- a/drivers/infiniband/ulp/iser/iser_verbs.c
+++ b/drivers/infiniband/ulp/iser/iser_verbs.c
@@ -281,14 +281,18 @@ void iser_free_fmr_pool(struct ib_conn *ib_conn)
 }
 
 static int
-iser_alloc_reg_res(struct ib_device *ib_device,
+iser_alloc_reg_res(struct iser_device *device,
 		   struct ib_pd *pd,
 		   struct iser_reg_resources *res,
 		   unsigned int size)
 {
 	int ret;
+	int flags = 0;
 
-	res->mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, size, 0);
+	if (device->dev_attr.device_cap_flags & IB_DEVICE_MAP_ARB_SG)
+		flags = IB_MR_MAP_ARB_SG;
+
+	res->mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, size, flags);
 	if (IS_ERR(res->mr)) {
 		ret = PTR_ERR(res->mr);
 		iser_err("Failed to allocate ib_fast_reg_mr err=%d\n", ret);
@@ -306,7 +310,7 @@ iser_free_reg_res(struct iser_reg_resources *rsc)
 }
 
 static int
-iser_alloc_pi_ctx(struct ib_device *ib_device,
+iser_alloc_pi_ctx(struct iser_device *device,
 		  struct ib_pd *pd,
 		  struct iser_fr_desc *desc,
 		  unsigned int size)
@@ -320,7 +324,7 @@ iser_alloc_pi_ctx(struct ib_device *ib_device,
 
 	pi_ctx = desc->pi_ctx;
 
-	ret = iser_alloc_reg_res(ib_device, pd, &pi_ctx->rsc, size);
+	ret = iser_alloc_reg_res(device, pd, &pi_ctx->rsc, size);
 	if (ret) {
 		iser_err("failed to allocate reg_resources\n");
 		goto alloc_reg_res_err;
@@ -353,7 +357,7 @@ iser_free_pi_ctx(struct iser_pi_context *pi_ctx)
 }
 
 static struct iser_fr_desc *
-iser_create_fastreg_desc(struct ib_device *ib_device,
+iser_create_fastreg_desc(struct iser_device *device,
 			 struct ib_pd *pd,
 			 bool pi_enable,
 			 unsigned int size)
@@ -365,12 +369,12 @@ iser_create_fastreg_desc(struct ib_device *ib_device,
 	if (!desc)
 		return ERR_PTR(-ENOMEM);
 
-	ret = iser_alloc_reg_res(ib_device, pd, &desc->rsc, size);
+	ret = iser_alloc_reg_res(device, pd, &desc->rsc, size);
 	if (ret)
 		goto reg_res_alloc_failure;
 
 	if (pi_enable) {
-		ret = iser_alloc_pi_ctx(ib_device, pd, desc, size);
+		ret = iser_alloc_pi_ctx(device, pd, desc, size);
 		if (ret)
 			goto pi_ctx_alloc_failure;
 	}
@@ -403,7 +407,7 @@ int iser_alloc_fastreg_pool(struct ib_conn *ib_conn,
 	spin_lock_init(&fr_pool->lock);
 	fr_pool->size = 0;
 	for (i = 0; i < cmds_max; i++) {
-		desc = iser_create_fastreg_desc(device->ib_device, device->pd,
+		desc = iser_create_fastreg_desc(device, device->pd,
 						ib_conn->pi_support, size);
 		if (IS_ERR(desc)) {
 			ret = PTR_ERR(desc);
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* [PATCH WIP 43/43] iser: Move unaligned counter increment
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (41 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 42/43] iser: Accept arbitrary sg lists mapping if the device supports it Sagi Grimberg
@ 2015-07-22  6:55   ` Sagi Grimberg
  2015-07-22 17:10   ` [PATCH WIP 00/43] New fast registration API Christoph Hellwig
  43 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22  6:55 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

We don't always use bounce buffers, still we update
this counter.

Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/ulp/iser/iser_memory.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c
index 690f840..4d3dc1c 100644
--- a/drivers/infiniband/ulp/iser/iser_memory.c
+++ b/drivers/infiniband/ulp/iser/iser_memory.c
@@ -487,11 +487,8 @@ static int fall_to_bounce_buf(struct iscsi_iser_task *iser_task,
 			      struct iser_data_buf *mem,
 			      enum iser_data_dir cmd_dir)
 {
-	struct iscsi_conn *iscsi_conn = iser_task->iser_conn->iscsi_conn;
 	struct iser_device *device = iser_task->iser_conn->ib_conn.device;
 
-	iscsi_conn->fmr_unalign_cnt++;
-
 	if (iser_debug_level > 0)
 		iser_data_buf_dump(mem, device->ib_device);
 
@@ -781,6 +778,7 @@ iser_handle_unaligned_buf(struct iscsi_iser_task *task,
 	aligned_len = iser_data_buf_aligned_len(mem, device->ib_device,
 						iser_conn->scsi_sg_tablesize);
 	if (aligned_len != mem->dma_nents) {
+		iser_conn->iscsi_conn->fmr_unalign_cnt++;
 		if (device->dev_attr.device_cap_flags & IB_DEVICE_MAP_ARB_SG)
 			/* Arbitrary sg support, no need to bounce :) */
 			return 0;
-- 
1.8.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 37/43] xprtrdma: Port to new memory registration API
       [not found]     ` <1437548143-24893-38-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-07-22 15:03       ` Chuck Lever
       [not found]         ` <795F4F28-D92F-46A1-8DA3-2B1B19A17AA3-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  2015-07-22 19:21       ` Steve Wise
  1 sibling, 1 reply; 142+ messages in thread
From: Chuck Lever @ 2015-07-22 15:03 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-rdma, Liran Liss, Oren Duer


On Jul 22, 2015, at 2:55 AM, Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:

> Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> ---
> net/sunrpc/xprtrdma/frwr_ops.c  | 80 ++++++++++++++++++++++-------------------
> net/sunrpc/xprtrdma/xprt_rdma.h |  4 ++-
> 2 files changed, 47 insertions(+), 37 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
> index 517efed..e28246b 100644
> --- a/net/sunrpc/xprtrdma/frwr_ops.c
> +++ b/net/sunrpc/xprtrdma/frwr_ops.c
> @@ -151,9 +151,13 @@ __frwr_init(struct rpcrdma_mw *r, struct ib_pd *pd, struct ib_device *device,
> 	f->fr_mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, depth, 0);
> 	if (IS_ERR(f->fr_mr))
> 		goto out_mr_err;
> -	f->fr_pgl = ib_alloc_fast_reg_page_list(device, depth);
> -	if (IS_ERR(f->fr_pgl))
> +
> +	f->sg = kcalloc(sizeof(*f->sg), depth, GFP_KERNEL);
> +	if (IS_ERR(f->sg))
> 		goto out_list_err;
> +
> +	sg_init_table(f->sg, depth);
> +
> 	return 0;
> 
> out_mr_err:
> @@ -163,7 +167,7 @@ out_mr_err:
> 	return rc;
> 
> out_list_err:
> -	rc = PTR_ERR(f->fr_pgl);
> +	rc = -ENOMEM;
> 	dprintk("RPC:       %s: ib_alloc_fast_reg_page_list status %i\n",
> 		__func__, rc);
> 	ib_dereg_mr(f->fr_mr);
> @@ -179,7 +183,7 @@ __frwr_release(struct rpcrdma_mw *r)
> 	if (rc)
> 		dprintk("RPC:       %s: ib_dereg_mr status %i\n",
> 			__func__, rc);
> -	ib_free_fast_reg_page_list(r->r.frmr.fr_pgl);
> +	kfree(r->r.frmr.sg);
> }
> 
> static int
> @@ -320,10 +324,7 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
> 	struct ib_send_wr fastreg_wr, *bad_wr;
> 	u8 key;
> 	int len, pageoff;
> -	int i, rc;
> -	int seg_len;
> -	u64 pa;
> -	int page_no;
> +	int i, rc, access;
> 
> 	mw = seg1->rl_mw;
> 	seg1->rl_mw = NULL;
> @@ -344,39 +345,46 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
> 	if (nsegs > ia->ri_max_frmr_depth)
> 		nsegs = ia->ri_max_frmr_depth;
> 
> -	for (page_no = i = 0; i < nsegs;) {
> -		rpcrdma_map_one(device, seg, direction);
> -		pa = seg->mr_dma;
> -		for (seg_len = seg->mr_len; seg_len > 0; seg_len -= PAGE_SIZE) {
> -			frmr->fr_pgl->page_list[page_no++] = pa;
> -			pa += PAGE_SIZE;
> -		}
> +	for (i = 0; i < nsegs;) {
> +		sg_set_page(&frmr->sg[i], seg->mr_page,
> +			    seg->mr_len, offset_in_page(seg->mr_offset));

Cautionary note: here we’re dealing with both the “contiguous
set of pages” case and the “small region of bytes in a single page”
case. See rpcrdma_convert_iovs(): sometimes RPC send or receive
buffers can be registered (RDMA_NOMSG).


> 		len += seg->mr_len;
> -		++seg;
> 		++i;
> -		/* Check for holes */
> +		++seg;
> +
> +		/* Check for holes - needed?? */
> 		if ((i < nsegs && offset_in_page(seg->mr_offset)) ||
> 		    offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
> 			break;
> 	}
> +
> +	frmr->sg_nents = i;
> +	frmr->dma_nents = ib_dma_map_sg(device, frmr->sg,
> +					frmr->sg_nents, direction);
> +	if (!frmr->dma_nents) {
> +		pr_err("RPC:       %s: failed to dma map sg %p sg_nents %d\n",
> +			__func__, frmr->sg, frmr->sg_nents);
> +		return -ENOMEM;
> +	}
> +
> 	dprintk("RPC:       %s: Using frmr %p to map %d segments (%d bytes)\n",
> 		__func__, mw, i, len);
> 
> -	memset(&fastreg_wr, 0, sizeof(fastreg_wr));
> -	fastreg_wr.wr_id = (unsigned long)(void *)mw;
> -	fastreg_wr.opcode = IB_WR_FAST_REG_MR;
> -	fastreg_wr.wr.fast_reg.iova_start = seg1->mr_dma + pageoff;
> -	fastreg_wr.wr.fast_reg.page_list = frmr->fr_pgl;
> -	fastreg_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
> -	fastreg_wr.wr.fast_reg.page_list_len = page_no;
> -	fastreg_wr.wr.fast_reg.length = len;
> -	fastreg_wr.wr.fast_reg.access_flags = writing ?
> -				IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
> -				IB_ACCESS_REMOTE_READ;
> 	mr = frmr->fr_mr;
> +	access = writing ? IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
> +			   IB_ACCESS_REMOTE_READ;
> +	rc = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, access);

I like this (and the matching ib_dma_unmap_sg). But why wouldn’t
this function be called ib_dma_map_sg() ? The name ib_map_mr_sg()
had me thinking for a moment that this API actually posted the
FASTREG WR, but I see that it doesn’t.


> +	if (rc) {
> +		pr_err("RPC:       %s: failed to map mr %p rc %d\n",
> +			__func__, frmr->fr_mr, rc);
> +		return rc;
> +	}
> +
> 	key = (u8)(mr->rkey & 0x000000FF);
> 	ib_update_fast_reg_key(mr, ++key);
> -	fastreg_wr.wr.fast_reg.rkey = mr->rkey;
> +
> +	memset(&fastreg_wr, 0, sizeof(fastreg_wr));
> +	ib_set_fastreg_wr(mr, mr->rkey, (uintptr_t)mw, false, &fastreg_wr);
> 
> 	DECR_CQCOUNT(&r_xprt->rx_ep);
> 	rc = ib_post_send(ia->ri_id->qp, &fastreg_wr, &bad_wr);
> @@ -385,15 +393,14 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
> 
> 	seg1->rl_mw = mw;
> 	seg1->mr_rkey = mr->rkey;
> -	seg1->mr_base = seg1->mr_dma + pageoff;
> +	seg1->mr_base = mr->iova;
> 	seg1->mr_nsegs = i;
> 	seg1->mr_len = len;
> 	return i;
> 
> out_senderr:
> 	dprintk("RPC:       %s: ib_post_send status %i\n", __func__, rc);
> -	while (i--)
> -		rpcrdma_unmap_one(device, --seg);
> +	ib_dma_unmap_sg(device, frmr->sg, frmr->sg_nents, direction);
> 	__frwr_queue_recovery(mw);
> 	return rc;
> }
> @@ -407,22 +414,23 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
> 	struct rpcrdma_mr_seg *seg1 = seg;
> 	struct rpcrdma_ia *ia = &r_xprt->rx_ia;
> 	struct rpcrdma_mw *mw = seg1->rl_mw;
> +	struct rpcrdma_frmr *frmr = &mw->r.frmr;
> 	struct ib_send_wr invalidate_wr, *bad_wr;
> 	int rc, nsegs = seg->mr_nsegs;
> 
> 	dprintk("RPC:       %s: FRMR %p\n", __func__, mw);
> 
> 	seg1->rl_mw = NULL;
> -	mw->r.frmr.fr_state = FRMR_IS_INVALID;
> +	frmr->fr_state = FRMR_IS_INVALID;
> 
> 	memset(&invalidate_wr, 0, sizeof(invalidate_wr));
> 	invalidate_wr.wr_id = (unsigned long)(void *)mw;
> 	invalidate_wr.opcode = IB_WR_LOCAL_INV;
> -	invalidate_wr.ex.invalidate_rkey = mw->r.frmr.fr_mr->rkey;
> +	invalidate_wr.ex.invalidate_rkey = frmr->fr_mr->rkey;
> 	DECR_CQCOUNT(&r_xprt->rx_ep);
> 
> -	while (seg1->mr_nsegs--)
> -		rpcrdma_unmap_one(ia->ri_device, seg++);
> +	ib_dma_unmap_sg(ia->ri_device, frmr->sg, frmr->sg_nents, seg1->mr_dir);

->mr_dir was previously set by rpcrdma_map_one(), which you’ve replaced
with ib_map_mr_sg(). So maybe frwr_op_map() needs to save “direction”
in the rpcrdma_frmr.


> +
> 	read_lock(&ia->ri_qplock);
> 	rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr);
> 	read_unlock(&ia->ri_qplock);
> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
> index 886f8c8..a1c3ab2b 100644
> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
> @@ -195,7 +195,9 @@ enum rpcrdma_frmr_state {
> };
> 
> struct rpcrdma_frmr {
> -	struct ib_fast_reg_page_list	*fr_pgl;
> +	struct scatterlist		*sg;
> +	unsigned int			sg_nents;
> +	unsigned int			dma_nents;
> 	struct ib_mr			*fr_mr;
> 	enum rpcrdma_frmr_state		fr_state;
> 	struct work_struct		fr_work;

--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 37/43] xprtrdma: Port to new memory registration API
       [not found]         ` <795F4F28-D92F-46A1-8DA3-2B1B19A17AA3-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2015-07-22 15:41           ` Sagi Grimberg
       [not found]             ` <55AFB9A7.4030103-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  2015-07-22 16:59           ` Christoph Hellwig
  1 sibling, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22 15:41 UTC (permalink / raw)
  To: Chuck Lever, Sagi Grimberg; +Cc: linux-rdma, Liran Liss, Oren Duer


>> +	for (i = 0; i < nsegs;) {
>> +		sg_set_page(&frmr->sg[i], seg->mr_page,
>> +			    seg->mr_len, offset_in_page(seg->mr_offset));
>
> Cautionary note: here we’re dealing with both the “contiguous
> set of pages” case and the “small region of bytes in a single page”
> case. See rpcrdma_convert_iovs(): sometimes RPC send or receive
> buffers can be registered (RDMA_NOMSG).

I noticed that (I think). I think this is handled correctly.
What exactly is the caution note here?

>> 	mr = frmr->fr_mr;
>> +	access = writing ? IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
>> +			   IB_ACCESS_REMOTE_READ;
>> +	rc = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, access);
>
> I like this (and the matching ib_dma_unmap_sg). But why wouldn’t
> this function be called ib_dma_map_sg() ? The name ib_map_mr_sg()
> had me thinking for a moment that this API actually posted the
> FASTREG WR, but I see that it doesn’t.

Umm, ib_dma_map_sg is already taken :)

This is what I came up with, it maps the SG elements to the MR
private context.

I'd like to keep the post API for now. It will be possible to
to add a wrapper function that would do:
- dma_map_sg
- ib_map_mr_sg
- init fastreg send_wr
- post_send (maybe)


>> -	while (seg1->mr_nsegs--)
>> -		rpcrdma_unmap_one(ia->ri_device, seg++);
>> +	ib_dma_unmap_sg(ia->ri_device, frmr->sg, frmr->sg_nents, seg1->mr_dir);
>
> ->mr_dir was previously set by rpcrdma_map_one(), which you’ve replaced
> with ib_map_mr_sg(). So maybe frwr_op_map() needs to save “direction”
> in the rpcrdma_frmr.

Yep, that's correct, if I had turned on dma mapping debug it would shout
at me here...

Note, I added in the git repo a patch to allow arbitrary sg lists in
frwr_op_map() which would allow you to skip the holes check... seems to
work with mlx5...

I did noticed the mlx4 gives a protection error with after the 
conversion... I'll look into that...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 37/43] xprtrdma: Port to new memory registration API
       [not found]             ` <55AFB9A7.4030103-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-07-22 16:04               ` Chuck Lever
       [not found]                 ` <5114D0F0-7C66-4889-85D8-E7297009AF23-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Chuck Lever @ 2015-07-22 16:04 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: Sagi Grimberg, linux-rdma, Liran Liss, Oren Duer


On Jul 22, 2015, at 11:41 AM, Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:

> 
>>> +	for (i = 0; i < nsegs;) {
>>> +		sg_set_page(&frmr->sg[i], seg->mr_page,
>>> +			    seg->mr_len, offset_in_page(seg->mr_offset));
>> 
>> Cautionary note: here we’re dealing with both the “contiguous
>> set of pages” case and the “small region of bytes in a single page”
>> case. See rpcrdma_convert_iovs(): sometimes RPC send or receive
>> buffers can be registered (RDMA_NOMSG).
> 
> I noticed that (I think). I think this is handled correctly.
> What exactly is the caution note here?

Well the sg is turned into a page list below your API. Just
want to make sure that we have tested your xprtrdma alterations
with all the ULP possibilities. When you are further along I
can pull this and run my functional tests.


>>> 	mr = frmr->fr_mr;
>>> +	access = writing ? IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
>>> +			   IB_ACCESS_REMOTE_READ;
>>> +	rc = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, access);
>> 
>> I like this (and the matching ib_dma_unmap_sg). But why wouldn’t
>> this function be called ib_dma_map_sg() ? The name ib_map_mr_sg()
>> had me thinking for a moment that this API actually posted the
>> FASTREG WR, but I see that it doesn’t.
> 
> Umm, ib_dma_map_sg is already taken :)
> 
> This is what I came up with, it maps the SG elements to the MR
> private context.
> 
> I'd like to keep the post API for now. It will be possible to
> to add a wrapper function that would do:
> - dma_map_sg
> - ib_map_mr_sg
> - init fastreg send_wr
> - post_send (maybe)

Where xprtrdma might improve is by setting up all the FASTREG
WRs for one RPC with a single chain and post_send. We could do
that with your INDIR_MR concept, for example.


>>> -	while (seg1->mr_nsegs--)
>>> -		rpcrdma_unmap_one(ia->ri_device, seg++);
>>> +	ib_dma_unmap_sg(ia->ri_device, frmr->sg, frmr->sg_nents, seg1->mr_dir);
>> 
>> ->mr_dir was previously set by rpcrdma_map_one(), which you’ve replaced
>> with ib_map_mr_sg(). So maybe frwr_op_map() needs to save “direction”
>> in the rpcrdma_frmr.
> 
> Yep, that's correct, if I had turned on dma mapping debug it would shout
> at me here...
> 
> Note, I added in the git repo a patch to allow arbitrary sg lists in
> frwr_op_map() which would allow you to skip the holes check... seems to
> work with mlx5...
> 
> I did noticed the mlx4 gives a protection error with after the conversion... I'll look into that...

Should also get Steve and Devesh to try this with their adapters.


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API
       [not found]     ` <1437548143-24893-2-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-07-22 16:34       ` Jason Gunthorpe
       [not found]         ` <20150722163405.GA26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-07-23  0:57       ` Hefty, Sean
  1 sibling, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-22 16:34 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

> +/**
> + * ib_alloc_mr() - Allocates a memory region
> + * @pd:            protection domain associated with the region
> + * @mr_type:       memory region type
> + * @max_entries:   maximum registration entries available
> + * @flags:         create flags
> + */

Can you update this comment to elaborate some more on what the
parameters are? 'max_entries' is the number of s/g elements or
something?

> +enum ib_mr_type {
> +	IB_MR_TYPE_FAST_REG,
> +	IB_MR_TYPE_SIGNATURE,
>  };

Sure would be nice to have some documentation for what these things
do..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API
       [not found]         ` <20150722163405.GA26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-22 16:44           ` Christoph Hellwig
       [not found]             ` <20150722164421.GA6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-07-22 16:59           ` Sagi Grimberg
  1 sibling, 1 reply; 142+ messages in thread
From: Christoph Hellwig @ 2015-07-22 16:44 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Wed, Jul 22, 2015 at 10:34:05AM -0600, Jason Gunthorpe wrote:
> > +/**
> > + * ib_alloc_mr() - Allocates a memory region
> > + * @pd:            protection domain associated with the region
> > + * @mr_type:       memory region type
> > + * @max_entries:   maximum registration entries available
> > + * @flags:         create flags
> > + */
> 
> Can you update this comment to elaborate some more on what the
> parameters are? 'max_entries' is the number of s/g elements or
> something?
> 
> > +enum ib_mr_type {
> > +	IB_MR_TYPE_FAST_REG,
> > +	IB_MR_TYPE_SIGNATURE,
> >  };
> 
> Sure would be nice to have some documentation for what these things
> do..

Agreed on both counts.  Otherwise this looks pretty good to me.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 21/43] mlx5: Allocate a private page list in ib_alloc_mr
       [not found]     ` <1437548143-24893-22-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-07-22 16:46       ` Christoph Hellwig
       [not found]         ` <20150722164605.GB6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-07-28 10:57       ` Haggai Eran
  1 sibling, 1 reply; 142+ messages in thread
From: Christoph Hellwig @ 2015-07-22 16:46 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

Just curious: what's the tradeoff between allocating the page list
in the core vs duplicating it in all the drivers?  Does the driver
variant give us any benefits?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]     ` <1437548143-24893-29-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-07-22 16:50       ` Christoph Hellwig
       [not found]         ` <20150722165012.GC6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-07-22 18:02       ` Jason Gunthorpe
  2015-07-28 11:20       ` Haggai Eran
  2 siblings, 1 reply; 142+ messages in thread
From: Christoph Hellwig @ 2015-07-22 16:50 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

> +/**
> + * ib_map_mr_sg() - Populates MR with a dma mapped SG list
> + * @mr:            memory region
> + * @sg:            dma mapped scatterlist
> + * @sg_nents:      number of entries in sg
> + * @access:        access permissions

I know moving the access flags here was my idea originally, but I seem
convinced by your argument that it might fit in better with the posting
helper.  Or did someone else come up with a better argument that mine
for moving it here?

> +int ib_map_mr_sg(struct ib_mr *mr,
> +		 struct scatterlist *sg,
> +		 unsigned short sg_nents,
> +		 unsigned int access)
> +{
> +	int rc;
> +
> +	if (!mr->device->map_mr_sg)
> +		return -ENOSYS;
> +
> +	rc = mr->device->map_mr_sg(mr, sg, sg_nents);

Do we really need a driver callout here?  It seems like we should
just do the map here, and then either have a flag for the mlx5 indirect
mapping, or if you want to keep the abstraction add the method at that
point but make it optional, so that all the other drivers don't need the
boilerplate code.

Also it seems like this returns 0/-error.  How do callers like SRP
see that it only did a partial mapping and it needs another MR?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 21/43] mlx5: Allocate a private page list in ib_alloc_mr
       [not found]         ` <20150722164605.GB6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-07-22 16:51           ` Sagi Grimberg
  0 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22 16:51 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/22/2015 7:46 PM, Christoph Hellwig wrote:
> Just curious: what's the tradeoff between allocating the page list
> in the core vs duplicating it in all the drivers?  Does the driver
> variant give us any benefits?

It's not necessarily a page list... (i.e. a real scatterlist).
I it will make more sense in patch 41/43.

Moreover, as I wrote in the cover-letter. I noticed that several
drivers keep shadows anyway for various reasons. For example mlx4
sets the page list with a preset-bit (related to ODP...) so at
registration time we see the loop:

for (i = 0; i < mr->npages; ++i)
         mr->mpl[i] = cpu_to_be64(mr->pl[i] | MLX4_MTT_FLAG_PRESENT);

Given that this not a single example, I'd expect drivers to skip this
duplication (hopefully).

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]         ` <20150722165012.GC6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-07-22 16:56           ` Sagi Grimberg
  2015-07-22 17:44           ` Jason Gunthorpe
  1 sibling, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22 16:56 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/22/2015 7:50 PM, Christoph Hellwig wrote:
>> +/**
>> + * ib_map_mr_sg() - Populates MR with a dma mapped SG list
>> + * @mr:            memory region
>> + * @sg:            dma mapped scatterlist
>> + * @sg_nents:      number of entries in sg
>> + * @access:        access permissions
>
> I know moving the access flags here was my idea originally, but I seem
> convinced by your argument that it might fit in better with the posting
> helper.  Or did someone else come up with a better argument that mine
> for moving it here?

Not really. I was and still pretty indifferent about it...

>
>> +int ib_map_mr_sg(struct ib_mr *mr,
>> +		 struct scatterlist *sg,
>> +		 unsigned short sg_nents,
>> +		 unsigned int access)
>> +{
>> +	int rc;
>> +
>> +	if (!mr->device->map_mr_sg)
>> +		return -ENOSYS;
>> +
>> +	rc = mr->device->map_mr_sg(mr, sg, sg_nents);
>
> Do we really need a driver callout here?  It seems like we should
> just do the map here, and then either have a flag for the mlx5 indirect
> mapping, or if you want to keep the abstraction add the method at that
> point but make it optional, so that all the other drivers don't need the
> boilerplate code.

I commented on this bit in another reply. I think that several drivers
will want to use their own mappings. But I can change that if it's not
the case...

>
> Also it seems like this returns 0/-error.  How do callers like SRP
> see that it only did a partial mapping and it needs another MR?

Umm, I think SRP would need to iterate over the sg list and pass partial
SGs to the mapping (I can add a break; statement if we met sg_nents)

It's not perfect, but the idea was not to do backflips here.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API
       [not found]             ` <20150722164421.GA6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-07-22 16:58               ` Sagi Grimberg
       [not found]                 ` <55AFCBAF.2000504-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22 16:58 UTC (permalink / raw)
  To: Christoph Hellwig, Jason Gunthorpe
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/22/2015 7:44 PM, Christoph Hellwig wrote:
> On Wed, Jul 22, 2015 at 10:34:05AM -0600, Jason Gunthorpe wrote:
>>> +/**
>>> + * ib_alloc_mr() - Allocates a memory region
>>> + * @pd:            protection domain associated with the region
>>> + * @mr_type:       memory region type
>>> + * @max_entries:   maximum registration entries available
>>> + * @flags:         create flags
>>> + */
>>
>> Can you update this comment to elaborate some more on what the
>> parameters are? 'max_entries' is the number of s/g elements or
>> something?
>>
>>> +enum ib_mr_type {
>>> +	IB_MR_TYPE_FAST_REG,
>>> +	IB_MR_TYPE_SIGNATURE,
>>>   };
>>
>> Sure would be nice to have some documentation for what these things
>> do..
>
> Agreed on both counts.  Otherwise this looks pretty good to me.

I can add some more documentation here...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 02/43] IB/mlx4: Support ib_alloc_mr verb
       [not found]     ` <1437548143-24893-3-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-07-22 16:58       ` Jason Gunthorpe
       [not found]         ` <20150722165831.GB26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-22 16:58 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Wed, Jul 22, 2015 at 09:55:02AM +0300, Sagi Grimberg wrote:
>  
> +struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd,
> +			       enum ib_mr_type mr_type,
> +			       u32 max_entries,
> +			       u32 flags)
> +{

This is just a copy of mlx4_ib_alloc_fast_reg_mr with
this added:

> +	if (mr_type != IB_MR_TYPE_FAST_REG || flags)
> +		return ERR_PTR(-EINVAL);

Are all the driver updates the same? It looks like it.

I'd suggest shortening this patch series, have the core provide the
wrapper immediately:

struct ib_mr *ib_alloc_mr(struct ib_pd *pd,
{
...

    if (pd->device->alloc_mr) {
	mr = pd->device->alloc_mr(pd, mr_type, max_entries, flags);
    } else {
   	if (mr_type != IB_MR_TYPE_FAST_REG || flags ||
	    !ib_dev->alloc_fast_reg_mr)
		return ERR_PTR(-ENOSYS);
	mr = pd->device->alloc_fast_reg_mr(..);
    }
}

Then go through the series to remove ib_alloc_fast_reg_mr

Then go through one series to migrate the drivers from
alloc_fast_reg_mr to alloc_mr

Then entirely drop alloc_fast_reg_mr from the driver API.

That should be shorter and easier to read the driver diffs, which is
the major change here.

This whole section (up to 20) looks reasonable to me..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API
       [not found]         ` <20150722163405.GA26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-07-22 16:44           ` Christoph Hellwig
@ 2015-07-22 16:59           ` Sagi Grimberg
       [not found]             ` <55AFCBE4.1070803-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  1 sibling, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22 16:59 UTC (permalink / raw)
  To: Jason Gunthorpe, Sagi Grimberg
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/22/2015 7:34 PM, Jason Gunthorpe wrote:
>> +/**
>> + * ib_alloc_mr() - Allocates a memory region
>> + * @pd:            protection domain associated with the region
>> + * @mr_type:       memory region type
>> + * @max_entries:   maximum registration entries available
>> + * @flags:         create flags
>> + */
>
> Can you update this comment to elaborate some more on what the
> parameters are? 'max_entries' is the number of s/g elements or
> something?
>
>> +enum ib_mr_type {
>> +	IB_MR_TYPE_FAST_REG,
>> +	IB_MR_TYPE_SIGNATURE,
>>   };
>
> Sure would be nice to have some documentation for what these things
> do..

Do we want to pull ib_get_dma_mr() here with type IB_MR_TYPE_DMA?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 37/43] xprtrdma: Port to new memory registration API
       [not found]         ` <795F4F28-D92F-46A1-8DA3-2B1B19A17AA3-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  2015-07-22 15:41           ` Sagi Grimberg
@ 2015-07-22 16:59           ` Christoph Hellwig
  1 sibling, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2015-07-22 16:59 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Sagi Grimberg, linux-rdma, Liran Liss, Oren Duer

On Wed, Jul 22, 2015 at 11:03:49AM -0400, Chuck Lever wrote:
> I like this (and the matching ib_dma_unmap_sg). But why wouldn?t
> this function be called ib_dma_map_sg() ? The name ib_map_mr_sg()
> had me thinking for a moment that this API actually posted the
> FASTREG WR, but I see that it doesn?t.

We already have a ib_dma_map_sg, which is a wrapper around dma_map_sg
that allows ehc ipath amd qib to do naughty things instead of the
regular dma mapping.

But it seems maybe the dma_map_sg calls or the magic for those other
drivers should be folded into Sagi's new API as those HCA apparently
don't need physical addresses and thus the S/G list.

God knows what's they're doing with a list of virtual addresses, but
removing the struct scatterlist abuse there would be highly welcome.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API
       [not found]             ` <55AFCBE4.1070803-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-07-22 17:01               ` Jason Gunthorpe
       [not found]                 ` <20150722170120.GC26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-22 17:01 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Wed, Jul 22, 2015 at 07:59:16PM +0300, Sagi Grimberg wrote:
> Do we want to pull ib_get_dma_mr() here with type IB_MR_TYPE_DMA?

I want to get rid of ib_get_dma_mr...

My plan was to get rid of it as my last series shows for all lkey
usages and then rename it to:

ib_get_insecure_all_physical_rkey

For the remaining usages, and a future kernel version will taint the
kernel if anyone calls it.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API
       [not found]                 ` <20150722170120.GC26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-22 17:03                   ` Sagi Grimberg
  0 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22 17:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/22/2015 8:01 PM, Jason Gunthorpe wrote:
> On Wed, Jul 22, 2015 at 07:59:16PM +0300, Sagi Grimberg wrote:
>> Do we want to pull ib_get_dma_mr() here with type IB_MR_TYPE_DMA?
>
> I want to get rid of ib_get_dma_mr...

That's why I asked :)

So I'll take it as a no...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API
       [not found]     ` <1437548143-24893-39-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-07-22 17:04       ` Christoph Hellwig
       [not found]         ` <20150722170413.GE6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Christoph Hellwig @ 2015-07-22 17:04 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

> @@ -2585,11 +2517,9 @@ isert_fast_reg_mr(struct isert_conn *isert_conn,
>  	struct isert_device *device = isert_conn->device;
>  	struct ib_device *ib_dev = device->ib_device;
>  	struct ib_mr *mr;
>  	struct ib_send_wr fr_wr, inv_wr;
>  	struct ib_send_wr *bad_wr, *wr = NULL;
> +	int ret;
>  
>  	if (mem->dma_nents == 1) {
>  		sge->lkey = device->mr->lkey;
> @@ -2600,40 +2530,32 @@ isert_fast_reg_mr(struct isert_conn *isert_conn,
>  		return 0;
>  	}
>  
> +	if (ind == ISERT_DATA_KEY_VALID)
>  		/* Registering data buffer */
>  		mr = fr_desc->data_mr;
> +	else
>  		/* Registering protection buffer */
>  		mr = fr_desc->pi_ctx->prot_mr;
>  
>  	if (!(fr_desc->ind & ind)) {
>  		isert_inv_rkey(&inv_wr, mr);
>  		wr = &inv_wr;
>  	}
>  
> +	ret = ib_map_mr_sg(mr, mem->sg, mem->nents, IB_ACCESS_LOCAL_WRITE);
> +	if (ret) {
> +		isert_err("failed to map sg %p with %d entries\n",
> +			 mem->sg, mem->dma_nents);
> +		return ret;
> +	}
> +
> +	isert_dbg("Use fr_desc %p sg_nents %d offset %u\n",
> +		  fr_desc, mem->nents, mem->offset);
> +
>  	/* Prepare FASTREG WR */
>  	memset(&fr_wr, 0, sizeof(fr_wr));
> +	ib_set_fastreg_wr(mr, mr->lkey, ISER_FASTREG_LI_WRID,
> +			  false, &fr_wr);

Shouldn't ib_set_fastreg_wr take care of this memset?  Also it seems
instead of the singalled flag to it we might just set that or
other flags later if we really want to.

>  struct pi_context {
>  	struct ib_mr		       *prot_mr;
> -	struct ib_fast_reg_page_list   *prot_frpl;
>  	struct ib_mr		       *sig_mr;
>  };
>  
>  struct fast_reg_descriptor {
>  	struct list_head		list;
>  	struct ib_mr		       *data_mr;
> -	struct ib_fast_reg_page_list   *data_frpl;
>  	u8				ind;
>  	struct pi_context	       *pi_ctx;

As a follow on it might be worth to just kill off the separate
pi_context structure here.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 39/43] IB/core: Add arbitrary sg_list support
       [not found]     ` <1437548143-24893-40-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-07-22 17:05       ` Christoph Hellwig
  2015-07-22 17:22       ` Jason Gunthorpe
  1 sibling, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2015-07-22 17:05 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

> +	IB_DEVICE_MAP_ARB_SG		= (1ULL<<32),

> +enum ib_mr_flags {
> +	IB_MR_MAP_ARB_SG = 1,
> +};
> +

s/ARB_SG/SG_GAPS/?

Also please try to document new flags.  I know the IB code currently
doesn't do it, but starting a trend there would be very useful.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 00/43] New fast registration API
       [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
                     ` (42 preceding siblings ...)
  2015-07-22  6:55   ` [PATCH WIP 43/43] iser: Move unaligned counter increment Sagi Grimberg
@ 2015-07-22 17:10   ` Christoph Hellwig
       [not found]     ` <20150722171023.GA18934-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  43 siblings, 1 reply; 142+ messages in thread
From: Christoph Hellwig @ 2015-07-22 17:10 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

Thanks Sagi,

this looks pretty good in general, various nitpicks nonwithstanding.

The one thing I'm curious about is how we can support SRP with it's
multiple MR support without too much boilerplate code.  One option
would be that pass an array of MRs to the map routines, and while
most callers would just pass in one it would handle multiple for those
drivers that supply them.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 02/43] IB/mlx4: Support ib_alloc_mr verb
       [not found]         ` <20150722165831.GB26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-22 17:22           ` Sagi Grimberg
       [not found]             ` <55AFD14C.8040007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22 17:22 UTC (permalink / raw)
  To: Jason Gunthorpe, Sagi Grimberg
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/22/2015 7:58 PM, Jason Gunthorpe wrote:
> On Wed, Jul 22, 2015 at 09:55:02AM +0300, Sagi Grimberg wrote:
>>
>> +struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd,
>> +			       enum ib_mr_type mr_type,
>> +			       u32 max_entries,
>> +			       u32 flags)
>> +{
>
> This is just a copy of mlx4_ib_alloc_fast_reg_mr with
> this added:
>
>> +	if (mr_type != IB_MR_TYPE_FAST_REG || flags)
>> +		return ERR_PTR(-EINVAL);
>
> Are all the driver updates the same? It looks like it.
>
> I'd suggest shortening this patch series, have the core provide the
> wrapper immediately:
>
> struct ib_mr *ib_alloc_mr(struct ib_pd *pd,
> {
> ...
>
>      if (pd->device->alloc_mr) {
> 	mr = pd->device->alloc_mr(pd, mr_type, max_entries, flags);
>      } else {
>     	if (mr_type != IB_MR_TYPE_FAST_REG || flags ||
> 	    !ib_dev->alloc_fast_reg_mr)
> 		return ERR_PTR(-ENOSYS);
> 	mr = pd->device->alloc_fast_reg_mr(..);
>      }
> }
>
> Then go through the series to remove ib_alloc_fast_reg_mr
>
> Then go through one series to migrate the drivers from
> alloc_fast_reg_mr to alloc_mr
>
> Then entirely drop alloc_fast_reg_mr from the driver API.
>
> That should be shorter and easier to read the driver diffs, which is
> the major change here.

Yea, it would be better...

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 39/43] IB/core: Add arbitrary sg_list support
       [not found]     ` <1437548143-24893-40-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-07-22 17:05       ` Christoph Hellwig
@ 2015-07-22 17:22       ` Jason Gunthorpe
       [not found]         ` <20150722172255.GD26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-22 17:22 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Wed, Jul 22, 2015 at 09:55:39AM +0300, Sagi Grimberg wrote:
> +enum ib_mr_flags {
> +	IB_MR_MAP_ARB_SG = 1,
> +};

Something about this just seems ugly. We are back to what we were
trying to avoid: Adding more types of MRs..

Is this really necessary? Do you really need to know the MR type when
the MR is created, or can the adaptor change types on the fly during
registration?

iSER for example has a rarely used corner case where it needs this,
but it just turns on the feature unconditionally right away. This
incures 2x the overhead in the MR allocations and who knows what
performance impact on the adaptor side.

It would be so much better if it could switch to this mode on a SG by
SG list basis.

Same for signature.

In other words: It would be so much cleaner if ib_map_mr_sg set the
MR type based on the need.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 00/43] New fast registration API
       [not found]     ` <20150722171023.GA18934-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-07-22 17:27       ` Jason Gunthorpe
       [not found]         ` <20150722172702.GE26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-07-22 17:42       ` Sagi Grimberg
  1 sibling, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-22 17:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Wed, Jul 22, 2015 at 10:10:23AM -0700, Christoph Hellwig wrote:
> The one thing I'm curious about is how we can support SRP with it's
> multiple MR support without too much boilerplate code.  One option
> would be that pass an array of MRs to the map routines, and while
> most callers would just pass in one it would handle multiple for those
> drivers that supply them.

What is SRP trying to accomplish with that?

The only reason that springs to mind is to emulate IB_MR_MAP_ARB_SG ?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 39/43] IB/core: Add arbitrary sg_list support
       [not found]         ` <20150722172255.GD26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-22 17:29           ` Sagi Grimberg
  0 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22 17:29 UTC (permalink / raw)
  To: Jason Gunthorpe, Sagi Grimberg
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/22/2015 8:22 PM, Jason Gunthorpe wrote:
> On Wed, Jul 22, 2015 at 09:55:39AM +0300, Sagi Grimberg wrote:
>> +enum ib_mr_flags {
>> +	IB_MR_MAP_ARB_SG = 1,
>> +};
>
> Something about this just seems ugly. We are back to what we were
> trying to avoid: Adding more types of MRs..
>
> Is this really necessary? Do you really need to know the MR type when
> the MR is created, or can the adaptor change types on the fly during
> registration?
>
> iSER for example has a rarely used corner case where it needs this,

I can tell you that its anything but a corner case. direct-io, bio
merges, FS operations and PI are examples where most of the sg lists
*will* be "gappy".

Trust me, it's fairly common to see those...

> but it just turns on the feature unconditionally right away. This
> incures 2x the overhead in the MR allocations and who knows what
> performance impact on the adaptor side.

I ran various workloads with this, and performance seems to sustain.

>
> It would be so much better if it could switch to this mode on a SG by
> SG list basis.

It would, but unfortunately it can't.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 40/43] mlx5: Allocate private context for arbitrary scatterlist registration
       [not found]     ` <1437548143-24893-41-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-07-22 17:30       ` Jason Gunthorpe
       [not found]         ` <20150722173048.GF26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-22 17:30 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Wed, Jul 22, 2015 at 09:55:40AM +0300, Sagi Grimberg wrote:
> +	size += max_t(int, MLX5_UMR_ALIGN - ARCH_KMALLOC_MINALIGN, 0);
> +	mr->klms = kzalloc(size, GFP_KERNEL);
> +	if (!mr->klms)
> +		return -ENOMEM;
> +
> +	mr->pl_map = dma_map_single(device->dma_device, mr->klms,
> +				    size, DMA_TO_DEVICE);

This is a misuse of the DMA API, you must call dma_map_single after
the memory is set by the CPU, not before.

The fast reg varient is using coherent allocations, which is OK..

Personally, I'd switch them both to map_single, then when copying the
scatter list
 - Make sure the buffer is DMA unmapped
 - Copy
 - dma_map_single

Unless there is some additional reason for the coherent allocation..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API
       [not found]         ` <20150722170413.GE6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-07-22 17:33           ` Sagi Grimberg
       [not found]             ` <55AFD3DC.8070508-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22 17:33 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/22/2015 8:04 PM, Christoph Hellwig wrote:
>> @@ -2585,11 +2517,9 @@ isert_fast_reg_mr(struct isert_conn *isert_conn,
>>   	struct isert_device *device = isert_conn->device;
>>   	struct ib_device *ib_dev = device->ib_device;
>>   	struct ib_mr *mr;
>>   	struct ib_send_wr fr_wr, inv_wr;
>>   	struct ib_send_wr *bad_wr, *wr = NULL;
>> +	int ret;
>>
>>   	if (mem->dma_nents == 1) {
>>   		sge->lkey = device->mr->lkey;
>> @@ -2600,40 +2530,32 @@ isert_fast_reg_mr(struct isert_conn *isert_conn,
>>   		return 0;
>>   	}
>>
>> +	if (ind == ISERT_DATA_KEY_VALID)
>>   		/* Registering data buffer */
>>   		mr = fr_desc->data_mr;
>> +	else
>>   		/* Registering protection buffer */
>>   		mr = fr_desc->pi_ctx->prot_mr;
>>
>>   	if (!(fr_desc->ind & ind)) {
>>   		isert_inv_rkey(&inv_wr, mr);
>>   		wr = &inv_wr;
>>   	}
>>
>> +	ret = ib_map_mr_sg(mr, mem->sg, mem->nents, IB_ACCESS_LOCAL_WRITE);
>> +	if (ret) {
>> +		isert_err("failed to map sg %p with %d entries\n",
>> +			 mem->sg, mem->dma_nents);
>> +		return ret;
>> +	}
>> +
>> +	isert_dbg("Use fr_desc %p sg_nents %d offset %u\n",
>> +		  fr_desc, mem->nents, mem->offset);
>> +
>>   	/* Prepare FASTREG WR */
>>   	memset(&fr_wr, 0, sizeof(fr_wr));
>> +	ib_set_fastreg_wr(mr, mr->lkey, ISER_FASTREG_LI_WRID,
>> +			  false, &fr_wr);
>
> Shouldn't ib_set_fastreg_wr take care of this memset?  Also it seems
> instead of the singalled flag to it we might just set that or
> other flags later if we really want to.

The reason I didn't put it in was that ib_send_wr is not a small struct
(92 bytes IIRC). So I'm a bit reluctant to add an unconditional memset.
Maybe it's better that the callers can carefully set it to save some
cycles?

>
>>   struct pi_context {
>>   	struct ib_mr		       *prot_mr;
>> -	struct ib_fast_reg_page_list   *prot_frpl;
>>   	struct ib_mr		       *sig_mr;
>>   };
>>
>>   struct fast_reg_descriptor {
>>   	struct list_head		list;
>>   	struct ib_mr		       *data_mr;
>> -	struct ib_fast_reg_page_list   *data_frpl;
>>   	u8				ind;
>>   	struct pi_context	       *pi_ctx;
>
> As a follow on it might be worth to just kill off the separate
> pi_context structure here.

Yea we can do that..
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 00/43] New fast registration API
       [not found]     ` <20150722171023.GA18934-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-07-22 17:27       ` Jason Gunthorpe
@ 2015-07-22 17:42       ` Sagi Grimberg
       [not found]         ` <55AFD608.401-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  1 sibling, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-22 17:42 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/22/2015 8:10 PM, Christoph Hellwig wrote:
> Thanks Sagi,
>
> this looks pretty good in general, various nitpicks nonwithstanding.
>
> The one thing I'm curious about is how we can support SRP with it's
> multiple MR support without too much boilerplate code.  One option
> would be that pass an array of MRs to the map routines, and while
> most callers would just pass in one it would handle multiple for those
> drivers that supply them.

We can do that, but I'd prefer not to pollute the API just for this
single use case. What we can do, is add a pool API that would take care
of that. But even then we might end up with different strategies as not
all ULPs can use it the same way (protocol constraints)...

Today SRP has this logic that registers multiple SG aligned partials.
We can just have it pass a partial SG list to what we have today instead
of building the page vectors...

Or if we can come up with something that will keep the API trivial, we
can take care of that too.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]         ` <20150722165012.GC6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-07-22 16:56           ` Sagi Grimberg
@ 2015-07-22 17:44           ` Jason Gunthorpe
       [not found]             ` <20150722174401.GG26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-22 17:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Wed, Jul 22, 2015 at 09:50:12AM -0700, Christoph Hellwig wrote:
> > +/**
> > + * ib_map_mr_sg() - Populates MR with a dma mapped SG list
> > + * @mr:            memory region
> > + * @sg:            dma mapped scatterlist
> > + * @sg_nents:      number of entries in sg
> > + * @access:        access permissions
> 
> I know moving the access flags here was my idea originally, but I seem
> convinced by your argument that it might fit in better with the posting
> helper.  Or did someone else come up with a better argument that mine
> for moving it here?

I was hoping we'd move the DMA flush and translate into here and make
it mandatory. Is there any reason not to do that?

> > +int ib_map_mr_sg(struct ib_mr *mr,
> > +		 struct scatterlist *sg,
> > +		 unsigned short sg_nents,
> > +		 unsigned int access)
> > +{
> > +	int rc;
> > +
> > +	if (!mr->device->map_mr_sg)
> > +		return -ENOSYS;
> > +
> > +	rc = mr->device->map_mr_sg(mr, sg, sg_nents);
> 
> Do we really need a driver callout here?  It seems like we should

The call out makes sense to me..

The driver will convert the scatter list directly into whatever HW
representation it needs and prepare everything for posting. Every
driver has a different HW format, so it must be a callout.

> Also it seems like this returns 0/-error.  How do callers like SRP
> see that it only did a partial mapping and it needs another MR?

I would think it is an error to pass in more sg_nents than the MR was
created with, so SRP should never get a partial mapping as it should
never ask for more than max_entries.

(? Sagi, did I get the intent of this right?)

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API
       [not found]             ` <55AFD3DC.8070508-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-07-22 17:57               ` Jason Gunthorpe
       [not found]                 ` <20150722175755.GH26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-22 17:57 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Wed, Jul 22, 2015 at 08:33:16PM +0300, Sagi Grimberg wrote:
> >>  	memset(&fr_wr, 0, sizeof(fr_wr));
> >>+	ib_set_fastreg_wr(mr, mr->lkey, ISER_FASTREG_LI_WRID,
> >>+			  false, &fr_wr);
> >
> >Shouldn't ib_set_fastreg_wr take care of this memset?  Also it seems
> >instead of the singalled flag to it we might just set that or
> >other flags later if we really want to.

Seems reasonable.

If you want to micro optimize then just zero the few items that are
defined to be accessed for fastreg, no need to zero the whole
structure. Infact, you may have already done that, so just drop the
memset entirely.

> The reason I didn't put it in was that ib_send_wr is not a small struct
> (92 bytes IIRC). So I'm a bit reluctant to add an unconditional memset.
> Maybe it's better that the callers can carefully set it to save some
> cycles?

If you want to optimize this path, then Sean is right, move the post
into the driver and stop pretending that ib_post_send is a performance
API.

ib_post_fastreg_wr would be a function that needs 3 register passed
arguments and does a simple copy to the driver's actual sendq

No 96 byte structure memset, no stack traffic, no conditional
jumps.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]     ` <1437548143-24893-29-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-07-22 16:50       ` Christoph Hellwig
@ 2015-07-22 18:02       ` Jason Gunthorpe
       [not found]         ` <20150722180203.GI26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-07-28 11:20       ` Haggai Eran
  2 siblings, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-22 18:02 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Wed, Jul 22, 2015 at 09:55:28AM +0300, Sagi Grimberg wrote:
> +/**
> + * ib_map_mr_sg() - Populates MR with a dma mapped SG list
> + * @mr:            memory region
> + * @sg:            dma mapped scatterlist
> + * @sg_nents:      number of entries in sg
> + * @access:        access permissions

Again, related to my prior comments, please have two of these:

ib_map_mr_sg_rkey()
ib_map_mr_sg_lkey()

So we force ULPs to think about what they are doing properly, and we
get a chance to actually force lkey to be local use only for IB.

> +static inline void
> +ib_set_fastreg_wr(struct ib_mr *mr,
> +		  u32 key,

The key should come from MR. Once the above is split then it is
obvious which key to use.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 02/43] IB/mlx4: Support ib_alloc_mr verb
       [not found]             ` <55AFD14C.8040007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-07-22 18:50               ` Steve Wise
       [not found]                 ` <55AFE5D9.3050102-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Steve Wise @ 2015-07-22 18:50 UTC (permalink / raw)
  To: Sagi Grimberg, Jason Gunthorpe, Sagi Grimberg
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/22/2015 12:22 PM, Sagi Grimberg wrote:
> On 7/22/2015 7:58 PM, Jason Gunthorpe wrote:
>> On Wed, Jul 22, 2015 at 09:55:02AM +0300, Sagi Grimberg wrote:
>>>
>>> +struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd,
>>> +                   enum ib_mr_type mr_type,
>>> +                   u32 max_entries,
>>> +                   u32 flags)
>>> +{
>>
>> This is just a copy of mlx4_ib_alloc_fast_reg_mr with
>> this added:
>>
>>> +    if (mr_type != IB_MR_TYPE_FAST_REG || flags)
>>> +        return ERR_PTR(-EINVAL);
>>
>> Are all the driver updates the same? It looks like it.
>>
>> I'd suggest shortening this patch series, have the core provide the
>> wrapper immediately:
>>
>> struct ib_mr *ib_alloc_mr(struct ib_pd *pd,
>> {
>> ...
>>
>>      if (pd->device->alloc_mr) {
>>     mr = pd->device->alloc_mr(pd, mr_type, max_entries, flags);
>>      } else {
>>         if (mr_type != IB_MR_TYPE_FAST_REG || flags ||
>>         !ib_dev->alloc_fast_reg_mr)
>>         return ERR_PTR(-ENOSYS);
>>     mr = pd->device->alloc_fast_reg_mr(..);
>>      }
>> }
>>
>> Then go through the series to remove ib_alloc_fast_reg_mr
>>
>> Then go through one series to migrate the drivers from
>> alloc_fast_reg_mr to alloc_mr
>>
>> Then entirely drop alloc_fast_reg_mr from the driver API.
>>
>> That should be shorter and easier to read the driver diffs, which is
>> the major change here.
>
> Yea, it would be better...

43 patches overflows my stack ;)  I agree with Jason's suggestion.

Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 02/43] IB/mlx4: Support ib_alloc_mr verb
       [not found]                 ` <55AFE5D9.3050102-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2015-07-22 18:54                   ` Jason Gunthorpe
       [not found]                     ` <20150722185410.GA4527-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-22 18:54 UTC (permalink / raw)
  To: Steve Wise
  Cc: Sagi Grimberg, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Oren Duer

On Wed, Jul 22, 2015 at 01:50:01PM -0500, Steve Wise wrote:
 
> 43 patches overflows my stack ;)  I agree with Jason's suggestion.

Saig, you may as well just send the ib_alloc_mr rework as a series and
get it done with, I'd pass off on the core parts of v2.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API
       [not found]                 ` <55AFCBAF.2000504-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-07-22 19:05                   ` Jason Gunthorpe
       [not found]                     ` <20150722190555.GB4527-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-22 19:05 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Wed, Jul 22, 2015 at 07:58:23PM +0300, Sagi Grimberg wrote:
> On 7/22/2015 7:44 PM, Christoph Hellwig wrote:
> >On Wed, Jul 22, 2015 at 10:34:05AM -0600, Jason Gunthorpe wrote:
> >>>+/**
> >>>+ * ib_alloc_mr() - Allocates a memory region
> >>>+ * @pd:            protection domain associated with the region
> >>>+ * @mr_type:       memory region type
> >>>+ * @max_entries:   maximum registration entries available
> >>>+ * @flags:         create flags
> >>>+ */
> >>
> >>Can you update this comment to elaborate some more on what the
> >>parameters are? 'max_entries' is the number of s/g elements or
> >>something?
> >>
> >>>+enum ib_mr_type {
> >>>+	IB_MR_TYPE_FAST_REG,
> >>>+	IB_MR_TYPE_SIGNATURE,
> >>>  };
> >>
> >>Sure would be nice to have some documentation for what these things
> >>do..
> >
> >Agreed on both counts.  Otherwise this looks pretty good to me.
> 
> I can add some more documentation here...

So, I was wrong, 'max_entries' is the number of page entires, not
really the s/g element limit?

In other words the ULP can submit at most max_entires*PAGE_SIZE bytes
for the non ARB_SG case

For the ARB_SG case.. It is some other more difficult computation?

It is somewhat ugly to ask for this upfront as a hard limit..

Is there any reason we can't use a hint_prealloc_pages as the argument
here, and then realloc in the map routine if the hint turns out to be
too small for a particular s/g list?

It looks like all drivers can support this.

That would make it much easier to use correctly, and free ULPs from
dealing with any impedance mismatch with core kernel code that assumes
a sg list length limit, or overall side limit, not some oddball
computation based on pages...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 37/43] xprtrdma: Port to new memory registration API
       [not found]     ` <1437548143-24893-38-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-07-22 15:03       ` Chuck Lever
@ 2015-07-22 19:21       ` Steve Wise
       [not found]         ` <55AFED4C.9040409-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  1 sibling, 1 reply; 142+ messages in thread
From: Steve Wise @ 2015-07-22 19:21 UTC (permalink / raw)
  To: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer


On 7/22/2015 1:55 AM, Sagi Grimberg wrote:
> Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> ---
>   net/sunrpc/xprtrdma/frwr_ops.c  | 80 ++++++++++++++++++++++-------------------
>   net/sunrpc/xprtrdma/xprt_rdma.h |  4 ++-
>   2 files changed, 47 insertions(+), 37 deletions(-)

Did you intend to change svcrdma as well?

> diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
> index 517efed..e28246b 100644
> --- a/net/sunrpc/xprtrdma/frwr_ops.c
> +++ b/net/sunrpc/xprtrdma/frwr_ops.c
> @@ -151,9 +151,13 @@ __frwr_init(struct rpcrdma_mw *r, struct ib_pd *pd, struct ib_device *device,
>   	f->fr_mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, depth, 0);
>   	if (IS_ERR(f->fr_mr))
>   		goto out_mr_err;
> -	f->fr_pgl = ib_alloc_fast_reg_page_list(device, depth);
> -	if (IS_ERR(f->fr_pgl))
> +
> +	f->sg = kcalloc(sizeof(*f->sg), depth, GFP_KERNEL);
> +	if (IS_ERR(f->sg))
>   		goto out_list_err;
> +
> +	sg_init_table(f->sg, depth);
> +
>   	return 0;
>   
>   out_mr_err:
> @@ -163,7 +167,7 @@ out_mr_err:
>   	return rc;
>   
>   out_list_err:
> -	rc = PTR_ERR(f->fr_pgl);
> +	rc = -ENOMEM;
>   	dprintk("RPC:       %s: ib_alloc_fast_reg_page_list status %i\n",
>   		__func__, rc);
>   	ib_dereg_mr(f->fr_mr);
> @@ -179,7 +183,7 @@ __frwr_release(struct rpcrdma_mw *r)
>   	if (rc)
>   		dprintk("RPC:       %s: ib_dereg_mr status %i\n",
>   			__func__, rc);
> -	ib_free_fast_reg_page_list(r->r.frmr.fr_pgl);
> +	kfree(r->r.frmr.sg);
>   }
>   
>   static int
> @@ -320,10 +324,7 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
>   	struct ib_send_wr fastreg_wr, *bad_wr;
>   	u8 key;
>   	int len, pageoff;
> -	int i, rc;
> -	int seg_len;
> -	u64 pa;
> -	int page_no;
> +	int i, rc, access;
>   
>   	mw = seg1->rl_mw;
>   	seg1->rl_mw = NULL;
> @@ -344,39 +345,46 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
>   	if (nsegs > ia->ri_max_frmr_depth)
>   		nsegs = ia->ri_max_frmr_depth;
>   
> -	for (page_no = i = 0; i < nsegs;) {
> -		rpcrdma_map_one(device, seg, direction);
> -		pa = seg->mr_dma;
> -		for (seg_len = seg->mr_len; seg_len > 0; seg_len -= PAGE_SIZE) {
> -			frmr->fr_pgl->page_list[page_no++] = pa;
> -			pa += PAGE_SIZE;
> -		}
> +	for (i = 0; i < nsegs;) {
> +		sg_set_page(&frmr->sg[i], seg->mr_page,
> +			    seg->mr_len, offset_in_page(seg->mr_offset));
>   		len += seg->mr_len;
> -		++seg;
>   		++i;
> -		/* Check for holes */
> +		++seg;
> +
> +		/* Check for holes - needed?? */
>   		if ((i < nsegs && offset_in_page(seg->mr_offset)) ||
>   		    offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
>   			break;
>   	}
> +
> +	frmr->sg_nents = i;
> +	frmr->dma_nents = ib_dma_map_sg(device, frmr->sg,
> +					frmr->sg_nents, direction);
> +	if (!frmr->dma_nents) {
> +		pr_err("RPC:       %s: failed to dma map sg %p sg_nents %d\n",
> +			__func__, frmr->sg, frmr->sg_nents);
> +		return -ENOMEM;
> +	}
> +
>   	dprintk("RPC:       %s: Using frmr %p to map %d segments (%d bytes)\n",
>   		__func__, mw, i, len);
>   
> -	memset(&fastreg_wr, 0, sizeof(fastreg_wr));
> -	fastreg_wr.wr_id = (unsigned long)(void *)mw;
> -	fastreg_wr.opcode = IB_WR_FAST_REG_MR;
> -	fastreg_wr.wr.fast_reg.iova_start = seg1->mr_dma + pageoff;
> -	fastreg_wr.wr.fast_reg.page_list = frmr->fr_pgl;
> -	fastreg_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
> -	fastreg_wr.wr.fast_reg.page_list_len = page_no;
> -	fastreg_wr.wr.fast_reg.length = len;
> -	fastreg_wr.wr.fast_reg.access_flags = writing ?
> -				IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
> -				IB_ACCESS_REMOTE_READ;
>   	mr = frmr->fr_mr;
> +	access = writing ? IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
> +			   IB_ACCESS_REMOTE_READ;
> +	rc = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, access);
> +	if (rc) {
> +		pr_err("RPC:       %s: failed to map mr %p rc %d\n",
> +			__func__, frmr->fr_mr, rc);
> +		return rc;
> +	}
> +
>   	key = (u8)(mr->rkey & 0x000000FF);
>   	ib_update_fast_reg_key(mr, ++key);
> -	fastreg_wr.wr.fast_reg.rkey = mr->rkey;
> +
> +	memset(&fastreg_wr, 0, sizeof(fastreg_wr));
> +	ib_set_fastreg_wr(mr, mr->rkey, (uintptr_t)mw, false, &fastreg_wr);
>   
>   	DECR_CQCOUNT(&r_xprt->rx_ep);
>   	rc = ib_post_send(ia->ri_id->qp, &fastreg_wr, &bad_wr);
> @@ -385,15 +393,14 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
>   
>   	seg1->rl_mw = mw;
>   	seg1->mr_rkey = mr->rkey;
> -	seg1->mr_base = seg1->mr_dma + pageoff;
> +	seg1->mr_base = mr->iova;
>   	seg1->mr_nsegs = i;
>   	seg1->mr_len = len;
>   	return i;
>   
>   out_senderr:
>   	dprintk("RPC:       %s: ib_post_send status %i\n", __func__, rc);
> -	while (i--)
> -		rpcrdma_unmap_one(device, --seg);
> +	ib_dma_unmap_sg(device, frmr->sg, frmr->sg_nents, direction);
>   	__frwr_queue_recovery(mw);
>   	return rc;
>   }
> @@ -407,22 +414,23 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg)
>   	struct rpcrdma_mr_seg *seg1 = seg;
>   	struct rpcrdma_ia *ia = &r_xprt->rx_ia;
>   	struct rpcrdma_mw *mw = seg1->rl_mw;
> +	struct rpcrdma_frmr *frmr = &mw->r.frmr;
>   	struct ib_send_wr invalidate_wr, *bad_wr;
>   	int rc, nsegs = seg->mr_nsegs;
>   
>   	dprintk("RPC:       %s: FRMR %p\n", __func__, mw);
>   
>   	seg1->rl_mw = NULL;
> -	mw->r.frmr.fr_state = FRMR_IS_INVALID;
> +	frmr->fr_state = FRMR_IS_INVALID;
>   
>   	memset(&invalidate_wr, 0, sizeof(invalidate_wr));
>   	invalidate_wr.wr_id = (unsigned long)(void *)mw;
>   	invalidate_wr.opcode = IB_WR_LOCAL_INV;
> -	invalidate_wr.ex.invalidate_rkey = mw->r.frmr.fr_mr->rkey;
> +	invalidate_wr.ex.invalidate_rkey = frmr->fr_mr->rkey;
>   	DECR_CQCOUNT(&r_xprt->rx_ep);
>   
> -	while (seg1->mr_nsegs--)
> -		rpcrdma_unmap_one(ia->ri_device, seg++);
> +	ib_dma_unmap_sg(ia->ri_device, frmr->sg, frmr->sg_nents, seg1->mr_dir);
> +
>   	read_lock(&ia->ri_qplock);
>   	rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr);
>   	read_unlock(&ia->ri_qplock);
> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
> index 886f8c8..a1c3ab2b 100644
> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
> @@ -195,7 +195,9 @@ enum rpcrdma_frmr_state {
>   };
>   
>   struct rpcrdma_frmr {
> -	struct ib_fast_reg_page_list	*fr_pgl;
> +	struct scatterlist		*sg;
> +	unsigned int			sg_nents;
> +	unsigned int			dma_nents;
>   	struct ib_mr			*fr_mr;
>   	enum rpcrdma_frmr_state		fr_state;
>   	struct work_struct		fr_work;

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* RE: [PATCH WIP 01/43] IB: Modify ib_create_mr API
       [not found]     ` <1437548143-24893-2-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-07-22 16:34       ` Jason Gunthorpe
@ 2015-07-23  0:57       ` Hefty, Sean
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A9001357-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  1 sibling, 1 reply; 142+ messages in thread
From: Hefty, Sean @ 2015-07-23  0:57 UTC (permalink / raw)
  To: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

> +enum ib_mr_type {
> +	IB_MR_TYPE_FAST_REG,
> +	IB_MR_TYPE_SIGNATURE,

If we're going to go through the trouble of changing everything, I vote for dropping the word 'fast'. It's a marketing term.  It's goofy.  And the IB spec is goofy for using it.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]             ` <20150722174401.GG26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-23  9:19               ` Christoph Hellwig
       [not found]                 ` <20150723091955.GA32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-07-23 10:15               ` Sagi Grimberg
  1 sibling, 1 reply; 142+ messages in thread
From: Christoph Hellwig @ 2015-07-23  9:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Wed, Jul 22, 2015 at 11:44:01AM -0600, Jason Gunthorpe wrote:
> I was hoping we'd move the DMA flush and translate into here and make
> it mandatory. Is there any reason not to do that?

That would be a reason for passing in a direction, but it would also
up the question on what form we pass that access flag in.  The
old-school RDMA local/remote read/write flags, or a enum_dma_direction
and either a bool or separate functions for lkey/rkey.

Although I wonder if we really need to differentiate between rkey and
leky in this ib_map_mr_sg function, or if we should do it when
allocating the mr, i.e. in ib_alloc_mr.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 40/43] mlx5: Allocate private context for arbitrary scatterlist registration
       [not found]         ` <20150722173048.GF26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-23  9:25           ` Christoph Hellwig
       [not found]             ` <20150723092532.GC32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Christoph Hellwig @ 2015-07-23  9:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Wed, Jul 22, 2015 at 11:30:48AM -0600, Jason Gunthorpe wrote:
> On Wed, Jul 22, 2015 at 09:55:40AM +0300, Sagi Grimberg wrote:
> > +	size += max_t(int, MLX5_UMR_ALIGN - ARCH_KMALLOC_MINALIGN, 0);
> > +	mr->klms = kzalloc(size, GFP_KERNEL);
> > +	if (!mr->klms)
> > +		return -ENOMEM;
> > +
> > +	mr->pl_map = dma_map_single(device->dma_device, mr->klms,
> > +				    size, DMA_TO_DEVICE);
> 
> This is a misuse of the DMA API, you must call dma_map_single after
> the memory is set by the CPU, not before.
>
> The fast reg varient is using coherent allocations, which is OK..

It's fine as long as you dma_sync_*_for_{cpu,device} in the right
places, which is what a lot of drivers do for longer held allocations.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 00/43] New fast registration API
       [not found]         ` <20150722172702.GE26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-23  9:26           ` Christoph Hellwig
  0 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2015-07-23  9:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Wed, Jul 22, 2015 at 11:27:02AM -0600, Jason Gunthorpe wrote:
> What is SRP trying to accomplish with that?
> 
> The only reason that springs to mind is to emulate IB_MR_MAP_ARB_SG ?

It's not emulating IB_MR_MAP_ARB_SG, it simply allows muliple memory
registrations per I/O request.  Be that to support gappy SGLs in a
generic way, or to allow larger I/O sizes than the HCA MR size.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 00/43] New fast registration API
       [not found]         ` <55AFD608.401-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-07-23  9:28           ` Christoph Hellwig
       [not found]             ` <20150723092857.GE32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Christoph Hellwig @ 2015-07-23  9:28 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Wed, Jul 22, 2015 at 08:42:32PM +0300, Sagi Grimberg wrote:
> We can do that, but I'd prefer not to pollute the API just for this
> single use case. What we can do, is add a pool API that would take care
> of that. But even then we might end up with different strategies as not
> all ULPs can use it the same way (protocol constraints)...
> 
> Today SRP has this logic that registers multiple SG aligned partials.
> We can just have it pass a partial SG list to what we have today instead
> of building the page vectors...
> 
> Or if we can come up with something that will keep the API trivial, we
> can take care of that too.


Supporting an array or list of MRs seems pretty easy.  If you ignore the
weird fallback to physical DMA case when a MR fails case the SRP memory
registration code isn't significanly more complex than that in iSER for
example.  And I think NFS needs the same support as well, as it allows
using additional MRs when detecting a gap.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API
       [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A9001357-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2015-07-23  9:30           ` Christoph Hellwig
       [not found]             ` <20150723093046.GF32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Christoph Hellwig @ 2015-07-23  9:30 UTC (permalink / raw)
  To: Hefty, Sean
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Thu, Jul 23, 2015 at 12:57:34AM +0000, Hefty, Sean wrote:
> > +enum ib_mr_type {
> > +	IB_MR_TYPE_FAST_REG,
> > +	IB_MR_TYPE_SIGNATURE,
> 
> If we're going to go through the trouble of changing everything, I vote
> for dropping the word 'fast'. It's a marketing term.  It's goofy.  And
> the IB spec is goofy for using it.

Yes.  Especially as the infrastructure will be usable to support FMR
on legacy adapters as well except that instead of the ib_post_send it'll
need a call to the FMR code at the very end.

While we're at it  wonder if we should consolidate the type and the
flags field as well, as the split between the two is a little confusing.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API
       [not found]                     ` <20150722190555.GB4527-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-23 10:07                       ` Sagi Grimberg
       [not found]                         ` <55B0BCFC.6040602-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-23 10:07 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/22/2015 10:05 PM, Jason Gunthorpe wrote:
> On Wed, Jul 22, 2015 at 07:58:23PM +0300, Sagi Grimberg wrote:
>> On 7/22/2015 7:44 PM, Christoph Hellwig wrote:
>>> On Wed, Jul 22, 2015 at 10:34:05AM -0600, Jason Gunthorpe wrote:
>>>>> +/**
>>>>> + * ib_alloc_mr() - Allocates a memory region
>>>>> + * @pd:            protection domain associated with the region
>>>>> + * @mr_type:       memory region type
>>>>> + * @max_entries:   maximum registration entries available
>>>>> + * @flags:         create flags
>>>>> + */
>>>>
>>>> Can you update this comment to elaborate some more on what the
>>>> parameters are? 'max_entries' is the number of s/g elements or
>>>> something?
>>>>
>>>>> +enum ib_mr_type {
>>>>> +	IB_MR_TYPE_FAST_REG,
>>>>> +	IB_MR_TYPE_SIGNATURE,
>>>>>   };
>>>>
>>>> Sure would be nice to have some documentation for what these things
>>>> do..
>>>
>>> Agreed on both counts.  Otherwise this looks pretty good to me.
>>
>> I can add some more documentation here...
>
> So, I was wrong, 'max_entries' is the number of page entires, not
> really the s/g element limit?

The max_entries stands for the maximum number of sg entries. Other than
that, the SG list must meet the requirements documented in ib_map_mr_sg.

The reason I named max_entries is because might might not be pages but
real SG elements. It stands for maximum registration entries.

Do you have a better name?

>
> In other words the ULP can submit at most max_entires*PAGE_SIZE bytes
> for the non ARB_SG case
>
> For the ARB_SG case.. It is some other more difficult computation?

Not really. The ULP needs to submit sg_nents < max_entries. The SG
list needs to meed the alignment requirements.

For ARB_SG, the condition is the same, but the SG is free from the
alignment constraints.

>
> It is somewhat ugly to ask for this upfront as a hard limit..
>
> Is there any reason we can't use a hint_prealloc_pages as the argument
> here, and then realloc in the map routine if the hint turns out to be
> too small for a particular s/g list?

The reason is that it is not possible. The memory key allocation
reserves resources in the device translation tables. realloc means
reallocating the memory key. In any event, this is not possible in
the IO path.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API
       [not found]             ` <20150723093046.GF32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-07-23 10:09               ` Sagi Grimberg
  0 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-23 10:09 UTC (permalink / raw)
  To: Christoph Hellwig, Hefty, Sean
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/23/2015 12:30 PM, Christoph Hellwig wrote:
> On Thu, Jul 23, 2015 at 12:57:34AM +0000, Hefty, Sean wrote:
>>> +enum ib_mr_type {
>>> +	IB_MR_TYPE_FAST_REG,
>>> +	IB_MR_TYPE_SIGNATURE,
>>
>> If we're going to go through the trouble of changing everything, I vote
>> for dropping the word 'fast'. It's a marketing term.  It's goofy.  And
>> the IB spec is goofy for using it.

So IB_MR_TYPE_MEM_REG?

>
> Yes.  Especially as the infrastructure will be usable to support FMR
> on legacy adapters as well except that instead of the ib_post_send it'll
> need a call to the FMR code at the very end.
>
> While we're at it  wonder if we should consolidate the type and the
> flags field as well, as the split between the two is a little confusing.

I can do that.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 02/43] IB/mlx4: Support ib_alloc_mr verb
       [not found]                     ` <20150722185410.GA4527-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-23 10:10                       ` Sagi Grimberg
  0 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-23 10:10 UTC (permalink / raw)
  To: Jason Gunthorpe, Steve Wise
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/22/2015 9:54 PM, Jason Gunthorpe wrote:
> On Wed, Jul 22, 2015 at 01:50:01PM -0500, Steve Wise wrote:
>
>> 43 patches overflows my stack ;)  I agree with Jason's suggestion.
>
> Saig, you may as well just send the ib_alloc_mr rework as a series and
> get it done with, I'd pass off on the core parts of v2.

I'll split that off from the rest.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]             ` <20150722174401.GG26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-07-23  9:19               ` Christoph Hellwig
@ 2015-07-23 10:15               ` Sagi Grimberg
       [not found]                 ` <55B0BEB4.9080702-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  1 sibling, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-23 10:15 UTC (permalink / raw)
  To: Jason Gunthorpe, Christoph Hellwig
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/22/2015 8:44 PM, Jason Gunthorpe wrote:
> On Wed, Jul 22, 2015 at 09:50:12AM -0700, Christoph Hellwig wrote:
>>> +/**
>>> + * ib_map_mr_sg() - Populates MR with a dma mapped SG list
>>> + * @mr:            memory region
>>> + * @sg:            dma mapped scatterlist
>>> + * @sg_nents:      number of entries in sg
>>> + * @access:        access permissions
>>
>> I know moving the access flags here was my idea originally, but I seem
>> convinced by your argument that it might fit in better with the posting
>> helper.  Or did someone else come up with a better argument that mine
>> for moving it here?
>
> I was hoping we'd move the DMA flush and translate into here and make
> it mandatory. Is there any reason not to do that?

The reason I didn't added it in was so the ULPs can make sure they meet
the restrictions of ib_map_mr_sg(). Allow SRP to iterate on his
SG list set partials and iSER to detect gaps (they need to dma map
for that).

>
>>> +int ib_map_mr_sg(struct ib_mr *mr,
>>> +		 struct scatterlist *sg,
>>> +		 unsigned short sg_nents,
>>> +		 unsigned int access)
>>> +{
>>> +	int rc;
>>> +
>>> +	if (!mr->device->map_mr_sg)
>>> +		return -ENOSYS;
>>> +
>>> +	rc = mr->device->map_mr_sg(mr, sg, sg_nents);
>>
>> Do we really need a driver callout here?  It seems like we should
>
> The call out makes sense to me..
>
> The driver will convert the scatter list directly into whatever HW
> representation it needs and prepare everything for posting. Every
> driver has a different HW format, so it must be a callout.
>
>> Also it seems like this returns 0/-error.  How do callers like SRP
>> see that it only did a partial mapping and it needs another MR?
>
> I would think it is an error to pass in more sg_nents than the MR was
> created with, so SRP should never get a partial mapping as it should
> never ask for more than max_entries.
>
> (? Sagi, did I get the intent of this right?)

Error is returned when:
- sg_nents > max_entries
- sg has gaps
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]         ` <20150722180203.GI26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-23 10:19           ` Sagi Grimberg
       [not found]             ` <55B0BFA4.4060509-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-23 10:19 UTC (permalink / raw)
  To: Jason Gunthorpe, Sagi Grimberg
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/22/2015 9:02 PM, Jason Gunthorpe wrote:
> On Wed, Jul 22, 2015 at 09:55:28AM +0300, Sagi Grimberg wrote:
>> +/**
>> + * ib_map_mr_sg() - Populates MR with a dma mapped SG list
>> + * @mr:            memory region
>> + * @sg:            dma mapped scatterlist
>> + * @sg_nents:      number of entries in sg
>> + * @access:        access permissions
>
> Again, related to my prior comments, please have two of these:
>
> ib_map_mr_sg_rkey()
> ib_map_mr_sg_lkey()
>
> So we force ULPs to think about what they are doing properly, and we
> get a chance to actually force lkey to be local use only for IB.

The lkey/rkey decision is passed in the fastreg post_send().

ib_map_mr_sg is just a mapping API, not the registration itself.

>
>> +static inline void
>> +ib_set_fastreg_wr(struct ib_mr *mr,
>> +		  u32 key,
>
> The key should come from MR. Once the above is split then it is
> obvious which key to use.

IMO, it's obvious as it is. I don't see why should anyone get it
wrong.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 37/43] xprtrdma: Port to new memory registration API
       [not found]         ` <55AFED4C.9040409-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2015-07-23 10:20           ` Sagi Grimberg
       [not found]             ` <55B0C002.60307-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-23 10:20 UTC (permalink / raw)
  To: Steve Wise, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: Liran Liss, Oren Duer

On 7/22/2015 10:21 PM, Steve Wise wrote:
>
> On 7/22/2015 1:55 AM, Sagi Grimberg wrote:
>> Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> ---
>>   net/sunrpc/xprtrdma/frwr_ops.c  | 80
>> ++++++++++++++++++++++-------------------
>>   net/sunrpc/xprtrdma/xprt_rdma.h |  4 ++-
>>   2 files changed, 47 insertions(+), 37 deletions(-)
>
> Did you intend to change svcrdma as well?

All the ULPs need to convert. I didn't have a chance to convert
svcrdma yet. Want to take it?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API
       [not found]                 ` <20150722175755.GH26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-23 10:27                   ` Sagi Grimberg
       [not found]                     ` <55B0C18B.4080901-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-23 10:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/22/2015 8:57 PM, Jason Gunthorpe wrote:
> On Wed, Jul 22, 2015 at 08:33:16PM +0300, Sagi Grimberg wrote:
>>>>   	memset(&fr_wr, 0, sizeof(fr_wr));
>>>> +	ib_set_fastreg_wr(mr, mr->lkey, ISER_FASTREG_LI_WRID,
>>>> +			  false, &fr_wr);
>>>
>>> Shouldn't ib_set_fastreg_wr take care of this memset?  Also it seems
>>> instead of the singalled flag to it we might just set that or
>>> other flags later if we really want to.
>
> Seems reasonable.
>
> If you want to micro optimize then just zero the few items that are
> defined to be accessed for fastreg, no need to zero the whole
> structure. Infact, you may have already done that, so just drop the
> memset entirely.

I will.

>
>> The reason I didn't put it in was that ib_send_wr is not a small struct
>> (92 bytes IIRC). So I'm a bit reluctant to add an unconditional memset.
>> Maybe it's better that the callers can carefully set it to save some
>> cycles?
>
> If you want to optimize this path, then Sean is right, move the post
> into the driver and stop pretending that ib_post_send is a performance
> API.
>
> ib_post_fastreg_wr would be a function that needs 3 register passed
> arguments and does a simple copy to the driver's actual sendq

That will require to take the SQ lock and write a doorbell for each
registration and post you want to do. I'm confident that constructing
a post chain with a single sq lock acquire and a single doorbell will
be much much better even with conditional jumps and memsets.

svcrdma, isert (and iser - not upstream yet) are doing it. I think that
others should do it too. My tests shows that this makes a difference in
small IO workloads.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 40/43] mlx5: Allocate private context for arbitrary scatterlist registration
       [not found]             ` <20150723092532.GC32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-07-23 10:28               ` Sagi Grimberg
  2015-07-23 16:04               ` Jason Gunthorpe
  1 sibling, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-23 10:28 UTC (permalink / raw)
  To: Christoph Hellwig, Jason Gunthorpe
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/23/2015 12:25 PM, Christoph Hellwig wrote:
> On Wed, Jul 22, 2015 at 11:30:48AM -0600, Jason Gunthorpe wrote:
>> On Wed, Jul 22, 2015 at 09:55:40AM +0300, Sagi Grimberg wrote:
>>> +	size += max_t(int, MLX5_UMR_ALIGN - ARCH_KMALLOC_MINALIGN, 0);
>>> +	mr->klms = kzalloc(size, GFP_KERNEL);
>>> +	if (!mr->klms)
>>> +		return -ENOMEM;
>>> +
>>> +	mr->pl_map = dma_map_single(device->dma_device, mr->klms,
>>> +				    size, DMA_TO_DEVICE);
>>
>> This is a misuse of the DMA API, you must call dma_map_single after
>> the memory is set by the CPU, not before.
>>
>> The fast reg varient is using coherent allocations, which is OK..
>
> It's fine as long as you dma_sync_*_for_{cpu,device} in the right
> places, which is what a lot of drivers do for longer held allocations.

OK. I'll fix that.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 00/43] New fast registration API
       [not found]             ` <20150723092857.GE32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-07-23 10:34               ` Sagi Grimberg
  0 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-23 10:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/23/2015 12:28 PM, Christoph Hellwig wrote:
> On Wed, Jul 22, 2015 at 08:42:32PM +0300, Sagi Grimberg wrote:
>> We can do that, but I'd prefer not to pollute the API just for this
>> single use case. What we can do, is add a pool API that would take care
>> of that. But even then we might end up with different strategies as not
>> all ULPs can use it the same way (protocol constraints)...
>>
>> Today SRP has this logic that registers multiple SG aligned partials.
>> We can just have it pass a partial SG list to what we have today instead
>> of building the page vectors...
>>
>> Or if we can come up with something that will keep the API trivial, we
>> can take care of that too.
>
>
> Supporting an array or list of MRs seems pretty easy.

I'm missing the simplicity here...

> If you ignore the
> weird fallback to physical DMA case when a MR fails case the SRP memory
> registration code isn't significanly more complex than that in iSER for
> example.  And I think NFS needs the same support as well, as it allows
> using additional MRs when detecting a gap.
>

This kinda changing the semantics a bit. With this we need to return a
value of how many MRs used to register. It will also make it a bit
sloppy as the actual mapping is driven from the drivers (which use their
internal buffers).

Don't you think that a separate pool API is better for addressing this?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 37/43] xprtrdma: Port to new memory registration API
       [not found]                 ` <5114D0F0-7C66-4889-85D8-E7297009AF23-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2015-07-23 10:42                   ` Sagi Grimberg
  0 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-23 10:42 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Sagi Grimberg, linux-rdma, Liran Liss, Oren Duer

On 7/22/2015 7:04 PM, Chuck Lever wrote:
>
> On Jul 22, 2015, at 11:41 AM, Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
>
>>
>>>> +	for (i = 0; i < nsegs;) {
>>>> +		sg_set_page(&frmr->sg[i], seg->mr_page,
>>>> +			    seg->mr_len, offset_in_page(seg->mr_offset));
>>>
>>> Cautionary note: here we’re dealing with both the “contiguous
>>> set of pages” case and the “small region of bytes in a single page”
>>> case. See rpcrdma_convert_iovs(): sometimes RPC send or receive
>>> buffers can be registered (RDMA_NOMSG).
>>
>> I noticed that (I think). I think this is handled correctly.
>> What exactly is the caution note here?
>
> Well the sg is turned into a page list below your API. Just
> want to make sure that we have tested your xprtrdma alterations
> with all the ULP possibilities. When you are further along I
> can pull this and run my functional tests.
>
>
>>>> 	mr = frmr->fr_mr;
>>>> +	access = writing ? IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
>>>> +			   IB_ACCESS_REMOTE_READ;
>>>> +	rc = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, access);
>>>
>>> I like this (and the matching ib_dma_unmap_sg). But why wouldn’t
>>> this function be called ib_dma_map_sg() ? The name ib_map_mr_sg()
>>> had me thinking for a moment that this API actually posted the
>>> FASTREG WR, but I see that it doesn’t.
>>
>> Umm, ib_dma_map_sg is already taken :)
>>
>> This is what I came up with, it maps the SG elements to the MR
>> private context.
>>
>> I'd like to keep the post API for now. It will be possible to
>> to add a wrapper function that would do:
>> - dma_map_sg
>> - ib_map_mr_sg
>> - init fastreg send_wr
>> - post_send (maybe)
>
> Where xprtrdma might improve is by setting up all the FASTREG
> WRs for one RPC with a single chain and post_send. We could do
> that with your INDIR_MR concept, for example.

BTW, it would be great if you can play with it a little bit. I'm more
confident with the iSER part... I added two small fixes when I tested
with mlx4. It seems to work...

>
>
>>>> -	while (seg1->mr_nsegs--)
>>>> -		rpcrdma_unmap_one(ia->ri_device, seg++);
>>>> +	ib_dma_unmap_sg(ia->ri_device, frmr->sg, frmr->sg_nents, seg1->mr_dir);
>>>
>>> ->mr_dir was previously set by rpcrdma_map_one(), which you’ve replaced
>>> with ib_map_mr_sg(). So maybe frwr_op_map() needs to save “direction”
>>> in the rpcrdma_frmr.
>>
>> Yep, that's correct, if I had turned on dma mapping debug it would shout
>> at me here...
>>
>> Note, I added in the git repo a patch to allow arbitrary sg lists in
>> frwr_op_map() which would allow you to skip the holes check... seems to
>> work with mlx5...
>>
>> I did noticed the mlx4 gives a protection error with after the conversion... I'll look into that...
>
> Should also get Steve and Devesh to try this with their adapters.

Ah, yes please. I've only compiled tested drivers other than mlx4, mlx5
which means there is a 99.9% (probably 100%) that it doesn't work.

It would be great to get help on porting the rest of the ULPs as well,
but that can wait until we converge on the API...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API
       [not found]                     ` <55B0C18B.4080901-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-07-23 13:35                       ` Chuck Lever
  2015-07-23 16:31                       ` Jason Gunthorpe
  1 sibling, 0 replies; 142+ messages in thread
From: Chuck Lever @ 2015-07-23 13:35 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Jason Gunthorpe, Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer


On Jul 23, 2015, at 6:27 AM, Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:

> On 7/22/2015 8:57 PM, Jason Gunthorpe wrote:
>> On Wed, Jul 22, 2015 at 08:33:16PM +0300, Sagi Grimberg wrote:
>>>>>  	memset(&fr_wr, 0, sizeof(fr_wr));
>>>>> +	ib_set_fastreg_wr(mr, mr->lkey, ISER_FASTREG_LI_WRID,
>>>>> +			  false, &fr_wr);
>>>> 
>>>> Shouldn't ib_set_fastreg_wr take care of this memset?  Also it seems
>>>> instead of the singalled flag to it we might just set that or
>>>> other flags later if we really want to.
>> 
>> Seems reasonable.
>> 
>> If you want to micro optimize then just zero the few items that are
>> defined to be accessed for fastreg, no need to zero the whole
>> structure. Infact, you may have already done that, so just drop the
>> memset entirely.
> 
> I will.
> 
>> 
>>> The reason I didn't put it in was that ib_send_wr is not a small struct
>>> (92 bytes IIRC). So I'm a bit reluctant to add an unconditional memset.
>>> Maybe it's better that the callers can carefully set it to save some
>>> cycles?
>> 
>> If you want to optimize this path, then Sean is right, move the post
>> into the driver and stop pretending that ib_post_send is a performance
>> API.
>> 
>> ib_post_fastreg_wr would be a function that needs 3 register passed
>> arguments and does a simple copy to the driver's actual sendq
> 
> That will require to take the SQ lock and write a doorbell for each
> registration and post you want to do. I'm confident that constructing
> a post chain with a single sq lock acquire and a single doorbell will
> be much much better even with conditional jumps and memsets.

I agree. xprtrdma uses several MRs per RPC. It would be more efficient
to chain together several WRs and post once to deal with these,
especially for HCAs/providers that have a shallow page_list depth.


> svcrdma, isert (and iser - not upstream yet) are doing it. I think that
> others should do it too. My tests shows that this makes a difference in
> small IO workloads.


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* RE: [PATCH WIP 37/43] xprtrdma: Port to new memory registration API
       [not found]             ` <55B0C002.60307-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-07-23 13:46               ` Steve Wise
  0 siblings, 0 replies; 142+ messages in thread
From: Steve Wise @ 2015-07-23 13:46 UTC (permalink / raw)
  To: 'Sagi Grimberg', 'Sagi Grimberg',
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
  Cc: 'Liran Liss', 'Oren Duer'



> -----Original Message-----
> From: Sagi Grimberg [mailto:sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org]
> Sent: Thursday, July 23, 2015 5:21 AM
> To: Steve Wise; Sagi Grimberg; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Cc: Liran Liss; Oren Duer
> Subject: Re: [PATCH WIP 37/43] xprtrdma: Port to new memory registration API
> 
> On 7/22/2015 10:21 PM, Steve Wise wrote:
> >
> > On 7/22/2015 1:55 AM, Sagi Grimberg wrote:
> >> Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >> ---
> >>   net/sunrpc/xprtrdma/frwr_ops.c  | 80
> >> ++++++++++++++++++++++-------------------
> >>   net/sunrpc/xprtrdma/xprt_rdma.h |  4 ++-
> >>   2 files changed, 47 insertions(+), 37 deletions(-)
> >
> > Did you intend to change svcrdma as well?
> 
> All the ULPs need to convert. I didn't have a chance to convert
> svcrdma yet. Want to take it?

Not right now.  My focus is still on enabling iSER.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                 ` <20150723091955.GA32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-07-23 16:03                   ` Jason Gunthorpe
  0 siblings, 0 replies; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-23 16:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Thu, Jul 23, 2015 at 02:19:55AM -0700, Christoph Hellwig wrote:
> Although I wonder if we really need to differentiate between rkey and
> leky in this ib_map_mr_sg function, or if we should do it when
> allocating the mr, i.e. in ib_alloc_mr.

The allocation is agnostic to the usage, the map is what solidifies
things into a certain use, effectively based on the access flags..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 40/43] mlx5: Allocate private context for arbitrary scatterlist registration
       [not found]             ` <20150723092532.GC32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-07-23 10:28               ` Sagi Grimberg
@ 2015-07-23 16:04               ` Jason Gunthorpe
  1 sibling, 0 replies; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-23 16:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Thu, Jul 23, 2015 at 02:25:32AM -0700, Christoph Hellwig wrote:
> On Wed, Jul 22, 2015 at 11:30:48AM -0600, Jason Gunthorpe wrote:
> > On Wed, Jul 22, 2015 at 09:55:40AM +0300, Sagi Grimberg wrote:
> > > +	size += max_t(int, MLX5_UMR_ALIGN - ARCH_KMALLOC_MINALIGN, 0);
> > > +	mr->klms = kzalloc(size, GFP_KERNEL);
> > > +	if (!mr->klms)
> > > +		return -ENOMEM;
> > > +
> > > +	mr->pl_map = dma_map_single(device->dma_device, mr->klms,
> > > +				    size, DMA_TO_DEVICE);
> > 
> > This is a misuse of the DMA API, you must call dma_map_single after
> > the memory is set by the CPU, not before.
> >
> > The fast reg varient is using coherent allocations, which is OK..
> 
> It's fine as long as you dma_sync_*_for_{cpu,device} in the right
> places, which is what a lot of drivers do for longer held allocations.

Right, that is the other better option.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]             ` <55B0BFA4.4060509-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-07-23 16:14               ` Jason Gunthorpe
       [not found]                 ` <20150723161436.GC25174-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-23 16:14 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Thu, Jul 23, 2015 at 01:19:16PM +0300, Sagi Grimberg wrote:
> >Again, related to my prior comments, please have two of these:
> >
> >ib_map_mr_sg_rkey()
> >ib_map_mr_sg_lkey()
> >
> >So we force ULPs to think about what they are doing properly, and we
> >get a chance to actually force lkey to be local use only for IB.
> 
> The lkey/rkey decision is passed in the fastreg post_send().

That is too late to check the access flags.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API
       [not found]                     ` <55B0C18B.4080901-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  2015-07-23 13:35                       ` Chuck Lever
@ 2015-07-23 16:31                       ` Jason Gunthorpe
       [not found]                         ` <20150723163124.GD25174-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-23 16:31 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Thu, Jul 23, 2015 at 01:27:23PM +0300, Sagi Grimberg wrote:
> >ib_post_fastreg_wr would be a function that needs 3 register passed
> >arguments and does a simple copy to the driver's actual sendq
> 
> That will require to take the SQ lock and write a doorbell for each
> registration and post you want to do. I'm confident that constructing
> a post chain with a single sq lock acquire and a single doorbell will
> be much much better even with conditional jumps and memsets.

You are still thinking at a micro level, the ULP should be working at
a higher level and requesting the MR(s) and the actual work together
so the driver can run the the whole chain of posts without extra stack
traffic, locking or doorbells.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                 ` <20150723161436.GC25174-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-23 16:47                   ` Sagi Grimberg
       [not found]                     ` <55B11A92.9040406-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-23 16:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/23/2015 7:14 PM, Jason Gunthorpe wrote:
> On Thu, Jul 23, 2015 at 01:19:16PM +0300, Sagi Grimberg wrote:
>>> Again, related to my prior comments, please have two of these:
>>>
>>> ib_map_mr_sg_rkey()
>>> ib_map_mr_sg_lkey()
>>>
>>> So we force ULPs to think about what they are doing properly, and we
>>> get a chance to actually force lkey to be local use only for IB.
>>
>> The lkey/rkey decision is passed in the fastreg post_send().
>
> That is too late to check the access flags.

Why? the access permissions are kept in the mr context?
I can move it to the post interface if it makes more sense.
the access is kind of out of place in the mapping routine anyway...

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API
       [not found]                         ` <20150723163124.GD25174-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-23 16:59                           ` Sagi Grimberg
       [not found]                             ` <55B11D84.102-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-23 16:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/23/2015 7:31 PM, Jason Gunthorpe wrote:
> On Thu, Jul 23, 2015 at 01:27:23PM +0300, Sagi Grimberg wrote:
>>> ib_post_fastreg_wr would be a function that needs 3 register passed
>>> arguments and does a simple copy to the driver's actual sendq
>>
>> That will require to take the SQ lock and write a doorbell for each
>> registration and post you want to do. I'm confident that constructing
>> a post chain with a single sq lock acquire and a single doorbell will
>> be much much better even with conditional jumps and memsets.
>
> You are still thinking at a micro level, the ULP should be working at
> a higher level and requesting the MR(s) and the actual work together
> so the driver can run the the whole chain of posts without extra stack
> traffic, locking or doorbells.

But I'd also want to chain the subsequent RDMA(s) or SEND (with the
rkey(s) under the same post.

I'm sorry but the idea of handling memory region mapping (possibly more
than one), detecting gaps and deciding on the strategy of what to do
and who knows what else under the send queue lock doesn't seem like a
good idea, its a complete overkill IMO.

I don't mean to be negative about your ideas, I just don't think that
doing all the work in the drivers is going to get us to a better place.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                 ` <55B0BEB4.9080702-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-07-23 17:55                   ` Jason Gunthorpe
       [not found]                     ` <20150723175535.GE25174-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-07-23 18:42                   ` Jason Gunthorpe
  1 sibling, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-23 17:55 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Thu, Jul 23, 2015 at 01:15:16PM +0300, Sagi Grimberg wrote:
> >I was hoping we'd move the DMA flush and translate into here and make
> >it mandatory. Is there any reason not to do that?
> 
> The reason I didn't added it in was so the ULPs can make sure they meet
> the restrictions of ib_map_mr_sg(). Allow SRP to iterate on his
> SG list set partials and iSER to detect gaps (they need to dma map
> for that).

The ULP can always get the sg list's virtual address to check for
gaps. Page aligned gaps are always OK.

BTW, the logic in ib_sg_to_pages should be checking that directly, as
coded, it won't work with swiotlb:

// Only the first SG entry can start unaligned
if (i && page_addr != dma_addr)
    return EINVAL;
// Only the last SG entry can end unaligned
if ((page_addr + dma_len) & PAGE_MASK != end_dma_addr)
 if (!is_last)
     return EINVAL;

Don't use sg->offset after dma mapping.

The biggest problem with checking the virtual address is
swiotlb. However, if swiotlb is used this API is basically broken as
swiotlb downgrades everything to a 2k alignment, which means we only
ever get 1 s/g entry.

To efficiently support swiotlb we'd need to see the driver be able to
work with a page size of IO_TLB_SEGSIZE (2k) so it can handle the
de-alignment that happens during bouncing.

My biggest problem with pushing the dma address up to the ULP is
basically that: The ULP has no idea what the driver can handle, maybe
the driver can handle the 2k pages.

So, that leaves a flow where the ULP does a basic sanity check on the
virtual side, then asks the IB core to map it. The mapping could still
fail because of swiotlb.

If the mapping fails, then the ULP has to bounce buffer, or MR split,
or totally fail.

For bounce buffer, all solutions have a DMA map/unmap cost, so it
doesn't matter if ib_map_mr_sg does that internally.

For MR fragment, the DMA mapping is still usable. Maybe we do need a
slightly different core API to help MR fragmentation? Sounds like NFS
uses this too?

 num_mrs = ib_mr_fragment_sg(&scatterlist);
 while (..)
   ib_map_fragment_sg(mr[i++],&scatterlist,&offset);

Perhaps?

Maybe that is even better because something like iser could do
the parallel:
 ib_mr_needs_fragment_sg(reference_mr,&scatterlist)

Which hides all the various restrictions in driver code.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                 ` <55B0BEB4.9080702-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  2015-07-23 17:55                   ` Jason Gunthorpe
@ 2015-07-23 18:42                   ` Jason Gunthorpe
       [not found]                     ` <20150723184221.GA30303-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-23 18:42 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Thu, Jul 23, 2015 at 01:15:16PM +0300, Sagi Grimberg wrote:
> On 7/22/2015 8:44 PM, Jason Gunthorpe wrote:
> >On Wed, Jul 22, 2015 at 09:50:12AM -0700, Christoph Hellwig wrote:
> >>>+/**
> >>>+ * ib_map_mr_sg() - Populates MR with a dma mapped SG list
> >>>+ * @mr:            memory region
> >>>+ * @sg:            dma mapped scatterlist
> >>>+ * @sg_nents:      number of entries in sg
> >>>+ * @access:        access permissions
> >>
> >>I know moving the access flags here was my idea originally, but I seem
> >>convinced by your argument that it might fit in better with the posting
> >>helper.  Or did someone else come up with a better argument that mine
> >>for moving it here?
> >
> >I was hoping we'd move the DMA flush and translate into here and make
> >it mandatory. Is there any reason not to do that?
> 
> The reason I didn't added it in was so the ULPs can make sure they meet
> the restrictions of ib_map_mr_sg(). Allow SRP to iterate on his
> SG list set partials and iSER to detect gaps (they need to dma map
> for that).

I would like to see the kdoc for ib_map_mr_sg explain exactly what is
required of the caller, maybe just hoist this bit from the
ib_sg_to_pages

Not entirely required if we are going to have an API to do the test..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                     ` <55B11A92.9040406-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-07-23 18:51                       ` Jason Gunthorpe
       [not found]                         ` <20150723185126.GA31346-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-23 18:51 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Thu, Jul 23, 2015 at 07:47:14PM +0300, Sagi Grimberg wrote:

> >>>So we force ULPs to think about what they are doing properly, and we
> >>>get a chance to actually force lkey to be local use only for IB.
> >>
> >>The lkey/rkey decision is passed in the fastreg post_send().
> >
> >That is too late to check the access flags.
> 
> Why? the access permissions are kept in the mr context?

Sure, one could do if (key == mr->lkey) .. check lkey flags in the
post, but that seems silly considering we want the post inlined..

> I can move it to the post interface if it makes more sense.
> the access is kind of out of place in the mapping routine anyway...

All the dma routines have an access equivalent during map, I don't
think it is out of place..

To my mind, the map is the point where the MR should crystallize into
an rkey or lkey MR, not at the post.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API
       [not found]                             ` <55B11D84.102-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-07-23 18:53                               ` Jason Gunthorpe
       [not found]                                 ` <20150723185334.GB31346-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-23 18:53 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Thu, Jul 23, 2015 at 07:59:48PM +0300, Sagi Grimberg wrote:
> I don't mean to be negative about your ideas, I just don't think that
> doing all the work in the drivers is going to get us to a better place.

No worries, I'm hoping someone can put the peices together and figure
out how to code share all the duplication we seem to have in the ULPs.

The more I've look at them, the more it seems like they get basic
things wrong, like SQE accouting in NFS, dma flush ordering in NFS,
rkey security in SRP/iSER..

Sharing code means we can fix those problems for good.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API
       [not found]                         ` <55B0BCFC.6040602-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-07-23 19:08                           ` Jason Gunthorpe
       [not found]                             ` <20150723190855.GB31577-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-23 19:08 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Thu, Jul 23, 2015 at 01:07:56PM +0300, Sagi Grimberg wrote:
> On 7/22/2015 10:05 PM, Jason Gunthorpe wrote:
> The reason I named max_entries is because might might not be pages but
> real SG elements. It stands for maximum registration entries.
> 
> Do you have a better name?

I wouldn't try and be both..

Use 'max_num_sg' and document that no aggregate scatterlist with
length larger than 'max_num_sg*PAGE_SIZE' or with more entries than
max_num_sg can be submitted?

Maybe document with ARB_SG that it is not length limited?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API
       [not found]                                 ` <20150723185334.GB31346-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-24 14:36                                   ` Chuck Lever
       [not found]                                     ` <DE0226A1-A7FC-4618-91F1-FE34347C252A-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Chuck Lever @ 2015-07-24 14:36 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Sagi Grimberg, Christoph Hellwig, linux-rdma, Liran Liss, Oren Duer


On Jul 23, 2015, at 2:53 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote:

> On Thu, Jul 23, 2015 at 07:59:48PM +0300, Sagi Grimberg wrote:
>> I don't mean to be negative about your ideas, I just don't think that
>> doing all the work in the drivers is going to get us to a better place.
> 
> No worries, I'm hoping someone can put the peices together and figure
> out how to code share all the duplication we seem to have in the ULPs.
> 
> The more I've look at them, the more it seems like they get basic
> things wrong, like SQE accouting in NFS, dma flush ordering in NFS,

I have a work-in-progress prototype that addresses both of these issues.

Unfinished, but operational:

http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=shortlog;h=refs/heads/nfs-rdma-future

Having this should give us time to analyze the performance impact
of these changes, and to dial in an approach that aligns with your
vision about the unified APIs that you and Sagi have been
discussing.

FRWR is seeing a 10-15% throughput reduction with 8-thread dbench,
but a 5% improvement with 16-thread fio IOPS. 4K and 8K direct
read and write are negatively impacted.

I don’t see any significant change in client CPU utilization, but
have not yet examined changes in interrupt workload, nor have I
done any spin lock or CPU bus traffic analysis.

But none of this is as bad as I feared it could be. There are
plenty of other areas that can recoup some or all of this loss
eventually.

I converted the RPC reply handler tasklet to a work queue context
to allow sleeping. A new .ro_unmap_sync method is invoked after
the RPC/RDMA header is parsed but before xprt_complete_rqst()
wakes up the waiting RPC.

.ro_unmap_sync is 100% synchronous. It does not return to the
reply handler until the MRs are invalid and unmapped.

For FMR, .ro_unmap_sync makes a list of the RPC’s MRs and passes
that list to a single ib_unmap_fmr() call, then performs DMA
unmap and releases the MRs.

This is actually much more efficient than the current logic,
which serially does an ib_unmap_fmr() for each MR the RPC owns.
So FMR overall performs better with this change.

For FRWR, .ro_unmap_sync builds a chain of LOCAL_INV WRs for the
RPC’s MRs and posts that with a single ib_post_send(). The final
WR in the chain is signaled. A kernel completion is used to wait
for the LINV chain to complete. Then DMA unmap and MR release.

This lengthens per-RPC latency for FRWR, because the LINVs are
now fully accounted for in the RPC round-trip rather than being
done asynchronously after the RPC completes. So here performance
is closer to FMR, but is still better by a substantial margin.

Because the next RPC cannot awaken until the last send completes,
send queue accounting is based on RPC/RDMA credit flow control.
I’m sure there are some details here that still need to be
addressed, but this fixes the big problem with FRWR send queue
accounting, which was that LOCAL_INV WRs would continue to
consume SQEs while another RPC was allowed to start.

I think switching to use s/g lists will be straightforward and
could simplify the overall approach somewhat.


> rkey security in SRP/iSER..
> 
> Sharing code means we can fix those problems for good.


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API
       [not found]                                     ` <DE0226A1-A7FC-4618-91F1-FE34347C252A-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2015-07-24 16:26                                       ` Jason Gunthorpe
       [not found]                                         ` <20150724162657.GA21473-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-24 16:26 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Sagi Grimberg, Christoph Hellwig, linux-rdma, Liran Liss, Oren Duer

On Fri, Jul 24, 2015 at 10:36:07AM -0400, Chuck Lever wrote:

> Unfinished, but operational:
> 
> http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=shortlog;h=refs/heads/nfs-rdma-future

Nice..

Can you spend some time and reflect on how some of this could be
lowered into the core code? The FMR and FRWR side have many
similarities now..

> FRWR is seeing a 10-15% throughput reduction with 8-thread dbench,
> but a 5% improvement with 16-thread fio IOPS. 4K and 8K direct
> read and write are negatively impacted.

I'm not surprised since invalidate is sync. I belive you need to
incorporate SEND WITH INVALIDATE to substantially recover this
overhead.

It would be neat if the RQ could continue to advance while waiting for
the invalidate.. That looks almost doable..

> I converted the RPC reply handler tasklet to a work queue context
> to allow sleeping. A new .ro_unmap_sync method is invoked after
> the RPC/RDMA header is parsed but before xprt_complete_rqst()
> wakes up the waiting RPC.

.. so the issue is the RPC must be substantially parsed to learn which
MR it is associated with to schedule the invalidate? 

> This is actually much more efficient than the current logic,
> which serially does an ib_unmap_fmr() for each MR the RPC owns.
> So FMR overall performs better with this change.

Interesting..

> Because the next RPC cannot awaken until the last send completes,
> send queue accounting is based on RPC/RDMA credit flow control.

So for FRWR the sync invalidate effectively guarentees all SQEs
related to this RPC are flushed. That seems reasonable, if the number
of SQEs and CQEs are properly sized in relation to the RPC slot count
it should be workable..

How does FMR and PHYS synchronize?

> I’m sure there are some details here that still need to be
> addressed, but this fixes the big problem with FRWR send queue
> accounting, which was that LOCAL_INV WRs would continue to
> consume SQEs while another RPC was allowed to start.

Did you test without that artificial limit you mentioned before?

I'm also wondering about this:

> During some other testing I found that when a completion upcall
> returns to the provider leaving CQEs still on the completion queue,
> there is a non-zero probability that a completion will be lost.

What does lost mean?

The CQ is edge triggered, so if you don't drain it you might not get
another timely CQ callback (which is bad), but CQEs themselves should
not be lost.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* RE: [PATCH WIP 38/43] iser-target: Port to new memory registration API
       [not found]                                         ` <20150724162657.GA21473-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-24 16:34                                           ` Steve Wise
  2015-07-24 17:46                                           ` Chuck Lever
  1 sibling, 0 replies; 142+ messages in thread
From: Steve Wise @ 2015-07-24 16:34 UTC (permalink / raw)
  To: 'Jason Gunthorpe', 'Chuck Lever'
  Cc: 'Sagi Grimberg', 'Christoph Hellwig',
	'linux-rdma', 'Liran Liss', 'Oren Duer'



> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner@vger.kernel.org] On Behalf Of Jason Gunthorpe
> Sent: Friday, July 24, 2015 11:27 AM
> To: Chuck Lever
> Cc: Sagi Grimberg; Christoph Hellwig; linux-rdma; Liran Liss; Oren Duer
> Subject: Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API
> 
> On Fri, Jul 24, 2015 at 10:36:07AM -0400, Chuck Lever wrote:
> 
> > Unfinished, but operational:
> >
> > http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=shortlog;h=refs/heads/nfs-rdma-future
> 
> Nice..
> 
> Can you spend some time and reflect on how some of this could be
> lowered into the core code? The FMR and FRWR side have many
> similarities now..
> 
> > FRWR is seeing a 10-15% throughput reduction with 8-thread dbench,
> > but a 5% improvement with 16-thread fio IOPS. 4K and 8K direct
> > read and write are negatively impacted.
> 
> I'm not surprised since invalidate is sync. I belive you need to
> incorporate SEND WITH INVALIDATE to substantially recover this
> overhead.
> 
> It would be neat if the RQ could continue to advance while waiting for
> the invalidate.. That looks almost doable..
> 
> > I converted the RPC reply handler tasklet to a work queue context
> > to allow sleeping. A new .ro_unmap_sync method is invoked after
> > the RPC/RDMA header is parsed but before xprt_complete_rqst()
> > wakes up the waiting RPC.
> 
> .. so the issue is the RPC must be substantially parsed to learn which
> MR it is associated with to schedule the invalidate?
> 
> > This is actually much more efficient than the current logic,
> > which serially does an ib_unmap_fmr() for each MR the RPC owns.
> > So FMR overall performs better with this change.
> 
> Interesting..
> 
> > Because the next RPC cannot awaken until the last send completes,
> > send queue accounting is based on RPC/RDMA credit flow control.
> 
> So for FRWR the sync invalidate effectively guarentees all SQEs
> related to this RPC are flushed. That seems reasonable, if the number
> of SQEs and CQEs are properly sized in relation to the RPC slot count
> it should be workable..
> 
> How does FMR and PHYS synchronize?
> 
> > I’m sure there are some details here that still need to be
> > addressed, but this fixes the big problem with FRWR send queue
> > accounting, which was that LOCAL_INV WRs would continue to
> > consume SQEs while another RPC was allowed to start.
> 
> Did you test without that artificial limit you mentioned before?
> 
> I'm also wondering about this:
> 
> > During some other testing I found that when a completion upcall
> > returns to the provider leaving CQEs still on the completion queue,
> > there is a non-zero probability that a completion will be lost.
> 
> What does lost mean?
> 
> The CQ is edge triggered, so if you don't drain it you might not get
> another timely CQ callback (which is bad), but CQEs themselves should
> not be lost.
> 

This condition (not fully draining the CQEs) is due to SQ flow control, yes?  If so, then when the SQ resumes can it wake up the appropriate thread (simulating another CQE insertion)?




--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API
       [not found]                                         ` <20150724162657.GA21473-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-07-24 16:34                                           ` Steve Wise
@ 2015-07-24 17:46                                           ` Chuck Lever
       [not found]                                             ` <903CDFB5-04FE-47B6-B044-E960E8A8BC4C-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 142+ messages in thread
From: Chuck Lever @ 2015-07-24 17:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Sagi Grimberg, Christoph Hellwig, linux-rdma, Liran Liss, Oren Duer


On Jul 24, 2015, at 12:26 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote:

> On Fri, Jul 24, 2015 at 10:36:07AM -0400, Chuck Lever wrote:
> 
>> Unfinished, but operational:
>> 
>> http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=shortlog;h=refs/heads/nfs-rdma-future
> 
> Nice..
> 
> Can you spend some time and reflect on how some of this could be
> lowered into the core code?

The point of the prototype is to start thinking about this with
actual data. :-) So I’m with you.


> The FMR and FRWR side have many
> similarities now..


>> FRWR is seeing a 10-15% throughput reduction with 8-thread dbench,
>> but a 5% improvement with 16-thread fio IOPS. 4K and 8K direct
>> read and write are negatively impacted.
> 
> I'm not surprised since invalidate is sync. I belive you need to
> incorporate SEND WITH INVALIDATE to substantially recover this
> overhead.

I tried to find another kernel ULP using SEND WITH INVALIDATE, but
I didn’t see one. I assume you mean the NFS server would use this
WR when replying, to knock down the RPC’s client MRs remotely?


> It would be neat if the RQ could continue to advance while waiting for
> the invalidate.. That looks almost doable..

The new reply handling work queue is not restricted to serial reply
processing. Unlike the tasklet model, multiple RPC replies can be
processed at once, and can run across all CPUs.

The tasklet was global, shared across all RPC/RDMA receive queues on
that client. AFAICT there is very little else that is shared between
RPC replies.

I think using a work queue instead may be a tiny bit slower for each
RPC (perhaps due to additional context switching), but will allow
much better scaling with the number of transports and mount points
the client creates.

I may not have understood your comment.


>> I converted the RPC reply handler tasklet to a work queue context
>> to allow sleeping. A new .ro_unmap_sync method is invoked after
>> the RPC/RDMA header is parsed but before xprt_complete_rqst()
>> wakes up the waiting RPC.
> 
> .. so the issue is the RPC must be substantially parsed to learn which
> MR it is associated with to schedule the invalidate? 

Only the RPC/RDMA header has to be parsed, but yes. The needed
parsing is handled in rpcrdma_reply_handler right before the
.ro_unmap_unsync call.

Parsing the RPC reply results is then done by the upper layer
once xprt_complete_rqst() has run.


>> This is actually much more efficient than the current logic,
>> which serially does an ib_unmap_fmr() for each MR the RPC owns.
>> So FMR overall performs better with this change.
> 
> Interesting..
> 
>> Because the next RPC cannot awaken until the last send completes,
>> send queue accounting is based on RPC/RDMA credit flow control.
> 
> So for FRWR the sync invalidate effectively guarentees all SQEs
> related to this RPC are flushed. That seems reasonable, if the number
> of SQEs and CQEs are properly sized in relation to the RPC slot count
> it should be workable..

Yes, both queues are sized in rpcrdma_ep_create() according to
the slot count / credit limit.


> How does FMR and PHYS synchronize?

We still rely on timing there.

The RPC's send buffer may be re-allocated by the next RPC
if that RPC wants to send a bigger request than this one. Thus
there is still a tiny but non-zero risk the HCA may not be
done with that send buffer. Closing that hole is still on my
to-do list.


>> I’m sure there are some details here that still need to be
>> addressed, but this fixes the big problem with FRWR send queue
>> accounting, which was that LOCAL_INV WRs would continue to
>> consume SQEs while another RPC was allowed to start.
> 
> Did you test without that artificial limit you mentioned before?

Yes. No problems now, the limit is removed in the last patch
in that series.


> I'm also wondering about this:
> 
>> During some other testing I found that when a completion upcall
>> returns to the provider leaving CQEs still on the completion queue,
>> there is a non-zero probability that a completion will be lost.
> 
> What does lost mean?

Lost means a WC in the CQ is skipped by ib_poll_cq().

In other words, I expected that during the next upcall,
ib_poll_cq() would return WCs that were not processed, starting
with the last one on the CQ when my upcall handler returned.

I found this by intentionally having the completion handler
process only one or two WCs and then return.


> The CQ is edge triggered, so if you don't drain it you might not get
> another timely CQ callback (which is bad), but CQEs themselves should
> not be lost.

I’m not sure I fully understand this problem, it might
even be my misuderstanding about ib_poll_cq(). But forcing
the completion upcall handler to completely drain the CQ
during each upcall prevents the issue.

(Note, I don’t think fixing this is a pre-requisite for
the synchronous invalidate work, but it just happened
to be in the patch queue).


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API
       [not found]                                             ` <903CDFB5-04FE-47B6-B044-E960E8A8BC4C-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2015-07-24 19:10                                               ` Jason Gunthorpe
       [not found]                                                 ` <20150724191003.GA26225-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-07-27 15:57                                               ` Chuck Lever
  1 sibling, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-24 19:10 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Sagi Grimberg, Christoph Hellwig, linux-rdma, Liran Liss, Oren Duer

On Fri, Jul 24, 2015 at 01:46:05PM -0400, Chuck Lever wrote:
> > I'm not surprised since invalidate is sync. I belive you need to
> > incorporate SEND WITH INVALIDATE to substantially recover this
> > overhead.
> 
> I tried to find another kernel ULP using SEND WITH INVALIDATE, but
> I didn’t see one. I assume you mean the NFS server would use this
> WR when replying, to knock down the RPC’s client MRs remotely?

Yes. I think the issue with it not being used in the kernel is mainly
to do with lack of standardization. The verb cannot be used unless
both sides negotiate it and perhaps the older RDMA protocols have not
been revised to include that.

For simple testing purposes it shouldn't be too hard to force it to
get an idea if it is worth perusing. On the RECV work completion check
if the right rkey was invalidated and skip the invalidation
step. Presumably the HCA does all this internally very quickly..
 
> I may not have understood your comment.

Okay, I didn't looke closely at the entire series together..

> Only the RPC/RDMA header has to be parsed, but yes. The needed
> parsing is handled in rpcrdma_reply_handler right before the
> .ro_unmap_unsync call.

Right, okay, if this could be done in the rq callback itself rather
than bounce to a wq and immediately turn around the needed invalidate
posts you'd get back a little more overhead by reducing the time to
turn it around... Then bounce to the wq to complete from the SQ
callback ?

> > Did you test without that artificial limit you mentioned before?
> 
> Yes. No problems now, the limit is removed in the last patch
> in that series.

Okay, so that was just overflowing the sq due to not accounting..

> >> During some other testing I found that when a completion upcall
> >> returns to the provider leaving CQEs still on the completion queue,
> >> there is a non-zero probability that a completion will be lost.
> > 
> > What does lost mean?
> 
> Lost means a WC in the CQ is skipped by ib_poll_cq().
> 
> In other words, I expected that during the next upcall,
> ib_poll_cq() would return WCs that were not processed, starting
> with the last one on the CQ when my upcall handler returned.

Yes, this is what it should do. I wouldn't expect a timely upcall, but
none should be lost.

> I found this by intentionally having the completion handler
> process only one or two WCs and then return.
> 
> > The CQ is edge triggered, so if you don't drain it you might not get
> > another timely CQ callback (which is bad), but CQEs themselves should
> > not be lost.
> 
> I’m not sure I fully understand this problem, it might
> even be my misuderstanding about ib_poll_cq(). But forcing
> the completion upcall handler to completely drain the CQ
> during each upcall prevents the issue.

CQs should never be lost.

The idea that you can completely drain the CQ during the upcall is
inherently racey, so this cannot be the answer to whatever the problem
is..

Is there any chance this is still an artifact of the lazy SQE flow
control? The RDMA buffer SQE recycling is solved by the sync
invalidate, but workloads that don't use RDMA buffers (ie SEND only)
will still run without proper flow control...

If you are totally certain a CQ was dropped from ib_poll_cq, and that
the SQ is not overflowing by strict accounting, then I'd say driver
problem, but the odds of having an undetected driver problem like that
at this point seem somehow small...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API
       [not found]                                                 ` <20150724191003.GA26225-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-24 19:59                                                   ` Chuck Lever
       [not found]                                                     ` <A1A0BF6E-992A-4B34-8D24-EA8AA8D6983B-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Chuck Lever @ 2015-07-24 19:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Sagi Grimberg, Christoph Hellwig, linux-rdma, Liran Liss, Oren Duer


On Jul 24, 2015, at 3:10 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote:

> On Fri, Jul 24, 2015 at 01:46:05PM -0400, Chuck Lever wrote:
>>> I'm not surprised since invalidate is sync. I belive you need to
>>> incorporate SEND WITH INVALIDATE to substantially recover this
>>> overhead.
>> 
>> I tried to find another kernel ULP using SEND WITH INVALIDATE, but
>> I didn’t see one. I assume you mean the NFS server would use this
>> WR when replying, to knock down the RPC’s client MRs remotely?
> 
> Yes. I think the issue with it not being used in the kernel is mainly
> to do with lack of standardization. The verb cannot be used unless
> both sides negotiate it and perhaps the older RDMA protocols have not
> been revised to include that.

And RPC-over-RDMA version 1 does not have any way to signal that
the server has invalidated the MRs. Such signaling would be a
pre-requisite to allow the Linux NFS/RDMA client to interoperate
with non-Linux NFS/RDMA servers that do not have such support.


>> Only the RPC/RDMA header has to be parsed, but yes. The needed
>> parsing is handled in rpcrdma_reply_handler right before the
>> .ro_unmap_unsync call.
> 
> Right, okay, if this could be done in the rq callback itself rather
> than bounce to a wq and immediately turn around the needed invalidate
> posts you'd get back a little more overhead by reducing the time to
> turn it around... Then bounce to the wq to complete from the SQ
> callback ?

For FRWR, you could post LINV from the receive completion upcall
handler, and handle the rest of the invalidation from the send
completion upcall, then poke the RPC reply handler.

But this wouldn’t work at all for FMR, whose unmap verb is
synchronous, would it?

I’m not sure we’d buy more than a few microseconds here, and
the receive upcall is single-threaded.

I’ll move the “lost WC” discussion to another thread.


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API
       [not found]                                                     ` <A1A0BF6E-992A-4B34-8D24-EA8AA8D6983B-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2015-07-24 20:24                                                       ` Jason Gunthorpe
       [not found]                                                         ` <20150724202445.GA28033-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-24 20:24 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Sagi Grimberg, Christoph Hellwig, linux-rdma, Liran Liss, Oren Duer

On Fri, Jul 24, 2015 at 03:59:06PM -0400, Chuck Lever wrote:
> And RPC-over-RDMA version 1 does not have any way to signal that
> the server has invalidated the MRs. Such signaling would be a
> pre-requisite to allow the Linux NFS/RDMA client to interoperate
> with non-Linux NFS/RDMA servers that do not have such support.

You can implement client support immediately, nothing special is
required.

When processing a SEND WC check ex.invalidate_rkey and
IB_WC_WITH_INVALIDATE. If that rkey matches the MR associated with
that RPC slot then skip the invalidate.

No protocol negotiation is required at that point.

I am unclear what happens sever side if the server starts issuing
SEND_WITH_INVALIDATE to a client that doesn't expect it. The net
result is a MR would be invalidated twice. I don't know if this is OK
or not.

If it is OK, then the server can probably just start using it as
well without negotiation.

Otherwise the client has to signal the server it supports it once at
connection setup.

> For FRWR, you could post LINV from the receive completion upcall
> handler, and handle the rest of the invalidation from the send
> completion upcall, then poke the RPC reply handler.

Yes

> But this wouldn’t work at all for FMR, whose unmap verb is
> synchronous, would it?

It could run the FMR unmap in a thread/workqueue/tasklet and then
complete the RPC side from that context. Same basic idea, using your
taslket not the driver's sendq context.

> I’m not sure we’d buy more than a few microseconds here, and
> the receive upcall is single-threaded.

Not sure on how that matches your performance goals, just remarking
that lauching the invalidate in the recv upcall and completing
processing from the sendq upcall is the very best performance you can
expect from this API.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* RE: [PATCH WIP 38/43] iser-target: Port to new memory registration API
       [not found]                                                         ` <20150724202445.GA28033-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-24 22:13                                                           ` Steve Wise
  2015-07-24 22:44                                                             ` Jason Gunthorpe
  0 siblings, 1 reply; 142+ messages in thread
From: Steve Wise @ 2015-07-24 22:13 UTC (permalink / raw)
  To: 'Jason Gunthorpe', 'Chuck Lever'
  Cc: 'Sagi Grimberg', 'Christoph Hellwig',
	'linux-rdma', 'Liran Liss', 'Oren Duer'



> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner@vger.kernel.org] On Behalf Of Jason Gunthorpe
> Sent: Friday, July 24, 2015 3:25 PM
> To: Chuck Lever
> Cc: Sagi Grimberg; Christoph Hellwig; linux-rdma; Liran Liss; Oren Duer
> Subject: Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API
> 
> On Fri, Jul 24, 2015 at 03:59:06PM -0400, Chuck Lever wrote:
> > And RPC-over-RDMA version 1 does not have any way to signal that
> > the server has invalidated the MRs. Such signaling would be a
> > pre-requisite to allow the Linux NFS/RDMA client to interoperate
> > with non-Linux NFS/RDMA servers that do not have such support.
> 
> You can implement client support immediately, nothing special is
> required.
> 
> When processing a SEND WC check ex.invalidate_rkey and
> IB_WC_WITH_INVALIDATE. If that rkey matches the MR associated with
> that RPC slot then skip the invalidate.
> 
> No protocol negotiation is required at that point.
> 
> I am unclear what happens sever side if the server starts issuing
> SEND_WITH_INVALIDATE to a client that doesn't expect it. The net
> result is a MR would be invalidated twice. I don't know if this is OK
> or not.
> 

It is ok to invalidate an already-invalid MR.

> If it is OK, then the server can probably just start using it as
> well without negotiation.
> 
> Otherwise the client has to signal the server it supports it once at
> connection setup.
> 
> > For FRWR, you could post LINV from the receive completion upcall
> > handler, and handle the rest of the invalidation from the send
> > completion upcall, then poke the RPC reply handler.
> 
> Yes
> 
> > But this wouldn’t work at all for FMR, whose unmap verb is
> > synchronous, would it?
> 
> It could run the FMR unmap in a thread/workqueue/tasklet and then
> complete the RPC side from that context. Same basic idea, using your
> taslket not the driver's sendq context.
> 
> > I’m not sure we’d buy more than a few microseconds here, and
> > the receive upcall is single-threaded.
> 
> Not sure on how that matches your performance goals, just remarking
> that lauching the invalidate in the recv upcall and completing
> processing from the sendq upcall is the very best performance you can
> expect from this API.
> 
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API
  2015-07-24 22:13                                                           ` Steve Wise
@ 2015-07-24 22:44                                                             ` Jason Gunthorpe
  0 siblings, 0 replies; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-24 22:44 UTC (permalink / raw)
  To: Steve Wise
  Cc: 'Chuck Lever', 'Sagi Grimberg',
	'Christoph Hellwig', 'linux-rdma',
	'Liran Liss', 'Oren Duer'

> > I am unclear what happens sever side if the server starts issuing
> > SEND_WITH_INVALIDATE to a client that doesn't expect it. The net
> > result is a MR would be invalidated twice. I don't know if this is OK
> > or not.
> 
> It is ok to invalidate an already-invalid MR.

Nice, ah but I forgot about the last issue..

A server must not send the SEND_WITH_INVALIDATE OP to a client HCA
that does not support it in HW. At least on IB the operation code is
different, so it will break..

So negotiation is needed..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API
       [not found]                             ` <20150723190855.GB31577-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-26  8:51                               ` Sagi Grimberg
  0 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-26  8:51 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/23/2015 10:08 PM, Jason Gunthorpe wrote:
> On Thu, Jul 23, 2015 at 01:07:56PM +0300, Sagi Grimberg wrote:
>> On 7/22/2015 10:05 PM, Jason Gunthorpe wrote:
>> The reason I named max_entries is because might might not be pages but
>> real SG elements. It stands for maximum registration entries.
>>
>> Do you have a better name?
>
> I wouldn't try and be both..
>
> Use 'max_num_sg' and document that no aggregate scatterlist with
> length larger than 'max_num_sg*PAGE_SIZE' or with more entries than
> max_num_sg can be submitted?
>
> Maybe document with ARB_SG that it is not length limited?

OK, I can do that.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                     ` <20150723184221.GA30303-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-26  8:54                       ` Sagi Grimberg
  0 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-26  8:54 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

>
> I would like to see the kdoc for ib_map_mr_sg explain exactly what is
> required of the caller, maybe just hoist this bit from the
> ib_sg_to_pages

I'll add the kdoc.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                     ` <20150723175535.GE25174-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-26  9:37                       ` Sagi Grimberg
       [not found]                         ` <55B4AA73.3090803-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  2015-08-19 11:56                       ` Sagi Grimberg
  1 sibling, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-26  9:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/23/2015 8:55 PM, Jason Gunthorpe wrote:
> On Thu, Jul 23, 2015 at 01:15:16PM +0300, Sagi Grimberg wrote:
>>> I was hoping we'd move the DMA flush and translate into here and make
>>> it mandatory. Is there any reason not to do that?
>>
>> The reason I didn't added it in was so the ULPs can make sure they meet
>> the restrictions of ib_map_mr_sg(). Allow SRP to iterate on his
>> SG list set partials and iSER to detect gaps (they need to dma map
>> for that).
>
> The ULP can always get the sg list's virtual address to check for
> gaps. Page aligned gaps are always OK.

I guess I can pull DMA mapping in there, but we will need an opposite
routine ib_umap_mr_sg() since it'll be weird if the ULP will do dma
unmap without doing the map...

>
> BTW, the logic in ib_sg_to_pages should be checking that directly, as
> coded, it won't work with swiotlb:
>
> // Only the first SG entry can start unaligned
> if (i && page_addr != dma_addr)
>      return EINVAL;
> // Only the last SG entry can end unaligned
> if ((page_addr + dma_len) & PAGE_MASK != end_dma_addr)
>   if (!is_last)
>       return EINVAL;
>
> Don't use sg->offset after dma mapping.
>
> The biggest problem with checking the virtual address is
> swiotlb. However, if swiotlb is used this API is basically broken as
> swiotlb downgrades everything to a 2k alignment, which means we only
> ever get 1 s/g entry.

Can you explain what do you mean by "downgrades everything to a 2k 
alignment"? If the ULP is responsible for a PAGE_SIZE alignment than
how would this get out of alignment with swiotlb?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                         ` <20150723185126.GA31346-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-26  9:45                           ` Sagi Grimberg
       [not found]                             ` <55B4AC26.20405-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-26  9:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/23/2015 9:51 PM, Jason Gunthorpe wrote:
> On Thu, Jul 23, 2015 at 07:47:14PM +0300, Sagi Grimberg wrote:
>
>>>>> So we force ULPs to think about what they are doing properly, and we
>>>>> get a chance to actually force lkey to be local use only for IB.
>>>>
>>>> The lkey/rkey decision is passed in the fastreg post_send().
>>>
>>> That is too late to check the access flags.
>>
>> Why? the access permissions are kept in the mr context?
>
> Sure, one could do if (key == mr->lkey) .. check lkey flags in the
> post, but that seems silly considering we want the post inlined..

Why should we check the lkey/rkey access flags in the post?

>
>> I can move it to the post interface if it makes more sense.
>> the access is kind of out of place in the mapping routine anyway...
>
> All the dma routines have an access equivalent during map, I don't
> think it is out of place..
>
> To my mind, the map is the point where the MR should crystallize into
> an rkey or lkey MR, not at the post.

I'm not sure I understand why the lkey/rkey should be set at the map
routine. To me, it seems more natural to map_mr_sg and then either
register the lkey or the rkey.

It's easy enough to move the key arg to ib_map_mr_sg, but I don't see a
good reason why at the moment.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API
       [not found]                                             ` <903CDFB5-04FE-47B6-B044-E960E8A8BC4C-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  2015-07-24 19:10                                               ` Jason Gunthorpe
@ 2015-07-27 15:57                                               ` Chuck Lever
       [not found]                                                 ` <8A2BC019-1DC0-4531-9659-3181EE9A4B43-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 142+ messages in thread
From: Chuck Lever @ 2015-07-27 15:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Sagi Grimberg, Christoph Hellwig, linux-rdma, Liran Liss, Oren Duer


On Jul 24, 2015, at 1:46 PM, Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:

> On Jul 24, 2015, at 12:26 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote:
> 
>> On Fri, Jul 24, 2015 at 10:36:07AM -0400, Chuck Lever wrote:
>> 
>>> Unfinished, but operational:
>>> 
>>> http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=shortlog;h=refs/heads/nfs-rdma-future
>> 
>> Nice..
>> 
>> Can you spend some time and reflect on how some of this could be
>> lowered into the core code?
> 
> The point of the prototype is to start thinking about this with
> actual data. :-) So I’m with you.
> 
> 
>> The FMR and FRWR side have many
>> similarities now..

IMO ib_unmap_fmr is a very different animal from LOCAL_INV WR.

ib_unmap_fmr is synchronous, provides no ordering guarantees with
send queue operations, and does not depend on a connected QP to
be available. You could emulate asynchronicity with a work queue
but that still does not provide SQ ordering. There are few if any
failure modes for ib_unmap_fmr.

LOCAL_INV WR is asynchronous, provides strong ordering with other
send queue operations, but does require a non-NULL QP in RTS to
work. The failure modes are complex: without a QP in RTS, the
post_send fails. If the QP leaves RTS while LOCAL_INV is in
flight, the LINV flushes. MRs can be left in a state where the
MR's rkey is not in sync with the HW, in which case a
synchronous operation may be required to recover the MR.

These are the reasons I elected to employ a synchronous
invalidation model in the RPC reply handler. This model can be
made to work adequately for both FMR and FRWR, provides
proper DMA unmap ordering guarantees for both, and hides wonky
transport disconnect recovery mechanics. The only downside
is the performance cost.

A generic MR invalidation API that buries underlying verb
activity and guarantees proper DMA unmap ordering I think would
have to be synchronous.

In the long run, two things will change: first, FMR will
eventually be deprecated; and second, ULPs will likely adopt
SEND_WITH_INV.

The complexion of MR invalidation could be vastly different in
a few years: handled entirely by the target-side, and only
verified by the initiator. Verification doesn't need to sleep,
and the slow path (the target failed to invalidate) can be
deferred.

All that would be necessary at that point would be a synchronous
invalidation API (synchronous in the sense that the invalidate
is complete if the API returns without error).


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                         ` <55B4AA73.3090803-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-07-27 17:04                           ` Jason Gunthorpe
       [not found]                             ` <20150727170459.GA18348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-27 17:04 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Sun, Jul 26, 2015 at 12:37:55PM +0300, Sagi Grimberg wrote:
> I guess I can pull DMA mapping in there, but we will need an opposite
> routine ib_umap_mr_sg() since it'll be weird if the ULP will do dma
> unmap without doing the map...

Yes, and ideally it would help ULPs to order these operations
properly. eg we shouldn't be abusing the DMA API and unmapping before
invalidate completes by default. That breaks obscure stuff in various
ways...

> >The biggest problem with checking the virtual address is
> >swiotlb. However, if swiotlb is used this API is basically broken as
> >swiotlb downgrades everything to a 2k alignment, which means we only
> >ever get 1 s/g entry.
> 
> Can you explain what do you mean by "downgrades everything to a 2k
> alignment"? If the ULP is responsible for a PAGE_SIZE alignment than
> how would this get out of alignment with swiotlb?

swiotlb copies all DMA maps to a shared buffer below 4G so it can be
used with 32 bit devices.

The shared buffer is managed in a way that copies each s/g element to
a continuous 2k aligned subsection of the buffer.

Basically, swiotlb realigns everything that passes through it.

The DMA API allows this, so ultimately, code has to check the dma
physical address when concerned about alignment.. But we should not
expect this to commonly fail.

So, something like..

  if (!ib_does_sgl_fit_in_mr(mr,sg))
     .. bounce buffer ..
     
  if (!ib_map_mr_sg(mr,sg)) // does dma mapping and checks it
     .. bounce buffer ..
     
  .. post ..
  .. send invalidate ..
  .. catch invalidate completion ...

  ib_unmap_mr(mr); // does dma unmap

?
 
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                             ` <55B4AC26.20405-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-07-27 17:14                               ` Jason Gunthorpe
       [not found]                                 ` <20150727171441.GC18348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-27 17:14 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Sun, Jul 26, 2015 at 12:45:10PM +0300, Sagi Grimberg wrote:
> On 7/23/2015 9:51 PM, Jason Gunthorpe wrote:
> >On Thu, Jul 23, 2015 at 07:47:14PM +0300, Sagi Grimberg wrote:
> >
> >>>>>So we force ULPs to think about what they are doing properly, and we
> >>>>>get a chance to actually force lkey to be local use only for IB.
> >>>>
> >>>>The lkey/rkey decision is passed in the fastreg post_send().
> >>>
> >>>That is too late to check the access flags.
> >>
> >>Why? the access permissions are kept in the mr context?
> >
> >Sure, one could do if (key == mr->lkey) .. check lkey flags in the
> >post, but that seems silly considering we want the post inlined..
> 
> Why should we check the lkey/rkey access flags in the post?

Eh? It was your idea..

I just want to check the access flags and force lkey's to not have
ACCESS_REMOTE set without complaining loudly.

To do that you need to know if the mr is a lkey/rkey, and you need to
know the flags.

> >>I can move it to the post interface if it makes more sense.
> >>the access is kind of out of place in the mapping routine anyway...
> >
> >All the dma routines have an access equivalent during map, I don't
> >think it is out of place..
> >
> >To my mind, the map is the point where the MR should crystallize into
> >an rkey or lkey MR, not at the post.
> 
> I'm not sure I understand why the lkey/rkey should be set at the map
> routine. To me, it seems more natural to map_mr_sg and then either
> register the lkey or the rkey.

We need to check the access flags to put a stop to this remote access
lkey security problem. That means we need to label every MR as a lkey
or rkey MR.

No more MR's can be both nonsense.

Pick a place to do that and enforce that IB cannot have remote access
LKEYs.

My vote is to do that work in map, because I don't think it make any
sense in post (post should not fail)

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API
       [not found]                                                 ` <8A2BC019-1DC0-4531-9659-3181EE9A4B43-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2015-07-27 17:25                                                   ` Jason Gunthorpe
       [not found]                                                     ` <20150727172510.GD18348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-27 17:25 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Sagi Grimberg, Christoph Hellwig, linux-rdma, Liran Liss, Oren Duer

On Mon, Jul 27, 2015 at 11:57:46AM -0400, Chuck Lever wrote:
> IMO ib_unmap_fmr is a very different animal from LOCAL_INV WR.

Sure, but how many of these properties does NFS actually care about,
now that it is running the API properly?

> ib_unmap_fmr is synchronous, provides no ordering guarantees with
> send queue operations, and does not depend on a connected QP to
> be available. You could emulate asynchronicity with a work queue
> but that still does not provide SQ ordering. There are few if any
> failure modes for ib_unmap_fmr.

I'm having a hard time seeing how SQ ordering is important when the
API is used properly. Once you explicitly order the DMA unmap after
the invalidate completion you no longer need implicit SQ ordering

Is there a way to combine SQ implicit ordering and the Linux DMA API
together correctly?

> flight, the LINV flushes. MRs can be left in a state where the
> MR's rkey is not in sync with the HW, in which case a
> synchronous operation may be required to recover the MR.

The error handling seems like a trivial difference, a
ib_recover_failed_qp_mr(mr); sort of call could resync everything
after a QP blows up..

> The complexion of MR invalidation could be vastly different in
> a few years: handled entirely by the target-side, and only
> verified by the initiator. Verification doesn't need to sleep,
> and the slow path (the target failed to invalidate) can be
> deferred.

The initiator still needs to have the ability to issue the invalidate
if the target doesn't do it, so all the code still exists..

Even ignoring those issues, should we be talking about putting FMR
under the new ib_alloc_mr and ib_map_mr interfaces? Would that help
much even if the post and unmap flows are totally different?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                                 ` <20150727171441.GC18348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-27 20:11                                   ` Steve Wise
       [not found]                                     ` <55B69058.70403-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Steve Wise @ 2015-07-27 20:11 UTC (permalink / raw)
  To: Jason Gunthorpe, Sagi Grimberg
  Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/27/2015 12:14 PM, Jason Gunthorpe wrote:
> On Sun, Jul 26, 2015 at 12:45:10PM +0300, Sagi Grimberg wrote:
>> On 7/23/2015 9:51 PM, Jason Gunthorpe wrote:
>>> On Thu, Jul 23, 2015 at 07:47:14PM +0300, Sagi Grimberg wrote:
>>>
>>>>>>> So we force ULPs to think about what they are doing properly, and we
>>>>>>> get a chance to actually force lkey to be local use only for IB.
>>>>>> The lkey/rkey decision is passed in the fastreg post_send().
>>>>> That is too late to check the access flags.
>>>> Why? the access permissions are kept in the mr context?
>>> Sure, one could do if (key == mr->lkey) .. check lkey flags in the
>>> post, but that seems silly considering we want the post inlined..
>> Why should we check the lkey/rkey access flags in the post?
> Eh? It was your idea..
>
> I just want to check the access flags and force lkey's to not have
> ACCESS_REMOTE set without complaining loudly.
>
> To do that you need to know if the mr is a lkey/rkey, and you need to
> know the flags.
>
>>>> I can move it to the post interface if it makes more sense.
>>>> the access is kind of out of place in the mapping routine anyway...
>>> All the dma routines have an access equivalent during map, I don't
>>> think it is out of place..
>>>
>>> To my mind, the map is the point where the MR should crystallize into
>>> an rkey or lkey MR, not at the post.
>> I'm not sure I understand why the lkey/rkey should be set at the map
>> routine. To me, it seems more natural to map_mr_sg and then either
>> register the lkey or the rkey.
> We need to check the access flags to put a stop to this remote access
> lkey security problem. That means we need to label every MR as a lkey
> or rkey MR.
>
> No more MR's can be both nonsense.

Well technically an MR with REMOTE_WRITE also has LOCAL_WRITE set. So 
you are proposing the core disallow a ULP from using the lkey for this 
type of MR?  Say in a RECV sge?



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                                     ` <55B69058.70403-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2015-07-27 20:29                                       ` Jason Gunthorpe
  0 siblings, 0 replies; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-27 20:29 UTC (permalink / raw)
  To: Steve Wise
  Cc: Sagi Grimberg, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Oren Duer

On Mon, Jul 27, 2015 at 03:11:04PM -0500, Steve Wise wrote:
> Well technically an MR with REMOTE_WRITE also has LOCAL_WRITE set. So you
> are proposing the core disallow a ULP from using the lkey for this type of
> MR?  Say in a RECV sge?

Yes, absolutely.

It is wrong anyhow, RECV isn't special, if you RECV into memory that
is exposed via a rkey MR, you have to invalidate that MR and fence DMA
before you can touch the buffer. Only very special, carefully
designed, cases could avoid that.

We don't have those cases, so lets just ban it. The only exception is
the iWarp RDMA READ thing.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 21/43] mlx5: Allocate a private page list in ib_alloc_mr
       [not found]     ` <1437548143-24893-22-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-07-22 16:46       ` Christoph Hellwig
@ 2015-07-28 10:57       ` Haggai Eran
       [not found]         ` <55B75FFC.6040200-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 142+ messages in thread
From: Haggai Eran @ 2015-07-28 10:57 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

Hi Sagi,

On 22/07/2015 09:55, Sagi Grimberg wrote:
> Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> ---
>  drivers/infiniband/hw/mlx5/mlx5_ib.h |  5 ++++
>  drivers/infiniband/hw/mlx5/mr.c      | 45 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> index c2916f1..df5e959 100644
> --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
> @@ -315,6 +315,11 @@ enum mlx5_ib_mtt_access_flags {
>  
>  struct mlx5_ib_mr {
>  	struct ib_mr		ibmr;
> +	u64		        *pl;
> +	__be64			*mpl;
> +	dma_addr_t		pl_map;
Nit: could you choose more descriptive names for these fields? It can be
difficult to understand what they mean just based on the acronym.

> +	int			ndescs;
This one isn't used in this patch, right?

> +	int			max_descs;
>  	struct mlx5_core_mr	mmr;
>  	struct ib_umem	       *umem;
>  	struct mlx5_shared_mr_info	*smr_info;

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]     ` <1437548143-24893-29-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2015-07-22 16:50       ` Christoph Hellwig
  2015-07-22 18:02       ` Jason Gunthorpe
@ 2015-07-28 11:20       ` Haggai Eran
  2 siblings, 0 replies; 142+ messages in thread
From: Haggai Eran @ 2015-07-28 11:20 UTC (permalink / raw)
  To: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer

On 22/07/2015 09:55, Sagi Grimberg wrote:
> +/**
> + * ib_sg_to_pages() - Convert a sg list to a page vector
> + * @dev:           ib device
> + * @sgl:           dma mapped scatterlist
> + * @sg_nents:      number of entries in sg
> + * @max_pages:     maximum pages allowed
> + * @pages:         output page vector
> + * @npages:        output number of mapped pages
> + * @length:        output total byte length
> + * @offset:        output first byte offset
> + *
> + * Core service helper for drivers to convert a scatter
> + * list to a page vector. The assumption is that the
> + * sg must meet the following conditions:
> + * - Only the first sg is allowed to have an offset
> + * - All the elements are of the same size - PAGE_SIZE
> + * - The last element is allowed to have length less than
> + *   PAGE_SIZE
> + *
> + * If any of those conditions is not met, the routine will
> + * fail with EINVAL.
> + */
> +int ib_sg_to_pages(struct scatterlist *sgl,
> +		   unsigned short sg_nents,
> +		   unsigned short max_pages,
> +		   u64 *pages, u32 *npages,
> +		   u32 *length, u64 *offset)
> +{
> +	struct scatterlist *sg;
> +	u64 last_end_dma_addr = 0, last_page_addr = 0;
> +	unsigned int last_page_off = 0;
> +	int i, j = 0;
> +
> +	/* TODO: We can do better with huge pages */
> +
> +	*offset = sg_dma_address(&sgl[0]);
> +	*length = 0;
> +
> +	for_each_sg(sgl, sg, sg_nents, i) {
> +		u64 dma_addr = sg_dma_address(sg);
> +		unsigned int dma_len = sg_dma_len(sg);
> +		u64 end_dma_addr = dma_addr + dma_len;
> +		u64 page_addr = dma_addr & PAGE_MASK;
> +
> +		*length += dma_len;
> +
> +		/* Fail we ran out of pages */
> +		if (unlikely(j > max_pages))
> +			return -EINVAL;
> +
> +		if (i && sg->offset) {
> +			if (unlikely((last_end_dma_addr) != dma_addr)) {
> +				/* gap - fail */
> +				goto err;
> +			}
> +			if (last_page_off + dma_len < PAGE_SIZE) {
> +				/* chunk this fragment with the last */
> +				last_end_dma_addr += dma_len;
> +				last_page_off += dma_len;
> +				continue;
> +			} else {
> +				/* map starting from the next page */
> +				page_addr = last_page_addr + PAGE_SIZE;
> +				dma_len -= PAGE_SIZE - last_page_off;
> +			}
> +		}
> +
> +		do {
> +			pages[j++] = page_addr;
I think this line could overrun the pages buffer. The test above only checks
at the beginning of the sg, but with an sg larger than PAGE_SIZE, you could
still overrun.

> +			page_addr += PAGE_SIZE;
> +		} while (page_addr < end_dma_addr);
> +
> +		last_end_dma_addr = end_dma_addr;
> +		last_page_addr = end_dma_addr & PAGE_MASK;
> +		last_page_off = end_dma_addr & ~PAGE_MASK;
> +	}
> +
> +	*npages = j;
> +
> +	return 0;
> +err:
> +	pr_err("RDMA alignment violation\n");
> +	for_each_sg(sgl, sg, sg_nents, i) {
> +		u64 dma_addr = sg_dma_address(sg);
> +		unsigned int dma_len = sg_dma_len(sg);
> +
> +		pr_err("sg[%d]: offset=0x%x, dma_addr=0x%llx, dma_len=0x%x\n",
> +			i, sg->offset, dma_addr, dma_len);
> +	}
> +
> +	return -EINVAL;
> +}
> +EXPORT_SYMBOL(ib_sg_to_pages);

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API
       [not found]                                                     ` <20150727172510.GD18348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-28 20:06                                                       ` Chuck Lever
       [not found]                                                         ` <B045BAC2-0360-4D97-A220-7DB52AF90BF7-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Chuck Lever @ 2015-07-28 20:06 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Sagi Grimberg, Christoph Hellwig, linux-rdma, Liran Liss, Oren Duer


On Jul 27, 2015, at 1:25 PM, Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote:

> On Mon, Jul 27, 2015 at 11:57:46AM -0400, Chuck Lever wrote:
>> IMO ib_unmap_fmr is a very different animal from LOCAL_INV WR.
> 
> Sure, but how many of these properties does NFS actually care about,
> now that it is running the API properly?
> 
>> ib_unmap_fmr is synchronous, provides no ordering guarantees with
>> send queue operations, and does not depend on a connected QP to
>> be available. You could emulate asynchronicity with a work queue
>> but that still does not provide SQ ordering. There are few if any
>> failure modes for ib_unmap_fmr.
> 
> I'm having a hard time seeing how SQ ordering is important when the
> API is used properly. Once you explicitly order the DMA unmap after
> the invalidate completion you no longer need implicit SQ ordering
> 
> Is there a way to combine SQ implicit ordering and the Linux DMA API
> together correctly?
> 
>> flight, the LINV flushes. MRs can be left in a state where the
>> MR's rkey is not in sync with the HW, in which case a
>> synchronous operation may be required to recover the MR.
> 
> The error handling seems like a trivial difference, a
> ib_recover_failed_qp_mr(mr); sort of call could resync everything
> after a QP blows up..

Out of interest, why does this need to be exposed to ULPs?

I don't feel a ULP should have to deal with broken MRs
following a transport disconnect. It falls in that category
of things every ULP that supports FRWR has to do, and each
has plenty of opportunity to get it wrong.


>> The complexion of MR invalidation could be vastly different in
>> a few years: handled entirely by the target-side, and only
>> verified by the initiator. Verification doesn't need to sleep,
>> and the slow path (the target failed to invalidate) can be
>> deferred.
> 
> The initiator still needs to have the ability to issue the invalidate
> if the target doesn't do it, so all the code still exists..
> 
> Even ignoring those issues, should we be talking about putting FMR
> under the new ib_alloc_mr and ib_map_mr interfaces? Would that help
> much even if the post and unmap flows are totally different?

My opinion is FMR should be separate from the new API. Some have
expressed an interest in combining all kernel registration
mechanisms under a single API, but they seem too different from
each other to do that successfully.


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API
       [not found]                                                         ` <B045BAC2-0360-4D97-A220-7DB52AF90BF7-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2015-07-29  6:32                                                           ` Christoph Hellwig
  0 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2015-07-29  6:32 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Jason Gunthorpe, Sagi Grimberg, Christoph Hellwig, linux-rdma,
	Liran Liss, Oren Duer

On Tue, Jul 28, 2015 at 04:06:23PM -0400, Chuck Lever wrote:
> My opinion is FMR should be separate from the new API. Some have
> expressed an interest in combining all kernel registration
> mechanisms under a single API, but they seem too different from
> each other to do that successfully.

Hi Chuck,

I think we can fit FMR partially under this API, e.g. alloc and map_sg
fit in very well, but then instead of post and invalidate we'll need to
call into slightly modified existing FMR pool APIs.

I'd suggest to postponed the issue for now, I'll prepare a prototype
once we've finished the FR-side API.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                             ` <20150727170459.GA18348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-30  7:13                               ` Sagi Grimberg
       [not found]                                 ` <55B9CE85.40007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-30  7:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer


>> Can you explain what do you mean by "downgrades everything to a 2k
>> alignment"? If the ULP is responsible for a PAGE_SIZE alignment than
>> how would this get out of alignment with swiotlb?
>
> swiotlb copies all DMA maps to a shared buffer below 4G so it can be
> used with 32 bit devices.
>
> The shared buffer is managed in a way that copies each s/g element to
> a continuous 2k aligned subsection of the buffer.
>

Thanks for the explanation.

> Basically, swiotlb realigns everything that passes through it.

So this won't ever happen if the ULP will DMA map the SG and check
for gaps right?

Also, is it interesting to support swiotlb even if we don't have
any devices that require it (and should we expect one to ever exist)?

>
> The DMA API allows this, so ultimately, code has to check the dma
> physical address when concerned about alignment.. But we should not
> expect this to commonly fail.
>
> So, something like..
>
>    if (!ib_does_sgl_fit_in_mr(mr,sg))
>       .. bounce buffer ..

I don't understand the need for this is we do the same thing
if the actual mapping fails...

>
>    if (!ib_map_mr_sg(mr,sg)) // does dma mapping and checks it
>       .. bounce buffer ..

Each ULP would want to do something different, iser
will bounce but srp would need to use multiple mrs, nfs will
split the request.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 21/43] mlx5: Allocate a private page list in ib_alloc_mr
       [not found]         ` <55B75FFC.6040200-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2015-07-30  8:08           ` Sagi Grimberg
  0 siblings, 0 replies; 142+ messages in thread
From: Sagi Grimberg @ 2015-07-30  8:08 UTC (permalink / raw)
  To: Haggai Eran, Sagi Grimberg
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/28/2015 1:57 PM, Haggai Eran wrote:
> Hi Sagi,
>
> On 22/07/2015 09:55, Sagi Grimberg wrote:
>> Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> ---
>>   drivers/infiniband/hw/mlx5/mlx5_ib.h |  5 ++++
>>   drivers/infiniband/hw/mlx5/mr.c      | 45 ++++++++++++++++++++++++++++++++++++
>>   2 files changed, 50 insertions(+)
>>
>> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
>> index c2916f1..df5e959 100644
>> --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
>> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
>> @@ -315,6 +315,11 @@ enum mlx5_ib_mtt_access_flags {
>>
>>   struct mlx5_ib_mr {
>>   	struct ib_mr		ibmr;
>> +	u64		        *pl;
>> +	__be64			*mpl;
>> +	dma_addr_t		pl_map;
> Nit: could you choose more descriptive names for these fields? It can be
> difficult to understand what they mean just based on the acronym.

OK - I'll name it better in v1.

>
>> +	int			ndescs;
> This one isn't used in this patch, right?

Not in this patch - I can move it.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                                 ` <55B9CE85.40007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-07-30 16:36                                   ` Jason Gunthorpe
       [not found]                                     ` <20150730163631.GB16659-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-07-30 16:36 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Thu, Jul 30, 2015 at 10:13:09AM +0300, Sagi Grimberg wrote:

> >Basically, swiotlb realigns everything that passes through it.
> 
> So this won't ever happen if the ULP will DMA map the SG and check
> for gaps right?

Once mapped the physical address isn't going to change - but at some
point we must check the physical address directly.

> Also, is it interesting to support swiotlb even if we don't have
> any devices that require it (and should we expect one to ever exist)?

swiotlb is an obvious example, and totally uninteresting to support,
but we must correctly use the DMA API.

> >The DMA API allows this, so ultimately, code has to check the dma
> >physical address when concerned about alignment.. But we should not
> >expect this to commonly fail.
> >
> >So, something like..
> >
> >   if (!ib_does_sgl_fit_in_mr(mr,sg))
> >      .. bounce buffer ..
> 
> I don't understand the need for this is we do the same thing
> if the actual mapping fails...

Just performance. DMA mapping is potentially very expensive, the
common case to detect will be a sg that is virtually unaligned.

This virtual scan could be bundled insde the map, but if a ULP knows
it is page aligned already then that is just creating overhead..

I'm ambivalent..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                                     ` <20150730163631.GB16659-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-07-30 16:39                                       ` Christoph Hellwig
  0 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2015-07-30 16:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Sagi Grimberg, Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Thu, Jul 30, 2015 at 10:36:31AM -0600, Jason Gunthorpe wrote:
> > Also, is it interesting to support swiotlb even if we don't have
> > any devices that require it (and should we expect one to ever exist)?
> 
> swiotlb is an obvious example, and totally uninteresting to support,
> but we must correctly use the DMA API.

Do we have a choice?  It seems like various setups with DMA restrictions
rely on it, including many Xen PV guests.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                     ` <20150723175535.GE25174-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  2015-07-26  9:37                       ` Sagi Grimberg
@ 2015-08-19 11:56                       ` Sagi Grimberg
       [not found]                         ` <55D46EE8.4060701-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  1 sibling, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-08-19 11:56 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On 7/23/2015 8:55 PM, Jason Gunthorpe wrote:
> On Thu, Jul 23, 2015 at 01:15:16PM +0300, Sagi Grimberg wrote:
>>> I was hoping we'd move the DMA flush and translate into here and make
>>> it mandatory. Is there any reason not to do that?
>>
>> The reason I didn't added it in was so the ULPs can make sure they meet
>> the restrictions of ib_map_mr_sg(). Allow SRP to iterate on his
>> SG list set partials and iSER to detect gaps (they need to dma map
>> for that).
>
> The ULP can always get the sg list's virtual address to check for
> gaps. Page aligned gaps are always OK.

So I had a go with moving the DMA mapping into ib_map_mr_sg() and
it turns out mapping somewhat poorly if the ULP _may_ register memory
or just send sg_lists (like storage targets over IB/iWARP). So the ULP
will sometimes use the DMA mapping and sometimes it won't... feels
kinda off to me...

it's much saner to do:
1. dma_map_sg
2. register / send-sg-list
3. unregister (if needed)
4. dma_unmap_sg

then:
1. if register - call ib_map_mr_sg (which calls dma_map_sg)
    else do dma_map_sg
2. if registered - call ib_dma_unmap_sg (which calles dma_unmap_sg)
    else do dma_unmap_sg

this kinda forces ULP to completely separate these code paths with
with very little sharing.

Also, at the moment, when ULPs are doing either FRWR or FMRs
its a pain to get a non-intrusive conversion.

I'm thinking we should keep dma_map_sg out of ib_map_mr_sg, and leave
it to the ULP like it does today (at least in the first stage...)

Thoughts?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                         ` <55D46EE8.4060701-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-08-19 12:52                           ` Christoph Hellwig
       [not found]                             ` <20150819125253.GB24746-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2015-08-19 17:37                           ` Jason Gunthorpe
  1 sibling, 1 reply; 142+ messages in thread
From: Christoph Hellwig @ 2015-08-19 12:52 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Jason Gunthorpe, Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Wed, Aug 19, 2015 at 02:56:24PM +0300, Sagi Grimberg wrote:
> So I had a go with moving the DMA mapping into ib_map_mr_sg() and
> it turns out mapping somewhat poorly if the ULP _may_ register memory
> or just send sg_lists (like storage targets over IB/iWARP). So the ULP
> will sometimes use the DMA mapping and sometimes it won't... feels
> kinda off to me...

Yes, it's odd.

> it's much saner to do:
> 1. dma_map_sg
> 2. register / send-sg-list
> 3. unregister (if needed)
> 4. dma_unmap_sg
> 
> then:
> 1. if register - call ib_map_mr_sg (which calls dma_map_sg)
>    else do dma_map_sg
> 2. if registered - call ib_dma_unmap_sg (which calles dma_unmap_sg)
>    else do dma_unmap_sg
> 
> this kinda forces ULP to completely separate these code paths with
> with very little sharing.
> 
> Also, at the moment, when ULPs are doing either FRWR or FMRs
> its a pain to get a non-intrusive conversion.
> 
> I'm thinking we should keep dma_map_sg out of ib_map_mr_sg, and leave
> it to the ULP like it does today (at least in the first stage...)

Keep it out for now.  I think we need to move the dma mapping into
the RDMA care rather sooner than later, but that must also include
ib_post_send/recv, so it's better done separately.

After having a look at the mess some drivers (ipath,qib,hfi & ehca)
cause with abuse of dma_map_ops I've got an even strong opion on
the whole subject now.  However I think we'll get more things done
if we split them into smaller steps.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                             ` <20150819125253.GB24746-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-08-19 16:09                               ` Sagi Grimberg
       [not found]                                 ` <55D4AA2E.7090204-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-08-19 16:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jason Gunthorpe, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer


>
> Keep it out for now.

Ok, I was also thinking on moving the access flags
to the work request again. It doesn't make much sense there
unless I go with what Jason suggested with ib_map_mr_[lkey|rkey]
to protect against remote access for lkeys in IB which to me, sounds
redundant at this point given that ULPs will set the access according
to iWARP anyway.

I'd prefer to get this right with a different helper like Steve
suggested:
int rdma_access_flags(int mr_roles);

This way we don't need to protect against it.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                                 ` <55D4AA2E.7090204-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-08-19 16:58                                   ` Christoph Hellwig
  0 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2015-08-19 16:58 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Jason Gunthorpe, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer,
	Chuck Lever, Wengang Wang

On Wed, Aug 19, 2015 at 07:09:18PM +0300, Sagi Grimberg wrote:
> Ok, I was also thinking on moving the access flags
> to the work request again.

Yes, with the current code I don't think we need it in the MR.

> I'd prefer to get this right with a different helper like Steve
> suggested:
> int rdma_access_flags(int mr_roles);

We can start with that.  In the long run we really want to have
two higher level helpers to RDMA READ a scatterlist:

 - one for iWARP that uses an FR and RDMA READ WITH INVALIDATE
 - one of IB-like transports that just uses a READ with the
   local lkey

Right now every ULP that wants to support iWarp needs to duplicate
that code.  This leads to some curious situations like the NFS
server apparently always using FRs if available for this if my
reading of svc_rdma_accept() is correct, or the weird parallel
code pathes for IB vs iWarp in RDS:

hch@brick:~/work/linux/net/rds$ ls ib*
ib.c  ib_cm.c  ib.h  ib_rdma.c  ib_recv.c  ib_ring.c  ib_send.c
ib_stats.c  ib_sysctl.c
hch@brick:~/work/linux/net/rds$ ls iw*
iw.c  iw_cm.c  iw.h  iw_rdma.c  iw_recv.c  iw_ring.c  iw_send.c
iw_stats.c  iw_sysctl.c

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                         ` <55D46EE8.4060701-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  2015-08-19 12:52                           ` Christoph Hellwig
@ 2015-08-19 17:37                           ` Jason Gunthorpe
       [not found]                             ` <20150819173751.GB22646-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-08-19 17:37 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Wed, Aug 19, 2015 at 02:56:24PM +0300, Sagi Grimberg wrote:
> On 7/23/2015 8:55 PM, Jason Gunthorpe wrote:
> >On Thu, Jul 23, 2015 at 01:15:16PM +0300, Sagi Grimberg wrote:
> >>>I was hoping we'd move the DMA flush and translate into here and make
> >>>it mandatory. Is there any reason not to do that?
> >>
> >>The reason I didn't added it in was so the ULPs can make sure they meet
> >>the restrictions of ib_map_mr_sg(). Allow SRP to iterate on his
> >>SG list set partials and iSER to detect gaps (they need to dma map
> >>for that).
> >
> >The ULP can always get the sg list's virtual address to check for
> >gaps. Page aligned gaps are always OK.
> 
> So I had a go with moving the DMA mapping into ib_map_mr_sg() and
> it turns out mapping somewhat poorly if the ULP _may_ register memory
> or just send sg_lists (like storage targets over IB/iWARP). So the ULP
> will sometimes use the DMA mapping and sometimes it won't... feels
> kinda off to me...

You need to split the rkey and lkey API flows to pull this off - the
rkey side never needs to touch a sg, while the lkey side should always
try and use a sg first. I keep saying this: they have
fundamentally different ULP usages.

> 1. if register - call ib_map_mr_sg (which calls dma_map_sg)
>    else do dma_map_sg
> 2. if registered - call ib_dma_unmap_sg (which calles dma_unmap_sg)
>    else do dma_unmap_sg

>From what I've seen in the ULPs the flow control is generally such
that the MR is 'consumed' even if it isn't used by a send.

So lkey usage is simply split into things that absolutely don't need a
MR, and things that maybe do. The maybe side can go ahead and always
consume the MR resource, but optimize the implementation to a SG list
to avoid a performance hit.

Then the whole API becomes symmetric. The ULP says, 'here is a
scatterlist list and a lkey MR, make me a ib_sg list' and the core
either packes it as is into the sg, or it spins up the MR and packs
that.

This lets the unmap be symmetric, as the core always dma_unmaps, but
only tears down the MR if it was used.

The cost is the lkey MR slot is always consumed, which should be OK
because SQE flow control bounds the number of concurrent MRs required,
so consuming a SQE but not a MR doesn't provide an advantage.

> Also, at the moment, when ULPs are doing either FRWR or FMRs
> its a pain to get a non-intrusive conversion.

Without FMR sharing API entry points it is going to be hard to unify
them..

ie the map and alloc API side certainly could be shared..

> I'm thinking we should keep dma_map_sg out of ib_map_mr_sg, and leave
> it to the ULP like it does today (at least in the first stage...)

I'm fine with first stage, but we still really do need to figure how
how to get better code sharing in our API here..

Maybe we can do the rkey side right away until we can figure out how
to harmonize the rkey sg/mr usage?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                             ` <20150819173751.GB22646-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-08-20 10:05                               ` Sagi Grimberg
       [not found]                                 ` <55D5A687.90102-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Sagi Grimberg @ 2015-08-20 10:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

>> 1. if register - call ib_map_mr_sg (which calls dma_map_sg)
>>     else do dma_map_sg
>> 2. if registered - call ib_dma_unmap_sg (which calles dma_unmap_sg)
>>     else do dma_unmap_sg
>
>  From what I've seen in the ULPs the flow control is generally such
> that the MR is 'consumed' even if it isn't used by a send.

Not really. if registration is not needed, an MR is not consumed. In
fact, in svcrdma the IB code path never uses those, and the iWARP code
path always use those for RDMA_READs and not RDMA_WRITEs. Also, isert
use those only when signature is enabled and registration is required.

>
> So lkey usage is simply split into things that absolutely don't need a
> MR, and things that maybe do. The maybe side can go ahead and always
> consume the MR resource, but optimize the implementation to a SG list
> to avoid a performance hit.
>
> Then the whole API becomes symmetric. The ULP says, 'here is a
> scatterlist list and a lkey MR, make me a ib_sg list' and the core
> either packes it as is into the sg, or it spins up the MR and packs
> that.

Always consuming an MR resource is an extra lock acquire given these
are always kept in a pool structure.

>> I'm thinking we should keep dma_map_sg out of ib_map_mr_sg, and leave
>> it to the ULP like it does today (at least in the first stage...)
>
> I'm fine with first stage, but we still really do need to figure how
> how to get better code sharing in our API here..
>
> Maybe we can do the rkey side right away until we can figure out how
> to harmonize the rkey sg/mr usage?

I'm fine with that. I agree we still need to do better.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                                 ` <55D5A687.90102-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-08-20 19:04                                   ` Jason Gunthorpe
       [not found]                                     ` <20150820190413.GB29567-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Jason Gunthorpe @ 2015-08-20 19:04 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

On Thu, Aug 20, 2015 at 01:05:59PM +0300, Sagi Grimberg wrote:
> >>1. if register - call ib_map_mr_sg (which calls dma_map_sg)
> >>    else do dma_map_sg
> >>2. if registered - call ib_dma_unmap_sg (which calles dma_unmap_sg)
> >>    else do dma_unmap_sg
> >
> > From what I've seen in the ULPs the flow control is generally such
> >that the MR is 'consumed' even if it isn't used by a send.
> 
> Not really. if registration is not needed, an MR is not consumed. In
> fact, in svcrdma the IB code path never uses those, and the iWARP code
> path always use those for RDMA_READs and not RDMA_WRITEs. Also, isert
> use those only when signature is enabled and registration is required.

The MR is not *used* but it should be 'consumed' - in the sense that
every RPC slot is associated (implicitly) with a MR, leaving the unused MR
in some kind of pool doesn't really help anything. Honestly, the MR
pool idea doesn't really help anything, it just makes confusion.

What should be pooled is the 'request slot' itself, in the sense that
if a request slot is in the 'ready to go' pool it is guarenteed to be
able to complete *any* request without blocking. That means the
MR/SQE/CQE resources are all ready to go. Any ancillary memory is
ready to use, etc.

The ULP should design its slot with the idea that it doesn't have to
allocate memory, or IB resources, or block, once the slot becomes
'ready to go'.

Review the discussion Chuck and I had on SQE flow control for a sense
on what that means. Understand why the lifetime of the MR and the SQE
and the slot are all convoluted together if RDMA is used correctly.

Trying to decouple the sub resources, ie by separately pooling the
MR/SQE/etc, is just unnecessary complexity, IMHO.. NFS client already
had serioues bugs in this area.

So, I turn to the idea that every ULP should work as the above, which
means when it gets to working on a 'slot' that implies there is an
actual struct ib_mr resource guaranteed available. This is why I
suggested using the 'struct ib_mr' to guide the SG construction even if
the actual HW MR isn't going to be used. The struct ib_mr is tied to
the slot, so using it has no cost.

-------

But, maybe that is too much of a shortcut, and thinking about it more
and all the other things I've written about.. Lets just directly
address this issue and add something called 'struct ib_op_slot'.

Data transfer would look like this:

 struct *ib_send_wr cur;

 cur = ib_slot_make_send(&req->op_slot,scatter_list);
 cur->next = ib_slot_make_rdma_read(&next_req->op,scatter_list,
 	          rkey,length);
 ib_post_send(qp,cur);

 [.. at CQE time ..]
 if (ib_slot_complete(qp,req->op_slot))
    [.. slot is now 'ready to go' ..]
 else
    [.. otherwise more stuff was posted, have to wait ...]

This forces the ULP to deal with many of the issues. Having a slot
means guarenteed minimum avaiable MR,SQE,CQE resources. That
guarenteed minimum avoids the messy API struggle in my prior writings.

.. and maybe the above is even thinking too small, to Christoph's
earlier musings, I wonder if a slot based middle API could hijack the
entire SCQ processing and have a per-slot callback scheme
instead. That kind of intervention is exactly what we'd need to
trivially hide the FMR difference.

... and now this provides enough context to start talking about common
helper APIs for common ULP things, like the rdma_read switch. The slot
has pre-allocated everything needed to handle the variations.

... which suddenly starts to be really viable because the slot
guarentees SQE availability too.

... and we start having the idea of a slot able to do certain tasks,
and codify that with API help at creation:

  struct nfs_rpc_slot
  {
     strict ib_op_slot slot;
  };

  struct ib_op_slot_attributes attrs;
  ib_init_slot_attrs(&attrs,ib_pd);

  ib_request_action(&attrs, "args describing RDMA read with N SGEs");
  if (ib_request_action("args describing a requirement for signature"))
      signature_supported = true;
  if (ib_request_action("args describing a requirement for non-page-aligned"))
      byte_sgl_supported = true;
  ib_request_action("args describing SEND with N SGEs");
  ib_request_action("args describing N RDMA reads each with N SGEs");

  for (required slot concurrency)
    ib_alloc_slot(&rpc.slot,&attrs);

Then the alloc just grabs everything required. ..mumble mumble.. some
way to flow into the QP/CQ allocation attributes too ..

Essentially, the ULP says 'here is what I want to do with this slot'
and the core code *guarentees* that if the slot is 'ready to go' then
'any single work of any requested type' can be queued without blocking
or memory allocation. Covers SQEs, CQEs, MRs, etc.

ib_request_action is a basic pattern that does various tests and ends
up doing:
  attrs->num_mrs = max(attrs->num_mrs, needed_for_this_action);
  attrs->num_mrs_sge = max(attrs->num_mrs_sge, needed_for_this_action);
  attrs->num_wr_sge = max(attrs->num_qp_sqe, needed_for_this_action);
  attrs->num_sqe = max(attrs->num_sqe, needed_for_this_action);
  attrs->num_cqe = max(attrs->num_cqe, needed_for_this_action);
[ie we compute the maximum allocation needed to satisfy the
 requested requirement]

Each request could fail, eg if signature is not supported then the
request_action will fail, so we have a more uniform way to talk about
send queue features.

... and the ULP could have a 'heavy' and 'light' slot pool if that
made some kind of sense for its work load.

So, that is a long road, but maybe there are reasonable interm stages?

Anyhow, conceptually, an idea. Eliminates the hated fmr pool concept,
cleans up bad thinking around queue flow control. Provides at least a
structure to abstract transport differences.

---------

It could look something like this:

 struct ib_op_slot
 {
    struct ib_mr **mr_list; // null terminated
    void *wr_memory;
    void *sg_memory;
    unsigned int num_sgs;
 };

 struct ib_send_wr *ib_slot_make_send(struct ib_op_slot *slot,
               const struct scatter_list *sgl)
 {
     dma_map(sgl);
     if (num_sges(sgl) < slot->num_sgs) {
        // send fits in the sg list
        struct ib_send_wr *wr = slot->wr_memory;
	wr->sg0list = slot->sg_memory;
	.. pack it in ..
	return wr;
     } else {
        // Need to spin up a MR..
	struct {
	   struct ib_send_wr frwr_wr;
	   struct ib_send_wr send_wr;
	} *wrs = slot->wr_memory;
	wrs->frwr_wr.next = &wrs->send_wr
	... pack it in ...
	return &wrs->frwr_wr;
    }
    // similar for FMR
 }

.. similar concept for rdma read, etc.
.. ib_request_action makes sure the wr_memory/sg_memory are pre-sized
   to accommodate the action. Add optional #ifdef'd debugging to check
   for bad ULP usage
.. function pointers could be used to provide special optimal versions
   if necessary
.. Complex things like signature just vanish from the API. ULP sees
   something like:

    if (ib_request_action("args describing a requirement for signature"))
       signature_supported = true;
    wr = ib_slot_make_rdma_write_signature(slot,....);

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                                     ` <20150820190413.GB29567-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-08-21  6:34                                       ` Christoph Hellwig
       [not found]                                         ` <20150821063458.GA875-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 142+ messages in thread
From: Christoph Hellwig @ 2015-08-21  6:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Sagi Grimberg, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Oren Duer

On Thu, Aug 20, 2015 at 01:04:13PM -0600, Jason Gunthorpe wrote:
> Trying to decouple the sub resources, ie by separately pooling the
> MR/SQE/etc, is just unnecessary complexity, IMHO.. NFS client already
> had serioues bugs in this area.
> 
> So, I turn to the idea that every ULP should work as the above, which
> means when it gets to working on a 'slot' that implies there is an
> actual struct ib_mr resource guaranteed available. This is why I
> suggested using the 'struct ib_mr' to guide the SG construction even if
> the actual HW MR isn't going to be used. The struct ib_mr is tied to
> the slot, so using it has no cost.

How is this going to work for drivers that might consumer multiple
MRs per request like SRP or similar upcoming block drivers?  Unless
you want to allocate a potentially large number of MRs for each
request that scheme doesn't work.

> This forces the ULP to deal with many of the issues. Having a slot
> means guarenteed minimum avaiable MR,SQE,CQE resources. That
> guarenteed minimum avoids the messy API struggle in my prior writings.
> 
> .. and maybe the above is even thinking too small, to Christoph's
> earlier musings, I wonder if a slot based middle API could hijack the
> entire SCQ processing and have a per-slot callback scheme
> instead. That kind of intervention is exactly what we'd need to
> trivially hide the FMR difference.

FYI, I have working early patches to do per-WR completion callback,
I'll post them after I get them into a slightly better shape.

As for your grand schemes:  I like some of the idea there, but we
need to get there gradually.  I'd much prefer to finish Sagi's simple
scheme, get my completion work in, add abstractions for RDMA READ and
WRITE scatterlist mapping and build things up slowly.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API
       [not found]                                         ` <20150821063458.GA875-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2015-08-21 18:08                                           ` Jason Gunthorpe
  0 siblings, 0 replies; 142+ messages in thread
From: Jason Gunthorpe @ 2015-08-21 18:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	Liran Liss, Oren Duer

On Thu, Aug 20, 2015 at 11:34:58PM -0700, Christoph Hellwig wrote:

> How is this going to work for drivers that might consumer multiple
> MRs per request like SRP or similar upcoming block drivers?  Unless
> you want to allocate a potentially large number of MRs for each
> request that scheme doesn't work.

There are at least two approaches, and it depends on how flow control
to the driving layer works out. Look at what the ULP does when the
existing MR pool exhausts:
- Exhaustion is not allowed. In this model every slot must truely handle
  every required action without blocking. The ULP somehow wrangles
  things so pool exhaustion is not possible. NFS client is a
  good example.

  Where NFS client went wrong is that the MR alone is not enough,
  issuing a request requires SQE/CQE resources, failing to track that
  caused hard to find bugs.
- Exhaustion is allowed, and somehow the ULP is able to stop
  processing. In this case you'd just swap MRs for slots in the pool,
  probably having pools of different kinds of slots to optimize
  resource use.

  Pool draw down includes SQE/CQE/etc resources as well. A multiple
  rkey MR case would just draw down the required slots from the pool.

I suspect client side tends to lean toward the first option and target
side the second - targets can always do back pressure flow control by
simply halting RQE processing, and it makes alot of sense on a target
to globally pool slots across all client QPs.

This idea of a slot is just a higher level structure we can hang other
stuff off - like the sg/mr decision, the iwarp rdma read change, sqe
accounting.

We don't need to start with everything, but I'm looking at Sagi's
notes on trying to factor the lkey side code paths and thinking a
broader abstraction than raw MR is needed to solve that.

> FYI, I have working early patches to do per-WR completion callback,
> I'll post them after I get them into a slightly better shape.

Interesting..

> As for your grand schemes:  I like some of the idea there, but we
> need to get there gradually.  I'd much prefer to finish Sagi's simple
> scheme, get my completion work in, add abstractions for RDMA READ and
> WRITE scatterlist mapping and build things up slowly.

Yes, absolutely, we have to go slowly - but exploring how we can fit
this together in some other way can help guide some of the smaller
choices.

Sagi could drop the lkey side, getting the rkey side in order would be
nice enough. Something like this is a direction to address the
lkey side.

Ie we could 1:1 replace MR with 'slot' and use that to factor the lkey
code paths. Over time slot can grow organically to factor more code.

Slot would be a new object for the core, one that is guarenteed to
last from post->completion, that seems like exactly the sort of object
a completion callback scheme would benefit from. Guarenteed memory to
hang callback pointers/etc off.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API
@ 2015-07-23  9:22 Christoph Hellwig
  0 siblings, 0 replies; 142+ messages in thread
From: Christoph Hellwig @ 2015-07-23  9:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Sagi Grimberg, Christoph Hellwig, Sagi Grimberg,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer

> If you want to micro optimize then just zero the few items that are
> defined to be accessed for fastreg, no need to zero the whole
> structure. Infact, you may have already done that, so just drop the
> memset entirely.

Oh, indeed.

> If you want to optimize this path, then Sean is right, move the post
> into the driver and stop pretending that ib_post_send is a performance
> API.
> 
> ib_post_fastreg_wr would be a function that needs 3 register passed
> arguments and does a simple copy to the driver's actual sendq

Now that sounds even better.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 142+ messages in thread

end of thread, other threads:[~2015-08-21 18:08 UTC | newest]

Thread overview: 142+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-22  6:55 [PATCH WIP 00/43] New fast registration API Sagi Grimberg
     [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-07-22  6:55   ` [PATCH WIP 01/43] IB: Modify ib_create_mr API Sagi Grimberg
     [not found]     ` <1437548143-24893-2-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-07-22 16:34       ` Jason Gunthorpe
     [not found]         ` <20150722163405.GA26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-22 16:44           ` Christoph Hellwig
     [not found]             ` <20150722164421.GA6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-07-22 16:58               ` Sagi Grimberg
     [not found]                 ` <55AFCBAF.2000504-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-22 19:05                   ` Jason Gunthorpe
     [not found]                     ` <20150722190555.GB4527-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-23 10:07                       ` Sagi Grimberg
     [not found]                         ` <55B0BCFC.6040602-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-23 19:08                           ` Jason Gunthorpe
     [not found]                             ` <20150723190855.GB31577-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-26  8:51                               ` Sagi Grimberg
2015-07-22 16:59           ` Sagi Grimberg
     [not found]             ` <55AFCBE4.1070803-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-22 17:01               ` Jason Gunthorpe
     [not found]                 ` <20150722170120.GC26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-22 17:03                   ` Sagi Grimberg
2015-07-23  0:57       ` Hefty, Sean
     [not found]         ` <1828884A29C6694DAF28B7E6B8A82373A9001357-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-07-23  9:30           ` Christoph Hellwig
     [not found]             ` <20150723093046.GF32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-07-23 10:09               ` Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 02/43] IB/mlx4: Support ib_alloc_mr verb Sagi Grimberg
     [not found]     ` <1437548143-24893-3-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-07-22 16:58       ` Jason Gunthorpe
     [not found]         ` <20150722165831.GB26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-22 17:22           ` Sagi Grimberg
     [not found]             ` <55AFD14C.8040007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-22 18:50               ` Steve Wise
     [not found]                 ` <55AFE5D9.3050102-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2015-07-22 18:54                   ` Jason Gunthorpe
     [not found]                     ` <20150722185410.GA4527-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-23 10:10                       ` Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 03/43] ocrdma: " Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 04/43] iw_cxgb4: " Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 05/43] cxgb3: " Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 06/43] nes: " Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 07/43] qib: " Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 08/43] IB/iser: Convert to ib_alloc_mr Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 09/43] iser-target: " Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 10/43] IB/srp: " Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 11/43] xprtrdma, svcrdma: " Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 12/43] RDS: " Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 13/43] mlx5: Drop mlx5_ib_alloc_fast_reg_mr Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 14/43] mlx4: Drop mlx4_ib_alloc_fast_reg_mr Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 15/43] ocrdma: Drop ocrdma_alloc_frmr Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 16/43] qib: Drop qib_alloc_fast_reg_mr Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 17/43] nes: Drop nes_alloc_fast_reg_mr Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 18/43] cxgb4: Drop c4iw_alloc_fast_reg_mr Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 19/43] cxgb3: Drop iwch_alloc_fast_reg_mr Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 20/43] IB/core: Drop ib_alloc_fast_reg_mr Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 21/43] mlx5: Allocate a private page list in ib_alloc_mr Sagi Grimberg
     [not found]     ` <1437548143-24893-22-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-07-22 16:46       ` Christoph Hellwig
     [not found]         ` <20150722164605.GB6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-07-22 16:51           ` Sagi Grimberg
2015-07-28 10:57       ` Haggai Eran
     [not found]         ` <55B75FFC.6040200-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-07-30  8:08           ` Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 22/43] mlx4: " Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 23/43] ocrdma: " Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 24/43] cxgb3: Allocate a provate " Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 25/43] cxgb4: Allocate a private " Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 26/43] qib: " Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 27/43] nes: " Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 28/43] IB/core: Introduce new fast registration API Sagi Grimberg
     [not found]     ` <1437548143-24893-29-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-07-22 16:50       ` Christoph Hellwig
     [not found]         ` <20150722165012.GC6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-07-22 16:56           ` Sagi Grimberg
2015-07-22 17:44           ` Jason Gunthorpe
     [not found]             ` <20150722174401.GG26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-23  9:19               ` Christoph Hellwig
     [not found]                 ` <20150723091955.GA32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-07-23 16:03                   ` Jason Gunthorpe
2015-07-23 10:15               ` Sagi Grimberg
     [not found]                 ` <55B0BEB4.9080702-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-23 17:55                   ` Jason Gunthorpe
     [not found]                     ` <20150723175535.GE25174-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-26  9:37                       ` Sagi Grimberg
     [not found]                         ` <55B4AA73.3090803-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-27 17:04                           ` Jason Gunthorpe
     [not found]                             ` <20150727170459.GA18348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-30  7:13                               ` Sagi Grimberg
     [not found]                                 ` <55B9CE85.40007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-30 16:36                                   ` Jason Gunthorpe
     [not found]                                     ` <20150730163631.GB16659-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-30 16:39                                       ` Christoph Hellwig
2015-08-19 11:56                       ` Sagi Grimberg
     [not found]                         ` <55D46EE8.4060701-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-08-19 12:52                           ` Christoph Hellwig
     [not found]                             ` <20150819125253.GB24746-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-08-19 16:09                               ` Sagi Grimberg
     [not found]                                 ` <55D4AA2E.7090204-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-08-19 16:58                                   ` Christoph Hellwig
2015-08-19 17:37                           ` Jason Gunthorpe
     [not found]                             ` <20150819173751.GB22646-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-08-20 10:05                               ` Sagi Grimberg
     [not found]                                 ` <55D5A687.90102-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-08-20 19:04                                   ` Jason Gunthorpe
     [not found]                                     ` <20150820190413.GB29567-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-08-21  6:34                                       ` Christoph Hellwig
     [not found]                                         ` <20150821063458.GA875-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-08-21 18:08                                           ` Jason Gunthorpe
2015-07-23 18:42                   ` Jason Gunthorpe
     [not found]                     ` <20150723184221.GA30303-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-26  8:54                       ` Sagi Grimberg
2015-07-22 18:02       ` Jason Gunthorpe
     [not found]         ` <20150722180203.GI26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-23 10:19           ` Sagi Grimberg
     [not found]             ` <55B0BFA4.4060509-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-23 16:14               ` Jason Gunthorpe
     [not found]                 ` <20150723161436.GC25174-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-23 16:47                   ` Sagi Grimberg
     [not found]                     ` <55B11A92.9040406-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-23 18:51                       ` Jason Gunthorpe
     [not found]                         ` <20150723185126.GA31346-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-26  9:45                           ` Sagi Grimberg
     [not found]                             ` <55B4AC26.20405-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-27 17:14                               ` Jason Gunthorpe
     [not found]                                 ` <20150727171441.GC18348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-27 20:11                                   ` Steve Wise
     [not found]                                     ` <55B69058.70403-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2015-07-27 20:29                                       ` Jason Gunthorpe
2015-07-28 11:20       ` Haggai Eran
2015-07-22  6:55   ` [PATCH WIP 29/43] mlx5: Support the new memory " Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 30/43] mlx4: " Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 31/43] ocrdma: " Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 32/43] cxgb3: " Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 33/43] cxgb4: " Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 34/43] nes: " Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 35/43] qib: " Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 36/43] iser: Port to new fast registration api Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 37/43] xprtrdma: Port to new memory registration API Sagi Grimberg
     [not found]     ` <1437548143-24893-38-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-07-22 15:03       ` Chuck Lever
     [not found]         ` <795F4F28-D92F-46A1-8DA3-2B1B19A17AA3-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-07-22 15:41           ` Sagi Grimberg
     [not found]             ` <55AFB9A7.4030103-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-22 16:04               ` Chuck Lever
     [not found]                 ` <5114D0F0-7C66-4889-85D8-E7297009AF23-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-07-23 10:42                   ` Sagi Grimberg
2015-07-22 16:59           ` Christoph Hellwig
2015-07-22 19:21       ` Steve Wise
     [not found]         ` <55AFED4C.9040409-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2015-07-23 10:20           ` Sagi Grimberg
     [not found]             ` <55B0C002.60307-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-23 13:46               ` Steve Wise
2015-07-22  6:55   ` [PATCH WIP 38/43] iser-target: " Sagi Grimberg
     [not found]     ` <1437548143-24893-39-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-07-22 17:04       ` Christoph Hellwig
     [not found]         ` <20150722170413.GE6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-07-22 17:33           ` Sagi Grimberg
     [not found]             ` <55AFD3DC.8070508-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-22 17:57               ` Jason Gunthorpe
     [not found]                 ` <20150722175755.GH26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-23 10:27                   ` Sagi Grimberg
     [not found]                     ` <55B0C18B.4080901-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-23 13:35                       ` Chuck Lever
2015-07-23 16:31                       ` Jason Gunthorpe
     [not found]                         ` <20150723163124.GD25174-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-23 16:59                           ` Sagi Grimberg
     [not found]                             ` <55B11D84.102-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-23 18:53                               ` Jason Gunthorpe
     [not found]                                 ` <20150723185334.GB31346-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-24 14:36                                   ` Chuck Lever
     [not found]                                     ` <DE0226A1-A7FC-4618-91F1-FE34347C252A-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-07-24 16:26                                       ` Jason Gunthorpe
     [not found]                                         ` <20150724162657.GA21473-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-24 16:34                                           ` Steve Wise
2015-07-24 17:46                                           ` Chuck Lever
     [not found]                                             ` <903CDFB5-04FE-47B6-B044-E960E8A8BC4C-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-07-24 19:10                                               ` Jason Gunthorpe
     [not found]                                                 ` <20150724191003.GA26225-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-24 19:59                                                   ` Chuck Lever
     [not found]                                                     ` <A1A0BF6E-992A-4B34-8D24-EA8AA8D6983B-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-07-24 20:24                                                       ` Jason Gunthorpe
     [not found]                                                         ` <20150724202445.GA28033-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-24 22:13                                                           ` Steve Wise
2015-07-24 22:44                                                             ` Jason Gunthorpe
2015-07-27 15:57                                               ` Chuck Lever
     [not found]                                                 ` <8A2BC019-1DC0-4531-9659-3181EE9A4B43-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-07-27 17:25                                                   ` Jason Gunthorpe
     [not found]                                                     ` <20150727172510.GD18348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-28 20:06                                                       ` Chuck Lever
     [not found]                                                         ` <B045BAC2-0360-4D97-A220-7DB52AF90BF7-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-07-29  6:32                                                           ` Christoph Hellwig
2015-07-22  6:55   ` [PATCH WIP 39/43] IB/core: Add arbitrary sg_list support Sagi Grimberg
     [not found]     ` <1437548143-24893-40-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-07-22 17:05       ` Christoph Hellwig
2015-07-22 17:22       ` Jason Gunthorpe
     [not found]         ` <20150722172255.GD26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-22 17:29           ` Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 40/43] mlx5: Allocate private context for arbitrary scatterlist registration Sagi Grimberg
     [not found]     ` <1437548143-24893-41-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2015-07-22 17:30       ` Jason Gunthorpe
     [not found]         ` <20150722173048.GF26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-23  9:25           ` Christoph Hellwig
     [not found]             ` <20150723092532.GC32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-07-23 10:28               ` Sagi Grimberg
2015-07-23 16:04               ` Jason Gunthorpe
2015-07-22  6:55   ` [PATCH WIP 41/43] mlx5: Add arbitrary sg list support Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 42/43] iser: Accept arbitrary sg lists mapping if the device supports it Sagi Grimberg
2015-07-22  6:55   ` [PATCH WIP 43/43] iser: Move unaligned counter increment Sagi Grimberg
2015-07-22 17:10   ` [PATCH WIP 00/43] New fast registration API Christoph Hellwig
     [not found]     ` <20150722171023.GA18934-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-07-22 17:27       ` Jason Gunthorpe
     [not found]         ` <20150722172702.GE26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-07-23  9:26           ` Christoph Hellwig
2015-07-22 17:42       ` Sagi Grimberg
     [not found]         ` <55AFD608.401-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-07-23  9:28           ` Christoph Hellwig
     [not found]             ` <20150723092857.GE32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2015-07-23 10:34               ` Sagi Grimberg
2015-07-23  9:22 [PATCH WIP 38/43] iser-target: Port to new memory " Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.