* [PATCH WIP 00/43] New fast registration API @ 2015-07-22 6:55 Sagi Grimberg [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Hi all, So I went ahead and tried to implement some of the stuff we've been talking about. I figured I'd send out a WIP version to try and communicate early where this is heading. In order to have a sane patchset I followed a scheme that add-new/port-existing/drop-old... The set starts with: - Convert ib_create_mr API to ib_alloc_mr as Christoph suggested (1) - Add vendor drivers support for ib_alloc_mr (2-7) - Port ULPs to use ib_alloc_mr (8-12) - Drop alloc_fast_reg_mr API (core + vendor drivers) (13-20) Continues with: - Allocate vendor private page lists (21-27) - Add a new fast registration API that will replace existing frwr (28) - Add support for the new API in relevant vendor drivers (29-35) * its a bit hacky since just bluntly duplicated the registration routines keep in mind that this is transient until we drop the old API... - Port ULPs to use the new API (iser, isert, xprtrdma for now) (36-38) this is on top of Chuck's nfs-rdma-for-4.3 and updated iser/isert code The set should end with: - Complete ULPs porting (svcrdma, rds, srp) - Drop old fast registration API - FRWR (core + vendor drivers) - Still have the huge-pages bit to work out. I also added the arbitrary sg list registration support to mlx5 and iser in a less intrusive API additions (39-43) just to show the concept. This set was lightly tested on the ported ULPs over mlx5 (didn't have a chance to test mlx4 yet). The main reasons for this preview are: - Help with testing (especially on devices that I don't have access to e.g cxgb3, cxgb4, ocrdma, nes, qib). I probably have bugs there as I just compile tested so far. - Help with porting of the rest of the ULPs (rds, srp, svcrdma) - Early code review What I've noticed from this effort was that several drivers keep a shadow mapped page lists for specific device settings. At registration time, the drivers iterate on the page list and sets the mapped page list entries with some extra information. I'd expect these drivers not to use the core function to map SG list to pages and use it's own function which will allow them to lose their page list duplication. I haven't done that yet. Comments and review are welcomed (and needed!). Sorry for the long series, but it's kinda transverse... The code/patches can be found in: https://github.com/sagigrimberg/linux/tree/fastreg_api_wip Sagi Grimberg (43): IB: Modify ib_create_mr API IB/mlx4: Support ib_alloc_mr verb ocrdma: Support ib_alloc_mr verb iw_cxgb4: Support ib_alloc_mr verb cxgb3: Support ib_alloc_mr verb nes: Support ib_alloc_mr verb qib: Support ib_alloc_mr verb IB/iser: Convert to ib_alloc_mr iser-target: Convert to ib_alloc_mr IB/srp: Convert to ib_alloc_mr xprtrdma, svcrdma: Convert to ib_alloc_mr RDS: Convert to ib_alloc_mr mlx5: Drop mlx5_ib_alloc_fast_reg_mr mlx4: Drop mlx4_ib_alloc_fast_reg_mr ocrdma: Drop ocrdma_alloc_frmr qib: Drop qib_alloc_fast_reg_mr nes: Drop nes_alloc_fast_reg_mr cxgb4: Drop c4iw_alloc_fast_reg_mr cxgb3: Drop iwch_alloc_fast_reg_mr IB/core: Drop ib_alloc_fast_reg_mr mlx5: Allocate a private page list in ib_alloc_mr mlx4: Allocate a private page list in ib_alloc_mr ocrdma: Allocate a private page list in ib_alloc_mr cxgb3: Allocate a provate page list in ib_alloc_mr cxgb4: Allocate a private page list in ib_alloc_mr qib: Allocate a private page list in ib_alloc_mr nes: Allocate a private page list in ib_alloc_mr IB/core: Introduce new fast registration API mlx5: Support the new memory registration API mlx4: Support the new memory registration API ocrdma: Support the new memory registration API cxgb3: Support the new memory registration API cxgb4: Support the new memory registration API nes: Support the new memory registration API qib: Support the new memory registration API iser: Port to new fast registration api xprtrdma: Port to new memory registration API iser-target: Port to new memory registration API IB/core: Add arbitrary sg_list support mlx5: Allocate private context for arbitrary scatterlist registration mlx5: Add arbitrary sg list support iser: Accept arbitrary sg lists mapping if the device supports it iser: Move unaligned counter increment drivers/infiniband/core/verbs.c | 164 ++++++++++++++++++---- drivers/infiniband/hw/cxgb3/iwch_provider.c | 35 ++++- drivers/infiniband/hw/cxgb3/iwch_provider.h | 2 + drivers/infiniband/hw/cxgb3/iwch_qp.c | 48 +++++++ drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 12 +- drivers/infiniband/hw/cxgb4/mem.c | 38 +++++- drivers/infiniband/hw/cxgb4/provider.c | 3 +- drivers/infiniband/hw/cxgb4/qp.c | 75 +++++++++- drivers/infiniband/hw/mlx4/main.c | 3 +- drivers/infiniband/hw/mlx4/mlx4_ib.h | 14 +- drivers/infiniband/hw/mlx4/mr.c | 74 +++++++++- drivers/infiniband/hw/mlx4/qp.c | 27 ++++ drivers/infiniband/hw/mlx5/main.c | 5 +- drivers/infiniband/hw/mlx5/mlx5_ib.h | 20 ++- drivers/infiniband/hw/mlx5/mr.c | 204 +++++++++++++++++++++------- drivers/infiniband/hw/mlx5/qp.c | 107 +++++++++++++++ drivers/infiniband/hw/nes/nes_verbs.c | 129 +++++++++++++++++- drivers/infiniband/hw/nes/nes_verbs.h | 5 + drivers/infiniband/hw/ocrdma/ocrdma.h | 2 + drivers/infiniband/hw/ocrdma/ocrdma_main.c | 3 +- drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 88 +++++++++++- drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 8 +- drivers/infiniband/hw/qib/qib_keys.c | 56 ++++++++ drivers/infiniband/hw/qib/qib_mr.c | 30 +++- drivers/infiniband/hw/qib/qib_verbs.c | 8 +- drivers/infiniband/hw/qib/qib_verbs.h | 12 +- drivers/infiniband/ulp/iser/iscsi_iser.h | 6 +- drivers/infiniband/ulp/iser/iser_memory.c | 48 +++---- drivers/infiniband/ulp/iser/iser_verbs.c | 38 ++---- drivers/infiniband/ulp/isert/ib_isert.c | 128 ++++------------- drivers/infiniband/ulp/isert/ib_isert.h | 2 - drivers/infiniband/ulp/srp/ib_srp.c | 3 +- include/rdma/ib_verbs.h | 88 +++++++----- net/rds/iw_rdma.c | 5 +- net/rds/iw_send.c | 5 +- net/sunrpc/xprtrdma/frwr_ops.c | 86 ++++++------ net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 +- net/sunrpc/xprtrdma/xprt_rdma.h | 4 +- 38 files changed, 1223 insertions(+), 364 deletions(-) -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* [PATCH WIP 01/43] IB: Modify ib_create_mr API [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2015-07-22 6:55 ` Sagi Grimberg [not found] ` <1437548143-24893-2-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 6:55 ` [PATCH WIP 02/43] IB/mlx4: Support ib_alloc_mr verb Sagi Grimberg ` (42 subsequent siblings) 43 siblings, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Use ib_alloc_mr with specific parameters. Change the existing callers. Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/core/verbs.c | 20 ++++++++++++------ drivers/infiniband/hw/mlx5/main.c | 2 +- drivers/infiniband/hw/mlx5/mlx5_ib.h | 6 ++++-- drivers/infiniband/hw/mlx5/mr.c | 21 ++++++++++++++----- drivers/infiniband/ulp/iser/iser_verbs.c | 4 +--- drivers/infiniband/ulp/isert/ib_isert.c | 6 +----- include/rdma/ib_verbs.h | 36 ++++++++++---------------------- 7 files changed, 48 insertions(+), 47 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 8197ce7..23d73bd 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1235,16 +1235,24 @@ int ib_dereg_mr(struct ib_mr *mr) } EXPORT_SYMBOL(ib_dereg_mr); -struct ib_mr *ib_create_mr(struct ib_pd *pd, - struct ib_mr_init_attr *mr_init_attr) +/** + * ib_alloc_mr() - Allocates a memory region + * @pd: protection domain associated with the region + * @mr_type: memory region type + * @max_entries: maximum registration entries available + * @flags: create flags + */ +struct ib_mr *ib_alloc_mr(struct ib_pd *pd, + enum ib_mr_type mr_type, + u32 max_entries, + u32 flags) { struct ib_mr *mr; - if (!pd->device->create_mr) + if (!pd->device->alloc_mr) return ERR_PTR(-ENOSYS); - mr = pd->device->create_mr(pd, mr_init_attr); - + mr = pd->device->alloc_mr(pd, mr_type, max_entries, flags); if (!IS_ERR(mr)) { mr->device = pd->device; mr->pd = pd; @@ -1255,7 +1263,7 @@ struct ib_mr *ib_create_mr(struct ib_pd *pd, return mr; } -EXPORT_SYMBOL(ib_create_mr); +EXPORT_SYMBOL(ib_alloc_mr); struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len) { diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 48f02da..82a371f 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1502,7 +1502,7 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev) dev->ib_dev.attach_mcast = mlx5_ib_mcg_attach; dev->ib_dev.detach_mcast = mlx5_ib_mcg_detach; dev->ib_dev.process_mad = mlx5_ib_process_mad; - dev->ib_dev.create_mr = mlx5_ib_create_mr; + dev->ib_dev.alloc_mr = mlx5_ib_alloc_mr; dev->ib_dev.alloc_fast_reg_mr = mlx5_ib_alloc_fast_reg_mr; dev->ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list; dev->ib_dev.free_fast_reg_page_list = mlx5_ib_free_fast_reg_page_list; diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 29c74e9..cd6fb5d 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -573,8 +573,10 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, int mlx5_ib_update_mtt(struct mlx5_ib_mr *mr, u64 start_page_index, int npages, int zap); int mlx5_ib_dereg_mr(struct ib_mr *ibmr); -struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd, - struct ib_mr_init_attr *mr_init_attr); +struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd, + enum ib_mr_type mr_type, + u32 max_entries, + u32 flags); struct ib_mr *mlx5_ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len); struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev, diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index 3197c00..185c963 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -1247,14 +1247,19 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr) return 0; } -struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd, - struct ib_mr_init_attr *mr_init_attr) +struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd, + enum ib_mr_type mr_type, + u32 max_entries, + u32 flags) { struct mlx5_ib_dev *dev = to_mdev(pd->device); struct mlx5_create_mkey_mbox_in *in; struct mlx5_ib_mr *mr; int access_mode, err; - int ndescs = roundup(mr_init_attr->max_reg_descriptors, 4); + int ndescs = roundup(max_entries, 4); + + if (flags) + return ERR_PTR(-EINVAL); mr = kzalloc(sizeof(*mr), GFP_KERNEL); if (!mr) @@ -1270,9 +1275,11 @@ struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd, in->seg.xlt_oct_size = cpu_to_be32(ndescs); in->seg.qpn_mkey7_0 = cpu_to_be32(0xffffff << 8); in->seg.flags_pd = cpu_to_be32(to_mpd(pd)->pdn); - access_mode = MLX5_ACCESS_MODE_MTT; - if (mr_init_attr->flags & IB_MR_SIGNATURE_EN) { + if (mr_type == IB_MR_TYPE_FAST_REG) { + access_mode = MLX5_ACCESS_MODE_MTT; + in->seg.log2_page_size = PAGE_SHIFT; + } else if (mr_type == IB_MR_TYPE_SIGNATURE) { u32 psv_index[2]; in->seg.flags_pd = cpu_to_be32(be32_to_cpu(in->seg.flags_pd) | @@ -1298,6 +1305,10 @@ struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd, mr->sig->sig_err_exists = false; /* Next UMR, Arm SIGERR */ ++mr->sig->sigerr_count; + } else { + mlx5_ib_warn(dev, "Invalid mr type %d\n", mr_type); + err = -EINVAL; + goto err_free_in; } in->seg.flags = MLX5_PERM_UMR_EN | access_mode; diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index 7a5c49f..6be4d4a 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -326,8 +326,6 @@ iser_alloc_pi_ctx(struct ib_device *ib_device, unsigned int size) { struct iser_pi_context *pi_ctx = NULL; - struct ib_mr_init_attr mr_init_attr = {.max_reg_descriptors = 2, - .flags = IB_MR_SIGNATURE_EN}; int ret; desc->pi_ctx = kzalloc(sizeof(*desc->pi_ctx), GFP_KERNEL); @@ -342,7 +340,7 @@ iser_alloc_pi_ctx(struct ib_device *ib_device, goto alloc_reg_res_err; } - pi_ctx->sig_mr = ib_create_mr(pd, &mr_init_attr); + pi_ctx->sig_mr = ib_alloc_mr(pd, IB_MR_TYPE_SIGNATURE, 2, 0); if (IS_ERR(pi_ctx->sig_mr)) { ret = PTR_ERR(pi_ctx->sig_mr); goto sig_mr_failure; diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index e59228d..f0b7c9b 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -508,7 +508,6 @@ isert_create_pi_ctx(struct fast_reg_descriptor *desc, struct ib_device *device, struct ib_pd *pd) { - struct ib_mr_init_attr mr_init_attr; struct pi_context *pi_ctx; int ret; @@ -536,10 +535,7 @@ isert_create_pi_ctx(struct fast_reg_descriptor *desc, } desc->ind |= ISERT_PROT_KEY_VALID; - memset(&mr_init_attr, 0, sizeof(mr_init_attr)); - mr_init_attr.max_reg_descriptors = 2; - mr_init_attr.flags |= IB_MR_SIGNATURE_EN; - pi_ctx->sig_mr = ib_create_mr(pd, &mr_init_attr); + pi_ctx->sig_mr = ib_alloc_mr(pd, IB_MR_TYPE_SIGNATURE, 2, 0); if (IS_ERR(pi_ctx->sig_mr)) { isert_err("Failed to allocate signature enabled mr err=%ld\n", PTR_ERR(pi_ctx->sig_mr)); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 4468a64..5ec9a70 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -556,20 +556,9 @@ __attribute_const__ int ib_rate_to_mult(enum ib_rate rate); */ __attribute_const__ int ib_rate_to_mbps(enum ib_rate rate); -enum ib_mr_create_flags { - IB_MR_SIGNATURE_EN = 1, -}; - -/** - * ib_mr_init_attr - Memory region init attributes passed to routine - * ib_create_mr. - * @max_reg_descriptors: max number of registration descriptors that - * may be used with registration work requests. - * @flags: MR creation flags bit mask. - */ -struct ib_mr_init_attr { - int max_reg_descriptors; - u32 flags; +enum ib_mr_type { + IB_MR_TYPE_FAST_REG, + IB_MR_TYPE_SIGNATURE, }; /** @@ -1668,8 +1657,10 @@ struct ib_device { int (*query_mr)(struct ib_mr *mr, struct ib_mr_attr *mr_attr); int (*dereg_mr)(struct ib_mr *mr); - struct ib_mr * (*create_mr)(struct ib_pd *pd, - struct ib_mr_init_attr *mr_init_attr); + struct ib_mr * (*alloc_mr)(struct ib_pd *pd, + enum ib_mr_type mr_type, + u32 max_entries, + u32 flags); struct ib_mr * (*alloc_fast_reg_mr)(struct ib_pd *pd, int max_page_list_len); struct ib_fast_reg_page_list * (*alloc_fast_reg_page_list)(struct ib_device *device, @@ -2806,15 +2797,10 @@ int ib_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr); */ int ib_dereg_mr(struct ib_mr *mr); - -/** - * ib_create_mr - Allocates a memory region that may be used for - * signature handover operations. - * @pd: The protection domain associated with the region. - * @mr_init_attr: memory region init attributes. - */ -struct ib_mr *ib_create_mr(struct ib_pd *pd, - struct ib_mr_init_attr *mr_init_attr); +struct ib_mr *ib_alloc_mr(struct ib_pd *pd, + enum ib_mr_type mr_type, + u32 max_entries, + u32 flags); /** * ib_alloc_fast_reg_mr - Allocates memory region usable with the -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
[parent not found: <1437548143-24893-2-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API [not found] ` <1437548143-24893-2-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2015-07-22 16:34 ` Jason Gunthorpe [not found] ` <20150722163405.GA26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-23 0:57 ` Hefty, Sean 1 sibling, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-22 16:34 UTC (permalink / raw) To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer > +/** > + * ib_alloc_mr() - Allocates a memory region > + * @pd: protection domain associated with the region > + * @mr_type: memory region type > + * @max_entries: maximum registration entries available > + * @flags: create flags > + */ Can you update this comment to elaborate some more on what the parameters are? 'max_entries' is the number of s/g elements or something? > +enum ib_mr_type { > + IB_MR_TYPE_FAST_REG, > + IB_MR_TYPE_SIGNATURE, > }; Sure would be nice to have some documentation for what these things do.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150722163405.GA26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API [not found] ` <20150722163405.GA26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-22 16:44 ` Christoph Hellwig [not found] ` <20150722164421.GA6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-07-22 16:59 ` Sagi Grimberg 1 sibling, 1 reply; 142+ messages in thread From: Christoph Hellwig @ 2015-07-22 16:44 UTC (permalink / raw) To: Jason Gunthorpe Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Wed, Jul 22, 2015 at 10:34:05AM -0600, Jason Gunthorpe wrote: > > +/** > > + * ib_alloc_mr() - Allocates a memory region > > + * @pd: protection domain associated with the region > > + * @mr_type: memory region type > > + * @max_entries: maximum registration entries available > > + * @flags: create flags > > + */ > > Can you update this comment to elaborate some more on what the > parameters are? 'max_entries' is the number of s/g elements or > something? > > > +enum ib_mr_type { > > + IB_MR_TYPE_FAST_REG, > > + IB_MR_TYPE_SIGNATURE, > > }; > > Sure would be nice to have some documentation for what these things > do.. Agreed on both counts. Otherwise this looks pretty good to me. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150722164421.GA6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>]
* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API [not found] ` <20150722164421.GA6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> @ 2015-07-22 16:58 ` Sagi Grimberg [not found] ` <55AFCBAF.2000504-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 16:58 UTC (permalink / raw) To: Christoph Hellwig, Jason Gunthorpe Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/22/2015 7:44 PM, Christoph Hellwig wrote: > On Wed, Jul 22, 2015 at 10:34:05AM -0600, Jason Gunthorpe wrote: >>> +/** >>> + * ib_alloc_mr() - Allocates a memory region >>> + * @pd: protection domain associated with the region >>> + * @mr_type: memory region type >>> + * @max_entries: maximum registration entries available >>> + * @flags: create flags >>> + */ >> >> Can you update this comment to elaborate some more on what the >> parameters are? 'max_entries' is the number of s/g elements or >> something? >> >>> +enum ib_mr_type { >>> + IB_MR_TYPE_FAST_REG, >>> + IB_MR_TYPE_SIGNATURE, >>> }; >> >> Sure would be nice to have some documentation for what these things >> do.. > > Agreed on both counts. Otherwise this looks pretty good to me. I can add some more documentation here... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <55AFCBAF.2000504-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API [not found] ` <55AFCBAF.2000504-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2015-07-22 19:05 ` Jason Gunthorpe [not found] ` <20150722190555.GB4527-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-22 19:05 UTC (permalink / raw) To: Sagi Grimberg Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Wed, Jul 22, 2015 at 07:58:23PM +0300, Sagi Grimberg wrote: > On 7/22/2015 7:44 PM, Christoph Hellwig wrote: > >On Wed, Jul 22, 2015 at 10:34:05AM -0600, Jason Gunthorpe wrote: > >>>+/** > >>>+ * ib_alloc_mr() - Allocates a memory region > >>>+ * @pd: protection domain associated with the region > >>>+ * @mr_type: memory region type > >>>+ * @max_entries: maximum registration entries available > >>>+ * @flags: create flags > >>>+ */ > >> > >>Can you update this comment to elaborate some more on what the > >>parameters are? 'max_entries' is the number of s/g elements or > >>something? > >> > >>>+enum ib_mr_type { > >>>+ IB_MR_TYPE_FAST_REG, > >>>+ IB_MR_TYPE_SIGNATURE, > >>> }; > >> > >>Sure would be nice to have some documentation for what these things > >>do.. > > > >Agreed on both counts. Otherwise this looks pretty good to me. > > I can add some more documentation here... So, I was wrong, 'max_entries' is the number of page entires, not really the s/g element limit? In other words the ULP can submit at most max_entires*PAGE_SIZE bytes for the non ARB_SG case For the ARB_SG case.. It is some other more difficult computation? It is somewhat ugly to ask for this upfront as a hard limit.. Is there any reason we can't use a hint_prealloc_pages as the argument here, and then realloc in the map routine if the hint turns out to be too small for a particular s/g list? It looks like all drivers can support this. That would make it much easier to use correctly, and free ULPs from dealing with any impedance mismatch with core kernel code that assumes a sg list length limit, or overall side limit, not some oddball computation based on pages... Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150722190555.GB4527-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API [not found] ` <20150722190555.GB4527-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-23 10:07 ` Sagi Grimberg [not found] ` <55B0BCFC.6040602-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-23 10:07 UTC (permalink / raw) To: Jason Gunthorpe Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/22/2015 10:05 PM, Jason Gunthorpe wrote: > On Wed, Jul 22, 2015 at 07:58:23PM +0300, Sagi Grimberg wrote: >> On 7/22/2015 7:44 PM, Christoph Hellwig wrote: >>> On Wed, Jul 22, 2015 at 10:34:05AM -0600, Jason Gunthorpe wrote: >>>>> +/** >>>>> + * ib_alloc_mr() - Allocates a memory region >>>>> + * @pd: protection domain associated with the region >>>>> + * @mr_type: memory region type >>>>> + * @max_entries: maximum registration entries available >>>>> + * @flags: create flags >>>>> + */ >>>> >>>> Can you update this comment to elaborate some more on what the >>>> parameters are? 'max_entries' is the number of s/g elements or >>>> something? >>>> >>>>> +enum ib_mr_type { >>>>> + IB_MR_TYPE_FAST_REG, >>>>> + IB_MR_TYPE_SIGNATURE, >>>>> }; >>>> >>>> Sure would be nice to have some documentation for what these things >>>> do.. >>> >>> Agreed on both counts. Otherwise this looks pretty good to me. >> >> I can add some more documentation here... > > So, I was wrong, 'max_entries' is the number of page entires, not > really the s/g element limit? The max_entries stands for the maximum number of sg entries. Other than that, the SG list must meet the requirements documented in ib_map_mr_sg. The reason I named max_entries is because might might not be pages but real SG elements. It stands for maximum registration entries. Do you have a better name? > > In other words the ULP can submit at most max_entires*PAGE_SIZE bytes > for the non ARB_SG case > > For the ARB_SG case.. It is some other more difficult computation? Not really. The ULP needs to submit sg_nents < max_entries. The SG list needs to meed the alignment requirements. For ARB_SG, the condition is the same, but the SG is free from the alignment constraints. > > It is somewhat ugly to ask for this upfront as a hard limit.. > > Is there any reason we can't use a hint_prealloc_pages as the argument > here, and then realloc in the map routine if the hint turns out to be > too small for a particular s/g list? The reason is that it is not possible. The memory key allocation reserves resources in the device translation tables. realloc means reallocating the memory key. In any event, this is not possible in the IO path. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <55B0BCFC.6040602-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API [not found] ` <55B0BCFC.6040602-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2015-07-23 19:08 ` Jason Gunthorpe [not found] ` <20150723190855.GB31577-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-23 19:08 UTC (permalink / raw) To: Sagi Grimberg Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Thu, Jul 23, 2015 at 01:07:56PM +0300, Sagi Grimberg wrote: > On 7/22/2015 10:05 PM, Jason Gunthorpe wrote: > The reason I named max_entries is because might might not be pages but > real SG elements. It stands for maximum registration entries. > > Do you have a better name? I wouldn't try and be both.. Use 'max_num_sg' and document that no aggregate scatterlist with length larger than 'max_num_sg*PAGE_SIZE' or with more entries than max_num_sg can be submitted? Maybe document with ARB_SG that it is not length limited? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150723190855.GB31577-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API [not found] ` <20150723190855.GB31577-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-26 8:51 ` Sagi Grimberg 0 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-26 8:51 UTC (permalink / raw) To: Jason Gunthorpe Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/23/2015 10:08 PM, Jason Gunthorpe wrote: > On Thu, Jul 23, 2015 at 01:07:56PM +0300, Sagi Grimberg wrote: >> On 7/22/2015 10:05 PM, Jason Gunthorpe wrote: >> The reason I named max_entries is because might might not be pages but >> real SG elements. It stands for maximum registration entries. >> >> Do you have a better name? > > I wouldn't try and be both.. > > Use 'max_num_sg' and document that no aggregate scatterlist with > length larger than 'max_num_sg*PAGE_SIZE' or with more entries than > max_num_sg can be submitted? > > Maybe document with ARB_SG that it is not length limited? OK, I can do that. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API [not found] ` <20150722163405.GA26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-22 16:44 ` Christoph Hellwig @ 2015-07-22 16:59 ` Sagi Grimberg [not found] ` <55AFCBE4.1070803-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 1 sibling, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 16:59 UTC (permalink / raw) To: Jason Gunthorpe, Sagi Grimberg Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/22/2015 7:34 PM, Jason Gunthorpe wrote: >> +/** >> + * ib_alloc_mr() - Allocates a memory region >> + * @pd: protection domain associated with the region >> + * @mr_type: memory region type >> + * @max_entries: maximum registration entries available >> + * @flags: create flags >> + */ > > Can you update this comment to elaborate some more on what the > parameters are? 'max_entries' is the number of s/g elements or > something? > >> +enum ib_mr_type { >> + IB_MR_TYPE_FAST_REG, >> + IB_MR_TYPE_SIGNATURE, >> }; > > Sure would be nice to have some documentation for what these things > do.. Do we want to pull ib_get_dma_mr() here with type IB_MR_TYPE_DMA? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <55AFCBE4.1070803-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API [not found] ` <55AFCBE4.1070803-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2015-07-22 17:01 ` Jason Gunthorpe [not found] ` <20150722170120.GC26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-22 17:01 UTC (permalink / raw) To: Sagi Grimberg Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Wed, Jul 22, 2015 at 07:59:16PM +0300, Sagi Grimberg wrote: > Do we want to pull ib_get_dma_mr() here with type IB_MR_TYPE_DMA? I want to get rid of ib_get_dma_mr... My plan was to get rid of it as my last series shows for all lkey usages and then rename it to: ib_get_insecure_all_physical_rkey For the remaining usages, and a future kernel version will taint the kernel if anyone calls it. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150722170120.GC26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API [not found] ` <20150722170120.GC26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-22 17:03 ` Sagi Grimberg 0 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 17:03 UTC (permalink / raw) To: Jason Gunthorpe Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/22/2015 8:01 PM, Jason Gunthorpe wrote: > On Wed, Jul 22, 2015 at 07:59:16PM +0300, Sagi Grimberg wrote: >> Do we want to pull ib_get_dma_mr() here with type IB_MR_TYPE_DMA? > > I want to get rid of ib_get_dma_mr... That's why I asked :) So I'll take it as a no... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* RE: [PATCH WIP 01/43] IB: Modify ib_create_mr API [not found] ` <1437548143-24893-2-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 16:34 ` Jason Gunthorpe @ 2015-07-23 0:57 ` Hefty, Sean [not found] ` <1828884A29C6694DAF28B7E6B8A82373A9001357-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> 1 sibling, 1 reply; 142+ messages in thread From: Hefty, Sean @ 2015-07-23 0:57 UTC (permalink / raw) To: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer > +enum ib_mr_type { > + IB_MR_TYPE_FAST_REG, > + IB_MR_TYPE_SIGNATURE, If we're going to go through the trouble of changing everything, I vote for dropping the word 'fast'. It's a marketing term. It's goofy. And the IB spec is goofy for using it. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <1828884A29C6694DAF28B7E6B8A82373A9001357-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API [not found] ` <1828884A29C6694DAF28B7E6B8A82373A9001357-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2015-07-23 9:30 ` Christoph Hellwig [not found] ` <20150723093046.GF32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Christoph Hellwig @ 2015-07-23 9:30 UTC (permalink / raw) To: Hefty, Sean Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Thu, Jul 23, 2015 at 12:57:34AM +0000, Hefty, Sean wrote: > > +enum ib_mr_type { > > + IB_MR_TYPE_FAST_REG, > > + IB_MR_TYPE_SIGNATURE, > > If we're going to go through the trouble of changing everything, I vote > for dropping the word 'fast'. It's a marketing term. It's goofy. And > the IB spec is goofy for using it. Yes. Especially as the infrastructure will be usable to support FMR on legacy adapters as well except that instead of the ib_post_send it'll need a call to the FMR code at the very end. While we're at it wonder if we should consolidate the type and the flags field as well, as the split between the two is a little confusing. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150723093046.GF32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>]
* Re: [PATCH WIP 01/43] IB: Modify ib_create_mr API [not found] ` <20150723093046.GF32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> @ 2015-07-23 10:09 ` Sagi Grimberg 0 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-23 10:09 UTC (permalink / raw) To: Christoph Hellwig, Hefty, Sean Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/23/2015 12:30 PM, Christoph Hellwig wrote: > On Thu, Jul 23, 2015 at 12:57:34AM +0000, Hefty, Sean wrote: >>> +enum ib_mr_type { >>> + IB_MR_TYPE_FAST_REG, >>> + IB_MR_TYPE_SIGNATURE, >> >> If we're going to go through the trouble of changing everything, I vote >> for dropping the word 'fast'. It's a marketing term. It's goofy. And >> the IB spec is goofy for using it. So IB_MR_TYPE_MEM_REG? > > Yes. Especially as the infrastructure will be usable to support FMR > on legacy adapters as well except that instead of the ib_post_send it'll > need a call to the FMR code at the very end. > > While we're at it wonder if we should consolidate the type and the > flags field as well, as the split between the two is a little confusing. I can do that. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* [PATCH WIP 02/43] IB/mlx4: Support ib_alloc_mr verb [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 6:55 ` [PATCH WIP 01/43] IB: Modify ib_create_mr API Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg [not found] ` <1437548143-24893-3-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 6:55 ` [PATCH WIP 03/43] ocrdma: " Sagi Grimberg ` (41 subsequent siblings) 43 siblings, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/mlx4/main.c | 1 + drivers/infiniband/hw/mlx4/mlx4_ib.h | 4 ++++ drivers/infiniband/hw/mlx4/mr.c | 38 ++++++++++++++++++++++++++++++++++++ 3 files changed, 43 insertions(+) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index a6f44ee..54671c7 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -2298,6 +2298,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) ibdev->ib_dev.rereg_user_mr = mlx4_ib_rereg_user_mr; ibdev->ib_dev.dereg_mr = mlx4_ib_dereg_mr; ibdev->ib_dev.alloc_fast_reg_mr = mlx4_ib_alloc_fast_reg_mr; + ibdev->ib_dev.alloc_mr = mlx4_ib_alloc_mr; ibdev->ib_dev.alloc_fast_reg_page_list = mlx4_ib_alloc_fast_reg_page_list; ibdev->ib_dev.free_fast_reg_page_list = mlx4_ib_free_fast_reg_page_list; ibdev->ib_dev.attach_mcast = mlx4_ib_mcg_attach; diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index 334387f..c8b5679 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -680,6 +680,10 @@ struct ib_mw *mlx4_ib_alloc_mw(struct ib_pd *pd, enum ib_mw_type type); int mlx4_ib_bind_mw(struct ib_qp *qp, struct ib_mw *mw, struct ib_mw_bind *mw_bind); int mlx4_ib_dealloc_mw(struct ib_mw *mw); +struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd, + enum ib_mr_type mr_type, + u32 max_entries, + u32 flags); struct ib_mr *mlx4_ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len); struct ib_fast_reg_page_list *mlx4_ib_alloc_fast_reg_page_list(struct ib_device *ibdev, diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c index e0d2717..3cba374 100644 --- a/drivers/infiniband/hw/mlx4/mr.c +++ b/drivers/infiniband/hw/mlx4/mr.c @@ -350,6 +350,44 @@ int mlx4_ib_dealloc_mw(struct ib_mw *ibmw) return 0; } +struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd, + enum ib_mr_type mr_type, + u32 max_entries, + u32 flags) +{ + struct mlx4_ib_dev *dev = to_mdev(pd->device); + struct mlx4_ib_mr *mr; + int err; + + if (mr_type != IB_MR_TYPE_FAST_REG || flags) + return ERR_PTR(-EINVAL); + + mr = kmalloc(sizeof *mr, GFP_KERNEL); + if (!mr) + return ERR_PTR(-ENOMEM); + + err = mlx4_mr_alloc(dev->dev, to_mpd(pd)->pdn, 0, 0, 0, + max_entries, 0, &mr->mmr); + if (err) + goto err_free; + + err = mlx4_mr_enable(dev->dev, &mr->mmr); + if (err) + goto err_mr; + + mr->ibmr.rkey = mr->ibmr.lkey = mr->mmr.key; + mr->umem = NULL; + + return &mr->ibmr; + +err_mr: + (void) mlx4_mr_free(dev->dev, &mr->mmr); + +err_free: + kfree(mr); + return ERR_PTR(err); +} + struct ib_mr *mlx4_ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len) { -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
[parent not found: <1437548143-24893-3-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH WIP 02/43] IB/mlx4: Support ib_alloc_mr verb [not found] ` <1437548143-24893-3-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2015-07-22 16:58 ` Jason Gunthorpe [not found] ` <20150722165831.GB26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-22 16:58 UTC (permalink / raw) To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Wed, Jul 22, 2015 at 09:55:02AM +0300, Sagi Grimberg wrote: > > +struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd, > + enum ib_mr_type mr_type, > + u32 max_entries, > + u32 flags) > +{ This is just a copy of mlx4_ib_alloc_fast_reg_mr with this added: > + if (mr_type != IB_MR_TYPE_FAST_REG || flags) > + return ERR_PTR(-EINVAL); Are all the driver updates the same? It looks like it. I'd suggest shortening this patch series, have the core provide the wrapper immediately: struct ib_mr *ib_alloc_mr(struct ib_pd *pd, { ... if (pd->device->alloc_mr) { mr = pd->device->alloc_mr(pd, mr_type, max_entries, flags); } else { if (mr_type != IB_MR_TYPE_FAST_REG || flags || !ib_dev->alloc_fast_reg_mr) return ERR_PTR(-ENOSYS); mr = pd->device->alloc_fast_reg_mr(..); } } Then go through the series to remove ib_alloc_fast_reg_mr Then go through one series to migrate the drivers from alloc_fast_reg_mr to alloc_mr Then entirely drop alloc_fast_reg_mr from the driver API. That should be shorter and easier to read the driver diffs, which is the major change here. This whole section (up to 20) looks reasonable to me.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150722165831.GB26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 02/43] IB/mlx4: Support ib_alloc_mr verb [not found] ` <20150722165831.GB26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-22 17:22 ` Sagi Grimberg [not found] ` <55AFD14C.8040007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 17:22 UTC (permalink / raw) To: Jason Gunthorpe, Sagi Grimberg Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/22/2015 7:58 PM, Jason Gunthorpe wrote: > On Wed, Jul 22, 2015 at 09:55:02AM +0300, Sagi Grimberg wrote: >> >> +struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd, >> + enum ib_mr_type mr_type, >> + u32 max_entries, >> + u32 flags) >> +{ > > This is just a copy of mlx4_ib_alloc_fast_reg_mr with > this added: > >> + if (mr_type != IB_MR_TYPE_FAST_REG || flags) >> + return ERR_PTR(-EINVAL); > > Are all the driver updates the same? It looks like it. > > I'd suggest shortening this patch series, have the core provide the > wrapper immediately: > > struct ib_mr *ib_alloc_mr(struct ib_pd *pd, > { > ... > > if (pd->device->alloc_mr) { > mr = pd->device->alloc_mr(pd, mr_type, max_entries, flags); > } else { > if (mr_type != IB_MR_TYPE_FAST_REG || flags || > !ib_dev->alloc_fast_reg_mr) > return ERR_PTR(-ENOSYS); > mr = pd->device->alloc_fast_reg_mr(..); > } > } > > Then go through the series to remove ib_alloc_fast_reg_mr > > Then go through one series to migrate the drivers from > alloc_fast_reg_mr to alloc_mr > > Then entirely drop alloc_fast_reg_mr from the driver API. > > That should be shorter and easier to read the driver diffs, which is > the major change here. Yea, it would be better... Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <55AFD14C.8040007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* Re: [PATCH WIP 02/43] IB/mlx4: Support ib_alloc_mr verb [not found] ` <55AFD14C.8040007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2015-07-22 18:50 ` Steve Wise [not found] ` <55AFE5D9.3050102-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Steve Wise @ 2015-07-22 18:50 UTC (permalink / raw) To: Sagi Grimberg, Jason Gunthorpe, Sagi Grimberg Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/22/2015 12:22 PM, Sagi Grimberg wrote: > On 7/22/2015 7:58 PM, Jason Gunthorpe wrote: >> On Wed, Jul 22, 2015 at 09:55:02AM +0300, Sagi Grimberg wrote: >>> >>> +struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd, >>> + enum ib_mr_type mr_type, >>> + u32 max_entries, >>> + u32 flags) >>> +{ >> >> This is just a copy of mlx4_ib_alloc_fast_reg_mr with >> this added: >> >>> + if (mr_type != IB_MR_TYPE_FAST_REG || flags) >>> + return ERR_PTR(-EINVAL); >> >> Are all the driver updates the same? It looks like it. >> >> I'd suggest shortening this patch series, have the core provide the >> wrapper immediately: >> >> struct ib_mr *ib_alloc_mr(struct ib_pd *pd, >> { >> ... >> >> if (pd->device->alloc_mr) { >> mr = pd->device->alloc_mr(pd, mr_type, max_entries, flags); >> } else { >> if (mr_type != IB_MR_TYPE_FAST_REG || flags || >> !ib_dev->alloc_fast_reg_mr) >> return ERR_PTR(-ENOSYS); >> mr = pd->device->alloc_fast_reg_mr(..); >> } >> } >> >> Then go through the series to remove ib_alloc_fast_reg_mr >> >> Then go through one series to migrate the drivers from >> alloc_fast_reg_mr to alloc_mr >> >> Then entirely drop alloc_fast_reg_mr from the driver API. >> >> That should be shorter and easier to read the driver diffs, which is >> the major change here. > > Yea, it would be better... 43 patches overflows my stack ;) I agree with Jason's suggestion. Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <55AFE5D9.3050102-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>]
* Re: [PATCH WIP 02/43] IB/mlx4: Support ib_alloc_mr verb [not found] ` <55AFE5D9.3050102-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> @ 2015-07-22 18:54 ` Jason Gunthorpe [not found] ` <20150722185410.GA4527-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-22 18:54 UTC (permalink / raw) To: Steve Wise Cc: Sagi Grimberg, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Wed, Jul 22, 2015 at 01:50:01PM -0500, Steve Wise wrote: > 43 patches overflows my stack ;) I agree with Jason's suggestion. Saig, you may as well just send the ib_alloc_mr rework as a series and get it done with, I'd pass off on the core parts of v2. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150722185410.GA4527-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 02/43] IB/mlx4: Support ib_alloc_mr verb [not found] ` <20150722185410.GA4527-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-23 10:10 ` Sagi Grimberg 0 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-23 10:10 UTC (permalink / raw) To: Jason Gunthorpe, Steve Wise Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/22/2015 9:54 PM, Jason Gunthorpe wrote: > On Wed, Jul 22, 2015 at 01:50:01PM -0500, Steve Wise wrote: > >> 43 patches overflows my stack ;) I agree with Jason's suggestion. > > Saig, you may as well just send the ib_alloc_mr rework as a series and > get it done with, I'd pass off on the core parts of v2. I'll split that off from the rest. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* [PATCH WIP 03/43] ocrdma: Support ib_alloc_mr verb [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 6:55 ` [PATCH WIP 01/43] IB: Modify ib_create_mr API Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 02/43] IB/mlx4: Support ib_alloc_mr verb Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 04/43] iw_cxgb4: " Sagi Grimberg ` (40 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/ocrdma/ocrdma_main.c | 1 + drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 47 +++++++++++++++++++++++++++++ drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 4 +++ 3 files changed, 52 insertions(+) diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c index 8a1398b..d7ebe04 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c @@ -294,6 +294,7 @@ static int ocrdma_register_device(struct ocrdma_dev *dev) dev->ibdev.dereg_mr = ocrdma_dereg_mr; dev->ibdev.reg_user_mr = ocrdma_reg_user_mr; + dev->ibdev.alloc_mr = ocrdma_alloc_mr; dev->ibdev.alloc_fast_reg_mr = ocrdma_alloc_frmr; dev->ibdev.alloc_fast_reg_page_list = ocrdma_alloc_frmr_page_list; dev->ibdev.free_fast_reg_page_list = ocrdma_free_frmr_page_list; diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c index 5bb61eb..3487780 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c @@ -2983,6 +2983,53 @@ int ocrdma_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags cq_flags) return 0; } +struct ib_mr *ocrdma_alloc_mr(struct ib_pd *ibpd, + enum ib_mr_type mr_type, + u32 max_entries, + u32 flags) +{ + int status; + struct ocrdma_mr *mr; + struct ocrdma_pd *pd = get_ocrdma_pd(ibpd); + struct ocrdma_dev *dev = get_ocrdma_dev(ibpd->device); + + if (mr_type != IB_MR_TYPE_FAST_REG || flags) + return ERR_PTR(-EINVAL); + + if (max_entries > dev->attr.max_pages_per_frmr) + return ERR_PTR(-EINVAL); + + mr = kzalloc(sizeof(*mr), GFP_KERNEL); + if (!mr) + return ERR_PTR(-ENOMEM); + + status = ocrdma_get_pbl_info(dev, mr, max_entries); + if (status) + goto pbl_err; + mr->hwmr.fr_mr = 1; + mr->hwmr.remote_rd = 0; + mr->hwmr.remote_wr = 0; + mr->hwmr.local_rd = 0; + mr->hwmr.local_wr = 0; + mr->hwmr.mw_bind = 0; + status = ocrdma_build_pbl_tbl(dev, &mr->hwmr); + if (status) + goto pbl_err; + status = ocrdma_reg_mr(dev, &mr->hwmr, pd->id, 0); + if (status) + goto mbx_err; + mr->ibmr.rkey = mr->hwmr.lkey; + mr->ibmr.lkey = mr->hwmr.lkey; + dev->stag_arr[(mr->hwmr.lkey >> 8) & (OCRDMA_MAX_STAG - 1)] = + (unsigned long) mr; + return &mr->ibmr; +mbx_err: + ocrdma_free_mr_pbl_tbl(dev, &mr->hwmr); +pbl_err: + kfree(mr); + return ERR_PTR(-ENOMEM); +} + struct ib_mr *ocrdma_alloc_frmr(struct ib_pd *ibpd, int max_page_list_len) { int status; diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h index b15c608..eebcda2 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h @@ -96,6 +96,10 @@ struct ib_mr *ocrdma_reg_kernel_mr(struct ib_pd *, int num_phys_buf, int acc, u64 *iova_start); struct ib_mr *ocrdma_reg_user_mr(struct ib_pd *, u64 start, u64 length, u64 virt, int acc, struct ib_udata *); +struct ib_mr *ocrdma_alloc_mr(struct ib_pd *pd, + enum ib_mr_type mr_type, + u32 max_entries, + u32 flags); struct ib_mr *ocrdma_alloc_frmr(struct ib_pd *pd, int max_page_list_len); struct ib_fast_reg_page_list *ocrdma_alloc_frmr_page_list(struct ib_device *ibdev, -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 04/43] iw_cxgb4: Support ib_alloc_mr verb [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (2 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 03/43] ocrdma: " Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 05/43] cxgb3: " Sagi Grimberg ` (39 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 4 +++ drivers/infiniband/hw/cxgb4/mem.c | 57 ++++++++++++++++++++++++++++++++++ drivers/infiniband/hw/cxgb4/provider.c | 1 + 3 files changed, 62 insertions(+) diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h index cc77844..97b2568 100644 --- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h +++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h @@ -970,6 +970,10 @@ void c4iw_free_fastreg_pbl(struct ib_fast_reg_page_list *page_list); struct ib_fast_reg_page_list *c4iw_alloc_fastreg_pbl( struct ib_device *device, int page_list_len); +struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd, + enum ib_mr_type mr_type, + u32 max_entries, + u32 flags); struct ib_mr *c4iw_alloc_fast_reg_mr(struct ib_pd *pd, int pbl_depth); int c4iw_dealloc_mw(struct ib_mw *mw); struct ib_mw *c4iw_alloc_mw(struct ib_pd *pd, enum ib_mw_type type); diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c index cff815b..7ee01ce 100644 --- a/drivers/infiniband/hw/cxgb4/mem.c +++ b/drivers/infiniband/hw/cxgb4/mem.c @@ -853,6 +853,63 @@ int c4iw_dealloc_mw(struct ib_mw *mw) return 0; } +struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd, + enum ib_mr_type mr_type, + u32 max_entries, + u32 flags) +{ + struct c4iw_dev *rhp; + struct c4iw_pd *php; + struct c4iw_mr *mhp; + u32 mmid; + u32 stag = 0; + int ret = 0; + + if (mr_type != IB_MR_TYPE_FAST_REG || flags) + return ERR_PTR(-EINVAL); + + php = to_c4iw_pd(pd); + rhp = php->rhp; + mhp = kzalloc(sizeof(*mhp), GFP_KERNEL); + if (!mhp) { + ret = -ENOMEM; + goto err; + } + + mhp->rhp = rhp; + ret = alloc_pbl(mhp, max_entries); + if (ret) + goto err1; + mhp->attr.pbl_size = max_entries; + ret = allocate_stag(&rhp->rdev, &stag, php->pdid, + mhp->attr.pbl_size, mhp->attr.pbl_addr); + if (ret) + goto err2; + mhp->attr.pdid = php->pdid; + mhp->attr.type = FW_RI_STAG_NSMR; + mhp->attr.stag = stag; + mhp->attr.state = 1; + mmid = (stag) >> 8; + mhp->ibmr.rkey = mhp->ibmr.lkey = stag; + if (insert_handle(rhp, &rhp->mmidr, mhp, mmid)) { + ret = -ENOMEM; + goto err3; + } + + PDBG("%s mmid 0x%x mhp %p stag 0x%x\n", __func__, mmid, mhp, stag); + return &(mhp->ibmr); +err3: + dereg_mem(&rhp->rdev, stag, mhp->attr.pbl_size, + mhp->attr.pbl_addr); +err2: + c4iw_pblpool_free(&mhp->rhp->rdev, mhp->attr.pbl_addr, + mhp->attr.pbl_size << 3); +err1: + kfree(mhp); +err: + return ERR_PTR(ret); +} + struct ib_mr *c4iw_alloc_fast_reg_mr(struct ib_pd *pd, int pbl_depth) { struct c4iw_dev *rhp; diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c index 6eee3d3..2885aba 100644 --- a/drivers/infiniband/hw/cxgb4/provider.c +++ b/drivers/infiniband/hw/cxgb4/provider.c @@ -556,6 +556,7 @@ int c4iw_register_device(struct c4iw_dev *dev) dev->ibdev.alloc_mw = c4iw_alloc_mw; dev->ibdev.bind_mw = c4iw_bind_mw; dev->ibdev.dealloc_mw = c4iw_dealloc_mw; + dev->ibdev.alloc_mr = c4iw_alloc_mr; dev->ibdev.alloc_fast_reg_mr = c4iw_alloc_fast_reg_mr; dev->ibdev.alloc_fast_reg_page_list = c4iw_alloc_fastreg_pbl; dev->ibdev.free_fast_reg_page_list = c4iw_free_fastreg_pbl; -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 05/43] cxgb3: Support ib_alloc_mr verb [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (3 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 04/43] iw_cxgb4: " Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 06/43] nes: " Sagi Grimberg ` (38 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/cxgb3/iwch_provider.c | 53 +++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index b1b7323..d0e9e2d 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -796,6 +796,58 @@ static int iwch_dealloc_mw(struct ib_mw *mw) return 0; } +static struct ib_mr *iwch_alloc_mr(struct ib_pd *pd, + enum ib_mr_type mr_type, + u32 max_entries, + u32 flags) +{ + struct iwch_dev *rhp; + struct iwch_pd *php; + struct iwch_mr *mhp; + u32 mmid; + u32 stag = 0; + int ret = 0; + + if (mr_type != IB_MR_TYPE_FAST_REG || flags) + return ERR_PTR(-EINVAL); + + php = to_iwch_pd(pd); + rhp = php->rhp; + mhp = kzalloc(sizeof(*mhp), GFP_KERNEL); + if (!mhp) + goto err; + + mhp->rhp = rhp; + ret = iwch_alloc_pbl(mhp, max_entries); + if (ret) + goto err1; + mhp->attr.pbl_size = max_entries; + ret = cxio_allocate_stag(&rhp->rdev, &stag, php->pdid, + mhp->attr.pbl_size, mhp->attr.pbl_addr); + if (ret) + goto err2; + mhp->attr.pdid = php->pdid; + mhp->attr.type = TPT_NON_SHARED_MR; + mhp->attr.stag = stag; + mhp->attr.state = 1; + mmid = (stag) >> 8; + mhp->ibmr.rkey = mhp->ibmr.lkey = stag; + if (insert_handle(rhp, &rhp->mmidr, mhp, mmid)) + goto err3; + + PDBG("%s mmid 0x%x mhp %p stag 0x%x\n", __func__, mmid, mhp, stag); + return &(mhp->ibmr); +err3: + cxio_dereg_mem(&rhp->rdev, stag, mhp->attr.pbl_size, + mhp->attr.pbl_addr); +err2: + iwch_free_pbl(mhp); +err1: + kfree(mhp); +err: + return ERR_PTR(ret); +} + static struct ib_mr *iwch_alloc_fast_reg_mr(struct ib_pd *pd, int pbl_depth) { struct iwch_dev *rhp; @@ -1439,6 +1491,7 @@ int iwch_register_device(struct iwch_dev *dev) dev->ibdev.alloc_mw = iwch_alloc_mw; dev->ibdev.bind_mw = iwch_bind_mw; dev->ibdev.dealloc_mw = iwch_dealloc_mw; + dev->ibdev.alloc_mr = iwch_alloc_mr; dev->ibdev.alloc_fast_reg_mr = iwch_alloc_fast_reg_mr; dev->ibdev.alloc_fast_reg_page_list = iwch_alloc_fastreg_pbl; dev->ibdev.free_fast_reg_page_list = iwch_free_fastreg_pbl; -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 06/43] nes: Support ib_alloc_mr verb [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (4 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 05/43] cxgb3: " Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 07/43] qib: " Sagi Grimberg ` (37 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/nes/nes_verbs.c | 73 +++++++++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+) diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c index fbc43e5..ac63763 100644 --- a/drivers/infiniband/hw/nes/nes_verbs.c +++ b/drivers/infiniband/hw/nes/nes_verbs.c @@ -375,6 +375,78 @@ static int alloc_fast_reg_mr(struct nes_device *nesdev, struct nes_pd *nespd, } /* + * nes_alloc_mr + */ +static struct ib_mr *nes_alloc_mr(struct ib_pd *ibpd, + enum ib_mr_type mr_type, + u32 max_entries, + u32 flags) +{ + struct nes_pd *nespd = to_nespd(ibpd); + struct nes_vnic *nesvnic = to_nesvnic(ibpd->device); + struct nes_device *nesdev = nesvnic->nesdev; + struct nes_adapter *nesadapter = nesdev->nesadapter; + + u32 next_stag_index; + u8 stag_key = 0; + u32 driver_key = 0; + int err = 0; + u32 stag_index = 0; + struct nes_mr *nesmr; + u32 stag; + int ret; + struct ib_mr *ibmr; + + if (mr_type != IB_MR_TYPE_FAST_REG || flags) + return ERR_PTR(-EINVAL); + +/* + * Note: Set to always use a fixed length single page entry PBL. This is to allow + * for the fast_reg_mr operation to always know the size of the PBL. + */ + if (max_entries > (NES_4K_PBL_CHUNK_SIZE / sizeof(u64))) + return ERR_PTR(-E2BIG); + + get_random_bytes(&next_stag_index, sizeof(next_stag_index)); + stag_key = (u8)next_stag_index; + next_stag_index >>= 8; + next_stag_index %= nesadapter->max_mr; + + err = nes_alloc_resource(nesadapter, nesadapter->allocated_mrs, + nesadapter->max_mr, &stag_index, + &next_stag_index, NES_RESOURCE_FAST_MR); + if (err) + return ERR_PTR(err); + + nesmr = kzalloc(sizeof(*nesmr), GFP_KERNEL); + if (!nesmr) { + nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); + return ERR_PTR(-ENOMEM); + } + + stag = stag_index << 8; + stag |= driver_key; + stag += (u32)stag_key; + + nes_debug(NES_DBG_MR, "Allocating STag 0x%08X index = 0x%08X\n", + stag, stag_index); + + ret = alloc_fast_reg_mr(nesdev, nespd, stag, max_entries); + + if (ret == 0) { + nesmr->ibmr.rkey = stag; + nesmr->ibmr.lkey = stag; + nesmr->mode = IWNES_MEMREG_TYPE_FMEM; + ibmr = &nesmr->ibmr; + } else { + kfree(nesmr); + nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); + ibmr = ERR_PTR(-ENOMEM); + } + return ibmr; +} + +/* * nes_alloc_fast_reg_mr */ static struct ib_mr *nes_alloc_fast_reg_mr(struct ib_pd *ibpd, int max_page_list_len) @@ -3929,6 +4001,7 @@ struct nes_ib_device *nes_init_ofa_device(struct net_device *netdev) nesibdev->ibdev.dealloc_mw = nes_dealloc_mw; nesibdev->ibdev.bind_mw = nes_bind_mw; + nesibdev->ibdev.alloc_mr = nes_alloc_mr; nesibdev->ibdev.alloc_fast_reg_mr = nes_alloc_fast_reg_mr; nesibdev->ibdev.alloc_fast_reg_page_list = nes_alloc_fast_reg_page_list; nesibdev->ibdev.free_fast_reg_page_list = nes_free_fast_reg_page_list; -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 07/43] qib: Support ib_alloc_mr verb [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (5 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 06/43] nes: " Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 08/43] IB/iser: Convert to ib_alloc_mr Sagi Grimberg ` (36 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/qib/qib_mr.c | 23 +++++++++++++++++++++++ drivers/infiniband/hw/qib/qib_verbs.c | 1 + drivers/infiniband/hw/qib/qib_verbs.h | 5 +++++ 3 files changed, 29 insertions(+) diff --git a/drivers/infiniband/hw/qib/qib_mr.c b/drivers/infiniband/hw/qib/qib_mr.c index c4473db..1522255 100644 --- a/drivers/infiniband/hw/qib/qib_mr.c +++ b/drivers/infiniband/hw/qib/qib_mr.c @@ -327,6 +327,29 @@ out: * * Return the memory region on success, otherwise return an errno. */ +struct ib_mr *qib_alloc_mr(struct ib_pd *pd, + enum ib_mr_type mr_type, + u32 max_entries, + u32 flags) +{ + struct qib_mr *mr; + + if (mr_type != IB_MR_TYPE_FAST_REG || flags) + return ERR_PTR(-EINVAL); + + mr = alloc_mr(max_entries, pd); + if (IS_ERR(mr)) + return (struct ib_mr *)mr; + + return &mr->ibmr; +} + +/* + * Allocate a memory region usable with the + * IB_WR_FAST_REG_MR send work request. + * + * Return the memory region on success, otherwise return an errno. + */ struct ib_mr *qib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len) { struct qib_mr *mr; diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c index a05d1a3..323666b 100644 --- a/drivers/infiniband/hw/qib/qib_verbs.c +++ b/drivers/infiniband/hw/qib/qib_verbs.c @@ -2235,6 +2235,7 @@ int qib_register_ib_device(struct qib_devdata *dd) ibdev->reg_phys_mr = qib_reg_phys_mr; ibdev->reg_user_mr = qib_reg_user_mr; ibdev->dereg_mr = qib_dereg_mr; + ibdev->alloc_mr = qib_alloc_mr; ibdev->alloc_fast_reg_mr = qib_alloc_fast_reg_mr; ibdev->alloc_fast_reg_page_list = qib_alloc_fast_reg_page_list; ibdev->free_fast_reg_page_list = qib_free_fast_reg_page_list; diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h index 1635572..034510c 100644 --- a/drivers/infiniband/hw/qib/qib_verbs.h +++ b/drivers/infiniband/hw/qib/qib_verbs.h @@ -1032,6 +1032,11 @@ struct ib_mr *qib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, int qib_dereg_mr(struct ib_mr *ibmr); +struct ib_mr *qib_alloc_mr(struct ib_pd *pd, + enum ib_mr_type mr_type, + u32 max_entries, + u32 flags); + struct ib_mr *qib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len); struct ib_fast_reg_page_list *qib_alloc_fast_reg_page_list( -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 08/43] IB/iser: Convert to ib_alloc_mr [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (6 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 07/43] qib: " Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 09/43] iser-target: " Sagi Grimberg ` (35 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/ulp/iser/iser_verbs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index 6be4d4a..ecc3265 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -296,7 +296,7 @@ iser_alloc_reg_res(struct ib_device *ib_device, return PTR_ERR(res->frpl); } - res->mr = ib_alloc_fast_reg_mr(pd, size); + res->mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, size, 0); if (IS_ERR(res->mr)) { ret = PTR_ERR(res->mr); iser_err("Failed to allocate ib_fast_reg_mr err=%d\n", ret); -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 09/43] iser-target: Convert to ib_alloc_mr [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (7 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 08/43] IB/iser: Convert to ib_alloc_mr Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 10/43] IB/srp: " Sagi Grimberg ` (34 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/ulp/isert/ib_isert.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index f0b7c9b..94395ce 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -526,7 +526,8 @@ isert_create_pi_ctx(struct fast_reg_descriptor *desc, goto err_pi_ctx; } - pi_ctx->prot_mr = ib_alloc_fast_reg_mr(pd, ISCSI_ISER_SG_TABLESIZE); + pi_ctx->prot_mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, + ISCSI_ISER_SG_TABLESIZE, 0); if (IS_ERR(pi_ctx->prot_mr)) { isert_err("Failed to allocate prot frmr err=%ld\n", PTR_ERR(pi_ctx->prot_mr)); @@ -573,7 +574,8 @@ isert_create_fr_desc(struct ib_device *ib_device, struct ib_pd *pd, return PTR_ERR(fr_desc->data_frpl); } - fr_desc->data_mr = ib_alloc_fast_reg_mr(pd, ISCSI_ISER_SG_TABLESIZE); + fr_desc->data_mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, + ISCSI_ISER_SG_TABLESIZE, 0); if (IS_ERR(fr_desc->data_mr)) { isert_err("Failed to allocate data frmr err=%ld\n", PTR_ERR(fr_desc->data_mr)); -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 10/43] IB/srp: Convert to ib_alloc_mr [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (8 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 09/43] iser-target: " Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 11/43] xprtrdma, svcrdma: " Sagi Grimberg ` (33 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/ulp/srp/ib_srp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 1218738..7747587 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp/ib_srp.c @@ -378,7 +378,8 @@ static struct srp_fr_pool *srp_create_fr_pool(struct ib_device *device, INIT_LIST_HEAD(&pool->free_list); for (i = 0, d = &pool->desc[0]; i < pool->size; i++, d++) { - mr = ib_alloc_fast_reg_mr(pd, max_page_list_len); + mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, + max_page_list_len, 0); if (IS_ERR(mr)) { ret = PTR_ERR(mr); goto destroy_pool; -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 11/43] xprtrdma, svcrdma: Convert to ib_alloc_mr [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (9 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 10/43] IB/srp: " Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 12/43] RDS: " Sagi Grimberg ` (32 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- net/sunrpc/xprtrdma/frwr_ops.c | 6 +++--- net/sunrpc/xprtrdma/svc_rdma_transport.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c index 63f282e..517efed 100644 --- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -117,7 +117,7 @@ __frwr_recovery_worker(struct work_struct *work) if (ib_dereg_mr(r->r.frmr.fr_mr)) goto out_fail; - r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(pd, depth); + r->r.frmr.fr_mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, depth, 0); if (IS_ERR(r->r.frmr.fr_mr)) goto out_fail; @@ -148,7 +148,7 @@ __frwr_init(struct rpcrdma_mw *r, struct ib_pd *pd, struct ib_device *device, struct rpcrdma_frmr *f = &r->r.frmr; int rc; - f->fr_mr = ib_alloc_fast_reg_mr(pd, depth); + f->fr_mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, depth, 0); if (IS_ERR(f->fr_mr)) goto out_mr_err; f->fr_pgl = ib_alloc_fast_reg_page_list(device, depth); @@ -158,7 +158,7 @@ __frwr_init(struct rpcrdma_mw *r, struct ib_pd *pd, struct ib_device *device, out_mr_err: rc = PTR_ERR(f->fr_mr); - dprintk("RPC: %s: ib_alloc_fast_reg_mr status %i\n", + dprintk("RPC: %s: ib_alloc_mr status %i\n", __func__, rc); return rc; diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c index 6b36279..fd933d9 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c @@ -738,7 +738,7 @@ static struct svc_rdma_fastreg_mr *rdma_alloc_frmr(struct svcxprt_rdma *xprt) if (!frmr) goto err; - mr = ib_alloc_fast_reg_mr(xprt->sc_pd, RPCSVC_MAXPAGES); + mr = ib_alloc_mr(xprt->sc_pd, IB_MR_TYPE_FAST_REG, RPCSVC_MAXPAGES, 0); if (IS_ERR(mr)) goto err_free_frmr; -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 12/43] RDS: Convert to ib_alloc_mr [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (10 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 11/43] xprtrdma, svcrdma: " Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 13/43] mlx5: Drop mlx5_ib_alloc_fast_reg_mr Sagi Grimberg ` (31 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- net/rds/iw_rdma.c | 5 +++-- net/rds/iw_send.c | 5 +++-- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/net/rds/iw_rdma.c b/net/rds/iw_rdma.c index dba8d08..dac0131 100644 --- a/net/rds/iw_rdma.c +++ b/net/rds/iw_rdma.c @@ -667,11 +667,12 @@ static int rds_iw_init_fastreg(struct rds_iw_mr_pool *pool, struct ib_mr *mr; int err; - mr = ib_alloc_fast_reg_mr(rds_iwdev->pd, pool->max_message_size); + mr = ib_alloc_mr(rds_iwdev->pd, IB_MR_TYPE_FAST_REG, + pool->max_message_size, 0); if (IS_ERR(mr)) { err = PTR_ERR(mr); - printk(KERN_WARNING "RDS/IW: ib_alloc_fast_reg_mr failed (err=%d)\n", err); + printk(KERN_WARNING "RDS/IW: ib_alloc_mr failed (err=%d)\n", err); return err; } diff --git a/net/rds/iw_send.c b/net/rds/iw_send.c index 334fe98..0d8e74b 100644 --- a/net/rds/iw_send.c +++ b/net/rds/iw_send.c @@ -153,9 +153,10 @@ void rds_iw_send_init_ring(struct rds_iw_connection *ic) sge->length = sizeof(struct rds_header); sge->lkey = 0; - send->s_mr = ib_alloc_fast_reg_mr(ic->i_pd, fastreg_message_size); + send->s_mr = ib_alloc_mr(ic->i_pd, IB_MR_TYPE_FAST_REG, + fastreg_message_size, 0); if (IS_ERR(send->s_mr)) { - printk(KERN_WARNING "RDS/IW: ib_alloc_fast_reg_mr failed\n"); + printk(KERN_WARNING "RDS/IW: ib_alloc_mr failed\n"); break; } -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 13/43] mlx5: Drop mlx5_ib_alloc_fast_reg_mr [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (11 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 12/43] RDS: " Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 14/43] mlx4: Drop mlx4_ib_alloc_fast_reg_mr Sagi Grimberg ` (30 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/mlx5/main.c | 1 - drivers/infiniband/hw/mlx5/mlx5_ib.h | 2 -- drivers/infiniband/hw/mlx5/mr.c | 44 ------------------------------------ 3 files changed, 47 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 82a371f..ce75875 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1503,7 +1503,6 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev) dev->ib_dev.detach_mcast = mlx5_ib_mcg_detach; dev->ib_dev.process_mad = mlx5_ib_process_mad; dev->ib_dev.alloc_mr = mlx5_ib_alloc_mr; - dev->ib_dev.alloc_fast_reg_mr = mlx5_ib_alloc_fast_reg_mr; dev->ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list; dev->ib_dev.free_fast_reg_page_list = mlx5_ib_free_fast_reg_page_list; dev->ib_dev.check_mr_status = mlx5_ib_check_mr_status; diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index cd6fb5d..c2916f1 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -577,8 +577,6 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd, enum ib_mr_type mr_type, u32 max_entries, u32 flags); -struct ib_mr *mlx5_ib_alloc_fast_reg_mr(struct ib_pd *pd, - int max_page_list_len); struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev, int page_list_len); void mlx5_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list); diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index 185c963..c8de302 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -1344,50 +1344,6 @@ err_free: return ERR_PTR(err); } -struct ib_mr *mlx5_ib_alloc_fast_reg_mr(struct ib_pd *pd, - int max_page_list_len) -{ - struct mlx5_ib_dev *dev = to_mdev(pd->device); - struct mlx5_create_mkey_mbox_in *in; - struct mlx5_ib_mr *mr; - int err; - - mr = kzalloc(sizeof(*mr), GFP_KERNEL); - if (!mr) - return ERR_PTR(-ENOMEM); - - in = kzalloc(sizeof(*in), GFP_KERNEL); - if (!in) { - err = -ENOMEM; - goto err_free; - } - - in->seg.status = MLX5_MKEY_STATUS_FREE; - in->seg.xlt_oct_size = cpu_to_be32((max_page_list_len + 1) / 2); - in->seg.qpn_mkey7_0 = cpu_to_be32(0xffffff << 8); - in->seg.flags = MLX5_PERM_UMR_EN | MLX5_ACCESS_MODE_MTT; - in->seg.flags_pd = cpu_to_be32(to_mpd(pd)->pdn); - /* - * TBD not needed - issue 197292 */ - in->seg.log2_page_size = PAGE_SHIFT; - - err = mlx5_core_create_mkey(dev->mdev, &mr->mmr, in, sizeof(*in), NULL, - NULL, NULL); - kfree(in); - if (err) - goto err_free; - - mr->ibmr.lkey = mr->mmr.key; - mr->ibmr.rkey = mr->mmr.key; - mr->umem = NULL; - - return &mr->ibmr; - -err_free: - kfree(mr); - return ERR_PTR(err); -} - struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev, int page_list_len) { -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 14/43] mlx4: Drop mlx4_ib_alloc_fast_reg_mr [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (12 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 13/43] mlx5: Drop mlx5_ib_alloc_fast_reg_mr Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 15/43] ocrdma: Drop ocrdma_alloc_frmr Sagi Grimberg ` (29 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/mlx4/main.c | 1 - drivers/infiniband/hw/mlx4/mlx4_ib.h | 2 -- drivers/infiniband/hw/mlx4/mr.c | 33 --------------------------------- 3 files changed, 36 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 54671c7..829fcf4 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -2297,7 +2297,6 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) ibdev->ib_dev.reg_user_mr = mlx4_ib_reg_user_mr; ibdev->ib_dev.rereg_user_mr = mlx4_ib_rereg_user_mr; ibdev->ib_dev.dereg_mr = mlx4_ib_dereg_mr; - ibdev->ib_dev.alloc_fast_reg_mr = mlx4_ib_alloc_fast_reg_mr; ibdev->ib_dev.alloc_mr = mlx4_ib_alloc_mr; ibdev->ib_dev.alloc_fast_reg_page_list = mlx4_ib_alloc_fast_reg_page_list; ibdev->ib_dev.free_fast_reg_page_list = mlx4_ib_free_fast_reg_page_list; diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index c8b5679..9220faf 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -684,8 +684,6 @@ struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd, enum ib_mr_type mr_type, u32 max_entries, u32 flags); -struct ib_mr *mlx4_ib_alloc_fast_reg_mr(struct ib_pd *pd, - int max_page_list_len); struct ib_fast_reg_page_list *mlx4_ib_alloc_fast_reg_page_list(struct ib_device *ibdev, int page_list_len); void mlx4_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list); diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c index 3cba374..121ee7f 100644 --- a/drivers/infiniband/hw/mlx4/mr.c +++ b/drivers/infiniband/hw/mlx4/mr.c @@ -388,39 +388,6 @@ err_free: return ERR_PTR(err); } -struct ib_mr *mlx4_ib_alloc_fast_reg_mr(struct ib_pd *pd, - int max_page_list_len) -{ - struct mlx4_ib_dev *dev = to_mdev(pd->device); - struct mlx4_ib_mr *mr; - int err; - - mr = kmalloc(sizeof *mr, GFP_KERNEL); - if (!mr) - return ERR_PTR(-ENOMEM); - - err = mlx4_mr_alloc(dev->dev, to_mpd(pd)->pdn, 0, 0, 0, - max_page_list_len, 0, &mr->mmr); - if (err) - goto err_free; - - err = mlx4_mr_enable(dev->dev, &mr->mmr); - if (err) - goto err_mr; - - mr->ibmr.rkey = mr->ibmr.lkey = mr->mmr.key; - mr->umem = NULL; - - return &mr->ibmr; - -err_mr: - (void) mlx4_mr_free(dev->dev, &mr->mmr); - -err_free: - kfree(mr); - return ERR_PTR(err); -} - struct ib_fast_reg_page_list *mlx4_ib_alloc_fast_reg_page_list(struct ib_device *ibdev, int page_list_len) { -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 15/43] ocrdma: Drop ocrdma_alloc_frmr [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (13 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 14/43] mlx4: Drop mlx4_ib_alloc_fast_reg_mr Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 16/43] qib: Drop qib_alloc_fast_reg_mr Sagi Grimberg ` (28 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/ocrdma/ocrdma_main.c | 1 - drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 41 ----------------------------- drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 1 - 3 files changed, 43 deletions(-) diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c index d7ebe04..47d2814 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c @@ -295,7 +295,6 @@ static int ocrdma_register_device(struct ocrdma_dev *dev) dev->ibdev.reg_user_mr = ocrdma_reg_user_mr; dev->ibdev.alloc_mr = ocrdma_alloc_mr; - dev->ibdev.alloc_fast_reg_mr = ocrdma_alloc_frmr; dev->ibdev.alloc_fast_reg_page_list = ocrdma_alloc_frmr_page_list; dev->ibdev.free_fast_reg_page_list = ocrdma_free_frmr_page_list; diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c index 3487780..fb97db1 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c @@ -3030,47 +3030,6 @@ pbl_err: return ERR_PTR(-ENOMEM); } -struct ib_mr *ocrdma_alloc_frmr(struct ib_pd *ibpd, int max_page_list_len) -{ - int status; - struct ocrdma_mr *mr; - struct ocrdma_pd *pd = get_ocrdma_pd(ibpd); - struct ocrdma_dev *dev = get_ocrdma_dev(ibpd->device); - - if (max_page_list_len > dev->attr.max_pages_per_frmr) - return ERR_PTR(-EINVAL); - - mr = kzalloc(sizeof(*mr), GFP_KERNEL); - if (!mr) - return ERR_PTR(-ENOMEM); - - status = ocrdma_get_pbl_info(dev, mr, max_page_list_len); - if (status) - goto pbl_err; - mr->hwmr.fr_mr = 1; - mr->hwmr.remote_rd = 0; - mr->hwmr.remote_wr = 0; - mr->hwmr.local_rd = 0; - mr->hwmr.local_wr = 0; - mr->hwmr.mw_bind = 0; - status = ocrdma_build_pbl_tbl(dev, &mr->hwmr); - if (status) - goto pbl_err; - status = ocrdma_reg_mr(dev, &mr->hwmr, pd->id, 0); - if (status) - goto mbx_err; - mr->ibmr.rkey = mr->hwmr.lkey; - mr->ibmr.lkey = mr->hwmr.lkey; - dev->stag_arr[(mr->hwmr.lkey >> 8) & (OCRDMA_MAX_STAG - 1)] = - (unsigned long) mr; - return &mr->ibmr; -mbx_err: - ocrdma_free_mr_pbl_tbl(dev, &mr->hwmr); -pbl_err: - kfree(mr); - return ERR_PTR(-ENOMEM); -} - struct ib_fast_reg_page_list *ocrdma_alloc_frmr_page_list(struct ib_device *ibdev, int page_list_len) diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h index eebcda2..d09ff8e 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h @@ -100,7 +100,6 @@ struct ib_mr *ocrdma_alloc_mr(struct ib_pd *pd, enum ib_mr_type mr_type, u32 max_entries, u32 flags); -struct ib_mr *ocrdma_alloc_frmr(struct ib_pd *pd, int max_page_list_len); struct ib_fast_reg_page_list *ocrdma_alloc_frmr_page_list(struct ib_device *ibdev, int page_list_len); -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 16/43] qib: Drop qib_alloc_fast_reg_mr [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (14 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 15/43] ocrdma: Drop ocrdma_alloc_frmr Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 17/43] nes: Drop nes_alloc_fast_reg_mr Sagi Grimberg ` (27 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/qib/qib_mr.c | 17 ----------------- drivers/infiniband/hw/qib/qib_verbs.c | 1 - drivers/infiniband/hw/qib/qib_verbs.h | 2 -- 3 files changed, 20 deletions(-) diff --git a/drivers/infiniband/hw/qib/qib_mr.c b/drivers/infiniband/hw/qib/qib_mr.c index 1522255..2a4afea 100644 --- a/drivers/infiniband/hw/qib/qib_mr.c +++ b/drivers/infiniband/hw/qib/qib_mr.c @@ -344,23 +344,6 @@ struct ib_mr *qib_alloc_mr(struct ib_pd *pd, return &mr->ibmr; } -/* - * Allocate a memory region usable with the - * IB_WR_FAST_REG_MR send work request. - * - * Return the memory region on success, otherwise return an errno. - */ -struct ib_mr *qib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len) -{ - struct qib_mr *mr; - - mr = alloc_mr(max_page_list_len, pd); - if (IS_ERR(mr)) - return (struct ib_mr *)mr; - - return &mr->ibmr; -} - struct ib_fast_reg_page_list * qib_alloc_fast_reg_page_list(struct ib_device *ibdev, int page_list_len) { diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c index 323666b..ef022a1 100644 --- a/drivers/infiniband/hw/qib/qib_verbs.c +++ b/drivers/infiniband/hw/qib/qib_verbs.c @@ -2236,7 +2236,6 @@ int qib_register_ib_device(struct qib_devdata *dd) ibdev->reg_user_mr = qib_reg_user_mr; ibdev->dereg_mr = qib_dereg_mr; ibdev->alloc_mr = qib_alloc_mr; - ibdev->alloc_fast_reg_mr = qib_alloc_fast_reg_mr; ibdev->alloc_fast_reg_page_list = qib_alloc_fast_reg_page_list; ibdev->free_fast_reg_page_list = qib_free_fast_reg_page_list; ibdev->alloc_fmr = qib_alloc_fmr; diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h index 034510c..8fbd995 100644 --- a/drivers/infiniband/hw/qib/qib_verbs.h +++ b/drivers/infiniband/hw/qib/qib_verbs.h @@ -1037,8 +1037,6 @@ struct ib_mr *qib_alloc_mr(struct ib_pd *pd, u32 max_entries, u32 flags); -struct ib_mr *qib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len); - struct ib_fast_reg_page_list *qib_alloc_fast_reg_page_list( struct ib_device *ibdev, int page_list_len); -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 17/43] nes: Drop nes_alloc_fast_reg_mr [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (15 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 16/43] qib: Drop qib_alloc_fast_reg_mr Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 18/43] cxgb4: Drop c4iw_alloc_fast_reg_mr Sagi Grimberg ` (26 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/nes/nes_verbs.c | 66 ----------------------------------- 1 file changed, 66 deletions(-) diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c index ac63763..752e6ea 100644 --- a/drivers/infiniband/hw/nes/nes_verbs.c +++ b/drivers/infiniband/hw/nes/nes_verbs.c @@ -447,71 +447,6 @@ static struct ib_mr *nes_alloc_mr(struct ib_pd *ibpd, } /* - * nes_alloc_fast_reg_mr - */ -static struct ib_mr *nes_alloc_fast_reg_mr(struct ib_pd *ibpd, int max_page_list_len) -{ - struct nes_pd *nespd = to_nespd(ibpd); - struct nes_vnic *nesvnic = to_nesvnic(ibpd->device); - struct nes_device *nesdev = nesvnic->nesdev; - struct nes_adapter *nesadapter = nesdev->nesadapter; - - u32 next_stag_index; - u8 stag_key = 0; - u32 driver_key = 0; - int err = 0; - u32 stag_index = 0; - struct nes_mr *nesmr; - u32 stag; - int ret; - struct ib_mr *ibmr; -/* - * Note: Set to always use a fixed length single page entry PBL. This is to allow - * for the fast_reg_mr operation to always know the size of the PBL. - */ - if (max_page_list_len > (NES_4K_PBL_CHUNK_SIZE / sizeof(u64))) - return ERR_PTR(-E2BIG); - - get_random_bytes(&next_stag_index, sizeof(next_stag_index)); - stag_key = (u8)next_stag_index; - next_stag_index >>= 8; - next_stag_index %= nesadapter->max_mr; - - err = nes_alloc_resource(nesadapter, nesadapter->allocated_mrs, - nesadapter->max_mr, &stag_index, - &next_stag_index, NES_RESOURCE_FAST_MR); - if (err) - return ERR_PTR(err); - - nesmr = kzalloc(sizeof(*nesmr), GFP_KERNEL); - if (!nesmr) { - nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); - return ERR_PTR(-ENOMEM); - } - - stag = stag_index << 8; - stag |= driver_key; - stag += (u32)stag_key; - - nes_debug(NES_DBG_MR, "Allocating STag 0x%08X index = 0x%08X\n", - stag, stag_index); - - ret = alloc_fast_reg_mr(nesdev, nespd, stag, max_page_list_len); - - if (ret == 0) { - nesmr->ibmr.rkey = stag; - nesmr->ibmr.lkey = stag; - nesmr->mode = IWNES_MEMREG_TYPE_FMEM; - ibmr = &nesmr->ibmr; - } else { - kfree(nesmr); - nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); - ibmr = ERR_PTR(-ENOMEM); - } - return ibmr; -} - -/* * nes_alloc_fast_reg_page_list */ static struct ib_fast_reg_page_list *nes_alloc_fast_reg_page_list( @@ -4002,7 +3937,6 @@ struct nes_ib_device *nes_init_ofa_device(struct net_device *netdev) nesibdev->ibdev.bind_mw = nes_bind_mw; nesibdev->ibdev.alloc_mr = nes_alloc_mr; - nesibdev->ibdev.alloc_fast_reg_mr = nes_alloc_fast_reg_mr; nesibdev->ibdev.alloc_fast_reg_page_list = nes_alloc_fast_reg_page_list; nesibdev->ibdev.free_fast_reg_page_list = nes_free_fast_reg_page_list; -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 18/43] cxgb4: Drop c4iw_alloc_fast_reg_mr [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (16 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 17/43] nes: Drop nes_alloc_fast_reg_mr Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 19/43] cxgb3: Drop iwch_alloc_fast_reg_mr Sagi Grimberg ` (25 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 1 - drivers/infiniband/hw/cxgb4/mem.c | 51 ---------------------------------- drivers/infiniband/hw/cxgb4/provider.c | 1 - 3 files changed, 53 deletions(-) diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h index 97b2568..886be9c 100644 --- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h +++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h @@ -974,7 +974,6 @@ struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd, enum ib_mr_type mr_type, u32 max_entries, u32 flags); -struct ib_mr *c4iw_alloc_fast_reg_mr(struct ib_pd *pd, int pbl_depth); int c4iw_dealloc_mw(struct ib_mw *mw); struct ib_mw *c4iw_alloc_mw(struct ib_pd *pd, enum ib_mw_type type); struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start, diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c index 7ee01ce..5ecf4aa 100644 --- a/drivers/infiniband/hw/cxgb4/mem.c +++ b/drivers/infiniband/hw/cxgb4/mem.c @@ -910,57 +910,6 @@ err: return ERR_PTR(ret); } -struct ib_mr *c4iw_alloc_fast_reg_mr(struct ib_pd *pd, int pbl_depth) -{ - struct c4iw_dev *rhp; - struct c4iw_pd *php; - struct c4iw_mr *mhp; - u32 mmid; - u32 stag = 0; - int ret = 0; - - php = to_c4iw_pd(pd); - rhp = php->rhp; - mhp = kzalloc(sizeof(*mhp), GFP_KERNEL); - if (!mhp) { - ret = -ENOMEM; - goto err; - } - - mhp->rhp = rhp; - ret = alloc_pbl(mhp, pbl_depth); - if (ret) - goto err1; - mhp->attr.pbl_size = pbl_depth; - ret = allocate_stag(&rhp->rdev, &stag, php->pdid, - mhp->attr.pbl_size, mhp->attr.pbl_addr); - if (ret) - goto err2; - mhp->attr.pdid = php->pdid; - mhp->attr.type = FW_RI_STAG_NSMR; - mhp->attr.stag = stag; - mhp->attr.state = 1; - mmid = (stag) >> 8; - mhp->ibmr.rkey = mhp->ibmr.lkey = stag; - if (insert_handle(rhp, &rhp->mmidr, mhp, mmid)) { - ret = -ENOMEM; - goto err3; - } - - PDBG("%s mmid 0x%x mhp %p stag 0x%x\n", __func__, mmid, mhp, stag); - return &(mhp->ibmr); -err3: - dereg_mem(&rhp->rdev, stag, mhp->attr.pbl_size, - mhp->attr.pbl_addr); -err2: - c4iw_pblpool_free(&mhp->rhp->rdev, mhp->attr.pbl_addr, - mhp->attr.pbl_size << 3); -err1: - kfree(mhp); -err: - return ERR_PTR(ret); -} - struct ib_fast_reg_page_list *c4iw_alloc_fastreg_pbl(struct ib_device *device, int page_list_len) { diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c index 2885aba..7746113 100644 --- a/drivers/infiniband/hw/cxgb4/provider.c +++ b/drivers/infiniband/hw/cxgb4/provider.c @@ -557,7 +557,6 @@ int c4iw_register_device(struct c4iw_dev *dev) dev->ibdev.bind_mw = c4iw_bind_mw; dev->ibdev.dealloc_mw = c4iw_dealloc_mw; dev->ibdev.alloc_mr = c4iw_alloc_mr; - dev->ibdev.alloc_fast_reg_mr = c4iw_alloc_fast_reg_mr; dev->ibdev.alloc_fast_reg_page_list = c4iw_alloc_fastreg_pbl; dev->ibdev.free_fast_reg_page_list = c4iw_free_fastreg_pbl; dev->ibdev.attach_mcast = c4iw_multicast_attach; -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 19/43] cxgb3: Drop iwch_alloc_fast_reg_mr [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (17 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 18/43] cxgb4: Drop c4iw_alloc_fast_reg_mr Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 20/43] IB/core: Drop ib_alloc_fast_reg_mr Sagi Grimberg ` (24 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/cxgb3/iwch_provider.c | 47 ----------------------------- 1 file changed, 47 deletions(-) diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index d0e9e2d..af55b79 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -848,52 +848,6 @@ err: return ERR_PTR(ret); } -static struct ib_mr *iwch_alloc_fast_reg_mr(struct ib_pd *pd, int pbl_depth) -{ - struct iwch_dev *rhp; - struct iwch_pd *php; - struct iwch_mr *mhp; - u32 mmid; - u32 stag = 0; - int ret = 0; - - php = to_iwch_pd(pd); - rhp = php->rhp; - mhp = kzalloc(sizeof(*mhp), GFP_KERNEL); - if (!mhp) - goto err; - - mhp->rhp = rhp; - ret = iwch_alloc_pbl(mhp, pbl_depth); - if (ret) - goto err1; - mhp->attr.pbl_size = pbl_depth; - ret = cxio_allocate_stag(&rhp->rdev, &stag, php->pdid, - mhp->attr.pbl_size, mhp->attr.pbl_addr); - if (ret) - goto err2; - mhp->attr.pdid = php->pdid; - mhp->attr.type = TPT_NON_SHARED_MR; - mhp->attr.stag = stag; - mhp->attr.state = 1; - mmid = (stag) >> 8; - mhp->ibmr.rkey = mhp->ibmr.lkey = stag; - if (insert_handle(rhp, &rhp->mmidr, mhp, mmid)) - goto err3; - - PDBG("%s mmid 0x%x mhp %p stag 0x%x\n", __func__, mmid, mhp, stag); - return &(mhp->ibmr); -err3: - cxio_dereg_mem(&rhp->rdev, stag, mhp->attr.pbl_size, - mhp->attr.pbl_addr); -err2: - iwch_free_pbl(mhp); -err1: - kfree(mhp); -err: - return ERR_PTR(ret); -} - static struct ib_fast_reg_page_list *iwch_alloc_fastreg_pbl( struct ib_device *device, int page_list_len) @@ -1492,7 +1446,6 @@ int iwch_register_device(struct iwch_dev *dev) dev->ibdev.bind_mw = iwch_bind_mw; dev->ibdev.dealloc_mw = iwch_dealloc_mw; dev->ibdev.alloc_mr = iwch_alloc_mr; - dev->ibdev.alloc_fast_reg_mr = iwch_alloc_fast_reg_mr; dev->ibdev.alloc_fast_reg_page_list = iwch_alloc_fastreg_pbl; dev->ibdev.free_fast_reg_page_list = iwch_free_fastreg_pbl; dev->ibdev.attach_mcast = iwch_multicast_attach; -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 20/43] IB/core: Drop ib_alloc_fast_reg_mr [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (18 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 19/43] cxgb3: Drop iwch_alloc_fast_reg_mr Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 21/43] mlx5: Allocate a private page list in ib_alloc_mr Sagi Grimberg ` (23 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Fully replaced by a more generic and suitable ib_alloc_mr Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/core/verbs.c | 21 --------------------- include/rdma/ib_verbs.h | 11 ----------- 2 files changed, 32 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 23d73bd..beed431 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1265,27 +1265,6 @@ struct ib_mr *ib_alloc_mr(struct ib_pd *pd, } EXPORT_SYMBOL(ib_alloc_mr); -struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len) -{ - struct ib_mr *mr; - - if (!pd->device->alloc_fast_reg_mr) - return ERR_PTR(-ENOSYS); - - mr = pd->device->alloc_fast_reg_mr(pd, max_page_list_len); - - if (!IS_ERR(mr)) { - mr->device = pd->device; - mr->pd = pd; - mr->uobject = NULL; - atomic_inc(&pd->usecnt); - atomic_set(&mr->usecnt, 0); - } - - return mr; -} -EXPORT_SYMBOL(ib_alloc_fast_reg_mr); - struct ib_fast_reg_page_list *ib_alloc_fast_reg_page_list(struct ib_device *device, int max_page_list_len) { diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 5ec9a70..7a93e2d 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1661,8 +1661,6 @@ struct ib_device { enum ib_mr_type mr_type, u32 max_entries, u32 flags); - struct ib_mr * (*alloc_fast_reg_mr)(struct ib_pd *pd, - int max_page_list_len); struct ib_fast_reg_page_list * (*alloc_fast_reg_page_list)(struct ib_device *device, int page_list_len); void (*free_fast_reg_page_list)(struct ib_fast_reg_page_list *page_list); @@ -2803,15 +2801,6 @@ struct ib_mr *ib_alloc_mr(struct ib_pd *pd, u32 flags); /** - * ib_alloc_fast_reg_mr - Allocates memory region usable with the - * IB_WR_FAST_REG_MR send work request. - * @pd: The protection domain associated with the region. - * @max_page_list_len: requested max physical buffer list length to be - * used with fast register work requests for this MR. - */ -struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len); - -/** * ib_alloc_fast_reg_page_list - Allocates a page list array * @device - ib device pointer. * @page_list_len - size of the page list array to be allocated. -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 21/43] mlx5: Allocate a private page list in ib_alloc_mr [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (19 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 20/43] IB/core: Drop ib_alloc_fast_reg_mr Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg [not found] ` <1437548143-24893-22-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 6:55 ` [PATCH WIP 22/43] mlx4: " Sagi Grimberg ` (22 subsequent siblings) 43 siblings, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/mlx5/mlx5_ib.h | 5 ++++ drivers/infiniband/hw/mlx5/mr.c | 45 ++++++++++++++++++++++++++++++++++++ 2 files changed, 50 insertions(+) diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index c2916f1..df5e959 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -315,6 +315,11 @@ enum mlx5_ib_mtt_access_flags { struct mlx5_ib_mr { struct ib_mr ibmr; + u64 *pl; + __be64 *mpl; + dma_addr_t pl_map; + int ndescs; + int max_descs; struct mlx5_core_mr mmr; struct ib_umem *umem; struct mlx5_shared_mr_info *smr_info; diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index c8de302..1075065 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -1167,6 +1167,42 @@ error: return err; } +static int +mlx5_alloc_page_list(struct ib_device *device, + struct mlx5_ib_mr *mr, int ndescs) +{ + int size = ndescs * sizeof(u64); + + mr->pl = kcalloc(ndescs, sizeof(u64), GFP_KERNEL); + if (!mr->pl) + return -ENOMEM; + + mr->mpl = dma_alloc_coherent(device->dma_device, size, + &mr->pl_map, GFP_KERNEL); + if (!mr->mpl) + goto err; + + return 0; +err: + kfree(mr->pl); + + return -ENOMEM; +} + +static void +mlx5_free_page_list(struct mlx5_ib_mr *mr) +{ + struct ib_device *device = mr->ibmr.device; + int size = mr->max_descs * sizeof(u64); + + kfree(mr->pl); + if (mr->mpl) + dma_free_coherent(device->dma_device, size, + mr->mpl, mr->pl_map); + mr->pl = NULL; + mr->mpl = NULL; +} + static int clean_mr(struct mlx5_ib_mr *mr) { struct mlx5_ib_dev *dev = to_mdev(mr->ibmr.device); @@ -1186,6 +1222,8 @@ static int clean_mr(struct mlx5_ib_mr *mr) mr->sig = NULL; } + mlx5_free_page_list(mr); + if (!umred) { err = destroy_mkey(dev, mr); if (err) { @@ -1279,6 +1317,12 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd, if (mr_type == IB_MR_TYPE_FAST_REG) { access_mode = MLX5_ACCESS_MODE_MTT; in->seg.log2_page_size = PAGE_SHIFT; + + err = mlx5_alloc_page_list(pd->device, mr, ndescs); + if (err) + goto err_free_in; + + mr->max_descs = ndescs; } else if (mr_type == IB_MR_TYPE_SIGNATURE) { u32 psv_index[2]; @@ -1335,6 +1379,7 @@ err_destroy_psv: mlx5_ib_warn(dev, "failed to destroy wire psv %d\n", mr->sig->psv_wire.psv_idx); } + mlx5_free_page_list(mr); err_free_sig: kfree(mr->sig); err_free_in: -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
[parent not found: <1437548143-24893-22-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH WIP 21/43] mlx5: Allocate a private page list in ib_alloc_mr [not found] ` <1437548143-24893-22-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2015-07-22 16:46 ` Christoph Hellwig [not found] ` <20150722164605.GB6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-07-28 10:57 ` Haggai Eran 1 sibling, 1 reply; 142+ messages in thread From: Christoph Hellwig @ 2015-07-22 16:46 UTC (permalink / raw) To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer Just curious: what's the tradeoff between allocating the page list in the core vs duplicating it in all the drivers? Does the driver variant give us any benefits? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150722164605.GB6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>]
* Re: [PATCH WIP 21/43] mlx5: Allocate a private page list in ib_alloc_mr [not found] ` <20150722164605.GB6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> @ 2015-07-22 16:51 ` Sagi Grimberg 0 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 16:51 UTC (permalink / raw) To: Christoph Hellwig, Sagi Grimberg Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/22/2015 7:46 PM, Christoph Hellwig wrote: > Just curious: what's the tradeoff between allocating the page list > in the core vs duplicating it in all the drivers? Does the driver > variant give us any benefits? It's not necessarily a page list... (i.e. a real scatterlist). I it will make more sense in patch 41/43. Moreover, as I wrote in the cover-letter. I noticed that several drivers keep shadows anyway for various reasons. For example mlx4 sets the page list with a preset-bit (related to ODP...) so at registration time we see the loop: for (i = 0; i < mr->npages; ++i) mr->mpl[i] = cpu_to_be64(mr->pl[i] | MLX4_MTT_FLAG_PRESENT); Given that this not a single example, I'd expect drivers to skip this duplication (hopefully). Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH WIP 21/43] mlx5: Allocate a private page list in ib_alloc_mr [not found] ` <1437548143-24893-22-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 16:46 ` Christoph Hellwig @ 2015-07-28 10:57 ` Haggai Eran [not found] ` <55B75FFC.6040200-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 1 sibling, 1 reply; 142+ messages in thread From: Haggai Eran @ 2015-07-28 10:57 UTC (permalink / raw) To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer Hi Sagi, On 22/07/2015 09:55, Sagi Grimberg wrote: > Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> > --- > drivers/infiniband/hw/mlx5/mlx5_ib.h | 5 ++++ > drivers/infiniband/hw/mlx5/mr.c | 45 ++++++++++++++++++++++++++++++++++++ > 2 files changed, 50 insertions(+) > > diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h > index c2916f1..df5e959 100644 > --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h > +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h > @@ -315,6 +315,11 @@ enum mlx5_ib_mtt_access_flags { > > struct mlx5_ib_mr { > struct ib_mr ibmr; > + u64 *pl; > + __be64 *mpl; > + dma_addr_t pl_map; Nit: could you choose more descriptive names for these fields? It can be difficult to understand what they mean just based on the acronym. > + int ndescs; This one isn't used in this patch, right? > + int max_descs; > struct mlx5_core_mr mmr; > struct ib_umem *umem; > struct mlx5_shared_mr_info *smr_info; -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <55B75FFC.6040200-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH WIP 21/43] mlx5: Allocate a private page list in ib_alloc_mr [not found] ` <55B75FFC.6040200-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2015-07-30 8:08 ` Sagi Grimberg 0 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-30 8:08 UTC (permalink / raw) To: Haggai Eran, Sagi Grimberg Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/28/2015 1:57 PM, Haggai Eran wrote: > Hi Sagi, > > On 22/07/2015 09:55, Sagi Grimberg wrote: >> Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> >> --- >> drivers/infiniband/hw/mlx5/mlx5_ib.h | 5 ++++ >> drivers/infiniband/hw/mlx5/mr.c | 45 ++++++++++++++++++++++++++++++++++++ >> 2 files changed, 50 insertions(+) >> >> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h >> index c2916f1..df5e959 100644 >> --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h >> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h >> @@ -315,6 +315,11 @@ enum mlx5_ib_mtt_access_flags { >> >> struct mlx5_ib_mr { >> struct ib_mr ibmr; >> + u64 *pl; >> + __be64 *mpl; >> + dma_addr_t pl_map; > Nit: could you choose more descriptive names for these fields? It can be > difficult to understand what they mean just based on the acronym. OK - I'll name it better in v1. > >> + int ndescs; > This one isn't used in this patch, right? Not in this patch - I can move it. Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* [PATCH WIP 22/43] mlx4: Allocate a private page list in ib_alloc_mr [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (20 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 21/43] mlx5: Allocate a private page list in ib_alloc_mr Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 23/43] ocrdma: " Sagi Grimberg ` (21 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/mlx4/mlx4_ib.h | 5 ++++ drivers/infiniband/hw/mlx4/mr.c | 52 +++++++++++++++++++++++++++++++++--- 2 files changed, 54 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index 9220faf..a9a4a7f 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -120,6 +120,11 @@ struct mlx4_ib_mr { struct ib_mr ibmr; struct mlx4_mr mmr; struct ib_umem *umem; + u64 *pl; + __be64 *mpl; + dma_addr_t pl_map; + u32 npages; + u32 max_pages; }; struct mlx4_ib_mw { diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c index 121ee7f..01e16bc 100644 --- a/drivers/infiniband/hw/mlx4/mr.c +++ b/drivers/infiniband/hw/mlx4/mr.c @@ -271,11 +271,50 @@ release_mpt_entry: return err; } +static int +mlx4_alloc_page_list(struct ib_device *device, + struct mlx4_ib_mr *mr, + int max_entries) +{ + int size = max_entries * sizeof (u64); + + mr->pl = kcalloc(max_entries, sizeof(u64), GFP_KERNEL); + if (!mr->pl) + return -ENOMEM; + + mr->mpl = dma_alloc_coherent(device->dma_device, size, + &mr->pl_map, GFP_KERNEL); + if (!mr->mpl) + goto err; + + return 0; +err: + kfree(mr->pl); + + return -ENOMEM; +} + +static void +mlx4_free_page_list(struct mlx4_ib_mr *mr) +{ + struct ib_device *device = mr->ibmr.device; + int size = mr->max_pages * sizeof(u64); + + kfree(mr->pl); + if (mr->mpl) + dma_free_coherent(device->dma_device, size, + mr->mpl, mr->pl_map); + mr->pl = NULL; + mr->mpl = NULL; +} + int mlx4_ib_dereg_mr(struct ib_mr *ibmr) { struct mlx4_ib_mr *mr = to_mmr(ibmr); int ret; + mlx4_free_page_list(mr); + ret = mlx4_mr_free(to_mdev(ibmr->device)->dev, &mr->mmr); if (ret) return ret; @@ -371,18 +410,25 @@ struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd, if (err) goto err_free; + err = mlx4_alloc_page_list(pd->device, mr, max_entries); + if (err) + goto err_free_mr; + + mr->max_pages = max_entries; + err = mlx4_mr_enable(dev->dev, &mr->mmr); if (err) - goto err_mr; + goto err_free_pl; mr->ibmr.rkey = mr->ibmr.lkey = mr->mmr.key; mr->umem = NULL; return &mr->ibmr; -err_mr: +err_free_pl: + mlx4_free_page_list(mr); +err_free_mr: (void) mlx4_mr_free(dev->dev, &mr->mmr); - err_free: kfree(mr); return ERR_PTR(err); -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 23/43] ocrdma: Allocate a private page list in ib_alloc_mr [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (21 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 22/43] mlx4: " Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 24/43] cxgb3: Allocate a provate " Sagi Grimberg ` (20 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/ocrdma/ocrdma.h | 2 ++ drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 9 +++++++++ 2 files changed, 11 insertions(+) diff --git a/drivers/infiniband/hw/ocrdma/ocrdma.h b/drivers/infiniband/hw/ocrdma/ocrdma.h index b396344..37deea2 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma.h +++ b/drivers/infiniband/hw/ocrdma/ocrdma.h @@ -178,6 +178,8 @@ struct ocrdma_mr { struct ib_mr ibmr; struct ib_umem *umem; struct ocrdma_hw_mr hwmr; + u64 *pl; + u32 npages; }; struct ocrdma_stats { diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c index fb97db1..a764cb9 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c @@ -957,6 +957,7 @@ int ocrdma_dereg_mr(struct ib_mr *ib_mr) (void) ocrdma_mbx_dealloc_lkey(dev, mr->hwmr.fr_mr, mr->hwmr.lkey); + kfree(mr->pl); ocrdma_free_mr_pbl_tbl(dev, &mr->hwmr); /* it could be user registered memory. */ @@ -3003,6 +3004,12 @@ struct ib_mr *ocrdma_alloc_mr(struct ib_pd *ibpd, if (!mr) return ERR_PTR(-ENOMEM); + mr->pl = kcalloc(max_entries, sizeof(u64), GFP_KERNEL); + if (!mr->pl) { + status = -ENOMEM; + goto pl_err; + } + status = ocrdma_get_pbl_info(dev, mr, max_entries); if (status) goto pbl_err; @@ -3026,6 +3033,8 @@ struct ib_mr *ocrdma_alloc_mr(struct ib_pd *ibpd, mbx_err: ocrdma_free_mr_pbl_tbl(dev, &mr->hwmr); pbl_err: + kfree(mr->pl); +pl_err: kfree(mr); return ERR_PTR(-ENOMEM); } -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 24/43] cxgb3: Allocate a provate page list in ib_alloc_mr [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (22 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 23/43] ocrdma: " Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 25/43] cxgb4: Allocate a private " Sagi Grimberg ` (19 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/cxgb3/iwch_provider.c | 9 +++++++++ drivers/infiniband/hw/cxgb3/iwch_provider.h | 2 ++ 2 files changed, 11 insertions(+) diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index af55b79..c9368e6 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -463,6 +463,7 @@ static int iwch_dereg_mr(struct ib_mr *ib_mr) return -EINVAL; mhp = to_iwch_mr(ib_mr); + kfree(mhp->pl); rhp = mhp->rhp; mmid = mhp->attr.stag >> 8; cxio_dereg_mem(&rhp->rdev, mhp->attr.stag, mhp->attr.pbl_size, @@ -817,6 +818,12 @@ static struct ib_mr *iwch_alloc_mr(struct ib_pd *pd, if (!mhp) goto err; + mhp->pl = kcalloc(max_entries, sizeof(u64), GFP_KERNEL); + if (!mhp->pl) { + ret = -ENOMEM; + goto pl_err; + } + mhp->rhp = rhp; ret = iwch_alloc_pbl(mhp, max_entries); if (ret) @@ -843,6 +850,8 @@ err3: err2: iwch_free_pbl(mhp); err1: + kfree(mhp->pl); +pl_err: kfree(mhp); err: return ERR_PTR(ret); diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h index 87c14b0..8e16da9 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.h +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h @@ -77,6 +77,8 @@ struct iwch_mr { struct iwch_dev *rhp; u64 kva; struct tpt_attributes attr; + u64 *pl; + u32 npages; }; typedef struct iwch_mw iwch_mw_handle; -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 25/43] cxgb4: Allocate a private page list in ib_alloc_mr [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (23 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 24/43] cxgb3: Allocate a provate " Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 26/43] qib: " Sagi Grimberg ` (18 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 4 ++++ drivers/infiniband/hw/cxgb4/mem.c | 15 +++++++++++++++ 2 files changed, 19 insertions(+) diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h index 886be9c..e529ace 100644 --- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h +++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h @@ -386,6 +386,10 @@ struct c4iw_mr { struct c4iw_dev *rhp; u64 kva; struct tpt_attributes attr; + u64 *mpl; + dma_addr_t mpl_addr; + u32 max_mpl_len; + u32 mpl_len; }; static inline struct c4iw_mr *to_c4iw_mr(struct ib_mr *ibmr) diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c index 5ecf4aa..91aedce 100644 --- a/drivers/infiniband/hw/cxgb4/mem.c +++ b/drivers/infiniband/hw/cxgb4/mem.c @@ -864,6 +864,7 @@ struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd, u32 mmid; u32 stag = 0; int ret = 0; + int length = roundup(max_entries * sizeof(u64), 32); if (mr_type != IB_MR_TYPE_FAST_REG || flags) return ERR_PTR(-EINVAL); @@ -876,6 +877,14 @@ struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd, goto err; } + mhp->mpl = dma_alloc_coherent(&rhp->rdev.lldi.pdev->dev, + length, &mhp->mpl_addr, GFP_KERNEL); + if (!mhp->mpl) { + ret = -ENOMEM; + goto err_mpl; + } + mhp->max_mpl_len = length; + mhp->rhp = rhp; ret = alloc_pbl(mhp, max_entries); if (ret) @@ -905,6 +914,9 @@ err2: c4iw_pblpool_free(&mhp->rhp->rdev, mhp->attr.pbl_addr, mhp->attr.pbl_size << 3); err1: + dma_free_coherent(&mhp->rhp->rdev.lldi.pdev->dev, + mhp->max_mpl_len, mhp->mpl, mhp->mpl_addr); +err_mpl: kfree(mhp); err: return ERR_PTR(ret); @@ -970,6 +982,9 @@ int c4iw_dereg_mr(struct ib_mr *ib_mr) rhp = mhp->rhp; mmid = mhp->attr.stag >> 8; remove_handle(rhp, &rhp->mmidr, mmid); + if (mhp->mpl) + dma_free_coherent(&mhp->rhp->rdev.lldi.pdev->dev, + mhp->max_mpl_len, mhp->mpl, mhp->mpl_addr); dereg_mem(&rhp->rdev, mhp->attr.stag, mhp->attr.pbl_size, mhp->attr.pbl_addr); if (mhp->attr.pbl_size) -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 26/43] qib: Allocate a private page list in ib_alloc_mr [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (24 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 25/43] cxgb4: Allocate a private " Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 27/43] nes: " Sagi Grimberg ` (17 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/qib/qib_mr.c | 9 +++++++++ drivers/infiniband/hw/qib/qib_verbs.h | 2 ++ 2 files changed, 11 insertions(+) diff --git a/drivers/infiniband/hw/qib/qib_mr.c b/drivers/infiniband/hw/qib/qib_mr.c index 2a4afea..a58a347 100644 --- a/drivers/infiniband/hw/qib/qib_mr.c +++ b/drivers/infiniband/hw/qib/qib_mr.c @@ -303,6 +303,7 @@ int qib_dereg_mr(struct ib_mr *ibmr) int ret = 0; unsigned long timeout; + kfree(mr->pl); qib_free_lkey(&mr->mr); qib_put_mr(&mr->mr); /* will set completion if last */ @@ -341,7 +342,15 @@ struct ib_mr *qib_alloc_mr(struct ib_pd *pd, if (IS_ERR(mr)) return (struct ib_mr *)mr; + mr->pl = kcalloc(max_entries, sizeof(u64), GFP_KERNEL); + if (!mr->pl) + goto err; + return &mr->ibmr; + +err: + qib_dereg_mr(&mr->ibmr); + return ERR_PTR(-ENOMEM); } struct ib_fast_reg_page_list * diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h index 8fbd995..c8062ae 100644 --- a/drivers/infiniband/hw/qib/qib_verbs.h +++ b/drivers/infiniband/hw/qib/qib_verbs.h @@ -330,6 +330,8 @@ struct qib_mr { struct ib_mr ibmr; struct ib_umem *umem; struct qib_mregion mr; /* must be last */ + u64 *pl; + u32 npages; }; /* -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 27/43] nes: Allocate a private page list in ib_alloc_mr [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (25 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 26/43] qib: " Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 28/43] IB/core: Introduce new fast registration API Sagi Grimberg ` (16 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/nes/nes_verbs.c | 27 +++++++++++++++++++++++++++ drivers/infiniband/hw/nes/nes_verbs.h | 5 +++++ 2 files changed, 32 insertions(+) diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c index 752e6ea..532496d 100644 --- a/drivers/infiniband/hw/nes/nes_verbs.c +++ b/drivers/infiniband/hw/nes/nes_verbs.c @@ -51,6 +51,7 @@ atomic_t qps_created; atomic_t sw_qps_destroyed; static void nes_unregister_ofa_device(struct nes_ib_device *nesibdev); +static int nes_dereg_mr(struct ib_mr *ib_mr); /** * nes_alloc_mw @@ -443,7 +444,25 @@ static struct ib_mr *nes_alloc_mr(struct ib_pd *ibpd, nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); ibmr = ERR_PTR(-ENOMEM); } + + nesmr->pl = kcalloc(max_entries, sizeof(u64), GFP_KERNEL); + if (!nesmr->pl) + goto err; + + nesmr->mpl = pci_alloc_consistent(nesdev->pcidev, + max_entries * sizeof(u64), + &nesmr->mpl_addr); + if (!nesmr->mpl_addr) + goto err; + + nesmr->max_pages = max_entries; + return ibmr; + +err: + nes_dereg_mr(ibmr); + + return ERR_PTR(-ENOMEM); } /* @@ -2681,6 +2700,14 @@ static int nes_dereg_mr(struct ib_mr *ib_mr) u16 major_code; u16 minor_code; + + kfree(nesmr->pl); + if (nesmr->mpl) + pci_free_consistent(nesdev->pcidev, + nesmr->max_pages * sizeof(u64), + nesmr->mpl, + nesmr->mpl_addr); + if (nesmr->region) { ib_umem_release(nesmr->region); } diff --git a/drivers/infiniband/hw/nes/nes_verbs.h b/drivers/infiniband/hw/nes/nes_verbs.h index 309b31c..e99aa69 100644 --- a/drivers/infiniband/hw/nes/nes_verbs.h +++ b/drivers/infiniband/hw/nes/nes_verbs.h @@ -79,6 +79,11 @@ struct nes_mr { u16 pbls_used; u8 mode; u8 pbl_4k; + u64 *pl; + u64 *mpl; + dma_addr_t mpl_addr; + u32 max_pages; + u32 npages; }; struct nes_hw_pb { -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (26 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 27/43] nes: " Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg [not found] ` <1437548143-24893-29-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 6:55 ` [PATCH WIP 29/43] mlx5: Support the new memory " Sagi Grimberg ` (15 subsequent siblings) 43 siblings, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer The new fast registration is receiving a struct scatterlist and converts it to a page list under the verbs API. The user is provided with a new verb ib_map_mr_sg, and a helper to set the send work request structure. The drivers are handed with a generic helper that converts a scatterlist into a vector of pages. Given that some drivers have a shadow mapped page list, I expect that drivers might use their own routines to avoid the extra copies. The new registration API is added with fast_reg for now, but once all drivers and ULPs will be ported, we can drop the old registration API. Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/core/verbs.c | 123 ++++++++++++++++++++++++++++++++++++++++ include/rdma/ib_verbs.h | 37 ++++++++++++ 2 files changed, 160 insertions(+) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index beed431..9875163 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1481,3 +1481,126 @@ int ib_check_mr_status(struct ib_mr *mr, u32 check_mask, mr->device->check_mr_status(mr, check_mask, mr_status) : -ENOSYS; } EXPORT_SYMBOL(ib_check_mr_status); + + +/** + * ib_map_mr_sg() - Populates MR with a dma mapped SG list + * @mr: memory region + * @sg: dma mapped scatterlist + * @sg_nents: number of entries in sg + * @access: access permissions + * + * After this completes successfully, the memory region is ready + * for fast registration. + */ +int ib_map_mr_sg(struct ib_mr *mr, + struct scatterlist *sg, + unsigned short sg_nents, + unsigned int access) +{ + int rc; + + if (!mr->device->map_mr_sg) + return -ENOSYS; + + rc = mr->device->map_mr_sg(mr, sg, sg_nents); + if (!rc) + mr->access = access; + + return rc; +} +EXPORT_SYMBOL(ib_map_mr_sg); + +/** + * ib_sg_to_pages() - Convert a sg list to a page vector + * @dev: ib device + * @sgl: dma mapped scatterlist + * @sg_nents: number of entries in sg + * @max_pages: maximum pages allowed + * @pages: output page vector + * @npages: output number of mapped pages + * @length: output total byte length + * @offset: output first byte offset + * + * Core service helper for drivers to convert a scatter + * list to a page vector. The assumption is that the + * sg must meet the following conditions: + * - Only the first sg is allowed to have an offset + * - All the elements are of the same size - PAGE_SIZE + * - The last element is allowed to have length less than + * PAGE_SIZE + * + * If any of those conditions is not met, the routine will + * fail with EINVAL. + */ +int ib_sg_to_pages(struct scatterlist *sgl, + unsigned short sg_nents, + unsigned short max_pages, + u64 *pages, u32 *npages, + u32 *length, u64 *offset) +{ + struct scatterlist *sg; + u64 last_end_dma_addr = 0, last_page_addr = 0; + unsigned int last_page_off = 0; + int i, j = 0; + + /* TODO: We can do better with huge pages */ + + *offset = sg_dma_address(&sgl[0]); + *length = 0; + + for_each_sg(sgl, sg, sg_nents, i) { + u64 dma_addr = sg_dma_address(sg); + unsigned int dma_len = sg_dma_len(sg); + u64 end_dma_addr = dma_addr + dma_len; + u64 page_addr = dma_addr & PAGE_MASK; + + *length += dma_len; + + /* Fail we ran out of pages */ + if (unlikely(j > max_pages)) + return -EINVAL; + + if (i && sg->offset) { + if (unlikely((last_end_dma_addr) != dma_addr)) { + /* gap - fail */ + goto err; + } + if (last_page_off + dma_len < PAGE_SIZE) { + /* chunk this fragment with the last */ + last_end_dma_addr += dma_len; + last_page_off += dma_len; + continue; + } else { + /* map starting from the next page */ + page_addr = last_page_addr + PAGE_SIZE; + dma_len -= PAGE_SIZE - last_page_off; + } + } + + do { + pages[j++] = page_addr; + page_addr += PAGE_SIZE; + } while (page_addr < end_dma_addr); + + last_end_dma_addr = end_dma_addr; + last_page_addr = end_dma_addr & PAGE_MASK; + last_page_off = end_dma_addr & ~PAGE_MASK; + } + + *npages = j; + + return 0; +err: + pr_err("RDMA alignment violation\n"); + for_each_sg(sgl, sg, sg_nents, i) { + u64 dma_addr = sg_dma_address(sg); + unsigned int dma_len = sg_dma_len(sg); + + pr_err("sg[%d]: offset=0x%x, dma_addr=0x%llx, dma_len=0x%x\n", + i, sg->offset, dma_addr, dma_len); + } + + return -EINVAL; +} +EXPORT_SYMBOL(ib_sg_to_pages); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 7a93e2d..d543fee 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1013,6 +1013,7 @@ enum ib_wr_opcode { IB_WR_RDMA_READ_WITH_INV, IB_WR_LOCAL_INV, IB_WR_FAST_REG_MR, + IB_WR_FASTREG_MR, IB_WR_MASKED_ATOMIC_CMP_AND_SWP, IB_WR_MASKED_ATOMIC_FETCH_AND_ADD, IB_WR_BIND_MW, @@ -1117,6 +1118,10 @@ struct ib_send_wr { u32 rkey; } fast_reg; struct { + struct ib_mr *mr; + u32 key; + } fastreg; + struct { struct ib_mw *mw; /* The new rkey for the memory window. */ u32 rkey; @@ -1316,6 +1321,9 @@ struct ib_mr { struct ib_uobject *uobject; u32 lkey; u32 rkey; + int access; + u64 iova; + u32 length; atomic_t usecnt; /* count number of MWs */ }; @@ -1661,6 +1669,9 @@ struct ib_device { enum ib_mr_type mr_type, u32 max_entries, u32 flags); + int (*map_mr_sg)(struct ib_mr *mr, + struct scatterlist *sg, + unsigned short sg_nents); struct ib_fast_reg_page_list * (*alloc_fast_reg_page_list)(struct ib_device *device, int page_list_len); void (*free_fast_reg_page_list)(struct ib_fast_reg_page_list *page_list); @@ -2991,4 +3002,30 @@ static inline int ib_check_mr_access(int flags) int ib_check_mr_status(struct ib_mr *mr, u32 check_mask, struct ib_mr_status *mr_status); +int ib_map_mr_sg(struct ib_mr *mr, + struct scatterlist *sg, + unsigned short sg_nents, + unsigned int access); + +int ib_sg_to_pages(struct scatterlist *sgl, + unsigned short sg_nents, + unsigned short max_pages, + u64 *pages, u32 *npages, + u32 *length, u64 *offset); + +static inline void +ib_set_fastreg_wr(struct ib_mr *mr, + u32 key, + uintptr_t wr_id, + bool signaled, + struct ib_send_wr *wr) +{ + wr->opcode = IB_WR_FASTREG_MR; + wr->wr_id = wr_id; + wr->send_flags = signaled ? IB_SEND_SIGNALED : 0; + wr->num_sge = 0; + wr->wr.fastreg.mr = mr; + wr->wr.fastreg.key = key; +} + #endif /* IB_VERBS_H */ -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
[parent not found: <1437548143-24893-29-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <1437548143-24893-29-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2015-07-22 16:50 ` Christoph Hellwig [not found] ` <20150722165012.GC6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-07-22 18:02 ` Jason Gunthorpe 2015-07-28 11:20 ` Haggai Eran 2 siblings, 1 reply; 142+ messages in thread From: Christoph Hellwig @ 2015-07-22 16:50 UTC (permalink / raw) To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer > +/** > + * ib_map_mr_sg() - Populates MR with a dma mapped SG list > + * @mr: memory region > + * @sg: dma mapped scatterlist > + * @sg_nents: number of entries in sg > + * @access: access permissions I know moving the access flags here was my idea originally, but I seem convinced by your argument that it might fit in better with the posting helper. Or did someone else come up with a better argument that mine for moving it here? > +int ib_map_mr_sg(struct ib_mr *mr, > + struct scatterlist *sg, > + unsigned short sg_nents, > + unsigned int access) > +{ > + int rc; > + > + if (!mr->device->map_mr_sg) > + return -ENOSYS; > + > + rc = mr->device->map_mr_sg(mr, sg, sg_nents); Do we really need a driver callout here? It seems like we should just do the map here, and then either have a flag for the mlx5 indirect mapping, or if you want to keep the abstraction add the method at that point but make it optional, so that all the other drivers don't need the boilerplate code. Also it seems like this returns 0/-error. How do callers like SRP see that it only did a partial mapping and it needs another MR? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150722165012.GC6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <20150722165012.GC6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> @ 2015-07-22 16:56 ` Sagi Grimberg 2015-07-22 17:44 ` Jason Gunthorpe 1 sibling, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 16:56 UTC (permalink / raw) To: Christoph Hellwig, Sagi Grimberg Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/22/2015 7:50 PM, Christoph Hellwig wrote: >> +/** >> + * ib_map_mr_sg() - Populates MR with a dma mapped SG list >> + * @mr: memory region >> + * @sg: dma mapped scatterlist >> + * @sg_nents: number of entries in sg >> + * @access: access permissions > > I know moving the access flags here was my idea originally, but I seem > convinced by your argument that it might fit in better with the posting > helper. Or did someone else come up with a better argument that mine > for moving it here? Not really. I was and still pretty indifferent about it... > >> +int ib_map_mr_sg(struct ib_mr *mr, >> + struct scatterlist *sg, >> + unsigned short sg_nents, >> + unsigned int access) >> +{ >> + int rc; >> + >> + if (!mr->device->map_mr_sg) >> + return -ENOSYS; >> + >> + rc = mr->device->map_mr_sg(mr, sg, sg_nents); > > Do we really need a driver callout here? It seems like we should > just do the map here, and then either have a flag for the mlx5 indirect > mapping, or if you want to keep the abstraction add the method at that > point but make it optional, so that all the other drivers don't need the > boilerplate code. I commented on this bit in another reply. I think that several drivers will want to use their own mappings. But I can change that if it's not the case... > > Also it seems like this returns 0/-error. How do callers like SRP > see that it only did a partial mapping and it needs another MR? Umm, I think SRP would need to iterate over the sg list and pass partial SGs to the mapping (I can add a break; statement if we met sg_nents) It's not perfect, but the idea was not to do backflips here. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <20150722165012.GC6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-07-22 16:56 ` Sagi Grimberg @ 2015-07-22 17:44 ` Jason Gunthorpe [not found] ` <20150722174401.GG26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 1 sibling, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-22 17:44 UTC (permalink / raw) To: Christoph Hellwig Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Wed, Jul 22, 2015 at 09:50:12AM -0700, Christoph Hellwig wrote: > > +/** > > + * ib_map_mr_sg() - Populates MR with a dma mapped SG list > > + * @mr: memory region > > + * @sg: dma mapped scatterlist > > + * @sg_nents: number of entries in sg > > + * @access: access permissions > > I know moving the access flags here was my idea originally, but I seem > convinced by your argument that it might fit in better with the posting > helper. Or did someone else come up with a better argument that mine > for moving it here? I was hoping we'd move the DMA flush and translate into here and make it mandatory. Is there any reason not to do that? > > +int ib_map_mr_sg(struct ib_mr *mr, > > + struct scatterlist *sg, > > + unsigned short sg_nents, > > + unsigned int access) > > +{ > > + int rc; > > + > > + if (!mr->device->map_mr_sg) > > + return -ENOSYS; > > + > > + rc = mr->device->map_mr_sg(mr, sg, sg_nents); > > Do we really need a driver callout here? It seems like we should The call out makes sense to me.. The driver will convert the scatter list directly into whatever HW representation it needs and prepare everything for posting. Every driver has a different HW format, so it must be a callout. > Also it seems like this returns 0/-error. How do callers like SRP > see that it only did a partial mapping and it needs another MR? I would think it is an error to pass in more sg_nents than the MR was created with, so SRP should never get a partial mapping as it should never ask for more than max_entries. (? Sagi, did I get the intent of this right?) Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150722174401.GG26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <20150722174401.GG26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-23 9:19 ` Christoph Hellwig [not found] ` <20150723091955.GA32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-07-23 10:15 ` Sagi Grimberg 1 sibling, 1 reply; 142+ messages in thread From: Christoph Hellwig @ 2015-07-23 9:19 UTC (permalink / raw) To: Jason Gunthorpe Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Wed, Jul 22, 2015 at 11:44:01AM -0600, Jason Gunthorpe wrote: > I was hoping we'd move the DMA flush and translate into here and make > it mandatory. Is there any reason not to do that? That would be a reason for passing in a direction, but it would also up the question on what form we pass that access flag in. The old-school RDMA local/remote read/write flags, or a enum_dma_direction and either a bool or separate functions for lkey/rkey. Although I wonder if we really need to differentiate between rkey and leky in this ib_map_mr_sg function, or if we should do it when allocating the mr, i.e. in ib_alloc_mr. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150723091955.GA32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <20150723091955.GA32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> @ 2015-07-23 16:03 ` Jason Gunthorpe 0 siblings, 0 replies; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-23 16:03 UTC (permalink / raw) To: Christoph Hellwig Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Thu, Jul 23, 2015 at 02:19:55AM -0700, Christoph Hellwig wrote: > Although I wonder if we really need to differentiate between rkey and > leky in this ib_map_mr_sg function, or if we should do it when > allocating the mr, i.e. in ib_alloc_mr. The allocation is agnostic to the usage, the map is what solidifies things into a certain use, effectively based on the access flags.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <20150722174401.GG26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-23 9:19 ` Christoph Hellwig @ 2015-07-23 10:15 ` Sagi Grimberg [not found] ` <55B0BEB4.9080702-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 1 sibling, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-23 10:15 UTC (permalink / raw) To: Jason Gunthorpe, Christoph Hellwig Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/22/2015 8:44 PM, Jason Gunthorpe wrote: > On Wed, Jul 22, 2015 at 09:50:12AM -0700, Christoph Hellwig wrote: >>> +/** >>> + * ib_map_mr_sg() - Populates MR with a dma mapped SG list >>> + * @mr: memory region >>> + * @sg: dma mapped scatterlist >>> + * @sg_nents: number of entries in sg >>> + * @access: access permissions >> >> I know moving the access flags here was my idea originally, but I seem >> convinced by your argument that it might fit in better with the posting >> helper. Or did someone else come up with a better argument that mine >> for moving it here? > > I was hoping we'd move the DMA flush and translate into here and make > it mandatory. Is there any reason not to do that? The reason I didn't added it in was so the ULPs can make sure they meet the restrictions of ib_map_mr_sg(). Allow SRP to iterate on his SG list set partials and iSER to detect gaps (they need to dma map for that). > >>> +int ib_map_mr_sg(struct ib_mr *mr, >>> + struct scatterlist *sg, >>> + unsigned short sg_nents, >>> + unsigned int access) >>> +{ >>> + int rc; >>> + >>> + if (!mr->device->map_mr_sg) >>> + return -ENOSYS; >>> + >>> + rc = mr->device->map_mr_sg(mr, sg, sg_nents); >> >> Do we really need a driver callout here? It seems like we should > > The call out makes sense to me.. > > The driver will convert the scatter list directly into whatever HW > representation it needs and prepare everything for posting. Every > driver has a different HW format, so it must be a callout. > >> Also it seems like this returns 0/-error. How do callers like SRP >> see that it only did a partial mapping and it needs another MR? > > I would think it is an error to pass in more sg_nents than the MR was > created with, so SRP should never get a partial mapping as it should > never ask for more than max_entries. > > (? Sagi, did I get the intent of this right?) Error is returned when: - sg_nents > max_entries - sg has gaps -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <55B0BEB4.9080702-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <55B0BEB4.9080702-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2015-07-23 17:55 ` Jason Gunthorpe [not found] ` <20150723175535.GE25174-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-23 18:42 ` Jason Gunthorpe 1 sibling, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-23 17:55 UTC (permalink / raw) To: Sagi Grimberg Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Thu, Jul 23, 2015 at 01:15:16PM +0300, Sagi Grimberg wrote: > >I was hoping we'd move the DMA flush and translate into here and make > >it mandatory. Is there any reason not to do that? > > The reason I didn't added it in was so the ULPs can make sure they meet > the restrictions of ib_map_mr_sg(). Allow SRP to iterate on his > SG list set partials and iSER to detect gaps (they need to dma map > for that). The ULP can always get the sg list's virtual address to check for gaps. Page aligned gaps are always OK. BTW, the logic in ib_sg_to_pages should be checking that directly, as coded, it won't work with swiotlb: // Only the first SG entry can start unaligned if (i && page_addr != dma_addr) return EINVAL; // Only the last SG entry can end unaligned if ((page_addr + dma_len) & PAGE_MASK != end_dma_addr) if (!is_last) return EINVAL; Don't use sg->offset after dma mapping. The biggest problem with checking the virtual address is swiotlb. However, if swiotlb is used this API is basically broken as swiotlb downgrades everything to a 2k alignment, which means we only ever get 1 s/g entry. To efficiently support swiotlb we'd need to see the driver be able to work with a page size of IO_TLB_SEGSIZE (2k) so it can handle the de-alignment that happens during bouncing. My biggest problem with pushing the dma address up to the ULP is basically that: The ULP has no idea what the driver can handle, maybe the driver can handle the 2k pages. So, that leaves a flow where the ULP does a basic sanity check on the virtual side, then asks the IB core to map it. The mapping could still fail because of swiotlb. If the mapping fails, then the ULP has to bounce buffer, or MR split, or totally fail. For bounce buffer, all solutions have a DMA map/unmap cost, so it doesn't matter if ib_map_mr_sg does that internally. For MR fragment, the DMA mapping is still usable. Maybe we do need a slightly different core API to help MR fragmentation? Sounds like NFS uses this too? num_mrs = ib_mr_fragment_sg(&scatterlist); while (..) ib_map_fragment_sg(mr[i++],&scatterlist,&offset); Perhaps? Maybe that is even better because something like iser could do the parallel: ib_mr_needs_fragment_sg(reference_mr,&scatterlist) Which hides all the various restrictions in driver code. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150723175535.GE25174-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <20150723175535.GE25174-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-26 9:37 ` Sagi Grimberg [not found] ` <55B4AA73.3090803-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-08-19 11:56 ` Sagi Grimberg 1 sibling, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-26 9:37 UTC (permalink / raw) To: Jason Gunthorpe Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/23/2015 8:55 PM, Jason Gunthorpe wrote: > On Thu, Jul 23, 2015 at 01:15:16PM +0300, Sagi Grimberg wrote: >>> I was hoping we'd move the DMA flush and translate into here and make >>> it mandatory. Is there any reason not to do that? >> >> The reason I didn't added it in was so the ULPs can make sure they meet >> the restrictions of ib_map_mr_sg(). Allow SRP to iterate on his >> SG list set partials and iSER to detect gaps (they need to dma map >> for that). > > The ULP can always get the sg list's virtual address to check for > gaps. Page aligned gaps are always OK. I guess I can pull DMA mapping in there, but we will need an opposite routine ib_umap_mr_sg() since it'll be weird if the ULP will do dma unmap without doing the map... > > BTW, the logic in ib_sg_to_pages should be checking that directly, as > coded, it won't work with swiotlb: > > // Only the first SG entry can start unaligned > if (i && page_addr != dma_addr) > return EINVAL; > // Only the last SG entry can end unaligned > if ((page_addr + dma_len) & PAGE_MASK != end_dma_addr) > if (!is_last) > return EINVAL; > > Don't use sg->offset after dma mapping. > > The biggest problem with checking the virtual address is > swiotlb. However, if swiotlb is used this API is basically broken as > swiotlb downgrades everything to a 2k alignment, which means we only > ever get 1 s/g entry. Can you explain what do you mean by "downgrades everything to a 2k alignment"? If the ULP is responsible for a PAGE_SIZE alignment than how would this get out of alignment with swiotlb? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <55B4AA73.3090803-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <55B4AA73.3090803-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2015-07-27 17:04 ` Jason Gunthorpe [not found] ` <20150727170459.GA18348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-27 17:04 UTC (permalink / raw) To: Sagi Grimberg Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Sun, Jul 26, 2015 at 12:37:55PM +0300, Sagi Grimberg wrote: > I guess I can pull DMA mapping in there, but we will need an opposite > routine ib_umap_mr_sg() since it'll be weird if the ULP will do dma > unmap without doing the map... Yes, and ideally it would help ULPs to order these operations properly. eg we shouldn't be abusing the DMA API and unmapping before invalidate completes by default. That breaks obscure stuff in various ways... > >The biggest problem with checking the virtual address is > >swiotlb. However, if swiotlb is used this API is basically broken as > >swiotlb downgrades everything to a 2k alignment, which means we only > >ever get 1 s/g entry. > > Can you explain what do you mean by "downgrades everything to a 2k > alignment"? If the ULP is responsible for a PAGE_SIZE alignment than > how would this get out of alignment with swiotlb? swiotlb copies all DMA maps to a shared buffer below 4G so it can be used with 32 bit devices. The shared buffer is managed in a way that copies each s/g element to a continuous 2k aligned subsection of the buffer. Basically, swiotlb realigns everything that passes through it. The DMA API allows this, so ultimately, code has to check the dma physical address when concerned about alignment.. But we should not expect this to commonly fail. So, something like.. if (!ib_does_sgl_fit_in_mr(mr,sg)) .. bounce buffer .. if (!ib_map_mr_sg(mr,sg)) // does dma mapping and checks it .. bounce buffer .. .. post .. .. send invalidate .. .. catch invalidate completion ... ib_unmap_mr(mr); // does dma unmap ? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150727170459.GA18348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <20150727170459.GA18348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-30 7:13 ` Sagi Grimberg [not found] ` <55B9CE85.40007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-30 7:13 UTC (permalink / raw) To: Jason Gunthorpe Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer >> Can you explain what do you mean by "downgrades everything to a 2k >> alignment"? If the ULP is responsible for a PAGE_SIZE alignment than >> how would this get out of alignment with swiotlb? > > swiotlb copies all DMA maps to a shared buffer below 4G so it can be > used with 32 bit devices. > > The shared buffer is managed in a way that copies each s/g element to > a continuous 2k aligned subsection of the buffer. > Thanks for the explanation. > Basically, swiotlb realigns everything that passes through it. So this won't ever happen if the ULP will DMA map the SG and check for gaps right? Also, is it interesting to support swiotlb even if we don't have any devices that require it (and should we expect one to ever exist)? > > The DMA API allows this, so ultimately, code has to check the dma > physical address when concerned about alignment.. But we should not > expect this to commonly fail. > > So, something like.. > > if (!ib_does_sgl_fit_in_mr(mr,sg)) > .. bounce buffer .. I don't understand the need for this is we do the same thing if the actual mapping fails... > > if (!ib_map_mr_sg(mr,sg)) // does dma mapping and checks it > .. bounce buffer .. Each ULP would want to do something different, iser will bounce but srp would need to use multiple mrs, nfs will split the request. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <55B9CE85.40007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <55B9CE85.40007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2015-07-30 16:36 ` Jason Gunthorpe [not found] ` <20150730163631.GB16659-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-30 16:36 UTC (permalink / raw) To: Sagi Grimberg Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Thu, Jul 30, 2015 at 10:13:09AM +0300, Sagi Grimberg wrote: > >Basically, swiotlb realigns everything that passes through it. > > So this won't ever happen if the ULP will DMA map the SG and check > for gaps right? Once mapped the physical address isn't going to change - but at some point we must check the physical address directly. > Also, is it interesting to support swiotlb even if we don't have > any devices that require it (and should we expect one to ever exist)? swiotlb is an obvious example, and totally uninteresting to support, but we must correctly use the DMA API. > >The DMA API allows this, so ultimately, code has to check the dma > >physical address when concerned about alignment.. But we should not > >expect this to commonly fail. > > > >So, something like.. > > > > if (!ib_does_sgl_fit_in_mr(mr,sg)) > > .. bounce buffer .. > > I don't understand the need for this is we do the same thing > if the actual mapping fails... Just performance. DMA mapping is potentially very expensive, the common case to detect will be a sg that is virtually unaligned. This virtual scan could be bundled insde the map, but if a ULP knows it is page aligned already then that is just creating overhead.. I'm ambivalent.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150730163631.GB16659-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <20150730163631.GB16659-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-30 16:39 ` Christoph Hellwig 0 siblings, 0 replies; 142+ messages in thread From: Christoph Hellwig @ 2015-07-30 16:39 UTC (permalink / raw) To: Jason Gunthorpe Cc: Sagi Grimberg, Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Thu, Jul 30, 2015 at 10:36:31AM -0600, Jason Gunthorpe wrote: > > Also, is it interesting to support swiotlb even if we don't have > > any devices that require it (and should we expect one to ever exist)? > > swiotlb is an obvious example, and totally uninteresting to support, > but we must correctly use the DMA API. Do we have a choice? It seems like various setups with DMA restrictions rely on it, including many Xen PV guests. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <20150723175535.GE25174-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-26 9:37 ` Sagi Grimberg @ 2015-08-19 11:56 ` Sagi Grimberg [not found] ` <55D46EE8.4060701-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 1 sibling, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-08-19 11:56 UTC (permalink / raw) To: Jason Gunthorpe Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/23/2015 8:55 PM, Jason Gunthorpe wrote: > On Thu, Jul 23, 2015 at 01:15:16PM +0300, Sagi Grimberg wrote: >>> I was hoping we'd move the DMA flush and translate into here and make >>> it mandatory. Is there any reason not to do that? >> >> The reason I didn't added it in was so the ULPs can make sure they meet >> the restrictions of ib_map_mr_sg(). Allow SRP to iterate on his >> SG list set partials and iSER to detect gaps (they need to dma map >> for that). > > The ULP can always get the sg list's virtual address to check for > gaps. Page aligned gaps are always OK. So I had a go with moving the DMA mapping into ib_map_mr_sg() and it turns out mapping somewhat poorly if the ULP _may_ register memory or just send sg_lists (like storage targets over IB/iWARP). So the ULP will sometimes use the DMA mapping and sometimes it won't... feels kinda off to me... it's much saner to do: 1. dma_map_sg 2. register / send-sg-list 3. unregister (if needed) 4. dma_unmap_sg then: 1. if register - call ib_map_mr_sg (which calls dma_map_sg) else do dma_map_sg 2. if registered - call ib_dma_unmap_sg (which calles dma_unmap_sg) else do dma_unmap_sg this kinda forces ULP to completely separate these code paths with with very little sharing. Also, at the moment, when ULPs are doing either FRWR or FMRs its a pain to get a non-intrusive conversion. I'm thinking we should keep dma_map_sg out of ib_map_mr_sg, and leave it to the ULP like it does today (at least in the first stage...) Thoughts? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <55D46EE8.4060701-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <55D46EE8.4060701-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2015-08-19 12:52 ` Christoph Hellwig [not found] ` <20150819125253.GB24746-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-08-19 17:37 ` Jason Gunthorpe 1 sibling, 1 reply; 142+ messages in thread From: Christoph Hellwig @ 2015-08-19 12:52 UTC (permalink / raw) To: Sagi Grimberg Cc: Jason Gunthorpe, Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Wed, Aug 19, 2015 at 02:56:24PM +0300, Sagi Grimberg wrote: > So I had a go with moving the DMA mapping into ib_map_mr_sg() and > it turns out mapping somewhat poorly if the ULP _may_ register memory > or just send sg_lists (like storage targets over IB/iWARP). So the ULP > will sometimes use the DMA mapping and sometimes it won't... feels > kinda off to me... Yes, it's odd. > it's much saner to do: > 1. dma_map_sg > 2. register / send-sg-list > 3. unregister (if needed) > 4. dma_unmap_sg > > then: > 1. if register - call ib_map_mr_sg (which calls dma_map_sg) > else do dma_map_sg > 2. if registered - call ib_dma_unmap_sg (which calles dma_unmap_sg) > else do dma_unmap_sg > > this kinda forces ULP to completely separate these code paths with > with very little sharing. > > Also, at the moment, when ULPs are doing either FRWR or FMRs > its a pain to get a non-intrusive conversion. > > I'm thinking we should keep dma_map_sg out of ib_map_mr_sg, and leave > it to the ULP like it does today (at least in the first stage...) Keep it out for now. I think we need to move the dma mapping into the RDMA care rather sooner than later, but that must also include ib_post_send/recv, so it's better done separately. After having a look at the mess some drivers (ipath,qib,hfi & ehca) cause with abuse of dma_map_ops I've got an even strong opion on the whole subject now. However I think we'll get more things done if we split them into smaller steps. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150819125253.GB24746-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <20150819125253.GB24746-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> @ 2015-08-19 16:09 ` Sagi Grimberg [not found] ` <55D4AA2E.7090204-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-08-19 16:09 UTC (permalink / raw) To: Christoph Hellwig Cc: Jason Gunthorpe, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer > > Keep it out for now. Ok, I was also thinking on moving the access flags to the work request again. It doesn't make much sense there unless I go with what Jason suggested with ib_map_mr_[lkey|rkey] to protect against remote access for lkeys in IB which to me, sounds redundant at this point given that ULPs will set the access according to iWARP anyway. I'd prefer to get this right with a different helper like Steve suggested: int rdma_access_flags(int mr_roles); This way we don't need to protect against it. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <55D4AA2E.7090204-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <55D4AA2E.7090204-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2015-08-19 16:58 ` Christoph Hellwig 0 siblings, 0 replies; 142+ messages in thread From: Christoph Hellwig @ 2015-08-19 16:58 UTC (permalink / raw) To: Sagi Grimberg Cc: Christoph Hellwig, Jason Gunthorpe, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer, Chuck Lever, Wengang Wang On Wed, Aug 19, 2015 at 07:09:18PM +0300, Sagi Grimberg wrote: > Ok, I was also thinking on moving the access flags > to the work request again. Yes, with the current code I don't think we need it in the MR. > I'd prefer to get this right with a different helper like Steve > suggested: > int rdma_access_flags(int mr_roles); We can start with that. In the long run we really want to have two higher level helpers to RDMA READ a scatterlist: - one for iWARP that uses an FR and RDMA READ WITH INVALIDATE - one of IB-like transports that just uses a READ with the local lkey Right now every ULP that wants to support iWarp needs to duplicate that code. This leads to some curious situations like the NFS server apparently always using FRs if available for this if my reading of svc_rdma_accept() is correct, or the weird parallel code pathes for IB vs iWarp in RDS: hch@brick:~/work/linux/net/rds$ ls ib* ib.c ib_cm.c ib.h ib_rdma.c ib_recv.c ib_ring.c ib_send.c ib_stats.c ib_sysctl.c hch@brick:~/work/linux/net/rds$ ls iw* iw.c iw_cm.c iw.h iw_rdma.c iw_recv.c iw_ring.c iw_send.c iw_stats.c iw_sysctl.c -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <55D46EE8.4060701-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-08-19 12:52 ` Christoph Hellwig @ 2015-08-19 17:37 ` Jason Gunthorpe [not found] ` <20150819173751.GB22646-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 1 sibling, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-08-19 17:37 UTC (permalink / raw) To: Sagi Grimberg Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Wed, Aug 19, 2015 at 02:56:24PM +0300, Sagi Grimberg wrote: > On 7/23/2015 8:55 PM, Jason Gunthorpe wrote: > >On Thu, Jul 23, 2015 at 01:15:16PM +0300, Sagi Grimberg wrote: > >>>I was hoping we'd move the DMA flush and translate into here and make > >>>it mandatory. Is there any reason not to do that? > >> > >>The reason I didn't added it in was so the ULPs can make sure they meet > >>the restrictions of ib_map_mr_sg(). Allow SRP to iterate on his > >>SG list set partials and iSER to detect gaps (they need to dma map > >>for that). > > > >The ULP can always get the sg list's virtual address to check for > >gaps. Page aligned gaps are always OK. > > So I had a go with moving the DMA mapping into ib_map_mr_sg() and > it turns out mapping somewhat poorly if the ULP _may_ register memory > or just send sg_lists (like storage targets over IB/iWARP). So the ULP > will sometimes use the DMA mapping and sometimes it won't... feels > kinda off to me... You need to split the rkey and lkey API flows to pull this off - the rkey side never needs to touch a sg, while the lkey side should always try and use a sg first. I keep saying this: they have fundamentally different ULP usages. > 1. if register - call ib_map_mr_sg (which calls dma_map_sg) > else do dma_map_sg > 2. if registered - call ib_dma_unmap_sg (which calles dma_unmap_sg) > else do dma_unmap_sg >From what I've seen in the ULPs the flow control is generally such that the MR is 'consumed' even if it isn't used by a send. So lkey usage is simply split into things that absolutely don't need a MR, and things that maybe do. The maybe side can go ahead and always consume the MR resource, but optimize the implementation to a SG list to avoid a performance hit. Then the whole API becomes symmetric. The ULP says, 'here is a scatterlist list and a lkey MR, make me a ib_sg list' and the core either packes it as is into the sg, or it spins up the MR and packs that. This lets the unmap be symmetric, as the core always dma_unmaps, but only tears down the MR if it was used. The cost is the lkey MR slot is always consumed, which should be OK because SQE flow control bounds the number of concurrent MRs required, so consuming a SQE but not a MR doesn't provide an advantage. > Also, at the moment, when ULPs are doing either FRWR or FMRs > its a pain to get a non-intrusive conversion. Without FMR sharing API entry points it is going to be hard to unify them.. ie the map and alloc API side certainly could be shared.. > I'm thinking we should keep dma_map_sg out of ib_map_mr_sg, and leave > it to the ULP like it does today (at least in the first stage...) I'm fine with first stage, but we still really do need to figure how how to get better code sharing in our API here.. Maybe we can do the rkey side right away until we can figure out how to harmonize the rkey sg/mr usage? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150819173751.GB22646-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <20150819173751.GB22646-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-08-20 10:05 ` Sagi Grimberg [not found] ` <55D5A687.90102-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-08-20 10:05 UTC (permalink / raw) To: Jason Gunthorpe Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer >> 1. if register - call ib_map_mr_sg (which calls dma_map_sg) >> else do dma_map_sg >> 2. if registered - call ib_dma_unmap_sg (which calles dma_unmap_sg) >> else do dma_unmap_sg > > From what I've seen in the ULPs the flow control is generally such > that the MR is 'consumed' even if it isn't used by a send. Not really. if registration is not needed, an MR is not consumed. In fact, in svcrdma the IB code path never uses those, and the iWARP code path always use those for RDMA_READs and not RDMA_WRITEs. Also, isert use those only when signature is enabled and registration is required. > > So lkey usage is simply split into things that absolutely don't need a > MR, and things that maybe do. The maybe side can go ahead and always > consume the MR resource, but optimize the implementation to a SG list > to avoid a performance hit. > > Then the whole API becomes symmetric. The ULP says, 'here is a > scatterlist list and a lkey MR, make me a ib_sg list' and the core > either packes it as is into the sg, or it spins up the MR and packs > that. Always consuming an MR resource is an extra lock acquire given these are always kept in a pool structure. >> I'm thinking we should keep dma_map_sg out of ib_map_mr_sg, and leave >> it to the ULP like it does today (at least in the first stage...) > > I'm fine with first stage, but we still really do need to figure how > how to get better code sharing in our API here.. > > Maybe we can do the rkey side right away until we can figure out how > to harmonize the rkey sg/mr usage? I'm fine with that. I agree we still need to do better. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <55D5A687.90102-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <55D5A687.90102-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2015-08-20 19:04 ` Jason Gunthorpe [not found] ` <20150820190413.GB29567-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-08-20 19:04 UTC (permalink / raw) To: Sagi Grimberg Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Thu, Aug 20, 2015 at 01:05:59PM +0300, Sagi Grimberg wrote: > >>1. if register - call ib_map_mr_sg (which calls dma_map_sg) > >> else do dma_map_sg > >>2. if registered - call ib_dma_unmap_sg (which calles dma_unmap_sg) > >> else do dma_unmap_sg > > > > From what I've seen in the ULPs the flow control is generally such > >that the MR is 'consumed' even if it isn't used by a send. > > Not really. if registration is not needed, an MR is not consumed. In > fact, in svcrdma the IB code path never uses those, and the iWARP code > path always use those for RDMA_READs and not RDMA_WRITEs. Also, isert > use those only when signature is enabled and registration is required. The MR is not *used* but it should be 'consumed' - in the sense that every RPC slot is associated (implicitly) with a MR, leaving the unused MR in some kind of pool doesn't really help anything. Honestly, the MR pool idea doesn't really help anything, it just makes confusion. What should be pooled is the 'request slot' itself, in the sense that if a request slot is in the 'ready to go' pool it is guarenteed to be able to complete *any* request without blocking. That means the MR/SQE/CQE resources are all ready to go. Any ancillary memory is ready to use, etc. The ULP should design its slot with the idea that it doesn't have to allocate memory, or IB resources, or block, once the slot becomes 'ready to go'. Review the discussion Chuck and I had on SQE flow control for a sense on what that means. Understand why the lifetime of the MR and the SQE and the slot are all convoluted together if RDMA is used correctly. Trying to decouple the sub resources, ie by separately pooling the MR/SQE/etc, is just unnecessary complexity, IMHO.. NFS client already had serioues bugs in this area. So, I turn to the idea that every ULP should work as the above, which means when it gets to working on a 'slot' that implies there is an actual struct ib_mr resource guaranteed available. This is why I suggested using the 'struct ib_mr' to guide the SG construction even if the actual HW MR isn't going to be used. The struct ib_mr is tied to the slot, so using it has no cost. ------- But, maybe that is too much of a shortcut, and thinking about it more and all the other things I've written about.. Lets just directly address this issue and add something called 'struct ib_op_slot'. Data transfer would look like this: struct *ib_send_wr cur; cur = ib_slot_make_send(&req->op_slot,scatter_list); cur->next = ib_slot_make_rdma_read(&next_req->op,scatter_list, rkey,length); ib_post_send(qp,cur); [.. at CQE time ..] if (ib_slot_complete(qp,req->op_slot)) [.. slot is now 'ready to go' ..] else [.. otherwise more stuff was posted, have to wait ...] This forces the ULP to deal with many of the issues. Having a slot means guarenteed minimum avaiable MR,SQE,CQE resources. That guarenteed minimum avoids the messy API struggle in my prior writings. .. and maybe the above is even thinking too small, to Christoph's earlier musings, I wonder if a slot based middle API could hijack the entire SCQ processing and have a per-slot callback scheme instead. That kind of intervention is exactly what we'd need to trivially hide the FMR difference. ... and now this provides enough context to start talking about common helper APIs for common ULP things, like the rdma_read switch. The slot has pre-allocated everything needed to handle the variations. ... which suddenly starts to be really viable because the slot guarentees SQE availability too. ... and we start having the idea of a slot able to do certain tasks, and codify that with API help at creation: struct nfs_rpc_slot { strict ib_op_slot slot; }; struct ib_op_slot_attributes attrs; ib_init_slot_attrs(&attrs,ib_pd); ib_request_action(&attrs, "args describing RDMA read with N SGEs"); if (ib_request_action("args describing a requirement for signature")) signature_supported = true; if (ib_request_action("args describing a requirement for non-page-aligned")) byte_sgl_supported = true; ib_request_action("args describing SEND with N SGEs"); ib_request_action("args describing N RDMA reads each with N SGEs"); for (required slot concurrency) ib_alloc_slot(&rpc.slot,&attrs); Then the alloc just grabs everything required. ..mumble mumble.. some way to flow into the QP/CQ allocation attributes too .. Essentially, the ULP says 'here is what I want to do with this slot' and the core code *guarentees* that if the slot is 'ready to go' then 'any single work of any requested type' can be queued without blocking or memory allocation. Covers SQEs, CQEs, MRs, etc. ib_request_action is a basic pattern that does various tests and ends up doing: attrs->num_mrs = max(attrs->num_mrs, needed_for_this_action); attrs->num_mrs_sge = max(attrs->num_mrs_sge, needed_for_this_action); attrs->num_wr_sge = max(attrs->num_qp_sqe, needed_for_this_action); attrs->num_sqe = max(attrs->num_sqe, needed_for_this_action); attrs->num_cqe = max(attrs->num_cqe, needed_for_this_action); [ie we compute the maximum allocation needed to satisfy the requested requirement] Each request could fail, eg if signature is not supported then the request_action will fail, so we have a more uniform way to talk about send queue features. ... and the ULP could have a 'heavy' and 'light' slot pool if that made some kind of sense for its work load. So, that is a long road, but maybe there are reasonable interm stages? Anyhow, conceptually, an idea. Eliminates the hated fmr pool concept, cleans up bad thinking around queue flow control. Provides at least a structure to abstract transport differences. --------- It could look something like this: struct ib_op_slot { struct ib_mr **mr_list; // null terminated void *wr_memory; void *sg_memory; unsigned int num_sgs; }; struct ib_send_wr *ib_slot_make_send(struct ib_op_slot *slot, const struct scatter_list *sgl) { dma_map(sgl); if (num_sges(sgl) < slot->num_sgs) { // send fits in the sg list struct ib_send_wr *wr = slot->wr_memory; wr->sg0list = slot->sg_memory; .. pack it in .. return wr; } else { // Need to spin up a MR.. struct { struct ib_send_wr frwr_wr; struct ib_send_wr send_wr; } *wrs = slot->wr_memory; wrs->frwr_wr.next = &wrs->send_wr ... pack it in ... return &wrs->frwr_wr; } // similar for FMR } .. similar concept for rdma read, etc. .. ib_request_action makes sure the wr_memory/sg_memory are pre-sized to accommodate the action. Add optional #ifdef'd debugging to check for bad ULP usage .. function pointers could be used to provide special optimal versions if necessary .. Complex things like signature just vanish from the API. ULP sees something like: if (ib_request_action("args describing a requirement for signature")) signature_supported = true; wr = ib_slot_make_rdma_write_signature(slot,....); Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150820190413.GB29567-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <20150820190413.GB29567-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-08-21 6:34 ` Christoph Hellwig [not found] ` <20150821063458.GA875-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Christoph Hellwig @ 2015-08-21 6:34 UTC (permalink / raw) To: Jason Gunthorpe Cc: Sagi Grimberg, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Thu, Aug 20, 2015 at 01:04:13PM -0600, Jason Gunthorpe wrote: > Trying to decouple the sub resources, ie by separately pooling the > MR/SQE/etc, is just unnecessary complexity, IMHO.. NFS client already > had serioues bugs in this area. > > So, I turn to the idea that every ULP should work as the above, which > means when it gets to working on a 'slot' that implies there is an > actual struct ib_mr resource guaranteed available. This is why I > suggested using the 'struct ib_mr' to guide the SG construction even if > the actual HW MR isn't going to be used. The struct ib_mr is tied to > the slot, so using it has no cost. How is this going to work for drivers that might consumer multiple MRs per request like SRP or similar upcoming block drivers? Unless you want to allocate a potentially large number of MRs for each request that scheme doesn't work. > This forces the ULP to deal with many of the issues. Having a slot > means guarenteed minimum avaiable MR,SQE,CQE resources. That > guarenteed minimum avoids the messy API struggle in my prior writings. > > .. and maybe the above is even thinking too small, to Christoph's > earlier musings, I wonder if a slot based middle API could hijack the > entire SCQ processing and have a per-slot callback scheme > instead. That kind of intervention is exactly what we'd need to > trivially hide the FMR difference. FYI, I have working early patches to do per-WR completion callback, I'll post them after I get them into a slightly better shape. As for your grand schemes: I like some of the idea there, but we need to get there gradually. I'd much prefer to finish Sagi's simple scheme, get my completion work in, add abstractions for RDMA READ and WRITE scatterlist mapping and build things up slowly. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150821063458.GA875-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <20150821063458.GA875-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> @ 2015-08-21 18:08 ` Jason Gunthorpe 0 siblings, 0 replies; 142+ messages in thread From: Jason Gunthorpe @ 2015-08-21 18:08 UTC (permalink / raw) To: Christoph Hellwig Cc: Sagi Grimberg, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Thu, Aug 20, 2015 at 11:34:58PM -0700, Christoph Hellwig wrote: > How is this going to work for drivers that might consumer multiple > MRs per request like SRP or similar upcoming block drivers? Unless > you want to allocate a potentially large number of MRs for each > request that scheme doesn't work. There are at least two approaches, and it depends on how flow control to the driving layer works out. Look at what the ULP does when the existing MR pool exhausts: - Exhaustion is not allowed. In this model every slot must truely handle every required action without blocking. The ULP somehow wrangles things so pool exhaustion is not possible. NFS client is a good example. Where NFS client went wrong is that the MR alone is not enough, issuing a request requires SQE/CQE resources, failing to track that caused hard to find bugs. - Exhaustion is allowed, and somehow the ULP is able to stop processing. In this case you'd just swap MRs for slots in the pool, probably having pools of different kinds of slots to optimize resource use. Pool draw down includes SQE/CQE/etc resources as well. A multiple rkey MR case would just draw down the required slots from the pool. I suspect client side tends to lean toward the first option and target side the second - targets can always do back pressure flow control by simply halting RQE processing, and it makes alot of sense on a target to globally pool slots across all client QPs. This idea of a slot is just a higher level structure we can hang other stuff off - like the sg/mr decision, the iwarp rdma read change, sqe accounting. We don't need to start with everything, but I'm looking at Sagi's notes on trying to factor the lkey side code paths and thinking a broader abstraction than raw MR is needed to solve that. > FYI, I have working early patches to do per-WR completion callback, > I'll post them after I get them into a slightly better shape. Interesting.. > As for your grand schemes: I like some of the idea there, but we > need to get there gradually. I'd much prefer to finish Sagi's simple > scheme, get my completion work in, add abstractions for RDMA READ and > WRITE scatterlist mapping and build things up slowly. Yes, absolutely, we have to go slowly - but exploring how we can fit this together in some other way can help guide some of the smaller choices. Sagi could drop the lkey side, getting the rkey side in order would be nice enough. Something like this is a direction to address the lkey side. Ie we could 1:1 replace MR with 'slot' and use that to factor the lkey code paths. Over time slot can grow organically to factor more code. Slot would be a new object for the core, one that is guarenteed to last from post->completion, that seems like exactly the sort of object a completion callback scheme would benefit from. Guarenteed memory to hang callback pointers/etc off. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <55B0BEB4.9080702-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-07-23 17:55 ` Jason Gunthorpe @ 2015-07-23 18:42 ` Jason Gunthorpe [not found] ` <20150723184221.GA30303-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 1 sibling, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-23 18:42 UTC (permalink / raw) To: Sagi Grimberg Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Thu, Jul 23, 2015 at 01:15:16PM +0300, Sagi Grimberg wrote: > On 7/22/2015 8:44 PM, Jason Gunthorpe wrote: > >On Wed, Jul 22, 2015 at 09:50:12AM -0700, Christoph Hellwig wrote: > >>>+/** > >>>+ * ib_map_mr_sg() - Populates MR with a dma mapped SG list > >>>+ * @mr: memory region > >>>+ * @sg: dma mapped scatterlist > >>>+ * @sg_nents: number of entries in sg > >>>+ * @access: access permissions > >> > >>I know moving the access flags here was my idea originally, but I seem > >>convinced by your argument that it might fit in better with the posting > >>helper. Or did someone else come up with a better argument that mine > >>for moving it here? > > > >I was hoping we'd move the DMA flush and translate into here and make > >it mandatory. Is there any reason not to do that? > > The reason I didn't added it in was so the ULPs can make sure they meet > the restrictions of ib_map_mr_sg(). Allow SRP to iterate on his > SG list set partials and iSER to detect gaps (they need to dma map > for that). I would like to see the kdoc for ib_map_mr_sg explain exactly what is required of the caller, maybe just hoist this bit from the ib_sg_to_pages Not entirely required if we are going to have an API to do the test.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150723184221.GA30303-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <20150723184221.GA30303-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-26 8:54 ` Sagi Grimberg 0 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-26 8:54 UTC (permalink / raw) To: Jason Gunthorpe Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer > > I would like to see the kdoc for ib_map_mr_sg explain exactly what is > required of the caller, maybe just hoist this bit from the > ib_sg_to_pages I'll add the kdoc. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <1437548143-24893-29-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 16:50 ` Christoph Hellwig @ 2015-07-22 18:02 ` Jason Gunthorpe [not found] ` <20150722180203.GI26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-28 11:20 ` Haggai Eran 2 siblings, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-22 18:02 UTC (permalink / raw) To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Wed, Jul 22, 2015 at 09:55:28AM +0300, Sagi Grimberg wrote: > +/** > + * ib_map_mr_sg() - Populates MR with a dma mapped SG list > + * @mr: memory region > + * @sg: dma mapped scatterlist > + * @sg_nents: number of entries in sg > + * @access: access permissions Again, related to my prior comments, please have two of these: ib_map_mr_sg_rkey() ib_map_mr_sg_lkey() So we force ULPs to think about what they are doing properly, and we get a chance to actually force lkey to be local use only for IB. > +static inline void > +ib_set_fastreg_wr(struct ib_mr *mr, > + u32 key, The key should come from MR. Once the above is split then it is obvious which key to use. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150722180203.GI26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <20150722180203.GI26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-23 10:19 ` Sagi Grimberg [not found] ` <55B0BFA4.4060509-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-23 10:19 UTC (permalink / raw) To: Jason Gunthorpe, Sagi Grimberg Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/22/2015 9:02 PM, Jason Gunthorpe wrote: > On Wed, Jul 22, 2015 at 09:55:28AM +0300, Sagi Grimberg wrote: >> +/** >> + * ib_map_mr_sg() - Populates MR with a dma mapped SG list >> + * @mr: memory region >> + * @sg: dma mapped scatterlist >> + * @sg_nents: number of entries in sg >> + * @access: access permissions > > Again, related to my prior comments, please have two of these: > > ib_map_mr_sg_rkey() > ib_map_mr_sg_lkey() > > So we force ULPs to think about what they are doing properly, and we > get a chance to actually force lkey to be local use only for IB. The lkey/rkey decision is passed in the fastreg post_send(). ib_map_mr_sg is just a mapping API, not the registration itself. > >> +static inline void >> +ib_set_fastreg_wr(struct ib_mr *mr, >> + u32 key, > > The key should come from MR. Once the above is split then it is > obvious which key to use. IMO, it's obvious as it is. I don't see why should anyone get it wrong. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <55B0BFA4.4060509-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <55B0BFA4.4060509-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2015-07-23 16:14 ` Jason Gunthorpe [not found] ` <20150723161436.GC25174-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-23 16:14 UTC (permalink / raw) To: Sagi Grimberg Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Thu, Jul 23, 2015 at 01:19:16PM +0300, Sagi Grimberg wrote: > >Again, related to my prior comments, please have two of these: > > > >ib_map_mr_sg_rkey() > >ib_map_mr_sg_lkey() > > > >So we force ULPs to think about what they are doing properly, and we > >get a chance to actually force lkey to be local use only for IB. > > The lkey/rkey decision is passed in the fastreg post_send(). That is too late to check the access flags. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150723161436.GC25174-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <20150723161436.GC25174-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-23 16:47 ` Sagi Grimberg [not found] ` <55B11A92.9040406-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-23 16:47 UTC (permalink / raw) To: Jason Gunthorpe Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/23/2015 7:14 PM, Jason Gunthorpe wrote: > On Thu, Jul 23, 2015 at 01:19:16PM +0300, Sagi Grimberg wrote: >>> Again, related to my prior comments, please have two of these: >>> >>> ib_map_mr_sg_rkey() >>> ib_map_mr_sg_lkey() >>> >>> So we force ULPs to think about what they are doing properly, and we >>> get a chance to actually force lkey to be local use only for IB. >> >> The lkey/rkey decision is passed in the fastreg post_send(). > > That is too late to check the access flags. Why? the access permissions are kept in the mr context? I can move it to the post interface if it makes more sense. the access is kind of out of place in the mapping routine anyway... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <55B11A92.9040406-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <55B11A92.9040406-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2015-07-23 18:51 ` Jason Gunthorpe [not found] ` <20150723185126.GA31346-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-23 18:51 UTC (permalink / raw) To: Sagi Grimberg Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Thu, Jul 23, 2015 at 07:47:14PM +0300, Sagi Grimberg wrote: > >>>So we force ULPs to think about what they are doing properly, and we > >>>get a chance to actually force lkey to be local use only for IB. > >> > >>The lkey/rkey decision is passed in the fastreg post_send(). > > > >That is too late to check the access flags. > > Why? the access permissions are kept in the mr context? Sure, one could do if (key == mr->lkey) .. check lkey flags in the post, but that seems silly considering we want the post inlined.. > I can move it to the post interface if it makes more sense. > the access is kind of out of place in the mapping routine anyway... All the dma routines have an access equivalent during map, I don't think it is out of place.. To my mind, the map is the point where the MR should crystallize into an rkey or lkey MR, not at the post. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150723185126.GA31346-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <20150723185126.GA31346-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-26 9:45 ` Sagi Grimberg [not found] ` <55B4AC26.20405-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-26 9:45 UTC (permalink / raw) To: Jason Gunthorpe Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/23/2015 9:51 PM, Jason Gunthorpe wrote: > On Thu, Jul 23, 2015 at 07:47:14PM +0300, Sagi Grimberg wrote: > >>>>> So we force ULPs to think about what they are doing properly, and we >>>>> get a chance to actually force lkey to be local use only for IB. >>>> >>>> The lkey/rkey decision is passed in the fastreg post_send(). >>> >>> That is too late to check the access flags. >> >> Why? the access permissions are kept in the mr context? > > Sure, one could do if (key == mr->lkey) .. check lkey flags in the > post, but that seems silly considering we want the post inlined.. Why should we check the lkey/rkey access flags in the post? > >> I can move it to the post interface if it makes more sense. >> the access is kind of out of place in the mapping routine anyway... > > All the dma routines have an access equivalent during map, I don't > think it is out of place.. > > To my mind, the map is the point where the MR should crystallize into > an rkey or lkey MR, not at the post. I'm not sure I understand why the lkey/rkey should be set at the map routine. To me, it seems more natural to map_mr_sg and then either register the lkey or the rkey. It's easy enough to move the key arg to ib_map_mr_sg, but I don't see a good reason why at the moment. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <55B4AC26.20405-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <55B4AC26.20405-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2015-07-27 17:14 ` Jason Gunthorpe [not found] ` <20150727171441.GC18348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-27 17:14 UTC (permalink / raw) To: Sagi Grimberg Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Sun, Jul 26, 2015 at 12:45:10PM +0300, Sagi Grimberg wrote: > On 7/23/2015 9:51 PM, Jason Gunthorpe wrote: > >On Thu, Jul 23, 2015 at 07:47:14PM +0300, Sagi Grimberg wrote: > > > >>>>>So we force ULPs to think about what they are doing properly, and we > >>>>>get a chance to actually force lkey to be local use only for IB. > >>>> > >>>>The lkey/rkey decision is passed in the fastreg post_send(). > >>> > >>>That is too late to check the access flags. > >> > >>Why? the access permissions are kept in the mr context? > > > >Sure, one could do if (key == mr->lkey) .. check lkey flags in the > >post, but that seems silly considering we want the post inlined.. > > Why should we check the lkey/rkey access flags in the post? Eh? It was your idea.. I just want to check the access flags and force lkey's to not have ACCESS_REMOTE set without complaining loudly. To do that you need to know if the mr is a lkey/rkey, and you need to know the flags. > >>I can move it to the post interface if it makes more sense. > >>the access is kind of out of place in the mapping routine anyway... > > > >All the dma routines have an access equivalent during map, I don't > >think it is out of place.. > > > >To my mind, the map is the point where the MR should crystallize into > >an rkey or lkey MR, not at the post. > > I'm not sure I understand why the lkey/rkey should be set at the map > routine. To me, it seems more natural to map_mr_sg and then either > register the lkey or the rkey. We need to check the access flags to put a stop to this remote access lkey security problem. That means we need to label every MR as a lkey or rkey MR. No more MR's can be both nonsense. Pick a place to do that and enforce that IB cannot have remote access LKEYs. My vote is to do that work in map, because I don't think it make any sense in post (post should not fail) Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150727171441.GC18348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <20150727171441.GC18348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-27 20:11 ` Steve Wise [not found] ` <55B69058.70403-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Steve Wise @ 2015-07-27 20:11 UTC (permalink / raw) To: Jason Gunthorpe, Sagi Grimberg Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/27/2015 12:14 PM, Jason Gunthorpe wrote: > On Sun, Jul 26, 2015 at 12:45:10PM +0300, Sagi Grimberg wrote: >> On 7/23/2015 9:51 PM, Jason Gunthorpe wrote: >>> On Thu, Jul 23, 2015 at 07:47:14PM +0300, Sagi Grimberg wrote: >>> >>>>>>> So we force ULPs to think about what they are doing properly, and we >>>>>>> get a chance to actually force lkey to be local use only for IB. >>>>>> The lkey/rkey decision is passed in the fastreg post_send(). >>>>> That is too late to check the access flags. >>>> Why? the access permissions are kept in the mr context? >>> Sure, one could do if (key == mr->lkey) .. check lkey flags in the >>> post, but that seems silly considering we want the post inlined.. >> Why should we check the lkey/rkey access flags in the post? > Eh? It was your idea.. > > I just want to check the access flags and force lkey's to not have > ACCESS_REMOTE set without complaining loudly. > > To do that you need to know if the mr is a lkey/rkey, and you need to > know the flags. > >>>> I can move it to the post interface if it makes more sense. >>>> the access is kind of out of place in the mapping routine anyway... >>> All the dma routines have an access equivalent during map, I don't >>> think it is out of place.. >>> >>> To my mind, the map is the point where the MR should crystallize into >>> an rkey or lkey MR, not at the post. >> I'm not sure I understand why the lkey/rkey should be set at the map >> routine. To me, it seems more natural to map_mr_sg and then either >> register the lkey or the rkey. > We need to check the access flags to put a stop to this remote access > lkey security problem. That means we need to label every MR as a lkey > or rkey MR. > > No more MR's can be both nonsense. Well technically an MR with REMOTE_WRITE also has LOCAL_WRITE set. So you are proposing the core disallow a ULP from using the lkey for this type of MR? Say in a RECV sge? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <55B69058.70403-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>]
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <55B69058.70403-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> @ 2015-07-27 20:29 ` Jason Gunthorpe 0 siblings, 0 replies; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-27 20:29 UTC (permalink / raw) To: Steve Wise Cc: Sagi Grimberg, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Mon, Jul 27, 2015 at 03:11:04PM -0500, Steve Wise wrote: > Well technically an MR with REMOTE_WRITE also has LOCAL_WRITE set. So you > are proposing the core disallow a ULP from using the lkey for this type of > MR? Say in a RECV sge? Yes, absolutely. It is wrong anyhow, RECV isn't special, if you RECV into memory that is exposed via a rkey MR, you have to invalidate that MR and fence DMA before you can touch the buffer. Only very special, carefully designed, cases could avoid that. We don't have those cases, so lets just ban it. The only exception is the iWarp RDMA READ thing. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH WIP 28/43] IB/core: Introduce new fast registration API [not found] ` <1437548143-24893-29-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 16:50 ` Christoph Hellwig 2015-07-22 18:02 ` Jason Gunthorpe @ 2015-07-28 11:20 ` Haggai Eran 2 siblings, 0 replies; 142+ messages in thread From: Haggai Eran @ 2015-07-28 11:20 UTC (permalink / raw) To: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer On 22/07/2015 09:55, Sagi Grimberg wrote: > +/** > + * ib_sg_to_pages() - Convert a sg list to a page vector > + * @dev: ib device > + * @sgl: dma mapped scatterlist > + * @sg_nents: number of entries in sg > + * @max_pages: maximum pages allowed > + * @pages: output page vector > + * @npages: output number of mapped pages > + * @length: output total byte length > + * @offset: output first byte offset > + * > + * Core service helper for drivers to convert a scatter > + * list to a page vector. The assumption is that the > + * sg must meet the following conditions: > + * - Only the first sg is allowed to have an offset > + * - All the elements are of the same size - PAGE_SIZE > + * - The last element is allowed to have length less than > + * PAGE_SIZE > + * > + * If any of those conditions is not met, the routine will > + * fail with EINVAL. > + */ > +int ib_sg_to_pages(struct scatterlist *sgl, > + unsigned short sg_nents, > + unsigned short max_pages, > + u64 *pages, u32 *npages, > + u32 *length, u64 *offset) > +{ > + struct scatterlist *sg; > + u64 last_end_dma_addr = 0, last_page_addr = 0; > + unsigned int last_page_off = 0; > + int i, j = 0; > + > + /* TODO: We can do better with huge pages */ > + > + *offset = sg_dma_address(&sgl[0]); > + *length = 0; > + > + for_each_sg(sgl, sg, sg_nents, i) { > + u64 dma_addr = sg_dma_address(sg); > + unsigned int dma_len = sg_dma_len(sg); > + u64 end_dma_addr = dma_addr + dma_len; > + u64 page_addr = dma_addr & PAGE_MASK; > + > + *length += dma_len; > + > + /* Fail we ran out of pages */ > + if (unlikely(j > max_pages)) > + return -EINVAL; > + > + if (i && sg->offset) { > + if (unlikely((last_end_dma_addr) != dma_addr)) { > + /* gap - fail */ > + goto err; > + } > + if (last_page_off + dma_len < PAGE_SIZE) { > + /* chunk this fragment with the last */ > + last_end_dma_addr += dma_len; > + last_page_off += dma_len; > + continue; > + } else { > + /* map starting from the next page */ > + page_addr = last_page_addr + PAGE_SIZE; > + dma_len -= PAGE_SIZE - last_page_off; > + } > + } > + > + do { > + pages[j++] = page_addr; I think this line could overrun the pages buffer. The test above only checks at the beginning of the sg, but with an sg larger than PAGE_SIZE, you could still overrun. > + page_addr += PAGE_SIZE; > + } while (page_addr < end_dma_addr); > + > + last_end_dma_addr = end_dma_addr; > + last_page_addr = end_dma_addr & PAGE_MASK; > + last_page_off = end_dma_addr & ~PAGE_MASK; > + } > + > + *npages = j; > + > + return 0; > +err: > + pr_err("RDMA alignment violation\n"); > + for_each_sg(sgl, sg, sg_nents, i) { > + u64 dma_addr = sg_dma_address(sg); > + unsigned int dma_len = sg_dma_len(sg); > + > + pr_err("sg[%d]: offset=0x%x, dma_addr=0x%llx, dma_len=0x%x\n", > + i, sg->offset, dma_addr, dma_len); > + } > + > + return -EINVAL; > +} > +EXPORT_SYMBOL(ib_sg_to_pages); -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* [PATCH WIP 29/43] mlx5: Support the new memory registration API [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (27 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 28/43] IB/core: Introduce new fast registration API Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 30/43] mlx4: " Sagi Grimberg ` (14 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Just duplicated the functions to take the needed arguments from the private MR context. The old fast_reg routines will be dropped later. Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/mlx5/main.c | 1 + drivers/infiniband/hw/mlx5/mlx5_ib.h | 3 ++ drivers/infiniband/hw/mlx5/mr.c | 11 +++++ drivers/infiniband/hw/mlx5/qp.c | 90 ++++++++++++++++++++++++++++++++++++ 4 files changed, 105 insertions(+) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index ce75875..a90ef7a 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1503,6 +1503,7 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev) dev->ib_dev.detach_mcast = mlx5_ib_mcg_detach; dev->ib_dev.process_mad = mlx5_ib_process_mad; dev->ib_dev.alloc_mr = mlx5_ib_alloc_mr; + dev->ib_dev.map_mr_sg = mlx5_ib_map_mr_sg; dev->ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list; dev->ib_dev.free_fast_reg_page_list = mlx5_ib_free_fast_reg_page_list; dev->ib_dev.check_mr_status = mlx5_ib_check_mr_status; diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index df5e959..7017a1a 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -582,6 +582,9 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd, enum ib_mr_type mr_type, u32 max_entries, u32 flags); +int mlx5_ib_map_mr_sg(struct ib_mr *ibmr, + struct scatterlist *sg, + unsigned short sg_nents); struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev, int page_list_len); void mlx5_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list); diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index 1075065..7a030a2 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -1471,3 +1471,14 @@ int mlx5_ib_check_mr_status(struct ib_mr *ibmr, u32 check_mask, done: return ret; } + +int mlx5_ib_map_mr_sg(struct ib_mr *ibmr, + struct scatterlist *sg, + unsigned short sg_nents) +{ + struct mlx5_ib_mr *mr = to_mmr(ibmr); + + return ib_sg_to_pages(sg, sg_nents, mr->max_descs, + mr->pl, &mr->ndescs, + &ibmr->length, &ibmr->iova); +} diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 203c8a4..f0a03aa 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -65,6 +65,7 @@ static const u32 mlx5_ib_opcode[] = { [IB_WR_SEND_WITH_INV] = MLX5_OPCODE_SEND_INVAL, [IB_WR_LOCAL_INV] = MLX5_OPCODE_UMR, [IB_WR_FAST_REG_MR] = MLX5_OPCODE_UMR, + [IB_WR_FASTREG_MR] = MLX5_OPCODE_UMR, [IB_WR_MASKED_ATOMIC_CMP_AND_SWP] = MLX5_OPCODE_ATOMIC_MASKED_CS, [IB_WR_MASKED_ATOMIC_FETCH_AND_ADD] = MLX5_OPCODE_ATOMIC_MASKED_FA, [MLX5_IB_WR_UMR] = MLX5_OPCODE_UMR, @@ -1903,6 +1904,17 @@ static __be64 sig_mkey_mask(void) return cpu_to_be64(result); } +static void set_fastreg_umr_seg(struct mlx5_wqe_umr_ctrl_seg *umr, + struct mlx5_ib_mr *mr) +{ + int ndescs = mr->ndescs; + + memset(umr, 0, sizeof(*umr)); + umr->flags = MLX5_UMR_CHECK_NOT_FREE; + umr->klm_octowords = get_klm_octo(ndescs); + umr->mkey_mask = frwr_mkey_mask(); +} + static void set_frwr_umr_segment(struct mlx5_wqe_umr_ctrl_seg *umr, struct ib_send_wr *wr, int li) { @@ -1994,6 +2006,23 @@ static u8 get_umr_flags(int acc) MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN; } +static void set_fastreg_mkey_seg(struct mlx5_mkey_seg *seg, + struct mlx5_ib_mr *mr, u32 key, + int *writ) +{ + int ndescs = ALIGN(mr->ndescs, 8) >> 1; + + memset(seg, 0, sizeof(*seg)); + seg->flags = get_umr_flags(mr->ibmr.access) | MLX5_ACCESS_MODE_MTT; + *writ = seg->flags & (MLX5_PERM_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE); + seg->qpn_mkey7_0 = cpu_to_be32((key & 0xff) | 0xffffff00); + seg->flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL); + seg->start_addr = cpu_to_be64(mr->ibmr.iova); + seg->len = cpu_to_be64(mr->ibmr.length); + seg->xlt_oct_size = cpu_to_be32(ndescs); + seg->log2_page_size = PAGE_SHIFT; +} + static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr, int li, int *writ) { @@ -2035,6 +2064,23 @@ static void set_reg_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *w mlx5_mkey_variant(umrwr->mkey)); } +static void set_fastreg_ds(struct mlx5_wqe_data_seg *dseg, + struct mlx5_ib_mr *mr, + struct mlx5_ib_pd *pd, + int writ) +{ + u64 perm = MLX5_EN_RD | (writ ? MLX5_EN_WR : 0); + int bcount = sizeof(u64) * mr->ndescs; + int i; + + for (i = 0; i < mr->ndescs; i++) + mr->mpl[i] = cpu_to_be64(mr->pl[i] | perm); + + dseg->addr = cpu_to_be64(mr->pl_map); + dseg->byte_count = cpu_to_be32(ALIGN(bcount, 64)); + dseg->lkey = cpu_to_be32(pd->pa_lkey); +} + static void set_frwr_pages(struct mlx5_wqe_data_seg *dseg, struct ib_send_wr *wr, struct mlx5_core_dev *mdev, @@ -2440,6 +2486,37 @@ static int set_psv_wr(struct ib_sig_domain *domain, return 0; } +static int set_fastreg_wr(struct mlx5_ib_qp *qp, + struct ib_send_wr *wr, + void **seg, int *size) +{ + struct mlx5_ib_mr *mr = to_mmr(wr->wr.fastreg.mr); + struct mlx5_ib_pd *pd = to_mpd(qp->ibqp.pd); + u32 key = wr->wr.fastreg.key; + int writ = 0; + + if (unlikely(wr->send_flags & IB_SEND_INLINE)) + return -EINVAL; + + set_fastreg_umr_seg(*seg, mr); + *seg += sizeof(struct mlx5_wqe_umr_ctrl_seg); + *size += sizeof(struct mlx5_wqe_umr_ctrl_seg) / 16; + if (unlikely((*seg == qp->sq.qend))) + *seg = mlx5_get_send_wqe(qp, 0); + + set_fastreg_mkey_seg(*seg, mr, key, &writ); + *seg += sizeof(struct mlx5_mkey_seg); + *size += sizeof(struct mlx5_mkey_seg) / 16; + if (unlikely((*seg == qp->sq.qend))) + *seg = mlx5_get_send_wqe(qp, 0); + + set_fastreg_ds(*seg, mr, pd, writ); + *seg += sizeof(struct mlx5_wqe_data_seg); + *size += (sizeof(struct mlx5_wqe_data_seg) / 16); + + return 0; +} + static int set_frwr_li_wr(void **seg, struct ib_send_wr *wr, int *size, struct mlx5_core_dev *mdev, struct mlx5_ib_pd *pd, struct mlx5_ib_qp *qp) { @@ -2683,6 +2760,19 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, num_sge = 0; break; + case IB_WR_FASTREG_MR: + next_fence = MLX5_FENCE_MODE_INITIATOR_SMALL; + qp->sq.wr_data[idx] = IB_WR_FASTREG_MR; + ctrl->imm = cpu_to_be32(wr->wr.fastreg.key); + err = set_fastreg_wr(qp, wr, &seg, &size); + if (err) { + mlx5_ib_warn(dev, "\n"); + *bad_wr = wr; + goto out; + } + num_sge = 0; + break; + case IB_WR_REG_SIG_MR: qp->sq.wr_data[idx] = IB_WR_REG_SIG_MR; mr = to_mmr(wr->wr.sig_handover.sig_mr); -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 30/43] mlx4: Support the new memory registration API [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (28 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 29/43] mlx5: Support the new memory " Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 31/43] ocrdma: " Sagi Grimberg ` (13 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Just duplicated the functions to take the needed arguments from the private MR context. The old fast_reg routines will be dropped later. Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/mlx4/main.c | 1 + drivers/infiniband/hw/mlx4/mlx4_ib.h | 3 +++ drivers/infiniband/hw/mlx4/mr.c | 11 +++++++++++ drivers/infiniband/hw/mlx4/qp.c | 27 +++++++++++++++++++++++++++ 4 files changed, 42 insertions(+) diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c index 829fcf4..f2d101c 100644 --- a/drivers/infiniband/hw/mlx4/main.c +++ b/drivers/infiniband/hw/mlx4/main.c @@ -2298,6 +2298,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev) ibdev->ib_dev.rereg_user_mr = mlx4_ib_rereg_user_mr; ibdev->ib_dev.dereg_mr = mlx4_ib_dereg_mr; ibdev->ib_dev.alloc_mr = mlx4_ib_alloc_mr; + ibdev->ib_dev.map_mr_sg = mlx4_ib_map_mr_sg; ibdev->ib_dev.alloc_fast_reg_page_list = mlx4_ib_alloc_fast_reg_page_list; ibdev->ib_dev.free_fast_reg_page_list = mlx4_ib_free_fast_reg_page_list; ibdev->ib_dev.attach_mcast = mlx4_ib_mcg_attach; diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h index a9a4a7f..e5c7292 100644 --- a/drivers/infiniband/hw/mlx4/mlx4_ib.h +++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h @@ -689,6 +689,9 @@ struct ib_mr *mlx4_ib_alloc_mr(struct ib_pd *pd, enum ib_mr_type mr_type, u32 max_entries, u32 flags); +int mlx4_ib_map_mr_sg(struct ib_mr *ibmr, + struct scatterlist *sg, + unsigned short sg_nents); struct ib_fast_reg_page_list *mlx4_ib_alloc_fast_reg_page_list(struct ib_device *ibdev, int page_list_len); void mlx4_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list); diff --git a/drivers/infiniband/hw/mlx4/mr.c b/drivers/infiniband/hw/mlx4/mr.c index 01e16bc..9a86829 100644 --- a/drivers/infiniband/hw/mlx4/mr.c +++ b/drivers/infiniband/hw/mlx4/mr.c @@ -574,3 +574,14 @@ int mlx4_ib_fmr_dealloc(struct ib_fmr *ibfmr) return err; } + +int mlx4_ib_map_mr_sg(struct ib_mr *ibmr, + struct scatterlist *sg, + unsigned short sg_nents) +{ + struct mlx4_ib_mr *mr = to_mmr(ibmr); + + return ib_sg_to_pages(sg, sg_nents, mr->max_pages, + mr->pl, &mr->npages, + &ibmr->length, &ibmr->iova); +} diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index c5a3a5f..492e799 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -2401,6 +2401,25 @@ static __be32 convert_access(int acc) cpu_to_be32(MLX4_WQE_FMR_PERM_LOCAL_READ); } +static void set_fastreg_seg(struct mlx4_wqe_fmr_seg *fseg, struct ib_send_wr *wr) +{ + struct mlx4_ib_mr *mr = to_mmr(wr->wr.fastreg.mr); + int i; + + for (i = 0; i < mr->npages; ++i) + mr->mpl[i] = cpu_to_be64(mr->pl[i] | MLX4_MTT_FLAG_PRESENT); + + fseg->flags = convert_access(mr->ibmr.access); + fseg->mem_key = cpu_to_be32(wr->wr.fastreg.key); + fseg->buf_list = cpu_to_be64(mr->pl_map); + fseg->start_addr = cpu_to_be64(mr->ibmr.iova); + fseg->reg_len = cpu_to_be64(mr->ibmr.length); + fseg->offset = 0; /* XXX -- is this just for ZBVA? */ + fseg->page_size = cpu_to_be32(PAGE_SHIFT); + fseg->reserved[0] = 0; + fseg->reserved[1] = 0; +} + static void set_fmr_seg(struct mlx4_wqe_fmr_seg *fseg, struct ib_send_wr *wr) { struct mlx4_ib_fast_reg_page_list *mfrpl = to_mfrpl(wr->wr.fast_reg.page_list); @@ -2759,6 +2778,14 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, size += sizeof (struct mlx4_wqe_fmr_seg) / 16; break; + case IB_WR_FASTREG_MR: + ctrl->srcrb_flags |= + cpu_to_be32(MLX4_WQE_CTRL_STRONG_ORDER); + set_fastreg_seg(wqe, wr); + wqe += sizeof (struct mlx4_wqe_fmr_seg); + size += sizeof (struct mlx4_wqe_fmr_seg) / 16; + break; + case IB_WR_BIND_MW: ctrl->srcrb_flags |= cpu_to_be32(MLX4_WQE_CTRL_STRONG_ORDER); -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 31/43] ocrdma: Support the new memory registration API [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (29 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 30/43] mlx4: " Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 32/43] cxgb3: " Sagi Grimberg ` (12 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Just duplicated the functions to take the needed arguments from the private MR context. The old fast_reg routines will be dropped later. --- drivers/infiniband/hw/ocrdma/ocrdma_main.c | 1 + drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 67 +++++++++++++++++++++++++++++ drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 3 ++ 3 files changed, 71 insertions(+) diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c index 47d2814..2dd6b06 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c @@ -295,6 +295,7 @@ static int ocrdma_register_device(struct ocrdma_dev *dev) dev->ibdev.reg_user_mr = ocrdma_reg_user_mr; dev->ibdev.alloc_mr = ocrdma_alloc_mr; + dev->ibdev.map_mr_sg = ocrdma_map_mr_sg; dev->ibdev.alloc_fast_reg_page_list = ocrdma_alloc_frmr_page_list; dev->ibdev.free_fast_reg_page_list = ocrdma_free_frmr_page_list; diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c index a764cb9..0f32fc4 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c @@ -2121,6 +2121,59 @@ static int get_encoded_page_size(int pg_sz) return i; } +static int ocrdma_build_fr2(struct ocrdma_qp *qp, struct ocrdma_hdr_wqe *hdr, + struct ib_send_wr *wr) +{ + u64 fbo; + struct ocrdma_ewqe_fr *fast_reg = (struct ocrdma_ewqe_fr *)(hdr + 1); + struct ocrdma_mr *mr = get_ocrdma_mr(wr->wr.fastreg.mr); + struct ocrdma_pbl *pbl_tbl = mr->hwmr.pbl_table; + struct ocrdma_pbe *pbe; + u32 wqe_size = sizeof(*fast_reg) + sizeof(*hdr); + int num_pbes = 0, i; + + wqe_size = roundup(wqe_size, OCRDMA_WQE_ALIGN_BYTES); + + hdr->cw |= (OCRDMA_FR_MR << OCRDMA_WQE_OPCODE_SHIFT); + hdr->cw |= ((wqe_size / OCRDMA_WQE_STRIDE) << OCRDMA_WQE_SIZE_SHIFT); + + if (mr->ibmr.access & IB_ACCESS_LOCAL_WRITE) + hdr->rsvd_lkey_flags |= OCRDMA_LKEY_FLAG_LOCAL_WR; + if (mr->ibmr.access & IB_ACCESS_REMOTE_WRITE) + hdr->rsvd_lkey_flags |= OCRDMA_LKEY_FLAG_REMOTE_WR; + if (mr->ibmr.access & IB_ACCESS_REMOTE_READ) + hdr->rsvd_lkey_flags |= OCRDMA_LKEY_FLAG_REMOTE_RD; + hdr->lkey = wr->wr.fastreg.key; + hdr->total_len = mr->ibmr.length; + + fbo = mr->ibmr.iova - mr->pl[0]; + + fast_reg->va_hi = upper_32_bits(mr->ibmr.iova); + fast_reg->va_lo = (u32) (mr->ibmr.iova & 0xffffffff); + fast_reg->fbo_hi = upper_32_bits(fbo); + fast_reg->fbo_lo = (u32) fbo & 0xffffffff; + fast_reg->num_sges = mr->npages; + fast_reg->size_sge = get_encoded_page_size(1 << PAGE_SHIFT); + + pbe = pbl_tbl->va; + for (i = 0; i < mr->npages; i++) { + u64 buf_addr = mr->pl[i]; + pbe->pa_lo = cpu_to_le32((u32) (buf_addr & PAGE_MASK)); + pbe->pa_hi = cpu_to_le32((u32) upper_32_bits(buf_addr)); + num_pbes += 1; + pbe++; + + /* if the pbl is full storing the pbes, + * move to next pbl. + */ + if (num_pbes == (mr->hwmr.pbl_size/sizeof(u64))) { + pbl_tbl++; + pbe = (struct ocrdma_pbe *)pbl_tbl->va; + } + } + + return 0; +} static int ocrdma_build_fr(struct ocrdma_qp *qp, struct ocrdma_hdr_wqe *hdr, struct ib_send_wr *wr) @@ -2248,6 +2301,9 @@ int ocrdma_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, case IB_WR_FAST_REG_MR: status = ocrdma_build_fr(qp, hdr, wr); break; + case IB_WR_FASTREG_MR: + status = ocrdma_build_fr2(qp, hdr, wr); + break; default: status = -EINVAL; break; @@ -3221,3 +3277,14 @@ pbl_err: kfree(mr); return ERR_PTR(status); } + +int ocrdma_map_mr_sg(struct ib_mr *ibmr, + struct scatterlist *sg, + unsigned short sg_nents) +{ + struct ocrdma_mr *mr = get_ocrdma_mr(ibmr); + + return ib_sg_to_pages(sg, sg_nents, mr->hwmr.num_pbes, + mr->pl, &mr->npages, + &ibmr->length, &ibmr->iova); +} diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h index d09ff8e..4c60eec 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h @@ -100,6 +100,9 @@ struct ib_mr *ocrdma_alloc_mr(struct ib_pd *pd, enum ib_mr_type mr_type, u32 max_entries, u32 flags); +int ocrdma_map_mr_sg(struct ib_mr *ibmr, + struct scatterlist *sg, + unsigned short sg_nents); struct ib_fast_reg_page_list *ocrdma_alloc_frmr_page_list(struct ib_device *ibdev, int page_list_len); -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 32/43] cxgb3: Support the new memory registration API [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (30 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 31/43] ocrdma: " Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 33/43] cxgb4: " Sagi Grimberg ` (11 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Just duplicated the functions to take the needed arguments from the private MR context. The old fast_reg routines will be dropped later. Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/cxgb3/iwch_provider.c | 12 ++++++++ drivers/infiniband/hw/cxgb3/iwch_qp.c | 48 +++++++++++++++++++++++++++++ 2 files changed, 60 insertions(+) diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index c9368e6..b25cb6a 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -857,6 +857,17 @@ err: return ERR_PTR(ret); } +static int iwch_map_mr_sg(struct ib_mr *ibmr, + struct scatterlist *sg, + unsigned short sg_nents) +{ + struct iwch_mr *mhp = to_iwch_mr(ibmr); + + return ib_sg_to_pages(sg, sg_nents, mhp->attr.pbl_size, + mhp->pl, &mhp->npages, + &ibmr->length, &ibmr->iova); +} + static struct ib_fast_reg_page_list *iwch_alloc_fastreg_pbl( struct ib_device *device, int page_list_len) @@ -1455,6 +1466,7 @@ int iwch_register_device(struct iwch_dev *dev) dev->ibdev.bind_mw = iwch_bind_mw; dev->ibdev.dealloc_mw = iwch_dealloc_mw; dev->ibdev.alloc_mr = iwch_alloc_mr; + dev->ibdev.map_mr_sg = iwch_map_mr_sg; dev->ibdev.alloc_fast_reg_page_list = iwch_alloc_fastreg_pbl; dev->ibdev.free_fast_reg_page_list = iwch_free_fastreg_pbl; dev->ibdev.attach_mcast = iwch_multicast_attach; diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c index b57c0be..2c30326 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_qp.c +++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c @@ -146,6 +146,49 @@ static int build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr, return 0; } +static int build_fastreg2(union t3_wr *wqe, struct ib_send_wr *wr, + u8 *flit_cnt, int *wr_cnt, struct t3_wq *wq) +{ + struct iwch_mr *mhp = to_iwch_mr(wr->wr.fastreg.mr); + int i; + __be64 *p; + + if (mhp->npages > T3_MAX_FASTREG_DEPTH) + return -EINVAL; + *wr_cnt = 1; + wqe->fastreg.stag = cpu_to_be32(wr->wr.fastreg.key); + wqe->fastreg.len = cpu_to_be32(mhp->ibmr.length); + wqe->fastreg.va_base_hi = cpu_to_be32(mhp->ibmr.iova >> 32); + wqe->fastreg.va_base_lo_fbo = + cpu_to_be32(mhp->ibmr.iova & 0xffffffff); + wqe->fastreg.page_type_perms = cpu_to_be32( + V_FR_PAGE_COUNT(mhp->npages) | + V_FR_PAGE_SIZE(PAGE_SHIFT - 12) | + V_FR_TYPE(TPT_VATO) | + V_FR_PERMS(iwch_ib_to_tpt_access(mhp->ibmr.access))); + p = &wqe->fastreg.pbl_addrs[0]; + for (i = 0; i < mhp->npages; i++, p++) { + + /* If we need a 2nd WR, then set it up */ + if (i == T3_MAX_FASTREG_FRAG) { + *wr_cnt = 2; + wqe = (union t3_wr *)(wq->queue + + Q_PTR2IDX((wq->wptr+1), wq->size_log2)); + build_fw_riwrh((void *)wqe, T3_WR_FASTREG, 0, + Q_GENBIT(wq->wptr + 1, wq->size_log2), + 0, 1 + mhp->npages - T3_MAX_FASTREG_FRAG, + T3_EOP); + + p = &wqe->pbl_frag.pbl_addrs[0]; + } + *p = cpu_to_be64((u64)mhp->pl[i]); + } + *flit_cnt = 5 + mhp->npages; + if (*flit_cnt > 15) + *flit_cnt = 15; + return 0; +} + static int build_fastreg(union t3_wr *wqe, struct ib_send_wr *wr, u8 *flit_cnt, int *wr_cnt, struct t3_wq *wq) { @@ -419,6 +462,11 @@ int iwch_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, err = build_fastreg(wqe, wr, &t3_wr_flit_cnt, &wr_cnt, &qhp->wq); break; + case IB_WR_FASTREG_MR: + t3_wr_opcode = T3_WR_FASTREG; + err = build_fastreg2(wqe, wr, &t3_wr_flit_cnt, + &wr_cnt, &qhp->wq); + break; case IB_WR_LOCAL_INV: if (wr->send_flags & IB_SEND_FENCE) t3_wr_flags |= T3_LOCAL_FENCE_FLAG; -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 33/43] cxgb4: Support the new memory registration API [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (31 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 32/43] cxgb3: " Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 34/43] nes: " Sagi Grimberg ` (10 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Just duplicated the functions to take the needed arguments from the private MR context. The old fast_reg routines will be dropped later. Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/cxgb4/iw_cxgb4.h | 3 ++ drivers/infiniband/hw/cxgb4/mem.c | 11 +++++ drivers/infiniband/hw/cxgb4/provider.c | 1 + drivers/infiniband/hw/cxgb4/qp.c | 75 +++++++++++++++++++++++++++++++++- 4 files changed, 89 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h index e529ace..ce2bbf3 100644 --- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h +++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h @@ -978,6 +978,9 @@ struct ib_mr *c4iw_alloc_mr(struct ib_pd *pd, enum ib_mr_type mr_type, u32 max_entries, u32 flags); +int c4iw_map_mr_sg(struct ib_mr *ibmr, + struct scatterlist *sg, + unsigned short sg_nents); int c4iw_dealloc_mw(struct ib_mw *mw); struct ib_mw *c4iw_alloc_mw(struct ib_pd *pd, enum ib_mw_type type); struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start, diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c index 91aedce..ea37fc7 100644 --- a/drivers/infiniband/hw/cxgb4/mem.c +++ b/drivers/infiniband/hw/cxgb4/mem.c @@ -922,6 +922,17 @@ err: return ERR_PTR(ret); } +int c4iw_map_mr_sg(struct ib_mr *ibmr, + struct scatterlist *sg, + unsigned short sg_nents) +{ + struct c4iw_mr *mhp = to_c4iw_mr(ibmr); + + return ib_sg_to_pages(sg, sg_nents, mhp->max_mpl_len, + mhp->mpl, &mhp->mpl_len, + &ibmr->length, &ibmr->iova); +} + struct ib_fast_reg_page_list *c4iw_alloc_fastreg_pbl(struct ib_device *device, int page_list_len) { diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c index 7746113..55dedad 100644 --- a/drivers/infiniband/hw/cxgb4/provider.c +++ b/drivers/infiniband/hw/cxgb4/provider.c @@ -557,6 +557,7 @@ int c4iw_register_device(struct c4iw_dev *dev) dev->ibdev.bind_mw = c4iw_bind_mw; dev->ibdev.dealloc_mw = c4iw_dealloc_mw; dev->ibdev.alloc_mr = c4iw_alloc_mr; + dev->ibdev.map_mr_sg = c4iw_map_mr_sg; dev->ibdev.alloc_fast_reg_page_list = c4iw_alloc_fastreg_pbl; dev->ibdev.free_fast_reg_page_list = c4iw_free_fastreg_pbl; dev->ibdev.attach_mcast = c4iw_multicast_attach; diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c index 6517e12..e5d1d99 100644 --- a/drivers/infiniband/hw/cxgb4/qp.c +++ b/drivers/infiniband/hw/cxgb4/qp.c @@ -605,10 +605,75 @@ static int build_rdma_recv(struct c4iw_qp *qhp, union t4_recv_wr *wqe, return 0; } -static int build_fastreg(struct t4_sq *sq, union t4_wr *wqe, +static int build_fastreg2(struct t4_sq *sq, union t4_wr *wqe, struct ib_send_wr *wr, u8 *len16, u8 t5dev) { + struct c4iw_mr *mhp = to_c4iw_mr(wr->wr.fastreg.mr); + struct fw_ri_immd *imdp; + __be64 *p; + int i; + int pbllen = roundup(mhp->mpl_len * sizeof(u64), 32); + int rem; + + if (mhp->mpl_len > t4_max_fr_depth(use_dsgl)) + return -EINVAL; + + wqe->fr.qpbinde_to_dcacpu = 0; + wqe->fr.pgsz_shift = PAGE_SHIFT - 12; + wqe->fr.addr_type = FW_RI_VA_BASED_TO; + wqe->fr.mem_perms = c4iw_ib_to_tpt_access(mhp->ibmr.access); + wqe->fr.len_hi = 0; + wqe->fr.len_lo = cpu_to_be32(mhp->ibmr.length); + wqe->fr.stag = cpu_to_be32(wr->wr.fastreg.key); + wqe->fr.va_hi = cpu_to_be32(mhp->ibmr.iova >> 32); + wqe->fr.va_lo_fbo = cpu_to_be32(mhp->ibmr.iova & + 0xffffffff); + + if (t5dev && use_dsgl && (pbllen > max_fr_immd)) { + struct fw_ri_dsgl *sglp; + + for (i = 0; i < mhp->mpl_len; i++) { + mhp->mpl[i] = (__force u64)cpu_to_be64((u64)mhp->mpl[i]); + } + + sglp = (struct fw_ri_dsgl *)(&wqe->fr + 1); + sglp->op = FW_RI_DATA_DSGL; + sglp->r1 = 0; + sglp->nsge = cpu_to_be16(1); + sglp->addr0 = cpu_to_be64(mhp->mpl_addr); + sglp->len0 = cpu_to_be32(pbllen); + + *len16 = DIV_ROUND_UP(sizeof(wqe->fr) + sizeof(*sglp), 16); + } else { + imdp = (struct fw_ri_immd *)(&wqe->fr + 1); + imdp->op = FW_RI_DATA_IMMD; + imdp->r1 = 0; + imdp->r2 = 0; + imdp->immdlen = cpu_to_be32(pbllen); + p = (__be64 *)(imdp + 1); + rem = pbllen; + for (i = 0; i < mhp->mpl_len; i++) { + *p = cpu_to_be64((u64)mhp->mpl[i]); + rem -= sizeof(*p); + if (++p == (__be64 *)&sq->queue[sq->size]) + p = (__be64 *)sq->queue; + } + BUG_ON(rem < 0); + while (rem) { + *p = 0; + rem -= sizeof(*p); + if (++p == (__be64 *)&sq->queue[sq->size]) + p = (__be64 *)sq->queue; + } + *len16 = DIV_ROUND_UP(sizeof(wqe->fr) + sizeof(*imdp) + + pbllen, 16); + } + return 0; +} +static int build_fastreg(struct t4_sq *sq, union t4_wr *wqe, + struct ib_send_wr *wr, u8 *len16, u8 t5dev) +{ struct fw_ri_immd *imdp; __be64 *p; int i; @@ -821,6 +886,14 @@ int c4iw_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, qhp->rhp->rdev.lldi.adapter_type) ? 1 : 0); break; + case IB_WR_FASTREG_MR: + fw_opcode = FW_RI_FR_NSMR_WR; + swsqe->opcode = FW_RI_FAST_REGISTER; + err = build_fastreg2(&qhp->wq.sq, wqe, wr, &len16, + is_t5( + qhp->rhp->rdev.lldi.adapter_type) ? + 1 : 0); + break; case IB_WR_LOCAL_INV: if (wr->send_flags & IB_SEND_FENCE) fw_flags |= FW_RI_LOCAL_FENCE_FLAG; -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 34/43] nes: Support the new memory registration API [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (32 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 33/43] cxgb4: " Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 35/43] qib: " Sagi Grimberg ` (9 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Just duplicated the functions to take the needed arguments from the private MR context. The old fast_reg routines will be dropped later. Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/nes/nes_verbs.c | 85 +++++++++++++++++++++++++++++++++++ 1 file changed, 85 insertions(+) diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c index 532496d..d5d8b01 100644 --- a/drivers/infiniband/hw/nes/nes_verbs.c +++ b/drivers/infiniband/hw/nes/nes_verbs.c @@ -465,6 +465,17 @@ err: return ERR_PTR(-ENOMEM); } +static int nes_map_mr_sg(struct ib_mr *ibmr, + struct scatterlist *sg, + unsigned short sg_nents) +{ + struct nes_mr *nesmr = to_nesmr(ibmr); + + return ib_sg_to_pages(sg, sg_nents, nesmr->max_pages, + nesmr->pl, &nesmr->npages, + &ibmr->length, &ibmr->iova); +} + /* * nes_alloc_fast_reg_page_list */ @@ -3537,6 +3548,79 @@ static int nes_post_send(struct ib_qp *ibqp, struct ib_send_wr *ib_wr, wqe_misc); break; } + case IB_WR_FASTREG_MR: + { + int i; + struct nes_mr *mr = to_nesmr(ib_wr->wr.fastreg.mr); + int flags = mr->ibmr.access; + u64 *src_page_list = mr->pl; + u64 *dst_page_list = mr->mpl; + + if (mr->npages > (NES_4K_PBL_CHUNK_SIZE / sizeof(u64))) { + nes_debug(NES_DBG_IW_TX, "SQ_FMR: bad page_list_len\n"); + err = -EINVAL; + break; + } + wqe_misc = NES_IWARP_SQ_OP_FAST_REG; + set_wqe_64bit_value(wqe->wqe_words, + NES_IWARP_SQ_FMR_WQE_VA_FBO_LOW_IDX, + mr->ibmr.iova); + set_wqe_32bit_value(wqe->wqe_words, + NES_IWARP_SQ_FMR_WQE_LENGTH_LOW_IDX, + mr->ibmr.length); + set_wqe_32bit_value(wqe->wqe_words, + NES_IWARP_SQ_FMR_WQE_LENGTH_HIGH_IDX, 0); + set_wqe_32bit_value(wqe->wqe_words, + NES_IWARP_SQ_FMR_WQE_MR_STAG_IDX, + ib_wr->wr.fastreg.key); + + /* Set page size: currently only 4K*/ + if (ib_wr->wr.fast_reg.page_shift == 12) { + wqe_misc |= NES_IWARP_SQ_FMR_WQE_PAGE_SIZE_4K; + } else { + nes_debug(NES_DBG_IW_TX, "Invalid page shift," + " ib_wr=%u, max=1\n", ib_wr->num_sge); + err = -EINVAL; + break; + } + + /* Set access_flags */ + wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_LOCAL_READ; + if (flags & IB_ACCESS_LOCAL_WRITE) + wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_LOCAL_WRITE; + + if (flags & IB_ACCESS_REMOTE_WRITE) + wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_REMOTE_WRITE; + + if (flags & IB_ACCESS_REMOTE_READ) + wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_REMOTE_READ; + + if (flags & IB_ACCESS_MW_BIND) + wqe_misc |= NES_IWARP_SQ_FMR_WQE_RIGHTS_ENABLE_WINDOW_BIND; + + /* Fill in PBL info: */ + set_wqe_64bit_value(wqe->wqe_words, + NES_IWARP_SQ_FMR_WQE_PBL_ADDR_LOW_IDX, + mr->mpl_addr); + + set_wqe_32bit_value(wqe->wqe_words, + NES_IWARP_SQ_FMR_WQE_PBL_LENGTH_IDX, + mr->npages * 8); + + for (i = 0; i < mr->npages; i++) + dst_page_list[i] = cpu_to_le64(src_page_list[i]); + + nes_debug(NES_DBG_IW_TX, "SQ_FMR: iova_start: %llx, " + "length: %d, rkey: %0x, pgl_paddr: %llx, " + "page_list_len: %u, wqe_misc: %x\n", + (unsigned long long) mr->ibmr.iova, + mr->ibmr.length, + ib_wr->wr.fastreg.key, + (unsigned long long) mr->mpl_addr, + mr->npages, + wqe_misc); + break; + } default: /* error */ err = -EINVAL; @@ -3964,6 +4048,7 @@ struct nes_ib_device *nes_init_ofa_device(struct net_device *netdev) nesibdev->ibdev.bind_mw = nes_bind_mw; nesibdev->ibdev.alloc_mr = nes_alloc_mr; + nesibdev->ibdev.map_mr_sg = nes_map_mr_sg; nesibdev->ibdev.alloc_fast_reg_page_list = nes_alloc_fast_reg_page_list; nesibdev->ibdev.free_fast_reg_page_list = nes_free_fast_reg_page_list; -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 35/43] qib: Support the new memory registration API [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (33 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 34/43] nes: " Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 36/43] iser: Port to new fast registration api Sagi Grimberg ` (8 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Just duplicated the functions to take the needed arguments from the private MR context. The old fast_reg routines will be dropped later. Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/qib/qib_keys.c | 56 +++++++++++++++++++++++++++++++++++ drivers/infiniband/hw/qib/qib_mr.c | 11 +++++++ drivers/infiniband/hw/qib/qib_verbs.c | 6 +++- drivers/infiniband/hw/qib/qib_verbs.h | 5 ++++ 4 files changed, 77 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/qib/qib_keys.c b/drivers/infiniband/hw/qib/qib_keys.c index ad843c7..557e6c2 100644 --- a/drivers/infiniband/hw/qib/qib_keys.c +++ b/drivers/infiniband/hw/qib/qib_keys.c @@ -385,3 +385,59 @@ bail: spin_unlock_irqrestore(&rkt->lock, flags); return ret; } + +/* + * Initialize the memory region specified by the work reqeust. + */ +int qib_fastreg_mr(struct qib_qp *qp, struct ib_send_wr *wr) +{ + struct qib_lkey_table *rkt = &to_idev(qp->ibqp.device)->lk_table; + struct qib_pd *pd = to_ipd(qp->ibqp.pd); + struct qib_mr *mr = to_imr(wr->wr.fastreg.mr); + struct qib_mregion *mrg; + u32 key = wr->wr.fastreg.key; + unsigned i, n, m; + int ret = -EINVAL; + unsigned long flags; + u64 *page_list; + size_t ps; + + spin_lock_irqsave(&rkt->lock, flags); + if (pd->user || key == 0) + goto bail; + + mrg = rcu_dereference_protected( + rkt->table[(key >> (32 - ib_qib_lkey_table_size))], + lockdep_is_held(&rkt->lock)); + if (unlikely(mrg == NULL || qp->ibqp.pd != mrg->pd)) + goto bail; + + if (mr->npages > mrg->max_segs) + goto bail; + + ps = 1UL << PAGE_SHIFT; + if (mr->ibmr.length > ps * mr->npages) + goto bail; + + mrg->user_base = mr->ibmr.iova; + mrg->iova = mr->ibmr.iova; + mrg->lkey = key; + mrg->length = mr->ibmr.length; + mrg->access_flags = mr->ibmr.access; + page_list = mr->pl; + m = 0; + n = 0; + for (i = 0; i < wr->wr.fast_reg.page_list_len; i++) { + mrg->map[m]->segs[n].vaddr = (void *) page_list[i]; + mrg->map[m]->segs[n].length = ps; + if (++n == QIB_SEGSZ) { + m++; + n = 0; + } + } + + ret = 0; +bail: + spin_unlock_irqrestore(&rkt->lock, flags); + return ret; +} diff --git a/drivers/infiniband/hw/qib/qib_mr.c b/drivers/infiniband/hw/qib/qib_mr.c index a58a347..a4986f0 100644 --- a/drivers/infiniband/hw/qib/qib_mr.c +++ b/drivers/infiniband/hw/qib/qib_mr.c @@ -353,6 +353,17 @@ err: return ERR_PTR(-ENOMEM); } +int qib_map_mr_sg(struct ib_mr *ibmr, + struct scatterlist *sg, + unsigned short sg_nents) +{ + struct qib_mr *mr = to_imr(ibmr); + + return ib_sg_to_pages(sg, sg_nents, mr->mr.max_segs, + mr->pl, &mr->npages, + &ibmr->length, &ibmr->iova); +} + struct ib_fast_reg_page_list * qib_alloc_fast_reg_page_list(struct ib_device *ibdev, int page_list_len) { diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c index ef022a1..8561f90 100644 --- a/drivers/infiniband/hw/qib/qib_verbs.c +++ b/drivers/infiniband/hw/qib/qib_verbs.c @@ -361,7 +361,10 @@ static int qib_post_one_send(struct qib_qp *qp, struct ib_send_wr *wr, * undefined operations. * Make sure buffer is large enough to hold the result for atomics. */ - if (wr->opcode == IB_WR_FAST_REG_MR) { + if (wr->opcode == IB_WR_FASTREG_MR) { + if (qib_fastreg_mr(qp, wr)) + goto bail_inval; + } else if (wr->opcode == IB_WR_FAST_REG_MR) { if (qib_fast_reg_mr(qp, wr)) goto bail_inval; } else if (qp->ibqp.qp_type == IB_QPT_UC) { @@ -2236,6 +2239,7 @@ int qib_register_ib_device(struct qib_devdata *dd) ibdev->reg_user_mr = qib_reg_user_mr; ibdev->dereg_mr = qib_dereg_mr; ibdev->alloc_mr = qib_alloc_mr; + ibdev->map_mr_sg = qib_map_mr_sg; ibdev->alloc_fast_reg_page_list = qib_alloc_fast_reg_page_list; ibdev->free_fast_reg_page_list = qib_free_fast_reg_page_list; ibdev->alloc_fmr = qib_alloc_fmr; diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h index c8062ae..c7a3af5 100644 --- a/drivers/infiniband/hw/qib/qib_verbs.h +++ b/drivers/infiniband/hw/qib/qib_verbs.h @@ -1039,12 +1039,17 @@ struct ib_mr *qib_alloc_mr(struct ib_pd *pd, u32 max_entries, u32 flags); +int qib_map_mr_sg(struct ib_mr *ibmr, + struct scatterlist *sg, + unsigned short sg_nents); + struct ib_fast_reg_page_list *qib_alloc_fast_reg_page_list( struct ib_device *ibdev, int page_list_len); void qib_free_fast_reg_page_list(struct ib_fast_reg_page_list *pl); int qib_fast_reg_mr(struct qib_qp *qp, struct ib_send_wr *wr); +int qib_fastreg_mr(struct qib_qp *qp, struct ib_send_wr *wr); struct ib_fmr *qib_alloc_fmr(struct ib_pd *pd, int mr_access_flags, struct ib_fmr_attr *fmr_attr); -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 36/43] iser: Port to new fast registration api [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (34 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 35/43] qib: " Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 37/43] xprtrdma: Port to new memory registration API Sagi Grimberg ` (7 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/ulp/iser/iscsi_iser.h | 6 +---- drivers/infiniband/ulp/iser/iser_memory.c | 40 ++++++++++++------------------- drivers/infiniband/ulp/iser/iser_verbs.c | 16 +------------ 3 files changed, 17 insertions(+), 45 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h index 6c7efe6..88d0ffc 100644 --- a/drivers/infiniband/ulp/iser/iscsi_iser.h +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h @@ -413,7 +413,6 @@ struct iser_device { * * @mr: memory region * @fmr_pool: pool of fmrs - * @frpl: fast reg page list used by frwrs * @page_vec: fast reg page list used by fmr pool * @mr_valid: is mr valid indicator */ @@ -422,10 +421,7 @@ struct iser_reg_resources { struct ib_mr *mr; struct ib_fmr_pool *fmr_pool; }; - union { - struct ib_fast_reg_page_list *frpl; - struct iser_page_vec *page_vec; - }; + struct iser_page_vec *page_vec; u8 mr_valid:1; }; diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index d6d980b..094cf8a 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -732,19 +732,19 @@ static int iser_fast_reg_mr(struct iscsi_iser_task *iser_task, struct iser_reg_resources *rsc, struct iser_mem_reg *reg) { - struct ib_conn *ib_conn = &iser_task->iser_conn->ib_conn; - struct iser_device *device = ib_conn->device; - struct ib_mr *mr = rsc->mr; - struct ib_fast_reg_page_list *frpl = rsc->frpl; struct iser_tx_desc *tx_desc = &iser_task->desc; + struct ib_mr *mr = rsc->mr; struct ib_send_wr *wr; - int offset, size, plen; - - plen = iser_sg_to_page_vec(mem, device->ib_device, frpl->page_list, - &offset, &size); - if (plen * SIZE_4K < size) { - iser_err("fast reg page_list too short to hold this SG\n"); - return -EINVAL; + int err; + int access = IB_ACCESS_LOCAL_WRITE | + IB_ACCESS_REMOTE_WRITE | + IB_ACCESS_REMOTE_READ; + + err = ib_map_mr_sg(mr, mem->sg, mem->size, access); + if (err) { + iser_err("failed to map sg %p with %d entries\n", + mem->sg, mem->dma_nents); + return err; } if (!rsc->mr_valid) { @@ -753,24 +753,14 @@ static int iser_fast_reg_mr(struct iscsi_iser_task *iser_task, } wr = iser_tx_next_wr(tx_desc); - wr->opcode = IB_WR_FAST_REG_MR; - wr->wr_id = ISER_FASTREG_LI_WRID; - wr->send_flags = 0; - wr->wr.fast_reg.iova_start = frpl->page_list[0] + offset; - wr->wr.fast_reg.page_list = frpl; - wr->wr.fast_reg.page_list_len = plen; - wr->wr.fast_reg.page_shift = SHIFT_4K; - wr->wr.fast_reg.length = size; - wr->wr.fast_reg.rkey = mr->rkey; - wr->wr.fast_reg.access_flags = (IB_ACCESS_LOCAL_WRITE | - IB_ACCESS_REMOTE_WRITE | - IB_ACCESS_REMOTE_READ); + ib_set_fastreg_wr(mr, mr->rkey, ISER_FASTREG_LI_WRID, + false, wr); rsc->mr_valid = 0; reg->sge.lkey = mr->lkey; reg->rkey = mr->rkey; - reg->sge.addr = frpl->page_list[0] + offset; - reg->sge.length = size; + reg->sge.addr = mr->iova; + reg->sge.length = mr->length; iser_dbg("fast reg: lkey=0x%x, rkey=0x%x, addr=0x%llx," " length=0x%x\n", reg->sge.lkey, reg->rkey, diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index ecc3265..332f784 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -288,35 +288,21 @@ iser_alloc_reg_res(struct ib_device *ib_device, { int ret; - res->frpl = ib_alloc_fast_reg_page_list(ib_device, size); - if (IS_ERR(res->frpl)) { - ret = PTR_ERR(res->frpl); - iser_err("Failed to allocate ib_fast_reg_page_list err=%d\n", - ret); - return PTR_ERR(res->frpl); - } - res->mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, size, 0); if (IS_ERR(res->mr)) { ret = PTR_ERR(res->mr); iser_err("Failed to allocate ib_fast_reg_mr err=%d\n", ret); - goto fast_reg_mr_failure; + return ret; } res->mr_valid = 1; return 0; - -fast_reg_mr_failure: - ib_free_fast_reg_page_list(res->frpl); - - return ret; } static void iser_free_reg_res(struct iser_reg_resources *rsc) { ib_dereg_mr(rsc->mr); - ib_free_fast_reg_page_list(rsc->frpl); } static int -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 37/43] xprtrdma: Port to new memory registration API [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (35 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 36/43] iser: Port to new fast registration api Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg [not found] ` <1437548143-24893-38-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 6:55 ` [PATCH WIP 38/43] iser-target: " Sagi Grimberg ` (6 subsequent siblings) 43 siblings, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- net/sunrpc/xprtrdma/frwr_ops.c | 80 ++++++++++++++++++++++------------------- net/sunrpc/xprtrdma/xprt_rdma.h | 4 ++- 2 files changed, 47 insertions(+), 37 deletions(-) diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c index 517efed..e28246b 100644 --- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -151,9 +151,13 @@ __frwr_init(struct rpcrdma_mw *r, struct ib_pd *pd, struct ib_device *device, f->fr_mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, depth, 0); if (IS_ERR(f->fr_mr)) goto out_mr_err; - f->fr_pgl = ib_alloc_fast_reg_page_list(device, depth); - if (IS_ERR(f->fr_pgl)) + + f->sg = kcalloc(sizeof(*f->sg), depth, GFP_KERNEL); + if (IS_ERR(f->sg)) goto out_list_err; + + sg_init_table(f->sg, depth); + return 0; out_mr_err: @@ -163,7 +167,7 @@ out_mr_err: return rc; out_list_err: - rc = PTR_ERR(f->fr_pgl); + rc = -ENOMEM; dprintk("RPC: %s: ib_alloc_fast_reg_page_list status %i\n", __func__, rc); ib_dereg_mr(f->fr_mr); @@ -179,7 +183,7 @@ __frwr_release(struct rpcrdma_mw *r) if (rc) dprintk("RPC: %s: ib_dereg_mr status %i\n", __func__, rc); - ib_free_fast_reg_page_list(r->r.frmr.fr_pgl); + kfree(r->r.frmr.sg); } static int @@ -320,10 +324,7 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, struct ib_send_wr fastreg_wr, *bad_wr; u8 key; int len, pageoff; - int i, rc; - int seg_len; - u64 pa; - int page_no; + int i, rc, access; mw = seg1->rl_mw; seg1->rl_mw = NULL; @@ -344,39 +345,46 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, if (nsegs > ia->ri_max_frmr_depth) nsegs = ia->ri_max_frmr_depth; - for (page_no = i = 0; i < nsegs;) { - rpcrdma_map_one(device, seg, direction); - pa = seg->mr_dma; - for (seg_len = seg->mr_len; seg_len > 0; seg_len -= PAGE_SIZE) { - frmr->fr_pgl->page_list[page_no++] = pa; - pa += PAGE_SIZE; - } + for (i = 0; i < nsegs;) { + sg_set_page(&frmr->sg[i], seg->mr_page, + seg->mr_len, offset_in_page(seg->mr_offset)); len += seg->mr_len; - ++seg; ++i; - /* Check for holes */ + ++seg; + + /* Check for holes - needed?? */ if ((i < nsegs && offset_in_page(seg->mr_offset)) || offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len)) break; } + + frmr->sg_nents = i; + frmr->dma_nents = ib_dma_map_sg(device, frmr->sg, + frmr->sg_nents, direction); + if (!frmr->dma_nents) { + pr_err("RPC: %s: failed to dma map sg %p sg_nents %d\n", + __func__, frmr->sg, frmr->sg_nents); + return -ENOMEM; + } + dprintk("RPC: %s: Using frmr %p to map %d segments (%d bytes)\n", __func__, mw, i, len); - memset(&fastreg_wr, 0, sizeof(fastreg_wr)); - fastreg_wr.wr_id = (unsigned long)(void *)mw; - fastreg_wr.opcode = IB_WR_FAST_REG_MR; - fastreg_wr.wr.fast_reg.iova_start = seg1->mr_dma + pageoff; - fastreg_wr.wr.fast_reg.page_list = frmr->fr_pgl; - fastreg_wr.wr.fast_reg.page_shift = PAGE_SHIFT; - fastreg_wr.wr.fast_reg.page_list_len = page_no; - fastreg_wr.wr.fast_reg.length = len; - fastreg_wr.wr.fast_reg.access_flags = writing ? - IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE : - IB_ACCESS_REMOTE_READ; mr = frmr->fr_mr; + access = writing ? IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE : + IB_ACCESS_REMOTE_READ; + rc = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, access); + if (rc) { + pr_err("RPC: %s: failed to map mr %p rc %d\n", + __func__, frmr->fr_mr, rc); + return rc; + } + key = (u8)(mr->rkey & 0x000000FF); ib_update_fast_reg_key(mr, ++key); - fastreg_wr.wr.fast_reg.rkey = mr->rkey; + + memset(&fastreg_wr, 0, sizeof(fastreg_wr)); + ib_set_fastreg_wr(mr, mr->rkey, (uintptr_t)mw, false, &fastreg_wr); DECR_CQCOUNT(&r_xprt->rx_ep); rc = ib_post_send(ia->ri_id->qp, &fastreg_wr, &bad_wr); @@ -385,15 +393,14 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, seg1->rl_mw = mw; seg1->mr_rkey = mr->rkey; - seg1->mr_base = seg1->mr_dma + pageoff; + seg1->mr_base = mr->iova; seg1->mr_nsegs = i; seg1->mr_len = len; return i; out_senderr: dprintk("RPC: %s: ib_post_send status %i\n", __func__, rc); - while (i--) - rpcrdma_unmap_one(device, --seg); + ib_dma_unmap_sg(device, frmr->sg, frmr->sg_nents, direction); __frwr_queue_recovery(mw); return rc; } @@ -407,22 +414,23 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg) struct rpcrdma_mr_seg *seg1 = seg; struct rpcrdma_ia *ia = &r_xprt->rx_ia; struct rpcrdma_mw *mw = seg1->rl_mw; + struct rpcrdma_frmr *frmr = &mw->r.frmr; struct ib_send_wr invalidate_wr, *bad_wr; int rc, nsegs = seg->mr_nsegs; dprintk("RPC: %s: FRMR %p\n", __func__, mw); seg1->rl_mw = NULL; - mw->r.frmr.fr_state = FRMR_IS_INVALID; + frmr->fr_state = FRMR_IS_INVALID; memset(&invalidate_wr, 0, sizeof(invalidate_wr)); invalidate_wr.wr_id = (unsigned long)(void *)mw; invalidate_wr.opcode = IB_WR_LOCAL_INV; - invalidate_wr.ex.invalidate_rkey = mw->r.frmr.fr_mr->rkey; + invalidate_wr.ex.invalidate_rkey = frmr->fr_mr->rkey; DECR_CQCOUNT(&r_xprt->rx_ep); - while (seg1->mr_nsegs--) - rpcrdma_unmap_one(ia->ri_device, seg++); + ib_dma_unmap_sg(ia->ri_device, frmr->sg, frmr->sg_nents, seg1->mr_dir); + read_lock(&ia->ri_qplock); rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr); read_unlock(&ia->ri_qplock); diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h index 886f8c8..a1c3ab2b 100644 --- a/net/sunrpc/xprtrdma/xprt_rdma.h +++ b/net/sunrpc/xprtrdma/xprt_rdma.h @@ -195,7 +195,9 @@ enum rpcrdma_frmr_state { }; struct rpcrdma_frmr { - struct ib_fast_reg_page_list *fr_pgl; + struct scatterlist *sg; + unsigned int sg_nents; + unsigned int dma_nents; struct ib_mr *fr_mr; enum rpcrdma_frmr_state fr_state; struct work_struct fr_work; -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
[parent not found: <1437548143-24893-38-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH WIP 37/43] xprtrdma: Port to new memory registration API [not found] ` <1437548143-24893-38-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2015-07-22 15:03 ` Chuck Lever [not found] ` <795F4F28-D92F-46A1-8DA3-2B1B19A17AA3-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2015-07-22 19:21 ` Steve Wise 1 sibling, 1 reply; 142+ messages in thread From: Chuck Lever @ 2015-07-22 15:03 UTC (permalink / raw) To: Sagi Grimberg; +Cc: linux-rdma, Liran Liss, Oren Duer On Jul 22, 2015, at 2:55 AM, Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote: > Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> > --- > net/sunrpc/xprtrdma/frwr_ops.c | 80 ++++++++++++++++++++++------------------- > net/sunrpc/xprtrdma/xprt_rdma.h | 4 ++- > 2 files changed, 47 insertions(+), 37 deletions(-) > > diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c > index 517efed..e28246b 100644 > --- a/net/sunrpc/xprtrdma/frwr_ops.c > +++ b/net/sunrpc/xprtrdma/frwr_ops.c > @@ -151,9 +151,13 @@ __frwr_init(struct rpcrdma_mw *r, struct ib_pd *pd, struct ib_device *device, > f->fr_mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, depth, 0); > if (IS_ERR(f->fr_mr)) > goto out_mr_err; > - f->fr_pgl = ib_alloc_fast_reg_page_list(device, depth); > - if (IS_ERR(f->fr_pgl)) > + > + f->sg = kcalloc(sizeof(*f->sg), depth, GFP_KERNEL); > + if (IS_ERR(f->sg)) > goto out_list_err; > + > + sg_init_table(f->sg, depth); > + > return 0; > > out_mr_err: > @@ -163,7 +167,7 @@ out_mr_err: > return rc; > > out_list_err: > - rc = PTR_ERR(f->fr_pgl); > + rc = -ENOMEM; > dprintk("RPC: %s: ib_alloc_fast_reg_page_list status %i\n", > __func__, rc); > ib_dereg_mr(f->fr_mr); > @@ -179,7 +183,7 @@ __frwr_release(struct rpcrdma_mw *r) > if (rc) > dprintk("RPC: %s: ib_dereg_mr status %i\n", > __func__, rc); > - ib_free_fast_reg_page_list(r->r.frmr.fr_pgl); > + kfree(r->r.frmr.sg); > } > > static int > @@ -320,10 +324,7 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, > struct ib_send_wr fastreg_wr, *bad_wr; > u8 key; > int len, pageoff; > - int i, rc; > - int seg_len; > - u64 pa; > - int page_no; > + int i, rc, access; > > mw = seg1->rl_mw; > seg1->rl_mw = NULL; > @@ -344,39 +345,46 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, > if (nsegs > ia->ri_max_frmr_depth) > nsegs = ia->ri_max_frmr_depth; > > - for (page_no = i = 0; i < nsegs;) { > - rpcrdma_map_one(device, seg, direction); > - pa = seg->mr_dma; > - for (seg_len = seg->mr_len; seg_len > 0; seg_len -= PAGE_SIZE) { > - frmr->fr_pgl->page_list[page_no++] = pa; > - pa += PAGE_SIZE; > - } > + for (i = 0; i < nsegs;) { > + sg_set_page(&frmr->sg[i], seg->mr_page, > + seg->mr_len, offset_in_page(seg->mr_offset)); Cautionary note: here we’re dealing with both the “contiguous set of pages” case and the “small region of bytes in a single page” case. See rpcrdma_convert_iovs(): sometimes RPC send or receive buffers can be registered (RDMA_NOMSG). > len += seg->mr_len; > - ++seg; > ++i; > - /* Check for holes */ > + ++seg; > + > + /* Check for holes - needed?? */ > if ((i < nsegs && offset_in_page(seg->mr_offset)) || > offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len)) > break; > } > + > + frmr->sg_nents = i; > + frmr->dma_nents = ib_dma_map_sg(device, frmr->sg, > + frmr->sg_nents, direction); > + if (!frmr->dma_nents) { > + pr_err("RPC: %s: failed to dma map sg %p sg_nents %d\n", > + __func__, frmr->sg, frmr->sg_nents); > + return -ENOMEM; > + } > + > dprintk("RPC: %s: Using frmr %p to map %d segments (%d bytes)\n", > __func__, mw, i, len); > > - memset(&fastreg_wr, 0, sizeof(fastreg_wr)); > - fastreg_wr.wr_id = (unsigned long)(void *)mw; > - fastreg_wr.opcode = IB_WR_FAST_REG_MR; > - fastreg_wr.wr.fast_reg.iova_start = seg1->mr_dma + pageoff; > - fastreg_wr.wr.fast_reg.page_list = frmr->fr_pgl; > - fastreg_wr.wr.fast_reg.page_shift = PAGE_SHIFT; > - fastreg_wr.wr.fast_reg.page_list_len = page_no; > - fastreg_wr.wr.fast_reg.length = len; > - fastreg_wr.wr.fast_reg.access_flags = writing ? > - IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE : > - IB_ACCESS_REMOTE_READ; > mr = frmr->fr_mr; > + access = writing ? IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE : > + IB_ACCESS_REMOTE_READ; > + rc = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, access); I like this (and the matching ib_dma_unmap_sg). But why wouldn’t this function be called ib_dma_map_sg() ? The name ib_map_mr_sg() had me thinking for a moment that this API actually posted the FASTREG WR, but I see that it doesn’t. > + if (rc) { > + pr_err("RPC: %s: failed to map mr %p rc %d\n", > + __func__, frmr->fr_mr, rc); > + return rc; > + } > + > key = (u8)(mr->rkey & 0x000000FF); > ib_update_fast_reg_key(mr, ++key); > - fastreg_wr.wr.fast_reg.rkey = mr->rkey; > + > + memset(&fastreg_wr, 0, sizeof(fastreg_wr)); > + ib_set_fastreg_wr(mr, mr->rkey, (uintptr_t)mw, false, &fastreg_wr); > > DECR_CQCOUNT(&r_xprt->rx_ep); > rc = ib_post_send(ia->ri_id->qp, &fastreg_wr, &bad_wr); > @@ -385,15 +393,14 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, > > seg1->rl_mw = mw; > seg1->mr_rkey = mr->rkey; > - seg1->mr_base = seg1->mr_dma + pageoff; > + seg1->mr_base = mr->iova; > seg1->mr_nsegs = i; > seg1->mr_len = len; > return i; > > out_senderr: > dprintk("RPC: %s: ib_post_send status %i\n", __func__, rc); > - while (i--) > - rpcrdma_unmap_one(device, --seg); > + ib_dma_unmap_sg(device, frmr->sg, frmr->sg_nents, direction); > __frwr_queue_recovery(mw); > return rc; > } > @@ -407,22 +414,23 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg) > struct rpcrdma_mr_seg *seg1 = seg; > struct rpcrdma_ia *ia = &r_xprt->rx_ia; > struct rpcrdma_mw *mw = seg1->rl_mw; > + struct rpcrdma_frmr *frmr = &mw->r.frmr; > struct ib_send_wr invalidate_wr, *bad_wr; > int rc, nsegs = seg->mr_nsegs; > > dprintk("RPC: %s: FRMR %p\n", __func__, mw); > > seg1->rl_mw = NULL; > - mw->r.frmr.fr_state = FRMR_IS_INVALID; > + frmr->fr_state = FRMR_IS_INVALID; > > memset(&invalidate_wr, 0, sizeof(invalidate_wr)); > invalidate_wr.wr_id = (unsigned long)(void *)mw; > invalidate_wr.opcode = IB_WR_LOCAL_INV; > - invalidate_wr.ex.invalidate_rkey = mw->r.frmr.fr_mr->rkey; > + invalidate_wr.ex.invalidate_rkey = frmr->fr_mr->rkey; > DECR_CQCOUNT(&r_xprt->rx_ep); > > - while (seg1->mr_nsegs--) > - rpcrdma_unmap_one(ia->ri_device, seg++); > + ib_dma_unmap_sg(ia->ri_device, frmr->sg, frmr->sg_nents, seg1->mr_dir); ->mr_dir was previously set by rpcrdma_map_one(), which you’ve replaced with ib_map_mr_sg(). So maybe frwr_op_map() needs to save “direction” in the rpcrdma_frmr. > + > read_lock(&ia->ri_qplock); > rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr); > read_unlock(&ia->ri_qplock); > diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h > index 886f8c8..a1c3ab2b 100644 > --- a/net/sunrpc/xprtrdma/xprt_rdma.h > +++ b/net/sunrpc/xprtrdma/xprt_rdma.h > @@ -195,7 +195,9 @@ enum rpcrdma_frmr_state { > }; > > struct rpcrdma_frmr { > - struct ib_fast_reg_page_list *fr_pgl; > + struct scatterlist *sg; > + unsigned int sg_nents; > + unsigned int dma_nents; > struct ib_mr *fr_mr; > enum rpcrdma_frmr_state fr_state; > struct work_struct fr_work; -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <795F4F28-D92F-46A1-8DA3-2B1B19A17AA3-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH WIP 37/43] xprtrdma: Port to new memory registration API [not found] ` <795F4F28-D92F-46A1-8DA3-2B1B19A17AA3-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2015-07-22 15:41 ` Sagi Grimberg [not found] ` <55AFB9A7.4030103-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-07-22 16:59 ` Christoph Hellwig 1 sibling, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 15:41 UTC (permalink / raw) To: Chuck Lever, Sagi Grimberg; +Cc: linux-rdma, Liran Liss, Oren Duer >> + for (i = 0; i < nsegs;) { >> + sg_set_page(&frmr->sg[i], seg->mr_page, >> + seg->mr_len, offset_in_page(seg->mr_offset)); > > Cautionary note: here we’re dealing with both the “contiguous > set of pages” case and the “small region of bytes in a single page” > case. See rpcrdma_convert_iovs(): sometimes RPC send or receive > buffers can be registered (RDMA_NOMSG). I noticed that (I think). I think this is handled correctly. What exactly is the caution note here? >> mr = frmr->fr_mr; >> + access = writing ? IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE : >> + IB_ACCESS_REMOTE_READ; >> + rc = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, access); > > I like this (and the matching ib_dma_unmap_sg). But why wouldn’t > this function be called ib_dma_map_sg() ? The name ib_map_mr_sg() > had me thinking for a moment that this API actually posted the > FASTREG WR, but I see that it doesn’t. Umm, ib_dma_map_sg is already taken :) This is what I came up with, it maps the SG elements to the MR private context. I'd like to keep the post API for now. It will be possible to to add a wrapper function that would do: - dma_map_sg - ib_map_mr_sg - init fastreg send_wr - post_send (maybe) >> - while (seg1->mr_nsegs--) >> - rpcrdma_unmap_one(ia->ri_device, seg++); >> + ib_dma_unmap_sg(ia->ri_device, frmr->sg, frmr->sg_nents, seg1->mr_dir); > > ->mr_dir was previously set by rpcrdma_map_one(), which you’ve replaced > with ib_map_mr_sg(). So maybe frwr_op_map() needs to save “direction” > in the rpcrdma_frmr. Yep, that's correct, if I had turned on dma mapping debug it would shout at me here... Note, I added in the git repo a patch to allow arbitrary sg lists in frwr_op_map() which would allow you to skip the holes check... seems to work with mlx5... I did noticed the mlx4 gives a protection error with after the conversion... I'll look into that... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <55AFB9A7.4030103-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* Re: [PATCH WIP 37/43] xprtrdma: Port to new memory registration API [not found] ` <55AFB9A7.4030103-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2015-07-22 16:04 ` Chuck Lever [not found] ` <5114D0F0-7C66-4889-85D8-E7297009AF23-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Chuck Lever @ 2015-07-22 16:04 UTC (permalink / raw) To: Sagi Grimberg; +Cc: Sagi Grimberg, linux-rdma, Liran Liss, Oren Duer On Jul 22, 2015, at 11:41 AM, Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote: > >>> + for (i = 0; i < nsegs;) { >>> + sg_set_page(&frmr->sg[i], seg->mr_page, >>> + seg->mr_len, offset_in_page(seg->mr_offset)); >> >> Cautionary note: here we’re dealing with both the “contiguous >> set of pages” case and the “small region of bytes in a single page” >> case. See rpcrdma_convert_iovs(): sometimes RPC send or receive >> buffers can be registered (RDMA_NOMSG). > > I noticed that (I think). I think this is handled correctly. > What exactly is the caution note here? Well the sg is turned into a page list below your API. Just want to make sure that we have tested your xprtrdma alterations with all the ULP possibilities. When you are further along I can pull this and run my functional tests. >>> mr = frmr->fr_mr; >>> + access = writing ? IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE : >>> + IB_ACCESS_REMOTE_READ; >>> + rc = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, access); >> >> I like this (and the matching ib_dma_unmap_sg). But why wouldn’t >> this function be called ib_dma_map_sg() ? The name ib_map_mr_sg() >> had me thinking for a moment that this API actually posted the >> FASTREG WR, but I see that it doesn’t. > > Umm, ib_dma_map_sg is already taken :) > > This is what I came up with, it maps the SG elements to the MR > private context. > > I'd like to keep the post API for now. It will be possible to > to add a wrapper function that would do: > - dma_map_sg > - ib_map_mr_sg > - init fastreg send_wr > - post_send (maybe) Where xprtrdma might improve is by setting up all the FASTREG WRs for one RPC with a single chain and post_send. We could do that with your INDIR_MR concept, for example. >>> - while (seg1->mr_nsegs--) >>> - rpcrdma_unmap_one(ia->ri_device, seg++); >>> + ib_dma_unmap_sg(ia->ri_device, frmr->sg, frmr->sg_nents, seg1->mr_dir); >> >> ->mr_dir was previously set by rpcrdma_map_one(), which you’ve replaced >> with ib_map_mr_sg(). So maybe frwr_op_map() needs to save “direction” >> in the rpcrdma_frmr. > > Yep, that's correct, if I had turned on dma mapping debug it would shout > at me here... > > Note, I added in the git repo a patch to allow arbitrary sg lists in > frwr_op_map() which would allow you to skip the holes check... seems to > work with mlx5... > > I did noticed the mlx4 gives a protection error with after the conversion... I'll look into that... Should also get Steve and Devesh to try this with their adapters. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <5114D0F0-7C66-4889-85D8-E7297009AF23-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH WIP 37/43] xprtrdma: Port to new memory registration API [not found] ` <5114D0F0-7C66-4889-85D8-E7297009AF23-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2015-07-23 10:42 ` Sagi Grimberg 0 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-23 10:42 UTC (permalink / raw) To: Chuck Lever; +Cc: Sagi Grimberg, linux-rdma, Liran Liss, Oren Duer On 7/22/2015 7:04 PM, Chuck Lever wrote: > > On Jul 22, 2015, at 11:41 AM, Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote: > >> >>>> + for (i = 0; i < nsegs;) { >>>> + sg_set_page(&frmr->sg[i], seg->mr_page, >>>> + seg->mr_len, offset_in_page(seg->mr_offset)); >>> >>> Cautionary note: here we’re dealing with both the “contiguous >>> set of pages” case and the “small region of bytes in a single page” >>> case. See rpcrdma_convert_iovs(): sometimes RPC send or receive >>> buffers can be registered (RDMA_NOMSG). >> >> I noticed that (I think). I think this is handled correctly. >> What exactly is the caution note here? > > Well the sg is turned into a page list below your API. Just > want to make sure that we have tested your xprtrdma alterations > with all the ULP possibilities. When you are further along I > can pull this and run my functional tests. > > >>>> mr = frmr->fr_mr; >>>> + access = writing ? IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE : >>>> + IB_ACCESS_REMOTE_READ; >>>> + rc = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, access); >>> >>> I like this (and the matching ib_dma_unmap_sg). But why wouldn’t >>> this function be called ib_dma_map_sg() ? The name ib_map_mr_sg() >>> had me thinking for a moment that this API actually posted the >>> FASTREG WR, but I see that it doesn’t. >> >> Umm, ib_dma_map_sg is already taken :) >> >> This is what I came up with, it maps the SG elements to the MR >> private context. >> >> I'd like to keep the post API for now. It will be possible to >> to add a wrapper function that would do: >> - dma_map_sg >> - ib_map_mr_sg >> - init fastreg send_wr >> - post_send (maybe) > > Where xprtrdma might improve is by setting up all the FASTREG > WRs for one RPC with a single chain and post_send. We could do > that with your INDIR_MR concept, for example. BTW, it would be great if you can play with it a little bit. I'm more confident with the iSER part... I added two small fixes when I tested with mlx4. It seems to work... > > >>>> - while (seg1->mr_nsegs--) >>>> - rpcrdma_unmap_one(ia->ri_device, seg++); >>>> + ib_dma_unmap_sg(ia->ri_device, frmr->sg, frmr->sg_nents, seg1->mr_dir); >>> >>> ->mr_dir was previously set by rpcrdma_map_one(), which you’ve replaced >>> with ib_map_mr_sg(). So maybe frwr_op_map() needs to save “direction” >>> in the rpcrdma_frmr. >> >> Yep, that's correct, if I had turned on dma mapping debug it would shout >> at me here... >> >> Note, I added in the git repo a patch to allow arbitrary sg lists in >> frwr_op_map() which would allow you to skip the holes check... seems to >> work with mlx5... >> >> I did noticed the mlx4 gives a protection error with after the conversion... I'll look into that... > > Should also get Steve and Devesh to try this with their adapters. Ah, yes please. I've only compiled tested drivers other than mlx4, mlx5 which means there is a 99.9% (probably 100%) that it doesn't work. It would be great to get help on porting the rest of the ULPs as well, but that can wait until we converge on the API... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH WIP 37/43] xprtrdma: Port to new memory registration API [not found] ` <795F4F28-D92F-46A1-8DA3-2B1B19A17AA3-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2015-07-22 15:41 ` Sagi Grimberg @ 2015-07-22 16:59 ` Christoph Hellwig 1 sibling, 0 replies; 142+ messages in thread From: Christoph Hellwig @ 2015-07-22 16:59 UTC (permalink / raw) To: Chuck Lever; +Cc: Sagi Grimberg, linux-rdma, Liran Liss, Oren Duer On Wed, Jul 22, 2015 at 11:03:49AM -0400, Chuck Lever wrote: > I like this (and the matching ib_dma_unmap_sg). But why wouldn?t > this function be called ib_dma_map_sg() ? The name ib_map_mr_sg() > had me thinking for a moment that this API actually posted the > FASTREG WR, but I see that it doesn?t. We already have a ib_dma_map_sg, which is a wrapper around dma_map_sg that allows ehc ipath amd qib to do naughty things instead of the regular dma mapping. But it seems maybe the dma_map_sg calls or the magic for those other drivers should be folded into Sagi's new API as those HCA apparently don't need physical addresses and thus the S/G list. God knows what's they're doing with a list of virtual addresses, but removing the struct scatterlist abuse there would be highly welcome. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH WIP 37/43] xprtrdma: Port to new memory registration API [not found] ` <1437548143-24893-38-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 15:03 ` Chuck Lever @ 2015-07-22 19:21 ` Steve Wise [not found] ` <55AFED4C.9040409-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 1 sibling, 1 reply; 142+ messages in thread From: Steve Wise @ 2015-07-22 19:21 UTC (permalink / raw) To: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer On 7/22/2015 1:55 AM, Sagi Grimberg wrote: > Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> > --- > net/sunrpc/xprtrdma/frwr_ops.c | 80 ++++++++++++++++++++++------------------- > net/sunrpc/xprtrdma/xprt_rdma.h | 4 ++- > 2 files changed, 47 insertions(+), 37 deletions(-) Did you intend to change svcrdma as well? > diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c > index 517efed..e28246b 100644 > --- a/net/sunrpc/xprtrdma/frwr_ops.c > +++ b/net/sunrpc/xprtrdma/frwr_ops.c > @@ -151,9 +151,13 @@ __frwr_init(struct rpcrdma_mw *r, struct ib_pd *pd, struct ib_device *device, > f->fr_mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, depth, 0); > if (IS_ERR(f->fr_mr)) > goto out_mr_err; > - f->fr_pgl = ib_alloc_fast_reg_page_list(device, depth); > - if (IS_ERR(f->fr_pgl)) > + > + f->sg = kcalloc(sizeof(*f->sg), depth, GFP_KERNEL); > + if (IS_ERR(f->sg)) > goto out_list_err; > + > + sg_init_table(f->sg, depth); > + > return 0; > > out_mr_err: > @@ -163,7 +167,7 @@ out_mr_err: > return rc; > > out_list_err: > - rc = PTR_ERR(f->fr_pgl); > + rc = -ENOMEM; > dprintk("RPC: %s: ib_alloc_fast_reg_page_list status %i\n", > __func__, rc); > ib_dereg_mr(f->fr_mr); > @@ -179,7 +183,7 @@ __frwr_release(struct rpcrdma_mw *r) > if (rc) > dprintk("RPC: %s: ib_dereg_mr status %i\n", > __func__, rc); > - ib_free_fast_reg_page_list(r->r.frmr.fr_pgl); > + kfree(r->r.frmr.sg); > } > > static int > @@ -320,10 +324,7 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, > struct ib_send_wr fastreg_wr, *bad_wr; > u8 key; > int len, pageoff; > - int i, rc; > - int seg_len; > - u64 pa; > - int page_no; > + int i, rc, access; > > mw = seg1->rl_mw; > seg1->rl_mw = NULL; > @@ -344,39 +345,46 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, > if (nsegs > ia->ri_max_frmr_depth) > nsegs = ia->ri_max_frmr_depth; > > - for (page_no = i = 0; i < nsegs;) { > - rpcrdma_map_one(device, seg, direction); > - pa = seg->mr_dma; > - for (seg_len = seg->mr_len; seg_len > 0; seg_len -= PAGE_SIZE) { > - frmr->fr_pgl->page_list[page_no++] = pa; > - pa += PAGE_SIZE; > - } > + for (i = 0; i < nsegs;) { > + sg_set_page(&frmr->sg[i], seg->mr_page, > + seg->mr_len, offset_in_page(seg->mr_offset)); > len += seg->mr_len; > - ++seg; > ++i; > - /* Check for holes */ > + ++seg; > + > + /* Check for holes - needed?? */ > if ((i < nsegs && offset_in_page(seg->mr_offset)) || > offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len)) > break; > } > + > + frmr->sg_nents = i; > + frmr->dma_nents = ib_dma_map_sg(device, frmr->sg, > + frmr->sg_nents, direction); > + if (!frmr->dma_nents) { > + pr_err("RPC: %s: failed to dma map sg %p sg_nents %d\n", > + __func__, frmr->sg, frmr->sg_nents); > + return -ENOMEM; > + } > + > dprintk("RPC: %s: Using frmr %p to map %d segments (%d bytes)\n", > __func__, mw, i, len); > > - memset(&fastreg_wr, 0, sizeof(fastreg_wr)); > - fastreg_wr.wr_id = (unsigned long)(void *)mw; > - fastreg_wr.opcode = IB_WR_FAST_REG_MR; > - fastreg_wr.wr.fast_reg.iova_start = seg1->mr_dma + pageoff; > - fastreg_wr.wr.fast_reg.page_list = frmr->fr_pgl; > - fastreg_wr.wr.fast_reg.page_shift = PAGE_SHIFT; > - fastreg_wr.wr.fast_reg.page_list_len = page_no; > - fastreg_wr.wr.fast_reg.length = len; > - fastreg_wr.wr.fast_reg.access_flags = writing ? > - IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE : > - IB_ACCESS_REMOTE_READ; > mr = frmr->fr_mr; > + access = writing ? IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE : > + IB_ACCESS_REMOTE_READ; > + rc = ib_map_mr_sg(mr, frmr->sg, frmr->sg_nents, access); > + if (rc) { > + pr_err("RPC: %s: failed to map mr %p rc %d\n", > + __func__, frmr->fr_mr, rc); > + return rc; > + } > + > key = (u8)(mr->rkey & 0x000000FF); > ib_update_fast_reg_key(mr, ++key); > - fastreg_wr.wr.fast_reg.rkey = mr->rkey; > + > + memset(&fastreg_wr, 0, sizeof(fastreg_wr)); > + ib_set_fastreg_wr(mr, mr->rkey, (uintptr_t)mw, false, &fastreg_wr); > > DECR_CQCOUNT(&r_xprt->rx_ep); > rc = ib_post_send(ia->ri_id->qp, &fastreg_wr, &bad_wr); > @@ -385,15 +393,14 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, > > seg1->rl_mw = mw; > seg1->mr_rkey = mr->rkey; > - seg1->mr_base = seg1->mr_dma + pageoff; > + seg1->mr_base = mr->iova; > seg1->mr_nsegs = i; > seg1->mr_len = len; > return i; > > out_senderr: > dprintk("RPC: %s: ib_post_send status %i\n", __func__, rc); > - while (i--) > - rpcrdma_unmap_one(device, --seg); > + ib_dma_unmap_sg(device, frmr->sg, frmr->sg_nents, direction); > __frwr_queue_recovery(mw); > return rc; > } > @@ -407,22 +414,23 @@ frwr_op_unmap(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg) > struct rpcrdma_mr_seg *seg1 = seg; > struct rpcrdma_ia *ia = &r_xprt->rx_ia; > struct rpcrdma_mw *mw = seg1->rl_mw; > + struct rpcrdma_frmr *frmr = &mw->r.frmr; > struct ib_send_wr invalidate_wr, *bad_wr; > int rc, nsegs = seg->mr_nsegs; > > dprintk("RPC: %s: FRMR %p\n", __func__, mw); > > seg1->rl_mw = NULL; > - mw->r.frmr.fr_state = FRMR_IS_INVALID; > + frmr->fr_state = FRMR_IS_INVALID; > > memset(&invalidate_wr, 0, sizeof(invalidate_wr)); > invalidate_wr.wr_id = (unsigned long)(void *)mw; > invalidate_wr.opcode = IB_WR_LOCAL_INV; > - invalidate_wr.ex.invalidate_rkey = mw->r.frmr.fr_mr->rkey; > + invalidate_wr.ex.invalidate_rkey = frmr->fr_mr->rkey; > DECR_CQCOUNT(&r_xprt->rx_ep); > > - while (seg1->mr_nsegs--) > - rpcrdma_unmap_one(ia->ri_device, seg++); > + ib_dma_unmap_sg(ia->ri_device, frmr->sg, frmr->sg_nents, seg1->mr_dir); > + > read_lock(&ia->ri_qplock); > rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr); > read_unlock(&ia->ri_qplock); > diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h > index 886f8c8..a1c3ab2b 100644 > --- a/net/sunrpc/xprtrdma/xprt_rdma.h > +++ b/net/sunrpc/xprtrdma/xprt_rdma.h > @@ -195,7 +195,9 @@ enum rpcrdma_frmr_state { > }; > > struct rpcrdma_frmr { > - struct ib_fast_reg_page_list *fr_pgl; > + struct scatterlist *sg; > + unsigned int sg_nents; > + unsigned int dma_nents; > struct ib_mr *fr_mr; > enum rpcrdma_frmr_state fr_state; > struct work_struct fr_work; -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <55AFED4C.9040409-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>]
* Re: [PATCH WIP 37/43] xprtrdma: Port to new memory registration API [not found] ` <55AFED4C.9040409-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> @ 2015-07-23 10:20 ` Sagi Grimberg [not found] ` <55B0C002.60307-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-23 10:20 UTC (permalink / raw) To: Steve Wise, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: Liran Liss, Oren Duer On 7/22/2015 10:21 PM, Steve Wise wrote: > > On 7/22/2015 1:55 AM, Sagi Grimberg wrote: >> Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> >> --- >> net/sunrpc/xprtrdma/frwr_ops.c | 80 >> ++++++++++++++++++++++------------------- >> net/sunrpc/xprtrdma/xprt_rdma.h | 4 ++- >> 2 files changed, 47 insertions(+), 37 deletions(-) > > Did you intend to change svcrdma as well? All the ULPs need to convert. I didn't have a chance to convert svcrdma yet. Want to take it? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <55B0C002.60307-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* RE: [PATCH WIP 37/43] xprtrdma: Port to new memory registration API [not found] ` <55B0C002.60307-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2015-07-23 13:46 ` Steve Wise 0 siblings, 0 replies; 142+ messages in thread From: Steve Wise @ 2015-07-23 13:46 UTC (permalink / raw) To: 'Sagi Grimberg', 'Sagi Grimberg', linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: 'Liran Liss', 'Oren Duer' > -----Original Message----- > From: Sagi Grimberg [mailto:sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org] > Sent: Thursday, July 23, 2015 5:21 AM > To: Steve Wise; Sagi Grimberg; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > Cc: Liran Liss; Oren Duer > Subject: Re: [PATCH WIP 37/43] xprtrdma: Port to new memory registration API > > On 7/22/2015 10:21 PM, Steve Wise wrote: > > > > On 7/22/2015 1:55 AM, Sagi Grimberg wrote: > >> Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> > >> --- > >> net/sunrpc/xprtrdma/frwr_ops.c | 80 > >> ++++++++++++++++++++++------------------- > >> net/sunrpc/xprtrdma/xprt_rdma.h | 4 ++- > >> 2 files changed, 47 insertions(+), 37 deletions(-) > > > > Did you intend to change svcrdma as well? > > All the ULPs need to convert. I didn't have a chance to convert > svcrdma yet. Want to take it? Not right now. My focus is still on enabling iSER. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* [PATCH WIP 38/43] iser-target: Port to new memory registration API [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (36 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 37/43] xprtrdma: Port to new memory registration API Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg [not found] ` <1437548143-24893-39-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 6:55 ` [PATCH WIP 39/43] IB/core: Add arbitrary sg_list support Sagi Grimberg ` (5 subsequent siblings) 43 siblings, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/ulp/isert/ib_isert.c | 116 ++++++-------------------------- drivers/infiniband/ulp/isert/ib_isert.h | 2 - 2 files changed, 19 insertions(+), 99 deletions(-) diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index 94395ce..af1c01d 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -486,10 +486,8 @@ isert_conn_free_fastreg_pool(struct isert_conn *isert_conn) list_for_each_entry_safe(fr_desc, tmp, &isert_conn->fr_pool, list) { list_del(&fr_desc->list); - ib_free_fast_reg_page_list(fr_desc->data_frpl); ib_dereg_mr(fr_desc->data_mr); if (fr_desc->pi_ctx) { - ib_free_fast_reg_page_list(fr_desc->pi_ctx->prot_frpl); ib_dereg_mr(fr_desc->pi_ctx->prot_mr); ib_dereg_mr(fr_desc->pi_ctx->sig_mr); kfree(fr_desc->pi_ctx); @@ -517,22 +515,13 @@ isert_create_pi_ctx(struct fast_reg_descriptor *desc, return -ENOMEM; } - pi_ctx->prot_frpl = ib_alloc_fast_reg_page_list(device, - ISCSI_ISER_SG_TABLESIZE); - if (IS_ERR(pi_ctx->prot_frpl)) { - isert_err("Failed to allocate prot frpl err=%ld\n", - PTR_ERR(pi_ctx->prot_frpl)); - ret = PTR_ERR(pi_ctx->prot_frpl); - goto err_pi_ctx; - } - pi_ctx->prot_mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, ISCSI_ISER_SG_TABLESIZE, 0); if (IS_ERR(pi_ctx->prot_mr)) { isert_err("Failed to allocate prot frmr err=%ld\n", PTR_ERR(pi_ctx->prot_mr)); ret = PTR_ERR(pi_ctx->prot_mr); - goto err_prot_frpl; + goto err_pi_ctx; } desc->ind |= ISERT_PROT_KEY_VALID; @@ -552,8 +541,6 @@ isert_create_pi_ctx(struct fast_reg_descriptor *desc, err_prot_mr: ib_dereg_mr(pi_ctx->prot_mr); -err_prot_frpl: - ib_free_fast_reg_page_list(pi_ctx->prot_frpl); err_pi_ctx: kfree(pi_ctx); @@ -564,34 +551,18 @@ static int isert_create_fr_desc(struct ib_device *ib_device, struct ib_pd *pd, struct fast_reg_descriptor *fr_desc) { - int ret; - - fr_desc->data_frpl = ib_alloc_fast_reg_page_list(ib_device, - ISCSI_ISER_SG_TABLESIZE); - if (IS_ERR(fr_desc->data_frpl)) { - isert_err("Failed to allocate data frpl err=%ld\n", - PTR_ERR(fr_desc->data_frpl)); - return PTR_ERR(fr_desc->data_frpl); - } - fr_desc->data_mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, ISCSI_ISER_SG_TABLESIZE, 0); if (IS_ERR(fr_desc->data_mr)) { isert_err("Failed to allocate data frmr err=%ld\n", PTR_ERR(fr_desc->data_mr)); - ret = PTR_ERR(fr_desc->data_mr); - goto err_data_frpl; + return PTR_ERR(fr_desc->data_mr); } fr_desc->ind |= ISERT_DATA_KEY_VALID; isert_dbg("Created fr_desc %p\n", fr_desc); return 0; - -err_data_frpl: - ib_free_fast_reg_page_list(fr_desc->data_frpl); - - return ret; } static int @@ -2521,45 +2492,6 @@ unmap_cmd: return ret; } -static int -isert_map_fr_pagelist(struct ib_device *ib_dev, - struct scatterlist *sg_start, int sg_nents, u64 *fr_pl) -{ - u64 start_addr, end_addr, page, chunk_start = 0; - struct scatterlist *tmp_sg; - int i = 0, new_chunk, last_ent, n_pages; - - n_pages = 0; - new_chunk = 1; - last_ent = sg_nents - 1; - for_each_sg(sg_start, tmp_sg, sg_nents, i) { - start_addr = ib_sg_dma_address(ib_dev, tmp_sg); - if (new_chunk) - chunk_start = start_addr; - end_addr = start_addr + ib_sg_dma_len(ib_dev, tmp_sg); - - isert_dbg("SGL[%d] dma_addr: 0x%llx len: %u\n", - i, (unsigned long long)tmp_sg->dma_address, - tmp_sg->length); - - if ((end_addr & ~PAGE_MASK) && i < last_ent) { - new_chunk = 0; - continue; - } - new_chunk = 1; - - page = chunk_start & PAGE_MASK; - do { - fr_pl[n_pages++] = page; - isert_dbg("Mapped page_list[%d] page_addr: 0x%llx\n", - n_pages - 1, page); - page += PAGE_SIZE; - } while (page < end_addr); - } - - return n_pages; -} - static inline void isert_inv_rkey(struct ib_send_wr *inv_wr, struct ib_mr *mr) { @@ -2585,11 +2517,9 @@ isert_fast_reg_mr(struct isert_conn *isert_conn, struct isert_device *device = isert_conn->device; struct ib_device *ib_dev = device->ib_device; struct ib_mr *mr; - struct ib_fast_reg_page_list *frpl; struct ib_send_wr fr_wr, inv_wr; struct ib_send_wr *bad_wr, *wr = NULL; - int ret, pagelist_len; - u32 page_off; + int ret; if (mem->dma_nents == 1) { sge->lkey = device->mr->lkey; @@ -2600,40 +2530,32 @@ isert_fast_reg_mr(struct isert_conn *isert_conn, return 0; } - if (ind == ISERT_DATA_KEY_VALID) { + if (ind == ISERT_DATA_KEY_VALID) /* Registering data buffer */ mr = fr_desc->data_mr; - frpl = fr_desc->data_frpl; - } else { + else /* Registering protection buffer */ mr = fr_desc->pi_ctx->prot_mr; - frpl = fr_desc->pi_ctx->prot_frpl; - } - - page_off = mem->offset % PAGE_SIZE; - - isert_dbg("Use fr_desc %p sg_nents %d offset %u\n", - fr_desc, mem->nents, mem->offset); - - pagelist_len = isert_map_fr_pagelist(ib_dev, mem->sg, mem->nents, - &frpl->page_list[0]); if (!(fr_desc->ind & ind)) { isert_inv_rkey(&inv_wr, mr); wr = &inv_wr; } + ret = ib_map_mr_sg(mr, mem->sg, mem->nents, IB_ACCESS_LOCAL_WRITE); + if (ret) { + isert_err("failed to map sg %p with %d entries\n", + mem->sg, mem->dma_nents); + return ret; + } + + isert_dbg("Use fr_desc %p sg_nents %d offset %u\n", + fr_desc, mem->nents, mem->offset); + /* Prepare FASTREG WR */ memset(&fr_wr, 0, sizeof(fr_wr)); - fr_wr.wr_id = ISER_FASTREG_LI_WRID; - fr_wr.opcode = IB_WR_FAST_REG_MR; - fr_wr.wr.fast_reg.iova_start = frpl->page_list[0] + page_off; - fr_wr.wr.fast_reg.page_list = frpl; - fr_wr.wr.fast_reg.page_list_len = pagelist_len; - fr_wr.wr.fast_reg.page_shift = PAGE_SHIFT; - fr_wr.wr.fast_reg.length = mem->len; - fr_wr.wr.fast_reg.rkey = mr->rkey; - fr_wr.wr.fast_reg.access_flags = IB_ACCESS_LOCAL_WRITE; + ib_set_fastreg_wr(mr, mr->lkey, ISER_FASTREG_LI_WRID, + false, &fr_wr); if (!wr) wr = &fr_wr; @@ -2648,8 +2570,8 @@ isert_fast_reg_mr(struct isert_conn *isert_conn, fr_desc->ind &= ~ind; sge->lkey = mr->lkey; - sge->addr = frpl->page_list[0] + page_off; - sge->length = mem->len; + sge->addr = mr->iova; + sge->length = mr->length; isert_dbg("sge: addr: 0x%llx length: %u lkey: %x\n", sge->addr, sge->length, sge->lkey); diff --git a/drivers/infiniband/ulp/isert/ib_isert.h b/drivers/infiniband/ulp/isert/ib_isert.h index 9ec23a78..a63fc6a 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.h +++ b/drivers/infiniband/ulp/isert/ib_isert.h @@ -84,14 +84,12 @@ enum isert_indicator { struct pi_context { struct ib_mr *prot_mr; - struct ib_fast_reg_page_list *prot_frpl; struct ib_mr *sig_mr; }; struct fast_reg_descriptor { struct list_head list; struct ib_mr *data_mr; - struct ib_fast_reg_page_list *data_frpl; u8 ind; struct pi_context *pi_ctx; }; -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
[parent not found: <1437548143-24893-39-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API [not found] ` <1437548143-24893-39-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2015-07-22 17:04 ` Christoph Hellwig [not found] ` <20150722170413.GE6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Christoph Hellwig @ 2015-07-22 17:04 UTC (permalink / raw) To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer > @@ -2585,11 +2517,9 @@ isert_fast_reg_mr(struct isert_conn *isert_conn, > struct isert_device *device = isert_conn->device; > struct ib_device *ib_dev = device->ib_device; > struct ib_mr *mr; > struct ib_send_wr fr_wr, inv_wr; > struct ib_send_wr *bad_wr, *wr = NULL; > + int ret; > > if (mem->dma_nents == 1) { > sge->lkey = device->mr->lkey; > @@ -2600,40 +2530,32 @@ isert_fast_reg_mr(struct isert_conn *isert_conn, > return 0; > } > > + if (ind == ISERT_DATA_KEY_VALID) > /* Registering data buffer */ > mr = fr_desc->data_mr; > + else > /* Registering protection buffer */ > mr = fr_desc->pi_ctx->prot_mr; > > if (!(fr_desc->ind & ind)) { > isert_inv_rkey(&inv_wr, mr); > wr = &inv_wr; > } > > + ret = ib_map_mr_sg(mr, mem->sg, mem->nents, IB_ACCESS_LOCAL_WRITE); > + if (ret) { > + isert_err("failed to map sg %p with %d entries\n", > + mem->sg, mem->dma_nents); > + return ret; > + } > + > + isert_dbg("Use fr_desc %p sg_nents %d offset %u\n", > + fr_desc, mem->nents, mem->offset); > + > /* Prepare FASTREG WR */ > memset(&fr_wr, 0, sizeof(fr_wr)); > + ib_set_fastreg_wr(mr, mr->lkey, ISER_FASTREG_LI_WRID, > + false, &fr_wr); Shouldn't ib_set_fastreg_wr take care of this memset? Also it seems instead of the singalled flag to it we might just set that or other flags later if we really want to. > struct pi_context { > struct ib_mr *prot_mr; > - struct ib_fast_reg_page_list *prot_frpl; > struct ib_mr *sig_mr; > }; > > struct fast_reg_descriptor { > struct list_head list; > struct ib_mr *data_mr; > - struct ib_fast_reg_page_list *data_frpl; > u8 ind; > struct pi_context *pi_ctx; As a follow on it might be worth to just kill off the separate pi_context structure here. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150722170413.GE6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>]
* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API [not found] ` <20150722170413.GE6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> @ 2015-07-22 17:33 ` Sagi Grimberg [not found] ` <55AFD3DC.8070508-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 17:33 UTC (permalink / raw) To: Christoph Hellwig, Sagi Grimberg Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/22/2015 8:04 PM, Christoph Hellwig wrote: >> @@ -2585,11 +2517,9 @@ isert_fast_reg_mr(struct isert_conn *isert_conn, >> struct isert_device *device = isert_conn->device; >> struct ib_device *ib_dev = device->ib_device; >> struct ib_mr *mr; >> struct ib_send_wr fr_wr, inv_wr; >> struct ib_send_wr *bad_wr, *wr = NULL; >> + int ret; >> >> if (mem->dma_nents == 1) { >> sge->lkey = device->mr->lkey; >> @@ -2600,40 +2530,32 @@ isert_fast_reg_mr(struct isert_conn *isert_conn, >> return 0; >> } >> >> + if (ind == ISERT_DATA_KEY_VALID) >> /* Registering data buffer */ >> mr = fr_desc->data_mr; >> + else >> /* Registering protection buffer */ >> mr = fr_desc->pi_ctx->prot_mr; >> >> if (!(fr_desc->ind & ind)) { >> isert_inv_rkey(&inv_wr, mr); >> wr = &inv_wr; >> } >> >> + ret = ib_map_mr_sg(mr, mem->sg, mem->nents, IB_ACCESS_LOCAL_WRITE); >> + if (ret) { >> + isert_err("failed to map sg %p with %d entries\n", >> + mem->sg, mem->dma_nents); >> + return ret; >> + } >> + >> + isert_dbg("Use fr_desc %p sg_nents %d offset %u\n", >> + fr_desc, mem->nents, mem->offset); >> + >> /* Prepare FASTREG WR */ >> memset(&fr_wr, 0, sizeof(fr_wr)); >> + ib_set_fastreg_wr(mr, mr->lkey, ISER_FASTREG_LI_WRID, >> + false, &fr_wr); > > Shouldn't ib_set_fastreg_wr take care of this memset? Also it seems > instead of the singalled flag to it we might just set that or > other flags later if we really want to. The reason I didn't put it in was that ib_send_wr is not a small struct (92 bytes IIRC). So I'm a bit reluctant to add an unconditional memset. Maybe it's better that the callers can carefully set it to save some cycles? > >> struct pi_context { >> struct ib_mr *prot_mr; >> - struct ib_fast_reg_page_list *prot_frpl; >> struct ib_mr *sig_mr; >> }; >> >> struct fast_reg_descriptor { >> struct list_head list; >> struct ib_mr *data_mr; >> - struct ib_fast_reg_page_list *data_frpl; >> u8 ind; >> struct pi_context *pi_ctx; > > As a follow on it might be worth to just kill off the separate > pi_context structure here. Yea we can do that.. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <55AFD3DC.8070508-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API [not found] ` <55AFD3DC.8070508-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2015-07-22 17:57 ` Jason Gunthorpe [not found] ` <20150722175755.GH26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-22 17:57 UTC (permalink / raw) To: Sagi Grimberg Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Wed, Jul 22, 2015 at 08:33:16PM +0300, Sagi Grimberg wrote: > >> memset(&fr_wr, 0, sizeof(fr_wr)); > >>+ ib_set_fastreg_wr(mr, mr->lkey, ISER_FASTREG_LI_WRID, > >>+ false, &fr_wr); > > > >Shouldn't ib_set_fastreg_wr take care of this memset? Also it seems > >instead of the singalled flag to it we might just set that or > >other flags later if we really want to. Seems reasonable. If you want to micro optimize then just zero the few items that are defined to be accessed for fastreg, no need to zero the whole structure. Infact, you may have already done that, so just drop the memset entirely. > The reason I didn't put it in was that ib_send_wr is not a small struct > (92 bytes IIRC). So I'm a bit reluctant to add an unconditional memset. > Maybe it's better that the callers can carefully set it to save some > cycles? If you want to optimize this path, then Sean is right, move the post into the driver and stop pretending that ib_post_send is a performance API. ib_post_fastreg_wr would be a function that needs 3 register passed arguments and does a simple copy to the driver's actual sendq No 96 byte structure memset, no stack traffic, no conditional jumps. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150722175755.GH26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API [not found] ` <20150722175755.GH26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-23 10:27 ` Sagi Grimberg [not found] ` <55B0C18B.4080901-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-23 10:27 UTC (permalink / raw) To: Jason Gunthorpe Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/22/2015 8:57 PM, Jason Gunthorpe wrote: > On Wed, Jul 22, 2015 at 08:33:16PM +0300, Sagi Grimberg wrote: >>>> memset(&fr_wr, 0, sizeof(fr_wr)); >>>> + ib_set_fastreg_wr(mr, mr->lkey, ISER_FASTREG_LI_WRID, >>>> + false, &fr_wr); >>> >>> Shouldn't ib_set_fastreg_wr take care of this memset? Also it seems >>> instead of the singalled flag to it we might just set that or >>> other flags later if we really want to. > > Seems reasonable. > > If you want to micro optimize then just zero the few items that are > defined to be accessed for fastreg, no need to zero the whole > structure. Infact, you may have already done that, so just drop the > memset entirely. I will. > >> The reason I didn't put it in was that ib_send_wr is not a small struct >> (92 bytes IIRC). So I'm a bit reluctant to add an unconditional memset. >> Maybe it's better that the callers can carefully set it to save some >> cycles? > > If you want to optimize this path, then Sean is right, move the post > into the driver and stop pretending that ib_post_send is a performance > API. > > ib_post_fastreg_wr would be a function that needs 3 register passed > arguments and does a simple copy to the driver's actual sendq That will require to take the SQ lock and write a doorbell for each registration and post you want to do. I'm confident that constructing a post chain with a single sq lock acquire and a single doorbell will be much much better even with conditional jumps and memsets. svcrdma, isert (and iser - not upstream yet) are doing it. I think that others should do it too. My tests shows that this makes a difference in small IO workloads. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <55B0C18B.4080901-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API [not found] ` <55B0C18B.4080901-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2015-07-23 13:35 ` Chuck Lever 2015-07-23 16:31 ` Jason Gunthorpe 1 sibling, 0 replies; 142+ messages in thread From: Chuck Lever @ 2015-07-23 13:35 UTC (permalink / raw) To: Sagi Grimberg Cc: Jason Gunthorpe, Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Jul 23, 2015, at 6:27 AM, Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote: > On 7/22/2015 8:57 PM, Jason Gunthorpe wrote: >> On Wed, Jul 22, 2015 at 08:33:16PM +0300, Sagi Grimberg wrote: >>>>> memset(&fr_wr, 0, sizeof(fr_wr)); >>>>> + ib_set_fastreg_wr(mr, mr->lkey, ISER_FASTREG_LI_WRID, >>>>> + false, &fr_wr); >>>> >>>> Shouldn't ib_set_fastreg_wr take care of this memset? Also it seems >>>> instead of the singalled flag to it we might just set that or >>>> other flags later if we really want to. >> >> Seems reasonable. >> >> If you want to micro optimize then just zero the few items that are >> defined to be accessed for fastreg, no need to zero the whole >> structure. Infact, you may have already done that, so just drop the >> memset entirely. > > I will. > >> >>> The reason I didn't put it in was that ib_send_wr is not a small struct >>> (92 bytes IIRC). So I'm a bit reluctant to add an unconditional memset. >>> Maybe it's better that the callers can carefully set it to save some >>> cycles? >> >> If you want to optimize this path, then Sean is right, move the post >> into the driver and stop pretending that ib_post_send is a performance >> API. >> >> ib_post_fastreg_wr would be a function that needs 3 register passed >> arguments and does a simple copy to the driver's actual sendq > > That will require to take the SQ lock and write a doorbell for each > registration and post you want to do. I'm confident that constructing > a post chain with a single sq lock acquire and a single doorbell will > be much much better even with conditional jumps and memsets. I agree. xprtrdma uses several MRs per RPC. It would be more efficient to chain together several WRs and post once to deal with these, especially for HCAs/providers that have a shallow page_list depth. > svcrdma, isert (and iser - not upstream yet) are doing it. I think that > others should do it too. My tests shows that this makes a difference in > small IO workloads. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API [not found] ` <55B0C18B.4080901-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-07-23 13:35 ` Chuck Lever @ 2015-07-23 16:31 ` Jason Gunthorpe [not found] ` <20150723163124.GD25174-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 1 sibling, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-23 16:31 UTC (permalink / raw) To: Sagi Grimberg Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Thu, Jul 23, 2015 at 01:27:23PM +0300, Sagi Grimberg wrote: > >ib_post_fastreg_wr would be a function that needs 3 register passed > >arguments and does a simple copy to the driver's actual sendq > > That will require to take the SQ lock and write a doorbell for each > registration and post you want to do. I'm confident that constructing > a post chain with a single sq lock acquire and a single doorbell will > be much much better even with conditional jumps and memsets. You are still thinking at a micro level, the ULP should be working at a higher level and requesting the MR(s) and the actual work together so the driver can run the the whole chain of posts without extra stack traffic, locking or doorbells. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150723163124.GD25174-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API [not found] ` <20150723163124.GD25174-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-23 16:59 ` Sagi Grimberg [not found] ` <55B11D84.102-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-23 16:59 UTC (permalink / raw) To: Jason Gunthorpe Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/23/2015 7:31 PM, Jason Gunthorpe wrote: > On Thu, Jul 23, 2015 at 01:27:23PM +0300, Sagi Grimberg wrote: >>> ib_post_fastreg_wr would be a function that needs 3 register passed >>> arguments and does a simple copy to the driver's actual sendq >> >> That will require to take the SQ lock and write a doorbell for each >> registration and post you want to do. I'm confident that constructing >> a post chain with a single sq lock acquire and a single doorbell will >> be much much better even with conditional jumps and memsets. > > You are still thinking at a micro level, the ULP should be working at > a higher level and requesting the MR(s) and the actual work together > so the driver can run the the whole chain of posts without extra stack > traffic, locking or doorbells. But I'd also want to chain the subsequent RDMA(s) or SEND (with the rkey(s) under the same post. I'm sorry but the idea of handling memory region mapping (possibly more than one), detecting gaps and deciding on the strategy of what to do and who knows what else under the send queue lock doesn't seem like a good idea, its a complete overkill IMO. I don't mean to be negative about your ideas, I just don't think that doing all the work in the drivers is going to get us to a better place. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <55B11D84.102-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API [not found] ` <55B11D84.102-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2015-07-23 18:53 ` Jason Gunthorpe [not found] ` <20150723185334.GB31346-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-23 18:53 UTC (permalink / raw) To: Sagi Grimberg Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Thu, Jul 23, 2015 at 07:59:48PM +0300, Sagi Grimberg wrote: > I don't mean to be negative about your ideas, I just don't think that > doing all the work in the drivers is going to get us to a better place. No worries, I'm hoping someone can put the peices together and figure out how to code share all the duplication we seem to have in the ULPs. The more I've look at them, the more it seems like they get basic things wrong, like SQE accouting in NFS, dma flush ordering in NFS, rkey security in SRP/iSER.. Sharing code means we can fix those problems for good. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150723185334.GB31346-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API [not found] ` <20150723185334.GB31346-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-24 14:36 ` Chuck Lever [not found] ` <DE0226A1-A7FC-4618-91F1-FE34347C252A-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Chuck Lever @ 2015-07-24 14:36 UTC (permalink / raw) To: Jason Gunthorpe Cc: Sagi Grimberg, Christoph Hellwig, linux-rdma, Liran Liss, Oren Duer On Jul 23, 2015, at 2:53 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Thu, Jul 23, 2015 at 07:59:48PM +0300, Sagi Grimberg wrote: >> I don't mean to be negative about your ideas, I just don't think that >> doing all the work in the drivers is going to get us to a better place. > > No worries, I'm hoping someone can put the peices together and figure > out how to code share all the duplication we seem to have in the ULPs. > > The more I've look at them, the more it seems like they get basic > things wrong, like SQE accouting in NFS, dma flush ordering in NFS, I have a work-in-progress prototype that addresses both of these issues. Unfinished, but operational: http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=shortlog;h=refs/heads/nfs-rdma-future Having this should give us time to analyze the performance impact of these changes, and to dial in an approach that aligns with your vision about the unified APIs that you and Sagi have been discussing. FRWR is seeing a 10-15% throughput reduction with 8-thread dbench, but a 5% improvement with 16-thread fio IOPS. 4K and 8K direct read and write are negatively impacted. I don’t see any significant change in client CPU utilization, but have not yet examined changes in interrupt workload, nor have I done any spin lock or CPU bus traffic analysis. But none of this is as bad as I feared it could be. There are plenty of other areas that can recoup some or all of this loss eventually. I converted the RPC reply handler tasklet to a work queue context to allow sleeping. A new .ro_unmap_sync method is invoked after the RPC/RDMA header is parsed but before xprt_complete_rqst() wakes up the waiting RPC. .ro_unmap_sync is 100% synchronous. It does not return to the reply handler until the MRs are invalid and unmapped. For FMR, .ro_unmap_sync makes a list of the RPC’s MRs and passes that list to a single ib_unmap_fmr() call, then performs DMA unmap and releases the MRs. This is actually much more efficient than the current logic, which serially does an ib_unmap_fmr() for each MR the RPC owns. So FMR overall performs better with this change. For FRWR, .ro_unmap_sync builds a chain of LOCAL_INV WRs for the RPC’s MRs and posts that with a single ib_post_send(). The final WR in the chain is signaled. A kernel completion is used to wait for the LINV chain to complete. Then DMA unmap and MR release. This lengthens per-RPC latency for FRWR, because the LINVs are now fully accounted for in the RPC round-trip rather than being done asynchronously after the RPC completes. So here performance is closer to FMR, but is still better by a substantial margin. Because the next RPC cannot awaken until the last send completes, send queue accounting is based on RPC/RDMA credit flow control. I’m sure there are some details here that still need to be addressed, but this fixes the big problem with FRWR send queue accounting, which was that LOCAL_INV WRs would continue to consume SQEs while another RPC was allowed to start. I think switching to use s/g lists will be straightforward and could simplify the overall approach somewhat. > rkey security in SRP/iSER.. > > Sharing code means we can fix those problems for good. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <DE0226A1-A7FC-4618-91F1-FE34347C252A-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API [not found] ` <DE0226A1-A7FC-4618-91F1-FE34347C252A-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2015-07-24 16:26 ` Jason Gunthorpe [not found] ` <20150724162657.GA21473-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-24 16:26 UTC (permalink / raw) To: Chuck Lever Cc: Sagi Grimberg, Christoph Hellwig, linux-rdma, Liran Liss, Oren Duer On Fri, Jul 24, 2015 at 10:36:07AM -0400, Chuck Lever wrote: > Unfinished, but operational: > > http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=shortlog;h=refs/heads/nfs-rdma-future Nice.. Can you spend some time and reflect on how some of this could be lowered into the core code? The FMR and FRWR side have many similarities now.. > FRWR is seeing a 10-15% throughput reduction with 8-thread dbench, > but a 5% improvement with 16-thread fio IOPS. 4K and 8K direct > read and write are negatively impacted. I'm not surprised since invalidate is sync. I belive you need to incorporate SEND WITH INVALIDATE to substantially recover this overhead. It would be neat if the RQ could continue to advance while waiting for the invalidate.. That looks almost doable.. > I converted the RPC reply handler tasklet to a work queue context > to allow sleeping. A new .ro_unmap_sync method is invoked after > the RPC/RDMA header is parsed but before xprt_complete_rqst() > wakes up the waiting RPC. .. so the issue is the RPC must be substantially parsed to learn which MR it is associated with to schedule the invalidate? > This is actually much more efficient than the current logic, > which serially does an ib_unmap_fmr() for each MR the RPC owns. > So FMR overall performs better with this change. Interesting.. > Because the next RPC cannot awaken until the last send completes, > send queue accounting is based on RPC/RDMA credit flow control. So for FRWR the sync invalidate effectively guarentees all SQEs related to this RPC are flushed. That seems reasonable, if the number of SQEs and CQEs are properly sized in relation to the RPC slot count it should be workable.. How does FMR and PHYS synchronize? > I’m sure there are some details here that still need to be > addressed, but this fixes the big problem with FRWR send queue > accounting, which was that LOCAL_INV WRs would continue to > consume SQEs while another RPC was allowed to start. Did you test without that artificial limit you mentioned before? I'm also wondering about this: > During some other testing I found that when a completion upcall > returns to the provider leaving CQEs still on the completion queue, > there is a non-zero probability that a completion will be lost. What does lost mean? The CQ is edge triggered, so if you don't drain it you might not get another timely CQ callback (which is bad), but CQEs themselves should not be lost. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150724162657.GA21473-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* RE: [PATCH WIP 38/43] iser-target: Port to new memory registration API [not found] ` <20150724162657.GA21473-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-24 16:34 ` Steve Wise 2015-07-24 17:46 ` Chuck Lever 1 sibling, 0 replies; 142+ messages in thread From: Steve Wise @ 2015-07-24 16:34 UTC (permalink / raw) To: 'Jason Gunthorpe', 'Chuck Lever' Cc: 'Sagi Grimberg', 'Christoph Hellwig', 'linux-rdma', 'Liran Liss', 'Oren Duer' > -----Original Message----- > From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner@vger.kernel.org] On Behalf Of Jason Gunthorpe > Sent: Friday, July 24, 2015 11:27 AM > To: Chuck Lever > Cc: Sagi Grimberg; Christoph Hellwig; linux-rdma; Liran Liss; Oren Duer > Subject: Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API > > On Fri, Jul 24, 2015 at 10:36:07AM -0400, Chuck Lever wrote: > > > Unfinished, but operational: > > > > http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=shortlog;h=refs/heads/nfs-rdma-future > > Nice.. > > Can you spend some time and reflect on how some of this could be > lowered into the core code? The FMR and FRWR side have many > similarities now.. > > > FRWR is seeing a 10-15% throughput reduction with 8-thread dbench, > > but a 5% improvement with 16-thread fio IOPS. 4K and 8K direct > > read and write are negatively impacted. > > I'm not surprised since invalidate is sync. I belive you need to > incorporate SEND WITH INVALIDATE to substantially recover this > overhead. > > It would be neat if the RQ could continue to advance while waiting for > the invalidate.. That looks almost doable.. > > > I converted the RPC reply handler tasklet to a work queue context > > to allow sleeping. A new .ro_unmap_sync method is invoked after > > the RPC/RDMA header is parsed but before xprt_complete_rqst() > > wakes up the waiting RPC. > > .. so the issue is the RPC must be substantially parsed to learn which > MR it is associated with to schedule the invalidate? > > > This is actually much more efficient than the current logic, > > which serially does an ib_unmap_fmr() for each MR the RPC owns. > > So FMR overall performs better with this change. > > Interesting.. > > > Because the next RPC cannot awaken until the last send completes, > > send queue accounting is based on RPC/RDMA credit flow control. > > So for FRWR the sync invalidate effectively guarentees all SQEs > related to this RPC are flushed. That seems reasonable, if the number > of SQEs and CQEs are properly sized in relation to the RPC slot count > it should be workable.. > > How does FMR and PHYS synchronize? > > > I’m sure there are some details here that still need to be > > addressed, but this fixes the big problem with FRWR send queue > > accounting, which was that LOCAL_INV WRs would continue to > > consume SQEs while another RPC was allowed to start. > > Did you test without that artificial limit you mentioned before? > > I'm also wondering about this: > > > During some other testing I found that when a completion upcall > > returns to the provider leaving CQEs still on the completion queue, > > there is a non-zero probability that a completion will be lost. > > What does lost mean? > > The CQ is edge triggered, so if you don't drain it you might not get > another timely CQ callback (which is bad), but CQEs themselves should > not be lost. > This condition (not fully draining the CQEs) is due to SQ flow control, yes? If so, then when the SQ resumes can it wake up the appropriate thread (simulating another CQE insertion)? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API [not found] ` <20150724162657.GA21473-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-24 16:34 ` Steve Wise @ 2015-07-24 17:46 ` Chuck Lever [not found] ` <903CDFB5-04FE-47B6-B044-E960E8A8BC4C-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 142+ messages in thread From: Chuck Lever @ 2015-07-24 17:46 UTC (permalink / raw) To: Jason Gunthorpe Cc: Sagi Grimberg, Christoph Hellwig, linux-rdma, Liran Liss, Oren Duer On Jul 24, 2015, at 12:26 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Fri, Jul 24, 2015 at 10:36:07AM -0400, Chuck Lever wrote: > >> Unfinished, but operational: >> >> http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=shortlog;h=refs/heads/nfs-rdma-future > > Nice.. > > Can you spend some time and reflect on how some of this could be > lowered into the core code? The point of the prototype is to start thinking about this with actual data. :-) So I’m with you. > The FMR and FRWR side have many > similarities now.. >> FRWR is seeing a 10-15% throughput reduction with 8-thread dbench, >> but a 5% improvement with 16-thread fio IOPS. 4K and 8K direct >> read and write are negatively impacted. > > I'm not surprised since invalidate is sync. I belive you need to > incorporate SEND WITH INVALIDATE to substantially recover this > overhead. I tried to find another kernel ULP using SEND WITH INVALIDATE, but I didn’t see one. I assume you mean the NFS server would use this WR when replying, to knock down the RPC’s client MRs remotely? > It would be neat if the RQ could continue to advance while waiting for > the invalidate.. That looks almost doable.. The new reply handling work queue is not restricted to serial reply processing. Unlike the tasklet model, multiple RPC replies can be processed at once, and can run across all CPUs. The tasklet was global, shared across all RPC/RDMA receive queues on that client. AFAICT there is very little else that is shared between RPC replies. I think using a work queue instead may be a tiny bit slower for each RPC (perhaps due to additional context switching), but will allow much better scaling with the number of transports and mount points the client creates. I may not have understood your comment. >> I converted the RPC reply handler tasklet to a work queue context >> to allow sleeping. A new .ro_unmap_sync method is invoked after >> the RPC/RDMA header is parsed but before xprt_complete_rqst() >> wakes up the waiting RPC. > > .. so the issue is the RPC must be substantially parsed to learn which > MR it is associated with to schedule the invalidate? Only the RPC/RDMA header has to be parsed, but yes. The needed parsing is handled in rpcrdma_reply_handler right before the .ro_unmap_unsync call. Parsing the RPC reply results is then done by the upper layer once xprt_complete_rqst() has run. >> This is actually much more efficient than the current logic, >> which serially does an ib_unmap_fmr() for each MR the RPC owns. >> So FMR overall performs better with this change. > > Interesting.. > >> Because the next RPC cannot awaken until the last send completes, >> send queue accounting is based on RPC/RDMA credit flow control. > > So for FRWR the sync invalidate effectively guarentees all SQEs > related to this RPC are flushed. That seems reasonable, if the number > of SQEs and CQEs are properly sized in relation to the RPC slot count > it should be workable.. Yes, both queues are sized in rpcrdma_ep_create() according to the slot count / credit limit. > How does FMR and PHYS synchronize? We still rely on timing there. The RPC's send buffer may be re-allocated by the next RPC if that RPC wants to send a bigger request than this one. Thus there is still a tiny but non-zero risk the HCA may not be done with that send buffer. Closing that hole is still on my to-do list. >> I’m sure there are some details here that still need to be >> addressed, but this fixes the big problem with FRWR send queue >> accounting, which was that LOCAL_INV WRs would continue to >> consume SQEs while another RPC was allowed to start. > > Did you test without that artificial limit you mentioned before? Yes. No problems now, the limit is removed in the last patch in that series. > I'm also wondering about this: > >> During some other testing I found that when a completion upcall >> returns to the provider leaving CQEs still on the completion queue, >> there is a non-zero probability that a completion will be lost. > > What does lost mean? Lost means a WC in the CQ is skipped by ib_poll_cq(). In other words, I expected that during the next upcall, ib_poll_cq() would return WCs that were not processed, starting with the last one on the CQ when my upcall handler returned. I found this by intentionally having the completion handler process only one or two WCs and then return. > The CQ is edge triggered, so if you don't drain it you might not get > another timely CQ callback (which is bad), but CQEs themselves should > not be lost. I’m not sure I fully understand this problem, it might even be my misuderstanding about ib_poll_cq(). But forcing the completion upcall handler to completely drain the CQ during each upcall prevents the issue. (Note, I don’t think fixing this is a pre-requisite for the synchronous invalidate work, but it just happened to be in the patch queue). -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <903CDFB5-04FE-47B6-B044-E960E8A8BC4C-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API [not found] ` <903CDFB5-04FE-47B6-B044-E960E8A8BC4C-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2015-07-24 19:10 ` Jason Gunthorpe [not found] ` <20150724191003.GA26225-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-27 15:57 ` Chuck Lever 1 sibling, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-24 19:10 UTC (permalink / raw) To: Chuck Lever Cc: Sagi Grimberg, Christoph Hellwig, linux-rdma, Liran Liss, Oren Duer On Fri, Jul 24, 2015 at 01:46:05PM -0400, Chuck Lever wrote: > > I'm not surprised since invalidate is sync. I belive you need to > > incorporate SEND WITH INVALIDATE to substantially recover this > > overhead. > > I tried to find another kernel ULP using SEND WITH INVALIDATE, but > I didn’t see one. I assume you mean the NFS server would use this > WR when replying, to knock down the RPC’s client MRs remotely? Yes. I think the issue with it not being used in the kernel is mainly to do with lack of standardization. The verb cannot be used unless both sides negotiate it and perhaps the older RDMA protocols have not been revised to include that. For simple testing purposes it shouldn't be too hard to force it to get an idea if it is worth perusing. On the RECV work completion check if the right rkey was invalidated and skip the invalidation step. Presumably the HCA does all this internally very quickly.. > I may not have understood your comment. Okay, I didn't looke closely at the entire series together.. > Only the RPC/RDMA header has to be parsed, but yes. The needed > parsing is handled in rpcrdma_reply_handler right before the > .ro_unmap_unsync call. Right, okay, if this could be done in the rq callback itself rather than bounce to a wq and immediately turn around the needed invalidate posts you'd get back a little more overhead by reducing the time to turn it around... Then bounce to the wq to complete from the SQ callback ? > > Did you test without that artificial limit you mentioned before? > > Yes. No problems now, the limit is removed in the last patch > in that series. Okay, so that was just overflowing the sq due to not accounting.. > >> During some other testing I found that when a completion upcall > >> returns to the provider leaving CQEs still on the completion queue, > >> there is a non-zero probability that a completion will be lost. > > > > What does lost mean? > > Lost means a WC in the CQ is skipped by ib_poll_cq(). > > In other words, I expected that during the next upcall, > ib_poll_cq() would return WCs that were not processed, starting > with the last one on the CQ when my upcall handler returned. Yes, this is what it should do. I wouldn't expect a timely upcall, but none should be lost. > I found this by intentionally having the completion handler > process only one or two WCs and then return. > > > The CQ is edge triggered, so if you don't drain it you might not get > > another timely CQ callback (which is bad), but CQEs themselves should > > not be lost. > > I’m not sure I fully understand this problem, it might > even be my misuderstanding about ib_poll_cq(). But forcing > the completion upcall handler to completely drain the CQ > during each upcall prevents the issue. CQs should never be lost. The idea that you can completely drain the CQ during the upcall is inherently racey, so this cannot be the answer to whatever the problem is.. Is there any chance this is still an artifact of the lazy SQE flow control? The RDMA buffer SQE recycling is solved by the sync invalidate, but workloads that don't use RDMA buffers (ie SEND only) will still run without proper flow control... If you are totally certain a CQ was dropped from ib_poll_cq, and that the SQ is not overflowing by strict accounting, then I'd say driver problem, but the odds of having an undetected driver problem like that at this point seem somehow small... Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150724191003.GA26225-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API [not found] ` <20150724191003.GA26225-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-24 19:59 ` Chuck Lever [not found] ` <A1A0BF6E-992A-4B34-8D24-EA8AA8D6983B-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Chuck Lever @ 2015-07-24 19:59 UTC (permalink / raw) To: Jason Gunthorpe Cc: Sagi Grimberg, Christoph Hellwig, linux-rdma, Liran Liss, Oren Duer On Jul 24, 2015, at 3:10 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Fri, Jul 24, 2015 at 01:46:05PM -0400, Chuck Lever wrote: >>> I'm not surprised since invalidate is sync. I belive you need to >>> incorporate SEND WITH INVALIDATE to substantially recover this >>> overhead. >> >> I tried to find another kernel ULP using SEND WITH INVALIDATE, but >> I didn’t see one. I assume you mean the NFS server would use this >> WR when replying, to knock down the RPC’s client MRs remotely? > > Yes. I think the issue with it not being used in the kernel is mainly > to do with lack of standardization. The verb cannot be used unless > both sides negotiate it and perhaps the older RDMA protocols have not > been revised to include that. And RPC-over-RDMA version 1 does not have any way to signal that the server has invalidated the MRs. Such signaling would be a pre-requisite to allow the Linux NFS/RDMA client to interoperate with non-Linux NFS/RDMA servers that do not have such support. >> Only the RPC/RDMA header has to be parsed, but yes. The needed >> parsing is handled in rpcrdma_reply_handler right before the >> .ro_unmap_unsync call. > > Right, okay, if this could be done in the rq callback itself rather > than bounce to a wq and immediately turn around the needed invalidate > posts you'd get back a little more overhead by reducing the time to > turn it around... Then bounce to the wq to complete from the SQ > callback ? For FRWR, you could post LINV from the receive completion upcall handler, and handle the rest of the invalidation from the send completion upcall, then poke the RPC reply handler. But this wouldn’t work at all for FMR, whose unmap verb is synchronous, would it? I’m not sure we’d buy more than a few microseconds here, and the receive upcall is single-threaded. I’ll move the “lost WC” discussion to another thread. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <A1A0BF6E-992A-4B34-8D24-EA8AA8D6983B-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API [not found] ` <A1A0BF6E-992A-4B34-8D24-EA8AA8D6983B-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2015-07-24 20:24 ` Jason Gunthorpe [not found] ` <20150724202445.GA28033-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-24 20:24 UTC (permalink / raw) To: Chuck Lever Cc: Sagi Grimberg, Christoph Hellwig, linux-rdma, Liran Liss, Oren Duer On Fri, Jul 24, 2015 at 03:59:06PM -0400, Chuck Lever wrote: > And RPC-over-RDMA version 1 does not have any way to signal that > the server has invalidated the MRs. Such signaling would be a > pre-requisite to allow the Linux NFS/RDMA client to interoperate > with non-Linux NFS/RDMA servers that do not have such support. You can implement client support immediately, nothing special is required. When processing a SEND WC check ex.invalidate_rkey and IB_WC_WITH_INVALIDATE. If that rkey matches the MR associated with that RPC slot then skip the invalidate. No protocol negotiation is required at that point. I am unclear what happens sever side if the server starts issuing SEND_WITH_INVALIDATE to a client that doesn't expect it. The net result is a MR would be invalidated twice. I don't know if this is OK or not. If it is OK, then the server can probably just start using it as well without negotiation. Otherwise the client has to signal the server it supports it once at connection setup. > For FRWR, you could post LINV from the receive completion upcall > handler, and handle the rest of the invalidation from the send > completion upcall, then poke the RPC reply handler. Yes > But this wouldn’t work at all for FMR, whose unmap verb is > synchronous, would it? It could run the FMR unmap in a thread/workqueue/tasklet and then complete the RPC side from that context. Same basic idea, using your taslket not the driver's sendq context. > I’m not sure we’d buy more than a few microseconds here, and > the receive upcall is single-threaded. Not sure on how that matches your performance goals, just remarking that lauching the invalidate in the recv upcall and completing processing from the sendq upcall is the very best performance you can expect from this API. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150724202445.GA28033-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* RE: [PATCH WIP 38/43] iser-target: Port to new memory registration API [not found] ` <20150724202445.GA28033-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-24 22:13 ` Steve Wise 2015-07-24 22:44 ` Jason Gunthorpe 0 siblings, 1 reply; 142+ messages in thread From: Steve Wise @ 2015-07-24 22:13 UTC (permalink / raw) To: 'Jason Gunthorpe', 'Chuck Lever' Cc: 'Sagi Grimberg', 'Christoph Hellwig', 'linux-rdma', 'Liran Liss', 'Oren Duer' > -----Original Message----- > From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner@vger.kernel.org] On Behalf Of Jason Gunthorpe > Sent: Friday, July 24, 2015 3:25 PM > To: Chuck Lever > Cc: Sagi Grimberg; Christoph Hellwig; linux-rdma; Liran Liss; Oren Duer > Subject: Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API > > On Fri, Jul 24, 2015 at 03:59:06PM -0400, Chuck Lever wrote: > > And RPC-over-RDMA version 1 does not have any way to signal that > > the server has invalidated the MRs. Such signaling would be a > > pre-requisite to allow the Linux NFS/RDMA client to interoperate > > with non-Linux NFS/RDMA servers that do not have such support. > > You can implement client support immediately, nothing special is > required. > > When processing a SEND WC check ex.invalidate_rkey and > IB_WC_WITH_INVALIDATE. If that rkey matches the MR associated with > that RPC slot then skip the invalidate. > > No protocol negotiation is required at that point. > > I am unclear what happens sever side if the server starts issuing > SEND_WITH_INVALIDATE to a client that doesn't expect it. The net > result is a MR would be invalidated twice. I don't know if this is OK > or not. > It is ok to invalidate an already-invalid MR. > If it is OK, then the server can probably just start using it as > well without negotiation. > > Otherwise the client has to signal the server it supports it once at > connection setup. > > > For FRWR, you could post LINV from the receive completion upcall > > handler, and handle the rest of the invalidation from the send > > completion upcall, then poke the RPC reply handler. > > Yes > > > But this wouldn’t work at all for FMR, whose unmap verb is > > synchronous, would it? > > It could run the FMR unmap in a thread/workqueue/tasklet and then > complete the RPC side from that context. Same basic idea, using your > taslket not the driver's sendq context. > > > I’m not sure we’d buy more than a few microseconds here, and > > the receive upcall is single-threaded. > > Not sure on how that matches your performance goals, just remarking > that lauching the invalidate in the recv upcall and completing > processing from the sendq upcall is the very best performance you can > expect from this API. > > Jason > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API 2015-07-24 22:13 ` Steve Wise @ 2015-07-24 22:44 ` Jason Gunthorpe 0 siblings, 0 replies; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-24 22:44 UTC (permalink / raw) To: Steve Wise Cc: 'Chuck Lever', 'Sagi Grimberg', 'Christoph Hellwig', 'linux-rdma', 'Liran Liss', 'Oren Duer' > > I am unclear what happens sever side if the server starts issuing > > SEND_WITH_INVALIDATE to a client that doesn't expect it. The net > > result is a MR would be invalidated twice. I don't know if this is OK > > or not. > > It is ok to invalidate an already-invalid MR. Nice, ah but I forgot about the last issue.. A server must not send the SEND_WITH_INVALIDATE OP to a client HCA that does not support it in HW. At least on IB the operation code is different, so it will break.. So negotiation is needed.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API [not found] ` <903CDFB5-04FE-47B6-B044-E960E8A8BC4C-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2015-07-24 19:10 ` Jason Gunthorpe @ 2015-07-27 15:57 ` Chuck Lever [not found] ` <8A2BC019-1DC0-4531-9659-3181EE9A4B43-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 142+ messages in thread From: Chuck Lever @ 2015-07-27 15:57 UTC (permalink / raw) To: Jason Gunthorpe Cc: Sagi Grimberg, Christoph Hellwig, linux-rdma, Liran Liss, Oren Duer On Jul 24, 2015, at 1:46 PM, Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote: > On Jul 24, 2015, at 12:26 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > >> On Fri, Jul 24, 2015 at 10:36:07AM -0400, Chuck Lever wrote: >> >>> Unfinished, but operational: >>> >>> http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=shortlog;h=refs/heads/nfs-rdma-future >> >> Nice.. >> >> Can you spend some time and reflect on how some of this could be >> lowered into the core code? > > The point of the prototype is to start thinking about this with > actual data. :-) So I’m with you. > > >> The FMR and FRWR side have many >> similarities now.. IMO ib_unmap_fmr is a very different animal from LOCAL_INV WR. ib_unmap_fmr is synchronous, provides no ordering guarantees with send queue operations, and does not depend on a connected QP to be available. You could emulate asynchronicity with a work queue but that still does not provide SQ ordering. There are few if any failure modes for ib_unmap_fmr. LOCAL_INV WR is asynchronous, provides strong ordering with other send queue operations, but does require a non-NULL QP in RTS to work. The failure modes are complex: without a QP in RTS, the post_send fails. If the QP leaves RTS while LOCAL_INV is in flight, the LINV flushes. MRs can be left in a state where the MR's rkey is not in sync with the HW, in which case a synchronous operation may be required to recover the MR. These are the reasons I elected to employ a synchronous invalidation model in the RPC reply handler. This model can be made to work adequately for both FMR and FRWR, provides proper DMA unmap ordering guarantees for both, and hides wonky transport disconnect recovery mechanics. The only downside is the performance cost. A generic MR invalidation API that buries underlying verb activity and guarantees proper DMA unmap ordering I think would have to be synchronous. In the long run, two things will change: first, FMR will eventually be deprecated; and second, ULPs will likely adopt SEND_WITH_INV. The complexion of MR invalidation could be vastly different in a few years: handled entirely by the target-side, and only verified by the initiator. Verification doesn't need to sleep, and the slow path (the target failed to invalidate) can be deferred. All that would be necessary at that point would be a synchronous invalidation API (synchronous in the sense that the invalidate is complete if the API returns without error). -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <8A2BC019-1DC0-4531-9659-3181EE9A4B43-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API [not found] ` <8A2BC019-1DC0-4531-9659-3181EE9A4B43-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2015-07-27 17:25 ` Jason Gunthorpe [not found] ` <20150727172510.GD18348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-27 17:25 UTC (permalink / raw) To: Chuck Lever Cc: Sagi Grimberg, Christoph Hellwig, linux-rdma, Liran Liss, Oren Duer On Mon, Jul 27, 2015 at 11:57:46AM -0400, Chuck Lever wrote: > IMO ib_unmap_fmr is a very different animal from LOCAL_INV WR. Sure, but how many of these properties does NFS actually care about, now that it is running the API properly? > ib_unmap_fmr is synchronous, provides no ordering guarantees with > send queue operations, and does not depend on a connected QP to > be available. You could emulate asynchronicity with a work queue > but that still does not provide SQ ordering. There are few if any > failure modes for ib_unmap_fmr. I'm having a hard time seeing how SQ ordering is important when the API is used properly. Once you explicitly order the DMA unmap after the invalidate completion you no longer need implicit SQ ordering Is there a way to combine SQ implicit ordering and the Linux DMA API together correctly? > flight, the LINV flushes. MRs can be left in a state where the > MR's rkey is not in sync with the HW, in which case a > synchronous operation may be required to recover the MR. The error handling seems like a trivial difference, a ib_recover_failed_qp_mr(mr); sort of call could resync everything after a QP blows up.. > The complexion of MR invalidation could be vastly different in > a few years: handled entirely by the target-side, and only > verified by the initiator. Verification doesn't need to sleep, > and the slow path (the target failed to invalidate) can be > deferred. The initiator still needs to have the ability to issue the invalidate if the target doesn't do it, so all the code still exists.. Even ignoring those issues, should we be talking about putting FMR under the new ib_alloc_mr and ib_map_mr interfaces? Would that help much even if the post and unmap flows are totally different? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150727172510.GD18348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API [not found] ` <20150727172510.GD18348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-28 20:06 ` Chuck Lever [not found] ` <B045BAC2-0360-4D97-A220-7DB52AF90BF7-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Chuck Lever @ 2015-07-28 20:06 UTC (permalink / raw) To: Jason Gunthorpe Cc: Sagi Grimberg, Christoph Hellwig, linux-rdma, Liran Liss, Oren Duer On Jul 27, 2015, at 1:25 PM, Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: > On Mon, Jul 27, 2015 at 11:57:46AM -0400, Chuck Lever wrote: >> IMO ib_unmap_fmr is a very different animal from LOCAL_INV WR. > > Sure, but how many of these properties does NFS actually care about, > now that it is running the API properly? > >> ib_unmap_fmr is synchronous, provides no ordering guarantees with >> send queue operations, and does not depend on a connected QP to >> be available. You could emulate asynchronicity with a work queue >> but that still does not provide SQ ordering. There are few if any >> failure modes for ib_unmap_fmr. > > I'm having a hard time seeing how SQ ordering is important when the > API is used properly. Once you explicitly order the DMA unmap after > the invalidate completion you no longer need implicit SQ ordering > > Is there a way to combine SQ implicit ordering and the Linux DMA API > together correctly? > >> flight, the LINV flushes. MRs can be left in a state where the >> MR's rkey is not in sync with the HW, in which case a >> synchronous operation may be required to recover the MR. > > The error handling seems like a trivial difference, a > ib_recover_failed_qp_mr(mr); sort of call could resync everything > after a QP blows up.. Out of interest, why does this need to be exposed to ULPs? I don't feel a ULP should have to deal with broken MRs following a transport disconnect. It falls in that category of things every ULP that supports FRWR has to do, and each has plenty of opportunity to get it wrong. >> The complexion of MR invalidation could be vastly different in >> a few years: handled entirely by the target-side, and only >> verified by the initiator. Verification doesn't need to sleep, >> and the slow path (the target failed to invalidate) can be >> deferred. > > The initiator still needs to have the ability to issue the invalidate > if the target doesn't do it, so all the code still exists.. > > Even ignoring those issues, should we be talking about putting FMR > under the new ib_alloc_mr and ib_map_mr interfaces? Would that help > much even if the post and unmap flows are totally different? My opinion is FMR should be separate from the new API. Some have expressed an interest in combining all kernel registration mechanisms under a single API, but they seem too different from each other to do that successfully. -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <B045BAC2-0360-4D97-A220-7DB52AF90BF7-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API [not found] ` <B045BAC2-0360-4D97-A220-7DB52AF90BF7-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2015-07-29 6:32 ` Christoph Hellwig 0 siblings, 0 replies; 142+ messages in thread From: Christoph Hellwig @ 2015-07-29 6:32 UTC (permalink / raw) To: Chuck Lever Cc: Jason Gunthorpe, Sagi Grimberg, Christoph Hellwig, linux-rdma, Liran Liss, Oren Duer On Tue, Jul 28, 2015 at 04:06:23PM -0400, Chuck Lever wrote: > My opinion is FMR should be separate from the new API. Some have > expressed an interest in combining all kernel registration > mechanisms under a single API, but they seem too different from > each other to do that successfully. Hi Chuck, I think we can fit FMR partially under this API, e.g. alloc and map_sg fit in very well, but then instead of post and invalidate we'll need to call into slightly modified existing FMR pool APIs. I'd suggest to postponed the issue for now, I'll prepare a prototype once we've finished the FR-side API. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* [PATCH WIP 39/43] IB/core: Add arbitrary sg_list support [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (37 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 38/43] iser-target: " Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg [not found] ` <1437548143-24893-40-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 6:55 ` [PATCH WIP 40/43] mlx5: Allocate private context for arbitrary scatterlist registration Sagi Grimberg ` (4 subsequent siblings) 43 siblings, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- include/rdma/ib_verbs.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index d543fee..cc83c39 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -133,6 +133,7 @@ enum ib_device_cap_flags { IB_DEVICE_MANAGED_FLOW_STEERING = (1<<29), IB_DEVICE_SIGNATURE_HANDOVER = (1<<30), IB_DEVICE_ON_DEMAND_PAGING = (1<<31), + IB_DEVICE_MAP_ARB_SG = (1ULL<<32), }; enum ib_signature_prot_cap { @@ -193,7 +194,7 @@ struct ib_device_attr { u32 hw_ver; int max_qp; int max_qp_wr; - int device_cap_flags; + u64 device_cap_flags; int max_sge; int max_sge_rd; int max_cq; @@ -556,6 +557,11 @@ __attribute_const__ int ib_rate_to_mult(enum ib_rate rate); */ __attribute_const__ int ib_rate_to_mbps(enum ib_rate rate); +enum ib_mr_flags { + IB_MR_MAP_ARB_SG = 1, +}; + + enum ib_mr_type { IB_MR_TYPE_FAST_REG, IB_MR_TYPE_SIGNATURE, -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
[parent not found: <1437548143-24893-40-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH WIP 39/43] IB/core: Add arbitrary sg_list support [not found] ` <1437548143-24893-40-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2015-07-22 17:05 ` Christoph Hellwig 2015-07-22 17:22 ` Jason Gunthorpe 1 sibling, 0 replies; 142+ messages in thread From: Christoph Hellwig @ 2015-07-22 17:05 UTC (permalink / raw) To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer > + IB_DEVICE_MAP_ARB_SG = (1ULL<<32), > +enum ib_mr_flags { > + IB_MR_MAP_ARB_SG = 1, > +}; > + s/ARB_SG/SG_GAPS/? Also please try to document new flags. I know the IB code currently doesn't do it, but starting a trend there would be very useful. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH WIP 39/43] IB/core: Add arbitrary sg_list support [not found] ` <1437548143-24893-40-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 17:05 ` Christoph Hellwig @ 2015-07-22 17:22 ` Jason Gunthorpe [not found] ` <20150722172255.GD26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 1 sibling, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-22 17:22 UTC (permalink / raw) To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Wed, Jul 22, 2015 at 09:55:39AM +0300, Sagi Grimberg wrote: > +enum ib_mr_flags { > + IB_MR_MAP_ARB_SG = 1, > +}; Something about this just seems ugly. We are back to what we were trying to avoid: Adding more types of MRs.. Is this really necessary? Do you really need to know the MR type when the MR is created, or can the adaptor change types on the fly during registration? iSER for example has a rarely used corner case where it needs this, but it just turns on the feature unconditionally right away. This incures 2x the overhead in the MR allocations and who knows what performance impact on the adaptor side. It would be so much better if it could switch to this mode on a SG by SG list basis. Same for signature. In other words: It would be so much cleaner if ib_map_mr_sg set the MR type based on the need. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150722172255.GD26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 39/43] IB/core: Add arbitrary sg_list support [not found] ` <20150722172255.GD26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-22 17:29 ` Sagi Grimberg 0 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 17:29 UTC (permalink / raw) To: Jason Gunthorpe, Sagi Grimberg Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/22/2015 8:22 PM, Jason Gunthorpe wrote: > On Wed, Jul 22, 2015 at 09:55:39AM +0300, Sagi Grimberg wrote: >> +enum ib_mr_flags { >> + IB_MR_MAP_ARB_SG = 1, >> +}; > > Something about this just seems ugly. We are back to what we were > trying to avoid: Adding more types of MRs.. > > Is this really necessary? Do you really need to know the MR type when > the MR is created, or can the adaptor change types on the fly during > registration? > > iSER for example has a rarely used corner case where it needs this, I can tell you that its anything but a corner case. direct-io, bio merges, FS operations and PI are examples where most of the sg lists *will* be "gappy". Trust me, it's fairly common to see those... > but it just turns on the feature unconditionally right away. This > incures 2x the overhead in the MR allocations and who knows what > performance impact on the adaptor side. I ran various workloads with this, and performance seems to sustain. > > It would be so much better if it could switch to this mode on a SG by > SG list basis. It would, but unfortunately it can't. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* [PATCH WIP 40/43] mlx5: Allocate private context for arbitrary scatterlist registration [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (38 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 39/43] IB/core: Add arbitrary sg_list support Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg [not found] ` <1437548143-24893-41-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 6:55 ` [PATCH WIP 41/43] mlx5: Add arbitrary sg list support Sagi Grimberg ` (3 subsequent siblings) 43 siblings, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/mlx5/mlx5_ib.h | 6 ++- drivers/infiniband/hw/mlx5/mr.c | 71 ++++++++++++++++++++++++++++++------ 2 files changed, 64 insertions(+), 13 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 7017a1a..fb3ac22 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -315,11 +315,15 @@ enum mlx5_ib_mtt_access_flags { struct mlx5_ib_mr { struct ib_mr ibmr; - u64 *pl; + union { + __be64 *pl; + struct mlx5_klm *klms; + }; __be64 *mpl; dma_addr_t pl_map; int ndescs; int max_descs; + int access_mode; struct mlx5_core_mr mmr; struct ib_umem *umem; struct mlx5_shared_mr_info *smr_info; diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index 7a030a2..45209c7 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -1168,6 +1168,40 @@ error: } static int +mlx5_alloc_klm_list(struct ib_device *device, + struct mlx5_ib_mr *mr, int ndescs) +{ + int size = sizeof(struct mlx5_klm) * ndescs; + + size += max_t(int, MLX5_UMR_ALIGN - ARCH_KMALLOC_MINALIGN, 0); + mr->klms = kzalloc(size, GFP_KERNEL); + if (!mr->klms) + return -ENOMEM; + + mr->pl_map = dma_map_single(device->dma_device, mr->klms, + size, DMA_TO_DEVICE); + if (dma_mapping_error(device->dma_device, mr->pl_map)) + goto err; + + return 0; +err: + kfree(mr->klms); + + return -ENOMEM; +} + +static void +mlx5_free_klm_list(struct mlx5_ib_mr *mr) +{ + struct ib_device *device = mr->ibmr.device; + int size = mr->max_descs * sizeof(struct mlx5_klm); + + kfree(mr->klms); + dma_unmap_single(device->dma_device, mr->pl_map, size, DMA_TO_DEVICE); + mr->klms = NULL; +} + +static int mlx5_alloc_page_list(struct ib_device *device, struct mlx5_ib_mr *mr, int ndescs) { @@ -1222,7 +1256,10 @@ static int clean_mr(struct mlx5_ib_mr *mr) mr->sig = NULL; } - mlx5_free_page_list(mr); + if (mr->access_mode == MLX5_ACCESS_MODE_MTT) + mlx5_free_page_list(mr); + else + mlx5_free_klm_list(mr); if (!umred) { err = destroy_mkey(dev, mr); @@ -1293,10 +1330,10 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd, struct mlx5_ib_dev *dev = to_mdev(pd->device); struct mlx5_create_mkey_mbox_in *in; struct mlx5_ib_mr *mr; - int access_mode, err; - int ndescs = roundup(max_entries, 4); + int ndescs = ALIGN(max_entries, 4); + int err; - if (flags) + if (flags & ~IB_MR_MAP_ARB_SG) return ERR_PTR(-EINVAL); mr = kzalloc(sizeof(*mr), GFP_KERNEL); @@ -1315,13 +1352,20 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd, in->seg.flags_pd = cpu_to_be32(to_mpd(pd)->pdn); if (mr_type == IB_MR_TYPE_FAST_REG) { - access_mode = MLX5_ACCESS_MODE_MTT; - in->seg.log2_page_size = PAGE_SHIFT; + if (flags & IB_MR_MAP_ARB_SG) { + mr->access_mode = MLX5_ACCESS_MODE_KLM; - err = mlx5_alloc_page_list(pd->device, mr, ndescs); - if (err) - goto err_free_in; + err = mlx5_alloc_klm_list(pd->device, mr, ndescs); + if (err) + goto err_free_in; + } else { + mr->access_mode = MLX5_ACCESS_MODE_MTT; + in->seg.log2_page_size = PAGE_SHIFT; + err = mlx5_alloc_page_list(pd->device, mr, ndescs); + if (err) + goto err_free_in; + } mr->max_descs = ndescs; } else if (mr_type == IB_MR_TYPE_SIGNATURE) { u32 psv_index[2]; @@ -1341,7 +1385,7 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd, if (err) goto err_free_sig; - access_mode = MLX5_ACCESS_MODE_KLM; + mr->access_mode = MLX5_ACCESS_MODE_KLM; mr->sig->psv_memory.psv_idx = psv_index[0]; mr->sig->psv_wire.psv_idx = psv_index[1]; @@ -1355,7 +1399,7 @@ struct ib_mr *mlx5_ib_alloc_mr(struct ib_pd *pd, goto err_free_in; } - in->seg.flags = MLX5_PERM_UMR_EN | access_mode; + in->seg.flags = MLX5_PERM_UMR_EN | mr->access_mode; err = mlx5_core_create_mkey(dev->mdev, &mr->mmr, in, sizeof(*in), NULL, NULL, NULL); if (err) @@ -1379,7 +1423,10 @@ err_destroy_psv: mlx5_ib_warn(dev, "failed to destroy wire psv %d\n", mr->sig->psv_wire.psv_idx); } - mlx5_free_page_list(mr); + if (mr->access_mode == MLX5_ACCESS_MODE_MTT) + mlx5_free_page_list(mr); + else + mlx5_free_klm_list(mr); err_free_sig: kfree(mr->sig); err_free_in: -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
[parent not found: <1437548143-24893-41-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH WIP 40/43] mlx5: Allocate private context for arbitrary scatterlist registration [not found] ` <1437548143-24893-41-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2015-07-22 17:30 ` Jason Gunthorpe [not found] ` <20150722173048.GF26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-22 17:30 UTC (permalink / raw) To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Wed, Jul 22, 2015 at 09:55:40AM +0300, Sagi Grimberg wrote: > + size += max_t(int, MLX5_UMR_ALIGN - ARCH_KMALLOC_MINALIGN, 0); > + mr->klms = kzalloc(size, GFP_KERNEL); > + if (!mr->klms) > + return -ENOMEM; > + > + mr->pl_map = dma_map_single(device->dma_device, mr->klms, > + size, DMA_TO_DEVICE); This is a misuse of the DMA API, you must call dma_map_single after the memory is set by the CPU, not before. The fast reg varient is using coherent allocations, which is OK.. Personally, I'd switch them both to map_single, then when copying the scatter list - Make sure the buffer is DMA unmapped - Copy - dma_map_single Unless there is some additional reason for the coherent allocation.. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150722173048.GF26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 40/43] mlx5: Allocate private context for arbitrary scatterlist registration [not found] ` <20150722173048.GF26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-23 9:25 ` Christoph Hellwig [not found] ` <20150723092532.GC32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Christoph Hellwig @ 2015-07-23 9:25 UTC (permalink / raw) To: Jason Gunthorpe Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Wed, Jul 22, 2015 at 11:30:48AM -0600, Jason Gunthorpe wrote: > On Wed, Jul 22, 2015 at 09:55:40AM +0300, Sagi Grimberg wrote: > > + size += max_t(int, MLX5_UMR_ALIGN - ARCH_KMALLOC_MINALIGN, 0); > > + mr->klms = kzalloc(size, GFP_KERNEL); > > + if (!mr->klms) > > + return -ENOMEM; > > + > > + mr->pl_map = dma_map_single(device->dma_device, mr->klms, > > + size, DMA_TO_DEVICE); > > This is a misuse of the DMA API, you must call dma_map_single after > the memory is set by the CPU, not before. > > The fast reg varient is using coherent allocations, which is OK.. It's fine as long as you dma_sync_*_for_{cpu,device} in the right places, which is what a lot of drivers do for longer held allocations. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150723092532.GC32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>]
* Re: [PATCH WIP 40/43] mlx5: Allocate private context for arbitrary scatterlist registration [not found] ` <20150723092532.GC32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> @ 2015-07-23 10:28 ` Sagi Grimberg 2015-07-23 16:04 ` Jason Gunthorpe 1 sibling, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-23 10:28 UTC (permalink / raw) To: Christoph Hellwig, Jason Gunthorpe Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/23/2015 12:25 PM, Christoph Hellwig wrote: > On Wed, Jul 22, 2015 at 11:30:48AM -0600, Jason Gunthorpe wrote: >> On Wed, Jul 22, 2015 at 09:55:40AM +0300, Sagi Grimberg wrote: >>> + size += max_t(int, MLX5_UMR_ALIGN - ARCH_KMALLOC_MINALIGN, 0); >>> + mr->klms = kzalloc(size, GFP_KERNEL); >>> + if (!mr->klms) >>> + return -ENOMEM; >>> + >>> + mr->pl_map = dma_map_single(device->dma_device, mr->klms, >>> + size, DMA_TO_DEVICE); >> >> This is a misuse of the DMA API, you must call dma_map_single after >> the memory is set by the CPU, not before. >> >> The fast reg varient is using coherent allocations, which is OK.. > > It's fine as long as you dma_sync_*_for_{cpu,device} in the right > places, which is what a lot of drivers do for longer held allocations. OK. I'll fix that. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH WIP 40/43] mlx5: Allocate private context for arbitrary scatterlist registration [not found] ` <20150723092532.GC32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-07-23 10:28 ` Sagi Grimberg @ 2015-07-23 16:04 ` Jason Gunthorpe 1 sibling, 0 replies; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-23 16:04 UTC (permalink / raw) To: Christoph Hellwig Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Thu, Jul 23, 2015 at 02:25:32AM -0700, Christoph Hellwig wrote: > On Wed, Jul 22, 2015 at 11:30:48AM -0600, Jason Gunthorpe wrote: > > On Wed, Jul 22, 2015 at 09:55:40AM +0300, Sagi Grimberg wrote: > > > + size += max_t(int, MLX5_UMR_ALIGN - ARCH_KMALLOC_MINALIGN, 0); > > > + mr->klms = kzalloc(size, GFP_KERNEL); > > > + if (!mr->klms) > > > + return -ENOMEM; > > > + > > > + mr->pl_map = dma_map_single(device->dma_device, mr->klms, > > > + size, DMA_TO_DEVICE); > > > > This is a misuse of the DMA API, you must call dma_map_single after > > the memory is set by the CPU, not before. > > > > The fast reg varient is using coherent allocations, which is OK.. > > It's fine as long as you dma_sync_*_for_{cpu,device} in the right > places, which is what a lot of drivers do for longer held allocations. Right, that is the other better option. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* [PATCH WIP 41/43] mlx5: Add arbitrary sg list support [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (39 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 40/43] mlx5: Allocate private context for arbitrary scatterlist registration Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 42/43] iser: Accept arbitrary sg lists mapping if the device supports it Sagi Grimberg ` (2 subsequent siblings) 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer If ib_alloc_mr is called with IB_MR_MAP_ARB_SG, the driver allocate a private klm list instead of a private page list. And set the UMR wqe correctly when posting the fast registration. Also, expose device cap IB_DEVICE_MAP_ARB_SG Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/mlx5/main.c | 1 + drivers/infiniband/hw/mlx5/mr.c | 30 ++++++++++++++++++++++++++++++ drivers/infiniband/hw/mlx5/qp.c | 31 ++++++++++++++++++++++++------- 3 files changed, 55 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index a90ef7a..2402563 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -249,6 +249,7 @@ static int mlx5_ib_query_device(struct ib_device *ibdev, if (MLX5_CAP_GEN(mdev, xrc)) props->device_cap_flags |= IB_DEVICE_XRC; props->device_cap_flags |= IB_DEVICE_MEM_MGT_EXTENSIONS; + props->device_cap_flags |= IB_DEVICE_MAP_ARB_SG; if (MLX5_CAP_GEN(mdev, sho)) { props->device_cap_flags |= IB_DEVICE_SIGNATURE_HANDOVER; /* At this stage no support for signature handover */ diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index 45209c7..836e717 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -1519,12 +1519,42 @@ done: return ret; } +static int +mlx5_ib_sg_to_klms(struct mlx5_ib_mr *mr, + struct scatterlist *sgl, + unsigned short sg_nents) +{ + struct scatterlist *sg = sgl; + u32 lkey = mr->ibmr.device->local_dma_lkey; + int i; + + if (sg_nents > mr->max_descs) + return -EINVAL; + + mr->ibmr.iova = sg_dma_address(sg); + mr->ibmr.length = 0; + mr->ndescs = sg_nents; + + for (i = 0; i < sg_nents; i++) { + mr->klms[i].va = cpu_to_be64(sg_dma_address(sg)); + mr->klms[i].bcount = cpu_to_be32(sg_dma_len(sg)); + mr->klms[i].key = cpu_to_be32(lkey); + mr->ibmr.length += sg_dma_len(sg); + sg = sg_next(sg); + } + + return 0; +} + int mlx5_ib_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, unsigned short sg_nents) { struct mlx5_ib_mr *mr = to_mmr(ibmr); + if (mr->access_mode == MLX5_ACCESS_MODE_KLM) + return mlx5_ib_sg_to_klms(mr, sg, sg_nents); + return ib_sg_to_pages(sg, sg_nents, mr->max_descs, mr->pl, &mr->ndescs, &ibmr->length, &ibmr->iova); diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index f0a03aa..3fb0396 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1909,6 +1909,10 @@ static void set_fastreg_umr_seg(struct mlx5_wqe_umr_ctrl_seg *umr, { int ndescs = mr->ndescs; + if (mr->access_mode == MLX5_ACCESS_MODE_KLM) + /* KLMs take twice the size of MTTs */ + ndescs *= 2; + memset(umr, 0, sizeof(*umr)); umr->flags = MLX5_UMR_CHECK_NOT_FREE; umr->klm_octowords = get_klm_octo(ndescs); @@ -2012,15 +2016,21 @@ static void set_fastreg_mkey_seg(struct mlx5_mkey_seg *seg, { int ndescs = ALIGN(mr->ndescs, 8) >> 1; + if (mr->access_mode == MLX5_ACCESS_MODE_MTT) + seg->log2_page_size = PAGE_SHIFT; + else if (mr->access_mode == MLX5_ACCESS_MODE_KLM) + /* KLMs take twice the size of MTTs */ + ndescs *= 2; + + memset(seg, 0, sizeof(*seg)); - seg->flags = get_umr_flags(mr->ibmr.access) | MLX5_ACCESS_MODE_MTT; + seg->flags = get_umr_flags(mr->ibmr.access) | mr->access_mode; *writ = seg->flags & (MLX5_PERM_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE); seg->qpn_mkey7_0 = cpu_to_be32((key & 0xff) | 0xffffff00); seg->flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL); seg->start_addr = cpu_to_be64(mr->ibmr.iova); seg->len = cpu_to_be64(mr->ibmr.length); seg->xlt_oct_size = cpu_to_be32(ndescs); - seg->log2_page_size = PAGE_SHIFT; } static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr, @@ -2069,12 +2079,19 @@ static void set_fastreg_ds(struct mlx5_wqe_data_seg *dseg, struct mlx5_ib_pd *pd, int writ) { - u64 perm = MLX5_EN_RD | (writ ? MLX5_EN_WR : 0); - int bcount = sizeof(u64) * mr->ndescs; - int i; + int bcount; + + if (mr->access_mode == MLX5_ACCESS_MODE_MTT) { + u64 perm = MLX5_EN_RD | (writ ? MLX5_EN_WR : 0); + int i; + + bcount = sizeof(u64) * mr->ndescs; + for (i = 0; i < mr->ndescs; i++) + mr->mpl[i] = cpu_to_be64(mr->pl[i] | perm); + } else { + bcount = sizeof(struct mlx5_klm) * mr->ndescs; + } - for (i = 0; i < mr->ndescs; i++) - mr->mpl[i] = cpu_to_be64(mr->pl[i] | perm); dseg->addr = cpu_to_be64(mr->pl_map); dseg->byte_count = cpu_to_be32(ALIGN(bcount, 64)); -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 42/43] iser: Accept arbitrary sg lists mapping if the device supports it [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (40 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 41/43] mlx5: Add arbitrary sg list support Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 43/43] iser: Move unaligned counter increment Sagi Grimberg 2015-07-22 17:10 ` [PATCH WIP 00/43] New fast registration API Christoph Hellwig 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer If the device support arbitrary sg list mapping (device cap IB_DEVICE_MAP_ARB_SG set) we allocate the memory regions with IB_MR_MAP_ARB_SG and skip the counce buffer work around. Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/ulp/iser/iser_memory.c | 4 ++++ drivers/infiniband/ulp/iser/iser_verbs.c | 20 ++++++++++++-------- 2 files changed, 16 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index 094cf8a..690f840 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -781,6 +781,10 @@ iser_handle_unaligned_buf(struct iscsi_iser_task *task, aligned_len = iser_data_buf_aligned_len(mem, device->ib_device, iser_conn->scsi_sg_tablesize); if (aligned_len != mem->dma_nents) { + if (device->dev_attr.device_cap_flags & IB_DEVICE_MAP_ARB_SG) + /* Arbitrary sg support, no need to bounce :) */ + return 0; + err = fall_to_bounce_buf(task, mem, dir); if (err) return err; diff --git a/drivers/infiniband/ulp/iser/iser_verbs.c b/drivers/infiniband/ulp/iser/iser_verbs.c index 332f784..978e283 100644 --- a/drivers/infiniband/ulp/iser/iser_verbs.c +++ b/drivers/infiniband/ulp/iser/iser_verbs.c @@ -281,14 +281,18 @@ void iser_free_fmr_pool(struct ib_conn *ib_conn) } static int -iser_alloc_reg_res(struct ib_device *ib_device, +iser_alloc_reg_res(struct iser_device *device, struct ib_pd *pd, struct iser_reg_resources *res, unsigned int size) { int ret; + int flags = 0; - res->mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, size, 0); + if (device->dev_attr.device_cap_flags & IB_DEVICE_MAP_ARB_SG) + flags = IB_MR_MAP_ARB_SG; + + res->mr = ib_alloc_mr(pd, IB_MR_TYPE_FAST_REG, size, flags); if (IS_ERR(res->mr)) { ret = PTR_ERR(res->mr); iser_err("Failed to allocate ib_fast_reg_mr err=%d\n", ret); @@ -306,7 +310,7 @@ iser_free_reg_res(struct iser_reg_resources *rsc) } static int -iser_alloc_pi_ctx(struct ib_device *ib_device, +iser_alloc_pi_ctx(struct iser_device *device, struct ib_pd *pd, struct iser_fr_desc *desc, unsigned int size) @@ -320,7 +324,7 @@ iser_alloc_pi_ctx(struct ib_device *ib_device, pi_ctx = desc->pi_ctx; - ret = iser_alloc_reg_res(ib_device, pd, &pi_ctx->rsc, size); + ret = iser_alloc_reg_res(device, pd, &pi_ctx->rsc, size); if (ret) { iser_err("failed to allocate reg_resources\n"); goto alloc_reg_res_err; @@ -353,7 +357,7 @@ iser_free_pi_ctx(struct iser_pi_context *pi_ctx) } static struct iser_fr_desc * -iser_create_fastreg_desc(struct ib_device *ib_device, +iser_create_fastreg_desc(struct iser_device *device, struct ib_pd *pd, bool pi_enable, unsigned int size) @@ -365,12 +369,12 @@ iser_create_fastreg_desc(struct ib_device *ib_device, if (!desc) return ERR_PTR(-ENOMEM); - ret = iser_alloc_reg_res(ib_device, pd, &desc->rsc, size); + ret = iser_alloc_reg_res(device, pd, &desc->rsc, size); if (ret) goto reg_res_alloc_failure; if (pi_enable) { - ret = iser_alloc_pi_ctx(ib_device, pd, desc, size); + ret = iser_alloc_pi_ctx(device, pd, desc, size); if (ret) goto pi_ctx_alloc_failure; } @@ -403,7 +407,7 @@ int iser_alloc_fastreg_pool(struct ib_conn *ib_conn, spin_lock_init(&fr_pool->lock); fr_pool->size = 0; for (i = 0; i < cmds_max; i++) { - desc = iser_create_fastreg_desc(device->ib_device, device->pd, + desc = iser_create_fastreg_desc(device, device->pd, ib_conn->pi_support, size); if (IS_ERR(desc)) { ret = PTR_ERR(desc); -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* [PATCH WIP 43/43] iser: Move unaligned counter increment [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (41 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 42/43] iser: Accept arbitrary sg lists mapping if the device supports it Sagi Grimberg @ 2015-07-22 6:55 ` Sagi Grimberg 2015-07-22 17:10 ` [PATCH WIP 00/43] New fast registration API Christoph Hellwig 43 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 6:55 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Liran Liss, Oren Duer We don't always use bounce buffers, still we update this counter. Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/ulp/iser/iser_memory.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/infiniband/ulp/iser/iser_memory.c b/drivers/infiniband/ulp/iser/iser_memory.c index 690f840..4d3dc1c 100644 --- a/drivers/infiniband/ulp/iser/iser_memory.c +++ b/drivers/infiniband/ulp/iser/iser_memory.c @@ -487,11 +487,8 @@ static int fall_to_bounce_buf(struct iscsi_iser_task *iser_task, struct iser_data_buf *mem, enum iser_data_dir cmd_dir) { - struct iscsi_conn *iscsi_conn = iser_task->iser_conn->iscsi_conn; struct iser_device *device = iser_task->iser_conn->ib_conn.device; - iscsi_conn->fmr_unalign_cnt++; - if (iser_debug_level > 0) iser_data_buf_dump(mem, device->ib_device); @@ -781,6 +778,7 @@ iser_handle_unaligned_buf(struct iscsi_iser_task *task, aligned_len = iser_data_buf_aligned_len(mem, device->ib_device, iser_conn->scsi_sg_tablesize); if (aligned_len != mem->dma_nents) { + iser_conn->iscsi_conn->fmr_unalign_cnt++; if (device->dev_attr.device_cap_flags & IB_DEVICE_MAP_ARB_SG) /* Arbitrary sg support, no need to bounce :) */ return 0; -- 1.8.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 142+ messages in thread
* Re: [PATCH WIP 00/43] New fast registration API [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (42 preceding siblings ...) 2015-07-22 6:55 ` [PATCH WIP 43/43] iser: Move unaligned counter increment Sagi Grimberg @ 2015-07-22 17:10 ` Christoph Hellwig [not found] ` <20150722171023.GA18934-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 43 siblings, 1 reply; 142+ messages in thread From: Christoph Hellwig @ 2015-07-22 17:10 UTC (permalink / raw) To: Sagi Grimberg; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer Thanks Sagi, this looks pretty good in general, various nitpicks nonwithstanding. The one thing I'm curious about is how we can support SRP with it's multiple MR support without too much boilerplate code. One option would be that pass an array of MRs to the map routines, and while most callers would just pass in one it would handle multiple for those drivers that supply them. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150722171023.GA18934-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>]
* Re: [PATCH WIP 00/43] New fast registration API [not found] ` <20150722171023.GA18934-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> @ 2015-07-22 17:27 ` Jason Gunthorpe [not found] ` <20150722172702.GE26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-22 17:42 ` Sagi Grimberg 1 sibling, 1 reply; 142+ messages in thread From: Jason Gunthorpe @ 2015-07-22 17:27 UTC (permalink / raw) To: Christoph Hellwig Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Wed, Jul 22, 2015 at 10:10:23AM -0700, Christoph Hellwig wrote: > The one thing I'm curious about is how we can support SRP with it's > multiple MR support without too much boilerplate code. One option > would be that pass an array of MRs to the map routines, and while > most callers would just pass in one it would handle multiple for those > drivers that supply them. What is SRP trying to accomplish with that? The only reason that springs to mind is to emulate IB_MR_MAP_ARB_SG ? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150722172702.GE26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: [PATCH WIP 00/43] New fast registration API [not found] ` <20150722172702.GE26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2015-07-23 9:26 ` Christoph Hellwig 0 siblings, 0 replies; 142+ messages in thread From: Christoph Hellwig @ 2015-07-23 9:26 UTC (permalink / raw) To: Jason Gunthorpe Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Wed, Jul 22, 2015 at 11:27:02AM -0600, Jason Gunthorpe wrote: > What is SRP trying to accomplish with that? > > The only reason that springs to mind is to emulate IB_MR_MAP_ARB_SG ? It's not emulating IB_MR_MAP_ARB_SG, it simply allows muliple memory registrations per I/O request. Be that to support gappy SGLs in a generic way, or to allow larger I/O sizes than the HCA MR size. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH WIP 00/43] New fast registration API [not found] ` <20150722171023.GA18934-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-07-22 17:27 ` Jason Gunthorpe @ 2015-07-22 17:42 ` Sagi Grimberg [not found] ` <55AFD608.401-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 1 sibling, 1 reply; 142+ messages in thread From: Sagi Grimberg @ 2015-07-22 17:42 UTC (permalink / raw) To: Christoph Hellwig, Sagi Grimberg Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/22/2015 8:10 PM, Christoph Hellwig wrote: > Thanks Sagi, > > this looks pretty good in general, various nitpicks nonwithstanding. > > The one thing I'm curious about is how we can support SRP with it's > multiple MR support without too much boilerplate code. One option > would be that pass an array of MRs to the map routines, and while > most callers would just pass in one it would handle multiple for those > drivers that supply them. We can do that, but I'd prefer not to pollute the API just for this single use case. What we can do, is add a pool API that would take care of that. But even then we might end up with different strategies as not all ULPs can use it the same way (protocol constraints)... Today SRP has this logic that registers multiple SG aligned partials. We can just have it pass a partial SG list to what we have today instead of building the page vectors... Or if we can come up with something that will keep the API trivial, we can take care of that too. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <55AFD608.401-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]
* Re: [PATCH WIP 00/43] New fast registration API [not found] ` <55AFD608.401-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> @ 2015-07-23 9:28 ` Christoph Hellwig [not found] ` <20150723092857.GE32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 0 siblings, 1 reply; 142+ messages in thread From: Christoph Hellwig @ 2015-07-23 9:28 UTC (permalink / raw) To: Sagi Grimberg Cc: Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On Wed, Jul 22, 2015 at 08:42:32PM +0300, Sagi Grimberg wrote: > We can do that, but I'd prefer not to pollute the API just for this > single use case. What we can do, is add a pool API that would take care > of that. But even then we might end up with different strategies as not > all ULPs can use it the same way (protocol constraints)... > > Today SRP has this logic that registers multiple SG aligned partials. > We can just have it pass a partial SG list to what we have today instead > of building the page vectors... > > Or if we can come up with something that will keep the API trivial, we > can take care of that too. Supporting an array or list of MRs seems pretty easy. If you ignore the weird fallback to physical DMA case when a MR fails case the SRP memory registration code isn't significanly more complex than that in iSER for example. And I think NFS needs the same support as well, as it allows using additional MRs when detecting a gap. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
[parent not found: <20150723092857.GE32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>]
* Re: [PATCH WIP 00/43] New fast registration API [not found] ` <20150723092857.GE32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> @ 2015-07-23 10:34 ` Sagi Grimberg 0 siblings, 0 replies; 142+ messages in thread From: Sagi Grimberg @ 2015-07-23 10:34 UTC (permalink / raw) To: Christoph Hellwig Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer On 7/23/2015 12:28 PM, Christoph Hellwig wrote: > On Wed, Jul 22, 2015 at 08:42:32PM +0300, Sagi Grimberg wrote: >> We can do that, but I'd prefer not to pollute the API just for this >> single use case. What we can do, is add a pool API that would take care >> of that. But even then we might end up with different strategies as not >> all ULPs can use it the same way (protocol constraints)... >> >> Today SRP has this logic that registers multiple SG aligned partials. >> We can just have it pass a partial SG list to what we have today instead >> of building the page vectors... >> >> Or if we can come up with something that will keep the API trivial, we >> can take care of that too. > > > Supporting an array or list of MRs seems pretty easy. I'm missing the simplicity here... > If you ignore the > weird fallback to physical DMA case when a MR fails case the SRP memory > registration code isn't significanly more complex than that in iSER for > example. And I think NFS needs the same support as well, as it allows > using additional MRs when detecting a gap. > This kinda changing the semantics a bit. With this we need to return a value of how many MRs used to register. It will also make it a bit sloppy as the actual mapping is driven from the drivers (which use their internal buffers). Don't you think that a separate pool API is better for addressing this? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
* Re: [PATCH WIP 38/43] iser-target: Port to new memory registration API @ 2015-07-23 9:22 Christoph Hellwig 0 siblings, 0 replies; 142+ messages in thread From: Christoph Hellwig @ 2015-07-23 9:22 UTC (permalink / raw) To: Jason Gunthorpe Cc: Sagi Grimberg, Christoph Hellwig, Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Liran Liss, Oren Duer > If you want to micro optimize then just zero the few items that are > defined to be accessed for fastreg, no need to zero the whole > structure. Infact, you may have already done that, so just drop the > memset entirely. Oh, indeed. > If you want to optimize this path, then Sean is right, move the post > into the driver and stop pretending that ib_post_send is a performance > API. > > ib_post_fastreg_wr would be a function that needs 3 register passed > arguments and does a simple copy to the driver's actual sendq Now that sounds even better. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 142+ messages in thread
end of thread, other threads:[~2015-08-21 18:08 UTC | newest] Thread overview: 142+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-07-22 6:55 [PATCH WIP 00/43] New fast registration API Sagi Grimberg [not found] ` <1437548143-24893-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 6:55 ` [PATCH WIP 01/43] IB: Modify ib_create_mr API Sagi Grimberg [not found] ` <1437548143-24893-2-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 16:34 ` Jason Gunthorpe [not found] ` <20150722163405.GA26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-22 16:44 ` Christoph Hellwig [not found] ` <20150722164421.GA6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-07-22 16:58 ` Sagi Grimberg [not found] ` <55AFCBAF.2000504-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-07-22 19:05 ` Jason Gunthorpe [not found] ` <20150722190555.GB4527-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-23 10:07 ` Sagi Grimberg [not found] ` <55B0BCFC.6040602-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-07-23 19:08 ` Jason Gunthorpe [not found] ` <20150723190855.GB31577-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-26 8:51 ` Sagi Grimberg 2015-07-22 16:59 ` Sagi Grimberg [not found] ` <55AFCBE4.1070803-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-07-22 17:01 ` Jason Gunthorpe [not found] ` <20150722170120.GC26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-22 17:03 ` Sagi Grimberg 2015-07-23 0:57 ` Hefty, Sean [not found] ` <1828884A29C6694DAF28B7E6B8A82373A9001357-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org> 2015-07-23 9:30 ` Christoph Hellwig [not found] ` <20150723093046.GF32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-07-23 10:09 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 02/43] IB/mlx4: Support ib_alloc_mr verb Sagi Grimberg [not found] ` <1437548143-24893-3-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 16:58 ` Jason Gunthorpe [not found] ` <20150722165831.GB26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-22 17:22 ` Sagi Grimberg [not found] ` <55AFD14C.8040007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-07-22 18:50 ` Steve Wise [not found] ` <55AFE5D9.3050102-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 2015-07-22 18:54 ` Jason Gunthorpe [not found] ` <20150722185410.GA4527-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-23 10:10 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 03/43] ocrdma: " Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 04/43] iw_cxgb4: " Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 05/43] cxgb3: " Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 06/43] nes: " Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 07/43] qib: " Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 08/43] IB/iser: Convert to ib_alloc_mr Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 09/43] iser-target: " Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 10/43] IB/srp: " Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 11/43] xprtrdma, svcrdma: " Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 12/43] RDS: " Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 13/43] mlx5: Drop mlx5_ib_alloc_fast_reg_mr Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 14/43] mlx4: Drop mlx4_ib_alloc_fast_reg_mr Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 15/43] ocrdma: Drop ocrdma_alloc_frmr Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 16/43] qib: Drop qib_alloc_fast_reg_mr Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 17/43] nes: Drop nes_alloc_fast_reg_mr Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 18/43] cxgb4: Drop c4iw_alloc_fast_reg_mr Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 19/43] cxgb3: Drop iwch_alloc_fast_reg_mr Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 20/43] IB/core: Drop ib_alloc_fast_reg_mr Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 21/43] mlx5: Allocate a private page list in ib_alloc_mr Sagi Grimberg [not found] ` <1437548143-24893-22-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 16:46 ` Christoph Hellwig [not found] ` <20150722164605.GB6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-07-22 16:51 ` Sagi Grimberg 2015-07-28 10:57 ` Haggai Eran [not found] ` <55B75FFC.6040200-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-30 8:08 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 22/43] mlx4: " Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 23/43] ocrdma: " Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 24/43] cxgb3: Allocate a provate " Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 25/43] cxgb4: Allocate a private " Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 26/43] qib: " Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 27/43] nes: " Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 28/43] IB/core: Introduce new fast registration API Sagi Grimberg [not found] ` <1437548143-24893-29-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 16:50 ` Christoph Hellwig [not found] ` <20150722165012.GC6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-07-22 16:56 ` Sagi Grimberg 2015-07-22 17:44 ` Jason Gunthorpe [not found] ` <20150722174401.GG26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-23 9:19 ` Christoph Hellwig [not found] ` <20150723091955.GA32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-07-23 16:03 ` Jason Gunthorpe 2015-07-23 10:15 ` Sagi Grimberg [not found] ` <55B0BEB4.9080702-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-07-23 17:55 ` Jason Gunthorpe [not found] ` <20150723175535.GE25174-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-26 9:37 ` Sagi Grimberg [not found] ` <55B4AA73.3090803-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-07-27 17:04 ` Jason Gunthorpe [not found] ` <20150727170459.GA18348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-30 7:13 ` Sagi Grimberg [not found] ` <55B9CE85.40007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-07-30 16:36 ` Jason Gunthorpe [not found] ` <20150730163631.GB16659-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-30 16:39 ` Christoph Hellwig 2015-08-19 11:56 ` Sagi Grimberg [not found] ` <55D46EE8.4060701-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-08-19 12:52 ` Christoph Hellwig [not found] ` <20150819125253.GB24746-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-08-19 16:09 ` Sagi Grimberg [not found] ` <55D4AA2E.7090204-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-08-19 16:58 ` Christoph Hellwig 2015-08-19 17:37 ` Jason Gunthorpe [not found] ` <20150819173751.GB22646-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-08-20 10:05 ` Sagi Grimberg [not found] ` <55D5A687.90102-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-08-20 19:04 ` Jason Gunthorpe [not found] ` <20150820190413.GB29567-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-08-21 6:34 ` Christoph Hellwig [not found] ` <20150821063458.GA875-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-08-21 18:08 ` Jason Gunthorpe 2015-07-23 18:42 ` Jason Gunthorpe [not found] ` <20150723184221.GA30303-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-26 8:54 ` Sagi Grimberg 2015-07-22 18:02 ` Jason Gunthorpe [not found] ` <20150722180203.GI26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-23 10:19 ` Sagi Grimberg [not found] ` <55B0BFA4.4060509-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-07-23 16:14 ` Jason Gunthorpe [not found] ` <20150723161436.GC25174-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-23 16:47 ` Sagi Grimberg [not found] ` <55B11A92.9040406-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-07-23 18:51 ` Jason Gunthorpe [not found] ` <20150723185126.GA31346-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-26 9:45 ` Sagi Grimberg [not found] ` <55B4AC26.20405-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-07-27 17:14 ` Jason Gunthorpe [not found] ` <20150727171441.GC18348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-27 20:11 ` Steve Wise [not found] ` <55B69058.70403-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 2015-07-27 20:29 ` Jason Gunthorpe 2015-07-28 11:20 ` Haggai Eran 2015-07-22 6:55 ` [PATCH WIP 29/43] mlx5: Support the new memory " Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 30/43] mlx4: " Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 31/43] ocrdma: " Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 32/43] cxgb3: " Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 33/43] cxgb4: " Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 34/43] nes: " Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 35/43] qib: " Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 36/43] iser: Port to new fast registration api Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 37/43] xprtrdma: Port to new memory registration API Sagi Grimberg [not found] ` <1437548143-24893-38-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 15:03 ` Chuck Lever [not found] ` <795F4F28-D92F-46A1-8DA3-2B1B19A17AA3-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2015-07-22 15:41 ` Sagi Grimberg [not found] ` <55AFB9A7.4030103-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-07-22 16:04 ` Chuck Lever [not found] ` <5114D0F0-7C66-4889-85D8-E7297009AF23-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2015-07-23 10:42 ` Sagi Grimberg 2015-07-22 16:59 ` Christoph Hellwig 2015-07-22 19:21 ` Steve Wise [not found] ` <55AFED4C.9040409-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> 2015-07-23 10:20 ` Sagi Grimberg [not found] ` <55B0C002.60307-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-07-23 13:46 ` Steve Wise 2015-07-22 6:55 ` [PATCH WIP 38/43] iser-target: " Sagi Grimberg [not found] ` <1437548143-24893-39-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 17:04 ` Christoph Hellwig [not found] ` <20150722170413.GE6443-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-07-22 17:33 ` Sagi Grimberg [not found] ` <55AFD3DC.8070508-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-07-22 17:57 ` Jason Gunthorpe [not found] ` <20150722175755.GH26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-23 10:27 ` Sagi Grimberg [not found] ` <55B0C18B.4080901-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-07-23 13:35 ` Chuck Lever 2015-07-23 16:31 ` Jason Gunthorpe [not found] ` <20150723163124.GD25174-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-23 16:59 ` Sagi Grimberg [not found] ` <55B11D84.102-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-07-23 18:53 ` Jason Gunthorpe [not found] ` <20150723185334.GB31346-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-24 14:36 ` Chuck Lever [not found] ` <DE0226A1-A7FC-4618-91F1-FE34347C252A-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2015-07-24 16:26 ` Jason Gunthorpe [not found] ` <20150724162657.GA21473-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-24 16:34 ` Steve Wise 2015-07-24 17:46 ` Chuck Lever [not found] ` <903CDFB5-04FE-47B6-B044-E960E8A8BC4C-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2015-07-24 19:10 ` Jason Gunthorpe [not found] ` <20150724191003.GA26225-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-24 19:59 ` Chuck Lever [not found] ` <A1A0BF6E-992A-4B34-8D24-EA8AA8D6983B-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2015-07-24 20:24 ` Jason Gunthorpe [not found] ` <20150724202445.GA28033-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-24 22:13 ` Steve Wise 2015-07-24 22:44 ` Jason Gunthorpe 2015-07-27 15:57 ` Chuck Lever [not found] ` <8A2BC019-1DC0-4531-9659-3181EE9A4B43-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2015-07-27 17:25 ` Jason Gunthorpe [not found] ` <20150727172510.GD18348-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-28 20:06 ` Chuck Lever [not found] ` <B045BAC2-0360-4D97-A220-7DB52AF90BF7-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2015-07-29 6:32 ` Christoph Hellwig 2015-07-22 6:55 ` [PATCH WIP 39/43] IB/core: Add arbitrary sg_list support Sagi Grimberg [not found] ` <1437548143-24893-40-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 17:05 ` Christoph Hellwig 2015-07-22 17:22 ` Jason Gunthorpe [not found] ` <20150722172255.GD26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-22 17:29 ` Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 40/43] mlx5: Allocate private context for arbitrary scatterlist registration Sagi Grimberg [not found] ` <1437548143-24893-41-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2015-07-22 17:30 ` Jason Gunthorpe [not found] ` <20150722173048.GF26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-23 9:25 ` Christoph Hellwig [not found] ` <20150723092532.GC32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-07-23 10:28 ` Sagi Grimberg 2015-07-23 16:04 ` Jason Gunthorpe 2015-07-22 6:55 ` [PATCH WIP 41/43] mlx5: Add arbitrary sg list support Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 42/43] iser: Accept arbitrary sg lists mapping if the device supports it Sagi Grimberg 2015-07-22 6:55 ` [PATCH WIP 43/43] iser: Move unaligned counter increment Sagi Grimberg 2015-07-22 17:10 ` [PATCH WIP 00/43] New fast registration API Christoph Hellwig [not found] ` <20150722171023.GA18934-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-07-22 17:27 ` Jason Gunthorpe [not found] ` <20150722172702.GE26909-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 2015-07-23 9:26 ` Christoph Hellwig 2015-07-22 17:42 ` Sagi Grimberg [not found] ` <55AFD608.401-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> 2015-07-23 9:28 ` Christoph Hellwig [not found] ` <20150723092857.GE32592-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2015-07-23 10:34 ` Sagi Grimberg 2015-07-23 9:22 [PATCH WIP 38/43] iser-target: Port to new memory " Christoph Hellwig
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.