* [PATCH v1 0/4] NFS/RDMA client for v4.21 (part 1) @ 2018-11-19 15:45 Chuck Lever 2018-11-19 15:45 ` [PATCH v1 1/4] xprtrdma: Remove support for FMR memory registration Chuck Lever ` (3 more replies) 0 siblings, 4 replies; 31+ messages in thread From: Chuck Lever @ 2018-11-19 15:45 UTC (permalink / raw) To: linux-rdma; +Cc: linux-nfs Hi- These four patches changes details of how the Linux RPC-over-RDMA client implementation uses RDMA, thus I'd like some scrutiny by the good people on linux-rdma. I've got a dozen or so additional patches for v4.21 dealing with the higher layers which I will post next week. I will aim those at linux-nfs. --- Chuck Lever (4): xprtrdma: Remove support for FMR memory registration xprtrdma: mrs_create off-by-one xprtrdma: Reduce max_frwr_depth xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) net/sunrpc/xprtrdma/Makefile | 3 net/sunrpc/xprtrdma/fmr_ops.c | 337 --------------------------------------- net/sunrpc/xprtrdma/frwr_ops.c | 20 ++ net/sunrpc/xprtrdma/rpc_rdma.c | 6 - net/sunrpc/xprtrdma/verbs.c | 6 - net/sunrpc/xprtrdma/xprt_rdma.h | 2 6 files changed, 19 insertions(+), 355 deletions(-) delete mode 100644 net/sunrpc/xprtrdma/fmr_ops.c -- Chuck Lever ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v1 1/4] xprtrdma: Remove support for FMR memory registration 2018-11-19 15:45 [PATCH v1 0/4] NFS/RDMA client for v4.21 (part 1) Chuck Lever @ 2018-11-19 15:45 ` Chuck Lever 2018-11-19 16:16 ` Bart Van Assche 2018-11-19 15:45 ` [PATCH v1 2/4] xprtrdma: mrs_create off-by-one Chuck Lever ` (2 subsequent siblings) 3 siblings, 1 reply; 31+ messages in thread From: Chuck Lever @ 2018-11-19 15:45 UTC (permalink / raw) To: linux-rdma; +Cc: linux-nfs FMR is not supported on most recent RDMA devices. It is also slower and less secure than FRWR. As discussed during the RDMA BoF at LPC 2018, it is time to remove support for FMR in the NFS/RDMA client stack. NFS/RDMA server-side uses either local memory registration or FRWR. There is no change required there to deprecate FMR. There are a few Infiniband/RoCE devices in the kernel tree that do not support MEM_MGT_EXTENSIONS, and therefore will no longer support client-side NFS/RDMA. These are: - mthca - qib - usnic - hns (RoCE) Signed-off-by: Chuck Lever <chuck.lever@oracle.com> --- net/sunrpc/xprtrdma/Makefile | 3 net/sunrpc/xprtrdma/fmr_ops.c | 337 ----------------------------------------- net/sunrpc/xprtrdma/verbs.c | 6 - 3 files changed, 1 insertion(+), 345 deletions(-) delete mode 100644 net/sunrpc/xprtrdma/fmr_ops.c diff --git a/net/sunrpc/xprtrdma/Makefile b/net/sunrpc/xprtrdma/Makefile index 8bf19e1..8ed0377 100644 --- a/net/sunrpc/xprtrdma/Makefile +++ b/net/sunrpc/xprtrdma/Makefile @@ -1,8 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_SUNRPC_XPRT_RDMA) += rpcrdma.o -rpcrdma-y := transport.o rpc_rdma.o verbs.o \ - fmr_ops.o frwr_ops.o \ +rpcrdma-y := transport.o rpc_rdma.o verbs.o frwr_ops.o \ svc_rdma.o svc_rdma_backchannel.o svc_rdma_transport.o \ svc_rdma_sendto.o svc_rdma_recvfrom.o svc_rdma_rw.o \ module.o diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c deleted file mode 100644 index 7f5632c..0000000 --- a/net/sunrpc/xprtrdma/fmr_ops.c +++ /dev/null @@ -1,337 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * Copyright (c) 2015, 2017 Oracle. All rights reserved. - * Copyright (c) 2003-2007 Network Appliance, Inc. All rights reserved. - */ - -/* Lightweight memory registration using Fast Memory Regions (FMR). - * Referred to sometimes as MTHCAFMR mode. - * - * FMR uses synchronous memory registration and deregistration. - * FMR registration is known to be fast, but FMR deregistration - * can take tens of usecs to complete. - */ - -/* Normal operation - * - * A Memory Region is prepared for RDMA READ or WRITE using the - * ib_map_phys_fmr verb (fmr_op_map). When the RDMA operation is - * finished, the Memory Region is unmapped using the ib_unmap_fmr - * verb (fmr_op_unmap). - */ - -#include <linux/sunrpc/svc_rdma.h> - -#include "xprt_rdma.h" -#include <trace/events/rpcrdma.h> - -#if IS_ENABLED(CONFIG_SUNRPC_DEBUG) -# define RPCDBG_FACILITY RPCDBG_TRANS -#endif - -/* Maximum scatter/gather per FMR */ -#define RPCRDMA_MAX_FMR_SGES (64) - -/* Access mode of externally registered pages */ -enum { - RPCRDMA_FMR_ACCESS_FLAGS = IB_ACCESS_REMOTE_WRITE | - IB_ACCESS_REMOTE_READ, -}; - -bool -fmr_is_supported(struct rpcrdma_ia *ia) -{ - if (!ia->ri_device->alloc_fmr) { - pr_info("rpcrdma: 'fmr' mode is not supported by device %s\n", - ia->ri_device->name); - return false; - } - return true; -} - -static void -__fmr_unmap(struct rpcrdma_mr *mr) -{ - LIST_HEAD(l); - int rc; - - list_add(&mr->fmr.fm_mr->list, &l); - rc = ib_unmap_fmr(&l); - list_del(&mr->fmr.fm_mr->list); - if (rc) - pr_err("rpcrdma: final ib_unmap_fmr for %p failed %i\n", - mr, rc); -} - -/* Release an MR. - */ -static void -fmr_op_release_mr(struct rpcrdma_mr *mr) -{ - int rc; - - kfree(mr->fmr.fm_physaddrs); - kfree(mr->mr_sg); - - /* In case this one was left mapped, try to unmap it - * to prevent dealloc_fmr from failing with EBUSY - */ - __fmr_unmap(mr); - - rc = ib_dealloc_fmr(mr->fmr.fm_mr); - if (rc) - pr_err("rpcrdma: final ib_dealloc_fmr for %p returned %i\n", - mr, rc); - - kfree(mr); -} - -/* MRs are dynamically allocated, so simply clean up and release the MR. - * A replacement MR will subsequently be allocated on demand. - */ -static void -fmr_mr_recycle_worker(struct work_struct *work) -{ - struct rpcrdma_mr *mr = container_of(work, struct rpcrdma_mr, mr_recycle); - struct rpcrdma_xprt *r_xprt = mr->mr_xprt; - - trace_xprtrdma_mr_recycle(mr); - - trace_xprtrdma_mr_unmap(mr); - ib_dma_unmap_sg(r_xprt->rx_ia.ri_device, - mr->mr_sg, mr->mr_nents, mr->mr_dir); - - spin_lock(&r_xprt->rx_buf.rb_mrlock); - list_del(&mr->mr_all); - r_xprt->rx_stats.mrs_recycled++; - spin_unlock(&r_xprt->rx_buf.rb_mrlock); - fmr_op_release_mr(mr); -} - -static int -fmr_op_init_mr(struct rpcrdma_ia *ia, struct rpcrdma_mr *mr) -{ - static struct ib_fmr_attr fmr_attr = { - .max_pages = RPCRDMA_MAX_FMR_SGES, - .max_maps = 1, - .page_shift = PAGE_SHIFT - }; - - mr->fmr.fm_physaddrs = kcalloc(RPCRDMA_MAX_FMR_SGES, - sizeof(u64), GFP_KERNEL); - if (!mr->fmr.fm_physaddrs) - goto out_free; - - mr->mr_sg = kcalloc(RPCRDMA_MAX_FMR_SGES, - sizeof(*mr->mr_sg), GFP_KERNEL); - if (!mr->mr_sg) - goto out_free; - - sg_init_table(mr->mr_sg, RPCRDMA_MAX_FMR_SGES); - - mr->fmr.fm_mr = ib_alloc_fmr(ia->ri_pd, RPCRDMA_FMR_ACCESS_FLAGS, - &fmr_attr); - if (IS_ERR(mr->fmr.fm_mr)) - goto out_fmr_err; - - INIT_LIST_HEAD(&mr->mr_list); - INIT_WORK(&mr->mr_recycle, fmr_mr_recycle_worker); - return 0; - -out_fmr_err: - dprintk("RPC: %s: ib_alloc_fmr returned %ld\n", __func__, - PTR_ERR(mr->fmr.fm_mr)); - -out_free: - kfree(mr->mr_sg); - kfree(mr->fmr.fm_physaddrs); - return -ENOMEM; -} - -/* On success, sets: - * ep->rep_attr.cap.max_send_wr - * ep->rep_attr.cap.max_recv_wr - * cdata->max_requests - * ia->ri_max_segs - */ -static int -fmr_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep, - struct rpcrdma_create_data_internal *cdata) -{ - int max_qp_wr; - - max_qp_wr = ia->ri_device->attrs.max_qp_wr; - max_qp_wr -= RPCRDMA_BACKWARD_WRS; - max_qp_wr -= 1; - if (max_qp_wr < RPCRDMA_MIN_SLOT_TABLE) - return -ENOMEM; - if (cdata->max_requests > max_qp_wr) - cdata->max_requests = max_qp_wr; - ep->rep_attr.cap.max_send_wr = cdata->max_requests; - ep->rep_attr.cap.max_send_wr += RPCRDMA_BACKWARD_WRS; - ep->rep_attr.cap.max_send_wr += 1; /* for ib_drain_sq */ - ep->rep_attr.cap.max_recv_wr = cdata->max_requests; - ep->rep_attr.cap.max_recv_wr += RPCRDMA_BACKWARD_WRS; - ep->rep_attr.cap.max_recv_wr += 1; /* for ib_drain_rq */ - - ia->ri_max_segs = max_t(unsigned int, 1, RPCRDMA_MAX_DATA_SEGS / - RPCRDMA_MAX_FMR_SGES); - ia->ri_max_segs += 2; /* segments for head and tail buffers */ - return 0; -} - -/* FMR mode conveys up to 64 pages of payload per chunk segment. - */ -static size_t -fmr_op_maxpages(struct rpcrdma_xprt *r_xprt) -{ - return min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS, - RPCRDMA_MAX_HDR_SEGS * RPCRDMA_MAX_FMR_SGES); -} - -/* Use the ib_map_phys_fmr() verb to register a memory region - * for remote access via RDMA READ or RDMA WRITE. - */ -static struct rpcrdma_mr_seg * -fmr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, - int nsegs, bool writing, struct rpcrdma_mr **out) -{ - struct rpcrdma_mr_seg *seg1 = seg; - int len, pageoff, i, rc; - struct rpcrdma_mr *mr; - u64 *dma_pages; - - mr = rpcrdma_mr_get(r_xprt); - if (!mr) - return ERR_PTR(-EAGAIN); - - pageoff = offset_in_page(seg1->mr_offset); - seg1->mr_offset -= pageoff; /* start of page */ - seg1->mr_len += pageoff; - len = -pageoff; - if (nsegs > RPCRDMA_MAX_FMR_SGES) - nsegs = RPCRDMA_MAX_FMR_SGES; - for (i = 0; i < nsegs;) { - if (seg->mr_page) - sg_set_page(&mr->mr_sg[i], - seg->mr_page, - seg->mr_len, - offset_in_page(seg->mr_offset)); - else - sg_set_buf(&mr->mr_sg[i], seg->mr_offset, - seg->mr_len); - len += seg->mr_len; - ++seg; - ++i; - /* Check for holes */ - if ((i < nsegs && offset_in_page(seg->mr_offset)) || - offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len)) - break; - } - mr->mr_dir = rpcrdma_data_dir(writing); - - mr->mr_nents = ib_dma_map_sg(r_xprt->rx_ia.ri_device, - mr->mr_sg, i, mr->mr_dir); - if (!mr->mr_nents) - goto out_dmamap_err; - trace_xprtrdma_mr_map(mr); - - for (i = 0, dma_pages = mr->fmr.fm_physaddrs; i < mr->mr_nents; i++) - dma_pages[i] = sg_dma_address(&mr->mr_sg[i]); - rc = ib_map_phys_fmr(mr->fmr.fm_mr, dma_pages, mr->mr_nents, - dma_pages[0]); - if (rc) - goto out_maperr; - - mr->mr_handle = mr->fmr.fm_mr->rkey; - mr->mr_length = len; - mr->mr_offset = dma_pages[0] + pageoff; - - *out = mr; - return seg; - -out_dmamap_err: - pr_err("rpcrdma: failed to DMA map sg %p sg_nents %d\n", - mr->mr_sg, i); - rpcrdma_mr_put(mr); - return ERR_PTR(-EIO); - -out_maperr: - pr_err("rpcrdma: ib_map_phys_fmr %u@0x%llx+%i (%d) status %i\n", - len, (unsigned long long)dma_pages[0], - pageoff, mr->mr_nents, rc); - rpcrdma_mr_unmap_and_put(mr); - return ERR_PTR(-EIO); -} - -/* Post Send WR containing the RPC Call message. - */ -static int -fmr_op_send(struct rpcrdma_ia *ia, struct rpcrdma_req *req) -{ - return ib_post_send(ia->ri_id->qp, &req->rl_sendctx->sc_wr, NULL); -} - -/* Invalidate all memory regions that were registered for "req". - * - * Sleeps until it is safe for the host CPU to access the - * previously mapped memory regions. - * - * Caller ensures that @mrs is not empty before the call. This - * function empties the list. - */ -static void -fmr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct list_head *mrs) -{ - struct rpcrdma_mr *mr; - LIST_HEAD(unmap_list); - int rc; - - /* ORDER: Invalidate all of the req's MRs first - * - * ib_unmap_fmr() is slow, so use a single call instead - * of one call per mapped FMR. - */ - list_for_each_entry(mr, mrs, mr_list) { - dprintk("RPC: %s: unmapping fmr %p\n", - __func__, &mr->fmr); - trace_xprtrdma_mr_localinv(mr); - list_add_tail(&mr->fmr.fm_mr->list, &unmap_list); - } - r_xprt->rx_stats.local_inv_needed++; - rc = ib_unmap_fmr(&unmap_list); - if (rc) - goto out_release; - - /* ORDER: Now DMA unmap all of the req's MRs, and return - * them to the free MW list. - */ - while (!list_empty(mrs)) { - mr = rpcrdma_mr_pop(mrs); - list_del(&mr->fmr.fm_mr->list); - rpcrdma_mr_unmap_and_put(mr); - } - - return; - -out_release: - pr_err("rpcrdma: ib_unmap_fmr failed (%i)\n", rc); - - while (!list_empty(mrs)) { - mr = rpcrdma_mr_pop(mrs); - list_del(&mr->fmr.fm_mr->list); - rpcrdma_mr_recycle(mr); - } -} - -const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = { - .ro_map = fmr_op_map, - .ro_send = fmr_op_send, - .ro_unmap_sync = fmr_op_unmap_sync, - .ro_open = fmr_op_open, - .ro_maxpages = fmr_op_maxpages, - .ro_init_mr = fmr_op_init_mr, - .ro_release_mr = fmr_op_release_mr, - .ro_displayname = "fmr", - .ro_send_w_inv_ok = 0, -}; diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 3ddba94..a8d4def 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -434,12 +434,6 @@ break; } /*FALLTHROUGH*/ - case RPCRDMA_MTHCAFMR: - if (fmr_is_supported(ia)) { - ia->ri_ops = &rpcrdma_fmr_memreg_ops; - break; - } - /*FALLTHROUGH*/ default: pr_err("rpcrdma: Device %s does not support memreg mode %d\n", ia->ri_device->name, xprt_rdma_memreg_strategy); ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v1 1/4] xprtrdma: Remove support for FMR memory registration 2018-11-19 15:45 ` [PATCH v1 1/4] xprtrdma: Remove support for FMR memory registration Chuck Lever @ 2018-11-19 16:16 ` Bart Van Assche 2018-11-19 19:09 ` Leon Romanovsky 2018-11-19 22:41 ` Jason Gunthorpe 0 siblings, 2 replies; 31+ messages in thread From: Bart Van Assche @ 2018-11-19 16:16 UTC (permalink / raw) To: Chuck Lever, linux-rdma Cc: linux-nfs, oulijun, umalhi, zhongjiang, venkata.s.dhanalakota On Mon, 2018-11-19 at 10:45 -0500, Chuck Lever wrote: > FMR is not supported on most recent RDMA devices. It is also slower > and less secure than FRWR. As discussed during the RDMA BoF at LPC > 2018, it is time to remove support for FMR in the NFS/RDMA client > stack. NFS/RDMA server-side uses either local memory registration or > FRWR. There is no change required there to deprecate FMR. > > There are a few Infiniband/RoCE devices in the kernel tree that do > not support MEM_MGT_EXTENSIONS, and therefore will no longer support > client-side NFS/RDMA. These are: > > - mthca > - qib > - usnic > - hns (RoCE) Can someone from Intel comment on how many of their customers rely on the qib driver? Can someone from Cisco comment on how many of their customers rely on the usnic driver? Can someone from Huawei comment on how many of their customers rely on the hns driver? I'm wondering whether I can remove FMR support from the SRP initiator driver. Thanks, Bart. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v1 1/4] xprtrdma: Remove support for FMR memory registration 2018-11-19 16:16 ` Bart Van Assche @ 2018-11-19 19:09 ` Leon Romanovsky 2018-11-19 20:52 ` Bart Van Assche 2018-11-19 22:41 ` Jason Gunthorpe 1 sibling, 1 reply; 31+ messages in thread From: Leon Romanovsky @ 2018-11-19 19:09 UTC (permalink / raw) To: Bart Van Assche Cc: Chuck Lever, linux-rdma, linux-nfs, oulijun, umalhi, zhongjiang, venkata.s.dhanalakota [-- Attachment #1: Type: text/plain, Size: 1198 bytes --] On Mon, Nov 19, 2018 at 08:16:37AM -0800, Bart Van Assche wrote: > On Mon, 2018-11-19 at 10:45 -0500, Chuck Lever wrote: > > FMR is not supported on most recent RDMA devices. It is also slower > > and less secure than FRWR. As discussed during the RDMA BoF at LPC > > 2018, it is time to remove support for FMR in the NFS/RDMA client > > stack. NFS/RDMA server-side uses either local memory registration or > > FRWR. There is no change required there to deprecate FMR. > > > > There are a few Infiniband/RoCE devices in the kernel tree that do > > not support MEM_MGT_EXTENSIONS, and therefore will no longer support > > client-side NFS/RDMA. These are: > > > > - mthca > > - qib > > - usnic > > - hns (RoCE) > > Can someone from Intel comment on how many of their customers rely on the qib > driver? > > Can someone from Cisco comment on how many of their customers rely on the usnic > driver? > > Can someone from Huawei comment on how many of their customers rely on the hns > driver? > > I'm wondering whether I can remove FMR support from the SRP initiator driver. Bart, The more accurate question is "how many customers ... use latest, upstream kernel?" Thanks > > Thanks, > > Bart. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v1 1/4] xprtrdma: Remove support for FMR memory registration 2018-11-19 19:09 ` Leon Romanovsky @ 2018-11-19 20:52 ` Bart Van Assche 2018-11-20 5:37 ` Leon Romanovsky 0 siblings, 1 reply; 31+ messages in thread From: Bart Van Assche @ 2018-11-19 20:52 UTC (permalink / raw) To: Leon Romanovsky Cc: Chuck Lever, linux-rdma, linux-nfs, oulijun, zhongjiang, venkata.s.dhanalakota On Mon, 2018-11-19 at 21:09 +0200, Leon Romanovsky wrote: > The more accurate question is "how many customers ... use latest, > upstream kernel?" Hi Leon, If this would not be clear: I am in favor of removing kernel drivers that are no longer in use. Recently I supported a proposal to remove the SCSI OSD driver from the kernel tree because that driver probably has not been used by anyone in the past five years. But I think that kernel drivers that are still in use should stay in the kernel tree, no matter which version is in use. Since the Linux kernel ABI is backwards compatible, with every Linux distro it is possible to replace the Linux kernel that was provided by the Linux distributor with a more recent upstream kernel. I think it would be a very unpleasant surprise for people who use one of the RDMA adapters that only support FMR if they would test a more recent kernel and if they would notice that its driver has been removed. Bart. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v1 1/4] xprtrdma: Remove support for FMR memory registration 2018-11-19 20:52 ` Bart Van Assche @ 2018-11-20 5:37 ` Leon Romanovsky 0 siblings, 0 replies; 31+ messages in thread From: Leon Romanovsky @ 2018-11-20 5:37 UTC (permalink / raw) To: Bart Van Assche Cc: Chuck Lever, linux-rdma, linux-nfs, oulijun, zhongjiang, venkata.s.dhanalakota [-- Attachment #1: Type: text/plain, Size: 1277 bytes --] On Mon, Nov 19, 2018 at 12:52:10PM -0800, Bart Van Assche wrote: > On Mon, 2018-11-19 at 21:09 +0200, Leon Romanovsky wrote: > > The more accurate question is "how many customers ... use latest, > > upstream kernel?" > > Hi Leon, > > If this would not be clear: I am in favor of removing kernel drivers that > are no longer in use. Recently I supported a proposal to remove the SCSI > OSD driver from the kernel tree because that driver > probably has not been > used by anyone in the past five years. > > But I think that kernel drivers that are still in use should stay in the > kernel tree, no matter which version is in use. Since the Linux kernel ABI > is backwards compatible, with every Linux distro it is possible to replace > the Linux kernel that was provided by the Linux distributor with a more > recent upstream kernel. I think it would be a very unpleasant surprise for > people who use one of the RDMA adapters that only support FMR if they would > test a more recent kernel and if they would notice that its driver has been > removed. Bart, I know the theory behind promises to work on old HW, but the reality that no one really tests latest kernels on those devices. This article complains about it too: https://lwn.net/Articles/769468/ Thanks > > Bart. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 801 bytes --] ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v1 1/4] xprtrdma: Remove support for FMR memory registration 2018-11-19 16:16 ` Bart Van Assche 2018-11-19 19:09 ` Leon Romanovsky @ 2018-11-19 22:41 ` Jason Gunthorpe 2018-11-19 22:56 ` Chuck Lever 2018-11-20 15:22 ` Dennis Dalessandro 1 sibling, 2 replies; 31+ messages in thread From: Jason Gunthorpe @ 2018-11-19 22:41 UTC (permalink / raw) To: Bart Van Assche Cc: Chuck Lever, linux-rdma, linux-nfs, oulijun, umalhi, zhongjiang, venkata.s.dhanalakota, Dennis Dalessandro On Mon, Nov 19, 2018 at 08:16:37AM -0800, Bart Van Assche wrote: > On Mon, 2018-11-19 at 10:45 -0500, Chuck Lever wrote: > > FMR is not supported on most recent RDMA devices. It is also slower > > and less secure than FRWR. As discussed during the RDMA BoF at LPC > > 2018, it is time to remove support for FMR in the NFS/RDMA client > > stack. NFS/RDMA server-side uses either local memory registration or > > FRWR. There is no change required there to deprecate FMR. > > > > There are a few Infiniband/RoCE devices in the kernel tree that do > > not support MEM_MGT_EXTENSIONS, and therefore will no longer support > > client-side NFS/RDMA. These are: > > > > - mthca > > - qib Ooh, qib was not in the list when we talked on this in plumbers. That does change things. Dennis: Could qib be changed? HFI1 supports FRWR right? > > - usnic > > - hns (RoCE) > > Can someone from Intel comment on how many of their customers rely on the qib > driver? > > Can someone from Cisco comment on how many of their customers rely on the usnic > driver? Ignore usnic, it is not a RDMA driver > Can someone from Huawei comment on how many of their customers rely on the hns > driver? Nor was this, this is an actively used driver.. Could hns do FRWR? It is very disappointing that such a new driver does not support FRWR. I guess it is worth stating again that FMR is deprecated. People are going to be disappointed if they try to submit new drivers that don't support FRWR :( Jason ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v1 1/4] xprtrdma: Remove support for FMR memory registration 2018-11-19 22:41 ` Jason Gunthorpe @ 2018-11-19 22:56 ` Chuck Lever 2018-11-19 23:10 ` Jason Gunthorpe 2018-11-20 15:22 ` Dennis Dalessandro 1 sibling, 1 reply; 31+ messages in thread From: Chuck Lever @ 2018-11-19 22:56 UTC (permalink / raw) To: Jason Gunthorpe Cc: Bart Van Assche, linux-rdma, Linux NFS Mailing List, Lijun Ou, umalhi, zhongjiang, venkata.s.dhanalakota, Dennis Dalessandro > On Nov 19, 2018, at 5:41 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Mon, Nov 19, 2018 at 08:16:37AM -0800, Bart Van Assche wrote: >> On Mon, 2018-11-19 at 10:45 -0500, Chuck Lever wrote: >>> FMR is not supported on most recent RDMA devices. It is also slower >>> and less secure than FRWR. As discussed during the RDMA BoF at LPC >>> 2018, it is time to remove support for FMR in the NFS/RDMA client >>> stack. NFS/RDMA server-side uses either local memory registration or >>> FRWR. There is no change required there to deprecate FMR. >>> >>> There are a few Infiniband/RoCE devices in the kernel tree that do >>> not support MEM_MGT_EXTENSIONS, and therefore will no longer support >>> client-side NFS/RDMA. These are: >>> >>> - mthca >>> - qib > > Ooh, qib was not in the list when we talked on this in plumbers. That > does change things. > > Dennis: Could qib be changed? HFI1 supports FRWR right? > >>> - usnic >>> - hns (RoCE) >> >> Can someone from Intel comment on how many of their customers rely on the qib >> driver? >> >> Can someone from Cisco comment on how many of their customers rely on the usnic >> driver? > > Ignore usnic, it is not a RDMA driver > >> Can someone from Huawei comment on how many of their customers rely on the hns >> driver? > > Nor was this, this is an actively used driver.. > > Could hns do FRWR? It is very disappointing that such a new driver > does not support FRWR. > > I guess it is worth stating again that FMR is deprecated. People are > going to be disappointed if they try to submit new drivers that don't > support FRWR :( My methodology was to search for whether each driver asserts IB_DEVICE_MEM_MGT_EXTENSIONS in device_cap_flags. Interestingly, nes and ocrdma assert it, but qib and hns do not. -- Chuck Lever ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v1 1/4] xprtrdma: Remove support for FMR memory registration 2018-11-19 22:56 ` Chuck Lever @ 2018-11-19 23:10 ` Jason Gunthorpe 0 siblings, 0 replies; 31+ messages in thread From: Jason Gunthorpe @ 2018-11-19 23:10 UTC (permalink / raw) To: Chuck Lever Cc: Bart Van Assche, linux-rdma, Linux NFS Mailing List, Lijun Ou, umalhi, zhongjiang, venkata.s.dhanalakota, Dennis Dalessandro On Mon, Nov 19, 2018 at 05:56:54PM -0500, Chuck Lever wrote: > > > > On Nov 19, 2018, at 5:41 PM, Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > On Mon, Nov 19, 2018 at 08:16:37AM -0800, Bart Van Assche wrote: > >> On Mon, 2018-11-19 at 10:45 -0500, Chuck Lever wrote: > >>> FMR is not supported on most recent RDMA devices. It is also slower > >>> and less secure than FRWR. As discussed during the RDMA BoF at LPC > >>> 2018, it is time to remove support for FMR in the NFS/RDMA client > >>> stack. NFS/RDMA server-side uses either local memory registration or > >>> FRWR. There is no change required there to deprecate FMR. > >>> > >>> There are a few Infiniband/RoCE devices in the kernel tree that do > >>> not support MEM_MGT_EXTENSIONS, and therefore will no longer support > >>> client-side NFS/RDMA. These are: > >>> > >>> - mthca > >>> - qib > > > > Ooh, qib was not in the list when we talked on this in plumbers. That > > does change things. > > > > Dennis: Could qib be changed? HFI1 supports FRWR right? > > > >>> - usnic > >>> - hns (RoCE) > >> > >> Can someone from Intel comment on how many of their customers rely on the qib > >> driver? > >> > >> Can someone from Cisco comment on how many of their customers rely on the usnic > >> driver? > > > > Ignore usnic, it is not a RDMA driver > > > >> Can someone from Huawei comment on how many of their customers rely on the hns > >> driver? > > > > Nor was this, this is an actively used driver.. > > > > Could hns do FRWR? It is very disappointing that such a new driver > > does not support FRWR. > > > > I guess it is worth stating again that FMR is deprecated. People are > > going to be disappointed if they try to submit new drivers that don't > > support FRWR :( > > My methodology was to search for whether each driver asserts > IB_DEVICE_MEM_MGT_EXTENSIONS in device_cap_flags. Drivers might be broken here, or we might need more flags. Searching for FWMR shows hits in hns, which is encouraging... Jason ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v1 1/4] xprtrdma: Remove support for FMR memory registration 2018-11-19 22:41 ` Jason Gunthorpe 2018-11-19 22:56 ` Chuck Lever @ 2018-11-20 15:22 ` Dennis Dalessandro 1 sibling, 0 replies; 31+ messages in thread From: Dennis Dalessandro @ 2018-11-20 15:22 UTC (permalink / raw) To: Jason Gunthorpe, Bart Van Assche Cc: Chuck Lever, linux-rdma, linux-nfs, oulijun, umalhi, zhongjiang, venkata.s.dhanalakota On 11/19/2018 5:41 PM, Jason Gunthorpe wrote: > On Mon, Nov 19, 2018 at 08:16:37AM -0800, Bart Van Assche wrote: >> On Mon, 2018-11-19 at 10:45 -0500, Chuck Lever wrote: >>> FMR is not supported on most recent RDMA devices. It is also slower >>> and less secure than FRWR. As discussed during the RDMA BoF at LPC >>> 2018, it is time to remove support for FMR in the NFS/RDMA client >>> stack. NFS/RDMA server-side uses either local memory registration or >>> FRWR. There is no change required there to deprecate FMR. >>> >>> There are a few Infiniband/RoCE devices in the kernel tree that do >>> not support MEM_MGT_EXTENSIONS, and therefore will no longer support >>> client-side NFS/RDMA. These are: >>> >>> - mthca >>> - qib > > Ooh, qib was not in the list when we talked on this in plumbers. That > does change things. > > Dennis: Could qib be changed? HFI1 supports FRWR right? Turns out that qib can not support FRWR because that requires HW support for opcodes that qib just doesn't understand. hfi1 is good to go though. >>> - usnic >>> - hns (RoCE) >> >> Can someone from Intel comment on how many of their customers rely on the qib >> driver? There is a user base. I can't say how large though. But I have had discussions with mgmt and we aren't quite ready to leave qib out in the cold just yet. We have a lot of people out this week due to the US Thanksgiving holiday. Once people are back in the office on Monday I'll find out some more as to the impact of this change. -Denny ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v1 2/4] xprtrdma: mrs_create off-by-one 2018-11-19 15:45 [PATCH v1 0/4] NFS/RDMA client for v4.21 (part 1) Chuck Lever 2018-11-19 15:45 ` [PATCH v1 1/4] xprtrdma: Remove support for FMR memory registration Chuck Lever @ 2018-11-19 15:45 ` Chuck Lever 2018-11-19 15:46 ` [PATCH v1 3/4] xprtrdma: Reduce max_frwr_depth Chuck Lever 2018-11-19 15:46 ` [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) Chuck Lever 3 siblings, 0 replies; 31+ messages in thread From: Chuck Lever @ 2018-11-19 15:45 UTC (permalink / raw) To: linux-rdma; +Cc: linux-nfs With certain combinations of krb5i/p and MR size, one extra segment is needed per RPC. Fixes: c421ece68f6 ("xprtrdma: Create more MRs at a time") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> --- net/sunrpc/xprtrdma/frwr_ops.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c index fc6378cc..448e063 100644 --- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -242,7 +242,7 @@ ia->ri_max_segs = max_t(unsigned int, 1, RPCRDMA_MAX_DATA_SEGS / ia->ri_max_frwr_depth); - ia->ri_max_segs += 2; /* segments for head and tail buffers */ + ia->ri_max_segs += 3; /* segments for head and tail buffers */ return 0; } ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v1 3/4] xprtrdma: Reduce max_frwr_depth 2018-11-19 15:45 [PATCH v1 0/4] NFS/RDMA client for v4.21 (part 1) Chuck Lever 2018-11-19 15:45 ` [PATCH v1 1/4] xprtrdma: Remove support for FMR memory registration Chuck Lever 2018-11-19 15:45 ` [PATCH v1 2/4] xprtrdma: mrs_create off-by-one Chuck Lever @ 2018-11-19 15:46 ` Chuck Lever 2018-11-19 15:46 ` [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) Chuck Lever 3 siblings, 0 replies; 31+ messages in thread From: Chuck Lever @ 2018-11-19 15:46 UTC (permalink / raw) To: linux-rdma; +Cc: linux-nfs Some devices advertise a large max_fast_reg_page_list_len capability, but perform optimally when MRs are significantly smaller than that depth -- probably when the MR itself is no larger than a page. By default, the RDMA R/W core API uses max_sge_rd as the maximum page depth for MRs. For some devices, the value of max_sge_rd is 1, which is also not optimal. Thus, when max_sge_rd is larger than 1, use that value. Otherwise use the value of the max_fast_reg_page_list_len attribute. I've tested this with a couple of devices, and it reproducibly improves the throughput of large I/Os by several percent. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> --- net/sunrpc/xprtrdma/frwr_ops.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c index 448e063..49b314d 100644 --- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -191,10 +191,17 @@ if (attrs->device_cap_flags & IB_DEVICE_SG_GAPS_REG) ia->ri_mrtype = IB_MR_TYPE_SG_GAPS; - ia->ri_max_frwr_depth = - min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS, - attrs->max_fast_reg_page_list_len); - dprintk("RPC: %s: device's max FR page list len = %u\n", + /* Quirk: Some devices advertise a large max_fast_reg_page_list_len + * capability, but perform optimally when the MRs are not larger + * than a page. + */ + if (attrs->max_sge_rd > 1) + ia->ri_max_frwr_depth = attrs->max_sge_rd; + else + ia->ri_max_frwr_depth = attrs->max_fast_reg_page_list_len; + if (ia->ri_max_frwr_depth > RPCRDMA_MAX_DATA_SEGS) + ia->ri_max_frwr_depth = RPCRDMA_MAX_DATA_SEGS; + dprintk("RPC: %s: max FR page list depth = %u\n", __func__, ia->ri_max_frwr_depth); /* Add room for frwr register and invalidate WRs. ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) 2018-11-19 15:45 [PATCH v1 0/4] NFS/RDMA client for v4.21 (part 1) Chuck Lever ` (2 preceding siblings ...) 2018-11-19 15:46 ` [PATCH v1 3/4] xprtrdma: Reduce max_frwr_depth Chuck Lever @ 2018-11-19 15:46 ` Chuck Lever 2018-11-19 17:47 ` Olga Kornievskaia 2018-11-20 18:02 ` Anna Schumaker 3 siblings, 2 replies; 31+ messages in thread From: Chuck Lever @ 2018-11-19 15:46 UTC (permalink / raw) To: linux-rdma; +Cc: linux-nfs Place the associated RPC transaction's XID in the upper 32 bits of each RDMA segment's rdma_offset field. These bits are currently always zero. There are two reasons to do this: - The R_key only has 8 bits that are different from registration to registration. The XID adds more uniqueness to each RDMA segment to reduce the likelihood of a software bug on the server reading from or writing into memory it's not supposed to. - On-the-wire RDMA Read and Write operations do not otherwise carry any identifier that matches them up to an RPC. The XID in the upper 32 bits will act as an eye-catcher in network captures. Suggested-by: Tom Talpey <ttalpey@microsoft.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> --- net/sunrpc/xprtrdma/frwr_ops.c | 3 ++- net/sunrpc/xprtrdma/rpc_rdma.c | 6 +++--- net/sunrpc/xprtrdma/xprt_rdma.h | 2 +- 3 files changed, 6 insertions(+), 5 deletions(-) diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c index 49b314d..3b260d2 100644 --- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -344,7 +344,7 @@ */ static struct rpcrdma_mr_seg * frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, - int nsegs, bool writing, struct rpcrdma_mr **out) + int nsegs, bool writing, u32 xid, struct rpcrdma_mr **out) { struct rpcrdma_ia *ia = &r_xprt->rx_ia; bool holes_ok = ia->ri_mrtype == IB_MR_TYPE_SG_GAPS; @@ -398,6 +398,7 @@ if (unlikely(n != mr->mr_nents)) goto out_mapmr_err; + ibmr->iova |= ((u64)cpu_to_be32(xid)) << 32; key = (u8)(ibmr->rkey & 0x000000FF); ib_update_fast_reg_key(ibmr, ++key); diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c index 9f53e02..89a2db2 100644 --- a/net/sunrpc/xprtrdma/rpc_rdma.c +++ b/net/sunrpc/xprtrdma/rpc_rdma.c @@ -357,7 +357,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt, do { seg = r_xprt->rx_ia.ri_ops->ro_map(r_xprt, seg, nsegs, - false, &mr); + false, rqst->rq_xid, &mr); if (IS_ERR(seg)) return PTR_ERR(seg); rpcrdma_mr_push(mr, &req->rl_registered); @@ -415,7 +415,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt, nchunks = 0; do { seg = r_xprt->rx_ia.ri_ops->ro_map(r_xprt, seg, nsegs, - true, &mr); + true, rqst->rq_xid, &mr); if (IS_ERR(seg)) return PTR_ERR(seg); rpcrdma_mr_push(mr, &req->rl_registered); @@ -473,7 +473,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt, nchunks = 0; do { seg = r_xprt->rx_ia.ri_ops->ro_map(r_xprt, seg, nsegs, - true, &mr); + true, rqst->rq_xid, &mr); if (IS_ERR(seg)) return PTR_ERR(seg); rpcrdma_mr_push(mr, &req->rl_registered); diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h index a13ccb6..2ae1ee2 100644 --- a/net/sunrpc/xprtrdma/xprt_rdma.h +++ b/net/sunrpc/xprtrdma/xprt_rdma.h @@ -472,7 +472,7 @@ struct rpcrdma_memreg_ops { struct rpcrdma_mr_seg * (*ro_map)(struct rpcrdma_xprt *, struct rpcrdma_mr_seg *, int, bool, - struct rpcrdma_mr **); + u32, struct rpcrdma_mr **); int (*ro_send)(struct rpcrdma_ia *ia, struct rpcrdma_req *req); void (*ro_reminv)(struct rpcrdma_rep *rep, ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) 2018-11-19 15:46 ` [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) Chuck Lever @ 2018-11-19 17:47 ` Olga Kornievskaia 2018-11-19 17:58 ` Chuck Lever 2018-11-20 18:02 ` Anna Schumaker 1 sibling, 1 reply; 31+ messages in thread From: Olga Kornievskaia @ 2018-11-19 17:47 UTC (permalink / raw) To: Chuck Lever; +Cc: linux-rdma, linux-nfs On Mon, Nov 19, 2018 at 10:46 AM Chuck Lever <chuck.lever@oracle.com> wrote: > > Place the associated RPC transaction's XID in the upper 32 bits of > each RDMA segment's rdma_offset field. These bits are currently > always zero. > > There are two reasons to do this: > > - The R_key only has 8 bits that are different from registration to > registration. The XID adds more uniqueness to each RDMA segment to > reduce the likelihood of a software bug on the server reading from > or writing into memory it's not supposed to. > > - On-the-wire RDMA Read and Write operations do not otherwise carry > any identifier that matches them up to an RPC. The XID in the > upper 32 bits will act as an eye-catcher in network captures. Is this just an "eye-catcher" or do you have plans to use it in wireshark? If the latter, then can we really do that? while a linux implementation may do that, other (or even possibly future linux) implementation might not do this. Can we justify changing the wireshark logic for it? > Suggested-by: Tom Talpey <ttalpey@microsoft.com> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > --- > net/sunrpc/xprtrdma/frwr_ops.c | 3 ++- > net/sunrpc/xprtrdma/rpc_rdma.c | 6 +++--- > net/sunrpc/xprtrdma/xprt_rdma.h | 2 +- > 3 files changed, 6 insertions(+), 5 deletions(-) > > diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c > index 49b314d..3b260d2 100644 > --- a/net/sunrpc/xprtrdma/frwr_ops.c > +++ b/net/sunrpc/xprtrdma/frwr_ops.c > @@ -344,7 +344,7 @@ > */ > static struct rpcrdma_mr_seg * > frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, > - int nsegs, bool writing, struct rpcrdma_mr **out) > + int nsegs, bool writing, u32 xid, struct rpcrdma_mr **out) > { > struct rpcrdma_ia *ia = &r_xprt->rx_ia; > bool holes_ok = ia->ri_mrtype == IB_MR_TYPE_SG_GAPS; > @@ -398,6 +398,7 @@ > if (unlikely(n != mr->mr_nents)) > goto out_mapmr_err; > > + ibmr->iova |= ((u64)cpu_to_be32(xid)) << 32; > key = (u8)(ibmr->rkey & 0x000000FF); > ib_update_fast_reg_key(ibmr, ++key); > > diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c > index 9f53e02..89a2db2 100644 > --- a/net/sunrpc/xprtrdma/rpc_rdma.c > +++ b/net/sunrpc/xprtrdma/rpc_rdma.c > @@ -357,7 +357,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt, > > do { > seg = r_xprt->rx_ia.ri_ops->ro_map(r_xprt, seg, nsegs, > - false, &mr); > + false, rqst->rq_xid, &mr); > if (IS_ERR(seg)) > return PTR_ERR(seg); > rpcrdma_mr_push(mr, &req->rl_registered); > @@ -415,7 +415,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt, > nchunks = 0; > do { > seg = r_xprt->rx_ia.ri_ops->ro_map(r_xprt, seg, nsegs, > - true, &mr); > + true, rqst->rq_xid, &mr); > if (IS_ERR(seg)) > return PTR_ERR(seg); > rpcrdma_mr_push(mr, &req->rl_registered); > @@ -473,7 +473,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt, > nchunks = 0; > do { > seg = r_xprt->rx_ia.ri_ops->ro_map(r_xprt, seg, nsegs, > - true, &mr); > + true, rqst->rq_xid, &mr); > if (IS_ERR(seg)) > return PTR_ERR(seg); > rpcrdma_mr_push(mr, &req->rl_registered); > diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h > index a13ccb6..2ae1ee2 100644 > --- a/net/sunrpc/xprtrdma/xprt_rdma.h > +++ b/net/sunrpc/xprtrdma/xprt_rdma.h > @@ -472,7 +472,7 @@ struct rpcrdma_memreg_ops { > struct rpcrdma_mr_seg * > (*ro_map)(struct rpcrdma_xprt *, > struct rpcrdma_mr_seg *, int, bool, > - struct rpcrdma_mr **); > + u32, struct rpcrdma_mr **); > int (*ro_send)(struct rpcrdma_ia *ia, > struct rpcrdma_req *req); > void (*ro_reminv)(struct rpcrdma_rep *rep, > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) 2018-11-19 17:47 ` Olga Kornievskaia @ 2018-11-19 17:58 ` Chuck Lever 2018-11-19 18:08 ` Olga Kornievskaia 0 siblings, 1 reply; 31+ messages in thread From: Chuck Lever @ 2018-11-19 17:58 UTC (permalink / raw) To: Olga Kornievskaia; +Cc: linux-rdma, Linux NFS Mailing List > On Nov 19, 2018, at 12:47 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > > On Mon, Nov 19, 2018 at 10:46 AM Chuck Lever <chuck.lever@oracle.com> wrote: >> >> Place the associated RPC transaction's XID in the upper 32 bits of >> each RDMA segment's rdma_offset field. These bits are currently >> always zero. >> >> There are two reasons to do this: >> >> - The R_key only has 8 bits that are different from registration to >> registration. The XID adds more uniqueness to each RDMA segment to >> reduce the likelihood of a software bug on the server reading from >> or writing into memory it's not supposed to. >> >> - On-the-wire RDMA Read and Write operations do not otherwise carry >> any identifier that matches them up to an RPC. The XID in the >> upper 32 bits will act as an eye-catcher in network captures. > > Is this just an "eye-catcher" or do you have plans to use it in > wireshark? If the latter, then can we really do that? while a linux > implementation may do that, other (or even possibly future linux) > implementation might not do this. Can we justify changing the > wireshark logic for it? No plans to change the wireshark RPC-over-RDMA dissector. That would only be a valid thing to do if adding the XID were made part of the RPC-over-RDMA protocol via an RFC. -- Chuck Lever ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) 2018-11-19 17:58 ` Chuck Lever @ 2018-11-19 18:08 ` Olga Kornievskaia 2018-11-19 18:18 ` Chuck Lever 0 siblings, 1 reply; 31+ messages in thread From: Olga Kornievskaia @ 2018-11-19 18:08 UTC (permalink / raw) To: Chuck Lever; +Cc: linux-rdma, linux-nfs On Mon, Nov 19, 2018 at 12:59 PM Chuck Lever <chuck.lever@oracle.com> wrote: > > > > > On Nov 19, 2018, at 12:47 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > > > > On Mon, Nov 19, 2018 at 10:46 AM Chuck Lever <chuck.lever@oracle.com> wrote: > >> > >> Place the associated RPC transaction's XID in the upper 32 bits of > >> each RDMA segment's rdma_offset field. These bits are currently > >> always zero. > >> > >> There are two reasons to do this: > >> > >> - The R_key only has 8 bits that are different from registration to > >> registration. The XID adds more uniqueness to each RDMA segment to > >> reduce the likelihood of a software bug on the server reading from > >> or writing into memory it's not supposed to. > >> > >> - On-the-wire RDMA Read and Write operations do not otherwise carry > >> any identifier that matches them up to an RPC. The XID in the > >> upper 32 bits will act as an eye-catcher in network captures. > > > > Is this just an "eye-catcher" or do you have plans to use it in > > wireshark? If the latter, then can we really do that? while a linux > > implementation may do that, other (or even possibly future linux) > > implementation might not do this. Can we justify changing the > > wireshark logic for it? > > No plans to change the wireshark RPC-over-RDMA dissector. > That would only be a valid thing to do if adding the XID > were made part of the RPC-over-RDMA protocol via an RFC. Agreed. Can you also help me understand the proposal (as I'm still trying to figure why it is useful). You are proposing to modify the RDMA segments's RDMA offset field (I see top 6bits are indeed always 0). I don't see how adding that helps an RDMA read/write message which does not have an "offset" field in it be matched to a particular RPC. I don't believe we have (had) any issues matching the initial RC Send only that contains the RDMA_MSG to the RPC. > > > -- > Chuck Lever > > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) 2018-11-19 18:08 ` Olga Kornievskaia @ 2018-11-19 18:18 ` Chuck Lever 2018-11-19 18:47 ` Olga Kornievskaia 0 siblings, 1 reply; 31+ messages in thread From: Chuck Lever @ 2018-11-19 18:18 UTC (permalink / raw) To: Olga Kornievskaia; +Cc: linux-rdma, Linux NFS Mailing List > On Nov 19, 2018, at 1:08 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > > On Mon, Nov 19, 2018 at 12:59 PM Chuck Lever <chuck.lever@oracle.com> wrote: >> >> >> >>> On Nov 19, 2018, at 12:47 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >>> >>> On Mon, Nov 19, 2018 at 10:46 AM Chuck Lever <chuck.lever@oracle.com> wrote: >>>> >>>> Place the associated RPC transaction's XID in the upper 32 bits of >>>> each RDMA segment's rdma_offset field. These bits are currently >>>> always zero. >>>> >>>> There are two reasons to do this: >>>> >>>> - The R_key only has 8 bits that are different from registration to >>>> registration. The XID adds more uniqueness to each RDMA segment to >>>> reduce the likelihood of a software bug on the server reading from >>>> or writing into memory it's not supposed to. >>>> >>>> - On-the-wire RDMA Read and Write operations do not otherwise carry >>>> any identifier that matches them up to an RPC. The XID in the >>>> upper 32 bits will act as an eye-catcher in network captures. >>> >>> Is this just an "eye-catcher" or do you have plans to use it in >>> wireshark? If the latter, then can we really do that? while a linux >>> implementation may do that, other (or even possibly future linux) >>> implementation might not do this. Can we justify changing the >>> wireshark logic for it? >> >> No plans to change the wireshark RPC-over-RDMA dissector. >> That would only be a valid thing to do if adding the XID >> were made part of the RPC-over-RDMA protocol via an RFC. > > Agreed. Can you also help me understand the proposal (as I'm still > trying to figure why it is useful). > > You are proposing to modify the RDMA segments's RDMA offset field (I > see top 6bits are indeed always 0). I don't see how adding that helps > an RDMA read/write message which does not have an "offset" field in it > be matched to a particular RPC. I don't believe we have (had) any > issues matching the initial RC Send only that contains the RDMA_MSG to > the RPC. The ULP has access to only the low order 8 bits of the R_key. The upper 24 bits are fixed for each MR. So for any given MR, there are only 256 unique R_key values. That means the same R_key will appear again quickly on the wire. The 64-bit offset field is set by the ULP, and can be essentially any arbitrary value. Most kernel ULPs use the iova of the registered memory. We only need the lower 32 bits for that. The purpose of adding junk to the offset is to make the offset unique to that RPC transaction, just like the R_key is. This helps make the RDMA segment co-ordinates (handle, length, offset) more unique and thus harder to spoof. We could use random numbers in that upper 32 bits, but we have something more handy: the RPC's XID. Now when you look at an RDMA Read or Write, the top 32 bits in each RDMA segment's offset match the XID of the RPC transaction that the RDMA operations go with. This is really a secondary benefit to the uniquifying effect above. -- Chuck Lever ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) 2018-11-19 18:18 ` Chuck Lever @ 2018-11-19 18:47 ` Olga Kornievskaia 2018-11-19 18:58 ` Chuck Lever 0 siblings, 1 reply; 31+ messages in thread From: Olga Kornievskaia @ 2018-11-19 18:47 UTC (permalink / raw) To: Chuck Lever; +Cc: linux-rdma, linux-nfs On Mon, Nov 19, 2018 at 1:19 PM Chuck Lever <chuck.lever@oracle.com> wrote: > > > > > On Nov 19, 2018, at 1:08 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > > > > On Mon, Nov 19, 2018 at 12:59 PM Chuck Lever <chuck.lever@oracle.com> wrote: > >> > >> > >> > >>> On Nov 19, 2018, at 12:47 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > >>> > >>> On Mon, Nov 19, 2018 at 10:46 AM Chuck Lever <chuck.lever@oracle.com> wrote: > >>>> > >>>> Place the associated RPC transaction's XID in the upper 32 bits of > >>>> each RDMA segment's rdma_offset field. These bits are currently > >>>> always zero. > >>>> > >>>> There are two reasons to do this: > >>>> > >>>> - The R_key only has 8 bits that are different from registration to > >>>> registration. The XID adds more uniqueness to each RDMA segment to > >>>> reduce the likelihood of a software bug on the server reading from > >>>> or writing into memory it's not supposed to. > >>>> > >>>> - On-the-wire RDMA Read and Write operations do not otherwise carry > >>>> any identifier that matches them up to an RPC. The XID in the > >>>> upper 32 bits will act as an eye-catcher in network captures. > >>> > >>> Is this just an "eye-catcher" or do you have plans to use it in > >>> wireshark? If the latter, then can we really do that? while a linux > >>> implementation may do that, other (or even possibly future linux) > >>> implementation might not do this. Can we justify changing the > >>> wireshark logic for it? > >> > >> No plans to change the wireshark RPC-over-RDMA dissector. > >> That would only be a valid thing to do if adding the XID > >> were made part of the RPC-over-RDMA protocol via an RFC. > > > > Agreed. Can you also help me understand the proposal (as I'm still > > trying to figure why it is useful). > > > > You are proposing to modify the RDMA segments's RDMA offset field (I > > see top 6bits are indeed always 0). I don't see how adding that helps > > an RDMA read/write message which does not have an "offset" field in it > > be matched to a particular RPC. I don't believe we have (had) any > > issues matching the initial RC Send only that contains the RDMA_MSG to > > the RPC. > > The ULP has access to only the low order 8 bits of the R_key. The > upper 24 bits are fixed for each MR. So for any given MR, there are > only 256 unique R_key values. That means the same R_key will appear > again quickly on the wire. > > The 64-bit offset field is set by the ULP, and can be essentially > any arbitrary value. Most kernel ULPs use the iova of the registered > memory. We only need the lower 32 bits for that. > > The purpose of adding junk to the offset is to make the offset > unique to that RPC transaction, just like the R_key is. This helps > make the RDMA segment co-ordinates (handle, length, offset) more > unique and thus harder to spoof. Thank you for the explanation that makes sense. > We could use random numbers in that upper 32 bits, but we have > something more handy: the RPC's XID. > > Now when you look at an RDMA Read or Write, the top 32 bits in each > RDMA segment's offset match the XID of the RPC transaction that the > RDMA operations go with. This is really a secondary benefit to the > uniquifying effect above. I find the wording "no the wire RDMA read or write" misleading. Did you really mean it as "RDMA read or write" or do you mean "RDMA_MSG" or do you mean "NFS RDMA read or write"? Because RDMA offset is not a part of the RDMA read/write (first/middle/last) packet. That's what I'm hanged up on. > > > -- > Chuck Lever > > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) 2018-11-19 18:47 ` Olga Kornievskaia @ 2018-11-19 18:58 ` Chuck Lever 2018-11-19 21:22 ` Olga Kornievskaia 0 siblings, 1 reply; 31+ messages in thread From: Chuck Lever @ 2018-11-19 18:58 UTC (permalink / raw) To: Olga Kornievskaia; +Cc: linux-rdma, Linux NFS Mailing List > On Nov 19, 2018, at 1:47 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > > On Mon, Nov 19, 2018 at 1:19 PM Chuck Lever <chuck.lever@oracle.com> wrote: >> >> >> >>> On Nov 19, 2018, at 1:08 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >>> >>> On Mon, Nov 19, 2018 at 12:59 PM Chuck Lever <chuck.lever@oracle.com> wrote: >>>> >>>> >>>> >>>>> On Nov 19, 2018, at 12:47 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>> >>>>> On Mon, Nov 19, 2018 at 10:46 AM Chuck Lever <chuck.lever@oracle.com> wrote: >>>>>> >>>>>> Place the associated RPC transaction's XID in the upper 32 bits of >>>>>> each RDMA segment's rdma_offset field. These bits are currently >>>>>> always zero. >>>>>> >>>>>> There are two reasons to do this: >>>>>> >>>>>> - The R_key only has 8 bits that are different from registration to >>>>>> registration. The XID adds more uniqueness to each RDMA segment to >>>>>> reduce the likelihood of a software bug on the server reading from >>>>>> or writing into memory it's not supposed to. >>>>>> >>>>>> - On-the-wire RDMA Read and Write operations do not otherwise carry >>>>>> any identifier that matches them up to an RPC. The XID in the >>>>>> upper 32 bits will act as an eye-catcher in network captures. >>>>> >>>>> Is this just an "eye-catcher" or do you have plans to use it in >>>>> wireshark? If the latter, then can we really do that? while a linux >>>>> implementation may do that, other (or even possibly future linux) >>>>> implementation might not do this. Can we justify changing the >>>>> wireshark logic for it? >>>> >>>> No plans to change the wireshark RPC-over-RDMA dissector. >>>> That would only be a valid thing to do if adding the XID >>>> were made part of the RPC-over-RDMA protocol via an RFC. >>> >>> Agreed. Can you also help me understand the proposal (as I'm still >>> trying to figure why it is useful). >>> >>> You are proposing to modify the RDMA segments's RDMA offset field (I >>> see top 6bits are indeed always 0). I don't see how adding that helps >>> an RDMA read/write message which does not have an "offset" field in it >>> be matched to a particular RPC. I don't believe we have (had) any >>> issues matching the initial RC Send only that contains the RDMA_MSG to >>> the RPC. >> >> The ULP has access to only the low order 8 bits of the R_key. The >> upper 24 bits are fixed for each MR. So for any given MR, there are >> only 256 unique R_key values. That means the same R_key will appear >> again quickly on the wire. >> >> The 64-bit offset field is set by the ULP, and can be essentially >> any arbitrary value. Most kernel ULPs use the iova of the registered >> memory. We only need the lower 32 bits for that. >> >> The purpose of adding junk to the offset is to make the offset >> unique to that RPC transaction, just like the R_key is. This helps >> make the RDMA segment co-ordinates (handle, length, offset) more >> unique and thus harder to spoof. > > Thank you for the explanation that makes sense. > >> We could use random numbers in that upper 32 bits, but we have >> something more handy: the RPC's XID. >> >> Now when you look at an RDMA Read or Write, the top 32 bits in each >> RDMA segment's offset match the XID of the RPC transaction that the >> RDMA operations go with. This is really a secondary benefit to the >> uniquifying effect above. > > I find the wording "no the wire RDMA read or write" misleading. Did > you really mean it as "RDMA read or write" or do you mean "RDMA_MSG" > or do you mean "NFS RDMA read or write"? Because RDMA offset is not a > part of the RDMA read/write (first/middle/last) packet. That's what > I'm hanged up on. Here's an RDMA Read request in a network capture I had at hand: No. Time Source Destination Protocol Length Info 228 22:31:06.203637 LID: 5 LID: 11 InfiniBand 42 RC RDMA Read Request QP=0x000240 Frame 228: 42 bytes on wire (336 bits), 42 bytes captured (336 bits) on interface 1 Extensible Record Format InfiniBand Local Route Header Base Transport Header RETH - RDMA Extended Transport Header Virtual Address: 11104011393315758080 <<<<<< Remote Key: 1879114618 DMA Length: 4015 Invariant CRC: 0xd492a3e1 Variant CRC: 0x8736 The value of the Virtual Address field is what the RPC-over-RDMA protocol calls the Offset. The Read responses are matched to this request by their message sequence numbers, and this Read request is matched to the RPC Call by the XID in the top 32 bits of the Virtual Address. Likewise for an RDMA Write Only request: 188 22:31:06.201350 LID: 5 LID: 11 InfiniBand 162 RC RDMA Write Only QP=0x000240 Frame 188: 162 bytes on wire (1296 bits), 162 bytes captured (1296 bits) on interface 1 Extensible Record Format InfiniBand Local Route Header Base Transport Header RETH - RDMA Extended Transport Header Virtual Address: 10455493047213809920 <<<<<< Remote Key: 1879115386 DMA Length: 120 Invariant CRC: 0xe2e1b2cd Variant CRC: 0x676f Data (120 bytes) 0000 91 19 5f 87 00 00 00 01 00 00 00 00 00 00 00 00 .._............. 0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 ................ 0020 00 00 00 02 00 00 01 ed 00 00 00 02 00 00 04 16 ................ 0030 00 00 00 64 00 00 00 00 00 00 00 28 00 00 00 00 ...d.......(.... 0040 00 00 00 00 00 00 00 00 00 00 00 00 54 9e e5 9d ............T... 0050 c6 d3 1a d2 00 00 00 00 00 00 63 1f 5b f2 2e 7a ..........c.[..z 0060 0b ae f9 ec 5b f2 2e 7a 0b 53 6c ef 5b f2 2e 7a ....[..z.Sl.[..z 0070 0b 53 6c ef 00 00 00 0c .Sl..... I believe RDMA Write First also has an RETH. The sender does not interleave RDMA Writes, so subsequent Middle and Last packets go with this RDMA Write First. -- Chuck Lever ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) 2018-11-19 18:58 ` Chuck Lever @ 2018-11-19 21:22 ` Olga Kornievskaia 2018-11-19 21:32 ` Chuck Lever 0 siblings, 1 reply; 31+ messages in thread From: Olga Kornievskaia @ 2018-11-19 21:22 UTC (permalink / raw) To: Chuck Lever; +Cc: linux-rdma, linux-nfs On Mon, Nov 19, 2018 at 1:59 PM Chuck Lever <chuck.lever@oracle.com> wrote: > > > > > On Nov 19, 2018, at 1:47 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > > > > On Mon, Nov 19, 2018 at 1:19 PM Chuck Lever <chuck.lever@oracle.com> wrote: > >> > >> > >> > >>> On Nov 19, 2018, at 1:08 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > >>> > >>> On Mon, Nov 19, 2018 at 12:59 PM Chuck Lever <chuck.lever@oracle.com> wrote: > >>>> > >>>> > >>>> > >>>>> On Nov 19, 2018, at 12:47 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > >>>>> > >>>>> On Mon, Nov 19, 2018 at 10:46 AM Chuck Lever <chuck.lever@oracle.com> wrote: > >>>>>> > >>>>>> Place the associated RPC transaction's XID in the upper 32 bits of > >>>>>> each RDMA segment's rdma_offset field. These bits are currently > >>>>>> always zero. > >>>>>> > >>>>>> There are two reasons to do this: > >>>>>> > >>>>>> - The R_key only has 8 bits that are different from registration to > >>>>>> registration. The XID adds more uniqueness to each RDMA segment to > >>>>>> reduce the likelihood of a software bug on the server reading from > >>>>>> or writing into memory it's not supposed to. > >>>>>> > >>>>>> - On-the-wire RDMA Read and Write operations do not otherwise carry > >>>>>> any identifier that matches them up to an RPC. The XID in the > >>>>>> upper 32 bits will act as an eye-catcher in network captures. > >>>>> > >>>>> Is this just an "eye-catcher" or do you have plans to use it in > >>>>> wireshark? If the latter, then can we really do that? while a linux > >>>>> implementation may do that, other (or even possibly future linux) > >>>>> implementation might not do this. Can we justify changing the > >>>>> wireshark logic for it? > >>>> > >>>> No plans to change the wireshark RPC-over-RDMA dissector. > >>>> That would only be a valid thing to do if adding the XID > >>>> were made part of the RPC-over-RDMA protocol via an RFC. > >>> > >>> Agreed. Can you also help me understand the proposal (as I'm still > >>> trying to figure why it is useful). > >>> > >>> You are proposing to modify the RDMA segments's RDMA offset field (I > >>> see top 6bits are indeed always 0). I don't see how adding that helps > >>> an RDMA read/write message which does not have an "offset" field in it > >>> be matched to a particular RPC. I don't believe we have (had) any > >>> issues matching the initial RC Send only that contains the RDMA_MSG to > >>> the RPC. > >> > >> The ULP has access to only the low order 8 bits of the R_key. The > >> upper 24 bits are fixed for each MR. So for any given MR, there are > >> only 256 unique R_key values. That means the same R_key will appear > >> again quickly on the wire. > >> > >> The 64-bit offset field is set by the ULP, and can be essentially > >> any arbitrary value. Most kernel ULPs use the iova of the registered > >> memory. We only need the lower 32 bits for that. > >> > >> The purpose of adding junk to the offset is to make the offset > >> unique to that RPC transaction, just like the R_key is. This helps > >> make the RDMA segment co-ordinates (handle, length, offset) more > >> unique and thus harder to spoof. > > > > Thank you for the explanation that makes sense. > > > >> We could use random numbers in that upper 32 bits, but we have > >> something more handy: the RPC's XID. > >> > >> Now when you look at an RDMA Read or Write, the top 32 bits in each > >> RDMA segment's offset match the XID of the RPC transaction that the > >> RDMA operations go with. This is really a secondary benefit to the > >> uniquifying effect above. > > > > I find the wording "no the wire RDMA read or write" misleading. Did > > you really mean it as "RDMA read or write" or do you mean "RDMA_MSG" > > or do you mean "NFS RDMA read or write"? Because RDMA offset is not a > > part of the RDMA read/write (first/middle/last) packet. That's what > > I'm hanged up on. > > Here's an RDMA Read request in a network capture I had at hand: > > No. Time Source Destination Protocol Length Info > 228 22:31:06.203637 LID: 5 LID: 11 InfiniBand 42 RC RDMA Read Request QP=0x000240 > > Frame 228: 42 bytes on wire (336 bits), 42 bytes captured (336 bits) on interface 1 > Extensible Record Format > InfiniBand > Local Route Header > Base Transport Header > RETH - RDMA Extended Transport Header > Virtual Address: 11104011393315758080 <<<<<< > Remote Key: 1879114618 > DMA Length: 4015 > Invariant CRC: 0xd492a3e1 > Variant CRC: 0x8736 > > The value of the Virtual Address field is what the RPC-over-RDMA > protocol calls the Offset. The Read responses are matched to this > request by their message sequence numbers, and this Read request is > matched to the RPC Call by the XID in the top 32 bits of the > Virtual Address. > > Likewise for an RDMA Write Only request: > > 188 22:31:06.201350 LID: 5 LID: 11 InfiniBand 162 RC RDMA Write Only QP=0x000240 > > Frame 188: 162 bytes on wire (1296 bits), 162 bytes captured (1296 bits) on interface 1 > Extensible Record Format > InfiniBand > Local Route Header > Base Transport Header > RETH - RDMA Extended Transport Header > Virtual Address: 10455493047213809920 <<<<<< > Remote Key: 1879115386 > DMA Length: 120 > Invariant CRC: 0xe2e1b2cd > Variant CRC: 0x676f > Data (120 bytes) > > 0000 91 19 5f 87 00 00 00 01 00 00 00 00 00 00 00 00 .._............. > 0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 ................ > 0020 00 00 00 02 00 00 01 ed 00 00 00 02 00 00 04 16 ................ > 0030 00 00 00 64 00 00 00 00 00 00 00 28 00 00 00 00 ...d.......(.... > 0040 00 00 00 00 00 00 00 00 00 00 00 00 54 9e e5 9d ............T... > 0050 c6 d3 1a d2 00 00 00 00 00 00 63 1f 5b f2 2e 7a ..........c.[..z > 0060 0b ae f9 ec 5b f2 2e 7a 0b 53 6c ef 5b f2 2e 7a ....[..z.Sl.[..z > 0070 0b 53 6c ef 00 00 00 0c .Sl..... > > > I believe RDMA Write First also has an RETH. The sender does not > interleave RDMA Writes, so subsequent Middle and Last packets go > with this RDMA Write First. Ok I see now where I was confused, in RDMA_MSG in the wireshark it's labeled "RDMA offset" and the in the RDMA write first message it's labeled "Virtual address". Thank you for explanation. Here's the next question (coming from Jorge)? Is it reasonable to assume that top 32bits are always zero? I have an network trace (from 4.18-rc2 kernel) where they are not. RPC over RDMA XID: 0xa347cfa2 Version: 1 Flow Control: 128 Message Type: RDMA_MSG (0) Read list (count: 0) Write list (count: 1) Write chunk (1 segment) Write chunk segment count: 1 RDMA segment 0 RDMA handle: 0x4000076f RDMA length: 65676 RDMA offset: 0x0000001049973000 I don't believe 0xa347cfa2 can fit? > > > -- > Chuck Lever > > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) 2018-11-19 21:22 ` Olga Kornievskaia @ 2018-11-19 21:32 ` Chuck Lever 2018-11-19 21:42 ` Mora, Jorge 0 siblings, 1 reply; 31+ messages in thread From: Chuck Lever @ 2018-11-19 21:32 UTC (permalink / raw) To: Olga Kornievskaia; +Cc: linux-rdma, Linux NFS Mailing List > On Nov 19, 2018, at 4:22 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > > On Mon, Nov 19, 2018 at 1:59 PM Chuck Lever <chuck.lever@oracle.com> wrote: >> >> >> >>> On Nov 19, 2018, at 1:47 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >>> >>> On Mon, Nov 19, 2018 at 1:19 PM Chuck Lever <chuck.lever@oracle.com> wrote: >>>> >>>> >>>> >>>>> On Nov 19, 2018, at 1:08 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>> >>>>> On Mon, Nov 19, 2018 at 12:59 PM Chuck Lever <chuck.lever@oracle.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>>> On Nov 19, 2018, at 12:47 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>>>> >>>>>>> On Mon, Nov 19, 2018 at 10:46 AM Chuck Lever <chuck.lever@oracle.com> wrote: >>>>>>>> >>>>>>>> Place the associated RPC transaction's XID in the upper 32 bits of >>>>>>>> each RDMA segment's rdma_offset field. These bits are currently >>>>>>>> always zero. >>>>>>>> >>>>>>>> There are two reasons to do this: >>>>>>>> >>>>>>>> - The R_key only has 8 bits that are different from registration to >>>>>>>> registration. The XID adds more uniqueness to each RDMA segment to >>>>>>>> reduce the likelihood of a software bug on the server reading from >>>>>>>> or writing into memory it's not supposed to. >>>>>>>> >>>>>>>> - On-the-wire RDMA Read and Write operations do not otherwise carry >>>>>>>> any identifier that matches them up to an RPC. The XID in the >>>>>>>> upper 32 bits will act as an eye-catcher in network captures. >>>>>>> >>>>>>> Is this just an "eye-catcher" or do you have plans to use it in >>>>>>> wireshark? If the latter, then can we really do that? while a linux >>>>>>> implementation may do that, other (or even possibly future linux) >>>>>>> implementation might not do this. Can we justify changing the >>>>>>> wireshark logic for it? >>>>>> >>>>>> No plans to change the wireshark RPC-over-RDMA dissector. >>>>>> That would only be a valid thing to do if adding the XID >>>>>> were made part of the RPC-over-RDMA protocol via an RFC. >>>>> >>>>> Agreed. Can you also help me understand the proposal (as I'm still >>>>> trying to figure why it is useful). >>>>> >>>>> You are proposing to modify the RDMA segments's RDMA offset field (I >>>>> see top 6bits are indeed always 0). I don't see how adding that helps >>>>> an RDMA read/write message which does not have an "offset" field in it >>>>> be matched to a particular RPC. I don't believe we have (had) any >>>>> issues matching the initial RC Send only that contains the RDMA_MSG to >>>>> the RPC. >>>> >>>> The ULP has access to only the low order 8 bits of the R_key. The >>>> upper 24 bits are fixed for each MR. So for any given MR, there are >>>> only 256 unique R_key values. That means the same R_key will appear >>>> again quickly on the wire. >>>> >>>> The 64-bit offset field is set by the ULP, and can be essentially >>>> any arbitrary value. Most kernel ULPs use the iova of the registered >>>> memory. We only need the lower 32 bits for that. >>>> >>>> The purpose of adding junk to the offset is to make the offset >>>> unique to that RPC transaction, just like the R_key is. This helps >>>> make the RDMA segment co-ordinates (handle, length, offset) more >>>> unique and thus harder to spoof. >>> >>> Thank you for the explanation that makes sense. >>> >>>> We could use random numbers in that upper 32 bits, but we have >>>> something more handy: the RPC's XID. >>>> >>>> Now when you look at an RDMA Read or Write, the top 32 bits in each >>>> RDMA segment's offset match the XID of the RPC transaction that the >>>> RDMA operations go with. This is really a secondary benefit to the >>>> uniquifying effect above. >>> >>> I find the wording "no the wire RDMA read or write" misleading. Did >>> you really mean it as "RDMA read or write" or do you mean "RDMA_MSG" >>> or do you mean "NFS RDMA read or write"? Because RDMA offset is not a >>> part of the RDMA read/write (first/middle/last) packet. That's what >>> I'm hanged up on. >> >> Here's an RDMA Read request in a network capture I had at hand: >> >> No. Time Source Destination Protocol Length Info >> 228 22:31:06.203637 LID: 5 LID: 11 InfiniBand 42 RC RDMA Read Request QP=0x000240 >> >> Frame 228: 42 bytes on wire (336 bits), 42 bytes captured (336 bits) on interface 1 >> Extensible Record Format >> InfiniBand >> Local Route Header >> Base Transport Header >> RETH - RDMA Extended Transport Header >> Virtual Address: 11104011393315758080 <<<<<< >> Remote Key: 1879114618 >> DMA Length: 4015 >> Invariant CRC: 0xd492a3e1 >> Variant CRC: 0x8736 >> >> The value of the Virtual Address field is what the RPC-over-RDMA >> protocol calls the Offset. The Read responses are matched to this >> request by their message sequence numbers, and this Read request is >> matched to the RPC Call by the XID in the top 32 bits of the >> Virtual Address. >> >> Likewise for an RDMA Write Only request: >> >> 188 22:31:06.201350 LID: 5 LID: 11 InfiniBand 162 RC RDMA Write Only QP=0x000240 >> >> Frame 188: 162 bytes on wire (1296 bits), 162 bytes captured (1296 bits) on interface 1 >> Extensible Record Format >> InfiniBand >> Local Route Header >> Base Transport Header >> RETH - RDMA Extended Transport Header >> Virtual Address: 10455493047213809920 <<<<<< >> Remote Key: 1879115386 >> DMA Length: 120 >> Invariant CRC: 0xe2e1b2cd >> Variant CRC: 0x676f >> Data (120 bytes) >> >> 0000 91 19 5f 87 00 00 00 01 00 00 00 00 00 00 00 00 .._............. >> 0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 ................ >> 0020 00 00 00 02 00 00 01 ed 00 00 00 02 00 00 04 16 ................ >> 0030 00 00 00 64 00 00 00 00 00 00 00 28 00 00 00 00 ...d.......(.... >> 0040 00 00 00 00 00 00 00 00 00 00 00 00 54 9e e5 9d ............T... >> 0050 c6 d3 1a d2 00 00 00 00 00 00 63 1f 5b f2 2e 7a ..........c.[..z >> 0060 0b ae f9 ec 5b f2 2e 7a 0b 53 6c ef 5b f2 2e 7a ....[..z.Sl.[..z >> 0070 0b 53 6c ef 00 00 00 0c .Sl..... >> >> >> I believe RDMA Write First also has an RETH. The sender does not >> interleave RDMA Writes, so subsequent Middle and Last packets go >> with this RDMA Write First. > > Ok I see now where I was confused, in RDMA_MSG in the wireshark it's > labeled "RDMA offset" and the in the RDMA write first message it's > labeled "Virtual address". Thank you for explanation. > > Here's the next question (coming from Jorge)? Is it reasonable to > assume that top 32bits are always zero? I have an network trace (from > 4.18-rc2 kernel) where they are not. > > RPC over RDMA > XID: 0xa347cfa2 > Version: 1 > Flow Control: 128 > Message Type: RDMA_MSG (0) > Read list (count: 0) > Write list (count: 1) > Write chunk (1 segment) > Write chunk segment count: 1 > RDMA segment 0 > RDMA handle: 0x4000076f > RDMA length: 65676 > RDMA offset: 0x0000001049973000 > > I don't believe 0xa347cfa2 can fit? I've been told by three independent RDMA experts that using the top 32 bits of the offset, even if they are not zero, should be OK to do. If it doesn't work, it's a device driver bug that needs to be fixed. -- Chuck Lever ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) 2018-11-19 21:32 ` Chuck Lever @ 2018-11-19 21:42 ` Mora, Jorge 2018-11-19 22:46 ` Jason Gunthorpe 0 siblings, 1 reply; 31+ messages in thread From: Mora, Jorge @ 2018-11-19 21:42 UTC (permalink / raw) To: Chuck Lever, Olga Kornievskaia; +Cc: linux-rdma, Linux NFS Mailing List Hello Chuck, I am confused, is it the whole purpose of RDMA is to place the data directly into the memory location given by the virtual address or offset? What you are saying is that this offset is not the actual memory address and so the driver must map this offset to the actual address? --Jorge On 11/19/18, 2:33 PM, "linux-nfs-owner@vger.kernel.org on behalf of Chuck Lever" <linux-nfs-owner@vger.kernel.org on behalf of chuck.lever@oracle.com> wrote: NetApp Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and know the content is safe. > On Nov 19, 2018, at 4:22 PM, Olga Kornievskaia <aglo@umich.edu> wrote: > > On Mon, Nov 19, 2018 at 1:59 PM Chuck Lever <chuck.lever@oracle.com> wrote: >> >> >> >>> On Nov 19, 2018, at 1:47 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >>> >>> On Mon, Nov 19, 2018 at 1:19 PM Chuck Lever <chuck.lever@oracle.com> wrote: >>>> >>>> >>>> >>>>> On Nov 19, 2018, at 1:08 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>> >>>>> On Mon, Nov 19, 2018 at 12:59 PM Chuck Lever <chuck.lever@oracle.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>>> On Nov 19, 2018, at 12:47 PM, Olga Kornievskaia <aglo@umich.edu> wrote: >>>>>>> >>>>>>> On Mon, Nov 19, 2018 at 10:46 AM Chuck Lever <chuck.lever@oracle.com> wrote: >>>>>>>> >>>>>>>> Place the associated RPC transaction's XID in the upper 32 bits of >>>>>>>> each RDMA segment's rdma_offset field. These bits are currently >>>>>>>> always zero. >>>>>>>> >>>>>>>> There are two reasons to do this: >>>>>>>> >>>>>>>> - The R_key only has 8 bits that are different from registration to >>>>>>>> registration. The XID adds more uniqueness to each RDMA segment to >>>>>>>> reduce the likelihood of a software bug on the server reading from >>>>>>>> or writing into memory it's not supposed to. >>>>>>>> >>>>>>>> - On-the-wire RDMA Read and Write operations do not otherwise carry >>>>>>>> any identifier that matches them up to an RPC. The XID in the >>>>>>>> upper 32 bits will act as an eye-catcher in network captures. >>>>>>> >>>>>>> Is this just an "eye-catcher" or do you have plans to use it in >>>>>>> wireshark? If the latter, then can we really do that? while a linux >>>>>>> implementation may do that, other (or even possibly future linux) >>>>>>> implementation might not do this. Can we justify changing the >>>>>>> wireshark logic for it? >>>>>> >>>>>> No plans to change the wireshark RPC-over-RDMA dissector. >>>>>> That would only be a valid thing to do if adding the XID >>>>>> were made part of the RPC-over-RDMA protocol via an RFC. >>>>> >>>>> Agreed. Can you also help me understand the proposal (as I'm still >>>>> trying to figure why it is useful). >>>>> >>>>> You are proposing to modify the RDMA segments's RDMA offset field (I >>>>> see top 6bits are indeed always 0). I don't see how adding that helps >>>>> an RDMA read/write message which does not have an "offset" field in it >>>>> be matched to a particular RPC. I don't believe we have (had) any >>>>> issues matching the initial RC Send only that contains the RDMA_MSG to >>>>> the RPC. >>>> >>>> The ULP has access to only the low order 8 bits of the R_key. The >>>> upper 24 bits are fixed for each MR. So for any given MR, there are >>>> only 256 unique R_key values. That means the same R_key will appear >>>> again quickly on the wire. >>>> >>>> The 64-bit offset field is set by the ULP, and can be essentially >>>> any arbitrary value. Most kernel ULPs use the iova of the registered >>>> memory. We only need the lower 32 bits for that. >>>> >>>> The purpose of adding junk to the offset is to make the offset >>>> unique to that RPC transaction, just like the R_key is. This helps >>>> make the RDMA segment co-ordinates (handle, length, offset) more >>>> unique and thus harder to spoof. >>> >>> Thank you for the explanation that makes sense. >>> >>>> We could use random numbers in that upper 32 bits, but we have >>>> something more handy: the RPC's XID. >>>> >>>> Now when you look at an RDMA Read or Write, the top 32 bits in each >>>> RDMA segment's offset match the XID of the RPC transaction that the >>>> RDMA operations go with. This is really a secondary benefit to the >>>> uniquifying effect above. >>> >>> I find the wording "no the wire RDMA read or write" misleading. Did >>> you really mean it as "RDMA read or write" or do you mean "RDMA_MSG" >>> or do you mean "NFS RDMA read or write"? Because RDMA offset is not a >>> part of the RDMA read/write (first/middle/last) packet. That's what >>> I'm hanged up on. >> >> Here's an RDMA Read request in a network capture I had at hand: >> >> No. Time Source Destination Protocol Length Info >> 228 22:31:06.203637 LID: 5 LID: 11 InfiniBand 42 RC RDMA Read Request QP=0x000240 >> >> Frame 228: 42 bytes on wire (336 bits), 42 bytes captured (336 bits) on interface 1 >> Extensible Record Format >> InfiniBand >> Local Route Header >> Base Transport Header >> RETH - RDMA Extended Transport Header >> Virtual Address: 11104011393315758080 <<<<<< >> Remote Key: 1879114618 >> DMA Length: 4015 >> Invariant CRC: 0xd492a3e1 >> Variant CRC: 0x8736 >> >> The value of the Virtual Address field is what the RPC-over-RDMA >> protocol calls the Offset. The Read responses are matched to this >> request by their message sequence numbers, and this Read request is >> matched to the RPC Call by the XID in the top 32 bits of the >> Virtual Address. >> >> Likewise for an RDMA Write Only request: >> >> 188 22:31:06.201350 LID: 5 LID: 11 InfiniBand 162 RC RDMA Write Only QP=0x000240 >> >> Frame 188: 162 bytes on wire (1296 bits), 162 bytes captured (1296 bits) on interface 1 >> Extensible Record Format >> InfiniBand >> Local Route Header >> Base Transport Header >> RETH - RDMA Extended Transport Header >> Virtual Address: 10455493047213809920 <<<<<< >> Remote Key: 1879115386 >> DMA Length: 120 >> Invariant CRC: 0xe2e1b2cd >> Variant CRC: 0x676f >> Data (120 bytes) >> >> 0000 91 19 5f 87 00 00 00 01 00 00 00 00 00 00 00 00 .._............. >> 0010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 ................ >> 0020 00 00 00 02 00 00 01 ed 00 00 00 02 00 00 04 16 ................ >> 0030 00 00 00 64 00 00 00 00 00 00 00 28 00 00 00 00 ...d.......(.... >> 0040 00 00 00 00 00 00 00 00 00 00 00 00 54 9e e5 9d ............T... >> 0050 c6 d3 1a d2 00 00 00 00 00 00 63 1f 5b f2 2e 7a ..........c.[..z >> 0060 0b ae f9 ec 5b f2 2e 7a 0b 53 6c ef 5b f2 2e 7a ....[..z.Sl.[..z >> 0070 0b 53 6c ef 00 00 00 0c .Sl..... >> >> >> I believe RDMA Write First also has an RETH. The sender does not >> interleave RDMA Writes, so subsequent Middle and Last packets go >> with this RDMA Write First. > > Ok I see now where I was confused, in RDMA_MSG in the wireshark it's > labeled "RDMA offset" and the in the RDMA write first message it's > labeled "Virtual address". Thank you for explanation. > > Here's the next question (coming from Jorge)? Is it reasonable to > assume that top 32bits are always zero? I have an network trace (from > 4.18-rc2 kernel) where they are not. > > RPC over RDMA > XID: 0xa347cfa2 > Version: 1 > Flow Control: 128 > Message Type: RDMA_MSG (0) > Read list (count: 0) > Write list (count: 1) > Write chunk (1 segment) > Write chunk segment count: 1 > RDMA segment 0 > RDMA handle: 0x4000076f > RDMA length: 65676 > RDMA offset: 0x0000001049973000 > > I don't believe 0xa347cfa2 can fit? I've been told by three independent RDMA experts that using the top 32 bits of the offset, even if they are not zero, should be OK to do. If it doesn't work, it's a device driver bug that needs to be fixed. -- Chuck Lever ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) 2018-11-19 21:42 ` Mora, Jorge @ 2018-11-19 22:46 ` Jason Gunthorpe 2018-11-20 2:45 ` Tom Talpey 0 siblings, 1 reply; 31+ messages in thread From: Jason Gunthorpe @ 2018-11-19 22:46 UTC (permalink / raw) To: Mora, Jorge Cc: Chuck Lever, Olga Kornievskaia, linux-rdma, Linux NFS Mailing List On Mon, Nov 19, 2018 at 09:42:56PM +0000, Mora, Jorge wrote: > Hello Chuck, > > I am confused, is it the whole purpose of RDMA is to place the data > directly into the memory location given by the virtual address or > offset? What you are saying is that this offset is not the actual > memory address and so the driver must map this offset to the actual > address? All RDMA on the wire addresses are 'virtual' and always go through translation hardware to reach a physical memory address. Even if you have a virtual address that matches your physical address there will still be translation hardware to enforce security. The direct placement refers to not copying data, not to using physical memory addresses throughout the protocol design. So it is better to use a virtual address that doesn't disclose any information about the host's configuration to the network, such as physical locations of kernel memory. Jason ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) 2018-11-19 22:46 ` Jason Gunthorpe @ 2018-11-20 2:45 ` Tom Talpey 2018-11-20 3:09 ` Jason Gunthorpe 0 siblings, 1 reply; 31+ messages in thread From: Tom Talpey @ 2018-11-20 2:45 UTC (permalink / raw) To: Jason Gunthorpe, Mora, Jorge Cc: Chuck Lever, Olga Kornievskaia, linux-rdma, Linux NFS Mailing List On 11/19/2018 5:46 PM, Jason Gunthorpe wrote: > On Mon, Nov 19, 2018 at 09:42:56PM +0000, Mora, Jorge wrote: >> Hello Chuck, >> >> I am confused, is it the whole purpose of RDMA is to place the data >> directly into the memory location given by the virtual address or >> offset? What you are saying is that this offset is not the actual >> memory address and so the driver must map this offset to the actual >> address? > > All RDMA on the wire addresses are 'virtual' and always go through > translation hardware to reach a physical memory address. Even if you > have a virtual address that matches your physical address there will > still be translation hardware to enforce security. > > The direct placement refers to not copying data, not to using physical > memory addresses throughout the protocol design. > > So it is better to use a virtual address that doesn't disclose any > information about the host's configuration to the network, such as > physical locations of kernel memory. Agreed. And, this is a very subtle but very powerful aspect of FRWR registration, that the high-order bits of the "virtual address" can basically be any value of the requestor's (client's) choice. Note, the low-order bits, up to the maximum offset used by the upper layer (e.g. a few MB for NFS), must be reserved in order to allow RDMA segments to iterate over the range of bytes in the region. That's only 24 or so bits out of the available 64, here however. For other registration methods, this is not always the case. It is definitely best to leverage this important property for FRWR. SMB Direct in Windows does a similar swizzling, by the way. We have taken several different approaches to the format, over time. Tom. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) 2018-11-20 2:45 ` Tom Talpey @ 2018-11-20 3:09 ` Jason Gunthorpe 2018-11-20 3:25 ` Tom Talpey 0 siblings, 1 reply; 31+ messages in thread From: Jason Gunthorpe @ 2018-11-20 3:09 UTC (permalink / raw) To: Tom Talpey Cc: Mora, Jorge, Chuck Lever, Olga Kornievskaia, linux-rdma, Linux NFS Mailing List On Mon, Nov 19, 2018 at 09:45:24PM -0500, Tom Talpey wrote: > On 11/19/2018 5:46 PM, Jason Gunthorpe wrote: > > On Mon, Nov 19, 2018 at 09:42:56PM +0000, Mora, Jorge wrote: > > > Hello Chuck, > > > > > > I am confused, is it the whole purpose of RDMA is to place the data > > > directly into the memory location given by the virtual address or > > > offset? What you are saying is that this offset is not the actual > > > memory address and so the driver must map this offset to the actual > > > address? > > > > All RDMA on the wire addresses are 'virtual' and always go through > > translation hardware to reach a physical memory address. Even if you > > have a virtual address that matches your physical address there will > > still be translation hardware to enforce security. > > > > The direct placement refers to not copying data, not to using physical > > memory addresses throughout the protocol design. > > > > So it is better to use a virtual address that doesn't disclose any > > information about the host's configuration to the network, such as > > physical locations of kernel memory. > > Agreed. And, this is a very subtle but very powerful aspect of FRWR > registration, that the high-order bits of the "virtual address" can > basically be any value of the requestor's (client's) choice. > > Note, the low-order bits, up to the maximum offset used by the upper > layer (e.g. a few MB for NFS), must be reserved in order to allow RDMA > segments to iterate over the range of bytes in the region. That's only > 24 or so bits out of the available 64, here however. Why does it depend on protocol? So long as the adaptor can compute VA + Offset = Physical the upper bits can be changed. For many adaptors that means PAGE_SIZE or huge page size level alignment.. Jason ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) 2018-11-20 3:09 ` Jason Gunthorpe @ 2018-11-20 3:25 ` Tom Talpey 2018-11-20 3:32 ` Jason Gunthorpe 0 siblings, 1 reply; 31+ messages in thread From: Tom Talpey @ 2018-11-20 3:25 UTC (permalink / raw) To: Jason Gunthorpe Cc: Mora, Jorge, Chuck Lever, Olga Kornievskaia, linux-rdma, Linux NFS Mailing List On 11/19/2018 10:09 PM, Jason Gunthorpe wrote: > On Mon, Nov 19, 2018 at 09:45:24PM -0500, Tom Talpey wrote: >> On 11/19/2018 5:46 PM, Jason Gunthorpe wrote: >>> On Mon, Nov 19, 2018 at 09:42:56PM +0000, Mora, Jorge wrote: >>>> Hello Chuck, >>>> >>>> I am confused, is it the whole purpose of RDMA is to place the data >>>> directly into the memory location given by the virtual address or >>>> offset? What you are saying is that this offset is not the actual >>>> memory address and so the driver must map this offset to the actual >>>> address? >>> >>> All RDMA on the wire addresses are 'virtual' and always go through >>> translation hardware to reach a physical memory address. Even if you >>> have a virtual address that matches your physical address there will >>> still be translation hardware to enforce security. >>> >>> The direct placement refers to not copying data, not to using physical >>> memory addresses throughout the protocol design. >>> >>> So it is better to use a virtual address that doesn't disclose any >>> information about the host's configuration to the network, such as >>> physical locations of kernel memory. >> >> Agreed. And, this is a very subtle but very powerful aspect of FRWR >> registration, that the high-order bits of the "virtual address" can >> basically be any value of the requestor's (client's) choice. >> >> Note, the low-order bits, up to the maximum offset used by the upper >> layer (e.g. a few MB for NFS), must be reserved in order to allow RDMA >> segments to iterate over the range of bytes in the region. That's only >> 24 or so bits out of the available 64, here however. > > Why does it depend on protocol? Because the maximum payload being RDMA's is an upper layer matter. The idea here is to provide a certain number of "invariant" bits in the offset / virtual address. Chuck proposes reserving just 32 bits, leaving 32 bits of range, quite comfy. > So long as the adaptor can compute VA + Offset = Physical the upper > bits can be changed. Absolutely, but if the offset walks into the high-order invariant bits, they won't be invariant any longer. It would still work, but it would defeat the wireshark trick that Chuck wants to play. > For many adaptors that means PAGE_SIZE or huge > page size level alignment.. I'm not sure I understand this. The FRWR hides the native page size, by providing the page frames in the local work request. Their size isn't known to the protocol. Tom. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) 2018-11-20 3:25 ` Tom Talpey @ 2018-11-20 3:32 ` Jason Gunthorpe 2018-11-20 3:38 ` Tom Talpey 0 siblings, 1 reply; 31+ messages in thread From: Jason Gunthorpe @ 2018-11-20 3:32 UTC (permalink / raw) To: Tom Talpey Cc: Mora, Jorge, Chuck Lever, Olga Kornievskaia, linux-rdma, Linux NFS Mailing List On Mon, Nov 19, 2018 at 10:25:37PM -0500, Tom Talpey wrote: > Absolutely, but if the offset walks into the high-order invariant bits, > they won't be invariant any longer. It would still work, but it would > defeat the wireshark trick that Chuck wants to play. Sure, OK, if you care about the bits being invarient, and not simply randomized :) > > For many adaptors that means PAGE_SIZE or huge > > page size level alignment.. > > I'm not sure I understand this. The FRWR hides the native page size, by > providing the page frames in the local work request. Their size isn't > known to the protocol. Ah, some HW has limitations in how it can construct a VA. In many implementation the low order bits of the VA must match the offset into the page. If the page is 4K that is 12 bits, if it is 2M that is 21 bits.. So there should be restrictions on what start VA can be assigned to a FRWR. Jason ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) 2018-11-20 3:32 ` Jason Gunthorpe @ 2018-11-20 3:38 ` Tom Talpey 0 siblings, 0 replies; 31+ messages in thread From: Tom Talpey @ 2018-11-20 3:38 UTC (permalink / raw) To: Jason Gunthorpe Cc: Mora, Jorge, Chuck Lever, Olga Kornievskaia, linux-rdma, Linux NFS Mailing List On 11/19/2018 10:32 PM, Jason Gunthorpe wrote: > On Mon, Nov 19, 2018 at 10:25:37PM -0500, Tom Talpey wrote: > >> Absolutely, but if the offset walks into the high-order invariant bits, >> they won't be invariant any longer. It would still work, but it would >> defeat the wireshark trick that Chuck wants to play. > > Sure, OK, if you care about the bits being invarient, and not simply > randomized :) Well, that was Chuck's proposal, to use the XID. SMB Direct originally used a formula, then changed to randomized, and in future will likely change it again. The point is, it's the RDMA consumer's choice, not a verb, or adapter, or protocol, requirement. >>> For many adaptors that means PAGE_SIZE or huge >>> page size level alignment.. >> >> I'm not sure I understand this. The FRWR hides the native page size, by >> providing the page frames in the local work request. Their size isn't >> known to the protocol. > > Ah, some HW has limitations in how it can construct a VA. In many > implementation the low order bits of the VA must match the offset into > the page. If the page is 4K that is 12 bits, if it is 2M that is 21 > bits.. > > So there should be restrictions on what start VA can be assigned to a > FRWR. I think only the FBO (First Byte Offset) is really restricted. But indeed yes, the number of possible FBO bits is a function of the page size. So, if so, we agree - ok. Tom. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) 2018-11-19 15:46 ` [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) Chuck Lever 2018-11-19 17:47 ` Olga Kornievskaia @ 2018-11-20 18:02 ` Anna Schumaker 2018-11-20 18:07 ` Chuck Lever 1 sibling, 1 reply; 31+ messages in thread From: Anna Schumaker @ 2018-11-20 18:02 UTC (permalink / raw) To: Chuck Lever, linux-rdma; +Cc: linux-nfs Hi Chuck, On Mon, 2018-11-19 at 10:46 -0500, Chuck Lever wrote: > Place the associated RPC transaction's XID in the upper 32 bits of > each RDMA segment's rdma_offset field. These bits are currently > always zero. > > There are two reasons to do this: > > - The R_key only has 8 bits that are different from registration to > registration. The XID adds more uniqueness to each RDMA segment to > reduce the likelihood of a software bug on the server reading from > or writing into memory it's not supposed to. > > - On-the-wire RDMA Read and Write operations do not otherwise carry > any identifier that matches them up to an RPC. The XID in the > upper 32 bits will act as an eye-catcher in network captures. > > Suggested-by: Tom Talpey <ttalpey@microsoft.com> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > --- > net/sunrpc/xprtrdma/frwr_ops.c | 3 ++- > net/sunrpc/xprtrdma/rpc_rdma.c | 6 +++--- > net/sunrpc/xprtrdma/xprt_rdma.h | 2 +- > 3 files changed, 6 insertions(+), 5 deletions(-) > > diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c > index 49b314d..3b260d2 100644 > --- a/net/sunrpc/xprtrdma/frwr_ops.c > +++ b/net/sunrpc/xprtrdma/frwr_ops.c > @@ -344,7 +344,7 @@ > */ > static struct rpcrdma_mr_seg * > frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, > - int nsegs, bool writing, struct rpcrdma_mr **out) > + int nsegs, bool writing, u32 xid, struct rpcrdma_mr **out) > { > struct rpcrdma_ia *ia = &r_xprt->rx_ia; > bool holes_ok = ia->ri_mrtype == IB_MR_TYPE_SG_GAPS; > @@ -398,6 +398,7 @@ > if (unlikely(n != mr->mr_nents)) > goto out_mapmr_err; > > + ibmr->iova |= ((u64)cpu_to_be32(xid)) << 32; My mount command hangs once we make this change (I got bored and killed it after about 5 minutes). This is with NFS over soft-RoCE in a kvm virtual machine, and I see the behavior with all NFS versions. I hope this helps! Anna > key = (u8)(ibmr->rkey & 0x000000FF); > ib_update_fast_reg_key(ibmr, ++key); > > diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c > index 9f53e02..89a2db2 100644 > --- a/net/sunrpc/xprtrdma/rpc_rdma.c > +++ b/net/sunrpc/xprtrdma/rpc_rdma.c > @@ -357,7 +357,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt > *r_xprt, > > do { > seg = r_xprt->rx_ia.ri_ops->ro_map(r_xprt, seg, nsegs, > - false, &mr); > + false, rqst->rq_xid, &mr); > if (IS_ERR(seg)) > return PTR_ERR(seg); > rpcrdma_mr_push(mr, &req->rl_registered); > @@ -415,7 +415,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt > *r_xprt, > nchunks = 0; > do { > seg = r_xprt->rx_ia.ri_ops->ro_map(r_xprt, seg, nsegs, > - true, &mr); > + true, rqst->rq_xid, &mr); > if (IS_ERR(seg)) > return PTR_ERR(seg); > rpcrdma_mr_push(mr, &req->rl_registered); > @@ -473,7 +473,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt > *r_xprt, > nchunks = 0; > do { > seg = r_xprt->rx_ia.ri_ops->ro_map(r_xprt, seg, nsegs, > - true, &mr); > + true, rqst->rq_xid, &mr); > if (IS_ERR(seg)) > return PTR_ERR(seg); > rpcrdma_mr_push(mr, &req->rl_registered); > diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h > index a13ccb6..2ae1ee2 100644 > --- a/net/sunrpc/xprtrdma/xprt_rdma.h > +++ b/net/sunrpc/xprtrdma/xprt_rdma.h > @@ -472,7 +472,7 @@ struct rpcrdma_memreg_ops { > struct rpcrdma_mr_seg * > (*ro_map)(struct rpcrdma_xprt *, > struct rpcrdma_mr_seg *, int, bool, > - struct rpcrdma_mr **); > + u32, struct rpcrdma_mr **); > int (*ro_send)(struct rpcrdma_ia *ia, > struct rpcrdma_req *req); > void (*ro_reminv)(struct rpcrdma_rep *rep, > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) 2018-11-20 18:02 ` Anna Schumaker @ 2018-11-20 18:07 ` Chuck Lever [not found] ` <94ff7ec712e086bfdd9c217a5f97c293a07151b9.camel@gmail.com> 0 siblings, 1 reply; 31+ messages in thread From: Chuck Lever @ 2018-11-20 18:07 UTC (permalink / raw) To: Anna Schumaker; +Cc: linux-rdma, Linux NFS Mailing List > On Nov 20, 2018, at 1:02 PM, Anna Schumaker <schumaker.anna@gmail.com> wrote: > > Hi Chuck, > > On Mon, 2018-11-19 at 10:46 -0500, Chuck Lever wrote: >> Place the associated RPC transaction's XID in the upper 32 bits of >> each RDMA segment's rdma_offset field. These bits are currently >> always zero. >> >> There are two reasons to do this: >> >> - The R_key only has 8 bits that are different from registration to >> registration. The XID adds more uniqueness to each RDMA segment to >> reduce the likelihood of a software bug on the server reading from >> or writing into memory it's not supposed to. >> >> - On-the-wire RDMA Read and Write operations do not otherwise carry >> any identifier that matches them up to an RPC. The XID in the >> upper 32 bits will act as an eye-catcher in network captures. >> >> Suggested-by: Tom Talpey <ttalpey@microsoft.com> >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> >> --- >> net/sunrpc/xprtrdma/frwr_ops.c | 3 ++- >> net/sunrpc/xprtrdma/rpc_rdma.c | 6 +++--- >> net/sunrpc/xprtrdma/xprt_rdma.h | 2 +- >> 3 files changed, 6 insertions(+), 5 deletions(-) >> >> diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c >> index 49b314d..3b260d2 100644 >> --- a/net/sunrpc/xprtrdma/frwr_ops.c >> +++ b/net/sunrpc/xprtrdma/frwr_ops.c >> @@ -344,7 +344,7 @@ >> */ >> static struct rpcrdma_mr_seg * >> frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, >> - int nsegs, bool writing, struct rpcrdma_mr **out) >> + int nsegs, bool writing, u32 xid, struct rpcrdma_mr **out) >> { >> struct rpcrdma_ia *ia = &r_xprt->rx_ia; >> bool holes_ok = ia->ri_mrtype == IB_MR_TYPE_SG_GAPS; >> @@ -398,6 +398,7 @@ >> if (unlikely(n != mr->mr_nents)) >> goto out_mapmr_err; >> >> + ibmr->iova |= ((u64)cpu_to_be32(xid)) << 32; > > My mount command hangs once we make this change (I got bored and killed it after > about 5 minutes). This is with NFS over soft-RoCE in a kvm virtual machine, and > I see the behavior with all NFS versions. > > I hope this helps! OK. Can you capture ftrace output to show us where the hang occurs? I use this on the NFS client: # trace-cmd record -e sunrpc -e rpcrdma ^C # trace-cmd report | less (or > /tmp/trace.log) Thanks! > Anna > >> key = (u8)(ibmr->rkey & 0x000000FF); >> ib_update_fast_reg_key(ibmr, ++key); >> >> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c >> index 9f53e02..89a2db2 100644 >> --- a/net/sunrpc/xprtrdma/rpc_rdma.c >> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c >> @@ -357,7 +357,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt >> *r_xprt, >> >> do { >> seg = r_xprt->rx_ia.ri_ops->ro_map(r_xprt, seg, nsegs, >> - false, &mr); >> + false, rqst->rq_xid, &mr); >> if (IS_ERR(seg)) >> return PTR_ERR(seg); >> rpcrdma_mr_push(mr, &req->rl_registered); >> @@ -415,7 +415,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt >> *r_xprt, >> nchunks = 0; >> do { >> seg = r_xprt->rx_ia.ri_ops->ro_map(r_xprt, seg, nsegs, >> - true, &mr); >> + true, rqst->rq_xid, &mr); >> if (IS_ERR(seg)) >> return PTR_ERR(seg); >> rpcrdma_mr_push(mr, &req->rl_registered); >> @@ -473,7 +473,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt >> *r_xprt, >> nchunks = 0; >> do { >> seg = r_xprt->rx_ia.ri_ops->ro_map(r_xprt, seg, nsegs, >> - true, &mr); >> + true, rqst->rq_xid, &mr); >> if (IS_ERR(seg)) >> return PTR_ERR(seg); >> rpcrdma_mr_push(mr, &req->rl_registered); >> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h >> index a13ccb6..2ae1ee2 100644 >> --- a/net/sunrpc/xprtrdma/xprt_rdma.h >> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h >> @@ -472,7 +472,7 @@ struct rpcrdma_memreg_ops { >> struct rpcrdma_mr_seg * >> (*ro_map)(struct rpcrdma_xprt *, >> struct rpcrdma_mr_seg *, int, bool, >> - struct rpcrdma_mr **); >> + u32, struct rpcrdma_mr **); >> int (*ro_send)(struct rpcrdma_ia *ia, >> struct rpcrdma_req *req); >> void (*ro_reminv)(struct rpcrdma_rep *rep, -- Chuck Lever ^ permalink raw reply [flat|nested] 31+ messages in thread
[parent not found: <94ff7ec712e086bfdd9c217a5f97c293a07151b9.camel@gmail.com>]
* Re: [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) [not found] ` <94ff7ec712e086bfdd9c217a5f97c293a07151b9.camel@gmail.com> @ 2018-11-20 21:31 ` Chuck Lever 0 siblings, 0 replies; 31+ messages in thread From: Chuck Lever @ 2018-11-20 21:31 UTC (permalink / raw) To: Anna Schumaker; +Cc: linux-rdma, Linux NFS Mailing List > On Nov 20, 2018, at 4:01 PM, Anna Schumaker <schumaker.anna@gmail.com> wrote: > > On Tue, 2018-11-20 at 13:07 -0500, Chuck Lever wrote: >>> On Nov 20, 2018, at 1:02 PM, Anna Schumaker <schumaker.anna@gmail.com> >>> wrote: >>> >>> Hi Chuck, >>> >>> On Mon, 2018-11-19 at 10:46 -0500, Chuck Lever wrote: >>>> Place the associated RPC transaction's XID in the upper 32 bits of >>>> each RDMA segment's rdma_offset field. These bits are currently >>>> always zero. >>>> >>>> There are two reasons to do this: >>>> >>>> - The R_key only has 8 bits that are different from registration to >>>> registration. The XID adds more uniqueness to each RDMA segment to >>>> reduce the likelihood of a software bug on the server reading from >>>> or writing into memory it's not supposed to. >>>> >>>> - On-the-wire RDMA Read and Write operations do not otherwise carry >>>> any identifier that matches them up to an RPC. The XID in the >>>> upper 32 bits will act as an eye-catcher in network captures. >>>> >>>> Suggested-by: Tom Talpey <ttalpey@microsoft.com> >>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> >>>> --- >>>> net/sunrpc/xprtrdma/frwr_ops.c | 3 ++- >>>> net/sunrpc/xprtrdma/rpc_rdma.c | 6 +++--- >>>> net/sunrpc/xprtrdma/xprt_rdma.h | 2 +- >>>> 3 files changed, 6 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/net/sunrpc/xprtrdma/frwr_ops.c >>>> b/net/sunrpc/xprtrdma/frwr_ops.c >>>> index 49b314d..3b260d2 100644 >>>> --- a/net/sunrpc/xprtrdma/frwr_ops.c >>>> +++ b/net/sunrpc/xprtrdma/frwr_ops.c >>>> @@ -344,7 +344,7 @@ >>>> */ >>>> static struct rpcrdma_mr_seg * >>>> frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg, >>>> - int nsegs, bool writing, struct rpcrdma_mr **out) >>>> + int nsegs, bool writing, u32 xid, struct rpcrdma_mr **out) >>>> { >>>> struct rpcrdma_ia *ia = &r_xprt->rx_ia; >>>> bool holes_ok = ia->ri_mrtype == IB_MR_TYPE_SG_GAPS; >>>> @@ -398,6 +398,7 @@ >>>> if (unlikely(n != mr->mr_nents)) >>>> goto out_mapmr_err; >>>> >>>> + ibmr->iova |= ((u64)cpu_to_be32(xid)) << 32; >>> >>> My mount command hangs once we make this change (I got bored and killed it >>> after >>> about 5 minutes). This is with NFS over soft-RoCE in a kvm virtual machine, >>> and >>> I see the behavior with all NFS versions. >>> >>> I hope this helps! >> >> OK. Can you capture ftrace output to show us where the hang occurs? >> I use this on the NFS client: >> >> # trace-cmd record -e sunrpc -e rpcrdma >> ^C >> # trace-cmd report | less (or > /tmp/trace.log) > > The trace is attached! I started the mount and walked away for a while. When I > came back, it reported "mount.nfs: Connection timed out". The first operation is EXCHANGE_ID, and the client expects a large Reply so it registers a Reply chunk. mount.nfs-1325 [001] 450.674910: xprtrdma_reply_chunk: task:16530@10 mr=0xffff93643876cb00 4380@0xffffd37d38bec100:0x00001a61 (last) mount.nfs-1325 [001] 450.674912: xprtrdma_marshal: task:16530@10 xid=0x09c9d05d: hdr=48 xdr=232/0/0 inline/reply chunk mount.nfs-1325 [001] 450.674914: xprtrdma_post_send: req=0xffff936438a80000, 2 SGEs, status=0 mount.nfs-1325 [001] 450.674928: xprt_transmit: peer=[192.168.100.215]:20049 xid=0x09c9d05d status=0 mount.nfs-1325 [001] 450.674929: rpc_task_run_action: task:16530@10 flags=5a80 state=0015 status=0 action=call_transmit_status mount.nfs-1325 [001] 450.674930: rpc_task_sleep: task:16530@10 flags=5a80 state=0015 status=0 timeout=18000 queue=xprt_pending kworker/1:1H-177 [001] 450.676094: xprtrdma_wc_receive: cqe=0xffff936438d66600 0 bytes: WR_FLUSH_ERR (5/0x0) kworker/1:1H-177 [001] 450.676095: xprtrdma_wc_receive: cqe=0xffff936438d66400 0 bytes: WR_FLUSH_ERR (5/0x0) The XID is 09c9d0f3, but I don't see it in the offset, which is ffffd37d38bec100. Are you sure the patch is applied? > I hope this helps! > Anna > >> >> Thanks! >> >> >>> Anna >>> >>>> key = (u8)(ibmr->rkey & 0x000000FF); >>>> ib_update_fast_reg_key(ibmr, ++key); >>>> >>>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c >>>> b/net/sunrpc/xprtrdma/rpc_rdma.c >>>> index 9f53e02..89a2db2 100644 >>>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c >>>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c >>>> @@ -357,7 +357,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt >>>> *r_xprt, >>>> >>>> do { >>>> seg = r_xprt->rx_ia.ri_ops->ro_map(r_xprt, seg, nsegs, >>>> - false, &mr); >>>> + false, rqst->rq_xid, &mr); >>>> if (IS_ERR(seg)) >>>> return PTR_ERR(seg); >>>> rpcrdma_mr_push(mr, &req->rl_registered); >>>> @@ -415,7 +415,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt >>>> *r_xprt, >>>> nchunks = 0; >>>> do { >>>> seg = r_xprt->rx_ia.ri_ops->ro_map(r_xprt, seg, nsegs, >>>> - true, &mr); >>>> + true, rqst->rq_xid, &mr); >>>> if (IS_ERR(seg)) >>>> return PTR_ERR(seg); >>>> rpcrdma_mr_push(mr, &req->rl_registered); >>>> @@ -473,7 +473,7 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt >>>> *r_xprt, >>>> nchunks = 0; >>>> do { >>>> seg = r_xprt->rx_ia.ri_ops->ro_map(r_xprt, seg, nsegs, >>>> - true, &mr); >>>> + true, rqst->rq_xid, &mr); >>>> if (IS_ERR(seg)) >>>> return PTR_ERR(seg); >>>> rpcrdma_mr_push(mr, &req->rl_registered); >>>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h >>>> b/net/sunrpc/xprtrdma/xprt_rdma.h >>>> index a13ccb6..2ae1ee2 100644 >>>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h >>>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h >>>> @@ -472,7 +472,7 @@ struct rpcrdma_memreg_ops { >>>> struct rpcrdma_mr_seg * >>>> (*ro_map)(struct rpcrdma_xprt *, >>>> struct rpcrdma_mr_seg *, int, bool, >>>> - struct rpcrdma_mr **); >>>> + u32, struct rpcrdma_mr **); >>>> int (*ro_send)(struct rpcrdma_ia *ia, >>>> struct rpcrdma_req *req); >>>> void (*ro_reminv)(struct rpcrdma_rep *rep, >> >> -- >> Chuck Lever >> >> >> > <trace.log> -- Chuck Lever ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2018-11-20 21:31 UTC | newest] Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-11-19 15:45 [PATCH v1 0/4] NFS/RDMA client for v4.21 (part 1) Chuck Lever 2018-11-19 15:45 ` [PATCH v1 1/4] xprtrdma: Remove support for FMR memory registration Chuck Lever 2018-11-19 16:16 ` Bart Van Assche 2018-11-19 19:09 ` Leon Romanovsky 2018-11-19 20:52 ` Bart Van Assche 2018-11-20 5:37 ` Leon Romanovsky 2018-11-19 22:41 ` Jason Gunthorpe 2018-11-19 22:56 ` Chuck Lever 2018-11-19 23:10 ` Jason Gunthorpe 2018-11-20 15:22 ` Dennis Dalessandro 2018-11-19 15:45 ` [PATCH v1 2/4] xprtrdma: mrs_create off-by-one Chuck Lever 2018-11-19 15:46 ` [PATCH v1 3/4] xprtrdma: Reduce max_frwr_depth Chuck Lever 2018-11-19 15:46 ` [PATCH v1 4/4] xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) Chuck Lever 2018-11-19 17:47 ` Olga Kornievskaia 2018-11-19 17:58 ` Chuck Lever 2018-11-19 18:08 ` Olga Kornievskaia 2018-11-19 18:18 ` Chuck Lever 2018-11-19 18:47 ` Olga Kornievskaia 2018-11-19 18:58 ` Chuck Lever 2018-11-19 21:22 ` Olga Kornievskaia 2018-11-19 21:32 ` Chuck Lever 2018-11-19 21:42 ` Mora, Jorge 2018-11-19 22:46 ` Jason Gunthorpe 2018-11-20 2:45 ` Tom Talpey 2018-11-20 3:09 ` Jason Gunthorpe 2018-11-20 3:25 ` Tom Talpey 2018-11-20 3:32 ` Jason Gunthorpe 2018-11-20 3:38 ` Tom Talpey 2018-11-20 18:02 ` Anna Schumaker 2018-11-20 18:07 ` Chuck Lever [not found] ` <94ff7ec712e086bfdd9c217a5f97c293a07151b9.camel@gmail.com> 2018-11-20 21:31 ` Chuck Lever
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).