All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Don Dutile <ddutile@redhat.com>,
	Chuck Lever <chuck.lever@oracle.com>,
	"J. Bruce Fields" <bfields@redhat.com>
Subject: [PATCH 4.19 44/44] svcrdma: Remove max_sge check at connect time
Date: Wed, 13 Feb 2019 19:38:45 +0100	[thread overview]
Message-ID: <20190213183655.483627018@linuxfoundation.org> (raw)
In-Reply-To: <20190213183651.648060257@linuxfoundation.org>

4.19-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Chuck Lever <chuck.lever@oracle.com>

commit e248aa7be86e8179f20ac0931774ecd746f3f5bf upstream.

Two and a half years ago, the client was changed to use gathered
Send for larger inline messages, in commit 655fec6987b ("xprtrdma:
Use gathered Send for large inline messages"). Several fixes were
required because there are a few in-kernel device drivers whose
max_sge is 3, and these were broken by the change.

Apparently my memory is going, because some time later, I submitted
commit 25fd86eca11c ("svcrdma: Don't overrun the SGE array in
svc_rdma_send_ctxt"), and after that, commit f3c1fd0ee294 ("svcrdma:
Reduce max_send_sges"). These too incorrectly assumed in-kernel
device drivers would have more than a few Send SGEs available.

The fix for the server side is not the same. This is because the
fundamental problem on the server is that, whether or not the client
has provisioned a chunk for the RPC reply, the server must squeeze
even the most complex RPC replies into a single RDMA Send. Failing
in the send path because of Send SGE exhaustion should never be an
option.

Therefore, instead of failing when the send path runs out of SGEs,
switch to using a bounce buffer mechanism to handle RPC replies that
are too complex for the device to send directly. That allows us to
remove the max_sge check to enable drivers with small max_sge to
work again.

Reported-by: Don Dutile <ddutile@redhat.com>
Fixes: 25fd86eca11c ("svcrdma: Don't overrun the SGE array in ...")
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 net/sunrpc/xprtrdma/svc_rdma_sendto.c    |  105 +++++++++++++++++++++++++++++--
 net/sunrpc/xprtrdma/svc_rdma_transport.c |    9 --
 2 files changed, 102 insertions(+), 12 deletions(-)

--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -563,6 +563,99 @@ void svc_rdma_sync_reply_hdr(struct svcx
 				      DMA_TO_DEVICE);
 }
 
+/* If the xdr_buf has more elements than the device can
+ * transmit in a single RDMA Send, then the reply will
+ * have to be copied into a bounce buffer.
+ */
+static bool svc_rdma_pull_up_needed(struct svcxprt_rdma *rdma,
+				    struct xdr_buf *xdr,
+				    __be32 *wr_lst)
+{
+	int elements;
+
+	/* xdr->head */
+	elements = 1;
+
+	/* xdr->pages */
+	if (!wr_lst) {
+		unsigned int remaining;
+		unsigned long pageoff;
+
+		pageoff = xdr->page_base & ~PAGE_MASK;
+		remaining = xdr->page_len;
+		while (remaining) {
+			++elements;
+			remaining -= min_t(u32, PAGE_SIZE - pageoff,
+					   remaining);
+			pageoff = 0;
+		}
+	}
+
+	/* xdr->tail */
+	if (xdr->tail[0].iov_len)
+		++elements;
+
+	/* assume 1 SGE is needed for the transport header */
+	return elements >= rdma->sc_max_send_sges;
+}
+
+/* The device is not capable of sending the reply directly.
+ * Assemble the elements of @xdr into the transport header
+ * buffer.
+ */
+static int svc_rdma_pull_up_reply_msg(struct svcxprt_rdma *rdma,
+				      struct svc_rdma_send_ctxt *ctxt,
+				      struct xdr_buf *xdr, __be32 *wr_lst)
+{
+	unsigned char *dst, *tailbase;
+	unsigned int taillen;
+
+	dst = ctxt->sc_xprt_buf;
+	dst += ctxt->sc_sges[0].length;
+
+	memcpy(dst, xdr->head[0].iov_base, xdr->head[0].iov_len);
+	dst += xdr->head[0].iov_len;
+
+	tailbase = xdr->tail[0].iov_base;
+	taillen = xdr->tail[0].iov_len;
+	if (wr_lst) {
+		u32 xdrpad;
+
+		xdrpad = xdr_padsize(xdr->page_len);
+		if (taillen && xdrpad) {
+			tailbase += xdrpad;
+			taillen -= xdrpad;
+		}
+	} else {
+		unsigned int len, remaining;
+		unsigned long pageoff;
+		struct page **ppages;
+
+		ppages = xdr->pages + (xdr->page_base >> PAGE_SHIFT);
+		pageoff = xdr->page_base & ~PAGE_MASK;
+		remaining = xdr->page_len;
+		while (remaining) {
+			len = min_t(u32, PAGE_SIZE - pageoff, remaining);
+
+			memcpy(dst, page_address(*ppages), len);
+			remaining -= len;
+			dst += len;
+			pageoff = 0;
+		}
+	}
+
+	if (taillen)
+		memcpy(dst, tailbase, taillen);
+
+	ctxt->sc_sges[0].length += xdr->len;
+	ib_dma_sync_single_for_device(rdma->sc_pd->device,
+				      ctxt->sc_sges[0].addr,
+				      ctxt->sc_sges[0].length,
+				      DMA_TO_DEVICE);
+
+	return 0;
+}
+
 /* svc_rdma_map_reply_msg - Map the buffer holding RPC message
  * @rdma: controlling transport
  * @ctxt: send_ctxt for the Send WR
@@ -585,8 +678,10 @@ int svc_rdma_map_reply_msg(struct svcxpr
 	u32 xdr_pad;
 	int ret;
 
-	if (++ctxt->sc_cur_sge_no >= rdma->sc_max_send_sges)
-		return -EIO;
+	if (svc_rdma_pull_up_needed(rdma, xdr, wr_lst))
+		return svc_rdma_pull_up_reply_msg(rdma, ctxt, xdr, wr_lst);
+
+	++ctxt->sc_cur_sge_no;
 	ret = svc_rdma_dma_map_buf(rdma, ctxt,
 				   xdr->head[0].iov_base,
 				   xdr->head[0].iov_len);
@@ -617,8 +712,7 @@ int svc_rdma_map_reply_msg(struct svcxpr
 	while (remaining) {
 		len = min_t(u32, PAGE_SIZE - page_off, remaining);
 
-		if (++ctxt->sc_cur_sge_no >= rdma->sc_max_send_sges)
-			return -EIO;
+		++ctxt->sc_cur_sge_no;
 		ret = svc_rdma_dma_map_page(rdma, ctxt, *ppages++,
 					    page_off, len);
 		if (ret < 0)
@@ -632,8 +726,7 @@ int svc_rdma_map_reply_msg(struct svcxpr
 	len = xdr->tail[0].iov_len;
 tail:
 	if (len) {
-		if (++ctxt->sc_cur_sge_no >= rdma->sc_max_send_sges)
-			return -EIO;
+		++ctxt->sc_cur_sge_no;
 		ret = svc_rdma_dma_map_buf(rdma, ctxt, base, len);
 		if (ret < 0)
 			return ret;
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -478,12 +478,9 @@ static struct svc_xprt *svc_rdma_accept(
 	/* Transport header, head iovec, tail iovec */
 	newxprt->sc_max_send_sges = 3;
 	/* Add one SGE per page list entry */
-	newxprt->sc_max_send_sges += svcrdma_max_req_size / PAGE_SIZE;
-	if (newxprt->sc_max_send_sges > dev->attrs.max_send_sge) {
-		pr_err("svcrdma: too few Send SGEs available (%d needed)\n",
-		       newxprt->sc_max_send_sges);
-		goto errout;
-	}
+	newxprt->sc_max_send_sges += (svcrdma_max_req_size / PAGE_SIZE) + 1;
+	if (newxprt->sc_max_send_sges > dev->attrs.max_send_sge)
+		newxprt->sc_max_send_sges = dev->attrs.max_send_sge;
 	newxprt->sc_max_req_size = svcrdma_max_req_size;
 	newxprt->sc_max_requests = svcrdma_max_requests;
 	newxprt->sc_max_bc_requests = svcrdma_max_bc_requests;



  parent reply	other threads:[~2019-02-13 18:44 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-13 18:38 [PATCH 4.19 00/44] 4.19.22-stable review Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 01/44] mtd: Make sure mtd->erasesize is valid even if the partition is of size 0 Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 02/44] mtd: spinand: Handle the case where PROGRAM LOAD does not reset the cache Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 03/44] mtd: spinand: Fix the error/cleanup path in spinand_init() Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 04/44] mtd: rawnand: gpmi: fix MX28 bus master lockup problem Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 05/44] libata: Add NOLPM quirk for SAMSUNG MZ7TE512HMHP-000L1 SSD Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 06/44] tools: iio: iio_generic_buffer: make num_loops signed Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 07/44] iio: adc: axp288: Fix TS-pin handling Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 08/44] iio: chemical: atlas-ph-sensor: correct IIO_TEMP values to millicelsius Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 09/44] iio: ti-ads8688: Update buffer allocation for timestamps Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 10/44] signal: Always notice exiting tasks Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 11/44] signal: Better detection of synchronous signals Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 12/44] misc: vexpress: Off by one in vexpress_syscfg_exec() Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 13/44] mei: me: add ice lake point device id Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 14/44] samples: mei: use /dev/mei0 instead of /dev/mei Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 15/44] debugfs: fix debugfs_rename parameter checking Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 16/44] pinctrl: sunxi: Correct number of IRQ banks on H6 main pin controller Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 17/44] pinctrl: cherryview: fix Strago DMI workaround Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 18/44] tracing: uprobes: Fix typo in pr_fmt string Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 19/44] mips: cm: reprime error cause Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 20/44] MIPS: OCTEON: dont set octeon_dma_bar_type if PCI is disabled Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 21/44] MIPS: VDSO: Use same -m%-float cflag as the kernel proper Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 22/44] mips: loongson64: remove unreachable(), fix loongson_poweroff() Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 23/44] MIPS: VDSO: Include $(ccflags-vdso) in o32,n32 .lds builds Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 24/44] ARM: iop32x/n2100: fix PCI IRQ mapping Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 25/44] ARM: tango: Improve ARCH_MULTIPLATFORM compatibility Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 26/44] ARM: dts: da850: fix interrupt numbers for clocksource Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 27/44] firmware: arm_scmi: provide the mandatory device release callback Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 28/44] powerpc/radix: Fix kernel crash with mremap() Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 29/44] mic: vop: Fix use-after-free on remove Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 30/44] mac80211: ensure that mgmt tx skbs have tailroom for encryption Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 31/44] drm/modes: Prevent division by zero htotal Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 32/44] drm/amd/powerplay: Fix missing break in switch Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 33/44] drm/i915: always return something on DDI clock selection Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 34/44] drm/vmwgfx: Fix setting of dma masks Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 35/44] drm/vmwgfx: Return error code from vmw_execbuf_copy_fence_user Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 36/44] [STABLE PATCH] SUNRPC: Always drop the XPRT_LOCK on XPRT_CLOSE_WAIT Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 37/44] xfrm: Make set-mark default behavior backward compatible Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 38/44] Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal" Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 39/44] libceph: avoid KEEPALIVE_PENDING races in ceph_con_keepalive() Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 40/44] xfrm: refine validation of template and selector families Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 41/44] batman-adv: Avoid WARN on net_device without parent in netns Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 42/44] batman-adv: Force mac header to start of data on xmit Greg Kroah-Hartman
2019-02-13 18:38 ` [PATCH 4.19 43/44] svcrdma: Reduce max_send_sges Greg Kroah-Hartman
2019-02-13 18:38 ` Greg Kroah-Hartman [this message]
2019-02-14 16:46 ` [PATCH 4.19 00/44] 4.19.22-stable review Dan Rue
2019-02-14 19:17 ` Guenter Roeck
2019-02-14 22:23 ` shuah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190213183655.483627018@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=bfields@redhat.com \
    --cc=chuck.lever@oracle.com \
    --cc=ddutile@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.