netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 5.10 03/45] SUNRPC: Handle TCP socket sends with kernel_sendpage() again
       [not found] <20210120012602.769683-1-sashal@kernel.org>
@ 2021-01-20  1:25 ` Sasha Levin
  2021-02-08 19:34   ` Trond Myklebust
  2021-01-20  1:25 ` [PATCH AUTOSEL 5.10 21/45] r8152: Add Lenovo Powered USB-C Travel Hub Sasha Levin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Sasha Levin @ 2021-01-20  1:25 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Chuck Lever, Daire Byrne, Sasha Levin, linux-nfs, netdev

From: Chuck Lever <chuck.lever@oracle.com>

[ Upstream commit 4a85a6a3320b4a622315d2e0ea91a1d2b013bce4 ]

Daire Byrne reports a ~50% aggregrate throughput regression on his
Linux NFS server after commit da1661b93bf4 ("SUNRPC: Teach server to
use xprt_sock_sendmsg for socket sends"), which replaced
kernel_send_page() calls in NFSD's socket send path with calls to
sock_sendmsg() using iov_iter.

Investigation showed that tcp_sendmsg() was not using zero-copy to
send the xdr_buf's bvec pages, but instead was relying on memcpy.
This means copying every byte of a large NFS READ payload.

It looks like TLS sockets do indeed support a ->sendpage method,
so it's really not necessary to use xprt_sock_sendmsg() to support
TLS fully on the server. A mechanical reversion of da1661b93bf4 is
not possible at this point, but we can re-implement the server's
TCP socket sendmsg path using kernel_sendpage().

Reported-by: Daire Byrne <daire@dneg.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209439
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/sunrpc/svcsock.c | 86 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 85 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index c2752e2b9ce34..4404c491eb388 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1062,6 +1062,90 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp)
 	return 0;	/* record not complete */
 }
 
+static int svc_tcp_send_kvec(struct socket *sock, const struct kvec *vec,
+			      int flags)
+{
+	return kernel_sendpage(sock, virt_to_page(vec->iov_base),
+			       offset_in_page(vec->iov_base),
+			       vec->iov_len, flags);
+}
+
+/*
+ * kernel_sendpage() is used exclusively to reduce the number of
+ * copy operations in this path. Therefore the caller must ensure
+ * that the pages backing @xdr are unchanging.
+ *
+ * In addition, the logic assumes that * .bv_len is never larger
+ * than PAGE_SIZE.
+ */
+static int svc_tcp_sendmsg(struct socket *sock, struct msghdr *msg,
+			   struct xdr_buf *xdr, rpc_fraghdr marker,
+			   unsigned int *sentp)
+{
+	const struct kvec *head = xdr->head;
+	const struct kvec *tail = xdr->tail;
+	struct kvec rm = {
+		.iov_base	= &marker,
+		.iov_len	= sizeof(marker),
+	};
+	int flags, ret;
+
+	*sentp = 0;
+	xdr_alloc_bvec(xdr, GFP_KERNEL);
+
+	msg->msg_flags = MSG_MORE;
+	ret = kernel_sendmsg(sock, msg, &rm, 1, rm.iov_len);
+	if (ret < 0)
+		return ret;
+	*sentp += ret;
+	if (ret != rm.iov_len)
+		return -EAGAIN;
+
+	flags = head->iov_len < xdr->len ? MSG_MORE | MSG_SENDPAGE_NOTLAST : 0;
+	ret = svc_tcp_send_kvec(sock, head, flags);
+	if (ret < 0)
+		return ret;
+	*sentp += ret;
+	if (ret != head->iov_len)
+		goto out;
+
+	if (xdr->page_len) {
+		unsigned int offset, len, remaining;
+		struct bio_vec *bvec;
+
+		bvec = xdr->bvec;
+		offset = xdr->page_base;
+		remaining = xdr->page_len;
+		flags = MSG_MORE | MSG_SENDPAGE_NOTLAST;
+		while (remaining > 0) {
+			if (remaining <= PAGE_SIZE && tail->iov_len == 0)
+				flags = 0;
+			len = min(remaining, bvec->bv_len);
+			ret = kernel_sendpage(sock, bvec->bv_page,
+					      bvec->bv_offset + offset,
+					      len, flags);
+			if (ret < 0)
+				return ret;
+			*sentp += ret;
+			if (ret != len)
+				goto out;
+			remaining -= len;
+			offset = 0;
+			bvec++;
+		}
+	}
+
+	if (tail->iov_len) {
+		ret = svc_tcp_send_kvec(sock, tail, 0);
+		if (ret < 0)
+			return ret;
+		*sentp += ret;
+	}
+
+out:
+	return 0;
+}
+
 /**
  * svc_tcp_sendto - Send out a reply on a TCP socket
  * @rqstp: completed svc_rqst
@@ -1089,7 +1173,7 @@ static int svc_tcp_sendto(struct svc_rqst *rqstp)
 	mutex_lock(&xprt->xpt_mutex);
 	if (svc_xprt_is_dead(xprt))
 		goto out_notconn;
-	err = xprt_sock_sendmsg(svsk->sk_sock, &msg, xdr, 0, marker, &sent);
+	err = svc_tcp_sendmsg(svsk->sk_sock, &msg, xdr, marker, &sent);
 	xdr_free_bvec(xdr);
 	trace_svcsock_tcp_send(xprt, err < 0 ? err : sent);
 	if (err < 0 || sent != (xdr->len + sizeof(marker)))
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH AUTOSEL 5.10 21/45] r8152: Add Lenovo Powered USB-C Travel Hub
       [not found] <20210120012602.769683-1-sashal@kernel.org>
  2021-01-20  1:25 ` [PATCH AUTOSEL 5.10 03/45] SUNRPC: Handle TCP socket sends with kernel_sendpage() again Sasha Levin
@ 2021-01-20  1:25 ` Sasha Levin
  2021-01-20  1:25 ` [PATCH AUTOSEL 5.10 27/45] net: stmmac: use __napi_schedule() for PREEMPT_RT Sasha Levin
  2021-01-20  1:25 ` [PATCH AUTOSEL 5.10 28/45] net: stmmac: Fixed mtu channged by cache aligned Sasha Levin
  3 siblings, 0 replies; 12+ messages in thread
From: Sasha Levin @ 2021-01-20  1:25 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Leon Schuermann, Jakub Kicinski, Sasha Levin, linux-usb, netdev

From: Leon Schuermann <leon@is.currently.online>

[ Upstream commit cb82a54904a99df9e8f9e9d282046055dae5a730 ]

This USB-C Hub (17ef:721e) based on the Realtek RTL8153B chip used to
use the cdc_ether driver. However, using this driver, with the system
suspended the device constantly sends pause-frames as soon as the
receive buffer fills up. This causes issues with other devices, where
some Ethernet switches stop forwarding packets altogether.

Using the Realtek driver (r8152) fixes this issue. Pause frames are no
longer sent while the host system is suspended.

Signed-off-by: Leon Schuermann <leon@is.currently.online>
Tested-by: Leon Schuermann <leon@is.currently.online>
Link: https://lore.kernel.org/r/20210111190312.12589-2-leon@is.currently.online
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/usb/cdc_ether.c | 7 +++++++
 drivers/net/usb/r8152.c     | 1 +
 2 files changed, 8 insertions(+)

diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c
index 8c1d61c2cbacb..6aaa0675c28a3 100644
--- a/drivers/net/usb/cdc_ether.c
+++ b/drivers/net/usb/cdc_ether.c
@@ -793,6 +793,13 @@ static const struct usb_device_id	products[] = {
 	.driver_info = 0,
 },
 
+/* Lenovo Powered USB-C Travel Hub (4X90S92381, based on Realtek RTL8153) */
+{
+	USB_DEVICE_AND_INTERFACE_INFO(LENOVO_VENDOR_ID, 0x721e, USB_CLASS_COMM,
+			USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE),
+	.driver_info = 0,
+},
+
 /* ThinkPad USB-C Dock Gen 2 (based on Realtek RTL8153) */
 {
 	USB_DEVICE_AND_INTERFACE_INFO(LENOVO_VENDOR_ID, 0xa387, USB_CLASS_COMM,
diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index b1770489aca51..88f177aca342e 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -6893,6 +6893,7 @@ static const struct usb_device_id rtl8152_table[] = {
 	{REALTEK_USB_DEVICE(VENDOR_ID_LENOVO,  0x7205)},
 	{REALTEK_USB_DEVICE(VENDOR_ID_LENOVO,  0x720c)},
 	{REALTEK_USB_DEVICE(VENDOR_ID_LENOVO,  0x7214)},
+	{REALTEK_USB_DEVICE(VENDOR_ID_LENOVO,  0x721e)},
 	{REALTEK_USB_DEVICE(VENDOR_ID_LENOVO,  0xa387)},
 	{REALTEK_USB_DEVICE(VENDOR_ID_LINKSYS, 0x0041)},
 	{REALTEK_USB_DEVICE(VENDOR_ID_NVIDIA,  0x09ff)},
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH AUTOSEL 5.10 27/45] net: stmmac: use __napi_schedule() for PREEMPT_RT
       [not found] <20210120012602.769683-1-sashal@kernel.org>
  2021-01-20  1:25 ` [PATCH AUTOSEL 5.10 03/45] SUNRPC: Handle TCP socket sends with kernel_sendpage() again Sasha Levin
  2021-01-20  1:25 ` [PATCH AUTOSEL 5.10 21/45] r8152: Add Lenovo Powered USB-C Travel Hub Sasha Levin
@ 2021-01-20  1:25 ` Sasha Levin
  2021-01-20  1:25 ` [PATCH AUTOSEL 5.10 28/45] net: stmmac: Fixed mtu channged by cache aligned Sasha Levin
  3 siblings, 0 replies; 12+ messages in thread
From: Sasha Levin @ 2021-01-20  1:25 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Seb Laveze, Jakub Kicinski, Sasha Levin, netdev, linux-stm32,
	linux-arm-kernel

From: Seb Laveze <sebastien.laveze@nxp.com>

[ Upstream commit 1f02efd1bb35bee95feed6aab46d1217f29d555b ]

Use of __napi_schedule_irqoff() is not safe with PREEMPT_RT in which
hard interrupts are not disabled while running the threaded interrupt.

Using __napi_schedule() works for both PREEMPT_RT and mainline Linux,
just at the cost of an additional check if interrupts are disabled for
mainline (since they are already disabled).

Similar to the fix done for enetc commit 215602a8d212 ("enetc: use
napi_schedule to be compatible with PREEMPT_RT")

Signed-off-by: Seb Laveze <sebastien.laveze@nxp.com>
Link: https://lore.kernel.org/r/20210112140121.1487619-1-sebastien.laveze@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index c33db79cdd0ad..cb39f6dbf72b8 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2158,7 +2158,7 @@ static int stmmac_napi_check(struct stmmac_priv *priv, u32 chan)
 			spin_lock_irqsave(&ch->lock, flags);
 			stmmac_disable_dma_irq(priv, priv->ioaddr, chan, 1, 0);
 			spin_unlock_irqrestore(&ch->lock, flags);
-			__napi_schedule_irqoff(&ch->rx_napi);
+			__napi_schedule(&ch->rx_napi);
 		}
 	}
 
@@ -2167,7 +2167,7 @@ static int stmmac_napi_check(struct stmmac_priv *priv, u32 chan)
 			spin_lock_irqsave(&ch->lock, flags);
 			stmmac_disable_dma_irq(priv, priv->ioaddr, chan, 0, 1);
 			spin_unlock_irqrestore(&ch->lock, flags);
-			__napi_schedule_irqoff(&ch->tx_napi);
+			__napi_schedule(&ch->tx_napi);
 		}
 	}
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH AUTOSEL 5.10 28/45] net: stmmac: Fixed mtu channged by cache aligned
       [not found] <20210120012602.769683-1-sashal@kernel.org>
                   ` (2 preceding siblings ...)
  2021-01-20  1:25 ` [PATCH AUTOSEL 5.10 27/45] net: stmmac: use __napi_schedule() for PREEMPT_RT Sasha Levin
@ 2021-01-20  1:25 ` Sasha Levin
  2021-01-20  6:08   ` Jakub Kicinski
  3 siblings, 1 reply; 12+ messages in thread
From: Sasha Levin @ 2021-01-20  1:25 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: David Wu, Jakub Kicinski, Sasha Levin, netdev, linux-stm32,
	linux-arm-kernel

From: David Wu <david.wu@rock-chips.com>

[ Upstream commit 5b55299eed78538cc4746e50ee97103a1643249c ]

Since the original mtu is not used when the mtu is updated,
the mtu is aligned with cache, this will get an incorrect.
For example, if you want to configure the mtu to be 1500,
but mtu 1536 is configured in fact.

Fixed: eaf4fac478077 ("net: stmmac: Do not accept invalid MTU values")
Signed-off-by: David Wu <david.wu@rock-chips.com>
Link: https://lore.kernel.org/r/20210113034109.27865-1-david.wu@rock-chips.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index cb39f6dbf72b8..b3d6d8e3f4de9 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3996,6 +3996,7 @@ static int stmmac_change_mtu(struct net_device *dev, int new_mtu)
 {
 	struct stmmac_priv *priv = netdev_priv(dev);
 	int txfifosz = priv->plat->tx_fifo_size;
+	const int mtu = new_mtu;
 
 	if (txfifosz == 0)
 		txfifosz = priv->dma_cap.tx_fifo_size;
@@ -4013,7 +4014,7 @@ static int stmmac_change_mtu(struct net_device *dev, int new_mtu)
 	if ((txfifosz < new_mtu) || (new_mtu > BUF_SIZE_16KiB))
 		return -EINVAL;
 
-	dev->mtu = new_mtu;
+	dev->mtu = mtu;
 
 	netdev_update_features(dev);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH AUTOSEL 5.10 28/45] net: stmmac: Fixed mtu channged by cache aligned
  2021-01-20  1:25 ` [PATCH AUTOSEL 5.10 28/45] net: stmmac: Fixed mtu channged by cache aligned Sasha Levin
@ 2021-01-20  6:08   ` Jakub Kicinski
  2021-01-20 14:26     ` Sasha Levin
  0 siblings, 1 reply; 12+ messages in thread
From: Jakub Kicinski @ 2021-01-20  6:08 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-kernel, stable, David Wu, netdev, linux-stm32, linux-arm-kernel

On Tue, 19 Jan 2021 20:25:45 -0500 Sasha Levin wrote:
> From: David Wu <david.wu@rock-chips.com>
> 
> [ Upstream commit 5b55299eed78538cc4746e50ee97103a1643249c ]
> 
> Since the original mtu is not used when the mtu is updated,
> the mtu is aligned with cache, this will get an incorrect.
> For example, if you want to configure the mtu to be 1500,
> but mtu 1536 is configured in fact.
> 
> Fixed: eaf4fac478077 ("net: stmmac: Do not accept invalid MTU values")
> Signed-off-by: David Wu <david.wu@rock-chips.com>
> Link: https://lore.kernel.org/r/20210113034109.27865-1-david.wu@rock-chips.com
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Sasha Levin <sashal@kernel.org>

This was applied 6 days ago, I thought you said you wait two weeks.
What am I missing?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH AUTOSEL 5.10 28/45] net: stmmac: Fixed mtu channged by cache aligned
  2021-01-20  6:08   ` Jakub Kicinski
@ 2021-01-20 14:26     ` Sasha Levin
  2021-01-21 14:39       ` [Linux-stm32] " Ahmad Fatoum
  0 siblings, 1 reply; 12+ messages in thread
From: Sasha Levin @ 2021-01-20 14:26 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: linux-kernel, stable, David Wu, netdev, linux-stm32, linux-arm-kernel

On Tue, Jan 19, 2021 at 10:08:15PM -0800, Jakub Kicinski wrote:
>On Tue, 19 Jan 2021 20:25:45 -0500 Sasha Levin wrote:
>> From: David Wu <david.wu@rock-chips.com>
>>
>> [ Upstream commit 5b55299eed78538cc4746e50ee97103a1643249c ]
>>
>> Since the original mtu is not used when the mtu is updated,
>> the mtu is aligned with cache, this will get an incorrect.
>> For example, if you want to configure the mtu to be 1500,
>> but mtu 1536 is configured in fact.
>>
>> Fixed: eaf4fac478077 ("net: stmmac: Do not accept invalid MTU values")
>> Signed-off-by: David Wu <david.wu@rock-chips.com>
>> Link: https://lore.kernel.org/r/20210113034109.27865-1-david.wu@rock-chips.com
>> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>> Signed-off-by: Sasha Levin <sashal@kernel.org>
>
>This was applied 6 days ago, I thought you said you wait two weeks.
>What am I missing?

The "AUTOSEL" review cycle is an additional hurdle automatically
selected patches need to clear before being queued up. There are 7 days
between the day I sent the review for these and the first day I might
queue them up.

This mail isn't an indication that the patch has been added to the
queue, it's just an extra step to give folks time to object.

If you add up all the days you'll get >14 :)

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Linux-stm32] [PATCH AUTOSEL 5.10 28/45] net: stmmac: Fixed mtu channged by cache aligned
  2021-01-20 14:26     ` Sasha Levin
@ 2021-01-21 14:39       ` Ahmad Fatoum
  2021-01-21 16:02         ` Sasha Levin
  0 siblings, 1 reply; 12+ messages in thread
From: Ahmad Fatoum @ 2021-01-21 14:39 UTC (permalink / raw)
  To: Sasha Levin, Jakub Kicinski
  Cc: netdev, linux-kernel, stable, David Wu, linux-stm32, linux-arm-kernel

Hello Sasha,

On 20.01.21 15:26, Sasha Levin wrote:
> On Tue, Jan 19, 2021 at 10:08:15PM -0800, Jakub Kicinski wrote:
>> On Tue, 19 Jan 2021 20:25:45 -0500 Sasha Levin wrote:
>>> From: David Wu <david.wu@rock-chips.com>
>>>
>>> [ Upstream commit 5b55299eed78538cc4746e50ee97103a1643249c ]
>>>
>>> Since the original mtu is not used when the mtu is updated,
>>> the mtu is aligned with cache, this will get an incorrect.
>>> For example, if you want to configure the mtu to be 1500,
>>> but mtu 1536 is configured in fact.
>>>
>>> Fixed: eaf4fac478077 ("net: stmmac: Do not accept invalid MTU values")
>>> Signed-off-by: David Wu <david.wu@rock-chips.com>
>>> Link: https://lore.kernel.org/r/20210113034109.27865-1-david.wu@rock-chips.com
>>> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>>> Signed-off-by: Sasha Levin <sashal@kernel.org>
>>
>> This was applied 6 days ago, I thought you said you wait two weeks.
>> What am I missing?
> 
> The "AUTOSEL" review cycle is an additional hurdle automatically
> selected patches need to clear before being queued up. There are 7 days
> between the day I sent the review for these and the first day I might
> queue them up.

I guess this could benefit from being documented in
Documentation/process/stable-kernel-rules.rst? Or is this documented
elsewhere?

> 
> This mail isn't an indication that the patch has been added to the
> queue, it's just an extra step to give folks time to object.
> 
> If you add up all the days you'll get >14 :)
> 

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Linux-stm32] [PATCH AUTOSEL 5.10 28/45] net: stmmac: Fixed mtu channged by cache aligned
  2021-01-21 14:39       ` [Linux-stm32] " Ahmad Fatoum
@ 2021-01-21 16:02         ` Sasha Levin
  0 siblings, 0 replies; 12+ messages in thread
From: Sasha Levin @ 2021-01-21 16:02 UTC (permalink / raw)
  To: Ahmad Fatoum
  Cc: Jakub Kicinski, netdev, linux-kernel, stable, David Wu,
	linux-stm32, linux-arm-kernel

On Thu, Jan 21, 2021 at 03:39:22PM +0100, Ahmad Fatoum wrote:
>On 20.01.21 15:26, Sasha Levin wrote:
>> On Tue, Jan 19, 2021 at 10:08:15PM -0800, Jakub Kicinski wrote:
>>> This was applied 6 days ago, I thought you said you wait two weeks.
>>> What am I missing?
>>
>> The "AUTOSEL" review cycle is an additional hurdle automatically
>> selected patches need to clear before being queued up. There are 7 days
>> between the day I sent the review for these and the first day I might
>> queue them up.
>
>I guess this could benefit from being documented in
>Documentation/process/stable-kernel-rules.rst? Or is this documented
>elsewhere?

This is not documented because it's not part of the -stable process,
it's just the way I currently handle AUTOSEL stuff. The timeline
requirement for -stable is:

	"It or an equivalent fix must already exist in Linus' tree (upstream)"
-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH AUTOSEL 5.10 03/45] SUNRPC: Handle TCP socket sends with kernel_sendpage() again
  2021-01-20  1:25 ` [PATCH AUTOSEL 5.10 03/45] SUNRPC: Handle TCP socket sends with kernel_sendpage() again Sasha Levin
@ 2021-02-08 19:34   ` Trond Myklebust
  2021-02-08 19:48     ` Chuck Lever
  0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2021-02-08 19:34 UTC (permalink / raw)
  To: sashal, stable, linux-kernel; +Cc: linux-nfs, daire, netdev, chuck.lever

On Tue, 2021-01-19 at 20:25 -0500, Sasha Levin wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> [ Upstream commit 4a85a6a3320b4a622315d2e0ea91a1d2b013bce4 ]
> 
> Daire Byrne reports a ~50% aggregrate throughput regression on his
> Linux NFS server after commit da1661b93bf4 ("SUNRPC: Teach server to
> use xprt_sock_sendmsg for socket sends"), which replaced
> kernel_send_page() calls in NFSD's socket send path with calls to
> sock_sendmsg() using iov_iter.
> 
> Investigation showed that tcp_sendmsg() was not using zero-copy to
> send the xdr_buf's bvec pages, but instead was relying on memcpy.
> This means copying every byte of a large NFS READ payload.
> 
> It looks like TLS sockets do indeed support a ->sendpage method,
> so it's really not necessary to use xprt_sock_sendmsg() to support
> TLS fully on the server. A mechanical reversion of da1661b93bf4 is
> not possible at this point, but we can re-implement the server's
> TCP socket sendmsg path using kernel_sendpage().
> 
> Reported-by: Daire Byrne <daire@dneg.com>
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209439
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>  net/sunrpc/svcsock.c | 86
> +++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 85 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index c2752e2b9ce34..4404c491eb388 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -1062,6 +1062,90 @@ static int svc_tcp_recvfrom(struct svc_rqst
> *rqstp)
>         return 0;       /* record not complete */
>  }
>  
> +static int svc_tcp_send_kvec(struct socket *sock, const struct kvec
> *vec,
> +                             int flags)
> +{
> +       return kernel_sendpage(sock, virt_to_page(vec->iov_base),
> +                              offset_in_page(vec->iov_base),
> +                              vec->iov_len, flags);

I'm having trouble with this line. This looks like it is trying to push
a slab page into kernel_sendpage(). What guarantees that the nfsd
thread won't call kfree() before the socket layer is done transmitting
the page?



-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH AUTOSEL 5.10 03/45] SUNRPC: Handle TCP socket sends with kernel_sendpage() again
  2021-02-08 19:34   ` Trond Myklebust
@ 2021-02-08 19:48     ` Chuck Lever
  2021-02-08 20:12       ` Trond Myklebust
  0 siblings, 1 reply; 12+ messages in thread
From: Chuck Lever @ 2021-02-08 19:48 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: sashal, stable, linux-kernel, Linux NFS Mailing List, daire, netdev



> On Feb 8, 2021, at 2:34 PM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> On Tue, 2021-01-19 at 20:25 -0500, Sasha Levin wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>> 
>> [ Upstream commit 4a85a6a3320b4a622315d2e0ea91a1d2b013bce4 ]
>> 
>> Daire Byrne reports a ~50% aggregrate throughput regression on his
>> Linux NFS server after commit da1661b93bf4 ("SUNRPC: Teach server to
>> use xprt_sock_sendmsg for socket sends"), which replaced
>> kernel_send_page() calls in NFSD's socket send path with calls to
>> sock_sendmsg() using iov_iter.
>> 
>> Investigation showed that tcp_sendmsg() was not using zero-copy to
>> send the xdr_buf's bvec pages, but instead was relying on memcpy.
>> This means copying every byte of a large NFS READ payload.
>> 
>> It looks like TLS sockets do indeed support a ->sendpage method,
>> so it's really not necessary to use xprt_sock_sendmsg() to support
>> TLS fully on the server. A mechanical reversion of da1661b93bf4 is
>> not possible at this point, but we can re-implement the server's
>> TCP socket sendmsg path using kernel_sendpage().
>> 
>> Reported-by: Daire Byrne <daire@dneg.com>
>> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209439
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> Signed-off-by: Sasha Levin <sashal@kernel.org>
>> ---
>>  net/sunrpc/svcsock.c | 86
>> +++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 85 insertions(+), 1 deletion(-)
>> 
>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>> index c2752e2b9ce34..4404c491eb388 100644
>> --- a/net/sunrpc/svcsock.c
>> +++ b/net/sunrpc/svcsock.c
>> @@ -1062,6 +1062,90 @@ static int svc_tcp_recvfrom(struct svc_rqst
>> *rqstp)
>>         return 0;       /* record not complete */
>>  }
>>  
>> +static int svc_tcp_send_kvec(struct socket *sock, const struct kvec
>> *vec,
>> +                             int flags)
>> +{
>> +       return kernel_sendpage(sock, virt_to_page(vec->iov_base),
>> +                              offset_in_page(vec->iov_base),
>> +                              vec->iov_len, flags);

Thanks for your review!

> I'm having trouble with this line. This looks like it is trying to push
> a slab page into kernel_sendpage().

The head and tail kvec's in rq_res are not kmalloc'd, they are
backed by pages in rqstp->rq_pages[].


> What guarantees that the nfsd
> thread won't call kfree() before the socket layer is done transmitting
> the page?

If I understand correctly what Neil told us last week, the page
reference count on those pages is set up so that one of
svc_xprt_release() or the network layer does the final put_page(),
in a safe fashion.

Before da1661b93bf4 ("SUNRPC: Teach server to use xprt_sock_sendmsg
for socket sends"), the original svc_send_common() code did this:

-       /* send head */
-       if (slen == xdr->head[0].iov_len)
-               flags = 0;
-       len = kernel_sendpage(sock, headpage, headoffset,
-                                 xdr->head[0].iov_len, flags);
-       if (len != xdr->head[0].iov_len)
-               goto out;
-       slen -= xdr->head[0].iov_len;
-       if (slen == 0)
-               goto out;


--
Chuck Lever




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH AUTOSEL 5.10 03/45] SUNRPC: Handle TCP socket sends with kernel_sendpage() again
  2021-02-08 19:48     ` Chuck Lever
@ 2021-02-08 20:12       ` Trond Myklebust
  2021-02-08 20:17         ` Chuck Lever
  0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2021-02-08 20:12 UTC (permalink / raw)
  To: chuck.lever; +Cc: sashal, stable, linux-nfs, linux-kernel, daire, netdev

On Mon, 2021-02-08 at 19:48 +0000, Chuck Lever wrote:
> 
> 
> > On Feb 8, 2021, at 2:34 PM, Trond Myklebust <
> > trondmy@hammerspace.com> wrote:
> > 
> > On Tue, 2021-01-19 at 20:25 -0500, Sasha Levin wrote:
> > > From: Chuck Lever <chuck.lever@oracle.com>
> > > 
> > > [ Upstream commit 4a85a6a3320b4a622315d2e0ea91a1d2b013bce4 ]
> > > 
> > > Daire Byrne reports a ~50% aggregrate throughput regression on
> > > his
> > > Linux NFS server after commit da1661b93bf4 ("SUNRPC: Teach server
> > > to
> > > use xprt_sock_sendmsg for socket sends"), which replaced
> > > kernel_send_page() calls in NFSD's socket send path with calls to
> > > sock_sendmsg() using iov_iter.
> > > 
> > > Investigation showed that tcp_sendmsg() was not using zero-copy
> > > to
> > > send the xdr_buf's bvec pages, but instead was relying on memcpy.
> > > This means copying every byte of a large NFS READ payload.
> > > 
> > > It looks like TLS sockets do indeed support a ->sendpage method,
> > > so it's really not necessary to use xprt_sock_sendmsg() to
> > > support
> > > TLS fully on the server. A mechanical reversion of da1661b93bf4
> > > is
> > > not possible at this point, but we can re-implement the server's
> > > TCP socket sendmsg path using kernel_sendpage().
> > > 
> > > Reported-by: Daire Byrne <daire@dneg.com>
> > > BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209439
> > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > > Signed-off-by: Sasha Levin <sashal@kernel.org>
> > > ---
> > >  net/sunrpc/svcsock.c | 86
> > > +++++++++++++++++++++++++++++++++++++++++++-
> > >  1 file changed, 85 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > > index c2752e2b9ce34..4404c491eb388 100644
> > > --- a/net/sunrpc/svcsock.c
> > > +++ b/net/sunrpc/svcsock.c
> > > @@ -1062,6 +1062,90 @@ static int svc_tcp_recvfrom(struct
> > > svc_rqst
> > > *rqstp)
> > >         return 0;       /* record not complete */
> > >  }
> > >  
> > > +static int svc_tcp_send_kvec(struct socket *sock, const struct
> > > kvec
> > > *vec,
> > > +                             int flags)
> > > +{
> > > +       return kernel_sendpage(sock, virt_to_page(vec->iov_base),
> > > +                              offset_in_page(vec->iov_base),
> > > +                              vec->iov_len, flags);
> 
> Thanks for your review!
> 
> > I'm having trouble with this line. This looks like it is trying to
> > push
> > a slab page into kernel_sendpage().
> 
> The head and tail kvec's in rq_res are not kmalloc'd, they are
> backed by pages in rqstp->rq_pages[].
> 
> 
> > What guarantees that the nfsd
> > thread won't call kfree() before the socket layer is done
> > transmitting
> > the page?
> 
> If I understand correctly what Neil told us last week, the page
> reference count on those pages is set up so that one of
> svc_xprt_release() or the network layer does the final put_page(),
> in a safe fashion.
> 
> Before da1661b93bf4 ("SUNRPC: Teach server to use xprt_sock_sendmsg
> for socket sends"), the original svc_send_common() code did this:
> 
> -       /* send head */
> -       if (slen == xdr->head[0].iov_len)
> -               flags = 0;
> -       len = kernel_sendpage(sock, headpage, headoffset,
> -                                 xdr->head[0].iov_len, flags);
> -       if (len != xdr->head[0].iov_len)
> -               goto out;
> -       slen -= xdr->head[0].iov_len;
> -       if (slen == 0)
> -               goto out;
> 
> 
> 

OK, so then only the argument kvec can be allocated on the slab (thanks
to  svc_deferred_recv)? Is that correct?

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH AUTOSEL 5.10 03/45] SUNRPC: Handle TCP socket sends with kernel_sendpage() again
  2021-02-08 20:12       ` Trond Myklebust
@ 2021-02-08 20:17         ` Chuck Lever
  0 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever @ 2021-02-08 20:17 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: sashal, stable, Linux NFS Mailing List, linux-kernel, daire, netdev



> On Feb 8, 2021, at 3:12 PM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> On Mon, 2021-02-08 at 19:48 +0000, Chuck Lever wrote:
>> 
>> 
>>> On Feb 8, 2021, at 2:34 PM, Trond Myklebust <
>>> trondmy@hammerspace.com> wrote:
>>> 
>>> On Tue, 2021-01-19 at 20:25 -0500, Sasha Levin wrote:
>>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>> 
>>>> [ Upstream commit 4a85a6a3320b4a622315d2e0ea91a1d2b013bce4 ]
>>>> 
>>>> Daire Byrne reports a ~50% aggregrate throughput regression on
>>>> his
>>>> Linux NFS server after commit da1661b93bf4 ("SUNRPC: Teach server
>>>> to
>>>> use xprt_sock_sendmsg for socket sends"), which replaced
>>>> kernel_send_page() calls in NFSD's socket send path with calls to
>>>> sock_sendmsg() using iov_iter.
>>>> 
>>>> Investigation showed that tcp_sendmsg() was not using zero-copy
>>>> to
>>>> send the xdr_buf's bvec pages, but instead was relying on memcpy.
>>>> This means copying every byte of a large NFS READ payload.
>>>> 
>>>> It looks like TLS sockets do indeed support a ->sendpage method,
>>>> so it's really not necessary to use xprt_sock_sendmsg() to
>>>> support
>>>> TLS fully on the server. A mechanical reversion of da1661b93bf4
>>>> is
>>>> not possible at this point, but we can re-implement the server's
>>>> TCP socket sendmsg path using kernel_sendpage().
>>>> 
>>>> Reported-by: Daire Byrne <daire@dneg.com>
>>>> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209439
>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>> Signed-off-by: Sasha Levin <sashal@kernel.org>
>>>> ---
>>>>  net/sunrpc/svcsock.c | 86
>>>> +++++++++++++++++++++++++++++++++++++++++++-
>>>>  1 file changed, 85 insertions(+), 1 deletion(-)
>>>> 
>>>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>>>> index c2752e2b9ce34..4404c491eb388 100644
>>>> --- a/net/sunrpc/svcsock.c
>>>> +++ b/net/sunrpc/svcsock.c
>>>> @@ -1062,6 +1062,90 @@ static int svc_tcp_recvfrom(struct
>>>> svc_rqst
>>>> *rqstp)
>>>>         return 0;       /* record not complete */
>>>>  }
>>>>  
>>>> +static int svc_tcp_send_kvec(struct socket *sock, const struct
>>>> kvec
>>>> *vec,
>>>> +                             int flags)
>>>> +{
>>>> +       return kernel_sendpage(sock, virt_to_page(vec->iov_base),
>>>> +                              offset_in_page(vec->iov_base),
>>>> +                              vec->iov_len, flags);
>> 
>> Thanks for your review!
>> 
>>> I'm having trouble with this line. This looks like it is trying to
>>> push
>>> a slab page into kernel_sendpage().
>> 
>> The head and tail kvec's in rq_res are not kmalloc'd, they are
>> backed by pages in rqstp->rq_pages[].
>> 
>> 
>>> What guarantees that the nfsd
>>> thread won't call kfree() before the socket layer is done
>>> transmitting
>>> the page?
>> 
>> If I understand correctly what Neil told us last week, the page
>> reference count on those pages is set up so that one of
>> svc_xprt_release() or the network layer does the final put_page(),
>> in a safe fashion.
>> 
>> Before da1661b93bf4 ("SUNRPC: Teach server to use xprt_sock_sendmsg
>> for socket sends"), the original svc_send_common() code did this:
>> 
>> -       /* send head */
>> -       if (slen == xdr->head[0].iov_len)
>> -               flags = 0;
>> -       len = kernel_sendpage(sock, headpage, headoffset,
>> -                                 xdr->head[0].iov_len, flags);
>> -       if (len != xdr->head[0].iov_len)
>> -               goto out;
>> -       slen -= xdr->head[0].iov_len;
>> -       if (slen == 0)
>> -               goto out;
>> 
>> 
>> 
> 
> OK, so then only the argument kvec can be allocated on the slab (thanks
> to  svc_deferred_recv)? Is that correct?

The RPC/RDMA Receive buffer is kmalloc'd, that would be used for
rq_arg.head/tail. But for TCP, I believe the head kvec is always
pulled out of rq_pages[].

svc_process() sets up rq_res.head this way:

1508         resv->iov_base = page_address(rqstp->rq_respages[0]);
1509         resv->iov_len = 0;

I would need to audit the code to confirm that rq_res.tail is never
kmalloc'd.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-02-08 20:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210120012602.769683-1-sashal@kernel.org>
2021-01-20  1:25 ` [PATCH AUTOSEL 5.10 03/45] SUNRPC: Handle TCP socket sends with kernel_sendpage() again Sasha Levin
2021-02-08 19:34   ` Trond Myklebust
2021-02-08 19:48     ` Chuck Lever
2021-02-08 20:12       ` Trond Myklebust
2021-02-08 20:17         ` Chuck Lever
2021-01-20  1:25 ` [PATCH AUTOSEL 5.10 21/45] r8152: Add Lenovo Powered USB-C Travel Hub Sasha Levin
2021-01-20  1:25 ` [PATCH AUTOSEL 5.10 27/45] net: stmmac: use __napi_schedule() for PREEMPT_RT Sasha Levin
2021-01-20  1:25 ` [PATCH AUTOSEL 5.10 28/45] net: stmmac: Fixed mtu channged by cache aligned Sasha Levin
2021-01-20  6:08   ` Jakub Kicinski
2021-01-20 14:26     ` Sasha Levin
2021-01-21 14:39       ` [Linux-stm32] " Ahmad Fatoum
2021-01-21 16:02         ` Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).