All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/2] mptcp: adjust tcp rcvspace on rx
@ 2020-05-25 18:15 Florian Westphal
  2020-05-25 18:15 ` [PATCH v2 net-next 1/2] mptcp: adjust tcp rcvspace after moving skbs from ssk to sk queue Florian Westphal
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Florian Westphal @ 2020-05-25 18:15 UTC (permalink / raw)
  To: netdev; +Cc: matthieu.baerts, mathew.j.martineau, Paolo Abeni

These two patches improve mptcp throughput by making sure tcp grows
the receive buffer when we move skbs from subflow socket to the
mptcp socket.

The second patch moves mptcp receive buffer increase to the recvmsg
path, i.e. we only change its size when userspace processes/consumes
the data.  This is done by using the largest rcvbuf size of the active
subflows.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 net-next 1/2] mptcp: adjust tcp rcvspace after moving skbs from ssk to sk queue
  2020-05-25 18:15 [PATCH v2 net-next 0/2] mptcp: adjust tcp rcvspace on rx Florian Westphal
@ 2020-05-25 18:15 ` Florian Westphal
  2020-05-25 18:15 ` [PATCH v2 net-next 2/2] mptcp: move recbuf adjustment to recvmsg path Florian Westphal
  2020-05-27  3:28 ` [PATCH v2 net-next 0/2] mptcp: adjust tcp rcvspace on rx David Miller
  2 siblings, 0 replies; 8+ messages in thread
From: Florian Westphal @ 2020-05-25 18:15 UTC (permalink / raw)
  To: netdev; +Cc: matthieu.baerts, mathew.j.martineau, Paolo Abeni, Florian Westphal

TCP does tcp rcvbuf tuning when copying packets to userspace, e.g. in
tcp_read_sock().  In case of mptcp, that function is only rarely used
(when discarding retransmitted duplicate data).

Instead, skbs are moved from the tcp rx queue to the mptcp socket rx
queue.
Adjust subflow rcvbuf when we do so, its the last spot where we can
adjust the ssk rcvbuf -- later we only have the mptcp-level socket.

This greatly improves performance on mptcp bulk transfers.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 v2: no change, added new patch, 2/2

 net/mptcp/protocol.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index ba9d3d5c625f..dbb86cbb9e77 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -248,6 +248,9 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
 
 	*bytes = moved;
 
+	if (moved)
+		tcp_rcv_space_adjust(ssk);
+
 	return done;
 }
 
@@ -1263,6 +1266,7 @@ static int mptcp_init_sock(struct sock *sk)
 		return ret;
 
 	sk_sockets_allocated_inc(sk);
+	sk->sk_rcvbuf = sock_net(sk)->ipv4.sysctl_tcp_rmem[1];
 	sk->sk_sndbuf = sock_net(sk)->ipv4.sysctl_tcp_wmem[2];
 
 	return 0;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 net-next 2/2] mptcp: move recbuf adjustment to recvmsg path
  2020-05-25 18:15 [PATCH v2 net-next 0/2] mptcp: adjust tcp rcvspace on rx Florian Westphal
  2020-05-25 18:15 ` [PATCH v2 net-next 1/2] mptcp: adjust tcp rcvspace after moving skbs from ssk to sk queue Florian Westphal
@ 2020-05-25 18:15 ` Florian Westphal
  2020-05-26 16:07   ` Christoph Paasch
  2020-05-27  3:28 ` [PATCH v2 net-next 0/2] mptcp: adjust tcp rcvspace on rx David Miller
  2 siblings, 1 reply; 8+ messages in thread
From: Florian Westphal @ 2020-05-25 18:15 UTC (permalink / raw)
  To: netdev; +Cc: matthieu.baerts, mathew.j.martineau, Paolo Abeni, Florian Westphal

From: Paolo Abeni <pabeni@redhat.com>

Place receive window tuning in the recvmsg path.
This makes sure the size is only increased when userspace consumes data.

Previously we would grow the sk receive buffer towards tcp_rmem[2], now we
so only if userspace reads data.

Simply adjust the msk rcvbuf size to the largest receive buffer of any of
the existing subflows.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 This patch is new in v2.

 net/mptcp/protocol.c | 32 ++++++++++++++++++++++++--------
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index dbb86cbb9e77..89a35c3fc499 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -190,13 +190,6 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
 		return false;
 	}
 
-	if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
-		int rcvbuf = max(ssk->sk_rcvbuf, sk->sk_rcvbuf);
-
-		if (rcvbuf > sk->sk_rcvbuf)
-			sk->sk_rcvbuf = rcvbuf;
-	}
-
 	tp = tcp_sk(ssk);
 	do {
 		u32 map_remaining, offset;
@@ -933,6 +926,25 @@ static bool __mptcp_move_skbs(struct mptcp_sock *msk)
 	return moved > 0;
 }
 
+static void mptcp_rcv_space_adjust(struct mptcp_sock *msk)
+{
+	const struct mptcp_subflow_context *subflow;
+	struct sock *sk = (struct sock *)msk;
+	const struct sock *ssk;
+	int rcvbuf = 0;
+
+	if (sk->sk_userlocks & SOCK_RCVBUF_LOCK)
+		return;
+
+	mptcp_for_each_subflow(msk, subflow) {
+		ssk = mptcp_subflow_tcp_sock(subflow);
+		rcvbuf = max(ssk->sk_rcvbuf, rcvbuf);
+	}
+
+	if (rcvbuf)
+		sk->sk_rcvbuf = rcvbuf;
+}
+
 static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 			 int nonblock, int flags, int *addr_len)
 {
@@ -962,6 +974,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 	target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
 	__mptcp_flush_join_list(msk);
 
+	mptcp_rcv_space_adjust(msk);
+
 	while (len > (size_t)copied) {
 		int bytes_read;
 
@@ -975,8 +989,10 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		copied += bytes_read;
 
 		if (skb_queue_empty(&sk->sk_receive_queue) &&
-		    __mptcp_move_skbs(msk))
+		    __mptcp_move_skbs(msk)) {
+			mptcp_rcv_space_adjust(msk);
 			continue;
+		}
 
 		/* only the master socket status is relevant here. The exit
 		 * conditions mirror closely tcp_recvmsg()
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 2/2] mptcp: move recbuf adjustment to recvmsg path
  2020-05-25 18:15 ` [PATCH v2 net-next 2/2] mptcp: move recbuf adjustment to recvmsg path Florian Westphal
@ 2020-05-26 16:07   ` Christoph Paasch
  2020-05-27 11:55     ` Florian Westphal
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Paasch @ 2020-05-26 16:07 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, Matthieu Baerts, Mat Martineau, Paolo Abeni

Hello,

On Mon, May 25, 2020 at 3:19 PM Florian Westphal <fw@strlen.de> wrote:
>
> From: Paolo Abeni <pabeni@redhat.com>
>
> Place receive window tuning in the recvmsg path.
> This makes sure the size is only increased when userspace consumes data.
>
> Previously we would grow the sk receive buffer towards tcp_rmem[2], now we
> so only if userspace reads data.
>
> Simply adjust the msk rcvbuf size to the largest receive buffer of any of
> the existing subflows.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  This patch is new in v2.
>
>  net/mptcp/protocol.c | 32 ++++++++++++++++++++++++--------
>  1 file changed, 24 insertions(+), 8 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index dbb86cbb9e77..89a35c3fc499 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -190,13 +190,6 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk,
>                 return false;
>         }
>
> -       if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) {
> -               int rcvbuf = max(ssk->sk_rcvbuf, sk->sk_rcvbuf);
> -
> -               if (rcvbuf > sk->sk_rcvbuf)
> -                       sk->sk_rcvbuf = rcvbuf;
> -       }
> -
>         tp = tcp_sk(ssk);
>         do {
>                 u32 map_remaining, offset;
> @@ -933,6 +926,25 @@ static bool __mptcp_move_skbs(struct mptcp_sock *msk)
>         return moved > 0;
>  }
>
> +static void mptcp_rcv_space_adjust(struct mptcp_sock *msk)
> +{
> +       const struct mptcp_subflow_context *subflow;
> +       struct sock *sk = (struct sock *)msk;
> +       const struct sock *ssk;
> +       int rcvbuf = 0;
> +
> +       if (sk->sk_userlocks & SOCK_RCVBUF_LOCK)
> +               return;
> +
> +       mptcp_for_each_subflow(msk, subflow) {
> +               ssk = mptcp_subflow_tcp_sock(subflow);
> +               rcvbuf = max(ssk->sk_rcvbuf, rcvbuf);

tcp_rcv_space_adjust is called even when the app is not yet reading,
thus wouldn't this mean that we still end up with an ever-growing
window?
E.g., imagine an app that does not read at all at the beginning. The
call to tcp_rcv_space_adjust in patch 1/2 will make the subflow's
window grow. Now, the app comes and reads one byte. Then, the window
at MPTCP-level will jump regardless of how much the app actually read.

I think what is needed is to size the MPTCP-level window to 2 x the
amount of data read by the application within an RTT (maximum RTT
among all the active subflows). That way if an app reads 1 byte a
second, the window will remain low. While for a bulk-transfer it will
allow all subflows to receive at full speed [1].

Or do you think that kind of tuning can be done in a follow-up patch?


Christoph

[1]: https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final125.pdf
-> Section 4.2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 0/2] mptcp: adjust tcp rcvspace on rx
  2020-05-25 18:15 [PATCH v2 net-next 0/2] mptcp: adjust tcp rcvspace on rx Florian Westphal
  2020-05-25 18:15 ` [PATCH v2 net-next 1/2] mptcp: adjust tcp rcvspace after moving skbs from ssk to sk queue Florian Westphal
  2020-05-25 18:15 ` [PATCH v2 net-next 2/2] mptcp: move recbuf adjustment to recvmsg path Florian Westphal
@ 2020-05-27  3:28 ` David Miller
  2020-05-27 11:55   ` Florian Westphal
  2 siblings, 1 reply; 8+ messages in thread
From: David Miller @ 2020-05-27  3:28 UTC (permalink / raw)
  To: fw; +Cc: netdev, matthieu.baerts, mathew.j.martineau, pabeni

From: Florian Westphal <fw@strlen.de>
Date: Mon, 25 May 2020 20:15:06 +0200

> These two patches improve mptcp throughput by making sure tcp grows
> the receive buffer when we move skbs from subflow socket to the
> mptcp socket.
> 
> The second patch moves mptcp receive buffer increase to the recvmsg
> path, i.e. we only change its size when userspace processes/consumes
> the data.  This is done by using the largest rcvbuf size of the active
> subflows.

What's the follow-up wrt. Christoph's feedback on patch #2?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 2/2] mptcp: move recbuf adjustment to recvmsg path
  2020-05-26 16:07   ` Christoph Paasch
@ 2020-05-27 11:55     ` Florian Westphal
  0 siblings, 0 replies; 8+ messages in thread
From: Florian Westphal @ 2020-05-27 11:55 UTC (permalink / raw)
  To: Christoph Paasch
  Cc: Florian Westphal, netdev, Matthieu Baerts, Mat Martineau, Paolo Abeni

Christoph Paasch <christoph.paasch@gmail.com> wrote:
> tcp_rcv_space_adjust is called even when the app is not yet reading,
> thus wouldn't this mean that we still end up with an ever-growing
> window?

Window is based on available mptcp sk recvbuf.  When data is moved from
ssk to the mptcp sk, the skb truesize is charged to the mptcp rmem.

> E.g., imagine an app that does not read at all at the beginning. The
> call to tcp_rcv_space_adjust in patch 1/2 will make the subflow's
> window grow. Now, the app comes and reads one byte. Then, the window
> at MPTCP-level will jump regardless of how much the app actually read.

Yes, the rcvbufsz value will jump, regardless homw much the app
actually read.

> I think what is needed is to size the MPTCP-level window to 2 x the
> amount of data read by the application within an RTT (maximum RTT
> among all the active subflows). That way if an app reads 1 byte a
> second, the window will remain low. While for a bulk-transfer it will
> allow all subflows to receive at full speed [1].

Sounds like the idea to move skbs to msk was bad one?
Sorry, I don't see how I can make this work.

Even deferring tcp_rcv_space_adjust() until recv() time won't work,
given data has been pulled to the mptcp socket already.

NOT calling tcp_rcv_space_adjust() at all might work, but that would
require something else entirely.  We would still have to adjust the
subflow rcvbufsz in this case, else we may announce a window that is
larger than the memory limit of the ssk (and we will end up dropping
data at tcp level if the worker can't move the skbs fast enough).

> Or do you think that kind of tuning can be done in a follow-up patch?

This sounds completely different so I don't think that makes sense.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 0/2] mptcp: adjust tcp rcvspace on rx
  2020-05-27  3:28 ` [PATCH v2 net-next 0/2] mptcp: adjust tcp rcvspace on rx David Miller
@ 2020-05-27 11:55   ` Florian Westphal
  2020-05-27 18:29     ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Westphal @ 2020-05-27 11:55 UTC (permalink / raw)
  To: David Miller; +Cc: fw, netdev, matthieu.baerts, mathew.j.martineau, pabeni

David Miller <davem@davemloft.net> wrote:
> From: Florian Westphal <fw@strlen.de>
> Date: Mon, 25 May 2020 20:15:06 +0200
> 
> > These two patches improve mptcp throughput by making sure tcp grows
> > the receive buffer when we move skbs from subflow socket to the
> > mptcp socket.
> > 
> > The second patch moves mptcp receive buffer increase to the recvmsg
> > path, i.e. we only change its size when userspace processes/consumes
> > the data.  This is done by using the largest rcvbuf size of the active
> > subflows.
> 
> What's the follow-up wrt. Christoph's feedback on patch #2?

Please drop these patches, I have no idea (yet?) how to address it.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 net-next 0/2] mptcp: adjust tcp rcvspace on rx
  2020-05-27 11:55   ` Florian Westphal
@ 2020-05-27 18:29     ` David Miller
  0 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2020-05-27 18:29 UTC (permalink / raw)
  To: fw; +Cc: netdev, matthieu.baerts, mathew.j.martineau, pabeni

From: Florian Westphal <fw@strlen.de>
Date: Wed, 27 May 2020 13:55:45 +0200

> David Miller <davem@davemloft.net> wrote:
>> From: Florian Westphal <fw@strlen.de>
>> Date: Mon, 25 May 2020 20:15:06 +0200
>> 
>> > These two patches improve mptcp throughput by making sure tcp grows
>> > the receive buffer when we move skbs from subflow socket to the
>> > mptcp socket.
>> > 
>> > The second patch moves mptcp receive buffer increase to the recvmsg
>> > path, i.e. we only change its size when userspace processes/consumes
>> > the data.  This is done by using the largest rcvbuf size of the active
>> > subflows.
>> 
>> What's the follow-up wrt. Christoph's feedback on patch #2?
> 
> Please drop these patches, I have no idea (yet?) how to address it.

Ok, thank you.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-05-27 18:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-25 18:15 [PATCH v2 net-next 0/2] mptcp: adjust tcp rcvspace on rx Florian Westphal
2020-05-25 18:15 ` [PATCH v2 net-next 1/2] mptcp: adjust tcp rcvspace after moving skbs from ssk to sk queue Florian Westphal
2020-05-25 18:15 ` [PATCH v2 net-next 2/2] mptcp: move recbuf adjustment to recvmsg path Florian Westphal
2020-05-26 16:07   ` Christoph Paasch
2020-05-27 11:55     ` Florian Westphal
2020-05-27  3:28 ` [PATCH v2 net-next 0/2] mptcp: adjust tcp rcvspace on rx David Miller
2020-05-27 11:55   ` Florian Westphal
2020-05-27 18:29     ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.