All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] tcp: select sane initial rcvq_space.space for big MSS
@ 2020-12-08 16:21 Eric Dumazet
  2020-12-08 20:45 ` Soheil Hassas Yeganeh
  2020-12-09  0:28 ` David Miller
  0 siblings, 2 replies; 5+ messages in thread
From: Eric Dumazet @ 2020-12-08 16:21 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Eric Dumazet, Soheil Hassas Yeganeh,
	Neal Cardwell, Yuchung Cheng, Hazem Mohamed Abuelfotoh

From: Eric Dumazet <edumazet@google.com>

Before commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
small tcp_rmem[1] values were overridden by tcp_fixup_rcvbuf() to accommodate various MSS.

This is no longer the case, and Hazem Mohamed Abuelfotoh reported
that DRS would not work for MTU 9000 endpoints receiving regular (1500 bytes) frames.

Root cause is that tcp_init_buffer_space() uses tp->rcv_wnd for upper limit
of rcvq_space.space computation, while it can select later a smaller
value for tp->rcv_ssthresh and tp->window_clamp.

ss -temoi on receiver would show :

skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) rcv_space:62496 rcv_ssthresh:56596

This means that TCP can not increase its window in tcp_grow_window(),
and that DRS can never kick.

Fix this by making sure that rcvq_space.space is not bigger than number of bytes
that can be held in TCP receive queue.

People unable/unwilling to change their kernel can work around this issue by
selecting a bigger tcp_rmem[1] value as in :

echo "4096 196608 6291456" >/proc/sys/net/ipv4/tcp_rmem

Based on an initial report and patch from Hazem Mohamed Abuelfotoh
 https://lore.kernel.org/netdev/20201204180622.14285-1-abuehaze@amazon.com/

Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
Fixes: 041a14d26715 ("tcp: start receiver buffer autotuning sooner")
Reported-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_input.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 389d1b34024854a9bdcbe861d4820d1bfb495e24..ef4bdb038a4bbbd949868a01dc855bba0e90b9ca 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -510,7 +510,6 @@ static void tcp_init_buffer_space(struct sock *sk)
 	if (!(sk->sk_userlocks & SOCK_SNDBUF_LOCK))
 		tcp_sndbuf_expand(sk);
 
-	tp->rcvq_space.space = min_t(u32, tp->rcv_wnd, TCP_INIT_CWND * tp->advmss);
 	tcp_mstamp_refresh(tp);
 	tp->rcvq_space.time = tp->tcp_mstamp;
 	tp->rcvq_space.seq = tp->copied_seq;
@@ -534,6 +533,8 @@ static void tcp_init_buffer_space(struct sock *sk)
 
 	tp->rcv_ssthresh = min(tp->rcv_ssthresh, tp->window_clamp);
 	tp->snd_cwnd_stamp = tcp_jiffies32;
+	tp->rcvq_space.space = min3(tp->rcv_ssthresh, tp->rcv_wnd,
+				    (u32)TCP_INIT_CWND * tp->advmss);
 }
 
 /* 4. Recalculate window clamp after socket hit its memory bounds. */
-- 
2.29.2.576.ga3fc446d84-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net] tcp: select sane initial rcvq_space.space for big MSS
  2020-12-08 16:21 [PATCH net] tcp: select sane initial rcvq_space.space for big MSS Eric Dumazet
@ 2020-12-08 20:45 ` Soheil Hassas Yeganeh
  2020-12-09  0:28 ` David Miller
  1 sibling, 0 replies; 5+ messages in thread
From: Soheil Hassas Yeganeh @ 2020-12-08 20:45 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, netdev, Eric Dumazet,
	Neal Cardwell, Yuchung Cheng, Hazem Mohamed Abuelfotoh

On Tue, Dec 8, 2020 at 11:21 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
>
> Before commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
> small tcp_rmem[1] values were overridden by tcp_fixup_rcvbuf() to accommodate various MSS.
>
> This is no longer the case, and Hazem Mohamed Abuelfotoh reported
> that DRS would not work for MTU 9000 endpoints receiving regular (1500 bytes) frames.
>
> Root cause is that tcp_init_buffer_space() uses tp->rcv_wnd for upper limit
> of rcvq_space.space computation, while it can select later a smaller
> value for tp->rcv_ssthresh and tp->window_clamp.
>
> ss -temoi on receiver would show :
>
> skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) rcv_space:62496 rcv_ssthresh:56596
>
> This means that TCP can not increase its window in tcp_grow_window(),
> and that DRS can never kick.
>
> Fix this by making sure that rcvq_space.space is not bigger than number of bytes
> that can be held in TCP receive queue.
>
> People unable/unwilling to change their kernel can work around this issue by
> selecting a bigger tcp_rmem[1] value as in :
>
> echo "4096 196608 6291456" >/proc/sys/net/ipv4/tcp_rmem
>
> Based on an initial report and patch from Hazem Mohamed Abuelfotoh
>  https://lore.kernel.org/netdev/20201204180622.14285-1-abuehaze@amazon.com/
>
> Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
> Fixes: 041a14d26715 ("tcp: start receiver buffer autotuning sooner")
> Reported-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Acked-by: Soheil Hassas Yeganeh <soheil@google.com>

Nice catch and fix!  Thanks Eric!


> ---
>  net/ipv4/tcp_input.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 389d1b34024854a9bdcbe861d4820d1bfb495e24..ef4bdb038a4bbbd949868a01dc855bba0e90b9ca 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -510,7 +510,6 @@ static void tcp_init_buffer_space(struct sock *sk)
>         if (!(sk->sk_userlocks & SOCK_SNDBUF_LOCK))
>                 tcp_sndbuf_expand(sk);
>
> -       tp->rcvq_space.space = min_t(u32, tp->rcv_wnd, TCP_INIT_CWND * tp->advmss);
>         tcp_mstamp_refresh(tp);
>         tp->rcvq_space.time = tp->tcp_mstamp;
>         tp->rcvq_space.seq = tp->copied_seq;
> @@ -534,6 +533,8 @@ static void tcp_init_buffer_space(struct sock *sk)
>
>         tp->rcv_ssthresh = min(tp->rcv_ssthresh, tp->window_clamp);
>         tp->snd_cwnd_stamp = tcp_jiffies32;
> +       tp->rcvq_space.space = min3(tp->rcv_ssthresh, tp->rcv_wnd,
> +                                   (u32)TCP_INIT_CWND * tp->advmss);
>  }
>
>  /* 4. Recalculate window clamp after socket hit its memory bounds. */
> --
> 2.29.2.576.ga3fc446d84-goog
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] tcp: select sane initial rcvq_space.space for big MSS
  2020-12-08 16:21 [PATCH net] tcp: select sane initial rcvq_space.space for big MSS Eric Dumazet
  2020-12-08 20:45 ` Soheil Hassas Yeganeh
@ 2020-12-09  0:28 ` David Miller
  2020-12-10 12:49   ` Mohamed Abuelfotoh, Hazem
  1 sibling, 1 reply; 5+ messages in thread
From: David Miller @ 2020-12-09  0:28 UTC (permalink / raw)
  To: eric.dumazet; +Cc: kuba, netdev, edumazet, soheil, ncardwell, ycheng, abuehaze

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue,  8 Dec 2020 08:21:31 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Before commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
> small tcp_rmem[1] values were overridden by tcp_fixup_rcvbuf() to accommodate various MSS.
> 
> This is no longer the case, and Hazem Mohamed Abuelfotoh reported
> that DRS would not work for MTU 9000 endpoints receiving regular (1500 bytes) frames.
> 
> Root cause is that tcp_init_buffer_space() uses tp->rcv_wnd for upper limit
> of rcvq_space.space computation, while it can select later a smaller
> value for tp->rcv_ssthresh and tp->window_clamp.
> 
> ss -temoi on receiver would show :
> 
> skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) rcv_space:62496 rcv_ssthresh:56596
> 
> This means that TCP can not increase its window in tcp_grow_window(),
> and that DRS can never kick.
> 
> Fix this by making sure that rcvq_space.space is not bigger than number of bytes
> that can be held in TCP receive queue.
> 
> People unable/unwilling to change their kernel can work around this issue by
> selecting a bigger tcp_rmem[1] value as in :
> 
> echo "4096 196608 6291456" >/proc/sys/net/ipv4/tcp_rmem
> 
> Based on an initial report and patch from Hazem Mohamed Abuelfotoh
>  https://lore.kernel.org/netdev/20201204180622.14285-1-abuehaze@amazon.com/
> 
> Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
> Fixes: 041a14d26715 ("tcp: start receiver buffer autotuning sooner")
> Reported-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] tcp: select sane initial rcvq_space.space for big MSS
  2020-12-09  0:28 ` David Miller
@ 2020-12-10 12:49   ` Mohamed Abuelfotoh, Hazem
  2020-12-10 13:34     ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Mohamed Abuelfotoh, Hazem @ 2020-12-10 12:49 UTC (permalink / raw)
  To: David Miller, eric.dumazet
  Cc: kuba, netdev, edumazet, soheil, ncardwell, ycheng

Hi Eric,

I don't see the patch in the stable queue. Can we add it  to stable so we can cherry pick it  in Amazon Linux kernel?

Thank you.

Hazem

On 09/12/2020, 00:29, "David Miller" <davem@davemloft.net> wrote:

    CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.



    From: Eric Dumazet <eric.dumazet@gmail.com>
    Date: Tue,  8 Dec 2020 08:21:31 -0800

    > From: Eric Dumazet <edumazet@google.com>
    >
    > Before commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
    > small tcp_rmem[1] values were overridden by tcp_fixup_rcvbuf() to accommodate various MSS.
    >
    > This is no longer the case, and Hazem Mohamed Abuelfotoh reported
    > that DRS would not work for MTU 9000 endpoints receiving regular (1500 bytes) frames.
    >
    > Root cause is that tcp_init_buffer_space() uses tp->rcv_wnd for upper limit
    > of rcvq_space.space computation, while it can select later a smaller
    > value for tp->rcv_ssthresh and tp->window_clamp.
    >
    > ss -temoi on receiver would show :
    >
    > skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) rcv_space:62496 rcv_ssthresh:56596
    >
    > This means that TCP can not increase its window in tcp_grow_window(),
    > and that DRS can never kick.
    >
    > Fix this by making sure that rcvq_space.space is not bigger than number of bytes
    > that can be held in TCP receive queue.
    >
    > People unable/unwilling to change their kernel can work around this issue by
    > selecting a bigger tcp_rmem[1] value as in :
    >
    > echo "4096 196608 6291456" >/proc/sys/net/ipv4/tcp_rmem
    >
    > Based on an initial report and patch from Hazem Mohamed Abuelfotoh
    >  https://lore.kernel.org/netdev/20201204180622.14285-1-abuehaze@amazon.com/
    >
    > Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
    > Fixes: 041a14d26715 ("tcp: start receiver buffer autotuning sooner")
    > Reported-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com>
    > Signed-off-by: Eric Dumazet <edumazet@google.com>

    Applied, thanks Eric.




Amazon Web Services EMEA SARL, 38 avenue John F. Kennedy, L-1855 Luxembourg, R.C.S. Luxembourg B186284

Amazon Web Services EMEA SARL, Irish Branch, One Burlington Plaza, Burlington Road, Dublin 4, Ireland, branch registration number 908705



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] tcp: select sane initial rcvq_space.space for big MSS
  2020-12-10 12:49   ` Mohamed Abuelfotoh, Hazem
@ 2020-12-10 13:34     ` Eric Dumazet
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2020-12-10 13:34 UTC (permalink / raw)
  To: Mohamed Abuelfotoh, Hazem
  Cc: David Miller, eric.dumazet, kuba, netdev, soheil, ncardwell, ycheng

On Thu, Dec 10, 2020 at 1:49 PM Mohamed Abuelfotoh, Hazem
<abuehaze@amazon.com> wrote:
>
> Hi Eric,
>
> I don't see the patch in the stable queue. Can we add it  to stable so we can cherry pick it  in Amazon Linux kernel?
>

No need for stable tags , as documented in
https://elixir.bootlin.com/linux/v5.9/source/Documentation/networking/netdev-FAQ.rst#L147


> Thank you.
>
> Hazem
>
> On 09/12/2020, 00:29, "David Miller" <davem@davemloft.net> wrote:
>
>     CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
>     From: Eric Dumazet <eric.dumazet@gmail.com>
>     Date: Tue,  8 Dec 2020 08:21:31 -0800
>
>     > From: Eric Dumazet <edumazet@google.com>
>     >
>     > Before commit a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
>     > small tcp_rmem[1] values were overridden by tcp_fixup_rcvbuf() to accommodate various MSS.
>     >
>     > This is no longer the case, and Hazem Mohamed Abuelfotoh reported
>     > that DRS would not work for MTU 9000 endpoints receiving regular (1500 bytes) frames.
>     >
>     > Root cause is that tcp_init_buffer_space() uses tp->rcv_wnd for upper limit
>     > of rcvq_space.space computation, while it can select later a smaller
>     > value for tp->rcv_ssthresh and tp->window_clamp.
>     >
>     > ss -temoi on receiver would show :
>     >
>     > skmem:(r0,rb131072,t0,tb46080,f0,w0,o0,bl0,d0) rcv_space:62496 rcv_ssthresh:56596
>     >
>     > This means that TCP can not increase its window in tcp_grow_window(),
>     > and that DRS can never kick.
>     >
>     > Fix this by making sure that rcvq_space.space is not bigger than number of bytes
>     > that can be held in TCP receive queue.
>     >
>     > People unable/unwilling to change their kernel can work around this issue by
>     > selecting a bigger tcp_rmem[1] value as in :
>     >
>     > echo "4096 196608 6291456" >/proc/sys/net/ipv4/tcp_rmem
>     >
>     > Based on an initial report and patch from Hazem Mohamed Abuelfotoh
>     >  https://lore.kernel.org/netdev/20201204180622.14285-1-abuehaze@amazon.com/
>     >
>     > Fixes: a337531b942b ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
>     > Fixes: 041a14d26715 ("tcp: start receiver buffer autotuning sooner")
>     > Reported-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com>
>     > Signed-off-by: Eric Dumazet <edumazet@google.com>
>
>     Applied, thanks Eric.
>
>
>
>
> Amazon Web Services EMEA SARL, 38 avenue John F. Kennedy, L-1855 Luxembourg, R.C.S. Luxembourg B186284
>
> Amazon Web Services EMEA SARL, Irish Branch, One Burlington Plaza, Burlington Road, Dublin 4, Ireland, branch registration number 908705
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-12-10 13:35 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-08 16:21 [PATCH net] tcp: select sane initial rcvq_space.space for big MSS Eric Dumazet
2020-12-08 20:45 ` Soheil Hassas Yeganeh
2020-12-09  0:28 ` David Miller
2020-12-10 12:49   ` Mohamed Abuelfotoh, Hazem
2020-12-10 13:34     ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.