linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] tcp: note that tcp_rmem[1] has a limited range
@ 2022-01-04  0:37 Ivan Babrou
  2022-01-04  0:44 ` Stephen Hemminger
  0 siblings, 1 reply; 6+ messages in thread
From: Ivan Babrou @ 2022-01-04  0:37 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, kernel-team, David S . Miller, Jakub Kicinski,
	Jonathan Corbet, Eric Dumazet, Ivan Babrou

The value of rcv_ssthresh is limited to tcp_rwin, which is limited
to 64k at the handshake time, since window scaling is not allowed there.

Let's add a note to the docs that increasing tcp_rmem[1] does not have
any effect on raising the initial value of rcv_ssthresh past 64k.

Link: https://lkml.org/lkml/2021/12/22/652
Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
---
 Documentation/networking/ip-sysctl.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index 2572eecc3e86..16528bc92e65 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -683,7 +683,8 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max
 	default: initial size of receive buffer used by TCP sockets.
 	This value overrides net.core.rmem_default used by other protocols.
 	Default: 131072 bytes.
-	This value results in initial window of 65535.
+	This value results in initial window of 65535. Increasing this value
+	won't raise the initial advertised window above 65535.
 
 	max: maximal size of receive buffer allowed for automatically
 	selected receiver buffers for TCP socket. This value does not override
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net] tcp: note that tcp_rmem[1] has a limited range
  2022-01-04  0:37 [PATCH net] tcp: note that tcp_rmem[1] has a limited range Ivan Babrou
@ 2022-01-04  0:44 ` Stephen Hemminger
  2022-01-04  8:33   ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Stephen Hemminger @ 2022-01-04  0:44 UTC (permalink / raw)
  To: Ivan Babrou
  Cc: netdev, linux-kernel, kernel-team, David S . Miller,
	Jakub Kicinski, Jonathan Corbet, Eric Dumazet

On Mon,  3 Jan 2022 16:37:22 -0800
Ivan Babrou <ivan@cloudflare.com> wrote:

> The value of rcv_ssthresh is limited to tcp_rwin, which is limited
> to 64k at the handshake time, since window scaling is not allowed there.
> 
> Let's add a note to the docs that increasing tcp_rmem[1] does not have
> any effect on raising the initial value of rcv_ssthresh past 64k.
> 
> Link: https://lkml.org/lkml/2021/12/22/652
> Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
> ---
>  Documentation/networking/ip-sysctl.rst | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
> index 2572eecc3e86..16528bc92e65 100644
> --- a/Documentation/networking/ip-sysctl.rst
> +++ b/Documentation/networking/ip-sysctl.rst
> @@ -683,7 +683,8 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max
>  	default: initial size of receive buffer used by TCP sockets.
>  	This value overrides net.core.rmem_default used by other protocols.
>  	Default: 131072 bytes.
> -	This value results in initial window of 65535.
> +	This value results in initial window of 65535. Increasing this value
> +	won't raise the initial advertised window above 65535.
>  
>  	max: maximal size of receive buffer allowed for automatically
>  	selected receiver buffers for TCP socket. This value does not override

Why not add error check or warning in write to sysctl?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] tcp: note that tcp_rmem[1] has a limited range
  2022-01-04  0:44 ` Stephen Hemminger
@ 2022-01-04  8:33   ` Eric Dumazet
  2022-01-06  4:20     ` Ivan Babrou
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2022-01-04  8:33 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Ivan Babrou, netdev, LKML, kernel-team, David S . Miller,
	Jakub Kicinski, Jonathan Corbet

On Mon, Jan 3, 2022 at 4:44 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Mon,  3 Jan 2022 16:37:22 -0800
> Ivan Babrou <ivan@cloudflare.com> wrote:
>
> > The value of rcv_ssthresh is limited to tcp_rwin, which is limited
> > to 64k at the handshake time, since window scaling is not allowed there.
> >
> > Let's add a note to the docs that increasing tcp_rmem[1] does not have
> > any effect on raising the initial value of rcv_ssthresh past 64k.

I guess you have to define what is the initial window.

There seems to be a confusion between rcv_ssthresh and sk_rcvbuf

If you want to document what is rcv_ssthresh and how it relates to sk_rcvbuf,
you probably need more than few lines in Documentation/networking/ip-sysctl.rst

> >
> > Link: https://lkml.org/lkml/2021/12/22/652
> > Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
> > ---
> >  Documentation/networking/ip-sysctl.rst | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
> > index 2572eecc3e86..16528bc92e65 100644
> > --- a/Documentation/networking/ip-sysctl.rst
> > +++ b/Documentation/networking/ip-sysctl.rst
> > @@ -683,7 +683,8 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max
> >       default: initial size of receive buffer used by TCP sockets.
> >       This value overrides net.core.rmem_default used by other protocols.
> >       Default: 131072 bytes.
> > -     This value results in initial window of 65535.
> > +     This value results in initial window of 65535. Increasing this value
> > +     won't raise the initial advertised window above 65535.
> >
> >       max: maximal size of receive buffer allowed for automatically
> >       selected receiver buffers for TCP socket. This value does not override
>
> Why not add error check or warning in write to sysctl?

Please do not. We set this sysctl to 0.5 MB
DRS is known to have quantization artifacts.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] tcp: note that tcp_rmem[1] has a limited range
  2022-01-04  8:33   ` Eric Dumazet
@ 2022-01-06  4:20     ` Ivan Babrou
  2022-01-06  8:25       ` Eric Dumazet
  0 siblings, 1 reply; 6+ messages in thread
From: Ivan Babrou @ 2022-01-06  4:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, netdev, LKML, kernel-team, David S . Miller,
	Jakub Kicinski, Jonathan Corbet

On Tue, Jan 4, 2022 at 12:33 AM Eric Dumazet <edumazet@google.com> wrote:
> I guess you have to define what is the initial window.

What I mean here is the first window after scaling is allowed, so the
one that appears in the first non-SYN ACK.

> There seems to be a confusion between rcv_ssthresh and sk_rcvbuf
>
> If you want to document what is rcv_ssthresh and how it relates to sk_rcvbuf,
> you probably need more than few lines in Documentation/networking/ip-sysctl.rst

I can't say I fully understand how buffer sizes grow and how
rcv_ssthresh and sk_rcvbuf interact to document this properly.

All I want is to document the fact that no matter what you punch into
sysctls, you'll end up with an initial scaled window (defined above)
that's no higher than 64k. Let me know if this is incorrect and if
there's a way we can put this into words without going into too much
detail.

> Please do not. We set this sysctl to 0.5 MB
> DRS is known to have quantization artifacts.

Where can I read more about the quantization artifacts you mentioned?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] tcp: note that tcp_rmem[1] has a limited range
  2022-01-06  4:20     ` Ivan Babrou
@ 2022-01-06  8:25       ` Eric Dumazet
  2022-01-06 22:40         ` Ivan Babrou
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2022-01-06  8:25 UTC (permalink / raw)
  To: Ivan Babrou
  Cc: Stephen Hemminger, netdev, LKML, kernel-team, David S . Miller,
	Jakub Kicinski, Jonathan Corbet

On Wed, Jan 5, 2022 at 8:20 PM Ivan Babrou <ivan@cloudflare.com> wrote:
>
> On Tue, Jan 4, 2022 at 12:33 AM Eric Dumazet <edumazet@google.com> wrote:
> > I guess you have to define what is the initial window.
>
> What I mean here is the first window after scaling is allowed, so the
> one that appears in the first non-SYN ACK.
>
> > There seems to be a confusion between rcv_ssthresh and sk_rcvbuf
> >
> > If you want to document what is rcv_ssthresh and how it relates to sk_rcvbuf,
> > you probably need more than few lines in Documentation/networking/ip-sysctl.rst
>
> I can't say I fully understand how buffer sizes grow and how
> rcv_ssthresh and sk_rcvbuf interact to document this properly.
>
> All I want is to document the fact that no matter what you punch into
> sysctls, you'll end up with an initial scaled window (defined above)
> that's no higher than 64k. Let me know if this is incorrect and if
> there's a way we can put this into words without going into too much
> detail.

Just to clarify, normal TCP 3WHS has a final ACK packet, where window
scaling is enabled.

You describe a possible issue of passive connections.
Most of the time, servers want some kind of control before allowing a
remote peer to send MB of payload in the first round trip.

However, a typical connection starts with IW10 (rfc 6928), and
standard TCP congestion
control would implement Slow Start, doubling the payload at every round trip,
so this is not an issue.

If you want to enable bigger than 65535 RWIN for passive connections,
this would violate standards and should be discussed first at IETF.

If you want to enable bigger than 65535 RWIN for passive connections
in a controlled environment, I suggest using an eBPF program to do so.


>
> > Please do not. We set this sysctl to 0.5 MB
> > DRS is known to have quantization artifacts.
>
> Where can I read more about the quantization artifacts you mentioned?

DRS is implemented in tcp_rcv_space_adjust()/tcp_rcv_rtt_update(),
you can look at git history to get plenty of details.

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=c3916ad9320eed8eacd7c0b2cf7f881efceda892

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] tcp: note that tcp_rmem[1] has a limited range
  2022-01-06  8:25       ` Eric Dumazet
@ 2022-01-06 22:40         ` Ivan Babrou
  0 siblings, 0 replies; 6+ messages in thread
From: Ivan Babrou @ 2022-01-06 22:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, netdev, LKML, kernel-team, David S . Miller,
	Jakub Kicinski, Jonathan Corbet

On Thu, Jan 6, 2022 at 12:25 AM Eric Dumazet <edumazet@google.com> wrote:

> Just to clarify, normal TCP 3WHS has a final ACK packet, where window
> scaling is enabled.

Correct, yet this final ACK packet won't signal the initial scaled
window above 64k. That's what I'm trying to document, as it seems like
a useful thing to keep in mind. If this statement is incorrect, then
I'm definitely missing something very basic. Let me know if that's the
case.

> You describe a possible issue of passive connections.
> Most of the time, servers want some kind of control before allowing a
> remote peer to send MB of payload in the first round trip.

Let's focus purely on the client side of it. The client is willing to
receive the large payload (let's say 250K), yet it cannot signal this
fact to the server.

> However, a typical connection starts with IW10 (rfc 6928), and
> standard TCP congestion
> control would implement Slow Start, doubling the payload at every round trip,
> so this is not an issue.

It's not an issue on a low latency link, but when a latency sensitive
client is trying to retrieve something across a 300ms RTT link, extra
round trips to stretch the window add a lot of latency.

> If you want to enable bigger than 65535 RWIN for passive connections,
> this would violate standards and should be discussed first at IETF.

I understand this and I don't intend to do this.

> If you want to enable bigger than 65535 RWIN for passive connections
> in a controlled environment, I suggest using an eBPF program to do so.

Right, ebpf was your suggestion: https://lkml.org/lkml/2021/12/22/668

The intention of this patch is to say that you can't achieve this even
for active connections with the client that is willing to advertise a
larger window in the first non-SYN ACK. Currently even with ebpf you
cannot do this, but I'm happy to add the support.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-01-06 22:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-04  0:37 [PATCH net] tcp: note that tcp_rmem[1] has a limited range Ivan Babrou
2022-01-04  0:44 ` Stephen Hemminger
2022-01-04  8:33   ` Eric Dumazet
2022-01-06  4:20     ` Ivan Babrou
2022-01-06  8:25       ` Eric Dumazet
2022-01-06 22:40         ` Ivan Babrou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).