netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <edumazet@google.com>
To: Michal Kubecek <mkubecek@suse.cz>
Cc: Jiri Slaby <jirislaby@kernel.org>,
	"David S . Miller" <davem@davemloft.net>,
	 Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	netdev@vger.kernel.org,
	 Soheil Hassas Yeganeh <soheil@google.com>,
	Neal Cardwell <ncardwell@google.com>,
	 Yuchung Cheng <ycheng@google.com>,
	eric.dumazet@gmail.com
Subject: Re: [PATCH net-next] tcp: get rid of sysctl_tcp_adv_win_scale
Date: Fri, 3 Nov 2023 10:53:22 +0100	[thread overview]
Message-ID: <CANn89iKPdAVdPo1g15dEp3smAjM2rY0T25p3y2Dzu-poFk5kWA@mail.gmail.com> (raw)
In-Reply-To: <20231103092706.6rw2ehuigxfdvhlc@lion.mk-sys.cz>

On Fri, Nov 3, 2023 at 10:27 AM Michal Kubecek <mkubecek@suse.cz> wrote:
>
> On Fri, Nov 03, 2023 at 09:17:27AM +0100, Eric Dumazet wrote:
> >
> > It seems the test had some expectations.
> >
> > Setting a small (1 byte) RCVBUF/SNDBUF, and yet expecting to send
> > 46080 bytes fast enough was not reasonable.
> > It might have relied on the fact that tcp sendmsg() can cook large GSO
> > packets, even if sk->sk_sndbuf is small.
> >
> > With tight memory settings, it is possible TCP has to resort on RTO
> > timers (200ms by default) to recover from dropped packets.
>
> There seems to be one drop but somehow the sender does not recover from
> it, even if the retransmit and following packets are acked quickly:
>
> 09:15:29.424017 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [S], seq 104649613, win 33280, options [mss 65495,sackOK,TS val 1319295278 ecr 0,nop,wscale 7], length 0
> 09:15:29.424024 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [S.], seq 1343383818, ack 104649614, win 585, options [mss 65495,sackOK,TS val 1319295278 ecr 1319295278,nop,wscale 0], length 0
> 09:15:29.424031 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [.], ack 1, win 260, options [nop,nop,TS val 1319295278 ecr 1319295278], length 0
> 09:15:29.424155 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [.], seq 1:16641, ack 1, win 585, options [nop,nop,TS val 1319295279 ecr 1319295278], length 16640
> 09:15:29.424160 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [.], ack 16641, win 130, options [nop,nop,TS val 1319295279 ecr 1319295279], length 0
> 09:15:29.424179 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [P.], seq 16641:33281, ack 1, win 585, options [nop,nop,TS val 1319295279 ecr 1319295279], length 16640
> 09:15:29.424183 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [.], ack 16641, win 0, options [nop,nop,TS val 1319295279 ecr 1319295279], length 0
> 09:15:29.424280 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [P.], seq 1:12, ack 16641, win 16640, options [nop,nop,TS val 1319295279 ecr 1319295279], length 11
> 09:15:29.424284 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [.], ack 12, win 574, options [nop,nop,TS val 1319295279 ecr 1319295279], length 0
> 09:15:29.630272 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [P.], seq 16641:33281, ack 12, win 574, options [nop,nop,TS val 1319295485 ecr 1319295279], length 16640
> 09:15:29.630334 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [.], ack 33281, win 2304, options [nop,nop,TS val 1319295485 ecr 1319295485], length 0



> 09:15:29.836938 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [P.], seq 33281:35585, ack 12, win 574, options [nop,nop,TS val 1319295691 ecr 1319295485], length 2304
> 09:15:29.836984 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [.], ack 35585, win 2304, options [nop,nop,TS val 1319295691 ecr 1319295691], length 0
> 09:15:30.043606 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [P.], seq 35585:37889, ack 12, win 574, options [nop,nop,TS val 1319295898 ecr 1319295691], length 2304
> 09:15:30.043653 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [.], ack 37889, win 2304, options [nop,nop,TS val 1319295898 ecr 1319295898], length 0
> 09:15:30.250270 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [P.], seq 37889:40193, ack 12, win 574, options [nop,nop,TS val 1319296105 ecr 1319295898], length 2304
> 09:15:30.250316 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [.], ack 40193, win 2304, options [nop,nop,TS val 1319296105 ecr 1319296105], length 0
> 09:15:30.456932 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [P.], seq 40193:42497, ack 12, win 574, options [nop,nop,TS val 1319296311 ecr 1319296105], length 2304
> 09:15:30.456975 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [.], ack 42497, win 2304, options [nop,nop,TS val 1319296311 ecr 1319296311], length 0
> 09:15:30.663598 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [P.], seq 42497:44801, ack 12, win 574, options [nop,nop,TS val 1319296518 ecr 1319296311], length 2304
> 09:15:30.663638 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [.], ack 44801, win 2304, options [nop,nop,TS val 1319296518 ecr 1319296518], length 0
> 09:15:30.663646 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [FP.], seq 44801:46081, ack 12, win 574, options [nop,nop,TS val 1319296518 ecr 1319296518], length 1280
> 09:15:30.663712 IP 127.0.0.1.40386 > 127.0.0.1.42483: Flags [F.], seq 12, ack 46082, win 2304, options [nop,nop,TS val 1319296518 ecr 1319296518], length 0
> 09:15:30.663724 IP 127.0.0.1.42483 > 127.0.0.1.40386: Flags [.], ack 13, win 573, options [nop,nop,TS val 1319296518 ecr 1319296518], length 0
>
> (window size values are scaled here). Part of the problem is that the
> receiver side sets SO_RCVBUF after connect() so that the window shrinks
> after sender already sent more data; when I move the bufsized() calls
> in the python script before listen() and connect(), the test runs
> quickly.

This makes sense.

Old kernels would have instead dropped a packet, without changing test status:

09:49:49.390066 IP localhost.39710 > localhost.44173: Flags [S], seq
1464131415, win 65495, options [mss 65495,sackOK,TS val 578664891 ecr
0,nop,wscale 7], length 0
09:49:49.390078 IP localhost.44173 > localhost.39710: Flags [S.], seq
2322612108, ack 1464131416, win 1152, options [mss 65495,sackOK,TS val
578664891 ecr 578664891,nop,wscale 0], length 0
09:49:49.390088 IP localhost.39710 > localhost.44173: Flags [.], ack
1, win 512, options [nop,nop,TS val 578664891 ecr 578664891], length 0
09:49:49.390319 IP localhost.44173 > localhost.39710: Flags [.], seq
1:32769, ack 1, win 1152, options [nop,nop,TS val 578664892 ecr
578664891], length 32768
09:49:49.390325 IP localhost.39710 > localhost.44173: Flags [.], ack
32769, win 256, options [nop,nop,TS val 578664892 ecr 578664892],
length 0
09:49:49.390355 IP localhost.44173 > localhost.39710: Flags [P.], seq
32769:46081, ack 1, win 1152, options [nop,nop,TS val 578664892 ecr
578664892], length 13312
<prior packet has been dropped by receiver>
09:49:49.390479 IP localhost.39710 > localhost.44173: Flags [P.], seq
1:12, ack 32769, win 256, options [nop,nop,TS val 578664892 ecr
578664892], length 11
09:49:49.390483 IP localhost.44173 > localhost.39710: Flags [.], ack
12, win 1141, options [nop,nop,TS val 578664892 ecr 578664892], length
0
09:49:49.390547 IP localhost.44173 > localhost.39710: Flags [F.], seq
46081, ack 12, win 1141, options [nop,nop,TS val 578664892 ecr
578664892], length 0
09:49:49.390552 IP localhost.39710 > localhost.44173: Flags [.], ack
32769, win 256, options [nop,nop,TS val 578664892 ecr
578664892,nop,nop,sack 1 {46081:46082}], length 0

<packet retransmit>
09:49:49.390562 IP localhost.44173 > localhost.39710: Flags [P.], seq
32769:46081, ack 12, win 1141, options [nop,nop,TS val 578664892 ecr
578664892], length 13312
09:49:49.390567 IP localhost.39710 > localhost.44173: Flags [.], ack
46082, win 152, options [nop,nop,TS val 578664892 ecr 578664892],
length 0
09:49:49.390677 IP localhost.39710 > localhost.44173: Flags [F.], seq
12, ack 46082, win 152, options [nop,nop,TS val 578664892 ecr
578664892], length 0
09:49:49.390685 IP localhost.44173 > localhost.39710: Flags [.], ack
13, win 1141, options [nop,nop,TS val 578664892 ecr 578664892], length
0


Retracting TCP windows has always been problematic.

If we really want to be very gentle, this could add more logic,
shorter timer events for pathological cases like that,
I am not sure this is really worth it, especially if dealing with one
million TCP sockets in this state.

  reply	other threads:[~2023-11-03  9:53 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-17 15:29 [PATCH net-next] tcp: get rid of sysctl_tcp_adv_win_scale Eric Dumazet
2023-07-17 16:52 ` Soheil Hassas Yeganeh
2023-07-17 17:17   ` Eric Dumazet
2023-07-17 17:20     ` Soheil Hassas Yeganeh
2023-07-19  2:10 ` patchwork-bot+netdevbpf
2023-07-20 15:41 ` Paolo Abeni
2023-07-20 15:50   ` Eric Dumazet
2023-11-03  6:10 ` Jiri Slaby
2023-11-03  6:56   ` Jiri Slaby
2023-11-03  7:07     ` Jiri Slaby
2023-11-03  8:17       ` Eric Dumazet
2023-11-03  9:27         ` Michal Kubecek
2023-11-03  9:53           ` Eric Dumazet [this message]
2023-11-03 10:14         ` Jiri Slaby
2023-11-03 10:27           ` Eric Dumazet
2023-11-03 11:07             ` Jiri Slaby
2024-04-02 15:28 ` shironeko
2024-04-02 15:50   ` Eric Dumazet
2024-04-02 16:23     ` Eric Dumazet
2024-04-02 16:29       ` shironeko
2024-04-06  0:22         ` shironeko
2024-04-06  6:11           ` Eric Dumazet
2024-04-06  7:07             ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANn89iKPdAVdPo1g15dEp3smAjM2rY0T25p3y2Dzu-poFk5kWA@mail.gmail.com \
    --to=edumazet@google.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=jirislaby@kernel.org \
    --cc=kuba@kernel.org \
    --cc=mkubecek@suse.cz \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=soheil@google.com \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).