stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <edumazet@google.com>
To: Neal Cardwell <ncardwell@google.com>
Cc: "Mohamed Abuelfotoh, Hazem" <abuehaze@amazon.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>,
	"ycheng@google.com" <ycheng@google.com>,
	"weiwan@google.com" <weiwan@google.com>,
	"Strohman, Andy" <astroh@amazon.com>,
	"Herrenschmidt, Benjamin" <benh@amazon.com>
Subject: Re: [PATCH net-next] tcp: optimise receiver buffer autotuning initialisation for high latency connections
Date: Mon, 7 Dec 2020 18:08:17 +0100	[thread overview]
Message-ID: <CANn89iJyw+EYiXLz_mYQQxdqnZn=vhmj9fj=0Qz0doyzZCsMnQ@mail.gmail.com> (raw)
In-Reply-To: <CADVnQymROUn6jQdPKxNr_Uc3KMqjX4t0M6=HC6rDxmZzZVv0=Q@mail.gmail.com>

On Mon, Dec 7, 2020 at 5:34 PM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Mon, Dec 7, 2020 at 11:23 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Mon, Dec 7, 2020 at 5:09 PM Mohamed Abuelfotoh, Hazem
> > <abuehaze@amazon.com> wrote:
> > >
> > >     >Since I can not reproduce this problem with another NIC on x86, I
> > >     >really wonder if this is not an issue with ENA driver on PowerPC
> > >     >perhaps ?
> > >
> > >
> > > I am able to reproduce it on x86 based EC2 instances using ENA  or  Xen netfront or Intel ixgbevf driver on the receiver so it's not specific to ENA, we were able to easily reproduce it between 2 VMs running in virtual box on the same physical host considering the environment requirements I mentioned in my first e-mail.
> > >
> > > What's the RTT between the sender & receiver in your reproduction? Are you using bbr on the sender side?
> >
> >
> > 100ms RTT
> >
> > Which exact version of linux kernel are you using ?
>
> Thanks for testing this, Eric. Would you be able to share the MTU
> config commands you used, and the tcpdump traces you get? I'm
> surprised that receive buffer autotuning would work for advmss of
> around 6500 or higher.

autotuning might be delayed by one RTT, this does not match numbers
given by Mohamed (flows stuck in low speed)

autotuning is an heuristic, and because it has one RTT latency, it is
crucial to get proper initial rcvmem values.

People using MTU=9000 should know they have to tune tcp_rmem[1]
accordingly, especially when using drivers consuming one page per
incoming MSS.


(mlx4 driver only uses ome 2048 bytes fragment for a 1500 MTU packet.
even with MTU set to 9000)

I want to state again that using 536 bytes as a magic value makes no
sense to me.


For the record, Google has increased tcp_rmem[1] when switching to a bigger MTU.

The reason is simple : If we intend to receive 10 MSS, we should allow
for 90000 bytes of payload, or tcp_rmem[1] set to 180,000
Because of autotuning latency, doubling the value is advised : 360000

Another problem with kicking autotuning too fast is that it might
allow bigger sk->sk_rcvbuf values even for small flows, opening more
surface to malicious attacks.

I _think_ that if we want to allow admins to set high MTU without
having to tune tcp_rmem[], we need something different than current
proposal.

  reply	other threads:[~2020-12-07 17:09 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-04 18:06 [PATCH net-next] tcp: optimise receiver buffer autotuning initialisation for high latency connections Hazem Mohamed Abuelfotoh
2020-12-04 18:19 ` Mohamed Abuelfotoh, Hazem
2020-12-04 18:41   ` Eric Dumazet
     [not found]     ` <3F02FF08-EDA6-4DFD-8D93-479A5B05E25A@amazon.com>
2020-12-07 15:25       ` Eric Dumazet
2020-12-07 16:09         ` Mohamed Abuelfotoh, Hazem
2020-12-07 16:22           ` Eric Dumazet
2020-12-07 16:33             ` Neal Cardwell
2020-12-07 17:08               ` Eric Dumazet [this message]
2020-12-07 20:09                 ` Mohamed Abuelfotoh, Hazem
2020-12-07 23:22                   ` Eric Dumazet
2020-12-07 17:16               ` Mohamed Abuelfotoh, Hazem
2020-12-07 17:27                 ` Eric Dumazet
2020-12-08 16:28                   ` Mohamed Abuelfotoh, Hazem
2020-12-08 16:30                     ` Mohamed Abuelfotoh, Hazem
2020-12-08 16:46                     ` Eric Dumazet
2020-12-07 16:34             ` Mohamed Abuelfotoh, Hazem
2020-12-07 17:46               ` Greg KH
2020-12-07 17:54                 ` Mohamed Abuelfotoh, Hazem
2020-12-04 19:10 ` Eric Dumazet
2020-12-04 21:28 ` Neal Cardwell
2020-12-07 11:46   ` [PATCH net] tcp: fix receive buffer autotuning to trigger for any valid advertised MSS Hazem Mohamed Abuelfotoh
2020-12-07 18:53     ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANn89iJyw+EYiXLz_mYQQxdqnZn=vhmj9fj=0Qz0doyzZCsMnQ@mail.gmail.com' \
    --to=edumazet@google.com \
    --cc=abuehaze@amazon.com \
    --cc=astroh@amazon.com \
    --cc=benh@amazon.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=weiwan@google.com \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).