netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Eric Dumazet <edumazet@google.com>
Cc: David Miller <davem@davemloft.net>,
	David Ahern <dsahern@gmail.com>, Paolo Abeni <pabeni@redhat.com>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	netdev <netdev@vger.kernel.org>,
	Yuchung Cheng <ycheng@google.com>,
	Neal Cardwell <ncardwell@google.com>
Subject: Re: [PATCH net] ipv6: gro: flush instead of assuming different flows on hop_limit mismatch
Date: Mon, 24 Jan 2022 16:02:40 -0800	[thread overview]
Message-ID: <20220124160240.02a451bd@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net> (raw)
In-Reply-To: <CANn89iJY=oDHY+Fe=u+GHeb07LCUC305rwLehsE2Wq1TcidP8Q@mail.gmail.com>

Sorry for the delay I had to do some homework and more tests.

On Fri, 21 Jan 2022 08:37:12 -0800 Eric Dumazet wrote:
> On Fri, Jan 21, 2022 at 7:15 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > We implemented SACK compress in TCP stack to avoid extra SACK being
> > > sent by the receiver
> > >
> > > We have an extension of this SACK compression for TCP flows terminated
> > > by Google servers,
> > > since modern TCP stacks do not need the old rule of TCP_FASTRETRANS_THRESH
> > > DUPACK to start retransmits.
> > >
> > > Something like this pseudo code:
> > >
> > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > index dc49a3d551eb919baf5ad812ef21698c5c7b9679..d72554ab70fd2e16ed60dc78a905f4aa1414f8c9
> > > 100644
> > > --- a/net/ipv4/tcp_input.c
> > > +++ b/net/ipv4/tcp_input.c
> > > @@ -5494,7 +5494,8 @@ static void __tcp_ack_snd_check(struct sock *sk,
> > > int ofo_possible)
> > >         }
> > >         if (tp->dup_ack_counter < TCP_FASTRETRANS_THRESH) {
> > >                 tp->dup_ack_counter++;
> > > -               goto send_now;
> > > +               if (peer_is_using_old_rule_about_fastretrans(tp))
> > > +                       goto send_now;
> > >         }
> > >         tp->compressed_ack++;
> > >         if (hrtimer_is_queued(&tp->compressed_ack_timer))
> > >  
> >
> > Is this something we could upstream / test? peer_is_using.. does not
> > exist upstream.  
> 
> Sure, because we do not have a standardized way (at SYN SYNACK time)
> to advertise
> that the stack is not 10 years old.
> 
> This could be a per net-ns sysctl, or a per socket flag, or a per cgroup flag.
> 
> In our case, we do negotiate special TCP options, and allow these options
> only from internal communications.
> 
> (So we store this private bit in the socket itself)

This does not fix the problem, unfortunately. I still see TCP detecting
reordering based on SACK if re-transmits have higher TTL.

> > Coincidentally, speaking of sending SACKs, my initial testing was on
> > 5.12 kernels and there I saw what appeared to a lay person (me) like
> > missing ACKs. Receiver would receive segments:
> >
> > _AB_C_D_E
> >
> > where _ indicates loss. It'd SACK A, then generate the next SACK after E
> > (SACKing C D E), sender would rexmit A which makes receiver ACK all
> > the way to the end of B. Now sender thinks B arrived after CDE because
> > it was never sacked.
> >
> > Perhaps it was fixed by commit a29cb6914681 ("net: tcp better handling
> > of reordering then loss cases").. or it's a result of some out-of-tree
> > hack. I thought I'd mention it tho in case it immediately rings a bell
> > for anyone.  
> 
> Could all the missing SACK have been lost ?

I had tcpdump on both ends, but I can't repro any more with the GRO fix
applied. Maybe it was also related to that. Somehow.

> Writing a packetdrill test for this scenario should not be too hard.

  reply	other threads:[~2022-01-25  3:42 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-21  1:19 [PATCH net] ipv6: gro: flush instead of assuming different flows on hop_limit mismatch Jakub Kicinski
2022-01-21  8:55 ` Eric Dumazet
2022-01-21 15:15   ` Jakub Kicinski
2022-01-21 16:37     ` Eric Dumazet
2022-01-25  0:02       ` Jakub Kicinski [this message]
2022-01-24 19:23 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220124160240.02a451bd@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net \
    --to=kuba@kernel.org \
    --cc=davem@davemloft.net \
    --cc=dsahern@gmail.com \
    --cc=edumazet@google.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).