All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <edumazet@google.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>,
	Alexander Duyck <alexander.duyck@gmail.com>,
	David Miller <davem@davemloft.net>,
	netdev <netdev@vger.kernel.org>, kernel-team <kernel-team@fb.com>,
	Neil Spring <ntspring@fb.com>, Yuchung Cheng <ycheng@google.com>
Subject: Re: [PATCH net] net: tcp: don't allocate fast clones for fastopen SYN
Date: Fri, 5 Mar 2021 06:33:50 +0100	[thread overview]
Message-ID: <CANn89iK_MORZmk4waXn3vfHYDZfz4AdqfQ5hrEr0Pwo8DMZG4w@mail.gmail.com> (raw)
In-Reply-To: <CANn89iJKH37CMUuq66nERZQoMHFp+yuTe=yqxm1kf+RQ1RfHzw@mail.gmail.com>

On Fri, Mar 5, 2021 at 6:17 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Fri, Mar 5, 2021 at 12:27 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Thu, 4 Mar 2021 22:26:58 +0100 Eric Dumazet wrote:
> > > It would be nice if tun driver would have the ability to delay TX
> > > completions by N usecs,
> > > so that packetdrill tests could be used.
> > >
> > > It is probably not too hard to add such a feature.
> >
> > Add an ioctl to turn off the skb_orphan, queue the skbs in tun_do_read()
> > to free them from a timer?
>
> Yes, I  cooked a prototype like that a few hours ago before my night
> shift to launch our test suite.
>
> I yet have to add a sane limit to the number of delayed skbs so that
> syzbot does not oom its hosts ;)
>
> >
> > > I was testing this (note how I also removed the tcp_rearm_rto(sk) call)
> > >
> > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > > index 6f450e577975c7be9537338c8a4c0673d7d36c4c..9ef92ca55e530f76ad793d7342442c4ec06165f7
> > > 100644
> > > --- a/net/ipv4/tcp_input.c
> > > +++ b/net/ipv4/tcp_input.c
> > > @@ -6471,11 +6471,10 @@ static bool tcp_rcv_fastopen_synack(struct
> > > sock *sk, struct sk_buff *synack,
> > >         tcp_fastopen_cache_set(sk, mss, cookie, syn_drop, try_exp);
> > >
> > >         if (data) { /* Retransmit unacked data in SYN */
> > > -               skb_rbtree_walk_from(data) {
> > > -                       if (__tcp_retransmit_skb(sk, data, 1))
> > > -                               break;
> > > -               }
> > > -               tcp_rearm_rto(sk);
> > > +               skb_rbtree_walk_from(data)
> > > +                       tcp_mark_skb_lost(sk, data);
> > > +
> > > +               tcp_xmit_retransmit_queue(sk);
> > >                 NET_INC_STATS(sock_net(sk),
> > >                                 LINUX_MIB_TCPFASTOPENACTIVEFAIL);
> > >                 return true;
> >
> > AFAICT this works great now:
>
> Yes but some packetdrill tests fail, I have to check them (sponge link
> for Googlers), but at first glance we have more investigations.
>
> Ran 2003 tests: 1990 passed, 13 failed
> Sponge: http://sponge2/b0d4c652-3173-4837-86ef-5e8cc59730a1
>
> Failures in
> //net/tcp/fastopen/client:ipv4-mapped-ipv6:cookie-disabled-prod-conn
> //net/tcp/fastopen/client:ipv4-mapped-ipv6:syn-data-icmp-unreach-frag-needed
> //net/tcp/fastopen/client:ipv4-mapped-ipv6:syn-data-icmp-unreach-frag-needed-with-seq
> //net/tcp/fastopen/client:ipv4-mapped-ipv6:syn-data-only-syn-acked
> //net/tcp/fastopen/client:ipv4-mapped-ipv6:syn-data-partial-or-over-ack
> //net/tcp/fastopen/client:ipv4:cookie-disabled-prod-conn
> //net/tcp/fastopen/client:ipv4:syn-data-icmp-unreach-frag-needed
> //net/tcp/fastopen/client:ipv4:syn-data-icmp-unreach-frag-needed-with-seq
> //net/tcp/fastopen/client:ipv4:syn-data-only-syn-acked
> //net/tcp/fastopen/client:ipv4:syn-data-partial-or-over-ack
> //net/tcp/fastopen/client:ipv6:cookie-disabled-prod-conn
> //net/tcp/fastopen/client:ipv6:syn-data-only-syn-acked
> //net/tcp/fastopen/client:ipv6:syn-data-partial-or-over-ack
>
> Showing test.log for
> //net/tcp/fastopen/client:ipv4:syn-data-icmp-unreach-frag-needed
>
> syn-data-icmp-unreach-frag-needed.pkt:33: error handling packet: live
> packet field ipv4_total_length: expected: 800 (0x320) vs actual: 1040
> (0x410)
> script packet: 0.090624 . 1:761(760) ack 1
> actual packet: 0.090619 P. 1:1001(1000) ack 1 win 256
>

Actually this might be a good thing : We now resend the whole data in
a TSO packet, now
we call the standard rtx queue mechanism, instead of sending only one MSS

I will look at other test failures today (Friday) before submitting a
patch series officially.

>
> Yes, it looks like Alex patch no longer works
>
> commit c31b70c9968fe9c4194d1b5d06d07596a3b680de
> Author: Alexander Duyck <alexanderduyck@fb.com>
> Date:   Sat Dec 12 12:31:24 2020 -0800
>
>     tcp: Add logic to check for SYN w/ data in tcp_simple_retransmit
>
>
>
> >
> > ==> TFO case ret:-16 (0) ca_state:0 skb:ffff8881d3513800!
> >   FREED swapper/5 -- skb 0xffff8881d3513800 freed after: 1us
> > -----
> > First:
> >         __tcp_retransmit_skb+1
> >         tcp_retransmit_skb+18
> >         tcp_xmit_retransmit_queue.part.70+339
> >         tcp_rcv_state_process+2491
> >         tcp_v6_do_rcv+405
> >         tcp_v6_rcv+2984
> >
> > Second:
> >         __tcp_retransmit_skb+1
> >         tcp_retransmit_skb+18
> >         tcp_xmit_retransmit_queue.part.70+339
> >         tcp_tsq_write.part.71+146
> >         tcp_tsq_handler+53
> >         tcp_tasklet_func+181
> >
> >  sk:0xffff8885adc16f00 skb:ffff8881d3513800 --- 61us acked:1
> >
> >
> > The other case where we hit RTO after __tcp_retransmit_skb() fails is:
> >
> > ==> non-TFO case ret:-11 (0) ca_state:3 skb:ffff8883d71dd400!
> > -----
> > First:
> >         __tcp_retransmit_skb+1
> >         tcp_retransmit_skb+18
> >         tcp_xmit_retransmit_queue.part.70+339
> >         tcp_ack+2270
> >         tcp_rcv_established+303
> >         tcp_v6_do_rcv+190
> >
> > Second:
> >         __tcp_retransmit_skb+1
> >         tcp_retransmit_skb+18
> >         tcp_retransmit_timer+716
> >         tcp_write_timer_handler+136
> >         tcp_write_timer+141
> >         call_timer_fn+43
> >
> >  sk:0xffff88801772d340 skb:ffff8883d71dd400 --- 51738us acked:47324
> >
> > Which I believe is this:
> >
> >         if (refcount_read(&sk->sk_wmem_alloc) >
> >             min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
> >                   sk->sk_sndbuf))
> >                 return -EAGAIN;
> >
> > Because __tcp_retransmit_skb() seems to bail before
> > inet6_sk_rebuild_header gets called. Should we arm TSQ here as well?
>
> Yes, or make the limit slightly bigger since we now have the fclone
> more precise thing for standard drivers [1]
>
> if ((refcount_read(&sk->sk_wmem_alloc) >> 1) >
>     min_t(u32, sk->sk_wmem_queued,  sk->sk_sndbuf))
>    return -EAGAIN;
>
> [1] It is probably wise to keep this code because some drivers do call
> skb_orphan() in their ndo_start_xmit()

  reply	other threads:[~2021-03-05  5:34 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-02  6:07 [PATCH net] net: tcp: don't allocate fast clones for fastopen SYN Jakub Kicinski
2021-03-02  9:38 ` Eric Dumazet
2021-03-02 17:00   ` Jakub Kicinski
2021-03-02 17:02     ` Eric Dumazet
2021-03-02 20:52   ` Yuchung Cheng
2021-03-02 22:00     ` Jakub Kicinski
2021-03-03 21:35   ` Alexander Duyck
2021-03-04  0:07     ` Jakub Kicinski
2021-03-04  2:45       ` Alexander Duyck
2021-03-04 12:51         ` Eric Dumazet
2021-03-04 19:06           ` Jakub Kicinski
2021-03-04 19:41             ` Eric Dumazet
2021-03-04 20:18               ` Eric Dumazet
2021-03-04 21:08               ` Jonathan Lemon
2021-03-04 21:20                 ` Eric Dumazet
2021-03-04 21:26                   ` Eric Dumazet
2021-03-04 23:27                     ` Jakub Kicinski
2021-03-05  5:17                       ` Eric Dumazet
2021-03-05  5:33                         ` Eric Dumazet [this message]
2021-03-05  6:38                           ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANn89iK_MORZmk4waXn3vfHYDZfz4AdqfQ5hrEr0Pwo8DMZG4w@mail.gmail.com \
    --to=edumazet@google.com \
    --cc=alexander.duyck@gmail.com \
    --cc=davem@davemloft.net \
    --cc=jonathan.lemon@gmail.com \
    --cc=kernel-team@fb.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=ntspring@fb.com \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.