All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <edumazet@google.com>
To: Neal Cardwell <ncardwell@google.com>
Cc: Jason Xing <kerneljasonxing@gmail.com>,
	David Miller <davem@davemloft.net>,
	Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	netdev <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	liweishi@kuaishou.com, Shujin Li <lishujin@kuaishou.com>
Subject: Re: [PATCH] tcp: fix TCP socks unreleased in BBR mode
Date: Wed, 3 Jun 2020 06:54:56 -0700	[thread overview]
Message-ID: <CANn89i+7-wE4xr5D9DpH+N-xkL1SB8oVghCKgz+CT5eG1ODQhA@mail.gmail.com> (raw)
In-Reply-To: <CADVnQy=RJfmzHR15DyWdydFAqSqVmFhaW4_cgYYAgnixEa5DNQ@mail.gmail.com>

On Wed, Jun 3, 2020 at 5:02 AM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Wed, Jun 3, 2020 at 1:44 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Tue, Jun 2, 2020 at 10:05 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > Hi Eric,
> > >
> > > I'm still trying to understand what you're saying before. Would this
> > > be better as following:
> > > 1) discard the tcp_internal_pacing() function.
> > > 2) remove where the tcp_internal_pacing() is called in the
> > > __tcp_transmit_skb() function.
> > >
> > > If we do so, we could avoid 'too late to give up pacing'. Meanwhile,
> > > should we introduce the tcp_wstamp_ns socket field as commit
> > > (864e5c090749) does?
> > >
> >
> > Please do not top-post on netdev mailing list.
> >
> >
> > I basically suggested double-checking which point in TCP could end up
> > calling tcp_internal_pacing()
> > while the timer was already armed.
> >
> > I guess this is mtu probing.
>
> Perhaps this could also happen from some of the retransmission code
> paths that don't use tcp_xmit_retransmit_queue()? Perhaps
> tcp_retransmit_timer() (RTO) and  tcp_send_loss_probe() TLP? It seems
> they could indirectly cause a call to __tcp_transmit_skb() and thus
> tcp_internal_pacing() without first checking if the pacing timer was
> already armed?

I feared this, (see recent commits about very low pacing rates) :/

I am not sure we need to properly fix all these points for old
kernels, since EDT model got rid of these problems.

Maybe we can try to extend the timer.

Something like :


diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index cc4ba42052c21b206850594db6751810d8fc72b4..626b9f4f500f7e5270d8d59e6eb16dbfa3efbc7c
100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -966,6 +966,8 @@ enum hrtimer_restart tcp_pace_kick(struct hrtimer *timer)

 static void tcp_internal_pacing(struct sock *sk, const struct sk_buff *skb)
 {
+       struct tcp_sock *tp = tcp_sk(sk);
+       ktime_t expire, now;
        u64 len_ns;
        u32 rate;

@@ -977,12 +979,29 @@ static void tcp_internal_pacing(struct sock *sk,
const struct sk_buff *skb)

        len_ns = (u64)skb->len * NSEC_PER_SEC;
        do_div(len_ns, rate);
-       hrtimer_start(&tcp_sk(sk)->pacing_timer,
-                     ktime_add_ns(ktime_get(), len_ns),
+
+       now = ktime_get();
+       /* If hrtimer is already armed, then our caller has not
+        * used tcp_pacing_check().
+        */
+       if (unlikely(hrtimer_is_queued(&tp->pacing_timer))) {
+               expire = hrtimer_get_softexpires(&tp->pacing_timer);
+               if (ktime_after(expire, now))
+                       now = expire;
+               if (hrtimer_try_to_cancel(&tp->pacing_timer) == 1)
+                       __sock_put(sk);
+       }
+       hrtimer_start(&tp->pacing_timer, ktime_add_ns(now, len_ns),
                      HRTIMER_MODE_ABS_PINNED_SOFT);
        sock_hold(sk);
 }

+static bool tcp_pacing_check(const struct sock *sk)
+{
+       return tcp_needs_internal_pacing(sk) &&
+              hrtimer_is_queued(&tcp_sk(sk)->pacing_timer);
+}
+
 static void tcp_update_skb_after_send(struct tcp_sock *tp, struct sk_buff *skb)
 {
        skb->skb_mstamp = tp->tcp_mstamp;
@@ -2117,6 +2136,9 @@ static int tcp_mtu_probe(struct sock *sk)
        if (!tcp_can_coalesce_send_queue_head(sk, probe_size))
                return -1;

+       if (tcp_pacing_check(sk))
+               return -1;
+
        /* We're allowed to probe.  Build it now. */
        nskb = sk_stream_alloc_skb(sk, probe_size, GFP_ATOMIC, false);
        if (!nskb)
@@ -2190,11 +2212,6 @@ static int tcp_mtu_probe(struct sock *sk)
        return -1;
 }

-static bool tcp_pacing_check(const struct sock *sk)
-{
-       return tcp_needs_internal_pacing(sk) &&
-              hrtimer_is_queued(&tcp_sk(sk)->pacing_timer);
-}

 /* TCP Small Queues :
  * Control number of packets in qdisc/devices to two packets / or ~1 ms.



>
> neal

  parent reply	other threads:[~2020-06-03 13:55 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-02  8:04 [PATCH] tcp: fix TCP socks unreleased in BBR mode kerneljasonxing
2020-06-02 13:05 ` Eric Dumazet
2020-06-03  1:53   ` Jason Xing
2020-06-03  2:14     ` David Miller
2020-06-03  2:29     ` Eric Dumazet
2020-06-03  2:41       ` Jason Xing
2020-06-03  2:43         ` Eric Dumazet
2020-06-03  2:48           ` Jason Xing
2020-06-03  5:05           ` Jason Xing
2020-06-03  5:44             ` Eric Dumazet
2020-06-03  6:32               ` Jason Xing
2020-06-03 12:02               ` Neal Cardwell
2020-06-03 13:51                 ` Jason Xing
2020-06-03 13:54                 ` Eric Dumazet [this message]
2020-06-03 14:07                   ` Neal Cardwell
2020-06-04  8:40                     ` Jason Xing
2020-06-04  9:00 ` [PATCH v2 4.19] " kerneljasonxing
2020-06-04 13:10   ` Eric Dumazet
2020-06-04 13:47     ` Jason Xing
2020-08-11 10:37       ` Jason Xing
2020-08-11 15:32         ` Eric Dumazet
2021-04-30 10:51           ` Jason Xing

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANn89i+7-wE4xr5D9DpH+N-xkL1SB8oVghCKgz+CT5eG1ODQhA@mail.gmail.com \
    --to=edumazet@google.com \
    --cc=davem@davemloft.net \
    --cc=kerneljasonxing@gmail.com \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lishujin@kuaishou.com \
    --cc=liweishi@kuaishou.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.