From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=boYM=7Q=vger.kernel.org=linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-11.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id ABBD2C433DF
	for <linux-kernel@archiver.kernel.org>; Wed,  3 Jun 2020 13:55:11 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 7BE33206A2
	for <linux-kernel@archiver.kernel.org>; Wed,  3 Jun 2020 13:55:11 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="oqrUFNi/"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1726080AbgFCNzK (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 3 Jun 2020 09:55:10 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51154 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1725916AbgFCNzJ (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 3 Jun 2020 09:55:09 -0400
Received: from mail-yb1-xb41.google.com (mail-yb1-xb41.google.com [IPv6:2607:f8b0:4864:20::b41])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9AA80C08C5C1
        for <linux-kernel@vger.kernel.org>; Wed,  3 Jun 2020 06:55:09 -0700 (PDT)
Received: by mail-yb1-xb41.google.com with SMTP id u17so1121860ybi.0
        for <linux-kernel@vger.kernel.org>; Wed, 03 Jun 2020 06:55:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=U8IeDiDoBVmA0QE3oVgedl+zceUqh62Dw2hpb3UyemU=;
        b=oqrUFNi/IKLm8zr/JkoLEWltOameVg+otBORTPeyDT6K+1vL0RDgZ8MIs6JKhZNwiR
         z/D8d1I15Y7tEVTunSpxsi7kWcXtbI6Vl8jBO4BTixPuDUkfVBiKGGyQdiOFGnJ+GZfU
         Q36XNbxDsRzFvH26Hf9zP5Lwb7NrNCsohYHmmBUT+WSCWfhXaroWcs/tDjEn77mte5c4
         AKhellf9UepiS6Kq2dFe0MMyIUi2jOHBgg3EKbXsOzIhPWfBN8iqEM3L4YPODQ0pU/OR
         WX7gbyuqBzgHs6Qv0X45wC3efKl40NxhMzMBjvy6bCjibf3VOZWvtCogXFAevrgRU84o
         +UtA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=U8IeDiDoBVmA0QE3oVgedl+zceUqh62Dw2hpb3UyemU=;
        b=uBwMWAzpZVdHR0cmb8h+dZjVUAvI6ENJNrTLMWnqNMCt2iUszcqqPAWovIzMDVGvSc
         m1oOAB/TZFRtlAlfs2tBglL+rUU+hWYCq+qq6iBWJYe9UiWrVT2saW5VHcXvMmnu4gsg
         mJbggXpzbEFq0io1GjKNrZ6//IkpfbUM208RP9S+nrw4BhMYSAcDupCL+LGv8a46Zee9
         uvyHVJPd+w2/iNurwNayKjJXze7W8CrfcKahX/IIg0o/Ki57n/s7QPD8/Rl6Dg5/f+01
         OP4pVIbbHyEpixjl3sWJiAY/FRlqx23lVyBUB4dZPkrriC1swtwZ3SmtLSQMsQNI48z0
         txlA==
X-Gm-Message-State: AOAM530e5SZLn3vK5BeH72ByS7dKPP1Wxo0GuN/XyV69iUMZkSCvJEcF
        R7bLl2/D2n+ITJPC8rAEeBEImEE41w/HFdVCxIryGg==
X-Google-Smtp-Source: ABdhPJz9Qw7D5ixsrCkBdUq2WRhpgvzsiAYhD7jodKSpHZnJBlWjCCQ1goHrC+aPtsxXwhA3bm8Dvpk7aSzaaxpcE/0=
X-Received: by 2002:a25:9a49:: with SMTP id r9mr1089ybo.520.1591192508483;
 Wed, 03 Jun 2020 06:55:08 -0700 (PDT)
MIME-Version: 1.0
References: <20200602080425.93712-1-kerneljasonxing@gmail.com>
 <CANn89iLNCDuXAhj4By0PDKbuFvneVfwmwkLbRCEKLBF+pmNEPg@mail.gmail.com>
 <CAL+tcoBjjwrkE5QbXDFADRGJfPoniLL1rMFNUkAKBN9L57UGHA@mail.gmail.com>
 <CANn89iKDKnnW1na_F0ngGh3EEc0quuBB2XWo21oAKaHckdPK4w@mail.gmail.com>
 <CAL+tcoDn_=T--uB0CRymfTGvD022PPDk5Yw2yCxvqOOpZ4G_dQ@mail.gmail.com>
 <CANn89i+dPu9=qJowhRVm9d3CesY4p+zzJ0HGiCMc_yJxux6pow@mail.gmail.com>
 <CAL+tcoC2+vYoFbujkLCF7P3evfirNSBQtJ9bPFHiU2FGOnBo+A@mail.gmail.com>
 <CANn89iJfLM2Hz69d9qOZoRKwzzCCpgVRZ1zbTTbg4vGvSAEZ-w@mail.gmail.com> <CADVnQy=RJfmzHR15DyWdydFAqSqVmFhaW4_cgYYAgnixEa5DNQ@mail.gmail.com>
In-Reply-To: <CADVnQy=RJfmzHR15DyWdydFAqSqVmFhaW4_cgYYAgnixEa5DNQ@mail.gmail.com>
From:   Eric Dumazet <edumazet@google.com>
Date:   Wed, 3 Jun 2020 06:54:56 -0700
Message-ID: <CANn89i+7-wE4xr5D9DpH+N-xkL1SB8oVghCKgz+CT5eG1ODQhA@mail.gmail.com>
Subject: Re: [PATCH] tcp: fix TCP socks unreleased in BBR mode
To:     Neal Cardwell <ncardwell@google.com>
Cc:     Jason Xing <kerneljasonxing@gmail.com>,
        David Miller <davem@davemloft.net>,
        Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
        Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
        netdev <netdev@vger.kernel.org>,
        LKML <linux-kernel@vger.kernel.org>, liweishi@kuaishou.com,
        Shujin Li <lishujin@kuaishou.com>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed, Jun 3, 2020 at 5:02 AM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Wed, Jun 3, 2020 at 1:44 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Tue, Jun 2, 2020 at 10:05 PM Jason Xing <kerneljasonxing@gmail.com> wrote:
> > >
> > > Hi Eric,
> > >
> > > I'm still trying to understand what you're saying before. Would this
> > > be better as following:
> > > 1) discard the tcp_internal_pacing() function.
> > > 2) remove where the tcp_internal_pacing() is called in the
> > > __tcp_transmit_skb() function.
> > >
> > > If we do so, we could avoid 'too late to give up pacing'. Meanwhile,
> > > should we introduce the tcp_wstamp_ns socket field as commit
> > > (864e5c090749) does?
> > >
> >
> > Please do not top-post on netdev mailing list.
> >
> >
> > I basically suggested double-checking which point in TCP could end up
> > calling tcp_internal_pacing()
> > while the timer was already armed.
> >
> > I guess this is mtu probing.
>
> Perhaps this could also happen from some of the retransmission code
> paths that don't use tcp_xmit_retransmit_queue()? Perhaps
> tcp_retransmit_timer() (RTO) and  tcp_send_loss_probe() TLP? It seems
> they could indirectly cause a call to __tcp_transmit_skb() and thus
> tcp_internal_pacing() without first checking if the pacing timer was
> already armed?

I feared this, (see recent commits about very low pacing rates) :/

I am not sure we need to properly fix all these points for old
kernels, since EDT model got rid of these problems.

Maybe we can try to extend the timer.

Something like :


diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index cc4ba42052c21b206850594db6751810d8fc72b4..626b9f4f500f7e5270d8d59e6eb16dbfa3efbc7c
100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -966,6 +966,8 @@ enum hrtimer_restart tcp_pace_kick(struct hrtimer *timer)

 static void tcp_internal_pacing(struct sock *sk, const struct sk_buff *skb)
 {
+       struct tcp_sock *tp = tcp_sk(sk);
+       ktime_t expire, now;
        u64 len_ns;
        u32 rate;

@@ -977,12 +979,29 @@ static void tcp_internal_pacing(struct sock *sk,
const struct sk_buff *skb)

        len_ns = (u64)skb->len * NSEC_PER_SEC;
        do_div(len_ns, rate);
-       hrtimer_start(&tcp_sk(sk)->pacing_timer,
-                     ktime_add_ns(ktime_get(), len_ns),
+
+       now = ktime_get();
+       /* If hrtimer is already armed, then our caller has not
+        * used tcp_pacing_check().
+        */
+       if (unlikely(hrtimer_is_queued(&tp->pacing_timer))) {
+               expire = hrtimer_get_softexpires(&tp->pacing_timer);
+               if (ktime_after(expire, now))
+                       now = expire;
+               if (hrtimer_try_to_cancel(&tp->pacing_timer) == 1)
+                       __sock_put(sk);
+       }
+       hrtimer_start(&tp->pacing_timer, ktime_add_ns(now, len_ns),
                      HRTIMER_MODE_ABS_PINNED_SOFT);
        sock_hold(sk);
 }

+static bool tcp_pacing_check(const struct sock *sk)
+{
+       return tcp_needs_internal_pacing(sk) &&
+              hrtimer_is_queued(&tcp_sk(sk)->pacing_timer);
+}
+
 static void tcp_update_skb_after_send(struct tcp_sock *tp, struct sk_buff *skb)
 {
        skb->skb_mstamp = tp->tcp_mstamp;
@@ -2117,6 +2136,9 @@ static int tcp_mtu_probe(struct sock *sk)
        if (!tcp_can_coalesce_send_queue_head(sk, probe_size))
                return -1;

+       if (tcp_pacing_check(sk))
+               return -1;
+
        /* We're allowed to probe.  Build it now. */
        nskb = sk_stream_alloc_skb(sk, probe_size, GFP_ATOMIC, false);
        if (!nskb)
@@ -2190,11 +2212,6 @@ static int tcp_mtu_probe(struct sock *sk)
        return -1;
 }

-static bool tcp_pacing_check(const struct sock *sk)
-{
-       return tcp_needs_internal_pacing(sk) &&
-              hrtimer_is_queued(&tcp_sk(sk)->pacing_timer);
-}

 /* TCP Small Queues :
  * Control number of packets in qdisc/devices to two packets / or ~1 ms.


>
> neal