From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ABBD2C433DF for ; Wed, 3 Jun 2020 13:55:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7BE33206A2 for ; Wed, 3 Jun 2020 13:55:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="oqrUFNi/" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726080AbgFCNzK (ORCPT ); Wed, 3 Jun 2020 09:55:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51154 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725916AbgFCNzJ (ORCPT ); Wed, 3 Jun 2020 09:55:09 -0400 Received: from mail-yb1-xb41.google.com (mail-yb1-xb41.google.com [IPv6:2607:f8b0:4864:20::b41]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9AA80C08C5C1 for ; Wed, 3 Jun 2020 06:55:09 -0700 (PDT) Received: by mail-yb1-xb41.google.com with SMTP id u17so1121860ybi.0 for ; Wed, 03 Jun 2020 06:55:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=U8IeDiDoBVmA0QE3oVgedl+zceUqh62Dw2hpb3UyemU=; b=oqrUFNi/IKLm8zr/JkoLEWltOameVg+otBORTPeyDT6K+1vL0RDgZ8MIs6JKhZNwiR z/D8d1I15Y7tEVTunSpxsi7kWcXtbI6Vl8jBO4BTixPuDUkfVBiKGGyQdiOFGnJ+GZfU Q36XNbxDsRzFvH26Hf9zP5Lwb7NrNCsohYHmmBUT+WSCWfhXaroWcs/tDjEn77mte5c4 AKhellf9UepiS6Kq2dFe0MMyIUi2jOHBgg3EKbXsOzIhPWfBN8iqEM3L4YPODQ0pU/OR WX7gbyuqBzgHs6Qv0X45wC3efKl40NxhMzMBjvy6bCjibf3VOZWvtCogXFAevrgRU84o +UtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=U8IeDiDoBVmA0QE3oVgedl+zceUqh62Dw2hpb3UyemU=; b=uBwMWAzpZVdHR0cmb8h+dZjVUAvI6ENJNrTLMWnqNMCt2iUszcqqPAWovIzMDVGvSc m1oOAB/TZFRtlAlfs2tBglL+rUU+hWYCq+qq6iBWJYe9UiWrVT2saW5VHcXvMmnu4gsg mJbggXpzbEFq0io1GjKNrZ6//IkpfbUM208RP9S+nrw4BhMYSAcDupCL+LGv8a46Zee9 uvyHVJPd+w2/iNurwNayKjJXze7W8CrfcKahX/IIg0o/Ki57n/s7QPD8/Rl6Dg5/f+01 OP4pVIbbHyEpixjl3sWJiAY/FRlqx23lVyBUB4dZPkrriC1swtwZ3SmtLSQMsQNI48z0 txlA== X-Gm-Message-State: AOAM530e5SZLn3vK5BeH72ByS7dKPP1Wxo0GuN/XyV69iUMZkSCvJEcF R7bLl2/D2n+ITJPC8rAEeBEImEE41w/HFdVCxIryGg== X-Google-Smtp-Source: ABdhPJz9Qw7D5ixsrCkBdUq2WRhpgvzsiAYhD7jodKSpHZnJBlWjCCQ1goHrC+aPtsxXwhA3bm8Dvpk7aSzaaxpcE/0= X-Received: by 2002:a25:9a49:: with SMTP id r9mr1089ybo.520.1591192508483; Wed, 03 Jun 2020 06:55:08 -0700 (PDT) MIME-Version: 1.0 References: <20200602080425.93712-1-kerneljasonxing@gmail.com> In-Reply-To: From: Eric Dumazet Date: Wed, 3 Jun 2020 06:54:56 -0700 Message-ID: Subject: Re: [PATCH] tcp: fix TCP socks unreleased in BBR mode To: Neal Cardwell Cc: Jason Xing , David Miller , Alexey Kuznetsov , Hideaki YOSHIFUJI , netdev , LKML , liweishi@kuaishou.com, Shujin Li Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 3, 2020 at 5:02 AM Neal Cardwell wrote: > > On Wed, Jun 3, 2020 at 1:44 AM Eric Dumazet wrote: > > > > On Tue, Jun 2, 2020 at 10:05 PM Jason Xing wrote: > > > > > > Hi Eric, > > > > > > I'm still trying to understand what you're saying before. Would this > > > be better as following: > > > 1) discard the tcp_internal_pacing() function. > > > 2) remove where the tcp_internal_pacing() is called in the > > > __tcp_transmit_skb() function. > > > > > > If we do so, we could avoid 'too late to give up pacing'. Meanwhile, > > > should we introduce the tcp_wstamp_ns socket field as commit > > > (864e5c090749) does? > > > > > > > Please do not top-post on netdev mailing list. > > > > > > I basically suggested double-checking which point in TCP could end up > > calling tcp_internal_pacing() > > while the timer was already armed. > > > > I guess this is mtu probing. > > Perhaps this could also happen from some of the retransmission code > paths that don't use tcp_xmit_retransmit_queue()? Perhaps > tcp_retransmit_timer() (RTO) and tcp_send_loss_probe() TLP? It seems > they could indirectly cause a call to __tcp_transmit_skb() and thus > tcp_internal_pacing() without first checking if the pacing timer was > already armed? I feared this, (see recent commits about very low pacing rates) :/ I am not sure we need to properly fix all these points for old kernels, since EDT model got rid of these problems. Maybe we can try to extend the timer. Something like : diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index cc4ba42052c21b206850594db6751810d8fc72b4..626b9f4f500f7e5270d8d59e6eb16dbfa3efbc7c 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -966,6 +966,8 @@ enum hrtimer_restart tcp_pace_kick(struct hrtimer *timer) static void tcp_internal_pacing(struct sock *sk, const struct sk_buff *skb) { + struct tcp_sock *tp = tcp_sk(sk); + ktime_t expire, now; u64 len_ns; u32 rate; @@ -977,12 +979,29 @@ static void tcp_internal_pacing(struct sock *sk, const struct sk_buff *skb) len_ns = (u64)skb->len * NSEC_PER_SEC; do_div(len_ns, rate); - hrtimer_start(&tcp_sk(sk)->pacing_timer, - ktime_add_ns(ktime_get(), len_ns), + + now = ktime_get(); + /* If hrtimer is already armed, then our caller has not + * used tcp_pacing_check(). + */ + if (unlikely(hrtimer_is_queued(&tp->pacing_timer))) { + expire = hrtimer_get_softexpires(&tp->pacing_timer); + if (ktime_after(expire, now)) + now = expire; + if (hrtimer_try_to_cancel(&tp->pacing_timer) == 1) + __sock_put(sk); + } + hrtimer_start(&tp->pacing_timer, ktime_add_ns(now, len_ns), HRTIMER_MODE_ABS_PINNED_SOFT); sock_hold(sk); } +static bool tcp_pacing_check(const struct sock *sk) +{ + return tcp_needs_internal_pacing(sk) && + hrtimer_is_queued(&tcp_sk(sk)->pacing_timer); +} + static void tcp_update_skb_after_send(struct tcp_sock *tp, struct sk_buff *skb) { skb->skb_mstamp = tp->tcp_mstamp; @@ -2117,6 +2136,9 @@ static int tcp_mtu_probe(struct sock *sk) if (!tcp_can_coalesce_send_queue_head(sk, probe_size)) return -1; + if (tcp_pacing_check(sk)) + return -1; + /* We're allowed to probe. Build it now. */ nskb = sk_stream_alloc_skb(sk, probe_size, GFP_ATOMIC, false); if (!nskb) @@ -2190,11 +2212,6 @@ static int tcp_mtu_probe(struct sock *sk) return -1; } -static bool tcp_pacing_check(const struct sock *sk) -{ - return tcp_needs_internal_pacing(sk) && - hrtimer_is_queued(&tcp_sk(sk)->pacing_timer); -} /* TCP Small Queues : * Control number of packets in qdisc/devices to two packets / or ~1 ms. > > neal