From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35855C433DF for ; Wed, 3 Jun 2020 14:08:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F3E15207DF for ; Wed, 3 Jun 2020 14:08:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Q6LEkZfx" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726108AbgFCOIS (ORCPT ); Wed, 3 Jun 2020 10:08:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53178 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725890AbgFCOIR (ORCPT ); Wed, 3 Jun 2020 10:08:17 -0400 Received: from mail-vs1-xe44.google.com (mail-vs1-xe44.google.com [IPv6:2607:f8b0:4864:20::e44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4609EC08C5C0 for ; Wed, 3 Jun 2020 07:08:15 -0700 (PDT) Received: by mail-vs1-xe44.google.com with SMTP id u17so1449687vsu.7 for ; Wed, 03 Jun 2020 07:08:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=987vEKH7qwi1mDGdoRAEJ4OXmQS09iP3mdTNwmC8gNc=; b=Q6LEkZfxtDhljoZLLSJB+xKbUuZ/9GorNwDH/PbasIqw8M/FfVqT+36ppVrQYUzf4R ydEpAjtR/2V0a+vx4c5WvX3zvctPafn1B+BPL+HKrFbbLa42MQLwdzPIWtwUvl+UP1vk SUV7FjYGUaLF3w3RPqMkAWg8XAm1AMO8E9egVn5vtXYoRRidPMx1HRT0SqGHvNYdtliB q89+Y+5senbsBhg/sDB1Is+Midy9lfOHEqxIZebsxxqlfWneSVDSoNWXKM3RyV4FVpL0 dQySDcrtELfxaGaB1QEFmMNBohPj+lcfAxysuolQ4BaNHN1z4L5d1P5Y7PKjXO2Q9G+W LChQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=987vEKH7qwi1mDGdoRAEJ4OXmQS09iP3mdTNwmC8gNc=; b=pbtSv3GCn1eCrT6byVU59xJkV2njzDBwWh9f0a9kNkPyf5ugvlPLmmZliv/Utau5iz drrPaBkYShQWnYM8z/08D7t9qlWQ3yJ7l5yUQi1l/21Ykd64sls/CjwMmlYmXfwK2LZ9 iBHYwmf/X8rkOquWFYQlNFN9cl4dBnr26LFv4Ujthaoo86gadpWLiXpuMTgtiT0MtPqe 6T+hm3n8ALR6CRObc/wYepyuWRANpC+Ez/NgTEiJRZDMaBsya6UyrESfKcPPWd6myWq1 DHHgww6Zk8ea9Lkb4eTKJfJXhF/RyZPKLcWwN3JBoraBe2mTjkYDXyjIAThweWA1fT2o JeTg== X-Gm-Message-State: AOAM531Jpg/5Wmn9CCErgE3yLhNE7JIL1NVC8CtZTmVIQhNLb//YSmZd Gg4y/GXrd/mPoSsOJQLztl4PA54G5uErJkeLhtap4g== X-Google-Smtp-Source: ABdhPJy0hpwTVN8ZLSoKHjsRwOYMVh8eTtLjYBhB7yeOnlKEff6FI7agH/g5zBKnYVkMBWnLIRuf7reWU8NEWJzrE+A= X-Received: by 2002:a67:edca:: with SMTP id e10mr21035480vsp.219.1591193294137; Wed, 03 Jun 2020 07:08:14 -0700 (PDT) MIME-Version: 1.0 References: <20200602080425.93712-1-kerneljasonxing@gmail.com> In-Reply-To: From: Neal Cardwell Date: Wed, 3 Jun 2020 10:07:55 -0400 Message-ID: Subject: Re: [PATCH] tcp: fix TCP socks unreleased in BBR mode To: Eric Dumazet Cc: Jason Xing , David Miller , Alexey Kuznetsov , Hideaki YOSHIFUJI , netdev , LKML , liweishi@kuaishou.com, Shujin Li Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 3, 2020 at 9:55 AM Eric Dumazet wrote: > > On Wed, Jun 3, 2020 at 5:02 AM Neal Cardwell wrote: > > > > On Wed, Jun 3, 2020 at 1:44 AM Eric Dumazet wrote: > > > > > > On Tue, Jun 2, 2020 at 10:05 PM Jason Xing wrote: > > > > > > > > Hi Eric, > > > > > > > > I'm still trying to understand what you're saying before. Would this > > > > be better as following: > > > > 1) discard the tcp_internal_pacing() function. > > > > 2) remove where the tcp_internal_pacing() is called in the > > > > __tcp_transmit_skb() function. > > > > > > > > If we do so, we could avoid 'too late to give up pacing'. Meanwhile, > > > > should we introduce the tcp_wstamp_ns socket field as commit > > > > (864e5c090749) does? > > > > > > > > > > Please do not top-post on netdev mailing list. > > > > > > > > > I basically suggested double-checking which point in TCP could end up > > > calling tcp_internal_pacing() > > > while the timer was already armed. > > > > > > I guess this is mtu probing. > > > > Perhaps this could also happen from some of the retransmission code > > paths that don't use tcp_xmit_retransmit_queue()? Perhaps > > tcp_retransmit_timer() (RTO) and tcp_send_loss_probe() TLP? It seems > > they could indirectly cause a call to __tcp_transmit_skb() and thus > > tcp_internal_pacing() without first checking if the pacing timer was > > already armed? > > I feared this, (see recent commits about very low pacing rates) :/ > > I am not sure we need to properly fix all these points for old > kernels, since EDT model got rid of these problems. Agreed. > Maybe we can try to extend the timer. Sounds good. > Something like : > > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index cc4ba42052c21b206850594db6751810d8fc72b4..626b9f4f500f7e5270d8d59e6eb16dbfa3efbc7c > 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -966,6 +966,8 @@ enum hrtimer_restart tcp_pace_kick(struct hrtimer *timer) > > static void tcp_internal_pacing(struct sock *sk, const struct sk_buff *skb) > { > + struct tcp_sock *tp = tcp_sk(sk); > + ktime_t expire, now; > u64 len_ns; > u32 rate; > > @@ -977,12 +979,29 @@ static void tcp_internal_pacing(struct sock *sk, > const struct sk_buff *skb) > > len_ns = (u64)skb->len * NSEC_PER_SEC; > do_div(len_ns, rate); > - hrtimer_start(&tcp_sk(sk)->pacing_timer, > - ktime_add_ns(ktime_get(), len_ns), > + > + now = ktime_get(); > + /* If hrtimer is already armed, then our caller has not > + * used tcp_pacing_check(). > + */ > + if (unlikely(hrtimer_is_queued(&tp->pacing_timer))) { > + expire = hrtimer_get_softexpires(&tp->pacing_timer); > + if (ktime_after(expire, now)) > + now = expire; > + if (hrtimer_try_to_cancel(&tp->pacing_timer) == 1) > + __sock_put(sk); > + } > + hrtimer_start(&tp->pacing_timer, ktime_add_ns(now, len_ns), > HRTIMER_MODE_ABS_PINNED_SOFT); > sock_hold(sk); > } > > +static bool tcp_pacing_check(const struct sock *sk) > +{ > + return tcp_needs_internal_pacing(sk) && > + hrtimer_is_queued(&tcp_sk(sk)->pacing_timer); > +} > + > static void tcp_update_skb_after_send(struct tcp_sock *tp, struct sk_buff *skb) > { > skb->skb_mstamp = tp->tcp_mstamp; > @@ -2117,6 +2136,9 @@ static int tcp_mtu_probe(struct sock *sk) > if (!tcp_can_coalesce_send_queue_head(sk, probe_size)) > return -1; > > + if (tcp_pacing_check(sk)) > + return -1; > + > /* We're allowed to probe. Build it now. */ > nskb = sk_stream_alloc_skb(sk, probe_size, GFP_ATOMIC, false); > if (!nskb) > @@ -2190,11 +2212,6 @@ static int tcp_mtu_probe(struct sock *sk) > return -1; > } > > -static bool tcp_pacing_check(const struct sock *sk) > -{ > - return tcp_needs_internal_pacing(sk) && > - hrtimer_is_queued(&tcp_sk(sk)->pacing_timer); > -} > > /* TCP Small Queues : > * Control number of packets in qdisc/devices to two packets / or ~1 ms. Thanks for your fix, Eric. This fix looks good to me! I agree that this fix is good enough for older kernels. thanks, neal