From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17F97C433EF for ; Sat, 15 Jan 2022 16:47:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229784AbiAOQrH (ORCPT ); Sat, 15 Jan 2022 11:47:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60332 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229471AbiAOQrH (ORCPT ); Sat, 15 Jan 2022 11:47:07 -0500 Received: from mail-ed1-x52b.google.com (mail-ed1-x52b.google.com [IPv6:2a00:1450:4864:20::52b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 87903C061574; Sat, 15 Jan 2022 08:47:06 -0800 (PST) Received: by mail-ed1-x52b.google.com with SMTP id a18so46360564edj.7; Sat, 15 Jan 2022 08:47:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=umg3fmIHEU7ouEUqxByuwXez80RylgBi5NxgeR5uNbQ=; b=XtM7/t4yjYeD7SB0BHoS/8zjCi3EyHDHyN8e+nPG3/QFmujQ1mEftGtLVzA59gNri8 N5EOUb0FZImDOY6t+xgq0FqVwNXEMSuIGL651CgPH7ooF5ASBPja9tL8vOv4LsuCpXC9 EgTXJLVNtSwXVfPBRMcln6JzB52gwoca0kNu4Y7K6zU5I3Px4vCIY7c0c2AWIEBDAajA YiI7OjPLq8qUmyPHDc9t8kBS+RCNmgXkyHxf8Y5PiTHKt7JXIkufiuhCaHc1KTuoWler yeh7qkRQuaqLP65bFQrf1xvodxCRXRBFt4SlqstNUwoQ7nX73QeHJevivhiASdeiwZBi 9E2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=umg3fmIHEU7ouEUqxByuwXez80RylgBi5NxgeR5uNbQ=; b=BWeApdd+nIB72WcIRjsObTiIs/JOqW25mhqK+O+2+gVeXHwamRGuau35nZEbHmcqbh 3HPNtYY19R+ed3fF+GcQ9eCsoIX7N9rGc4y19yi0vnpCahZvoUr/55fFIF8TN9X6/7My cd48yUWGA9uWgFULUY4GGXXx9RJcNK4AeNNa2A2JoBWmqK43rtIQLACBj+Q1VhoBiuwR vVECPCRMbAcVP2d6rnQdfDXtSnbxT3OpKZJTkzg9IVlad+2JoxZQPRGTCZs/M2NRbtKp L3GNiW72sz+vM7MAbu91EBINPilJaEjTEYmmGBPLCe2aVGARGfyBtymdZp2xXZS7HDbN msaw== X-Gm-Message-State: AOAM533C9GfiLjjr8JdWNk+S3CUXqP6lQARe0aYLrWfECFA1PtcPJSgM rxaCAFFqB5LA7JykS30DkX8faXoRhKyGSOSJl7g= X-Google-Smtp-Source: ABdhPJyH0UIPV8JqnNsONGk7+0AcajsTKmtphUYvVBb6LJAlzTTQZj8ELY3WoewqdICJa2EcIezozG5CquyVZSmgNis= X-Received: by 2002:a05:6402:289a:: with SMTP id eg26mr334586edb.318.1642265225026; Sat, 15 Jan 2022 08:47:05 -0800 (PST) MIME-Version: 1.0 References: <20220111192952.49040-1-ivan@cloudflare.com> In-Reply-To: From: Dave Taht Date: Sat, 15 Jan 2022 08:46:52 -0800 Message-ID: Subject: Re: [PATCH bpf-next] tcp: bpf: Add TCP_BPF_RCV_SSTHRESH for bpf_setsockopt To: Ivan Babrou Cc: bpf , Linux Kernel Network Developers , LKML , kernel-team , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eric Dumazet Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org On Fri, Jan 14, 2022 at 2:21 PM Ivan Babrou wrote: > > On Thu, Jan 13, 2022 at 9:44 PM Dave Taht wrote: > > Yes, but with the caveats below. I'm fine with you just saying round tr= ips, > > and making this api possible. > > > > It would comfort me further if you could provide an actual scenario. > > The actual scenario is getting a response as quickly as possible on a > fresh connection across long distances (200ms+ RTT). If an RPC > response doesn't fit into the initial 64k of rcv_ssthresh, we end up > requiring more roundrips to receive the response. Some customers are > very picky about the latency they measure and cutting the extra > roundtrips made a very visible difference in the tests. > > > See also: > > > > https://datatracker.ietf.org/doc/html/rfc6928 > > > > which predates packet pacing (are you using sch_fq?) > > We are using fq and bbr. > > > > Congestion window is a learned property, not a static number. You > > > won't get a large initcwnd towards a poor connection. > > > > initcwnd is set globally or on a per route basis. Like I said, retaining state from an existing connection as to the window is ok. i think arbitrarily declaring a window like this for a new connection is not. > With TCP_BPF_IW the world is your oyster. The oyster has to co-habit in this ocean with all the other life there, and I would be comforted if your customer also tracked various other TCP_INFO statistics, like RTT growth, loss, marks, and retransmits, and was aware of not just the self harm inflicted but of collateral damage. In fact I really wish more were instrumenting everything with that, of late we've seen a lot of need for TCP_NOTSENT_LOWAT in things like apache traffic server in containers. A simple one line patch for an widely used app I can't talk about, did wonders for actual perceived throughput and responsiveness by the end user. Measuring from the reciever is far, far more important than measuring from the sender. Collecting long term statistics over many connections, also, from the real world. I hope y'all have been instrumenting your work as well as google has, on these fronts. I know that I'm getting old and crunchy and scarred by seeing so many (corporate wifi mostly) networks over the last decade essentially in congestion collapse! https://blog.apnic.net/2020/01/22/bufferbloat-may-be-solved-but-its-not-ove= r-yet/ I'm very happy with how well sch_fq + packet pacing works to mitigate impuses like this, as well as with so many other things like BBR and BQL, but pacing out !=3D pacing in, and despite my fervent wish for more FQ+AQM techniques on more bottleneck links also, we're not there yet. I like very much that BPF is allowing rapid innovation, but with great power comes great responsibility. --=20 I tried to build a better future, a few times: https://wayforward.archive.org/?site=3Dhttps%3A%2F%2Fwww.icei.org Dave T=C3=A4ht CEO, TekLibre, LLC