From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D91C0C43460 for ; Wed, 28 Apr 2021 16:35:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A3F9E613F3 for ; Wed, 28 Apr 2021 16:35:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241001AbhD1QgT (ORCPT ); Wed, 28 Apr 2021 12:36:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33954 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240862AbhD1QgS (ORCPT ); Wed, 28 Apr 2021 12:36:18 -0400 Received: from mail.as397444.net (mail.as397444.net [IPv6:2620:6e:a000:dead:beef:15:bad:f00d]) by lindbergh.monkeyblade.net (Postfix) with UTF8SMTPS id 02A28C0613ED for ; Wed, 28 Apr 2021 09:35:32 -0700 (PDT) Received: by mail.as397444.net (Postfix) with UTF8SMTPSA id F386155B76A; Wed, 28 Apr 2021 16:35:28 +0000 (UTC) X-DKIM-Note: Keys used to sign are likely public at https://as397444.net/dkim/ DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mattcorallo.com; s=1619625664; t=1619627729; bh=m6s62/obytHANW+CMc2arDEQhP2GDonZEh7Iblwfuho=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=FYUYRnT7iSws5HMgqC/ojba7YSdp/0xJnE9/walA/E98T+QEp/qYCdKuZsuzh6ITJ RxaClR/aaPjVaWE22qXLmGE1Y3MImHgU3y695ywGutwBUk/49IpHY3zChDlzB30mJj peWAd1VwlpqqvDZVQ3otYhOQ4ycTJs26u/7Etz0xKZZMg27dz0Qyc9IBIY4RXm/P9/ NnL1LsMDPV1ez9iA2Q0huAZ3aKnTp8qdS1u/D8x8zT+ST7MBWTkLXreiidihGVT1M2 AhSvYdqeEm0viKot7Gh3ZXYrBhomPyRpRXmwgxh8BtQBy3+dVyaBmWac+51/E0Sldb JSvTm8y8kB8MQ== Message-ID: <64829c98-e4eb-6725-0fee-dc3c6681506f@bluematt.me> Date: Wed, 28 Apr 2021 12:35:28 -0400 MIME-Version: 1.0 Subject: Re: [PATCH net-next] Reduce IP_FRAG_TIME fragment-reassembly timeout to 1s, from 30s Content-Language: en-US To: Eric Dumazet Cc: Willy Tarreau , "David S. Miller" , netdev , Alexey Kuznetsov , Hideaki YOSHIFUJI , Keyu Man References: <0cb19f7e-a9b3-58f8-6119-0736010f1326@bluematt.me> <20210428141319.GA7645@1wt.eu> <055d0512-216c-9661-9dd4-007c46049265@bluematt.me> From: Matt Corallo In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 4/28/21 11:38, Eric Dumazet wrote: > On Wed, Apr 28, 2021 at 4:28 PM Matt Corallo > wrote: > I have been working in wifi environments (linux conferences) where RTT > could reach 20 sec, and even 30 seconds, and this was in some very > rich cities in the USA. > > Obviously, when a network is under provisioned by 50x factor, you > _need_ more time to complete fragments. Its also a trade-off - if you're in a hugely under-provisioned environment with bufferblot issues you may have some fragments that need more time for reassembly if they've gotten horribly reordered (though just having 20 second RTT doesn't imply that fragments are going to be re-ordered by 20 seconds, more likely you might see a small fraction of it), but you're also likely to have more *lost* fragments, which can trigger the black-holing behavior here. If you have some loss in the flow, its very easy to hit 1Mbps of lost fragments and suddenly instead of just giving more time to reassemble, you're just black-holing instead. I'm not claiming I have the right trade-off here, I'd love more input, but at least in my experience trying to just occasionally send fragments on a pretty standard DOCSIS modem, 30s is way off. > For some reason, the crazy IP reassembly stuff comes every couple of > years, and it is now a FAQ. > > The Internet has changed for the lucky ones, but some deployments are > using 4Mbps satellite connectivity, shared by hundreds of people. I'd think this is a great example of a case where you precisely *dont* want such a low threshold for dropping all fragments. Note that in my specific deployment (standard DOCSIS), we're talking about the same speed and network as was available ten years ago, this isn't exactly an uncommon or particularly fancy deployment. The real issue is applications which happily send 8MB of fragments within a few seconds and suddenly find themselves completely black-holed for 30 seconds, but this isn't going to just go away. > I urge application designers to _not_ rely on doomed frags, even in > controlled networks. I'd love to, but we're talking about a default value for fragment reassembly. At least in my subjective experience here, the conservative 30s time takes things from "more time" to "completely blackhole", which feels like the wrong tradeoff. At the end of the day, you can't expect fragments to work super well, indeed, and you assume some amount of loss, the goal is to minimize the loss you see from them. Even if you have some reordering, you're unlikely to see every fragment reordered (I guess you could imagine a horribly broken qdisc, does such a thing exist in practice?) such that you always need 30s to reassemble. Taking some loss to avoid making it so easy to completely blackhole fragments seems like a reasonable tradeoff. Matt