From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 945A1C433DF for ; Tue, 19 May 2020 08:04:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 221622075F for ; Tue, 19 May 2020 08:04:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="unknown key version" (0-bit key) header.d=tlapnet.cz header.i=@tlapnet.cz header.b="EBAVRKyC" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728575AbgESIEb (ORCPT ); Tue, 19 May 2020 04:04:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53058 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727987AbgESIEa (ORCPT ); Tue, 19 May 2020 04:04:30 -0400 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 30E99C061A0C for ; Tue, 19 May 2020 01:04:30 -0700 (PDT) Received: by mail-qk1-x743.google.com with SMTP id f83so13716439qke.13 for ; Tue, 19 May 2020 01:04:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tlapnet.cz; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=ocGXIwqORay17xst69UB6yvn1kgNBi9FzeC+D70pQfw=; b=EBAVRKyCxYoKRQLc2cgXYKN897DjzYZh4nKfbX4VSDWjA1TbjRR+ye4VnIVHe3gB9Y fbWxliFgIe47JZm04Y0WzT/qmQRFfVtRYCM/ZpRQtwPKI37Lq9GguJGe1dE+vZLd7j8N /6g7wTglZaFLuN8+HUl5sjqpDXU5ZyU52UicS8Tfgtxe/Qh3n51KORO0/6RrIZwiKmvw G8a3n+TBSqRGtIyJRL3h/skeNYJUAVHPRmhuILwViq/lfleDMB3ayeka/iVBVWMssNM7 OCsaEL49qy4QlTFjkLNIz6klEZnRaQ/r/37MlXVH37UwC+II15cM6Cbqqk2fUk94WITn XVPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=ocGXIwqORay17xst69UB6yvn1kgNBi9FzeC+D70pQfw=; b=pAO5Cts5iqEWGrrHRmfHSOsuiLhfDCevilf8UnVtl6yOR6ybR9QfNvkoI16i2UUGoi gwSPxsvny3OzrUrG/j2kLacKfkPfJ9eTHoiN5Be0fxL71Nq9m7o9gl0QNUTPTD3zTDJC CtqhbfhSf8SmnlXcXNFJhw+S/SbHeU6EfbkGrMikTcNdtvmKltQa+LzgEWyQgtpQYJqV 6z/aaI8iv2yVMaP430y2oVLwzjatnonu1JSlM41eQe3QLBfK0YTfMNZH+e6+gfhqJguS Qq7MU6RUv5OlraT/PJ5RrJRwUXII8bgGrbEcOtshc+o3D5N1zRTpx+Y1LGqttWfnstXR 3Xhw== X-Gm-Message-State: AOAM532oG6IMy7OhByH3+vaLScDvDCHknV67sA79VStvV/517sf8SkJV HVhLzoNqJj+8AaeV2LHHat11nrS7kVDbz3FJM7Kpe30nKUxPqw== X-Google-Smtp-Source: ABdhPJxrr8dYO92uALrTLvV39gw4P6y8nio5oRLJIGvyT3JtKJ2Tr187foDdDC51cGokBAsKGZ/XbdSwme7ZyRHXRI4= X-Received: by 2002:a37:bd45:: with SMTP id n66mr20113859qkf.108.1589875469248; Tue, 19 May 2020 01:04:29 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?Q?V=C3=A1clav_Zindulka?= Date: Tue, 19 May 2020 10:04:18 +0200 Message-ID: Subject: Re: iproute2: tc deletion freezes whole server To: Cong Wang Cc: Linux Kernel Network Developers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Mon, May 18, 2020 at 8:22 PM Cong Wang wrote: > > On Mon, May 18, 2020 at 7:16 AM V=C3=A1clav Zindulka > wrote: > > > > On Sun, May 17, 2020 at 9:35 PM Cong Wang wr= ote: > > > > > > On Fri, May 8, 2020 at 6:59 AM V=C3=A1clav Zindulka > > > wrote: > > > > > > > > > > > > > > I tried to emulate your test case in my VM, here is the scrip= t I use: > > > > > > > > > > > > > > =3D=3D=3D=3D > > > > > > > ip li set dev dummy0 up > > > > > > > tc qd add dev dummy0 root handle 1: htb default 1 > > > > > > > for i in `seq 1 1000` > > > > > > > do > > > > > > > tc class add dev dummy0 parent 1:0 classid 1:$i htb rate 1m= bit ceil 1.5mbit > > > > > > > tc qd add dev dummy0 parent 1:$i fq_codel > > > > > > > done > > > > > > > > > > > > > > time tc qd del dev dummy0 root > > > > > > > =3D=3D=3D=3D > > > > > > > > > > > > > > And this is the result: > > > > > > > > > > > > > > Before my patch: > > > > > > > real 0m0.488s > > > > > > > user 0m0.000s > > > > > > > sys 0m0.325s > > > > > > > > > > > > > > After my patch: > > > > > > > real 0m0.180s > > > > > > > user 0m0.000s > > > > > > > sys 0m0.132s > > > > > > > > > > > > My results with your test script. > > > > > > > > > > > > before patch: > > > > > > /usr/bin/time -p tc qdisc del dev enp1s0f0 root > > > > > > real 1.63 > > > > > > user 0.00 > > > > > > sys 1.63 > > > > > > > > > > > > after patch: > > > > > > /usr/bin/time -p tc qdisc del dev enp1s0f0 root > > > > > > real 1.55 > > > > > > user 0.00 > > > > > > sys 1.54 > > > > > > > > > > > > > This is an obvious improvement, so I have no idea why you did= n't > > > > > > > catch any difference. > > > > > > > > > > > > We use hfsc instead of htb. I don't know whether it may cause a= ny > > > > > > difference. I can provide you with my test scripts if necessary= . > > > > > > > > > > Yeah, you can try to replace the htb with hfsc in my script, > > > > > I didn't spend time to figure out hfsc parameters. > > > > > > > > class add dev dummy0 parent 1:0 classid 1:$i hfsc ls m1 0 d 0 m2 > > > > 13107200 ul m1 0 d 0 m2 13107200 > > > > > > > > but it behaves the same as htb... > > > > > > > > > My point here is, if I can see the difference with merely 1000 > > > > > tc classes, you should see a bigger difference with hundreds > > > > > of thousands classes in your setup. So, I don't know why you > > > > > saw a relatively smaller difference. > > > > > > > > I saw a relatively big difference. It was about 1.5s faster on my h= uge > > > > setup which is a lot. Yet maybe the problem is caused by something > > > > > > What percentage? IIUC, without patch it took you about 11s, so > > > 1.5s faster means 13% improvement for you? > > > > My whole setup needs 22.17 seconds to delete with an unpatched kernel. > > With your patches applied it is 21.08. So it varies between 1 - 1.5s. > > Improvement is about 5 - 6%. > > Good to know. > > > > > > > else? I thought about tx/rx queues. RJ45 ports have up to 4 tx and = rx > > > > queues. SFP+ interfaces have much higher limits. 8 or even 64 possi= ble > > > > queues. I've tried to increase the number of queues using ethtool f= rom > > > > 4 to 8 and decreased to 2. But there was no difference. It was abou= t > > > > 1.62 - 1.63 with an unpatched kernel and about 1.55 - 1.58 with you= r > > > > patches applied. I've tried it for ifb and RJ45 interfaces where it > > > > took about 0.02 - 0.03 with an unpatched kernel and 0.05 with your > > > > patches applied, which is strange, but it may be caused by the fact= it > > > > was very fast even before. > > > > > > That is odd. In fact, this is highly related to number of TX queues, > > > because the existing code resets the qdisc's once for each TX > > > queue, so the more TX queues you have, the more resets kernel > > > will do, that is the more time it will take. > > > > Can't the problem be caused that reset is done on active and inactive > > queues every time? It would explain why it had no effect in decreasing > > and increasing the number of active queues. Yet it doesn't explain why > > Intel card (82599ES) with 64 possible queues has exactly the same > > problem as Mellanox (ConnectX-4 LX) with 8 possible queues. > > Regardless of these queues, the qdisc's should be only reset once, > because all of these queues point to the same instance of root > qdisc in your case. > > [...] > > With the attached patch I'm down to 1.7 seconds - more than 90% > > improvement :-) Can you please check it and pass it to proper places? > > According to debugging printk messages it empties only active queues. > > You can't change netdev_for_each_tx_queue(), it would certainly at least > break netif_alloc_netdev_queues(). You are right. I didn't check this. But that is the only one occurence of netdev_for_each_tx_queue() except sch_generic.c. I've duplicated netdev_for_each_tx_queue() to netdev_for_each_active_tx_queue() with real_num_tx_queues in for loop and altered sch_generic.c where I replaced all the occurences with the new function. From my point of view it is easier to fix unnecessary excessive calls for unallocated queues. Yet I totally understand your point of view which would mean a cleaner approach. I'm more than happy with that small fix. I'll help you test all the patches necessary yet our production will run on that dirty fix. You know, my first kernel patch. :-D If you want I'll send it to you. > > Let me think how to fix this properly, I have some ideas and will provide > you some patch(es) to test soon. Sure, I'll wait. I have plenty of time now with the main problem fixed :-) Thank you.