From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3EA3C43381 for ; Tue, 19 Feb 2019 15:20:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 63D7520818 for ; Tue, 19 Feb 2019 15:20:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=Mellanox.com header.i=@Mellanox.com header.b="rS6lsexf" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728201AbfBSPUp (ORCPT ); Tue, 19 Feb 2019 10:20:45 -0500 Received: from mail-eopbgr80089.outbound.protection.outlook.com ([40.107.8.89]:18400 "EHLO EUR04-VI1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726546AbfBSPUo (ORCPT ); Tue, 19 Feb 2019 10:20:44 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=KDCgmBRld/OiPaaemCi548IHvg3B8M962FaNTEReihE=; b=rS6lsexf0N1pXk7vuX7U5r8S+W66bszq4pgcIWEy/go74JRTt1tU0xXCj1uNSjoikm1Te0/sj27bHNRLsAFvYl/sUsUXoG3EZiDDo1MPMiN+LXz9etb4pENBVE5+YYrwLiX/bz9GtioHgU9rUZQJTgX5G4srL2evaqeO9itWaig= Received: from VI1PR0502MB3647.eurprd05.prod.outlook.com (52.134.7.141) by VI1PR0502MB4032.eurprd05.prod.outlook.com (52.134.18.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1622.16; Tue, 19 Feb 2019 15:20:37 +0000 Received: from VI1PR0502MB3647.eurprd05.prod.outlook.com ([fe80::d058:d17:78fc:969a]) by VI1PR0502MB3647.eurprd05.prod.outlook.com ([fe80::d058:d17:78fc:969a%6]) with mapi id 15.20.1622.018; Tue, 19 Feb 2019 15:20:37 +0000 From: Vlad Buslov To: Cong Wang CC: Ido Schimmel , "netdev@vger.kernel.org" , "jhs@mojatatu.com" , "jiri@resnulli.us" , "davem@davemloft.net" , "ast@kernel.org" , "daniel@iogearbox.net" Subject: Re: [PATCH net-next v4 07/17] net: sched: protect filter_chain list with filter_chain_lock mutex Thread-Topic: [PATCH net-next v4 07/17] net: sched: protect filter_chain list with filter_chain_lock mutex Thread-Index: AQHUwefD/Cm+p7M7pEKHPk7WVBdPDaXfoUsAgAEF54CABfdpAIAAqksA Date: Tue, 19 Feb 2019 15:20:37 +0000 Message-ID: References: <20190211085548.7190-1-vladbu@mellanox.com> <20190211085548.7190-8-vladbu@mellanox.com> <20190214182442.GA19269@splinter> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: LO2P265CA0144.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:9f::36) To VI1PR0502MB3647.eurprd05.prod.outlook.com (2603:10a6:803:f::13) authentication-results: spf=none (sender IP is ) smtp.mailfrom=vladbu@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [37.142.13.130] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 304ec733-ead1-4f3b-3a56-08d6967dcd11 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0;PCL:0;RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600110)(711020)(4605104)(4618075)(2017052603328)(7153060)(7193020);SRVR:VI1PR0502MB4032; x-ms-traffictypediagnostic: VI1PR0502MB4032: x-microsoft-exchange-diagnostics: =?iso-8859-1?Q?1;VI1PR0502MB4032;23:pUdDPUWhVrO03Uysez4nLFVBwvOBP5pI7tyU0?= =?iso-8859-1?Q?z153aLRWsoT/ExoVjlI6u7tpOT82MZqoz+LySUlxFrfgirbK2iWa02XQZF?= =?iso-8859-1?Q?Bqir6MFqqTi1iHXqDqHJz2DTNcAqkC0tJsDStcHoZED8+dXkMYqT04s2Cr?= =?iso-8859-1?Q?Qq8BNdYORnBV3PulQuRbdyOYhZU4PNNvA5OL/2vHvA9vcR4K/qYj0Lq5XD?= =?iso-8859-1?Q?DsTloy4wYZrVTYCWDK6nuHDTyxk7gJUWxanttvI5qCpI5dm1nzqrGXQaOq?= =?iso-8859-1?Q?3zhXDEj706DRmiYHk/wmt3NdGQ/yqqFllaZCzA7wJIJtfRwLuW1QHKTkp5?= =?iso-8859-1?Q?pjLUW+x0WSDbZZP1gTN2BF+fVCG+I63fq6Ayik2OaTkTKHbf/QEuFG73t8?= =?iso-8859-1?Q?31TIfYgH3XvuyVtM5yKdB8njQ//4kpCHEikRxYWSNg5pJHANo6cko49sFt?= =?iso-8859-1?Q?uOqjEuiXFvgQ5GFVawWjofJmI/LlPN8vo2jdDSwc+jUcT23gWU8kHO32Ok?= =?iso-8859-1?Q?N3e12xovfCjYWl+fA5vKF8RB6jMy7ozHJjDGhkTF9zYDgbiUv3YW9kHI3f?= =?iso-8859-1?Q?LiEv0cUex3HMhM/kJxHdejoTtGCa5RoW4Hg76NO9PuT6cF5CLsx92PAe+c?= =?iso-8859-1?Q?K8H9vJw7QmZMTEiC1lbU3O3/ZR1PZlqbdLKJE/1Q+LDuC8nMp7UkTjr7oh?= =?iso-8859-1?Q?wWl+og3AI6dnqTmvrsW3ESQCUEkrG/8zGOK6xGTI4YBL9aWJEocmLUm42H?= =?iso-8859-1?Q?e8rRtNR8rn+luBjBpWHygyt793l7zqi0gLWOFCB31ZaMNErzGDvwBNzp3R?= =?iso-8859-1?Q?SH4pBtdL4kCTJuwzSgJT7IJ1/8WpMMJO1VGeaH11jujS4plkb7qUYOnjq8?= =?iso-8859-1?Q?5sTJuGnvfJo3oF3nZQTWbUjZJda8B9AinFrrsIyZu1942noi2FnthWqdTI?= =?iso-8859-1?Q?HDBGBWJJrw2n+WfAu+SwGj6HxFQVi4EARToTUXmW8PgslXfJR8jDuvkCNq?= =?iso-8859-1?Q?cErPLWu25s9bDeW0xGB5J6TU1DOQtRWJlhghe8Co+F6IekN0Mb4TabQ6lt?= =?iso-8859-1?Q?QeutfK++1tIBwJEpwOq17DjQhoZDy9rFdfhbStgAjwx7neJnFfcI1Lp/xI?= =?iso-8859-1?Q?uVp1fl4/rVyHUXEPRxwXCYae82cJQlV7CxtXreLWvr4FfSyvUpzcnrDV32?= =?iso-8859-1?Q?EtgDuNQshVCgko5mAToz3LXFZ5boyvNMTEyRm+yPpgzfTpFG7wv0IxCd0B?= =?iso-8859-1?Q?qEEn3qWZZrCQEDbs8Kbq21jsR8PoLAYAkH9OIJuEaNNw20O+9YAPbzyImh?= =?iso-8859-1?Q?mtoosOQGNpHhS1S0QGXEmLZvn?= x-microsoft-antispam-prvs: x-forefront-prvs: 09538D3531 x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(396003)(136003)(346002)(376002)(366004)(39860400002)(199004)(189003)(66066001)(81166006)(8676002)(316002)(7736002)(36756003)(6246003)(54906003)(476003)(2616005)(486006)(11346002)(446003)(305945005)(8936002)(6512007)(52116002)(14454004)(93886005)(4326008)(5660300002)(99286004)(81156014)(478600001)(68736007)(71190400001)(2906002)(71200400001)(53936002)(76176011)(6916009)(97736004)(105586002)(6436002)(106356001)(3846002)(6116002)(86362001)(26005)(186003)(14444005)(25786009)(256004)(229853002)(53546011)(6506007)(386003)(6486002)(102836004);DIR:OUT;SFP:1101;SCL:1;SRVR:VI1PR0502MB4032;H:VI1PR0502MB3647.eurprd05.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;A:1;MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: /HiQl1aZs9q4AjRmAgiRTFS3x4+pBQC7UED9ERSVfmArn7o8wjxL0mTqCyVF+rInOtdoZIcZxRrF1hOQcCn7CFbfr1B76Ijme7CXHEpGC0zhrtVQngbg2zdRNPZRDCTT1VQW4z4nxNp+qYa5DeGgdRW+5ztSLPg3OsTBw8LoaBWPK8a2w5vDV2R/TfYY4yxgizfvGXLdaleOHOkICcsfdd918rkR7jaebk+LWAObiq2TLtmPVslNgkLeYaDPL8+OlF3zpwJLYzx+QS5kBz3uWN9Xm66tOf4H1+n0NMtUOWJWYe4jvH6GM+to3+zFyj/qaWa9Tm2qt6tz7RVK0GbFFux0ITKOdyiEnPB19r2bk6cgy5M9a041Jpw2OKHIoNjKVniTtfTi0LAOcpUDE+b/eNgdluUp7oy4Ia2beTSkLCc= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 304ec733-ead1-4f3b-3a56-08d6967dcd11 X-MS-Exchange-CrossTenant-originalarrivaltime: 19 Feb 2019 15:20:36.2691 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0502MB4032 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Tue 19 Feb 2019 at 05:08, Cong Wang wrote: > On Fri, Feb 15, 2019 at 2:02 AM Vlad Buslov wrote: >> >> I looked at the code and problem seems to be matchall classifier >> specific. My implementation of unlocked cls API assumes that concurrent >> insertions are possible and checks for it when deleting "empty" tp. >> Since classifiers don't expose number of elements, the only way to test >> this is to do tp->walk() on them and assume that walk callback is called >> once per filter on every classifier. In your example new tp is created >> for second filter, filter insertion fails, number of elements on newly >> created tp is checked with tp->walk() before deleting it. However, >> matchall classifier always calls the tp->walk() callback once, even when >> it doesn't have a valid filter (in this case with NULL filter pointer). > > Again, this can be eliminated by just switching to normal > non-retry logic. This is yet another headache to review this > kind of unlock-and-retry logic, I have no idea why you are such > a big fan of it. The retry approach was suggested to me multiple times by Jiri on previous code reviews so I assumed it is preferred approach in such cases. I don't have a strong preference in this regard, but locking whole tp on filter update will remove any parallelism when updating same classifier instance concurrently. The goal of these changes is to allow parallel rule update and to achieve that I had to introduce some complexity into the code. Now let me explain why these two approaches result completely different performance in this case. Lets start with a list of most CPU-consuming parts in new filter creation process in descending order (raw data at the end of this mail): 1) Hardware offload - if available and no skip_hw. 2) Exts (actions) initalization - most expensive part even with single action, CPU usage increases with number of actions per filter. 3) cls API. 4) Flower classifier data structure initialization. Note that 1)+2) is ~80% of cost of creating a flower filter. So if we just lock the whole flower classifier instance during rule update we serialize 1, 2 and 4, and only cls API (~13% of CPU cost) can be executed concurrently. However, in proposed flower implementation hw offloading and action initialization code is called without any locks and tp->lock is only obtained when modifying flower data structures, which means that only 3) is serialized and everything else (87% of CPU cost) can be executed in parallel. First page of profiling data: Samples: 100K of event 'cycles:ppp', Event count (approx.): 11191878316 Children Self Command Shared Object Symbol + 84.71% 0.08% tc [kernel.vmlinux] [k] entry_SYSCALL_64_aft= er_hwframe + 84.62% 0.06% tc [kernel.vmlinux] [k] do_syscall_64 + 82.63% 0.01% tc libc-2.25.so [.] __libc_sendmsg + 82.37% 0.00% tc [kernel.vmlinux] [k] __sys_sendmsg + 82.37% 0.00% tc [kernel.vmlinux] [k] ___sys_sendmsg + 82.34% 0.00% tc [kernel.vmlinux] [k] sock_sendmsg + 82.34% 0.01% tc [kernel.vmlinux] [k] netlink_sendmsg + 82.15% 0.15% tc [kernel.vmlinux] [k] netlink_unicast + 82.10% 0.11% tc [kernel.vmlinux] [k] netlink_rcv_skb + 80.76% 0.22% tc [kernel.vmlinux] [k] rtnetlink_rcv_msg + 80.10% 0.24% tc [kernel.vmlinux] [k] tc_new_tfilter + 69.30% 2.11% tc [cls_flower] [k] fl_change + 33.56% 0.05% tc [kernel.vmlinux] [k] tcf_exts_validate + 33.50% 0.12% tc [kernel.vmlinux] [k] tcf_action_init + 33.30% 0.10% tc [kernel.vmlinux] [k] tcf_action_init_1 + 32.78% 0.11% tc [act_gact] [k] tcf_gact_init + 30.93% 0.16% tc [kernel.vmlinux] [k] tc_setup_cb_call + 29.96% 0.60% tc [mlx5_core] [k] mlx5e_configure_flow= er + 27.62% 0.23% tc [mlx5_core] [k] mlx5e_tc_add_nic_flo= w + 27.31% 0.45% tc [kernel.vmlinux] [k] tcf_idr_create + 25.45% 1.75% tc [kernel.vmlinux] [k] pcpu_alloc + 16.33% 0.07% tc [mlx5_core] [k] mlx5_cmd_exec + 16.26% 1.96% tc [mlx5_core] [k] cmd_exec + 14.28% 1.05% tc [mlx5_core] [k] mlx5_add_flow_rules + 14.02% 0.26% tc [kernel.vmlinux] [k] pcpu_alloc_area + 13.09% 0.13% tc [mlx5_core] [k] mlx5_fc_create + 9.77% 0.30% tc [mlx5_core] [k] add_rule_fg.isra.28 + 9.08% 0.84% tc [mlx5_core] [k] mlx5_cmd_set_fte + 8.90% 0.09% tc [mlx5_core] [k] mlx5_cmd_fc_alloc + 7.90% 0.12% tc [kernel.vmlinux] [k] tfilter_notify + 7.34% 0.61% tc [kernel.vmlinux] [k] __queue_work + 7.25% 0.26% tc [kernel.vmlinux] [k] tcf_fill_node + 6.73% 0.23% tc [kernel.vmlinux] [k] wait_for_completion_= timeout + 6.67% 0.20% tc [cls_flower] [k] fl_dump + 6.52% 5.93% tc [kernel.vmlinux] [k] memset_erms + 5.77% 0.49% tc [kernel.vmlinux] [k] schedule_timeout + 5.57% 1.29% tc [kernel.vmlinux] [k] try_to_wake_up + 5.50% 0.11% tc [kernel.vmlinux] [k] pcpu_block_update_hi= nt_alloc + 5.40% 0.85% tc [kernel.vmlinux] [k] pcpu_block_refresh_h= int + 5.28% 0.11% tc [kernel.vmlinux] [k] queue_work_on + 5.19% 4.96% tc [kernel.vmlinux] [k] find_next_bit + 4.77% 0.11% tc [kernel.vmlinux] [k] idr_alloc_u32 + 4.71% 0.10% tc [kernel.vmlinux] [k] schedule + 4.62% 0.30% tc [kernel.vmlinux] [k] __sched_text_start + 4.48% 4.41% tc [kernel.vmlinux] [k] idr_get_free + 4.19% 0.04% tc [kernel.vmlinux] [k] tcf_idr_check_alloc