From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97974C43381 for ; Tue, 19 Feb 2019 16:05:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 503E42147A for ; Tue, 19 Feb 2019 16:05:02 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=Mellanox.com header.i=@Mellanox.com header.b="D9ePq6Xe" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727048AbfBSQFB (ORCPT ); Tue, 19 Feb 2019 11:05:01 -0500 Received: from mail-eopbgr70058.outbound.protection.outlook.com ([40.107.7.58]:46209 "EHLO EUR04-HE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725885AbfBSQFA (ORCPT ); Tue, 19 Feb 2019 11:05:00 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=W6DHP2YUYQYGzBHdNUUMMXjKcIGGVDBtaONo4FRrNdw=; b=D9ePq6Xe6ALmkH73MNfBM86pNjBtDES8LZy86lqUTiZermel6YZa6xu7W51pyX5HB3w6UdHfmwhvhSqzdUKng8+JdHguTn5nMX39SmOTJUidPAmsXaGhN91lPSupHehoBZK/JIkMtgpwfO5gmI1ow6Zjj7euv/FzVPnFBWdESeY= Received: from VI1PR0502MB3647.eurprd05.prod.outlook.com (52.134.7.141) by VI1PR0502MB2894.eurprd05.prod.outlook.com (10.175.24.140) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1622.16; Tue, 19 Feb 2019 16:04:54 +0000 Received: from VI1PR0502MB3647.eurprd05.prod.outlook.com ([fe80::d058:d17:78fc:969a]) by VI1PR0502MB3647.eurprd05.prod.outlook.com ([fe80::d058:d17:78fc:969a%6]) with mapi id 15.20.1622.018; Tue, 19 Feb 2019 16:04:54 +0000 From: Vlad Buslov To: Cong Wang CC: Linux Kernel Network Developers , Jamal Hadi Salim , Jiri Pirko , David Miller , Alexei Starovoitov , Daniel Borkmann Subject: Re: [PATCH net-next v4 05/17] net: sched: traverse chains in block with tcf_get_next_chain() Thread-Topic: [PATCH net-next v4 05/17] net: sched: traverse chains in block with tcf_get_next_chain() Thread-Index: AQHUwefCnpHBoUvaMEWLwFR2bAA8fKXhdauAgAPp5QCAAItvgIABauAA Date: Tue, 19 Feb 2019 16:04:54 +0000 Message-ID: References: <20190211085548.7190-1-vladbu@mellanox.com> <20190211085548.7190-6-vladbu@mellanox.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: LO2P265CA0457.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:a2::13) To VI1PR0502MB3647.eurprd05.prod.outlook.com (2603:10a6:803:f::13) authentication-results: spf=none (sender IP is ) smtp.mailfrom=vladbu@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [37.142.13.130] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: a3222ff6-9b52-47ac-c2fe-08d69683fcb0 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0;PCL:0;RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600110)(711020)(4605104)(4618075)(2017052603328)(7153060)(7193020);SRVR:VI1PR0502MB2894; x-ms-traffictypediagnostic: VI1PR0502MB2894: x-microsoft-exchange-diagnostics: =?iso-8859-1?Q?1;VI1PR0502MB2894;23:h3rqBfWmYKvMCHD6GtJXVKEPL3c4pCAbiPVeh?= =?iso-8859-1?Q?qxaMqQEpCnKKOeHb0m7eeDdhp+R9LDTbfqY/1Y+u0DPhQAA07TTIM3nANC?= =?iso-8859-1?Q?azjTWDdEaA7xuoG773NXeOQ0DWpNNhuD3NjwX0D0cmXMjgYRKHJ1dBYEv8?= =?iso-8859-1?Q?/daOxGT3s6gl6JpFzKF4IKhcgZStHtaUQjNiAlhgmWy5tLzbo5z6B8tZrh?= =?iso-8859-1?Q?PZX3WtouieE3KlxMxmNpKvMXzDo/wzXEvUCEDtyLH7CtKeOkFVlxBstnf7?= =?iso-8859-1?Q?mMvS6y1EaJ+kuYY8v/F5MGy34if9o0loc2t7Wf3yHtT4Geo5KprXdmh9E8?= =?iso-8859-1?Q?MjXwvuQWPK66RhckYlOD4/VZ6MVgl0gscUBI/mIBPv1A6btu0zcn/KIDuU?= =?iso-8859-1?Q?+4iW+N1n0jdnssy6QT3pMVs/xGAQOZxCw9Ff4+UaAAoJyzWCZ7ebW62aec?= =?iso-8859-1?Q?40qNdEg9NpSjQawnlc0/Wh4oLocQLik4I5qKGVZK33AjJbjwL4NGinS8Jt?= =?iso-8859-1?Q?0mezRVWoKTGccXt7h7rX+LPy/aqxxHEIRIOXkMvb+qIqikAe8QJdpLDB8n?= =?iso-8859-1?Q?IA6AYfO/KIaeg+qdcU5tynnN8ZG0jBb9OTACHw8f4EChPOlZ62S4IkjTr2?= =?iso-8859-1?Q?Mak2B3r4Pmd3w+bHjmiP+pc13OSfdtUx+F9epJ0X3pR7226UgNjUe3QOLF?= =?iso-8859-1?Q?VSCOBdvhLFgxX+TNyEBYIO7dxknCMA2CBfCdGJ85adqp0jDdS51AN0Q51x?= =?iso-8859-1?Q?Y7g13s9o9e5L2LREjqJuMrxqkA0TNfSaQZFCmhqxJlRYHZlzJvkm+7P8Yu?= =?iso-8859-1?Q?tA6d42bYT2RP32rvk68hqIVc8BpqmAYmrwMzPjSZrmtGzcDwEKX3R1Z77g?= =?iso-8859-1?Q?iw2hmK+wPfUd2q4M42Zn9gS9WjaA4tYB126lOkx26RCAbbd+t5EGLdAyZ7?= =?iso-8859-1?Q?//DIaJKAJMxPywi3dL8/mdBFkHh6c3cDPyCcVcZ1r6epbV3Ns2V2jc/+OY?= =?iso-8859-1?Q?pWZCIH/QDOP8em4Y9aGmEZlY448sjt9IleKmplMabT9BH1eEHkqqGy54wD?= =?iso-8859-1?Q?/IG2KVF42QUMB0mCLx+cJNVqrX+YMff3qkwxcgarA2y0geza9j0y7gsXBr?= =?iso-8859-1?Q?j/z2k0mgogYg2+CK7aW5q83thxZnmV0KKpf1Ka+QgwrAaNpalHeGT2R5iM?= =?iso-8859-1?Q?MQl4AcmwSyoVLzGHtdUiYV5X1dhL7e3f+ckmqVK1TS1gkL3fAxn5bDsiuR?= =?iso-8859-1?Q?875lpr69FWbhiJItjpD/PAUQoES19DGqulAGNiddgt22h1pncHiRxZUWQY?= =?iso-8859-1?Q?kaJBeQAxOiSrshcMqX9I+ee+f?= x-microsoft-antispam-prvs: x-forefront-prvs: 09538D3531 x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(396003)(366004)(39860400002)(346002)(136003)(376002)(189003)(199004)(99286004)(4326008)(93886005)(54906003)(76176011)(14454004)(6486002)(8936002)(486006)(476003)(446003)(2616005)(25786009)(81166006)(105586002)(229853002)(6436002)(11346002)(186003)(106356001)(478600001)(5660300002)(26005)(386003)(52116002)(6506007)(66066001)(256004)(14444005)(6512007)(36756003)(8676002)(53546011)(102836004)(305945005)(7736002)(2906002)(6246003)(86362001)(316002)(68736007)(53936002)(6916009)(71190400001)(81156014)(3846002)(6116002)(71200400001)(97736004);DIR:OUT;SFP:1101;SCL:1;SRVR:VI1PR0502MB2894;H:VI1PR0502MB3647.eurprd05.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;MX:1;A:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: GvRyK2ot8tw+DE63kpZxpOg6h46A3wG3G22lovu7ONk6y/mwhL5zAAIET1HRGl57BSAM1itCI18wuxdrxJ+7ncseFLTJAn995sfDBy8zDhawZyKWRb7jgibOGepdnK6fc3iXgF0EHTUebKN9m/M/Jo2p98/behWzKqU0hGrzHqYJLsVo5JZcqbFN+eWoiA3Bebv3JNN7CXHcRKFhJkR0spfv9loiBSvKT3IgPN6kAmCwDIeFzMVFUy8dEe6g1ZjNBA2xlM8Rt9urApmMPKCo74YvVklMBepccGBIK0DuEMtpE8NMdHMMUYYRn7OULVNVhMZdFG0GoSb0cju50DqkCE2Vn+AESnKgHw5rmFqW8Q4Ig+Ud8oCf8GQ+4QDrtWlyVgwn8L516yRVz2M+3/FQ0A57WcXYlQBr6CeM5A/ZQDE= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: a3222ff6-9b52-47ac-c2fe-08d69683fcb0 X-MS-Exchange-CrossTenant-originalarrivaltime: 19 Feb 2019 16:04:53.5052 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0502MB2894 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Mon 18 Feb 2019 at 18:26, Cong Wang wrote: > On Mon, Feb 18, 2019 at 2:07 AM Vlad Buslov wrote: >> >> Hi Cong, >> >> Thanks for reviewing! >> >> On Fri 15 Feb 2019 at 22:21, Cong Wang wrote: >> > (Sorry for joining this late.) >> > >> > On Mon, Feb 11, 2019 at 12:56 AM Vlad Buslov wro= te: >> >> @@ -2432,7 +2474,11 @@ static int tc_dump_chain(struct sk_buff *skb, = struct netlink_callback *cb) >> >> index_start =3D cb->args[0]; >> >> index =3D 0; >> >> >> >> - list_for_each_entry(chain, &block->chain_list, list) { >> >> + for (chain =3D __tcf_get_next_chain(block, NULL); >> >> + chain; >> >> + chain_prev =3D chain, >> >> + chain =3D __tcf_get_next_chain(block, chain), >> >> + tcf_chain_put(chain_prev)) { >> > >> > Why do you want to take the block->lock in each iteration >> > of the loop rather than taking once for the whole loop? >> >> This loop calls classifier ops callback in tc_chain_fill_node(). I don't >> call any classifier ops callbacks while holding block or chain lock in >> this change because the goal is to achieve fine-grained locking for data >> structures used by filter update path. Locking per-block or per-chain is >> much coarser than taking reference counters to parent structures and >> allowing classifiers to implement their own locking. > > That is the problem, when we have N filter chains in a block, you > lock and unlock mutex N times... And what __tcf_get_next_chain() > does is basically just retrieving the next entry in the list, so the > overhead of mutex is likely more than the list operation itself in > contention situation. > > Now I can see why you complained about mutex before, it is > how you use it, not actually its own problem. :) > >> >> In this case call to ops->tmplt_dump() is probably quite fast and its >> execution time doesn't depend on number of filters on the classifier, so >> releasing block->lock on each iteration doesn't provide much benefit, if >> at all. However, it is easier for me to reason about locking correctness >> in this refactoring by following a simple rule that no locks (besides >> rtnl mutex) can be held when calling classifier ops callbacks. > > Well, for me, a hierarchy locking is always simple when you take > them in the right order, that is locking the larger-scope lock first > and then smaller-scope one. > > The way you use the locking here is actually harder for me to > review, because it is hard to valid its atomicity when you unlock > the larger scope lock and re-take the smaller scope lock. You > use refcnt to ensure it will not go way, but that is still far from > guarantee of the atomicity. > > For example, tp->ops->change() which changes an existing > filter, I don't see you lock either block->lock or > chain->filter_chain_lock when calling it. How does it even work? Sorry for not describing it in more details in cover letter. Basic approach is that cls API obtains references to all necessary data structures (Qdisc, block, chain, tp) and calls classifier ops callbacks after releasing all the locks. This defers all locking and atomicity concerns to actual classifier implementations. In case of tp->ops->change() flower classifier uses tp->lock. In case of dump code (both chain and filter) there is no atomicity with or without my changes because rtnl lock is released after sending each skb to userspace and concurrent changes to tc structures can occur in between.