From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2920DC43381 for ; Mon, 25 Feb 2019 16:11:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D71F820C01 for ; Mon, 25 Feb 2019 16:11:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=Mellanox.com header.i=@Mellanox.com header.b="D8y5eGsP" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727950AbfBYQL2 (ORCPT ); Mon, 25 Feb 2019 11:11:28 -0500 Received: from mail-eopbgr60058.outbound.protection.outlook.com ([40.107.6.58]:51872 "EHLO EUR04-DB3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727661AbfBYQL2 (ORCPT ); Mon, 25 Feb 2019 11:11:28 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=1iEtKX5xwJOpJYI7kwLbTLXW8lOe59dDaHrW7qmGh28=; b=D8y5eGsPzRNN44qEp667GO3SXl6UKRR+chKqIzPZU8yv5I9Giis1NiXpkvK8jsW3hpeOQsaj4/AIn7W6oJL5xUkS4d26ojKFbaz8LnL2TFZTkIn/t/03I7+ac/lnGDK6bcXN2MOhR2pWNs+9p7YaXEqYho56+94J/26QtpKSFCM= Received: from HE1PR0502MB3641.eurprd05.prod.outlook.com (10.167.127.11) by HE1PR0502MB3723.eurprd05.prod.outlook.com (10.167.127.33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1643.18; Mon, 25 Feb 2019 16:11:20 +0000 Received: from HE1PR0502MB3641.eurprd05.prod.outlook.com ([fe80::b03d:8cd4:d259:f749]) by HE1PR0502MB3641.eurprd05.prod.outlook.com ([fe80::b03d:8cd4:d259:f749%5]) with mapi id 15.20.1643.019; Mon, 25 Feb 2019 16:11:20 +0000 From: Vlad Buslov To: Cong Wang CC: Linux Kernel Network Developers , Jamal Hadi Salim , Jiri Pirko , David Miller Subject: Re: [PATCH net-next 01/12] net: sched: flower: don't check for rtnl on head dereference Thread-Topic: [PATCH net-next 01/12] net: sched: flower: don't check for rtnl on head dereference Thread-Index: AQHUxDmfnNJQQFjgm0S2EuLlSiVKMKXl8iaAgADz/YCAAmnzgIABQWoAgAGwuACABH4tAA== Date: Mon, 25 Feb 2019 16:11:20 +0000 Message-ID: References: <20190214074712.17846-1-vladbu@mellanox.com> <20190214074712.17846-2-vladbu@mellanox.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: LO2P265CA0266.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:a1::14) To HE1PR0502MB3641.eurprd05.prod.outlook.com (2603:10a6:7:85::11) authentication-results: spf=none (sender IP is ) smtp.mailfrom=vladbu@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [37.142.13.130] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 1083e85a-bfa6-49d0-67c3-08d69b3be0f6 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0;PCL:0;RULEID:(2390118)(7020095)(4652040)(8989299)(5600110)(711020)(4605104)(4618075)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(2017052603328)(7153060)(7193020);SRVR:HE1PR0502MB3723; x-ms-traffictypediagnostic: HE1PR0502MB3723: x-microsoft-exchange-diagnostics: =?iso-8859-1?Q?1;HE1PR0502MB3723;23:WdJR1gGomJ6Zq4s1dB9qEaSwVCpLNjX2/dO8s?= =?iso-8859-1?Q?Y2GbKRncCMlqvVvRSn8SfF83u5OJfgwZrSGLXPvuDNDoHaVdbyGqUe0IOk?= =?iso-8859-1?Q?K5EaFWc/E+/m2787rR3Z8BNvV6IsdPBofdxTax6MTfAi74J0GwcqlvDGl2?= =?iso-8859-1?Q?i9bPM+4phpvu4mdpGxAzcLZxhih9+3ipYyyv3/0HPDGfP7YzrrWiHNOGPS?= =?iso-8859-1?Q?MbCqPGo9vspG8Ibi90V+UzqK9UNyO8DysF14tzI74OOk2qZ/QQdrt8tAmn?= =?iso-8859-1?Q?DUZ+IQpTaa9up719Z3EfNk6n7ObL1a/b5AS14bMK48d28rOwWWLS2qmY0q?= =?iso-8859-1?Q?4mCadIqZgdhBtM+BHuxfYALHXSoQ+ckT96RqU42W48OYtDvLKUAJcnkqwD?= =?iso-8859-1?Q?h3LztBq2RXHMlkp2/JXi2cHSkeyyYG7rLY4os2AFgVWcMg3jaT0NrjFQLt?= =?iso-8859-1?Q?4fNnPt5ZxJkgyMPNe/1YcWjElNWtvyKLWDYmmpNNYQ0Ru+pAlHT8fAesvL?= =?iso-8859-1?Q?/lSrzOQU6N3DMB+X0bTqZy0c3xq1kGhXAhB/WkiEpSK/5wr95REFljEHo6?= =?iso-8859-1?Q?cG2FdLuB5mUPbUNhiI7QBinfbz6fMxnRkEy0kPncll6MuAyXMbJd4AnkQh?= =?iso-8859-1?Q?7fl0bUz4T24p4gW4GZTuceV+imuCupHlHzXBpckbQNyALRmFbSq6HsGWcq?= =?iso-8859-1?Q?Dz/6fJ4NX76nFhj74RRCciWBP1iHqMaE65+hXPYyCF5rV7xlNXVpDlYx+P?= =?iso-8859-1?Q?Jg7+zfN8bCv6nq4eGn5+HJSIK4L+6t/8XjLARjATbTroguH5MY11/rlL0r?= =?iso-8859-1?Q?A39ocAuP2odrbm9azIhwu7ggUhsHTPIy+kf6saJXTFV6SggkbNEAAuI40S?= =?iso-8859-1?Q?XfURnmjT6bqlcfxBmKE1aGV8skRZ2qBrPwhYvBIo4TkBu/6bk5O1Fmh6Dj?= =?iso-8859-1?Q?wK1tQAyvBlQqCGAtQnogmU/QPdozlO1PRLLImknw1CViNMXRlS1MFlCzwB?= =?iso-8859-1?Q?Clwhjj4JYNY+YL69UasclREraSUBjal4+jGZfT/csPAbG6poKbMsYEgCVV?= =?iso-8859-1?Q?x7vXTvQaioqxtbQ2xsukxDAspswP3yUGhAwEcb5VUaR5xHgphJQqQv88pQ?= =?iso-8859-1?Q?v8py1vxTV19bI21Z9e8acnOYHWLsaGvjwwD729ThuffhcWu6JP3s0SwcNC?= =?iso-8859-1?Q?llpXGnOb6YSxX8iBDPsJJqIeD4wn+35EHbDv7HmLpbLs7WNin5kFf5/gZx?= =?iso-8859-1?Q?O5UOHtj4U9SgTZJDbnw2XgsyPEhuL7B2B2tklqXBGHcPqIn+BMRhodKdgG?= =?iso-8859-1?Q?flBoxUwf7DcTzYG7AUXTekrpzopjXrUNiK10GeAkdx3rTXw=3D=3D?= x-microsoft-antispam-prvs: x-forefront-prvs: 095972DF2F x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(366004)(136003)(376002)(39860400002)(396003)(346002)(54534003)(199004)(189003)(52116002)(99286004)(186003)(486006)(2616005)(476003)(446003)(11346002)(105586002)(71200400001)(71190400001)(106356001)(76176011)(6506007)(386003)(26005)(53546011)(102836004)(8676002)(8936002)(81166006)(68736007)(81156014)(6512007)(305945005)(7736002)(6436002)(6486002)(4326008)(3846002)(6116002)(229853002)(97736004)(25786009)(2906002)(6246003)(36756003)(6916009)(53936002)(316002)(54906003)(93886005)(478600001)(14454004)(66066001)(256004)(14444005)(5660300002)(86362001);DIR:OUT;SFP:1101;SCL:1;SRVR:HE1PR0502MB3723;H:HE1PR0502MB3641.eurprd05.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;A:1;MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: ByMqJcN8XK8y1FoU+I8PKPd96QOQxY/3t0m6zlC6bSH175yx3KQcHKHyi3DnOyRdS5TgJRGXI/DmrLlUWHfVDw/u7AnGmoWbSAfdUDNM2Dz6e5uJUQI5A/Wdo6ZvokBOCYODIF9Y9zZFLVlNeGsE4LyfWobr0/Qu6xYRSP6ZupYj8P7Czscx1K8PHgoaWdboD3DbubMc6OIa4zSXDXUaHWi4HqO7C/4HxiO/0ppb+AesKsmKVzatSPWqM4KJswvPqKPP6p44ulaQ8Ohx1Uj+bcG3KNAA9vxNpehHyVlbl14fjtbLbwA0vOQsDZ+JwaKShCsI2S4TvbdkG/9XaqT/w8Isyf2BArw64lLJ3Zu0HV78idZhtdg/qkfomaa+D84rCKsEbFgbC8TreBJEcy+HTo8PEkGW/CDoF2Pn22zbI2Y= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 1083e85a-bfa6-49d0-67c3-08d69b3be0f6 X-MS-Exchange-CrossTenant-originalarrivaltime: 25 Feb 2019 16:11:19.0998 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0502MB3723 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Fri 22 Feb 2019 at 19:32, Cong Wang wrote: > On Thu, Feb 21, 2019 at 9:45 AM Vlad Buslov wrote: >> >> >> On Wed 20 Feb 2019 at 22:33, Cong Wang wrote: >> > On Tue, Feb 19, 2019 at 1:46 AM Vlad Buslov wrot= e: >> >> >> >> >> >> On Mon 18 Feb 2019 at 19:08, Cong Wang wro= te: >> >> > On Wed, Feb 13, 2019 at 11:47 PM Vlad Buslov = wrote: >> >> >> >> >> >> Flower classifier only changes root pointer during init and destro= y. Cls >> >> >> API implements reference counting for tcf_proto, so there is no da= nger of >> >> >> concurrent access to tp when it is being destroyed, even without p= rotection >> >> >> provided by rtnl lock. >> >> > >> >> > How about atomicity? Refcnt doesn't guarantee atomicity, how do >> >> > you make sure two concurrent modifications are atomic? >> >> >> >> In order to guarantee atomicity I lock shared flower classifier data >> >> structures with tp->lock in following patches. >> > >> > Sure, I meant the atomicity of the _whole_ change, as you know >> > the TC filters are stored in hierarchical structures: a block, a chain= , >> > a tp struct, some type-specific data structure like a hash table. >> > >> > Locking tp only solves a partial of the atomicity here. Are you >> > going to restart the whole change from top down to the bottom? >> >> You mean in cases where there is a conflict? Yes, processing EAGAIN in >> tc_new_tfilter() involves releasing and re-obtaining all locks and >> references. > > Sure, restart only happens when a conflict is detected, this is > why I called it worst scenario. > > >> >> > >> > >> >> >> >> > >> >> > >> >> >> >> >> >> Implement new function fl_head_dereference() to dereference tp->ro= ot >> >> >> without checking for rtnl lock. Use it in all flower function that= obtain >> >> >> head pointer instead of rtnl_dereference(). >> >> >> >> >> > >> >> > So what lock protects RCU writers after this patch? >> >> >> >> I explained it in comment for fl_head_dereference(), but should have >> >> copied this information to changelog as well: >> >> Flower classifier only changes root pointer during init and destroy. >> >> Cls API implements reference counting for tcf_proto, so there is no >> >> danger of concurrent access to tp when it is being destroyed, even >> >> without protection provided by rtnl lock. >> > >> > So you are saying an RCU pointer is okay to deference without >> > any lock eve without RCU read lock, right? >> > >> > >> >> >> >> In initial version of this change I used tp->lock to protect tp->root >> >> access and verified it with lockdep, but during internal review Jiri >> >> noted that this is not needed in current flower implementation. >> > >> > Let's see what you have on top of your own branch >> > unlocked_flower_cong_1: >> > >> > 1458 static int fl_change(struct net *net, struct sk_buff *in_skb, >> > 1459 struct tcf_proto *tp, unsigned long base, >> > 1460 u32 handle, struct nlattr **tca, >> > 1461 void **arg, bool ovr, bool rtnl_held, >> > 1462 struct netlink_ext_ack *extack) >> > 1463 { >> > 1464 struct cls_fl_head *head =3D fl_head_dereference(tp); >> > >> > At the point of line 1464, there is no lock taken, tp->lock is taken >> > after it, block->lock or chain lock is already unlocked before ->chang= e(). >> > >> > So, what protects this RCU structure? According to RCU, it must be >> > either RCU read lock or some writer lock. I see none here. :( >> > >> > What am I missing? >> >> Initially I had flower implementation that protected tp->root access >> with tp->lock, re-obtaining lock and head reference each time it was >> used, sometimes multiple times per function (head reference is usually >> obtained in beginning of every flower API function and used multiple >> times afterwards). This complicated the code and reduced parallelism. >> However, in case of flower classifier tp->root is never reassigned after >> init, so essentially there is no "CU" part in this "RCU" pointer. This >> realization allowed me to refactor unlocked flower code to look much >> nicer and perform better. Am I missing something in flower classifier >> implementation? > > So if it is no longer RCU any more, why do you still use > rcu_dereference_protected()? That is, why not just deref it as a raw > pointer? > > And, I don't think I can buy your argument here. The RCU infrastructure > should not be changed even after your patches, the fast path is still > protocted by RCU read lock, while the slow path now is protected by > some smaller-scope locks. What makes cls_flower so unique that > it doesn't even need RCU here? tp->root is not reassigned but it is still > freed via RCU infra, that is in fl_destroy_sleepable(). > > Thanks. My cls API patch set introduced reference counting for tcf_proto structure. With that change tp->ops->destroy() (which calls fl_destroy() and fl_destroy_sleepable(), in case of flower classifier) is only called after last reference to tp is released. All slow path users of tp->ops must obtain reference to tp, so concurrent call to fl_destroy() is not possible. Before this change tcf_proto structure didn't have reference counting support and required users to obtain rtnl mutex before calling its ops callbacks. This was verified in flower by using rtnl_dereference to obtain tp->root.