From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 595C4C43381 for ; Thu, 21 Feb 2019 17:11:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 05FCE2077B for ; Thu, 21 Feb 2019 17:11:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=Mellanox.com header.i=@Mellanox.com header.b="VtTPIKx4" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728442AbfBURLO (ORCPT ); Thu, 21 Feb 2019 12:11:14 -0500 Received: from mail-eopbgr20051.outbound.protection.outlook.com ([40.107.2.51]:1984 "EHLO EUR02-VE1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728359AbfBURLN (ORCPT ); Thu, 21 Feb 2019 12:11:13 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=BwaWs9yf27qUwFXy0RywfM5zO0Ojeo/Mp/sYZHejCSE=; b=VtTPIKx4TmeOph6RDSIintBdYy5hL1dde2UALFeYnj3eibWRbBPIJJJ8JULW0Jg1O9W7ZxU0bGpdnuw0uEByBQB7SN0H6PwskqzTWOJ4wi3Tsju9LGSuzsfGUJ9NZAzFDAoD2CBnpKZw9I+BebUejGkB4W9fVXQherlTYkPbMYs= Received: from HE1PR0502MB3641.eurprd05.prod.outlook.com (10.167.127.11) by HE1PR0502MB3098.eurprd05.prod.outlook.com (10.175.29.144) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1643.14; Thu, 21 Feb 2019 17:11:07 +0000 Received: from HE1PR0502MB3641.eurprd05.prod.outlook.com ([fe80::b03d:8cd4:d259:f749]) by HE1PR0502MB3641.eurprd05.prod.outlook.com ([fe80::b03d:8cd4:d259:f749%5]) with mapi id 15.20.1643.014; Thu, 21 Feb 2019 17:11:07 +0000 From: Vlad Buslov To: Cong Wang CC: Ido Schimmel , "netdev@vger.kernel.org" , "jhs@mojatatu.com" , "jiri@resnulli.us" , "davem@davemloft.net" , "ast@kernel.org" , "daniel@iogearbox.net" Subject: Re: [PATCH net-next v4 07/17] net: sched: protect filter_chain list with filter_chain_lock mutex Thread-Topic: [PATCH net-next v4 07/17] net: sched: protect filter_chain list with filter_chain_lock mutex Thread-Index: AQHUwefD/Cm+p7M7pEKHPk7WVBdPDaXfoUsAgAEF54CABfdpAIAAqksAgAITdgCAATCtgA== Date: Thu, 21 Feb 2019 17:11:07 +0000 Message-ID: References: <20190211085548.7190-1-vladbu@mellanox.com> <20190211085548.7190-8-vladbu@mellanox.com> <20190214182442.GA19269@splinter> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: LO2P265CA0350.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:d::26) To HE1PR0502MB3641.eurprd05.prod.outlook.com (2603:10a6:7:85::11) authentication-results: spf=none (sender IP is ) smtp.mailfrom=vladbu@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [37.142.13.130] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 78035bb9-69fc-4ede-5926-08d6981f9176 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0;PCL:0;RULEID:(2390118)(7020095)(4652040)(8989299)(5600110)(711020)(4605104)(4618075)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(2017052603328)(7153060)(7193020);SRVR:HE1PR0502MB3098; x-ms-traffictypediagnostic: HE1PR0502MB3098: x-microsoft-exchange-diagnostics: =?iso-8859-1?Q?1;HE1PR0502MB3098;23:FMo17Eytv/vEPvNpsKW+kn9rNd6v5p7WGrp8v?= =?iso-8859-1?Q?c+Rl4Y11wIV5ZfiLOXim7Hyjenr4HTUuo0AygKmzkW5/7KY0Mc4O97uJfR?= =?iso-8859-1?Q?VbU9Q6ZFRpyTWtXlyOmaQKZc2nsHzEuhkH6NfaX7E/ci7/oR+OHSNSIK/8?= =?iso-8859-1?Q?c0lFAt3OjUswULd1vrGK9/yVw2CrCnTJ3+e1IQ2sppV56nNU1WFTujh9W/?= =?iso-8859-1?Q?zEfjoXuBZxyMTBH/ACprLW20x9NBBCFf/Z0p8LZRAWbdFAtLmElrWtm61+?= =?iso-8859-1?Q?Vokatn3NT2VoepXOKLGY4ApDeJ+PYxGhElpxRGHsgc6DYoOKUSeHsVE81K?= =?iso-8859-1?Q?96kmItDVV02YNfzeXSiCJmGPGT7S3eCyehZqnzMZofdtdAxjN94oGa/kWT?= =?iso-8859-1?Q?N37zTobnfR4BctYTfr+kgSl8GJQ1SfwDhjH+sxwRT+qB4jxkASNlYBm1Kv?= =?iso-8859-1?Q?UOKlSC4X8QId6bkrSUFTt92IF4GFGdFlruTA3pmqi8mY6v3YCdaa4zZgi1?= =?iso-8859-1?Q?Lh+H0jT7lt0H2fIG8/A/3HehLLfI0dZoSSi3+dUL0Jj5RH831UMyTnUk6L?= =?iso-8859-1?Q?cf/YmJiG+7z9yM2vnIYOO8xJfeq2C7oy4aX1rIx3U85x2pizN5WMV4AcOl?= =?iso-8859-1?Q?ch9+fMeakMAsXZcb1mJmkkX3IJuGhB8YF3gziAN3/izaWFd8QWfourXujA?= =?iso-8859-1?Q?1r07rrL/kFb6lvgVvm4Ln9vestsAFaJB/DcFV7yVKT8cLNlufMGhBAIZXL?= =?iso-8859-1?Q?O44uSS1S5vo+8+Dpa/0SGhn3s1nWiHlus4G8b4cTntDA9UFu/ITsNNDoy5?= =?iso-8859-1?Q?JXodWv3hVeS+2NRGX3QATQ9YVgQ5n8MaogD0z5owiwSO2Ah7JApRiKDE6U?= =?iso-8859-1?Q?ky6l6vDjCS0fkbN9Cs7FW8ota3m7qvqGIZsYDTnHuNdfiR8xmG8KcHQKE1?= =?iso-8859-1?Q?veIIn7FhPv/pu3fuAMC4vCslJguV/3PJSgZTqSLsUIpwslUhYEunETgiF7?= =?iso-8859-1?Q?GucrIGRHx+mME9V55AhqEeB3JJInAfpMXA3ifJVTgTxsG5m10JNVMPGu6s?= =?iso-8859-1?Q?lzWTSPlqCpOCYLG529Dlbqj5kRqm/lFj9ip1BkrnSSoauwBS31Y5OOezVu?= =?iso-8859-1?Q?CRaEf1y+C9LmWf6olgfvTZt+YGw3o5ppdy2CYOF5GTNEkwf48m5WIqkiO5?= =?iso-8859-1?Q?lyX73H65c2j64pDp8CRtd1xqK92OHXRciartUFe4h2t4TvMaZHvDkspNSy?= =?iso-8859-1?Q?rx8zFzM1QLtpJnivT2sCHIudKIt4KIWtO4+rTPNxcYYwdDmKM+Kn/vAz/E?= =?iso-8859-1?Q?u5m6BTGMLAA1M1bpdcveF0Xcd?= x-microsoft-antispam-prvs: x-forefront-prvs: 09555FB1AD x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(396003)(39860400002)(346002)(376002)(136003)(366004)(189003)(199004)(71200400001)(486006)(6916009)(6436002)(25786009)(229853002)(99286004)(4326008)(2616005)(11346002)(71190400001)(476003)(446003)(86362001)(26005)(305945005)(186003)(7736002)(478600001)(68736007)(6512007)(6486002)(6246003)(105586002)(81156014)(8936002)(81166006)(3846002)(6116002)(14444005)(316002)(256004)(106356001)(36756003)(53936002)(102836004)(8676002)(97736004)(93886005)(14454004)(66066001)(5660300002)(2906002)(52116002)(54906003)(386003)(76176011)(6506007)(53546011);DIR:OUT;SFP:1101;SCL:1;SRVR:HE1PR0502MB3098;H:HE1PR0502MB3641.eurprd05.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;MX:1;A:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: zHPX28zaFNAr+WRua/jQc0loJCYkqmYNdhOmFk73oZFMflMuLoz+JsE0HbGRQ3Kn/mHNbaiwVBe6E9S8Ocn+fUImoHbeq32kLFcg+yZWk8f+TEQzEzfts7tYKFH49bXprqD/ai3WuB+gE64yRACViPFzUH+X9aKFno6USrQZNlE//M5+reKfnZaenTqb+jDHk9MCPN2V1f51St9ervfU4UpF6fA9tvQMKgZC2V/6LpdXsvo33vGfQtmNB64a3pKD9T2iyA1S9umr3wKGAk1RejeY4mgGb4lCrDjTOc6MYbPxXNbc3jy8AMrJAbB5r7rTlL3KAz/76qxedwOP6Og8BdDfGCziCCnWlcwJ7T4GqC7HdswdqQM1i8sgMrM81yivIf17enR8Gxtk7Rd8dZZ5tTD9KKoUnsMhgMZowqXZENk= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 78035bb9-69fc-4ede-5926-08d6981f9176 X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Feb 2019 17:11:06.0586 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0502MB3098 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Wed 20 Feb 2019 at 23:00, Cong Wang wrote: > On Tue, Feb 19, 2019 at 7:20 AM Vlad Buslov wrote: >> >> >> On Tue 19 Feb 2019 at 05:08, Cong Wang wrote: >> > On Fri, Feb 15, 2019 at 2:02 AM Vlad Buslov wrot= e: >> >> >> >> I looked at the code and problem seems to be matchall classifier >> >> specific. My implementation of unlocked cls API assumes that concurre= nt >> >> insertions are possible and checks for it when deleting "empty" tp. >> >> Since classifiers don't expose number of elements, the only way to te= st >> >> this is to do tp->walk() on them and assume that walk callback is cal= led >> >> once per filter on every classifier. In your example new tp is create= d >> >> for second filter, filter insertion fails, number of elements on newl= y >> >> created tp is checked with tp->walk() before deleting it. However, >> >> matchall classifier always calls the tp->walk() callback once, even w= hen >> >> it doesn't have a valid filter (in this case with NULL filter pointer= ). >> > >> > Again, this can be eliminated by just switching to normal >> > non-retry logic. This is yet another headache to review this >> > kind of unlock-and-retry logic, I have no idea why you are such >> > a big fan of it. >> >> The retry approach was suggested to me multiple times by Jiri on >> previous code reviews so I assumed it is preferred approach in such >> cases. I don't have a strong preference in this regard, but locking >> whole tp on filter update will remove any parallelism when updating same >> classifier instance concurrently. The goal of these changes is to allow >> parallel rule update and to achieve that I had to introduce some >> complexity into the code. > > Yeah, but with unlock-and-retry it would waste more time when > retry occurs. So it can't be better in the worst scenario. > > The question is essentially that do we want to waste CPU cycles > when conflicts occurs or just block there until it is safe to enter > the critical section? > > And, is the retry bound? Is it possible that we would retry infinitely > as long as we time it correctly? > > >> >> Now let me explain why these two approaches result completely different >> performance in this case. Lets start with a list of most CPU-consuming >> parts in new filter creation process in descending order (raw data at >> the end of this mail): >> >> 1) Hardware offload - if available and no skip_hw. >> 2) Exts (actions) initalization - most expensive part even with single >> action, CPU usage increases with number of actions per filter. >> 3) cls API. >> 4) Flower classifier data structure initialization. >> >> Note that 1)+2) is ~80% of cost of creating a flower filter. So if we >> just lock the whole flower classifier instance during rule update we >> serialize 1, 2 and 4, and only cls API (~13% of CPU cost) can be >> executed concurrently. However, in proposed flower implementation hw >> offloading and action initialization code is called without any locks >> and tp->lock is only obtained when modifying flower data structures, >> which means that only 3) is serialized and everything else (87% of CPU >> cost) can be executed in parallel. > > What about when conflicts detected and retry the whole change? > And, of course, how often do conflicts happen? > > Thanks. I had similar concerns when designing this change. Lets look at two cases when this retry is needed. One process creates first filter on classifier and fails, while other processes are trying to concurrently add filter to same block/chain/tp: 1) Process obtains filter_chain_lock, performs unsuccessful tp lookup, releases the lock. 2) Calls tcf_chain_tp_insert_unique() which obtains filter_chain_lock, inserts new tp, releases the lock. 3) Calls tp->ops->change() that returns an error. 4) Calls tcf_chain_tp_delete_empty() which takes filter_chain_lock, verifie= s that no filters were added to tp concurrently, sets tp->deleting flag, removes tp from chain. This is supposed to be very rare occurrence because for retry to happen it not only requires concurrent insertions to same block/chain/tp, but also that tp with requested prio didn't exist before and no concurrent process succeeded in adding at least one filter to tp during step 3 before it is marked for deletion in step 4 (otherwise tcf_proto_check_delete() fails and concurrent threads don't perform retry). Another case is when last filter is being deleted while concurrent processes adding new filters to same block/chain/tp: 1) tc_del_tfilter() gets last filter with tp->ops->get() 2) Deletes it with tp->ops->delete()... 3) ... that return 'last' hint set to true. 4) Calls tcf_chain_tp_delete_empty() which takes filter_chain_lock, verifie= s that no filters were added to tp concurrently, sets tp->deleting flag, removes tp from chain. This case is also quite rare because it requires concurrent users to successfully lookup tp before tp->deleting is set to true and tp is removed from chain, but not create any new filters on tp during that time. After considering this I decided that it is not worth it to penalize common case of updating filters by completely removing parallelism when updates target same tp instance for such rare corner cases as described above. Now regarding forcing users to retry indefinitely. In later cases no more than one retry is possible because concurrent add processes create new tp on first retry. In former case multiple retries are possible, but to block concurrent users indefinitely would require malicious process to somehow always have priority when obtaining filter_chain_lock during steps 1, 2, then wait to allow all concurrent users to lookup the tp, then obtain filter_chain_lock in step 4 and initiate tp deletion before any of concurrent users that have a reference to this new tp instance can insert any single filter on it, then go back to step 1, obtain lock first and repeat. I don't see how this can be timed from userspace repeatedly as creating first filter on new tp involves multiple cycles of getting and releasing filter_chain_lock and each of them require attacker to "influence" kernel scheduler to behave in very specific fashion. Regards, Vlad