From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754650AbdKNSE0 (ORCPT ); Tue, 14 Nov 2017 13:04:26 -0500 Received: from mail-eopbgr30101.outbound.protection.outlook.com ([40.107.3.101]:45293 "EHLO EUR03-AM5-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752704AbdKNSEP (ORCPT ); Tue, 14 Nov 2017 13:04:15 -0500 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=ktkhai@virtuozzo.com; Subject: Re: [PATCH] net: Convert net_mutex into rw_semaphore and down read it on net->init/->exit To: Andrei Vagin Cc: davem@davemloft.net, vyasevic@redhat.com, kstewart@linuxfoundation.org, pombredanne@nexb.com, vyasevich@gmail.com, mark.rutland@arm.com, gregkh@linuxfoundation.org, adobriyan@gmail.com, fw@strlen.de, nicolas.dichtel@6wind.com, xiyou.wangcong@gmail.com, roman.kapl@sysgo.com, paul@paul-moore.com, dsahern@gmail.com, daniel@iogearbox.net, lucien.xin@gmail.com, mschiffer@universe-factory.net, rshearma@brocade.com, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, ebiederm@xmission.com, gorcunov@virtuozzo.com References: <151066759055.14465.9783879083192000862.stgit@localhost.localdomain> <20171114174454.GA11452@outlook.office365.com> From: Kirill Tkhai Message-ID: Date: Tue, 14 Nov 2017 21:04:06 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <20171114174454.GA11452@outlook.office365.com> Content-Type: text/plain; charset=koi8-r Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [195.214.232.6] X-ClientProxiedBy: VI1P189CA0023.EURP189.PROD.OUTLOOK.COM (2603:10a6:802:2a::36) To AM5PR0801MB1331.eurprd08.prod.outlook.com (2603:10a6:203:1f::9) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: fc5bac60-df51-44c7-ae5b-08d52b8a1b5d X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(2017052603199);SRVR:AM5PR0801MB1331; X-Microsoft-Exchange-Diagnostics: 1;AM5PR0801MB1331;3:wXrxoZGndLpbify4GSm0hFe5wvwe5Jt01fDiIG4o0B5Mj5a4adnuFK59zmT+kqzWVjCeo+xa3ypWWPpzbQsq90X/50J/Kr5eo+6V8E1LVa3nwp0EAspqVuk2jbX/dNCe3KJ7mtQLmQUn3o/jCQrDi1/wVAbKutT1GS9WjIwZGnXQL6rJ3PK0e8LM2WA5pPvxe4AR4/lCrxqMXFjTwZjBl/Bgw5wR21MTP7zZdaY7ClJd+ve5sfdIc+6LhZewgKzK;25:NR7K2lbN031DZ72LIWtiQhfBt4U/QIgdSfQSH2ecgp7jgxLW2E+zFjCr5ftp9Pga/6WHncaTT7hRmvvOqSbDqCRmi/HOBX85keIb74x47KdFoHUvemJ37gVXc7/6NVgORPirS50Rk3E+9yOzXzIijWHbwSJn8CWZihm9X4bwsM9que11iFmPne88glccIIxj1n0SpZsGLxYSaIzi+n/4nxi6H01ZX9JkW18wfQX8vW80pkK0vfBZ5Cg3HJe2bC1pAOUWZ4v2vXAPUL7RW2QuLS06HtnmGwnOZaMrsa1ijJBepRpT+BA0yYVitxVZLHUZuAVODny75JRWsEIMCENZNg==;31:cFgUyN/KWN0Ya1iTKpe4vIGalbI++LPlbHxEk3RJ8emeE0UdGfbVsn/ABo6Z8VcXgttAMRbqg5xjdsHs+a8nuqBITVtPNsML95W7CAAF5Gm9o6s1W5ViUginxLUSQ9f0v5u5hQUuLlVG/BcT0PfGCw9/EDTqLcs/DElxIFQE+0SbnlNKCTgYwxLrEDoZFW4/yEX6i4371WD5y+2dSR81QCqLlXq4XYw38+ZXFw9+G6E= X-MS-TrafficTypeDiagnostic: AM5PR0801MB1331: X-Microsoft-Exchange-Diagnostics: 1;AM5PR0801MB1331;20:nctOKKpuiqmL8uXeCkx6LFV0I/OLBqYjgo9oai5qB75o+xBK1Rb7VOIyKJ3P9fg/xVmF3huaB5NaKBO7sZS9CqLmT5Wu0ttlXbZexb5BYOFKxfUUCbl6fgeJGr0mlSDjcZbmo1LGahg6WkbHsfSleRVzANm94L0ifrkw66ll24nVJ+AuFJhzXT3aZnwXSYRpj4Cb6LBe1UeoiYVGRIxQU+5UTTYCS68asEaTkqqeTQRsLH0XhDRAWfUe9WsKHWlNo45/ytUQse2SyXhBbhIgxRyaeCW1cYjhaI+tSJMIGmh47uhIh/llocXJNwu3Z3hSP7t8vGu+gLlv8nVjFYwPhqmdmhVUWOUx338saVOOytg0MAGM8kv0OwAhN2FYJ6cpuo96VimQidT+NPTav1NqyL4Gw/w9/OPvIBe2WEdqEcg=;4:DFiCyTXNq6zu3BdMtUk/uTiUpY6ne2v5kogOxri9cY+W6eP2weHOy6+OdE/9h35EomwNzJwMvSl6ehnqRLBbNeMupq/N4Oodpg88NAkFGAcmdYRXg/py3ciZjRYIjk/UbwBtOSSLVClX/oQT9hNDdTriC0i1nCZ1Z0xPTWibe4x+N36yrb3A49+hAQNH1UbYJ4wcn8GosoWUnzEq85F8qIkjDH/IuieT+DuNeStn1/ks6gDNLxCEhb4FJrQ4N3XX+NYU1if482kUdOMwNDZQYA== X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(5005006)(8121501046)(100000703101)(100105400095)(93006095)(93001095)(3002001)(10201501046)(3231022)(6041248)(20161123555025)(20161123564025)(20161123562025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123560025)(20161123558100)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:AM5PR0801MB1331;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:AM5PR0801MB1331; X-Forefront-PRVS: 04916EA04C X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(6009001)(6049001)(346002)(376002)(24454002)(199003)(189002)(52314003)(478600001)(7736002)(68736007)(47776003)(3846002)(65956001)(189998001)(65806001)(101416001)(316002)(6306002)(6116002)(76176999)(54356999)(50986999)(5660300001)(66066001)(64126003)(7416002)(6486002)(77096006)(83506002)(31686004)(230700001)(16526018)(50466002)(305945005)(966005)(33646002)(23686003)(106356001)(8936002)(8676002)(36756003)(81166006)(4326008)(105586002)(86362001)(6246003)(6636002)(6862004)(65826007)(39060400002)(81156014)(2906002)(53546010)(31696002)(2950100002)(229853002)(25786009)(107886003)(37006003)(58126008)(16576012)(53936002)(97736004);DIR:OUT;SFP:1102;SCL:1;SRVR:AM5PR0801MB1331;H:[172.16.25.196];FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?koi8-r?Q?1;AM5PR0801MB1331;23:iOEk/1o1KnIrtiU588SxXr1WIzJTt4nrUW0s3KLPi?= =?koi8-r?Q?v0ZMWTV2oLJHbLR4jGP1IKwgEf8iwJ7df/3zlo9McAFfJ3laWSgEAdHbxXTq61?= =?koi8-r?Q?lTOtiA7SvAQHce3UZwWwz2dPecd0dFcSsQP+4WmI2wo2bx8aBl3Ag0UqCcKxzE?= =?koi8-r?Q?1maf/KG4Ams+86McB7bbOH9uZVBAEm4vOUz/vFwWfJau5bErLAhe1W+tTcj3gE?= =?koi8-r?Q?tT7uZQCnK64VKwB26otN0OrDGvvi2v3SDiqGP7gO3v5qcd5n3CAI1XxLA+jtrA?= =?koi8-r?Q?oGAW5izhZzs3oPtmVesav0S9kajYpkqenN2D9ObFegrJAkK5nN0ntXk7g8ji92?= =?koi8-r?Q?6I2Pa12UAT46dEeqO6noI6QHrvTbJ/5oHSB1Ga/KKjtQrJ+FfVw/GCLIUR9gpy?= =?koi8-r?Q?C0z8ix4M4XbEO4A10s1GJzV9Q1F7Y76zLKzcjrv/WyKIoW7EML1tDBvFBglSz+?= =?koi8-r?Q?OuU3QelDQzOXQu+6X5B8NbQw++dqL4nTjkqGzvPatvTGu8WUU6MA2l12WSO5Qr?= =?koi8-r?Q?MzNHr4gW0L2bf8bMQUzOR6pI3SrKYlGHjXXrlqdNFNqte3E3x+M3EnwW5zO9IV?= =?koi8-r?Q?ywwmMaxmSPkKxSSRYbtkslwP5MyOQfeLcAQFJ6qXMG+YrbTZrr67cr6hHeovB9?= =?koi8-r?Q?BM8OxPaF1z0Rp3/K/NcyUN39qNlh2iZCjfkbCiv0ElqJVmqpKCkqu4RDcSSVh1?= =?koi8-r?Q?B8PQ2GlDIT86/BBfiSu/QcVnVMZgn7YDSIqAGxqa3e/rnpp/LOmQzl6D2LYVSD?= =?koi8-r?Q?Jz1/33xpV8fwbxCLMp6YWiLEAUEecBocm8u5HUIosrPChRTKWObNQV5zM9RVma?= =?koi8-r?Q?GB0k39yjaTsKoquvBTUvSZE+ttEJ1jP8T0Y3wb8BIyB65dbcQVNkaFiUPCyLvR?= =?koi8-r?Q?oOnwWOQbYxdtXWGQtlSy1b72pH+CsWxZyryb3twVw24m+CLWjyT37H7wrlcAWA?= =?koi8-r?Q?u6BWQ2XB2AFau/xussAVo1/aiFNkHPtGSvtXTKZyGZOH4FhWbA4DoUbYRIWqWs?= =?koi8-r?Q?a9NXK16khvf2Sd9YTwXeYGeYppz+OXBtTGP2amsA9vJ/BVDeRKBbEirCJPe1ag?= =?koi8-r?Q?EDXfSXTKVfaqLSZ8GLQjfN28csxyTGHdGmV5uT1692oyzClOieUoT1mU+DgK+A?= =?koi8-r?Q?znNtwdVmRDixR3NFR76MdutW5fAoJ2ysVi4eDMucIpDWNgx+l3I+U9/e/8TfKH?= =?koi8-r?Q?sCIFXe0gXIe3cSf43PHIFDESIZPwiF7UCji+r5XzrPRUBzcf/AEjbXFiF15kqc?= =?koi8-r?Q?mSTuGq0E86FfNb4jzGM5fB4cSSoQnABSvHNH/mrH9BMWCqMhmlHyi9Eruar1ZN?= =?koi8-r?Q?13mRyn2zIPGivAWwBlyxjqgd7hQCd04BlmIPtvyO/X5HL/NRcK1zAJfzsEzbJx?= =?koi8-r?Q?ylv92tVHnPmBtqTKzbmAxf3/dU1jw3fORelHYRlaKGp/omKc50hqMRK7D48V3r?= =?koi8-r?Q?4nhSw?= X-Microsoft-Exchange-Diagnostics: 1;AM5PR0801MB1331;6:uwmQ/iDyFLKnj+AAkm0z0KQ1x47zTuDXXpxKQtUg0uBkmHFkev1KsBZhTxEK0KusklR8LZyAhhGiHAQPKh7ovyHUzTcRBaGZBTsS7+iJFr6L1IcHANOBwvo79/0ovinbasE3fIdgO7zcXfIrg5qxXSNAgBCyymFOv14ms+FUZevlUTfML8fQcsfIi3o/oWEKgJCWN3fgysUCs07/RrZCQVdMwuO32jcy8YeyuyxqwOt2GJP5j/8cdsvdeWZQBFvojJQzemIYmiW5wET/UXAI+4XfabzzEkKXYAGkFOwfCUrQCg3Gabh0UmQCAygCydbhVKOgeVogM5OoFWMWApi4JrQigssuhxFcyfG6kB31q1w=;5:YLIRHqR6Qj+dcgk1/CcvGtABldMr01fVW2nG1SA73Xe2zJ/oVrbRiE48Gl+5u8EPedm1IZeb1UpXLLWPEzIchGxKoRT69XPeLw3MbGcVxQ7ZDMADfrWGm9VmF1ls2sLEZ7oSFVkAMlRmjlRNQjanHyf86MT8qGHqE02nJwu+VFo=;24:vwKAUOMmY1wI0b9Y6uSveLtBLOlqN8wF0uVRfxAqrYMPNaSIJu441EkuFs6POXcthf+Dnf0XMgbIP0UW2KZhJyh2/HxUJLHi6virIAv4WMY=;7:+TSxzOY2lk7pT/JZB66GLzBKhuFvSWivw0FjPN0eVC8HIR11ummZyhrPPm9NhNEJMVcafxeXUzAd+ifxrA1tRXq4JQlXSdVE+u5XHZg88C9nVPFvgGhKYfWUtBWB0N5F9Dv98LwtxfdO0uEc3Ylq7oqp5UHt8SpnoCwW9gmjqpoZHAWJMCCoGSEv/pfxUPqwCMBHswuSMxdyo/BGUagfLU1SBbcvBv7QU0OcVUiLGD1E4HDCmEwBMIknXAN049kh SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;AM5PR0801MB1331;20:5B7P17FqaEICwPYFWaq3545aGYTeh0Pa7P9VQqaPs7Q+d7E+xx0ulapuCNkdLXUY43Z8i3WoEu4SDC0jmvSIkBA6YkS1s/K2LeJtjooKIZVevSx1EwZYQ530nNgWyAqUrmEtOjb7UY+Fd3p1O30FzOYoWH0bwO9bLpVq2YRhgck= X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Nov 2017 18:04:08.4273 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: fc5bac60-df51-44c7-ae5b-08d52b8a1b5d X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 0bc7f26d-0264-416e-a6fc-8352af79c58f X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5PR0801MB1331 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 14.11.2017 20:44, Andrei Vagin wrote: > On Tue, Nov 14, 2017 at 04:53:33PM +0300, Kirill Tkhai wrote: >> Curently mutex is used to protect pernet operations list. It makes >> cleanup_net() to execute ->exit methods of the same operations set, >> which was used on the time of ->init, even after net namespace is >> unlinked from net_namespace_list. >> >> But the problem is it's need to synchronize_rcu() after net is removed >> from net_namespace_list(): >> >> Destroy net_ns: >> cleanup_net() >> mutex_lock(&net_mutex) >> list_del_rcu(&net->list) >> synchronize_rcu() <--- Sleep there for ages >> list_for_each_entry_reverse(ops, &pernet_list, list) >> ops_exit_list(ops, &net_exit_list) >> list_for_each_entry_reverse(ops, &pernet_list, list) >> ops_free_list(ops, &net_exit_list) >> mutex_unlock(&net_mutex) >> >> This primitive is not fast, especially on the systems with many processors >> and/or when preemptible RCU is enabled in config. So, all the time, while >> cleanup_net() is waiting for RCU grace period, creation of new net namespaces >> is not possible, the tasks, who makes it, are sleeping on the same mutex: >> >> Create net_ns: >> copy_net_ns() >> mutex_lock_killable(&net_mutex) <--- Sleep there for ages >> >> The solution is to convert net_mutex to the rw_semaphore. Then, >> pernet_operations::init/::exit methods, modifying the net-related data, >> will require down_read() locking only, while down_write() will be used >> for changing pernet_list. >> >> This gives signify performance increase, like you may see below. There >> is measured sequential net namespace creation in a cycle, in single >> thread, without other tasks (single user mode): >> >> 1)int main(int argc, char *argv[]) >> { >> unsigned nr; >> if (argc < 2) { >> fprintf(stderr, "Provide nr iterations arg\n"); >> return 1; >> } >> nr = atoi(argv[1]); >> while (nr-- > 0) { >> if (unshare(CLONE_NEWNET)) { >> perror("Can't unshare"); >> return 1; >> } >> } >> return 0; >> } >> >> Origin, 100000 unshare(): >> 0.03user 23.14system 1:39.85elapsed 23%CPU >> >> Patched, 100000 unshare(): >> 0.03user 67.49system 1:08.34elapsed 98%CPU >> >> 2)for i in {1..10000}; do unshare -n bash -c exit; done > > Hi Kirill, > > This mutex has another role. You know that net namespaces are destroyed > asynchronously, and the net mutex gurantees that a backlog will be not > big. If we have something in backlog, we know that it will be handled > before creating a new net ns. > > As far as I remember net namespaces are created much faster than > they are destroyed, so with this changes we can create a really big > backlog, can't we? I don't think limitation is a good goal or a gool for the mutex, because it's very easy to create many net namespaces in case of the mutex exists. You may open /proc/[pid]/ns/net like a file, and net_ns counter will increment. Then, do unshare(), and the mutex has no a way to protect against that. Anyway, mutex can't limit a number of something in general, I've never seen a (good) example in kernel. As I see, the real limitation happen in inc_net_namespaces(), which is decremented after RCU grace period in cleanup_net(), and it has not changed. > There was a discussion a few month ago: > https://lists.onap.org/pipermail/containers/2016-October/037509.html > > >> >> Origin: >> real 1m24,190s >> user 0m6,225s >> sys 0m15,132s > > Here you measure time of creating and destroying net namespaces. > >> >> Patched: >> real 0m18,235s (4.6 times faster) >> user 0m4,544s >> sys 0m13,796s > > But here you measure time of crearing namespaces and you know nothing > when they will be destroyed. You're right, and I predict, the sum time, spent on cpu, will remain the same, but the think is that now creation and destroying may be executed in parallel.