From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753424AbdBHQT5 (ORCPT ); Wed, 8 Feb 2017 11:19:57 -0500 Received: from resqmta-ch2-03v.sys.comcast.net ([69.252.207.35]:56628 "EHLO resqmta-ch2-03v.sys.comcast.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753186AbdBHQT4 (ORCPT ); Wed, 8 Feb 2017 11:19:56 -0500 Date: Wed, 8 Feb 2017 10:17:19 -0600 (CST) From: Christoph Lameter X-X-Sender: cl@east.gentwo.org To: Michal Hocko cc: Thomas Gleixner , Mel Gorman , Vlastimil Babka , Dmitry Vyukov , Tejun Heo , "linux-mm@kvack.org" , LKML , Ingo Molnar , Peter Zijlstra , syzkaller , Andrew Morton Subject: Re: mm: deadlock between get_online_cpus/pcpu_alloc In-Reply-To: <20170208152106.GP5686@dhcp22.suse.cz> Message-ID: References: <20170207123708.GO5065@dhcp22.suse.cz> <20170207135846.usfrn7e4znjhmogn@techsingularity.net> <20170207141911.GR5065@dhcp22.suse.cz> <20170207153459.GV5065@dhcp22.suse.cz> <20170207162224.elnrlgibjegswsgn@techsingularity.net> <20170207164130.GY5065@dhcp22.suse.cz> <20170208073527.GA5686@dhcp22.suse.cz> <20170208152106.GP5686@dhcp22.suse.cz> Content-Type: text/plain; charset=US-ASCII X-CMAE-Envelope: MS4wfNf6wcGBR4DGQ0uCq2LkVMePoZFG9+OQpmoqdo1ugkgrb/uPDUq1n90zWl+TeKeIKFjVa/JLPSCRMoD0d3SdZEqcfSY9LoOkQHAIeUOYDx8Zr8Wj1usK jH4YekHmjFCEitX9EAvZFltwSZI+XBAfgpyv4RGh5CRVXXDZVJxTwvA3p6l2/SrwRIaLvszApE2WEr5tzyEiOhA4K6Jhh/Cbime7BaWuGn6x8WBlHFTg6Aim QaJg1YcJnovjQq/F+SJA29pQD/lR4TtkN3iKjZHfmK+8KlfNgG7Wo1HiQwcLFzj6XmecZKssoY0lY9mAfm0vipQkptE/JymGDfNklustmFh1hwndhlHMYMPy wCDJbhiedL7JcIg0ieNDpTyMFsRsWv7g4aL/9NI3HHC6k2HPCHTMFNVMmQ8zcqSnswgDnhjSp78ZPUPBe6AYBKNXRshGcvumItM0lhRI9Bj9bQi048xb4eDJ MenYsE/xutaOt0YIFgz3gK8LBtuGJTPTeVTDWA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 8 Feb 2017, Michal Hocko wrote: > I have no idea what you are trying to say and how this is related to the > deadlock we are discussing here. We certainly do not need to add > stop_machine the problem. And yeah, dropping get_online_cpus was > possible after considering all fallouts. This is not the first time get_online_cpus() causes problems due to the need to support hotplug for processors. Hotplugging is not happening frequently (which is low balling it. Actually the frequency of the hotplug events on almost all systems is zero) so the constant check is a useless overhead and causes trouble for development. In particular get_online_cpus() is often needed in sections that need to hold locks. So lets get rid of it. The severity, frequency and rarity of processor hotplug events would justify only allowing adding and removal of processors through the stop_machine_xx mechanism. With that in place the processor masks can be used without synchronization and the locking issues all over the kernel would become simpler. It is likely that this will even improve the hotplug code because the easier form of synchronization (you have a piece of code that executed while the OS is in stop state) would allow to make more significant changes to the software environment. F.e. one could think about removing memory segments as well as maybe per cpu segments. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt0-f200.google.com (mail-qt0-f200.google.com [209.85.216.200]) by kanga.kvack.org (Postfix) with ESMTP id 2255728089F for ; Wed, 8 Feb 2017 11:17:24 -0500 (EST) Received: by mail-qt0-f200.google.com with SMTP id c25so139474595qtg.2 for ; Wed, 08 Feb 2017 08:17:24 -0800 (PST) Received: from resqmta-ch2-09v.sys.comcast.net (resqmta-ch2-09v.sys.comcast.net. [2001:558:fe21:29:69:252:207:41]) by mx.google.com with ESMTPS id j13si5923280qta.134.2017.02.08.08.17.22 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 08 Feb 2017 08:17:23 -0800 (PST) Date: Wed, 8 Feb 2017 10:17:19 -0600 (CST) From: Christoph Lameter Subject: Re: mm: deadlock between get_online_cpus/pcpu_alloc In-Reply-To: <20170208152106.GP5686@dhcp22.suse.cz> Message-ID: References: <20170207123708.GO5065@dhcp22.suse.cz> <20170207135846.usfrn7e4znjhmogn@techsingularity.net> <20170207141911.GR5065@dhcp22.suse.cz> <20170207153459.GV5065@dhcp22.suse.cz> <20170207162224.elnrlgibjegswsgn@techsingularity.net> <20170207164130.GY5065@dhcp22.suse.cz> <20170208073527.GA5686@dhcp22.suse.cz> <20170208152106.GP5686@dhcp22.suse.cz> Content-Type: text/plain; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Michal Hocko Cc: Thomas Gleixner , Mel Gorman , Vlastimil Babka , Dmitry Vyukov , Tejun Heo , "linux-mm@kvack.org" , LKML , Ingo Molnar , Peter Zijlstra , syzkaller , Andrew Morton On Wed, 8 Feb 2017, Michal Hocko wrote: > I have no idea what you are trying to say and how this is related to the > deadlock we are discussing here. We certainly do not need to add > stop_machine the problem. And yeah, dropping get_online_cpus was > possible after considering all fallouts. This is not the first time get_online_cpus() causes problems due to the need to support hotplug for processors. Hotplugging is not happening frequently (which is low balling it. Actually the frequency of the hotplug events on almost all systems is zero) so the constant check is a useless overhead and causes trouble for development. In particular get_online_cpus() is often needed in sections that need to hold locks. So lets get rid of it. The severity, frequency and rarity of processor hotplug events would justify only allowing adding and removal of processors through the stop_machine_xx mechanism. With that in place the processor masks can be used without synchronization and the locking issues all over the kernel would become simpler. It is likely that this will even improve the hotplug code because the easier form of synchronization (you have a piece of code that executed while the OS is in stop state) would allow to make more significant changes to the software environment. F.e. one could think about removing memory segments as well as maybe per cpu segments. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org