From: Srikar Dronamraju <srikar@linux.vnet.ibm.com> To: Michal Hocko <mhocko@kernel.org> Cc: David Hildenbrand <david@redhat.com>, Andrew Morton <akpm@linux-foundation.org>, linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Mel Gorman <mgorman@suse.de>, Vlastimil Babka <vbabka@suse.cz>, "Kirill A. Shutemov" <kirill@shutemov.name>, Christopher Lameter <cl@linux.com>, Michael Ellerman <mpe@ellerman.id.au>, Linus Torvalds <torvalds@linux-foundation.org>, Gautham R Shenoy <ego@linux.vnet.ibm.com>, Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Subject: Re: [PATCH v5 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline Date: Thu, 2 Jul 2020 20:02:27 +0530 [thread overview] Message-ID: <20200702143227.GE17918@linux.vnet.ibm.com> (raw) In-Reply-To: <20200702084123.GC18446@dhcp22.suse.cz> * Michal Hocko <mhocko@kernel.org> [2020-07-02 10:41:23]: > On Thu 02-07-20 12:14:08, Srikar Dronamraju wrote: > > * Michal Hocko <mhocko@kernel.org> [2020-07-01 14:21:10]: > > > > > > >>>>> The autonuma problem sounds interesting but again this patch doesn't > > > > >>>>> really solve the underlying problem because I strongly suspect that the > > > > >>>>> problem is still there when a numa node gets all its memory offline as > > > > >>>>> mentioned above. > > > > > > I would really appreciate a feedback to these two as well. > > > > 1. Its not just numactl that's to be fixed but all tools/utilities that > > depend on /sys/devices/system/node/online. Are we saying to not rely/believe > > in the output given by the kernel but do further verification? > > No, what we are saying is that even an online node might have zero > number of online pages/cpus. So the online status is not really > something that matters. If people are confused by that output then user > space tools can make their confusion go away. I really do not understand > why the kernel should do any logic there. The user facing teams are saying they are getting queries from the users who are unable to understand from the tools/sysfs files why a node is online and but has no attached resources. Its the amount of time that is being spent on these issues that triggered the patch. Initially even I was skeptical that this was a non-issue. > > > Also how would the user space differentiate between the case where the > > Kernel missed marking a node as offline to the case where the memory was > > offlined on a cpuless node but node wasn't offline?. > > What I am arguing is that those two shouldn't be any different. Really! > > > 2. Regarding the autonuma, the case of offline memory is user/admin driven, > > so if there is a performance hit, its something that's driven by his > > user/admin actions. Also how often do we see users offline complete memory > > of cpuless node on a 2 node system? > > How often do we see crippled HW configurations like that? Really if > autonuma should be made more clever for one case it should recognize the > other as well. > Lets take a 16 socket PowerVM system and assume that 32 lpars are created on that socket, i.e 2 lpars for each socket. (PowerVM has the final say on how the lpars are created.) In such a case, we can expect 30 out of the 32 lpars to face this problem, with the only 2 lpars that actually run on socket 0 having the correct configuration. > > > > > > This begs a question whether ppc can do the same thing? > > > > Certainly ppc can be made to adapt to this situation but that would be a > > workaround. Do we have a reason why we think node 0 is unique and special? > > It is not. As replied in other email in this thread. I would hope for > having less hacks in the numa initialization. Cleaning up the mess is > would be a lot of work and testing on all NUMA capable architectures. > This is a heritage from the past I am afraid. All that I am arguing here > is that your touch to the generic code with a very simple looking patch > might have side effects which are pretty much impossible to review. > Moreover it seems that nothing but ppc really needs this treatment. > So fixing it in ppc specific code sounds much more safe. > > Normally I would really push for a generic solution but after getting > burned several times in this area I do not dare anymore. The problem is > not in the code complexity but in how spread it is in places where you > do not expect side effects. > I do understand and respect your viewpoint. > -- > Michal Hocko > SUSE Labs -- Thanks and Regards Srikar Dronamraju
WARNING: multiple messages have this Message-ID (diff)
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com> To: Michal Hocko <mhocko@kernel.org> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>, David Hildenbrand <david@redhat.com>, Linus Torvalds <torvalds@linux-foundation.org>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>, Mel Gorman <mgorman@suse.de>, "Kirill A. Shutemov" <kirill@shutemov.name>, Andrew Morton <akpm@linux-foundation.org>, linuxppc-dev@lists.ozlabs.org, Christopher Lameter <cl@linux.com>, Vlastimil Babka <vbabka@suse.cz> Subject: Re: [PATCH v5 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline Date: Thu, 2 Jul 2020 20:02:27 +0530 [thread overview] Message-ID: <20200702143227.GE17918@linux.vnet.ibm.com> (raw) In-Reply-To: <20200702084123.GC18446@dhcp22.suse.cz> * Michal Hocko <mhocko@kernel.org> [2020-07-02 10:41:23]: > On Thu 02-07-20 12:14:08, Srikar Dronamraju wrote: > > * Michal Hocko <mhocko@kernel.org> [2020-07-01 14:21:10]: > > > > > > >>>>> The autonuma problem sounds interesting but again this patch doesn't > > > > >>>>> really solve the underlying problem because I strongly suspect that the > > > > >>>>> problem is still there when a numa node gets all its memory offline as > > > > >>>>> mentioned above. > > > > > > I would really appreciate a feedback to these two as well. > > > > 1. Its not just numactl that's to be fixed but all tools/utilities that > > depend on /sys/devices/system/node/online. Are we saying to not rely/believe > > in the output given by the kernel but do further verification? > > No, what we are saying is that even an online node might have zero > number of online pages/cpus. So the online status is not really > something that matters. If people are confused by that output then user > space tools can make their confusion go away. I really do not understand > why the kernel should do any logic there. The user facing teams are saying they are getting queries from the users who are unable to understand from the tools/sysfs files why a node is online and but has no attached resources. Its the amount of time that is being spent on these issues that triggered the patch. Initially even I was skeptical that this was a non-issue. > > > Also how would the user space differentiate between the case where the > > Kernel missed marking a node as offline to the case where the memory was > > offlined on a cpuless node but node wasn't offline?. > > What I am arguing is that those two shouldn't be any different. Really! > > > 2. Regarding the autonuma, the case of offline memory is user/admin driven, > > so if there is a performance hit, its something that's driven by his > > user/admin actions. Also how often do we see users offline complete memory > > of cpuless node on a 2 node system? > > How often do we see crippled HW configurations like that? Really if > autonuma should be made more clever for one case it should recognize the > other as well. > Lets take a 16 socket PowerVM system and assume that 32 lpars are created on that socket, i.e 2 lpars for each socket. (PowerVM has the final say on how the lpars are created.) In such a case, we can expect 30 out of the 32 lpars to face this problem, with the only 2 lpars that actually run on socket 0 having the correct configuration. > > > > > > This begs a question whether ppc can do the same thing? > > > > Certainly ppc can be made to adapt to this situation but that would be a > > workaround. Do we have a reason why we think node 0 is unique and special? > > It is not. As replied in other email in this thread. I would hope for > having less hacks in the numa initialization. Cleaning up the mess is > would be a lot of work and testing on all NUMA capable architectures. > This is a heritage from the past I am afraid. All that I am arguing here > is that your touch to the generic code with a very simple looking patch > might have side effects which are pretty much impossible to review. > Moreover it seems that nothing but ppc really needs this treatment. > So fixing it in ppc specific code sounds much more safe. > > Normally I would really push for a generic solution but after getting > burned several times in this area I do not dare anymore. The problem is > not in the code complexity but in how spread it is in places where you > do not expect side effects. > I do understand and respect your viewpoint. > -- > Michal Hocko > SUSE Labs -- Thanks and Regards Srikar Dronamraju
next prev parent reply other threads:[~2020-07-02 14:36 UTC|newest] Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-06-24 9:28 [PATCH v5 0/3] Offline memoryless cpuless node 0 Srikar Dronamraju 2020-06-24 9:28 ` Srikar Dronamraju 2020-06-24 9:28 ` [PATCH v5 1/3] powerpc/numa: Set numa_node for all possible cpus Srikar Dronamraju 2020-06-24 9:28 ` Srikar Dronamraju 2020-06-24 9:48 ` Gautham R Shenoy 2020-06-24 9:48 ` Gautham R Shenoy 2020-06-24 9:28 ` [PATCH v5 2/3] powerpc/numa: Prefer node id queried from vphn Srikar Dronamraju 2020-06-24 9:28 ` Srikar Dronamraju 2020-06-24 10:29 ` Gautham R Shenoy 2020-06-24 10:29 ` Gautham R Shenoy 2020-06-24 9:28 ` [PATCH v5 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline Srikar Dronamraju 2020-06-24 9:28 ` Srikar Dronamraju 2020-06-29 14:58 ` Christopher Lameter 2020-06-29 14:58 ` Christopher Lameter 2020-06-29 14:58 ` Christopher Lameter 2020-06-30 4:01 ` Srikar Dronamraju 2020-06-30 4:01 ` Srikar Dronamraju 2020-07-01 12:23 ` Michal Hocko 2020-07-01 12:23 ` Michal Hocko 2020-07-01 8:42 ` Michal Hocko 2020-07-01 8:42 ` Michal Hocko 2020-07-01 10:04 ` Srikar Dronamraju 2020-07-01 10:04 ` Srikar Dronamraju 2020-07-01 10:15 ` David Hildenbrand 2020-07-01 10:15 ` David Hildenbrand 2020-07-01 11:01 ` Srikar Dronamraju 2020-07-01 11:01 ` Srikar Dronamraju 2020-07-01 11:06 ` David Hildenbrand 2020-07-01 11:06 ` David Hildenbrand 2020-07-01 11:30 ` David Hildenbrand 2020-07-01 11:30 ` David Hildenbrand 2020-07-01 12:21 ` Michal Hocko 2020-07-01 12:21 ` Michal Hocko 2020-07-02 6:44 ` Srikar Dronamraju 2020-07-02 6:44 ` Srikar Dronamraju 2020-07-02 8:41 ` Michal Hocko 2020-07-02 8:41 ` Michal Hocko 2020-07-02 14:32 ` Srikar Dronamraju [this message] 2020-07-02 14:32 ` Srikar Dronamraju 2020-07-03 9:10 ` Michal Suchánek 2020-07-03 9:10 ` Michal Suchánek 2020-07-03 9:24 ` Michal Hocko 2020-07-03 9:24 ` Michal Hocko 2020-07-03 10:59 ` Michal Hocko 2020-07-03 10:59 ` Michal Hocko 2020-07-03 11:32 ` David Hildenbrand 2020-07-03 11:32 ` David Hildenbrand 2020-07-03 11:46 ` Michal Hocko 2020-07-03 11:46 ` Michal Hocko 2020-07-03 12:58 ` Srikar Dronamraju 2020-07-03 12:58 ` Srikar Dronamraju 2020-08-07 4:32 ` Andrew Morton 2020-08-07 4:32 ` Andrew Morton 2020-08-07 6:58 ` David Hildenbrand 2020-08-07 6:58 ` David Hildenbrand 2020-08-07 10:04 ` Michal Suchánek 2020-08-07 10:04 ` Michal Suchánek 2020-08-12 6:01 ` Srikar Dronamraju 2020-08-12 6:01 ` Srikar Dronamraju 2020-08-18 7:32 ` David Hildenbrand 2020-08-18 7:32 ` David Hildenbrand 2020-08-18 7:37 ` Michal Hocko 2020-08-18 7:37 ` Michal Hocko 2020-08-18 7:49 ` Srikar Dronamraju 2020-08-18 7:49 ` Srikar Dronamraju 2020-07-06 16:08 ` Andi Kleen 2020-07-06 16:08 ` Andi Kleen
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200702143227.GE17918@linux.vnet.ibm.com \ --to=srikar@linux.vnet.ibm.com \ --cc=akpm@linux-foundation.org \ --cc=cl@linux.com \ --cc=david@redhat.com \ --cc=ego@linux.vnet.ibm.com \ --cc=kirill@shutemov.name \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=mgorman@suse.de \ --cc=mhocko@kernel.org \ --cc=mpe@ellerman.id.au \ --cc=sathnaga@linux.vnet.ibm.com \ --cc=torvalds@linux-foundation.org \ --cc=vbabka@suse.cz \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.