All of lore.kernel.org
 help / color / mirror / Atom feed
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: David Hildenbrand <david@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Mel Gorman <mgorman@suse.de>,
	Vlastimil Babka <vbabka@suse.cz>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Christopher Lameter <cl@linux.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Gautham R Shenoy <ego@linux.vnet.ibm.com>,
	Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Subject: Re: [PATCH v5 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline
Date: Thu, 2 Jul 2020 20:02:27 +0530	[thread overview]
Message-ID: <20200702143227.GE17918@linux.vnet.ibm.com> (raw)
In-Reply-To: <20200702084123.GC18446@dhcp22.suse.cz>

* Michal Hocko <mhocko@kernel.org> [2020-07-02 10:41:23]:

> On Thu 02-07-20 12:14:08, Srikar Dronamraju wrote:
> > * Michal Hocko <mhocko@kernel.org> [2020-07-01 14:21:10]:
> > 
> > > > >>>>> The autonuma problem sounds interesting but again this patch doesn't
> > > > >>>>> really solve the underlying problem because I strongly suspect that the
> > > > >>>>> problem is still there when a numa node gets all its memory offline as
> > > > >>>>> mentioned above.
> > > 
> > > I would really appreciate a feedback to these two as well.
> > 
> > 1. Its not just numactl that's to be fixed but all tools/utilities that
> > depend on /sys/devices/system/node/online. Are we saying to not rely/believe
> > in the output given by the kernel but do further verification?  
> 
> No, what we are saying is that even an online node might have zero
> number of online pages/cpus. So the online status is not really
> something that matters. If people are confused by that output then user
> space tools can make their confusion go away. I really do not understand
> why the kernel should do any logic there.

The user facing teams are saying they are getting queries from the users who
are unable to understand from the tools/sysfs files why a node is online and
but has no attached resources. Its the amount of time that is being spent on
these issues that triggered the patch. Initially even I was skeptical that
this was a non-issue.

> 
> > Also how would the user space differentiate between the case where the
> > Kernel missed marking a node as offline to the case where the memory was
> > offlined on a cpuless node but node wasn't offline?.
> 
> What I am arguing is that those two shouldn't be any different. Really!
> 
> > 2. Regarding the autonuma, the case of offline memory is user/admin driven,
> > so if there is a performance hit, its something that's driven by his
> > user/admin actions. Also how often do we see users offline complete memory
> > of cpuless node on a 2 node system?
> 
> How often do we see crippled HW configurations like that? Really if
> autonuma should be made more clever for one case it should recognize the
> other as well.
> 

Lets take a 16 socket PowerVM system and assume that 32 lpars are created
on that socket, i.e 2 lpars for each socket. (PowerVM has the final say on
how the lpars are created.) In such a case, we can expect 30 out of the 32
lpars to face this problem, with the only 2 lpars that actually run on
socket 0 having the correct configuration.

> > > 
> > > This begs a question whether ppc can do the same thing?
> > 
> > Certainly ppc can be made to adapt to this situation but that would be a
> > workaround. Do we have a reason why we think node 0 is unique and special?
> 
> It is not. As replied in other email in this thread. I would hope for
> having less hacks in the numa initialization. Cleaning up the mess is
> would be a lot of work and testing on all NUMA capable architectures.
> This is a heritage from the past I am afraid. All that I am arguing here
> is that your touch to the generic code with a very simple looking patch
> might have side effects which are pretty much impossible to review.
> Moreover it seems that nothing but ppc really needs this treatment.
> So fixing it in ppc specific code sounds much more safe.
> 
> Normally I would really push for a generic solution but after getting
> burned several times in this area I do not dare anymore. The problem is
> not in the code complexity but in how spread it is in places where you
> do not expect side effects.
> 

I do understand and respect your viewpoint.

> -- 
> Michal Hocko
> SUSE Labs

-- 
Thanks and Regards
Srikar Dronamraju

WARNING: multiple messages have this Message-ID (diff)
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>,
	David Hildenbrand <david@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>,
	Mel Gorman <mgorman@suse.de>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org, Christopher Lameter <cl@linux.com>,
	Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [PATCH v5 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline
Date: Thu, 2 Jul 2020 20:02:27 +0530	[thread overview]
Message-ID: <20200702143227.GE17918@linux.vnet.ibm.com> (raw)
In-Reply-To: <20200702084123.GC18446@dhcp22.suse.cz>

* Michal Hocko <mhocko@kernel.org> [2020-07-02 10:41:23]:

> On Thu 02-07-20 12:14:08, Srikar Dronamraju wrote:
> > * Michal Hocko <mhocko@kernel.org> [2020-07-01 14:21:10]:
> > 
> > > > >>>>> The autonuma problem sounds interesting but again this patch doesn't
> > > > >>>>> really solve the underlying problem because I strongly suspect that the
> > > > >>>>> problem is still there when a numa node gets all its memory offline as
> > > > >>>>> mentioned above.
> > > 
> > > I would really appreciate a feedback to these two as well.
> > 
> > 1. Its not just numactl that's to be fixed but all tools/utilities that
> > depend on /sys/devices/system/node/online. Are we saying to not rely/believe
> > in the output given by the kernel but do further verification?  
> 
> No, what we are saying is that even an online node might have zero
> number of online pages/cpus. So the online status is not really
> something that matters. If people are confused by that output then user
> space tools can make their confusion go away. I really do not understand
> why the kernel should do any logic there.

The user facing teams are saying they are getting queries from the users who
are unable to understand from the tools/sysfs files why a node is online and
but has no attached resources. Its the amount of time that is being spent on
these issues that triggered the patch. Initially even I was skeptical that
this was a non-issue.

> 
> > Also how would the user space differentiate between the case where the
> > Kernel missed marking a node as offline to the case where the memory was
> > offlined on a cpuless node but node wasn't offline?.
> 
> What I am arguing is that those two shouldn't be any different. Really!
> 
> > 2. Regarding the autonuma, the case of offline memory is user/admin driven,
> > so if there is a performance hit, its something that's driven by his
> > user/admin actions. Also how often do we see users offline complete memory
> > of cpuless node on a 2 node system?
> 
> How often do we see crippled HW configurations like that? Really if
> autonuma should be made more clever for one case it should recognize the
> other as well.
> 

Lets take a 16 socket PowerVM system and assume that 32 lpars are created
on that socket, i.e 2 lpars for each socket. (PowerVM has the final say on
how the lpars are created.) In such a case, we can expect 30 out of the 32
lpars to face this problem, with the only 2 lpars that actually run on
socket 0 having the correct configuration.

> > > 
> > > This begs a question whether ppc can do the same thing?
> > 
> > Certainly ppc can be made to adapt to this situation but that would be a
> > workaround. Do we have a reason why we think node 0 is unique and special?
> 
> It is not. As replied in other email in this thread. I would hope for
> having less hacks in the numa initialization. Cleaning up the mess is
> would be a lot of work and testing on all NUMA capable architectures.
> This is a heritage from the past I am afraid. All that I am arguing here
> is that your touch to the generic code with a very simple looking patch
> might have side effects which are pretty much impossible to review.
> Moreover it seems that nothing but ppc really needs this treatment.
> So fixing it in ppc specific code sounds much more safe.
> 
> Normally I would really push for a generic solution but after getting
> burned several times in this area I do not dare anymore. The problem is
> not in the code complexity but in how spread it is in places where you
> do not expect side effects.
> 

I do understand and respect your viewpoint.

> -- 
> Michal Hocko
> SUSE Labs

-- 
Thanks and Regards
Srikar Dronamraju

  reply	other threads:[~2020-07-02 14:36 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-24  9:28 [PATCH v5 0/3] Offline memoryless cpuless node 0 Srikar Dronamraju
2020-06-24  9:28 ` Srikar Dronamraju
2020-06-24  9:28 ` [PATCH v5 1/3] powerpc/numa: Set numa_node for all possible cpus Srikar Dronamraju
2020-06-24  9:28   ` Srikar Dronamraju
2020-06-24  9:48   ` Gautham R Shenoy
2020-06-24  9:48     ` Gautham R Shenoy
2020-06-24  9:28 ` [PATCH v5 2/3] powerpc/numa: Prefer node id queried from vphn Srikar Dronamraju
2020-06-24  9:28   ` Srikar Dronamraju
2020-06-24 10:29   ` Gautham R Shenoy
2020-06-24 10:29     ` Gautham R Shenoy
2020-06-24  9:28 ` [PATCH v5 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline Srikar Dronamraju
2020-06-24  9:28   ` Srikar Dronamraju
2020-06-29 14:58   ` Christopher Lameter
2020-06-29 14:58     ` Christopher Lameter
2020-06-29 14:58     ` Christopher Lameter
2020-06-30  4:01     ` Srikar Dronamraju
2020-06-30  4:01       ` Srikar Dronamraju
2020-07-01 12:23       ` Michal Hocko
2020-07-01 12:23         ` Michal Hocko
2020-07-01  8:42   ` Michal Hocko
2020-07-01  8:42     ` Michal Hocko
2020-07-01 10:04     ` Srikar Dronamraju
2020-07-01 10:04       ` Srikar Dronamraju
2020-07-01 10:15       ` David Hildenbrand
2020-07-01 10:15         ` David Hildenbrand
2020-07-01 11:01         ` Srikar Dronamraju
2020-07-01 11:01           ` Srikar Dronamraju
2020-07-01 11:06           ` David Hildenbrand
2020-07-01 11:06             ` David Hildenbrand
2020-07-01 11:30             ` David Hildenbrand
2020-07-01 11:30               ` David Hildenbrand
2020-07-01 12:21               ` Michal Hocko
2020-07-01 12:21                 ` Michal Hocko
2020-07-02  6:44                 ` Srikar Dronamraju
2020-07-02  6:44                   ` Srikar Dronamraju
2020-07-02  8:41                   ` Michal Hocko
2020-07-02  8:41                     ` Michal Hocko
2020-07-02 14:32                     ` Srikar Dronamraju [this message]
2020-07-02 14:32                       ` Srikar Dronamraju
2020-07-03  9:10                 ` Michal Suchánek
2020-07-03  9:10                   ` Michal Suchánek
2020-07-03  9:24                   ` Michal Hocko
2020-07-03  9:24                     ` Michal Hocko
2020-07-03 10:59                     ` Michal Hocko
2020-07-03 10:59                       ` Michal Hocko
2020-07-03 11:32                       ` David Hildenbrand
2020-07-03 11:32                         ` David Hildenbrand
2020-07-03 11:46                         ` Michal Hocko
2020-07-03 11:46                           ` Michal Hocko
2020-07-03 12:58                       ` Srikar Dronamraju
2020-07-03 12:58                         ` Srikar Dronamraju
2020-08-07  4:32                         ` Andrew Morton
2020-08-07  4:32                           ` Andrew Morton
2020-08-07  6:58                           ` David Hildenbrand
2020-08-07  6:58                             ` David Hildenbrand
2020-08-07 10:04                             ` Michal Suchánek
2020-08-07 10:04                               ` Michal Suchánek
2020-08-12  6:01                           ` Srikar Dronamraju
2020-08-12  6:01                             ` Srikar Dronamraju
2020-08-18  7:32                             ` David Hildenbrand
2020-08-18  7:32                               ` David Hildenbrand
2020-08-18  7:37                               ` Michal Hocko
2020-08-18  7:37                                 ` Michal Hocko
2020-08-18  7:49                                 ` Srikar Dronamraju
2020-08-18  7:49                                   ` Srikar Dronamraju
2020-07-06 16:08                     ` Andi Kleen
2020-07-06 16:08                       ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200702143227.GE17918@linux.vnet.ibm.com \
    --to=srikar@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=david@redhat.com \
    --cc=ego@linux.vnet.ibm.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=sathnaga@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.