LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Mel Gorman <mgorman@suse.de>,
	Vlastimil Babka <vbabka@suse.cz>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Christopher Lameter <cl@linux.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH v2 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline
Date: Mon, 4 May 2020 11:37:12 +0200
Message-ID: <20200504093712.GL22838@dhcp22.suse.cz> (raw)
In-Reply-To: <20200430071820.GF19958@linux.vnet.ibm.com>

On Thu 30-04-20 12:48:20, Srikar Dronamraju wrote:
> * Michal Hocko <mhocko@kernel.org> [2020-04-29 14:22:11]:
> 
> > On Wed 29-04-20 07:11:45, Srikar Dronamraju wrote:
> > > > > 
> > > > > By marking, N_ONLINE as NODE_MASK_NONE, lets stop assuming that Node 0 is
> > > > > always online.
> > > > > 
> > > > > ...
> > > > >
> > > > > --- a/mm/page_alloc.c
> > > > > +++ b/mm/page_alloc.c
> > > > > @@ -116,8 +116,10 @@ EXPORT_SYMBOL(latent_entropy);
> > > > >   */
> > > > >  nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
> > > > >  	[N_POSSIBLE] = NODE_MASK_ALL,
> > > > > +#ifdef CONFIG_NUMA
> > > > > +	[N_ONLINE] = NODE_MASK_NONE,
> > > > > +#else
> > > > >  	[N_ONLINE] = { { [0] = 1UL } },
> > > > > -#ifndef CONFIG_NUMA
> > > > >  	[N_NORMAL_MEMORY] = { { [0] = 1UL } },
> > > > >  #ifdef CONFIG_HIGHMEM
> > > > >  	[N_HIGH_MEMORY] = { { [0] = 1UL } },
> > > > 
> > > > So on all other NUMA machines, when does node 0 get marked online?
> > > > 
> > > > This change means that for some time during boot, such machines will
> > > > now be running with node 0 marked as offline.  What are the
> > > > implications of this?  Will something break?
> > > 
> > > Till the nodes are detected, marking Node 0 as online tends to be redundant.
> > > Because the system doesn't know if its a NUMA or a non-NUMA system.
> > > Once we detect the nodes, we online them immediately. Hence I don't see any
> > > side-effects or negative implications of this change.
> > > 
> > > However if I am missing anything, please do let me know.
> > > 
> > > >From my part, I have tested this on
> > > 1. Non-NUMA Single node but CPUs and memory coming from zero node.
> > > 2. Non-NUMA Single node but CPUs and memory coming from non-zero node.
> > > 3. NUMA Multi node but with CPUs and memory from node 0.
> > > 4. NUMA Multi node but with no CPUs and memory from node 0.
> > 
> > Have you tested on something else than ppc? Each arch does the NUMA
> > setup separately and this is a big mess. E.g. x86 marks even memory less
> > nodes (see init_memory_less_node) as online.
> > 
> 
> while I have predominantly tested on ppc, I did test on X86 with CONFIG_NUMA
> enabled/disabled on both single node and multi node machines.
> However, I dont have a cpuless/memoryless x86 system.

This should be able to emulate inside kvm, I believe.

> > Honestly I have hard time to evaluate the effect of this patch. It makes
> > some sense to assume all nodes offline before they get online but this
> > is a land mine territory.
> > 
> > I am also not sure what kind of problem this is going to address. You
> > have mentioned numa balancing without many details.
> 
> 1. On a machine with just one node with node number not being 0,
> the current setup will end up showing 2 online nodes. And when there are
> more than one online nodes, numa_balancing gets enabled.
> 
> Without patch
> $ grep numa /proc/vmstat
> numa_hit 95179
> numa_miss 0
> numa_foreign 0
> numa_interleave 3764
> numa_local 95179
> numa_other 0
> numa_pte_updates 1206973                 <----------
> numa_huge_pte_updates 4654                 <----------
> numa_hint_faults 19560                 <----------
> numa_hint_faults_local 19560                 <----------
> numa_pages_migrated 0
> 
> 
> With patch
> $ grep numa /proc/vmstat 
> numa_hit 322338756
> numa_miss 0
> numa_foreign 0
> numa_interleave 3790
> numa_local 322338756
> numa_other 0
> numa_pte_updates 0                 <----------
> numa_huge_pte_updates 0                 <----------
> numa_hint_faults 0                 <----------
> numa_hint_faults_local 0                 <----------
> numa_pages_migrated 0
> 
> So we have a redundant page hinting numa faults which we can avoid.

interesting. Does this lead to any observable differences? Btw. it would
be really great to describe how the online state influences the numa
balancing.

> 2. Few people have complained about existence of this dummy node when
> parsing lscpu and numactl o/p. They somehow start to think that the tools
> are reporting incorrectly or the kernel is not able to recognize resources
> connected to the node.

Please be more specific.
-- 
Michal Hocko
SUSE Labs

  reply index

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-28  9:38 [PATCH v2 0/3] Offline memoryless cpuless node 0 Srikar Dronamraju
2020-04-28  9:38 ` [PATCH v2 1/3] powerpc/numa: Set numa_node for all possible cpus Srikar Dronamraju
2020-04-28  9:38 ` [PATCH v2 2/3] powerpc/numa: Prefer node id queried from vphn Srikar Dronamraju
2020-04-29  6:52   ` Gautham R Shenoy
2020-04-30  4:34     ` Srikar Dronamraju
2020-04-28  9:38 ` [PATCH v2 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline Srikar Dronamraju
2020-04-28 23:59   ` Andrew Morton
2020-04-29  1:41     ` Srikar Dronamraju
2020-04-29 12:22       ` Michal Hocko
2020-04-30  7:18         ` Srikar Dronamraju
2020-05-04  9:37           ` Michal Hocko [this message]
2020-05-08 13:03             ` Srikar Dronamraju
2020-05-08 13:39               ` David Hildenbrand
2020-05-08 13:42                 ` David Hildenbrand
2020-05-11 17:47                   ` Srikar Dronamraju
2020-05-12  7:49                     ` David Hildenbrand
2020-05-12 10:42                       ` Srikar Dronamraju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200504093712.GL22838@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mgorman@suse.de \
    --cc=mpe@ellerman.id.au \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git