Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: "Michal Suchánek" <msuchanek@suse.de>
To: Michal Hocko <mhocko@kernel.org>
Cc: David Hildenbrand <david@redhat.com>,
	Gautham R Shenoy <ego@linux.vnet.ibm.com>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>,
	Mel Gorman <mgorman@suse.de>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Andrew Morton <akpm@linux-foundation.org>,
	linuxppc-dev@lists.ozlabs.org, Christopher Lameter <cl@linux.com>,
	Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [PATCH v5 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline
Date: Fri, 3 Jul 2020 11:10:01 +0200
Message-ID: <20200703091001.GJ21462@kitsune.suse.cz> (raw)
In-Reply-To: <20200701122110.GT2369@dhcp22.suse.cz>

On Wed, Jul 01, 2020 at 02:21:10PM +0200, Michal Hocko wrote:
> On Wed 01-07-20 13:30:57, David Hildenbrand wrote:
> > On 01.07.20 13:06, David Hildenbrand wrote:
> > > On 01.07.20 13:01, Srikar Dronamraju wrote:
> > >> * David Hildenbrand <david@redhat.com> [2020-07-01 12:15:54]:
> > >>
> > >>> On 01.07.20 12:04, Srikar Dronamraju wrote:
> > >>>> * Michal Hocko <mhocko@kernel.org> [2020-07-01 10:42:00]:
> > >>>>
> > >>>>>
> > >>>>>>
> > >>>>>> 2. Also existence of dummy node also leads to inconsistent information. The
> > >>>>>> number of online nodes is inconsistent with the information in the
> > >>>>>> device-tree and resource-dump
> > >>>>>>
> > >>>>>> 3. When the dummy node is present, single node non-Numa systems end up showing
> > >>>>>> up as NUMA systems and numa_balancing gets enabled. This will mean we take
> > >>>>>> the hit from the unnecessary numa hinting faults.
> > >>>>>
> > >>>>> I have to say that I dislike the node online/offline state and directly
> > >>>>> exporting that to the userspace. Users should only care whether the node
> > >>>>> has memory/cpus. Numa nodes can be online without any memory. Just
> > >>>>> offline all the present memory blocks but do not physically hot remove
> > >>>>> them and you are in the same situation. If users are confused by an
> > >>>>> output of tools like numactl -H then those could be updated and hide
> > >>>>> nodes without any memory&cpus.
> > >>>>>
> > >>>>> The autonuma problem sounds interesting but again this patch doesn't
> > >>>>> really solve the underlying problem because I strongly suspect that the
> > >>>>> problem is still there when a numa node gets all its memory offline as
> > >>>>> mentioned above.
> 
> I would really appreciate a feedback to these two as well.
> 
> > >>>>> While I completely agree that making node 0 special is wrong, I have
> > >>>>> still hard time to review this very simply looking patch because all the
> > >>>>> numa initialization is so spread around that this might just blow up
> > >>>>> at unexpected places. IIRC we have discussed testing in the previous
> > >>>>> version and David has provided a way to emulate these configurations
> > >>>>> on x86. Did you manage to use those instruction for additional testing
> > >>>>> on other than ppc architectures?
> > >>>>>
> > >>>>
> > >>>> I have tried all the steps that David mentioned and reported back at
> > >>>> https://lore.kernel.org/lkml/20200511174731.GD1961@linux.vnet.ibm.com/t/#u
> > >>>>
> > >>>> As a summary, David's steps are still not creating a memoryless/cpuless on
> > >>>> x86 VM.
> > >>>
> > >>> Now, that is wrong. You get a memoryless/cpuless node, which is *not
> > >>> online*. Once you hotplug some memory, it will switch online. Once you
> > >>> remove memory, it will switch back offline.
> > >>>
> > >>
> > >> Let me clarify, we are looking for a node 0 which is cpuless/memoryless at
> > >> boot.  The code in question tries to handle a cpuless/memoryless node 0 at
> > >> boot.
> > > 
> > > I was just correcting your statement, because it was wrong.
> > > 
> > > Could be that x86 code maps PXM 1 to node 0 because PXM 1 does neither
> > > have CPUs nor memory. That would imply that we can, in fact, never have
> > > node 0 offline during boot.
> > > 
> > 
> > Yep, looks like it.
> > 
> > [    0.009726] SRAT: PXM 1 -> APIC 0x00 -> Node 0
> > [    0.009727] SRAT: PXM 1 -> APIC 0x01 -> Node 0
> > [    0.009727] SRAT: PXM 1 -> APIC 0x02 -> Node 0
> > [    0.009728] SRAT: PXM 1 -> APIC 0x03 -> Node 0
> > [    0.009731] ACPI: SRAT: Node 0 PXM 1 [mem 0x00000000-0x0009ffff]
> > [    0.009732] ACPI: SRAT: Node 0 PXM 1 [mem 0x00100000-0xbfffffff]
> > [    0.009733] ACPI: SRAT: Node 0 PXM 1 [mem 0x100000000-0x13fffffff]
> 
> This begs a question whether ppc can do the same thing?
Or x86 stop doing it so that you can see on what node you are running?

What's the point of this indirection other than another way of avoiding
empty node 0?

Thanks

Michal


  parent reply index

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-24  9:28 [PATCH v5 0/3] Offline memoryless cpuless node 0 Srikar Dronamraju
2020-06-24  9:28 ` [PATCH v5 1/3] powerpc/numa: Set numa_node for all possible cpus Srikar Dronamraju
2020-06-24  9:48   ` Gautham R Shenoy
2020-06-24  9:28 ` [PATCH v5 2/3] powerpc/numa: Prefer node id queried from vphn Srikar Dronamraju
2020-06-24 10:29   ` Gautham R Shenoy
2020-06-24  9:28 ` [PATCH v5 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline Srikar Dronamraju
2020-06-29 14:58   ` Christopher Lameter
2020-06-30  4:01     ` Srikar Dronamraju
2020-07-01 12:23       ` Michal Hocko
2020-07-01  8:42   ` Michal Hocko
2020-07-01 10:04     ` Srikar Dronamraju
2020-07-01 10:15       ` David Hildenbrand
2020-07-01 11:01         ` Srikar Dronamraju
2020-07-01 11:06           ` David Hildenbrand
2020-07-01 11:30             ` David Hildenbrand
2020-07-01 12:21               ` Michal Hocko
2020-07-02  6:44                 ` Srikar Dronamraju
2020-07-02  8:41                   ` Michal Hocko
2020-07-02 14:32                     ` Srikar Dronamraju
2020-07-03  9:10                 ` Michal Suchánek [this message]
2020-07-03  9:24                   ` Michal Hocko
2020-07-03 10:59                     ` Michal Hocko
2020-07-03 11:32                       ` David Hildenbrand
2020-07-03 11:46                         ` Michal Hocko
2020-07-03 12:58                       ` Srikar Dronamraju
2020-08-07  4:32                         ` Andrew Morton
2020-08-07  6:58                           ` David Hildenbrand
2020-08-07 10:04                             ` Michal Suchánek
2020-07-06 16:08                     ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200703091001.GJ21462@kitsune.suse.cz \
    --to=msuchanek@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=david@redhat.com \
    --cc=ego@linux.vnet.ibm.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=sathnaga@linux.vnet.ibm.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git