From: Michal Hocko <firstname.lastname@example.org> To: David Hildenbrand <email@example.com> Cc: "Michal Suchánek" <firstname.lastname@example.org>, "Gautham R Shenoy" <email@example.com>, "Srikar Dronamraju" <firstname.lastname@example.org>, "Linus Torvalds" <email@example.com>, firstname.lastname@example.org, email@example.com, "Satheesh Rajendran" <firstname.lastname@example.org>, "Mel Gorman" <email@example.com>, "Kirill A. Shutemov" <firstname.lastname@example.org>, "Andrew Morton" <email@example.com>, firstname.lastname@example.org, "Christopher Lameter" <email@example.com>, "Vlastimil Babka" <firstname.lastname@example.org>, "Andi Kleen" <email@example.com> Subject: Re: [PATCH v5 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline Date: Fri, 3 Jul 2020 13:46:03 +0200 [thread overview] Message-ID: <20200703114603.GU18446@dhcp22.suse.cz> (raw) In-Reply-To: <firstname.lastname@example.org> On Fri 03-07-20 13:32:21, David Hildenbrand wrote: > On 03.07.20 12:59, Michal Hocko wrote: > > On Fri 03-07-20 11:24:17, Michal Hocko wrote: > >> [Cc Andi] > >> > >> On Fri 03-07-20 11:10:01, Michal Suchanek wrote: > >>> On Wed, Jul 01, 2020 at 02:21:10PM +0200, Michal Hocko wrote: > >>>> On Wed 01-07-20 13:30:57, David Hildenbrand wrote: > >> [...] > >>>>> Yep, looks like it. > >>>>> > >>>>> [ 0.009726] SRAT: PXM 1 -> APIC 0x00 -> Node 0 > >>>>> [ 0.009727] SRAT: PXM 1 -> APIC 0x01 -> Node 0 > >>>>> [ 0.009727] SRAT: PXM 1 -> APIC 0x02 -> Node 0 > >>>>> [ 0.009728] SRAT: PXM 1 -> APIC 0x03 -> Node 0 > >>>>> [ 0.009731] ACPI: SRAT: Node 0 PXM 1 [mem 0x00000000-0x0009ffff] > >>>>> [ 0.009732] ACPI: SRAT: Node 0 PXM 1 [mem 0x00100000-0xbfffffff] > >>>>> [ 0.009733] ACPI: SRAT: Node 0 PXM 1 [mem 0x100000000-0x13fffffff] > >>>> > >>>> This begs a question whether ppc can do the same thing? > >>> Or x86 stop doing it so that you can see on what node you are running? > >>> > >>> What's the point of this indirection other than another way of avoiding > >>> empty node 0? > >> > >> Honestly, I do not have any idea. I've traced it down to > >> Author: Andi Kleen <email@example.com> > >> Date: Tue Jan 11 15:35:48 2005 -0800 > >> > >> [PATCH] x86_64: Fix ACPI SRAT NUMA parsing > >> > >> Fix fallout from the recent nodemask_t changes. The node ids assigned > >> in the SRAT parser were off by one. > >> > >> I added a new first_unset_node() function to nodemask.h to allocate > >> IDs sanely. > >> > >> Signed-off-by: Andi Kleen <firstname.lastname@example.org> > >> Signed-off-by: Linus Torvalds <email@example.com> > >> > >> which doesn't really tell all that much. The historical baggage and a > >> long term behavior which is not really trivial to fix I suspect. > > > > Thinking about this some more, this logic makes some sense afterall. > > Especially in the world without memory hotplug which was very likely the > > case back then. It is much better to have compact node mask rather than > > sparse one. After all node numbers shouldn't really matter as long as > > you have a clear mapping to the HW. I am not sure we export that > > information (except for the kernel ring buffer) though. > > > > The memory hotplug changes that somehow because you can hotremove numa > > nodes and therefore make the nodemask sparse but that is not a common > > case. I am not sure what would happen if a completely new node was added > > and its corresponding node was already used by the renumbered one > > though. It would likely conflate the two I am afraid. But I am not sure > > this is really possible with x86 and a lack of a bug report would > > suggest that nobody is doing that at least. > > > > I think the ACPI code takes care of properly mapping PXM to nodes. > > So if I start with PXM 0 empty and PXM 1 populated, I will get > PXM 1 == node 0 as described. Once I hotplug something to PXM 0 in QEMU > > $ echo "object_add memory-backend-ram,id=mem0,size=1G" | sudo nc -U /var/tmp/monitor > $ echo "device_add pc-dimm,id=dimm0,memdev=mem0,node=0" | sudo nc -U /var/tmp/monitor > > $ echo "info numa" | sudo nc -U /var/tmp/monitor > QEMU 5.0.50 monitor - type 'help' for more information > (qemu) info numa > 2 nodes > node 0 cpus: > node 0 size: 1024 MB > node 0 plugged: 1024 MB > node 1 cpus: 0 1 2 3 > node 1 size: 4096 MB > node 1 plugged: 0 MB Thanks for double checking. > I get in the guest: > > [ 50.174435] ------------[ cut here ]------------ > [ 50.175436] node 1 was absent from the node_possible_map > [ 50.176844] WARNING: CPU: 0 PID: 7 at mm/memory_hotplug.c:1021 add_memory_resource+0x8c/0x290 This would mean that the ACPI code or whoever does the remaping is not adding the new node into possible nodes. [...] > I remember that we added that check just recently (due to powerpc if I am not wrong). > Not sure why that triggers here. This was a misbehaving Qemu IIRC providing a garbage map. -- Michal Hocko SUSE Labs
next prev parent reply other threads:[~2020-07-03 11:46 UTC|newest] Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-06-24 9:28 [PATCH v5 0/3] Offline memoryless cpuless node 0 Srikar Dronamraju 2020-06-24 9:28 ` [PATCH v5 1/3] powerpc/numa: Set numa_node for all possible cpus Srikar Dronamraju 2020-06-24 9:48 ` Gautham R Shenoy 2020-06-24 9:28 ` [PATCH v5 2/3] powerpc/numa: Prefer node id queried from vphn Srikar Dronamraju 2020-06-24 10:29 ` Gautham R Shenoy 2020-06-24 9:28 ` [PATCH v5 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline Srikar Dronamraju 2020-06-29 14:58 ` Christopher Lameter 2020-06-30 4:01 ` Srikar Dronamraju 2020-07-01 12:23 ` Michal Hocko 2020-07-01 8:42 ` Michal Hocko 2020-07-01 10:04 ` Srikar Dronamraju 2020-07-01 10:15 ` David Hildenbrand 2020-07-01 11:01 ` Srikar Dronamraju 2020-07-01 11:06 ` David Hildenbrand 2020-07-01 11:30 ` David Hildenbrand 2020-07-01 12:21 ` Michal Hocko 2020-07-02 6:44 ` Srikar Dronamraju 2020-07-02 8:41 ` Michal Hocko 2020-07-02 14:32 ` Srikar Dronamraju 2020-07-03 9:10 ` Michal Suchánek 2020-07-03 9:24 ` Michal Hocko 2020-07-03 10:59 ` Michal Hocko 2020-07-03 11:32 ` David Hildenbrand 2020-07-03 11:46 ` Michal Hocko [this message] 2020-07-03 12:58 ` Srikar Dronamraju 2020-08-07 4:32 ` Andrew Morton 2020-08-07 6:58 ` David Hildenbrand 2020-08-07 10:04 ` Michal Suchánek 2020-08-12 6:01 ` Srikar Dronamraju 2020-08-18 7:32 ` David Hildenbrand 2020-08-18 7:37 ` Michal Hocko 2020-08-18 7:49 ` Srikar Dronamraju 2020-07-06 16:08 ` Andi Kleen
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200703114603.GU18446@dhcp22.suse.cz \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: [PATCH v5 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).