From: Stefano Stabellini <sstabellini@kernel.org>
To: Wei Chen <Wei.Chen@arm.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>,
"xen-devel@lists.xenproject.org"
<xen-devel@lists.xenproject.org>,
"julien@xen.org" <julien@xen.org>,
Bertrand Marquis <Bertrand.Marquis@arm.com>,
"jbeulich@suse.com" <jbeulich@suse.com>,
"andrew.cooper3@citrix.com" <andrew.cooper3@citrix.com>,
"roger.pau@citrix.com" <roger.pau@citrix.com>,
"wl@xen.org" <wl@xen.org>
Subject: RE: [PATCH 08/37] xen/x86: add detection of discontinous node memory range
Date: Mon, 27 Sep 2021 10:19:25 -0700 (PDT) [thread overview]
Message-ID: <alpine.DEB.2.21.2109271018220.5022@sstabellini-ThinkPad-T480s> (raw)
In-Reply-To: <DB9PR08MB685772C5CDE9DF885A063F479EA79@DB9PR08MB6857.eurprd08.prod.outlook.com>
[-- Attachment #1: Type: text/plain, Size: 8429 bytes --]
On Mon, 27 Sep 2021, Wei Chen wrote:
> > -----Original Message-----
> > From: Stefano Stabellini <sstabellini@kernel.org>
> > Sent: 2021年9月27日 13:05
> > To: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> > julien@xen.org; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> > jbeulich@suse.com; andrew.cooper3@citrix.com; roger.pau@citrix.com;
> > wl@xen.org
> > Subject: RE: [PATCH 08/37] xen/x86: add detection of discontinous node
> > memory range
> >
> > On Sun, 26 Sep 2021, Stefano Stabellini wrote:
> > > On Sun, 26 Sep 2021, Wei Chen wrote:
> > > > > -----Original Message-----
> > > > > From: Stefano Stabellini <sstabellini@kernel.org>
> > > > > Sent: 2021年9月25日 3:53
> > > > > To: Wei Chen <Wei.Chen@arm.com>
> > > > > Cc: Stefano Stabellini <sstabellini@kernel.org>; xen-
> > > > > devel@lists.xenproject.org; julien@xen.org; Bertrand Marquis
> > > > > <Bertrand.Marquis@arm.com>; jbeulich@suse.com;
> > andrew.cooper3@citrix.com;
> > > > > roger.pau@citrix.com; wl@xen.org
> > > > > Subject: RE: [PATCH 08/37] xen/x86: add detection of discontinous
> > node
> > > > > memory range
> > > > >
> > > > > On Fri, 24 Sep 2021, Wei Chen wrote:
> > > > > > > -----Original Message-----
> > > > > > > From: Stefano Stabellini <sstabellini@kernel.org>
> > > > > > > Sent: 2021年9月24日 8:26
> > > > > > > To: Wei Chen <Wei.Chen@arm.com>
> > > > > > > Cc: xen-devel@lists.xenproject.org; sstabellini@kernel.org;
> > > > > julien@xen.org;
> > > > > > > Bertrand Marquis <Bertrand.Marquis@arm.com>; jbeulich@suse.com;
> > > > > > > andrew.cooper3@citrix.com; roger.pau@citrix.com; wl@xen.org
> > > > > > > Subject: Re: [PATCH 08/37] xen/x86: add detection of
> > discontinous node
> > > > > > > memory range
> > > > > > >
> > > > > > > CC'ing x86 maintainers
> > > > > > >
> > > > > > > On Thu, 23 Sep 2021, Wei Chen wrote:
> > > > > > > > One NUMA node may contain several memory blocks. In current
> > Xen
> > > > > > > > code, Xen will maintain a node memory range for each node to
> > cover
> > > > > > > > all its memory blocks. But here comes the problem, in the gap
> > of
> > > > > > > > one node's two memory blocks, if there are some memory blocks
> > don't
> > > > > > > > belong to this node (remote memory blocks). This node's memory
> > range
> > > > > > > > will be expanded to cover these remote memory blocks.
> > > > > > > >
> > > > > > > > One node's memory range contains othe nodes' memory, this is
> > > > > obviously
> > > > > > > > not very reasonable. This means current NUMA code only can
> > support
> > > > > > > > node has continous memory blocks. However, on a physical
> > machine,
> > > > > the
> > > > > > > > addresses of multiple nodes can be interleaved.
> > > > > > > >
> > > > > > > > So in this patch, we add code to detect discontinous memory
> > blocks
> > > > > > > > for one node. NUMA initializtion will be failed and error
> > messages
> > > > > > > > will be printed when Xen detect such hardware configuration.
> > > > > > >
> > > > > > > At least on ARM, it is not just memory that can be interleaved,
> > but
> > > > > also
> > > > > > > MMIO regions. For instance:
> > > > > > >
> > > > > > > node0 bank0 0-0x1000000
> > > > > > > MMIO 0x1000000-0x1002000
> > > > > > > Hole 0x1002000-0x2000000
> > > > > > > node0 bank1 0x2000000-0x3000000
> > > > > > >
> > > > > > > So I am not familiar with the SRAT format, but I think on ARM
> > the
> > > > > check
> > > > > > > would look different: we would just look for multiple memory
> > ranges
> > > > > > > under a device_type = "memory" node of a NUMA node in device
> > tree.
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > Should I need to include/refine above message to commit log?
> > > > >
> > > > > Let me ask you a question first.
> > > > >
> > > > > With the NUMA implementation of this patch series, can we deal with
> > > > > cases where each node has multiple memory banks, not interleaved?
> > > >
> > > > Yes.
> > > >
> > > > > An an example:
> > > > >
> > > > > node0: 0x0 - 0x10000000
> > > > > MMIO : 0x10000000 - 0x20000000
> > > > > node0: 0x20000000 - 0x30000000
> > > > > MMIO : 0x30000000 - 0x50000000
> > > > > node1: 0x50000000 - 0x60000000
> > > > > MMIO : 0x60000000 - 0x80000000
> > > > > node2: 0x80000000 - 0x90000000
> > > > >
> > > > >
> > > > > I assume we can deal with this case simply by setting node0 memory
> > to
> > > > > 0x0-0x30000000 even if there is actually something else, a device,
> > that
> > > > > doesn't belong to node0 in between the two node0 banks?
> > > >
> > > > While this configuration is rare in SoC design, but it is not
> > impossible.
> > >
> > > Definitely, I have seen it before.
> > >
> > >
> > > > > Is it only other nodes' memory interleaved that cause issues? In
> > other
> > > > > words, only the following is a problematic scenario?
> > > > >
> > > > > node0: 0x0 - 0x10000000
> > > > > MMIO : 0x10000000 - 0x20000000
> > > > > node1: 0x20000000 - 0x30000000
> > > > > MMIO : 0x30000000 - 0x50000000
> > > > > node0: 0x50000000 - 0x60000000
> > > > >
> > > > > Because node1 is in between the two ranges of node0?
> > > > >
> > > >
> > > > But only device_type="memory" can be added to allocation.
> > > > For mmio there are two cases:
> > > > 1. mmio doesn't have NUMA id property.
> > > > 2. mmio has NUMA id property, just like some PCIe controllers.
> > > > But we don’t need to handle these kinds of MMIO devices
> > > > in memory block parsing. Because we don't need to allocate
> > > > memory from these mmio ranges. And for accessing, we need
> > > > a NUMA-aware PCIe controller driver or a generic NUMA-aware
> > > > MMIO accessing APIs.
> > >
> > > Yes, I am not too worried about devices with a NUMA id property because
> > > they are less common and this series doesn't handle them at all, right?
> > > I imagine they would be treated like any other device without NUMA
> > > awareness.
> > >
> > > I am thinking about the case where the memory of each NUMA node is made
> > > of multiple banks. I understand that this patch adds an explicit check
> > > for cases where these banks are interleaving, however there are many
> > > other cases where NUMA memory nodes are *not* interleaving but they are
> > > still made of multiple discontinuous banks, like in the two example
> > > above.
> > >
> > > My question is whether this patch series in its current form can handle
> > > the two cases above correctly. If so, I am wondering how it works given
> > > that we only have a single "start" and "size" parameter per node.
> > >
> > > On the other hand if this series cannot handle the two cases above, my
> > > question is whether it would fail explicitly or not. The new
> > > check is_node_memory_continuous doesn't seem to be able to catch them.
> >
> >
> > Looking at numa_update_node_memblks, it is clear that the code is meant
> > to increase the range of each numa node to cover even MMIO regions in
> > between memory banks. Also see the comment at the top of the file:
> >
> > * Assumes all memory regions belonging to a single proximity domain
> > * are in one chunk. Holes between them will be included in the node.
> >
> > So if there are multiple banks for each node, start and end are
> > stretched to cover the holes between them, and it works as long as
> > memory banks of different NUMA nodes don't interleave.
> >
> > I would appreciate if you could add an in-code comment to explain this
> > on top of numa_update_node_memblk.
>
> Yes, I will do it.
Thank you
> > Have you had a chance to test this? If not it would be fantastic if you
> > could give it a quick test to make sure it works as intended: for
> > instance by creating multiple memory banks for each NUMA node by
> > splitting an real bank into two smaller banks with a hole in between in
> > device tree, just for the sake of testing.
>
> Yes, I have created some fake NUMA nodes in FVP device tree to test it.
> The intertwine of nodes' address can be detected.
>
> (XEN) SRAT: Node 0 0000000080000000-00000000ff000000
> (XEN) SRAT: Node 1 0000000880000000-00000008c0000000
> (XEN) NODE 0: (0000000080000000-00000008d0000000) intertwine with NODE 1 (0000000880000000-00000008c0000000)
Great thanks! And what if there are multiple non-contiguous memory banks
per node, but *not* intertwined. Does that all work correctly as
expected?
next prev parent reply other threads:[~2021-09-27 17:19 UTC|newest]
Thread overview: 192+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-23 12:01 [PATCH 00/37] Add device tree based NUMA support to Arm Wei Chen
2021-09-23 12:02 ` [PATCH 01/37] xen/arm: Print a 64-bit number in hex from early uart Wei Chen
2021-09-23 12:02 ` [PATCH 02/37] xen: introduce a Kconfig option to configure NUMA nodes number Wei Chen
2021-09-23 23:45 ` Stefano Stabellini
2021-09-24 1:24 ` Wei Chen
2021-09-24 8:55 ` Jan Beulich
2021-09-24 10:33 ` Wei Chen
2021-09-24 10:47 ` Jan Beulich
2021-09-23 12:02 ` [PATCH 03/37] xen/x86: Initialize memnodemapsize while faking NUMA node Wei Chen
2021-09-24 8:57 ` Jan Beulich
2021-09-24 10:34 ` Wei Chen
2021-09-23 12:02 ` [PATCH 04/37] xen: introduce an arch helper for default dma zone status Wei Chen
2021-09-23 23:55 ` Stefano Stabellini
2021-09-24 1:50 ` Wei Chen
2022-01-17 16:10 ` Jan Beulich
2022-01-18 7:51 ` Wei Chen
2022-01-18 8:16 ` Jan Beulich
2022-01-18 9:20 ` Wei Chen
2022-01-18 14:16 ` Jan Beulich
2022-01-19 2:49 ` Wei Chen
2022-01-19 7:50 ` Jan Beulich
2022-01-19 8:33 ` Wei Chen
2021-09-23 12:02 ` [PATCH 05/37] xen: decouple NUMA from ACPI in Kconfig Wei Chen
2021-09-23 12:02 ` [PATCH 06/37] xen/arm: use !CONFIG_NUMA to keep fake NUMA API Wei Chen
2021-09-24 0:05 ` Stefano Stabellini
2021-09-24 10:21 ` Wei Chen
2021-09-23 12:02 ` [PATCH 07/37] xen/x86: use paddr_t for addresses in NUMA node structure Wei Chen
2021-09-24 0:11 ` Stefano Stabellini
2021-09-24 0:13 ` Stefano Stabellini
2021-09-24 3:00 ` Wei Chen
2022-01-18 15:22 ` Jan Beulich
2022-01-19 6:33 ` Wei Chen
2022-01-19 7:55 ` Jan Beulich
2022-01-19 8:36 ` Wei Chen
2021-09-23 12:02 ` [PATCH 08/37] xen/x86: add detection of discontinous node memory range Wei Chen
2021-09-24 0:25 ` Stefano Stabellini
2021-09-24 4:28 ` Wei Chen
2021-09-24 19:52 ` Stefano Stabellini
2021-09-26 10:11 ` Wei Chen
2021-09-27 3:13 ` Stefano Stabellini
2021-09-27 5:05 ` Stefano Stabellini
2021-09-27 9:50 ` Wei Chen
2021-09-27 17:19 ` Stefano Stabellini [this message]
2021-09-28 4:41 ` Wei Chen
2021-09-28 4:59 ` Stefano Stabellini
2022-01-18 16:13 ` Jan Beulich
2022-01-19 7:33 ` Wei Chen
2022-01-19 8:01 ` Jan Beulich
2022-01-19 8:24 ` Wei Chen
2021-09-23 12:02 ` [PATCH 09/37] xen/x86: introduce two helpers to access memory hotplug end Wei Chen
2021-09-24 0:29 ` Stefano Stabellini
2021-09-24 4:21 ` Wei Chen
2022-01-24 16:24 ` Jan Beulich
2022-01-26 7:53 ` Wei Chen
2021-09-23 12:02 ` [PATCH 10/37] xen/x86: use helpers to access/update mem_hotplug Wei Chen
2021-09-24 0:31 ` Stefano Stabellini
2021-09-24 4:29 ` Wei Chen
2022-01-24 16:29 ` Jan Beulich
2022-01-26 7:58 ` Wei Chen
2021-09-23 12:02 ` [PATCH 11/37] xen/x86: abstract neutral code from acpi_numa_memory_affinity_init Wei Chen
2021-09-24 0:38 ` Stefano Stabellini
2022-01-24 16:50 ` Jan Beulich
2022-01-26 10:39 ` Wei Chen
2021-09-23 12:02 ` [PATCH 12/37] xen/x86: decouple nodes_cover_memory from E820 map Wei Chen
2021-09-24 0:39 ` Stefano Stabellini
2022-01-24 16:59 ` Jan Beulich
2022-01-27 8:03 ` Wei Chen
2022-01-27 8:08 ` Jan Beulich
2022-01-27 9:03 ` Wei Chen
2022-01-27 9:22 ` Jan Beulich
2022-01-27 9:27 ` Wei Chen
2021-09-23 12:02 ` [PATCH 13/37] xen/x86: decouple processor_nodes_parsed from acpi numa functions Wei Chen
2021-09-24 0:40 ` Stefano Stabellini
2022-01-25 9:49 ` Jan Beulich
2022-01-27 8:06 ` Wei Chen
2021-09-23 12:02 ` [PATCH 14/37] xen/x86: use name fw_numa to replace acpi_numa Wei Chen
2021-09-24 0:40 ` Stefano Stabellini
2022-01-25 10:12 ` Jan Beulich
2022-01-27 8:09 ` Wei Chen
2021-09-23 12:02 ` [PATCH 15/37] xen/x86: rename acpi_scan_nodes to numa_scan_nodes Wei Chen
2021-09-24 0:40 ` Stefano Stabellini
2022-01-25 10:17 ` Jan Beulich
2022-01-27 8:14 ` Wei Chen
2021-09-23 12:02 ` [PATCH 16/37] xen/x86: export srat_bad to external Wei Chen
2021-09-24 0:41 ` Stefano Stabellini
2022-01-25 10:22 ` Jan Beulich
2022-01-27 8:35 ` Wei Chen
2022-01-27 8:37 ` Jan Beulich
2022-01-27 8:47 ` Wei Chen
2021-09-23 12:02 ` [PATCH 17/37] xen/x86: use CONFIG_NUMA to gate numa_scan_nodes Wei Chen
2021-09-24 0:41 ` Stefano Stabellini
2022-01-25 10:26 ` Jan Beulich
2022-01-27 8:37 ` Wei Chen
2021-09-23 12:02 ` [PATCH 18/37] xen: move NUMA common code from x86 to common Wei Chen
2021-09-23 12:02 ` [PATCH 19/37] xen/x86: promote VIRTUAL_BUG_ON to ASSERT in Wei Chen
2022-01-17 16:21 ` Jan Beulich
2022-01-18 7:52 ` Wei Chen
2021-09-23 12:02 ` [PATCH 20/37] xen: introduce CONFIG_EFI to stub API for non-EFI architecture Wei Chen
2021-09-24 1:15 ` Stefano Stabellini
2021-09-24 4:34 ` Wei Chen
2021-09-24 7:58 ` Jan Beulich
2021-09-24 10:31 ` Wei Chen
2021-09-24 10:49 ` Jan Beulich
2021-09-26 10:25 ` Wei Chen
2021-09-27 10:28 ` Wei Chen
2021-09-28 0:59 ` Stefano Stabellini
2021-09-28 4:16 ` Wei Chen
2021-09-28 5:01 ` Stefano Stabellini
2021-09-28 8:02 ` Jan Beulich
2021-10-03 23:28 ` Wei Chen
2022-01-25 10:34 ` Jan Beulich
2022-01-27 8:44 ` Wei Chen
2022-01-27 8:51 ` Wei Chen
2022-01-27 9:00 ` Jan Beulich
2022-01-27 9:09 ` Wei Chen
2022-01-27 9:16 ` Jan Beulich
2022-01-27 9:25 ` Wei Chen
2022-01-27 9:27 ` Jan Beulich
2022-01-27 10:00 ` Julien Grall
2022-01-28 4:35 ` Wei Chen
2021-09-23 12:02 ` [PATCH 21/37] xen/arm: Keep memory nodes in dtb for NUMA when boot from EFI Wei Chen
2021-09-24 1:23 ` Stefano Stabellini
2021-09-24 4:36 ` Wei Chen
2022-01-25 10:38 ` Jan Beulich
2022-01-27 8:45 ` Wei Chen
2021-09-23 12:02 ` [PATCH 22/37] xen/arm: use NR_MEM_BANKS to override default NR_NODE_MEMBLKS Wei Chen
2021-09-24 1:34 ` Stefano Stabellini
2021-09-26 13:13 ` Wei Chen
2021-09-27 3:25 ` Stefano Stabellini
2021-09-27 4:18 ` Wei Chen
2021-09-27 4:59 ` Stefano Stabellini
2021-09-27 6:25 ` Julien Grall
2021-09-27 6:46 ` Wei Chen
2021-09-27 6:53 ` Wei Chen
2021-09-27 7:35 ` Julien Grall
2021-09-27 10:21 ` Wei Chen
2021-09-27 10:39 ` Julien Grall
2021-09-27 16:58 ` Stefano Stabellini
2021-09-28 2:57 ` Wei Chen
2021-09-23 12:02 ` [PATCH 23/37] xen/arm: implement node distance helpers for Arm Wei Chen
2021-09-24 1:46 ` Stefano Stabellini
2021-09-24 4:41 ` Wei Chen
2021-09-24 19:36 ` Stefano Stabellini
2021-09-26 10:15 ` Wei Chen
2021-09-23 12:02 ` [PATCH 24/37] xen/arm: implement two arch helpers to get memory map info Wei Chen
2021-09-24 2:06 ` Stefano Stabellini
2021-09-24 4:42 ` Wei Chen
2021-09-23 12:02 ` [PATCH 25/37] xen/arm: implement bad_srat for Arm NUMA initialization Wei Chen
2021-09-24 2:09 ` Stefano Stabellini
2021-09-24 4:45 ` Wei Chen
2021-09-24 8:07 ` Jan Beulich
2021-09-24 19:33 ` Stefano Stabellini
2021-09-23 12:02 ` [PATCH 26/37] xen/arm: build NUMA cpu_to_node map in dt_smp_init_cpus Wei Chen
2021-09-24 2:26 ` Stefano Stabellini
2021-09-24 4:25 ` Wei Chen
2021-09-23 12:02 ` [PATCH 27/37] xen/arm: Add boot and secondary CPU to NUMA system Wei Chen
2021-09-23 12:02 ` [PATCH 28/37] xen/arm: stub memory hotplug access helpers for Arm Wei Chen
2021-09-24 2:33 ` Stefano Stabellini
2021-09-24 4:26 ` Wei Chen
2021-09-23 12:02 ` [PATCH 29/37] xen/arm: introduce a helper to parse device tree processor node Wei Chen
2021-09-24 2:44 ` Stefano Stabellini
2021-09-24 4:46 ` Wei Chen
2021-09-23 12:02 ` [PATCH 30/37] xen/arm: introduce a helper to parse device tree memory node Wei Chen
2021-09-24 3:05 ` Stefano Stabellini
2021-09-24 7:54 ` Wei Chen
2021-09-23 12:02 ` [PATCH 31/37] xen/arm: introduce a helper to parse device tree NUMA distance map Wei Chen
2021-09-24 3:05 ` Stefano Stabellini
2021-09-24 5:23 ` Wei Chen
2021-09-23 12:02 ` [PATCH 32/37] xen/arm: unified entry to parse all NUMA data from device tree Wei Chen
2021-09-24 3:16 ` Stefano Stabellini
2021-09-24 7:58 ` Wei Chen
2021-09-24 19:42 ` Stefano Stabellini
2021-09-23 12:02 ` [PATCH 33/37] xen/arm: keep guest still be NUMA unware Wei Chen
2021-09-24 3:19 ` Stefano Stabellini
2021-09-24 10:23 ` Wei Chen
2021-09-23 12:02 ` [PATCH 34/37] xen/arm: enable device tree based NUMA in system init Wei Chen
2021-09-24 3:28 ` Stefano Stabellini
2021-09-24 9:52 ` Wei Chen
2021-09-23 12:02 ` [PATCH 35/37] xen/arm: use CONFIG_NUMA to gate node_online_map in smpboot Wei Chen
2021-09-23 12:02 ` [PATCH 36/37] xen/arm: Provide Kconfig options for Arm to enable NUMA Wei Chen
2021-09-24 3:31 ` Stefano Stabellini
2021-09-24 10:13 ` Wei Chen
2021-09-24 19:39 ` Stefano Stabellini
2021-09-27 8:33 ` Jan Beulich
2021-09-27 8:45 ` Julien Grall
2021-09-27 9:17 ` Jan Beulich
2021-09-27 17:17 ` Stefano Stabellini
2021-09-28 2:59 ` Wei Chen
2021-09-28 3:30 ` Stefano Stabellini
2021-09-24 10:25 ` Jan Beulich
2021-09-24 10:37 ` Wei Chen
2021-09-23 12:02 ` [PATCH 37/37] docs: update numa command line to support Arm Wei Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.21.2109271018220.5022@sstabellini-ThinkPad-T480s \
--to=sstabellini@kernel.org \
--cc=Bertrand.Marquis@arm.com \
--cc=Wei.Chen@arm.com \
--cc=andrew.cooper3@citrix.com \
--cc=jbeulich@suse.com \
--cc=julien@xen.org \
--cc=roger.pau@citrix.com \
--cc=wl@xen.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).