On Fri, 24 Sep 2021, Wei Chen wrote: > > -----Original Message----- > > From: Stefano Stabellini > > Sent: 2021年9月24日 8:26 > > To: Wei Chen > > Cc: xen-devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org; > > Bertrand Marquis ; jbeulich@suse.com; > > andrew.cooper3@citrix.com; roger.pau@citrix.com; wl@xen.org > > Subject: Re: [PATCH 08/37] xen/x86: add detection of discontinous node > > memory range > > > > CC'ing x86 maintainers > > > > On Thu, 23 Sep 2021, Wei Chen wrote: > > > One NUMA node may contain several memory blocks. In current Xen > > > code, Xen will maintain a node memory range for each node to cover > > > all its memory blocks. But here comes the problem, in the gap of > > > one node's two memory blocks, if there are some memory blocks don't > > > belong to this node (remote memory blocks). This node's memory range > > > will be expanded to cover these remote memory blocks. > > > > > > One node's memory range contains othe nodes' memory, this is obviously > > > not very reasonable. This means current NUMA code only can support > > > node has continous memory blocks. However, on a physical machine, the > > > addresses of multiple nodes can be interleaved. > > > > > > So in this patch, we add code to detect discontinous memory blocks > > > for one node. NUMA initializtion will be failed and error messages > > > will be printed when Xen detect such hardware configuration. > > > > At least on ARM, it is not just memory that can be interleaved, but also > > MMIO regions. For instance: > > > > node0 bank0 0-0x1000000 > > MMIO 0x1000000-0x1002000 > > Hole 0x1002000-0x2000000 > > node0 bank1 0x2000000-0x3000000 > > > > So I am not familiar with the SRAT format, but I think on ARM the check > > would look different: we would just look for multiple memory ranges > > under a device_type = "memory" node of a NUMA node in device tree. > > > > > > Should I need to include/refine above message to commit log? Let me ask you a question first. With the NUMA implementation of this patch series, can we deal with cases where each node has multiple memory banks, not interleaved? An an example: node0: 0x0 - 0x10000000 MMIO : 0x10000000 - 0x20000000 node0: 0x20000000 - 0x30000000 MMIO : 0x30000000 - 0x50000000 node1: 0x50000000 - 0x60000000 MMIO : 0x60000000 - 0x80000000 node2: 0x80000000 - 0x90000000 I assume we can deal with this case simply by setting node0 memory to 0x0-0x30000000 even if there is actually something else, a device, that doesn't belong to node0 in between the two node0 banks? Is it only other nodes' memory interleaved that cause issues? In other words, only the following is a problematic scenario? node0: 0x0 - 0x10000000 MMIO : 0x10000000 - 0x20000000 node1: 0x20000000 - 0x30000000 MMIO : 0x30000000 - 0x50000000 node0: 0x50000000 - 0x60000000 Because node1 is in between the two ranges of node0? I am asking these questions because it is certainly possible to have multiple memory ranges for each NUMA node in device tree, either by specifying multiple ranges with a single "reg" property, or by specifying multiple memory nodes with the same numa-node-id.