All of lore.kernel.org
 help / color / mirror / Atom feed
* [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64
@ 2021-08-11 10:23 Wei Chen
  2021-08-11 10:23 ` [XEN RFC PATCH 01/40] tools: Fix -Werror=maybe-uninitialized for xlu_pci_parse_bdf Wei Chen
                   ` (42 more replies)
  0 siblings, 43 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:23 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

Xen memory allocation and scheduler modules are NUMA aware.
But actually, on x86 has implemented the architecture APIs
to support NUMA. Arm was providing a set of fake architecture
APIs to make it compatible with NUMA awared memory allocation
and scheduler.

Arm system was working well as a single node NUMA system with
these fake APIs, because we didn't have multiple nodes NUMA
system on Arm. But in recent years, more and more Arm devices
support multiple nodes NUMA system. Like TX2, some Hisilicon
chips and the Ampere Altra.

So now we have a new problem. When Xen is running on these Arm
devices, Xen still treat them as single node SMP systems. The
NUMA affinity capability of Xen memory allocation and scheduler
becomes meaningless. Because they rely on input data that does
not reflect real NUMA layout.

Xen still think the access time for all of the memory is the
same for all CPUs. However, Xen may allocate memory to a VM
from different NUMA nodes with different access speeds. This
difference can be amplified in workloads inside VM, causing
performance instability and timeouts. 

So in this patch series, we implement a set of NUMA API to use
device tree to describe the NUMA layout. We reuse most of the
code of x86 NUMA to create and maintain the mapping between
memory and CPU, create the matrix between any two NUMA nodes.
Except ACPI and some x86 specified code, we have moved other
code to common. In next stage, when we implement ACPI based
NUMA for Arm64, we may move the ACPI NUMA code to common too,
but in current stage, we keep it as x86 only.

This patch serires has been tested and booted well on one
Arm64 NUMA machine and one HPE x86 NUMA machine.

Hongda Deng (2):
  xen/arm: return default DMA bit width when platform is not set
  xen/arm: Fix lowmem_bitsize when arch_get_dma_bitsize return 0

Wei Chen (38):
  tools: Fix -Werror=maybe-uninitialized for xlu_pci_parse_bdf
  xen/arm: Print a 64-bit number in hex from early uart
  xen/x86: Initialize memnodemapsize while faking NUMA node
  xen: decouple NUMA from ACPI in Kconfig
  xen/arm: use !CONFIG_NUMA to keep fake NUMA API
  xen/x86: Move NUMA memory node map functions to common
  xen/x86: Move numa_add_cpu_node to common
  xen/x86: Move NR_NODE_MEMBLKS macro to common
  xen/x86: Move NUMA nodes and memory block ranges to common
  xen/x86: Move numa_initmem_init to common
  xen/arm: introduce numa_set_node for Arm
  xen/arm: set NUMA nodes max number to 64 by default
  xen/x86: move NUMA API from x86 header to common header
  xen/arm: Create a fake NUMA node to use common code
  xen/arm: Introduce DEVICE_TREE_NUMA Kconfig for arm64
  xen/arm: Keep memory nodes in dtb for NUMA when boot from EFI
  xen: fdt: Introduce a helper to check fdt node type
  xen/arm: implement node distance helpers for Arm64
  xen/arm: introduce device_tree_numa as a switch for device tree NUMA
  xen/arm: introduce a helper to parse device tree processor node
  xen/arm: introduce a helper to parse device tree memory node
  xen/arm: introduce a helper to parse device tree NUMA distance map
  xen/arm: unified entry to parse all NUMA data from device tree
  xen/arm: Add boot and secondary CPU to NUMA system
  xen/arm: build CPU NUMA node map while creating cpu_logical_map
  xen/x86: decouple nodes_cover_memory with E820 map
  xen/arm: implement Arm arch helpers Arm to get memory map info
  xen: move NUMA memory and CPU parsed nodemasks to common
  xen/x86: move nodes_cover_memory to common
  xen/x86: make acpi_scan_nodes to be neutral
  xen: export bad_srat and srat_disabled to extern
  xen: move numa_scan_nodes from x86 to common
  xen: enable numa_scan_nodes for device tree based NUMA
  xen/arm: keep guest still be NUMA unware
  xen: introduce an arch helper to do NUMA init failed fallback
  xen/arm: enable device tree based NUMA in system init
  xen/x86: move numa_setup to common to support NUMA switch in command
    line
  xen/x86: move dump_numa info hotkey to common

 tools/libs/util/libxlu_pci.c    |   3 +-
 xen/arch/arm/Kconfig            |  10 +
 xen/arch/arm/Makefile           |   2 +
 xen/arch/arm/arm64/head.S       |   9 +-
 xen/arch/arm/bootfdt.c          |   8 +-
 xen/arch/arm/domain_build.c     |  17 +-
 xen/arch/arm/efi/efi-boot.h     |  25 --
 xen/arch/arm/numa.c             | 162 +++++++++
 xen/arch/arm/numa_device_tree.c | 292 ++++++++++++++++
 xen/arch/arm/platform.c         |   4 +-
 xen/arch/arm/setup.c            |  14 +
 xen/arch/arm/smpboot.c          |  37 +-
 xen/arch/x86/Kconfig            |   2 +-
 xen/arch/x86/numa.c             | 421 +----------------------
 xen/arch/x86/srat.c             | 147 +-------
 xen/common/Kconfig              |   3 +
 xen/common/Makefile             |   1 +
 xen/common/libfdt/fdt_ro.c      |  15 +
 xen/common/numa.c               | 588 ++++++++++++++++++++++++++++++++
 xen/common/page_alloc.c         |   2 +-
 xen/drivers/acpi/Kconfig        |   3 +-
 xen/drivers/acpi/Makefile       |   2 +-
 xen/include/asm-arm/numa.h      |  33 ++
 xen/include/asm-arm/setup.h     |   6 +
 xen/include/asm-x86/acpi.h      |   4 -
 xen/include/asm-x86/config.h    |   1 -
 xen/include/asm-x86/numa.h      |  65 +---
 xen/include/asm-x86/setup.h     |   1 -
 xen/include/xen/libfdt/libfdt.h |  25 ++
 xen/include/xen/nodemask.h      |   2 +
 xen/include/xen/numa.h          |  80 +++++
 31 files changed, 1325 insertions(+), 659 deletions(-)
 create mode 100644 xen/arch/arm/numa.c
 create mode 100644 xen/arch/arm/numa_device_tree.c
 create mode 100644 xen/common/numa.c

-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 01/40] tools: Fix -Werror=maybe-uninitialized for xlu_pci_parse_bdf
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
@ 2021-08-11 10:23 ` Wei Chen
  2021-08-11 10:49   ` Jan Beulich
  2021-08-11 10:23 ` [XEN RFC PATCH 02/40] xen/arm: Print a 64-bit number in hex from early uart Wei Chen
                   ` (41 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:23 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

| libxlu_pci.c: In function 'xlu_pci_parse_bdf':
| libxlu_pci.c:32:18: error: 'func' may be used uninitialized in this function [-Werror=maybe-uninitialized]
|    32 |     pcidev->func = func;
|       |     ~~~~~~~~~~~~~^~~~~~
| libxlu_pci.c:51:29: note: 'func' was declared here
|    51 |     unsigned dom, bus, dev, func, vslot = 0;
|       |                             ^~~~
| libxlu_pci.c:31:17: error: 'dev' may be used uninitialized in this function [-Werror=maybe-uninitialized]
|    31 |     pcidev->dev = dev;
|       |     ~~~~~~~~~~~~^~~~~
| libxlu_pci.c:51:24: note: 'dev' was declared here
|    51 |     unsigned dom, bus, dev, func, vslot = 0;
|       |                        ^~~
| libxlu_pci.c:30:17: error: 'bus' may be used uninitialized in this function [-Werror=maybe-uninitialized]
|    30 |     pcidev->bus = bus;
|       |     ~~~~~~~~~~~~^~~~~
| libxlu_pci.c:51:19: note: 'bus' was declared here
|    51 |     unsigned dom, bus, dev, func, vslot = 0;
|       |                   ^~~
| libxlu_pci.c:78:26: error: 'dom' may be used uninitialized in this function [-Werror=maybe-uninitialized]
|    78 |                 if ( dom & ~0xff )
|       |                      ~~~~^~~~~~~

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 tools/libs/util/libxlu_pci.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tools/libs/util/libxlu_pci.c b/tools/libs/util/libxlu_pci.c
index 551d8e3aed..b38e9aab40 100644
--- a/tools/libs/util/libxlu_pci.c
+++ b/tools/libs/util/libxlu_pci.c
@@ -15,7 +15,7 @@ static int parse_bdf(libxl_device_pci *pci, const char *str, const char **endp)
 {
     const char *ptr = str;
     unsigned int colons = 0;
-    unsigned int domain, bus, dev, func;
+    unsigned int domain = 0, bus = 0, dev = 0, func = 0;
     int n;
 
     /* Count occurrences of ':' to detrmine presence/absence of the 'domain' */
@@ -28,7 +28,6 @@ static int parse_bdf(libxl_device_pci *pci, const char *str, const char **endp)
     ptr = str;
     switch (colons) {
     case 1:
-        domain = 0;
         if (sscanf(ptr, "%x:%x.%n", &bus, &dev, &n) != 2)
             return ERROR_INVAL;
         break;
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 02/40] xen/arm: Print a 64-bit number in hex from early uart
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
  2021-08-11 10:23 ` [XEN RFC PATCH 01/40] tools: Fix -Werror=maybe-uninitialized for xlu_pci_parse_bdf Wei Chen
@ 2021-08-11 10:23 ` Wei Chen
  2021-08-19 13:05   ` Julien Grall
  2021-08-11 10:23 ` [XEN RFC PATCH 03/40] xen/x86: Initialize memnodemapsize while faking NUMA node Wei Chen
                   ` (40 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:23 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

Current putn function that is using for early print
only can print low 32-bit of AArch64 register. This
will lose some important messages while debugging
with early console. For example:
(XEN) Bringing up CPU5
- CPU 0000000100000100 booting -
Will be truncated to
(XEN) Bringing up CPU5
- CPU 00000100 booting -

In this patch, we increased the print loops and shift
bits to make putn print 64-bit number.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/arm64/head.S | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index aa1f88c764..b32639d7d6 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -862,17 +862,18 @@ puts:
         ret
 ENDPROC(puts)
 
-/* Print a 32-bit number in hex.  Specific to the PL011 UART.
+/* Print a 64-bit number in hex.  Specific to the PL011 UART.
  * x0: Number to print.
  * x23: Early UART base address
  * Clobbers x0-x3 */
+#define PRINT_MASK 0xf000000000000000
 putn:
         adr   x1, hex
-        mov   x3, #8
+        mov   x3, #16
 1:
         early_uart_ready x23, 2
-        and   x2, x0, #0xf0000000    /* Mask off the top nybble */
-        lsr   x2, x2, #28
+        and   x2, x0, #PRINT_MASK    /* Mask off the top nybble */
+        lsr   x2, x2, #60
         ldrb  w2, [x1, x2]           /* Convert to a char */
         early_uart_transmit x23, w2
         lsl   x0, x0, #4             /* Roll it through one nybble at a time */
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 03/40] xen/x86: Initialize memnodemapsize while faking NUMA node
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
  2021-08-11 10:23 ` [XEN RFC PATCH 01/40] tools: Fix -Werror=maybe-uninitialized for xlu_pci_parse_bdf Wei Chen
  2021-08-11 10:23 ` [XEN RFC PATCH 02/40] xen/arm: Print a 64-bit number in hex from early uart Wei Chen
@ 2021-08-11 10:23 ` Wei Chen
  2021-08-12 15:32   ` Jan Beulich
  2021-08-11 10:23 ` [XEN RFC PATCH 04/40] xen/arm: return default DMA bit width when platform is not set Wei Chen
                   ` (39 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:23 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

When system turns NUMA off or system lacks of NUMA support,
Xen will fake a NUMA node to make system works as a single
node NUMA system.

In this case the memory node map doesn't need to be allocated
from boot pages. But we should set the memnodemapsize to the
array size of _memnodemap. Xen hadn't done it, and Xen should
assert in phys_to_nid. But because x86 was using an empty
macro "VIRTUAL_BUG_ON" to replace ASSERT, this bug will not
be triggered.

In this patch, we set memnodemapsize to ARRAY_SIZE(_memnodemap)
to fix it.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/x86/numa.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
index f1066c59c7..d23f4f7919 100644
--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -270,6 +270,8 @@ void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
     /* setup dummy node covering all memory */
     memnode_shift = BITS_PER_LONG - 1;
     memnodemap = _memnodemap;
+    memnodemapsize = ARRAY_SIZE(_memnodemap);
+
     nodes_clear(node_online_map);
     node_set_online(0);
     for ( i = 0; i < nr_cpu_ids; i++ )
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 04/40] xen/arm: return default DMA bit width when platform is not set
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (2 preceding siblings ...)
  2021-08-11 10:23 ` [XEN RFC PATCH 03/40] xen/x86: Initialize memnodemapsize while faking NUMA node Wei Chen
@ 2021-08-11 10:23 ` Wei Chen
  2021-08-11 10:54   ` Jan Beulich
  2021-08-19 13:28   ` Julien Grall
  2021-08-11 10:23 ` [XEN RFC PATCH 05/40] xen/arm: Fix lowmem_bitsize when arch_get_dma_bitsize return 0 Wei Chen
                   ` (38 subsequent siblings)
  42 siblings, 2 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:23 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

From: Hongda Deng <Hongda.Deng@arm.com>

In current code, arch_get_dma_bitsize will return 32 when platorm
or platform->dma_bitsize is not set. It's not resonable, for Arm,
we don't require to reserve DMA memory. So we set dma_bitsize always
be 0. In NO-NUMA system, arch_get_dma_bitsize will not be invoked,
so dma_bitsize will not be overrided by this function. But in NUMA
system, once the online nodes are greater than 1, this function will
be invoked. The dma_bitsize will be limited to 32. That means, only
first 4GB memory can be used for DMA. But that's against our hardware
design. We don't have that kind of restriction on hardware. Only
platform setting can override dma_bitsize. So in this patch, we
return default dma_bitsize, when platform and platorm->dma_bitsize
are not set.

Signed-off-by: Wei Chen <wei.chen@arm.com>
Signed-off-by: Hongda Deng <Hongda.Deng@arm.com>
---
 xen/arch/arm/platform.c | 4 +++-
 xen/common/page_alloc.c | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/platform.c b/xen/arch/arm/platform.c
index 4db5bbb4c5..0a27fef9a4 100644
--- a/xen/arch/arm/platform.c
+++ b/xen/arch/arm/platform.c
@@ -27,6 +27,7 @@ extern const struct platform_desc _splatform[], _eplatform[];
 /* Pointer to the current platform description */
 static const struct platform_desc *platform;
 
+extern unsigned int dma_bitsize;
 
 static bool __init platform_is_compatible(const struct platform_desc *plat)
 {
@@ -157,7 +158,8 @@ bool platform_device_is_blacklisted(const struct dt_device_node *node)
 
 unsigned int arch_get_dma_bitsize(void)
 {
-    return ( platform && platform->dma_bitsize ) ? platform->dma_bitsize : 32;
+    return ( platform && platform->dma_bitsize ) ? platform->dma_bitsize
+                                                 : dma_bitsize;
 }
 
 /*
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 958ba0cd92..0f0cae5a4e 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -227,7 +227,7 @@ static bool __read_mostly scrub_debug;
  * Bit width of the DMA heap -- used to override NUMA-node-first.
  * allocation strategy, which can otherwise exhaust low memory.
  */
-static unsigned int dma_bitsize;
+unsigned int dma_bitsize;
 integer_param("dma_bits", dma_bitsize);
 
 /* Offlined page list, protected by heap_lock. */
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 05/40] xen/arm: Fix lowmem_bitsize when arch_get_dma_bitsize return 0
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (3 preceding siblings ...)
  2021-08-11 10:23 ` [XEN RFC PATCH 04/40] xen/arm: return default DMA bit width when platform is not set Wei Chen
@ 2021-08-11 10:23 ` Wei Chen
  2021-08-19 13:32   ` Julien Grall
  2021-08-11 10:23 ` [XEN RFC PATCH 06/40] xen: decouple NUMA from ACPI in Kconfig Wei Chen
                   ` (37 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:23 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

From: Hongda Deng <Hongda.Deng@arm.com>

In previous patch, we make arch_get_dma_bitsize return 0 when
dma_bitsize and platform->dma_bitsize are not set. But this
will affect lowmem_bitsize in allocate_memory_11 for domain0.
Because this function depends lowmem_bitsize to allocate memory
below 4GB.

In current code, when arch_get_dma_bitsize return 0, lowmem_bitsize
will be set to 0. In this case, we will get "No bank has been
allocated below 0-bit." message while allocating domain0 memory.
And the lowmem will be set to false.

This behavior is inconsistent with what allocate_memory_11 done
before, and doesn't meet this functions requirements. So we
check arch_get_dma_bitsize's return value before set lowmem_bitsize.
Avoid setting lowmem_bitsize to 0 by mistake.

Signed-off-by: Wei Chen <wei.chen@arm.com>
Signed-off-by: Hongda Deng <Hongda.Deng@arm.com>
---
 xen/arch/arm/domain_build.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 6c86d52781..cf341f349f 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -265,9 +265,18 @@ static void __init allocate_memory_11(struct domain *d,
     int i;
 
     bool lowmem = true;
-    unsigned int lowmem_bitsize = min(32U, arch_get_dma_bitsize());
+    unsigned int lowmem_bitsize = arch_get_dma_bitsize();
     unsigned int bits;
 
+    /*
+       When dma_bitsize and platform->dma_bitsize are not set,
+       arch_get_dma_bitsize will return 0. That means this system
+       doesn't need to reserve memory for DMA. But in order to
+       meet above requirements, we still need to try to allocate
+       memory below 4GB for Dom0.
+    */
+    lowmem_bitsize = lowmem_bitsize ? min(32U, lowmem_bitsize) : 32U;
+
     /*
      * TODO: Implement memory bank allocation when DOM0 is not direct
      * mapped
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 06/40] xen: decouple NUMA from ACPI in Kconfig
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (4 preceding siblings ...)
  2021-08-11 10:23 ` [XEN RFC PATCH 05/40] xen/arm: Fix lowmem_bitsize when arch_get_dma_bitsize return 0 Wei Chen
@ 2021-08-11 10:23 ` Wei Chen
  2021-08-12 15:36   ` Jan Beulich
  2021-08-12 16:54   ` Julien Grall
  2021-08-11 10:23 ` [XEN RFC PATCH 07/40] xen/arm: use !CONFIG_NUMA to keep fake NUMA API Wei Chen
                   ` (36 subsequent siblings)
  42 siblings, 2 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:23 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

In current Xen code only implments x86 ACPI-based NUMA support.
So in Xen Kconfig system, NUMA equals to ACPI_NUMA. x86 selects
NUMA by default, and CONFIG_ACPI_NUMA is hardcode in config.h.

In this patch series, we introduced device tree based NUMA for
Arm. That means we will have two NUMA implemetations, so in this
patch we decouple NUMA from ACPI based NUMA in Kconfig. Make NUMA
as a common feature, that device tree based NUMA also can select it.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/x86/Kconfig         | 2 +-
 xen/common/Kconfig           | 3 +++
 xen/drivers/acpi/Kconfig     | 3 ++-
 xen/drivers/acpi/Makefile    | 2 +-
 xen/include/asm-x86/config.h | 1 -
 5 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 9b164db641..7414aef113 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -24,7 +24,7 @@ config X86
 	select HAS_UBSAN
 	select HAS_VPCI if HVM
 	select NEEDS_LIBELF
-	select NUMA
+	select ACPI_NUMA
 
 config ARCH_DEFCONFIG
 	string
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 0ddd18e11a..b1f1145613 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -67,6 +67,9 @@ config MEM_ACCESS
 config NEEDS_LIBELF
 	bool
 
+config NUMA
+	bool
+
 menu "Speculative hardening"
 
 config SPECULATIVE_HARDEN_ARRAY
diff --git a/xen/drivers/acpi/Kconfig b/xen/drivers/acpi/Kconfig
index b64d3731fb..e3f3d8f4b1 100644
--- a/xen/drivers/acpi/Kconfig
+++ b/xen/drivers/acpi/Kconfig
@@ -5,5 +5,6 @@ config ACPI
 config ACPI_LEGACY_TABLES_LOOKUP
 	bool
 
-config NUMA
+config ACPI_NUMA
 	bool
+	select NUMA
diff --git a/xen/drivers/acpi/Makefile b/xen/drivers/acpi/Makefile
index 4f8e97228e..2fc5230253 100644
--- a/xen/drivers/acpi/Makefile
+++ b/xen/drivers/acpi/Makefile
@@ -3,7 +3,7 @@ obj-y += utilities/
 obj-$(CONFIG_X86) += apei/
 
 obj-bin-y += tables.init.o
-obj-$(CONFIG_NUMA) += numa.o
+obj-$(CONFIG_ACPI_NUMA) += numa.o
 obj-y += osl.o
 obj-$(CONFIG_HAS_CPUFREQ) += pmstat.o
 
diff --git a/xen/include/asm-x86/config.h b/xen/include/asm-x86/config.h
index 883c2ef0df..9a6f0a6edf 100644
--- a/xen/include/asm-x86/config.h
+++ b/xen/include/asm-x86/config.h
@@ -31,7 +31,6 @@
 /* Intel P4 currently has largest cache line (L2 line size is 128 bytes). */
 #define CONFIG_X86_L1_CACHE_SHIFT 7
 
-#define CONFIG_ACPI_NUMA 1
 #define CONFIG_ACPI_SRAT 1
 #define CONFIG_ACPI_CSTATE 1
 
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 07/40] xen/arm: use !CONFIG_NUMA to keep fake NUMA API
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (5 preceding siblings ...)
  2021-08-11 10:23 ` [XEN RFC PATCH 06/40] xen: decouple NUMA from ACPI in Kconfig Wei Chen
@ 2021-08-11 10:23 ` Wei Chen
  2021-08-19 13:34   ` Julien Grall
  2021-08-11 10:23 ` [XEN RFC PATCH 08/40] xen/x86: Move NUMA memory node map functions to common Wei Chen
                   ` (35 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:23 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

Only Arm64 supports NUMA, the CONFIG_NUMA could not be
enabled for Arm32. Even in Arm64, users still can disable
the CONFIG_NUMA through Kconfig option. In this case, keep
current fake NUMA API, will make Arm code still can work
with NUMA aware memory allocation and scheduler.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/include/asm-arm/numa.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
index 31a6de4e23..ab9c4a2448 100644
--- a/xen/include/asm-arm/numa.h
+++ b/xen/include/asm-arm/numa.h
@@ -5,6 +5,8 @@
 
 typedef u8 nodeid_t;
 
+#if !defined(CONFIG_NUMA)
+
 /* Fake one node for now. See also node_online_map. */
 #define cpu_to_node(cpu) 0
 #define node_to_cpumask(node)   (cpu_online_map)
@@ -25,6 +27,8 @@ extern mfn_t first_valid_mfn;
 #define node_start_pfn(nid) (mfn_x(first_valid_mfn))
 #define __node_distance(a, b) (20)
 
+#endif
+
 #endif /* __ARCH_ARM_NUMA_H */
 /*
  * Local variables:
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 08/40] xen/x86: Move NUMA memory node map functions to common
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (6 preceding siblings ...)
  2021-08-11 10:23 ` [XEN RFC PATCH 07/40] xen/arm: use !CONFIG_NUMA to keep fake NUMA API Wei Chen
@ 2021-08-11 10:23 ` Wei Chen
  2021-08-23 17:47   ` Julien Grall
  2021-08-11 10:23 ` [XEN RFC PATCH 09/40] xen/x86: Move numa_add_cpu_node " Wei Chen
                   ` (34 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:23 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

In the later patches we will add NUMA support to Arm. Arm
NUMA support will follow current memory node map management
as x86. So this part of code can be common, in this case,
we move this part of code from arch/x86 to common.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/x86/numa.c        | 114 --------------------------------
 xen/common/Makefile        |   1 +
 xen/common/numa.c          | 131 +++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/numa.h |  29 --------
 xen/include/xen/numa.h     |  35 ++++++++++
 5 files changed, 167 insertions(+), 143 deletions(-)
 create mode 100644 xen/common/numa.c

diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
index d23f4f7919..a6211be121 100644
--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -29,14 +29,6 @@ custom_param("numa", numa_setup);
 /* from proto.h */
 #define round_up(x,y) ((((x)+(y))-1) & (~((y)-1)))
 
-struct node_data node_data[MAX_NUMNODES];
-
-/* Mapping from pdx to node id */
-int memnode_shift;
-static typeof(*memnodemap) _memnodemap[64];
-unsigned long memnodemapsize;
-u8 *memnodemap;
-
 nodeid_t cpu_to_node[NR_CPUS] __read_mostly = {
     [0 ... NR_CPUS-1] = NUMA_NO_NODE
 };
@@ -58,112 +50,6 @@ int srat_disabled(void)
     return numa_off || acpi_numa < 0;
 }
 
-/*
- * Given a shift value, try to populate memnodemap[]
- * Returns :
- * 1 if OK
- * 0 if memnodmap[] too small (of shift too small)
- * -1 if node overlap or lost ram (shift too big)
- */
-static int __init populate_memnodemap(const struct node *nodes,
-                                      int numnodes, int shift, nodeid_t *nodeids)
-{
-    unsigned long spdx, epdx;
-    int i, res = -1;
-
-    memset(memnodemap, NUMA_NO_NODE, memnodemapsize * sizeof(*memnodemap));
-    for ( i = 0; i < numnodes; i++ )
-    {
-        spdx = paddr_to_pdx(nodes[i].start);
-        epdx = paddr_to_pdx(nodes[i].end - 1) + 1;
-        if ( spdx >= epdx )
-            continue;
-        if ( (epdx >> shift) >= memnodemapsize )
-            return 0;
-        do {
-            if ( memnodemap[spdx >> shift] != NUMA_NO_NODE )
-                return -1;
-
-            if ( !nodeids )
-                memnodemap[spdx >> shift] = i;
-            else
-                memnodemap[spdx >> shift] = nodeids[i];
-
-            spdx += (1UL << shift);
-        } while ( spdx < epdx );
-        res = 1;
-    }
-
-    return res;
-}
-
-static int __init allocate_cachealigned_memnodemap(void)
-{
-    unsigned long size = PFN_UP(memnodemapsize * sizeof(*memnodemap));
-    unsigned long mfn = mfn_x(alloc_boot_pages(size, 1));
-
-    memnodemap = mfn_to_virt(mfn);
-    mfn <<= PAGE_SHIFT;
-    size <<= PAGE_SHIFT;
-    printk(KERN_DEBUG "NUMA: Allocated memnodemap from %lx - %lx\n",
-           mfn, mfn + size);
-    memnodemapsize = size / sizeof(*memnodemap);
-
-    return 0;
-}
-
-/*
- * The LSB of all start and end addresses in the node map is the value of the
- * maximum possible shift.
- */
-static int __init extract_lsb_from_nodes(const struct node *nodes,
-                                         int numnodes)
-{
-    int i, nodes_used = 0;
-    unsigned long spdx, epdx;
-    unsigned long bitfield = 0, memtop = 0;
-
-    for ( i = 0; i < numnodes; i++ )
-    {
-        spdx = paddr_to_pdx(nodes[i].start);
-        epdx = paddr_to_pdx(nodes[i].end - 1) + 1;
-        if ( spdx >= epdx )
-            continue;
-        bitfield |= spdx;
-        nodes_used++;
-        if ( epdx > memtop )
-            memtop = epdx;
-    }
-    if ( nodes_used <= 1 )
-        i = BITS_PER_LONG - 1;
-    else
-        i = find_first_bit(&bitfield, sizeof(unsigned long)*8);
-    memnodemapsize = (memtop >> i) + 1;
-    return i;
-}
-
-int __init compute_hash_shift(struct node *nodes, int numnodes,
-                              nodeid_t *nodeids)
-{
-    int shift;
-
-    shift = extract_lsb_from_nodes(nodes, numnodes);
-    if ( memnodemapsize <= ARRAY_SIZE(_memnodemap) )
-        memnodemap = _memnodemap;
-    else if ( allocate_cachealigned_memnodemap() )
-        return -1;
-    printk(KERN_DEBUG "NUMA: Using %d for the hash shift.\n", shift);
-
-    if ( populate_memnodemap(nodes, numnodes, shift, nodeids) != 1 )
-    {
-        printk(KERN_INFO "Your memory is not aligned you need to "
-               "rebuild your hypervisor with a bigger NODEMAPSIZE "
-               "shift=%d\n", shift);
-        return -1;
-    }
-
-    return shift;
-}
 /* initialize NODE_DATA given nodeid and start/end */
 void __init setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end)
 { 
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 54de70d422..f8f667e90a 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -54,6 +54,7 @@ obj-y += wait.o
 obj-bin-y += warning.init.o
 obj-$(CONFIG_XENOPROF) += xenoprof.o
 obj-y += xmalloc_tlsf.o
+obj-$(CONFIG_NUMA) += numa.o
 
 obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma lzo unlzo unlz4 unzstd earlycpio,$(n).init.o)
 
diff --git a/xen/common/numa.c b/xen/common/numa.c
new file mode 100644
index 0000000000..e65b6a6676
--- /dev/null
+++ b/xen/common/numa.c
@@ -0,0 +1,131 @@
+/*
+ * Generic VM initialization for x86-64 NUMA setups.
+ * Copyright 2002,2003 Andi Kleen, SuSE Labs.
+ * Adapted for Xen: Ryan Harper <ryanh@us.ibm.com>
+ */
+
+#include <xen/mm.h>
+#include <xen/string.h>
+#include <xen/init.h>
+#include <xen/ctype.h>
+#include <xen/nodemask.h>
+#include <xen/numa.h>
+#include <xen/time.h>
+#include <xen/smp.h>
+#include <xen/pfn.h>
+#include <xen/sched.h>
+
+struct node_data node_data[MAX_NUMNODES];
+
+/* Mapping from pdx to node id */
+int memnode_shift;
+typeof(*memnodemap) _memnodemap[64];
+unsigned long memnodemapsize;
+u8 *memnodemap;
+
+/*
+ * Given a shift value, try to populate memnodemap[]
+ * Returns :
+ * 1 if OK
+ * 0 if memnodmap[] too small (of shift too small)
+ * -1 if node overlap or lost ram (shift too big)
+ */
+static int __init populate_memnodemap(const struct node *nodes,
+                                      int numnodes, int shift, nodeid_t *nodeids)
+{
+    unsigned long spdx, epdx;
+    int i, res = -1;
+
+    memset(memnodemap, NUMA_NO_NODE, memnodemapsize * sizeof(*memnodemap));
+    for ( i = 0; i < numnodes; i++ )
+    {
+        spdx = paddr_to_pdx(nodes[i].start);
+        epdx = paddr_to_pdx(nodes[i].end - 1) + 1;
+        if ( spdx >= epdx )
+            continue;
+        if ( (epdx >> shift) >= memnodemapsize )
+            return 0;
+        do {
+            if ( memnodemap[spdx >> shift] != NUMA_NO_NODE )
+                return -1;
+
+            if ( !nodeids )
+                memnodemap[spdx >> shift] = i;
+            else
+                memnodemap[spdx >> shift] = nodeids[i];
+
+            spdx += (1UL << shift);
+        } while ( spdx < epdx );
+        res = 1;
+    }
+
+    return res;
+}
+
+static int __init allocate_cachealigned_memnodemap(void)
+{
+    unsigned long size = PFN_UP(memnodemapsize * sizeof(*memnodemap));
+    unsigned long mfn = mfn_x(alloc_boot_pages(size, 1));
+
+    memnodemap = mfn_to_virt(mfn);
+    mfn <<= PAGE_SHIFT;
+    size <<= PAGE_SHIFT;
+    printk(KERN_DEBUG "NUMA: Allocated memnodemap from %lx - %lx\n",
+           mfn, mfn + size);
+    memnodemapsize = size / sizeof(*memnodemap);
+
+    return 0;
+}
+
+/*
+ * The LSB of all start and end addresses in the node map is the value of the
+ * maximum possible shift.
+ */
+static int __init extract_lsb_from_nodes(const struct node *nodes,
+                                         int numnodes)
+{
+    int i, nodes_used = 0;
+    unsigned long spdx, epdx;
+    unsigned long bitfield = 0, memtop = 0;
+
+    for ( i = 0; i < numnodes; i++ )
+    {
+        spdx = paddr_to_pdx(nodes[i].start);
+        epdx = paddr_to_pdx(nodes[i].end - 1) + 1;
+        if ( spdx >= epdx )
+            continue;
+        bitfield |= spdx;
+        nodes_used++;
+        if ( epdx > memtop )
+            memtop = epdx;
+    }
+    if ( nodes_used <= 1 )
+        i = BITS_PER_LONG - 1;
+    else
+        i = find_first_bit(&bitfield, sizeof(unsigned long)*8);
+    memnodemapsize = (memtop >> i) + 1;
+    return i;
+}
+
+int __init compute_hash_shift(struct node *nodes, int numnodes,
+                              nodeid_t *nodeids)
+{
+    int shift;
+
+    shift = extract_lsb_from_nodes(nodes, numnodes);
+    if ( memnodemapsize <= ARRAY_SIZE(_memnodemap) )
+        memnodemap = _memnodemap;
+    else if ( allocate_cachealigned_memnodemap() )
+        return -1;
+    printk(KERN_DEBUG "NUMA: Using %d for the hash shift.\n", shift);
+
+    if ( populate_memnodemap(nodes, numnodes, shift, nodeids) != 1 )
+    {
+        printk(KERN_INFO "Your memory is not aligned you need to "
+               "rebuild your hypervisor with a bigger NODEMAPSIZE "
+               "shift=%d\n", shift);
+        return -1;
+    }
+
+    return shift;
+}
diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
index bada2c0bb9..abe5617d01 100644
--- a/xen/include/asm-x86/numa.h
+++ b/xen/include/asm-x86/numa.h
@@ -26,7 +26,6 @@ extern int compute_hash_shift(struct node *nodes, int numnodes,
 extern nodeid_t pxm_to_node(unsigned int pxm);
 
 #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
-#define VIRTUAL_BUG_ON(x) 
 
 extern void numa_add_cpu(int cpu);
 extern void numa_init_array(void);
@@ -47,34 +46,6 @@ static inline void clear_node_cpumask(int cpu)
 	cpumask_clear_cpu(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
 }
 
-/* Simple perfect hash to map pdx to node numbers */
-extern int memnode_shift; 
-extern unsigned long memnodemapsize;
-extern u8 *memnodemap;
-
-struct node_data {
-    unsigned long node_start_pfn;
-    unsigned long node_spanned_pages;
-};
-
-extern struct node_data node_data[];
-
-static inline __attribute__((pure)) nodeid_t phys_to_nid(paddr_t addr)
-{ 
-	nodeid_t nid;
-	VIRTUAL_BUG_ON((paddr_to_pdx(addr) >> memnode_shift) >= memnodemapsize);
-	nid = memnodemap[paddr_to_pdx(addr) >> memnode_shift]; 
-	VIRTUAL_BUG_ON(nid >= MAX_NUMNODES || !node_data[nid]); 
-	return nid; 
-} 
-
-#define NODE_DATA(nid)		(&(node_data[nid]))
-
-#define node_start_pfn(nid)	(NODE_DATA(nid)->node_start_pfn)
-#define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
-#define node_end_pfn(nid)       (NODE_DATA(nid)->node_start_pfn + \
-				 NODE_DATA(nid)->node_spanned_pages)
-
 extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
 
 void srat_parse_regions(u64 addr);
diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
index 7aef1a88dc..39e8a4e00a 100644
--- a/xen/include/xen/numa.h
+++ b/xen/include/xen/numa.h
@@ -18,4 +18,39 @@
   (((d)->vcpu != NULL && (d)->vcpu[0] != NULL) \
    ? vcpu_to_node((d)->vcpu[0]) : NUMA_NO_NODE)
 
+/* The following content can be used when NUMA feature is enabled */
+#if defined(CONFIG_NUMA)
+
+/* Simple perfect hash to map pdx to node numbers */
+extern int memnode_shift;
+extern unsigned long memnodemapsize;
+extern u8 *memnodemap;
+extern typeof(*memnodemap) _memnodemap[64];
+
+struct node_data {
+    unsigned long node_start_pfn;
+    unsigned long node_spanned_pages;
+};
+
+extern struct node_data node_data[];
+#define VIRTUAL_BUG_ON(x)
+
+static inline __attribute__((pure)) nodeid_t phys_to_nid(paddr_t addr)
+{
+	nodeid_t nid;
+	VIRTUAL_BUG_ON((paddr_to_pdx(addr) >> memnode_shift) >= memnodemapsize);
+	nid = memnodemap[paddr_to_pdx(addr) >> memnode_shift];
+	VIRTUAL_BUG_ON(nid >= MAX_NUMNODES || !node_data[nid]);
+	return nid;
+}
+
+#define NODE_DATA(nid)		(&(node_data[nid]))
+
+#define node_start_pfn(nid)	(NODE_DATA(nid)->node_start_pfn)
+#define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
+#define node_end_pfn(nid)       (NODE_DATA(nid)->node_start_pfn + \
+				 NODE_DATA(nid)->node_spanned_pages)
+
+#endif /* CONFIG_NUMA */
+
 #endif /* _XEN_NUMA_H */
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 09/40] xen/x86: Move numa_add_cpu_node to common
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (7 preceding siblings ...)
  2021-08-11 10:23 ` [XEN RFC PATCH 08/40] xen/x86: Move NUMA memory node map functions to common Wei Chen
@ 2021-08-11 10:23 ` Wei Chen
  2021-08-23 17:54   ` Julien Grall
  2021-08-11 10:23 ` [XEN RFC PATCH 10/40] xen/x86: Move NR_NODE_MEMBLKS macro " Wei Chen
                   ` (33 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:23 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

This function will be reused by Arm later, so we move it
from arch/x86 to common. But we keep cpu_to_node and
node_to_cpumask to x86 header file. Because cpu_to_node and
node_to_cpumask have different implementation for x86 and Arm.
We will move them to common header file when we change the Arm
implementation in later patches.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/x86/numa.c        |  9 ---------
 xen/common/numa.c          | 11 +++++++++++
 xen/include/asm-x86/numa.h |  1 -
 xen/include/xen/numa.h     |  2 ++
 4 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
index a6211be121..f2626b3968 100644
--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -29,16 +29,12 @@ custom_param("numa", numa_setup);
 /* from proto.h */
 #define round_up(x,y) ((((x)+(y))-1) & (~((y)-1)))
 
-nodeid_t cpu_to_node[NR_CPUS] __read_mostly = {
-    [0 ... NR_CPUS-1] = NUMA_NO_NODE
-};
 /*
  * Keep BIOS's CPU2node information, should not be used for memory allocaion
  */
 nodeid_t apicid_to_node[MAX_LOCAL_APIC] = {
     [0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
 };
-cpumask_t node_to_cpumask[MAX_NUMNODES] __read_mostly;
 
 nodemask_t __read_mostly node_online_map = { { [0] = 1UL } };
 
@@ -167,11 +163,6 @@ void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
                     (u64)end_pfn << PAGE_SHIFT);
 }
 
-void numa_add_cpu(int cpu)
-{
-    cpumask_set_cpu(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
-} 
-
 void numa_set_node(int cpu, nodeid_t node)
 {
     cpu_to_node[cpu] = node;
diff --git a/xen/common/numa.c b/xen/common/numa.c
index e65b6a6676..9b6f23dfc1 100644
--- a/xen/common/numa.c
+++ b/xen/common/numa.c
@@ -23,6 +23,12 @@ typeof(*memnodemap) _memnodemap[64];
 unsigned long memnodemapsize;
 u8 *memnodemap;
 
+nodeid_t cpu_to_node[NR_CPUS] __read_mostly = {
+    [0 ... NR_CPUS-1] = NUMA_NO_NODE
+};
+
+cpumask_t node_to_cpumask[MAX_NUMNODES] __read_mostly;
+
 /*
  * Given a shift value, try to populate memnodemap[]
  * Returns :
@@ -129,3 +135,8 @@ int __init compute_hash_shift(struct node *nodes, int numnodes,
 
     return shift;
 }
+
+void numa_add_cpu(int cpu)
+{
+    cpumask_set_cpu(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
+}
diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
index abe5617d01..07ff78ea1b 100644
--- a/xen/include/asm-x86/numa.h
+++ b/xen/include/asm-x86/numa.h
@@ -27,7 +27,6 @@ extern nodeid_t pxm_to_node(unsigned int pxm);
 
 #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
 
-extern void numa_add_cpu(int cpu);
 extern void numa_init_array(void);
 extern bool numa_off;
 
diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
index 39e8a4e00a..f9769cba4b 100644
--- a/xen/include/xen/numa.h
+++ b/xen/include/xen/numa.h
@@ -51,6 +51,8 @@ static inline __attribute__((pure)) nodeid_t phys_to_nid(paddr_t addr)
 #define node_end_pfn(nid)       (NODE_DATA(nid)->node_start_pfn + \
 				 NODE_DATA(nid)->node_spanned_pages)
 
+extern void numa_add_cpu(int cpu);
+
 #endif /* CONFIG_NUMA */
 
 #endif /* _XEN_NUMA_H */
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 10/40] xen/x86: Move NR_NODE_MEMBLKS macro to common
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (8 preceding siblings ...)
  2021-08-11 10:23 ` [XEN RFC PATCH 09/40] xen/x86: Move numa_add_cpu_node " Wei Chen
@ 2021-08-11 10:23 ` Wei Chen
  2021-08-23 17:58   ` Julien Grall
  2021-08-11 10:23 ` [XEN RFC PATCH 11/40] xen/x86: Move NUMA nodes and memory block ranges " Wei Chen
                   ` (32 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:23 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

Not only x86 ACPI need this macro. Device tree based NUMA
also needs this macro to present max memory block number.
So we move it from x86 ACPI header file to common NUMA
header file.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/include/asm-x86/acpi.h | 1 -
 xen/include/xen/numa.h     | 1 +
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/include/asm-x86/acpi.h b/xen/include/asm-x86/acpi.h
index 7032f3a001..d347500a3c 100644
--- a/xen/include/asm-x86/acpi.h
+++ b/xen/include/asm-x86/acpi.h
@@ -103,7 +103,6 @@ extern unsigned long acpi_wakeup_address;
 
 extern s8 acpi_numa;
 extern int acpi_scan_nodes(u64 start, u64 end);
-#define NR_NODE_MEMBLKS (MAX_NUMNODES*2)
 
 extern struct acpi_sleep_info acpi_sinfo;
 #define acpi_video_flags bootsym(video_flags)
diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
index f9769cba4b..5af74b357f 100644
--- a/xen/include/xen/numa.h
+++ b/xen/include/xen/numa.h
@@ -11,6 +11,7 @@
 #define NUMA_NO_DISTANCE 0xFF
 
 #define MAX_NUMNODES    (1 << NODES_SHIFT)
+#define NR_NODE_MEMBLKS (MAX_NUMNODES*2)
 
 #define vcpu_to_node(v) (cpu_to_node((v)->processor))
 
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 11/40] xen/x86: Move NUMA nodes and memory block ranges to common
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (9 preceding siblings ...)
  2021-08-11 10:23 ` [XEN RFC PATCH 10/40] xen/x86: Move NR_NODE_MEMBLKS macro " Wei Chen
@ 2021-08-11 10:23 ` Wei Chen
  2021-08-24 17:40   ` Julien Grall
  2021-08-11 10:23 ` [XEN RFC PATCH 12/40] xen/x86: Move numa_initmem_init " Wei Chen
                   ` (31 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:23 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

These data structures and functions are used to create the
mapping between node and memory blocks. In device tree based
NUMA, we will reuse these data structures and functions, so
we move this part of code from x86 to common.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/x86/srat.c        | 50 -------------------------------------
 xen/common/numa.c          | 51 ++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/numa.h |  8 ------
 xen/include/xen/numa.h     | 15 +++++++++++
 4 files changed, 66 insertions(+), 58 deletions(-)

diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
index 6b77b98201..6d68b8a614 100644
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -26,7 +26,6 @@ static struct acpi_table_slit *__read_mostly acpi_slit;
 
 static nodemask_t memory_nodes_parsed __initdata;
 static nodemask_t processor_nodes_parsed __initdata;
-static struct node nodes[MAX_NUMNODES] __initdata;
 
 struct pxm2node {
 	unsigned pxm;
@@ -37,9 +36,6 @@ static struct pxm2node __read_mostly pxm2node[MAX_NUMNODES] =
 
 static unsigned node_to_pxm(nodeid_t n);
 
-static int num_node_memblks;
-static struct node node_memblk_range[NR_NODE_MEMBLKS];
-static nodeid_t memblk_nodeid[NR_NODE_MEMBLKS];
 static __initdata DECLARE_BITMAP(memblk_hotplug, NR_NODE_MEMBLKS);
 
 static inline bool node_found(unsigned idx, unsigned pxm)
@@ -104,52 +100,6 @@ nodeid_t setup_node(unsigned pxm)
 	return node;
 }
 
-int valid_numa_range(u64 start, u64 end, nodeid_t node)
-{
-	int i;
-
-	for (i = 0; i < num_node_memblks; i++) {
-		struct node *nd = &node_memblk_range[i];
-
-		if (nd->start <= start && nd->end >= end &&
-			memblk_nodeid[i] == node)
-			return 1;
-	}
-
-	return 0;
-}
-
-static __init int conflicting_memblks(u64 start, u64 end)
-{
-	int i;
-
-	for (i = 0; i < num_node_memblks; i++) {
-		struct node *nd = &node_memblk_range[i];
-		if (nd->start == nd->end)
-			continue;
-		if (nd->end > start && nd->start < end)
-			return i;
-		if (nd->end == end && nd->start == start)
-			return i;
-	}
-	return -1;
-}
-
-static __init void cutoff_node(int i, u64 start, u64 end)
-{
-	struct node *nd = &nodes[i];
-	if (nd->start < start) {
-		nd->start = start;
-		if (nd->end < nd->start)
-			nd->start = nd->end;
-	}
-	if (nd->end > end) {
-		nd->end = end;
-		if (nd->start > nd->end)
-			nd->start = nd->end;
-	}
-}
-
 static __init void bad_srat(void)
 {
 	int i;
diff --git a/xen/common/numa.c b/xen/common/numa.c
index 9b6f23dfc1..1facc8fe2b 100644
--- a/xen/common/numa.c
+++ b/xen/common/numa.c
@@ -29,6 +29,11 @@ nodeid_t cpu_to_node[NR_CPUS] __read_mostly = {
 
 cpumask_t node_to_cpumask[MAX_NUMNODES] __read_mostly;
 
+struct node nodes[MAX_NUMNODES] __initdata;
+int num_node_memblks;
+struct node node_memblk_range[NR_NODE_MEMBLKS];
+nodeid_t memblk_nodeid[NR_NODE_MEMBLKS];
+
 /*
  * Given a shift value, try to populate memnodemap[]
  * Returns :
@@ -136,6 +141,52 @@ int __init compute_hash_shift(struct node *nodes, int numnodes,
     return shift;
 }
 
+int valid_numa_range(u64 start, u64 end, nodeid_t node)
+{
+	int i;
+
+	for (i = 0; i < num_node_memblks; i++) {
+		struct node *nd = &node_memblk_range[i];
+
+		if (nd->start <= start && nd->end >= end &&
+			memblk_nodeid[i] == node)
+			return 1;
+	}
+
+	return 0;
+}
+
+int __init conflicting_memblks(u64 start, u64 end)
+{
+	int i;
+
+	for (i = 0; i < num_node_memblks; i++) {
+		struct node *nd = &node_memblk_range[i];
+		if (nd->start == nd->end)
+			continue;
+		if (nd->end > start && nd->start < end)
+			return i;
+		if (nd->end == end && nd->start == start)
+			return i;
+	}
+	return -1;
+}
+
+void __init cutoff_node(int i, u64 start, u64 end)
+{
+	struct node *nd = &nodes[i];
+	if (nd->start < start) {
+		nd->start = start;
+		if (nd->end < nd->start)
+			nd->start = nd->end;
+	}
+	if (nd->end > end) {
+		nd->end = end;
+		if (nd->start > nd->end)
+			nd->start = nd->end;
+	}
+}
+
 void numa_add_cpu(int cpu)
 {
     cpumask_set_cpu(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
index 07ff78ea1b..e8a92ad9df 100644
--- a/xen/include/asm-x86/numa.h
+++ b/xen/include/asm-x86/numa.h
@@ -17,12 +17,6 @@ extern cpumask_t     node_to_cpumask[];
 #define node_to_first_cpu(node)  (__ffs(node_to_cpumask[node]))
 #define node_to_cpumask(node)    (node_to_cpumask[node])
 
-struct node { 
-	u64 start,end; 
-};
-
-extern int compute_hash_shift(struct node *nodes, int numnodes,
-			      nodeid_t *nodeids);
 extern nodeid_t pxm_to_node(unsigned int pxm);
 
 #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
@@ -45,8 +39,6 @@ static inline void clear_node_cpumask(int cpu)
 	cpumask_clear_cpu(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
 }
 
-extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
-
 void srat_parse_regions(u64 addr);
 extern u8 __node_distance(nodeid_t a, nodeid_t b);
 unsigned int arch_get_dma_bitsize(void);
diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
index 5af74b357f..67b79a73a3 100644
--- a/xen/include/xen/numa.h
+++ b/xen/include/xen/numa.h
@@ -54,6 +54,21 @@ static inline __attribute__((pure)) nodeid_t phys_to_nid(paddr_t addr)
 
 extern void numa_add_cpu(int cpu);
 
+struct node {
+	u64 start,end;
+};
+
+extern struct node nodes[MAX_NUMNODES];
+extern int num_node_memblks;
+extern struct node node_memblk_range[NR_NODE_MEMBLKS];
+extern nodeid_t memblk_nodeid[NR_NODE_MEMBLKS];
+
+extern int compute_hash_shift(struct node *nodes, int numnodes,
+			      nodeid_t *nodeids);
+extern int conflicting_memblks(u64 start, u64 end);
+extern void cutoff_node(int i, u64 start, u64 end);
+extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
+
 #endif /* CONFIG_NUMA */
 
 #endif /* _XEN_NUMA_H */
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 12/40] xen/x86: Move numa_initmem_init to common
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (10 preceding siblings ...)
  2021-08-11 10:23 ` [XEN RFC PATCH 11/40] xen/x86: Move NUMA nodes and memory block ranges " Wei Chen
@ 2021-08-11 10:23 ` Wei Chen
  2021-08-25 10:21   ` Julien Grall
  2021-08-11 10:23 ` [XEN RFC PATCH 13/40] xen/arm: introduce numa_set_node for Arm Wei Chen
                   ` (30 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:23 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

This function can be reused by Arm device tree based
NUMA support. So we move it from x86 to common, as well
as its related variables and functions:
setup_node_bootmem, numa_init_array and numa_emulation.

As numa_initmem_init has been moved to common, _memnodemap
is not used cross files. We can restore _memnodemap to
static.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/x86/numa.c         | 118 ----------------------------------
 xen/common/numa.c           | 122 +++++++++++++++++++++++++++++++++++-
 xen/include/asm-x86/numa.h  |   5 --
 xen/include/asm-x86/setup.h |   1 -
 xen/include/xen/numa.h      |   8 ++-
 5 files changed, 128 insertions(+), 126 deletions(-)

diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
index f2626b3968..6908738305 100644
--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -38,7 +38,6 @@ nodeid_t apicid_to_node[MAX_LOCAL_APIC] = {
 
 nodemask_t __read_mostly node_online_map = { { [0] = 1UL } };
 
-bool numa_off;
 s8 acpi_numa = 0;
 
 int srat_disabled(void)
@@ -46,123 +45,6 @@ int srat_disabled(void)
     return numa_off || acpi_numa < 0;
 }
 
-/* initialize NODE_DATA given nodeid and start/end */
-void __init setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end)
-{ 
-    unsigned long start_pfn, end_pfn;
-
-    start_pfn = start >> PAGE_SHIFT;
-    end_pfn = end >> PAGE_SHIFT;
-
-    NODE_DATA(nodeid)->node_start_pfn = start_pfn;
-    NODE_DATA(nodeid)->node_spanned_pages = end_pfn - start_pfn;
-
-    node_set_online(nodeid);
-} 
-
-void __init numa_init_array(void)
-{
-    int rr, i;
-
-    /* There are unfortunately some poorly designed mainboards around
-       that only connect memory to a single CPU. This breaks the 1:1 cpu->node
-       mapping. To avoid this fill in the mapping for all possible
-       CPUs, as the number of CPUs is not known yet.
-       We round robin the existing nodes. */
-    rr = first_node(node_online_map);
-    for ( i = 0; i < nr_cpu_ids; i++ )
-    {
-        if ( cpu_to_node[i] != NUMA_NO_NODE )
-            continue;
-        numa_set_node(i, rr);
-        rr = cycle_node(rr, node_online_map);
-    }
-}
-
-#ifdef CONFIG_NUMA_EMU
-static int numa_fake __initdata = 0;
-
-/* Numa emulation */
-static int __init numa_emulation(u64 start_pfn, u64 end_pfn)
-{
-    int i;
-    struct node nodes[MAX_NUMNODES];
-    u64 sz = ((end_pfn - start_pfn)<<PAGE_SHIFT) / numa_fake;
-
-    /* Kludge needed for the hash function */
-    if ( hweight64(sz) > 1 )
-    {
-        u64 x = 1;
-        while ( (x << 1) < sz )
-            x <<= 1;
-        if ( x < sz/2 )
-            printk(KERN_ERR "Numa emulation unbalanced. Complain to maintainer\n");
-        sz = x;
-    }
-
-    memset(&nodes,0,sizeof(nodes));
-    for ( i = 0; i < numa_fake; i++ )
-    {
-        nodes[i].start = (start_pfn<<PAGE_SHIFT) + i*sz;
-        if ( i == numa_fake - 1 )
-            sz = (end_pfn<<PAGE_SHIFT) - nodes[i].start;
-        nodes[i].end = nodes[i].start + sz;
-        printk(KERN_INFO "Faking node %d at %"PRIx64"-%"PRIx64" (%"PRIu64"MB)\n",
-               i,
-               nodes[i].start, nodes[i].end,
-               (nodes[i].end - nodes[i].start) >> 20);
-        node_set_online(i);
-    }
-    memnode_shift = compute_hash_shift(nodes, numa_fake, NULL);
-    if ( memnode_shift < 0 )
-    {
-        memnode_shift = 0;
-        printk(KERN_ERR "No NUMA hash function found. Emulation disabled.\n");
-        return -1;
-    }
-    for_each_online_node ( i )
-        setup_node_bootmem(i, nodes[i].start, nodes[i].end);
-    numa_init_array();
-
-    return 0;
-}
-#endif
-
-void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
-{ 
-    int i;
-
-#ifdef CONFIG_NUMA_EMU
-    if ( numa_fake && !numa_emulation(start_pfn, end_pfn) )
-        return;
-#endif
-
-#ifdef CONFIG_ACPI_NUMA
-    if ( !numa_off && !acpi_scan_nodes((u64)start_pfn << PAGE_SHIFT,
-         (u64)end_pfn << PAGE_SHIFT) )
-        return;
-#endif
-
-    printk(KERN_INFO "%s\n",
-           numa_off ? "NUMA turned off" : "No NUMA configuration found");
-
-    printk(KERN_INFO "Faking a node at %016"PRIx64"-%016"PRIx64"\n",
-           (u64)start_pfn << PAGE_SHIFT,
-           (u64)end_pfn << PAGE_SHIFT);
-    /* setup dummy node covering all memory */
-    memnode_shift = BITS_PER_LONG - 1;
-    memnodemap = _memnodemap;
-    memnodemapsize = ARRAY_SIZE(_memnodemap);
-
-    nodes_clear(node_online_map);
-    node_set_online(0);
-    for ( i = 0; i < nr_cpu_ids; i++ )
-        numa_set_node(i, 0);
-    cpumask_copy(&node_to_cpumask[0], cpumask_of(0));
-    setup_node_bootmem(0, (u64)start_pfn << PAGE_SHIFT,
-                    (u64)end_pfn << PAGE_SHIFT);
-}
-
 void numa_set_node(int cpu, nodeid_t node)
 {
     cpu_to_node[cpu] = node;
diff --git a/xen/common/numa.c b/xen/common/numa.c
index 1facc8fe2b..26c0006d04 100644
--- a/xen/common/numa.c
+++ b/xen/common/numa.c
@@ -14,12 +14,13 @@
 #include <xen/smp.h>
 #include <xen/pfn.h>
 #include <xen/sched.h>
+#include <asm/acpi.h>
 
 struct node_data node_data[MAX_NUMNODES];
 
 /* Mapping from pdx to node id */
 int memnode_shift;
-typeof(*memnodemap) _memnodemap[64];
+static typeof(*memnodemap) _memnodemap[64];
 unsigned long memnodemapsize;
 u8 *memnodemap;
 
@@ -34,6 +35,8 @@ int num_node_memblks;
 struct node node_memblk_range[NR_NODE_MEMBLKS];
 nodeid_t memblk_nodeid[NR_NODE_MEMBLKS];
 
+bool numa_off;
+
 /*
  * Given a shift value, try to populate memnodemap[]
  * Returns :
@@ -191,3 +194,120 @@ void numa_add_cpu(int cpu)
 {
     cpumask_set_cpu(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
 }
+
+/* initialize NODE_DATA given nodeid and start/end */
+void __init setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end)
+{
+    unsigned long start_pfn, end_pfn;
+
+    start_pfn = start >> PAGE_SHIFT;
+    end_pfn = end >> PAGE_SHIFT;
+
+    NODE_DATA(nodeid)->node_start_pfn = start_pfn;
+    NODE_DATA(nodeid)->node_spanned_pages = end_pfn - start_pfn;
+
+    node_set_online(nodeid);
+}
+
+void __init numa_init_array(void)
+{
+    int rr, i;
+
+    /* There are unfortunately some poorly designed mainboards around
+       that only connect memory to a single CPU. This breaks the 1:1 cpu->node
+       mapping. To avoid this fill in the mapping for all possible
+       CPUs, as the number of CPUs is not known yet.
+       We round robin the existing nodes. */
+    rr = first_node(node_online_map);
+    for ( i = 0; i < nr_cpu_ids; i++ )
+    {
+        if ( cpu_to_node[i] != NUMA_NO_NODE )
+            continue;
+        numa_set_node(i, rr);
+        rr = cycle_node(rr, node_online_map);
+    }
+}
+
+#ifdef CONFIG_NUMA_EMU
+int numa_fake __initdata = 0;
+
+/* Numa emulation */
+static int __init numa_emulation(u64 start_pfn, u64 end_pfn)
+{
+    int i;
+    struct node nodes[MAX_NUMNODES];
+    u64 sz = ((end_pfn - start_pfn)<<PAGE_SHIFT) / numa_fake;
+
+    /* Kludge needed for the hash function */
+    if ( hweight64(sz) > 1 )
+    {
+        u64 x = 1;
+        while ( (x << 1) < sz )
+            x <<= 1;
+        if ( x < sz/2 )
+            printk(KERN_ERR "Numa emulation unbalanced. Complain to maintainer\n");
+        sz = x;
+    }
+
+    memset(&nodes,0,sizeof(nodes));
+    for ( i = 0; i < numa_fake; i++ )
+    {
+        nodes[i].start = (start_pfn<<PAGE_SHIFT) + i*sz;
+        if ( i == numa_fake - 1 )
+            sz = (end_pfn<<PAGE_SHIFT) - nodes[i].start;
+        nodes[i].end = nodes[i].start + sz;
+        printk(KERN_INFO "Faking node %d at %"PRIx64"-%"PRIx64" (%"PRIu64"MB)\n",
+               i,
+               nodes[i].start, nodes[i].end,
+               (nodes[i].end - nodes[i].start) >> 20);
+        node_set_online(i);
+    }
+    memnode_shift = compute_hash_shift(nodes, numa_fake, NULL);
+    if ( memnode_shift < 0 )
+    {
+        memnode_shift = 0;
+        printk(KERN_ERR "No NUMA hash function found. Emulation disabled.\n");
+        return -1;
+    }
+    for_each_online_node ( i )
+        setup_node_bootmem(i, nodes[i].start, nodes[i].end);
+    numa_init_array();
+
+    return 0;
+}
+#endif
+
+void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
+{
+    int i;
+
+#ifdef CONFIG_NUMA_EMU
+    if ( numa_fake && !numa_emulation(start_pfn, end_pfn) )
+        return;
+#endif
+
+#ifdef CONFIG_ACPI_NUMA
+    if ( !numa_off && !acpi_scan_nodes((u64)start_pfn << PAGE_SHIFT,
+         (u64)end_pfn << PAGE_SHIFT) )
+        return;
+#endif
+
+    printk(KERN_INFO "%s\n",
+           numa_off ? "NUMA turned off" : "No NUMA configuration found");
+
+    printk(KERN_INFO "Faking a node at %016"PRIx64"-%016"PRIx64"\n",
+           (u64)start_pfn << PAGE_SHIFT,
+           (u64)end_pfn << PAGE_SHIFT);
+    /* setup dummy node covering all memory */
+    memnode_shift = BITS_PER_LONG - 1;
+    memnodemap = _memnodemap;
+    memnodemapsize = ARRAY_SIZE(_memnodemap);
+
+    nodes_clear(node_online_map);
+    node_set_online(0);
+    for ( i = 0; i < nr_cpu_ids; i++ )
+        numa_set_node(i, 0);
+    cpumask_copy(&node_to_cpumask[0], cpumask_of(0));
+    setup_node_bootmem(0, (u64)start_pfn << PAGE_SHIFT,
+                    (u64)end_pfn << PAGE_SHIFT);
+}
diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
index e8a92ad9df..f8e4e15586 100644
--- a/xen/include/asm-x86/numa.h
+++ b/xen/include/asm-x86/numa.h
@@ -21,16 +21,11 @@ extern nodeid_t pxm_to_node(unsigned int pxm);
 
 #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
 
-extern void numa_init_array(void);
-extern bool numa_off;
-
-
 extern int srat_disabled(void);
 extern void numa_set_node(int cpu, nodeid_t node);
 extern nodeid_t setup_node(unsigned int pxm);
 extern void srat_detect_node(int cpu);
 
-extern void setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end);
 extern nodeid_t apicid_to_node[];
 extern void init_cpu_to_node(void);
 
diff --git a/xen/include/asm-x86/setup.h b/xen/include/asm-x86/setup.h
index 24be46115d..63838ba2d1 100644
--- a/xen/include/asm-x86/setup.h
+++ b/xen/include/asm-x86/setup.h
@@ -17,7 +17,6 @@ void early_time_init(void);
 
 void set_nr_cpu_ids(unsigned int max_cpus);
 
-void numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn);
 void arch_init_memory(void);
 void subarch_init_memory(void);
 
diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
index 67b79a73a3..258a5cb3db 100644
--- a/xen/include/xen/numa.h
+++ b/xen/include/xen/numa.h
@@ -26,7 +26,6 @@
 extern int memnode_shift;
 extern unsigned long memnodemapsize;
 extern u8 *memnodemap;
-extern typeof(*memnodemap) _memnodemap[64];
 
 struct node_data {
     unsigned long node_start_pfn;
@@ -69,6 +68,13 @@ extern int conflicting_memblks(u64 start, u64 end);
 extern void cutoff_node(int i, u64 start, u64 end);
 extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
 
+extern void numa_init_array(void);
+extern void numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn);
+extern bool numa_off;
+extern int numa_fake;
+
+extern void setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end);
+
 #endif /* CONFIG_NUMA */
 
 #endif /* _XEN_NUMA_H */
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 13/40] xen/arm: introduce numa_set_node for Arm
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (11 preceding siblings ...)
  2021-08-11 10:23 ` [XEN RFC PATCH 12/40] xen/x86: Move numa_initmem_init " Wei Chen
@ 2021-08-11 10:23 ` Wei Chen
  2021-08-25 10:36   ` Julien Grall
  2021-08-11 10:23 ` [XEN RFC PATCH 14/40] xen/arm: set NUMA nodes max number to 64 by default Wei Chen
                   ` (29 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:23 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

This API is used to set one CPU to a NUMA node. If the system
configure NUMA off or system initialize NUMA failed, the
online NUMA node would set to only node#0. This will be done
in following patches. When NUMA turn off or init failed,
node_online_map will be cleared and set node#0 online. So we
use node_online_map to prevent to set a CPU to an offline node.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/Makefile      |  1 +
 xen/arch/arm/numa.c        | 31 +++++++++++++++++++++++++++++++
 xen/include/asm-arm/numa.h |  2 ++
 xen/include/asm-x86/numa.h |  1 -
 xen/include/xen/numa.h     |  1 +
 5 files changed, 35 insertions(+), 1 deletion(-)
 create mode 100644 xen/arch/arm/numa.c

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 3d3b97b5b4..6e3fb8033e 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -35,6 +35,7 @@ obj-$(CONFIG_LIVEPATCH) += livepatch.o
 obj-y += mem_access.o
 obj-y += mm.o
 obj-y += monitor.o
+obj-$(CONFIG_NUMA) += numa.o
 obj-y += p2m.o
 obj-y += percpu.o
 obj-y += platform.o
diff --git a/xen/arch/arm/numa.c b/xen/arch/arm/numa.c
new file mode 100644
index 0000000000..1e30c5bb13
--- /dev/null
+++ b/xen/arch/arm/numa.c
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Arm Architecture support layer for NUMA.
+ *
+ * Copyright (C) 2021 Arm Ltd
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+#include <xen/init.h>
+#include <xen/nodemask.h>
+#include <xen/numa.h>
+
+void numa_set_node(int cpu, nodeid_t nid)
+{
+    if ( nid >= MAX_NUMNODES ||
+        !nodemask_test(nid, &node_online_map) )
+        nid = 0;
+
+    cpu_to_node[cpu] = nid;
+}
diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
index ab9c4a2448..1162c702df 100644
--- a/xen/include/asm-arm/numa.h
+++ b/xen/include/asm-arm/numa.h
@@ -27,6 +27,8 @@ extern mfn_t first_valid_mfn;
 #define node_start_pfn(nid) (mfn_x(first_valid_mfn))
 #define __node_distance(a, b) (20)
 
+#define numa_set_node(x, y) do { } while (0)
+
 #endif
 
 #endif /* __ARCH_ARM_NUMA_H */
diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
index f8e4e15586..69859b0a57 100644
--- a/xen/include/asm-x86/numa.h
+++ b/xen/include/asm-x86/numa.h
@@ -22,7 +22,6 @@ extern nodeid_t pxm_to_node(unsigned int pxm);
 #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
 
 extern int srat_disabled(void);
-extern void numa_set_node(int cpu, nodeid_t node);
 extern nodeid_t setup_node(unsigned int pxm);
 extern void srat_detect_node(int cpu);
 
diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
index 258a5cb3db..3972aa6b93 100644
--- a/xen/include/xen/numa.h
+++ b/xen/include/xen/numa.h
@@ -70,6 +70,7 @@ extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
 
 extern void numa_init_array(void);
 extern void numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn);
+extern void numa_set_node(int cpu, nodeid_t node);
 extern bool numa_off;
 extern int numa_fake;
 
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 14/40] xen/arm: set NUMA nodes max number to 64 by default
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (12 preceding siblings ...)
  2021-08-11 10:23 ` [XEN RFC PATCH 13/40] xen/arm: introduce numa_set_node for Arm Wei Chen
@ 2021-08-11 10:23 ` Wei Chen
  2021-08-25 13:28   ` Julien Grall
  2021-08-11 10:23 ` [XEN RFC PATCH 15/40] xen/x86: move NUMA API from x86 header to common header Wei Chen
                   ` (28 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:23 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

Today's Arm64 systems can reach or exceed 16 NUMA nodes, so
we set the number to 64 to match with x86.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/include/asm-arm/numa.h | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
index 1162c702df..b2982f9053 100644
--- a/xen/include/asm-arm/numa.h
+++ b/xen/include/asm-arm/numa.h
@@ -5,7 +5,15 @@
 
 typedef u8 nodeid_t;
 
-#if !defined(CONFIG_NUMA)
+#if defined(CONFIG_NUMA)
+
+/*
+ * Same as x86, we set the max number of NUMA nodes to 64 and
+ * set the number of NUMA memory block number to 128.
+ */
+#define NODES_SHIFT      6
+
+#else
 
 /* Fake one node for now. See also node_online_map. */
 #define cpu_to_node(cpu) 0
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 15/40] xen/x86: move NUMA API from x86 header to common header
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (13 preceding siblings ...)
  2021-08-11 10:23 ` [XEN RFC PATCH 14/40] xen/arm: set NUMA nodes max number to 64 by default Wei Chen
@ 2021-08-11 10:23 ` Wei Chen
  2021-08-11 10:23 ` [XEN RFC PATCH 16/40] xen/arm: Create a fake NUMA node to use common code Wei Chen
                   ` (27 subsequent siblings)
  42 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:23 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

As all functions and macros that depends on these NUMA API, like
clear_node_cpumask, have been moved to common header file. In this
case, we can move NUMA API from x86 header file to common header
file, and will not trigger the symbols not found error when
functions in arch NUMA header file depends on the symbols in common
NUMA header file.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/include/asm-x86/numa.h | 13 -------------
 xen/include/xen/numa.h     | 13 +++++++++++++
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
index 69859b0a57..5a57a51e26 100644
--- a/xen/include/asm-x86/numa.h
+++ b/xen/include/asm-x86/numa.h
@@ -9,14 +9,6 @@ typedef u8 nodeid_t;
 
 extern int srat_rev;
 
-extern nodeid_t      cpu_to_node[NR_CPUS];
-extern cpumask_t     node_to_cpumask[];
-
-#define cpu_to_node(cpu)		(cpu_to_node[cpu])
-#define parent_node(node)		(node)
-#define node_to_first_cpu(node)  (__ffs(node_to_cpumask[node]))
-#define node_to_cpumask(node)    (node_to_cpumask[node])
-
 extern nodeid_t pxm_to_node(unsigned int pxm);
 
 #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
@@ -28,11 +20,6 @@ extern void srat_detect_node(int cpu);
 extern nodeid_t apicid_to_node[];
 extern void init_cpu_to_node(void);
 
-static inline void clear_node_cpumask(int cpu)
-{
-	cpumask_clear_cpu(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
-}
-
 void srat_parse_regions(u64 addr);
 extern u8 __node_distance(nodeid_t a, nodeid_t b);
 unsigned int arch_get_dma_bitsize(void);
diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
index 3972aa6b93..cb08d2eca9 100644
--- a/xen/include/xen/numa.h
+++ b/xen/include/xen/numa.h
@@ -22,6 +22,14 @@
 /* The following content can be used when NUMA feature is enabled */
 #if defined(CONFIG_NUMA)
 
+extern nodeid_t      cpu_to_node[NR_CPUS];
+extern cpumask_t     node_to_cpumask[];
+
+#define cpu_to_node(cpu)		(cpu_to_node[cpu])
+#define parent_node(node)		(node)
+#define node_to_first_cpu(node)  (__ffs(node_to_cpumask[node]))
+#define node_to_cpumask(node)    (node_to_cpumask[node])
+
 /* Simple perfect hash to map pdx to node numbers */
 extern int memnode_shift;
 extern unsigned long memnodemapsize;
@@ -76,6 +84,11 @@ extern int numa_fake;
 
 extern void setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end);
 
+static inline void clear_node_cpumask(int cpu)
+{
+	cpumask_clear_cpu(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
+}
+
 #endif /* CONFIG_NUMA */
 
 #endif /* _XEN_NUMA_H */
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 16/40] xen/arm: Create a fake NUMA node to use common code
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (14 preceding siblings ...)
  2021-08-11 10:23 ` [XEN RFC PATCH 15/40] xen/x86: move NUMA API from x86 header to common header Wei Chen
@ 2021-08-11 10:23 ` Wei Chen
  2021-08-26 23:10   ` Stefano Stabellini
  2021-08-11 10:24 ` [XEN RFC PATCH 17/40] xen/arm: Introduce DEVICE_TREE_NUMA Kconfig for arm64 Wei Chen
                   ` (26 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:23 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

When CONFIG_NUMA is enabled for Arm, Xen will switch to use common
NUMA API instead of previous fake NUMA API. Before we parse NUMA
information from device tree or ACPI SRAT table, we need to init
the NUMA related variables, like cpu_to_node, as single node NUMA
system.

So in this patch, we introduce a numa_init function for to
initialize these data structures as all resources belongs to node#0.
This will make the new API returns the same values as the fake API
has done.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/numa.c        | 53 ++++++++++++++++++++++++++++++++++++++
 xen/arch/arm/setup.c       |  8 ++++++
 xen/include/asm-arm/numa.h | 11 ++++++++
 3 files changed, 72 insertions(+)

diff --git a/xen/arch/arm/numa.c b/xen/arch/arm/numa.c
index 1e30c5bb13..566ad1e52b 100644
--- a/xen/arch/arm/numa.c
+++ b/xen/arch/arm/numa.c
@@ -20,6 +20,8 @@
 #include <xen/init.h>
 #include <xen/nodemask.h>
 #include <xen/numa.h>
+#include <xen/pfn.h>
+#include <asm/setup.h>
 
 void numa_set_node(int cpu, nodeid_t nid)
 {
@@ -29,3 +31,54 @@ void numa_set_node(int cpu, nodeid_t nid)
 
     cpu_to_node[cpu] = nid;
 }
+
+void __init numa_init(bool acpi_off)
+{
+    uint32_t idx;
+    paddr_t ram_start = ~0;
+    paddr_t ram_size = 0;
+    paddr_t ram_end = 0;
+
+    printk(XENLOG_WARNING
+        "NUMA has not been supported yet, NUMA off!\n");
+    /* Arm NUMA has not been implemented until this patch */
+    numa_off = true;
+
+    /*
+     * Set all cpu_to_node mapping to 0, this will make cpu_to_node
+     * function return 0 as previous fake cpu_to_node API.
+     */
+    for ( idx = 0; idx < NR_CPUS; idx++ )
+        cpu_to_node[idx] = 0;
+
+    /*
+     * Make node_to_cpumask, node_spanned_pages and node_start_pfn
+     * return as previous fake APIs.
+     */
+    for ( idx = 0; idx < MAX_NUMNODES; idx++ ) {
+        node_to_cpumask[idx] = cpu_online_map;
+        node_spanned_pages(idx) = (max_page - mfn_x(first_valid_mfn));
+        node_start_pfn(idx) = (mfn_x(first_valid_mfn));
+    }
+
+    /*
+     * Find the minimal and maximum address of RAM, NUMA will
+     * build a memory to node mapping table for the whole range.
+     */
+    ram_start = bootinfo.mem.bank[0].start;
+    ram_size  = bootinfo.mem.bank[0].size;
+    ram_end   = ram_start + ram_size;
+    for ( idx = 1 ; idx < bootinfo.mem.nr_banks; idx++ )
+    {
+        paddr_t bank_start = bootinfo.mem.bank[idx].start;
+        paddr_t bank_size = bootinfo.mem.bank[idx].size;
+        paddr_t bank_end = bank_start + bank_size;
+
+        ram_size  = ram_size + bank_size;
+        ram_start = min(ram_start, bank_start);
+        ram_end   = max(ram_end, bank_end);
+    }
+
+    numa_initmem_init(PFN_UP(ram_start), PFN_DOWN(ram_end));
+    return;
+}
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 63a908e325..3c58d2d441 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -30,6 +30,7 @@
 #include <xen/init.h>
 #include <xen/irq.h>
 #include <xen/mm.h>
+#include <xen/numa.h>
 #include <xen/param.h>
 #include <xen/softirq.h>
 #include <xen/keyhandler.h>
@@ -874,6 +875,13 @@ void __init start_xen(unsigned long boot_phys_offset,
     /* Parse the ACPI tables for possible boot-time configuration */
     acpi_boot_table_init();
 
+    /*
+     * Try to initialize NUMA system, if failed, the system will
+     * fallback to uniform system which means system has only 1
+     * NUMA node.
+     */
+    numa_init(acpi_disabled);
+
     end_boot_allocator();
 
     /*
diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
index b2982f9053..bb495a24e1 100644
--- a/xen/include/asm-arm/numa.h
+++ b/xen/include/asm-arm/numa.h
@@ -13,6 +13,16 @@ typedef u8 nodeid_t;
  */
 #define NODES_SHIFT      6
 
+extern void numa_init(bool acpi_off);
+
+/*
+ * Temporary for fake NUMA node, when CPU, memory and distance
+ * matrix will be read from DTB or ACPI SRAT. The following
+ * symbols will be removed.
+ */
+extern mfn_t first_valid_mfn;
+#define __node_distance(a, b) (20)
+
 #else
 
 /* Fake one node for now. See also node_online_map. */
@@ -35,6 +45,7 @@ extern mfn_t first_valid_mfn;
 #define node_start_pfn(nid) (mfn_x(first_valid_mfn))
 #define __node_distance(a, b) (20)
 
+#define numa_init(x) do { } while (0)
 #define numa_set_node(x, y) do { } while (0)
 
 #endif
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 17/40] xen/arm: Introduce DEVICE_TREE_NUMA Kconfig for arm64
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (15 preceding siblings ...)
  2021-08-11 10:23 ` [XEN RFC PATCH 16/40] xen/arm: Create a fake NUMA node to use common code Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-19 13:38   ` Julien Grall
  2021-08-11 10:24 ` [XEN RFC PATCH 18/40] xen/arm: Keep memory nodes in dtb for NUMA when boot from EFI Wei Chen
                   ` (25 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

We need a Kconfig option to distinguish with ACPI based
NUMA. So we introduce the new Kconfig option:
DEVICE_TREE_NUMA in this patch for Arm64.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/Kconfig | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index ecfa6822e4..678cc98ea3 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -33,6 +33,16 @@ config ACPI
 	  Advanced Configuration and Power Interface (ACPI) support for Xen is
 	  an alternative to device tree on ARM64.
 
+config DEVICE_TREE_NUMA
+	bool "NUMA (Non-Uniform Memory Access) Support (UNSUPPORTED)" if UNSUPPORTED
+	depends on ARM_64
+	select NUMA
+	---help---
+
+	  Non-Uniform Memory Access (NUMA) is a computer memory design used in
+	  multiprocessing, where the memory access time depends on the memory
+	  location relative to the processor.
+
 config GICV3
 	bool "GICv3 driver"
 	depends on ARM_64 && !NEW_VGIC
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 18/40] xen/arm: Keep memory nodes in dtb for NUMA when boot from EFI
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (16 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 17/40] xen/arm: Introduce DEVICE_TREE_NUMA Kconfig for arm64 Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-19 17:35   ` Julien Grall
  2021-08-26 23:24   ` Stefano Stabellini
  2021-08-11 10:24 ` [XEN RFC PATCH 19/40] xen: fdt: Introduce a helper to check fdt node type Wei Chen
                   ` (24 subsequent siblings)
  42 siblings, 2 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

EFI can get memory map from EFI system table. But EFI system
table doesn't contain memory NUMA information, EFI depends on
ACPI SRAT or device tree memory node to parse memory blocks'
NUMA mapping.

But in current code, when Xen is booting from EFI, it will
delete all memory nodes in device tree. So in UEFI + DTB
boot, we don't have numa-node-id for memory blocks any more.

So in this patch, we will keep memory nodes in device tree for
NUMA code to parse memory numa-node-id later.

As a side effect, if we still parse boot memory information in
early_scan_node, bootmem.info will calculate memory ranges in
memory nodes twice. So we have to prvent early_scan_node to
parse memory nodes in EFI boot.

As EFI APIs only can be used in Arm64, so we introduced a wrapper
in header file to prevent #ifdef CONFIG_ARM_64/32 in code block.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/bootfdt.c      |  8 +++++++-
 xen/arch/arm/efi/efi-boot.h | 25 -------------------------
 xen/include/asm-arm/setup.h |  6 ++++++
 3 files changed, 13 insertions(+), 26 deletions(-)

diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
index 476e32e0f5..7df149dbca 100644
--- a/xen/arch/arm/bootfdt.c
+++ b/xen/arch/arm/bootfdt.c
@@ -11,6 +11,7 @@
 #include <xen/lib.h>
 #include <xen/kernel.h>
 #include <xen/init.h>
+#include <xen/efi.h>
 #include <xen/device_tree.h>
 #include <xen/libfdt/libfdt.h>
 #include <xen/sort.h>
@@ -335,7 +336,12 @@ static int __init early_scan_node(const void *fdt,
 {
     int rc = 0;
 
-    if ( device_tree_node_matches(fdt, node, "memory") )
+    /*
+     * If system boot from EFI, bootinfo.mem has been set by EFI,
+     * so we don't need to parse memory node from DTB.
+     */
+    if ( device_tree_node_matches(fdt, node, "memory") &&
+         !arch_efi_enabled(EFI_BOOT) )
         rc = process_memory_node(fdt, node, name, depth,
                                  address_cells, size_cells, &bootinfo.mem);
     else if ( depth == 1 && !dt_node_cmp(name, "reserved-memory") )
diff --git a/xen/arch/arm/efi/efi-boot.h b/xen/arch/arm/efi/efi-boot.h
index cf9c37153f..d0a9987fa4 100644
--- a/xen/arch/arm/efi/efi-boot.h
+++ b/xen/arch/arm/efi/efi-boot.h
@@ -197,33 +197,8 @@ EFI_STATUS __init fdt_add_uefi_nodes(EFI_SYSTEM_TABLE *sys_table,
     int status;
     u32 fdt_val32;
     u64 fdt_val64;
-    int prev;
     int num_rsv;
 
-    /*
-     * Delete any memory nodes present.  The EFI memory map is the only
-     * memory description provided to Xen.
-     */
-    prev = 0;
-    for (;;)
-    {
-        const char *type;
-        int len;
-
-        node = fdt_next_node(fdt, prev, NULL);
-        if ( node < 0 )
-            break;
-
-        type = fdt_getprop(fdt, node, "device_type", &len);
-        if ( type && strncmp(type, "memory", len) == 0 )
-        {
-            fdt_del_node(fdt, node);
-            continue;
-        }
-
-        prev = node;
-    }
-
    /*
     * Delete all memory reserve map entries. When booting via UEFI,
     * kernel will use the UEFI memory map to find reserved regions.
diff --git a/xen/include/asm-arm/setup.h b/xen/include/asm-arm/setup.h
index c4b6af6029..e4fb5f0d49 100644
--- a/xen/include/asm-arm/setup.h
+++ b/xen/include/asm-arm/setup.h
@@ -123,6 +123,12 @@ void device_tree_get_reg(const __be32 **cell, u32 address_cells,
 u32 device_tree_get_u32(const void *fdt, int node,
                         const char *prop_name, u32 dflt);
 
+#if defined(CONFIG_ARM_64)
+#define arch_efi_enabled(x) efi_enabled(x)
+#else
+#define arch_efi_enabled(x) (0)
+#endif
+
 #endif
 /*
  * Local variables:
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 19/40] xen: fdt: Introduce a helper to check fdt node type
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (17 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 18/40] xen/arm: Keep memory nodes in dtb for NUMA when boot from EFI Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-25 13:39   ` Julien Grall
  2021-08-11 10:24 ` [XEN RFC PATCH 20/40] xen/arm: implement node distance helpers for Arm64 Wei Chen
                   ` (23 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

In later patches, we will parse CPU and memory NUMA information
from device tree. FDT is using device type property to indicate
CPU nodes and memory nodes. So we introduce fdt_node_check_type
in this patch to avoid redundant code in subsequent patches.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/common/libfdt/fdt_ro.c      | 15 +++++++++++++++
 xen/include/xen/libfdt/libfdt.h | 25 +++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/xen/common/libfdt/fdt_ro.c b/xen/common/libfdt/fdt_ro.c
index 36f9b480d1..ae7794d870 100644
--- a/xen/common/libfdt/fdt_ro.c
+++ b/xen/common/libfdt/fdt_ro.c
@@ -545,6 +545,21 @@ int fdt_node_check_compatible(const void *fdt, int nodeoffset,
 		return 1;
 }
 
+int fdt_node_check_type(const void *fdt, int nodeoffset,
+			      const char *type)
+{
+	const void *prop;
+	int len;
+
+	prop = fdt_getprop(fdt, nodeoffset, "device_type", &len);
+	if (!prop)
+		return len;
+	if (fdt_stringlist_contains(prop, len, type))
+		return 0;
+	else
+		return 1;
+}
+
 int fdt_node_offset_by_compatible(const void *fdt, int startoffset,
 				  const char *compatible)
 {
diff --git a/xen/include/xen/libfdt/libfdt.h b/xen/include/xen/libfdt/libfdt.h
index 7c75688a39..7e4930dbcd 100644
--- a/xen/include/xen/libfdt/libfdt.h
+++ b/xen/include/xen/libfdt/libfdt.h
@@ -799,6 +799,31 @@ int fdt_node_offset_by_phandle(const void *fdt, uint32_t phandle);
 int fdt_node_check_compatible(const void *fdt, int nodeoffset,
 			      const char *compatible);
 
+/**
+ * fdt_node_check_type: check a node's device_type property
+ * @fdt: pointer to the device tree blob
+ * @nodeoffset: offset of a tree node
+ * @type: string to match against
+ *
+ *
+ * fdt_node_check_type() returns 0 if the given node contains a 'device_type'
+ * property with the given string as one of its elements, it returns non-zero
+ * otherwise, or on error.
+ *
+ * returns:
+ *	0, if the node has a 'device_type' property listing the given string
+ *	1, if the node has a 'device_type' property, but it does not list
+ *		the given string
+ *	-FDT_ERR_NOTFOUND, if the given node has no 'device_type' property
+ * 	-FDT_ERR_BADOFFSET, if nodeoffset does not refer to a BEGIN_NODE tag
+ *	-FDT_ERR_BADMAGIC,
+ *	-FDT_ERR_BADVERSION,
+ *	-FDT_ERR_BADSTATE,
+ *	-FDT_ERR_BADSTRUCTURE, standard meanings
+ */
+int fdt_node_check_type(const void *fdt, int nodeoffset,
+			      const char *type);
+
 /**
  * fdt_node_offset_by_compatible - find nodes with a given 'compatible' value
  * @fdt: pointer to the device tree blob
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 20/40] xen/arm: implement node distance helpers for Arm64
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (18 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 19/40] xen: fdt: Introduce a helper to check fdt node type Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-26 23:52   ` Stefano Stabellini
  2021-08-11 10:24 ` [XEN RFC PATCH 21/40] xen/arm: introduce device_tree_numa as a switch for device tree NUMA Wei Chen
                   ` (22 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

In current Xen code, __node_distance is a fake API, it always
returns NUMA_REMOTE_DISTANCE(20). Now we use a matrix to record
the distance between any two nodes. Accordingly, we provide a
set_node_distance API to set the distance for any two nodes in
this patch.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/numa.c        | 44 ++++++++++++++++++++++++++++++++++++++
 xen/include/asm-arm/numa.h | 12 ++++++++++-
 xen/include/asm-x86/numa.h |  1 -
 xen/include/xen/numa.h     |  2 +-
 4 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/numa.c b/xen/arch/arm/numa.c
index 566ad1e52b..f61a8df645 100644
--- a/xen/arch/arm/numa.c
+++ b/xen/arch/arm/numa.c
@@ -23,6 +23,11 @@
 #include <xen/pfn.h>
 #include <asm/setup.h>
 
+static uint8_t __read_mostly
+node_distance_map[MAX_NUMNODES][MAX_NUMNODES] = {
+    { NUMA_REMOTE_DISTANCE }
+};
+
 void numa_set_node(int cpu, nodeid_t nid)
 {
     if ( nid >= MAX_NUMNODES ||
@@ -32,6 +37,45 @@ void numa_set_node(int cpu, nodeid_t nid)
     cpu_to_node[cpu] = nid;
 }
 
+void __init numa_set_distance(nodeid_t from, nodeid_t to, uint32_t distance)
+{
+    if ( from >= MAX_NUMNODES || to >= MAX_NUMNODES )
+    {
+        printk(KERN_WARNING
+            "NUMA nodes are out of matrix, from=%u to=%u distance=%u\n",
+            from, to, distance);
+        return;
+    }
+
+    /* NUMA defines 0xff as an unreachable node and 0-9 are undefined */
+    if ( distance >= NUMA_NO_DISTANCE ||
+        (distance >= NUMA_DISTANCE_UDF_MIN &&
+         distance <= NUMA_DISTANCE_UDF_MAX) ||
+        (from == to && distance != NUMA_LOCAL_DISTANCE) )
+    {
+        printk(KERN_WARNING
+            "Invalid NUMA node distance, from:%d to:%d distance=%d\n",
+            from, to, distance);
+        return;
+    }
+
+    node_distance_map[from][to] = distance;
+}
+
+uint8_t __node_distance(nodeid_t from, nodeid_t to)
+{
+    /*
+     * Check whether the nodes are in the matrix range.
+     * When any node is out of range, except from and to nodes are the
+     * same, we treat them as unreachable (return 0xFF)
+     */
+    if ( from >= MAX_NUMNODES || to >= MAX_NUMNODES )
+        return from == to ? NUMA_LOCAL_DISTANCE : NUMA_NO_DISTANCE;
+
+    return node_distance_map[from][to];
+}
+EXPORT_SYMBOL(__node_distance);
+
 void __init numa_init(bool acpi_off)
 {
     uint32_t idx;
diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
index bb495a24e1..559b028a01 100644
--- a/xen/include/asm-arm/numa.h
+++ b/xen/include/asm-arm/numa.h
@@ -12,8 +12,19 @@ typedef u8 nodeid_t;
  * set the number of NUMA memory block number to 128.
  */
 #define NODES_SHIFT      6
+/*
+ * In ACPI spec, 0-9 are the reserved values for node distance,
+ * 10 indicates local node distance, 20 indicates remote node
+ * distance. Set node distance map in device tree will follow
+ * the ACPI's definition.
+ */
+#define NUMA_DISTANCE_UDF_MIN   0
+#define NUMA_DISTANCE_UDF_MAX   9
+#define NUMA_LOCAL_DISTANCE     10
+#define NUMA_REMOTE_DISTANCE    20
 
 extern void numa_init(bool acpi_off);
+extern void numa_set_distance(nodeid_t from, nodeid_t to, uint32_t distance);
 
 /*
  * Temporary for fake NUMA node, when CPU, memory and distance
@@ -21,7 +32,6 @@ extern void numa_init(bool acpi_off);
  * symbols will be removed.
  */
 extern mfn_t first_valid_mfn;
-#define __node_distance(a, b) (20)
 
 #else
 
diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
index 5a57a51e26..e0253c20b7 100644
--- a/xen/include/asm-x86/numa.h
+++ b/xen/include/asm-x86/numa.h
@@ -21,7 +21,6 @@ extern nodeid_t apicid_to_node[];
 extern void init_cpu_to_node(void);
 
 void srat_parse_regions(u64 addr);
-extern u8 __node_distance(nodeid_t a, nodeid_t b);
 unsigned int arch_get_dma_bitsize(void);
 
 #endif
diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
index cb08d2eca9..0475823b13 100644
--- a/xen/include/xen/numa.h
+++ b/xen/include/xen/numa.h
@@ -58,7 +58,7 @@ static inline __attribute__((pure)) nodeid_t phys_to_nid(paddr_t addr)
 #define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
 #define node_end_pfn(nid)       (NODE_DATA(nid)->node_start_pfn + \
 				 NODE_DATA(nid)->node_spanned_pages)
-
+extern u8 __node_distance(nodeid_t a, nodeid_t b);
 extern void numa_add_cpu(int cpu);
 
 struct node {
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 21/40] xen/arm: introduce device_tree_numa as a switch for device tree NUMA
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (19 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 20/40] xen/arm: implement node distance helpers for Arm64 Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-19 17:45   ` Julien Grall
  2021-08-11 10:24 ` [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse device tree processor node Wei Chen
                   ` (21 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

Like acpi_numa in x86 as a switch for ACPI based NUMA, we introduce
device_tree_numa as a switch for Arm device tree based NUMA. When
NUMA information in device tree is invalid, this switch will be set
to -1, then NUMA support for Arm will be disabled, even if user set
numa_off=0.

Keep using bad_srat and srat_disabled functions name, because we will
reuse node_covers_memory and acpi_scan_nodes code for Arm. These
functions are using these two API names. And, as device tree can be
treated as one kind of static resource table. So we keep these two
function names.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/Makefile           |  1 +
 xen/arch/arm/numa_device_tree.c | 35 +++++++++++++++++++++++++++++++++
 xen/include/asm-arm/numa.h      |  2 ++
 3 files changed, 38 insertions(+)
 create mode 100644 xen/arch/arm/numa_device_tree.c

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 6e3fb8033e..13e1549be0 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -36,6 +36,7 @@ obj-y += mem_access.o
 obj-y += mm.o
 obj-y += monitor.o
 obj-$(CONFIG_NUMA) += numa.o
+obj-$(CONFIG_DEVICE_TREE_NUMA) += numa_device_tree.o
 obj-y += p2m.o
 obj-y += percpu.o
 obj-y += platform.o
diff --git a/xen/arch/arm/numa_device_tree.c b/xen/arch/arm/numa_device_tree.c
new file mode 100644
index 0000000000..1c74ad135d
--- /dev/null
+++ b/xen/arch/arm/numa_device_tree.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Arm Architecture support layer for NUMA.
+ *
+ * Copyright (C) 2021 Arm Ltd
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+#include <xen/init.h>
+#include <xen/nodemask.h>
+#include <xen/numa.h>
+
+s8 device_tree_numa = 0;
+
+int srat_disabled(void)
+{
+    return numa_off || device_tree_numa < 0;
+}
+
+void __init bad_srat(void)
+{
+    printk(KERN_ERR "DT: NUMA information is not used.\n");
+    device_tree_numa = -1;
+}
diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
index 559b028a01..756ad82d07 100644
--- a/xen/include/asm-arm/numa.h
+++ b/xen/include/asm-arm/numa.h
@@ -23,6 +23,8 @@ typedef u8 nodeid_t;
 #define NUMA_LOCAL_DISTANCE     10
 #define NUMA_REMOTE_DISTANCE    20
 
+extern s8 device_tree_numa;
+
 extern void numa_init(bool acpi_off);
 extern void numa_set_distance(nodeid_t from, nodeid_t to, uint32_t distance);
 
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse device tree processor node
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (20 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 21/40] xen/arm: introduce device_tree_numa as a switch for device tree NUMA Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-19 18:09   ` Julien Grall
                     ` (3 more replies)
  2021-08-11 10:24 ` [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse device tree memory node Wei Chen
                   ` (20 subsequent siblings)
  42 siblings, 4 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

Processor NUMA ID information is stored in device tree's processor
node as "numa-node-id". We need a new helper to parse this ID from
processor node. If we get this ID from processor node, this ID's
validity still need to be checked. Once we got a invalid NUMA ID
from any processor node, the device tree will be marked as NUMA
information invalid.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/numa_device_tree.c | 41 +++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/numa_device_tree.c b/xen/arch/arm/numa_device_tree.c
index 1c74ad135d..37cc56acf3 100644
--- a/xen/arch/arm/numa_device_tree.c
+++ b/xen/arch/arm/numa_device_tree.c
@@ -20,16 +20,53 @@
 #include <xen/init.h>
 #include <xen/nodemask.h>
 #include <xen/numa.h>
+#include <xen/device_tree.h>
+#include <asm/setup.h>
 
 s8 device_tree_numa = 0;
+static nodemask_t processor_nodes_parsed __initdata;
 
-int srat_disabled(void)
+static int srat_disabled(void)
 {
     return numa_off || device_tree_numa < 0;
 }
 
-void __init bad_srat(void)
+static __init void bad_srat(void)
 {
     printk(KERN_ERR "DT: NUMA information is not used.\n");
     device_tree_numa = -1;
 }
+
+/* Callback for device tree processor affinity */
+static int __init dtb_numa_processor_affinity_init(nodeid_t node)
+{
+    if ( srat_disabled() )
+        return -EINVAL;
+    else if ( node == NUMA_NO_NODE || node >= MAX_NUMNODES ) {
+		bad_srat();
+		return -EINVAL;
+	}
+
+    node_set(node, processor_nodes_parsed);
+
+    device_tree_numa = 1;
+    printk(KERN_INFO "DT: NUMA node %u processor parsed\n", node);
+
+    return 0;
+}
+
+/* Parse CPU NUMA node info */
+int __init device_tree_parse_numa_cpu_node(const void *fdt, int node)
+{
+    uint32_t nid;
+
+    nid = device_tree_get_u32(fdt, node, "numa-node-id", MAX_NUMNODES);
+    printk(XENLOG_WARNING "CPU on NUMA node:%u\n", nid);
+    if ( nid >= MAX_NUMNODES )
+    {
+        printk(XENLOG_WARNING "Node id %u exceeds maximum value\n", nid);
+        return -EINVAL;
+    }
+
+    return dtb_numa_processor_affinity_init(nid);
+}
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse device tree memory node
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (21 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse device tree processor node Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-25 13:48   ` Julien Grall
  2021-08-28  1:06   ` Stefano Stabellini
  2021-08-11 10:24 ` [XEN RFC PATCH 24/40] xen/arm: introduce a helper to parse device tree NUMA distance map Wei Chen
                   ` (19 subsequent siblings)
  42 siblings, 2 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

Memory blocks' NUMA ID information is stored in device tree's
memory nodes as "numa-node-id". We need a new helper to parse
and verify this ID from memory nodes.

In order to support memory affinity in later use, the valid
memory ranges and NUMA ID will be saved to tables.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/numa_device_tree.c | 130 ++++++++++++++++++++++++++++++++
 1 file changed, 130 insertions(+)

diff --git a/xen/arch/arm/numa_device_tree.c b/xen/arch/arm/numa_device_tree.c
index 37cc56acf3..bbe081dcd1 100644
--- a/xen/arch/arm/numa_device_tree.c
+++ b/xen/arch/arm/numa_device_tree.c
@@ -20,11 +20,13 @@
 #include <xen/init.h>
 #include <xen/nodemask.h>
 #include <xen/numa.h>
+#include <xen/libfdt/libfdt.h>
 #include <xen/device_tree.h>
 #include <asm/setup.h>
 
 s8 device_tree_numa = 0;
 static nodemask_t processor_nodes_parsed __initdata;
+static nodemask_t memory_nodes_parsed __initdata;
 
 static int srat_disabled(void)
 {
@@ -55,6 +57,79 @@ static int __init dtb_numa_processor_affinity_init(nodeid_t node)
     return 0;
 }
 
+/* Callback for parsing of the memory regions affinity */
+static int __init dtb_numa_memory_affinity_init(nodeid_t node,
+                                paddr_t start, paddr_t size)
+{
+    struct node *nd;
+    paddr_t end;
+    int i;
+
+    if ( srat_disabled() )
+        return -EINVAL;
+
+    end = start + size;
+    if ( num_node_memblks >= NR_NODE_MEMBLKS )
+    {
+        dprintk(XENLOG_WARNING,
+                "Too many numa entry, try bigger NR_NODE_MEMBLKS \n");
+        bad_srat();
+        return -EINVAL;
+    }
+
+    /* It is fine to add this area to the nodes data it will be used later */
+    i = conflicting_memblks(start, end);
+    /* No conflicting memory block, we can save it for later usage */;
+    if ( i < 0 )
+        goto save_memblk;
+
+    if ( memblk_nodeid[i] == node ) {
+        /*
+         * Overlaps with other memblk in the same node, warning here.
+         * This memblk will be merged with conflicted memblk later.
+         */
+        printk(XENLOG_WARNING
+               "DT: NUMA NODE %u (%"PRIx64
+               "-%"PRIx64") overlaps with itself (%"PRIx64"-%"PRIx64")\n",
+               node, start, end,
+               node_memblk_range[i].start, node_memblk_range[i].end);
+    } else {
+        /*
+         * Conflict with memblk in other node, this is an error.
+         * The NUMA information is invalid, NUMA will be turn off.
+         */
+        printk(XENLOG_ERR
+               "DT: NUMA NODE %u (%"PRIx64"-%"
+               PRIx64") overlaps with NODE %u (%"PRIx64"-%"PRIx64")\n",
+               node, start, end, memblk_nodeid[i],
+               node_memblk_range[i].start, node_memblk_range[i].end);
+        bad_srat();
+        return -EINVAL;
+    }
+
+save_memblk:
+    nd = &nodes[node];
+    if ( !node_test_and_set(node, memory_nodes_parsed) ) {
+        nd->start = start;
+        nd->end = end;
+    } else {
+        if ( start < nd->start )
+            nd->start = start;
+        if ( nd->end < end )
+            nd->end = end;
+    }
+
+    printk(XENLOG_INFO "DT: NUMA node %u %"PRIx64"-%"PRIx64"\n",
+           node, start, end);
+
+    node_memblk_range[num_node_memblks].start = start;
+    node_memblk_range[num_node_memblks].end = end;
+    memblk_nodeid[num_node_memblks] = node;
+    num_node_memblks++;
+
+    return 0;
+}
+
 /* Parse CPU NUMA node info */
 int __init device_tree_parse_numa_cpu_node(const void *fdt, int node)
 {
@@ -70,3 +145,58 @@ int __init device_tree_parse_numa_cpu_node(const void *fdt, int node)
 
     return dtb_numa_processor_affinity_init(nid);
 }
+
+/* Parse memory node NUMA info */
+int __init
+device_tree_parse_numa_memory_node(const void *fdt, int node,
+    const char *name, uint32_t addr_cells, uint32_t size_cells)
+{
+    uint32_t nid;
+    int ret = 0, len;
+    paddr_t addr, size;
+    const struct fdt_property *prop;
+    uint32_t idx, ranges;
+    const __be32 *addresses;
+
+    nid = device_tree_get_u32(fdt, node, "numa-node-id", MAX_NUMNODES);
+    if ( nid >= MAX_NUMNODES )
+    {
+        printk(XENLOG_WARNING "Node id %u exceeds maximum value\n", nid);
+        return -EINVAL;
+    }
+
+    prop = fdt_get_property(fdt, node, "reg", &len);
+    if ( !prop )
+    {
+        printk(XENLOG_WARNING
+               "fdt: node `%s': missing `reg' property\n", name);
+        return -EINVAL;
+    }
+
+    addresses = (const __be32 *)prop->data;
+    ranges = len / (sizeof(__be32)* (addr_cells + size_cells));
+    for ( idx = 0; idx < ranges; idx++ )
+    {
+        device_tree_get_reg(&addresses, addr_cells, size_cells, &addr, &size);
+        /* Skip zero size ranges */
+        if ( !size )
+            continue;
+
+        ret = dtb_numa_memory_affinity_init(nid, addr, size);
+        if ( ret ) {
+            printk(XENLOG_WARNING
+                   "NUMA: process range#%d addr = %lx size=%lx failed!\n",
+                   idx, addr, size);
+            return -EINVAL;
+        }
+    }
+
+    if ( idx == 0 )
+    {
+        printk(XENLOG_ERR
+               "bad property in memory node, idx=%d ret=%d\n", idx, ret);
+        return -EINVAL;
+    }
+
+    return 0;
+}
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 24/40] xen/arm: introduce a helper to parse device tree NUMA distance map
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (22 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse device tree memory node Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-25 13:56   ` Julien Grall
  2021-08-31  0:48   ` Stefano Stabellini
  2021-08-11 10:24 ` [XEN RFC PATCH 25/40] xen/arm: unified entry to parse all NUMA data from device tree Wei Chen
                   ` (18 subsequent siblings)
  42 siblings, 2 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

A NUMA aware device tree will provide a "distance-map" node to
describe distance between any two nodes. This patch introduce a
new helper to parse this distance map.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/numa_device_tree.c | 67 +++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/xen/arch/arm/numa_device_tree.c b/xen/arch/arm/numa_device_tree.c
index bbe081dcd1..6e0d1d3d9f 100644
--- a/xen/arch/arm/numa_device_tree.c
+++ b/xen/arch/arm/numa_device_tree.c
@@ -200,3 +200,70 @@ device_tree_parse_numa_memory_node(const void *fdt, int node,
 
     return 0;
 }
+
+/* Parse NUMA distance map v1 */
+int __init
+device_tree_parse_numa_distance_map_v1(const void *fdt, int node)
+{
+    const struct fdt_property *prop;
+    const __be32 *matrix;
+    int entry_count, len, i;
+
+    printk(XENLOG_INFO "NUMA: parsing numa-distance-map\n");
+
+    prop = fdt_get_property(fdt, node, "distance-matrix", &len);
+    if ( !prop )
+    {
+        printk(XENLOG_WARNING
+               "NUMA: No distance-matrix property in distance-map\n");
+
+        return -EINVAL;
+    }
+
+    if ( len % sizeof(uint32_t) != 0 )
+    {
+        printk(XENLOG_WARNING
+               "distance-matrix in node is not a multiple of u32\n");
+        return -EINVAL;
+    }
+
+    entry_count = len / sizeof(uint32_t);
+    if ( entry_count <= 0 )
+    {
+        printk(XENLOG_WARNING "NUMA: Invalid distance-matrix\n");
+
+        return -EINVAL;
+    }
+
+    matrix = (const __be32 *)prop->data;
+    for ( i = 0; i + 2 < entry_count; i += 3 )
+    {
+        uint32_t from, to, distance;
+
+        from = dt_read_number(matrix, 1);
+        matrix++;
+        to = dt_read_number(matrix, 1);
+        matrix++;
+        distance = dt_read_number(matrix, 1);
+        matrix++;
+
+        if ( (from == to && distance != NUMA_LOCAL_DISTANCE) ||
+            (from != to && distance <= NUMA_LOCAL_DISTANCE) )
+        {
+            printk(XENLOG_WARNING
+                   "Invalid nodes' distance from node#%d to node#%d = %d\n",
+                   from, to, distance);
+            return -EINVAL;
+        }
+
+        printk(XENLOG_INFO "NUMA: distance from node#%d to node#%d = %d\n",
+               from, to, distance);
+        numa_set_distance(from, to, distance);
+
+        /* Set default distance of node B->A same as A->B */
+        if (to > from)
+             numa_set_distance(to, from, distance);
+    }
+
+    return 0;
+}
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 25/40] xen/arm: unified entry to parse all NUMA data from device tree
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (23 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 24/40] xen/arm: introduce a helper to parse device tree NUMA distance map Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-31  0:54   ` Stefano Stabellini
  2021-08-11 10:24 ` [XEN RFC PATCH 26/40] xen/arm: Add boot and secondary CPU to NUMA system Wei Chen
                   ` (17 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

In this API, we scan whole device tree to parse CPU node id, memory
node id and distance-map. Though early_scan_node will invoke has a
handler to process memory nodes. If we want to parse memory node id
in this handler, we have to embeded NUMA parse code in this handler.
But we still need to scan whole device tree to find CPU NUMA id and
distance-map. In this case, we include memory NUMA id parse in this
API too. Another benefit is that we have a unique entry for device
tree NUMA data parse.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/numa_device_tree.c | 31 ++++++++++++++++++++++++++++---
 xen/include/asm-arm/numa.h      |  1 +
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/numa_device_tree.c b/xen/arch/arm/numa_device_tree.c
index 6e0d1d3d9f..27ffb72f7b 100644
--- a/xen/arch/arm/numa_device_tree.c
+++ b/xen/arch/arm/numa_device_tree.c
@@ -131,7 +131,8 @@ save_memblk:
 }
 
 /* Parse CPU NUMA node info */
-int __init device_tree_parse_numa_cpu_node(const void *fdt, int node)
+static int __init
+device_tree_parse_numa_cpu_node(const void *fdt, int node)
 {
     uint32_t nid;
 
@@ -147,7 +148,7 @@ int __init device_tree_parse_numa_cpu_node(const void *fdt, int node)
 }
 
 /* Parse memory node NUMA info */
-int __init
+static int __init
 device_tree_parse_numa_memory_node(const void *fdt, int node,
     const char *name, uint32_t addr_cells, uint32_t size_cells)
 {
@@ -202,7 +203,7 @@ device_tree_parse_numa_memory_node(const void *fdt, int node,
 }
 
 /* Parse NUMA distance map v1 */
-int __init
+static int __init
 device_tree_parse_numa_distance_map_v1(const void *fdt, int node)
 {
     const struct fdt_property *prop;
@@ -267,3 +268,27 @@ device_tree_parse_numa_distance_map_v1(const void *fdt, int node)
 
     return 0;
 }
+
+static int __init fdt_scan_numa_nodes(const void *fdt,
+                int node, const char *uname, int depth,
+                u32 address_cells, u32 size_cells, void *data)
+{
+    int ret = 0;
+
+    if ( fdt_node_check_type(fdt, node, "cpu") == 0 )
+        ret = device_tree_parse_numa_cpu_node(fdt, node);
+    else if ( fdt_node_check_type(fdt, node, "memory") == 0 )
+        ret = device_tree_parse_numa_memory_node(fdt, node, uname,
+                                address_cells, size_cells);
+    else if ( fdt_node_check_compatible(fdt, node,
+                                "numa-distance-map-v1") == 0 )
+        ret = device_tree_parse_numa_distance_map_v1(fdt, node);
+
+    return ret;
+}
+
+/* Initialize NUMA from device tree */
+int __init numa_device_tree_init(const void *fdt)
+{
+    return device_tree_for_each_node(fdt, 0, fdt_scan_numa_nodes, NULL);
+}
diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
index 756ad82d07..7a3588ac7f 100644
--- a/xen/include/asm-arm/numa.h
+++ b/xen/include/asm-arm/numa.h
@@ -26,6 +26,7 @@ typedef u8 nodeid_t;
 extern s8 device_tree_numa;
 
 extern void numa_init(bool acpi_off);
+extern int numa_device_tree_init(const void *fdt);
 extern void numa_set_distance(nodeid_t from, nodeid_t to, uint32_t distance);
 
 /*
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 26/40] xen/arm: Add boot and secondary CPU to NUMA system
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (24 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 25/40] xen/arm: unified entry to parse all NUMA data from device tree Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-25 16:58   ` Julien Grall
  2021-08-11 10:24 ` [XEN RFC PATCH 27/40] xen/arm: build CPU NUMA node map while creating cpu_logical_map Wei Chen
                   ` (16 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

When cpu boot up, we have add them to NUMA system. In current
stage, we have not parsed the NUMA data, but we have created
a fake NUMA node. So, in this patch, all CPU will be added
to NUMA node#0. After the NUMA data has been parsed from device
tree, the CPU will be added to correct NUMA node as the NUMA
data described.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/setup.c       | 6 ++++++
 xen/arch/arm/smpboot.c     | 6 ++++++
 xen/include/asm-arm/numa.h | 1 +
 3 files changed, 13 insertions(+)

diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 3c58d2d441..7531989f21 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -918,6 +918,12 @@ void __init start_xen(unsigned long boot_phys_offset,
 
     processor_id();
 
+    /*
+     * If Xen is running on a NUMA off system, there will
+     * be a node#0 at least.
+     */
+    numa_add_cpu(0);
+
     smp_init_cpus();
     cpus = smp_get_max_cpus();
     printk(XENLOG_INFO "SMP: Allowing %u CPUs\n", cpus);
diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index a1ee3146ef..aa78958c07 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -358,6 +358,12 @@ void start_secondary(void)
      */
     smp_wmb();
 
+    /*
+     * If Xen is running on a NUMA off system, there will
+     * be a node#0 at least.
+     */
+    numa_add_cpu(cpuid);
+
     /* Now report this CPU is up */
     cpumask_set_cpu(cpuid, &cpu_online_map);
 
diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
index 7a3588ac7f..dd31324b0b 100644
--- a/xen/include/asm-arm/numa.h
+++ b/xen/include/asm-arm/numa.h
@@ -59,6 +59,7 @@ extern mfn_t first_valid_mfn;
 #define __node_distance(a, b) (20)
 
 #define numa_init(x) do { } while (0)
+#define numa_add_cpu(x) do { } while (0)
 #define numa_set_node(x, y) do { } while (0)
 
 #endif
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 27/40] xen/arm: build CPU NUMA node map while creating cpu_logical_map
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (25 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 26/40] xen/arm: Add boot and secondary CPU to NUMA system Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-25 17:06   ` Julien Grall
  2021-08-11 10:24 ` [XEN RFC PATCH 28/40] xen/x86: decouple nodes_cover_memory with E820 map Wei Chen
                   ` (15 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

Sometimes, CPU logical ID maybe different with physical CPU ID.
Xen is using CPU logial ID for runtime usage, so we should use
CPU logical ID to create map between NUMA node and CPU.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/smpboot.c | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index aa78958c07..dd5a45bffc 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -121,7 +121,12 @@ static void __init dt_smp_init_cpus(void)
     {
         [0 ... NR_CPUS - 1] = MPIDR_INVALID
     };
+    static nodeid_t node_map[NR_CPUS] __initdata =
+    {
+        [0 ... NR_CPUS - 1] = NUMA_NO_NODE
+    };
     bool bootcpu_valid = false;
+    uint32_t nid = 0;
     int rc;
 
     mpidr = boot_cpu_data.mpidr.bits & MPIDR_HWID_MASK;
@@ -172,6 +177,26 @@ static void __init dt_smp_init_cpus(void)
             continue;
         }
 
+#ifdef CONFIG_DEVICE_TREE_NUMA
+        /*
+         *  When CONFIG_DEVICE_TREE_NUMA is set, try to fetch numa infomation
+         * from CPU dts node, otherwise the nid is always 0.
+         */
+        if ( !dt_property_read_u32(cpu, "numa-node-id", &nid) )
+        {
+            printk(XENLOG_WARNING
+                "cpu[%d] dts path: %s: doesn't have numa infomation!\n",
+                cpuidx, dt_node_full_name(cpu));
+            /*
+             * The the early stage of NUMA initialization, when Xen found any
+             * CPU dts node doesn't have numa-node-id info, the NUMA will be
+             * treated as off, all CPU will be set to a FAKE node 0. So if we
+             * get numa-node-id failed here, we should set nid to 0.
+             */
+            nid = 0;
+        }
+#endif
+
         /*
          * 8 MSBs must be set to 0 in the DT since the reg property
          * defines the MPIDR[23:0]
@@ -231,9 +256,12 @@ static void __init dt_smp_init_cpus(void)
         {
             printk("cpu%d init failed (hwid %"PRIregister"): %d\n", i, hwid, rc);
             tmp_map[i] = MPIDR_INVALID;
+            node_map[i] = NUMA_NO_NODE;
         }
-        else
+        else {
             tmp_map[i] = hwid;
+            node_map[i] = nid;
+        }
     }
 
     if ( !bootcpu_valid )
@@ -249,6 +277,7 @@ static void __init dt_smp_init_cpus(void)
             continue;
         cpumask_set_cpu(i, &cpu_possible_map);
         cpu_logical_map(i) = tmp_map[i];
+        numa_set_node(i, node_map[i]);
     }
 }
 
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 28/40] xen/x86: decouple nodes_cover_memory with E820 map
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (26 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 27/40] xen/arm: build CPU NUMA node map while creating cpu_logical_map Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-31  1:07   ` Stefano Stabellini
  2021-08-11 10:24 ` [XEN RFC PATCH 29/40] xen/arm: implement Arm arch helpers Arm to get memory map info Wei Chen
                   ` (14 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

We will reuse nodes_cover_memory for Arm to check its bootmem
info. So we introduce two arch helpers to get memory map's
entry number and specified entry's range:
    arch_get_memory_bank_number
    arch_get_memory_bank_range

Depends above two helpers, we make nodes_cover_memory become
architecture independent.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/x86/numa.c    | 18 ++++++++++++++++++
 xen/arch/x86/srat.c    |  8 +++-----
 xen/include/xen/numa.h |  4 ++++
 3 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
index 6908738305..8b43be4aa7 100644
--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -128,6 +128,24 @@ unsigned int __init arch_get_dma_bitsize(void)
                  + PAGE_SHIFT, 32);
 }
 
+uint32_t __init arch_meminfo_get_nr_bank(void)
+{
+	return e820.nr_map;
+}
+
+int __init arch_meminfo_get_ram_bank_range(int bank,
+	unsigned long long *start, unsigned long long *end)
+{
+	if (e820.map[bank].type != E820_RAM || !start || !end) {
+		return -1;
+	}
+
+	*start = e820.map[bank].addr;
+	*end = e820.map[bank].addr + e820.map[bank].size;
+
+	return 0;
+}
+
 static void dump_numa(unsigned char key)
 {
     s_time_t now = NOW();
diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
index 6d68b8a614..2298353846 100644
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -316,18 +316,16 @@ acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
 static int __init nodes_cover_memory(void)
 {
 	int i;
+	uint32_t nr_banks = arch_meminfo_get_nr_bank();
 
-	for (i = 0; i < e820.nr_map; i++) {
+	for (i = 0; i < nr_banks; i++) {
 		int j, found;
 		unsigned long long start, end;
 
-		if (e820.map[i].type != E820_RAM) {
+		if (arch_meminfo_get_ram_bank_range(i, &start, &end)) {
 			continue;
 		}
 
-		start = e820.map[i].addr;
-		end = e820.map[i].addr + e820.map[i].size;
-
 		do {
 			found = 0;
 			for_each_node_mask(j, memory_nodes_parsed)
diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
index 0475823b13..6d18059bcd 100644
--- a/xen/include/xen/numa.h
+++ b/xen/include/xen/numa.h
@@ -89,6 +89,10 @@ static inline void clear_node_cpumask(int cpu)
 	cpumask_clear_cpu(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
 }
 
+extern uint32_t arch_meminfo_get_nr_bank(void);
+extern int arch_meminfo_get_ram_bank_range(int bank,
+    unsigned long long *start, unsigned long long *end);
+
 #endif /* CONFIG_NUMA */
 
 #endif /* _XEN_NUMA_H */
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 29/40] xen/arm: implement Arm arch helpers Arm to get memory map info
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (27 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 28/40] xen/x86: decouple nodes_cover_memory with E820 map Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-25 17:09   ` Julien Grall
  2021-08-11 10:24 ` [XEN RFC PATCH 30/40] xen: move NUMA memory and CPU parsed nodemasks to common Wei Chen
                   ` (13 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

These two helpers are architecture APIs that are required by
nodes_cover_memory.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/numa.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/xen/arch/arm/numa.c b/xen/arch/arm/numa.c
index f61a8df645..6eebf8e8bc 100644
--- a/xen/arch/arm/numa.c
+++ b/xen/arch/arm/numa.c
@@ -126,3 +126,17 @@ void __init numa_init(bool acpi_off)
     numa_initmem_init(PFN_UP(ram_start), PFN_DOWN(ram_end));
     return;
 }
+
+uint32_t __init arch_meminfo_get_nr_bank(void)
+{
+	return bootinfo.mem.nr_banks;
+}
+
+int __init arch_meminfo_get_ram_bank_range(int bank,
+	unsigned long long *start, unsigned long long *end)
+{
+	*start = bootinfo.mem.bank[bank].start;
+	*end = bootinfo.mem.bank[bank].start + bootinfo.mem.bank[bank].size;
+
+	return 0;
+}
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 30/40] xen: move NUMA memory and CPU parsed nodemasks to common
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (28 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 29/40] xen/arm: implement Arm arch helpers Arm to get memory map info Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-25 17:16   ` Julien Grall
  2021-08-11 10:24 ` [XEN RFC PATCH 31/40] xen/x86: move nodes_cover_memory " Wei Chen
                   ` (12 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

Both memory_nodes_parsed and processor_nodes_parsed are using
for Arm and x86 to record parded NUMA memory and CPU. So we
move them to common.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/numa_device_tree.c | 2 --
 xen/arch/x86/srat.c             | 3 ---
 xen/common/numa.c               | 3 +++
 xen/include/xen/nodemask.h      | 2 ++
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/xen/arch/arm/numa_device_tree.c b/xen/arch/arm/numa_device_tree.c
index 27ffb72f7b..f74b7f6427 100644
--- a/xen/arch/arm/numa_device_tree.c
+++ b/xen/arch/arm/numa_device_tree.c
@@ -25,8 +25,6 @@
 #include <asm/setup.h>
 
 s8 device_tree_numa = 0;
-static nodemask_t processor_nodes_parsed __initdata;
-static nodemask_t memory_nodes_parsed __initdata;
 
 static int srat_disabled(void)
 {
diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
index 2298353846..dd3aa30843 100644
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -24,9 +24,6 @@
 
 static struct acpi_table_slit *__read_mostly acpi_slit;
 
-static nodemask_t memory_nodes_parsed __initdata;
-static nodemask_t processor_nodes_parsed __initdata;
-
 struct pxm2node {
 	unsigned pxm;
 	nodeid_t node;
diff --git a/xen/common/numa.c b/xen/common/numa.c
index 26c0006d04..79ab250543 100644
--- a/xen/common/numa.c
+++ b/xen/common/numa.c
@@ -35,6 +35,9 @@ int num_node_memblks;
 struct node node_memblk_range[NR_NODE_MEMBLKS];
 nodeid_t memblk_nodeid[NR_NODE_MEMBLKS];
 
+nodemask_t memory_nodes_parsed __initdata;
+nodemask_t processor_nodes_parsed __initdata;
+
 bool numa_off;
 
 /*
diff --git a/xen/include/xen/nodemask.h b/xen/include/xen/nodemask.h
index 1dd6c7458e..29ce5e28e7 100644
--- a/xen/include/xen/nodemask.h
+++ b/xen/include/xen/nodemask.h
@@ -276,6 +276,8 @@ static inline int __cycle_node(int n, const nodemask_t *maskp, int nbits)
  */
 
 extern nodemask_t node_online_map;
+extern nodemask_t memory_nodes_parsed;
+extern nodemask_t processor_nodes_parsed;
 
 #if MAX_NUMNODES > 1
 #define num_online_nodes()	nodes_weight(node_online_map)
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 31/40] xen/x86: move nodes_cover_memory to common
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (29 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 30/40] xen: move NUMA memory and CPU parsed nodemasks to common Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-31  1:16   ` Stefano Stabellini
  2021-08-11 10:24 ` [XEN RFC PATCH 32/40] xen/x86: make acpi_scan_nodes to be neutral Wei Chen
                   ` (11 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

Not only ACPU NUMA, but also Arm device tree based NUMA
will use nodes_cover_memory to do sanity check. So we move
this function from arch/x86 to common.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/x86/srat.c    | 40 ----------------------------------------
 xen/common/numa.c      | 40 ++++++++++++++++++++++++++++++++++++++++
 xen/include/xen/numa.h |  1 +
 3 files changed, 41 insertions(+), 40 deletions(-)

diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
index dd3aa30843..dcebc7adec 100644
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -308,46 +308,6 @@ acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
 	num_node_memblks++;
 }
 
-/* Sanity check to catch more bad SRATs (they are amazingly common).
-   Make sure the PXMs cover all memory. */
-static int __init nodes_cover_memory(void)
-{
-	int i;
-	uint32_t nr_banks = arch_meminfo_get_nr_bank();
-
-	for (i = 0; i < nr_banks; i++) {
-		int j, found;
-		unsigned long long start, end;
-
-		if (arch_meminfo_get_ram_bank_range(i, &start, &end)) {
-			continue;
-		}
-
-		do {
-			found = 0;
-			for_each_node_mask(j, memory_nodes_parsed)
-				if (start < nodes[j].end
-				    && end > nodes[j].start) {
-					if (start >= nodes[j].start) {
-						start = nodes[j].end;
-						found = 1;
-					}
-					if (end <= nodes[j].end) {
-						end = nodes[j].start;
-						found = 1;
-					}
-				}
-		} while (found && start < end);
-
-		if (start < end) {
-			printk(KERN_ERR "SRAT: No PXM for e820 range: "
-				"%016Lx - %016Lx\n", start, end);
-			return 0;
-		}
-	}
-	return 1;
-}
-
 void __init acpi_numa_arch_fixup(void) {}
 
 static uint64_t __initdata srat_region_mask;
diff --git a/xen/common/numa.c b/xen/common/numa.c
index 79ab250543..74960885a6 100644
--- a/xen/common/numa.c
+++ b/xen/common/numa.c
@@ -193,6 +193,46 @@ void __init cutoff_node(int i, u64 start, u64 end)
 	}
 }
 
+/* Sanity check to catch more bad SRATs (they are amazingly common).
+   Make sure the PXMs cover all memory. */
+int __init nodes_cover_memory(void)
+{
+	int i;
+	uint32_t nr_banks = arch_meminfo_get_nr_bank();
+
+	for (i = 0; i < nr_banks; i++) {
+		int j, found;
+		unsigned long long start, end;
+
+		if (arch_meminfo_get_ram_bank_range(i, &start, &end)) {
+			continue;
+		}
+
+		do {
+			found = 0;
+			for_each_node_mask(j, memory_nodes_parsed)
+				if (start < nodes[j].end
+				    && end > nodes[j].start) {
+					if (start >= nodes[j].start) {
+						start = nodes[j].end;
+						found = 1;
+					}
+					if (end <= nodes[j].end) {
+						end = nodes[j].start;
+						found = 1;
+					}
+				}
+		} while (found && start < end);
+
+		if (start < end) {
+			printk(KERN_ERR "SRAT: No PXM for e820 range: "
+				"%016Lx - %016Lx\n", start, end);
+			return 0;
+		}
+	}
+	return 1;
+}
+
 void numa_add_cpu(int cpu)
 {
     cpumask_set_cpu(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
index 6d18059bcd..094ab904c9 100644
--- a/xen/include/xen/numa.h
+++ b/xen/include/xen/numa.h
@@ -92,6 +92,7 @@ static inline void clear_node_cpumask(int cpu)
 extern uint32_t arch_meminfo_get_nr_bank(void);
 extern int arch_meminfo_get_ram_bank_range(int bank,
     unsigned long long *start, unsigned long long *end);
+extern int nodes_cover_memory(void);
 
 #endif /* CONFIG_NUMA */
 
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 32/40] xen/x86: make acpi_scan_nodes to be neutral
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (30 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 31/40] xen/x86: move nodes_cover_memory " Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-27 14:08   ` Julien Grall
  2021-08-11 10:24 ` [XEN RFC PATCH 33/40] xen: export bad_srat and srat_disabled to extern Wei Chen
                   ` (10 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

The code in acpi_scan_nodes can be reused for device tree based
NUMA. So we rename acpi_scan_nodes to numa_scan_nodes for a neutral
function name. As acpi_numa variable is available in ACPU based NUMA
system only, we use CONFIG_ACPI_NUMA to protect it.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/x86/srat.c        | 4 +++-
 xen/common/numa.c          | 2 +-
 xen/include/asm-x86/acpi.h | 2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
index dcebc7adec..3d4d90a622 100644
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -362,7 +362,7 @@ void __init srat_parse_regions(u64 addr)
 }
 
 /* Use the information discovered above to actually set up the nodes. */
-int __init acpi_scan_nodes(u64 start, u64 end)
+int __init numa_scan_nodes(u64 start, u64 end)
 {
 	int i;
 	nodemask_t all_nodes_parsed;
@@ -371,8 +371,10 @@ int __init acpi_scan_nodes(u64 start, u64 end)
 	for (i = 0; i < MAX_NUMNODES; i++)
 		cutoff_node(i, start, end);
 
+#ifdef CONFIG_ACPI_NUMA
 	if (acpi_numa <= 0)
 		return -1;
+#endif
 
 	if (!nodes_cover_memory()) {
 		bad_srat();
diff --git a/xen/common/numa.c b/xen/common/numa.c
index 74960885a6..4152bbe83b 100644
--- a/xen/common/numa.c
+++ b/xen/common/numa.c
@@ -330,7 +330,7 @@ void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
 #endif
 
 #ifdef CONFIG_ACPI_NUMA
-    if ( !numa_off && !acpi_scan_nodes((u64)start_pfn << PAGE_SHIFT,
+    if ( !numa_off && !numa_scan_nodes((u64)start_pfn << PAGE_SHIFT,
          (u64)end_pfn << PAGE_SHIFT) )
         return;
 #endif
diff --git a/xen/include/asm-x86/acpi.h b/xen/include/asm-x86/acpi.h
index d347500a3c..33b71dfb3b 100644
--- a/xen/include/asm-x86/acpi.h
+++ b/xen/include/asm-x86/acpi.h
@@ -102,7 +102,7 @@ extern unsigned long acpi_wakeup_address;
 #define ARCH_HAS_POWER_INIT	1
 
 extern s8 acpi_numa;
-extern int acpi_scan_nodes(u64 start, u64 end);
+extern int numa_scan_nodes(u64 start, u64 end);
 
 extern struct acpi_sleep_info acpi_sinfo;
 #define acpi_video_flags bootsym(video_flags)
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 33/40] xen: export bad_srat and srat_disabled to extern
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (31 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 32/40] xen/x86: make acpi_scan_nodes to be neutral Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-11 10:24 ` [XEN RFC PATCH 34/40] xen: move numa_scan_nodes from x86 to common Wei Chen
                   ` (9 subsequent siblings)
  42 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

These two functions are architecture implementation. But common
code out of arch will invoke them, so we export them to extern.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/numa_device_tree.c | 4 ++--
 xen/arch/x86/srat.c             | 2 +-
 xen/include/asm-x86/numa.h      | 1 -
 xen/include/xen/numa.h          | 2 ++
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/numa_device_tree.c b/xen/arch/arm/numa_device_tree.c
index f74b7f6427..f7f2eeebc3 100644
--- a/xen/arch/arm/numa_device_tree.c
+++ b/xen/arch/arm/numa_device_tree.c
@@ -26,12 +26,12 @@
 
 s8 device_tree_numa = 0;
 
-static int srat_disabled(void)
+int srat_disabled(void)
 {
     return numa_off || device_tree_numa < 0;
 }
 
-static __init void bad_srat(void)
+__init void bad_srat(void)
 {
     printk(KERN_ERR "DT: NUMA information is not used.\n");
     device_tree_numa = -1;
diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
index 3d4d90a622..c979939fdd 100644
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -97,7 +97,7 @@ nodeid_t setup_node(unsigned pxm)
 	return node;
 }
 
-static __init void bad_srat(void)
+__init void bad_srat(void)
 {
 	int i;
 	printk(KERN_ERR "SRAT: SRAT not used.\n");
diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
index e0253c20b7..e63869135c 100644
--- a/xen/include/asm-x86/numa.h
+++ b/xen/include/asm-x86/numa.h
@@ -13,7 +13,6 @@ extern nodeid_t pxm_to_node(unsigned int pxm);
 
 #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
 
-extern int srat_disabled(void);
 extern nodeid_t setup_node(unsigned int pxm);
 extern void srat_detect_node(int cpu);
 
diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
index 094ab904c9..490381bd13 100644
--- a/xen/include/xen/numa.h
+++ b/xen/include/xen/numa.h
@@ -75,6 +75,8 @@ extern int compute_hash_shift(struct node *nodes, int numnodes,
 extern int conflicting_memblks(u64 start, u64 end);
 extern void cutoff_node(int i, u64 start, u64 end);
 extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
+extern int srat_disabled(void);
+extern void bad_srat(void);
 
 extern void numa_init_array(void);
 extern void numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn);
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 34/40] xen: move numa_scan_nodes from x86 to common
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (32 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 33/40] xen: export bad_srat and srat_disabled to extern Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-27 14:14   ` Julien Grall
  2021-08-31  1:26   ` Stefano Stabellini
  2021-08-11 10:24 ` [XEN RFC PATCH 35/40] xen: enable numa_scan_nodes for device tree based NUMA Wei Chen
                   ` (8 subsequent siblings)
  42 siblings, 2 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

After the previous patches preparations, numa_scan_nodes can be
used by Arm and x86. So we move this function from x86 to common.
As node_cover_memory will not be used cross files, we restore its
static attribute in this patch.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/x86/srat.c        | 52 ------------------------------------
 xen/common/numa.c          | 54 +++++++++++++++++++++++++++++++++++++-
 xen/include/asm-x86/acpi.h |  3 ---
 xen/include/xen/numa.h     |  3 ++-
 4 files changed, 55 insertions(+), 57 deletions(-)

diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
index c979939fdd..c9f019c307 100644
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -361,58 +361,6 @@ void __init srat_parse_regions(u64 addr)
 	pfn_pdx_hole_setup(mask >> PAGE_SHIFT);
 }
 
-/* Use the information discovered above to actually set up the nodes. */
-int __init numa_scan_nodes(u64 start, u64 end)
-{
-	int i;
-	nodemask_t all_nodes_parsed;
-
-	/* First clean up the node list */
-	for (i = 0; i < MAX_NUMNODES; i++)
-		cutoff_node(i, start, end);
-
-#ifdef CONFIG_ACPI_NUMA
-	if (acpi_numa <= 0)
-		return -1;
-#endif
-
-	if (!nodes_cover_memory()) {
-		bad_srat();
-		return -1;
-	}
-
-	memnode_shift = compute_hash_shift(node_memblk_range, num_node_memblks,
-				memblk_nodeid);
-
-	if (memnode_shift < 0) {
-		printk(KERN_ERR
-		     "SRAT: No NUMA node hash function found. Contact maintainer\n");
-		bad_srat();
-		return -1;
-	}
-
-	nodes_or(all_nodes_parsed, memory_nodes_parsed, processor_nodes_parsed);
-
-	/* Finally register nodes */
-	for_each_node_mask(i, all_nodes_parsed)
-	{
-		u64 size = nodes[i].end - nodes[i].start;
-		if ( size == 0 )
-			printk(KERN_WARNING "SRAT: Node %u has no memory. "
-			       "BIOS Bug or mis-configured hardware?\n", i);
-
-		setup_node_bootmem(i, nodes[i].start, nodes[i].end);
-	}
-	for (i = 0; i < nr_cpu_ids; i++) {
-		if (cpu_to_node[i] == NUMA_NO_NODE)
-			continue;
-		if (!nodemask_test(cpu_to_node[i], &processor_nodes_parsed))
-			numa_set_node(i, NUMA_NO_NODE);
-	}
-	numa_init_array();
-	return 0;
-}
-
 static unsigned node_to_pxm(nodeid_t n)
 {
 	unsigned i;
diff --git a/xen/common/numa.c b/xen/common/numa.c
index 4152bbe83b..8ca13e27d1 100644
--- a/xen/common/numa.c
+++ b/xen/common/numa.c
@@ -195,7 +195,7 @@ void __init cutoff_node(int i, u64 start, u64 end)
 
 /* Sanity check to catch more bad SRATs (they are amazingly common).
    Make sure the PXMs cover all memory. */
-int __init nodes_cover_memory(void)
+static int __init nodes_cover_memory(void)
 {
 	int i;
 	uint32_t nr_banks = arch_meminfo_get_nr_bank();
@@ -271,6 +271,58 @@ void __init numa_init_array(void)
     }
 }
 
+/* Use the information discovered above to actually set up the nodes. */
+int __init numa_scan_nodes(u64 start, u64 end)
+{
+	int i;
+	nodemask_t all_nodes_parsed;
+
+	/* First clean up the node list */
+	for (i = 0; i < MAX_NUMNODES; i++)
+		cutoff_node(i, start, end);
+
+#ifdef CONFIG_ACPI_NUMA
+	if (acpi_numa <= 0)
+		return -1;
+#endif
+
+	if (!nodes_cover_memory()) {
+		bad_srat();
+		return -1;
+	}
+
+	memnode_shift = compute_hash_shift(node_memblk_range, num_node_memblks,
+				memblk_nodeid);
+
+	if (memnode_shift < 0) {
+		printk(KERN_ERR
+		     "SRAT: No NUMA node hash function found. Contact maintainer\n");
+		bad_srat();
+		return -1;
+	}
+
+	nodes_or(all_nodes_parsed, memory_nodes_parsed, processor_nodes_parsed);
+
+	/* Finally register nodes */
+	for_each_node_mask(i, all_nodes_parsed)
+	{
+		u64 size = nodes[i].end - nodes[i].start;
+		if ( size == 0 )
+			printk(KERN_WARNING "SRAT: Node %u has no memory. "
+			       "BIOS Bug or mis-configured hardware?\n", i);
+
+		setup_node_bootmem(i, nodes[i].start, nodes[i].end);
+	}
+	for (i = 0; i < nr_cpu_ids; i++) {
+		if (cpu_to_node[i] == NUMA_NO_NODE)
+			continue;
+		if (!nodemask_test(cpu_to_node[i], &processor_nodes_parsed))
+			numa_set_node(i, NUMA_NO_NODE);
+	}
+	numa_init_array();
+	return 0;
+}
+
 #ifdef CONFIG_NUMA_EMU
 int numa_fake __initdata = 0;
 
diff --git a/xen/include/asm-x86/acpi.h b/xen/include/asm-x86/acpi.h
index 33b71dfb3b..2140461ff3 100644
--- a/xen/include/asm-x86/acpi.h
+++ b/xen/include/asm-x86/acpi.h
@@ -101,9 +101,6 @@ extern unsigned long acpi_wakeup_address;
 
 #define ARCH_HAS_POWER_INIT	1
 
-extern s8 acpi_numa;
-extern int numa_scan_nodes(u64 start, u64 end);
-
 extern struct acpi_sleep_info acpi_sinfo;
 #define acpi_video_flags bootsym(video_flags)
 struct xenpf_enter_acpi_sleep;
diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
index 490381bd13..b9b5d1ad88 100644
--- a/xen/include/xen/numa.h
+++ b/xen/include/xen/numa.h
@@ -81,8 +81,10 @@ extern void bad_srat(void);
 extern void numa_init_array(void);
 extern void numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn);
 extern void numa_set_node(int cpu, nodeid_t node);
+extern int numa_scan_nodes(u64 start, u64 end);
 extern bool numa_off;
 extern int numa_fake;
+extern s8 acpi_numa;
 
 extern void setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end);
 
@@ -94,7 +96,6 @@ static inline void clear_node_cpumask(int cpu)
 extern uint32_t arch_meminfo_get_nr_bank(void);
 extern int arch_meminfo_get_ram_bank_range(int bank,
     unsigned long long *start, unsigned long long *end);
-extern int nodes_cover_memory(void);
 
 #endif /* CONFIG_NUMA */
 
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 35/40] xen: enable numa_scan_nodes for device tree based NUMA
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (33 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 34/40] xen: move numa_scan_nodes from x86 to common Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-27 14:19   ` Julien Grall
  2021-08-11 10:24 ` [XEN RFC PATCH 36/40] xen/arm: keep guest still be NUMA unware Wei Chen
                   ` (7 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

Now, we can use the same function for ACPI and device tree based
NUMA to scan memory nodes.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/common/numa.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/common/numa.c b/xen/common/numa.c
index 8ca13e27d1..d15c2fc311 100644
--- a/xen/common/numa.c
+++ b/xen/common/numa.c
@@ -381,7 +381,7 @@ void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
         return;
 #endif
 
-#ifdef CONFIG_ACPI_NUMA
+#if defined(CONFIG_ACPI_NUMA) || defined(CONFIG_DEVICE_TREE_NUMA)
     if ( !numa_off && !numa_scan_nodes((u64)start_pfn << PAGE_SHIFT,
          (u64)end_pfn << PAGE_SHIFT) )
         return;
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 36/40] xen/arm: keep guest still be NUMA unware
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (34 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 35/40] xen: enable numa_scan_nodes for device tree based NUMA Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-27 14:28   ` Julien Grall
  2021-08-11 10:24 ` [XEN RFC PATCH 37/40] xen: introduce an arch helper to do NUMA init failed fallback Wei Chen
                   ` (6 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

We have not wanted to make Xen guest be NUMA aware in this patch
series. So in this patch, Xen will skip NUMA distance matrix node
and skip the numa-node-id property in CPU node and memory node,
when Xen is creating guest device tree binary.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/domain_build.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index cf341f349f..e62fa761bd 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -584,6 +584,10 @@ static int __init write_properties(struct domain *d, struct kernel_info *kinfo,
                 continue;
         }
 
+        /* Guest is numa unaware in current stage */
+        if ( dt_property_name_is_equal(prop, "numa-node-id") )
+            continue;
+
         res = fdt_property(kinfo->fdt, prop->name, prop_data, prop_len);
 
         if ( res )
@@ -1454,6 +1458,8 @@ static int __init handle_node(struct domain *d, struct kernel_info *kinfo,
         DT_MATCH_TYPE("memory"),
         /* The memory mapped timer is not supported by Xen. */
         DT_MATCH_COMPATIBLE("arm,armv7-timer-mem"),
+        /* Numa info doesn't need to be exposed to Domain-0 */
+        DT_MATCH_COMPATIBLE("numa-distance-map-v1"),
         { /* sentinel */ },
     };
     static const struct dt_device_match timer_matches[] __initconst =
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 37/40] xen: introduce an arch helper to do NUMA init failed fallback
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (35 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 36/40] xen/arm: keep guest still be NUMA unware Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-27 14:30   ` Julien Grall
  2021-08-11 10:24 ` [XEN RFC PATCH 38/40] xen/arm: enable device tree based NUMA in system init Wei Chen
                   ` (5 subsequent siblings)
  42 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

When Xen initialize NUMA failed, some architectures may need to
do fallback actions. For example, in device tree based NUMA, Arm
need to reset the distance between any two nodes.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/numa.c        | 13 +++++++++++++
 xen/common/numa.c          |  3 +++
 xen/include/asm-arm/numa.h |  1 +
 xen/include/asm-x86/numa.h |  6 ++++++
 4 files changed, 23 insertions(+)

diff --git a/xen/arch/arm/numa.c b/xen/arch/arm/numa.c
index 6eebf8e8bc..2a18c97470 100644
--- a/xen/arch/arm/numa.c
+++ b/xen/arch/arm/numa.c
@@ -140,3 +140,16 @@ int __init arch_meminfo_get_ram_bank_range(int bank,
 
 	return 0;
 }
+
+void __init arch_numa_init_failed_fallback(void)
+{
+    int i, j;
+
+    /* Reset all node distance to remote_distance */
+    for ( i = 0; i < MAX_NUMNODES; i++ ) {
+        for ( j = 0; j < MAX_NUMNODES; j++ ) {
+            numa_set_distance(i, j,
+                (i == j) ? NUMA_LOCAL_DISTANCE : NUMA_REMOTE_DISTANCE);
+        }
+    }
+}
diff --git a/xen/common/numa.c b/xen/common/numa.c
index d15c2fc311..88f1594127 100644
--- a/xen/common/numa.c
+++ b/xen/common/numa.c
@@ -405,4 +405,7 @@ void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
     cpumask_copy(&node_to_cpumask[0], cpumask_of(0));
     setup_node_bootmem(0, (u64)start_pfn << PAGE_SHIFT,
                     (u64)end_pfn << PAGE_SHIFT);
+
+    /* architecture specified fallback operations */
+    arch_numa_init_failed_fallback();
 }
diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
index dd31324b0b..a3982a94b6 100644
--- a/xen/include/asm-arm/numa.h
+++ b/xen/include/asm-arm/numa.h
@@ -28,6 +28,7 @@ extern s8 device_tree_numa;
 extern void numa_init(bool acpi_off);
 extern int numa_device_tree_init(const void *fdt);
 extern void numa_set_distance(nodeid_t from, nodeid_t to, uint32_t distance);
+extern void arch_numa_init_failed_fallback(void);
 
 /*
  * Temporary for fake NUMA node, when CPU, memory and distance
diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
index e63869135c..26280b0f3a 100644
--- a/xen/include/asm-x86/numa.h
+++ b/xen/include/asm-x86/numa.h
@@ -22,4 +22,10 @@ extern void init_cpu_to_node(void);
 void srat_parse_regions(u64 addr);
 unsigned int arch_get_dma_bitsize(void);
 
+/* Dummy function for numa init failed in numa_initmem_init */
+static inline void arch_numa_init_failed_fallback(void)
+{
+    return;
+}
+
 #endif
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 38/40] xen/arm: enable device tree based NUMA in system init
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (36 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 37/40] xen: introduce an arch helper to do NUMA init failed fallback Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-27 14:32   ` Julien Grall
  2021-08-31  1:50   ` Stefano Stabellini
  2021-08-11 10:24 ` [XEN RFC PATCH 39/40] xen/x86: move numa_setup to common to support NUMA switch in command line Wei Chen
                   ` (4 subsequent siblings)
  42 siblings, 2 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

Everything is ready, we can remove the fake NUMA node and
depends on device tree to create NUMA system.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/arm/numa.c        | 45 ++++++++++++++++++++++----------------
 xen/include/asm-arm/numa.h |  7 ------
 2 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/xen/arch/arm/numa.c b/xen/arch/arm/numa.c
index 2a18c97470..3b04220e60 100644
--- a/xen/arch/arm/numa.c
+++ b/xen/arch/arm/numa.c
@@ -18,6 +18,7 @@
  *
  */
 #include <xen/init.h>
+#include <xen/device_tree.h>
 #include <xen/nodemask.h>
 #include <xen/numa.h>
 #include <xen/pfn.h>
@@ -83,28 +84,34 @@ void __init numa_init(bool acpi_off)
     paddr_t ram_size = 0;
     paddr_t ram_end = 0;
 
-    printk(XENLOG_WARNING
-        "NUMA has not been supported yet, NUMA off!\n");
-    /* Arm NUMA has not been implemented until this patch */
-    numa_off = true;
+    /* NUMA has been turned off through Xen parameters */
+    if ( numa_off )
+        goto mem_init;
 
-    /*
-     * Set all cpu_to_node mapping to 0, this will make cpu_to_node
-     * function return 0 as previous fake cpu_to_node API.
-     */
-    for ( idx = 0; idx < NR_CPUS; idx++ )
-        cpu_to_node[idx] = 0;
-
-    /*
-     * Make node_to_cpumask, node_spanned_pages and node_start_pfn
-     * return as previous fake APIs.
-     */
-    for ( idx = 0; idx < MAX_NUMNODES; idx++ ) {
-        node_to_cpumask[idx] = cpu_online_map;
-        node_spanned_pages(idx) = (max_page - mfn_x(first_valid_mfn));
-        node_start_pfn(idx) = (mfn_x(first_valid_mfn));
+    /* Initialize NUMA from device tree when system is not ACPI booted */
+    if ( acpi_off )
+    {
+#ifdef CONFIG_DEVICE_TREE_NUMA
+        int ret = numa_device_tree_init(device_tree_flattened);
+        if ( !ret )
+            goto mem_init;
+        printk(XENLOG_WARNING
+               "Init NUMA from device tree failed, ret=%d\n", ret);
+#else
+        printk(XENLOG_WARNING
+               "CONFIG_DEVICE_TREE_NUMA is not set, NUMA off!\n");
+#endif
+        numa_off = true;
+    }
+    else
+    {
+        /* We don't support NUMA for ACPI boot currently */
+        printk(XENLOG_WARNING
+               "ACPI NUMA has not been supported yet, NUMA off!\n");
+        numa_off = true;
     }
 
+mem_init:
     /*
      * Find the minimal and maximum address of RAM, NUMA will
      * build a memory to node mapping table for the whole range.
diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
index a3982a94b6..425eb9aede 100644
--- a/xen/include/asm-arm/numa.h
+++ b/xen/include/asm-arm/numa.h
@@ -30,13 +30,6 @@ extern int numa_device_tree_init(const void *fdt);
 extern void numa_set_distance(nodeid_t from, nodeid_t to, uint32_t distance);
 extern void arch_numa_init_failed_fallback(void);
 
-/*
- * Temporary for fake NUMA node, when CPU, memory and distance
- * matrix will be read from DTB or ACPI SRAT. The following
- * symbols will be removed.
- */
-extern mfn_t first_valid_mfn;
-
 #else
 
 /* Fake one node for now. See also node_online_map. */
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 39/40] xen/x86: move numa_setup to common to support NUMA switch in command line
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (37 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 38/40] xen/arm: enable device tree based NUMA in system init Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-27 14:37   ` Julien Grall
  2021-08-31  1:53   ` Stefano Stabellini
  2021-08-11 10:24 ` [XEN RFC PATCH 40/40] xen/x86: move dump_numa info hotkey to common Wei Chen
                   ` (3 subsequent siblings)
  42 siblings, 2 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

Xen x86 has created a command line parameter "numa" as NUMA switch for
user to turn on/off NUMA. As device tree based NUMA has been enabled
for Arm, this parameter can be reused by Arm. So in this patch, we move
this parameter to common.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/x86/numa.c    | 34 ----------------------------------
 xen/common/numa.c      | 35 ++++++++++++++++++++++++++++++++++-
 xen/include/xen/numa.h |  1 -
 3 files changed, 34 insertions(+), 36 deletions(-)

diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
index 8b43be4aa7..380d8ed6fd 100644
--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -11,7 +11,6 @@
 #include <xen/nodemask.h>
 #include <xen/numa.h>
 #include <xen/keyhandler.h>
-#include <xen/param.h>
 #include <xen/time.h>
 #include <xen/smp.h>
 #include <xen/pfn.h>
@@ -19,9 +18,6 @@
 #include <xen/sched.h>
 #include <xen/softirq.h>
 
-static int numa_setup(const char *s);
-custom_param("numa", numa_setup);
-
 #ifndef Dprintk
 #define Dprintk(x...)
 #endif
@@ -50,35 +46,6 @@ void numa_set_node(int cpu, nodeid_t node)
     cpu_to_node[cpu] = node;
 }
 
-/* [numa=off] */
-static __init int numa_setup(const char *opt)
-{
-    if ( !strncmp(opt,"off",3) )
-        numa_off = true;
-    else if ( !strncmp(opt,"on",2) )
-        numa_off = false;
-#ifdef CONFIG_NUMA_EMU
-    else if ( !strncmp(opt, "fake=", 5) )
-    {
-        numa_off = false;
-        numa_fake = simple_strtoul(opt+5,NULL,0);
-        if ( numa_fake >= MAX_NUMNODES )
-            numa_fake = MAX_NUMNODES;
-    }
-#endif
-#ifdef CONFIG_ACPI_NUMA
-    else if ( !strncmp(opt,"noacpi",6) )
-    {
-        numa_off = false;
-        acpi_numa = -1;
-    }
-#endif
-    else
-        return -EINVAL;
-
-    return 0;
-} 
-
 /*
  * Setup early cpu_to_node.
  *
@@ -287,4 +254,3 @@ static __init int register_numa_trigger(void)
     return 0;
 }
 __initcall(register_numa_trigger);
-
diff --git a/xen/common/numa.c b/xen/common/numa.c
index 88f1594127..c98eb8d571 100644
--- a/xen/common/numa.c
+++ b/xen/common/numa.c
@@ -14,8 +14,12 @@
 #include <xen/smp.h>
 #include <xen/pfn.h>
 #include <xen/sched.h>
+#include <xen/param.h>
 #include <asm/acpi.h>
 
+static int numa_setup(const char *s);
+custom_param("numa", numa_setup);
+
 struct node_data node_data[MAX_NUMNODES];
 
 /* Mapping from pdx to node id */
@@ -324,7 +328,7 @@ int __init numa_scan_nodes(u64 start, u64 end)
 }
 
 #ifdef CONFIG_NUMA_EMU
-int numa_fake __initdata = 0;
+static int numa_fake __initdata = 0;
 
 /* Numa emulation */
 static int __init numa_emulation(u64 start_pfn, u64 end_pfn)
@@ -409,3 +413,32 @@ void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
     /* architecture specified fallback operations */
     arch_numa_init_failed_fallback();
 }
+
+/* [numa=off] */
+static __init int numa_setup(const char *opt)
+{
+    if ( !strncmp(opt,"off",3) )
+        numa_off = true;
+    else if ( !strncmp(opt,"on",2) )
+        numa_off = false;
+#ifdef CONFIG_NUMA_EMU
+    else if ( !strncmp(opt, "fake=", 5) )
+    {
+        numa_off = false;
+        numa_fake = simple_strtoul(opt+5,NULL,0);
+        if ( numa_fake >= MAX_NUMNODES )
+            numa_fake = MAX_NUMNODES;
+    }
+#endif
+#ifdef CONFIG_ACPI_NUMA
+    else if ( !strncmp(opt,"noacpi",6) )
+    {
+        numa_off = false;
+        acpi_numa = -1;
+    }
+#endif
+    else
+        return -EINVAL;
+
+    return 0;
+}
diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
index b9b5d1ad88..c647fef736 100644
--- a/xen/include/xen/numa.h
+++ b/xen/include/xen/numa.h
@@ -83,7 +83,6 @@ extern void numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn);
 extern void numa_set_node(int cpu, nodeid_t node);
 extern int numa_scan_nodes(u64 start, u64 end);
 extern bool numa_off;
-extern int numa_fake;
 extern s8 acpi_numa;
 
 extern void setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end);
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* [XEN RFC PATCH 40/40] xen/x86: move dump_numa info hotkey to common
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (38 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 39/40] xen/x86: move numa_setup to common to support NUMA switch in command line Wei Chen
@ 2021-08-11 10:24 ` Wei Chen
  2021-08-11 10:41 ` [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Jan Beulich
                   ` (2 subsequent siblings)
  42 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-11 10:24 UTC (permalink / raw)
  To: wei.chen, xen-devel, sstabellini, julien, jbeulich; +Cc: Bertrand.Marquis

As device tree based NUMA has been enabled for Arm, so not
only x86 needs to dump numa info through hotkey, but also
Arm can use this hotkey to dump numa info.

In this patch, we move this hotkey to common. Arm can use it
to dump its numa information:

(XEN)  key 'u' (ascii '75') => dump NUMA info
(XEN) 'u' pressed -> dumping numa info (now = 8805901249990)
(XEN) NODE0 start->524288 size->520192 free->257673
(XEN) NODE1 start->8912896 size->524288 free->499676
(XEN) CPU0...1 -> NODE0
(XEN) CPU2...3 -> NODE1
(XEN) Memory location of each domain:
(XEN) Domain 0 (total: 262144):
(XEN)     Node 0: 262144
(XEN)     Node 1: 0

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
 xen/arch/x86/numa.c | 142 -------------------------------------------
 xen/common/numa.c   | 144 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 144 insertions(+), 142 deletions(-)

diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
index 380d8ed6fd..322801cb17 100644
--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -112,145 +112,3 @@ int __init arch_meminfo_get_ram_bank_range(int bank,
 
 	return 0;
 }
-
-static void dump_numa(unsigned char key)
-{
-    s_time_t now = NOW();
-    unsigned int i, j, n;
-    struct domain *d;
-    struct page_info *page;
-    unsigned int page_num_node[MAX_NUMNODES];
-    const struct vnuma_info *vnuma;
-
-    printk("'%c' pressed -> dumping numa info (now = %"PRI_stime")\n", key,
-           now);
-
-    for_each_online_node ( i )
-    {
-        paddr_t pa = pfn_to_paddr(node_start_pfn(i) + 1);
-
-        printk("NODE%u start->%lu size->%lu free->%lu\n",
-               i, node_start_pfn(i), node_spanned_pages(i),
-               avail_node_heap_pages(i));
-        /* sanity check phys_to_nid() */
-        if ( phys_to_nid(pa) != i )
-            printk("phys_to_nid(%"PRIpaddr") -> %d should be %u\n",
-                   pa, phys_to_nid(pa), i);
-    }
-
-    j = cpumask_first(&cpu_online_map);
-    n = 0;
-    for_each_online_cpu ( i )
-    {
-        if ( i != j + n || cpu_to_node[j] != cpu_to_node[i] )
-        {
-            if ( n > 1 )
-                printk("CPU%u...%u -> NODE%d\n", j, j + n - 1, cpu_to_node[j]);
-            else
-                printk("CPU%u -> NODE%d\n", j, cpu_to_node[j]);
-            j = i;
-            n = 1;
-        }
-        else
-            ++n;
-    }
-    if ( n > 1 )
-        printk("CPU%u...%u -> NODE%d\n", j, j + n - 1, cpu_to_node[j]);
-    else
-        printk("CPU%u -> NODE%d\n", j, cpu_to_node[j]);
-
-    rcu_read_lock(&domlist_read_lock);
-
-    printk("Memory location of each domain:\n");
-    for_each_domain ( d )
-    {
-        process_pending_softirqs();
-
-        printk("Domain %u (total: %u):\n", d->domain_id, domain_tot_pages(d));
-
-        for_each_online_node ( i )
-            page_num_node[i] = 0;
-
-        spin_lock(&d->page_alloc_lock);
-        page_list_for_each(page, &d->page_list)
-        {
-            i = phys_to_nid(page_to_maddr(page));
-            page_num_node[i]++;
-        }
-        spin_unlock(&d->page_alloc_lock);
-
-        for_each_online_node ( i )
-            printk("    Node %u: %u\n", i, page_num_node[i]);
-
-        if ( !read_trylock(&d->vnuma_rwlock) )
-            continue;
-
-        if ( !d->vnuma )
-        {
-            read_unlock(&d->vnuma_rwlock);
-            continue;
-        }
-
-        vnuma = d->vnuma;
-        printk("     %u vnodes, %u vcpus, guest physical layout:\n",
-               vnuma->nr_vnodes, d->max_vcpus);
-        for ( i = 0; i < vnuma->nr_vnodes; i++ )
-        {
-            unsigned int start_cpu = ~0U;
-
-            if ( vnuma->vnode_to_pnode[i] == NUMA_NO_NODE )
-                printk("       %3u: pnode ???,", i);
-            else
-                printk("       %3u: pnode %3u,", i, vnuma->vnode_to_pnode[i]);
-
-            printk(" vcpus ");
-
-            for ( j = 0; j < d->max_vcpus; j++ )
-            {
-                if ( !(j & 0x3f) )
-                    process_pending_softirqs();
-
-                if ( vnuma->vcpu_to_vnode[j] == i )
-                {
-                    if ( start_cpu == ~0U )
-                    {
-                        printk("%d", j);
-                        start_cpu = j;
-                    }
-                }
-                else if ( start_cpu != ~0U )
-                {
-                    if ( j - 1 != start_cpu )
-                        printk("-%d ", j - 1);
-                    else
-                        printk(" ");
-                    start_cpu = ~0U;
-                }
-            }
-
-            if ( start_cpu != ~0U  && start_cpu != j - 1 )
-                printk("-%d", j - 1);
-
-            printk("\n");
-
-            for ( j = 0; j < vnuma->nr_vmemranges; j++ )
-            {
-                if ( vnuma->vmemrange[j].nid == i )
-                    printk("           %016"PRIx64" - %016"PRIx64"\n",
-                           vnuma->vmemrange[j].start,
-                           vnuma->vmemrange[j].end);
-            }
-        }
-
-        read_unlock(&d->vnuma_rwlock);
-    }
-
-    rcu_read_unlock(&domlist_read_lock);
-}
-
-static __init int register_numa_trigger(void)
-{
-    register_keyhandler('u', dump_numa, "dump NUMA info", 1);
-    return 0;
-}
-__initcall(register_numa_trigger);
diff --git a/xen/common/numa.c b/xen/common/numa.c
index c98eb8d571..eb1950c51a 100644
--- a/xen/common/numa.c
+++ b/xen/common/numa.c
@@ -14,7 +14,9 @@
 #include <xen/smp.h>
 #include <xen/pfn.h>
 #include <xen/sched.h>
+#include <xen/keyhandler.h>
 #include <xen/param.h>
+#include <xen/softirq.h>
 #include <asm/acpi.h>
 
 static int numa_setup(const char *s);
@@ -442,3 +444,145 @@ static __init int numa_setup(const char *opt)
 
     return 0;
 }
+
+static void dump_numa(unsigned char key)
+{
+    s_time_t now = NOW();
+    unsigned int i, j, n;
+    struct domain *d;
+    struct page_info *page;
+    unsigned int page_num_node[MAX_NUMNODES];
+    const struct vnuma_info *vnuma;
+
+    printk("'%c' pressed -> dumping numa info (now = %"PRI_stime")\n", key,
+           now);
+
+    for_each_online_node ( i )
+    {
+        paddr_t pa = pfn_to_paddr(node_start_pfn(i) + 1);
+
+        printk("NODE%u start->%lu size->%lu free->%lu\n",
+               i, node_start_pfn(i), node_spanned_pages(i),
+               avail_node_heap_pages(i));
+        /* sanity check phys_to_nid() */
+        if ( phys_to_nid(pa) != i )
+            printk("phys_to_nid(%"PRIpaddr") -> %d should be %u\n",
+                   pa, phys_to_nid(pa), i);
+    }
+
+    j = cpumask_first(&cpu_online_map);
+    n = 0;
+    for_each_online_cpu ( i )
+    {
+        if ( i != j + n || cpu_to_node[j] != cpu_to_node[i] )
+        {
+            if ( n > 1 )
+                printk("CPU%u...%u -> NODE%d\n", j, j + n - 1, cpu_to_node[j]);
+            else
+                printk("CPU%u -> NODE%d\n", j, cpu_to_node[j]);
+            j = i;
+            n = 1;
+        }
+        else
+            ++n;
+    }
+    if ( n > 1 )
+        printk("CPU%u...%u -> NODE%d\n", j, j + n - 1, cpu_to_node[j]);
+    else
+        printk("CPU%u -> NODE%d\n", j, cpu_to_node[j]);
+
+    rcu_read_lock(&domlist_read_lock);
+
+    printk("Memory location of each domain:\n");
+    for_each_domain ( d )
+    {
+        process_pending_softirqs();
+
+        printk("Domain %u (total: %u):\n", d->domain_id, domain_tot_pages(d));
+
+        for_each_online_node ( i )
+            page_num_node[i] = 0;
+
+        spin_lock(&d->page_alloc_lock);
+        page_list_for_each(page, &d->page_list)
+        {
+            i = phys_to_nid(page_to_maddr(page));
+            page_num_node[i]++;
+        }
+        spin_unlock(&d->page_alloc_lock);
+
+        for_each_online_node ( i )
+            printk("    Node %u: %u\n", i, page_num_node[i]);
+
+        if ( !read_trylock(&d->vnuma_rwlock) )
+            continue;
+
+        if ( !d->vnuma )
+        {
+            read_unlock(&d->vnuma_rwlock);
+            continue;
+        }
+
+        vnuma = d->vnuma;
+        printk("     %u vnodes, %u vcpus, guest physical layout:\n",
+               vnuma->nr_vnodes, d->max_vcpus);
+        for ( i = 0; i < vnuma->nr_vnodes; i++ )
+        {
+            unsigned int start_cpu = ~0U;
+
+            if ( vnuma->vnode_to_pnode[i] == NUMA_NO_NODE )
+                printk("       %3u: pnode ???,", i);
+            else
+                printk("       %3u: pnode %3u,", i, vnuma->vnode_to_pnode[i]);
+
+            printk(" vcpus ");
+
+            for ( j = 0; j < d->max_vcpus; j++ )
+            {
+                if ( !(j & 0x3f) )
+                    process_pending_softirqs();
+
+                if ( vnuma->vcpu_to_vnode[j] == i )
+                {
+                    if ( start_cpu == ~0U )
+                    {
+                        printk("%d", j);
+                        start_cpu = j;
+                    }
+                }
+                else if ( start_cpu != ~0U )
+                {
+                    if ( j - 1 != start_cpu )
+                        printk("-%d ", j - 1);
+                    else
+                        printk(" ");
+                    start_cpu = ~0U;
+                }
+            }
+
+            if ( start_cpu != ~0U  && start_cpu != j - 1 )
+                printk("-%d", j - 1);
+
+            printk("\n");
+
+            for ( j = 0; j < vnuma->nr_vmemranges; j++ )
+            {
+                if ( vnuma->vmemrange[j].nid == i )
+                    printk("           %016"PRIx64" - %016"PRIx64"\n",
+                           vnuma->vmemrange[j].start,
+                           vnuma->vmemrange[j].end);
+            }
+        }
+
+        read_unlock(&d->vnuma_rwlock);
+    }
+
+    rcu_read_unlock(&domlist_read_lock);
+}
+
+static __init int register_numa_trigger(void)
+{
+    register_keyhandler('u', dump_numa, "dump NUMA info", 1);
+    return 0;
+}
+__initcall(register_numa_trigger);
-- 
2.25.1



^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (39 preceding siblings ...)
  2021-08-11 10:24 ` [XEN RFC PATCH 40/40] xen/x86: move dump_numa info hotkey to common Wei Chen
@ 2021-08-11 10:41 ` Jan Beulich
  2021-08-13  2:33   ` Wei Chen
  2021-08-19 13:42 ` Julien Grall
  2021-08-26  0:09 ` Stefano Stabellini
  42 siblings, 1 reply; 196+ messages in thread
From: Jan Beulich @ 2021-08-11 10:41 UTC (permalink / raw)
  To: Wei Chen; +Cc: Bertrand.Marquis, sstabellini, julien, xen-devel

On 11.08.2021 12:23, Wei Chen wrote:
> Hongda Deng (2):
>   xen/arm: return default DMA bit width when platform is not set
>   xen/arm: Fix lowmem_bitsize when arch_get_dma_bitsize return 0
> 
> Wei Chen (38):
>   tools: Fix -Werror=maybe-uninitialized for xlu_pci_parse_bdf
>   xen/arm: Print a 64-bit number in hex from early uart
>   xen/x86: Initialize memnodemapsize while faking NUMA node
>   xen: decouple NUMA from ACPI in Kconfig
>   xen/arm: use !CONFIG_NUMA to keep fake NUMA API
>   xen/x86: Move NUMA memory node map functions to common
>   xen/x86: Move numa_add_cpu_node to common
>   xen/x86: Move NR_NODE_MEMBLKS macro to common
>   xen/x86: Move NUMA nodes and memory block ranges to common
>   xen/x86: Move numa_initmem_init to common
>   xen/arm: introduce numa_set_node for Arm
>   xen/arm: set NUMA nodes max number to 64 by default
>   xen/x86: move NUMA API from x86 header to common header
>   xen/arm: Create a fake NUMA node to use common code
>   xen/arm: Introduce DEVICE_TREE_NUMA Kconfig for arm64
>   xen/arm: Keep memory nodes in dtb for NUMA when boot from EFI
>   xen: fdt: Introduce a helper to check fdt node type
>   xen/arm: implement node distance helpers for Arm64
>   xen/arm: introduce device_tree_numa as a switch for device tree NUMA
>   xen/arm: introduce a helper to parse device tree processor node
>   xen/arm: introduce a helper to parse device tree memory node
>   xen/arm: introduce a helper to parse device tree NUMA distance map
>   xen/arm: unified entry to parse all NUMA data from device tree
>   xen/arm: Add boot and secondary CPU to NUMA system
>   xen/arm: build CPU NUMA node map while creating cpu_logical_map
>   xen/x86: decouple nodes_cover_memory with E820 map
>   xen/arm: implement Arm arch helpers Arm to get memory map info
>   xen: move NUMA memory and CPU parsed nodemasks to common
>   xen/x86: move nodes_cover_memory to common
>   xen/x86: make acpi_scan_nodes to be neutral
>   xen: export bad_srat and srat_disabled to extern
>   xen: move numa_scan_nodes from x86 to common
>   xen: enable numa_scan_nodes for device tree based NUMA
>   xen/arm: keep guest still be NUMA unware
>   xen: introduce an arch helper to do NUMA init failed fallback
>   xen/arm: enable device tree based NUMA in system init
>   xen/x86: move numa_setup to common to support NUMA switch in command
>     line
>   xen/x86: move dump_numa info hotkey to common

May I please ask that you follow patch submission guidelines, in that
you send patches To: the list and Cc: relevant people. Furthermore I
doubt that I need to be on Cc: for all 40 of the patches.

Thanks and regards,
Jan



^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 01/40] tools: Fix -Werror=maybe-uninitialized for xlu_pci_parse_bdf
  2021-08-11 10:23 ` [XEN RFC PATCH 01/40] tools: Fix -Werror=maybe-uninitialized for xlu_pci_parse_bdf Wei Chen
@ 2021-08-11 10:49   ` Jan Beulich
  2021-08-13  6:28     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Jan Beulich @ 2021-08-11 10:49 UTC (permalink / raw)
  To: Wei Chen; +Cc: Bertrand.Marquis, xen-devel, sstabellini, julien

On 11.08.2021 12:23, Wei Chen wrote:
> | libxlu_pci.c: In function 'xlu_pci_parse_bdf':
> | libxlu_pci.c:32:18: error: 'func' may be used uninitialized in this function [-Werror=maybe-uninitialized]
> |    32 |     pcidev->func = func;
> |       |     ~~~~~~~~~~~~~^~~~~~

I'm afraid I can't spot such an assignment in the file (nor the two
similar ones further down). All I can see is 

    pci->domain = domain;
    pci->bus = bus;
    pci->dev = dev;
    pci->func = func;

> | libxlu_pci.c:51:29: note: 'func' was declared here
> |    51 |     unsigned dom, bus, dev, func, vslot = 0;
> |       |                             ^~~~
> | libxlu_pci.c:31:17: error: 'dev' may be used uninitialized in this function [-Werror=maybe-uninitialized]
> |    31 |     pcidev->dev = dev;
> |       |     ~~~~~~~~~~~~^~~~~
> | libxlu_pci.c:51:24: note: 'dev' was declared here
> |    51 |     unsigned dom, bus, dev, func, vslot = 0;
> |       |                        ^~~
> | libxlu_pci.c:30:17: error: 'bus' may be used uninitialized in this function [-Werror=maybe-uninitialized]
> |    30 |     pcidev->bus = bus;
> |       |     ~~~~~~~~~~~~^~~~~
> | libxlu_pci.c:51:19: note: 'bus' was declared here
> |    51 |     unsigned dom, bus, dev, func, vslot = 0;
> |       |                   ^~~
> | libxlu_pci.c:78:26: error: 'dom' may be used uninitialized in this function [-Werror=maybe-uninitialized]
> |    78 |                 if ( dom & ~0xff )
> |       |                      ~~~~^~~~~~~

I'm afraid I also can't spot a variable named "dom", nor a sufficiently
similar if(). May I ask what code base these were observed with? Is the
change needed at all anymore?

> --- a/tools/libs/util/libxlu_pci.c
> +++ b/tools/libs/util/libxlu_pci.c
> @@ -15,7 +15,7 @@ static int parse_bdf(libxl_device_pci *pci, const char *str, const char **endp)
>  {
>      const char *ptr = str;
>      unsigned int colons = 0;
> -    unsigned int domain, bus, dev, func;
> +    unsigned int domain = 0, bus = 0, dev = 0, func = 0;
>      int n;
>  
>      /* Count occurrences of ':' to detrmine presence/absence of the 'domain' */
> @@ -28,7 +28,6 @@ static int parse_bdf(libxl_device_pci *pci, const char *str, const char **endp)
>      ptr = str;
>      switch (colons) {
>      case 1:
> -        domain = 0;
>          if (sscanf(ptr, "%x:%x.%n", &bus, &dev, &n) != 2)
>              return ERROR_INVAL;
>          break;
> 

Also - which compiler did you encounter this with?

Finally please don't forget to Cc maintainers.

Jan



^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 04/40] xen/arm: return default DMA bit width when platform is not set
  2021-08-11 10:23 ` [XEN RFC PATCH 04/40] xen/arm: return default DMA bit width when platform is not set Wei Chen
@ 2021-08-11 10:54   ` Jan Beulich
  2021-08-13  6:54     ` Wei Chen
  2021-08-19 13:28   ` Julien Grall
  1 sibling, 1 reply; 196+ messages in thread
From: Jan Beulich @ 2021-08-11 10:54 UTC (permalink / raw)
  To: Wei Chen; +Cc: Bertrand.Marquis, xen-devel, sstabellini, julien

On 11.08.2021 12:23, Wei Chen wrote:
> --- a/xen/arch/arm/platform.c
> +++ b/xen/arch/arm/platform.c
> @@ -27,6 +27,7 @@ extern const struct platform_desc _splatform[], _eplatform[];
>  /* Pointer to the current platform description */
>  static const struct platform_desc *platform;
>  
> +extern unsigned int dma_bitsize;

This is a no-go: Declarations need to live in a header which the producer
and all consumers include. Else ...

> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -227,7 +227,7 @@ static bool __read_mostly scrub_debug;
>   * Bit width of the DMA heap -- used to override NUMA-node-first.
>   * allocation strategy, which can otherwise exhaust low memory.
>   */
> -static unsigned int dma_bitsize;
> +unsigned int dma_bitsize;

... a change here (of e.g. the type) will go unnoticed by the compiler,
and the consumer of the variable may no longer work correctly.

Also I'm afraid the description does not make clear why this variable
is what you want to use. Connected to this is the question why you need
to consume it on Arm in the first place, when x86 never had the need.

Jan



^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 03/40] xen/x86: Initialize memnodemapsize while faking NUMA node
  2021-08-11 10:23 ` [XEN RFC PATCH 03/40] xen/x86: Initialize memnodemapsize while faking NUMA node Wei Chen
@ 2021-08-12 15:32   ` Jan Beulich
  2021-08-13  7:26     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Jan Beulich @ 2021-08-12 15:32 UTC (permalink / raw)
  To: Wei Chen; +Cc: Bertrand.Marquis, xen-devel, sstabellini, julien

On 11.08.2021 12:23, Wei Chen wrote:
> When system turns NUMA off or system lacks of NUMA support,
> Xen will fake a NUMA node to make system works as a single
> node NUMA system.
> 
> In this case the memory node map doesn't need to be allocated
> from boot pages. But we should set the memnodemapsize to the
> array size of _memnodemap. Xen hadn't done it, and Xen should
> assert in phys_to_nid. But because x86 was using an empty
> macro "VIRTUAL_BUG_ON" to replace ASSERT, this bug will not
> be triggered.

How about we promote VIRTUAL_BUG_ON() to expand to at least ASSERT()?

> --- a/xen/arch/x86/numa.c
> +++ b/xen/arch/x86/numa.c
> @@ -270,6 +270,8 @@ void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
>      /* setup dummy node covering all memory */
>      memnode_shift = BITS_PER_LONG - 1;
>      memnodemap = _memnodemap;
> +    memnodemapsize = ARRAY_SIZE(_memnodemap);

But this doesn't reflect reality then, does it? We'd rather want to
set the size to 1, I would think.

Jan



^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 06/40] xen: decouple NUMA from ACPI in Kconfig
  2021-08-11 10:23 ` [XEN RFC PATCH 06/40] xen: decouple NUMA from ACPI in Kconfig Wei Chen
@ 2021-08-12 15:36   ` Jan Beulich
  2021-08-13  7:27     ` Wei Chen
  2021-08-12 16:54   ` Julien Grall
  1 sibling, 1 reply; 196+ messages in thread
From: Jan Beulich @ 2021-08-12 15:36 UTC (permalink / raw)
  To: Wei Chen; +Cc: Bertrand.Marquis, xen-devel, sstabellini, julien

On 11.08.2021 12:23, Wei Chen wrote:
> --- a/xen/arch/x86/Kconfig
> +++ b/xen/arch/x86/Kconfig
> @@ -24,7 +24,7 @@ config X86
>  	select HAS_UBSAN
>  	select HAS_VPCI if HVM
>  	select NEEDS_LIBELF
> -	select NUMA
> +	select ACPI_NUMA

We try to keep this alphabetically sorted, so please move up the
replacement line. Then
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan



^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 06/40] xen: decouple NUMA from ACPI in Kconfig
  2021-08-11 10:23 ` [XEN RFC PATCH 06/40] xen: decouple NUMA from ACPI in Kconfig Wei Chen
  2021-08-12 15:36   ` Jan Beulich
@ 2021-08-12 16:54   ` Julien Grall
  2021-08-13  7:28     ` Wei Chen
  1 sibling, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-12 16:54 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:23, Wei Chen wrote:
> In current Xen code only implments x86 ACPI-based NUMA support.

s/implments/implements/

> So in Xen Kconfig system, NUMA equals to ACPI_NUMA. x86 selects
> NUMA by default, and CONFIG_ACPI_NUMA is hardcode in config.h.
> 
> In this patch series, we introduced device tree based NUMA for
> Arm.

The concept of patch series only applies to the ML. Once checked-in this 
is only a series of commit. So I would write:

"In a follow-up patch, we will introduce support for NUMA using the 
Device-Tree".

>  That means we will have two NUMA implemetations, so in this

s/implemetations/implementations/

> patch we decouple NUMA from ACPI based NUMA in Kconfig. Make NUMA
> as a common feature, that device tree based NUMA also can select it.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64
  2021-08-11 10:41 ` [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Jan Beulich
@ 2021-08-13  2:33   ` Wei Chen
  2021-08-13  6:53     ` Jan Beulich
  0 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-13  2:33 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Bertrand Marquis, sstabellini, julien, xen-devel

Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2021年8月11日 18:42
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; sstabellini@kernel.org;
> julien@xen.org; xen-devel@lists.xenproject.org
> Subject: Re: [XEN RFC PATCH 00/40] Add device tree based NUMA support to
> Arm64
> 
> On 11.08.2021 12:23, Wei Chen wrote:
> > Hongda Deng (2):
> >   xen/arm: return default DMA bit width when platform is not set
> >   xen/arm: Fix lowmem_bitsize when arch_get_dma_bitsize return 0
> >
> > Wei Chen (38):
> >   tools: Fix -Werror=maybe-uninitialized for xlu_pci_parse_bdf
> >   xen/arm: Print a 64-bit number in hex from early uart
> >   xen/x86: Initialize memnodemapsize while faking NUMA node
> >   xen: decouple NUMA from ACPI in Kconfig
> >   xen/arm: use !CONFIG_NUMA to keep fake NUMA API
> >   xen/x86: Move NUMA memory node map functions to common
> >   xen/x86: Move numa_add_cpu_node to common
> >   xen/x86: Move NR_NODE_MEMBLKS macro to common
> >   xen/x86: Move NUMA nodes and memory block ranges to common
> >   xen/x86: Move numa_initmem_init to common
> >   xen/arm: introduce numa_set_node for Arm
> >   xen/arm: set NUMA nodes max number to 64 by default
> >   xen/x86: move NUMA API from x86 header to common header
> >   xen/arm: Create a fake NUMA node to use common code
> >   xen/arm: Introduce DEVICE_TREE_NUMA Kconfig for arm64
> >   xen/arm: Keep memory nodes in dtb for NUMA when boot from EFI
> >   xen: fdt: Introduce a helper to check fdt node type
> >   xen/arm: implement node distance helpers for Arm64
> >   xen/arm: introduce device_tree_numa as a switch for device tree NUMA
> >   xen/arm: introduce a helper to parse device tree processor node
> >   xen/arm: introduce a helper to parse device tree memory node
> >   xen/arm: introduce a helper to parse device tree NUMA distance map
> >   xen/arm: unified entry to parse all NUMA data from device tree
> >   xen/arm: Add boot and secondary CPU to NUMA system
> >   xen/arm: build CPU NUMA node map while creating cpu_logical_map
> >   xen/x86: decouple nodes_cover_memory with E820 map
> >   xen/arm: implement Arm arch helpers Arm to get memory map info
> >   xen: move NUMA memory and CPU parsed nodemasks to common
> >   xen/x86: move nodes_cover_memory to common
> >   xen/x86: make acpi_scan_nodes to be neutral
> >   xen: export bad_srat and srat_disabled to extern
> >   xen: move numa_scan_nodes from x86 to common
> >   xen: enable numa_scan_nodes for device tree based NUMA
> >   xen/arm: keep guest still be NUMA unware
> >   xen: introduce an arch helper to do NUMA init failed fallback
> >   xen/arm: enable device tree based NUMA in system init
> >   xen/x86: move numa_setup to common to support NUMA switch in command
> >     line
> >   xen/x86: move dump_numa info hotkey to common
> 
> May I please ask that you follow patch submission guidelines, in that
> you send patches To: the list and Cc: relevant people. Furthermore I
> doubt that I need to be on Cc: for all 40 of the patches.
> 

Thanks for your reminder. Before I sent this series, I had paid
sometime to consider CC or TO you, I found you are in the X86 Arch,
x86 memory management maintainer lists. And in this patch series,
I have done some changes that affects x86, so I added you in TO list.
Obviously, my understanding had some mistake. I will add you to CC
list in next version.

> Thanks and regards,
> Jan


^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 01/40] tools: Fix -Werror=maybe-uninitialized for xlu_pci_parse_bdf
  2021-08-11 10:49   ` Jan Beulich
@ 2021-08-13  6:28     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-13  6:28 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Bertrand Marquis, xen-devel, sstabellini, julien

Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2021年8月11日 18:50
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; xen-
> devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org
> Subject: Re: [XEN RFC PATCH 01/40] tools: Fix -Werror=maybe-uninitialized
> for xlu_pci_parse_bdf
> 
> On 11.08.2021 12:23, Wei Chen wrote:
> > | libxlu_pci.c: In function 'xlu_pci_parse_bdf':
> > | libxlu_pci.c:32:18: error: 'func' may be used uninitialized in this
> function [-Werror=maybe-uninitialized]
> > |    32 |     pcidev->func = func;
> > |       |     ~~~~~~~~~~~~~^~~~~~
> 
> I'm afraid I can't spot such an assignment in the file (nor the two
> similar ones further down). All I can see is
> 
>     pci->domain = domain;
>     pci->bus = bus;
>     pci->dev = dev;
>     pci->func = func;
> 

Sorry, I forgot to update my commit log with the latest code base.
I revert this change in my current code, I can't reproduce it.
I'm not sure if it's because I upgraded my build environment.
Give me sometime, if I can reproduce it I will update the commit
log in next version. If it's no longer needed, I will remove this
patch from this series.

> > | libxlu_pci.c:51:29: note: 'func' was declared here
> > |    51 |     unsigned dom, bus, dev, func, vslot = 0;
> > |       |                             ^~~~
> > | libxlu_pci.c:31:17: error: 'dev' may be used uninitialized in this
> function [-Werror=maybe-uninitialized]
> > |    31 |     pcidev->dev = dev;
> > |       |     ~~~~~~~~~~~~^~~~~
> > | libxlu_pci.c:51:24: note: 'dev' was declared here
> > |    51 |     unsigned dom, bus, dev, func, vslot = 0;
> > |       |                        ^~~
> > | libxlu_pci.c:30:17: error: 'bus' may be used uninitialized in this
> function [-Werror=maybe-uninitialized]
> > |    30 |     pcidev->bus = bus;
> > |       |     ~~~~~~~~~~~~^~~~~
> > | libxlu_pci.c:51:19: note: 'bus' was declared here
> > |    51 |     unsigned dom, bus, dev, func, vslot = 0;
> > |       |                   ^~~
> > | libxlu_pci.c:78:26: error: 'dom' may be used uninitialized in this
> function [-Werror=maybe-uninitialized]
> > |    78 |                 if ( dom & ~0xff )
> > |       |                      ~~~~^~~~~~~
> 
> I'm afraid I also can't spot a variable named "dom", nor a sufficiently
> similar if(). May I ask what code base these were observed with? Is the
> change needed at all anymore?
> 

same as above.

> > --- a/tools/libs/util/libxlu_pci.c
> > +++ b/tools/libs/util/libxlu_pci.c
> > @@ -15,7 +15,7 @@ static int parse_bdf(libxl_device_pci *pci, const char
> *str, const char **endp)
> >  {
> >      const char *ptr = str;
> >      unsigned int colons = 0;
> > -    unsigned int domain, bus, dev, func;
> > +    unsigned int domain = 0, bus = 0, dev = 0, func = 0;
> >      int n;
> >
> >      /* Count occurrences of ':' to detrmine presence/absence of the
> 'domain' */
> > @@ -28,7 +28,6 @@ static int parse_bdf(libxl_device_pci *pci, const char
> *str, const char **endp)
> >      ptr = str;
> >      switch (colons) {
> >      case 1:
> > -        domain = 0;
> >          if (sscanf(ptr, "%x:%x.%n", &bus, &dev, &n) != 2)
> >              return ERROR_INVAL;
> >          break;
> >
> 
> Also - which compiler did you encounter this with?
> 
> Finally please don't forget to Cc maintainers.
> 

If this patch still needed, I will do in next version.

> Jan


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64
  2021-08-13  2:33   ` Wei Chen
@ 2021-08-13  6:53     ` Jan Beulich
  0 siblings, 0 replies; 196+ messages in thread
From: Jan Beulich @ 2021-08-13  6:53 UTC (permalink / raw)
  To: Wei Chen; +Cc: Bertrand Marquis, sstabellini, julien, xen-devel

On 13.08.2021 04:33, Wei Chen wrote:
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 2021年8月11日 18:42
>>
>> On 11.08.2021 12:23, Wei Chen wrote:
>>> Hongda Deng (2):
>>>   xen/arm: return default DMA bit width when platform is not set
>>>   xen/arm: Fix lowmem_bitsize when arch_get_dma_bitsize return 0
>>>
>>> Wei Chen (38):
>>>   tools: Fix -Werror=maybe-uninitialized for xlu_pci_parse_bdf
>>>   xen/arm: Print a 64-bit number in hex from early uart
>>>   xen/x86: Initialize memnodemapsize while faking NUMA node
>>>   xen: decouple NUMA from ACPI in Kconfig
>>>   xen/arm: use !CONFIG_NUMA to keep fake NUMA API
>>>   xen/x86: Move NUMA memory node map functions to common
>>>   xen/x86: Move numa_add_cpu_node to common
>>>   xen/x86: Move NR_NODE_MEMBLKS macro to common
>>>   xen/x86: Move NUMA nodes and memory block ranges to common
>>>   xen/x86: Move numa_initmem_init to common
>>>   xen/arm: introduce numa_set_node for Arm
>>>   xen/arm: set NUMA nodes max number to 64 by default
>>>   xen/x86: move NUMA API from x86 header to common header
>>>   xen/arm: Create a fake NUMA node to use common code
>>>   xen/arm: Introduce DEVICE_TREE_NUMA Kconfig for arm64
>>>   xen/arm: Keep memory nodes in dtb for NUMA when boot from EFI
>>>   xen: fdt: Introduce a helper to check fdt node type
>>>   xen/arm: implement node distance helpers for Arm64
>>>   xen/arm: introduce device_tree_numa as a switch for device tree NUMA
>>>   xen/arm: introduce a helper to parse device tree processor node
>>>   xen/arm: introduce a helper to parse device tree memory node
>>>   xen/arm: introduce a helper to parse device tree NUMA distance map
>>>   xen/arm: unified entry to parse all NUMA data from device tree
>>>   xen/arm: Add boot and secondary CPU to NUMA system
>>>   xen/arm: build CPU NUMA node map while creating cpu_logical_map
>>>   xen/x86: decouple nodes_cover_memory with E820 map
>>>   xen/arm: implement Arm arch helpers Arm to get memory map info
>>>   xen: move NUMA memory and CPU parsed nodemasks to common
>>>   xen/x86: move nodes_cover_memory to common
>>>   xen/x86: make acpi_scan_nodes to be neutral
>>>   xen: export bad_srat and srat_disabled to extern
>>>   xen: move numa_scan_nodes from x86 to common
>>>   xen: enable numa_scan_nodes for device tree based NUMA
>>>   xen/arm: keep guest still be NUMA unware
>>>   xen: introduce an arch helper to do NUMA init failed fallback
>>>   xen/arm: enable device tree based NUMA in system init
>>>   xen/x86: move numa_setup to common to support NUMA switch in command
>>>     line
>>>   xen/x86: move dump_numa info hotkey to common
>>
>> May I please ask that you follow patch submission guidelines, in that
>> you send patches To: the list and Cc: relevant people. Furthermore I
>> doubt that I need to be on Cc: for all 40 of the patches.
>>
> 
> Thanks for your reminder. Before I sent this series, I had paid
> sometime to consider CC or TO you, I found you are in the X86 Arch,
> x86 memory management maintainer lists. And in this patch series,
> I have done some changes that affects x86, so I added you in TO list.
> Obviously, my understanding had some mistake. I will add you to CC
> list in next version.

And then on a patch-by-patch basis please, unless you see a specific
need to also Cc my on certain Arm-only patches. Thanks.

Jan



^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 04/40] xen/arm: return default DMA bit width when platform is not set
  2021-08-11 10:54   ` Jan Beulich
@ 2021-08-13  6:54     ` Wei Chen
  2021-08-13  6:56       ` Jan Beulich
  0 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-13  6:54 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Bertrand Marquis, xen-devel, sstabellini, julien

Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2021年8月11日 18:54
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; xen-
> devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org
> Subject: Re: [XEN RFC PATCH 04/40] xen/arm: return default DMA bit width
> when platform is not set
> 
> On 11.08.2021 12:23, Wei Chen wrote:
> > --- a/xen/arch/arm/platform.c
> > +++ b/xen/arch/arm/platform.c
> > @@ -27,6 +27,7 @@ extern const struct platform_desc _splatform[],
> _eplatform[];
> >  /* Pointer to the current platform description */
> >  static const struct platform_desc *platform;
> >
> > +extern unsigned int dma_bitsize;
> 
> This is a no-go: Declarations need to live in a header which the producer
> and all consumers include. Else ...

Ok, I will place it to a header.

> 
> > --- a/xen/common/page_alloc.c
> > +++ b/xen/common/page_alloc.c
> > @@ -227,7 +227,7 @@ static bool __read_mostly scrub_debug;
> >   * Bit width of the DMA heap -- used to override NUMA-node-first.
> >   * allocation strategy, which can otherwise exhaust low memory.
> >   */
> > -static unsigned int dma_bitsize;
> > +unsigned int dma_bitsize;
> 
> ... a change here (of e.g. the type) will go unnoticed by the compiler,
> and the consumer of the variable may no longer work correctly.
> 

Sorry, I am not very clear about this comment.

> Also I'm afraid the description does not make clear why this variable
> is what you want to use. Connected to this is the question why you need
> to consume it on Arm in the first place, when x86 never had the need.
> 

Different Arm platforms may have different DMA bitsize. So in my previous
thought, If Arm platform doesn't provide any DMA bitsize info, I will
return the system DMA bitsize (dma_bitsize) in arch_get_dma_bitsize.
But your comment made me think again. My current change may have fallen
into a logical loophole. dma_bitsize as a high level variable, it's
value depends on xen boot command line or arch_get_dma_bitsize.
I can't use it for arch_get_dma_bitsize's input.

So I think, in next version, I will discard the changes of dma_bitsize,
just keep the changes in arch_get_dma_bitsize to return 0, when platform
hasn't specify the DMA bitsize. Just like:

unsigned int arch_get_dma_bitsize(void)
 {
-    return ( platform && platform->dma_bitsize ) ? platform->dma_bitsize : 32;
+    return ( platform && platform->dma_bitsize ) ? platform->dma_bitsize
+                                                 : 0;
 }

Thanks,
Wei Chen

> Jan


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 04/40] xen/arm: return default DMA bit width when platform is not set
  2021-08-13  6:54     ` Wei Chen
@ 2021-08-13  6:56       ` Jan Beulich
  0 siblings, 0 replies; 196+ messages in thread
From: Jan Beulich @ 2021-08-13  6:56 UTC (permalink / raw)
  To: Wei Chen; +Cc: Bertrand Marquis, xen-devel, sstabellini, julien

On 13.08.2021 08:54, Wei Chen wrote:
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 2021年8月11日 18:54
>>
>> On 11.08.2021 12:23, Wei Chen wrote:
>>> --- a/xen/arch/arm/platform.c
>>> +++ b/xen/arch/arm/platform.c
>>> @@ -27,6 +27,7 @@ extern const struct platform_desc _splatform[],
>> _eplatform[];
>>>  /* Pointer to the current platform description */
>>>  static const struct platform_desc *platform;
>>>
>>> +extern unsigned int dma_bitsize;
>>
>> This is a no-go: Declarations need to live in a header which the producer
>> and all consumers include. Else ...
> 
> Ok, I will place it to a header.
> 
>>
>>> --- a/xen/common/page_alloc.c
>>> +++ b/xen/common/page_alloc.c
>>> @@ -227,7 +227,7 @@ static bool __read_mostly scrub_debug;
>>>   * Bit width of the DMA heap -- used to override NUMA-node-first.
>>>   * allocation strategy, which can otherwise exhaust low memory.
>>>   */
>>> -static unsigned int dma_bitsize;
>>> +unsigned int dma_bitsize;
>>
>> ... a change here (of e.g. the type) will go unnoticed by the compiler,
>> and the consumer of the variable may no longer work correctly.
>>
> 
> Sorry, I am not very clear about this comment.

I've merely been trying to explain _why_ the declaration needs to be
in a header.

Jan



^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 03/40] xen/x86: Initialize memnodemapsize while faking NUMA node
  2021-08-12 15:32   ` Jan Beulich
@ 2021-08-13  7:26     ` Wei Chen
  2021-08-13  8:29       ` Jan Beulich
  0 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-13  7:26 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Bertrand Marquis, xen-devel, sstabellini, julien

Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2021年8月12日 23:33
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; xen-
> devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org
> Subject: Re: [XEN RFC PATCH 03/40] xen/x86: Initialize memnodemapsize
> while faking NUMA node
> 
> On 11.08.2021 12:23, Wei Chen wrote:
> > When system turns NUMA off or system lacks of NUMA support,
> > Xen will fake a NUMA node to make system works as a single
> > node NUMA system.
> >
> > In this case the memory node map doesn't need to be allocated
> > from boot pages. But we should set the memnodemapsize to the
> > array size of _memnodemap. Xen hadn't done it, and Xen should
> > assert in phys_to_nid. But because x86 was using an empty
> > macro "VIRTUAL_BUG_ON" to replace ASSERT, this bug will not
> > be triggered.
> 
> How about we promote VIRTUAL_BUG_ON() to expand to at least ASSERT()?
> 

That would be good. Frankly, we discovered this because we used ASSERT
in Arm and then noticed that x86 was using VIRTUAL_BUG_ON.

> > --- a/xen/arch/x86/numa.c
> > +++ b/xen/arch/x86/numa.c
> > @@ -270,6 +270,8 @@ void __init numa_initmem_init(unsigned long
> start_pfn, unsigned long end_pfn)
> >      /* setup dummy node covering all memory */
> >      memnode_shift = BITS_PER_LONG - 1;
> >      memnodemap = _memnodemap;
> > +    memnodemapsize = ARRAY_SIZE(_memnodemap);
> 
> But this doesn't reflect reality then, does it? We'd rather want to
> set the size to 1, I would think.
> 

Yes, you're right. Actually, we just only used 1 slot. But furthermore,
memnodemap[0] may be set in acpi_scan_nodes, but acpi_scan_nodes doesn't
reset memnodemap when it failed. I think maybe we can add:
    memnodemap[0] = 0;
    memnodemapsize = 1;
How do you think about it?

> Jan


^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 06/40] xen: decouple NUMA from ACPI in Kconfig
  2021-08-12 15:36   ` Jan Beulich
@ 2021-08-13  7:27     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-13  7:27 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Bertrand Marquis, xen-devel, sstabellini, julien

Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2021年8月12日 23:37
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; xen-
> devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org
> Subject: Re: [XEN RFC PATCH 06/40] xen: decouple NUMA from ACPI in Kconfig
> 
> On 11.08.2021 12:23, Wei Chen wrote:
> > --- a/xen/arch/x86/Kconfig
> > +++ b/xen/arch/x86/Kconfig
> > @@ -24,7 +24,7 @@ config X86
> >  	select HAS_UBSAN
> >  	select HAS_VPCI if HVM
> >  	select NEEDS_LIBELF
> > -	select NUMA
> > +	select ACPI_NUMA
> 
> We try to keep this alphabetically sorted, so please move up the
> replacement line. Then
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 

Thanks, I will do it in next version.

> Jan


^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 06/40] xen: decouple NUMA from ACPI in Kconfig
  2021-08-12 16:54   ` Julien Grall
@ 2021-08-13  7:28     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-13  7:28 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini, jbeulich; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月13日 0:55
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 06/40] xen: decouple NUMA from ACPI in Kconfig
> 
> Hi Wei,
> 
> On 11/08/2021 11:23, Wei Chen wrote:
> > In current Xen code only implments x86 ACPI-based NUMA support.
> 
> s/implments/implements/

Got it.

> 
> > So in Xen Kconfig system, NUMA equals to ACPI_NUMA. x86 selects
> > NUMA by default, and CONFIG_ACPI_NUMA is hardcode in config.h.
> >
> > In this patch series, we introduced device tree based NUMA for
> > Arm.
> 
> The concept of patch series only applies to the ML. Once checked-in this
> is only a series of commit. So I would write:
> 
> "In a follow-up patch, we will introduce support for NUMA using the
> Device-Tree".
> 
> >  That means we will have two NUMA implemetations, so in this
> 
> s/implemetations/implementations/
> 

Thanks, I will update the commit log in next version.

> > patch we decouple NUMA from ACPI based NUMA in Kconfig. Make NUMA
> > as a common feature, that device tree based NUMA also can select it.
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 03/40] xen/x86: Initialize memnodemapsize while faking NUMA node
  2021-08-13  7:26     ` Wei Chen
@ 2021-08-13  8:29       ` Jan Beulich
  0 siblings, 0 replies; 196+ messages in thread
From: Jan Beulich @ 2021-08-13  8:29 UTC (permalink / raw)
  To: Wei Chen; +Cc: Bertrand Marquis, xen-devel, sstabellini, julien

On 13.08.2021 09:26, Wei Chen wrote:
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 2021年8月12日 23:33
>>
>> On 11.08.2021 12:23, Wei Chen wrote:
>>> --- a/xen/arch/x86/numa.c
>>> +++ b/xen/arch/x86/numa.c
>>> @@ -270,6 +270,8 @@ void __init numa_initmem_init(unsigned long
>> start_pfn, unsigned long end_pfn)
>>>      /* setup dummy node covering all memory */
>>>      memnode_shift = BITS_PER_LONG - 1;
>>>      memnodemap = _memnodemap;
>>> +    memnodemapsize = ARRAY_SIZE(_memnodemap);
>>
>> But this doesn't reflect reality then, does it? We'd rather want to
>> set the size to 1, I would think.
>>
> 
> Yes, you're right. Actually, we just only used 1 slot. But furthermore,
> memnodemap[0] may be set in acpi_scan_nodes, but acpi_scan_nodes doesn't
> reset memnodemap when it failed. I think maybe we can add:
>     memnodemap[0] = 0;
>     memnodemapsize = 1;
> How do you think about it?

Well, yes, if data may have been put there, then resetting of course
makes sense.

Jan



^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 02/40] xen/arm: Print a 64-bit number in hex from early uart
  2021-08-11 10:23 ` [XEN RFC PATCH 02/40] xen/arm: Print a 64-bit number in hex from early uart Wei Chen
@ 2021-08-19 13:05   ` Julien Grall
  2021-08-20  1:13     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-19 13:05 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:23, Wei Chen wrote:
> Current putn function that is using for early print
> only can print low 32-bit of AArch64 register. This
> will lose some important messages while debugging
> with early console. For example:
> (XEN) Bringing up CPU5
> - CPU 0000000100000100 booting -
> Will be truncated to
> (XEN) Bringing up CPU5
> - CPU 00000100 booting -
> 
> In this patch, we increased the print loops and shift
> bits to make putn print 64-bit number.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>

Acked-by: Julien Grall <jgrall@amazon.com>

> ---
>   xen/arch/arm/arm64/head.S | 9 +++++----
>   1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> index aa1f88c764..b32639d7d6 100644
> --- a/xen/arch/arm/arm64/head.S
> +++ b/xen/arch/arm/arm64/head.S
> @@ -862,17 +862,18 @@ puts:
>           ret
>   ENDPROC(puts)
>   
> -/* Print a 32-bit number in hex.  Specific to the PL011 UART.
> +/* Print a 64-bit number in hex.  Specific to the PL011 UART.

As you modify the line, can you take the opportunity to write:

/*
  * Print a 64-bit...

And also drop the second sentence as it the code has not been PL011 
specific for quite a while now.

>    * x0: Number to print.
>    * x23: Early UART base address
>    * Clobbers x0-x3 */
> +#define PRINT_MASK 0xf000000000000000
>   putn:
>           adr   x1, hex
> -        mov   x3, #8
> +        mov   x3, #16
>   1:
>           early_uart_ready x23, 2
> -        and   x2, x0, #0xf0000000    /* Mask off the top nybble */
> -        lsr   x2, x2, #28
> +        and   x2, x0, #PRINT_MASK    /* Mask off the top nybble */
> +        lsr   x2, x2, #60
>           ldrb  w2, [x1, x2]           /* Convert to a char */
>           early_uart_transmit x23, w2
>           lsl   x0, x0, #4             /* Roll it through one nybble at a time */
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 04/40] xen/arm: return default DMA bit width when platform is not set
  2021-08-11 10:23 ` [XEN RFC PATCH 04/40] xen/arm: return default DMA bit width when platform is not set Wei Chen
  2021-08-11 10:54   ` Jan Beulich
@ 2021-08-19 13:28   ` Julien Grall
  2021-08-20  2:04     ` Wei Chen
  1 sibling, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-19 13:28 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi,

On 11/08/2021 11:23, Wei Chen wrote:
> From: Hongda Deng <Hongda.Deng@arm.com>
> 
> In current code, arch_get_dma_bitsize will return 32 when platorm
> or platform->dma_bitsize is not set. It's not resonable, for Arm,

s/resonable/reasonable/

> we don't require to reserve DMA memory. So we set dma_bitsize always
> be 0. In NO-NUMA system, arch_get_dma_bitsize will not be invoked,
> so dma_bitsize will not be overrided by this function. 

arch_get_dma_bitsize() is also used to allocate dom0 memory. We need to 
be able to allocate some DMA-able memory that can be used by every devices.

> But in NUMA
> system, once the online nodes are greater than 1, this function will
> be invoked. The dma_bitsize will be limited to 32. That means, only
> first 4GB memory can be used for DMA. But that's against our hardware
> design. We don't have that kind of restriction on hardware.

What do you mean by "hardware design"? Are you referring to the server 
you boot Xen on?

Anyway, there are plenty of platform out that have devices which can't 
DMA into memory above 32-bit. On RPI, this is even lower (30-bit).

So I would be cautious to change the default limit.

At the moment, the only place on Arm where we need DMA-able memory is 
for dom0. This is allocated at boot and can't change afterwards (for now).

So I would explore to remove the NUMA check for drop the DMA zone. FAOD, 
both suggestion are for Arm only. For x86, they need to be kept.

> Only
> platform setting can override dma_bitsize. So in this patch, we
> return default dma_bitsize, when platform and platorm->dma_bitsize
> are not set.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> Signed-off-by: Hongda Deng <Hongda.Deng@arm.com>
> ---
>   xen/arch/arm/platform.c | 4 +++-
>   xen/common/page_alloc.c | 2 +-
>   2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/arm/platform.c b/xen/arch/arm/platform.c
> index 4db5bbb4c5..0a27fef9a4 100644
> --- a/xen/arch/arm/platform.c
> +++ b/xen/arch/arm/platform.c
> @@ -27,6 +27,7 @@ extern const struct platform_desc _splatform[], _eplatform[];
>   /* Pointer to the current platform description */
>   static const struct platform_desc *platform;
>   
> +extern unsigned int dma_bitsize;
>   
>   static bool __init platform_is_compatible(const struct platform_desc *plat)
>   {
> @@ -157,7 +158,8 @@ bool platform_device_is_blacklisted(const struct dt_device_node *node)
>   
>   unsigned int arch_get_dma_bitsize(void)
>   {
> -    return ( platform && platform->dma_bitsize ) ? platform->dma_bitsize : 32;
> +    return ( platform && platform->dma_bitsize ) ? platform->dma_bitsize
> +                                                 : dma_bitsize;
>   }
>   
>   /*
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 958ba0cd92..0f0cae5a4e 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -227,7 +227,7 @@ static bool __read_mostly scrub_debug;
>    * Bit width of the DMA heap -- used to override NUMA-node-first.
>    * allocation strategy, which can otherwise exhaust low memory.
>    */
> -static unsigned int dma_bitsize;
> +unsigned int dma_bitsize;
>   integer_param("dma_bits", dma_bitsize);
>   
>   /* Offlined page list, protected by heap_lock. */
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 05/40] xen/arm: Fix lowmem_bitsize when arch_get_dma_bitsize return 0
  2021-08-11 10:23 ` [XEN RFC PATCH 05/40] xen/arm: Fix lowmem_bitsize when arch_get_dma_bitsize return 0 Wei Chen
@ 2021-08-19 13:32   ` Julien Grall
  2021-08-20  2:05     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-19 13:32 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi,

I guess this patch may be dropped after my comment on patch #4. I will 
comment just on the process.

On 11/08/2021 11:23, Wei Chen wrote:
> From: Hongda Deng <Hongda.Deng@arm.com>
> 
> In previous patch, we make arch_get_dma_bitsize return 0 when
> dma_bitsize and platform->dma_bitsize are not set. But this
> will affect lowmem_bitsize in allocate_memory_11 for domain0.
> Because this function depends lowmem_bitsize to allocate memory
> below 4GB.
> 
> In current code, when arch_get_dma_bitsize return 0, lowmem_bitsize
> will be set to 0. In this case, we will get "No bank has been
> allocated below 0-bit." message while allocating domain0 memory.
> And the lowmem will be set to false.
> 
> This behavior is inconsistent with what allocate_memory_11 done
> before, and doesn't meet this functions requirements. So we
> check arch_get_dma_bitsize's return value before set lowmem_bitsize.
> Avoid setting lowmem_bitsize to 0 by mistake.

In general, we want to avoid breaking bisection within a series. This 
means that this patch should be before patch #4.

> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> Signed-off-by: Hongda Deng <Hongda.Deng@arm.com>
> ---
>   xen/arch/arm/domain_build.c | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index 6c86d52781..cf341f349f 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -265,9 +265,18 @@ static void __init allocate_memory_11(struct domain *d,
>       int i;
>   
>       bool lowmem = true;
> -    unsigned int lowmem_bitsize = min(32U, arch_get_dma_bitsize());
> +    unsigned int lowmem_bitsize = arch_get_dma_bitsize();
>       unsigned int bits;
>   
> +    /*
> +       When dma_bitsize and platform->dma_bitsize are not set,
> +       arch_get_dma_bitsize will return 0. That means this system
> +       doesn't need to reserve memory for DMA. But in order to
> +       meet above requirements, we still need to try to allocate
> +       memory below 4GB for Dom0.
> +    */

The coding style for comments is:

/*
  * A
  * B
  */

> +    lowmem_bitsize = lowmem_bitsize ? min(32U, lowmem_bitsize) : 32U;
> +
>       /*
>        * TODO: Implement memory bank allocation when DOM0 is not direct
>        * mapped
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 07/40] xen/arm: use !CONFIG_NUMA to keep fake NUMA API
  2021-08-11 10:23 ` [XEN RFC PATCH 07/40] xen/arm: use !CONFIG_NUMA to keep fake NUMA API Wei Chen
@ 2021-08-19 13:34   ` Julien Grall
  2021-08-20  2:08     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-19 13:34 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:23, Wei Chen wrote:
> Only Arm64 supports NUMA, the CONFIG_NUMA could not be
> enabled for Arm32.

What do you mean by "could not be enabled"?

> Even in Arm64, users still can disable
> the CONFIG_NUMA through Kconfig option. In this case, keep
> current fake NUMA API, will make Arm code still can work
> with NUMA aware memory allocation and scheduler.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/include/asm-arm/numa.h | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
> index 31a6de4e23..ab9c4a2448 100644
> --- a/xen/include/asm-arm/numa.h
> +++ b/xen/include/asm-arm/numa.h
> @@ -5,6 +5,8 @@
>   
>   typedef u8 nodeid_t;
>   
> +#if !defined(CONFIG_NUMA)

NIT: We tend to use #ifndef rather than #if !defined(...)

> +
>   /* Fake one node for now. See also node_online_map. */
>   #define cpu_to_node(cpu) 0
>   #define node_to_cpumask(node)   (cpu_online_map)
> @@ -25,6 +27,8 @@ extern mfn_t first_valid_mfn;
>   #define node_start_pfn(nid) (mfn_x(first_valid_mfn))
>   #define __node_distance(a, b) (20)
>   
> +#endif
> +
>   #endif /* __ARCH_ARM_NUMA_H */
>   /*
>    * Local variables:
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 17/40] xen/arm: Introduce DEVICE_TREE_NUMA Kconfig for arm64
  2021-08-11 10:24 ` [XEN RFC PATCH 17/40] xen/arm: Introduce DEVICE_TREE_NUMA Kconfig for arm64 Wei Chen
@ 2021-08-19 13:38   ` Julien Grall
  2021-08-20  2:30     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-19 13:38 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi,

On 11/08/2021 11:24, Wei Chen wrote:
> We need a Kconfig option to distinguish with ACPI based
> NUMA. So we introduce the new Kconfig option:
> DEVICE_TREE_NUMA in this patch for Arm64.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/Kconfig | 10 ++++++++++
>   1 file changed, 10 insertions(+)
> 
> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> index ecfa6822e4..678cc98ea3 100644
> --- a/xen/arch/arm/Kconfig
> +++ b/xen/arch/arm/Kconfig
> @@ -33,6 +33,16 @@ config ACPI
>   	  Advanced Configuration and Power Interface (ACPI) support for Xen is
>   	  an alternative to device tree on ARM64.
>   
> +config DEVICE_TREE_NUMA

The name suggests that NUMA should only be enabled for Device-Tree... 
But the description looks generic.

However, I think the user should only have the choice to say whether 
they want NUMA to be enabled or not. We should not give them the choice 
to enable/disable the parsing for DT/ACPI.

So we should have a generic config that will then select DT (and ACPI in 
the future).

> +	bool "NUMA (Non-Uniform Memory Access) Support (UNSUPPORTED)" if UNSUPPORTED
> +	depends on ARM_64
> +	select NUMA
> +	---help---
> +
> +	  Non-Uniform Memory Access (NUMA) is a computer memory design used in
> +	  multiprocessing, where the memory access time depends on the memory
> +	  location relative to the processor.
> +

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (40 preceding siblings ...)
  2021-08-11 10:41 ` [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Jan Beulich
@ 2021-08-19 13:42 ` Julien Grall
  2021-08-19 14:05   ` Bertrand Marquis
  2021-08-26  0:09 ` Stefano Stabellini
  42 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-19 13:42 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:23, Wei Chen wrote:
> Xen memory allocation and scheduler modules are NUMA aware.
> But actually, on x86 has implemented the architecture APIs
> to support NUMA. Arm was providing a set of fake architecture
> APIs to make it compatible with NUMA awared memory allocation
> and scheduler.
> 
> Arm system was working well as a single node NUMA system with
> these fake APIs, because we didn't have multiple nodes NUMA
> system on Arm. But in recent years, more and more Arm devices
> support multiple nodes NUMA system. Like TX2, some Hisilicon
> chips and the Ampere Altra.

All the platforms you mention here are servers (so mainly ACPI). 
However, this series is adding DT support.

Could you outline the long term plan for DT? Is it going to be used on 
production HW?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64
  2021-08-19 13:42 ` Julien Grall
@ 2021-08-19 14:05   ` Bertrand Marquis
  2021-08-19 17:11     ` Julien Grall
  0 siblings, 1 reply; 196+ messages in thread
From: Bertrand Marquis @ 2021-08-19 14:05 UTC (permalink / raw)
  To: Julien Grall; +Cc: Wei Chen, xen-devel, sstabellini, jbeulich

Hi Julien,

> On 19 Aug 2021, at 14:42, Julien Grall <julien@xen.org> wrote:
> 
> Hi Wei,
> 
> On 11/08/2021 11:23, Wei Chen wrote:
>> Xen memory allocation and scheduler modules are NUMA aware.
>> But actually, on x86 has implemented the architecture APIs
>> to support NUMA. Arm was providing a set of fake architecture
>> APIs to make it compatible with NUMA awared memory allocation
>> and scheduler.
>> Arm system was working well as a single node NUMA system with
>> these fake APIs, because we didn't have multiple nodes NUMA
>> system on Arm. But in recent years, more and more Arm devices
>> support multiple nodes NUMA system. Like TX2, some Hisilicon
>> chips and the Ampere Altra.
> 
> All the platforms you mention here are servers (so mainly ACPI). However, this series is adding DT support.
> 
> Could you outline the long term plan for DT? Is it going to be used on production HW?

Yes we are already and will continue to use this on production HW.
Some embedded hardware will have some usage of NUMA (as some embedded functions do require lots of computing power).
We are doing experiments of that right now using those patches.

Cheers
Bertrand

> 
> Cheers,
> 
> -- 
> Julien Grall



^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64
  2021-08-19 14:05   ` Bertrand Marquis
@ 2021-08-19 17:11     ` Julien Grall
  0 siblings, 0 replies; 196+ messages in thread
From: Julien Grall @ 2021-08-19 17:11 UTC (permalink / raw)
  To: Bertrand Marquis; +Cc: Wei Chen, xen-devel, sstabellini, jbeulich



On 19/08/2021 15:05, Bertrand Marquis wrote:
> Hi Julien,

Hi Bertrand,

>> On 19 Aug 2021, at 14:42, Julien Grall <julien@xen.org> wrote:
>>
>> Hi Wei,
>>
>> On 11/08/2021 11:23, Wei Chen wrote:
>>> Xen memory allocation and scheduler modules are NUMA aware.
>>> But actually, on x86 has implemented the architecture APIs
>>> to support NUMA. Arm was providing a set of fake architecture
>>> APIs to make it compatible with NUMA awared memory allocation
>>> and scheduler.
>>> Arm system was working well as a single node NUMA system with
>>> these fake APIs, because we didn't have multiple nodes NUMA
>>> system on Arm. But in recent years, more and more Arm devices
>>> support multiple nodes NUMA system. Like TX2, some Hisilicon
>>> chips and the Ampere Altra.
>>
>> All the platforms you mention here are servers (so mainly ACPI). However, this series is adding DT support.
>>
>> Could you outline the long term plan for DT? Is it going to be used on production HW?
> 
> Yes we are already and will continue to use this on production HW.
> Some embedded hardware will have some usage of NUMA (as some embedded functions do require lots of computing power).

Interesting! Thank you for the clarifications.

> We are doing experiments of that right now using those patches.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 18/40] xen/arm: Keep memory nodes in dtb for NUMA when boot from EFI
  2021-08-11 10:24 ` [XEN RFC PATCH 18/40] xen/arm: Keep memory nodes in dtb for NUMA when boot from EFI Wei Chen
@ 2021-08-19 17:35   ` Julien Grall
  2021-08-20  2:18     ` Wei Chen
  2021-08-26 23:24   ` Stefano Stabellini
  1 sibling, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-19 17:35 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:24, Wei Chen wrote:
> EFI can get memory map from EFI system table. But EFI system
> table doesn't contain memory NUMA information, EFI depends on
> ACPI SRAT or device tree memory node to parse memory blocks'
> NUMA mapping.
> 
> But in current code, when Xen is booting from EFI, it will
> delete all memory nodes in device tree. So in UEFI + DTB
> boot, we don't have numa-node-id for memory blocks any more.
> 
> So in this patch, we will keep memory nodes in device tree for
> NUMA code to parse memory numa-node-id later.
> 
> As a side effect, if we still parse boot memory information in
> early_scan_node, bootmem.info will calculate memory ranges in
> memory nodes twice. So we have to prvent early_scan_node to

s/prvent/prevent/

> parse memory nodes in EFI boot.
> 
> As EFI APIs only can be used in Arm64, so we introduced a wrapper
> in header file to prevent #ifdef CONFIG_ARM_64/32 in code block.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/bootfdt.c      |  8 +++++++-
>   xen/arch/arm/efi/efi-boot.h | 25 -------------------------
>   xen/include/asm-arm/setup.h |  6 ++++++
>   3 files changed, 13 insertions(+), 26 deletions(-)
> 
> diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
> index 476e32e0f5..7df149dbca 100644
> --- a/xen/arch/arm/bootfdt.c
> +++ b/xen/arch/arm/bootfdt.c
> @@ -11,6 +11,7 @@
>   #include <xen/lib.h>
>   #include <xen/kernel.h>
>   #include <xen/init.h>
> +#include <xen/efi.h>
>   #include <xen/device_tree.h>
>   #include <xen/libfdt/libfdt.h>
>   #include <xen/sort.h>
> @@ -335,7 +336,12 @@ static int __init early_scan_node(const void *fdt,
>   {
>       int rc = 0;
>   
> -    if ( device_tree_node_matches(fdt, node, "memory") )
> +    /*
> +     * If system boot from EFI, bootinfo.mem has been set by EFI,

"If the system boot". Although, I would suggest to write:

"If Xen has been booted via UEFI, the memory banks will already be 
populated. So we should skip the parsing."

> +     * so we don't need to parse memory node from DTB.
> +     */
> +    if ( device_tree_node_matches(fdt, node, "memory") &&
> +         !arch_efi_enabled(EFI_BOOT) )

arch_efi_enabled() is going to be less expensive than 
device_tree_node_matches(). So I would suggest to re-order the operands.

>           rc = process_memory_node(fdt, node, name, depth,
>                                    address_cells, size_cells, &bootinfo.mem);
>       else if ( depth == 1 && !dt_node_cmp(name, "reserved-memory") )
> diff --git a/xen/arch/arm/efi/efi-boot.h b/xen/arch/arm/efi/efi-boot.h
> index cf9c37153f..d0a9987fa4 100644
> --- a/xen/arch/arm/efi/efi-boot.h
> +++ b/xen/arch/arm/efi/efi-boot.h
> @@ -197,33 +197,8 @@ EFI_STATUS __init fdt_add_uefi_nodes(EFI_SYSTEM_TABLE *sys_table,
>       int status;
>       u32 fdt_val32;
>       u64 fdt_val64;
> -    int prev;
>       int num_rsv;
>   
> -    /*
> -     * Delete any memory nodes present.  The EFI memory map is the only
> -     * memory description provided to Xen.
> -     */
> -    prev = 0;
> -    for (;;)
> -    {
> -        const char *type;
> -        int len;
> -
> -        node = fdt_next_node(fdt, prev, NULL);
> -        if ( node < 0 )
> -            break;
> -
> -        type = fdt_getprop(fdt, node, "device_type", &len);
> -        if ( type && strncmp(type, "memory", len) == 0 )
> -        {
> -            fdt_del_node(fdt, node);
> -            continue;
> -        }
> -
> -        prev = node;
> -    }
> -
>      /*
>       * Delete all memory reserve map entries. When booting via UEFI,
>       * kernel will use the UEFI memory map to find reserved regions.
> diff --git a/xen/include/asm-arm/setup.h b/xen/include/asm-arm/setup.h
> index c4b6af6029..e4fb5f0d49 100644
> --- a/xen/include/asm-arm/setup.h
> +++ b/xen/include/asm-arm/setup.h
> @@ -123,6 +123,12 @@ void device_tree_get_reg(const __be32 **cell, u32 address_cells,
>   u32 device_tree_get_u32(const void *fdt, int node,
>                           const char *prop_name, u32 dflt);
>   
> +#if defined(CONFIG_ARM_64)
> +#define arch_efi_enabled(x) efi_enabled(x)
> +#else
> +#define arch_efi_enabled(x) (0)
> +#endif

I would prefer if we introduce CONFIG_EFI that would stub efi_enabled 
for architecture not supporting EFI.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 21/40] xen/arm: introduce device_tree_numa as a switch for device tree NUMA
  2021-08-11 10:24 ` [XEN RFC PATCH 21/40] xen/arm: introduce device_tree_numa as a switch for device tree NUMA Wei Chen
@ 2021-08-19 17:45   ` Julien Grall
  2021-08-20  2:21     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-19 17:45 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:24, Wei Chen wrote:
> Like acpi_numa in x86 as a switch for ACPI based NUMA, we introduce
> device_tree_numa as a switch for Arm device tree based NUMA. When
> NUMA information in device tree is invalid, this switch will be set
> to -1, then NUMA support for Arm will be disabled, even if user set
> numa_off=0.

The hypervisor will never use both ACPI and DT at runtime. In fact...

> 
> Keep using bad_srat and srat_disabled functions name, because we will
> reuse node_covers_memory and acpi_scan_nodes code for Arm.

... given that both functions will be called from the common code, it 
will be a lot more difficult to add ACPI afterwards.

So I think we should either rename acpi_numa to something more generic 
(maybe fw_numa) or convert numa_off to a tri-state.

This will allow to have the code mostly common.

> These
> functions are using these two API names. And, as device tree can be
> treated as one kind of static resource table. So we keep these two
> function names.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/Makefile           |  1 +
>   xen/arch/arm/numa_device_tree.c | 35 +++++++++++++++++++++++++++++++++
>   xen/include/asm-arm/numa.h      |  2 ++
>   3 files changed, 38 insertions(+)
>   create mode 100644 xen/arch/arm/numa_device_tree.c
> 
> diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
> index 6e3fb8033e..13e1549be0 100644
> --- a/xen/arch/arm/Makefile
> +++ b/xen/arch/arm/Makefile
> @@ -36,6 +36,7 @@ obj-y += mem_access.o
>   obj-y += mm.o
>   obj-y += monitor.o
>   obj-$(CONFIG_NUMA) += numa.o
> +obj-$(CONFIG_DEVICE_TREE_NUMA) += numa_device_tree.o
>   obj-y += p2m.o
>   obj-y += percpu.o
>   obj-y += platform.o
> diff --git a/xen/arch/arm/numa_device_tree.c b/xen/arch/arm/numa_device_tree.c
> new file mode 100644
> index 0000000000..1c74ad135d
> --- /dev/null
> +++ b/xen/arch/arm/numa_device_tree.c
> @@ -0,0 +1,35 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Arm Architecture support layer for NUMA.
> + *
> + * Copyright (C) 2021 Arm Ltd
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + *
> + */
> +#include <xen/init.h>
> +#include <xen/nodemask.h>
> +#include <xen/numa.h>
> +
> +s8 device_tree_numa = 0;
> +
> +int srat_disabled(void)

You export this one and ...

> +void __init bad_srat(void)

... this one without providing in a prototype.

Looking at the rest of the series... they will be turned static in the 
next patch (#21) but then re-exported in patch #33.

In general, we should refrain to modify code that was added in the same 
patch unless it is not possible for split reason (e.g code clean-up and 
then code movement).

In this case, the helpers should be exported from now.

> +{
> +    printk(KERN_ERR "DT: NUMA information is not used.\n");
> +    device_tree_numa = -1;
> +}
> diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
> index 559b028a01..756ad82d07 100644
> --- a/xen/include/asm-arm/numa.h
> +++ b/xen/include/asm-arm/numa.h
> @@ -23,6 +23,8 @@ typedef u8 nodeid_t;
>   #define NUMA_LOCAL_DISTANCE     10
>   #define NUMA_REMOTE_DISTANCE    20
>   
> +extern s8 device_tree_numa;
> +
>   extern void numa_init(bool acpi_off);
>   extern void numa_set_distance(nodeid_t from, nodeid_t to, uint32_t distance);
>   
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse device tree processor node
  2021-08-11 10:24 ` [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse device tree processor node Wei Chen
@ 2021-08-19 18:09   ` Julien Grall
  2021-08-23  8:42     ` Wei Chen
  2021-08-19 18:10   ` Julien Grall
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-19 18:09 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:24, Wei Chen wrote:
> Processor NUMA ID information is stored in device tree's processor
> node as "numa-node-id". We need a new helper to parse this ID from
> processor node. If we get this ID from processor node, this ID's
> validity still need to be checked. Once we got a invalid NUMA ID
> from any processor node, the device tree will be marked as NUMA
> information invalid.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/numa_device_tree.c | 41 +++++++++++++++++++++++++++++++--
>   1 file changed, 39 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/arm/numa_device_tree.c b/xen/arch/arm/numa_device_tree.c
> index 1c74ad135d..37cc56acf3 100644
> --- a/xen/arch/arm/numa_device_tree.c
> +++ b/xen/arch/arm/numa_device_tree.c
> @@ -20,16 +20,53 @@
>   #include <xen/init.h>
>   #include <xen/nodemask.h>
>   #include <xen/numa.h>
> +#include <xen/device_tree.h>
> +#include <asm/setup.h>
>   
>   s8 device_tree_numa = 0;
> +static nodemask_t processor_nodes_parsed __initdata;
>   
> -int srat_disabled(void)
> +static int srat_disabled(void)
>   {
>       return numa_off || device_tree_numa < 0;
>   }
>   
> -void __init bad_srat(void)
> +static __init void bad_srat(void)
>   {
>       printk(KERN_ERR "DT: NUMA information is not used.\n");
>       device_tree_numa = -1;
>   }
> +
> +/* Callback for device tree processor affinity */
> +static int __init dtb_numa_processor_affinity_init(nodeid_t node)
> +{
> +    if ( srat_disabled() )
> +        return -EINVAL;
> +    else if ( node == NUMA_NO_NODE || node >= MAX_NUMNODES ) {
> +		bad_srat();
> +		return -EINVAL;

You seem to have a mix of soft and hard tab in this file. Is there a lot 
of the code that was directly copied from Linux? If not, then the file 
should be using Xen coding style.

> +	}
> +
> +    node_set(node, processor_nodes_parsed);
> +
> +    device_tree_numa = 1;
> +    printk(KERN_INFO "DT: NUMA node %u processor parsed\n", node);
> +
> +    return 0;
> +}
> +
> +/* Parse CPU NUMA node info */
> +int __init device_tree_parse_numa_cpu_node(const void *fdt, int node)

AFAICT, you are going to turn this helper static in a follow-up patch. 
This is a bad practice. Instead, the function should be static from the 
beginning. If it is not possible, then you should re-order the code.

In this case, I think you can add the boilerplate to parse the NUMA 
information (patch #25) here and then extend it in each patch.


> +{
> +    uint32_t nid;
> +
> +    nid = device_tree_get_u32(fdt, node, "numa-node-id", MAX_NUMNODES);
> +    printk(XENLOG_WARNING "CPU on NUMA node:%u\n", nid);
> +    if ( nid >= MAX_NUMNODES )
> +    {
> +        printk(XENLOG_WARNING "Node id %u exceeds maximum value\n", nid);
> +        return -EINVAL;
> +    }
> +
> +    return dtb_numa_processor_affinity_init(nid);
> +}
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse device tree processor node
  2021-08-11 10:24 ` [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse device tree processor node Wei Chen
  2021-08-19 18:09   ` Julien Grall
@ 2021-08-19 18:10   ` Julien Grall
  2021-08-23  8:47     ` Wei Chen
  2021-08-19 18:13   ` Julien Grall
  2021-08-27  0:06   ` Stefano Stabellini
  3 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-19 18:10 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

On 11/08/2021 11:24, Wei Chen wrote:
> Processor NUMA ID information is stored in device tree's processor
> node as "numa-node-id". We need a new helper to parse this ID from
> processor node. If we get this ID from processor node, this ID's
> validity still need to be checked. Once we got a invalid NUMA ID
> from any processor node, the device tree will be marked as NUMA
> information invalid.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/numa_device_tree.c | 41 +++++++++++++++++++++++++++++++--
>   1 file changed, 39 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/arm/numa_device_tree.c b/xen/arch/arm/numa_device_tree.c
> index 1c74ad135d..37cc56acf3 100644
> --- a/xen/arch/arm/numa_device_tree.c
> +++ b/xen/arch/arm/numa_device_tree.c
> @@ -20,16 +20,53 @@
>   #include <xen/init.h>
>   #include <xen/nodemask.h>
>   #include <xen/numa.h>
> +#include <xen/device_tree.h>
> +#include <asm/setup.h>
>   
>   s8 device_tree_numa = 0;
> +static nodemask_t processor_nodes_parsed __initdata;
>   
> -int srat_disabled(void)
> +static int srat_disabled(void)
>   {
>       return numa_off || device_tree_numa < 0;
>   }
>   
> -void __init bad_srat(void)
> +static __init void bad_srat(void)
>   {
>       printk(KERN_ERR "DT: NUMA information is not used.\n");
>       device_tree_numa = -1;
>   }
> +
> +/* Callback for device tree processor affinity */
> +static int __init dtb_numa_processor_affinity_init(nodeid_t node)

I forgot to answer. It seems odd that some of the function names start 
with dtb_* while other starts device_tree_*. Any particular reason for 
that difference of naming?

> +{
> +    if ( srat_disabled() )
> +        return -EINVAL;
> +    else if ( node == NUMA_NO_NODE || node >= MAX_NUMNODES ) {
> +		bad_srat();
> +		return -EINVAL;
> +	}
> +
> +    node_set(node, processor_nodes_parsed);
> +
> +    device_tree_numa = 1;
> +    printk(KERN_INFO "DT: NUMA node %u processor parsed\n", node);
> +
> +    return 0;
> +}
> +
> +/* Parse CPU NUMA node info */
> +int __init device_tree_parse_numa_cpu_node(const void *fdt, int node)
> +{
> +    uint32_t nid;
> +
> +    nid = device_tree_get_u32(fdt, node, "numa-node-id", MAX_NUMNODES);
> +    printk(XENLOG_WARNING "CPU on NUMA node:%u\n", nid);
> +    if ( nid >= MAX_NUMNODES )
> +    {
> +        printk(XENLOG_WARNING "Node id %u exceeds maximum value\n", nid);
> +        return -EINVAL;
> +    }
> +
> +    return dtb_numa_processor_affinity_init(nid);
> +}
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse device tree processor node
  2021-08-11 10:24 ` [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse device tree processor node Wei Chen
  2021-08-19 18:09   ` Julien Grall
  2021-08-19 18:10   ` Julien Grall
@ 2021-08-19 18:13   ` Julien Grall
  2021-08-20  2:23     ` Wei Chen
  2021-08-27  0:06   ` Stefano Stabellini
  3 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-19 18:13 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:24, Wei Chen wrote:
> Processor NUMA ID information is stored in device tree's processor
> node as "numa-node-id". We need a new helper to parse this ID from
> processor node. If we get this ID from processor node, this ID's
> validity still need to be checked. Once we got a invalid NUMA ID
> from any processor node, the device tree will be marked as NUMA
> information invalid.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/numa_device_tree.c | 41 +++++++++++++++++++++++++++++++--
>   1 file changed, 39 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/arm/numa_device_tree.c b/xen/arch/arm/numa_device_tree.c
> index 1c74ad135d..37cc56acf3 100644
> --- a/xen/arch/arm/numa_device_tree.c
> +++ b/xen/arch/arm/numa_device_tree.c
> @@ -20,16 +20,53 @@
>   #include <xen/init.h>
>   #include <xen/nodemask.h>
>   #include <xen/numa.h>
> +#include <xen/device_tree.h>

Nothing in this file seems to depend on xen/device_tree.h. So why do you 
need to include it?

> +#include <asm/setup.h>
>   
>   s8 device_tree_numa = 0;
> +static nodemask_t processor_nodes_parsed __initdata;
>   
> -int srat_disabled(void)
> +static int srat_disabled(void)
>   {
>       return numa_off || device_tree_numa < 0;
>   }
>   
> -void __init bad_srat(void)
> +static __init void bad_srat(void)
>   {
>       printk(KERN_ERR "DT: NUMA information is not used.\n");
>       device_tree_numa = -1;
>   }
> +
> +/* Callback for device tree processor affinity */
> +static int __init dtb_numa_processor_affinity_init(nodeid_t node)
> +{
> +    if ( srat_disabled() )
> +        return -EINVAL;
> +    else if ( node == NUMA_NO_NODE || node >= MAX_NUMNODES ) {
> +		bad_srat();
> +		return -EINVAL;
> +	}
> +
> +    node_set(node, processor_nodes_parsed);
> +
> +    device_tree_numa = 1;
> +    printk(KERN_INFO "DT: NUMA node %u processor parsed\n", node);
> +
> +    return 0;
> +}
> +
> +/* Parse CPU NUMA node info */
> +int __init device_tree_parse_numa_cpu_node(const void *fdt, int node)
> +{
> +    uint32_t nid;
> +
> +    nid = device_tree_get_u32(fdt, node, "numa-node-id", MAX_NUMNODES);
> +    printk(XENLOG_WARNING "CPU on NUMA node:%u\n", nid);
> +    if ( nid >= MAX_NUMNODES )
> +    {
> +        printk(XENLOG_WARNING "Node id %u exceeds maximum value\n", nid);
> +        return -EINVAL;
> +    }
> +
> +    return dtb_numa_processor_affinity_init(nid);
> +}
> 

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 02/40] xen/arm: Print a 64-bit number in hex from early uart
  2021-08-19 13:05   ` Julien Grall
@ 2021-08-20  1:13     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-20  1:13 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini, jbeulich; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月19日 21:05
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 02/40] xen/arm: Print a 64-bit number in hex
> from early uart
> 
> Hi Wei,
> 
> On 11/08/2021 11:23, Wei Chen wrote:
> > Current putn function that is using for early print
> > only can print low 32-bit of AArch64 register. This
> > will lose some important messages while debugging
> > with early console. For example:
> > (XEN) Bringing up CPU5
> > - CPU 0000000100000100 booting -
> > Will be truncated to
> > (XEN) Bringing up CPU5
> > - CPU 00000100 booting -
> >
> > In this patch, we increased the print loops and shift
> > bits to make putn print 64-bit number.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> 
> Acked-by: Julien Grall <jgrall@amazon.com>
> 
> > ---
> >   xen/arch/arm/arm64/head.S | 9 +++++----
> >   1 file changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> > index aa1f88c764..b32639d7d6 100644
> > --- a/xen/arch/arm/arm64/head.S
> > +++ b/xen/arch/arm/arm64/head.S
> > @@ -862,17 +862,18 @@ puts:
> >           ret
> >   ENDPROC(puts)
> >
> > -/* Print a 32-bit number in hex.  Specific to the PL011 UART.
> > +/* Print a 64-bit number in hex.  Specific to the PL011 UART.
> 
> As you modify the line, can you take the opportunity to write:
> 
> /*
>   * Print a 64-bit...
> 
> And also drop the second sentence as it the code has not been PL011
> specific for quite a while now.
> 

Ok, I will do it in next version

> >    * x0: Number to print.
> >    * x23: Early UART base address
> >    * Clobbers x0-x3 */
> > +#define PRINT_MASK 0xf000000000000000
> >   putn:
> >           adr   x1, hex
> > -        mov   x3, #8
> > +        mov   x3, #16
> >   1:
> >           early_uart_ready x23, 2
> > -        and   x2, x0, #0xf0000000    /* Mask off the top nybble */
> > -        lsr   x2, x2, #28
> > +        and   x2, x0, #PRINT_MASK    /* Mask off the top nybble */
> > +        lsr   x2, x2, #60
> >           ldrb  w2, [x1, x2]           /* Convert to a char */
> >           early_uart_transmit x23, w2
> >           lsl   x0, x0, #4             /* Roll it through one nybble at
> a time */
> >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 04/40] xen/arm: return default DMA bit width when platform is not set
  2021-08-19 13:28   ` Julien Grall
@ 2021-08-20  2:04     ` Wei Chen
  2021-08-20  8:20       ` Julien Grall
  0 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-20  2:04 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini, jbeulich; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月19日 21:28
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 04/40] xen/arm: return default DMA bit width
> when platform is not set
> 
> Hi,
> 
> On 11/08/2021 11:23, Wei Chen wrote:
> > From: Hongda Deng <Hongda.Deng@arm.com>
> >
> > In current code, arch_get_dma_bitsize will return 32 when platorm
> > or platform->dma_bitsize is not set. It's not resonable, for Arm,
> 
> s/resonable/reasonable/
> 

Ok

> > we don't require to reserve DMA memory. So we set dma_bitsize always
> > be 0. In NO-NUMA system, arch_get_dma_bitsize will not be invoked,
> > so dma_bitsize will not be overrided by this function.
> 
> arch_get_dma_bitsize() is also used to allocate dom0 memory. We need to
> be able to allocate some DMA-able memory that can be used by every devices.
> 
> > But in NUMA
> > system, once the online nodes are greater than 1, this function will
> > be invoked. The dma_bitsize will be limited to 32. That means, only
> > first 4GB memory can be used for DMA. But that's against our hardware
> > design. We don't have that kind of restriction on hardware.
> 
> What do you mean by "hardware design"? Are you referring to the server
> you boot Xen on?
> 

Yes. I will change it to some neutral words. something like:
"But that could not reflect some hardware's real DMA ability. They may not
have kind of restriction on hardware." ?


> Anyway, there are plenty of platform out that have devices which can't
> DMA into memory above 32-bit. On RPI, this is even lower (30-bit).
> 
> So I would be cautious to change the default limit.
> 

How about return 0 when platform doesn't specify the limit?
In my opinion, arbitrary to give 32 on AArch64 doesn't make sense.
But as you mentioned, if Xen is running on a platform with DMA limitation,
but we have not listed this platform in Xen supported list, Xen cannot
get DMA limit from platform->dma_bitsize. In this case, return 0 will
also cause some issue.

> At the moment, the only place on Arm where we need DMA-able memory is
> for dom0. This is allocated at boot and can't change afterwards (for now).
> 

For Dom0, we squash the patch#5 into this patch?

> So I would explore to remove the NUMA check for drop the DMA zone. FAOD,
> both suggestion are for Arm only. For x86, they need to be kept.
> 

Without introducing new flag, such as lowmem_for_dma, it's a little
hard to skip the numa node check. Unless we crudely add #ifdef ARCH to 
common code, which is not what we want to see ...
      if ( !dma_bitsize && (num_online_nodes() > 1) )
          dma_bitsize = arch_get_dma_bitsize();

> > Only
> > platform setting can override dma_bitsize. So in this patch, we
> > return default dma_bitsize, when platform and platorm->dma_bitsize
> > are not set.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > Signed-off-by: Hongda Deng <Hongda.Deng@arm.com>
> > ---
> >   xen/arch/arm/platform.c | 4 +++-
> >   xen/common/page_alloc.c | 2 +-
> >   2 files changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/xen/arch/arm/platform.c b/xen/arch/arm/platform.c
> > index 4db5bbb4c5..0a27fef9a4 100644
> > --- a/xen/arch/arm/platform.c
> > +++ b/xen/arch/arm/platform.c
> > @@ -27,6 +27,7 @@ extern const struct platform_desc _splatform[],
> _eplatform[];
> >   /* Pointer to the current platform description */
> >   static const struct platform_desc *platform;
> >
> > +extern unsigned int dma_bitsize;
> >
> >   static bool __init platform_is_compatible(const struct platform_desc
> *plat)
> >   {
> > @@ -157,7 +158,8 @@ bool platform_device_is_blacklisted(const struct
> dt_device_node *node)
> >
> >   unsigned int arch_get_dma_bitsize(void)
> >   {
> > -    return ( platform && platform->dma_bitsize ) ? platform-
> >dma_bitsize : 32;
> > +    return ( platform && platform->dma_bitsize ) ? platform-
> >dma_bitsize
> > +                                                 : dma_bitsize;
> >   }
> >
> >   /*
> > diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> > index 958ba0cd92..0f0cae5a4e 100644
> > --- a/xen/common/page_alloc.c
> > +++ b/xen/common/page_alloc.c
> > @@ -227,7 +227,7 @@ static bool __read_mostly scrub_debug;
> >    * Bit width of the DMA heap -- used to override NUMA-node-first.
> >    * allocation strategy, which can otherwise exhaust low memory.
> >    */
> > -static unsigned int dma_bitsize;
> > +unsigned int dma_bitsize;
> >   integer_param("dma_bits", dma_bitsize);
> >
> >   /* Offlined page list, protected by heap_lock. */
> >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 05/40] xen/arm: Fix lowmem_bitsize when arch_get_dma_bitsize return 0
  2021-08-19 13:32   ` Julien Grall
@ 2021-08-20  2:05     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-20  2:05 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini, jbeulich; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月19日 21:32
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 05/40] xen/arm: Fix lowmem_bitsize when
> arch_get_dma_bitsize return 0
> 
> Hi,
> 
> I guess this patch may be dropped after my comment on patch #4. I will
> comment just on the process.
> 

Ok

> On 11/08/2021 11:23, Wei Chen wrote:
> > From: Hongda Deng <Hongda.Deng@arm.com>
> >
> > In previous patch, we make arch_get_dma_bitsize return 0 when
> > dma_bitsize and platform->dma_bitsize are not set. But this
> > will affect lowmem_bitsize in allocate_memory_11 for domain0.
> > Because this function depends lowmem_bitsize to allocate memory
> > below 4GB.
> >
> > In current code, when arch_get_dma_bitsize return 0, lowmem_bitsize
> > will be set to 0. In this case, we will get "No bank has been
> > allocated below 0-bit." message while allocating domain0 memory.
> > And the lowmem will be set to false.
> >
> > This behavior is inconsistent with what allocate_memory_11 done
> > before, and doesn't meet this functions requirements. So we
> > check arch_get_dma_bitsize's return value before set lowmem_bitsize.
> > Avoid setting lowmem_bitsize to 0 by mistake.
> 
> In general, we want to avoid breaking bisection within a series. This
> means that this patch should be before patch #4.
> 
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > Signed-off-by: Hongda Deng <Hongda.Deng@arm.com>
> > ---
> >   xen/arch/arm/domain_build.c | 11 ++++++++++-
> >   1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> > index 6c86d52781..cf341f349f 100644
> > --- a/xen/arch/arm/domain_build.c
> > +++ b/xen/arch/arm/domain_build.c
> > @@ -265,9 +265,18 @@ static void __init allocate_memory_11(struct domain
> *d,
> >       int i;
> >
> >       bool lowmem = true;
> > -    unsigned int lowmem_bitsize = min(32U, arch_get_dma_bitsize());
> > +    unsigned int lowmem_bitsize = arch_get_dma_bitsize();
> >       unsigned int bits;
> >
> > +    /*
> > +       When dma_bitsize and platform->dma_bitsize are not set,
> > +       arch_get_dma_bitsize will return 0. That means this system
> > +       doesn't need to reserve memory for DMA. But in order to
> > +       meet above requirements, we still need to try to allocate
> > +       memory below 4GB for Dom0.
> > +    */
> 
> The coding style for comments is:
> 
> /*
>   * A
>   * B
>   */
> 

I will fix it.

> > +    lowmem_bitsize = lowmem_bitsize ? min(32U, lowmem_bitsize) : 32U;
> > +
> >       /*
> >        * TODO: Implement memory bank allocation when DOM0 is not direct
> >        * mapped
> >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 07/40] xen/arm: use !CONFIG_NUMA to keep fake NUMA API
  2021-08-19 13:34   ` Julien Grall
@ 2021-08-20  2:08     ` Wei Chen
  2021-08-20  8:23       ` Julien Grall
  0 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-20  2:08 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini, jbeulich; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月19日 21:34
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 07/40] xen/arm: use !CONFIG_NUMA to keep fake
> NUMA API
> 
> Hi Wei,
> 
> On 11/08/2021 11:23, Wei Chen wrote:
> > Only Arm64 supports NUMA, the CONFIG_NUMA could not be
> > enabled for Arm32.
> 
> What do you mean by "could not be enabled"?

I have not seen any Arm32 hardware support NUMA, so I think
we don't need to support Arm32 NUMA. In this case, this Kconfig
option could not be enabled on Arm32.

> 
> > Even in Arm64, users still can disable
> > the CONFIG_NUMA through Kconfig option. In this case, keep
> > current fake NUMA API, will make Arm code still can work
> > with NUMA aware memory allocation and scheduler.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/include/asm-arm/numa.h | 4 ++++
> >   1 file changed, 4 insertions(+)
> >
> > diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
> > index 31a6de4e23..ab9c4a2448 100644
> > --- a/xen/include/asm-arm/numa.h
> > +++ b/xen/include/asm-arm/numa.h
> > @@ -5,6 +5,8 @@
> >
> >   typedef u8 nodeid_t;
> >
> > +#if !defined(CONFIG_NUMA)
> 
> NIT: We tend to use #ifndef rather than #if !defined(...)
> 

OK, I will change related changes in this series.

> > +
> >   /* Fake one node for now. See also node_online_map. */
> >   #define cpu_to_node(cpu) 0
> >   #define node_to_cpumask(node)   (cpu_online_map)
> > @@ -25,6 +27,8 @@ extern mfn_t first_valid_mfn;
> >   #define node_start_pfn(nid) (mfn_x(first_valid_mfn))
> >   #define __node_distance(a, b) (20)
> >
> > +#endif
> > +
> >   #endif /* __ARCH_ARM_NUMA_H */
> >   /*
> >    * Local variables:
> >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 18/40] xen/arm: Keep memory nodes in dtb for NUMA when boot from EFI
  2021-08-19 17:35   ` Julien Grall
@ 2021-08-20  2:18     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-20  2:18 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini, jbeulich; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月20日 1:35
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 18/40] xen/arm: Keep memory nodes in dtb for
> NUMA when boot from EFI
> 
> Hi Wei,
> 
> On 11/08/2021 11:24, Wei Chen wrote:
> > EFI can get memory map from EFI system table. But EFI system
> > table doesn't contain memory NUMA information, EFI depends on
> > ACPI SRAT or device tree memory node to parse memory blocks'
> > NUMA mapping.
> >
> > But in current code, when Xen is booting from EFI, it will
> > delete all memory nodes in device tree. So in UEFI + DTB
> > boot, we don't have numa-node-id for memory blocks any more.
> >
> > So in this patch, we will keep memory nodes in device tree for
> > NUMA code to parse memory numa-node-id later.
> >
> > As a side effect, if we still parse boot memory information in
> > early_scan_node, bootmem.info will calculate memory ranges in
> > memory nodes twice. So we have to prvent early_scan_node to
> 
> s/prvent/prevent/
> 

Ok

> > parse memory nodes in EFI boot.
> >
> > As EFI APIs only can be used in Arm64, so we introduced a wrapper
> > in header file to prevent #ifdef CONFIG_ARM_64/32 in code block.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/arm/bootfdt.c      |  8 +++++++-
> >   xen/arch/arm/efi/efi-boot.h | 25 -------------------------
> >   xen/include/asm-arm/setup.h |  6 ++++++
> >   3 files changed, 13 insertions(+), 26 deletions(-)
> >
> > diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
> > index 476e32e0f5..7df149dbca 100644
> > --- a/xen/arch/arm/bootfdt.c
> > +++ b/xen/arch/arm/bootfdt.c
> > @@ -11,6 +11,7 @@
> >   #include <xen/lib.h>
> >   #include <xen/kernel.h>
> >   #include <xen/init.h>
> > +#include <xen/efi.h>
> >   #include <xen/device_tree.h>
> >   #include <xen/libfdt/libfdt.h>
> >   #include <xen/sort.h>
> > @@ -335,7 +336,12 @@ static int __init early_scan_node(const void *fdt,
> >   {
> >       int rc = 0;
> >
> > -    if ( device_tree_node_matches(fdt, node, "memory") )
> > +    /*
> > +     * If system boot from EFI, bootinfo.mem has been set by EFI,
> 
> "If the system boot". Although, I would suggest to write:
> 
> "If Xen has been booted via UEFI, the memory banks will already be
> populated. So we should skip the parsing."
> 

Yes, that would be better. I will change it in next version.

> > +     * so we don't need to parse memory node from DTB.
> > +     */
> > +    if ( device_tree_node_matches(fdt, node, "memory") &&
> > +         !arch_efi_enabled(EFI_BOOT) )
> 
> arch_efi_enabled() is going to be less expensive than
> device_tree_node_matches(). So I would suggest to re-order the operands.
> 

yes.

> >           rc = process_memory_node(fdt, node, name, depth,
> >                                    address_cells, size_cells,
> &bootinfo.mem);
> >       else if ( depth == 1 && !dt_node_cmp(name, "reserved-memory") )
> > diff --git a/xen/arch/arm/efi/efi-boot.h b/xen/arch/arm/efi/efi-boot.h
> > index cf9c37153f..d0a9987fa4 100644
> > --- a/xen/arch/arm/efi/efi-boot.h
> > +++ b/xen/arch/arm/efi/efi-boot.h
> > @@ -197,33 +197,8 @@ EFI_STATUS __init
> fdt_add_uefi_nodes(EFI_SYSTEM_TABLE *sys_table,
> >       int status;
> >       u32 fdt_val32;
> >       u64 fdt_val64;
> > -    int prev;
> >       int num_rsv;
> >
> > -    /*
> > -     * Delete any memory nodes present.  The EFI memory map is the only
> > -     * memory description provided to Xen.
> > -     */
> > -    prev = 0;
> > -    for (;;)
> > -    {
> > -        const char *type;
> > -        int len;
> > -
> > -        node = fdt_next_node(fdt, prev, NULL);
> > -        if ( node < 0 )
> > -            break;
> > -
> > -        type = fdt_getprop(fdt, node, "device_type", &len);
> > -        if ( type && strncmp(type, "memory", len) == 0 )
> > -        {
> > -            fdt_del_node(fdt, node);
> > -            continue;
> > -        }
> > -
> > -        prev = node;
> > -    }
> > -
> >      /*
> >       * Delete all memory reserve map entries. When booting via UEFI,
> >       * kernel will use the UEFI memory map to find reserved regions.
> > diff --git a/xen/include/asm-arm/setup.h b/xen/include/asm-arm/setup.h
> > index c4b6af6029..e4fb5f0d49 100644
> > --- a/xen/include/asm-arm/setup.h
> > +++ b/xen/include/asm-arm/setup.h
> > @@ -123,6 +123,12 @@ void device_tree_get_reg(const __be32 **cell, u32
> address_cells,
> >   u32 device_tree_get_u32(const void *fdt, int node,
> >                           const char *prop_name, u32 dflt);
> >
> > +#if defined(CONFIG_ARM_64)
> > +#define arch_efi_enabled(x) efi_enabled(x)
> > +#else
> > +#define arch_efi_enabled(x) (0)
> > +#endif
> 
> I would prefer if we introduce CONFIG_EFI that would stub efi_enabled
> for architecture not supporting EFI.
> 

Yes, that's a good idea.
I also feel a little awkward with the current arch helpers.

> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 21/40] xen/arm: introduce device_tree_numa as a switch for device tree NUMA
  2021-08-19 17:45   ` Julien Grall
@ 2021-08-20  2:21     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-20  2:21 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini, jbeulich; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月20日 1:45
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 21/40] xen/arm: introduce device_tree_numa as
> a switch for device tree NUMA
> 
> Hi Wei,
> 
> On 11/08/2021 11:24, Wei Chen wrote:
> > Like acpi_numa in x86 as a switch for ACPI based NUMA, we introduce
> > device_tree_numa as a switch for Arm device tree based NUMA. When
> > NUMA information in device tree is invalid, this switch will be set
> > to -1, then NUMA support for Arm will be disabled, even if user set
> > numa_off=0.
> 
> The hypervisor will never use both ACPI and DT at runtime. In fact...
> 

Yes.

> >
> > Keep using bad_srat and srat_disabled functions name, because we will
> > reuse node_covers_memory and acpi_scan_nodes code for Arm.
> 
> ... given that both functions will be called from the common code, it
> will be a lot more difficult to add ACPI afterwards.
> 
> So I think we should either rename acpi_numa to something more generic
> (maybe fw_numa) or convert numa_off to a tri-state.
> 
> This will allow to have the code mostly common.
> 

I will try to address them in next version.

> > These
> > functions are using these two API names. And, as device tree can be
> > treated as one kind of static resource table. So we keep these two
> > function names.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/arm/Makefile           |  1 +
> >   xen/arch/arm/numa_device_tree.c | 35 +++++++++++++++++++++++++++++++++
> >   xen/include/asm-arm/numa.h      |  2 ++
> >   3 files changed, 38 insertions(+)
> >   create mode 100644 xen/arch/arm/numa_device_tree.c
> >
> > diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
> > index 6e3fb8033e..13e1549be0 100644
> > --- a/xen/arch/arm/Makefile
> > +++ b/xen/arch/arm/Makefile
> > @@ -36,6 +36,7 @@ obj-y += mem_access.o
> >   obj-y += mm.o
> >   obj-y += monitor.o
> >   obj-$(CONFIG_NUMA) += numa.o
> > +obj-$(CONFIG_DEVICE_TREE_NUMA) += numa_device_tree.o
> >   obj-y += p2m.o
> >   obj-y += percpu.o
> >   obj-y += platform.o
> > diff --git a/xen/arch/arm/numa_device_tree.c
> b/xen/arch/arm/numa_device_tree.c
> > new file mode 100644
> > index 0000000000..1c74ad135d
> > --- /dev/null
> > +++ b/xen/arch/arm/numa_device_tree.c
> > @@ -0,0 +1,35 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Arm Architecture support layer for NUMA.
> > + *
> > + * Copyright (C) 2021 Arm Ltd
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> > + *
> > + */
> > +#include <xen/init.h>
> > +#include <xen/nodemask.h>
> > +#include <xen/numa.h>
> > +
> > +s8 device_tree_numa = 0;
> > +
> > +int srat_disabled(void)
> 
> You export this one and ...
> 
> > +void __init bad_srat(void)
> 
> ... this one without providing in a prototype.
> 
> Looking at the rest of the series... they will be turned static in the
> next patch (#21) but then re-exported in patch #33.
> 
> In general, we should refrain to modify code that was added in the same
> patch unless it is not possible for split reason (e.g code clean-up and
> then code movement).
> 
> In this case, the helpers should be exported from now.
> 

Ok.

> > +{
> > +    printk(KERN_ERR "DT: NUMA information is not used.\n");
> > +    device_tree_numa = -1;
> > +}
> > diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
> > index 559b028a01..756ad82d07 100644
> > --- a/xen/include/asm-arm/numa.h
> > +++ b/xen/include/asm-arm/numa.h
> > @@ -23,6 +23,8 @@ typedef u8 nodeid_t;
> >   #define NUMA_LOCAL_DISTANCE     10
> >   #define NUMA_REMOTE_DISTANCE    20
> >
> > +extern s8 device_tree_numa;
> > +
> >   extern void numa_init(bool acpi_off);
> >   extern void numa_set_distance(nodeid_t from, nodeid_t to, uint32_t
> distance);
> >
> >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse device tree processor node
  2021-08-19 18:13   ` Julien Grall
@ 2021-08-20  2:23     ` Wei Chen
  2021-08-20  8:44       ` Julien Grall
  0 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-20  2:23 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini, jbeulich; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月20日 2:13
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse
> device tree processor node
> 
> Hi Wei,
> 
> On 11/08/2021 11:24, Wei Chen wrote:
> > Processor NUMA ID information is stored in device tree's processor
> > node as "numa-node-id". We need a new helper to parse this ID from
> > processor node. If we get this ID from processor node, this ID's
> > validity still need to be checked. Once we got a invalid NUMA ID
> > from any processor node, the device tree will be marked as NUMA
> > information invalid.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/arm/numa_device_tree.c | 41 +++++++++++++++++++++++++++++++--
> >   1 file changed, 39 insertions(+), 2 deletions(-)
> >
> > diff --git a/xen/arch/arm/numa_device_tree.c
> b/xen/arch/arm/numa_device_tree.c
> > index 1c74ad135d..37cc56acf3 100644
> > --- a/xen/arch/arm/numa_device_tree.c
> > +++ b/xen/arch/arm/numa_device_tree.c
> > @@ -20,16 +20,53 @@
> >   #include <xen/init.h>
> >   #include <xen/nodemask.h>
> >   #include <xen/numa.h>
> > +#include <xen/device_tree.h>
> 
> Nothing in this file seems to depend on xen/device_tree.h. So why do you
> need to include it?
> 

I remember that without this header file, device_tree_get_u32 in this patch
will cause compiling failed.

> > +#include <asm/setup.h>
> >
> >   s8 device_tree_numa = 0;
> > +static nodemask_t processor_nodes_parsed __initdata;
> >
> > -int srat_disabled(void)
> > +static int srat_disabled(void)
> >   {
> >       return numa_off || device_tree_numa < 0;
> >   }
> >
> > -void __init bad_srat(void)
> > +static __init void bad_srat(void)
> >   {
> >       printk(KERN_ERR "DT: NUMA information is not used.\n");
> >       device_tree_numa = -1;
> >   }
> > +
> > +/* Callback for device tree processor affinity */
> > +static int __init dtb_numa_processor_affinity_init(nodeid_t node)
> > +{
> > +    if ( srat_disabled() )
> > +        return -EINVAL;
> > +    else if ( node == NUMA_NO_NODE || node >= MAX_NUMNODES ) {
> > +		bad_srat();
> > +		return -EINVAL;
> > +	}
> > +
> > +    node_set(node, processor_nodes_parsed);
> > +
> > +    device_tree_numa = 1;
> > +    printk(KERN_INFO "DT: NUMA node %u processor parsed\n", node);
> > +
> > +    return 0;
> > +}
> > +
> > +/* Parse CPU NUMA node info */
> > +int __init device_tree_parse_numa_cpu_node(const void *fdt, int node)
> > +{
> > +    uint32_t nid;
> > +
> > +    nid = device_tree_get_u32(fdt, node, "numa-node-id", MAX_NUMNODES);
> > +    printk(XENLOG_WARNING "CPU on NUMA node:%u\n", nid);
> > +    if ( nid >= MAX_NUMNODES )
> > +    {
> > +        printk(XENLOG_WARNING "Node id %u exceeds maximum value\n",
> nid);
> > +        return -EINVAL;
> > +    }
> > +
> > +    return dtb_numa_processor_affinity_init(nid);
> > +}
> >
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 17/40] xen/arm: Introduce DEVICE_TREE_NUMA Kconfig for arm64
  2021-08-19 13:38   ` Julien Grall
@ 2021-08-20  2:30     ` Wei Chen
  2021-08-20  8:41       ` Julien Grall
  0 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-20  2:30 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini, jbeulich; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月19日 21:38
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 17/40] xen/arm: Introduce DEVICE_TREE_NUMA
> Kconfig for arm64
> 
> Hi,
> 
> On 11/08/2021 11:24, Wei Chen wrote:
> > We need a Kconfig option to distinguish with ACPI based
> > NUMA. So we introduce the new Kconfig option:
> > DEVICE_TREE_NUMA in this patch for Arm64.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/arm/Kconfig | 10 ++++++++++
> >   1 file changed, 10 insertions(+)
> >
> > diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> > index ecfa6822e4..678cc98ea3 100644
> > --- a/xen/arch/arm/Kconfig
> > +++ b/xen/arch/arm/Kconfig
> > @@ -33,6 +33,16 @@ config ACPI
> >   	  Advanced Configuration and Power Interface (ACPI) support for Xen
> is
> >   	  an alternative to device tree on ARM64.
> >
> > +config DEVICE_TREE_NUMA
> 
> The name suggests that NUMA should only be enabled for Device-Tree...
> But the description looks generic.
> 
> However, I think the user should only have the choice to say whether
> they want NUMA to be enabled or not. We should not give them the choice
> to enable/disable the parsing for DT/ACPI.
> 
> So we should have a generic config that will then select DT (and ACPI in
> the future).
> 

How about we select DT_NUMA default on Arm64. And DT_NUMA select NUMA
like what we have done in patch#6 in x86? And remove the description?

If we make generic NUMA as a selectable option, and depends on
NUMA to select DT or ACPI NUMA. It seems to be quite different from
the existing logic?

> > +	bool "NUMA (Non-Uniform Memory Access) Support (UNSUPPORTED)" if
> UNSUPPORTED
> > +	depends on ARM_64
> > +	select NUMA
> > +	---help---
> > +
> > +	  Non-Uniform Memory Access (NUMA) is a computer memory design used
> in
> > +	  multiprocessing, where the memory access time depends on the
> memory
> > +	  location relative to the processor.
> > +
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 04/40] xen/arm: return default DMA bit width when platform is not set
  2021-08-20  2:04     ` Wei Chen
@ 2021-08-20  8:20       ` Julien Grall
  2021-08-20  9:37         ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-20  8:20 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand Marquis



On 20/08/2021 03:04, Wei Chen wrote:
> Hi Julien,

Hi Wei,

>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 2021年8月19日 21:28
>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
>> sstabellini@kernel.org; jbeulich@suse.com
>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
>> Subject: Re: [XEN RFC PATCH 04/40] xen/arm: return default DMA bit width
>> when platform is not set
>>
>> Hi,
>>
>> On 11/08/2021 11:23, Wei Chen wrote:
>>> From: Hongda Deng <Hongda.Deng@arm.com>
>>>
>>> In current code, arch_get_dma_bitsize will return 32 when platorm
>>> or platform->dma_bitsize is not set. It's not resonable, for Arm,
>>
>> s/resonable/reasonable/
>>
> 
> Ok
> 
>>> we don't require to reserve DMA memory. So we set dma_bitsize always
>>> be 0. In NO-NUMA system, arch_get_dma_bitsize will not be invoked,
>>> so dma_bitsize will not be overrided by this function.
>>
>> arch_get_dma_bitsize() is also used to allocate dom0 memory. We need to
>> be able to allocate some DMA-able memory that can be used by every devices.
>>
>>> But in NUMA
>>> system, once the online nodes are greater than 1, this function will
>>> be invoked. The dma_bitsize will be limited to 32. That means, only
>>> first 4GB memory can be used for DMA. But that's against our hardware
>>> design. We don't have that kind of restriction on hardware.
>>
>> What do you mean by "hardware design"? Are you referring to the server
>> you boot Xen on?
>>
> 
> Yes. I will change it to some neutral words. something like:
> "But that could not reflect some hardware's real DMA ability. They may not
> have kind of restriction on hardware." ?

The thing is DMA ability is not about the platform itself. It is more 
about the devices (this could just be a PCI card you just plugged). What 
you seem to suggest is no-one will ever plug such card on your platform. 
Is that correct?

> 
> 
>> Anyway, there are plenty of platform out that have devices which can't
>> DMA into memory above 32-bit. On RPI, this is even lower (30-bit).
>>
>> So I would be cautious to change the default limit.
>>
> 
> How about return 0 when platform doesn't specify the limit?
> In my opinion, arbitrary to give 32 on AArch64 doesn't make sense.
We have to care about the common use-case. We added this restriction a 
few years ago because we had a few reports of users using only 32-bit 
DMA capable device.

> But as you mentioned, if Xen is running on a platform with DMA limitation,
> but we have not listed this platform in Xen supported list, Xen cannot
> get DMA limit from platform->dma_bitsize. In this case, return 0 will
> also cause some issue.
> 
>> At the moment, the only place on Arm where we need DMA-able memory is
>> for dom0. This is allocated at boot and can't change afterwards (for now).
>>
> 
> For Dom0, we squash the patch#5 into this patch?

Let me answer with another question. Why should we modify the Arm code 
rather than the common code? IOW...

> 
>> So I would explore to remove the NUMA check for drop the DMA zone. FAOD,
>> both suggestion are for Arm only. For x86, they need to be kept.
>>
> 
> Without introducing new flag, such as lowmem_for_dma, it's a little
> hard to skip the numa node check. Unless we crudely add #ifdef ARCH to
> common code, which is not what we want to see ...
>        if ( !dma_bitsize && (num_online_nodes() > 1) )
>            dma_bitsize = arch_get_dma_bitsize();

... Why do you think we need this check on Arm when NUMA is enabled?

We can discuss how to remove it once this is answered.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 07/40] xen/arm: use !CONFIG_NUMA to keep fake NUMA API
  2021-08-20  2:08     ` Wei Chen
@ 2021-08-20  8:23       ` Julien Grall
  2021-08-20 10:24         ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-20  8:23 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand Marquis



On 20/08/2021 03:08, Wei Chen wrote:
> Hi Julien,

Hi Wei,

> 
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 2021年8月19日 21:34
>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
>> sstabellini@kernel.org; jbeulich@suse.com
>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
>> Subject: Re: [XEN RFC PATCH 07/40] xen/arm: use !CONFIG_NUMA to keep fake
>> NUMA API
>>
>> Hi Wei,
>>
>> On 11/08/2021 11:23, Wei Chen wrote:
>>> Only Arm64 supports NUMA, the CONFIG_NUMA could not be
>>> enabled for Arm32.
>>
>> What do you mean by "could not be enabled"?
> 
> I have not seen any Arm32 hardware support NUMA, so I think
> we don't need to support Arm32 NUMA.

I understand that there may not be 32-bit platform with NUMA. And that's 
fine stating that in the commit message. However...

> In this case, this Kconfig
> option could not be enabled on Arm32.

... you continue to say "couldn't be enabled" without clarifying whether 
this mean that the build will break or this was just not tested because 
you don't have any platform.

To put it differently, the code for NUMA looks bitness neutral. So I 
cannot really what what prevent us to potentially use it on Arm 32-bit.

> 
>>
>>> Even in Arm64, users still can disable
>>> the CONFIG_NUMA through Kconfig option. In this case, keep
>>> current fake NUMA API, will make Arm code still can work
>>> with NUMA aware memory allocation and scheduler.
>>>
>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>>> ---
>>>    xen/include/asm-arm/numa.h | 4 ++++
>>>    1 file changed, 4 insertions(+)
>>>
>>> diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
>>> index 31a6de4e23..ab9c4a2448 100644
>>> --- a/xen/include/asm-arm/numa.h
>>> +++ b/xen/include/asm-arm/numa.h
>>> @@ -5,6 +5,8 @@
>>>
>>>    typedef u8 nodeid_t;
>>>
>>> +#if !defined(CONFIG_NUMA)
>>
>> NIT: We tend to use #ifndef rather than #if !defined(...)
>>
> 
> OK, I will change related changes in this series.
> 
>>> +
>>>    /* Fake one node for now. See also node_online_map. */
>>>    #define cpu_to_node(cpu) 0
>>>    #define node_to_cpumask(node)   (cpu_online_map)
>>> @@ -25,6 +27,8 @@ extern mfn_t first_valid_mfn;
>>>    #define node_start_pfn(nid) (mfn_x(first_valid_mfn))
>>>    #define __node_distance(a, b) (20)
>>>
>>> +#endif
>>> +
>>>    #endif /* __ARCH_ARM_NUMA_H */
>>>    /*
>>>     * Local variables:
>>>
>>
>> Cheers,
>>
>> --
>> Julien Grall

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 17/40] xen/arm: Introduce DEVICE_TREE_NUMA Kconfig for arm64
  2021-08-20  2:30     ` Wei Chen
@ 2021-08-20  8:41       ` Julien Grall
  2021-08-20 10:49         ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-20  8:41 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand Marquis

On 20/08/2021 03:30, Wei Chen wrote:
> Hi Julien,

Hi Wei,

>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 2021年8月19日 21:38
>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
>> sstabellini@kernel.org; jbeulich@suse.com
>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
>> Subject: Re: [XEN RFC PATCH 17/40] xen/arm: Introduce DEVICE_TREE_NUMA
>> Kconfig for arm64
>>
>> Hi,
>>
>> On 11/08/2021 11:24, Wei Chen wrote:
>>> We need a Kconfig option to distinguish with ACPI based
>>> NUMA. So we introduce the new Kconfig option:
>>> DEVICE_TREE_NUMA in this patch for Arm64.
>>>
>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>>> ---
>>>    xen/arch/arm/Kconfig | 10 ++++++++++
>>>    1 file changed, 10 insertions(+)
>>>
>>> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
>>> index ecfa6822e4..678cc98ea3 100644
>>> --- a/xen/arch/arm/Kconfig
>>> +++ b/xen/arch/arm/Kconfig
>>> @@ -33,6 +33,16 @@ config ACPI
>>>    	  Advanced Configuration and Power Interface (ACPI) support for Xen
>> is
>>>    	  an alternative to device tree on ARM64.
>>>
>>> +config DEVICE_TREE_NUMA
>>
>> The name suggests that NUMA should only be enabled for Device-Tree...
>> But the description looks generic.
>>
>> However, I think the user should only have the choice to say whether
>> they want NUMA to be enabled or not. We should not give them the choice
>> to enable/disable the parsing for DT/ACPI.
>>
>> So we should have a generic config that will then select DT (and ACPI in
>> the future).
>>
> 
> How about we select DT_NUMA default on Arm64. And DT_NUMA select NUMA
> like what we have done in patch#6 in x86? And remove the description?
I would rather not make NUMA supported by default on Arm64. Instead, we 
should go throught the same process as other new features and gate it 
behind UNSUPPORTED until it is mature enough.

> 
> If we make generic NUMA as a selectable option, and depends on
> NUMA to select DT or ACPI NUMA. It seems to be quite different from
> the existing logic?

I am a bit confused. You added just logic to select NUMA from ACPI, 
right? So are you talking about a different logic?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse device tree processor node
  2021-08-20  2:23     ` Wei Chen
@ 2021-08-20  8:44       ` Julien Grall
  2021-08-20 11:53         ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-20  8:44 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand Marquis



On 20/08/2021 03:23, Wei Chen wrote:
> Hi Julien,

Hi Wei,

>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 2021年8月20日 2:13
>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
>> sstabellini@kernel.org; jbeulich@suse.com
>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
>> Subject: Re: [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse
>> device tree processor node
>>
>> Hi Wei,
>>
>> On 11/08/2021 11:24, Wei Chen wrote:
>>> Processor NUMA ID information is stored in device tree's processor
>>> node as "numa-node-id". We need a new helper to parse this ID from
>>> processor node. If we get this ID from processor node, this ID's
>>> validity still need to be checked. Once we got a invalid NUMA ID
>>> from any processor node, the device tree will be marked as NUMA
>>> information invalid.
>>>
>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>>> ---
>>>    xen/arch/arm/numa_device_tree.c | 41 +++++++++++++++++++++++++++++++--
>>>    1 file changed, 39 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/xen/arch/arm/numa_device_tree.c
>> b/xen/arch/arm/numa_device_tree.c
>>> index 1c74ad135d..37cc56acf3 100644
>>> --- a/xen/arch/arm/numa_device_tree.c
>>> +++ b/xen/arch/arm/numa_device_tree.c
>>> @@ -20,16 +20,53 @@
>>>    #include <xen/init.h>
>>>    #include <xen/nodemask.h>
>>>    #include <xen/numa.h>
>>> +#include <xen/device_tree.h>
>>
>> Nothing in this file seems to depend on xen/device_tree.h. So why do you
>> need to include it?
>>
> 
> I remember that without this header file, device_tree_get_u32 in this patch
> will cause compiling failed.

I looked at the prototype of device_tree_get_u32() and I can't find how 
it depends on bits from device_tree.h. Can you paste the compilation error?

> 
>>> +#include <asm/setup.h>
>>>
>>>    s8 device_tree_numa = 0;
>>> +static nodemask_t processor_nodes_parsed __initdata;
>>>
>>> -int srat_disabled(void)
>>> +static int srat_disabled(void)
>>>    {
>>>        return numa_off || device_tree_numa < 0;
>>>    }
>>>
>>> -void __init bad_srat(void)
>>> +static __init void bad_srat(void)
>>>    {
>>>        printk(KERN_ERR "DT: NUMA information is not used.\n");
>>>        device_tree_numa = -1;
>>>    }
>>> +
>>> +/* Callback for device tree processor affinity */
>>> +static int __init dtb_numa_processor_affinity_init(nodeid_t node)
>>> +{
>>> +    if ( srat_disabled() )
>>> +        return -EINVAL;
>>> +    else if ( node == NUMA_NO_NODE || node >= MAX_NUMNODES ) {
>>> +		bad_srat();
>>> +		return -EINVAL;
>>> +	}
>>> +
>>> +    node_set(node, processor_nodes_parsed);
>>> +
>>> +    device_tree_numa = 1;
>>> +    printk(KERN_INFO "DT: NUMA node %u processor parsed\n", node);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +/* Parse CPU NUMA node info */
>>> +int __init device_tree_parse_numa_cpu_node(const void *fdt, int node)
>>> +{
>>> +    uint32_t nid;
>>> +
>>> +    nid = device_tree_get_u32(fdt, node, "numa-node-id", MAX_NUMNODES);
>>> +    printk(XENLOG_WARNING "CPU on NUMA node:%u\n", nid);
>>> +    if ( nid >= MAX_NUMNODES )
>>> +    {
>>> +        printk(XENLOG_WARNING "Node id %u exceeds maximum value\n",
>> nid);
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    return dtb_numa_processor_affinity_init(nid);
>>> +}
>>>
>>
>> --
>> Julien Grall

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 04/40] xen/arm: return default DMA bit width when platform is not set
  2021-08-20  8:20       ` Julien Grall
@ 2021-08-20  9:37         ` Wei Chen
  2021-08-20 11:18           ` Julien Grall
  0 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-20  9:37 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月20日 16:20
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 04/40] xen/arm: return default DMA bit width
> when platform is not set
> 
> 
> 
> On 20/08/2021 03:04, Wei Chen wrote:
> > Hi Julien,
> 
> Hi Wei,
> 
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: 2021年8月19日 21:28
> >> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> >> sstabellini@kernel.org; jbeulich@suse.com
> >> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> >> Subject: Re: [XEN RFC PATCH 04/40] xen/arm: return default DMA bit
> width
> >> when platform is not set
> >>
> >> Hi,
> >>
> >> On 11/08/2021 11:23, Wei Chen wrote:
> >>> From: Hongda Deng <Hongda.Deng@arm.com>
> >>>
> >>> In current code, arch_get_dma_bitsize will return 32 when platorm
> >>> or platform->dma_bitsize is not set. It's not resonable, for Arm,
> >>
> >> s/resonable/reasonable/
> >>
> >
> > Ok
> >
> >>> we don't require to reserve DMA memory. So we set dma_bitsize always
> >>> be 0. In NO-NUMA system, arch_get_dma_bitsize will not be invoked,
> >>> so dma_bitsize will not be overrided by this function.
> >>
> >> arch_get_dma_bitsize() is also used to allocate dom0 memory. We need to
> >> be able to allocate some DMA-able memory that can be used by every
> devices.
> >>
> >>> But in NUMA
> >>> system, once the online nodes are greater than 1, this function will
> >>> be invoked. The dma_bitsize will be limited to 32. That means, only
> >>> first 4GB memory can be used for DMA. But that's against our hardware
> >>> design. We don't have that kind of restriction on hardware.
> >>
> >> What do you mean by "hardware design"? Are you referring to the server
> >> you boot Xen on?
> >>
> >
> > Yes. I will change it to some neutral words. something like:
> > "But that could not reflect some hardware's real DMA ability. They may
> not
> > have kind of restriction on hardware." ?
> 
> The thing is DMA ability is not about the platform itself. It is more
> about the devices (this could just be a PCI card you just plugged). What
> you seem to suggest is no-one will ever plug such card on your platform.
> Is that correct?
> 

OK, I understand now. Let's keep 32-bit as default value, but even in this
case, how about DMA-16 devices? Although these devices are very rare, they
still exist : )
Anyway, keep the arch_get_dma_bitsize as original maybe the best approach
for now.


> >
> >
> >> Anyway, there are plenty of platform out that have devices which can't
> >> DMA into memory above 32-bit. On RPI, this is even lower (30-bit).
> >>
> >> So I would be cautious to change the default limit.
> >>
> >
> > How about return 0 when platform doesn't specify the limit?
> > In my opinion, arbitrary to give 32 on AArch64 doesn't make sense.
> We have to care about the common use-case. We added this restriction a
> few years ago because we had a few reports of users using only 32-bit
> DMA capable device.
> 
> > But as you mentioned, if Xen is running on a platform with DMA
> limitation,
> > but we have not listed this platform in Xen supported list, Xen cannot
> > get DMA limit from platform->dma_bitsize. In this case, return 0 will
> > also cause some issue.
> >
> >> At the moment, the only place on Arm where we need DMA-able memory is
> >> for dom0. This is allocated at boot and can't change afterwards (for
> now).
> >>
> >
> > For Dom0, we squash the patch#5 into this patch?
> 
> Let me answer with another question. Why should we modify the Arm code
> rather than the common code? IOW...

As I answered above, let's keep arch_get_dma_bitsize as original then
we don't need to modify Arm code.

> 
> >
> >> So I would explore to remove the NUMA check for drop the DMA zone. FAOD,
> >> both suggestion are for Arm only. For x86, they need to be kept.
> >>
> >
> > Without introducing new flag, such as lowmem_for_dma, it's a little
> > hard to skip the numa node check. Unless we crudely add #ifdef ARCH to
> > common code, which is not what we want to see ...
> >        if ( !dma_bitsize && (num_online_nodes() > 1) )
> >            dma_bitsize = arch_get_dma_bitsize();
> 
> ... Why do you think we need this check on Arm when NUMA is enabled?
> 

I didn't think Arm needs, what I said is introduce a flag to disable
this check for Arm or other Architectures that they don't need this check.

> We can discuss how to remove it once this is answered.
> 

I think we can start to discuss it.

> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 07/40] xen/arm: use !CONFIG_NUMA to keep fake NUMA API
  2021-08-20  8:23       ` Julien Grall
@ 2021-08-20 10:24         ` Wei Chen
  2021-08-20 11:24           ` Julien Grall
  0 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-20 10:24 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月20日 16:24
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 07/40] xen/arm: use !CONFIG_NUMA to keep fake
> NUMA API
> 
> 
> 
> On 20/08/2021 03:08, Wei Chen wrote:
> > Hi Julien,
> 
> Hi Wei,
> 
> >
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: 2021年8月19日 21:34
> >> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> >> sstabellini@kernel.org; jbeulich@suse.com
> >> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> >> Subject: Re: [XEN RFC PATCH 07/40] xen/arm: use !CONFIG_NUMA to keep
> fake
> >> NUMA API
> >>
> >> Hi Wei,
> >>
> >> On 11/08/2021 11:23, Wei Chen wrote:
> >>> Only Arm64 supports NUMA, the CONFIG_NUMA could not be
> >>> enabled for Arm32.
> >>
> >> What do you mean by "could not be enabled"?
> >
> > I have not seen any Arm32 hardware support NUMA, so I think
> > we don't need to support Arm32 NUMA.
> 
> I understand that there may not be 32-bit platform with NUMA. And that's
> fine stating that in the commit message. However...
> 
> > In this case, this Kconfig
> > option could not be enabled on Arm32.
> 
> ... you continue to say "couldn't be enabled" without clarifying whether
> this mean that the build will break or this was just not tested because
> you don't have any platform.

Ok, I understand your concern. Yes, my words would lead to mis-understanding.
If we make CONFIG_NUMA enabled in Arm32, it need Arm32 to implement some
code to support NUMA common code. Otherwise the Arm32 build will failed.
I have not tried to implement those code for Arm32. And I found there is
no Arm32 machine support NUMA, so I wanted Arm32 to use fake NUMA API
as before.

> 
> To put it differently, the code for NUMA looks bitness neutral. So I
> cannot really what what prevent us to potentially use it on Arm 32-bit.
> 

Yes, you're right, it's neutral. But do we really need to add code to an
ARCH that it may never use? And how can we test this code? Before this patch,
I had checked Linux, and found that OF_NUMA only selected by Arm64 not Arm32.
But if you feel the need to add to arm32, I have no problem with that.

> >
> >>
> >>> Even in Arm64, users still can disable
> >>> the CONFIG_NUMA through Kconfig option. In this case, keep
> >>> current fake NUMA API, will make Arm code still can work
> >>> with NUMA aware memory allocation and scheduler.
> >>>
> >>> Signed-off-by: Wei Chen <wei.chen@arm.com>
> >>> ---
> >>>    xen/include/asm-arm/numa.h | 4 ++++
> >>>    1 file changed, 4 insertions(+)
> >>>
> >>> diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
> >>> index 31a6de4e23..ab9c4a2448 100644
> >>> --- a/xen/include/asm-arm/numa.h
> >>> +++ b/xen/include/asm-arm/numa.h
> >>> @@ -5,6 +5,8 @@
> >>>
> >>>    typedef u8 nodeid_t;
> >>>
> >>> +#if !defined(CONFIG_NUMA)
> >>
> >> NIT: We tend to use #ifndef rather than #if !defined(...)
> >>
> >
> > OK, I will change related changes in this series.
> >
> >>> +
> >>>    /* Fake one node for now. See also node_online_map. */
> >>>    #define cpu_to_node(cpu) 0
> >>>    #define node_to_cpumask(node)   (cpu_online_map)
> >>> @@ -25,6 +27,8 @@ extern mfn_t first_valid_mfn;
> >>>    #define node_start_pfn(nid) (mfn_x(first_valid_mfn))
> >>>    #define __node_distance(a, b) (20)
> >>>
> >>> +#endif
> >>> +
> >>>    #endif /* __ARCH_ARM_NUMA_H */
> >>>    /*
> >>>     * Local variables:
> >>>
> >>
> >> Cheers,
> >>
> >> --
> >> Julien Grall
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 17/40] xen/arm: Introduce DEVICE_TREE_NUMA Kconfig for arm64
  2021-08-20  8:41       ` Julien Grall
@ 2021-08-20 10:49         ` Wei Chen
  2021-08-20 11:28           ` Julien Grall
  0 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-20 10:49 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月20日 16:41
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 17/40] xen/arm: Introduce DEVICE_TREE_NUMA
> Kconfig for arm64
> 
> On 20/08/2021 03:30, Wei Chen wrote:
> > Hi Julien,
> 
> Hi Wei,
> 
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: 2021年8月19日 21:38
> >> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> >> sstabellini@kernel.org; jbeulich@suse.com
> >> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> >> Subject: Re: [XEN RFC PATCH 17/40] xen/arm: Introduce DEVICE_TREE_NUMA
> >> Kconfig for arm64
> >>
> >> Hi,
> >>
> >> On 11/08/2021 11:24, Wei Chen wrote:
> >>> We need a Kconfig option to distinguish with ACPI based
> >>> NUMA. So we introduce the new Kconfig option:
> >>> DEVICE_TREE_NUMA in this patch for Arm64.
> >>>
> >>> Signed-off-by: Wei Chen <wei.chen@arm.com>
> >>> ---
> >>>    xen/arch/arm/Kconfig | 10 ++++++++++
> >>>    1 file changed, 10 insertions(+)
> >>>
> >>> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> >>> index ecfa6822e4..678cc98ea3 100644
> >>> --- a/xen/arch/arm/Kconfig
> >>> +++ b/xen/arch/arm/Kconfig
> >>> @@ -33,6 +33,16 @@ config ACPI
> >>>    	  Advanced Configuration and Power Interface (ACPI) support
> for Xen
> >> is
> >>>    	  an alternative to device tree on ARM64.
> >>>
> >>> +config DEVICE_TREE_NUMA
> >>
> >> The name suggests that NUMA should only be enabled for Device-Tree...
> >> But the description looks generic.
> >>
> >> However, I think the user should only have the choice to say whether
> >> they want NUMA to be enabled or not. We should not give them the choice
> >> to enable/disable the parsing for DT/ACPI.
> >>
> >> So we should have a generic config that will then select DT (and ACPI
> in
> >> the future).
> >>
> >
> > How about we select DT_NUMA default on Arm64. And DT_NUMA select NUMA
> > like what we have done in patch#6 in x86? And remove the description?
> I would rather not make NUMA supported by default on Arm64. Instead, we
> should go throught the same process as other new features and gate it
> behind UNSUPPORTED until it is mature enough.
> 

Ok. I agree with this.

> >
> > If we make generic NUMA as a selectable option, and depends on
> > NUMA to select DT or ACPI NUMA. It seems to be quite different from
> > the existing logic?
> 
> I am a bit confused. You added just logic to select NUMA from ACPI,
> right? So are you talking about a different logic?
> 

No, I didn't want a different one. I thought you wanted it that way.
Obviously, I mis-understanded your comments.

Can I understand your previous comments like following:
1. We should have a generic config that will then select DT and ACPI:
   Because we already have CONFIG_NUMA in common layer. So we need to
   add another one for Arm like CONFIG_ARM_NUMA?
   And in this option, we can select CONFIG_DEVICE_TREE_NUMA
   automatically if device tree is enabled. If CONFIG_ACPI
   is enabled, we will select CONFIG_ACPI_NUMA too (in the
   future)
   In Xen code, DT_NUMA and ACPI_NUMA code can co-exist, Xen
   will check the system ACPI support status to decide to use
   DT_NUMA or ACPI_NUMA?


> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 04/40] xen/arm: return default DMA bit width when platform is not set
  2021-08-20  9:37         ` Wei Chen
@ 2021-08-20 11:18           ` Julien Grall
  2021-08-20 11:58             ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-20 11:18 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini; +Cc: Bertrand Marquis

On 20/08/2021 10:37, Wei Chen wrote:
> Hi Julien,

Hi Wei,

>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 2021年8月20日 16:20
>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
>> sstabellini@kernel.org; jbeulich@suse.com
>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
>> Subject: Re: [XEN RFC PATCH 04/40] xen/arm: return default DMA bit width
>> when platform is not set
>>
>>
>>
>> On 20/08/2021 03:04, Wei Chen wrote:
>>> Hi Julien,
>>
>> Hi Wei,
>>
>>>> -----Original Message-----
>>>> From: Julien Grall <julien@xen.org>
>>>> Sent: 2021年8月19日 21:28
>>>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
>>>> sstabellini@kernel.org; jbeulich@suse.com
>>>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
>>>> Subject: Re: [XEN RFC PATCH 04/40] xen/arm: return default DMA bit
>> width
>>>> when platform is not set
>>>>
>>>> Hi,
>>>>
>>>> On 11/08/2021 11:23, Wei Chen wrote:
>>>>> From: Hongda Deng <Hongda.Deng@arm.com>
>>>>>
>>>>> In current code, arch_get_dma_bitsize will return 32 when platorm
>>>>> or platform->dma_bitsize is not set. It's not resonable, for Arm,
>>>>
>>>> s/resonable/reasonable/
>>>>
>>>
>>> Ok
>>>
>>>>> we don't require to reserve DMA memory. So we set dma_bitsize always
>>>>> be 0. In NO-NUMA system, arch_get_dma_bitsize will not be invoked,
>>>>> so dma_bitsize will not be overrided by this function.
>>>>
>>>> arch_get_dma_bitsize() is also used to allocate dom0 memory. We need to
>>>> be able to allocate some DMA-able memory that can be used by every
>> devices.
>>>>
>>>>> But in NUMA
>>>>> system, once the online nodes are greater than 1, this function will
>>>>> be invoked. The dma_bitsize will be limited to 32. That means, only
>>>>> first 4GB memory can be used for DMA. But that's against our hardware
>>>>> design. We don't have that kind of restriction on hardware.
>>>>
>>>> What do you mean by "hardware design"? Are you referring to the server
>>>> you boot Xen on?
>>>>
>>>
>>> Yes. I will change it to some neutral words. something like:
>>> "But that could not reflect some hardware's real DMA ability. They may
>> not
>>> have kind of restriction on hardware." ?
>>
>> The thing is DMA ability is not about the platform itself. It is more
>> about the devices (this could just be a PCI card you just plugged). What
>> you seem to suggest is no-one will ever plug such card on your platform.
>> Is that correct?
>>
> 
> OK, I understand now. Let's keep 32-bit as default value, but even in this
> case, how about DMA-16 devices? Although these devices are very rare, they
> still exist : )

I haven't heard anyone reporting issues with them on Xen on Arm. So I 
assume that either it works or no-one is using them.

My main point is we need to care about the common use case. 32-bit DMA 
device is still a thing and caused trouble to some of our users (e.g. NXP).

If tomorrow, someone report issue with 16-bit DMA device, then we can 
consider our options how to handle.

>>>> So I would explore to remove the NUMA check for drop the DMA zone. FAOD,
>>>> both suggestion are for Arm only. For x86, they need to be kept.
>>>>
>>>
>>> Without introducing new flag, such as lowmem_for_dma, it's a little
>>> hard to skip the numa node check. Unless we crudely add #ifdef ARCH to
>>> common code, which is not what we want to see ...
>>>         if ( !dma_bitsize && (num_online_nodes() > 1) )
>>>             dma_bitsize = arch_get_dma_bitsize();
>>
>> ... Why do you think we need this check on Arm when NUMA is enabled?
>>
> 
> I didn't think Arm needs, what I said is introduce a flag to disable
> this check for Arm or other Architectures that they don't need this check.
> 
>> We can discuss how to remove it once this is answered.
>>
> 
> I think we can start to discuss it.

How about replacing the second part of the check with a new helper 
arch_have_default_dma_zone() (or a different name)?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 07/40] xen/arm: use !CONFIG_NUMA to keep fake NUMA API
  2021-08-20 10:24         ` Wei Chen
@ 2021-08-20 11:24           ` Julien Grall
  2021-08-20 12:23             ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-20 11:24 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini; +Cc: Bertrand Marquis



On 20/08/2021 11:24, Wei Chen wrote:
> Hi Julien,

Hi Wei,

> 
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 2021年8月20日 16:24
>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
>> sstabellini@kernel.org; jbeulich@suse.com
>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
>> Subject: Re: [XEN RFC PATCH 07/40] xen/arm: use !CONFIG_NUMA to keep fake
>> NUMA API
>>
>>
>>
>> On 20/08/2021 03:08, Wei Chen wrote:
>>> Hi Julien,
>>
>> Hi Wei,
>>
>>>
>>>> -----Original Message-----
>>>> From: Julien Grall <julien@xen.org>
>>>> Sent: 2021年8月19日 21:34
>>>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
>>>> sstabellini@kernel.org; jbeulich@suse.com
>>>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
>>>> Subject: Re: [XEN RFC PATCH 07/40] xen/arm: use !CONFIG_NUMA to keep
>> fake
>>>> NUMA API
>>>>
>>>> Hi Wei,
>>>>
>>>> On 11/08/2021 11:23, Wei Chen wrote:
>>>>> Only Arm64 supports NUMA, the CONFIG_NUMA could not be
>>>>> enabled for Arm32.
>>>>
>>>> What do you mean by "could not be enabled"?
>>>
>>> I have not seen any Arm32 hardware support NUMA, so I think
>>> we don't need to support Arm32 NUMA.
>>
>> I understand that there may not be 32-bit platform with NUMA. And that's
>> fine stating that in the commit message. However...
>>
>>> In this case, this Kconfig
>>> option could not be enabled on Arm32.
>>
>> ... you continue to say "couldn't be enabled" without clarifying whether
>> this mean that the build will break or this was just not tested because
>> you don't have any platform.
> 
> Ok, I understand your concern. Yes, my words would lead to mis-understanding.
> If we make CONFIG_NUMA enabled in Arm32, it need Arm32 to implement some
> code to support NUMA common code. Otherwise the Arm32 build will failed.

When I skimmed through the series, most of the code seems to be either 
in common, arm (bitness neutral). So I am not quite too sure why it 
would not build. Do you have more details?

> I have not tried to implement those code for Arm32. And I found there is
> no Arm32 machine support NUMA, so I wanted Arm32 to use fake NUMA API
> as before.
> 
>>
>> To put it differently, the code for NUMA looks bitness neutral. So I
>> cannot really what what prevent us to potentially use it on Arm 32-bit.
>>
> 
> Yes, you're right, it's neutral. But do we really need to add code to an
> ARCH that it may never use?

Technically you already added the code because arch/arm/ is common 
between arm32 and arm64. My only ask is to not make the new config 
depends on arm64. If you only build test it that fine because...

And how can we test this code?

I don't expect any of the code to be an issue on arm32 as the code 
should mostly be arch neutral.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 17/40] xen/arm: Introduce DEVICE_TREE_NUMA Kconfig for arm64
  2021-08-20 10:49         ` Wei Chen
@ 2021-08-20 11:28           ` Julien Grall
  2021-08-20 12:25             ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-20 11:28 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini; +Cc: Bertrand Marquis



On 20/08/2021 11:49, Wei Chen wrote:
> Hi Julien,

Hi Wei,

> 
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 2021年8月20日 16:41
>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
>> sstabellini@kernel.org; jbeulich@suse.com
>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
>> Subject: Re: [XEN RFC PATCH 17/40] xen/arm: Introduce DEVICE_TREE_NUMA
>> Kconfig for arm64
>>
>> On 20/08/2021 03:30, Wei Chen wrote:
>>> Hi Julien,
>>
>> Hi Wei,
>>
>>>> -----Original Message-----
>>>> From: Julien Grall <julien@xen.org>
>>>> Sent: 2021年8月19日 21:38
>>>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
>>>> sstabellini@kernel.org; jbeulich@suse.com
>>>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
>>>> Subject: Re: [XEN RFC PATCH 17/40] xen/arm: Introduce DEVICE_TREE_NUMA
>>>> Kconfig for arm64
>>>>
>>>> Hi,
>>>>
>>>> On 11/08/2021 11:24, Wei Chen wrote:
>>>>> We need a Kconfig option to distinguish with ACPI based
>>>>> NUMA. So we introduce the new Kconfig option:
>>>>> DEVICE_TREE_NUMA in this patch for Arm64.
>>>>>
>>>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>>>>> ---
>>>>>     xen/arch/arm/Kconfig | 10 ++++++++++
>>>>>     1 file changed, 10 insertions(+)
>>>>>
>>>>> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
>>>>> index ecfa6822e4..678cc98ea3 100644
>>>>> --- a/xen/arch/arm/Kconfig
>>>>> +++ b/xen/arch/arm/Kconfig
>>>>> @@ -33,6 +33,16 @@ config ACPI
>>>>>     	  Advanced Configuration and Power Interface (ACPI) support
>> for Xen
>>>> is
>>>>>     	  an alternative to device tree on ARM64.
>>>>>
>>>>> +config DEVICE_TREE_NUMA
>>>>
>>>> The name suggests that NUMA should only be enabled for Device-Tree...
>>>> But the description looks generic.
>>>>
>>>> However, I think the user should only have the choice to say whether
>>>> they want NUMA to be enabled or not. We should not give them the choice
>>>> to enable/disable the parsing for DT/ACPI.
>>>>
>>>> So we should have a generic config that will then select DT (and ACPI
>> in
>>>> the future).
>>>>
>>>
>>> How about we select DT_NUMA default on Arm64. And DT_NUMA select NUMA
>>> like what we have done in patch#6 in x86? And remove the description?
>> I would rather not make NUMA supported by default on Arm64. Instead, we
>> should go throught the same process as other new features and gate it
>> behind UNSUPPORTED until it is mature enough.
>>
> 
> Ok. I agree with this.
> 
>>>
>>> If we make generic NUMA as a selectable option, and depends on
>>> NUMA to select DT or ACPI NUMA. It seems to be quite different from
>>> the existing logic?
>>
>> I am a bit confused. You added just logic to select NUMA from ACPI,
>> right? So are you talking about a different logic?
>>
> 
> No, I didn't want a different one. I thought you wanted it that way.
> Obviously, I mis-understanded your comments.
> 
> Can I understand your previous comments like following:
> 1. We should have a generic config that will then select DT and ACPI:
>     Because we already have CONFIG_NUMA in common layer. So we need to
>     add another one for Arm like CONFIG_ARM_NUMA?

I think so.

>     And in this option, we can select CONFIG_DEVICE_TREE_NUMA
>     automatically if device tree is enabled. If CONFIG_ACPI
>     is enabled, we will select CONFIG_ACPI_NUMA too (in the
>     future)
>     In Xen code, DT_NUMA and ACPI_NUMA code can co-exist, Xen

Distributions should not have to build a different Xen for DT and ACPI. 
So it is more they *must* co-exist.

>     will check the system ACPI support status to decide to use
>     DT_NUMA or ACPI_NUMA?

Yes. A user should only have to say "I want to use NUMA". This is Xen to 
figure out whether we need to compile the support for DT and/or ACPI.

Once we have support for APCI, it doesn't make a lot of sense for the 
users to say "I want to compile with DT and ACPI but I only want NUMA 
when using DT".

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse device tree processor node
  2021-08-20  8:44       ` Julien Grall
@ 2021-08-20 11:53         ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-20 11:53 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月20日 16:44
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse
> device tree processor node
> 
> 
> 
> On 20/08/2021 03:23, Wei Chen wrote:
> > Hi Julien,
> 
> Hi Wei,
> 
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: 2021年8月20日 2:13
> >> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> >> sstabellini@kernel.org; jbeulich@suse.com
> >> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> >> Subject: Re: [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse
> >> device tree processor node
> >>
> >> Hi Wei,
> >>
> >> On 11/08/2021 11:24, Wei Chen wrote:
> >>> Processor NUMA ID information is stored in device tree's processor
> >>> node as "numa-node-id". We need a new helper to parse this ID from
> >>> processor node. If we get this ID from processor node, this ID's
> >>> validity still need to be checked. Once we got a invalid NUMA ID
> >>> from any processor node, the device tree will be marked as NUMA
> >>> information invalid.
> >>>
> >>> Signed-off-by: Wei Chen <wei.chen@arm.com>
> >>> ---
> >>>    xen/arch/arm/numa_device_tree.c | 41
> +++++++++++++++++++++++++++++++--
> >>>    1 file changed, 39 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/xen/arch/arm/numa_device_tree.c
> >> b/xen/arch/arm/numa_device_tree.c
> >>> index 1c74ad135d..37cc56acf3 100644
> >>> --- a/xen/arch/arm/numa_device_tree.c
> >>> +++ b/xen/arch/arm/numa_device_tree.c
> >>> @@ -20,16 +20,53 @@
> >>>    #include <xen/init.h>
> >>>    #include <xen/nodemask.h>
> >>>    #include <xen/numa.h>
> >>> +#include <xen/device_tree.h>
> >>
> >> Nothing in this file seems to depend on xen/device_tree.h. So why do
> you
> >> need to include it?
> >>
> >
> > I remember that without this header file, device_tree_get_u32 in this
> patch
> > will cause compiling failed.
> 
> I looked at the prototype of device_tree_get_u32() and I can't find how
> it depends on bits from device_tree.h. Can you paste the compilation error?
> 

I tested it again, this header file should be introduced in following patches:
numa_device_tree.c: In function ‘device_tree_parse_numa_distance_map_v1’:
numa_device_tree.c:243:16: error: implicit declaration of function ‘dt_read_number’ [-Werror=implicit-function-declaration]
  243 |         from = dt_read_number(matrix, 1);
      |                ^~~~~~~~~~~~~~
numa_device_tree.c:243:16: error: nested extern declaration of ‘dt_read_number’ [-Werror=nested-externs]
cc1: all warnings being treated as errors

I will move it to another patch.

> >
> >>> +#include <asm/setup.h>
> >>>
> >>>    s8 device_tree_numa = 0;
> >>> +static nodemask_t processor_nodes_parsed __initdata;
> >>>
> >>> -int srat_disabled(void)
> >>> +static int srat_disabled(void)
> >>>    {
> >>>        return numa_off || device_tree_numa < 0;
> >>>    }
> >>>
> >>> -void __init bad_srat(void)
> >>> +static __init void bad_srat(void)
> >>>    {
> >>>        printk(KERN_ERR "DT: NUMA information is not used.\n");
> >>>        device_tree_numa = -1;
> >>>    }
> >>> +
> >>> +/* Callback for device tree processor affinity */
> >>> +static int __init dtb_numa_processor_affinity_init(nodeid_t node)
> >>> +{
> >>> +    if ( srat_disabled() )
> >>> +        return -EINVAL;
> >>> +    else if ( node == NUMA_NO_NODE || node >= MAX_NUMNODES ) {
> >>> +		bad_srat();
> >>> +		return -EINVAL;
> >>> +	}
> >>> +
> >>> +    node_set(node, processor_nodes_parsed);
> >>> +
> >>> +    device_tree_numa = 1;
> >>> +    printk(KERN_INFO "DT: NUMA node %u processor parsed\n", node);
> >>> +
> >>> +    return 0;
> >>> +}
> >>> +
> >>> +/* Parse CPU NUMA node info */
> >>> +int __init device_tree_parse_numa_cpu_node(const void *fdt, int node)
> >>> +{
> >>> +    uint32_t nid;
> >>> +
> >>> +    nid = device_tree_get_u32(fdt, node, "numa-node-id",
> MAX_NUMNODES);
> >>> +    printk(XENLOG_WARNING "CPU on NUMA node:%u\n", nid);
> >>> +    if ( nid >= MAX_NUMNODES )
> >>> +    {
> >>> +        printk(XENLOG_WARNING "Node id %u exceeds maximum value\n",
> >> nid);
> >>> +        return -EINVAL;
> >>> +    }
> >>> +
> >>> +    return dtb_numa_processor_affinity_init(nid);
> >>> +}
> >>>
> >>
> >> --
> >> Julien Grall
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 04/40] xen/arm: return default DMA bit width when platform is not set
  2021-08-20 11:18           ` Julien Grall
@ 2021-08-20 11:58             ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-20 11:58 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月20日 19:18
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 04/40] xen/arm: return default DMA bit width
> when platform is not set
> 
> On 20/08/2021 10:37, Wei Chen wrote:
> > Hi Julien,
> 
> Hi Wei,
> 
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: 2021年8月20日 16:20
> >> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> >> sstabellini@kernel.org; jbeulich@suse.com
> >> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> >> Subject: Re: [XEN RFC PATCH 04/40] xen/arm: return default DMA bit
> width
> >> when platform is not set
> >>
> >>
> >>
> >> On 20/08/2021 03:04, Wei Chen wrote:
> >>> Hi Julien,
> >>
> >> Hi Wei,
> >>
> >>>> -----Original Message-----
> >>>> From: Julien Grall <julien@xen.org>
> >>>> Sent: 2021年8月19日 21:28
> >>>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> >>>> sstabellini@kernel.org; jbeulich@suse.com
> >>>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> >>>> Subject: Re: [XEN RFC PATCH 04/40] xen/arm: return default DMA bit
> >> width
> >>>> when platform is not set
> >>>>
> >>>> Hi,
> >>>>
> >>>> On 11/08/2021 11:23, Wei Chen wrote:
> >>>>> From: Hongda Deng <Hongda.Deng@arm.com>
> >>>>>
> >>>>> In current code, arch_get_dma_bitsize will return 32 when platorm
> >>>>> or platform->dma_bitsize is not set. It's not resonable, for Arm,
> >>>>
> >>>> s/resonable/reasonable/
> >>>>
> >>>
> >>> Ok
> >>>
> >>>>> we don't require to reserve DMA memory. So we set dma_bitsize always
> >>>>> be 0. In NO-NUMA system, arch_get_dma_bitsize will not be invoked,
> >>>>> so dma_bitsize will not be overrided by this function.
> >>>>
> >>>> arch_get_dma_bitsize() is also used to allocate dom0 memory. We need
> to
> >>>> be able to allocate some DMA-able memory that can be used by every
> >> devices.
> >>>>
> >>>>> But in NUMA
> >>>>> system, once the online nodes are greater than 1, this function will
> >>>>> be invoked. The dma_bitsize will be limited to 32. That means, only
> >>>>> first 4GB memory can be used for DMA. But that's against our
> hardware
> >>>>> design. We don't have that kind of restriction on hardware.
> >>>>
> >>>> What do you mean by "hardware design"? Are you referring to the
> server
> >>>> you boot Xen on?
> >>>>
> >>>
> >>> Yes. I will change it to some neutral words. something like:
> >>> "But that could not reflect some hardware's real DMA ability. They may
> >> not
> >>> have kind of restriction on hardware." ?
> >>
> >> The thing is DMA ability is not about the platform itself. It is more
> >> about the devices (this could just be a PCI card you just plugged).
> What
> >> you seem to suggest is no-one will ever plug such card on your platform.
> >> Is that correct?
> >>
> >
> > OK, I understand now. Let's keep 32-bit as default value, but even in
> this
> > case, how about DMA-16 devices? Although these devices are very rare,
> they
> > still exist : )
> 
> I haven't heard anyone reporting issues with them on Xen on Arm. So I
> assume that either it works or no-one is using them.
> 
> My main point is we need to care about the common use case. 32-bit DMA
> device is still a thing and caused trouble to some of our users (e.g. NXP).
> 
> If tomorrow, someone report issue with 16-bit DMA device, then we can
> consider our options how to handle.
> 
> >>>> So I would explore to remove the NUMA check for drop the DMA zone.
> FAOD,
> >>>> both suggestion are for Arm only. For x86, they need to be kept.
> >>>>
> >>>
> >>> Without introducing new flag, such as lowmem_for_dma, it's a little
> >>> hard to skip the numa node check. Unless we crudely add #ifdef ARCH to
> >>> common code, which is not what we want to see ...
> >>>         if ( !dma_bitsize && (num_online_nodes() > 1) )
> >>>             dma_bitsize = arch_get_dma_bitsize();
> >>
> >> ... Why do you think we need this check on Arm when NUMA is enabled?
> >>
> >
> > I didn't think Arm needs, what I said is introduce a flag to disable
> > this check for Arm or other Architectures that they don't need this
> check.
> >
> >> We can discuss how to remove it once this is answered.
> >>
> >
> > I think we can start to discuss it.
> 
> How about replacing the second part of the check with a new helper
> arch_have_default_dma_zone() (or a different name)?

This seems like a method worth trying, I will try to follow this way
to compose next version.

> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 07/40] xen/arm: use !CONFIG_NUMA to keep fake NUMA API
  2021-08-20 11:24           ` Julien Grall
@ 2021-08-20 12:23             ` Wei Chen
  2021-08-20 14:41               ` Julien Grall
  0 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-20 12:23 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月20日 19:24
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 07/40] xen/arm: use !CONFIG_NUMA to keep fake
> NUMA API
> 
> 
> 
> On 20/08/2021 11:24, Wei Chen wrote:
> > Hi Julien,
> 
> Hi Wei,
> 
> >
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: 2021年8月20日 16:24
> >> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> >> sstabellini@kernel.org; jbeulich@suse.com
> >> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> >> Subject: Re: [XEN RFC PATCH 07/40] xen/arm: use !CONFIG_NUMA to keep
> fake
> >> NUMA API
> >>
> >>
> >>
> >> On 20/08/2021 03:08, Wei Chen wrote:
> >>> Hi Julien,
> >>
> >> Hi Wei,
> >>
> >>>
> >>>> -----Original Message-----
> >>>> From: Julien Grall <julien@xen.org>
> >>>> Sent: 2021年8月19日 21:34
> >>>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> >>>> sstabellini@kernel.org; jbeulich@suse.com
> >>>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> >>>> Subject: Re: [XEN RFC PATCH 07/40] xen/arm: use !CONFIG_NUMA to keep
> >> fake
> >>>> NUMA API
> >>>>
> >>>> Hi Wei,
> >>>>
> >>>> On 11/08/2021 11:23, Wei Chen wrote:
> >>>>> Only Arm64 supports NUMA, the CONFIG_NUMA could not be
> >>>>> enabled for Arm32.
> >>>>
> >>>> What do you mean by "could not be enabled"?
> >>>
> >>> I have not seen any Arm32 hardware support NUMA, so I think
> >>> we don't need to support Arm32 NUMA.
> >>
> >> I understand that there may not be 32-bit platform with NUMA. And
> that's
> >> fine stating that in the commit message. However...
> >>
> >>> In this case, this Kconfig
> >>> option could not be enabled on Arm32.
> >>
> >> ... you continue to say "couldn't be enabled" without clarifying
> whether
> >> this mean that the build will break or this was just not tested because
> >> you don't have any platform.
> >
> > Ok, I understand your concern. Yes, my words would lead to mis-
> understanding.
> > If we make CONFIG_NUMA enabled in Arm32, it need Arm32 to implement some
> > code to support NUMA common code. Otherwise the Arm32 build will failed.
> 
> When I skimmed through the series, most of the code seems to be either
> in common, arm (bitness neutral). So I am not quite too sure why it
> would not build. Do you have more details?
> 

It could not build because I have not tried to enable device_tree_numa
option for Arm32 but enabled NUMA for arm32.

I have tested it again, yes, simple enable device_tree_numa and NUMA
for arm32 can build a image successfully.

So, I think it's OK to enable this on Arm32, and I will do it in next
version. But, can we still keep these FAKE APIs? If user don't want to
enable NUMA they still can make Xen work? And I will remove "could not
enable for Arm32" from commit log.

> > I have not tried to implement those code for Arm32. And I found there is
> > no Arm32 machine support NUMA, so I wanted Arm32 to use fake NUMA API
> > as before.
> >
> >>
> >> To put it differently, the code for NUMA looks bitness neutral. So I
> >> cannot really what what prevent us to potentially use it on Arm 32-bit.
> >>
> >
> > Yes, you're right, it's neutral. But do we really need to add code to an
> > ARCH that it may never use?
> 
> Technically you already added the code because arch/arm/ is common
> between arm32 and arm64. My only ask is to not make the new config
> depends on arm64. If you only build test it that fine because...
> 
> And how can we test this code?
> 
> I don't expect any of the code to be an issue on arm32 as the code
> should mostly be arch neutral.

I mean, we don't have Arm32 NUMA machine to test, I don't know
the code works well on Arm32 NUMA or not. I only can verify them
on non-NUMA arm32, and make sure this code will not break existed
machines.

> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 17/40] xen/arm: Introduce DEVICE_TREE_NUMA Kconfig for arm64
  2021-08-20 11:28           ` Julien Grall
@ 2021-08-20 12:25             ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-20 12:25 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月20日 19:29
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 17/40] xen/arm: Introduce DEVICE_TREE_NUMA
> Kconfig for arm64
> 
> 
> 
> On 20/08/2021 11:49, Wei Chen wrote:
> > Hi Julien,
> 
> Hi Wei,
> 
> >
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: 2021年8月20日 16:41
> >> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> >> sstabellini@kernel.org; jbeulich@suse.com
> >> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> >> Subject: Re: [XEN RFC PATCH 17/40] xen/arm: Introduce DEVICE_TREE_NUMA
> >> Kconfig for arm64
> >>
> >> On 20/08/2021 03:30, Wei Chen wrote:
> >>> Hi Julien,
> >>
> >> Hi Wei,
> >>
> >>>> -----Original Message-----
> >>>> From: Julien Grall <julien@xen.org>
> >>>> Sent: 2021年8月19日 21:38
> >>>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> >>>> sstabellini@kernel.org; jbeulich@suse.com
> >>>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> >>>> Subject: Re: [XEN RFC PATCH 17/40] xen/arm: Introduce
> DEVICE_TREE_NUMA
> >>>> Kconfig for arm64
> >>>>
> >>>> Hi,
> >>>>
> >>>> On 11/08/2021 11:24, Wei Chen wrote:
> >>>>> We need a Kconfig option to distinguish with ACPI based
> >>>>> NUMA. So we introduce the new Kconfig option:
> >>>>> DEVICE_TREE_NUMA in this patch for Arm64.
> >>>>>
> >>>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
> >>>>> ---
> >>>>>     xen/arch/arm/Kconfig | 10 ++++++++++
> >>>>>     1 file changed, 10 insertions(+)
> >>>>>
> >>>>> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> >>>>> index ecfa6822e4..678cc98ea3 100644
> >>>>> --- a/xen/arch/arm/Kconfig
> >>>>> +++ b/xen/arch/arm/Kconfig
> >>>>> @@ -33,6 +33,16 @@ config ACPI
> >>>>>     	  Advanced Configuration and Power Interface (ACPI) support
> >> for Xen
> >>>> is
> >>>>>     	  an alternative to device tree on ARM64.
> >>>>>
> >>>>> +config DEVICE_TREE_NUMA
> >>>>
> >>>> The name suggests that NUMA should only be enabled for Device-Tree...
> >>>> But the description looks generic.
> >>>>
> >>>> However, I think the user should only have the choice to say whether
> >>>> they want NUMA to be enabled or not. We should not give them the
> choice
> >>>> to enable/disable the parsing for DT/ACPI.
> >>>>
> >>>> So we should have a generic config that will then select DT (and ACPI
> >> in
> >>>> the future).
> >>>>
> >>>
> >>> How about we select DT_NUMA default on Arm64. And DT_NUMA select NUMA
> >>> like what we have done in patch#6 in x86? And remove the description?
> >> I would rather not make NUMA supported by default on Arm64. Instead, we
> >> should go throught the same process as other new features and gate it
> >> behind UNSUPPORTED until it is mature enough.
> >>
> >
> > Ok. I agree with this.
> >
> >>>
> >>> If we make generic NUMA as a selectable option, and depends on
> >>> NUMA to select DT or ACPI NUMA. It seems to be quite different from
> >>> the existing logic?
> >>
> >> I am a bit confused. You added just logic to select NUMA from ACPI,
> >> right? So are you talking about a different logic?
> >>
> >
> > No, I didn't want a different one. I thought you wanted it that way.
> > Obviously, I mis-understanded your comments.
> >
> > Can I understand your previous comments like following:
> > 1. We should have a generic config that will then select DT and ACPI:
> >     Because we already have CONFIG_NUMA in common layer. So we need to
> >     add another one for Arm like CONFIG_ARM_NUMA?
> 
> I think so.
> 
> >     And in this option, we can select CONFIG_DEVICE_TREE_NUMA
> >     automatically if device tree is enabled. If CONFIG_ACPI
> >     is enabled, we will select CONFIG_ACPI_NUMA too (in the
> >     future)
> >     In Xen code, DT_NUMA and ACPI_NUMA code can co-exist, Xen
> 
> Distributions should not have to build a different Xen for DT and ACPI.
> So it is more they *must* co-exist.
> 
> >     will check the system ACPI support status to decide to use
> >     DT_NUMA or ACPI_NUMA?
> 
> Yes. A user should only have to say "I want to use NUMA". This is Xen to
> figure out whether we need to compile the support for DT and/or ACPI.
> 
> Once we have support for APCI, it doesn't make a lot of sense for the
> users to say "I want to compile with DT and ACPI but I only want NUMA
> when using DT".
> 

I am glad we are now in the same page. Ok, I will change the Kconfig
like this in next version.

> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 07/40] xen/arm: use !CONFIG_NUMA to keep fake NUMA API
  2021-08-20 12:23             ` Wei Chen
@ 2021-08-20 14:41               ` Julien Grall
  0 siblings, 0 replies; 196+ messages in thread
From: Julien Grall @ 2021-08-20 14:41 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini; +Cc: Bertrand Marquis

On 20/08/2021 13:23, Wei Chen wrote:
> Hi Julien,

Hi Wei,

>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 2021年8月20日 19:24
>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
>> sstabellini@kernel.org
>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
>> Subject: Re: [XEN RFC PATCH 07/40] xen/arm: use !CONFIG_NUMA to keep fake
>> NUMA API
>>
>>
>>
>> On 20/08/2021 11:24, Wei Chen wrote:
>>> Hi Julien,
>>
>> Hi Wei,
>>
>>>
>>>> -----Original Message-----
>>>> From: Julien Grall <julien@xen.org>
>>>> Sent: 2021年8月20日 16:24
>>>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
>>>> sstabellini@kernel.org; jbeulich@suse.com
>>>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
>>>> Subject: Re: [XEN RFC PATCH 07/40] xen/arm: use !CONFIG_NUMA to keep
>> fake
>>>> NUMA API
>>>>
>>>>
>>>>
>>>> On 20/08/2021 03:08, Wei Chen wrote:
>>>>> Hi Julien,
>>>>
>>>> Hi Wei,
>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Julien Grall <julien@xen.org>
>>>>>> Sent: 2021年8月19日 21:34
>>>>>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
>>>>>> sstabellini@kernel.org; jbeulich@suse.com
>>>>>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
>>>>>> Subject: Re: [XEN RFC PATCH 07/40] xen/arm: use !CONFIG_NUMA to keep
>>>> fake
>>>>>> NUMA API
>>>>>>
>>>>>> Hi Wei,
>>>>>>
>>>>>> On 11/08/2021 11:23, Wei Chen wrote:
>>>>>>> Only Arm64 supports NUMA, the CONFIG_NUMA could not be
>>>>>>> enabled for Arm32.
>>>>>>
>>>>>> What do you mean by "could not be enabled"?
>>>>>
>>>>> I have not seen any Arm32 hardware support NUMA, so I think
>>>>> we don't need to support Arm32 NUMA.
>>>>
>>>> I understand that there may not be 32-bit platform with NUMA. And
>> that's
>>>> fine stating that in the commit message. However...
>>>>
>>>>> In this case, this Kconfig
>>>>> option could not be enabled on Arm32.
>>>>
>>>> ... you continue to say "couldn't be enabled" without clarifying
>> whether
>>>> this mean that the build will break or this was just not tested because
>>>> you don't have any platform.
>>>
>>> Ok, I understand your concern. Yes, my words would lead to mis-
>> understanding.
>>> If we make CONFIG_NUMA enabled in Arm32, it need Arm32 to implement some
>>> code to support NUMA common code. Otherwise the Arm32 build will failed.
>>
>> When I skimmed through the series, most of the code seems to be either
>> in common, arm (bitness neutral). So I am not quite too sure why it
>> would not build. Do you have more details?
>>
> 
> It could not build because I have not tried to enable device_tree_numa
> option for Arm32 but enabled NUMA for arm32.
> 
> I have tested it again, yes, simple enable device_tree_numa and NUMA
> for arm32 can build a image successfully.
> 
> So, I think it's OK to enable this on Arm32, and I will do it in next
> version. But, can we still keep these FAKE APIs? If user don't want to
> enable NUMA they still can make Xen work? 

Yes, we still need to keep the FAKE APIs. I was only commenting about 
the wording in the commit message.

> And I will remove "could not
> enable for Arm32" from commit log.
> 
>>> I have not tried to implement those code for Arm32. And I found there is
>>> no Arm32 machine support NUMA, so I wanted Arm32 to use fake NUMA API
>>> as before.
>>>
>>>>
>>>> To put it differently, the code for NUMA looks bitness neutral. So I
>>>> cannot really what what prevent us to potentially use it on Arm 32-bit.
>>>>
>>>
>>> Yes, you're right, it's neutral. But do we really need to add code to an
>>> ARCH that it may never use?
>>
>> Technically you already added the code because arch/arm/ is common
>> between arm32 and arm64. My only ask is to not make the new config
>> depends on arm64. If you only build test it that fine because...
>>
>> And how can we test this code?
>>
>> I don't expect any of the code to be an issue on arm32 as the code
>> should mostly be arch neutral.
> 
> I mean, we don't have Arm32 NUMA machine to test, I don't know
> the code works well on Arm32 NUMA or not. I only can verify them
> on non-NUMA arm32, and make sure this code will not break existed
> machines.

I understood you don't have any Arm32 NUMA machine. But I don't see the 
lack of testing as an issue because the code doesn't look to contain 
bits that may rely on arm64. So there are very limited reasons for the 
code to break on arm32.

If we really want to test it, then it should be feasible to fake the 
NUMA node in the DT. However, I don't expect you to do it.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse device tree processor node
  2021-08-19 18:09   ` Julien Grall
@ 2021-08-23  8:42     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-23  8:42 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini, jbeulich; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月20日 2:10
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse
> device tree processor node
> 
> Hi Wei,
> 
> On 11/08/2021 11:24, Wei Chen wrote:
> > Processor NUMA ID information is stored in device tree's processor
> > node as "numa-node-id". We need a new helper to parse this ID from
> > processor node. If we get this ID from processor node, this ID's
> > validity still need to be checked. Once we got a invalid NUMA ID
> > from any processor node, the device tree will be marked as NUMA
> > information invalid.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/arm/numa_device_tree.c | 41 +++++++++++++++++++++++++++++++--
> >   1 file changed, 39 insertions(+), 2 deletions(-)
> >
> > diff --git a/xen/arch/arm/numa_device_tree.c
> b/xen/arch/arm/numa_device_tree.c
> > index 1c74ad135d..37cc56acf3 100644
> > --- a/xen/arch/arm/numa_device_tree.c
> > +++ b/xen/arch/arm/numa_device_tree.c
> > @@ -20,16 +20,53 @@
> >   #include <xen/init.h>
> >   #include <xen/nodemask.h>
> >   #include <xen/numa.h>
> > +#include <xen/device_tree.h>
> > +#include <asm/setup.h>
> >
> >   s8 device_tree_numa = 0;
> > +static nodemask_t processor_nodes_parsed __initdata;
> >
> > -int srat_disabled(void)
> > +static int srat_disabled(void)
> >   {
> >       return numa_off || device_tree_numa < 0;
> >   }
> >
> > -void __init bad_srat(void)
> > +static __init void bad_srat(void)
> >   {
> >       printk(KERN_ERR "DT: NUMA information is not used.\n");
> >       device_tree_numa = -1;
> >   }
> > +
> > +/* Callback for device tree processor affinity */
> > +static int __init dtb_numa_processor_affinity_init(nodeid_t node)
> > +{
> > +    if ( srat_disabled() )
> > +        return -EINVAL;
> > +    else if ( node == NUMA_NO_NODE || node >= MAX_NUMNODES ) {
> > +		bad_srat();
> > +		return -EINVAL;
> 
> You seem to have a mix of soft and hard tab in this file. Is there a lot
> of the code that was directly copied from Linux? If not, then the file
> should be using Xen coding style.
> 

I copied some code from x86, and x86 is Linux style.
So, yes, I should adjust them it Xen coding style.
I will do it in next version.

> > +	}
> > +
> > +    node_set(node, processor_nodes_parsed);
> > +
> > +    device_tree_numa = 1;
> > +    printk(KERN_INFO "DT: NUMA node %u processor parsed\n", node);
> > +
> > +    return 0;
> > +}
> > +
> > +/* Parse CPU NUMA node info */
> > +int __init device_tree_parse_numa_cpu_node(const void *fdt, int node)
> 
> AFAICT, you are going to turn this helper static in a follow-up patch.
> This is a bad practice. Instead, the function should be static from the
> beginning. If it is not possible, then you should re-order the code.
> 
> In this case, I think you can add the boilerplate to parse the NUMA
> information (patch #25) here and then extend it in each patch.
> 
> 

That's make sense, I will try to address it in next version.

> > +{
> > +    uint32_t nid;
> > +
> > +    nid = device_tree_get_u32(fdt, node, "numa-node-id", MAX_NUMNODES);
> > +    printk(XENLOG_WARNING "CPU on NUMA node:%u\n", nid);
> > +    if ( nid >= MAX_NUMNODES )
> > +    {
> > +        printk(XENLOG_WARNING "Node id %u exceeds maximum value\n",
> nid);
> > +        return -EINVAL;
> > +    }
> > +
> > +    return dtb_numa_processor_affinity_init(nid);
> > +}
> >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse device tree processor node
  2021-08-19 18:10   ` Julien Grall
@ 2021-08-23  8:47     ` Wei Chen
  2021-08-23 10:59       ` Julien Grall
  0 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-23  8:47 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月20日 2:11
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse
> device tree processor node
> 
> On 11/08/2021 11:24, Wei Chen wrote:
> > Processor NUMA ID information is stored in device tree's processor
> > node as "numa-node-id". We need a new helper to parse this ID from
> > processor node. If we get this ID from processor node, this ID's
> > validity still need to be checked. Once we got a invalid NUMA ID
> > from any processor node, the device tree will be marked as NUMA
> > information invalid.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/arm/numa_device_tree.c | 41 +++++++++++++++++++++++++++++++--
> >   1 file changed, 39 insertions(+), 2 deletions(-)
> >
> > diff --git a/xen/arch/arm/numa_device_tree.c
> b/xen/arch/arm/numa_device_tree.c
> > index 1c74ad135d..37cc56acf3 100644
> > --- a/xen/arch/arm/numa_device_tree.c
> > +++ b/xen/arch/arm/numa_device_tree.c
> > @@ -20,16 +20,53 @@
> >   #include <xen/init.h>
> >   #include <xen/nodemask.h>
> >   #include <xen/numa.h>
> > +#include <xen/device_tree.h>
> > +#include <asm/setup.h>
> >
> >   s8 device_tree_numa = 0;
> > +static nodemask_t processor_nodes_parsed __initdata;
> >
> > -int srat_disabled(void)
> > +static int srat_disabled(void)
> >   {
> >       return numa_off || device_tree_numa < 0;
> >   }
> >
> > -void __init bad_srat(void)
> > +static __init void bad_srat(void)
> >   {
> >       printk(KERN_ERR "DT: NUMA information is not used.\n");
> >       device_tree_numa = -1;
> >   }
> > +
> > +/* Callback for device tree processor affinity */
> > +static int __init dtb_numa_processor_affinity_init(nodeid_t node)
> 
> I forgot to answer. It seems odd that some of the function names start
> with dtb_* while other starts device_tree_*. Any particular reason for
> that difference of naming?
> 

yes, in the very beginning, I want to keep device_tree_ prefix for
functions that will handle dtb file. And use dtb_ prefix to replace
acpi, to indicate, this function is device tree version numa implementation.

If that's not the right reason, I will unify all prefix to device_tree_
in next version. How do you think about it?

> > +{
> > +    if ( srat_disabled() )
> > +        return -EINVAL;
> > +    else if ( node == NUMA_NO_NODE || node >= MAX_NUMNODES ) {
> > +		bad_srat();
> > +		return -EINVAL;
> > +	}
> > +
> > +    node_set(node, processor_nodes_parsed);
> > +
> > +    device_tree_numa = 1;
> > +    printk(KERN_INFO "DT: NUMA node %u processor parsed\n", node);
> > +
> > +    return 0;
> > +}
> > +
> > +/* Parse CPU NUMA node info */
> > +int __init device_tree_parse_numa_cpu_node(const void *fdt, int node)
> > +{
> > +    uint32_t nid;
> > +
> > +    nid = device_tree_get_u32(fdt, node, "numa-node-id", MAX_NUMNODES);
> > +    printk(XENLOG_WARNING "CPU on NUMA node:%u\n", nid);
> > +    if ( nid >= MAX_NUMNODES )
> > +    {
> > +        printk(XENLOG_WARNING "Node id %u exceeds maximum value\n",
> nid);
> > +        return -EINVAL;
> > +    }
> > +
> > +    return dtb_numa_processor_affinity_init(nid);
> > +}
> >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse device tree processor node
  2021-08-23  8:47     ` Wei Chen
@ 2021-08-23 10:59       ` Julien Grall
  2021-08-24  4:09         ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-23 10:59 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini; +Cc: Bertrand Marquis



On 23/08/2021 09:47, Wei Chen wrote:
> Hi Julien,

Hi Wei,

>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 2021年8月20日 2:11
>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
>> sstabellini@kernel.org; jbeulich@suse.com
>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
>> Subject: Re: [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse
>> device tree processor node
>>
>> On 11/08/2021 11:24, Wei Chen wrote:
>>> Processor NUMA ID information is stored in device tree's processor
>>> node as "numa-node-id". We need a new helper to parse this ID from
>>> processor node. If we get this ID from processor node, this ID's
>>> validity still need to be checked. Once we got a invalid NUMA ID
>>> from any processor node, the device tree will be marked as NUMA
>>> information invalid.
>>>
>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>>> ---
>>>    xen/arch/arm/numa_device_tree.c | 41 +++++++++++++++++++++++++++++++--
>>>    1 file changed, 39 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/xen/arch/arm/numa_device_tree.c
>> b/xen/arch/arm/numa_device_tree.c
>>> index 1c74ad135d..37cc56acf3 100644
>>> --- a/xen/arch/arm/numa_device_tree.c
>>> +++ b/xen/arch/arm/numa_device_tree.c
>>> @@ -20,16 +20,53 @@
>>>    #include <xen/init.h>
>>>    #include <xen/nodemask.h>
>>>    #include <xen/numa.h>
>>> +#include <xen/device_tree.h>
>>> +#include <asm/setup.h>
>>>
>>>    s8 device_tree_numa = 0;
>>> +static nodemask_t processor_nodes_parsed __initdata;
>>>
>>> -int srat_disabled(void)
>>> +static int srat_disabled(void)
>>>    {
>>>        return numa_off || device_tree_numa < 0;
>>>    }
>>>
>>> -void __init bad_srat(void)
>>> +static __init void bad_srat(void)
>>>    {
>>>        printk(KERN_ERR "DT: NUMA information is not used.\n");
>>>        device_tree_numa = -1;
>>>    }
>>> +
>>> +/* Callback for device tree processor affinity */
>>> +static int __init dtb_numa_processor_affinity_init(nodeid_t node)
>>
>> I forgot to answer. It seems odd that some of the function names start
>> with dtb_* while other starts device_tree_*. Any particular reason for
>> that difference of naming?
>>
> 
> yes, in the very beginning, I want to keep device_tree_ prefix for
> functions that will handle dtb file. And use dtb_ prefix to replace
> acpi, to indicate, this function is device tree version numa implementation.

Thanks for the clarification. The difference between "dtb" and 
"device_tree" is quite subttle: the former refers to the binary while 
the latter refers to the format. Most of the readers are likely to infer 
they mean the same. So I think this will bring more confusion.

> 
> If that's not the right reason, I will unify all prefix to device_tree_
> in next version. How do you think about it?

AFAICT, your parsing functions will always start with 
"device_tree_parse_". I would prefer if the set replacing the ACPI 
helpers start with "device_tree_".

If you are concern with the length of the function name, then I would 
suggest to prefix all the functions with "fdt" (We are dealing with the 
flattened DT after all) or "dt".

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 08/40] xen/x86: Move NUMA memory node map functions to common
  2021-08-11 10:23 ` [XEN RFC PATCH 08/40] xen/x86: Move NUMA memory node map functions to common Wei Chen
@ 2021-08-23 17:47   ` Julien Grall
  2021-08-24  4:07     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-23 17:47 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:23, Wei Chen wrote:
> In the later patches we will add NUMA support to Arm. Arm
> NUMA support will follow current memory node map management
> as x86. So this part of code can be common, in this case,
> we move this part of code from arch/x86 to common.

I would add "No functional changes intended" to make clear this patch is 
only moving code.

> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/x86/numa.c        | 114 --------------------------------
>   xen/common/Makefile        |   1 +
>   xen/common/numa.c          | 131 +++++++++++++++++++++++++++++++++++++
>   xen/include/asm-x86/numa.h |  29 --------
>   xen/include/xen/numa.h     |  35 ++++++++++
>   5 files changed, 167 insertions(+), 143 deletions(-)
>   create mode 100644 xen/common/numa.c
> 
> diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
> index d23f4f7919..a6211be121 100644
> --- a/xen/arch/x86/numa.c
> +++ b/xen/arch/x86/numa.c
> @@ -29,14 +29,6 @@ custom_param("numa", numa_setup);
>   /* from proto.h */
>   #define round_up(x,y) ((((x)+(y))-1) & (~((y)-1)))
>   
> -struct node_data node_data[MAX_NUMNODES];
> -
> -/* Mapping from pdx to node id */
> -int memnode_shift;
> -static typeof(*memnodemap) _memnodemap[64];
> -unsigned long memnodemapsize;
> -u8 *memnodemap;
> -
>   nodeid_t cpu_to_node[NR_CPUS] __read_mostly = {
>       [0 ... NR_CPUS-1] = NUMA_NO_NODE
>   };
> @@ -58,112 +50,6 @@ int srat_disabled(void)
>       return numa_off || acpi_numa < 0;
>   }
>   
> -/*
> - * Given a shift value, try to populate memnodemap[]
> - * Returns :
> - * 1 if OK
> - * 0 if memnodmap[] too small (of shift too small)
> - * -1 if node overlap or lost ram (shift too big)
> - */
> -static int __init populate_memnodemap(const struct node *nodes,
> -                                      int numnodes, int shift, nodeid_t *nodeids)
> -{
> -    unsigned long spdx, epdx;
> -    int i, res = -1;
> -
> -    memset(memnodemap, NUMA_NO_NODE, memnodemapsize * sizeof(*memnodemap));
> -    for ( i = 0; i < numnodes; i++ )
> -    {
> -        spdx = paddr_to_pdx(nodes[i].start);
> -        epdx = paddr_to_pdx(nodes[i].end - 1) + 1;
> -        if ( spdx >= epdx )
> -            continue;
> -        if ( (epdx >> shift) >= memnodemapsize )
> -            return 0;
> -        do {
> -            if ( memnodemap[spdx >> shift] != NUMA_NO_NODE )
> -                return -1;
> -
> -            if ( !nodeids )
> -                memnodemap[spdx >> shift] = i;
> -            else
> -                memnodemap[spdx >> shift] = nodeids[i];
> -
> -            spdx += (1UL << shift);
> -        } while ( spdx < epdx );
> -        res = 1;
> -    }
> -
> -    return res;
> -}
> -
> -static int __init allocate_cachealigned_memnodemap(void)
> -{
> -    unsigned long size = PFN_UP(memnodemapsize * sizeof(*memnodemap));
> -    unsigned long mfn = mfn_x(alloc_boot_pages(size, 1));
> -
> -    memnodemap = mfn_to_virt(mfn);
> -    mfn <<= PAGE_SHIFT;
> -    size <<= PAGE_SHIFT;
> -    printk(KERN_DEBUG "NUMA: Allocated memnodemap from %lx - %lx\n",
> -           mfn, mfn + size);
> -    memnodemapsize = size / sizeof(*memnodemap);
> -
> -    return 0;
> -}
> -
> -/*
> - * The LSB of all start and end addresses in the node map is the value of the
> - * maximum possible shift.
> - */
> -static int __init extract_lsb_from_nodes(const struct node *nodes,
> -                                         int numnodes)
> -{
> -    int i, nodes_used = 0;
> -    unsigned long spdx, epdx;
> -    unsigned long bitfield = 0, memtop = 0;
> -
> -    for ( i = 0; i < numnodes; i++ )
> -    {
> -        spdx = paddr_to_pdx(nodes[i].start);
> -        epdx = paddr_to_pdx(nodes[i].end - 1) + 1;
> -        if ( spdx >= epdx )
> -            continue;
> -        bitfield |= spdx;
> -        nodes_used++;
> -        if ( epdx > memtop )
> -            memtop = epdx;
> -    }
> -    if ( nodes_used <= 1 )
> -        i = BITS_PER_LONG - 1;
> -    else
> -        i = find_first_bit(&bitfield, sizeof(unsigned long)*8);
> -    memnodemapsize = (memtop >> i) + 1;
> -    return i;
> -}
> -
> -int __init compute_hash_shift(struct node *nodes, int numnodes,
> -                              nodeid_t *nodeids)
> -{
> -    int shift;
> -
> -    shift = extract_lsb_from_nodes(nodes, numnodes);
> -    if ( memnodemapsize <= ARRAY_SIZE(_memnodemap) )
> -        memnodemap = _memnodemap;
> -    else if ( allocate_cachealigned_memnodemap() )
> -        return -1;
> -    printk(KERN_DEBUG "NUMA: Using %d for the hash shift.\n", shift);
> -
> -    if ( populate_memnodemap(nodes, numnodes, shift, nodeids) != 1 )
> -    {
> -        printk(KERN_INFO "Your memory is not aligned you need to "
> -               "rebuild your hypervisor with a bigger NODEMAPSIZE "
> -               "shift=%d\n", shift);
> -        return -1;
> -    }
> -
> -    return shift;
> -}
>   /* initialize NODE_DATA given nodeid and start/end */
>   void __init setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end)
>   {
> diff --git a/xen/common/Makefile b/xen/common/Makefile
> index 54de70d422..f8f667e90a 100644
> --- a/xen/common/Makefile
> +++ b/xen/common/Makefile
> @@ -54,6 +54,7 @@ obj-y += wait.o
>   obj-bin-y += warning.init.o
>   obj-$(CONFIG_XENOPROF) += xenoprof.o
>   obj-y += xmalloc_tlsf.o
> +obj-$(CONFIG_NUMA) += numa.o

AFAICT, the Makefile is listing the file in alphabetical order. So 
please add numa.o in the correct position.

>   
>   obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma lzo unlzo unlz4 unzstd earlycpio,$(n).init.o)
>   
> diff --git a/xen/common/numa.c b/xen/common/numa.c
> new file mode 100644
> index 0000000000..e65b6a6676
> --- /dev/null
> +++ b/xen/common/numa.c
> @@ -0,0 +1,131 @@
> +/*
> + * Generic VM initialization for x86-64 NUMA setups.
> + * Copyright 2002,2003 Andi Kleen, SuSE Labs.
> + * Adapted for Xen: Ryan Harper <ryanh@us.ibm.com>
> + */
> +
> +#include <xen/mm.h>
> +#include <xen/string.h>
> +#include <xen/init.h>
> +#include <xen/ctype.h>

You don't seem to use any helpers./types directly defined by at least 
this header...

> +#include <xen/nodemask.h>
> +#include <xen/numa.h>
> +#include <xen/time.h>

... this one and ...

> +#include <xen/smp.h>

... this one. Can you check the list of headers and introduce the 
minimum? If the dependency is required by another headers, then I think 
that dependency should be moved in the header requiring it.

> +#include <xen/pfn.h>
> +#include <xen/sched.h>

Please sort the includes in alphabetical order.

> +
> +struct node_data node_data[MAX_NUMNODES];
> +
> +/* Mapping from pdx to node id */
> +int memnode_shift;
> +typeof(*memnodemap) _memnodemap[64];
> +unsigned long memnodemapsize;
> +u8 *memnodemap;
> +
> +/*
> + * Given a shift value, try to populate memnodemap[]
> + * Returns :
> + * 1 if OK
> + * 0 if memnodmap[] too small (of shift too small)
> + * -1 if node overlap or lost ram (shift too big)
> + */
> +static int __init populate_memnodemap(const struct node *nodes,
> +                                      int numnodes, int shift, nodeid_t *nodeids)
> +{
> +    unsigned long spdx, epdx;
> +    int i, res = -1;
> +
> +    memset(memnodemap, NUMA_NO_NODE, memnodemapsize * sizeof(*memnodemap));
> +    for ( i = 0; i < numnodes; i++ )
> +    {
> +        spdx = paddr_to_pdx(nodes[i].start);
> +        epdx = paddr_to_pdx(nodes[i].end - 1) + 1;
> +        if ( spdx >= epdx )
> +            continue;
> +        if ( (epdx >> shift) >= memnodemapsize )
> +            return 0;
> +        do {
> +            if ( memnodemap[spdx >> shift] != NUMA_NO_NODE )
> +                return -1;
> +
> +            if ( !nodeids )
> +                memnodemap[spdx >> shift] = i;
> +            else
> +                memnodemap[spdx >> shift] = nodeids[i];
> +
> +            spdx += (1UL << shift);
> +        } while ( spdx < epdx );
> +        res = 1;
> +    }
> +
> +    return res;
> +}
> +
> +static int __init allocate_cachealigned_memnodemap(void)
> +{
> +    unsigned long size = PFN_UP(memnodemapsize * sizeof(*memnodemap));
> +    unsigned long mfn = mfn_x(alloc_boot_pages(size, 1));
> +
> +    memnodemap = mfn_to_virt(mfn);
> +    mfn <<= PAGE_SHIFT;
> +    size <<= PAGE_SHIFT;
> +    printk(KERN_DEBUG "NUMA: Allocated memnodemap from %lx - %lx\n",
> +           mfn, mfn + size);
> +    memnodemapsize = size / sizeof(*memnodemap);
> +
> +    return 0;
> +}
> +
> +/*
> + * The LSB of all start and end addresses in the node map is the value of the
> + * maximum possible shift.
> + */
> +static int __init extract_lsb_from_nodes(const struct node *nodes,
> +                                         int numnodes)
> +{
> +    int i, nodes_used = 0;
> +    unsigned long spdx, epdx;
> +    unsigned long bitfield = 0, memtop = 0;
> +
> +    for ( i = 0; i < numnodes; i++ )
> +    {
> +        spdx = paddr_to_pdx(nodes[i].start);
> +        epdx = paddr_to_pdx(nodes[i].end - 1) + 1;
> +        if ( spdx >= epdx )
> +            continue;
> +        bitfield |= spdx;
> +        nodes_used++;
> +        if ( epdx > memtop )
> +            memtop = epdx;
> +    }
> +    if ( nodes_used <= 1 )
> +        i = BITS_PER_LONG - 1;
> +    else
> +        i = find_first_bit(&bitfield, sizeof(unsigned long)*8);
> +    memnodemapsize = (memtop >> i) + 1;
> +    return i;
> +}
> +
> +int __init compute_hash_shift(struct node *nodes, int numnodes,
> +                              nodeid_t *nodeids)
> +{
> +    int shift;
> +
> +    shift = extract_lsb_from_nodes(nodes, numnodes);
> +    if ( memnodemapsize <= ARRAY_SIZE(_memnodemap) )
> +        memnodemap = _memnodemap;
> +    else if ( allocate_cachealigned_memnodemap() )
> +        return -1;
> +    printk(KERN_DEBUG "NUMA: Using %d for the hash shift.\n", shift);
> +
> +    if ( populate_memnodemap(nodes, numnodes, shift, nodeids) != 1 )
> +    {
> +        printk(KERN_INFO "Your memory is not aligned you need to "
> +               "rebuild your hypervisor with a bigger NODEMAPSIZE "
> +               "shift=%d\n", shift);
> +        return -1;
> +    }
> +
> +    return shift;
> +}
> diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
> index bada2c0bb9..abe5617d01 100644
> --- a/xen/include/asm-x86/numa.h
> +++ b/xen/include/asm-x86/numa.h
> @@ -26,7 +26,6 @@ extern int compute_hash_shift(struct node *nodes, int numnodes,
>   extern nodeid_t pxm_to_node(unsigned int pxm);
>   
>   #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
> -#define VIRTUAL_BUG_ON(x)
>   
>   extern void numa_add_cpu(int cpu);
>   extern void numa_init_array(void);
> @@ -47,34 +46,6 @@ static inline void clear_node_cpumask(int cpu)
>   	cpumask_clear_cpu(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
>   }
>   
> -/* Simple perfect hash to map pdx to node numbers */
> -extern int memnode_shift;
> -extern unsigned long memnodemapsize;
> -extern u8 *memnodemap;
> -
> -struct node_data {
> -    unsigned long node_start_pfn;
> -    unsigned long node_spanned_pages;
> -};
> -
> -extern struct node_data node_data[];
> -
> -static inline __attribute__((pure)) nodeid_t phys_to_nid(paddr_t addr)
> -{
> -	nodeid_t nid;
> -	VIRTUAL_BUG_ON((paddr_to_pdx(addr) >> memnode_shift) >= memnodemapsize);
> -	nid = memnodemap[paddr_to_pdx(addr) >> memnode_shift];
> -	VIRTUAL_BUG_ON(nid >= MAX_NUMNODES || !node_data[nid]);
> -	return nid;
> -}
> -
> -#define NODE_DATA(nid)		(&(node_data[nid]))
> -
> -#define node_start_pfn(nid)	(NODE_DATA(nid)->node_start_pfn)
> -#define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
> -#define node_end_pfn(nid)       (NODE_DATA(nid)->node_start_pfn + \
> -				 NODE_DATA(nid)->node_spanned_pages)
> -
>   extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
>   
>   void srat_parse_regions(u64 addr);
> diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
> index 7aef1a88dc..39e8a4e00a 100644
> --- a/xen/include/xen/numa.h
> +++ b/xen/include/xen/numa.h
> @@ -18,4 +18,39 @@
>     (((d)->vcpu != NULL && (d)->vcpu[0] != NULL) \
>      ? vcpu_to_node((d)->vcpu[0]) : NUMA_NO_NODE)
>   
> +/* The following content can be used when NUMA feature is enabled */
> +#if defined(CONFIG_NUMA)

Please use #ifdef CONFIG_NUMA

> +
> +/* Simple perfect hash to map pdx to node numbers */
> +extern int memnode_shift;
> +extern unsigned long memnodemapsize;
> +extern u8 *memnodemap;
> +extern typeof(*memnodemap) _memnodemap[64];

AFAICT, this will be turned static against in a follow-up patch. Can 
this be avoided?

> +
> +struct node_data {
> +    unsigned long node_start_pfn;
> +    unsigned long node_spanned_pages;
> +};
> +
> +extern struct node_data node_data[];
> +#define VIRTUAL_BUG_ON(x)
> +
> +static inline __attribute__((pure)) nodeid_t phys_to_nid(paddr_t addr)
> +{
> +	nodeid_t nid;
> +	VIRTUAL_BUG_ON((paddr_to_pdx(addr) >> memnode_shift) >= memnodemapsize);
> +	nid = memnodemap[paddr_to_pdx(addr) >> memnode_shift];
> +	VIRTUAL_BUG_ON(nid >= MAX_NUMNODES || !node_data[nid]);
> +	return nid;
> +}
> +
> +#define NODE_DATA(nid)		(&(node_data[nid]))
> +
> +#define node_start_pfn(nid)	(NODE_DATA(nid)->node_start_pfn)
> +#define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
> +#define node_end_pfn(nid)       (NODE_DATA(nid)->node_start_pfn + \
> +				 NODE_DATA(nid)->node_spanned_pages)
> +
> +#endif /* CONFIG_NUMA */
> +
>   #endif /* _XEN_NUMA_H */
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 09/40] xen/x86: Move numa_add_cpu_node to common
  2021-08-11 10:23 ` [XEN RFC PATCH 09/40] xen/x86: Move numa_add_cpu_node " Wei Chen
@ 2021-08-23 17:54   ` Julien Grall
  2021-08-24  4:18     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-23 17:54 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:23, Wei Chen wrote:
> This function will be reused by Arm later, so we move it
> from arch/x86 to common. But we keep cpu_to_node and
> node_to_cpumask to x86 header file. Because cpu_to_node and
> node_to_cpumask have different implementation for x86 and Arm.
> We will move them to common header file when we change the Arm
> implementation in later patches.

AFAICT, the Arm helpers are gated by !CONFIG_NUMA and the ones in common 
code will be gated by CONFIG_NUMA. So I am not quite too understand why 
they can't be moved now. Can you clarify it?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 10/40] xen/x86: Move NR_NODE_MEMBLKS macro to common
  2021-08-11 10:23 ` [XEN RFC PATCH 10/40] xen/x86: Move NR_NODE_MEMBLKS macro " Wei Chen
@ 2021-08-23 17:58   ` Julien Grall
  0 siblings, 0 replies; 196+ messages in thread
From: Julien Grall @ 2021-08-23 17:58 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:23, Wei Chen wrote:
> Not only x86 ACPI need this macro. Device tree based NUMA
> also needs this macro to present max memory block number.
AFAICT, a memory range described in DT cannot be split across multiple 
nodes. So I think we want to define NR_NODE_MEMBLKS as NR_MEM_BANKS.

> So we move it from x86 ACPI header file to common NUMA
> header file.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/include/asm-x86/acpi.h | 1 -
>   xen/include/xen/numa.h     | 1 +
>   2 files changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/include/asm-x86/acpi.h b/xen/include/asm-x86/acpi.h
> index 7032f3a001..d347500a3c 100644
> --- a/xen/include/asm-x86/acpi.h
> +++ b/xen/include/asm-x86/acpi.h
> @@ -103,7 +103,6 @@ extern unsigned long acpi_wakeup_address;
>   
>   extern s8 acpi_numa;
>   extern int acpi_scan_nodes(u64 start, u64 end);
> -#define NR_NODE_MEMBLKS (MAX_NUMNODES*2)
>   
>   extern struct acpi_sleep_info acpi_sinfo;
>   #define acpi_video_flags bootsym(video_flags)
> diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
> index f9769cba4b..5af74b357f 100644
> --- a/xen/include/xen/numa.h
> +++ b/xen/include/xen/numa.h
> @@ -11,6 +11,7 @@
>   #define NUMA_NO_DISTANCE 0xFF
>   
>   #define MAX_NUMNODES    (1 << NODES_SHIFT)
> +#define NR_NODE_MEMBLKS (MAX_NUMNODES*2)
>   
>   #define vcpu_to_node(v) (cpu_to_node((v)->processor))
>   
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 08/40] xen/x86: Move NUMA memory node map functions to common
  2021-08-23 17:47   ` Julien Grall
@ 2021-08-24  4:07     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-24  4:07 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini, jbeulich; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月24日 1:47
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 08/40] xen/x86: Move NUMA memory node map
> functions to common
> 
> Hi Wei,
> 
> On 11/08/2021 11:23, Wei Chen wrote:
> > In the later patches we will add NUMA support to Arm. Arm
> > NUMA support will follow current memory node map management
> > as x86. So this part of code can be common, in this case,
> > we move this part of code from arch/x86 to common.
> 
> I would add "No functional changes intended" to make clear this patch is
> only moving code.

Ok, I will do it.

> 
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/x86/numa.c        | 114 --------------------------------
> >   xen/common/Makefile        |   1 +
> >   xen/common/numa.c          | 131 +++++++++++++++++++++++++++++++++++++
> >   xen/include/asm-x86/numa.h |  29 --------
> >   xen/include/xen/numa.h     |  35 ++++++++++
> >   5 files changed, 167 insertions(+), 143 deletions(-)
> >   create mode 100644 xen/common/numa.c
> >
> > diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
> > index d23f4f7919..a6211be121 100644
> > --- a/xen/arch/x86/numa.c
> > +++ b/xen/arch/x86/numa.c
> > @@ -29,14 +29,6 @@ custom_param("numa", numa_setup);
> >   /* from proto.h */
> >   #define round_up(x,y) ((((x)+(y))-1) & (~((y)-1)))
> >
> > -struct node_data node_data[MAX_NUMNODES];
> > -
> > -/* Mapping from pdx to node id */
> > -int memnode_shift;
> > -static typeof(*memnodemap) _memnodemap[64];
> > -unsigned long memnodemapsize;
> > -u8 *memnodemap;
> > -
> >   nodeid_t cpu_to_node[NR_CPUS] __read_mostly = {
> >       [0 ... NR_CPUS-1] = NUMA_NO_NODE
> >   };
> > @@ -58,112 +50,6 @@ int srat_disabled(void)
> >       return numa_off || acpi_numa < 0;
> >   }
> >
> > -/*
> > - * Given a shift value, try to populate memnodemap[]
> > - * Returns :
> > - * 1 if OK
> > - * 0 if memnodmap[] too small (of shift too small)
> > - * -1 if node overlap or lost ram (shift too big)
> > - */
> > -static int __init populate_memnodemap(const struct node *nodes,
> > -                                      int numnodes, int shift, nodeid_t
> *nodeids)
> > -{
> > -    unsigned long spdx, epdx;
> > -    int i, res = -1;
> > -
> > -    memset(memnodemap, NUMA_NO_NODE, memnodemapsize *
> sizeof(*memnodemap));
> > -    for ( i = 0; i < numnodes; i++ )
> > -    {
> > -        spdx = paddr_to_pdx(nodes[i].start);
> > -        epdx = paddr_to_pdx(nodes[i].end - 1) + 1;
> > -        if ( spdx >= epdx )
> > -            continue;
> > -        if ( (epdx >> shift) >= memnodemapsize )
> > -            return 0;
> > -        do {
> > -            if ( memnodemap[spdx >> shift] != NUMA_NO_NODE )
> > -                return -1;
> > -
> > -            if ( !nodeids )
> > -                memnodemap[spdx >> shift] = i;
> > -            else
> > -                memnodemap[spdx >> shift] = nodeids[i];
> > -
> > -            spdx += (1UL << shift);
> > -        } while ( spdx < epdx );
> > -        res = 1;
> > -    }
> > -
> > -    return res;
> > -}
> > -
> > -static int __init allocate_cachealigned_memnodemap(void)
> > -{
> > -    unsigned long size = PFN_UP(memnodemapsize * sizeof(*memnodemap));
> > -    unsigned long mfn = mfn_x(alloc_boot_pages(size, 1));
> > -
> > -    memnodemap = mfn_to_virt(mfn);
> > -    mfn <<= PAGE_SHIFT;
> > -    size <<= PAGE_SHIFT;
> > -    printk(KERN_DEBUG "NUMA: Allocated memnodemap from %lx - %lx\n",
> > -           mfn, mfn + size);
> > -    memnodemapsize = size / sizeof(*memnodemap);
> > -
> > -    return 0;
> > -}
> > -
> > -/*
> > - * The LSB of all start and end addresses in the node map is the value
> of the
> > - * maximum possible shift.
> > - */
> > -static int __init extract_lsb_from_nodes(const struct node *nodes,
> > -                                         int numnodes)
> > -{
> > -    int i, nodes_used = 0;
> > -    unsigned long spdx, epdx;
> > -    unsigned long bitfield = 0, memtop = 0;
> > -
> > -    for ( i = 0; i < numnodes; i++ )
> > -    {
> > -        spdx = paddr_to_pdx(nodes[i].start);
> > -        epdx = paddr_to_pdx(nodes[i].end - 1) + 1;
> > -        if ( spdx >= epdx )
> > -            continue;
> > -        bitfield |= spdx;
> > -        nodes_used++;
> > -        if ( epdx > memtop )
> > -            memtop = epdx;
> > -    }
> > -    if ( nodes_used <= 1 )
> > -        i = BITS_PER_LONG - 1;
> > -    else
> > -        i = find_first_bit(&bitfield, sizeof(unsigned long)*8);
> > -    memnodemapsize = (memtop >> i) + 1;
> > -    return i;
> > -}
> > -
> > -int __init compute_hash_shift(struct node *nodes, int numnodes,
> > -                              nodeid_t *nodeids)
> > -{
> > -    int shift;
> > -
> > -    shift = extract_lsb_from_nodes(nodes, numnodes);
> > -    if ( memnodemapsize <= ARRAY_SIZE(_memnodemap) )
> > -        memnodemap = _memnodemap;
> > -    else if ( allocate_cachealigned_memnodemap() )
> > -        return -1;
> > -    printk(KERN_DEBUG "NUMA: Using %d for the hash shift.\n", shift);
> > -
> > -    if ( populate_memnodemap(nodes, numnodes, shift, nodeids) != 1 )
> > -    {
> > -        printk(KERN_INFO "Your memory is not aligned you need to "
> > -               "rebuild your hypervisor with a bigger NODEMAPSIZE "
> > -               "shift=%d\n", shift);
> > -        return -1;
> > -    }
> > -
> > -    return shift;
> > -}
> >   /* initialize NODE_DATA given nodeid and start/end */
> >   void __init setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end)
> >   {
> > diff --git a/xen/common/Makefile b/xen/common/Makefile
> > index 54de70d422..f8f667e90a 100644
> > --- a/xen/common/Makefile
> > +++ b/xen/common/Makefile
> > @@ -54,6 +54,7 @@ obj-y += wait.o
> >   obj-bin-y += warning.init.o
> >   obj-$(CONFIG_XENOPROF) += xenoprof.o
> >   obj-y += xmalloc_tlsf.o
> > +obj-$(CONFIG_NUMA) += numa.o
> 
> AFAICT, the Makefile is listing the file in alphabetical order. So
> please add numa.o in the correct position.
> 

Thanks for the reminder, I will fix it.

> >
> >   obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma
> lzo unlzo unlz4 unzstd earlycpio,$(n).init.o)
> >
> > diff --git a/xen/common/numa.c b/xen/common/numa.c
> > new file mode 100644
> > index 0000000000..e65b6a6676
> > --- /dev/null
> > +++ b/xen/common/numa.c
> > @@ -0,0 +1,131 @@
> > +/*
> > + * Generic VM initialization for x86-64 NUMA setups.
> > + * Copyright 2002,2003 Andi Kleen, SuSE Labs.
> > + * Adapted for Xen: Ryan Harper <ryanh@us.ibm.com>
> > + */
> > +
> > +#include <xen/mm.h>
> > +#include <xen/string.h>
> > +#include <xen/init.h>
> > +#include <xen/ctype.h>
> 
> You don't seem to use any helpers./types directly defined by at least
> this header...
> 
> > +#include <xen/nodemask.h>
> > +#include <xen/numa.h>
> > +#include <xen/time.h>
> 
> ... this one and ...
> 
> > +#include <xen/smp.h>
> 
> ... this one. Can you check the list of headers and introduce the
> minimum? If the dependency is required by another headers, then I think
> that dependency should be moved in the header requiring it.
> 

I will check it in next version. If it isn't needed, I will remove it.

> > +#include <xen/pfn.h>
> > +#include <xen/sched.h>
> 
> Please sort the includes in alphabetical order.
> 

OK

> > +
> > +struct node_data node_data[MAX_NUMNODES];
> > +
> > +/* Mapping from pdx to node id */
> > +int memnode_shift;
> > +typeof(*memnodemap) _memnodemap[64];
> > +unsigned long memnodemapsize;
> > +u8 *memnodemap;
> > +
> > +/*
> > + * Given a shift value, try to populate memnodemap[]
> > + * Returns :
> > + * 1 if OK
> > + * 0 if memnodmap[] too small (of shift too small)
> > + * -1 if node overlap or lost ram (shift too big)
> > + */
> > +static int __init populate_memnodemap(const struct node *nodes,
> > +                                      int numnodes, int shift, nodeid_t
> *nodeids)
> > +{
> > +    unsigned long spdx, epdx;
> > +    int i, res = -1;
> > +
> > +    memset(memnodemap, NUMA_NO_NODE, memnodemapsize *
> sizeof(*memnodemap));
> > +    for ( i = 0; i < numnodes; i++ )
> > +    {
> > +        spdx = paddr_to_pdx(nodes[i].start);
> > +        epdx = paddr_to_pdx(nodes[i].end - 1) + 1;
> > +        if ( spdx >= epdx )
> > +            continue;
> > +        if ( (epdx >> shift) >= memnodemapsize )
> > +            return 0;
> > +        do {
> > +            if ( memnodemap[spdx >> shift] != NUMA_NO_NODE )
> > +                return -1;
> > +
> > +            if ( !nodeids )
> > +                memnodemap[spdx >> shift] = i;
> > +            else
> > +                memnodemap[spdx >> shift] = nodeids[i];
> > +
> > +            spdx += (1UL << shift);
> > +        } while ( spdx < epdx );
> > +        res = 1;
> > +    }
> > +
> > +    return res;
> > +}
> > +
> > +static int __init allocate_cachealigned_memnodemap(void)
> > +{
> > +    unsigned long size = PFN_UP(memnodemapsize * sizeof(*memnodemap));
> > +    unsigned long mfn = mfn_x(alloc_boot_pages(size, 1));
> > +
> > +    memnodemap = mfn_to_virt(mfn);
> > +    mfn <<= PAGE_SHIFT;
> > +    size <<= PAGE_SHIFT;
> > +    printk(KERN_DEBUG "NUMA: Allocated memnodemap from %lx - %lx\n",
> > +           mfn, mfn + size);
> > +    memnodemapsize = size / sizeof(*memnodemap);
> > +
> > +    return 0;
> > +}
> > +
> > +/*
> > + * The LSB of all start and end addresses in the node map is the value
> of the
> > + * maximum possible shift.
> > + */
> > +static int __init extract_lsb_from_nodes(const struct node *nodes,
> > +                                         int numnodes)
> > +{
> > +    int i, nodes_used = 0;
> > +    unsigned long spdx, epdx;
> > +    unsigned long bitfield = 0, memtop = 0;
> > +
> > +    for ( i = 0; i < numnodes; i++ )
> > +    {
> > +        spdx = paddr_to_pdx(nodes[i].start);
> > +        epdx = paddr_to_pdx(nodes[i].end - 1) + 1;
> > +        if ( spdx >= epdx )
> > +            continue;
> > +        bitfield |= spdx;
> > +        nodes_used++;
> > +        if ( epdx > memtop )
> > +            memtop = epdx;
> > +    }
> > +    if ( nodes_used <= 1 )
> > +        i = BITS_PER_LONG - 1;
> > +    else
> > +        i = find_first_bit(&bitfield, sizeof(unsigned long)*8);
> > +    memnodemapsize = (memtop >> i) + 1;
> > +    return i;
> > +}
> > +
> > +int __init compute_hash_shift(struct node *nodes, int numnodes,
> > +                              nodeid_t *nodeids)
> > +{
> > +    int shift;
> > +
> > +    shift = extract_lsb_from_nodes(nodes, numnodes);
> > +    if ( memnodemapsize <= ARRAY_SIZE(_memnodemap) )
> > +        memnodemap = _memnodemap;
> > +    else if ( allocate_cachealigned_memnodemap() )
> > +        return -1;
> > +    printk(KERN_DEBUG "NUMA: Using %d for the hash shift.\n", shift);
> > +
> > +    if ( populate_memnodemap(nodes, numnodes, shift, nodeids) != 1 )
> > +    {
> > +        printk(KERN_INFO "Your memory is not aligned you need to "
> > +               "rebuild your hypervisor with a bigger NODEMAPSIZE "
> > +               "shift=%d\n", shift);
> > +        return -1;
> > +    }
> > +
> > +    return shift;
> > +}
> > diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
> > index bada2c0bb9..abe5617d01 100644
> > --- a/xen/include/asm-x86/numa.h
> > +++ b/xen/include/asm-x86/numa.h
> > @@ -26,7 +26,6 @@ extern int compute_hash_shift(struct node *nodes, int
> numnodes,
> >   extern nodeid_t pxm_to_node(unsigned int pxm);
> >
> >   #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
> > -#define VIRTUAL_BUG_ON(x)
> >
> >   extern void numa_add_cpu(int cpu);
> >   extern void numa_init_array(void);
> > @@ -47,34 +46,6 @@ static inline void clear_node_cpumask(int cpu)
> >   	cpumask_clear_cpu(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
> >   }
> >
> > -/* Simple perfect hash to map pdx to node numbers */
> > -extern int memnode_shift;
> > -extern unsigned long memnodemapsize;
> > -extern u8 *memnodemap;
> > -
> > -struct node_data {
> > -    unsigned long node_start_pfn;
> > -    unsigned long node_spanned_pages;
> > -};
> > -
> > -extern struct node_data node_data[];
> > -
> > -static inline __attribute__((pure)) nodeid_t phys_to_nid(paddr_t addr)
> > -{
> > -	nodeid_t nid;
> > -	VIRTUAL_BUG_ON((paddr_to_pdx(addr) >> memnode_shift) >=
> memnodemapsize);
> > -	nid = memnodemap[paddr_to_pdx(addr) >> memnode_shift];
> > -	VIRTUAL_BUG_ON(nid >= MAX_NUMNODES || !node_data[nid]);
> > -	return nid;
> > -}
> > -
> > -#define NODE_DATA(nid)		(&(node_data[nid]))
> > -
> > -#define node_start_pfn(nid)	(NODE_DATA(nid)->node_start_pfn)
> > -#define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
> > -#define node_end_pfn(nid)       (NODE_DATA(nid)->node_start_pfn + \
> > -				 NODE_DATA(nid)->node_spanned_pages)
> > -
> >   extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
> >
> >   void srat_parse_regions(u64 addr);
> > diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
> > index 7aef1a88dc..39e8a4e00a 100644
> > --- a/xen/include/xen/numa.h
> > +++ b/xen/include/xen/numa.h
> > @@ -18,4 +18,39 @@
> >     (((d)->vcpu != NULL && (d)->vcpu[0] != NULL) \
> >      ? vcpu_to_node((d)->vcpu[0]) : NUMA_NO_NODE)
> >
> > +/* The following content can be used when NUMA feature is enabled */
> > +#if defined(CONFIG_NUMA)
> 
> Please use #ifdef CONFIG_NUMA
> 
> > +
> > +/* Simple perfect hash to map pdx to node numbers */
> > +extern int memnode_shift;
> > +extern unsigned long memnodemapsize;
> > +extern u8 *memnodemap;
> > +extern typeof(*memnodemap) _memnodemap[64];
> 
> AFAICT, this will be turned static against in a follow-up patch. Can
> this be avoided?
> 

I will try it in next version.

> > +
> > +struct node_data {
> > +    unsigned long node_start_pfn;
> > +    unsigned long node_spanned_pages;
> > +};
> > +
> > +extern struct node_data node_data[];
> > +#define VIRTUAL_BUG_ON(x)
> > +
> > +static inline __attribute__((pure)) nodeid_t phys_to_nid(paddr_t addr)
> > +{
> > +	nodeid_t nid;
> > +	VIRTUAL_BUG_ON((paddr_to_pdx(addr) >> memnode_shift) >=
> memnodemapsize);
> > +	nid = memnodemap[paddr_to_pdx(addr) >> memnode_shift];
> > +	VIRTUAL_BUG_ON(nid >= MAX_NUMNODES || !node_data[nid]);
> > +	return nid;
> > +}
> > +
> > +#define NODE_DATA(nid)		(&(node_data[nid]))
> > +
> > +#define node_start_pfn(nid)	(NODE_DATA(nid)->node_start_pfn)
> > +#define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
> > +#define node_end_pfn(nid)       (NODE_DATA(nid)->node_start_pfn + \
> > +				 NODE_DATA(nid)->node_spanned_pages)
> > +
> > +#endif /* CONFIG_NUMA */
> > +
> >   #endif /* _XEN_NUMA_H */
> >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse device tree processor node
  2021-08-23 10:59       ` Julien Grall
@ 2021-08-24  4:09         ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-24  4:09 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月23日 18:59
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse
> device tree processor node
> 
> 
> 
> On 23/08/2021 09:47, Wei Chen wrote:
> > Hi Julien,
> 
> Hi Wei,
> 
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: 2021年8月20日 2:11
> >> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> >> sstabellini@kernel.org; jbeulich@suse.com
> >> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> >> Subject: Re: [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse
> >> device tree processor node
> >>
> >> On 11/08/2021 11:24, Wei Chen wrote:
> >>> Processor NUMA ID information is stored in device tree's processor
> >>> node as "numa-node-id". We need a new helper to parse this ID from
> >>> processor node. If we get this ID from processor node, this ID's
> >>> validity still need to be checked. Once we got a invalid NUMA ID
> >>> from any processor node, the device tree will be marked as NUMA
> >>> information invalid.
> >>>
> >>> Signed-off-by: Wei Chen <wei.chen@arm.com>
> >>> ---
> >>>    xen/arch/arm/numa_device_tree.c | 41
> +++++++++++++++++++++++++++++++--
> >>>    1 file changed, 39 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/xen/arch/arm/numa_device_tree.c
> >> b/xen/arch/arm/numa_device_tree.c
> >>> index 1c74ad135d..37cc56acf3 100644
> >>> --- a/xen/arch/arm/numa_device_tree.c
> >>> +++ b/xen/arch/arm/numa_device_tree.c
> >>> @@ -20,16 +20,53 @@
> >>>    #include <xen/init.h>
> >>>    #include <xen/nodemask.h>
> >>>    #include <xen/numa.h>
> >>> +#include <xen/device_tree.h>
> >>> +#include <asm/setup.h>
> >>>
> >>>    s8 device_tree_numa = 0;
> >>> +static nodemask_t processor_nodes_parsed __initdata;
> >>>
> >>> -int srat_disabled(void)
> >>> +static int srat_disabled(void)
> >>>    {
> >>>        return numa_off || device_tree_numa < 0;
> >>>    }
> >>>
> >>> -void __init bad_srat(void)
> >>> +static __init void bad_srat(void)
> >>>    {
> >>>        printk(KERN_ERR "DT: NUMA information is not used.\n");
> >>>        device_tree_numa = -1;
> >>>    }
> >>> +
> >>> +/* Callback for device tree processor affinity */
> >>> +static int __init dtb_numa_processor_affinity_init(nodeid_t node)
> >>
> >> I forgot to answer. It seems odd that some of the function names start
> >> with dtb_* while other starts device_tree_*. Any particular reason for
> >> that difference of naming?
> >>
> >
> > yes, in the very beginning, I want to keep device_tree_ prefix for
> > functions that will handle dtb file. And use dtb_ prefix to replace
> > acpi, to indicate, this function is device tree version numa
> implementation.
> 
> Thanks for the clarification. The difference between "dtb" and
> "device_tree" is quite subttle: the former refers to the binary while
> the latter refers to the format. Most of the readers are likely to infer
> they mean the same. So I think this will bring more confusion.
> 

Thanks for the clarification.

> >
> > If that's not the right reason, I will unify all prefix to device_tree_
> > in next version. How do you think about it?
> 
> AFAICT, your parsing functions will always start with
> "device_tree_parse_". I would prefer if the set replacing the ACPI
> helpers start with "device_tree_".
> 
> If you are concern with the length of the function name, then I would
> suggest to prefix all the functions with "fdt" (We are dealing with the
> flattened DT after all) or "dt".

That makes sense, I will do it.

> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 09/40] xen/x86: Move numa_add_cpu_node to common
  2021-08-23 17:54   ` Julien Grall
@ 2021-08-24  4:18     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-24  4:18 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini, jbeulich; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月24日 1:54
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 09/40] xen/x86: Move numa_add_cpu_node to
> common
> 
> Hi Wei,
> 
> On 11/08/2021 11:23, Wei Chen wrote:
> > This function will be reused by Arm later, so we move it
> > from arch/x86 to common. But we keep cpu_to_node and
> > node_to_cpumask to x86 header file. Because cpu_to_node and
> > node_to_cpumask have different implementation for x86 and Arm.
> > We will move them to common header file when we change the Arm
> > implementation in later patches.
> 
> AFAICT, the Arm helpers are gated by !CONFIG_NUMA and the ones in common
> code will be gated by CONFIG_NUMA. So I am not quite too understand why
> they can't be moved now. Can you clarify it?
> 

Yes, you're right. After we had introduced !CONFIG_NUMA, we can
move node_to_cpumask and cpu_to_node in this patch too. I will
fix it in next version.

> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 11/40] xen/x86: Move NUMA nodes and memory block ranges to common
  2021-08-11 10:23 ` [XEN RFC PATCH 11/40] xen/x86: Move NUMA nodes and memory block ranges " Wei Chen
@ 2021-08-24 17:40   ` Julien Grall
  2021-08-25  0:57     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-24 17:40 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:23, Wei Chen wrote:
> These data structures and functions are used to create the
> mapping between node and memory blocks. In device tree based
> NUMA, we will reuse these data structures and functions, so
> we move this part of code from x86 to common.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/x86/srat.c        | 50 -------------------------------------
>   xen/common/numa.c          | 51 ++++++++++++++++++++++++++++++++++++++
>   xen/include/asm-x86/numa.h |  8 ------
>   xen/include/xen/numa.h     | 15 +++++++++++
>   4 files changed, 66 insertions(+), 58 deletions(-)
> 
> diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
> index 6b77b98201..6d68b8a614 100644
> --- a/xen/arch/x86/srat.c
> +++ b/xen/arch/x86/srat.c
> @@ -26,7 +26,6 @@ static struct acpi_table_slit *__read_mostly acpi_slit;
>   
>   static nodemask_t memory_nodes_parsed __initdata;
>   static nodemask_t processor_nodes_parsed __initdata;
> -static struct node nodes[MAX_NUMNODES] __initdata;
>   
>   struct pxm2node {
>   	unsigned pxm;
> @@ -37,9 +36,6 @@ static struct pxm2node __read_mostly pxm2node[MAX_NUMNODES] =
>   
>   static unsigned node_to_pxm(nodeid_t n);
>   
> -static int num_node_memblks;
> -static struct node node_memblk_range[NR_NODE_MEMBLKS];
> -static nodeid_t memblk_nodeid[NR_NODE_MEMBLKS];
>   static __initdata DECLARE_BITMAP(memblk_hotplug, NR_NODE_MEMBLKS);
>   
>   static inline bool node_found(unsigned idx, unsigned pxm)
> @@ -104,52 +100,6 @@ nodeid_t setup_node(unsigned pxm)
>   	return node;
>   }
>   
> -int valid_numa_range(u64 start, u64 end, nodeid_t node)
> -{
> -	int i;
> -
> -	for (i = 0; i < num_node_memblks; i++) {
> -		struct node *nd = &node_memblk_range[i];
> -
> -		if (nd->start <= start && nd->end >= end &&
> -			memblk_nodeid[i] == node)
> -			return 1;
> -	}
> -
> -	return 0;
> -}
> -
> -static __init int conflicting_memblks(u64 start, u64 end)
> -{
> -	int i;
> -
> -	for (i = 0; i < num_node_memblks; i++) {
> -		struct node *nd = &node_memblk_range[i];
> -		if (nd->start == nd->end)
> -			continue;
> -		if (nd->end > start && nd->start < end)
> -			return i;
> -		if (nd->end == end && nd->start == start)
> -			return i;
> -	}
> -	return -1;
> -}
> -
> -static __init void cutoff_node(int i, u64 start, u64 end)
> -{
> -	struct node *nd = &nodes[i];
> -	if (nd->start < start) {
> -		nd->start = start;
> -		if (nd->end < nd->start)
> -			nd->start = nd->end;
> -	}
> -	if (nd->end > end) {
> -		nd->end = end;
> -		if (nd->start > nd->end)
> -			nd->start = nd->end;
> -	}
> -}
> -
>   static __init void bad_srat(void)
>   {
>   	int i;
> diff --git a/xen/common/numa.c b/xen/common/numa.c
> index 9b6f23dfc1..1facc8fe2b 100644
> --- a/xen/common/numa.c
> +++ b/xen/common/numa.c
> @@ -29,6 +29,11 @@ nodeid_t cpu_to_node[NR_CPUS] __read_mostly = {
>   
>   cpumask_t node_to_cpumask[MAX_NUMNODES] __read_mostly;
>   
> +struct node nodes[MAX_NUMNODES] __initdata;
> +int num_node_memblks;
> +struct node node_memblk_range[NR_NODE_MEMBLKS];
> +nodeid_t memblk_nodeid[NR_NODE_MEMBLKS];
> +
>   /*
>    * Given a shift value, try to populate memnodemap[]
>    * Returns :
> @@ -136,6 +141,52 @@ int __init compute_hash_shift(struct node *nodes, int numnodes,
>       return shift;
>   }
>   
> +int valid_numa_range(u64 start, u64 end, nodeid_t node)
> +{
> +	int i;
> +
> +	for (i = 0; i < num_node_memblks; i++) {
> +		struct node *nd = &node_memblk_range[i];
> +
> +		if (nd->start <= start && nd->end >= end &&
> +			memblk_nodeid[i] == node)
> +			return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +int __init conflicting_memblks(u64 start, u64 end)
> +{
> +	int i;
> +
> +	for (i = 0; i < num_node_memblks; i++) {
> +		struct node *nd = &node_memblk_range[i];
> +		if (nd->start == nd->end)
> +			continue;
> +		if (nd->end > start && nd->start < end)
> +			return i;
> +		if (nd->end == end && nd->start == start)
> +			return i;
> +	}
> +	return -1;
> +}
> +
> +void __init cutoff_node(int i, u64 start, u64 end)
> +{
> +	struct node *nd = &nodes[i];
> +	if (nd->start < start) {
> +		nd->start = start;
> +		if (nd->end < nd->start)
> +			nd->start = nd->end;
> +	}
> +	if (nd->end > end) {
> +		nd->end = end;
> +		if (nd->start > nd->end)
> +			nd->start = nd->end;
> +	}
> +}
> +
>   void numa_add_cpu(int cpu)
>   {
>       cpumask_set_cpu(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
> diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
> index 07ff78ea1b..e8a92ad9df 100644
> --- a/xen/include/asm-x86/numa.h
> +++ b/xen/include/asm-x86/numa.h
> @@ -17,12 +17,6 @@ extern cpumask_t     node_to_cpumask[];
>   #define node_to_first_cpu(node)  (__ffs(node_to_cpumask[node]))
>   #define node_to_cpumask(node)    (node_to_cpumask[node])
>   
> -struct node {
> -	u64 start,end;
> -};
> -
> -extern int compute_hash_shift(struct node *nodes, int numnodes,
> -			      nodeid_t *nodeids);
>   extern nodeid_t pxm_to_node(unsigned int pxm);
>   
>   #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
> @@ -45,8 +39,6 @@ static inline void clear_node_cpumask(int cpu)
>   	cpumask_clear_cpu(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
>   }
>   
> -extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
> -
>   void srat_parse_regions(u64 addr);
>   extern u8 __node_distance(nodeid_t a, nodeid_t b);
>   unsigned int arch_get_dma_bitsize(void);
> diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
> index 5af74b357f..67b79a73a3 100644
> --- a/xen/include/xen/numa.h
> +++ b/xen/include/xen/numa.h
> @@ -54,6 +54,21 @@ static inline __attribute__((pure)) nodeid_t phys_to_nid(paddr_t addr)
>   
>   extern void numa_add_cpu(int cpu);
>   
> +struct node {
> +	u64 start,end;
> +};
> +
> +extern struct node nodes[MAX_NUMNODES];
> +extern int num_node_memblks;
> +extern struct node node_memblk_range[NR_NODE_MEMBLKS];
> +extern nodeid_t memblk_nodeid[NR_NODE_MEMBLKS];

I am not overly happy that the 4 ariables above are now exported. 
Looking at the code, they are (only?) used in arch specific code for 
acpi_numa_memory_affinity_init() and dtb_numa_memory_affinity_init().

There bits touching the variables looks quite similar between the two 
functions. The main differences seems to be the messages in printk() and 
the hotplug bits.

So I think we should attempt to abstract the code. IIRC, we discussed 
some of the way to abstract when Vijay Kilari attempted to add support 
for NUMA (see [1]). It might be worth to have a look and see if you can 
re-use some of the ideas.

> +
> +extern int compute_hash_shift(struct node *nodes, int numnodes,
> +			      nodeid_t *nodeids);
> +extern int conflicting_memblks(u64 start, u64 end);
> +extern void cutoff_node(int i, u64 start, u64 end);
> +extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
> +
>   #endif /* CONFIG_NUMA */
>   
>   #endif /* _XEN_NUMA_H */
> 

[1] 
https://lore.kernel.org/xen-devel/1500378106-2620-1-git-send-email-vijay.kilari@gmail.com/

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 11/40] xen/x86: Move NUMA nodes and memory block ranges to common
  2021-08-24 17:40   ` Julien Grall
@ 2021-08-25  0:57     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-25  0:57 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月25日 1:41
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 11/40] xen/x86: Move NUMA nodes and memory
> block ranges to common
> 
> Hi Wei,
> 
> On 11/08/2021 11:23, Wei Chen wrote:
> > These data structures and functions are used to create the
> > mapping between node and memory blocks. In device tree based
> > NUMA, we will reuse these data structures and functions, so
> > we move this part of code from x86 to common.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/x86/srat.c        | 50 -------------------------------------
> >   xen/common/numa.c          | 51 ++++++++++++++++++++++++++++++++++++++
> >   xen/include/asm-x86/numa.h |  8 ------
> >   xen/include/xen/numa.h     | 15 +++++++++++
> >   4 files changed, 66 insertions(+), 58 deletions(-)
> >
> > diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
> > index 6b77b98201..6d68b8a614 100644
> > --- a/xen/arch/x86/srat.c
> > +++ b/xen/arch/x86/srat.c
> > @@ -26,7 +26,6 @@ static struct acpi_table_slit *__read_mostly acpi_slit;
> >
> >   static nodemask_t memory_nodes_parsed __initdata;
> >   static nodemask_t processor_nodes_parsed __initdata;
> > -static struct node nodes[MAX_NUMNODES] __initdata;
> >
> >   struct pxm2node {
> >   	unsigned pxm;
> > @@ -37,9 +36,6 @@ static struct pxm2node __read_mostly
> pxm2node[MAX_NUMNODES] =
> >
> >   static unsigned node_to_pxm(nodeid_t n);
> >
> > -static int num_node_memblks;
> > -static struct node node_memblk_range[NR_NODE_MEMBLKS];
> > -static nodeid_t memblk_nodeid[NR_NODE_MEMBLKS];
> >   static __initdata DECLARE_BITMAP(memblk_hotplug, NR_NODE_MEMBLKS);
> >
> >   static inline bool node_found(unsigned idx, unsigned pxm)
> > @@ -104,52 +100,6 @@ nodeid_t setup_node(unsigned pxm)
> >   	return node;
> >   }
> >
> > -int valid_numa_range(u64 start, u64 end, nodeid_t node)
> > -{
> > -	int i;
> > -
> > -	for (i = 0; i < num_node_memblks; i++) {
> > -		struct node *nd = &node_memblk_range[i];
> > -
> > -		if (nd->start <= start && nd->end >= end &&
> > -			memblk_nodeid[i] == node)
> > -			return 1;
> > -	}
> > -
> > -	return 0;
> > -}
> > -
> > -static __init int conflicting_memblks(u64 start, u64 end)
> > -{
> > -	int i;
> > -
> > -	for (i = 0; i < num_node_memblks; i++) {
> > -		struct node *nd = &node_memblk_range[i];
> > -		if (nd->start == nd->end)
> > -			continue;
> > -		if (nd->end > start && nd->start < end)
> > -			return i;
> > -		if (nd->end == end && nd->start == start)
> > -			return i;
> > -	}
> > -	return -1;
> > -}
> > -
> > -static __init void cutoff_node(int i, u64 start, u64 end)
> > -{
> > -	struct node *nd = &nodes[i];
> > -	if (nd->start < start) {
> > -		nd->start = start;
> > -		if (nd->end < nd->start)
> > -			nd->start = nd->end;
> > -	}
> > -	if (nd->end > end) {
> > -		nd->end = end;
> > -		if (nd->start > nd->end)
> > -			nd->start = nd->end;
> > -	}
> > -}
> > -
> >   static __init void bad_srat(void)
> >   {
> >   	int i;
> > diff --git a/xen/common/numa.c b/xen/common/numa.c
> > index 9b6f23dfc1..1facc8fe2b 100644
> > --- a/xen/common/numa.c
> > +++ b/xen/common/numa.c
> > @@ -29,6 +29,11 @@ nodeid_t cpu_to_node[NR_CPUS] __read_mostly = {
> >
> >   cpumask_t node_to_cpumask[MAX_NUMNODES] __read_mostly;
> >
> > +struct node nodes[MAX_NUMNODES] __initdata;
> > +int num_node_memblks;
> > +struct node node_memblk_range[NR_NODE_MEMBLKS];
> > +nodeid_t memblk_nodeid[NR_NODE_MEMBLKS];
> > +
> >   /*
> >    * Given a shift value, try to populate memnodemap[]
> >    * Returns :
> > @@ -136,6 +141,52 @@ int __init compute_hash_shift(struct node *nodes,
> int numnodes,
> >       return shift;
> >   }
> >
> > +int valid_numa_range(u64 start, u64 end, nodeid_t node)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < num_node_memblks; i++) {
> > +		struct node *nd = &node_memblk_range[i];
> > +
> > +		if (nd->start <= start && nd->end >= end &&
> > +			memblk_nodeid[i] == node)
> > +			return 1;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +int __init conflicting_memblks(u64 start, u64 end)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < num_node_memblks; i++) {
> > +		struct node *nd = &node_memblk_range[i];
> > +		if (nd->start == nd->end)
> > +			continue;
> > +		if (nd->end > start && nd->start < end)
> > +			return i;
> > +		if (nd->end == end && nd->start == start)
> > +			return i;
> > +	}
> > +	return -1;
> > +}
> > +
> > +void __init cutoff_node(int i, u64 start, u64 end)
> > +{
> > +	struct node *nd = &nodes[i];
> > +	if (nd->start < start) {
> > +		nd->start = start;
> > +		if (nd->end < nd->start)
> > +			nd->start = nd->end;
> > +	}
> > +	if (nd->end > end) {
> > +		nd->end = end;
> > +		if (nd->start > nd->end)
> > +			nd->start = nd->end;
> > +	}
> > +}
> > +
> >   void numa_add_cpu(int cpu)
> >   {
> >       cpumask_set_cpu(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
> > diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
> > index 07ff78ea1b..e8a92ad9df 100644
> > --- a/xen/include/asm-x86/numa.h
> > +++ b/xen/include/asm-x86/numa.h
> > @@ -17,12 +17,6 @@ extern cpumask_t     node_to_cpumask[];
> >   #define node_to_first_cpu(node)  (__ffs(node_to_cpumask[node]))
> >   #define node_to_cpumask(node)    (node_to_cpumask[node])
> >
> > -struct node {
> > -	u64 start,end;
> > -};
> > -
> > -extern int compute_hash_shift(struct node *nodes, int numnodes,
> > -			      nodeid_t *nodeids);
> >   extern nodeid_t pxm_to_node(unsigned int pxm);
> >
> >   #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
> > @@ -45,8 +39,6 @@ static inline void clear_node_cpumask(int cpu)
> >   	cpumask_clear_cpu(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
> >   }
> >
> > -extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
> > -
> >   void srat_parse_regions(u64 addr);
> >   extern u8 __node_distance(nodeid_t a, nodeid_t b);
> >   unsigned int arch_get_dma_bitsize(void);
> > diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
> > index 5af74b357f..67b79a73a3 100644
> > --- a/xen/include/xen/numa.h
> > +++ b/xen/include/xen/numa.h
> > @@ -54,6 +54,21 @@ static inline __attribute__((pure)) nodeid_t
> phys_to_nid(paddr_t addr)
> >
> >   extern void numa_add_cpu(int cpu);
> >
> > +struct node {
> > +	u64 start,end;
> > +};
> > +
> > +extern struct node nodes[MAX_NUMNODES];
> > +extern int num_node_memblks;
> > +extern struct node node_memblk_range[NR_NODE_MEMBLKS];
> > +extern nodeid_t memblk_nodeid[NR_NODE_MEMBLKS];
> 
> I am not overly happy that the 4 ariables above are now exported.
> Looking at the code, they are (only?) used in arch specific code for
> acpi_numa_memory_affinity_init() and dtb_numa_memory_affinity_init().
> 
> There bits touching the variables looks quite similar between the two
> functions. The main differences seems to be the messages in printk() and
> the hotplug bits.
> 
> So I think we should attempt to abstract the code. IIRC, we discussed
> some of the way to abstract when Vijay Kilari attempted to add support
> for NUMA (see [1]). It might be worth to have a look and see if you can
> re-use some of the ideas.

Ok, I will look at that thread. If it's useful, I will do it in next
version.

> 
> > +
> > +extern int compute_hash_shift(struct node *nodes, int numnodes,
> > +			      nodeid_t *nodeids);
> > +extern int conflicting_memblks(u64 start, u64 end);
> > +extern void cutoff_node(int i, u64 start, u64 end);
> > +extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
> > +
> >   #endif /* CONFIG_NUMA */
> >
> >   #endif /* _XEN_NUMA_H */
> >
> 
> [1]
> https://lore.kernel.org/xen-devel/1500378106-2620-1-git-send-email-
> vijay.kilari@gmail.com/
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 12/40] xen/x86: Move numa_initmem_init to common
  2021-08-11 10:23 ` [XEN RFC PATCH 12/40] xen/x86: Move numa_initmem_init " Wei Chen
@ 2021-08-25 10:21   ` Julien Grall
  2021-08-25 11:15     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-25 10:21 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:23, Wei Chen wrote:
> This function can be reused by Arm device tree based
> NUMA support. So we move it from x86 to common, as well
> as its related variables and functions:
> setup_node_bootmem, numa_init_array and numa_emulation.
> 
> As numa_initmem_init has been moved to common, _memnodemap
> is not used cross files. We can restore _memnodemap to
> static.

As we discussed on a previous patch, we should try to avoid this kind of 
dance. I can help to find a split that would achieve that.

> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/x86/numa.c         | 118 ----------------------------------
>   xen/common/numa.c           | 122 +++++++++++++++++++++++++++++++++++-
>   xen/include/asm-x86/numa.h  |   5 --
>   xen/include/asm-x86/setup.h |   1 -
>   xen/include/xen/numa.h      |   8 ++-
>   5 files changed, 128 insertions(+), 126 deletions(-)
> 
> diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
> index f2626b3968..6908738305 100644
> --- a/xen/arch/x86/numa.c
> +++ b/xen/arch/x86/numa.c
> @@ -38,7 +38,6 @@ nodeid_t apicid_to_node[MAX_LOCAL_APIC] = {
>   
>   nodemask_t __read_mostly node_online_map = { { [0] = 1UL } };
>   
> -bool numa_off;
>   s8 acpi_numa = 0;
>   
>   int srat_disabled(void)
> @@ -46,123 +45,6 @@ int srat_disabled(void)
>       return numa_off || acpi_numa < 0;
>   }
>   
> -/* initialize NODE_DATA given nodeid and start/end */
> -void __init setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end)
> -{
> -    unsigned long start_pfn, end_pfn;
> -
> -    start_pfn = start >> PAGE_SHIFT;
> -    end_pfn = end >> PAGE_SHIFT;
> -
> -    NODE_DATA(nodeid)->node_start_pfn = start_pfn;
> -    NODE_DATA(nodeid)->node_spanned_pages = end_pfn - start_pfn;
> -
> -    node_set_online(nodeid);
> -}
> -
> -void __init numa_init_array(void)
> -{
> -    int rr, i;
> -
> -    /* There are unfortunately some poorly designed mainboards around
> -       that only connect memory to a single CPU. This breaks the 1:1 cpu->node
> -       mapping. To avoid this fill in the mapping for all possible
> -       CPUs, as the number of CPUs is not known yet.
> -       We round robin the existing nodes. */
> -    rr = first_node(node_online_map);
> -    for ( i = 0; i < nr_cpu_ids; i++ )
> -    {
> -        if ( cpu_to_node[i] != NUMA_NO_NODE )
> -            continue;
> -        numa_set_node(i, rr);
> -        rr = cycle_node(rr, node_online_map);
> -    }
> -}
> -
> -#ifdef CONFIG_NUMA_EMU
> -static int numa_fake __initdata = 0;
> -
> -/* Numa emulation */
> -static int __init numa_emulation(u64 start_pfn, u64 end_pfn)
> -{
> -    int i;
> -    struct node nodes[MAX_NUMNODES];
> -    u64 sz = ((end_pfn - start_pfn)<<PAGE_SHIFT) / numa_fake;
> -
> -    /* Kludge needed for the hash function */
> -    if ( hweight64(sz) > 1 )
> -    {
> -        u64 x = 1;
> -        while ( (x << 1) < sz )
> -            x <<= 1;
> -        if ( x < sz/2 )
> -            printk(KERN_ERR "Numa emulation unbalanced. Complain to maintainer\n");
> -        sz = x;
> -    }
> -
> -    memset(&nodes,0,sizeof(nodes));
> -    for ( i = 0; i < numa_fake; i++ )
> -    {
> -        nodes[i].start = (start_pfn<<PAGE_SHIFT) + i*sz;
> -        if ( i == numa_fake - 1 )
> -            sz = (end_pfn<<PAGE_SHIFT) - nodes[i].start;
> -        nodes[i].end = nodes[i].start + sz;
> -        printk(KERN_INFO "Faking node %d at %"PRIx64"-%"PRIx64" (%"PRIu64"MB)\n",
> -               i,
> -               nodes[i].start, nodes[i].end,
> -               (nodes[i].end - nodes[i].start) >> 20);
> -        node_set_online(i);
> -    }
> -    memnode_shift = compute_hash_shift(nodes, numa_fake, NULL);
> -    if ( memnode_shift < 0 )
> -    {
> -        memnode_shift = 0;
> -        printk(KERN_ERR "No NUMA hash function found. Emulation disabled.\n");
> -        return -1;
> -    }
> -    for_each_online_node ( i )
> -        setup_node_bootmem(i, nodes[i].start, nodes[i].end);
> -    numa_init_array();
> -
> -    return 0;
> -}
> -#endif
> -
> -void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
> -{
> -    int i;
> -
> -#ifdef CONFIG_NUMA_EMU
> -    if ( numa_fake && !numa_emulation(start_pfn, end_pfn) )
> -        return;
> -#endif
> -
> -#ifdef CONFIG_ACPI_NUMA
> -    if ( !numa_off && !acpi_scan_nodes((u64)start_pfn << PAGE_SHIFT,
> -         (u64)end_pfn << PAGE_SHIFT) )
> -        return;
> -#endif
> -
> -    printk(KERN_INFO "%s\n",
> -           numa_off ? "NUMA turned off" : "No NUMA configuration found");
> -
> -    printk(KERN_INFO "Faking a node at %016"PRIx64"-%016"PRIx64"\n",
> -           (u64)start_pfn << PAGE_SHIFT,
> -           (u64)end_pfn << PAGE_SHIFT);
> -    /* setup dummy node covering all memory */
> -    memnode_shift = BITS_PER_LONG - 1;
> -    memnodemap = _memnodemap;
> -    memnodemapsize = ARRAY_SIZE(_memnodemap);
> -
> -    nodes_clear(node_online_map);
> -    node_set_online(0);
> -    for ( i = 0; i < nr_cpu_ids; i++ )
> -        numa_set_node(i, 0);
> -    cpumask_copy(&node_to_cpumask[0], cpumask_of(0));
> -    setup_node_bootmem(0, (u64)start_pfn << PAGE_SHIFT,
> -                    (u64)end_pfn << PAGE_SHIFT);
> -}
> -
>   void numa_set_node(int cpu, nodeid_t node)
>   {
>       cpu_to_node[cpu] = node;
> diff --git a/xen/common/numa.c b/xen/common/numa.c
> index 1facc8fe2b..26c0006d04 100644
> --- a/xen/common/numa.c
> +++ b/xen/common/numa.c
> @@ -14,12 +14,13 @@
>   #include <xen/smp.h>
>   #include <xen/pfn.h>
>   #include <xen/sched.h>

NIT: We tend to add a newline betwen <xen/...> headers and <asm/...> 
headers.

> +#include <asm/acpi.h>
>   
>   struct node_data node_data[MAX_NUMNODES];
>   
>   /* Mapping from pdx to node id */
>   int memnode_shift;
> -typeof(*memnodemap) _memnodemap[64];
> +static typeof(*memnodemap) _memnodemap[64];
>   unsigned long memnodemapsize;
>   u8 *memnodemap;
>   
> @@ -34,6 +35,8 @@ int num_node_memblks;
>   struct node node_memblk_range[NR_NODE_MEMBLKS];
>   nodeid_t memblk_nodeid[NR_NODE_MEMBLKS];
>   
> +bool numa_off;
> +
>   /*
>    * Given a shift value, try to populate memnodemap[]
>    * Returns :
> @@ -191,3 +194,120 @@ void numa_add_cpu(int cpu)
>   {
>       cpumask_set_cpu(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
>   }
> +
> +/* initialize NODE_DATA given nodeid and start/end */
> +void __init setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end)

 From an abstract PoV, start and end should be paddr_t. This should be 
done on a separate patch though.

> +{
> +    unsigned long start_pfn, end_pfn;
> +
> +    start_pfn = start >> PAGE_SHIFT;
> +    end_pfn = end >> PAGE_SHIFT;
> +
> +    NODE_DATA(nodeid)->node_start_pfn = start_pfn;
> +    NODE_DATA(nodeid)->node_spanned_pages = end_pfn - start_pfn;
> +
> +    node_set_online(nodeid);
> +}
> +
> +void __init numa_init_array(void)
> +{
> +    int rr, i;
> +
> +    /* There are unfortunately some poorly designed mainboards around
> +       that only connect memory to a single CPU. This breaks the 1:1 cpu->node
> +       mapping. To avoid this fill in the mapping for all possible
> +       CPUs, as the number of CPUs is not known yet.
> +       We round robin the existing nodes. */
> +    rr = first_node(node_online_map);
> +    for ( i = 0; i < nr_cpu_ids; i++ )
> +    {
> +        if ( cpu_to_node[i] != NUMA_NO_NODE )
> +            continue;
> +        numa_set_node(i, rr);
> +        rr = cycle_node(rr, node_online_map);
> +    }
> +}
> +
> +#ifdef CONFIG_NUMA_EMU
> +int numa_fake __initdata = 0;
> +
> +/* Numa emulation */
> +static int __init numa_emulation(u64 start_pfn, u64 end_pfn)

Here, this should be either "unsigned long" or ideally "mfn_t". 
Although, if you use "unsigned long", you will need to...

> +{
> +    int i;
> +    struct node nodes[MAX_NUMNODES];
> +    u64 sz = ((end_pfn - start_pfn)<<PAGE_SHIFT) / numa_fake;

... cast "(end_pfn - start_pfn)" to uin64_t or use pfn_to_paddr().

> +
> +    /* Kludge needed for the hash function */
> +    if ( hweight64(sz) > 1 )
> +    {
> +        u64 x = 1;
> +        while ( (x << 1) < sz )
> +            x <<= 1;
> +        if ( x < sz/2 )
> +            printk(KERN_ERR "Numa emulation unbalanced. Complain to maintainer\n");
> +        sz = x;
> +    }
> +
> +    memset(&nodes,0,sizeof(nodes));
> +    for ( i = 0; i < numa_fake; i++ )
> +    {
> +        nodes[i].start = (start_pfn<<PAGE_SHIFT) + i*sz;
> +        if ( i == numa_fake - 1 )
> +            sz = (end_pfn<<PAGE_SHIFT) - nodes[i].start;
> +        nodes[i].end = nodes[i].start + sz;
> +        printk(KERN_INFO "Faking node %d at %"PRIx64"-%"PRIx64" (%"PRIu64"MB)\n",
> +               i,
> +               nodes[i].start, nodes[i].end,
> +               (nodes[i].end - nodes[i].start) >> 20);
> +        node_set_online(i);
> +    }
> +    memnode_shift = compute_hash_shift(nodes, numa_fake, NULL);
> +    if ( memnode_shift < 0 )
> +    {
> +        memnode_shift = 0;
> +        printk(KERN_ERR "No NUMA hash function found. Emulation disabled.\n");
> +        return -1;
> +    }
> +    for_each_online_node ( i )
> +        setup_node_bootmem(i, nodes[i].start, nodes[i].end);
> +    numa_init_array();
> +
> +    return 0;
> +}
> +#endif
> +
> +void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
> +{
> +    int i;
> +
> +#ifdef CONFIG_NUMA_EMU
> +    if ( numa_fake && !numa_emulation(start_pfn, end_pfn) )
> +        return;
> +#endif
> +
> +#ifdef CONFIG_ACPI_NUMA
> +    if ( !numa_off && !acpi_scan_nodes((u64)start_pfn << PAGE_SHIFT,
> +         (u64)end_pfn << PAGE_SHIFT) )

(u64)v << PAGE_SHIFT should be switched to use pfn_to_paddr() or 
mfn_to_paddr() if you decide to make start_pfn and end_pfn typesafe.

> +        return;
> +#endif
> +
> +    printk(KERN_INFO "%s\n",
> +           numa_off ? "NUMA turned off" : "No NUMA configuration found");
> +
> +    printk(KERN_INFO "Faking a node at %016"PRIx64"-%016"PRIx64"\n",
> +           (u64)start_pfn << PAGE_SHIFT,
> +           (u64)end_pfn << PAGE_SHIFT);

Same remark here. PRIx64 would also have to be switched to PRIpaddr.

> +    /* setup dummy node covering all memory */
> +    memnode_shift = BITS_PER_LONG - 1;
> +    memnodemap = _memnodemap;
> +    memnodemapsize = ARRAY_SIZE(_memnodemap);
> +
> +    nodes_clear(node_online_map);
> +    node_set_online(0);
> +    for ( i = 0; i < nr_cpu_ids; i++ )
> +        numa_set_node(i, 0);
> +    cpumask_copy(&node_to_cpumask[0], cpumask_of(0));
> +    setup_node_bootmem(0, (u64)start_pfn << PAGE_SHIFT,
> +                    (u64)end_pfn << PAGE_SHIFT);
> +}
> diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
> index e8a92ad9df..f8e4e15586 100644
> --- a/xen/include/asm-x86/numa.h
> +++ b/xen/include/asm-x86/numa.h
> @@ -21,16 +21,11 @@ extern nodeid_t pxm_to_node(unsigned int pxm);
>   
>   #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
>   
> -extern void numa_init_array(void);
> -extern bool numa_off;
> -
> -
>   extern int srat_disabled(void);
>   extern void numa_set_node(int cpu, nodeid_t node);
>   extern nodeid_t setup_node(unsigned int pxm);
>   extern void srat_detect_node(int cpu);
>   
> -extern void setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end);
>   extern nodeid_t apicid_to_node[];
>   extern void init_cpu_to_node(void);
>   
> diff --git a/xen/include/asm-x86/setup.h b/xen/include/asm-x86/setup.h
> index 24be46115d..63838ba2d1 100644
> --- a/xen/include/asm-x86/setup.h
> +++ b/xen/include/asm-x86/setup.h
> @@ -17,7 +17,6 @@ void early_time_init(void);
>   
>   void set_nr_cpu_ids(unsigned int max_cpus);
>   
> -void numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn);
>   void arch_init_memory(void);
>   void subarch_init_memory(void);
>   
> diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
> index 67b79a73a3..258a5cb3db 100644
> --- a/xen/include/xen/numa.h
> +++ b/xen/include/xen/numa.h
> @@ -26,7 +26,6 @@
>   extern int memnode_shift;
>   extern unsigned long memnodemapsize;
>   extern u8 *memnodemap;
> -extern typeof(*memnodemap) _memnodemap[64];
>   
>   struct node_data {
>       unsigned long node_start_pfn;
> @@ -69,6 +68,13 @@ extern int conflicting_memblks(u64 start, u64 end);
>   extern void cutoff_node(int i, u64 start, u64 end);
>   extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
>   
> +extern void numa_init_array(void);
> +extern void numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn);
> +extern bool numa_off;
> +extern int numa_fake;
> +
> +extern void setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end);
> +
>   #endif /* CONFIG_NUMA */
>   
>   #endif /* _XEN_NUMA_H */
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 13/40] xen/arm: introduce numa_set_node for Arm
  2021-08-11 10:23 ` [XEN RFC PATCH 13/40] xen/arm: introduce numa_set_node for Arm Wei Chen
@ 2021-08-25 10:36   ` Julien Grall
  2021-08-25 12:07     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-25 10:36 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:23, Wei Chen wrote:
> This API is used to set one CPU to a NUMA node. If the system
> configure NUMA off or system initialize NUMA failed, the
> online NUMA node would set to only node#0. This will be done
> in following patches. When NUMA turn off or init failed,
> node_online_map will be cleared and set node#0 online. So we
> use node_online_map to prevent to set a CPU to an offline node.

IHMO numa_set_node() should behave exactly the same way on x86 and Arm 
because this is going to be used by the common code.

 From the commit message, I don't quite understand why the check is 
necessary on Arm but not on x86. Can you clarify it?

> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/Makefile      |  1 +
>   xen/arch/arm/numa.c        | 31 +++++++++++++++++++++++++++++++
>   xen/include/asm-arm/numa.h |  2 ++
>   xen/include/asm-x86/numa.h |  1 -
>   xen/include/xen/numa.h     |  1 +
>   5 files changed, 35 insertions(+), 1 deletion(-)
>   create mode 100644 xen/arch/arm/numa.c
> 
> diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
> index 3d3b97b5b4..6e3fb8033e 100644
> --- a/xen/arch/arm/Makefile
> +++ b/xen/arch/arm/Makefile
> @@ -35,6 +35,7 @@ obj-$(CONFIG_LIVEPATCH) += livepatch.o
>   obj-y += mem_access.o
>   obj-y += mm.o
>   obj-y += monitor.o
> +obj-$(CONFIG_NUMA) += numa.o
>   obj-y += p2m.o
>   obj-y += percpu.o
>   obj-y += platform.o
> diff --git a/xen/arch/arm/numa.c b/xen/arch/arm/numa.c
> new file mode 100644
> index 0000000000..1e30c5bb13
> --- /dev/null
> +++ b/xen/arch/arm/numa.c
> @@ -0,0 +1,31 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Arm Architecture support layer for NUMA.
> + *
> + * Copyright (C) 2021 Arm Ltd
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> + *
> + */
> +#include <xen/init.h>
> +#include <xen/nodemask.h>
> +#include <xen/numa.h>
> +
> +void numa_set_node(int cpu, nodeid_t nid)
> +{
> +    if ( nid >= MAX_NUMNODES ||
> +        !nodemask_test(nid, &node_online_map) )
> +        nid = 0;
> +
> +    cpu_to_node[cpu] = nid;
> +}
I think numa_set_node() will want to be implemented in common code.

> diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
> index ab9c4a2448..1162c702df 100644
> --- a/xen/include/asm-arm/numa.h
> +++ b/xen/include/asm-arm/numa.h
> @@ -27,6 +27,8 @@ extern mfn_t first_valid_mfn;
>   #define node_start_pfn(nid) (mfn_x(first_valid_mfn))
>   #define __node_distance(a, b) (20)
>   
> +#define numa_set_node(x, y) do { } while (0)

I would define it in xen/numa.h so other arch can take advantage ot it. 
Also, please use a static inline helper so the arguments are evaluated.

> +
>   #endif
>   
>   #endif /* __ARCH_ARM_NUMA_H */
> diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
> index f8e4e15586..69859b0a57 100644
> --- a/xen/include/asm-x86/numa.h
> +++ b/xen/include/asm-x86/numa.h
> @@ -22,7 +22,6 @@ extern nodeid_t pxm_to_node(unsigned int pxm);
>   #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
>   
>   extern int srat_disabled(void);
> -extern void numa_set_node(int cpu, nodeid_t node);
>   extern nodeid_t setup_node(unsigned int pxm);
>   extern void srat_detect_node(int cpu);
>   
> diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
> index 258a5cb3db..3972aa6b93 100644
> --- a/xen/include/xen/numa.h
> +++ b/xen/include/xen/numa.h
> @@ -70,6 +70,7 @@ extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
>   
>   extern void numa_init_array(void);
>   extern void numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn);
> +extern void numa_set_node(int cpu, nodeid_t node);
>   extern bool numa_off;
>   extern int numa_fake;
>   
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 12/40] xen/x86: Move numa_initmem_init to common
  2021-08-25 10:21   ` Julien Grall
@ 2021-08-25 11:15     ` Wei Chen
  2021-08-25 13:26       ` Julien Grall
  0 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-25 11:15 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月25日 18:22
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 12/40] xen/x86: Move numa_initmem_init to
> common
> 
> Hi Wei,
> 
> On 11/08/2021 11:23, Wei Chen wrote:
> > This function can be reused by Arm device tree based
> > NUMA support. So we move it from x86 to common, as well
> > as its related variables and functions:
> > setup_node_bootmem, numa_init_array and numa_emulation.
> >
> > As numa_initmem_init has been moved to common, _memnodemap
> > is not used cross files. We can restore _memnodemap to
> > static.
> 
> As we discussed on a previous patch, we should try to avoid this kind of
> dance. I can help to find a split that would achieve that.
> 

Yes, thanks!

> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/x86/numa.c         | 118 ----------------------------------
> >   xen/common/numa.c           | 122 +++++++++++++++++++++++++++++++++++-
> >   xen/include/asm-x86/numa.h  |   5 --
> >   xen/include/asm-x86/setup.h |   1 -
> >   xen/include/xen/numa.h      |   8 ++-
> >   5 files changed, 128 insertions(+), 126 deletions(-)
> >
> > diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
> > index f2626b3968..6908738305 100644
> > --- a/xen/arch/x86/numa.c
> > +++ b/xen/arch/x86/numa.c
> > @@ -38,7 +38,6 @@ nodeid_t apicid_to_node[MAX_LOCAL_APIC] = {
> >
> >   nodemask_t __read_mostly node_online_map = { { [0] = 1UL } };
> >
> > -bool numa_off;
> >   s8 acpi_numa = 0;
> >
> >   int srat_disabled(void)
> > @@ -46,123 +45,6 @@ int srat_disabled(void)
> >       return numa_off || acpi_numa < 0;
> >   }
> >
> > -/* initialize NODE_DATA given nodeid and start/end */
> > -void __init setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end)
> > -{
> > -    unsigned long start_pfn, end_pfn;
> > -
> > -    start_pfn = start >> PAGE_SHIFT;
> > -    end_pfn = end >> PAGE_SHIFT;
> > -
> > -    NODE_DATA(nodeid)->node_start_pfn = start_pfn;
> > -    NODE_DATA(nodeid)->node_spanned_pages = end_pfn - start_pfn;
> > -
> > -    node_set_online(nodeid);
> > -}
> > -
> > -void __init numa_init_array(void)
> > -{
> > -    int rr, i;
> > -
> > -    /* There are unfortunately some poorly designed mainboards around
> > -       that only connect memory to a single CPU. This breaks the 1:1
> cpu->node
> > -       mapping. To avoid this fill in the mapping for all possible
> > -       CPUs, as the number of CPUs is not known yet.
> > -       We round robin the existing nodes. */
> > -    rr = first_node(node_online_map);
> > -    for ( i = 0; i < nr_cpu_ids; i++ )
> > -    {
> > -        if ( cpu_to_node[i] != NUMA_NO_NODE )
> > -            continue;
> > -        numa_set_node(i, rr);
> > -        rr = cycle_node(rr, node_online_map);
> > -    }
> > -}
> > -
> > -#ifdef CONFIG_NUMA_EMU
> > -static int numa_fake __initdata = 0;
> > -
> > -/* Numa emulation */
> > -static int __init numa_emulation(u64 start_pfn, u64 end_pfn)
> > -{
> > -    int i;
> > -    struct node nodes[MAX_NUMNODES];
> > -    u64 sz = ((end_pfn - start_pfn)<<PAGE_SHIFT) / numa_fake;
> > -
> > -    /* Kludge needed for the hash function */
> > -    if ( hweight64(sz) > 1 )
> > -    {
> > -        u64 x = 1;
> > -        while ( (x << 1) < sz )
> > -            x <<= 1;
> > -        if ( x < sz/2 )
> > -            printk(KERN_ERR "Numa emulation unbalanced. Complain to
> maintainer\n");
> > -        sz = x;
> > -    }
> > -
> > -    memset(&nodes,0,sizeof(nodes));
> > -    for ( i = 0; i < numa_fake; i++ )
> > -    {
> > -        nodes[i].start = (start_pfn<<PAGE_SHIFT) + i*sz;
> > -        if ( i == numa_fake - 1 )
> > -            sz = (end_pfn<<PAGE_SHIFT) - nodes[i].start;
> > -        nodes[i].end = nodes[i].start + sz;
> > -        printk(KERN_INFO "Faking node %d at %"PRIx64"-%"PRIx64"
> (%"PRIu64"MB)\n",
> > -               i,
> > -               nodes[i].start, nodes[i].end,
> > -               (nodes[i].end - nodes[i].start) >> 20);
> > -        node_set_online(i);
> > -    }
> > -    memnode_shift = compute_hash_shift(nodes, numa_fake, NULL);
> > -    if ( memnode_shift < 0 )
> > -    {
> > -        memnode_shift = 0;
> > -        printk(KERN_ERR "No NUMA hash function found. Emulation
> disabled.\n");
> > -        return -1;
> > -    }
> > -    for_each_online_node ( i )
> > -        setup_node_bootmem(i, nodes[i].start, nodes[i].end);
> > -    numa_init_array();
> > -
> > -    return 0;
> > -}
> > -#endif
> > -
> > -void __init numa_initmem_init(unsigned long start_pfn, unsigned long
> end_pfn)
> > -{
> > -    int i;
> > -
> > -#ifdef CONFIG_NUMA_EMU
> > -    if ( numa_fake && !numa_emulation(start_pfn, end_pfn) )
> > -        return;
> > -#endif
> > -
> > -#ifdef CONFIG_ACPI_NUMA
> > -    if ( !numa_off && !acpi_scan_nodes((u64)start_pfn << PAGE_SHIFT,
> > -         (u64)end_pfn << PAGE_SHIFT) )
> > -        return;
> > -#endif
> > -
> > -    printk(KERN_INFO "%s\n",
> > -           numa_off ? "NUMA turned off" : "No NUMA configuration
> found");
> > -
> > -    printk(KERN_INFO "Faking a node at %016"PRIx64"-%016"PRIx64"\n",
> > -           (u64)start_pfn << PAGE_SHIFT,
> > -           (u64)end_pfn << PAGE_SHIFT);
> > -    /* setup dummy node covering all memory */
> > -    memnode_shift = BITS_PER_LONG - 1;
> > -    memnodemap = _memnodemap;
> > -    memnodemapsize = ARRAY_SIZE(_memnodemap);
> > -
> > -    nodes_clear(node_online_map);
> > -    node_set_online(0);
> > -    for ( i = 0; i < nr_cpu_ids; i++ )
> > -        numa_set_node(i, 0);
> > -    cpumask_copy(&node_to_cpumask[0], cpumask_of(0));
> > -    setup_node_bootmem(0, (u64)start_pfn << PAGE_SHIFT,
> > -                    (u64)end_pfn << PAGE_SHIFT);
> > -}
> > -
> >   void numa_set_node(int cpu, nodeid_t node)
> >   {
> >       cpu_to_node[cpu] = node;
> > diff --git a/xen/common/numa.c b/xen/common/numa.c
> > index 1facc8fe2b..26c0006d04 100644
> > --- a/xen/common/numa.c
> > +++ b/xen/common/numa.c
> > @@ -14,12 +14,13 @@
> >   #include <xen/smp.h>
> >   #include <xen/pfn.h>
> >   #include <xen/sched.h>
> 
> NIT: We tend to add a newline betwen <xen/...> headers and <asm/...>
> headers.
> 

got it

> > +#include <asm/acpi.h>
> >
> >   struct node_data node_data[MAX_NUMNODES];
> >
> >   /* Mapping from pdx to node id */
> >   int memnode_shift;
> > -typeof(*memnodemap) _memnodemap[64];
> > +static typeof(*memnodemap) _memnodemap[64];
> >   unsigned long memnodemapsize;
> >   u8 *memnodemap;
> >
> > @@ -34,6 +35,8 @@ int num_node_memblks;
> >   struct node node_memblk_range[NR_NODE_MEMBLKS];
> >   nodeid_t memblk_nodeid[NR_NODE_MEMBLKS];
> >
> > +bool numa_off;
> > +
> >   /*
> >    * Given a shift value, try to populate memnodemap[]
> >    * Returns :
> > @@ -191,3 +194,120 @@ void numa_add_cpu(int cpu)
> >   {
> >       cpumask_set_cpu(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
> >   }
> > +
> > +/* initialize NODE_DATA given nodeid and start/end */
> > +void __init setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end)
> 
>  From an abstract PoV, start and end should be paddr_t. This should be
> done on a separate patch though.
> 

Ok.

> > +{
> > +    unsigned long start_pfn, end_pfn;
> > +
> > +    start_pfn = start >> PAGE_SHIFT;
> > +    end_pfn = end >> PAGE_SHIFT;
> > +
> > +    NODE_DATA(nodeid)->node_start_pfn = start_pfn;
> > +    NODE_DATA(nodeid)->node_spanned_pages = end_pfn - start_pfn;
> > +
> > +    node_set_online(nodeid);
> > +}
> > +
> > +void __init numa_init_array(void)
> > +{
> > +    int rr, i;
> > +
> > +    /* There are unfortunately some poorly designed mainboards around
> > +       that only connect memory to a single CPU. This breaks the 1:1
> cpu->node
> > +       mapping. To avoid this fill in the mapping for all possible
> > +       CPUs, as the number of CPUs is not known yet.
> > +       We round robin the existing nodes. */
> > +    rr = first_node(node_online_map);
> > +    for ( i = 0; i < nr_cpu_ids; i++ )
> > +    {
> > +        if ( cpu_to_node[i] != NUMA_NO_NODE )
> > +            continue;
> > +        numa_set_node(i, rr);
> > +        rr = cycle_node(rr, node_online_map);
> > +    }
> > +}
> > +
> > +#ifdef CONFIG_NUMA_EMU
> > +int numa_fake __initdata = 0;
> > +
> > +/* Numa emulation */
> > +static int __init numa_emulation(u64 start_pfn, u64 end_pfn)
> 
> Here, this should be either "unsigned long" or ideally "mfn_t".
> Although, if you use "unsigned long", you will need to...
> 

Do we need a separate patch to do it?

> > +{
> > +    int i;
> > +    struct node nodes[MAX_NUMNODES];
> > +    u64 sz = ((end_pfn - start_pfn)<<PAGE_SHIFT) / numa_fake;
> 
> ... cast "(end_pfn - start_pfn)" to uin64_t or use pfn_to_paddr().
> 

Ok

> > +
> > +    /* Kludge needed for the hash function */
> > +    if ( hweight64(sz) > 1 )
> > +    {
> > +        u64 x = 1;
> > +        while ( (x << 1) < sz )
> > +            x <<= 1;
> > +        if ( x < sz/2 )
> > +            printk(KERN_ERR "Numa emulation unbalanced. Complain to
> maintainer\n");
> > +        sz = x;
> > +    }
> > +
> > +    memset(&nodes,0,sizeof(nodes));
> > +    for ( i = 0; i < numa_fake; i++ )
> > +    {
> > +        nodes[i].start = (start_pfn<<PAGE_SHIFT) + i*sz;
> > +        if ( i == numa_fake - 1 )
> > +            sz = (end_pfn<<PAGE_SHIFT) - nodes[i].start;
> > +        nodes[i].end = nodes[i].start + sz;
> > +        printk(KERN_INFO "Faking node %d at %"PRIx64"-%"PRIx64"
> (%"PRIu64"MB)\n",
> > +               i,
> > +               nodes[i].start, nodes[i].end,
> > +               (nodes[i].end - nodes[i].start) >> 20);
> > +        node_set_online(i);
> > +    }
> > +    memnode_shift = compute_hash_shift(nodes, numa_fake, NULL);
> > +    if ( memnode_shift < 0 )
> > +    {
> > +        memnode_shift = 0;
> > +        printk(KERN_ERR "No NUMA hash function found. Emulation
> disabled.\n");
> > +        return -1;
> > +    }
> > +    for_each_online_node ( i )
> > +        setup_node_bootmem(i, nodes[i].start, nodes[i].end);
> > +    numa_init_array();
> > +
> > +    return 0;
> > +}
> > +#endif
> > +
> > +void __init numa_initmem_init(unsigned long start_pfn, unsigned long
> end_pfn)
> > +{
> > +    int i;
> > +
> > +#ifdef CONFIG_NUMA_EMU
> > +    if ( numa_fake && !numa_emulation(start_pfn, end_pfn) )
> > +        return;
> > +#endif
> > +
> > +#ifdef CONFIG_ACPI_NUMA
> > +    if ( !numa_off && !acpi_scan_nodes((u64)start_pfn << PAGE_SHIFT,
> > +         (u64)end_pfn << PAGE_SHIFT) )
> 
> (u64)v << PAGE_SHIFT should be switched to use pfn_to_paddr() or
> mfn_to_paddr() if you decide to make start_pfn and end_pfn typesafe.
> 

Still need a separate patch to change it before move?

> > +        return;
> > +#endif
> > +
> > +    printk(KERN_INFO "%s\n",
> > +           numa_off ? "NUMA turned off" : "No NUMA configuration
> found");
> > +
> > +    printk(KERN_INFO "Faking a node at %016"PRIx64"-%016"PRIx64"\n",
> > +           (u64)start_pfn << PAGE_SHIFT,
> > +           (u64)end_pfn << PAGE_SHIFT);
> 
> Same remark here. PRIx64 would also have to be switched to PRIpaddr.
> 

Hmm, It seems I'd better to use a separate patch to do PRIpaddr clean up
before move code.

> > +    /* setup dummy node covering all memory */
> > +    memnode_shift = BITS_PER_LONG - 1;
> > +    memnodemap = _memnodemap;
> > +    memnodemapsize = ARRAY_SIZE(_memnodemap);
> > +
> > +    nodes_clear(node_online_map);
> > +    node_set_online(0);
> > +    for ( i = 0; i < nr_cpu_ids; i++ )
> > +        numa_set_node(i, 0);
> > +    cpumask_copy(&node_to_cpumask[0], cpumask_of(0));
> > +    setup_node_bootmem(0, (u64)start_pfn << PAGE_SHIFT,
> > +                    (u64)end_pfn << PAGE_SHIFT);
> > +}
> > diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
> > index e8a92ad9df..f8e4e15586 100644
> > --- a/xen/include/asm-x86/numa.h
> > +++ b/xen/include/asm-x86/numa.h
> > @@ -21,16 +21,11 @@ extern nodeid_t pxm_to_node(unsigned int pxm);
> >
> >   #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
> >
> > -extern void numa_init_array(void);
> > -extern bool numa_off;
> > -
> > -
> >   extern int srat_disabled(void);
> >   extern void numa_set_node(int cpu, nodeid_t node);
> >   extern nodeid_t setup_node(unsigned int pxm);
> >   extern void srat_detect_node(int cpu);
> >
> > -extern void setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end);
> >   extern nodeid_t apicid_to_node[];
> >   extern void init_cpu_to_node(void);
> >
> > diff --git a/xen/include/asm-x86/setup.h b/xen/include/asm-x86/setup.h
> > index 24be46115d..63838ba2d1 100644
> > --- a/xen/include/asm-x86/setup.h
> > +++ b/xen/include/asm-x86/setup.h
> > @@ -17,7 +17,6 @@ void early_time_init(void);
> >
> >   void set_nr_cpu_ids(unsigned int max_cpus);
> >
> > -void numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn);
> >   void arch_init_memory(void);
> >   void subarch_init_memory(void);
> >
> > diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
> > index 67b79a73a3..258a5cb3db 100644
> > --- a/xen/include/xen/numa.h
> > +++ b/xen/include/xen/numa.h
> > @@ -26,7 +26,6 @@
> >   extern int memnode_shift;
> >   extern unsigned long memnodemapsize;
> >   extern u8 *memnodemap;
> > -extern typeof(*memnodemap) _memnodemap[64];
> >
> >   struct node_data {
> >       unsigned long node_start_pfn;
> > @@ -69,6 +68,13 @@ extern int conflicting_memblks(u64 start, u64 end);
> >   extern void cutoff_node(int i, u64 start, u64 end);
> >   extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
> >
> > +extern void numa_init_array(void);
> > +extern void numa_initmem_init(unsigned long start_pfn, unsigned long
> end_pfn);
> > +extern bool numa_off;
> > +extern int numa_fake;
> > +
> > +extern void setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end);
> > +
> >   #endif /* CONFIG_NUMA */
> >
> >   #endif /* _XEN_NUMA_H */
> >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 13/40] xen/arm: introduce numa_set_node for Arm
  2021-08-25 10:36   ` Julien Grall
@ 2021-08-25 12:07     ` Wei Chen
  2021-08-25 13:24       ` Julien Grall
  0 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-25 12:07 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月25日 18:37
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 13/40] xen/arm: introduce numa_set_node for
> Arm
> 
> Hi Wei,
> 
> On 11/08/2021 11:23, Wei Chen wrote:
> > This API is used to set one CPU to a NUMA node. If the system
> > configure NUMA off or system initialize NUMA failed, the
> > online NUMA node would set to only node#0. This will be done
> > in following patches. When NUMA turn off or init failed,
> > node_online_map will be cleared and set node#0 online. So we
> > use node_online_map to prevent to set a CPU to an offline node.
> 
> IHMO numa_set_node() should behave exactly the same way on x86 and Arm
> because this is going to be used by the common code.
> 
>  From the commit message, I don't quite understand why the check is
> necessary on Arm but not on x86. Can you clarify it?
> 

Yes, in patch#27, in smpboot.c, dt_smp_init_cpus function.
We will parse CPU numa-node-id from dtb CPU node. If we get
a valid node ID for one CPU, we will invoke numa_set_node to
create CPU-NODE map. But in our testing, we found when NUMA
init failed, numa_set_node still can set CPU to a offline
or invalid NODE. So we're using node_online_map to prevent
this behavior. Otherwise we have to check node_online_map
everywhere before we call numa_set_node.

x86 actually is doing the same way, but it handles node_online_map
check out of numa_set_node:

57  void __init init_cpu_to_node(void)
58  {
59      unsigned int i;
60      nodeid_t node;
61  
62      for ( i = 0; i < nr_cpu_ids; i++ )
63      {
64          u32 apicid = x86_cpu_to_apicid[i];
65          if ( apicid == BAD_APICID )
66              continue;
67          node = apicid < MAX_LOCAL_APIC ? apicid_to_node[apicid] : NUMA_NO_NODE;
68          if ( node == NUMA_NO_NODE || !node_online(node) )
69              node = 0;
70          numa_set_node(i, node);
71      }
72  }


> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/arm/Makefile      |  1 +
> >   xen/arch/arm/numa.c        | 31 +++++++++++++++++++++++++++++++
> >   xen/include/asm-arm/numa.h |  2 ++
> >   xen/include/asm-x86/numa.h |  1 -
> >   xen/include/xen/numa.h     |  1 +
> >   5 files changed, 35 insertions(+), 1 deletion(-)
> >   create mode 100644 xen/arch/arm/numa.c
> >
> > diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
> > index 3d3b97b5b4..6e3fb8033e 100644
> > --- a/xen/arch/arm/Makefile
> > +++ b/xen/arch/arm/Makefile
> > @@ -35,6 +35,7 @@ obj-$(CONFIG_LIVEPATCH) += livepatch.o
> >   obj-y += mem_access.o
> >   obj-y += mm.o
> >   obj-y += monitor.o
> > +obj-$(CONFIG_NUMA) += numa.o
> >   obj-y += p2m.o
> >   obj-y += percpu.o
> >   obj-y += platform.o
> > diff --git a/xen/arch/arm/numa.c b/xen/arch/arm/numa.c
> > new file mode 100644
> > index 0000000000..1e30c5bb13
> > --- /dev/null
> > +++ b/xen/arch/arm/numa.c
> > @@ -0,0 +1,31 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Arm Architecture support layer for NUMA.
> > + *
> > + * Copyright (C) 2021 Arm Ltd
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program. If not, see <http://www.gnu.org/licenses/>.
> > + *
> > + */
> > +#include <xen/init.h>
> > +#include <xen/nodemask.h>
> > +#include <xen/numa.h>
> > +
> > +void numa_set_node(int cpu, nodeid_t nid)
> > +{
> > +    if ( nid >= MAX_NUMNODES ||
> > +        !nodemask_test(nid, &node_online_map) )
> > +        nid = 0;
> > +
> > +    cpu_to_node[cpu] = nid;
> > +}
> I think numa_set_node() will want to be implemented in common code.
> 

See my above comment. If x86 is ok, I think yes, we can do it
in common code.

> > diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
> > index ab9c4a2448..1162c702df 100644
> > --- a/xen/include/asm-arm/numa.h
> > +++ b/xen/include/asm-arm/numa.h
> > @@ -27,6 +27,8 @@ extern mfn_t first_valid_mfn;
> >   #define node_start_pfn(nid) (mfn_x(first_valid_mfn))
> >   #define __node_distance(a, b) (20)
> >
> > +#define numa_set_node(x, y) do { } while (0)
> 
> I would define it in xen/numa.h so other arch can take advantage ot it.
> Also, please use a static inline helper so the arguments are evaluated.
> 

Ok

> > +
> >   #endif
> >
> >   #endif /* __ARCH_ARM_NUMA_H */
> > diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
> > index f8e4e15586..69859b0a57 100644
> > --- a/xen/include/asm-x86/numa.h
> > +++ b/xen/include/asm-x86/numa.h
> > @@ -22,7 +22,6 @@ extern nodeid_t pxm_to_node(unsigned int pxm);
> >   #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT))
> >
> >   extern int srat_disabled(void);
> > -extern void numa_set_node(int cpu, nodeid_t node);
> >   extern nodeid_t setup_node(unsigned int pxm);
> >   extern void srat_detect_node(int cpu);
> >
> > diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
> > index 258a5cb3db..3972aa6b93 100644
> > --- a/xen/include/xen/numa.h
> > +++ b/xen/include/xen/numa.h
> > @@ -70,6 +70,7 @@ extern int valid_numa_range(u64 start, u64 end,
> nodeid_t node);
> >
> >   extern void numa_init_array(void);
> >   extern void numa_initmem_init(unsigned long start_pfn, unsigned long
> end_pfn);
> > +extern void numa_set_node(int cpu, nodeid_t node);
> >   extern bool numa_off;
> >   extern int numa_fake;
> >
> >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 13/40] xen/arm: introduce numa_set_node for Arm
  2021-08-25 12:07     ` Wei Chen
@ 2021-08-25 13:24       ` Julien Grall
  2021-08-26  5:13         ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-25 13:24 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini; +Cc: Bertrand Marquis



On 25/08/2021 13:07, Wei Chen wrote:
> Hi Julien,

Hi Wei,

>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 2021年8月25日 18:37
>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
>> sstabellini@kernel.org; jbeulich@suse.com
>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
>> Subject: Re: [XEN RFC PATCH 13/40] xen/arm: introduce numa_set_node for
>> Arm
>>
>> Hi Wei,
>>
>> On 11/08/2021 11:23, Wei Chen wrote:
>>> This API is used to set one CPU to a NUMA node. If the system
>>> configure NUMA off or system initialize NUMA failed, the
>>> online NUMA node would set to only node#0. This will be done
>>> in following patches. When NUMA turn off or init failed,
>>> node_online_map will be cleared and set node#0 online. So we
>>> use node_online_map to prevent to set a CPU to an offline node.
>>
>> IHMO numa_set_node() should behave exactly the same way on x86 and Arm
>> because this is going to be used by the common code.
>>
>>   From the commit message, I don't quite understand why the check is
>> necessary on Arm but not on x86. Can you clarify it?
>>
> 
> Yes, in patch#27, in smpboot.c, dt_smp_init_cpus function.
> We will parse CPU numa-node-id from dtb CPU node. If we get
> a valid node ID for one CPU, we will invoke numa_set_node to
> create CPU-NODE map. But in our testing, we found when NUMA
> init failed, numa_set_node still can set CPU to a offline
> or invalid NODE. So we're using node_online_map to prevent
> this behavior. Otherwise we have to check node_online_map
> everywhere before we call numa_set_node.

What do you mean by invalid NODE? Is it 0xFF (NUMA_NO_NODE)?

> 
> x86 actually is doing the same way, but it handles node_online_map
> check out of numa_set_node:

Right...

>> I think numa_set_node() will want to be implemented in common code.
>>
> 
> See my above comment. If x86 is ok, I think yes, we can do it
> in common code.

... on x86, this check is performed outside of numa_set_node() for one 
caller whereas on Arm you are adding it in numa_set_node().

For example, numa_set_node() can be called with NUMA_NO_NODE. On x86, we 
would set cpu_to_node[] to that value. However, if I am not mistaken, on 
Arm we would set the value to 0.

This will change the behavior of users to cpu_to_node() later on (such 
as XEN_SYSCTL_cputopoinfo).

NUMA is not something architecture specific, so I dont't think the 
implementation should differ here.

In this case, I think numa_set_node() shouldn't check if the node is 
valid. Instead, the caller should take care of it if it is important.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 12/40] xen/x86: Move numa_initmem_init to common
  2021-08-25 11:15     ` Wei Chen
@ 2021-08-25 13:26       ` Julien Grall
  0 siblings, 0 replies; 196+ messages in thread
From: Julien Grall @ 2021-08-25 13:26 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini; +Cc: Bertrand Marquis



On 25/08/2021 12:15, Wei Chen wrote:
> Hi Julien,

Hi Wei,

>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 2021年8月25日 18:22
>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
>> sstabellini@kernel.org; jbeulich@suse.com
>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
>> Subject: Re: [XEN RFC PATCH 12/40] xen/x86: Move numa_initmem_init to
>> common
>>
>> Hi Wei,
>>
>> On 11/08/2021 11:23, Wei Chen wrote:
>>> This function can be reused by Arm device tree based
>>> NUMA support. So we move it from x86 to common, as well
>>> as its related variables and functions:
>>> setup_node_bootmem, numa_init_array and numa_emulation.
>>>
>>> As numa_initmem_init has been moved to common, _memnodemap
>>> is not used cross files. We can restore _memnodemap to
>>> static.
>>
>> As we discussed on a previous patch, we should try to avoid this kind of
>> dance. I can help to find a split that would achieve that.
>>
> 
> Yes, thanks!
> 
>>>
>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>>> ---
>>>    xen/arch/x86/numa.c         | 118 ----------------------------------
>>>    xen/common/numa.c           | 122 +++++++++++++++++++++++++++++++++++-
>>>    xen/include/asm-x86/numa.h  |   5 --
>>>    xen/include/asm-x86/setup.h |   1 -
>>>    xen/include/xen/numa.h      |   8 ++-
>>>    5 files changed, 128 insertions(+), 126 deletions(-)
>>>
>>> diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
>>> index f2626b3968..6908738305 100644
>>> --- a/xen/arch/x86/numa.c
>>> +++ b/xen/arch/x86/numa.c
>>> @@ -38,7 +38,6 @@ nodeid_t apicid_to_node[MAX_LOCAL_APIC] = {
>>>
>>>    nodemask_t __read_mostly node_online_map = { { [0] = 1UL } };
>>>
>>> -bool numa_off;
>>>    s8 acpi_numa = 0;
>>>
>>>    int srat_disabled(void)
>>> @@ -46,123 +45,6 @@ int srat_disabled(void)
>>>        return numa_off || acpi_numa < 0;
>>>    }
>>>
>>> -/* initialize NODE_DATA given nodeid and start/end */
>>> -void __init setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end)
>>> -{
>>> -    unsigned long start_pfn, end_pfn;
>>> -
>>> -    start_pfn = start >> PAGE_SHIFT;
>>> -    end_pfn = end >> PAGE_SHIFT;
>>> -
>>> -    NODE_DATA(nodeid)->node_start_pfn = start_pfn;
>>> -    NODE_DATA(nodeid)->node_spanned_pages = end_pfn - start_pfn;
>>> -
>>> -    node_set_online(nodeid);
>>> -}
>>> -
>>> -void __init numa_init_array(void)
>>> -{
>>> -    int rr, i;
>>> -
>>> -    /* There are unfortunately some poorly designed mainboards around
>>> -       that only connect memory to a single CPU. This breaks the 1:1
>> cpu->node
>>> -       mapping. To avoid this fill in the mapping for all possible
>>> -       CPUs, as the number of CPUs is not known yet.
>>> -       We round robin the existing nodes. */
>>> -    rr = first_node(node_online_map);
>>> -    for ( i = 0; i < nr_cpu_ids; i++ )
>>> -    {
>>> -        if ( cpu_to_node[i] != NUMA_NO_NODE )
>>> -            continue;
>>> -        numa_set_node(i, rr);
>>> -        rr = cycle_node(rr, node_online_map);
>>> -    }
>>> -}
>>> -
>>> -#ifdef CONFIG_NUMA_EMU
>>> -static int numa_fake __initdata = 0;
>>> -
>>> -/* Numa emulation */
>>> -static int __init numa_emulation(u64 start_pfn, u64 end_pfn)
>>> -{
>>> -    int i;
>>> -    struct node nodes[MAX_NUMNODES];
>>> -    u64 sz = ((end_pfn - start_pfn)<<PAGE_SHIFT) / numa_fake;
>>> -
>>> -    /* Kludge needed for the hash function */
>>> -    if ( hweight64(sz) > 1 )
>>> -    {
>>> -        u64 x = 1;
>>> -        while ( (x << 1) < sz )
>>> -            x <<= 1;
>>> -        if ( x < sz/2 )
>>> -            printk(KERN_ERR "Numa emulation unbalanced. Complain to
>> maintainer\n");
>>> -        sz = x;
>>> -    }
>>> -
>>> -    memset(&nodes,0,sizeof(nodes));
>>> -    for ( i = 0; i < numa_fake; i++ )
>>> -    {
>>> -        nodes[i].start = (start_pfn<<PAGE_SHIFT) + i*sz;
>>> -        if ( i == numa_fake - 1 )
>>> -            sz = (end_pfn<<PAGE_SHIFT) - nodes[i].start;
>>> -        nodes[i].end = nodes[i].start + sz;
>>> -        printk(KERN_INFO "Faking node %d at %"PRIx64"-%"PRIx64"
>> (%"PRIu64"MB)\n",
>>> -               i,
>>> -               nodes[i].start, nodes[i].end,
>>> -               (nodes[i].end - nodes[i].start) >> 20);
>>> -        node_set_online(i);
>>> -    }
>>> -    memnode_shift = compute_hash_shift(nodes, numa_fake, NULL);
>>> -    if ( memnode_shift < 0 )
>>> -    {
>>> -        memnode_shift = 0;
>>> -        printk(KERN_ERR "No NUMA hash function found. Emulation
>> disabled.\n");
>>> -        return -1;
>>> -    }
>>> -    for_each_online_node ( i )
>>> -        setup_node_bootmem(i, nodes[i].start, nodes[i].end);
>>> -    numa_init_array();
>>> -
>>> -    return 0;
>>> -}
>>> -#endif
>>> -
>>> -void __init numa_initmem_init(unsigned long start_pfn, unsigned long
>> end_pfn)
>>> -{
>>> -    int i;
>>> -
>>> -#ifdef CONFIG_NUMA_EMU
>>> -    if ( numa_fake && !numa_emulation(start_pfn, end_pfn) )
>>> -        return;
>>> -#endif
>>> -
>>> -#ifdef CONFIG_ACPI_NUMA
>>> -    if ( !numa_off && !acpi_scan_nodes((u64)start_pfn << PAGE_SHIFT,
>>> -         (u64)end_pfn << PAGE_SHIFT) )
>>> -        return;
>>> -#endif
>>> -
>>> -    printk(KERN_INFO "%s\n",
>>> -           numa_off ? "NUMA turned off" : "No NUMA configuration
>> found");
>>> -
>>> -    printk(KERN_INFO "Faking a node at %016"PRIx64"-%016"PRIx64"\n",
>>> -           (u64)start_pfn << PAGE_SHIFT,
>>> -           (u64)end_pfn << PAGE_SHIFT);
>>> -    /* setup dummy node covering all memory */
>>> -    memnode_shift = BITS_PER_LONG - 1;
>>> -    memnodemap = _memnodemap;
>>> -    memnodemapsize = ARRAY_SIZE(_memnodemap);
>>> -
>>> -    nodes_clear(node_online_map);
>>> -    node_set_online(0);
>>> -    for ( i = 0; i < nr_cpu_ids; i++ )
>>> -        numa_set_node(i, 0);
>>> -    cpumask_copy(&node_to_cpumask[0], cpumask_of(0));
>>> -    setup_node_bootmem(0, (u64)start_pfn << PAGE_SHIFT,
>>> -                    (u64)end_pfn << PAGE_SHIFT);
>>> -}
>>> -
>>>    void numa_set_node(int cpu, nodeid_t node)
>>>    {
>>>        cpu_to_node[cpu] = node;
>>> diff --git a/xen/common/numa.c b/xen/common/numa.c
>>> index 1facc8fe2b..26c0006d04 100644
>>> --- a/xen/common/numa.c
>>> +++ b/xen/common/numa.c
>>> @@ -14,12 +14,13 @@
>>>    #include <xen/smp.h>
>>>    #include <xen/pfn.h>
>>>    #include <xen/sched.h>
>>
>> NIT: We tend to add a newline betwen <xen/...> headers and <asm/...>
>> headers.
>>
> 
> got it
> 
>>> +#include <asm/acpi.h>
>>>
>>>    struct node_data node_data[MAX_NUMNODES];
>>>
>>>    /* Mapping from pdx to node id */
>>>    int memnode_shift;
>>> -typeof(*memnodemap) _memnodemap[64];
>>> +static typeof(*memnodemap) _memnodemap[64];
>>>    unsigned long memnodemapsize;
>>>    u8 *memnodemap;
>>>
>>> @@ -34,6 +35,8 @@ int num_node_memblks;
>>>    struct node node_memblk_range[NR_NODE_MEMBLKS];
>>>    nodeid_t memblk_nodeid[NR_NODE_MEMBLKS];
>>>
>>> +bool numa_off;
>>> +
>>>    /*
>>>     * Given a shift value, try to populate memnodemap[]
>>>     * Returns :
>>> @@ -191,3 +194,120 @@ void numa_add_cpu(int cpu)
>>>    {
>>>        cpumask_set_cpu(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
>>>    }
>>> +
>>> +/* initialize NODE_DATA given nodeid and start/end */
>>> +void __init setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end)
>>
>>   From an abstract PoV, start and end should be paddr_t. This should be
>> done on a separate patch though.
>>
> 
> Ok.
> 
>>> +{
>>> +    unsigned long start_pfn, end_pfn;
>>> +
>>> +    start_pfn = start >> PAGE_SHIFT;
>>> +    end_pfn = end >> PAGE_SHIFT;
>>> +
>>> +    NODE_DATA(nodeid)->node_start_pfn = start_pfn;
>>> +    NODE_DATA(nodeid)->node_spanned_pages = end_pfn - start_pfn;
>>> +
>>> +    node_set_online(nodeid);
>>> +}
>>> +
>>> +void __init numa_init_array(void)
>>> +{
>>> +    int rr, i;
>>> +
>>> +    /* There are unfortunately some poorly designed mainboards around
>>> +       that only connect memory to a single CPU. This breaks the 1:1
>> cpu->node
>>> +       mapping. To avoid this fill in the mapping for all possible
>>> +       CPUs, as the number of CPUs is not known yet.
>>> +       We round robin the existing nodes. */
>>> +    rr = first_node(node_online_map);
>>> +    for ( i = 0; i < nr_cpu_ids; i++ )
>>> +    {
>>> +        if ( cpu_to_node[i] != NUMA_NO_NODE )
>>> +            continue;
>>> +        numa_set_node(i, rr);
>>> +        rr = cycle_node(rr, node_online_map);
>>> +    }
>>> +}
>>> +
>>> +#ifdef CONFIG_NUMA_EMU
>>> +int numa_fake __initdata = 0;
>>> +
>>> +/* Numa emulation */
>>> +static int __init numa_emulation(u64 start_pfn, u64 end_pfn)
>>
>> Here, this should be either "unsigned long" or ideally "mfn_t".
>> Although, if you use "unsigned long", you will need to...
>>
> 
> Do we need a separate patch to do it?

I would prefer if the cleanups are done separately as this makes easier 
to review code movement.

> 
>>> +{
>>> +    int i;
>>> +    struct node nodes[MAX_NUMNODES];
>>> +    u64 sz = ((end_pfn - start_pfn)<<PAGE_SHIFT) / numa_fake;
>>
>> ... cast "(end_pfn - start_pfn)" to uin64_t or use pfn_to_paddr().
>>
> 
> Ok
> 
>>> +
>>> +    /* Kludge needed for the hash function */
>>> +    if ( hweight64(sz) > 1 )
>>> +    {
>>> +        u64 x = 1;
>>> +        while ( (x << 1) < sz )
>>> +            x <<= 1;
>>> +        if ( x < sz/2 )
>>> +            printk(KERN_ERR "Numa emulation unbalanced. Complain to
>> maintainer\n");
>>> +        sz = x;
>>> +    }
>>> +
>>> +    memset(&nodes,0,sizeof(nodes));
>>> +    for ( i = 0; i < numa_fake; i++ )
>>> +    {
>>> +        nodes[i].start = (start_pfn<<PAGE_SHIFT) + i*sz;
>>> +        if ( i == numa_fake - 1 )
>>> +            sz = (end_pfn<<PAGE_SHIFT) - nodes[i].start;
>>> +        nodes[i].end = nodes[i].start + sz;
>>> +        printk(KERN_INFO "Faking node %d at %"PRIx64"-%"PRIx64"
>> (%"PRIu64"MB)\n",
>>> +               i,
>>> +               nodes[i].start, nodes[i].end,
>>> +               (nodes[i].end - nodes[i].start) >> 20);
>>> +        node_set_online(i);
>>> +    }
>>> +    memnode_shift = compute_hash_shift(nodes, numa_fake, NULL);
>>> +    if ( memnode_shift < 0 )
>>> +    {
>>> +        memnode_shift = 0;
>>> +        printk(KERN_ERR "No NUMA hash function found. Emulation
>> disabled.\n");
>>> +        return -1;
>>> +    }
>>> +    for_each_online_node ( i )
>>> +        setup_node_bootmem(i, nodes[i].start, nodes[i].end);
>>> +    numa_init_array();
>>> +
>>> +    return 0;
>>> +}
>>> +#endif
>>> +
>>> +void __init numa_initmem_init(unsigned long start_pfn, unsigned long
>> end_pfn)
>>> +{
>>> +    int i;
>>> +
>>> +#ifdef CONFIG_NUMA_EMU
>>> +    if ( numa_fake && !numa_emulation(start_pfn, end_pfn) )
>>> +        return;
>>> +#endif
>>> +
>>> +#ifdef CONFIG_ACPI_NUMA
>>> +    if ( !numa_off && !acpi_scan_nodes((u64)start_pfn << PAGE_SHIFT,
>>> +         (u64)end_pfn << PAGE_SHIFT) )
>>
>> (u64)v << PAGE_SHIFT should be switched to use pfn_to_paddr() or
>> mfn_to_paddr() if you decide to make start_pfn and end_pfn typesafe.
>>
> 
> Still need a separate patch to change it before move?

Yes.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 14/40] xen/arm: set NUMA nodes max number to 64 by default
  2021-08-11 10:23 ` [XEN RFC PATCH 14/40] xen/arm: set NUMA nodes max number to 64 by default Wei Chen
@ 2021-08-25 13:28   ` Julien Grall
  2021-08-25 13:36     ` Jan Beulich
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-25 13:28 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:23, Wei Chen wrote:
> Today's Arm64 systems can reach or exceed 16 NUMA nodes, so
> we set the number to 64 to match with x86.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/include/asm-arm/numa.h | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
> index 1162c702df..b2982f9053 100644
> --- a/xen/include/asm-arm/numa.h
> +++ b/xen/include/asm-arm/numa.h
> @@ -5,7 +5,15 @@
>   
>   typedef u8 nodeid_t;
>   
> -#if !defined(CONFIG_NUMA)
> +#if defined(CONFIG_NUMA)
> +
> +/*
> + * Same as x86, we set the max number of NUMA nodes to 64 and
> + * set the number of NUMA memory block number to 128.
> + */

Such comment can rot easily if x86 decides to bump there values. But 
given the value is the same, I think it would make sense to move the 
define to xen/numa.h.

> +#define NODES_SHIFT      6
> +
> +#else
>   
>   /* Fake one node for now. See also node_online_map. */
>   #define cpu_to_node(cpu) 0
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 14/40] xen/arm: set NUMA nodes max number to 64 by default
  2021-08-25 13:28   ` Julien Grall
@ 2021-08-25 13:36     ` Jan Beulich
  2021-08-26  2:26       ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Jan Beulich @ 2021-08-25 13:36 UTC (permalink / raw)
  To: Julien Grall, Wei Chen; +Cc: Bertrand.Marquis, xen-devel, sstabellini

On 25.08.2021 15:28, Julien Grall wrote:
> On 11/08/2021 11:23, Wei Chen wrote:
>> --- a/xen/include/asm-arm/numa.h
>> +++ b/xen/include/asm-arm/numa.h
>> @@ -5,7 +5,15 @@
>>   
>>   typedef u8 nodeid_t;
>>   
>> -#if !defined(CONFIG_NUMA)
>> +#if defined(CONFIG_NUMA)
>> +
>> +/*
>> + * Same as x86, we set the max number of NUMA nodes to 64 and
>> + * set the number of NUMA memory block number to 128.
>> + */
> 
> Such comment can rot easily if x86 decides to bump there values. But 
> given the value is the same, I think it would make sense to move the 
> define to xen/numa.h.

To be honest - if this gets moved, please at least consider making it
a proper Kconfig setting. Just as much a the number of CPUs can be
configured, the number of nodes should be possible to choose by the
build manager. Of course - if it's not too much trouble ...

Jan



^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 19/40] xen: fdt: Introduce a helper to check fdt node type
  2021-08-11 10:24 ` [XEN RFC PATCH 19/40] xen: fdt: Introduce a helper to check fdt node type Wei Chen
@ 2021-08-25 13:39   ` Julien Grall
  2021-08-26  6:00     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-25 13:39 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini; +Cc: Bertrand.Marquis, Jan Beulich

Hi Wei,

On 11/08/2021 11:24, Wei Chen wrote:
> In later patches, we will parse CPU and memory NUMA information
> from device tree. FDT is using device type property to indicate
> CPU nodes and memory nodes. So we introduce fdt_node_check_type
> in this patch to avoid redundant code in subsequent patches.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/common/libfdt/fdt_ro.c      | 15 +++++++++++++++
>   xen/include/xen/libfdt/libfdt.h | 25 +++++++++++++++++++++++++

This is meant to be a verbatim copy of libfdt. So I am not entirely in 
favor of adding a new function therefore without been upstreamed to 
libfdt first.

>   2 files changed, 40 insertions(+)
> 
> diff --git a/xen/common/libfdt/fdt_ro.c b/xen/common/libfdt/fdt_ro.c
> index 36f9b480d1..ae7794d870 100644
> --- a/xen/common/libfdt/fdt_ro.c
> +++ b/xen/common/libfdt/fdt_ro.c
> @@ -545,6 +545,21 @@ int fdt_node_check_compatible(const void *fdt, int nodeoffset,
>   		return 1;
>   }
>   
> +int fdt_node_check_type(const void *fdt, int nodeoffset,
> +			      const char *type)
> +{
> +	const void *prop;
> +	int len;
> +
> +	prop = fdt_getprop(fdt, nodeoffset, "device_type", &len);
> +	if (!prop)
> +		return len;
> +	if (fdt_stringlist_contains(prop, len, type))

The "device_type" is not a list of string. So I am a bit confused why 
you are using this helper. Shouldn't we simply check that the property 
value and type matches?

> +		return 0;
> +	else
> +		return 1;
> +}
> +
>   int fdt_node_offset_by_compatible(const void *fdt, int startoffset,
>   				  const char *compatible)
>   {
> diff --git a/xen/include/xen/libfdt/libfdt.h b/xen/include/xen/libfdt/libfdt.h
> index 7c75688a39..7e4930dbcd 100644
> --- a/xen/include/xen/libfdt/libfdt.h
> +++ b/xen/include/xen/libfdt/libfdt.h
> @@ -799,6 +799,31 @@ int fdt_node_offset_by_phandle(const void *fdt, uint32_t phandle);
>   int fdt_node_check_compatible(const void *fdt, int nodeoffset,
>   			      const char *compatible);
>   
> +/**
> + * fdt_node_check_type: check a node's device_type property
> + * @fdt: pointer to the device tree blob
> + * @nodeoffset: offset of a tree node
> + * @type: string to match against
> + *
> + *
> + * fdt_node_check_type() returns 0 if the given node contains a 'device_type'
> + * property with the given string as one of its elements, it returns non-zero
> + * otherwise, or on error.
> + *
> + * returns:
> + *	0, if the node has a 'device_type' property listing the given string
> + *	1, if the node has a 'device_type' property, but it does not list
> + *		the given string
> + *	-FDT_ERR_NOTFOUND, if the given node has no 'device_type' property
> + * 	-FDT_ERR_BADOFFSET, if nodeoffset does not refer to a BEGIN_NODE tag
> + *	-FDT_ERR_BADMAGIC,
> + *	-FDT_ERR_BADVERSION,
> + *	-FDT_ERR_BADSTATE,
> + *	-FDT_ERR_BADSTRUCTURE, standard meanings
> + */
> +int fdt_node_check_type(const void *fdt, int nodeoffset,
> +			      const char *type);
> +
>   /**
>    * fdt_node_offset_by_compatible - find nodes with a given 'compatible' value
>    * @fdt: pointer to the device tree blob
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse device tree memory node
  2021-08-11 10:24 ` [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse device tree memory node Wei Chen
@ 2021-08-25 13:48   ` Julien Grall
  2021-08-26  6:35     ` Wei Chen
  2021-08-28  1:06   ` Stefano Stabellini
  1 sibling, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-25 13:48 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:24, Wei Chen wrote:
> Memory blocks' NUMA ID information is stored in device tree's
> memory nodes as "numa-node-id". We need a new helper to parse
> and verify this ID from memory nodes.
> 
> In order to support memory affinity in later use, the valid
> memory ranges and NUMA ID will be saved to tables.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/numa_device_tree.c | 130 ++++++++++++++++++++++++++++++++
>   1 file changed, 130 insertions(+)
> 
> diff --git a/xen/arch/arm/numa_device_tree.c b/xen/arch/arm/numa_device_tree.c
> index 37cc56acf3..bbe081dcd1 100644
> --- a/xen/arch/arm/numa_device_tree.c
> +++ b/xen/arch/arm/numa_device_tree.c
> @@ -20,11 +20,13 @@
>   #include <xen/init.h>
>   #include <xen/nodemask.h>
>   #include <xen/numa.h>
> +#include <xen/libfdt/libfdt.h>
>   #include <xen/device_tree.h>
>   #include <asm/setup.h>
>   
>   s8 device_tree_numa = 0;
>   static nodemask_t processor_nodes_parsed __initdata;
> +static nodemask_t memory_nodes_parsed __initdata;
>   
>   static int srat_disabled(void)
>   {
> @@ -55,6 +57,79 @@ static int __init dtb_numa_processor_affinity_init(nodeid_t node)
>       return 0;
>   }
>   
> +/* Callback for parsing of the memory regions affinity */
> +static int __init dtb_numa_memory_affinity_init(nodeid_t node,
> +                                paddr_t start, paddr_t size)
> +{

The implementation of this function is quite similar ot the ACPI 
version. Can this be abstracted?

> +    struct node *nd;
> +    paddr_t end;
> +    int i;
> +
> +    if ( srat_disabled() )
> +        return -EINVAL;
> +
> +    end = start + size;
> +    if ( num_node_memblks >= NR_NODE_MEMBLKS )
> +    {
> +        dprintk(XENLOG_WARNING,
> +                "Too many numa entry, try bigger NR_NODE_MEMBLKS \n");
> +        bad_srat();
> +        return -EINVAL;
> +    }
> +
> +    /* It is fine to add this area to the nodes data it will be used later */
> +    i = conflicting_memblks(start, end);
> +    /* No conflicting memory block, we can save it for later usage */;
> +    if ( i < 0 )
> +        goto save_memblk;
> +
> +    if ( memblk_nodeid[i] == node ) {

Xen coding style is using:

if ( ... )
{

Note that I may not comment on all the occurents, so please check the 
other places.

> +        /*
> +         * Overlaps with other memblk in the same node, warning here.
> +         * This memblk will be merged with conflicted memblk later.
> +         */
> +        printk(XENLOG_WARNING
> +               "DT: NUMA NODE %u (%"PRIx64
> +               "-%"PRIx64") overlaps with itself (%"PRIx64"-%"PRIx64")\n",
> +               node, start, end,
> +               node_memblk_range[i].start, node_memblk_range[i].end);
> +    } else {
> +        /*
> +         * Conflict with memblk in other node, this is an error.
> +         * The NUMA information is invalid, NUMA will be turn off.
> +         */
> +        printk(XENLOG_ERR
> +               "DT: NUMA NODE %u (%"PRIx64"-%"
> +               PRIx64") overlaps with NODE %u (%"PRIx64"-%"PRIx64")\n",
> +               node, start, end, memblk_nodeid[i],
> +               node_memblk_range[i].start, node_memblk_range[i].end);
> +        bad_srat();
> +        return -EINVAL;
> +    }
> +
> +save_memblk:
> +    nd = &nodes[node];
> +    if ( !node_test_and_set(node, memory_nodes_parsed) ) {
> +        nd->start = start;
> +        nd->end = end;
> +    } else {
> +        if ( start < nd->start )
> +            nd->start = start;
> +        if ( nd->end < end )
> +            nd->end = end;
> +    }
> +
> +    printk(XENLOG_INFO "DT: NUMA node %u %"PRIx64"-%"PRIx64"\n",
> +           node, start, end);
> +
> +    node_memblk_range[num_node_memblks].start = start;
> +    node_memblk_range[num_node_memblks].end = end;
> +    memblk_nodeid[num_node_memblks] = node;
> +    num_node_memblks++;
> +
> +    return 0;
> +}
> +
>   /* Parse CPU NUMA node info */
>   int __init device_tree_parse_numa_cpu_node(const void *fdt, int node)
>   {
> @@ -70,3 +145,58 @@ int __init device_tree_parse_numa_cpu_node(const void *fdt, int node)
>   
>       return dtb_numa_processor_affinity_init(nid);
>   }
> +
> +/* Parse memory node NUMA info */
> +int __init
> +device_tree_parse_numa_memory_node(const void *fdt, int node,
> +    const char *name, uint32_t addr_cells, uint32_t size_cells)

This is pretty much a copy of process_memory_node(). Can we consider to 
collect the NUMA ID from there? If not, can we at least abstract the code?

> +{
> +    uint32_t nid;
> +    int ret = 0, len;
> +    paddr_t addr, size;
> +    const struct fdt_property *prop;
> +    uint32_t idx, ranges;
> +    const __be32 *addresses;
> +
> +    nid = device_tree_get_u32(fdt, node, "numa-node-id", MAX_NUMNODES);
> +    if ( nid >= MAX_NUMNODES )
> +    {
> +        printk(XENLOG_WARNING "Node id %u exceeds maximum value\n", nid);
> +        return -EINVAL;
> +    }
> +
> +    prop = fdt_get_property(fdt, node, "reg", &len);
> +    if ( !prop )
> +    {
> +        printk(XENLOG_WARNING
> +               "fdt: node `%s': missing `reg' property\n", name);
> +        return -EINVAL;
> +    }
> +
> +    addresses = (const __be32 *)prop->data;
> +    ranges = len / (sizeof(__be32)* (addr_cells + size_cells));
> +    for ( idx = 0; idx < ranges; idx++ )
> +    {
> +        device_tree_get_reg(&addresses, addr_cells, size_cells, &addr, &size);
> +        /* Skip zero size ranges */
> +        if ( !size )
> +            continue;
> +
> +        ret = dtb_numa_memory_affinity_init(nid, addr, size);
> +        if ( ret ) {
> +            printk(XENLOG_WARNING
> +                   "NUMA: process range#%d addr = %lx size=%lx failed!\n",

s/%d/%u/ as idx is an unsigned int
s/%lx/%"PRI_paddr"/ as addr and size are paddr_t.

> +                   idx, addr, size);
> +            return -EINVAL;
> +        }
> +    }
> +
> +    if ( idx == 0 )
> +    {
> +        printk(XENLOG_ERR
> +               "bad property in memory node, idx=%d ret=%d\n", idx, ret);
> +        return -EINVAL;
> +    }
> +
> +    return 0;
> +}
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 24/40] xen/arm: introduce a helper to parse device tree NUMA distance map
  2021-08-11 10:24 ` [XEN RFC PATCH 24/40] xen/arm: introduce a helper to parse device tree NUMA distance map Wei Chen
@ 2021-08-25 13:56   ` Julien Grall
  2021-08-26  7:01     ` Wei Chen
  2021-08-31  0:48   ` Stefano Stabellini
  1 sibling, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-25 13:56 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini; +Cc: Bertrand.Marquis, Jan Beulich

Hi Wei,

On 11/08/2021 11:24, Wei Chen wrote:
> A NUMA aware device tree will provide a "distance-map" node to
> describe distance between any two nodes. This patch introduce a

s/introduce/introduces/

> new helper to parse this distance map.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/numa_device_tree.c | 67 +++++++++++++++++++++++++++++++++
>   1 file changed, 67 insertions(+)
> 
> diff --git a/xen/arch/arm/numa_device_tree.c b/xen/arch/arm/numa_device_tree.c
> index bbe081dcd1..6e0d1d3d9f 100644
> --- a/xen/arch/arm/numa_device_tree.c
> +++ b/xen/arch/arm/numa_device_tree.c
> @@ -200,3 +200,70 @@ device_tree_parse_numa_memory_node(const void *fdt, int node,
>   
>       return 0;
>   }
> +
> +/* Parse NUMA distance map v1 */
> +int __init
> +device_tree_parse_numa_distance_map_v1(const void *fdt, int node)
> +{
> +    const struct fdt_property *prop;
> +    const __be32 *matrix;
> +    int entry_count, len, i;

entry_count and i should be unsigned. len unfortunately can't because 
fdt_get_property expects a signed int.

> +
> +    printk(XENLOG_INFO "NUMA: parsing numa-distance-map\n");
> +
> +    prop = fdt_get_property(fdt, node, "distance-matrix", &len);
> +    if ( !prop )
> +    {
> +        printk(XENLOG_WARNING
> +               "NUMA: No distance-matrix property in distance-map\n");
> +
> +        return -EINVAL;
> +    }
> +
> +    if ( len % sizeof(uint32_t) != 0 )
> +    {
> +        printk(XENLOG_WARNING
> +               "distance-matrix in node is not a multiple of u32\n");
> +        return -EINVAL;
> +    }
> +
> +    entry_count = len / sizeof(uint32_t);
> +    if ( entry_count <= 0 )

I understand that entry_count may be 0. But I can't see how it can be 
negative as the property len cannot be (even if it is a signed type). So 
I think this wants to be "== 0".

> +    {
> +        printk(XENLOG_WARNING "NUMA: Invalid distance-matrix\n");
> +
> +        return -EINVAL;
> +    }
> +
> +    matrix = (const __be32 *)prop->data;
> +    for ( i = 0; i + 2 < entry_count; i += 3 )
> +    {
> +        uint32_t from, to, distance;
> +
> +        from = dt_read_number(matrix, 1);
> +        matrix++;

You can use dt_next_cell() which will update the pointer for you.

> +        to = dt_read_number(matrix, 1);
> +        matrix++;
> +        distance = dt_read_number(matrix, 1);
> +        matrix++;
> +
> +        if ( (from == to && distance != NUMA_LOCAL_DISTANCE) ||
> +            (from != to && distance <= NUMA_LOCAL_DISTANCE) )
> +        {
> +            printk(XENLOG_WARNING
> +                   "Invalid nodes' distance from node#%d to node#%d = %d\n",
> +                   from, to, distance);
> +            return -EINVAL;
> +        }
> +
> +        printk(XENLOG_INFO "NUMA: distance from node#%d to node#%d = %d\n",
> +               from, to, distance);
> +        numa_set_distance(from, to, distance);
> +
> +        /* Set default distance of node B->A same as A->B */
> +        if (to > from)
> +             numa_set_distance(to, from, distance);
> +    }
> +
> +    return 0;
> +}
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 26/40] xen/arm: Add boot and secondary CPU to NUMA system
  2021-08-11 10:24 ` [XEN RFC PATCH 26/40] xen/arm: Add boot and secondary CPU to NUMA system Wei Chen
@ 2021-08-25 16:58   ` Julien Grall
  2021-08-26  7:24     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-25 16:58 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:24, Wei Chen wrote:
> When cpu boot up, we have add them to NUMA system. In current
> stage, we have not parsed the NUMA data, but we have created
> a fake NUMA node. So, in this patch, all CPU will be added
> to NUMA node#0. After the NUMA data has been parsed from device
> tree, the CPU will be added to correct NUMA node as the NUMA
> data described.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/setup.c       | 6 ++++++
>   xen/arch/arm/smpboot.c     | 6 ++++++
>   xen/include/asm-arm/numa.h | 1 +
>   3 files changed, 13 insertions(+)
> 
> diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
> index 3c58d2d441..7531989f21 100644
> --- a/xen/arch/arm/setup.c
> +++ b/xen/arch/arm/setup.c
> @@ -918,6 +918,12 @@ void __init start_xen(unsigned long boot_phys_offset,
>   
>       processor_id();
>   
> +    /*
> +     * If Xen is running on a NUMA off system, there will
> +     * be a node#0 at least.
> +     */
> +    numa_add_cpu(0);
> +
>       smp_init_cpus();
>       cpus = smp_get_max_cpus();
>       printk(XENLOG_INFO "SMP: Allowing %u CPUs\n", cpus);
> diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
> index a1ee3146ef..aa78958c07 100644
> --- a/xen/arch/arm/smpboot.c
> +++ b/xen/arch/arm/smpboot.c
> @@ -358,6 +358,12 @@ void start_secondary(void)
>        */
>       smp_wmb();
>   
> +    /*
> +     * If Xen is running on a NUMA off system, there will
> +     * be a node#0 at least.
> +     */
> +    numa_add_cpu(cpuid);
> +

On x86, numa_add_cpu() will be called before the pCPU is brought up. I 
am not quite too sure why we are doing it differently here. Can you 
clarify it?

>       /* Now report this CPU is up */
>       cpumask_set_cpu(cpuid, &cpu_online_map);
>   
> diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
> index 7a3588ac7f..dd31324b0b 100644
> --- a/xen/include/asm-arm/numa.h
> +++ b/xen/include/asm-arm/numa.h
> @@ -59,6 +59,7 @@ extern mfn_t first_valid_mfn;
>   #define __node_distance(a, b) (20)
>   
>   #define numa_init(x) do { } while (0)
> +#define numa_add_cpu(x) do { } while (0)

This is a stubs for a common helper. So I think this wants to be moved 
in the !CONFIG_NUMA in xen/numa.h.

>   #define numa_set_node(x, y) do { } while (0)
>   
>   #endif
>

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 27/40] xen/arm: build CPU NUMA node map while creating cpu_logical_map
  2021-08-11 10:24 ` [XEN RFC PATCH 27/40] xen/arm: build CPU NUMA node map while creating cpu_logical_map Wei Chen
@ 2021-08-25 17:06   ` Julien Grall
  2021-08-26  7:26     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-25 17:06 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini; +Cc: Bertrand.Marquis, Jan Beulich

Hi Wei,

On 11/08/2021 11:24, Wei Chen wrote:
> Sometimes, CPU logical ID maybe different with physical CPU ID.
> Xen is using CPU logial ID for runtime usage, so we should use
> CPU logical ID to create map between NUMA node and CPU.

This commit message gives the impression that you are trying to fix a 
bug. However, what you are explaining is the reason why the code will 
use the logical ID rather than physical ID.

I think the commit message should explain what the patch is doing. You 
can then add an explanation why you are using the CPU logical ID. 
Something like "Note we storing the CPU logical ID because...".


> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/smpboot.c | 31 ++++++++++++++++++++++++++++++-
>   1 file changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
> index aa78958c07..dd5a45bffc 100644
> --- a/xen/arch/arm/smpboot.c
> +++ b/xen/arch/arm/smpboot.c
> @@ -121,7 +121,12 @@ static void __init dt_smp_init_cpus(void)
>       {
>           [0 ... NR_CPUS - 1] = MPIDR_INVALID
>       };
> +    static nodeid_t node_map[NR_CPUS] __initdata =
> +    {
> +        [0 ... NR_CPUS - 1] = NUMA_NO_NODE
> +    };
>       bool bootcpu_valid = false;
> +    uint32_t nid = 0;
>       int rc;
>   
>       mpidr = boot_cpu_data.mpidr.bits & MPIDR_HWID_MASK;
> @@ -172,6 +177,26 @@ static void __init dt_smp_init_cpus(void)
>               continue;
>           }
>   
> +#ifdef CONFIG_DEVICE_TREE_NUMA
> +        /*
> +         *  When CONFIG_DEVICE_TREE_NUMA is set, try to fetch numa infomation
> +         * from CPU dts node, otherwise the nid is always 0.
> +         */
> +        if ( !dt_property_read_u32(cpu, "numa-node-id", &nid) )

You can avoid the #ifdef by writing:

if ( IS_ENABLED(CONFIG_DEVICE_TREE_NUMA) && ... )

However, I would using CONFIG_NUMA because this code is already DT 
specific. So we can shorten the name a bit.

> +        {
> +            printk(XENLOG_WARNING
> +                "cpu[%d] dts path: %s: doesn't have numa infomation!\n",

s/information/information/

> +                cpuidx, dt_node_full_name(cpu));
> +            /*
> +             * The the early stage of NUMA initialization, when Xen found any

s/The/During/?

> +             * CPU dts node doesn't have numa-node-id info, the NUMA will be
> +             * treated as off, all CPU will be set to a FAKE node 0. So if we
> +             * get numa-node-id failed here, we should set nid to 0.
> +             */
> +            nid = 0;
> +        }
> +#endif
> +
>           /*
>            * 8 MSBs must be set to 0 in the DT since the reg property
>            * defines the MPIDR[23:0]
> @@ -231,9 +256,12 @@ static void __init dt_smp_init_cpus(void)
>           {
>               printk("cpu%d init failed (hwid %"PRIregister"): %d\n", i, hwid, rc);
>               tmp_map[i] = MPIDR_INVALID;
> +            node_map[i] = NUMA_NO_NODE;
>           }
> -        else
> +        else {
>               tmp_map[i] = hwid;
> +            node_map[i] = nid;
> +        }
>       }
>   
>       if ( !bootcpu_valid )
> @@ -249,6 +277,7 @@ static void __init dt_smp_init_cpus(void)
>               continue;
>           cpumask_set_cpu(i, &cpu_possible_map);
>           cpu_logical_map(i) = tmp_map[i];
> +        numa_set_node(i, node_map[i]);
>       }
>   }
>    >

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 29/40] xen/arm: implement Arm arch helpers Arm to get memory map info
  2021-08-11 10:24 ` [XEN RFC PATCH 29/40] xen/arm: implement Arm arch helpers Arm to get memory map info Wei Chen
@ 2021-08-25 17:09   ` Julien Grall
  2021-08-26  7:27     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-25 17:09 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:24, Wei Chen wrote:
> These two helpers are architecture APIs that are required by
> nodes_cover_memory.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/numa.c | 14 ++++++++++++++
>   1 file changed, 14 insertions(+)
> 
> diff --git a/xen/arch/arm/numa.c b/xen/arch/arm/numa.c
> index f61a8df645..6eebf8e8bc 100644
> --- a/xen/arch/arm/numa.c
> +++ b/xen/arch/arm/numa.c
> @@ -126,3 +126,17 @@ void __init numa_init(bool acpi_off)
>       numa_initmem_init(PFN_UP(ram_start), PFN_DOWN(ram_end));
>       return;
>   }
> +
> +uint32_t __init arch_meminfo_get_nr_bank(void)
> +{
> +	return bootinfo.mem.nr_banks;
> +}
> +
> +int __init arch_meminfo_get_ram_bank_range(int bank,
> +	unsigned long long *start, unsigned long long *end)

They are physical address, so we should use "paddr_t" as on system such 
as 32-bit Arm, "unsigned long" is not enough to cover all the physical 
address.

As you change the type, I would also suggest to change the bank from an 
int to an unsigned int.

> +{
> +	*start = bootinfo.mem.bank[bank].start;
> +	*end = bootinfo.mem.bank[bank].start + bootinfo.mem.bank[bank].size;
> +
> +	return 0;
> +}
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 30/40] xen: move NUMA memory and CPU parsed nodemasks to common
  2021-08-11 10:24 ` [XEN RFC PATCH 30/40] xen: move NUMA memory and CPU parsed nodemasks to common Wei Chen
@ 2021-08-25 17:16   ` Julien Grall
  2021-08-26  7:29     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-25 17:16 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:24, Wei Chen wrote:
> Both memory_nodes_parsed and processor_nodes_parsed are using
> for Arm and x86 to record parded NUMA memory and CPU. So we
> move them to common.

Looking at the usage, they both call:

numa_set...(..., bitmap)

So rather than exporting the two helpers, could we simply add helpers to 
abstract it?


> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/numa_device_tree.c | 2 --
>   xen/arch/x86/srat.c             | 3 ---
>   xen/common/numa.c               | 3 +++
>   xen/include/xen/nodemask.h      | 2 ++
>   4 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/arch/arm/numa_device_tree.c b/xen/arch/arm/numa_device_tree.c
> index 27ffb72f7b..f74b7f6427 100644
> --- a/xen/arch/arm/numa_device_tree.c
> +++ b/xen/arch/arm/numa_device_tree.c
> @@ -25,8 +25,6 @@
>   #include <asm/setup.h>
>   
>   s8 device_tree_numa = 0;
> -static nodemask_t processor_nodes_parsed __initdata;
> -static nodemask_t memory_nodes_parsed __initdata;

This is code that was introduced in a previous patch. In general, it is 
better to do the rework first and then add the new code. This makes 
easier to follow series as the code added is not changed.

>   
>   static int srat_disabled(void)
>   {
> diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
> index 2298353846..dd3aa30843 100644
> --- a/xen/arch/x86/srat.c
> +++ b/xen/arch/x86/srat.c
> @@ -24,9 +24,6 @@
>   
>   static struct acpi_table_slit *__read_mostly acpi_slit;
>   
> -static nodemask_t memory_nodes_parsed __initdata;
> -static nodemask_t processor_nodes_parsed __initdata;
> -
>   struct pxm2node {
>   	unsigned pxm;
>   	nodeid_t node;
> diff --git a/xen/common/numa.c b/xen/common/numa.c
> index 26c0006d04..79ab250543 100644
> --- a/xen/common/numa.c
> +++ b/xen/common/numa.c
> @@ -35,6 +35,9 @@ int num_node_memblks;
>   struct node node_memblk_range[NR_NODE_MEMBLKS];
>   nodeid_t memblk_nodeid[NR_NODE_MEMBLKS];
>   
> +nodemask_t memory_nodes_parsed __initdata;
> +nodemask_t processor_nodes_parsed __initdata;
> +
>   bool numa_off;
>   
>   /*
> diff --git a/xen/include/xen/nodemask.h b/xen/include/xen/nodemask.h
> index 1dd6c7458e..29ce5e28e7 100644
> --- a/xen/include/xen/nodemask.h
> +++ b/xen/include/xen/nodemask.h
> @@ -276,6 +276,8 @@ static inline int __cycle_node(int n, const nodemask_t *maskp, int nbits)
>    */
>   
>   extern nodemask_t node_online_map;
> +extern nodemask_t memory_nodes_parsed;
> +extern nodemask_t processor_nodes_parsed;
>   
>   #if MAX_NUMNODES > 1
>   #define num_online_nodes()	nodes_weight(node_online_map)
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64
  2021-08-11 10:23 [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64 Wei Chen
                   ` (41 preceding siblings ...)
  2021-08-19 13:42 ` Julien Grall
@ 2021-08-26  0:09 ` Stefano Stabellini
  2021-08-26  7:31   ` Wei Chen
  42 siblings, 1 reply; 196+ messages in thread
From: Stefano Stabellini @ 2021-08-26  0:09 UTC (permalink / raw)
  To: Wei Chen
  Cc: xen-devel, sstabellini, julien, jbeulich, Bertrand.Marquis,
	andrew.cooper3

Thanks for the big contribution!

I just wanted to let you know that the series passed all the gitlab-ci
build tests without issues.

The runtime tests originally failed due to unrelated problems (there was
a Debian testing upgrade that broke Gitlab-CI.) I fix the underlying
issue and restarted the failed tests and now they passed.

This is the pipeline:
https://gitlab.com/xen-project/patchew/xen/-/pipelines/351484940

There are still two runtime x86 tests that fail but I don't think the
failures are related to your series.


On Wed, 11 Aug 2021, Wei Chen wrote:
> Xen memory allocation and scheduler modules are NUMA aware.
> But actually, on x86 has implemented the architecture APIs
> to support NUMA. Arm was providing a set of fake architecture
> APIs to make it compatible with NUMA awared memory allocation
> and scheduler.
> 
> Arm system was working well as a single node NUMA system with
> these fake APIs, because we didn't have multiple nodes NUMA
> system on Arm. But in recent years, more and more Arm devices
> support multiple nodes NUMA system. Like TX2, some Hisilicon
> chips and the Ampere Altra.
> 
> So now we have a new problem. When Xen is running on these Arm
> devices, Xen still treat them as single node SMP systems. The
> NUMA affinity capability of Xen memory allocation and scheduler
> becomes meaningless. Because they rely on input data that does
> not reflect real NUMA layout.
> 
> Xen still think the access time for all of the memory is the
> same for all CPUs. However, Xen may allocate memory to a VM
> from different NUMA nodes with different access speeds. This
> difference can be amplified in workloads inside VM, causing
> performance instability and timeouts. 
> 
> So in this patch series, we implement a set of NUMA API to use
> device tree to describe the NUMA layout. We reuse most of the
> code of x86 NUMA to create and maintain the mapping between
> memory and CPU, create the matrix between any two NUMA nodes.
> Except ACPI and some x86 specified code, we have moved other
> code to common. In next stage, when we implement ACPI based
> NUMA for Arm64, we may move the ACPI NUMA code to common too,
> but in current stage, we keep it as x86 only.
> 
> This patch serires has been tested and booted well on one
> Arm64 NUMA machine and one HPE x86 NUMA machine.
> 
> Hongda Deng (2):
>   xen/arm: return default DMA bit width when platform is not set
>   xen/arm: Fix lowmem_bitsize when arch_get_dma_bitsize return 0
> 
> Wei Chen (38):
>   tools: Fix -Werror=maybe-uninitialized for xlu_pci_parse_bdf
>   xen/arm: Print a 64-bit number in hex from early uart
>   xen/x86: Initialize memnodemapsize while faking NUMA node
>   xen: decouple NUMA from ACPI in Kconfig
>   xen/arm: use !CONFIG_NUMA to keep fake NUMA API
>   xen/x86: Move NUMA memory node map functions to common
>   xen/x86: Move numa_add_cpu_node to common
>   xen/x86: Move NR_NODE_MEMBLKS macro to common
>   xen/x86: Move NUMA nodes and memory block ranges to common
>   xen/x86: Move numa_initmem_init to common
>   xen/arm: introduce numa_set_node for Arm
>   xen/arm: set NUMA nodes max number to 64 by default
>   xen/x86: move NUMA API from x86 header to common header
>   xen/arm: Create a fake NUMA node to use common code
>   xen/arm: Introduce DEVICE_TREE_NUMA Kconfig for arm64
>   xen/arm: Keep memory nodes in dtb for NUMA when boot from EFI
>   xen: fdt: Introduce a helper to check fdt node type
>   xen/arm: implement node distance helpers for Arm64
>   xen/arm: introduce device_tree_numa as a switch for device tree NUMA
>   xen/arm: introduce a helper to parse device tree processor node
>   xen/arm: introduce a helper to parse device tree memory node
>   xen/arm: introduce a helper to parse device tree NUMA distance map
>   xen/arm: unified entry to parse all NUMA data from device tree
>   xen/arm: Add boot and secondary CPU to NUMA system
>   xen/arm: build CPU NUMA node map while creating cpu_logical_map
>   xen/x86: decouple nodes_cover_memory with E820 map
>   xen/arm: implement Arm arch helpers Arm to get memory map info
>   xen: move NUMA memory and CPU parsed nodemasks to common
>   xen/x86: move nodes_cover_memory to common
>   xen/x86: make acpi_scan_nodes to be neutral
>   xen: export bad_srat and srat_disabled to extern
>   xen: move numa_scan_nodes from x86 to common
>   xen: enable numa_scan_nodes for device tree based NUMA
>   xen/arm: keep guest still be NUMA unware
>   xen: introduce an arch helper to do NUMA init failed fallback
>   xen/arm: enable device tree based NUMA in system init
>   xen/x86: move numa_setup to common to support NUMA switch in command
>     line
>   xen/x86: move dump_numa info hotkey to common
> 
>  tools/libs/util/libxlu_pci.c    |   3 +-
>  xen/arch/arm/Kconfig            |  10 +
>  xen/arch/arm/Makefile           |   2 +
>  xen/arch/arm/arm64/head.S       |   9 +-
>  xen/arch/arm/bootfdt.c          |   8 +-
>  xen/arch/arm/domain_build.c     |  17 +-
>  xen/arch/arm/efi/efi-boot.h     |  25 --
>  xen/arch/arm/numa.c             | 162 +++++++++
>  xen/arch/arm/numa_device_tree.c | 292 ++++++++++++++++
>  xen/arch/arm/platform.c         |   4 +-
>  xen/arch/arm/setup.c            |  14 +
>  xen/arch/arm/smpboot.c          |  37 +-
>  xen/arch/x86/Kconfig            |   2 +-
>  xen/arch/x86/numa.c             | 421 +----------------------
>  xen/arch/x86/srat.c             | 147 +-------
>  xen/common/Kconfig              |   3 +
>  xen/common/Makefile             |   1 +
>  xen/common/libfdt/fdt_ro.c      |  15 +
>  xen/common/numa.c               | 588 ++++++++++++++++++++++++++++++++
>  xen/common/page_alloc.c         |   2 +-
>  xen/drivers/acpi/Kconfig        |   3 +-
>  xen/drivers/acpi/Makefile       |   2 +-
>  xen/include/asm-arm/numa.h      |  33 ++
>  xen/include/asm-arm/setup.h     |   6 +
>  xen/include/asm-x86/acpi.h      |   4 -
>  xen/include/asm-x86/config.h    |   1 -
>  xen/include/asm-x86/numa.h      |  65 +---
>  xen/include/asm-x86/setup.h     |   1 -
>  xen/include/xen/libfdt/libfdt.h |  25 ++
>  xen/include/xen/nodemask.h      |   2 +
>  xen/include/xen/numa.h          |  80 +++++
>  31 files changed, 1325 insertions(+), 659 deletions(-)
>  create mode 100644 xen/arch/arm/numa.c
>  create mode 100644 xen/arch/arm/numa_device_tree.c
>  create mode 100644 xen/common/numa.c
> 
> -- 
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 14/40] xen/arm: set NUMA nodes max number to 64 by default
  2021-08-25 13:36     ` Jan Beulich
@ 2021-08-26  2:26       ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-26  2:26 UTC (permalink / raw)
  To: Jan Beulich, Julien Grall; +Cc: Bertrand Marquis, xen-devel, sstabellini

Hi Jan, Julien,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2021年8月25日 21:36
> To: Julien Grall <julien@xen.org>; Wei Chen <Wei.Chen@arm.com>
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; xen-
> devel@lists.xenproject.org; sstabellini@kernel.org
> Subject: Re: [XEN RFC PATCH 14/40] xen/arm: set NUMA nodes max number to
> 64 by default
> 
> On 25.08.2021 15:28, Julien Grall wrote:
> > On 11/08/2021 11:23, Wei Chen wrote:
> >> --- a/xen/include/asm-arm/numa.h
> >> +++ b/xen/include/asm-arm/numa.h
> >> @@ -5,7 +5,15 @@
> >>
> >>   typedef u8 nodeid_t;
> >>
> >> -#if !defined(CONFIG_NUMA)
> >> +#if defined(CONFIG_NUMA)
> >> +
> >> +/*
> >> + * Same as x86, we set the max number of NUMA nodes to 64 and
> >> + * set the number of NUMA memory block number to 128.
> >> + */
> >
> > Such comment can rot easily if x86 decides to bump there values. But
> > given the value is the same, I think it would make sense to move the
> > define to xen/numa.h.
> 
> To be honest - if this gets moved, please at least consider making it
> a proper Kconfig setting. Just as much a the number of CPUs can be
> configured, the number of nodes should be possible to choose by the
> build manager. Of course - if it's not too much trouble ...
> 

Ok

> Jan


^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 13/40] xen/arm: introduce numa_set_node for Arm
  2021-08-25 13:24       ` Julien Grall
@ 2021-08-26  5:13         ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-26  5:13 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月25日 21:24
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 13/40] xen/arm: introduce numa_set_node for
> Arm
> 
> 
> 
> On 25/08/2021 13:07, Wei Chen wrote:
> > Hi Julien,
> 
> Hi Wei,
> 
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: 2021年8月25日 18:37
> >> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> >> sstabellini@kernel.org; jbeulich@suse.com
> >> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> >> Subject: Re: [XEN RFC PATCH 13/40] xen/arm: introduce numa_set_node for
> >> Arm
> >>
> >> Hi Wei,
> >>
> >> On 11/08/2021 11:23, Wei Chen wrote:
> >>> This API is used to set one CPU to a NUMA node. If the system
> >>> configure NUMA off or system initialize NUMA failed, the
> >>> online NUMA node would set to only node#0. This will be done
> >>> in following patches. When NUMA turn off or init failed,
> >>> node_online_map will be cleared and set node#0 online. So we
> >>> use node_online_map to prevent to set a CPU to an offline node.
> >>
> >> IHMO numa_set_node() should behave exactly the same way on x86 and Arm
> >> because this is going to be used by the common code.
> >>
> >>   From the commit message, I don't quite understand why the check is
> >> necessary on Arm but not on x86. Can you clarify it?
> >>
> >
> > Yes, in patch#27, in smpboot.c, dt_smp_init_cpus function.
> > We will parse CPU numa-node-id from dtb CPU node. If we get
> > a valid node ID for one CPU, we will invoke numa_set_node to
> > create CPU-NODE map. But in our testing, we found when NUMA
> > init failed, numa_set_node still can set CPU to a offline
> > or invalid NODE. So we're using node_online_map to prevent
> > this behavior. Otherwise we have to check node_online_map
> > everywhere before we call numa_set_node.
> 
> What do you mean by invalid NODE? Is it 0xFF (NUMA_NO_NODE)?

No, I mean some wrong content in device tree. For example, if
the dtb set a wrong numa-node-id in CPU dt-node.

> 
> >
> > x86 actually is doing the same way, but it handles node_online_map
> > check out of numa_set_node:
> 
> Right...
> 
> >> I think numa_set_node() will want to be implemented in common code.
> >>
> >
> > See my above comment. If x86 is ok, I think yes, we can do it
> > in common code.
> 
> ... on x86, this check is performed outside of numa_set_node() for one
> caller whereas on Arm you are adding it in numa_set_node().
> 
> For example, numa_set_node() can be called with NUMA_NO_NODE. On x86, we
> would set cpu_to_node[] to that value. However, if I am not mistaken, on
> Arm we would set the value to 0.
> 
> This will change the behavior of users to cpu_to_node() later on (such
> as XEN_SYSCTL_cputopoinfo).
> 
> NUMA is not something architecture specific, so I dont't think the
> implementation should differ here.
> 
> In this case, I think numa_set_node() shouldn't check if the node is
> valid. Instead, the caller should take care of it if it is important.
> 

Yes, I agree. I will change it in next version.

> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 19/40] xen: fdt: Introduce a helper to check fdt node type
  2021-08-25 13:39   ` Julien Grall
@ 2021-08-26  6:00     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-26  6:00 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月25日 21:39
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Jan Beulich
> <jbeulich@suse.com>
> Subject: Re: [XEN RFC PATCH 19/40] xen: fdt: Introduce a helper to check
> fdt node type
> 
> Hi Wei,
> 
> On 11/08/2021 11:24, Wei Chen wrote:
> > In later patches, we will parse CPU and memory NUMA information
> > from device tree. FDT is using device type property to indicate
> > CPU nodes and memory nodes. So we introduce fdt_node_check_type
> > in this patch to avoid redundant code in subsequent patches.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/common/libfdt/fdt_ro.c      | 15 +++++++++++++++
> >   xen/include/xen/libfdt/libfdt.h | 25 +++++++++++++++++++++++++
> 
> This is meant to be a verbatim copy of libfdt. So I am not entirely in
> favor of adding a new function therefore without been upstreamed to
> libfdt first.
> 

Oh, if we need to upstream this change in libfdt. I think I'd better
to remove this change in libfdt. Because we can implement type checking
in other place, and I don't want to introduce a dependency on external
repo upstream in this series.

> >   2 files changed, 40 insertions(+)
> >
> > diff --git a/xen/common/libfdt/fdt_ro.c b/xen/common/libfdt/fdt_ro.c
> > index 36f9b480d1..ae7794d870 100644
> > --- a/xen/common/libfdt/fdt_ro.c
> > +++ b/xen/common/libfdt/fdt_ro.c
> > @@ -545,6 +545,21 @@ int fdt_node_check_compatible(const void *fdt, int
> nodeoffset,
> >   		return 1;
> >   }
> >
> > +int fdt_node_check_type(const void *fdt, int nodeoffset,
> > +			      const char *type)
> > +{
> > +	const void *prop;
> > +	int len;
> > +
> > +	prop = fdt_getprop(fdt, nodeoffset, "device_type", &len);
> > +	if (!prop)
> > +		return len;
> > +	if (fdt_stringlist_contains(prop, len, type))
> 
> The "device_type" is not a list of string. So I am a bit confused why
> you are using this helper. Shouldn't we simply check that the property
> value and type matches?
> 

Yes, I think you're right. This function was based on the modification
of fdt_node_check_compatible, and I forgot to replace fdt_stringlist_contains.
And, as I reply above, we can simply the check. And I will implement it
out of libfdt.

> > +		return 0;
> > +	else
> > +		return 1;
> > +}
> > +
> >   int fdt_node_offset_by_compatible(const void *fdt, int startoffset,
> >   				  const char *compatible)
> >   {
> > diff --git a/xen/include/xen/libfdt/libfdt.h
> b/xen/include/xen/libfdt/libfdt.h
> > index 7c75688a39..7e4930dbcd 100644
> > --- a/xen/include/xen/libfdt/libfdt.h
> > +++ b/xen/include/xen/libfdt/libfdt.h
> > @@ -799,6 +799,31 @@ int fdt_node_offset_by_phandle(const void *fdt,
> uint32_t phandle);
> >   int fdt_node_check_compatible(const void *fdt, int nodeoffset,
> >   			      const char *compatible);
> >
> > +/**
> > + * fdt_node_check_type: check a node's device_type property
> > + * @fdt: pointer to the device tree blob
> > + * @nodeoffset: offset of a tree node
> > + * @type: string to match against
> > + *
> > + *
> > + * fdt_node_check_type() returns 0 if the given node contains a
> 'device_type'
> > + * property with the given string as one of its elements, it returns
> non-zero
> > + * otherwise, or on error.
> > + *
> > + * returns:
> > + *	0, if the node has a 'device_type' property listing the given string
> > + *	1, if the node has a 'device_type' property, but it does not list
> > + *		the given string
> > + *	-FDT_ERR_NOTFOUND, if the given node has no 'device_type' property
> > + * 	-FDT_ERR_BADOFFSET, if nodeoffset does not refer to a
> BEGIN_NODE tag
> > + *	-FDT_ERR_BADMAGIC,
> > + *	-FDT_ERR_BADVERSION,
> > + *	-FDT_ERR_BADSTATE,
> > + *	-FDT_ERR_BADSTRUCTURE, standard meanings
> > + */
> > +int fdt_node_check_type(const void *fdt, int nodeoffset,
> > +			      const char *type);
> > +
> >   /**
> >    * fdt_node_offset_by_compatible - find nodes with a given
> 'compatible' value
> >    * @fdt: pointer to the device tree blob
> >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse device tree memory node
  2021-08-25 13:48   ` Julien Grall
@ 2021-08-26  6:35     ` Wei Chen
  2021-08-26  8:21       ` Julien Grall
  0 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-26  6:35 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月25日 21:49
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse
> device tree memory node
> 
> Hi Wei,
> 
> On 11/08/2021 11:24, Wei Chen wrote:
> > Memory blocks' NUMA ID information is stored in device tree's
> > memory nodes as "numa-node-id". We need a new helper to parse
> > and verify this ID from memory nodes.
> >
> > In order to support memory affinity in later use, the valid
> > memory ranges and NUMA ID will be saved to tables.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/arm/numa_device_tree.c | 130 ++++++++++++++++++++++++++++++++
> >   1 file changed, 130 insertions(+)
> >
> > diff --git a/xen/arch/arm/numa_device_tree.c
> b/xen/arch/arm/numa_device_tree.c
> > index 37cc56acf3..bbe081dcd1 100644
> > --- a/xen/arch/arm/numa_device_tree.c
> > +++ b/xen/arch/arm/numa_device_tree.c
> > @@ -20,11 +20,13 @@
> >   #include <xen/init.h>
> >   #include <xen/nodemask.h>
> >   #include <xen/numa.h>
> > +#include <xen/libfdt/libfdt.h>
> >   #include <xen/device_tree.h>
> >   #include <asm/setup.h>
> >
> >   s8 device_tree_numa = 0;
> >   static nodemask_t processor_nodes_parsed __initdata;
> > +static nodemask_t memory_nodes_parsed __initdata;
> >
> >   static int srat_disabled(void)
> >   {
> > @@ -55,6 +57,79 @@ static int __init
> dtb_numa_processor_affinity_init(nodeid_t node)
> >       return 0;
> >   }
> >
> > +/* Callback for parsing of the memory regions affinity */
> > +static int __init dtb_numa_memory_affinity_init(nodeid_t node,
> > +                                paddr_t start, paddr_t size)
> > +{
> 
> The implementation of this function is quite similar ot the ACPI
> version. Can this be abstracted?

In my draft, I had tried to merge ACPI and DTB versions in one
function. I introduced a number of "if else" to distinguish ACPI
from DTB, especially ACPI hotplug. The function seems very messy.
Not enough benefits to make up for the mess, so I gave up.

But, yes, maybe we can abstract some common code to functions, that
can be reused in two functions. I would try it in next version.


> 
> > +    struct node *nd;
> > +    paddr_t end;
> > +    int i;
> > +
> > +    if ( srat_disabled() )
> > +        return -EINVAL;
> > +
> > +    end = start + size;
> > +    if ( num_node_memblks >= NR_NODE_MEMBLKS )
> > +    {
> > +        dprintk(XENLOG_WARNING,
> > +                "Too many numa entry, try bigger NR_NODE_MEMBLKS \n");
> > +        bad_srat();
> > +        return -EINVAL;
> > +    }
> > +
> > +    /* It is fine to add this area to the nodes data it will be used
> later */
> > +    i = conflicting_memblks(start, end);
> > +    /* No conflicting memory block, we can save it for later usage */;
> > +    if ( i < 0 )
> > +        goto save_memblk;
> > +
> > +    if ( memblk_nodeid[i] == node ) {
> 
> Xen coding style is using:
> 
> if ( ... )
> {
> 
> Note that I may not comment on all the occurents, so please check the
> other places.
> 

Ok.


> > +        /*
> > +         * Overlaps with other memblk in the same node, warning here.
> > +         * This memblk will be merged with conflicted memblk later.
> > +         */
> > +        printk(XENLOG_WARNING
> > +               "DT: NUMA NODE %u (%"PRIx64
> > +               "-%"PRIx64") overlaps with itself (%"PRIx64"-
> %"PRIx64")\n",
> > +               node, start, end,
> > +               node_memblk_range[i].start, node_memblk_range[i].end);
> > +    } else {
> > +        /*
> > +         * Conflict with memblk in other node, this is an error.
> > +         * The NUMA information is invalid, NUMA will be turn off.
> > +         */
> > +        printk(XENLOG_ERR
> > +               "DT: NUMA NODE %u (%"PRIx64"-%"
> > +               PRIx64") overlaps with NODE %u (%"PRIx64"-%"PRIx64")\n",
> > +               node, start, end, memblk_nodeid[i],
> > +               node_memblk_range[i].start, node_memblk_range[i].end);
> > +        bad_srat();
> > +        return -EINVAL;
> > +    }
> > +
> > +save_memblk:
> > +    nd = &nodes[node];
> > +    if ( !node_test_and_set(node, memory_nodes_parsed) ) {
> > +        nd->start = start;
> > +        nd->end = end;
> > +    } else {
> > +        if ( start < nd->start )
> > +            nd->start = start;
> > +        if ( nd->end < end )
> > +            nd->end = end;
> > +    }
> > +
> > +    printk(XENLOG_INFO "DT: NUMA node %u %"PRIx64"-%"PRIx64"\n",
> > +           node, start, end);
> > +
> > +    node_memblk_range[num_node_memblks].start = start;
> > +    node_memblk_range[num_node_memblks].end = end;
> > +    memblk_nodeid[num_node_memblks] = node;
> > +    num_node_memblks++;
> > +
> > +    return 0;
> > +}
> > +
> >   /* Parse CPU NUMA node info */
> >   int __init device_tree_parse_numa_cpu_node(const void *fdt, int node)
> >   {
> > @@ -70,3 +145,58 @@ int __init device_tree_parse_numa_cpu_node(const
> void *fdt, int node)
> >
> >       return dtb_numa_processor_affinity_init(nid);
> >   }
> > +
> > +/* Parse memory node NUMA info */
> > +int __init
> > +device_tree_parse_numa_memory_node(const void *fdt, int node,
> > +    const char *name, uint32_t addr_cells, uint32_t size_cells)
> 
> This is pretty much a copy of process_memory_node(). Can we consider to
> collect the NUMA ID from there? If not, can we at least abstract the code?
> 

I had tried to parse NUMA id in process_memory_node or in early_scan_node.
But I found, this change would make NUMA init code in different places.
And I prefer a unify NUMA init entry. Because it may be good for
maintenance. When we want to disable NUMA, we just need to disable it
in one place. But you're right, we still can do some code abstraction.
I will do it in next version.

> > +{
> > +    uint32_t nid;
> > +    int ret = 0, len;
> > +    paddr_t addr, size;
> > +    const struct fdt_property *prop;
> > +    uint32_t idx, ranges;
> > +    const __be32 *addresses;
> > +
> > +    nid = device_tree_get_u32(fdt, node, "numa-node-id", MAX_NUMNODES);
> > +    if ( nid >= MAX_NUMNODES )
> > +    {
> > +        printk(XENLOG_WARNING "Node id %u exceeds maximum value\n",
> nid);
> > +        return -EINVAL;
> > +    }
> > +
> > +    prop = fdt_get_property(fdt, node, "reg", &len);
> > +    if ( !prop )
> > +    {
> > +        printk(XENLOG_WARNING
> > +               "fdt: node `%s': missing `reg' property\n", name);
> > +        return -EINVAL;
> > +    }
> > +
> > +    addresses = (const __be32 *)prop->data;
> > +    ranges = len / (sizeof(__be32)* (addr_cells + size_cells));
> > +    for ( idx = 0; idx < ranges; idx++ )
> > +    {
> > +        device_tree_get_reg(&addresses, addr_cells, size_cells, &addr,
> &size);
> > +        /* Skip zero size ranges */
> > +        if ( !size )
> > +            continue;
> > +
> > +        ret = dtb_numa_memory_affinity_init(nid, addr, size);
> > +        if ( ret ) {
> > +            printk(XENLOG_WARNING
> > +                   "NUMA: process range#%d addr = %lx size=%lx
> failed!\n",
> 
> s/%d/%u/ as idx is an unsigned int
> s/%lx/%"PRI_paddr"/ as addr and size are paddr_t.
> 

OK

> > +                   idx, addr, size);
> > +            return -EINVAL;
> > +        }
> > +    }
> > +
> > +    if ( idx == 0 )
> > +    {
> > +        printk(XENLOG_ERR
> > +               "bad property in memory node, idx=%d ret=%d\n", idx,
> ret);
> > +        return -EINVAL;
> > +    }
> > +
> > +    return 0;
> > +}
> >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 24/40] xen/arm: introduce a helper to parse device tree NUMA distance map
  2021-08-25 13:56   ` Julien Grall
@ 2021-08-26  7:01     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-26  7:01 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月25日 21:56
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Jan Beulich
> <jbeulich@suse.com>
> Subject: Re: [XEN RFC PATCH 24/40] xen/arm: introduce a helper to parse
> device tree NUMA distance map
> 
> Hi Wei,
> 
> On 11/08/2021 11:24, Wei Chen wrote:
> > A NUMA aware device tree will provide a "distance-map" node to
> > describe distance between any two nodes. This patch introduce a
> 
> s/introduce/introduces/

OK

> 
> > new helper to parse this distance map.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/arm/numa_device_tree.c | 67 +++++++++++++++++++++++++++++++++
> >   1 file changed, 67 insertions(+)
> >
> > diff --git a/xen/arch/arm/numa_device_tree.c
> b/xen/arch/arm/numa_device_tree.c
> > index bbe081dcd1..6e0d1d3d9f 100644
> > --- a/xen/arch/arm/numa_device_tree.c
> > +++ b/xen/arch/arm/numa_device_tree.c
> > @@ -200,3 +200,70 @@ device_tree_parse_numa_memory_node(const void *fdt,
> int node,
> >
> >       return 0;
> >   }
> > +
> > +/* Parse NUMA distance map v1 */
> > +int __init
> > +device_tree_parse_numa_distance_map_v1(const void *fdt, int node)
> > +{
> > +    const struct fdt_property *prop;
> > +    const __be32 *matrix;
> > +    int entry_count, len, i;
> 
> entry_count and i should be unsigned. len unfortunately can't because
> fdt_get_property expects a signed int.
> 

OK

> > +
> > +    printk(XENLOG_INFO "NUMA: parsing numa-distance-map\n");
> > +
> > +    prop = fdt_get_property(fdt, node, "distance-matrix", &len);
> > +    if ( !prop )
> > +    {
> > +        printk(XENLOG_WARNING
> > +               "NUMA: No distance-matrix property in distance-map\n");
> > +
> > +        return -EINVAL;
> > +    }
> > +
> > +    if ( len % sizeof(uint32_t) != 0 )
> > +    {
> > +        printk(XENLOG_WARNING
> > +               "distance-matrix in node is not a multiple of u32\n");
> > +        return -EINVAL;
> > +    }
> > +
> > +    entry_count = len / sizeof(uint32_t);
> > +    if ( entry_count <= 0 )
> 
> I understand that entry_count may be 0. But I can't see how it can be
> negative as the property len cannot be (even if it is a signed type). So
> I think this wants to be "== 0".
> 

From the fdt_get_property's comment, when prop is NULL, the len can
be negative. But, yes, prop==NULL check will handled before this code.
negative len will not reach here. I am ok to change it to "== 0"

> > +    {
> > +        printk(XENLOG_WARNING "NUMA: Invalid distance-matrix\n");
> > +
> > +        return -EINVAL;
> > +    }
> > +
> > +    matrix = (const __be32 *)prop->data;
> > +    for ( i = 0; i + 2 < entry_count; i += 3 )
> > +    {
> > +        uint32_t from, to, distance;
> > +
> > +        from = dt_read_number(matrix, 1);
> > +        matrix++;
> 
> You can use dt_next_cell() which will update the pointer for you.
> 

Thanks

> > +        to = dt_read_number(matrix, 1);
> > +        matrix++;
> > +        distance = dt_read_number(matrix, 1);
> > +        matrix++;
> > +
> > +        if ( (from == to && distance != NUMA_LOCAL_DISTANCE) ||
> > +            (from != to && distance <= NUMA_LOCAL_DISTANCE) )
> > +        {
> > +            printk(XENLOG_WARNING
> > +                   "Invalid nodes' distance from node#%d to node#%d
> = %d\n",
> > +                   from, to, distance);
> > +            return -EINVAL;
> > +        }
> > +
> > +        printk(XENLOG_INFO "NUMA: distance from node#%d to node#%d
> = %d\n",
> > +               from, to, distance);
> > +        numa_set_distance(from, to, distance);
> > +
> > +        /* Set default distance of node B->A same as A->B */
> > +        if (to > from)
> > +             numa_set_distance(to, from, distance);
> > +    }
> > +
> > +    return 0;
> > +}
> >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 26/40] xen/arm: Add boot and secondary CPU to NUMA system
  2021-08-25 16:58   ` Julien Grall
@ 2021-08-26  7:24     ` Wei Chen
  2021-08-26  8:49       ` Julien Grall
  0 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-26  7:24 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月26日 0:58
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 26/40] xen/arm: Add boot and secondary CPU to
> NUMA system
> 
> Hi Wei,
> 
> On 11/08/2021 11:24, Wei Chen wrote:
> > When cpu boot up, we have add them to NUMA system. In current
> > stage, we have not parsed the NUMA data, but we have created
> > a fake NUMA node. So, in this patch, all CPU will be added
> > to NUMA node#0. After the NUMA data has been parsed from device
> > tree, the CPU will be added to correct NUMA node as the NUMA
> > data described.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/arm/setup.c       | 6 ++++++
> >   xen/arch/arm/smpboot.c     | 6 ++++++
> >   xen/include/asm-arm/numa.h | 1 +
> >   3 files changed, 13 insertions(+)
> >
> > diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
> > index 3c58d2d441..7531989f21 100644
> > --- a/xen/arch/arm/setup.c
> > +++ b/xen/arch/arm/setup.c
> > @@ -918,6 +918,12 @@ void __init start_xen(unsigned long
> boot_phys_offset,
> >
> >       processor_id();
> >
> > +    /*
> > +     * If Xen is running on a NUMA off system, there will
> > +     * be a node#0 at least.
> > +     */
> > +    numa_add_cpu(0);
> > +
> >       smp_init_cpus();
> >       cpus = smp_get_max_cpus();
> >       printk(XENLOG_INFO "SMP: Allowing %u CPUs\n", cpus);
> > diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
> > index a1ee3146ef..aa78958c07 100644
> > --- a/xen/arch/arm/smpboot.c
> > +++ b/xen/arch/arm/smpboot.c
> > @@ -358,6 +358,12 @@ void start_secondary(void)
> >        */
> >       smp_wmb();
> >
> > +    /*
> > +     * If Xen is running on a NUMA off system, there will
> > +     * be a node#0 at least.
> > +     */
> > +    numa_add_cpu(cpuid);
> > +
> 
> On x86, numa_add_cpu() will be called before the pCPU is brought up. I
> am not quite too sure why we are doing it differently here. Can you
> clarify it?

Of course we can invoke numa_add_cpu before cpu_up as x86. But in my tests,
I found when cpu bring up failed, this cpu still be add to NUMA. Although
this does not affect the execution of the code (because CPU is offline),  
But I don't think adding a offline CPU to NUMA makes sense.



> 
> >       /* Now report this CPU is up */
> >       cpumask_set_cpu(cpuid, &cpu_online_map);
> >
> > diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
> > index 7a3588ac7f..dd31324b0b 100644
> > --- a/xen/include/asm-arm/numa.h
> > +++ b/xen/include/asm-arm/numa.h
> > @@ -59,6 +59,7 @@ extern mfn_t first_valid_mfn;
> >   #define __node_distance(a, b) (20)
> >
> >   #define numa_init(x) do { } while (0)
> > +#define numa_add_cpu(x) do { } while (0)
> 
> This is a stubs for a common helper. So I think this wants to be moved
> in the !CONFIG_NUMA in xen/numa.h.
> 

OK

> >   #define numa_set_node(x, y) do { } while (0)
> >
> >   #endif
> >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 27/40] xen/arm: build CPU NUMA node map while creating cpu_logical_map
  2021-08-25 17:06   ` Julien Grall
@ 2021-08-26  7:26     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-26  7:26 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月26日 1:07
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Jan Beulich
> <jbeulich@suse.com>
> Subject: Re: [XEN RFC PATCH 27/40] xen/arm: build CPU NUMA node map while
> creating cpu_logical_map
> 
> Hi Wei,
> 
> On 11/08/2021 11:24, Wei Chen wrote:
> > Sometimes, CPU logical ID maybe different with physical CPU ID.
> > Xen is using CPU logial ID for runtime usage, so we should use
> > CPU logical ID to create map between NUMA node and CPU.
> 
> This commit message gives the impression that you are trying to fix a
> bug. However, what you are explaining is the reason why the code will
> use the logical ID rather than physical ID.
> 
> I think the commit message should explain what the patch is doing. You
> can then add an explanation why you are using the CPU logical ID.
> Something like "Note we storing the CPU logical ID because...".
> 
> 

Ok

> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/arm/smpboot.c | 31 ++++++++++++++++++++++++++++++-
> >   1 file changed, 30 insertions(+), 1 deletion(-)
> >
> > diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
> > index aa78958c07..dd5a45bffc 100644
> > --- a/xen/arch/arm/smpboot.c
> > +++ b/xen/arch/arm/smpboot.c
> > @@ -121,7 +121,12 @@ static void __init dt_smp_init_cpus(void)
> >       {
> >           [0 ... NR_CPUS - 1] = MPIDR_INVALID
> >       };
> > +    static nodeid_t node_map[NR_CPUS] __initdata =
> > +    {
> > +        [0 ... NR_CPUS - 1] = NUMA_NO_NODE
> > +    };
> >       bool bootcpu_valid = false;
> > +    uint32_t nid = 0;
> >       int rc;
> >
> >       mpidr = boot_cpu_data.mpidr.bits & MPIDR_HWID_MASK;
> > @@ -172,6 +177,26 @@ static void __init dt_smp_init_cpus(void)
> >               continue;
> >           }
> >
> > +#ifdef CONFIG_DEVICE_TREE_NUMA
> > +        /*
> > +         *  When CONFIG_DEVICE_TREE_NUMA is set, try to fetch numa
> infomation
> > +         * from CPU dts node, otherwise the nid is always 0.
> > +         */
> > +        if ( !dt_property_read_u32(cpu, "numa-node-id", &nid) )
> 
> You can avoid the #ifdef by writing:
> 
> if ( IS_ENABLED(CONFIG_DEVICE_TREE_NUMA) && ... )
> 
> However, I would using CONFIG_NUMA because this code is already DT
> specific. So we can shorten the name a bit.
> 

OK

> > +        {
> > +            printk(XENLOG_WARNING
> > +                "cpu[%d] dts path: %s: doesn't have numa infomation!\n",
> 
> s/information/information/

OK

> 
> > +                cpuidx, dt_node_full_name(cpu));
> > +            /*
> > +             * The the early stage of NUMA initialization, when Xen
> found any
> 
> s/The/During/?

Oh, yes, I will fix it.

> 
> > +             * CPU dts node doesn't have numa-node-id info, the NUMA
> will be
> > +             * treated as off, all CPU will be set to a FAKE node 0. So
> if we
> > +             * get numa-node-id failed here, we should set nid to 0.
> > +             */
> > +            nid = 0;
> > +        }
> > +#endif
> > +
> >           /*
> >            * 8 MSBs must be set to 0 in the DT since the reg property
> >            * defines the MPIDR[23:0]
> > @@ -231,9 +256,12 @@ static void __init dt_smp_init_cpus(void)
> >           {
> >               printk("cpu%d init failed (hwid %"PRIregister"): %d\n", i,
> hwid, rc);
> >               tmp_map[i] = MPIDR_INVALID;
> > +            node_map[i] = NUMA_NO_NODE;
> >           }
> > -        else
> > +        else {
> >               tmp_map[i] = hwid;
> > +            node_map[i] = nid;
> > +        }
> >       }
> >
> >       if ( !bootcpu_valid )
> > @@ -249,6 +277,7 @@ static void __init dt_smp_init_cpus(void)
> >               continue;
> >           cpumask_set_cpu(i, &cpu_possible_map);
> >           cpu_logical_map(i) = tmp_map[i];
> > +        numa_set_node(i, node_map[i]);
> >       }
> >   }
> >    >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 29/40] xen/arm: implement Arm arch helpers Arm to get memory map info
  2021-08-25 17:09   ` Julien Grall
@ 2021-08-26  7:27     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-26  7:27 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月26日 1:10
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 29/40] xen/arm: implement Arm arch helpers Arm
> to get memory map info
> 
> Hi Wei,
> 
> On 11/08/2021 11:24, Wei Chen wrote:
> > These two helpers are architecture APIs that are required by
> > nodes_cover_memory.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/arm/numa.c | 14 ++++++++++++++
> >   1 file changed, 14 insertions(+)
> >
> > diff --git a/xen/arch/arm/numa.c b/xen/arch/arm/numa.c
> > index f61a8df645..6eebf8e8bc 100644
> > --- a/xen/arch/arm/numa.c
> > +++ b/xen/arch/arm/numa.c
> > @@ -126,3 +126,17 @@ void __init numa_init(bool acpi_off)
> >       numa_initmem_init(PFN_UP(ram_start), PFN_DOWN(ram_end));
> >       return;
> >   }
> > +
> > +uint32_t __init arch_meminfo_get_nr_bank(void)
> > +{
> > +	return bootinfo.mem.nr_banks;
> > +}
> > +
> > +int __init arch_meminfo_get_ram_bank_range(int bank,
> > +	unsigned long long *start, unsigned long long *end)
> 
> They are physical address, so we should use "paddr_t" as on system such
> as 32-bit Arm, "unsigned long" is not enough to cover all the physical
> address.
> 
> As you change the type, I would also suggest to change the bank from an
> int to an unsigned int.
> 

I will fix them in next version.

> > +{
> > +	*start = bootinfo.mem.bank[bank].start;
> > +	*end = bootinfo.mem.bank[bank].start + bootinfo.mem.bank[bank].size;
> > +
> > +	return 0;
> > +}
> >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 30/40] xen: move NUMA memory and CPU parsed nodemasks to common
  2021-08-25 17:16   ` Julien Grall
@ 2021-08-26  7:29     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-26  7:29 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Juilien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月26日 1:17
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 30/40] xen: move NUMA memory and CPU parsed
> nodemasks to common
> 
> Hi Wei,
> 
> On 11/08/2021 11:24, Wei Chen wrote:
> > Both memory_nodes_parsed and processor_nodes_parsed are using
> > for Arm and x86 to record parded NUMA memory and CPU. So we
> > move them to common.
> 
> Looking at the usage, they both call:
> 
> numa_set...(..., bitmap)
> 
> So rather than exporting the two helpers, could we simply add helpers to
> abstract it?
> 

I will try to fix it in next version.

> 
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/arm/numa_device_tree.c | 2 --
> >   xen/arch/x86/srat.c             | 3 ---
> >   xen/common/numa.c               | 3 +++
> >   xen/include/xen/nodemask.h      | 2 ++
> >   4 files changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/xen/arch/arm/numa_device_tree.c
> b/xen/arch/arm/numa_device_tree.c
> > index 27ffb72f7b..f74b7f6427 100644
> > --- a/xen/arch/arm/numa_device_tree.c
> > +++ b/xen/arch/arm/numa_device_tree.c
> > @@ -25,8 +25,6 @@
> >   #include <asm/setup.h>
> >
> >   s8 device_tree_numa = 0;
> > -static nodemask_t processor_nodes_parsed __initdata;
> > -static nodemask_t memory_nodes_parsed __initdata;
> 
> This is code that was introduced in a previous patch. In general, it is
> better to do the rework first and then add the new code. This makes
> easier to follow series as the code added is not changed.
> 

Yes, I will fix it in next version.

> >
> >   static int srat_disabled(void)
> >   {
> > diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
> > index 2298353846..dd3aa30843 100644
> > --- a/xen/arch/x86/srat.c
> > +++ b/xen/arch/x86/srat.c
> > @@ -24,9 +24,6 @@
> >
> >   static struct acpi_table_slit *__read_mostly acpi_slit;
> >
> > -static nodemask_t memory_nodes_parsed __initdata;
> > -static nodemask_t processor_nodes_parsed __initdata;
> > -
> >   struct pxm2node {
> >   	unsigned pxm;
> >   	nodeid_t node;
> > diff --git a/xen/common/numa.c b/xen/common/numa.c
> > index 26c0006d04..79ab250543 100644
> > --- a/xen/common/numa.c
> > +++ b/xen/common/numa.c
> > @@ -35,6 +35,9 @@ int num_node_memblks;
> >   struct node node_memblk_range[NR_NODE_MEMBLKS];
> >   nodeid_t memblk_nodeid[NR_NODE_MEMBLKS];
> >
> > +nodemask_t memory_nodes_parsed __initdata;
> > +nodemask_t processor_nodes_parsed __initdata;
> > +
> >   bool numa_off;
> >
> >   /*
> > diff --git a/xen/include/xen/nodemask.h b/xen/include/xen/nodemask.h
> > index 1dd6c7458e..29ce5e28e7 100644
> > --- a/xen/include/xen/nodemask.h
> > +++ b/xen/include/xen/nodemask.h
> > @@ -276,6 +276,8 @@ static inline int __cycle_node(int n, const
> nodemask_t *maskp, int nbits)
> >    */
> >
> >   extern nodemask_t node_online_map;
> > +extern nodemask_t memory_nodes_parsed;
> > +extern nodemask_t processor_nodes_parsed;
> >
> >   #if MAX_NUMNODES > 1
> >   #define num_online_nodes()	nodes_weight(node_online_map)
> >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 00/40] Add device tree based NUMA support to Arm64
  2021-08-26  0:09 ` Stefano Stabellini
@ 2021-08-26  7:31   ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-26  7:31 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: xen-devel, julien, jbeulich, Bertrand Marquis, andrew.cooper3

Hi Stefano,

> -----Original Message-----
> From: Stefano Stabellini <sstabellini@kernel.org>
> Sent: 2021年8月26日 8:09
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: xen-devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org;
> jbeulich@suse.com; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> andrew.cooper3@citrix.com
> Subject: Re: [XEN RFC PATCH 00/40] Add device tree based NUMA support to
> Arm64
> 
> Thanks for the big contribution!
> 
> I just wanted to let you know that the series passed all the gitlab-ci
> build tests without issues.
> 
> The runtime tests originally failed due to unrelated problems (there was
> a Debian testing upgrade that broke Gitlab-CI.) I fix the underlying
> issue and restarted the failed tests and now they passed.
> 
> This is the pipeline:
> https://gitlab.com/xen-project/patchew/xen/-/pipelines/351484940
> 
> There are still two runtime x86 tests that fail but I don't think the
> failures are related to your series.
> 
> 

Thanks for testing this series : )

> On Wed, 11 Aug 2021, Wei Chen wrote:
> > Xen memory allocation and scheduler modules are NUMA aware.
> > But actually, on x86 has implemented the architecture APIs
> > to support NUMA. Arm was providing a set of fake architecture
> > APIs to make it compatible with NUMA awared memory allocation
> > and scheduler.
> >
> > Arm system was working well as a single node NUMA system with
> > these fake APIs, because we didn't have multiple nodes NUMA
> > system on Arm. But in recent years, more and more Arm devices
> > support multiple nodes NUMA system. Like TX2, some Hisilicon
> > chips and the Ampere Altra.
> >
> > So now we have a new problem. When Xen is running on these Arm
> > devices, Xen still treat them as single node SMP systems. The
> > NUMA affinity capability of Xen memory allocation and scheduler
> > becomes meaningless. Because they rely on input data that does
> > not reflect real NUMA layout.
> >
> > Xen still think the access time for all of the memory is the
> > same for all CPUs. However, Xen may allocate memory to a VM
> > from different NUMA nodes with different access speeds. This
> > difference can be amplified in workloads inside VM, causing
> > performance instability and timeouts.
> >
> > So in this patch series, we implement a set of NUMA API to use
> > device tree to describe the NUMA layout. We reuse most of the
> > code of x86 NUMA to create and maintain the mapping between
> > memory and CPU, create the matrix between any two NUMA nodes.
> > Except ACPI and some x86 specified code, we have moved other
> > code to common. In next stage, when we implement ACPI based
> > NUMA for Arm64, we may move the ACPI NUMA code to common too,
> > but in current stage, we keep it as x86 only.
> >
> > This patch serires has been tested and booted well on one
> > Arm64 NUMA machine and one HPE x86 NUMA machine.
> >
> > Hongda Deng (2):
> >   xen/arm: return default DMA bit width when platform is not set
> >   xen/arm: Fix lowmem_bitsize when arch_get_dma_bitsize return 0
> >
> > Wei Chen (38):
> >   tools: Fix -Werror=maybe-uninitialized for xlu_pci_parse_bdf
> >   xen/arm: Print a 64-bit number in hex from early uart
> >   xen/x86: Initialize memnodemapsize while faking NUMA node
> >   xen: decouple NUMA from ACPI in Kconfig
> >   xen/arm: use !CONFIG_NUMA to keep fake NUMA API
> >   xen/x86: Move NUMA memory node map functions to common
> >   xen/x86: Move numa_add_cpu_node to common
> >   xen/x86: Move NR_NODE_MEMBLKS macro to common
> >   xen/x86: Move NUMA nodes and memory block ranges to common
> >   xen/x86: Move numa_initmem_init to common
> >   xen/arm: introduce numa_set_node for Arm
> >   xen/arm: set NUMA nodes max number to 64 by default
> >   xen/x86: move NUMA API from x86 header to common header
> >   xen/arm: Create a fake NUMA node to use common code
> >   xen/arm: Introduce DEVICE_TREE_NUMA Kconfig for arm64
> >   xen/arm: Keep memory nodes in dtb for NUMA when boot from EFI
> >   xen: fdt: Introduce a helper to check fdt node type
> >   xen/arm: implement node distance helpers for Arm64
> >   xen/arm: introduce device_tree_numa as a switch for device tree NUMA
> >   xen/arm: introduce a helper to parse device tree processor node
> >   xen/arm: introduce a helper to parse device tree memory node
> >   xen/arm: introduce a helper to parse device tree NUMA distance map
> >   xen/arm: unified entry to parse all NUMA data from device tree
> >   xen/arm: Add boot and secondary CPU to NUMA system
> >   xen/arm: build CPU NUMA node map while creating cpu_logical_map
> >   xen/x86: decouple nodes_cover_memory with E820 map
> >   xen/arm: implement Arm arch helpers Arm to get memory map info
> >   xen: move NUMA memory and CPU parsed nodemasks to common
> >   xen/x86: move nodes_cover_memory to common
> >   xen/x86: make acpi_scan_nodes to be neutral
> >   xen: export bad_srat and srat_disabled to extern
> >   xen: move numa_scan_nodes from x86 to common
> >   xen: enable numa_scan_nodes for device tree based NUMA
> >   xen/arm: keep guest still be NUMA unware
> >   xen: introduce an arch helper to do NUMA init failed fallback
> >   xen/arm: enable device tree based NUMA in system init
> >   xen/x86: move numa_setup to common to support NUMA switch in command
> >     line
> >   xen/x86: move dump_numa info hotkey to common
> >
> >  tools/libs/util/libxlu_pci.c    |   3 +-
> >  xen/arch/arm/Kconfig            |  10 +
> >  xen/arch/arm/Makefile           |   2 +
> >  xen/arch/arm/arm64/head.S       |   9 +-
> >  xen/arch/arm/bootfdt.c          |   8 +-
> >  xen/arch/arm/domain_build.c     |  17 +-
> >  xen/arch/arm/efi/efi-boot.h     |  25 --
> >  xen/arch/arm/numa.c             | 162 +++++++++
> >  xen/arch/arm/numa_device_tree.c | 292 ++++++++++++++++
> >  xen/arch/arm/platform.c         |   4 +-
> >  xen/arch/arm/setup.c            |  14 +
> >  xen/arch/arm/smpboot.c          |  37 +-
> >  xen/arch/x86/Kconfig            |   2 +-
> >  xen/arch/x86/numa.c             | 421 +----------------------
> >  xen/arch/x86/srat.c             | 147 +-------
> >  xen/common/Kconfig              |   3 +
> >  xen/common/Makefile             |   1 +
> >  xen/common/libfdt/fdt_ro.c      |  15 +
> >  xen/common/numa.c               | 588 ++++++++++++++++++++++++++++++++
> >  xen/common/page_alloc.c         |   2 +-
> >  xen/drivers/acpi/Kconfig        |   3 +-
> >  xen/drivers/acpi/Makefile       |   2 +-
> >  xen/include/asm-arm/numa.h      |  33 ++
> >  xen/include/asm-arm/setup.h     |   6 +
> >  xen/include/asm-x86/acpi.h      |   4 -
> >  xen/include/asm-x86/config.h    |   1 -
> >  xen/include/asm-x86/numa.h      |  65 +---
> >  xen/include/asm-x86/setup.h     |   1 -
> >  xen/include/xen/libfdt/libfdt.h |  25 ++
> >  xen/include/xen/nodemask.h      |   2 +
> >  xen/include/xen/numa.h          |  80 +++++
> >  31 files changed, 1325 insertions(+), 659 deletions(-)
> >  create mode 100644 xen/arch/arm/numa.c
> >  create mode 100644 xen/arch/arm/numa_device_tree.c
> >  create mode 100644 xen/common/numa.c
> >
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse device tree memory node
  2021-08-26  6:35     ` Wei Chen
@ 2021-08-26  8:21       ` Julien Grall
  2021-08-26 11:54         ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-26  8:21 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini; +Cc: Bertrand Marquis



On 26/08/2021 07:35, Wei Chen wrote:
> Hi Julien,

Hi Wei,

>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 2021年8月25日 21:49
>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
>> sstabellini@kernel.org; jbeulich@suse.com
>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
>> Subject: Re: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse
>> device tree memory node
>>
>> Hi Wei,
>>
>> On 11/08/2021 11:24, Wei Chen wrote:
>>> Memory blocks' NUMA ID information is stored in device tree's
>>> memory nodes as "numa-node-id". We need a new helper to parse
>>> and verify this ID from memory nodes.
>>>
>>> In order to support memory affinity in later use, the valid
>>> memory ranges and NUMA ID will be saved to tables.
>>>
>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>>> ---
>>>    xen/arch/arm/numa_device_tree.c | 130 ++++++++++++++++++++++++++++++++
>>>    1 file changed, 130 insertions(+)
>>>
>>> diff --git a/xen/arch/arm/numa_device_tree.c
>> b/xen/arch/arm/numa_device_tree.c
>>> index 37cc56acf3..bbe081dcd1 100644
>>> --- a/xen/arch/arm/numa_device_tree.c
>>> +++ b/xen/arch/arm/numa_device_tree.c
>>> @@ -20,11 +20,13 @@
>>>    #include <xen/init.h>
>>>    #include <xen/nodemask.h>
>>>    #include <xen/numa.h>
>>> +#include <xen/libfdt/libfdt.h>
>>>    #include <xen/device_tree.h>
>>>    #include <asm/setup.h>
>>>
>>>    s8 device_tree_numa = 0;
>>>    static nodemask_t processor_nodes_parsed __initdata;
>>> +static nodemask_t memory_nodes_parsed __initdata;
>>>
>>>    static int srat_disabled(void)
>>>    {
>>> @@ -55,6 +57,79 @@ static int __init
>> dtb_numa_processor_affinity_init(nodeid_t node)
>>>        return 0;
>>>    }
>>>
>>> +/* Callback for parsing of the memory regions affinity */
>>> +static int __init dtb_numa_memory_affinity_init(nodeid_t node,
>>> +                                paddr_t start, paddr_t size)
>>> +{
>>
>> The implementation of this function is quite similar ot the ACPI
>> version. Can this be abstracted?
> 
> In my draft, I had tried to merge ACPI and DTB versions in one
> function. I introduced a number of "if else" to distinguish ACPI
> from DTB, especially ACPI hotplug. The function seems very messy.
> Not enough benefits to make up for the mess, so I gave up.

It think you can get away from distinguishing between ACPI and DT in 
that helper:
   * ma->flags & ACPI_SRAT_MEM_HOTPLUGGABLE could be replace by an 
argument indicating whether the region is hotpluggable (this would 
always be false for DT)
   * Access to memblk_hotplug can be stubbed (in the future we may want 
to consider memory hotplug even on Arm).

Do you still have the "if else" version? If so can you post it?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 26/40] xen/arm: Add boot and secondary CPU to NUMA system
  2021-08-26  7:24     ` Wei Chen
@ 2021-08-26  8:49       ` Julien Grall
  2021-08-26  9:39         ` Jan Beulich
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-26  8:49 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini; +Cc: Bertrand Marquis



On 26/08/2021 08:24, Wei Chen wrote:
> Hi Julien,

Hi Wei,

>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 2021年8月26日 0:58
>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
>> sstabellini@kernel.org; jbeulich@suse.com
>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
>> Subject: Re: [XEN RFC PATCH 26/40] xen/arm: Add boot and secondary CPU to
>> NUMA system
>>
>> Hi Wei,
>>
>> On 11/08/2021 11:24, Wei Chen wrote:
>>> When cpu boot up, we have add them to NUMA system. In current
>>> stage, we have not parsed the NUMA data, but we have created
>>> a fake NUMA node. So, in this patch, all CPU will be added
>>> to NUMA node#0. After the NUMA data has been parsed from device
>>> tree, the CPU will be added to correct NUMA node as the NUMA
>>> data described.
>>>
>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>>> ---
>>>    xen/arch/arm/setup.c       | 6 ++++++
>>>    xen/arch/arm/smpboot.c     | 6 ++++++
>>>    xen/include/asm-arm/numa.h | 1 +
>>>    3 files changed, 13 insertions(+)
>>>
>>> diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
>>> index 3c58d2d441..7531989f21 100644
>>> --- a/xen/arch/arm/setup.c
>>> +++ b/xen/arch/arm/setup.c
>>> @@ -918,6 +918,12 @@ void __init start_xen(unsigned long
>> boot_phys_offset,
>>>
>>>        processor_id();
>>>
>>> +    /*
>>> +     * If Xen is running on a NUMA off system, there will
>>> +     * be a node#0 at least.
>>> +     */
>>> +    numa_add_cpu(0);
>>> +
>>>        smp_init_cpus();
>>>        cpus = smp_get_max_cpus();
>>>        printk(XENLOG_INFO "SMP: Allowing %u CPUs\n", cpus);
>>> diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
>>> index a1ee3146ef..aa78958c07 100644
>>> --- a/xen/arch/arm/smpboot.c
>>> +++ b/xen/arch/arm/smpboot.c
>>> @@ -358,6 +358,12 @@ void start_secondary(void)
>>>         */
>>>        smp_wmb();
>>>
>>> +    /*
>>> +     * If Xen is running on a NUMA off system, there will
>>> +     * be a node#0 at least.
>>> +     */
>>> +    numa_add_cpu(cpuid);
>>> +
>>
>> On x86, numa_add_cpu() will be called before the pCPU is brought up. I
>> am not quite too sure why we are doing it differently here. Can you
>> clarify it?
> 
> Of course we can invoke numa_add_cpu before cpu_up as x86. But in my tests,
> I found when cpu bring up failed, this cpu still be add to NUMA. Although
> this does not affect the execution of the code (because CPU is offline),
> But I don't think adding a offline CPU to NUMA makes sense.

Right, but again, why do you want to solve the problem on Arm and not 
x86? After all, NUMA is not architecture specific (in fact you move most 
of the code in common).

In fact, the risk, is someone may read arch/x86 and doesn't realize the 
CPU is not in the node until late on Arm.

So I think we should call numa_add_cpu() around the same place on all 
the architectures.

If you think the current position on x86 is not correct, then it should 
be changed at as well. However, I don't know the story behind the 
position of the call on x86. You may want to ask the x86 maintainers.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 26/40] xen/arm: Add boot and secondary CPU to NUMA system
  2021-08-26  8:49       ` Julien Grall
@ 2021-08-26  9:39         ` Jan Beulich
  2021-08-26 12:08           ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Jan Beulich @ 2021-08-26  9:39 UTC (permalink / raw)
  To: Julien Grall, Wei Chen; +Cc: Bertrand Marquis, xen-devel, sstabellini

On 26.08.2021 10:49, Julien Grall wrote:
> On 26/08/2021 08:24, Wei Chen wrote:
>>> -----Original Message-----
>>> From: Julien Grall <julien@xen.org>
>>> Sent: 2021年8月26日 0:58
>>> On 11/08/2021 11:24, Wei Chen wrote:
>>>> --- a/xen/arch/arm/smpboot.c
>>>> +++ b/xen/arch/arm/smpboot.c
>>>> @@ -358,6 +358,12 @@ void start_secondary(void)
>>>>         */
>>>>        smp_wmb();
>>>>
>>>> +    /*
>>>> +     * If Xen is running on a NUMA off system, there will
>>>> +     * be a node#0 at least.
>>>> +     */
>>>> +    numa_add_cpu(cpuid);
>>>> +
>>>
>>> On x86, numa_add_cpu() will be called before the pCPU is brought up. I
>>> am not quite too sure why we are doing it differently here. Can you
>>> clarify it?
>>
>> Of course we can invoke numa_add_cpu before cpu_up as x86. But in my tests,
>> I found when cpu bring up failed, this cpu still be add to NUMA. Although
>> this does not affect the execution of the code (because CPU is offline),
>> But I don't think adding a offline CPU to NUMA makes sense.
> 
> Right, but again, why do you want to solve the problem on Arm and not 
> x86? After all, NUMA is not architecture specific (in fact you move most 
> of the code in common).
> 
> In fact, the risk, is someone may read arch/x86 and doesn't realize the 
> CPU is not in the node until late on Arm.
> 
> So I think we should call numa_add_cpu() around the same place on all 
> the architectures.

FWIW: +1

Jan



^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse device tree memory node
  2021-08-26  8:21       ` Julien Grall
@ 2021-08-26 11:54         ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-26 11:54 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月26日 16:22
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse
> device tree memory node
> 
> 
> 
> On 26/08/2021 07:35, Wei Chen wrote:
> > Hi Julien,
> 
> Hi Wei,
> 
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: 2021年8月25日 21:49
> >> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> >> sstabellini@kernel.org; jbeulich@suse.com
> >> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> >> Subject: Re: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse
> >> device tree memory node
> >>
> >> Hi Wei,
> >>
> >> On 11/08/2021 11:24, Wei Chen wrote:
> >>> Memory blocks' NUMA ID information is stored in device tree's
> >>> memory nodes as "numa-node-id". We need a new helper to parse
> >>> and verify this ID from memory nodes.
> >>>
> >>> In order to support memory affinity in later use, the valid
> >>> memory ranges and NUMA ID will be saved to tables.
> >>>
> >>> Signed-off-by: Wei Chen <wei.chen@arm.com>
> >>> ---
> >>>    xen/arch/arm/numa_device_tree.c | 130
> ++++++++++++++++++++++++++++++++
> >>>    1 file changed, 130 insertions(+)
> >>>
> >>> diff --git a/xen/arch/arm/numa_device_tree.c
> >> b/xen/arch/arm/numa_device_tree.c
> >>> index 37cc56acf3..bbe081dcd1 100644
> >>> --- a/xen/arch/arm/numa_device_tree.c
> >>> +++ b/xen/arch/arm/numa_device_tree.c
> >>> @@ -20,11 +20,13 @@
> >>>    #include <xen/init.h>
> >>>    #include <xen/nodemask.h>
> >>>    #include <xen/numa.h>
> >>> +#include <xen/libfdt/libfdt.h>
> >>>    #include <xen/device_tree.h>
> >>>    #include <asm/setup.h>
> >>>
> >>>    s8 device_tree_numa = 0;
> >>>    static nodemask_t processor_nodes_parsed __initdata;
> >>> +static nodemask_t memory_nodes_parsed __initdata;
> >>>
> >>>    static int srat_disabled(void)
> >>>    {
> >>> @@ -55,6 +57,79 @@ static int __init
> >> dtb_numa_processor_affinity_init(nodeid_t node)
> >>>        return 0;
> >>>    }
> >>>
> >>> +/* Callback for parsing of the memory regions affinity */
> >>> +static int __init dtb_numa_memory_affinity_init(nodeid_t node,
> >>> +                                paddr_t start, paddr_t size)
> >>> +{
> >>
> >> The implementation of this function is quite similar ot the ACPI
> >> version. Can this be abstracted?
> >
> > In my draft, I had tried to merge ACPI and DTB versions in one
> > function. I introduced a number of "if else" to distinguish ACPI
> > from DTB, especially ACPI hotplug. The function seems very messy.
> > Not enough benefits to make up for the mess, so I gave up.
> 
> It think you can get away from distinguishing between ACPI and DT in
> that helper:
>    * ma->flags & ACPI_SRAT_MEM_HOTPLUGGABLE could be replace by an
> argument indicating whether the region is hotpluggable (this would
> always be false for DT)
>    * Access to memblk_hotplug can be stubbed (in the future we may want
> to consider memory hotplug even on Arm).
> 
> Do you still have the "if else" version? If so can you post it?
> 

I just tried to do that in draft process, because I was not satisfied
with the changes, I haven't saved them as a patch.

I think your suggestions are worth to try again, I will do it
in next version.


> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 26/40] xen/arm: Add boot and secondary CPU to NUMA system
  2021-08-26  9:39         ` Jan Beulich
@ 2021-08-26 12:08           ` Wei Chen
  2021-08-26 12:26             ` Jan Beulich
  0 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-26 12:08 UTC (permalink / raw)
  To: Jan Beulich, Julien Grall; +Cc: Bertrand Marquis, xen-devel, sstabellini

Hi Jan, Julien,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2021年8月26日 17:40
> To: Julien Grall <julien@xen.org>; Wei Chen <Wei.Chen@arm.com>
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; xen-
> devel@lists.xenproject.org; sstabellini@kernel.org
> Subject: Re: [XEN RFC PATCH 26/40] xen/arm: Add boot and secondary CPU to
> NUMA system
> 
> On 26.08.2021 10:49, Julien Grall wrote:
> > On 26/08/2021 08:24, Wei Chen wrote:
> >>> -----Original Message-----
> >>> From: Julien Grall <julien@xen.org>
> >>> Sent: 2021年8月26日 0:58
> >>> On 11/08/2021 11:24, Wei Chen wrote:
> >>>> --- a/xen/arch/arm/smpboot.c
> >>>> +++ b/xen/arch/arm/smpboot.c
> >>>> @@ -358,6 +358,12 @@ void start_secondary(void)
> >>>>         */
> >>>>        smp_wmb();
> >>>>
> >>>> +    /*
> >>>> +     * If Xen is running on a NUMA off system, there will
> >>>> +     * be a node#0 at least.
> >>>> +     */
> >>>> +    numa_add_cpu(cpuid);
> >>>> +
> >>>
> >>> On x86, numa_add_cpu() will be called before the pCPU is brought up. I
> >>> am not quite too sure why we are doing it differently here. Can you
> >>> clarify it?
> >>
> >> Of course we can invoke numa_add_cpu before cpu_up as x86. But in my
> tests,
> >> I found when cpu bring up failed, this cpu still be add to NUMA.
> Although
> >> this does not affect the execution of the code (because CPU is offline),
> >> But I don't think adding a offline CPU to NUMA makes sense.
> >
> > Right, but again, why do you want to solve the problem on Arm and not
> > x86? After all, NUMA is not architecture specific (in fact you move most
> > of the code in common).
> >

I am not very familiar with x86, so when I was composing this patch series,
I always thought that if I could solve it inside Arm Arch, I would solve it
inside Arm Arch. That seems a bit conservative, and inappropriate on solving
this problem.

> > In fact, the risk, is someone may read arch/x86 and doesn't realize the
> > CPU is not in the node until late on Arm.
> >
> > So I think we should call numa_add_cpu() around the same place on all
> > the architectures.
> 
> FWIW: +1

I agree. As Jan in this discussion. How about following current x86's
numa_add_cpu behaviors in __start_xen, but add some code to revert
numa_add_cpu when cpu_up failed (both Arm and x86)?

> 
> Jan


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 26/40] xen/arm: Add boot and secondary CPU to NUMA system
  2021-08-26 12:08           ` Wei Chen
@ 2021-08-26 12:26             ` Jan Beulich
  0 siblings, 0 replies; 196+ messages in thread
From: Jan Beulich @ 2021-08-26 12:26 UTC (permalink / raw)
  To: Wei Chen; +Cc: Bertrand Marquis, xen-devel, sstabellini, Julien Grall

On 26.08.2021 14:08, Wei Chen wrote:
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 2021年8月26日 17:40
>>
>> On 26.08.2021 10:49, Julien Grall wrote:
>>> Right, but again, why do you want to solve the problem on Arm and not
>>> x86? After all, NUMA is not architecture specific (in fact you move most
>>> of the code in common).
>>>
> 
> I am not very familiar with x86, so when I was composing this patch series,
> I always thought that if I could solve it inside Arm Arch, I would solve it
> inside Arm Arch. That seems a bit conservative, and inappropriate on solving
> this problem.
> 
>>> In fact, the risk, is someone may read arch/x86 and doesn't realize the
>>> CPU is not in the node until late on Arm.
>>>
>>> So I think we should call numa_add_cpu() around the same place on all
>>> the architectures.
>>
>> FWIW: +1
> 
> I agree. As Jan in this discussion. How about following current x86's
> numa_add_cpu behaviors in __start_xen, but add some code to revert
> numa_add_cpu when cpu_up failed (both Arm and x86)?

Sure - if we don't clean up properly on x86 on a failure path, I'm all
for having that fixed.

Jan



^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 16/40] xen/arm: Create a fake NUMA node to use common code
  2021-08-11 10:23 ` [XEN RFC PATCH 16/40] xen/arm: Create a fake NUMA node to use common code Wei Chen
@ 2021-08-26 23:10   ` Stefano Stabellini
  2021-08-27  1:15     ` Wei Chen
  2021-08-27  6:18     ` Jan Beulich
  0 siblings, 2 replies; 196+ messages in thread
From: Stefano Stabellini @ 2021-08-26 23:10 UTC (permalink / raw)
  To: Wei Chen; +Cc: xen-devel, sstabellini, julien, jbeulich, Bertrand.Marquis

On Wed, 11 Aug 2021, Wei Chen wrote:
> When CONFIG_NUMA is enabled for Arm, Xen will switch to use common
> NUMA API instead of previous fake NUMA API. Before we parse NUMA
> information from device tree or ACPI SRAT table, we need to init
> the NUMA related variables, like cpu_to_node, as single node NUMA
> system.
> 
> So in this patch, we introduce a numa_init function for to
> initialize these data structures as all resources belongs to node#0.
> This will make the new API returns the same values as the fake API
> has done.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>  xen/arch/arm/numa.c        | 53 ++++++++++++++++++++++++++++++++++++++
>  xen/arch/arm/setup.c       |  8 ++++++
>  xen/include/asm-arm/numa.h | 11 ++++++++
>  3 files changed, 72 insertions(+)
> 
> diff --git a/xen/arch/arm/numa.c b/xen/arch/arm/numa.c
> index 1e30c5bb13..566ad1e52b 100644
> --- a/xen/arch/arm/numa.c
> +++ b/xen/arch/arm/numa.c
> @@ -20,6 +20,8 @@
>  #include <xen/init.h>
>  #include <xen/nodemask.h>
>  #include <xen/numa.h>
> +#include <xen/pfn.h>
> +#include <asm/setup.h>
>  
>  void numa_set_node(int cpu, nodeid_t nid)
>  {
> @@ -29,3 +31,54 @@ void numa_set_node(int cpu, nodeid_t nid)
>  
>      cpu_to_node[cpu] = nid;
>  }
> +
> +void __init numa_init(bool acpi_off)
> +{
> +    uint32_t idx;
> +    paddr_t ram_start = ~0;
> +    paddr_t ram_size = 0;
> +    paddr_t ram_end = 0;
> +
> +    printk(XENLOG_WARNING
> +        "NUMA has not been supported yet, NUMA off!\n");

NIT: please align


> +    /* Arm NUMA has not been implemented until this patch */

"Arm NUMA is not implemented yet"


> +    numa_off = true;
> +
> +    /*
> +     * Set all cpu_to_node mapping to 0, this will make cpu_to_node
> +     * function return 0 as previous fake cpu_to_node API.
> +     */
> +    for ( idx = 0; idx < NR_CPUS; idx++ )
> +        cpu_to_node[idx] = 0;
> +
> +    /*
> +     * Make node_to_cpumask, node_spanned_pages and node_start_pfn
> +     * return as previous fake APIs.
> +     */
> +    for ( idx = 0; idx < MAX_NUMNODES; idx++ ) {
> +        node_to_cpumask[idx] = cpu_online_map;
> +        node_spanned_pages(idx) = (max_page - mfn_x(first_valid_mfn));
> +        node_start_pfn(idx) = (mfn_x(first_valid_mfn));
> +    }

I just want to note that this works because MAX_NUMNODES is 1. If
MAX_NUMNODES was > 1 then it would be wrong to set node_to_cpumask,
node_spanned_pages and node_start_pfn for all nodes to the same values.

It might be worth writing something about it in the in-code comment.


> +    /*
> +     * Find the minimal and maximum address of RAM, NUMA will
> +     * build a memory to node mapping table for the whole range.
> +     */
> +    ram_start = bootinfo.mem.bank[0].start;
> +    ram_size  = bootinfo.mem.bank[0].size;
> +    ram_end   = ram_start + ram_size;
> +    for ( idx = 1 ; idx < bootinfo.mem.nr_banks; idx++ )
> +    {
> +        paddr_t bank_start = bootinfo.mem.bank[idx].start;
> +        paddr_t bank_size = bootinfo.mem.bank[idx].size;
> +        paddr_t bank_end = bank_start + bank_size;
> +
> +        ram_size  = ram_size + bank_size;

ram_size is updated but not utilized


> +        ram_start = min(ram_start, bank_start);
> +        ram_end   = max(ram_end, bank_end);
> +    }
> +
> +    numa_initmem_init(PFN_UP(ram_start), PFN_DOWN(ram_end));
> +    return;
> +}
> diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
> index 63a908e325..3c58d2d441 100644
> --- a/xen/arch/arm/setup.c
> +++ b/xen/arch/arm/setup.c
> @@ -30,6 +30,7 @@
>  #include <xen/init.h>
>  #include <xen/irq.h>
>  #include <xen/mm.h>
> +#include <xen/numa.h>
>  #include <xen/param.h>
>  #include <xen/softirq.h>
>  #include <xen/keyhandler.h>
> @@ -874,6 +875,13 @@ void __init start_xen(unsigned long boot_phys_offset,
>      /* Parse the ACPI tables for possible boot-time configuration */
>      acpi_boot_table_init();
>  
> +    /*
> +     * Try to initialize NUMA system, if failed, the system will
> +     * fallback to uniform system which means system has only 1
> +     * NUMA node.
> +     */
> +    numa_init(acpi_disabled);
> +
>      end_boot_allocator();
>  
>      /*
> diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
> index b2982f9053..bb495a24e1 100644
> --- a/xen/include/asm-arm/numa.h
> +++ b/xen/include/asm-arm/numa.h
> @@ -13,6 +13,16 @@ typedef u8 nodeid_t;
>   */
>  #define NODES_SHIFT      6
>  
> +extern void numa_init(bool acpi_off);
> +
> +/*
> + * Temporary for fake NUMA node, when CPU, memory and distance
> + * matrix will be read from DTB or ACPI SRAT. The following
> + * symbols will be removed.
> + */
> +extern mfn_t first_valid_mfn;
> +#define __node_distance(a, b) (20)
> +
>  #else
>  
>  /* Fake one node for now. See also node_online_map. */
> @@ -35,6 +45,7 @@ extern mfn_t first_valid_mfn;
>  #define node_start_pfn(nid) (mfn_x(first_valid_mfn))
>  #define __node_distance(a, b) (20)
>  
> +#define numa_init(x) do { } while (0)
>  #define numa_set_node(x, y) do { } while (0)
>  
>  #endif
> -- 
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 18/40] xen/arm: Keep memory nodes in dtb for NUMA when boot from EFI
  2021-08-11 10:24 ` [XEN RFC PATCH 18/40] xen/arm: Keep memory nodes in dtb for NUMA when boot from EFI Wei Chen
  2021-08-19 17:35   ` Julien Grall
@ 2021-08-26 23:24   ` Stefano Stabellini
  2021-08-27  7:41     ` Julien Grall
  2021-08-27  9:23     ` Wei Chen
  1 sibling, 2 replies; 196+ messages in thread
From: Stefano Stabellini @ 2021-08-26 23:24 UTC (permalink / raw)
  To: Wei Chen; +Cc: xen-devel, sstabellini, julien, jbeulich, Bertrand.Marquis

On Wed, 11 Aug 2021, Wei Chen wrote:
> EFI can get memory map from EFI system table. But EFI system
> table doesn't contain memory NUMA information, EFI depends on
> ACPI SRAT or device tree memory node to parse memory blocks'
> NUMA mapping.
> 
> But in current code, when Xen is booting from EFI, it will
> delete all memory nodes in device tree. So in UEFI + DTB
> boot, we don't have numa-node-id for memory blocks any more.
> 
> So in this patch, we will keep memory nodes in device tree for
> NUMA code to parse memory numa-node-id later.
> 
> As a side effect, if we still parse boot memory information in
> early_scan_node, bootmem.info will calculate memory ranges in
> memory nodes twice. So we have to prvent early_scan_node to
> parse memory nodes in EFI boot.
> 
> As EFI APIs only can be used in Arm64, so we introduced a wrapper
> in header file to prevent #ifdef CONFIG_ARM_64/32 in code block.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>  xen/arch/arm/bootfdt.c      |  8 +++++++-
>  xen/arch/arm/efi/efi-boot.h | 25 -------------------------
>  xen/include/asm-arm/setup.h |  6 ++++++
>  3 files changed, 13 insertions(+), 26 deletions(-)
> 
> diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
> index 476e32e0f5..7df149dbca 100644
> --- a/xen/arch/arm/bootfdt.c
> +++ b/xen/arch/arm/bootfdt.c
> @@ -11,6 +11,7 @@
>  #include <xen/lib.h>
>  #include <xen/kernel.h>
>  #include <xen/init.h>
> +#include <xen/efi.h>
>  #include <xen/device_tree.h>
>  #include <xen/libfdt/libfdt.h>
>  #include <xen/sort.h>
> @@ -335,7 +336,12 @@ static int __init early_scan_node(const void *fdt,
>  {
>      int rc = 0;
>  
> -    if ( device_tree_node_matches(fdt, node, "memory") )
> +    /*
> +     * If system boot from EFI, bootinfo.mem has been set by EFI,
> +     * so we don't need to parse memory node from DTB.
> +     */
> +    if ( device_tree_node_matches(fdt, node, "memory") &&
> +         !arch_efi_enabled(EFI_BOOT) )
>          rc = process_memory_node(fdt, node, name, depth,
>                                   address_cells, size_cells, &bootinfo.mem);
>      else if ( depth == 1 && !dt_node_cmp(name, "reserved-memory") )


If we are going to use the device tree info for the numa nodes (and
related memory) does it make sense to still rely on the EFI tables for
the memory map?

I wonder if we should just use device tree for memory and ignore EFI
instead. Do you know what Linux does in this regard?


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 20/40] xen/arm: implement node distance helpers for Arm64
  2021-08-11 10:24 ` [XEN RFC PATCH 20/40] xen/arm: implement node distance helpers for Arm64 Wei Chen
@ 2021-08-26 23:52   ` Stefano Stabellini
  2021-08-27  9:30     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Stefano Stabellini @ 2021-08-26 23:52 UTC (permalink / raw)
  To: Wei Chen; +Cc: xen-devel, sstabellini, julien, jbeulich, Bertrand.Marquis

On Wed, 11 Aug 2021, Wei Chen wrote:
> In current Xen code, __node_distance is a fake API, it always
> returns NUMA_REMOTE_DISTANCE(20). Now we use a matrix to record
> the distance between any two nodes. Accordingly, we provide a
> set_node_distance API to set the distance for any two nodes in
> this patch.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>  xen/arch/arm/numa.c        | 44 ++++++++++++++++++++++++++++++++++++++
>  xen/include/asm-arm/numa.h | 12 ++++++++++-
>  xen/include/asm-x86/numa.h |  1 -
>  xen/include/xen/numa.h     |  2 +-
>  4 files changed, 56 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/arm/numa.c b/xen/arch/arm/numa.c
> index 566ad1e52b..f61a8df645 100644
> --- a/xen/arch/arm/numa.c
> +++ b/xen/arch/arm/numa.c
> @@ -23,6 +23,11 @@
>  #include <xen/pfn.h>
>  #include <asm/setup.h>
>  
> +static uint8_t __read_mostly
> +node_distance_map[MAX_NUMNODES][MAX_NUMNODES] = {
> +    { NUMA_REMOTE_DISTANCE }
> +};
> +
>  void numa_set_node(int cpu, nodeid_t nid)
>  {
>      if ( nid >= MAX_NUMNODES ||
> @@ -32,6 +37,45 @@ void numa_set_node(int cpu, nodeid_t nid)
>      cpu_to_node[cpu] = nid;
>  }
>  
> +void __init numa_set_distance(nodeid_t from, nodeid_t to, uint32_t distance)
> +{
> +    if ( from >= MAX_NUMNODES || to >= MAX_NUMNODES )
> +    {
> +        printk(KERN_WARNING
> +            "NUMA nodes are out of matrix, from=%u to=%u distance=%u\n",
> +            from, to, distance);

NIT: please align. Example:

printk(KERN_WARNING
       "NUMA nodes are out of matrix, from=%u to=%u distance=%u\n",

Also please use PRIu32 for uint32_t. Probably should use PRIu8 for
nodeids.


> +        return;
> +    }
> +
> +    /* NUMA defines 0xff as an unreachable node and 0-9 are undefined */
> +    if ( distance >= NUMA_NO_DISTANCE ||
> +        (distance >= NUMA_DISTANCE_UDF_MIN &&
> +         distance <= NUMA_DISTANCE_UDF_MAX) ||
> +        (from == to && distance != NUMA_LOCAL_DISTANCE) )
> +    {
> +        printk(KERN_WARNING
> +            "Invalid NUMA node distance, from:%d to:%d distance=%d\n",
> +            from, to, distance);

NIT: please align

Also you used %u before for nodeids, which is better because from and to
are unsigned. Distance should be uint32_t.


> +        return;
> +    }
> +
> +    node_distance_map[from][to] = distance;

Shouldn't we also be setting:

    node_distance_map[to][from] = distance;

?


> +}
> +
> +uint8_t __node_distance(nodeid_t from, nodeid_t to)
> +{
> +    /*
> +     * Check whether the nodes are in the matrix range.
> +     * When any node is out of range, except from and to nodes are the
> +     * same, we treat them as unreachable (return 0xFF)
> +     */
> +    if ( from >= MAX_NUMNODES || to >= MAX_NUMNODES )
> +        return from == to ? NUMA_LOCAL_DISTANCE : NUMA_NO_DISTANCE;
> +
> +    return node_distance_map[from][to];
> +}
> +EXPORT_SYMBOL(__node_distance);
> +
>  void __init numa_init(bool acpi_off)
>  {
>      uint32_t idx;
> diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
> index bb495a24e1..559b028a01 100644
> --- a/xen/include/asm-arm/numa.h
> +++ b/xen/include/asm-arm/numa.h
> @@ -12,8 +12,19 @@ typedef u8 nodeid_t;
>   * set the number of NUMA memory block number to 128.
>   */
>  #define NODES_SHIFT      6
> +/*
> + * In ACPI spec, 0-9 are the reserved values for node distance,
> + * 10 indicates local node distance, 20 indicates remote node
> + * distance. Set node distance map in device tree will follow
> + * the ACPI's definition.
> + */
> +#define NUMA_DISTANCE_UDF_MIN   0
> +#define NUMA_DISTANCE_UDF_MAX   9
> +#define NUMA_LOCAL_DISTANCE     10
> +#define NUMA_REMOTE_DISTANCE    20
>  
>  extern void numa_init(bool acpi_off);
> +extern void numa_set_distance(nodeid_t from, nodeid_t to, uint32_t distance);
>  
>  /*
>   * Temporary for fake NUMA node, when CPU, memory and distance
> @@ -21,7 +32,6 @@ extern void numa_init(bool acpi_off);
>   * symbols will be removed.
>   */
>  extern mfn_t first_valid_mfn;
> -#define __node_distance(a, b) (20)
>  
>  #else
>  
> diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
> index 5a57a51e26..e0253c20b7 100644
> --- a/xen/include/asm-x86/numa.h
> +++ b/xen/include/asm-x86/numa.h
> @@ -21,7 +21,6 @@ extern nodeid_t apicid_to_node[];
>  extern void init_cpu_to_node(void);
>  
>  void srat_parse_regions(u64 addr);
> -extern u8 __node_distance(nodeid_t a, nodeid_t b);
>  unsigned int arch_get_dma_bitsize(void);
>  
>  #endif
> diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
> index cb08d2eca9..0475823b13 100644
> --- a/xen/include/xen/numa.h
> +++ b/xen/include/xen/numa.h
> @@ -58,7 +58,7 @@ static inline __attribute__((pure)) nodeid_t phys_to_nid(paddr_t addr)
>  #define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
>  #define node_end_pfn(nid)       (NODE_DATA(nid)->node_start_pfn + \
>  				 NODE_DATA(nid)->node_spanned_pages)
> -
> +extern u8 __node_distance(nodeid_t a, nodeid_t b);
>  extern void numa_add_cpu(int cpu);
>  
>  struct node {
> -- 
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse device tree processor node
  2021-08-11 10:24 ` [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse device tree processor node Wei Chen
                     ` (2 preceding siblings ...)
  2021-08-19 18:13   ` Julien Grall
@ 2021-08-27  0:06   ` Stefano Stabellini
  2021-08-27  9:31     ` Wei Chen
  3 siblings, 1 reply; 196+ messages in thread
From: Stefano Stabellini @ 2021-08-27  0:06 UTC (permalink / raw)
  To: Wei Chen; +Cc: xen-devel, sstabellini, julien, jbeulich, Bertrand.Marquis

On Wed, 11 Aug 2021, Wei Chen wrote:
> Processor NUMA ID information is stored in device tree's processor
> node as "numa-node-id". We need a new helper to parse this ID from
> processor node. If we get this ID from processor node, this ID's
> validity still need to be checked. Once we got a invalid NUMA ID
> from any processor node, the device tree will be marked as NUMA
> information invalid.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>  xen/arch/arm/numa_device_tree.c | 41 +++++++++++++++++++++++++++++++--
>  1 file changed, 39 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/arm/numa_device_tree.c b/xen/arch/arm/numa_device_tree.c
> index 1c74ad135d..37cc56acf3 100644
> --- a/xen/arch/arm/numa_device_tree.c
> +++ b/xen/arch/arm/numa_device_tree.c
> @@ -20,16 +20,53 @@
>  #include <xen/init.h>
>  #include <xen/nodemask.h>
>  #include <xen/numa.h>
> +#include <xen/device_tree.h>
> +#include <asm/setup.h>
>  
>  s8 device_tree_numa = 0;
> +static nodemask_t processor_nodes_parsed __initdata;
>  
> -int srat_disabled(void)
> +static int srat_disabled(void)
>  {
>      return numa_off || device_tree_numa < 0;
>  }
>  
> -void __init bad_srat(void)
> +static __init void bad_srat(void)
>  {
>      printk(KERN_ERR "DT: NUMA information is not used.\n");
>      device_tree_numa = -1;
>  }
> +
> +/* Callback for device tree processor affinity */
> +static int __init dtb_numa_processor_affinity_init(nodeid_t node)
> +{
> +    if ( srat_disabled() )
> +        return -EINVAL;
> +    else if ( node == NUMA_NO_NODE || node >= MAX_NUMNODES ) {
> +		bad_srat();
> +		return -EINVAL;
> +	}
> +
> +    node_set(node, processor_nodes_parsed);
> +
> +    device_tree_numa = 1;
> +    printk(KERN_INFO "DT: NUMA node %u processor parsed\n", node);
> +
> +    return 0;
> +}
> +
> +/* Parse CPU NUMA node info */
> +int __init device_tree_parse_numa_cpu_node(const void *fdt, int node)
> +{
> +    uint32_t nid;
> +
> +    nid = device_tree_get_u32(fdt, node, "numa-node-id", MAX_NUMNODES);
> +    printk(XENLOG_WARNING "CPU on NUMA node:%u\n", nid);

Given that this is not actually a warning (is it?) then I would move it
to XENLOG_INFO


> +    if ( nid >= MAX_NUMNODES )
> +    {
> +        printk(XENLOG_WARNING "Node id %u exceeds maximum value\n", nid);

This could be XENLOG_ERR


> +        return -EINVAL;
> +    }
> +
> +    return dtb_numa_processor_affinity_init(nid);
> +}



^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 16/40] xen/arm: Create a fake NUMA node to use common code
  2021-08-26 23:10   ` Stefano Stabellini
@ 2021-08-27  1:15     ` Wei Chen
  2021-08-27  6:18     ` Jan Beulich
  1 sibling, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-27  1:15 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, julien, Bertrand Marquis

Hi Stefano,

> -----Original Message-----
> From: Stefano Stabellini <sstabellini@kernel.org>
> Sent: 2021年8月27日 7:10
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: xen-devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org;
> jbeulich@suse.com; Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 16/40] xen/arm: Create a fake NUMA node to use
> common code
> 
> On Wed, 11 Aug 2021, Wei Chen wrote:
> > When CONFIG_NUMA is enabled for Arm, Xen will switch to use common
> > NUMA API instead of previous fake NUMA API. Before we parse NUMA
> > information from device tree or ACPI SRAT table, we need to init
> > the NUMA related variables, like cpu_to_node, as single node NUMA
> > system.
> >
> > So in this patch, we introduce a numa_init function for to
> > initialize these data structures as all resources belongs to node#0.
> > This will make the new API returns the same values as the fake API
> > has done.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >  xen/arch/arm/numa.c        | 53 ++++++++++++++++++++++++++++++++++++++
> >  xen/arch/arm/setup.c       |  8 ++++++
> >  xen/include/asm-arm/numa.h | 11 ++++++++
> >  3 files changed, 72 insertions(+)
> >
> > diff --git a/xen/arch/arm/numa.c b/xen/arch/arm/numa.c
> > index 1e30c5bb13..566ad1e52b 100644
> > --- a/xen/arch/arm/numa.c
> > +++ b/xen/arch/arm/numa.c
> > @@ -20,6 +20,8 @@
> >  #include <xen/init.h>
> >  #include <xen/nodemask.h>
> >  #include <xen/numa.h>
> > +#include <xen/pfn.h>
> > +#include <asm/setup.h>
> >
> >  void numa_set_node(int cpu, nodeid_t nid)
> >  {
> > @@ -29,3 +31,54 @@ void numa_set_node(int cpu, nodeid_t nid)
> >
> >      cpu_to_node[cpu] = nid;
> >  }
> > +
> > +void __init numa_init(bool acpi_off)
> > +{
> > +    uint32_t idx;
> > +    paddr_t ram_start = ~0;
> > +    paddr_t ram_size = 0;
> > +    paddr_t ram_end = 0;
> > +
> > +    printk(XENLOG_WARNING
> > +        "NUMA has not been supported yet, NUMA off!\n");
> 
> NIT: please align
> 


OK

> 
> > +    /* Arm NUMA has not been implemented until this patch */
> 
> "Arm NUMA is not implemented yet"
> 

OK

> 
> > +    numa_off = true;
> > +
> > +    /*
> > +     * Set all cpu_to_node mapping to 0, this will make cpu_to_node
> > +     * function return 0 as previous fake cpu_to_node API.
> > +     */
> > +    for ( idx = 0; idx < NR_CPUS; idx++ )
> > +        cpu_to_node[idx] = 0;
> > +
> > +    /*
> > +     * Make node_to_cpumask, node_spanned_pages and node_start_pfn
> > +     * return as previous fake APIs.
> > +     */
> > +    for ( idx = 0; idx < MAX_NUMNODES; idx++ ) {
> > +        node_to_cpumask[idx] = cpu_online_map;
> > +        node_spanned_pages(idx) = (max_page - mfn_x(first_valid_mfn));
> > +        node_start_pfn(idx) = (mfn_x(first_valid_mfn));
> > +    }
> 
> I just want to note that this works because MAX_NUMNODES is 1. If
> MAX_NUMNODES was > 1 then it would be wrong to set node_to_cpumask,
> node_spanned_pages and node_start_pfn for all nodes to the same values.
> 
> It might be worth writing something about it in the in-code comment.
> 

OK, I will do it.

> 
> > +    /*
> > +     * Find the minimal and maximum address of RAM, NUMA will
> > +     * build a memory to node mapping table for the whole range.
> > +     */
> > +    ram_start = bootinfo.mem.bank[0].start;
> > +    ram_size  = bootinfo.mem.bank[0].size;
> > +    ram_end   = ram_start + ram_size;
> > +    for ( idx = 1 ; idx < bootinfo.mem.nr_banks; idx++ )
> > +    {
> > +        paddr_t bank_start = bootinfo.mem.bank[idx].start;
> > +        paddr_t bank_size = bootinfo.mem.bank[idx].size;
> > +        paddr_t bank_end = bank_start + bank_size;
> > +
> > +        ram_size  = ram_size + bank_size;
> 
> ram_size is updated but not utilized
> 

Ok, I will remove it.

> 
> > +        ram_start = min(ram_start, bank_start);
> > +        ram_end   = max(ram_end, bank_end);
> > +    }
> > +
> > +    numa_initmem_init(PFN_UP(ram_start), PFN_DOWN(ram_end));
> > +    return;
> > +}
> > diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
> > index 63a908e325..3c58d2d441 100644
> > --- a/xen/arch/arm/setup.c
> > +++ b/xen/arch/arm/setup.c
> > @@ -30,6 +30,7 @@
> >  #include <xen/init.h>
> >  #include <xen/irq.h>
> >  #include <xen/mm.h>
> > +#include <xen/numa.h>
> >  #include <xen/param.h>
> >  #include <xen/softirq.h>
> >  #include <xen/keyhandler.h>
> > @@ -874,6 +875,13 @@ void __init start_xen(unsigned long
> boot_phys_offset,
> >      /* Parse the ACPI tables for possible boot-time configuration */
> >      acpi_boot_table_init();
> >
> > +    /*
> > +     * Try to initialize NUMA system, if failed, the system will
> > +     * fallback to uniform system which means system has only 1
> > +     * NUMA node.
> > +     */
> > +    numa_init(acpi_disabled);
> > +
> >      end_boot_allocator();
> >
> >      /*
> > diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
> > index b2982f9053..bb495a24e1 100644
> > --- a/xen/include/asm-arm/numa.h
> > +++ b/xen/include/asm-arm/numa.h
> > @@ -13,6 +13,16 @@ typedef u8 nodeid_t;
> >   */
> >  #define NODES_SHIFT      6
> >
> > +extern void numa_init(bool acpi_off);
> > +
> > +/*
> > + * Temporary for fake NUMA node, when CPU, memory and distance
> > + * matrix will be read from DTB or ACPI SRAT. The following
> > + * symbols will be removed.
> > + */
> > +extern mfn_t first_valid_mfn;
> > +#define __node_distance(a, b) (20)
> > +
> >  #else
> >
> >  /* Fake one node for now. See also node_online_map. */
> > @@ -35,6 +45,7 @@ extern mfn_t first_valid_mfn;
> >  #define node_start_pfn(nid) (mfn_x(first_valid_mfn))
> >  #define __node_distance(a, b) (20)
> >
> > +#define numa_init(x) do { } while (0)
> >  #define numa_set_node(x, y) do { } while (0)
> >
> >  #endif
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 16/40] xen/arm: Create a fake NUMA node to use common code
  2021-08-26 23:10   ` Stefano Stabellini
  2021-08-27  1:15     ` Wei Chen
@ 2021-08-27  6:18     ` Jan Beulich
  2021-08-27  9:32       ` Wei Chen
  1 sibling, 1 reply; 196+ messages in thread
From: Jan Beulich @ 2021-08-27  6:18 UTC (permalink / raw)
  To: Stefano Stabellini, Wei Chen; +Cc: xen-devel, julien, Bertrand.Marquis

On 27.08.2021 01:10, Stefano Stabellini wrote:
> On Wed, 11 Aug 2021, Wei Chen wrote:
>> @@ -29,3 +31,54 @@ void numa_set_node(int cpu, nodeid_t nid)
>>  
>>      cpu_to_node[cpu] = nid;
>>  }
>> +
>> +void __init numa_init(bool acpi_off)
>> +{
>> +    uint32_t idx;
>> +    paddr_t ram_start = ~0;
>> +    paddr_t ram_size = 0;
>> +    paddr_t ram_end = 0;
>> +
>> +    printk(XENLOG_WARNING
>> +        "NUMA has not been supported yet, NUMA off!\n");
> 
> NIT: please align
> 
> 
>> +    /* Arm NUMA has not been implemented until this patch */
> 
> "Arm NUMA is not implemented yet"
> 
> 
>> +    numa_off = true;
>> +
>> +    /*
>> +     * Set all cpu_to_node mapping to 0, this will make cpu_to_node
>> +     * function return 0 as previous fake cpu_to_node API.
>> +     */
>> +    for ( idx = 0; idx < NR_CPUS; idx++ )
>> +        cpu_to_node[idx] = 0;
>> +
>> +    /*
>> +     * Make node_to_cpumask, node_spanned_pages and node_start_pfn
>> +     * return as previous fake APIs.
>> +     */
>> +    for ( idx = 0; idx < MAX_NUMNODES; idx++ ) {
>> +        node_to_cpumask[idx] = cpu_online_map;
>> +        node_spanned_pages(idx) = (max_page - mfn_x(first_valid_mfn));
>> +        node_start_pfn(idx) = (mfn_x(first_valid_mfn));
>> +    }
> 
> I just want to note that this works because MAX_NUMNODES is 1. If
> MAX_NUMNODES was > 1 then it would be wrong to set node_to_cpumask,
> node_spanned_pages and node_start_pfn for all nodes to the same values.
> 
> It might be worth writing something about it in the in-code comment.

Plus perhaps BUILD_BUG_ON(MAX_NUMNODES != 1), so the issue is actually
noticed at build time once the constant gets changed?

Jan



^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 18/40] xen/arm: Keep memory nodes in dtb for NUMA when boot from EFI
  2021-08-26 23:24   ` Stefano Stabellini
@ 2021-08-27  7:41     ` Julien Grall
  2021-08-27 23:10       ` Stefano Stabellini
  2021-08-27  9:23     ` Wei Chen
  1 sibling, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-27  7:41 UTC (permalink / raw)
  To: Stefano Stabellini, Wei Chen; +Cc: xen-devel, jbeulich, Bertrand.Marquis

Hi Stefano,

On 27/08/2021 00:24, Stefano Stabellini wrote:
> On Wed, 11 Aug 2021, Wei Chen wrote:
>> EFI can get memory map from EFI system table. But EFI system
>> table doesn't contain memory NUMA information, EFI depends on
>> ACPI SRAT or device tree memory node to parse memory blocks'
>> NUMA mapping.
>>
>> But in current code, when Xen is booting from EFI, it will
>> delete all memory nodes in device tree. So in UEFI + DTB
>> boot, we don't have numa-node-id for memory blocks any more.
>>
>> So in this patch, we will keep memory nodes in device tree for
>> NUMA code to parse memory numa-node-id later.
>>
>> As a side effect, if we still parse boot memory information in
>> early_scan_node, bootmem.info will calculate memory ranges in
>> memory nodes twice. So we have to prvent early_scan_node to
>> parse memory nodes in EFI boot.
>>
>> As EFI APIs only can be used in Arm64, so we introduced a wrapper
>> in header file to prevent #ifdef CONFIG_ARM_64/32 in code block.
>>
>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>> ---
>>   xen/arch/arm/bootfdt.c      |  8 +++++++-
>>   xen/arch/arm/efi/efi-boot.h | 25 -------------------------
>>   xen/include/asm-arm/setup.h |  6 ++++++
>>   3 files changed, 13 insertions(+), 26 deletions(-)
>>
>> diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
>> index 476e32e0f5..7df149dbca 100644
>> --- a/xen/arch/arm/bootfdt.c
>> +++ b/xen/arch/arm/bootfdt.c
>> @@ -11,6 +11,7 @@
>>   #include <xen/lib.h>
>>   #include <xen/kernel.h>
>>   #include <xen/init.h>
>> +#include <xen/efi.h>
>>   #include <xen/device_tree.h>
>>   #include <xen/libfdt/libfdt.h>
>>   #include <xen/sort.h>
>> @@ -335,7 +336,12 @@ static int __init early_scan_node(const void *fdt,
>>   {
>>       int rc = 0;
>>   
>> -    if ( device_tree_node_matches(fdt, node, "memory") )
>> +    /*
>> +     * If system boot from EFI, bootinfo.mem has been set by EFI,
>> +     * so we don't need to parse memory node from DTB.
>> +     */
>> +    if ( device_tree_node_matches(fdt, node, "memory") &&
>> +         !arch_efi_enabled(EFI_BOOT) )
>>           rc = process_memory_node(fdt, node, name, depth,
>>                                    address_cells, size_cells, &bootinfo.mem);
>>       else if ( depth == 1 && !dt_node_cmp(name, "reserved-memory") )
> 
> 
> If we are going to use the device tree info for the numa nodes (and
> related memory) does it make sense to still rely on the EFI tables for
> the memory map?

Yes. AFAIK, when booting using EFI, the Device-Tree may not contain all 
the reserved regions. Furthermore, we are still too early to know 
whether we boot using ACPI and DT.

> 
> I wonder if we should just use device tree for memory and ignore EFI
> instead. Do you know what Linux does in this regard?
I looked at Linux when I first reviewed this patch because I was 
wondering what happens if the DT and UEFI map disagrees.

Linux and Xen are the same after this patch:
   1) The memory map is coming from UEFI map
   2) NUMA ID is coming from the DT

The commit that introduced the change in Linux is:

commit 500899c2cc3e3f06140373b587a69d30650f2d9d
Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Fri Apr 8 15:50:23 2016 -0700

     efi: ARM/arm64: ignore DT memory nodes instead of removing them

     There are two problems with the UEFI stub DT memory node removal
     routine:
     - it deletes nodes as it traverses the tree, which happens to work
       but is not supported, as deletion invalidates the node iterator;
     - deleting memory nodes entirely may discard annotations in the form
       of additional properties on the nodes.

     Since the discovery of DT memory nodes occurs strictly before the
     UEFI init sequence, we can simply clear the memblock memory table
     before parsing the UEFI memory map. This way, it is no longer
     necessary to remove the nodes, so we can remove that logic from the
     stub as well.

     Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
     Acked-by: Steve Capper <steve.capper@arm.com>
     Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
     Signed-off-by: David Daney <david.daney@cavium.com>
     Signed-off-by: Will Deacon <will.deacon@arm.com>

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 18/40] xen/arm: Keep memory nodes in dtb for NUMA when boot from EFI
  2021-08-26 23:24   ` Stefano Stabellini
  2021-08-27  7:41     ` Julien Grall
@ 2021-08-27  9:23     ` Wei Chen
  1 sibling, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-27  9:23 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, julien, Bertrand Marquis

Hi Stefano,

> -----Original Message-----
> From: Stefano Stabellini <sstabellini@kernel.org>
> Sent: 2021年8月27日 7:25
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: xen-devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org;
> jbeulich@suse.com; Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 18/40] xen/arm: Keep memory nodes in dtb for
> NUMA when boot from EFI
> 
> On Wed, 11 Aug 2021, Wei Chen wrote:
> > EFI can get memory map from EFI system table. But EFI system
> > table doesn't contain memory NUMA information, EFI depends on
> > ACPI SRAT or device tree memory node to parse memory blocks'
> > NUMA mapping.
> >
> > But in current code, when Xen is booting from EFI, it will
> > delete all memory nodes in device tree. So in UEFI + DTB
> > boot, we don't have numa-node-id for memory blocks any more.
> >
> > So in this patch, we will keep memory nodes in device tree for
> > NUMA code to parse memory numa-node-id later.
> >
> > As a side effect, if we still parse boot memory information in
> > early_scan_node, bootmem.info will calculate memory ranges in
> > memory nodes twice. So we have to prvent early_scan_node to
> > parse memory nodes in EFI boot.
> >
> > As EFI APIs only can be used in Arm64, so we introduced a wrapper
> > in header file to prevent #ifdef CONFIG_ARM_64/32 in code block.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >  xen/arch/arm/bootfdt.c      |  8 +++++++-
> >  xen/arch/arm/efi/efi-boot.h | 25 -------------------------
> >  xen/include/asm-arm/setup.h |  6 ++++++
> >  3 files changed, 13 insertions(+), 26 deletions(-)
> >
> > diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
> > index 476e32e0f5..7df149dbca 100644
> > --- a/xen/arch/arm/bootfdt.c
> > +++ b/xen/arch/arm/bootfdt.c
> > @@ -11,6 +11,7 @@
> >  #include <xen/lib.h>
> >  #include <xen/kernel.h>
> >  #include <xen/init.h>
> > +#include <xen/efi.h>
> >  #include <xen/device_tree.h>
> >  #include <xen/libfdt/libfdt.h>
> >  #include <xen/sort.h>
> > @@ -335,7 +336,12 @@ static int __init early_scan_node(const void *fdt,
> >  {
> >      int rc = 0;
> >
> > -    if ( device_tree_node_matches(fdt, node, "memory") )
> > +    /*
> > +     * If system boot from EFI, bootinfo.mem has been set by EFI,
> > +     * so we don't need to parse memory node from DTB.
> > +     */
> > +    if ( device_tree_node_matches(fdt, node, "memory") &&
> > +         !arch_efi_enabled(EFI_BOOT) )
> >          rc = process_memory_node(fdt, node, name, depth,
> >                                   address_cells, size_cells,
> &bootinfo.mem);
> >      else if ( depth == 1 && !dt_node_cmp(name, "reserved-memory") )
> 
> 
> If we are going to use the device tree info for the numa nodes (and
> related memory) does it make sense to still rely on the EFI tables for
> the memory map?
> 
> I wonder if we should just use device tree for memory and ignore EFI
> instead. Do you know what Linux does in this regard?

We don't use device tree for memory map on EFI boot. We just reply on
device tree to provide memory NUMA node info. Because EFI system table
doesn't contain this kind of data.

I have a quick look into Linux. Linux efi stub has a update_fdt function in:
drivers/firmware/efi/libstub/fdt.c. In this function, efi stub only delete
reserve memory nodes. Usable memory nodes will not be touched in this function.

Before Linux efi_init, early_init_dt_scan will be invoked. This means, if the
efi stub doesn't remove the normal memory nodes, these nodes will be found and
added to memblock by early_init_dt_add_memory_arch();

Later, in efi_init, sytemtable->memory_description will also be parsed,
the memory block will be added to memblock by early_init_dt_add_memory_arch.
So the duplicated memory nodes will be merged/ignored in memblock_add.

In Linux NUMA init, if ACPI is off, the NUMA info will be parsed from DT.

So I think in EFI boot, we use system table to parse memory, but use
DT to parse NUMA info. It doesn't seem particularly strange : )

Cheers,
Wei Chen




^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 20/40] xen/arm: implement node distance helpers for Arm64
  2021-08-26 23:52   ` Stefano Stabellini
@ 2021-08-27  9:30     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-27  9:30 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, julien, Bertrand Marquis

Hi Stefano,

> -----Original Message-----
> From: Stefano Stabellini <sstabellini@kernel.org>
> Sent: 2021年8月27日 7:52
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: xen-devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org;
> jbeulich@suse.com; Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 20/40] xen/arm: implement node distance
> helpers for Arm64
> 
> On Wed, 11 Aug 2021, Wei Chen wrote:
> > In current Xen code, __node_distance is a fake API, it always
> > returns NUMA_REMOTE_DISTANCE(20). Now we use a matrix to record
> > the distance between any two nodes. Accordingly, we provide a
> > set_node_distance API to set the distance for any two nodes in
> > this patch.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >  xen/arch/arm/numa.c        | 44 ++++++++++++++++++++++++++++++++++++++
> >  xen/include/asm-arm/numa.h | 12 ++++++++++-
> >  xen/include/asm-x86/numa.h |  1 -
> >  xen/include/xen/numa.h     |  2 +-
> >  4 files changed, 56 insertions(+), 3 deletions(-)
> >
> > diff --git a/xen/arch/arm/numa.c b/xen/arch/arm/numa.c
> > index 566ad1e52b..f61a8df645 100644
> > --- a/xen/arch/arm/numa.c
> > +++ b/xen/arch/arm/numa.c
> > @@ -23,6 +23,11 @@
> >  #include <xen/pfn.h>
> >  #include <asm/setup.h>
> >
> > +static uint8_t __read_mostly
> > +node_distance_map[MAX_NUMNODES][MAX_NUMNODES] = {
> > +    { NUMA_REMOTE_DISTANCE }
> > +};
> > +
> >  void numa_set_node(int cpu, nodeid_t nid)
> >  {
> >      if ( nid >= MAX_NUMNODES ||
> > @@ -32,6 +37,45 @@ void numa_set_node(int cpu, nodeid_t nid)
> >      cpu_to_node[cpu] = nid;
> >  }
> >
> > +void __init numa_set_distance(nodeid_t from, nodeid_t to, uint32_t
> distance)
> > +{
> > +    if ( from >= MAX_NUMNODES || to >= MAX_NUMNODES )
> > +    {
> > +        printk(KERN_WARNING
> > +            "NUMA nodes are out of matrix, from=%u to=%u distance=%u\n",
> > +            from, to, distance);
> 
> NIT: please align. Example:
> 
> printk(KERN_WARNING
>        "NUMA nodes are out of matrix, from=%u to=%u distance=%u\n",
> 
> Also please use PRIu32 for uint32_t. Probably should use PRIu8 for
> nodeids.
> 

OK

> 
> > +        return;
> > +    }
> > +
> > +    /* NUMA defines 0xff as an unreachable node and 0-9 are undefined
> */
> > +    if ( distance >= NUMA_NO_DISTANCE ||
> > +        (distance >= NUMA_DISTANCE_UDF_MIN &&
> > +         distance <= NUMA_DISTANCE_UDF_MAX) ||
> > +        (from == to && distance != NUMA_LOCAL_DISTANCE) )
> > +    {
> > +        printk(KERN_WARNING
> > +            "Invalid NUMA node distance, from:%d to:%d distance=%d\n",
> > +            from, to, distance);
> 
> NIT: please align
> 
> Also you used %u before for nodeids, which is better because from and to
> are unsigned. Distance should be uint32_t.
> 

OK

> 
> > +        return;
> > +    }
> > +
> > +    node_distance_map[from][to] = distance;
> 
> Shouldn't we also be setting:
> 
>     node_distance_map[to][from] = distance;
> 
> ?
> 

No, we want numa_set_distance behavior is single.
"node_distance_map[to][from] = distance" is handled in caller.

> 
> > +}
> > +
> > +uint8_t __node_distance(nodeid_t from, nodeid_t to)
> > +{
> > +    /*
> > +     * Check whether the nodes are in the matrix range.
> > +     * When any node is out of range, except from and to nodes are the
> > +     * same, we treat them as unreachable (return 0xFF)
> > +     */
> > +    if ( from >= MAX_NUMNODES || to >= MAX_NUMNODES )
> > +        return from == to ? NUMA_LOCAL_DISTANCE : NUMA_NO_DISTANCE;
> > +
> > +    return node_distance_map[from][to];
> > +}
> > +EXPORT_SYMBOL(__node_distance);
> > +
> >  void __init numa_init(bool acpi_off)
> >  {
> >      uint32_t idx;
> > diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
> > index bb495a24e1..559b028a01 100644
> > --- a/xen/include/asm-arm/numa.h
> > +++ b/xen/include/asm-arm/numa.h
> > @@ -12,8 +12,19 @@ typedef u8 nodeid_t;
> >   * set the number of NUMA memory block number to 128.
> >   */
> >  #define NODES_SHIFT      6
> > +/*
> > + * In ACPI spec, 0-9 are the reserved values for node distance,
> > + * 10 indicates local node distance, 20 indicates remote node
> > + * distance. Set node distance map in device tree will follow
> > + * the ACPI's definition.
> > + */
> > +#define NUMA_DISTANCE_UDF_MIN   0
> > +#define NUMA_DISTANCE_UDF_MAX   9
> > +#define NUMA_LOCAL_DISTANCE     10
> > +#define NUMA_REMOTE_DISTANCE    20
> >
> >  extern void numa_init(bool acpi_off);
> > +extern void numa_set_distance(nodeid_t from, nodeid_t to, uint32_t
> distance);
> >
> >  /*
> >   * Temporary for fake NUMA node, when CPU, memory and distance
> > @@ -21,7 +32,6 @@ extern void numa_init(bool acpi_off);
> >   * symbols will be removed.
> >   */
> >  extern mfn_t first_valid_mfn;
> > -#define __node_distance(a, b) (20)
> >
> >  #else
> >
> > diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
> > index 5a57a51e26..e0253c20b7 100644
> > --- a/xen/include/asm-x86/numa.h
> > +++ b/xen/include/asm-x86/numa.h
> > @@ -21,7 +21,6 @@ extern nodeid_t apicid_to_node[];
> >  extern void init_cpu_to_node(void);
> >
> >  void srat_parse_regions(u64 addr);
> > -extern u8 __node_distance(nodeid_t a, nodeid_t b);
> >  unsigned int arch_get_dma_bitsize(void);
> >
> >  #endif
> > diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
> > index cb08d2eca9..0475823b13 100644
> > --- a/xen/include/xen/numa.h
> > +++ b/xen/include/xen/numa.h
> > @@ -58,7 +58,7 @@ static inline __attribute__((pure)) nodeid_t
> phys_to_nid(paddr_t addr)
> >  #define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
> >  #define node_end_pfn(nid)       (NODE_DATA(nid)->node_start_pfn + \
> >  				 NODE_DATA(nid)->node_spanned_pages)
> > -
> > +extern u8 __node_distance(nodeid_t a, nodeid_t b);
> >  extern void numa_add_cpu(int cpu);
> >
> >  struct node {
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse device tree processor node
  2021-08-27  0:06   ` Stefano Stabellini
@ 2021-08-27  9:31     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-27  9:31 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, julien, Bertrand Marquis

Hi Stefano,

> -----Original Message-----
> From: Stefano Stabellini <sstabellini@kernel.org>
> Sent: 2021年8月27日 8:06
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: xen-devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org;
> jbeulich@suse.com; Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 22/40] xen/arm: introduce a helper to parse
> device tree processor node
> 
> On Wed, 11 Aug 2021, Wei Chen wrote:
> > Processor NUMA ID information is stored in device tree's processor
> > node as "numa-node-id". We need a new helper to parse this ID from
> > processor node. If we get this ID from processor node, this ID's
> > validity still need to be checked. Once we got a invalid NUMA ID
> > from any processor node, the device tree will be marked as NUMA
> > information invalid.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >  xen/arch/arm/numa_device_tree.c | 41 +++++++++++++++++++++++++++++++--
> >  1 file changed, 39 insertions(+), 2 deletions(-)
> >
> > diff --git a/xen/arch/arm/numa_device_tree.c
> b/xen/arch/arm/numa_device_tree.c
> > index 1c74ad135d..37cc56acf3 100644
> > --- a/xen/arch/arm/numa_device_tree.c
> > +++ b/xen/arch/arm/numa_device_tree.c
> > @@ -20,16 +20,53 @@
> >  #include <xen/init.h>
> >  #include <xen/nodemask.h>
> >  #include <xen/numa.h>
> > +#include <xen/device_tree.h>
> > +#include <asm/setup.h>
> >
> >  s8 device_tree_numa = 0;
> > +static nodemask_t processor_nodes_parsed __initdata;
> >
> > -int srat_disabled(void)
> > +static int srat_disabled(void)
> >  {
> >      return numa_off || device_tree_numa < 0;
> >  }
> >
> > -void __init bad_srat(void)
> > +static __init void bad_srat(void)
> >  {
> >      printk(KERN_ERR "DT: NUMA information is not used.\n");
> >      device_tree_numa = -1;
> >  }
> > +
> > +/* Callback for device tree processor affinity */
> > +static int __init dtb_numa_processor_affinity_init(nodeid_t node)
> > +{
> > +    if ( srat_disabled() )
> > +        return -EINVAL;
> > +    else if ( node == NUMA_NO_NODE || node >= MAX_NUMNODES ) {
> > +		bad_srat();
> > +		return -EINVAL;
> > +	}
> > +
> > +    node_set(node, processor_nodes_parsed);
> > +
> > +    device_tree_numa = 1;
> > +    printk(KERN_INFO "DT: NUMA node %u processor parsed\n", node);
> > +
> > +    return 0;
> > +}
> > +
> > +/* Parse CPU NUMA node info */
> > +int __init device_tree_parse_numa_cpu_node(const void *fdt, int node)
> > +{
> > +    uint32_t nid;
> > +
> > +    nid = device_tree_get_u32(fdt, node, "numa-node-id", MAX_NUMNODES);
> > +    printk(XENLOG_WARNING "CPU on NUMA node:%u\n", nid);
> 
> Given that this is not actually a warning (is it?) then I would move it
> to XENLOG_INFO
> 
> 


OK

> > +    if ( nid >= MAX_NUMNODES )
> > +    {
> > +        printk(XENLOG_WARNING "Node id %u exceeds maximum value\n",
> nid);
> 
> This could be XENLOG_ERR
> 

OK

> 
> > +        return -EINVAL;
> > +    }
> > +
> > +    return dtb_numa_processor_affinity_init(nid);
> > +}


^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 16/40] xen/arm: Create a fake NUMA node to use common code
  2021-08-27  6:18     ` Jan Beulich
@ 2021-08-27  9:32       ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-27  9:32 UTC (permalink / raw)
  To: Jan Beulich, Stefano Stabellini; +Cc: xen-devel, julien, Bertrand Marquis

Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2021年8月27日 14:18
> To: Stefano Stabellini <sstabellini@kernel.org>; Wei Chen
> <Wei.Chen@arm.com>
> Cc: xen-devel@lists.xenproject.org; julien@xen.org; Bertrand Marquis
> <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 16/40] xen/arm: Create a fake NUMA node to use
> common code
> 
> On 27.08.2021 01:10, Stefano Stabellini wrote:
> > On Wed, 11 Aug 2021, Wei Chen wrote:
> >> @@ -29,3 +31,54 @@ void numa_set_node(int cpu, nodeid_t nid)
> >>
> >>      cpu_to_node[cpu] = nid;
> >>  }
> >> +
> >> +void __init numa_init(bool acpi_off)
> >> +{
> >> +    uint32_t idx;
> >> +    paddr_t ram_start = ~0;
> >> +    paddr_t ram_size = 0;
> >> +    paddr_t ram_end = 0;
> >> +
> >> +    printk(XENLOG_WARNING
> >> +        "NUMA has not been supported yet, NUMA off!\n");
> >
> > NIT: please align
> >
> >
> >> +    /* Arm NUMA has not been implemented until this patch */
> >
> > "Arm NUMA is not implemented yet"
> >
> >
> >> +    numa_off = true;
> >> +
> >> +    /*
> >> +     * Set all cpu_to_node mapping to 0, this will make cpu_to_node
> >> +     * function return 0 as previous fake cpu_to_node API.
> >> +     */
> >> +    for ( idx = 0; idx < NR_CPUS; idx++ )
> >> +        cpu_to_node[idx] = 0;
> >> +
> >> +    /*
> >> +     * Make node_to_cpumask, node_spanned_pages and node_start_pfn
> >> +     * return as previous fake APIs.
> >> +     */
> >> +    for ( idx = 0; idx < MAX_NUMNODES; idx++ ) {
> >> +        node_to_cpumask[idx] = cpu_online_map;
> >> +        node_spanned_pages(idx) = (max_page - mfn_x(first_valid_mfn));
> >> +        node_start_pfn(idx) = (mfn_x(first_valid_mfn));
> >> +    }
> >
> > I just want to note that this works because MAX_NUMNODES is 1. If
> > MAX_NUMNODES was > 1 then it would be wrong to set node_to_cpumask,
> > node_spanned_pages and node_start_pfn for all nodes to the same values.
> >
> > It might be worth writing something about it in the in-code comment.
> 
> Plus perhaps BUILD_BUG_ON(MAX_NUMNODES != 1), so the issue is actually
> noticed at build time once the constant gets changed?
> 

It would be better. I will use it in next version.

> Jan


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 32/40] xen/x86: make acpi_scan_nodes to be neutral
  2021-08-11 10:24 ` [XEN RFC PATCH 32/40] xen/x86: make acpi_scan_nodes to be neutral Wei Chen
@ 2021-08-27 14:08   ` Julien Grall
  2021-08-28  2:11     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-27 14:08 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:24, Wei Chen wrote:
> The code in acpi_scan_nodes can be reused for device tree based
> NUMA. So we rename acpi_scan_nodes to numa_scan_nodes for a neutral
> function name. As acpi_numa variable is available in ACPU based NUMA
> system only, we use CONFIG_ACPI_NUMA to protect it.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/x86/srat.c        | 4 +++-
>   xen/common/numa.c          | 2 +-
>   xen/include/asm-x86/acpi.h | 2 +-
>   3 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
> index dcebc7adec..3d4d90a622 100644
> --- a/xen/arch/x86/srat.c
> +++ b/xen/arch/x86/srat.c
> @@ -362,7 +362,7 @@ void __init srat_parse_regions(u64 addr)
>   }
>   
>   /* Use the information discovered above to actually set up the nodes. */
> -int __init acpi_scan_nodes(u64 start, u64 end)
> +int __init numa_scan_nodes(u64 start, u64 end)
>   {
>   	int i;
>   	nodemask_t all_nodes_parsed;
> @@ -371,8 +371,10 @@ int __init acpi_scan_nodes(u64 start, u64 end)
>   	for (i = 0; i < MAX_NUMNODES; i++)
>   		cutoff_node(i, start, end);
>   
> +#ifdef CONFIG_ACPI_NUMA
>   	if (acpi_numa <= 0)
>   		return -1;
> +#endif

Looking at the follow-up patches, I find a bit odd that there is a check 
for ACPI but there is none added for DT. Can you explain why?

However, I think this check is going to impair the work to support both 
ACPI and DT on Arm because acpi_numa would end up to be 0 so you would 
bail out here.

With that in mind, I think this check needs to either go away or replace 
by something there is firmware agnostic.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 34/40] xen: move numa_scan_nodes from x86 to common
  2021-08-11 10:24 ` [XEN RFC PATCH 34/40] xen: move numa_scan_nodes from x86 to common Wei Chen
@ 2021-08-27 14:14   ` Julien Grall
  2021-08-28  2:12     ` Wei Chen
  2021-08-31  1:26   ` Stefano Stabellini
  1 sibling, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-27 14:14 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:24, Wei Chen wrote:
> diff --git a/xen/include/asm-x86/acpi.h b/xen/include/asm-x86/acpi.h
> index 33b71dfb3b..2140461ff3 100644
> --- a/xen/include/asm-x86/acpi.h
> +++ b/xen/include/asm-x86/acpi.h
> @@ -101,9 +101,6 @@ extern unsigned long acpi_wakeup_address;
>   
>   #define ARCH_HAS_POWER_INIT	1
>   
> -extern s8 acpi_numa;
> -extern int numa_scan_nodes(u64 start, u64 end);
> -
>   extern struct acpi_sleep_info acpi_sinfo;
>   #define acpi_video_flags bootsym(video_flags)
>   struct xenpf_enter_acpi_sleep;
> diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
> index 490381bd13..b9b5d1ad88 100644
> --- a/xen/include/xen/numa.h
> +++ b/xen/include/xen/numa.h
> @@ -81,8 +81,10 @@ extern void bad_srat(void);
>   extern void numa_init_array(void);
>   extern void numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn);
>   extern void numa_set_node(int cpu, nodeid_t node);
> +extern int numa_scan_nodes(u64 start, u64 end);

AFAICT, by the end of the series, the function is only called by the 
common code. So this should be static.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 35/40] xen: enable numa_scan_nodes for device tree based NUMA
  2021-08-11 10:24 ` [XEN RFC PATCH 35/40] xen: enable numa_scan_nodes for device tree based NUMA Wei Chen
@ 2021-08-27 14:19   ` Julien Grall
  2021-08-28  2:13     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-27 14:19 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:24, Wei Chen wrote:
> Now, we can use the same function for ACPI and device tree based
> NUMA to scan memory nodes.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/common/numa.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/xen/common/numa.c b/xen/common/numa.c
> index 8ca13e27d1..d15c2fc311 100644
> --- a/xen/common/numa.c
> +++ b/xen/common/numa.c
> @@ -381,7 +381,7 @@ void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
>           return;
>   #endif
>   
> -#ifdef CONFIG_ACPI_NUMA
> +#if defined(CONFIG_ACPI_NUMA) || defined(CONFIG_DEVICE_TREE_NUMA)

numa.c is only built when CONFIG_NUMA is set. I don't think CONFIG_NUMA 
will ever set if neither CONFIG_ACPI_NUMA or CONFIG_DEVICE_TREE_NUMA is 
set. So do we actually need this #ifdef?

>       if ( !numa_off && !numa_scan_nodes((u64)start_pfn << PAGE_SHIFT,
>            (u64)end_pfn << PAGE_SHIFT) )
>           return;
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 36/40] xen/arm: keep guest still be NUMA unware
  2021-08-11 10:24 ` [XEN RFC PATCH 36/40] xen/arm: keep guest still be NUMA unware Wei Chen
@ 2021-08-27 14:28   ` Julien Grall
  2021-08-28  2:19     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-27 14:28 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:24, Wei Chen wrote:
> We have not wanted to make Xen guest be NUMA aware in this patch
> series. 

The concept of patch series ceases to exist once we merge the code. So 
about how:

"The NUMA information provided in the host Device-Tree are only for Xen. 
For dom0, we want to hide them as they may be different (for now, dom0 
is still not aware of NUMA".

> So in this patch, Xen will skip NUMA distance matrix node
> and skip the numa-node-id property in CPU node and memory node,
> when Xen is creating guest device tree binary.

The CPU and memory nodes are recreated from scratch for the domain. So 
we already skip the property numa-node-id. However...

> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/domain_build.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index cf341f349f..e62fa761bd 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -584,6 +584,10 @@ static int __init write_properties(struct domain *d, struct kernel_info *kinfo,
>                   continue;
>           }
>   
> +        /* Guest is numa unaware in current stage */
> +        if ( dt_property_name_is_equal(prop, "numa-node-id") )
> +            continue;

... your code is doing more than skipping the property for the two nodes 
you mentionned. Can the property exists in other nodes?

> +
>           res = fdt_property(kinfo->fdt, prop->name, prop_data, prop_len);
>   
>           if ( res )
> @@ -1454,6 +1458,8 @@ static int __init handle_node(struct domain *d, struct kernel_info *kinfo,
>           DT_MATCH_TYPE("memory"),
>           /* The memory mapped timer is not supported by Xen. */
>           DT_MATCH_COMPATIBLE("arm,armv7-timer-mem"),
> +        /* Numa info doesn't need to be exposed to Domain-0 */
> +        DT_MATCH_COMPATIBLE("numa-distance-map-v1"),
>           { /* sentinel */ },
>       };
>       static const struct dt_device_match timer_matches[] __initconst =
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 37/40] xen: introduce an arch helper to do NUMA init failed fallback
  2021-08-11 10:24 ` [XEN RFC PATCH 37/40] xen: introduce an arch helper to do NUMA init failed fallback Wei Chen
@ 2021-08-27 14:30   ` Julien Grall
  2021-08-28  3:09     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-27 14:30 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi,

On 11/08/2021 11:24, Wei Chen wrote:
> When Xen initialize NUMA failed, some architectures may need to
> do fallback actions. For example, in device tree based NUMA, Arm
> need to reset the distance between any two nodes.

 From the description here, I don't understand why we need to reset the 
distance for Arm but not x86. In fact...

> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/arm/numa.c        | 13 +++++++++++++
>   xen/common/numa.c          |  3 +++
>   xen/include/asm-arm/numa.h |  1 +
>   xen/include/asm-x86/numa.h |  6 ++++++
>   4 files changed, 23 insertions(+)
> 
> diff --git a/xen/arch/arm/numa.c b/xen/arch/arm/numa.c
> index 6eebf8e8bc..2a18c97470 100644
> --- a/xen/arch/arm/numa.c
> +++ b/xen/arch/arm/numa.c
> @@ -140,3 +140,16 @@ int __init arch_meminfo_get_ram_bank_range(int bank,
>   
>   	return 0;
>   }
> +
> +void __init arch_numa_init_failed_fallback(void)
> +{
> +    int i, j;
> +
> +    /* Reset all node distance to remote_distance */
> +    for ( i = 0; i < MAX_NUMNODES; i++ ) {
> +        for ( j = 0; j < MAX_NUMNODES; j++ ) {
> +            numa_set_distance(i, j,
> +                (i == j) ? NUMA_LOCAL_DISTANCE : NUMA_REMOTE_DISTANCE);
> +        }
> +    }
> +}

... this implementation looks fairly generic. So can you explain why we 
need it on Arm but not x86?

> diff --git a/xen/common/numa.c b/xen/common/numa.c
> index d15c2fc311..88f1594127 100644
> --- a/xen/common/numa.c
> +++ b/xen/common/numa.c
> @@ -405,4 +405,7 @@ void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
>       cpumask_copy(&node_to_cpumask[0], cpumask_of(0));
>       setup_node_bootmem(0, (u64)start_pfn << PAGE_SHIFT,
>                       (u64)end_pfn << PAGE_SHIFT);
> +
> +    /* architecture specified fallback operations */
> +    arch_numa_init_failed_fallback();
>   }
> diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
> index dd31324b0b..a3982a94b6 100644
> --- a/xen/include/asm-arm/numa.h
> +++ b/xen/include/asm-arm/numa.h
> @@ -28,6 +28,7 @@ extern s8 device_tree_numa;
>   extern void numa_init(bool acpi_off);
>   extern int numa_device_tree_init(const void *fdt);
>   extern void numa_set_distance(nodeid_t from, nodeid_t to, uint32_t distance);
> +extern void arch_numa_init_failed_fallback(void);
>   
>   /*
>    * Temporary for fake NUMA node, when CPU, memory and distance
> diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
> index e63869135c..26280b0f3a 100644
> --- a/xen/include/asm-x86/numa.h
> +++ b/xen/include/asm-x86/numa.h
> @@ -22,4 +22,10 @@ extern void init_cpu_to_node(void);
>   void srat_parse_regions(u64 addr);
>   unsigned int arch_get_dma_bitsize(void);
>   
> +/* Dummy function for numa init failed in numa_initmem_init */
> +static inline void arch_numa_init_failed_fallback(void)
> +{
> +    return;

NIT: The return is pointless.

> +}
> +
>   #endif
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 38/40] xen/arm: enable device tree based NUMA in system init
  2021-08-11 10:24 ` [XEN RFC PATCH 38/40] xen/arm: enable device tree based NUMA in system init Wei Chen
@ 2021-08-27 14:32   ` Julien Grall
  2021-08-28  3:17     ` Wei Chen
  2021-08-31  1:50   ` Stefano Stabellini
  1 sibling, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-27 14:32 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:24, Wei Chen wrote:
> Everything is ready, we can remove the fake NUMA node and
> depends on device tree to create NUMA system.

So you just added code a few patches before that are now completely 
rewritten. Can you please re-order this series so it doesn't happen?

This may mean that CONFIG_NUMA is only selected until late in this series.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 39/40] xen/x86: move numa_setup to common to support NUMA switch in command line
  2021-08-11 10:24 ` [XEN RFC PATCH 39/40] xen/x86: move numa_setup to common to support NUMA switch in command line Wei Chen
@ 2021-08-27 14:37   ` Julien Grall
  2021-08-28  3:22     ` Wei Chen
  2021-08-31  1:53   ` Stefano Stabellini
  1 sibling, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-27 14:37 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini, jbeulich; +Cc: Bertrand.Marquis

Hi Wei,

On 11/08/2021 11:24, Wei Chen wrote:
> Xen x86 has created a command line parameter "numa" as NUMA switch for
> user to turn on/off NUMA. As device tree based NUMA has been enabled
> for Arm, this parameter can be reused by Arm. So in this patch, we move
> this parameter to common.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>   xen/arch/x86/numa.c    | 34 ----------------------------------
>   xen/common/numa.c      | 35 ++++++++++++++++++++++++++++++++++-
>   xen/include/xen/numa.h |  1 -
>   3 files changed, 34 insertions(+), 36 deletions(-)
> 
> diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
> index 8b43be4aa7..380d8ed6fd 100644
> --- a/xen/arch/x86/numa.c
> +++ b/xen/arch/x86/numa.c
> @@ -11,7 +11,6 @@
>   #include <xen/nodemask.h>
>   #include <xen/numa.h>
>   #include <xen/keyhandler.h>
> -#include <xen/param.h>
>   #include <xen/time.h>
>   #include <xen/smp.h>
>   #include <xen/pfn.h>
> @@ -19,9 +18,6 @@
>   #include <xen/sched.h>
>   #include <xen/softirq.h>
>   
> -static int numa_setup(const char *s);
> -custom_param("numa", numa_setup);
> -
>   #ifndef Dprintk
>   #define Dprintk(x...)
>   #endif
> @@ -50,35 +46,6 @@ void numa_set_node(int cpu, nodeid_t node)
>       cpu_to_node[cpu] = node;
>   }
>   
> -/* [numa=off] */
> -static __init int numa_setup(const char *opt)
> -{
> -    if ( !strncmp(opt,"off",3) )
> -        numa_off = true;
> -    else if ( !strncmp(opt,"on",2) )
> -        numa_off = false;
> -#ifdef CONFIG_NUMA_EMU
> -    else if ( !strncmp(opt, "fake=", 5) )
> -    {
> -        numa_off = false;
> -        numa_fake = simple_strtoul(opt+5,NULL,0);
> -        if ( numa_fake >= MAX_NUMNODES )
> -            numa_fake = MAX_NUMNODES;
> -    }
> -#endif
> -#ifdef CONFIG_ACPI_NUMA
> -    else if ( !strncmp(opt,"noacpi",6) )
> -    {
> -        numa_off = false;
> -        acpi_numa = -1;
> -    }
> -#endif
> -    else
> -        return -EINVAL;
> -
> -    return 0;
> -}
> -
>   /*
>    * Setup early cpu_to_node.
>    *
> @@ -287,4 +254,3 @@ static __init int register_numa_trigger(void)
>       return 0;
>   }
>   __initcall(register_numa_trigger);
> -
> diff --git a/xen/common/numa.c b/xen/common/numa.c
> index 88f1594127..c98eb8d571 100644
> --- a/xen/common/numa.c
> +++ b/xen/common/numa.c
> @@ -14,8 +14,12 @@
>   #include <xen/smp.h>
>   #include <xen/pfn.h>
>   #include <xen/sched.h>
> +#include <xen/param.h>
>   #include <asm/acpi.h>
>   
> +static int numa_setup(const char *s);
> +custom_param("numa", numa_setup);
> +
>   struct node_data node_data[MAX_NUMNODES];
>   
>   /* Mapping from pdx to node id */
> @@ -324,7 +328,7 @@ int __init numa_scan_nodes(u64 start, u64 end)
>   }
>   
>   #ifdef CONFIG_NUMA_EMU
> -int numa_fake __initdata = 0;
> +static int numa_fake __initdata = 0;
>   
>   /* Numa emulation */
>   static int __init numa_emulation(u64 start_pfn, u64 end_pfn)
> @@ -409,3 +413,32 @@ void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
>       /* architecture specified fallback operations */
>       arch_numa_init_failed_fallback();
>   }
> +
> +/* [numa=off] */

The documentation also needs be be updated to reflect that facts this 
option is not architecture-agnostic.

> +static __init int numa_setup(const char *opt)
> +{
> +    if ( !strncmp(opt,"off",3) )
> +        numa_off = true;
> +    else if ( !strncmp(opt,"on",2) )
> +        numa_off = false;
> +#ifdef CONFIG_NUMA_EMU
> +    else if ( !strncmp(opt, "fake=", 5) )
> +    {
> +        numa_off = false;
> +        numa_fake = simple_strtoul(opt+5,NULL,0);
> +        if ( numa_fake >= MAX_NUMNODES )
> +            numa_fake = MAX_NUMNODES;
> +    }
> +#endif
> +#ifdef CONFIG_ACPI_NUMA
> +    else if ( !strncmp(opt,"noacpi",6) )
> +    {
> +        numa_off = false;
> +        acpi_numa = -1;
> +    }
> +#endif

Looking at this code, I am not quite too sure to understand the 
difference between between "numa=noacpi" and "numa=off".

In fact, I am tempted to say this option should disappear because this 
is odd to have a firmware specific option just for ACPI but not DT. Even 
if we have one for each, this makes things a bit more complicated for 
the admin.

> +    else
> +        return -EINVAL;
> +
> +    return 0;
> +}
> diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
> index b9b5d1ad88..c647fef736 100644
> --- a/xen/include/xen/numa.h
> +++ b/xen/include/xen/numa.h
> @@ -83,7 +83,6 @@ extern void numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn);
>   extern void numa_set_node(int cpu, nodeid_t node);
>   extern int numa_scan_nodes(u64 start, u64 end);
>   extern bool numa_off;
> -extern int numa_fake;
>   extern s8 acpi_numa;
>   
>   extern void setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end);
> 

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 18/40] xen/arm: Keep memory nodes in dtb for NUMA when boot from EFI
  2021-08-27  7:41     ` Julien Grall
@ 2021-08-27 23:10       ` Stefano Stabellini
  0 siblings, 0 replies; 196+ messages in thread
From: Stefano Stabellini @ 2021-08-27 23:10 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Wei Chen, xen-devel, jbeulich, Bertrand.Marquis

On Fri, 27 Aug 2021, Julien Grall wrote:
> Hi Stefano,
> 
> On 27/08/2021 00:24, Stefano Stabellini wrote:
> > On Wed, 11 Aug 2021, Wei Chen wrote:
> > > EFI can get memory map from EFI system table. But EFI system
> > > table doesn't contain memory NUMA information, EFI depends on
> > > ACPI SRAT or device tree memory node to parse memory blocks'
> > > NUMA mapping.
> > > 
> > > But in current code, when Xen is booting from EFI, it will
> > > delete all memory nodes in device tree. So in UEFI + DTB
> > > boot, we don't have numa-node-id for memory blocks any more.
> > > 
> > > So in this patch, we will keep memory nodes in device tree for
> > > NUMA code to parse memory numa-node-id later.
> > > 
> > > As a side effect, if we still parse boot memory information in
> > > early_scan_node, bootmem.info will calculate memory ranges in
> > > memory nodes twice. So we have to prvent early_scan_node to
> > > parse memory nodes in EFI boot.
> > > 
> > > As EFI APIs only can be used in Arm64, so we introduced a wrapper
> > > in header file to prevent #ifdef CONFIG_ARM_64/32 in code block.
> > > 
> > > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > > ---
> > >   xen/arch/arm/bootfdt.c      |  8 +++++++-
> > >   xen/arch/arm/efi/efi-boot.h | 25 -------------------------
> > >   xen/include/asm-arm/setup.h |  6 ++++++
> > >   3 files changed, 13 insertions(+), 26 deletions(-)
> > > 
> > > diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
> > > index 476e32e0f5..7df149dbca 100644
> > > --- a/xen/arch/arm/bootfdt.c
> > > +++ b/xen/arch/arm/bootfdt.c
> > > @@ -11,6 +11,7 @@
> > >   #include <xen/lib.h>
> > >   #include <xen/kernel.h>
> > >   #include <xen/init.h>
> > > +#include <xen/efi.h>
> > >   #include <xen/device_tree.h>
> > >   #include <xen/libfdt/libfdt.h>
> > >   #include <xen/sort.h>
> > > @@ -335,7 +336,12 @@ static int __init early_scan_node(const void *fdt,
> > >   {
> > >       int rc = 0;
> > >   -    if ( device_tree_node_matches(fdt, node, "memory") )
> > > +    /*
> > > +     * If system boot from EFI, bootinfo.mem has been set by EFI,
> > > +     * so we don't need to parse memory node from DTB.
> > > +     */
> > > +    if ( device_tree_node_matches(fdt, node, "memory") &&
> > > +         !arch_efi_enabled(EFI_BOOT) )
> > >           rc = process_memory_node(fdt, node, name, depth,
> > >                                    address_cells, size_cells,
> > > &bootinfo.mem);
> > >       else if ( depth == 1 && !dt_node_cmp(name, "reserved-memory") )
> > 
> > 
> > If we are going to use the device tree info for the numa nodes (and
> > related memory) does it make sense to still rely on the EFI tables for
> > the memory map?
> 
> Yes. AFAIK, when booting using EFI, the Device-Tree may not contain all the
> reserved regions. Furthermore, we are still too early to know whether we boot
> using ACPI and DT.
> 
> > 
> > I wonder if we should just use device tree for memory and ignore EFI
> > instead. Do you know what Linux does in this regard?
> I looked at Linux when I first reviewed this patch because I was wondering
> what happens if the DT and UEFI map disagrees.
> 
> Linux and Xen are the same after this patch:
>   1) The memory map is coming from UEFI map
>   2) NUMA ID is coming from the DT
> 
> The commit that introduced the change in Linux is:
[...]

Thanks both you and Wei for the investigation :-)


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse device tree memory node
  2021-08-11 10:24 ` [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse device tree memory node Wei Chen
  2021-08-25 13:48   ` Julien Grall
@ 2021-08-28  1:06   ` Stefano Stabellini
  2021-08-28  3:56     ` Wei Chen
  1 sibling, 1 reply; 196+ messages in thread
From: Stefano Stabellini @ 2021-08-28  1:06 UTC (permalink / raw)
  To: Wei Chen; +Cc: xen-devel, sstabellini, julien, jbeulich, Bertrand.Marquis

On Wed, 11 Aug 2021, Wei Chen wrote:
> Memory blocks' NUMA ID information is stored in device tree's
> memory nodes as "numa-node-id". We need a new helper to parse
> and verify this ID from memory nodes.
> 
> In order to support memory affinity in later use, the valid
> memory ranges and NUMA ID will be saved to tables.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>  xen/arch/arm/numa_device_tree.c | 130 ++++++++++++++++++++++++++++++++
>  1 file changed, 130 insertions(+)
> 
> diff --git a/xen/arch/arm/numa_device_tree.c b/xen/arch/arm/numa_device_tree.c
> index 37cc56acf3..bbe081dcd1 100644
> --- a/xen/arch/arm/numa_device_tree.c
> +++ b/xen/arch/arm/numa_device_tree.c
> @@ -20,11 +20,13 @@
>  #include <xen/init.h>
>  #include <xen/nodemask.h>
>  #include <xen/numa.h>
> +#include <xen/libfdt/libfdt.h>
>  #include <xen/device_tree.h>
>  #include <asm/setup.h>
>  
>  s8 device_tree_numa = 0;
>  static nodemask_t processor_nodes_parsed __initdata;
> +static nodemask_t memory_nodes_parsed __initdata;
>  
>  static int srat_disabled(void)
>  {
> @@ -55,6 +57,79 @@ static int __init dtb_numa_processor_affinity_init(nodeid_t node)
>      return 0;
>  }
>  
> +/* Callback for parsing of the memory regions affinity */
> +static int __init dtb_numa_memory_affinity_init(nodeid_t node,
> +                                paddr_t start, paddr_t size)
> +{
> +    struct node *nd;
> +    paddr_t end;
> +    int i;
> +
> +    if ( srat_disabled() )
> +        return -EINVAL;
> +
> +    end = start + size;
> +    if ( num_node_memblks >= NR_NODE_MEMBLKS )
> +    {
> +        dprintk(XENLOG_WARNING,
> +                "Too many numa entry, try bigger NR_NODE_MEMBLKS \n");
> +        bad_srat();
> +        return -EINVAL;
> +    }
> +
> +    /* It is fine to add this area to the nodes data it will be used later */
> +    i = conflicting_memblks(start, end);
> +    /* No conflicting memory block, we can save it for later usage */;
> +    if ( i < 0 )
> +        goto save_memblk;
> +
> +    if ( memblk_nodeid[i] == node ) {
> +        /*
> +         * Overlaps with other memblk in the same node, warning here.
> +         * This memblk will be merged with conflicted memblk later.
> +         */
> +        printk(XENLOG_WARNING
> +               "DT: NUMA NODE %u (%"PRIx64
> +               "-%"PRIx64") overlaps with itself (%"PRIx64"-%"PRIx64")\n",
> +               node, start, end,
> +               node_memblk_range[i].start, node_memblk_range[i].end);
> +    } else {
> +        /*
> +         * Conflict with memblk in other node, this is an error.
> +         * The NUMA information is invalid, NUMA will be turn off.
> +         */
> +        printk(XENLOG_ERR
> +               "DT: NUMA NODE %u (%"PRIx64"-%"
> +               PRIx64") overlaps with NODE %u (%"PRIx64"-%"PRIx64")\n",
> +               node, start, end, memblk_nodeid[i],
> +               node_memblk_range[i].start, node_memblk_range[i].end);
> +        bad_srat();
> +        return -EINVAL;
> +    }
> +
> +save_memblk:
> +    nd = &nodes[node];
> +    if ( !node_test_and_set(node, memory_nodes_parsed) ) {
> +        nd->start = start;
> +        nd->end = end;
> +    } else {
> +        if ( start < nd->start )
> +            nd->start = start;
> +        if ( nd->end < end )
> +            nd->end = end;
> +    }
> +
> +    printk(XENLOG_INFO "DT: NUMA node %u %"PRIx64"-%"PRIx64"\n",
> +           node, start, end);
> +
> +    node_memblk_range[num_node_memblks].start = start;
> +    node_memblk_range[num_node_memblks].end = end;
> +    memblk_nodeid[num_node_memblks] = node;
> +    num_node_memblks++;


Is it possible to have non-contigous ranges of memory for a single NUMA
node?

Looking at the DT bindings and Linux implementation, it seems possible.
Here, it seems that node_memblk_range/memblk_nodeid could handle it,
but nodes couldn't.


^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 32/40] xen/x86: make acpi_scan_nodes to be neutral
  2021-08-27 14:08   ` Julien Grall
@ 2021-08-28  2:11     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-28  2:11 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月27日 22:09
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 32/40] xen/x86: make acpi_scan_nodes to be
> neutral
> 
> Hi Wei,
> 
> On 11/08/2021 11:24, Wei Chen wrote:
> > The code in acpi_scan_nodes can be reused for device tree based
> > NUMA. So we rename acpi_scan_nodes to numa_scan_nodes for a neutral
> > function name. As acpi_numa variable is available in ACPU based NUMA
> > system only, we use CONFIG_ACPI_NUMA to protect it.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/x86/srat.c        | 4 +++-
> >   xen/common/numa.c          | 2 +-
> >   xen/include/asm-x86/acpi.h | 2 +-
> >   3 files changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
> > index dcebc7adec..3d4d90a622 100644
> > --- a/xen/arch/x86/srat.c
> > +++ b/xen/arch/x86/srat.c
> > @@ -362,7 +362,7 @@ void __init srat_parse_regions(u64 addr)
> >   }
> >
> >   /* Use the information discovered above to actually set up the nodes.
> */
> > -int __init acpi_scan_nodes(u64 start, u64 end)
> > +int __init numa_scan_nodes(u64 start, u64 end)
> >   {
> >   	int i;
> >   	nodemask_t all_nodes_parsed;
> > @@ -371,8 +371,10 @@ int __init acpi_scan_nodes(u64 start, u64 end)
> >   	for (i = 0; i < MAX_NUMNODES; i++)
> >   		cutoff_node(i, start, end);
> >
> > +#ifdef CONFIG_ACPI_NUMA
> >   	if (acpi_numa <= 0)
> >   		return -1;
> > +#endif
> 
> Looking at the follow-up patches, I find a bit odd that there is a check
> for ACPI but there is none added for DT. Can you explain why?
> 

Oh, I forgot DT check. And simply to add DT check here seems not a
good idea. Because once, when Arm support ACPI NUMA.
CONFIG_ACPI_NUMA and CONFIG_DEVICE_TREE_NUMA can be selected at
the same time. But only acpi_numa or dtb_numa can be > 0.

> However, I think this check is going to impair the work to support both
> ACPI and DT on Arm because acpi_numa would end up to be 0 so you would
> bail out here.
> 
> With that in mind, I think this check needs to either go away or replace
> by something there is firmware agnostic.

Yes, we have discussed about something like fw_numa before.
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 34/40] xen: move numa_scan_nodes from x86 to common
  2021-08-27 14:14   ` Julien Grall
@ 2021-08-28  2:12     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-28  2:12 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月27日 22:14
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 34/40] xen: move numa_scan_nodes from x86 to
> common
> 
> Hi Wei,
> 
> On 11/08/2021 11:24, Wei Chen wrote:
> > diff --git a/xen/include/asm-x86/acpi.h b/xen/include/asm-x86/acpi.h
> > index 33b71dfb3b..2140461ff3 100644
> > --- a/xen/include/asm-x86/acpi.h
> > +++ b/xen/include/asm-x86/acpi.h
> > @@ -101,9 +101,6 @@ extern unsigned long acpi_wakeup_address;
> >
> >   #define ARCH_HAS_POWER_INIT	1
> >
> > -extern s8 acpi_numa;
> > -extern int numa_scan_nodes(u64 start, u64 end);
> > -
> >   extern struct acpi_sleep_info acpi_sinfo;
> >   #define acpi_video_flags bootsym(video_flags)
> >   struct xenpf_enter_acpi_sleep;
> > diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
> > index 490381bd13..b9b5d1ad88 100644
> > --- a/xen/include/xen/numa.h
> > +++ b/xen/include/xen/numa.h
> > @@ -81,8 +81,10 @@ extern void bad_srat(void);
> >   extern void numa_init_array(void);
> >   extern void numa_initmem_init(unsigned long start_pfn, unsigned long
> end_pfn);
> >   extern void numa_set_node(int cpu, nodeid_t node);
> > +extern int numa_scan_nodes(u64 start, u64 end);
> 
> AFAICT, by the end of the series, the function is only called by the
> common code. So this should be static.
> 

OK

> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 35/40] xen: enable numa_scan_nodes for device tree based NUMA
  2021-08-27 14:19   ` Julien Grall
@ 2021-08-28  2:13     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-28  2:13 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月27日 22:19
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 35/40] xen: enable numa_scan_nodes for device
> tree based NUMA
> 
> Hi Wei,
> 
> On 11/08/2021 11:24, Wei Chen wrote:
> > Now, we can use the same function for ACPI and device tree based
> > NUMA to scan memory nodes.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/common/numa.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/xen/common/numa.c b/xen/common/numa.c
> > index 8ca13e27d1..d15c2fc311 100644
> > --- a/xen/common/numa.c
> > +++ b/xen/common/numa.c
> > @@ -381,7 +381,7 @@ void __init numa_initmem_init(unsigned long
> start_pfn, unsigned long end_pfn)
> >           return;
> >   #endif
> >
> > -#ifdef CONFIG_ACPI_NUMA
> > +#if defined(CONFIG_ACPI_NUMA) || defined(CONFIG_DEVICE_TREE_NUMA)
> 
> numa.c is only built when CONFIG_NUMA is set. I don't think CONFIG_NUMA
> will ever set if neither CONFIG_ACPI_NUMA or CONFIG_DEVICE_TREE_NUMA is
> set. So do we actually need this #ifdef?
> 

Yes, you're right. This check should be removed.


> >       if ( !numa_off && !numa_scan_nodes((u64)start_pfn << PAGE_SHIFT,
> >            (u64)end_pfn << PAGE_SHIFT) )
> >           return;
> >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 36/40] xen/arm: keep guest still be NUMA unware
  2021-08-27 14:28   ` Julien Grall
@ 2021-08-28  2:19     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-28  2:19 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月27日 22:28
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 36/40] xen/arm: keep guest still be NUMA
> unware
> 
> Hi Wei,
> 
> On 11/08/2021 11:24, Wei Chen wrote:
> > We have not wanted to make Xen guest be NUMA aware in this patch
> > series.
> 
> The concept of patch series ceases to exist once we merge the code. So
> about how:
> 
> "The NUMA information provided in the host Device-Tree are only for Xen.
> For dom0, we want to hide them as they may be different (for now, dom0
> is still not aware of NUMA".
> 

Thanks, I will do it.

> > So in this patch, Xen will skip NUMA distance matrix node
> > and skip the numa-node-id property in CPU node and memory node,
> > when Xen is creating guest device tree binary.
> 
> The CPU and memory nodes are recreated from scratch for the domain. So
> we already skip the property numa-node-id. However...
> 
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/arm/domain_build.c | 6 ++++++
> >   1 file changed, 6 insertions(+)
> >
> > diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> > index cf341f349f..e62fa761bd 100644
> > --- a/xen/arch/arm/domain_build.c
> > +++ b/xen/arch/arm/domain_build.c
> > @@ -584,6 +584,10 @@ static int __init write_properties(struct domain *d,
> struct kernel_info *kinfo,
> >                   continue;
> >           }
> >
> > +        /* Guest is numa unaware in current stage */
> > +        if ( dt_property_name_is_equal(prop, "numa-node-id") )
> > +            continue;
> 
> ... your code is doing more than skipping the property for the two nodes
> you mentionned. Can the property exists in other nodes?

Some devices like PCIe may have numa-node-id. In the future maybe more
device can have NUMA property.

> 
> > +
> >           res = fdt_property(kinfo->fdt, prop->name, prop_data,
> prop_len);
> >
> >           if ( res )
> > @@ -1454,6 +1458,8 @@ static int __init handle_node(struct domain *d,
> struct kernel_info *kinfo,
> >           DT_MATCH_TYPE("memory"),
> >           /* The memory mapped timer is not supported by Xen. */
> >           DT_MATCH_COMPATIBLE("arm,armv7-timer-mem"),
> > +        /* Numa info doesn't need to be exposed to Domain-0 */
> > +        DT_MATCH_COMPATIBLE("numa-distance-map-v1"),
> >           { /* sentinel */ },
> >       };
> >       static const struct dt_device_match timer_matches[] __initconst =
> >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 37/40] xen: introduce an arch helper to do NUMA init failed fallback
  2021-08-27 14:30   ` Julien Grall
@ 2021-08-28  3:09     ` Wei Chen
  2021-08-28  3:45       ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-28  3:09 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini, Jan Beulich; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月27日 22:30
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 37/40] xen: introduce an arch helper to do
> NUMA init failed fallback
> 
> Hi,
> 
> On 11/08/2021 11:24, Wei Chen wrote:
> > When Xen initialize NUMA failed, some architectures may need to
> > do fallback actions. For example, in device tree based NUMA, Arm
> > need to reset the distance between any two nodes.
> 
>  From the description here, I don't understand why we need to reset the
> distance for Arm but not x86. In fact...
> 
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/arm/numa.c        | 13 +++++++++++++
> >   xen/common/numa.c          |  3 +++
> >   xen/include/asm-arm/numa.h |  1 +
> >   xen/include/asm-x86/numa.h |  6 ++++++
> >   4 files changed, 23 insertions(+)
> >
> > diff --git a/xen/arch/arm/numa.c b/xen/arch/arm/numa.c
> > index 6eebf8e8bc..2a18c97470 100644
> > --- a/xen/arch/arm/numa.c
> > +++ b/xen/arch/arm/numa.c
> > @@ -140,3 +140,16 @@ int __init arch_meminfo_get_ram_bank_range(int bank,
> >
> >   	return 0;
> >   }
> > +
> > +void __init arch_numa_init_failed_fallback(void)
> > +{
> > +    int i, j;
> > +
> > +    /* Reset all node distance to remote_distance */
> > +    for ( i = 0; i < MAX_NUMNODES; i++ ) {
> > +        for ( j = 0; j < MAX_NUMNODES; j++ ) {
> > +            numa_set_distance(i, j,
> > +                (i == j) ? NUMA_LOCAL_DISTANCE : NUMA_REMOTE_DISTANCE);
> > +        }
> > +    }
> > +}
> 
> ... this implementation looks fairly generic. So can you explain why we
> need it on Arm but not x86?
> 

This implementation is DT only, for x86, it's using acpi_slit.
For now, I am not quit sure ACPI need to do fallback or not.
Or say in another way, I don't know how to implement the fallback
for ACPI. I planned to solve it in Arm ACPI version NUMA, so I left
an empty helper for x86.

@Jan Beulich Could you give me some suggestion about x86 fallback?


> > diff --git a/xen/common/numa.c b/xen/common/numa.c
> > index d15c2fc311..88f1594127 100644
> > --- a/xen/common/numa.c
> > +++ b/xen/common/numa.c
> > @@ -405,4 +405,7 @@ void __init numa_initmem_init(unsigned long
> start_pfn, unsigned long end_pfn)
> >       cpumask_copy(&node_to_cpumask[0], cpumask_of(0));
> >       setup_node_bootmem(0, (u64)start_pfn << PAGE_SHIFT,
> >                       (u64)end_pfn << PAGE_SHIFT);
> > +
> > +    /* architecture specified fallback operations */
> > +    arch_numa_init_failed_fallback();
> >   }
> > diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
> > index dd31324b0b..a3982a94b6 100644
> > --- a/xen/include/asm-arm/numa.h
> > +++ b/xen/include/asm-arm/numa.h
> > @@ -28,6 +28,7 @@ extern s8 device_tree_numa;
> >   extern void numa_init(bool acpi_off);
> >   extern int numa_device_tree_init(const void *fdt);
> >   extern void numa_set_distance(nodeid_t from, nodeid_t to, uint32_t
> distance);
> > +extern void arch_numa_init_failed_fallback(void);
> >
> >   /*
> >    * Temporary for fake NUMA node, when CPU, memory and distance
> > diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
> > index e63869135c..26280b0f3a 100644
> > --- a/xen/include/asm-x86/numa.h
> > +++ b/xen/include/asm-x86/numa.h
> > @@ -22,4 +22,10 @@ extern void init_cpu_to_node(void);
> >   void srat_parse_regions(u64 addr);
> >   unsigned int arch_get_dma_bitsize(void);
> >
> > +/* Dummy function for numa init failed in numa_initmem_init */
> > +static inline void arch_numa_init_failed_fallback(void)
> > +{
> > +    return;
> 
> NIT: The return is pointless.
> 

OK

> > +}
> > +
> >   #endif
> >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 38/40] xen/arm: enable device tree based NUMA in system init
  2021-08-27 14:32   ` Julien Grall
@ 2021-08-28  3:17     ` Wei Chen
  2021-08-28 10:45       ` Julien Grall
  0 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-28  3:17 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月27日 22:33
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 38/40] xen/arm: enable device tree based NUMA
> in system init
> 
> Hi Wei,
> 
> On 11/08/2021 11:24, Wei Chen wrote:
> > Everything is ready, we can remove the fake NUMA node and
> > depends on device tree to create NUMA system.
> 
> So you just added code a few patches before that are now completely
> rewritten. Can you please re-order this series so it doesn't happen?
> 
> This may mean that CONFIG_NUMA is only selected until late in this series.
> 

Why I did like this is because my original concerns are:
1. When I introduced the CONFIG_NUMA. Users will be using a code base on
   this commit by accident.
2. If users select CONFIG_NUMA, but not all NUMA data are not initialized
   properly. The system may not work properly.
3. So I created the fake node to initialize the NUMA data, before we parser
   real data from DTB.
4. In this case, user can work well with CONFIG_NUMA is enabled, without
   this series is completed.

It seems I thought too much. If these concerns are not necessary. I am
OK to re-order this series.

> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 39/40] xen/x86: move numa_setup to common to support NUMA switch in command line
  2021-08-27 14:37   ` Julien Grall
@ 2021-08-28  3:22     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-28  3:22 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini, Jan Beulich; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月27日 22:38
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; jbeulich@suse.com
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 39/40] xen/x86: move numa_setup to common to
> support NUMA switch in command line
> 
> Hi Wei,
> 
> On 11/08/2021 11:24, Wei Chen wrote:
> > Xen x86 has created a command line parameter "numa" as NUMA switch for
> > user to turn on/off NUMA. As device tree based NUMA has been enabled
> > for Arm, this parameter can be reused by Arm. So in this patch, we move
> > this parameter to common.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >   xen/arch/x86/numa.c    | 34 ----------------------------------
> >   xen/common/numa.c      | 35 ++++++++++++++++++++++++++++++++++-
> >   xen/include/xen/numa.h |  1 -
> >   3 files changed, 34 insertions(+), 36 deletions(-)
> >
> > diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
> > index 8b43be4aa7..380d8ed6fd 100644
> > --- a/xen/arch/x86/numa.c
> > +++ b/xen/arch/x86/numa.c
> > @@ -11,7 +11,6 @@
> >   #include <xen/nodemask.h>
> >   #include <xen/numa.h>
> >   #include <xen/keyhandler.h>
> > -#include <xen/param.h>
> >   #include <xen/time.h>
> >   #include <xen/smp.h>
> >   #include <xen/pfn.h>
> > @@ -19,9 +18,6 @@
> >   #include <xen/sched.h>
> >   #include <xen/softirq.h>
> >
> > -static int numa_setup(const char *s);
> > -custom_param("numa", numa_setup);
> > -
> >   #ifndef Dprintk
> >   #define Dprintk(x...)
> >   #endif
> > @@ -50,35 +46,6 @@ void numa_set_node(int cpu, nodeid_t node)
> >       cpu_to_node[cpu] = node;
> >   }
> >
> > -/* [numa=off] */
> > -static __init int numa_setup(const char *opt)
> > -{
> > -    if ( !strncmp(opt,"off",3) )
> > -        numa_off = true;
> > -    else if ( !strncmp(opt,"on",2) )
> > -        numa_off = false;
> > -#ifdef CONFIG_NUMA_EMU
> > -    else if ( !strncmp(opt, "fake=", 5) )
> > -    {
> > -        numa_off = false;
> > -        numa_fake = simple_strtoul(opt+5,NULL,0);
> > -        if ( numa_fake >= MAX_NUMNODES )
> > -            numa_fake = MAX_NUMNODES;
> > -    }
> > -#endif
> > -#ifdef CONFIG_ACPI_NUMA
> > -    else if ( !strncmp(opt,"noacpi",6) )
> > -    {
> > -        numa_off = false;
> > -        acpi_numa = -1;
> > -    }
> > -#endif
> > -    else
> > -        return -EINVAL;
> > -
> > -    return 0;
> > -}
> > -
> >   /*
> >    * Setup early cpu_to_node.
> >    *
> > @@ -287,4 +254,3 @@ static __init int register_numa_trigger(void)
> >       return 0;
> >   }
> >   __initcall(register_numa_trigger);
> > -
> > diff --git a/xen/common/numa.c b/xen/common/numa.c
> > index 88f1594127..c98eb8d571 100644
> > --- a/xen/common/numa.c
> > +++ b/xen/common/numa.c
> > @@ -14,8 +14,12 @@
> >   #include <xen/smp.h>
> >   #include <xen/pfn.h>
> >   #include <xen/sched.h>
> > +#include <xen/param.h>
> >   #include <asm/acpi.h>
> >
> > +static int numa_setup(const char *s);
> > +custom_param("numa", numa_setup);
> > +
> >   struct node_data node_data[MAX_NUMNODES];
> >
> >   /* Mapping from pdx to node id */
> > @@ -324,7 +328,7 @@ int __init numa_scan_nodes(u64 start, u64 end)
> >   }
> >
> >   #ifdef CONFIG_NUMA_EMU
> > -int numa_fake __initdata = 0;
> > +static int numa_fake __initdata = 0;
> >
> >   /* Numa emulation */
> >   static int __init numa_emulation(u64 start_pfn, u64 end_pfn)
> > @@ -409,3 +413,32 @@ void __init numa_initmem_init(unsigned long
> start_pfn, unsigned long end_pfn)
> >       /* architecture specified fallback operations */
> >       arch_numa_init_failed_fallback();
> >   }
> > +
> > +/* [numa=off] */
> 
> The documentation also needs be be updated to reflect that facts this
> option is not architecture-agnostic.
> 

Ok, I will update the relate document in next version.

> > +static __init int numa_setup(const char *opt)
> > +{
> > +    if ( !strncmp(opt,"off",3) )
> > +        numa_off = true;
> > +    else if ( !strncmp(opt,"on",2) )
> > +        numa_off = false;
> > +#ifdef CONFIG_NUMA_EMU
> > +    else if ( !strncmp(opt, "fake=", 5) )
> > +    {
> > +        numa_off = false;
> > +        numa_fake = simple_strtoul(opt+5,NULL,0);
> > +        if ( numa_fake >= MAX_NUMNODES )
> > +            numa_fake = MAX_NUMNODES;
> > +    }
> > +#endif
> > +#ifdef CONFIG_ACPI_NUMA
> > +    else if ( !strncmp(opt,"noacpi",6) )
> > +    {
> > +        numa_off = false;
> > +        acpi_numa = -1;
> > +    }
> > +#endif
> 
> Looking at this code, I am not quite too sure to understand the
> difference between between "numa=noacpi" and "numa=off".
> 
> In fact, I am tempted to say this option should disappear because this
> is odd to have a firmware specific option just for ACPI but not DT. Even
> if we have one for each, this makes things a bit more complicated for
> the admin.
> 

Yes, I agree. I would consider a proper way to address it in next version.
If x86 maintainers can give some background of these two options would be
better.

> > +    else
> > +        return -EINVAL;
> > +
> > +    return 0;
> > +}
> > diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
> > index b9b5d1ad88..c647fef736 100644
> > --- a/xen/include/xen/numa.h
> > +++ b/xen/include/xen/numa.h
> > @@ -83,7 +83,6 @@ extern void numa_initmem_init(unsigned long start_pfn,
> unsigned long end_pfn);
> >   extern void numa_set_node(int cpu, nodeid_t node);
> >   extern int numa_scan_nodes(u64 start, u64 end);
> >   extern bool numa_off;
> > -extern int numa_fake;
> >   extern s8 acpi_numa;
> >
> >   extern void setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end);
> >
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 37/40] xen: introduce an arch helper to do NUMA init failed fallback
  2021-08-28  3:09     ` Wei Chen
@ 2021-08-28  3:45       ` Wei Chen
  2021-08-30  9:52         ` Jan Beulich
  0 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-28  3:45 UTC (permalink / raw)
  To: Wei Chen, Julien Grall, xen-devel, sstabellini, Jan Beulich
  Cc: Bertrand Marquis

Hi Julien, Jan

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Wei
> Chen
> Sent: 2021年8月28日 11:09
> To: Julien Grall <julien@xen.org>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org; Jan Beulich <jbeulich@suse.com>
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: RE: [XEN RFC PATCH 37/40] xen: introduce an arch helper to do
> NUMA init failed fallback
> 
> Hi Julien,
> 
> > -----Original Message-----
> > From: Julien Grall <julien@xen.org>
> > Sent: 2021年8月27日 22:30
> > To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> > sstabellini@kernel.org; jbeulich@suse.com
> > Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> > Subject: Re: [XEN RFC PATCH 37/40] xen: introduce an arch helper to do
> > NUMA init failed fallback
> >
> > Hi,
> >
> > On 11/08/2021 11:24, Wei Chen wrote:
> > > When Xen initialize NUMA failed, some architectures may need to
> > > do fallback actions. For example, in device tree based NUMA, Arm
> > > need to reset the distance between any two nodes.
> >
> >  From the description here, I don't understand why we need to reset the
> > distance for Arm but not x86. In fact...
> >
> > >
> > > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > > ---
> > >   xen/arch/arm/numa.c        | 13 +++++++++++++
> > >   xen/common/numa.c          |  3 +++
> > >   xen/include/asm-arm/numa.h |  1 +
> > >   xen/include/asm-x86/numa.h |  6 ++++++
> > >   4 files changed, 23 insertions(+)
> > >
> > > diff --git a/xen/arch/arm/numa.c b/xen/arch/arm/numa.c
> > > index 6eebf8e8bc..2a18c97470 100644
> > > --- a/xen/arch/arm/numa.c
> > > +++ b/xen/arch/arm/numa.c
> > > @@ -140,3 +140,16 @@ int __init arch_meminfo_get_ram_bank_range(int
> bank,
> > >
> > >   	return 0;
> > >   }
> > > +
> > > +void __init arch_numa_init_failed_fallback(void)
> > > +{
> > > +    int i, j;
> > > +
> > > +    /* Reset all node distance to remote_distance */
> > > +    for ( i = 0; i < MAX_NUMNODES; i++ ) {
> > > +        for ( j = 0; j < MAX_NUMNODES; j++ ) {
> > > +            numa_set_distance(i, j,
> > > +                (i == j) ? NUMA_LOCAL_DISTANCE :
> NUMA_REMOTE_DISTANCE);
> > > +        }
> > > +    }
> > > +}
> >
> > ... this implementation looks fairly generic. So can you explain why we
> > need it on Arm but not x86?
> >
> 
> This implementation is DT only, for x86, it's using acpi_slit.
> For now, I am not quit sure ACPI need to do fallback or not.
> Or say in another way, I don't know how to implement the fallback
> for ACPI. I planned to solve it in Arm ACPI version NUMA, so I left
> an empty helper for x86.
> 
> @Jan Beulich Could you give me some suggestion about x86 fallback?
> 
> 

I have a quick look into Linux. When Arch do numa init failed,
the numa_free_distance will be invoked to revert numa_distance.


> > > diff --git a/xen/common/numa.c b/xen/common/numa.c
> > > index d15c2fc311..88f1594127 100644
> > > --- a/xen/common/numa.c
> > > +++ b/xen/common/numa.c
> > > @@ -405,4 +405,7 @@ void __init numa_initmem_init(unsigned long
> > start_pfn, unsigned long end_pfn)
> > >       cpumask_copy(&node_to_cpumask[0], cpumask_of(0));
> > >       setup_node_bootmem(0, (u64)start_pfn << PAGE_SHIFT,
> > >                       (u64)end_pfn << PAGE_SHIFT);
> > > +
> > > +    /* architecture specified fallback operations */
> > > +    arch_numa_init_failed_fallback();
> > >   }
> > > diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
> > > index dd31324b0b..a3982a94b6 100644
> > > --- a/xen/include/asm-arm/numa.h
> > > +++ b/xen/include/asm-arm/numa.h
> > > @@ -28,6 +28,7 @@ extern s8 device_tree_numa;
> > >   extern void numa_init(bool acpi_off);
> > >   extern int numa_device_tree_init(const void *fdt);
> > >   extern void numa_set_distance(nodeid_t from, nodeid_t to, uint32_t
> > distance);
> > > +extern void arch_numa_init_failed_fallback(void);
> > >
> > >   /*
> > >    * Temporary for fake NUMA node, when CPU, memory and distance
> > > diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h
> > > index e63869135c..26280b0f3a 100644
> > > --- a/xen/include/asm-x86/numa.h
> > > +++ b/xen/include/asm-x86/numa.h
> > > @@ -22,4 +22,10 @@ extern void init_cpu_to_node(void);
> > >   void srat_parse_regions(u64 addr);
> > >   unsigned int arch_get_dma_bitsize(void);
> > >
> > > +/* Dummy function for numa init failed in numa_initmem_init */
> > > +static inline void arch_numa_init_failed_fallback(void)
> > > +{
> > > +    return;
> >
> > NIT: The return is pointless.
> >
> 
> OK
> 
> > > +}
> > > +
> > >   #endif
> > >
> >
> > Cheers,
> >
> > --
> > Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse device tree memory node
  2021-08-28  1:06   ` Stefano Stabellini
@ 2021-08-28  3:56     ` Wei Chen
  2021-08-28 10:33       ` Julien Grall
  0 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-28  3:56 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, julien, Bertrand Marquis, Jan Beulich

Hi Stefano,

> -----Original Message-----
> From: Stefano Stabellini <sstabellini@kernel.org>
> Sent: 2021年8月28日 9:06
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: xen-devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org;
> jbeulich@suse.com; Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse
> device tree memory node
> 
> On Wed, 11 Aug 2021, Wei Chen wrote:
> > Memory blocks' NUMA ID information is stored in device tree's
> > memory nodes as "numa-node-id". We need a new helper to parse
> > and verify this ID from memory nodes.
> >
> > In order to support memory affinity in later use, the valid
> > memory ranges and NUMA ID will be saved to tables.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >  xen/arch/arm/numa_device_tree.c | 130 ++++++++++++++++++++++++++++++++
> >  1 file changed, 130 insertions(+)
> >
> > diff --git a/xen/arch/arm/numa_device_tree.c
> b/xen/arch/arm/numa_device_tree.c
> > index 37cc56acf3..bbe081dcd1 100644
> > --- a/xen/arch/arm/numa_device_tree.c
> > +++ b/xen/arch/arm/numa_device_tree.c
> > @@ -20,11 +20,13 @@
> >  #include <xen/init.h>
> >  #include <xen/nodemask.h>
> >  #include <xen/numa.h>
> > +#include <xen/libfdt/libfdt.h>
> >  #include <xen/device_tree.h>
> >  #include <asm/setup.h>
> >
> >  s8 device_tree_numa = 0;
> >  static nodemask_t processor_nodes_parsed __initdata;
> > +static nodemask_t memory_nodes_parsed __initdata;
> >
> >  static int srat_disabled(void)
> >  {
> > @@ -55,6 +57,79 @@ static int __init
> dtb_numa_processor_affinity_init(nodeid_t node)
> >      return 0;
> >  }
> >
> > +/* Callback for parsing of the memory regions affinity */
> > +static int __init dtb_numa_memory_affinity_init(nodeid_t node,
> > +                                paddr_t start, paddr_t size)
> > +{
> > +    struct node *nd;
> > +    paddr_t end;
> > +    int i;
> > +
> > +    if ( srat_disabled() )
> > +        return -EINVAL;
> > +
> > +    end = start + size;
> > +    if ( num_node_memblks >= NR_NODE_MEMBLKS )
> > +    {
> > +        dprintk(XENLOG_WARNING,
> > +                "Too many numa entry, try bigger NR_NODE_MEMBLKS \n");
> > +        bad_srat();
> > +        return -EINVAL;
> > +    }
> > +
> > +    /* It is fine to add this area to the nodes data it will be used
> later */
> > +    i = conflicting_memblks(start, end);
> > +    /* No conflicting memory block, we can save it for later usage */;
> > +    if ( i < 0 )
> > +        goto save_memblk;
> > +
> > +    if ( memblk_nodeid[i] == node ) {
> > +        /*
> > +         * Overlaps with other memblk in the same node, warning here.
> > +         * This memblk will be merged with conflicted memblk later.
> > +         */
> > +        printk(XENLOG_WARNING
> > +               "DT: NUMA NODE %u (%"PRIx64
> > +               "-%"PRIx64") overlaps with itself (%"PRIx64"-
> %"PRIx64")\n",
> > +               node, start, end,
> > +               node_memblk_range[i].start, node_memblk_range[i].end);
> > +    } else {
> > +        /*
> > +         * Conflict with memblk in other node, this is an error.
> > +         * The NUMA information is invalid, NUMA will be turn off.
> > +         */
> > +        printk(XENLOG_ERR
> > +               "DT: NUMA NODE %u (%"PRIx64"-%"
> > +               PRIx64") overlaps with NODE %u (%"PRIx64"-%"PRIx64")\n",
> > +               node, start, end, memblk_nodeid[i],
> > +               node_memblk_range[i].start, node_memblk_range[i].end);
> > +        bad_srat();
> > +        return -EINVAL;
> > +    }
> > +
> > +save_memblk:
> > +    nd = &nodes[node];
> > +    if ( !node_test_and_set(node, memory_nodes_parsed) ) {
> > +        nd->start = start;
> > +        nd->end = end;
> > +    } else {
> > +        if ( start < nd->start )
> > +            nd->start = start;
> > +        if ( nd->end < end )
> > +            nd->end = end;
> > +    }
> > +
> > +    printk(XENLOG_INFO "DT: NUMA node %u %"PRIx64"-%"PRIx64"\n",
> > +           node, start, end);
> > +
> > +    node_memblk_range[num_node_memblks].start = start;
> > +    node_memblk_range[num_node_memblks].end = end;
> > +    memblk_nodeid[num_node_memblks] = node;
> > +    num_node_memblks++;
> 
> 
> Is it possible to have non-contigous ranges of memory for a single NUMA
> node?
> 
> Looking at the DT bindings and Linux implementation, it seems possible.
> Here, it seems that node_memblk_range/memblk_nodeid could handle it,
> but nodes couldn't.

Yes, you're right. I copied this code for x86 ACPI NUMA. Does ACPI allow
non-contiguous ranges of memory for a single NUMA node too? If yes, I think
this will affect x86 ACPI NUMA too. In next version, we plan to merge
dtb_numa_memory_affinity_init and acpi_numa_memory_affinity_init into a
neutral function. So we can fix them at the same time.

If not, maybe we have to keep the diversity for dtb and ACPI here.
Anyway, Thanks for pointing this, I will look into the latest Linux
implementation.

Cheers,
Wei Chen



^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse device tree memory node
  2021-08-28  3:56     ` Wei Chen
@ 2021-08-28 10:33       ` Julien Grall
  2021-08-28 13:58         ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-28 10:33 UTC (permalink / raw)
  To: Wei Chen, Stefano Stabellini; +Cc: xen-devel, Bertrand Marquis, Jan Beulich

Hi Wei,

On 28/08/2021 04:56, Wei Chen wrote:
>> -----Original Message-----
>> From: Stefano Stabellini <sstabellini@kernel.org>
>> Sent: 2021年8月28日 9:06
>> To: Wei Chen <Wei.Chen@arm.com>
>> Cc: xen-devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org;
>> jbeulich@suse.com; Bertrand Marquis <Bertrand.Marquis@arm.com>
>> Subject: Re: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse
>> device tree memory node
>>
>> On Wed, 11 Aug 2021, Wei Chen wrote:
>>> Memory blocks' NUMA ID information is stored in device tree's
>>> memory nodes as "numa-node-id". We need a new helper to parse
>>> and verify this ID from memory nodes.
>>>
>>> In order to support memory affinity in later use, the valid
>>> memory ranges and NUMA ID will be saved to tables.
>>>
>>> Signed-off-by: Wei Chen <wei.chen@arm.com>
>>> ---
>>>   xen/arch/arm/numa_device_tree.c | 130 ++++++++++++++++++++++++++++++++
>>>   1 file changed, 130 insertions(+)
>>>
>>> diff --git a/xen/arch/arm/numa_device_tree.c
>> b/xen/arch/arm/numa_device_tree.c
>>> index 37cc56acf3..bbe081dcd1 100644
>>> --- a/xen/arch/arm/numa_device_tree.c
>>> +++ b/xen/arch/arm/numa_device_tree.c
>>> @@ -20,11 +20,13 @@
>>>   #include <xen/init.h>
>>>   #include <xen/nodemask.h>
>>>   #include <xen/numa.h>
>>> +#include <xen/libfdt/libfdt.h>
>>>   #include <xen/device_tree.h>
>>>   #include <asm/setup.h>
>>>
>>>   s8 device_tree_numa = 0;
>>>   static nodemask_t processor_nodes_parsed __initdata;
>>> +static nodemask_t memory_nodes_parsed __initdata;
>>>
>>>   static int srat_disabled(void)
>>>   {
>>> @@ -55,6 +57,79 @@ static int __init
>> dtb_numa_processor_affinity_init(nodeid_t node)
>>>       return 0;
>>>   }
>>>
>>> +/* Callback for parsing of the memory regions affinity */
>>> +static int __init dtb_numa_memory_affinity_init(nodeid_t node,
>>> +                                paddr_t start, paddr_t size)
>>> +{
>>> +    struct node *nd;
>>> +    paddr_t end;
>>> +    int i;
>>> +
>>> +    if ( srat_disabled() )
>>> +        return -EINVAL;
>>> +
>>> +    end = start + size;
>>> +    if ( num_node_memblks >= NR_NODE_MEMBLKS )
>>> +    {
>>> +        dprintk(XENLOG_WARNING,
>>> +                "Too many numa entry, try bigger NR_NODE_MEMBLKS \n");
>>> +        bad_srat();
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    /* It is fine to add this area to the nodes data it will be used
>> later */
>>> +    i = conflicting_memblks(start, end);
>>> +    /* No conflicting memory block, we can save it for later usage */;
>>> +    if ( i < 0 )
>>> +        goto save_memblk;
>>> +
>>> +    if ( memblk_nodeid[i] == node ) {
>>> +        /*
>>> +         * Overlaps with other memblk in the same node, warning here.
>>> +         * This memblk will be merged with conflicted memblk later.
>>> +         */
>>> +        printk(XENLOG_WARNING
>>> +               "DT: NUMA NODE %u (%"PRIx64
>>> +               "-%"PRIx64") overlaps with itself (%"PRIx64"-
>> %"PRIx64")\n",
>>> +               node, start, end,
>>> +               node_memblk_range[i].start, node_memblk_range[i].end);
>>> +    } else {
>>> +        /*
>>> +         * Conflict with memblk in other node, this is an error.
>>> +         * The NUMA information is invalid, NUMA will be turn off.
>>> +         */
>>> +        printk(XENLOG_ERR
>>> +               "DT: NUMA NODE %u (%"PRIx64"-%"
>>> +               PRIx64") overlaps with NODE %u (%"PRIx64"-%"PRIx64")\n",
>>> +               node, start, end, memblk_nodeid[i],
>>> +               node_memblk_range[i].start, node_memblk_range[i].end);
>>> +        bad_srat();
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +save_memblk:
>>> +    nd = &nodes[node];
>>> +    if ( !node_test_and_set(node, memory_nodes_parsed) ) {
>>> +        nd->start = start;
>>> +        nd->end = end;
>>> +    } else {
>>> +        if ( start < nd->start )
>>> +            nd->start = start;
>>> +        if ( nd->end < end )
>>> +            nd->end = end;
>>> +    }
>>> +
>>> +    printk(XENLOG_INFO "DT: NUMA node %u %"PRIx64"-%"PRIx64"\n",
>>> +           node, start, end);
>>> +
>>> +    node_memblk_range[num_node_memblks].start = start;
>>> +    node_memblk_range[num_node_memblks].end = end;
>>> +    memblk_nodeid[num_node_memblks] = node;
>>> +    num_node_memblks++;
>>
>>
>> Is it possible to have non-contigous ranges of memory for a single NUMA
>> node?
>>
>> Looking at the DT bindings and Linux implementation, it seems possible.
>> Here, it seems that node_memblk_range/memblk_nodeid could handle it,
>> but nodes couldn't.
> 
> Yes, you're right. I copied this code for x86 ACPI NUMA. Does ACPI allow
> non-contiguous ranges of memory for a single NUMA node too? 

I couldn't find any restriction for ACPI. Although, I only briefly 
looked at the spec.

> If yes, I think
> this will affect x86 ACPI NUMA too. In next version, we plan to merge
> dtb_numa_memory_affinity_init and acpi_numa_memory_affinity_init into a
> neutral function. So we can fix them at the same time.
> 
> If not, maybe we have to keep the diversity for dtb and ACPI here.

I am not entirely sure what you mean. Are you saying if ACPI doesn't 
allow non-contiguous ranges of memory, then we should keep the 
implementation separated?

If so, then I disagree with that. It is fine to have code that supports 
more than what a firmware table supports. The main benefit is less code 
and therefore less long term maintenance (with the current solution we 
would need to check both the ACPI and DT implementation if there is a 
bug in one).

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 38/40] xen/arm: enable device tree based NUMA in system init
  2021-08-28  3:17     ` Wei Chen
@ 2021-08-28 10:45       ` Julien Grall
  2021-08-28 14:02         ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Julien Grall @ 2021-08-28 10:45 UTC (permalink / raw)
  To: Wei Chen, xen-devel, sstabellini; +Cc: Bertrand Marquis



On 28/08/2021 04:17, Wei Chen wrote:
> Hi Julien,

Hi Wei,

> 
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 2021年8月27日 22:33
>> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
>> sstabellini@kernel.org; jbeulich@suse.com
>> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
>> Subject: Re: [XEN RFC PATCH 38/40] xen/arm: enable device tree based NUMA
>> in system init
>>
>> Hi Wei,
>>
>> On 11/08/2021 11:24, Wei Chen wrote:
>>> Everything is ready, we can remove the fake NUMA node and
>>> depends on device tree to create NUMA system.
>>
>> So you just added code a few patches before that are now completely
>> rewritten. Can you please re-order this series so it doesn't happen?
>>
>> This may mean that CONFIG_NUMA is only selected until late in this series.
>>
> 
> Why I did like this is because my original concerns are:
> 1. When I introduced the CONFIG_NUMA. Users will be using a code base on
>     this commit by accident.
> 2. If users select CONFIG_NUMA, but not all NUMA data are not initialized
>     properly. The system may not work properly.

We have to make sure we don't break any existing use case when writing a 
new feature. However, a user should not expect a new feature to work it 
is using a random commit in the middle of the series.

This is also why I suggested that maybe CONFIG_NUMA is only selected for 
Arm towards the end of the series. So you reduce the risk of someone 
selecting it.

> 3. So I created the fake node to initialize the NUMA data, before we parser
>     real data from DTB.
> 4. In this case, user can work well with CONFIG_NUMA is enabled, without
>     this series is completed.

The flip side is you are adding more load on the reviewers because there 
are extra code. The series is already quite big (40 patches), any way to 
ease the review will definitely be appreciated.

Another possible way to ease the review is to move the patch that 
rework/move code at the beginning of the series and leave the Arm part 
for the second part of the series. This way, we can start to merge the 
series without waiting for the Arm bits to be completed.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse device tree memory node
  2021-08-28 10:33       ` Julien Grall
@ 2021-08-28 13:58         ` Wei Chen
  2021-09-08  7:34           ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-28 13:58 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini; +Cc: xen-devel, Bertrand Marquis, Jan Beulich

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月28日 18:34
> To: Wei Chen <Wei.Chen@arm.com>; Stefano Stabellini
> <sstabellini@kernel.org>
> Cc: xen-devel@lists.xenproject.org; Bertrand Marquis
> <Bertrand.Marquis@arm.com>; Jan Beulich <jbeulich@suse.com>
> Subject: Re: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse
> device tree memory node
> 
> Hi Wei,
> 
> On 28/08/2021 04:56, Wei Chen wrote:
> >> -----Original Message-----
> >> From: Stefano Stabellini <sstabellini@kernel.org>
> >> Sent: 2021��8��28�� 9:06
> >> To: Wei Chen <Wei.Chen@arm.com>
> >> Cc: xen-devel@lists.xenproject.org; sstabellini@kernel.org;
> julien@xen.org;
> >> jbeulich@suse.com; Bertrand Marquis <Bertrand.Marquis@arm.com>
> >> Subject: Re: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse
> >> device tree memory node
> >>
> >> On Wed, 11 Aug 2021, Wei Chen wrote:
> >>> Memory blocks' NUMA ID information is stored in device tree's
> >>> memory nodes as "numa-node-id". We need a new helper to parse
> >>> and verify this ID from memory nodes.
> >>>
> >>> In order to support memory affinity in later use, the valid
> >>> memory ranges and NUMA ID will be saved to tables.
> >>>
> >>> Signed-off-by: Wei Chen <wei.chen@arm.com>
> >>> ---
> >>>   xen/arch/arm/numa_device_tree.c | 130
> ++++++++++++++++++++++++++++++++
> >>>   1 file changed, 130 insertions(+)
> >>>
> >>> diff --git a/xen/arch/arm/numa_device_tree.c
> >> b/xen/arch/arm/numa_device_tree.c
> >>> index 37cc56acf3..bbe081dcd1 100644
> >>> --- a/xen/arch/arm/numa_device_tree.c
> >>> +++ b/xen/arch/arm/numa_device_tree.c
> >>> @@ -20,11 +20,13 @@
> >>>   #include <xen/init.h>
> >>>   #include <xen/nodemask.h>
> >>>   #include <xen/numa.h>
> >>> +#include <xen/libfdt/libfdt.h>
> >>>   #include <xen/device_tree.h>
> >>>   #include <asm/setup.h>
> >>>
> >>>   s8 device_tree_numa = 0;
> >>>   static nodemask_t processor_nodes_parsed __initdata;
> >>> +static nodemask_t memory_nodes_parsed __initdata;
> >>>
> >>>   static int srat_disabled(void)
> >>>   {
> >>> @@ -55,6 +57,79 @@ static int __init
> >> dtb_numa_processor_affinity_init(nodeid_t node)
> >>>       return 0;
> >>>   }
> >>>
> >>> +/* Callback for parsing of the memory regions affinity */
> >>> +static int __init dtb_numa_memory_affinity_init(nodeid_t node,
> >>> +                                paddr_t start, paddr_t size)
> >>> +{
> >>> +    struct node *nd;
> >>> +    paddr_t end;
> >>> +    int i;
> >>> +
> >>> +    if ( srat_disabled() )
> >>> +        return -EINVAL;
> >>> +
> >>> +    end = start + size;
> >>> +    if ( num_node_memblks >= NR_NODE_MEMBLKS )
> >>> +    {
> >>> +        dprintk(XENLOG_WARNING,
> >>> +                "Too many numa entry, try bigger NR_NODE_MEMBLKS \n");
> >>> +        bad_srat();
> >>> +        return -EINVAL;
> >>> +    }
> >>> +
> >>> +    /* It is fine to add this area to the nodes data it will be used
> >> later */
> >>> +    i = conflicting_memblks(start, end);
> >>> +    /* No conflicting memory block, we can save it for later usage */;
> >>> +    if ( i < 0 )
> >>> +        goto save_memblk;
> >>> +
> >>> +    if ( memblk_nodeid[i] == node ) {
> >>> +        /*
> >>> +         * Overlaps with other memblk in the same node, warning here.
> >>> +         * This memblk will be merged with conflicted memblk later.
> >>> +         */
> >>> +        printk(XENLOG_WARNING
> >>> +               "DT: NUMA NODE %u (%"PRIx64
> >>> +               "-%"PRIx64") overlaps with itself (%"PRIx64"-
> >> %"PRIx64")\n",
> >>> +               node, start, end,
> >>> +               node_memblk_range[i].start, node_memblk_range[i].end);
> >>> +    } else {
> >>> +        /*
> >>> +         * Conflict with memblk in other node, this is an error.
> >>> +         * The NUMA information is invalid, NUMA will be turn off.
> >>> +         */
> >>> +        printk(XENLOG_ERR
> >>> +               "DT: NUMA NODE %u (%"PRIx64"-%"
> >>> +               PRIx64") overlaps with NODE %u (%"PRIx64"-
> %"PRIx64")\n",
> >>> +               node, start, end, memblk_nodeid[i],
> >>> +               node_memblk_range[i].start, node_memblk_range[i].end);
> >>> +        bad_srat();
> >>> +        return -EINVAL;
> >>> +    }
> >>> +
> >>> +save_memblk:
> >>> +    nd = &nodes[node];
> >>> +    if ( !node_test_and_set(node, memory_nodes_parsed) ) {
> >>> +        nd->start = start;
> >>> +        nd->end = end;
> >>> +    } else {
> >>> +        if ( start < nd->start )
> >>> +            nd->start = start;
> >>> +        if ( nd->end < end )
> >>> +            nd->end = end;
> >>> +    }
> >>> +
> >>> +    printk(XENLOG_INFO "DT: NUMA node %u %"PRIx64"-%"PRIx64"\n",
> >>> +           node, start, end);
> >>> +
> >>> +    node_memblk_range[num_node_memblks].start = start;
> >>> +    node_memblk_range[num_node_memblks].end = end;
> >>> +    memblk_nodeid[num_node_memblks] = node;
> >>> +    num_node_memblks++;
> >>
> >>
> >> Is it possible to have non-contigous ranges of memory for a single NUMA
> >> node?
> >>
> >> Looking at the DT bindings and Linux implementation, it seems possible.
> >> Here, it seems that node_memblk_range/memblk_nodeid could handle it,
> >> but nodes couldn't.
> >
> > Yes, you're right. I copied this code for x86 ACPI NUMA. Does ACPI allow
> > non-contiguous ranges of memory for a single NUMA node too?
> 
> I couldn't find any restriction for ACPI. Although, I only briefly
> looked at the spec.
> 
> > If yes, I think
> > this will affect x86 ACPI NUMA too. In next version, we plan to merge
> > dtb_numa_memory_affinity_init and acpi_numa_memory_affinity_init into a
> > neutral function. So we can fix them at the same time.
> >
> > If not, maybe we have to keep the diversity for dtb and ACPI here.
> 
> I am not entirely sure what you mean. Are you saying if ACPI doesn't
> allow non-contiguous ranges of memory, then we should keep the
> implementation separated?
> 
> If so, then I disagree with that. It is fine to have code that supports
> more than what a firmware table supports. The main benefit is less code
> and therefore less long term maintenance (with the current solution we
> would need to check both the ACPI and DT implementation if there is a
> bug in one).
> 

Yes, I agree.

> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 38/40] xen/arm: enable device tree based NUMA in system init
  2021-08-28 10:45       ` Julien Grall
@ 2021-08-28 14:02         ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-28 14:02 UTC (permalink / raw)
  To: Julien Grall, xen-devel, sstabellini; +Cc: Bertrand Marquis

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2021年8月28日 18:45
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> sstabellini@kernel.org
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 38/40] xen/arm: enable device tree based NUMA
> in system init
> 
> 
> 
> On 28/08/2021 04:17, Wei Chen wrote:
> > Hi Julien,
> 
> Hi Wei,
> 
> >
> >> -----Original Message-----
> >> From: Julien Grall <julien@xen.org>
> >> Sent: 2021年8月27日 22:33
> >> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org;
> >> sstabellini@kernel.org; jbeulich@suse.com
> >> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>
> >> Subject: Re: [XEN RFC PATCH 38/40] xen/arm: enable device tree based
> NUMA
> >> in system init
> >>
> >> Hi Wei,
> >>
> >> On 11/08/2021 11:24, Wei Chen wrote:
> >>> Everything is ready, we can remove the fake NUMA node and
> >>> depends on device tree to create NUMA system.
> >>
> >> So you just added code a few patches before that are now completely
> >> rewritten. Can you please re-order this series so it doesn't happen?
> >>
> >> This may mean that CONFIG_NUMA is only selected until late in this
> series.
> >>
> >
> > Why I did like this is because my original concerns are:
> > 1. When I introduced the CONFIG_NUMA. Users will be using a code base on
> >     this commit by accident.
> > 2. If users select CONFIG_NUMA, but not all NUMA data are not
> initialized
> >     properly. The system may not work properly.
> 
> We have to make sure we don't break any existing use case when writing a
> new feature. However, a user should not expect a new feature to work it
> is using a random commit in the middle of the series.
> 
> This is also why I suggested that maybe CONFIG_NUMA is only selected for
> Arm towards the end of the series. So you reduce the risk of someone
> selecting it.
> 

Thanks for this clarification.

> > 3. So I created the fake node to initialize the NUMA data, before we
> parser
> >     real data from DTB.
> > 4. In this case, user can work well with CONFIG_NUMA is enabled, without
> >     this series is completed.
> 
> The flip side is you are adding more load on the reviewers because there
> are extra code. The series is already quite big (40 patches), any way to
> ease the review will definitely be appreciated.
> 
> Another possible way to ease the review is to move the patch that
> rework/move code at the beginning of the series and leave the Arm part
> for the second part of the series. This way, we can start to merge the
> series without waiting for the Arm bits to be completed.
> 

Yes, I will try to re-order the patches in this way in next version.

> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 37/40] xen: introduce an arch helper to do NUMA init failed fallback
  2021-08-28  3:45       ` Wei Chen
@ 2021-08-30  9:52         ` Jan Beulich
  2021-08-30 10:38           ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Jan Beulich @ 2021-08-30  9:52 UTC (permalink / raw)
  To: Wei Chen; +Cc: Bertrand Marquis, Julien Grall, xen-devel, sstabellini

On 28.08.2021 05:45, Wei Chen wrote:
>> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Wei
>> Chen
>> Sent: 2021年8月28日 11:09
>>
>>> From: Julien Grall <julien@xen.org>
>>> Sent: 2021年8月27日 22:30
>>>
>>>> --- a/xen/arch/arm/numa.c
>>>> +++ b/xen/arch/arm/numa.c
>>>> @@ -140,3 +140,16 @@ int __init arch_meminfo_get_ram_bank_range(int
>> bank,
>>>>
>>>>   	return 0;
>>>>   }
>>>> +
>>>> +void __init arch_numa_init_failed_fallback(void)
>>>> +{
>>>> +    int i, j;
>>>> +
>>>> +    /* Reset all node distance to remote_distance */
>>>> +    for ( i = 0; i < MAX_NUMNODES; i++ ) {
>>>> +        for ( j = 0; j < MAX_NUMNODES; j++ ) {
>>>> +            numa_set_distance(i, j,
>>>> +                (i == j) ? NUMA_LOCAL_DISTANCE :
>> NUMA_REMOTE_DISTANCE);
>>>> +        }
>>>> +    }
>>>> +}
>>>
>>> ... this implementation looks fairly generic. So can you explain why we
>>> need it on Arm but not x86?
>>>
>>
>> This implementation is DT only, for x86, it's using acpi_slit.
>> For now, I am not quit sure ACPI need to do fallback or not.
>> Or say in another way, I don't know how to implement the fallback
>> for ACPI. I planned to solve it in Arm ACPI version NUMA, so I left
>> an empty helper for x86.
>>
>> @Jan Beulich Could you give me some suggestion about x86 fallback?
>>
>>
> 
> I have a quick look into Linux. When Arch do numa init failed,
> the numa_free_distance will be invoked to revert numa_distance.

Does this matter in the first place? Don't we fall back to single
node mode, in which case the sole entry of the distance table
will say "local" anyway?

Jan



^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 37/40] xen: introduce an arch helper to do NUMA init failed fallback
  2021-08-30  9:52         ` Jan Beulich
@ 2021-08-30 10:38           ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-30 10:38 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Bertrand Marquis, Julien Grall, xen-devel, sstabellini

Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2021年8月30日 17:52
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: Bertrand Marquis <Bertrand.Marquis@arm.com>; Julien Grall
> <julien@xen.org>; xen-devel@lists.xenproject.org; sstabellini@kernel.org
> Subject: Re: [XEN RFC PATCH 37/40] xen: introduce an arch helper to do
> NUMA init failed fallback
> 
> On 28.08.2021 05:45, Wei Chen wrote:
> >> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of
> Wei
> >> Chen
> >> Sent: 2021年8月28日 11:09
> >>
> >>> From: Julien Grall <julien@xen.org>
> >>> Sent: 2021年8月27日 22:30
> >>>
> >>>> --- a/xen/arch/arm/numa.c
> >>>> +++ b/xen/arch/arm/numa.c
> >>>> @@ -140,3 +140,16 @@ int __init arch_meminfo_get_ram_bank_range(int
> >> bank,
> >>>>
> >>>>   	return 0;
> >>>>   }
> >>>> +
> >>>> +void __init arch_numa_init_failed_fallback(void)
> >>>> +{
> >>>> +    int i, j;
> >>>> +
> >>>> +    /* Reset all node distance to remote_distance */
> >>>> +    for ( i = 0; i < MAX_NUMNODES; i++ ) {
> >>>> +        for ( j = 0; j < MAX_NUMNODES; j++ ) {
> >>>> +            numa_set_distance(i, j,
> >>>> +                (i == j) ? NUMA_LOCAL_DISTANCE :
> >> NUMA_REMOTE_DISTANCE);
> >>>> +        }
> >>>> +    }
> >>>> +}
> >>>
> >>> ... this implementation looks fairly generic. So can you explain why
> we
> >>> need it on Arm but not x86?
> >>>
> >>
> >> This implementation is DT only, for x86, it's using acpi_slit.
> >> For now, I am not quit sure ACPI need to do fallback or not.
> >> Or say in another way, I don't know how to implement the fallback
> >> for ACPI. I planned to solve it in Arm ACPI version NUMA, so I left
> >> an empty helper for x86.
> >>
> >> @Jan Beulich Could you give me some suggestion about x86 fallback?
> >>
> >>
> >
> > I have a quick look into Linux. When Arch do numa init failed,
> > the numa_free_distance will be invoked to revert numa_distance.
> 
> Does this matter in the first place? Don't we fall back to single
> node mode, in which case the sole entry of the distance table
> will say "local" anyway?
> 

Thank you for providing another way of thinking. Yes, once NUMA init
is failed, the system will fall back to single node mode. If we call
__node_distance normally, we will not pass two different nodes to
this function. Even if we don't revert the values in distance table,
we will not trigger the condition of "node_a != node_b". We will
always get "local" from __node_distance.

But for closed-loop of code, I tend to revert data when initialization
is failed.

Cheers,
Wei Chen

> Jan


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 24/40] xen/arm: introduce a helper to parse device tree NUMA distance map
  2021-08-11 10:24 ` [XEN RFC PATCH 24/40] xen/arm: introduce a helper to parse device tree NUMA distance map Wei Chen
  2021-08-25 13:56   ` Julien Grall
@ 2021-08-31  0:48   ` Stefano Stabellini
  2021-08-31 10:17     ` Wei Chen
  1 sibling, 1 reply; 196+ messages in thread
From: Stefano Stabellini @ 2021-08-31  0:48 UTC (permalink / raw)
  To: Wei Chen; +Cc: xen-devel, sstabellini, julien, jbeulich, Bertrand.Marquis

On Wed, 11 Aug 2021, Wei Chen wrote:
> A NUMA aware device tree will provide a "distance-map" node to
> describe distance between any two nodes. This patch introduce a
> new helper to parse this distance map.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>  xen/arch/arm/numa_device_tree.c | 67 +++++++++++++++++++++++++++++++++
>  1 file changed, 67 insertions(+)
> 
> diff --git a/xen/arch/arm/numa_device_tree.c b/xen/arch/arm/numa_device_tree.c
> index bbe081dcd1..6e0d1d3d9f 100644
> --- a/xen/arch/arm/numa_device_tree.c
> +++ b/xen/arch/arm/numa_device_tree.c
> @@ -200,3 +200,70 @@ device_tree_parse_numa_memory_node(const void *fdt, int node,
>  
>      return 0;
>  }
> +
> +/* Parse NUMA distance map v1 */
> +int __init
> +device_tree_parse_numa_distance_map_v1(const void *fdt, int node)
> +{
> +    const struct fdt_property *prop;
> +    const __be32 *matrix;
> +    int entry_count, len, i;
> +
> +    printk(XENLOG_INFO "NUMA: parsing numa-distance-map\n");
> +
> +    prop = fdt_get_property(fdt, node, "distance-matrix", &len);
> +    if ( !prop )
> +    {
> +        printk(XENLOG_WARNING
> +               "NUMA: No distance-matrix property in distance-map\n");
> +
> +        return -EINVAL;
> +    }
> +
> +    if ( len % sizeof(uint32_t) != 0 )
> +    {
> +        printk(XENLOG_WARNING
> +               "distance-matrix in node is not a multiple of u32\n");
> +        return -EINVAL;
> +    }
> +
> +    entry_count = len / sizeof(uint32_t);
> +    if ( entry_count <= 0 )
> +    {
> +        printk(XENLOG_WARNING "NUMA: Invalid distance-matrix\n");
> +
> +        return -EINVAL;
> +    }
> +
> +    matrix = (const __be32 *)prop->data;
> +    for ( i = 0; i + 2 < entry_count; i += 3 )
> +    {
> +        uint32_t from, to, distance;
> +
> +        from = dt_read_number(matrix, 1);
> +        matrix++;
> +        to = dt_read_number(matrix, 1);
> +        matrix++;
> +        distance = dt_read_number(matrix, 1);
> +        matrix++;
> +
> +        if ( (from == to && distance != NUMA_LOCAL_DISTANCE) ||
> +            (from != to && distance <= NUMA_LOCAL_DISTANCE) )
> +        {
> +            printk(XENLOG_WARNING
> +                   "Invalid nodes' distance from node#%d to node#%d = %d\n",
> +                   from, to, distance);
> +            return -EINVAL;
> +        }
> +
> +        printk(XENLOG_INFO "NUMA: distance from node#%d to node#%d = %d\n",
> +               from, to, distance);
> +        numa_set_distance(from, to, distance);
> +
> +        /* Set default distance of node B->A same as A->B */
> +        if (to > from)
> +             numa_set_distance(to, from, distance);

I am a bit unsure about this last 2 lines: why calling numa_set_distance
in the opposite direction only when to > from? Wouldn't it be OK to
always do both:

numa_set_distance(from, to, distance);
numa_set_distance(to, from, distance);

?


But in any case, I have a different suggestion. The binding states that
"distances are equal in either direction". Also it has an example where
only one direction is expressed unfortunately (at the end of the
document).

So my suggestion is to parse it as follows:

- call numa_set_distance just once from
  device_tree_parse_numa_distance_map_v1

- in numa_set_distance:
    - set node_distance_map[from][to] = distance;
    - check node_distance_map[to][from]
          - if unset, node_distance_map[to][from] = distance;
          - if already set to the same value, return success;
          - if already set to a different value, return error;


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 25/40] xen/arm: unified entry to parse all NUMA data from device tree
  2021-08-11 10:24 ` [XEN RFC PATCH 25/40] xen/arm: unified entry to parse all NUMA data from device tree Wei Chen
@ 2021-08-31  0:54   ` Stefano Stabellini
  2021-08-31 17:47     ` Julien Grall
  0 siblings, 1 reply; 196+ messages in thread
From: Stefano Stabellini @ 2021-08-31  0:54 UTC (permalink / raw)
  To: Wei Chen, julien; +Cc: xen-devel, sstabellini, jbeulich, Bertrand.Marquis

On Wed, 11 Aug 2021, Wei Chen wrote:
> In this API, we scan whole device tree to parse CPU node id, memory
> node id and distance-map. Though early_scan_node will invoke has a
> handler to process memory nodes. If we want to parse memory node id
> in this handler, we have to embeded NUMA parse code in this handler.
> But we still need to scan whole device tree to find CPU NUMA id and
> distance-map. In this case, we include memory NUMA id parse in this
> API too. Another benefit is that we have a unique entry for device
> tree NUMA data parse.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>  xen/arch/arm/numa_device_tree.c | 31 ++++++++++++++++++++++++++++---
>  xen/include/asm-arm/numa.h      |  1 +
>  2 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/arm/numa_device_tree.c b/xen/arch/arm/numa_device_tree.c
> index 6e0d1d3d9f..27ffb72f7b 100644
> --- a/xen/arch/arm/numa_device_tree.c
> +++ b/xen/arch/arm/numa_device_tree.c
> @@ -131,7 +131,8 @@ save_memblk:
>  }
>  
>  /* Parse CPU NUMA node info */
> -int __init device_tree_parse_numa_cpu_node(const void *fdt, int node)
> +static int __init
> +device_tree_parse_numa_cpu_node(const void *fdt, int node)
>  {
>      uint32_t nid;
>  
> @@ -147,7 +148,7 @@ int __init device_tree_parse_numa_cpu_node(const void *fdt, int node)
>  }
>  
>  /* Parse memory node NUMA info */
> -int __init
> +static int __init
>  device_tree_parse_numa_memory_node(const void *fdt, int node,
>      const char *name, uint32_t addr_cells, uint32_t size_cells)
>  {
> @@ -202,7 +203,7 @@ device_tree_parse_numa_memory_node(const void *fdt, int node,
>  }
>  
>  /* Parse NUMA distance map v1 */
> -int __init
> +static int __init
>  device_tree_parse_numa_distance_map_v1(const void *fdt, int node)
>  {
>      const struct fdt_property *prop;
> @@ -267,3 +268,27 @@ device_tree_parse_numa_distance_map_v1(const void *fdt, int node)
>  
>      return 0;
>  }
> +
> +static int __init fdt_scan_numa_nodes(const void *fdt,
> +                int node, const char *uname, int depth,
> +                u32 address_cells, u32 size_cells, void *data)
> +{
> +    int ret = 0;
> +
> +    if ( fdt_node_check_type(fdt, node, "cpu") == 0 )
> +        ret = device_tree_parse_numa_cpu_node(fdt, node);
> +    else if ( fdt_node_check_type(fdt, node, "memory") == 0 )
> +        ret = device_tree_parse_numa_memory_node(fdt, node, uname,
> +                                address_cells, size_cells);
> +    else if ( fdt_node_check_compatible(fdt, node,
> +                                "numa-distance-map-v1") == 0 )
> +        ret = device_tree_parse_numa_distance_map_v1(fdt, node);
> +
> +    return ret;
> +}

Julien, do you have an opinion on whether it might be worth reusing the
existing early_scan_node function for this to avoiding another full FDT
scan (to avoid another call to device_tree_for_each_node)?


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 28/40] xen/x86: decouple nodes_cover_memory with E820 map
  2021-08-11 10:24 ` [XEN RFC PATCH 28/40] xen/x86: decouple nodes_cover_memory with E820 map Wei Chen
@ 2021-08-31  1:07   ` Stefano Stabellini
  2021-08-31 10:19     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Stefano Stabellini @ 2021-08-31  1:07 UTC (permalink / raw)
  To: Wei Chen; +Cc: xen-devel, sstabellini, julien, jbeulich, Bertrand.Marquis

On Wed, 11 Aug 2021, Wei Chen wrote:
> We will reuse nodes_cover_memory for Arm to check its bootmem
> info. So we introduce two arch helpers to get memory map's
> entry number and specified entry's range:
>     arch_get_memory_bank_number
>     arch_get_memory_bank_range
> 
> Depends above two helpers, we make nodes_cover_memory become
> architecture independent.

You might want to note that the only change from an x86 perspective is
the additional checks:

  !start || !end


> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>  xen/arch/x86/numa.c    | 18 ++++++++++++++++++
>  xen/arch/x86/srat.c    |  8 +++-----
>  xen/include/xen/numa.h |  4 ++++
>  3 files changed, 25 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
> index 6908738305..8b43be4aa7 100644
> --- a/xen/arch/x86/numa.c
> +++ b/xen/arch/x86/numa.c
> @@ -128,6 +128,24 @@ unsigned int __init arch_get_dma_bitsize(void)
>                   + PAGE_SHIFT, 32);
>  }
>  
> +uint32_t __init arch_meminfo_get_nr_bank(void)
> +{
> +	return e820.nr_map;
> +}
> +
> +int __init arch_meminfo_get_ram_bank_range(int bank,
> +	unsigned long long *start, unsigned long long *end)
> +{
> +	if (e820.map[bank].type != E820_RAM || !start || !end) {
> +		return -1;
> +	}
> +
> +	*start = e820.map[bank].addr;
> +	*end = e820.map[bank].addr + e820.map[bank].size;
> +
> +	return 0;
> +}
> +
>  static void dump_numa(unsigned char key)
>  {
>      s_time_t now = NOW();
> diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
> index 6d68b8a614..2298353846 100644
> --- a/xen/arch/x86/srat.c
> +++ b/xen/arch/x86/srat.c
> @@ -316,18 +316,16 @@ acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
>  static int __init nodes_cover_memory(void)
>  {
>  	int i;
> +	uint32_t nr_banks = arch_meminfo_get_nr_bank();
>  
> -	for (i = 0; i < e820.nr_map; i++) {
> +	for (i = 0; i < nr_banks; i++) {
>  		int j, found;
>  		unsigned long long start, end;
>  
> -		if (e820.map[i].type != E820_RAM) {
> +		if (arch_meminfo_get_ram_bank_range(i, &start, &end)) {
>  			continue;
>  		}
>  
> -		start = e820.map[i].addr;
> -		end = e820.map[i].addr + e820.map[i].size;
> -
>  		do {
>  			found = 0;
>  			for_each_node_mask(j, memory_nodes_parsed)
> diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
> index 0475823b13..6d18059bcd 100644
> --- a/xen/include/xen/numa.h
> +++ b/xen/include/xen/numa.h
> @@ -89,6 +89,10 @@ static inline void clear_node_cpumask(int cpu)
>  	cpumask_clear_cpu(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
>  }
>  
> +extern uint32_t arch_meminfo_get_nr_bank(void);
> +extern int arch_meminfo_get_ram_bank_range(int bank,
> +    unsigned long long *start, unsigned long long *end);
> +
>  #endif /* CONFIG_NUMA */
>  
>  #endif /* _XEN_NUMA_H */
> -- 
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 31/40] xen/x86: move nodes_cover_memory to common
  2021-08-11 10:24 ` [XEN RFC PATCH 31/40] xen/x86: move nodes_cover_memory " Wei Chen
@ 2021-08-31  1:16   ` Stefano Stabellini
  2021-08-31 13:43     ` Wei Chen
  0 siblings, 1 reply; 196+ messages in thread
From: Stefano Stabellini @ 2021-08-31  1:16 UTC (permalink / raw)
  To: Wei Chen; +Cc: xen-devel, sstabellini, julien, jbeulich, Bertrand.Marquis

On Wed, 11 Aug 2021, Wei Chen wrote:
> Not only ACPU NUMA, but also Arm device tree based NUMA
> will use nodes_cover_memory to do sanity check. So we move
> this function from arch/x86 to common.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>  xen/arch/x86/srat.c    | 40 ----------------------------------------
>  xen/common/numa.c      | 40 ++++++++++++++++++++++++++++++++++++++++
>  xen/include/xen/numa.h |  1 +
>  3 files changed, 41 insertions(+), 40 deletions(-)
> 
> diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
> index dd3aa30843..dcebc7adec 100644
> --- a/xen/arch/x86/srat.c
> +++ b/xen/arch/x86/srat.c
> @@ -308,46 +308,6 @@ acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
>  	num_node_memblks++;
>  }
>  
> -/* Sanity check to catch more bad SRATs (they are amazingly common).
> -   Make sure the PXMs cover all memory. */
> -static int __init nodes_cover_memory(void)
> -{
> -	int i;
> -	uint32_t nr_banks = arch_meminfo_get_nr_bank();
> -
> -	for (i = 0; i < nr_banks; i++) {
> -		int j, found;
> -		unsigned long long start, end;
> -
> -		if (arch_meminfo_get_ram_bank_range(i, &start, &end)) {
> -			continue;
> -		}
> -
> -		do {
> -			found = 0;
> -			for_each_node_mask(j, memory_nodes_parsed)
> -				if (start < nodes[j].end
> -				    && end > nodes[j].start) {
> -					if (start >= nodes[j].start) {
> -						start = nodes[j].end;
> -						found = 1;
> -					}
> -					if (end <= nodes[j].end) {
> -						end = nodes[j].start;
> -						found = 1;
> -					}
> -				}
> -		} while (found && start < end);
> -
> -		if (start < end) {
> -			printk(KERN_ERR "SRAT: No PXM for e820 range: "
> -				"%016Lx - %016Lx\n", start, end);
> -			return 0;
> -		}
> -	}
> -	return 1;
> -}
> -
>  void __init acpi_numa_arch_fixup(void) {}
>  
>  static uint64_t __initdata srat_region_mask;
> diff --git a/xen/common/numa.c b/xen/common/numa.c
> index 79ab250543..74960885a6 100644
> --- a/xen/common/numa.c
> +++ b/xen/common/numa.c
> @@ -193,6 +193,46 @@ void __init cutoff_node(int i, u64 start, u64 end)
>  	}
>  }
>  
> +/* Sanity check to catch more bad SRATs (they are amazingly common).
> +   Make sure the PXMs cover all memory. */
> +int __init nodes_cover_memory(void)
> +{
> +	int i;
> +	uint32_t nr_banks = arch_meminfo_get_nr_bank();
> +
> +	for (i = 0; i < nr_banks; i++) {
> +		int j, found;
> +		unsigned long long start, end;
> +
> +		if (arch_meminfo_get_ram_bank_range(i, &start, &end)) {
> +			continue;
> +		}
> +
> +		do {
> +			found = 0;
> +			for_each_node_mask(j, memory_nodes_parsed)
> +				if (start < nodes[j].end
> +				    && end > nodes[j].start) {
> +					if (start >= nodes[j].start) {
> +						start = nodes[j].end;
> +						found = 1;
> +					}
> +					if (end <= nodes[j].end) {
> +						end = nodes[j].start;
> +						found = 1;
> +					}
> +				}
> +		} while (found && start < end);
> +
> +		if (start < end) {
> +			printk(KERN_ERR "SRAT: No PXM for e820 range: "
> +				"%016Lx - %016Lx\n", start, end);

I don't know if you are already doing this in a later patch but the
message shouldn't say e820 as it doesn't exist on all architecture.
Maybe "for address range" or "for memory range" would suffice.

Normally we don't do change together with code movement but in this case
I think it would be OK.


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 34/40] xen: move numa_scan_nodes from x86 to common
  2021-08-11 10:24 ` [XEN RFC PATCH 34/40] xen: move numa_scan_nodes from x86 to common Wei Chen
  2021-08-27 14:14   ` Julien Grall
@ 2021-08-31  1:26   ` Stefano Stabellini
  2021-08-31 13:43     ` Wei Chen
  1 sibling, 1 reply; 196+ messages in thread
From: Stefano Stabellini @ 2021-08-31  1:26 UTC (permalink / raw)
  To: Wei Chen; +Cc: xen-devel, sstabellini, julien, jbeulich, Bertrand.Marquis

On Wed, 11 Aug 2021, Wei Chen wrote:
> After the previous patches preparations, numa_scan_nodes can be
> used by Arm and x86. So we move this function from x86 to common.
> As node_cover_memory will not be used cross files, we restore its
> static attribute in this patch.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>  xen/arch/x86/srat.c        | 52 ------------------------------------
>  xen/common/numa.c          | 54 +++++++++++++++++++++++++++++++++++++-
>  xen/include/asm-x86/acpi.h |  3 ---
>  xen/include/xen/numa.h     |  3 ++-
>  4 files changed, 55 insertions(+), 57 deletions(-)
> 
> diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
> index c979939fdd..c9f019c307 100644
> --- a/xen/arch/x86/srat.c
> +++ b/xen/arch/x86/srat.c
> @@ -361,58 +361,6 @@ void __init srat_parse_regions(u64 addr)
>  	pfn_pdx_hole_setup(mask >> PAGE_SHIFT);
>  }
>  
> -/* Use the information discovered above to actually set up the nodes. */
> -int __init numa_scan_nodes(u64 start, u64 end)
> -{
> -	int i;
> -	nodemask_t all_nodes_parsed;
> -
> -	/* First clean up the node list */
> -	for (i = 0; i < MAX_NUMNODES; i++)
> -		cutoff_node(i, start, end);
> -
> -#ifdef CONFIG_ACPI_NUMA
> -	if (acpi_numa <= 0)
> -		return -1;
> -#endif
> -
> -	if (!nodes_cover_memory()) {
> -		bad_srat();
> -		return -1;
> -	}
> -
> -	memnode_shift = compute_hash_shift(node_memblk_range, num_node_memblks,
> -				memblk_nodeid);
> -
> -	if (memnode_shift < 0) {
> -		printk(KERN_ERR
> -		     "SRAT: No NUMA node hash function found. Contact maintainer\n");
> -		bad_srat();
> -		return -1;
> -	}
> -
> -	nodes_or(all_nodes_parsed, memory_nodes_parsed, processor_nodes_parsed);
> -
> -	/* Finally register nodes */
> -	for_each_node_mask(i, all_nodes_parsed)
> -	{
> -		u64 size = nodes[i].end - nodes[i].start;
> -		if ( size == 0 )
> -			printk(KERN_WARNING "SRAT: Node %u has no memory. "
> -			       "BIOS Bug or mis-configured hardware?\n", i);
> -
> -		setup_node_bootmem(i, nodes[i].start, nodes[i].end);
> -	}
> -	for (i = 0; i < nr_cpu_ids; i++) {
> -		if (cpu_to_node[i] == NUMA_NO_NODE)
> -			continue;
> -		if (!nodemask_test(cpu_to_node[i], &processor_nodes_parsed))
> -			numa_set_node(i, NUMA_NO_NODE);
> -	}
> -	numa_init_array();
> -	return 0;
> -}
> -
>  static unsigned node_to_pxm(nodeid_t n)
>  {
>  	unsigned i;
> diff --git a/xen/common/numa.c b/xen/common/numa.c
> index 4152bbe83b..8ca13e27d1 100644
> --- a/xen/common/numa.c
> +++ b/xen/common/numa.c
> @@ -195,7 +195,7 @@ void __init cutoff_node(int i, u64 start, u64 end)
>  
>  /* Sanity check to catch more bad SRATs (they are amazingly common).
>     Make sure the PXMs cover all memory. */
> -int __init nodes_cover_memory(void)
> +static int __init nodes_cover_memory(void)
>  {
>  	int i;
>  	uint32_t nr_banks = arch_meminfo_get_nr_bank();
> @@ -271,6 +271,58 @@ void __init numa_init_array(void)
>      }
>  }
>  
> +/* Use the information discovered above to actually set up the nodes. */
> +int __init numa_scan_nodes(u64 start, u64 end)
> +{
> +	int i;
> +	nodemask_t all_nodes_parsed;
> +
> +	/* First clean up the node list */
> +	for (i = 0; i < MAX_NUMNODES; i++)
> +		cutoff_node(i, start, end);
> +
> +#ifdef CONFIG_ACPI_NUMA
> +	if (acpi_numa <= 0)
> +		return -1;
> +#endif
> +
> +	if (!nodes_cover_memory()) {
> +		bad_srat();
> +		return -1;
> +	}
> +
> +	memnode_shift = compute_hash_shift(node_memblk_range, num_node_memblks,
> +				memblk_nodeid);
> +
> +	if (memnode_shift < 0) {
> +		printk(KERN_ERR
> +		     "SRAT: No NUMA node hash function found. Contact maintainer\n");
> +		bad_srat();
> +		return -1;
> +	}
> +
> +	nodes_or(all_nodes_parsed, memory_nodes_parsed, processor_nodes_parsed);
> +
> +	/* Finally register nodes */
> +	for_each_node_mask(i, all_nodes_parsed)
> +	{
> +		u64 size = nodes[i].end - nodes[i].start;
> +		if ( size == 0 )
> +			printk(KERN_WARNING "SRAT: Node %u has no memory. "
> +			       "BIOS Bug or mis-configured hardware?\n", i);

Not all archs have a BIOS so I'd say "firmware bug". Like last time, we
usually don't do code changes together with code movement, but in this
case it might be OK. I am also happy with a separate patch to adjust the
two comments (this one and the one in the previous patch).


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 38/40] xen/arm: enable device tree based NUMA in system init
  2021-08-11 10:24 ` [XEN RFC PATCH 38/40] xen/arm: enable device tree based NUMA in system init Wei Chen
  2021-08-27 14:32   ` Julien Grall
@ 2021-08-31  1:50   ` Stefano Stabellini
  2021-08-31 13:43     ` Wei Chen
  1 sibling, 1 reply; 196+ messages in thread
From: Stefano Stabellini @ 2021-08-31  1:50 UTC (permalink / raw)
  To: Wei Chen; +Cc: xen-devel, sstabellini, julien, jbeulich, Bertrand.Marquis

On Wed, 11 Aug 2021, Wei Chen wrote:
> Everything is ready, we can remove the fake NUMA node and
> depends on device tree to create NUMA system.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>  xen/arch/arm/numa.c        | 45 ++++++++++++++++++++++----------------
>  xen/include/asm-arm/numa.h |  7 ------
>  2 files changed, 26 insertions(+), 26 deletions(-)
> 
> diff --git a/xen/arch/arm/numa.c b/xen/arch/arm/numa.c
> index 2a18c97470..3b04220e60 100644
> --- a/xen/arch/arm/numa.c
> +++ b/xen/arch/arm/numa.c
> @@ -18,6 +18,7 @@
>   *
>   */
>  #include <xen/init.h>
> +#include <xen/device_tree.h>
>  #include <xen/nodemask.h>
>  #include <xen/numa.h>
>  #include <xen/pfn.h>
> @@ -83,28 +84,34 @@ void __init numa_init(bool acpi_off)
>      paddr_t ram_size = 0;
>      paddr_t ram_end = 0;
>  
> -    printk(XENLOG_WARNING
> -        "NUMA has not been supported yet, NUMA off!\n");
> -    /* Arm NUMA has not been implemented until this patch */
> -    numa_off = true;
> +    /* NUMA has been turned off through Xen parameters */
> +    if ( numa_off )
> +        goto mem_init;
>  
> -    /*
> -     * Set all cpu_to_node mapping to 0, this will make cpu_to_node
> -     * function return 0 as previous fake cpu_to_node API.
> -     */
> -    for ( idx = 0; idx < NR_CPUS; idx++ )
> -        cpu_to_node[idx] = 0;
> -
> -    /*
> -     * Make node_to_cpumask, node_spanned_pages and node_start_pfn
> -     * return as previous fake APIs.
> -     */
> -    for ( idx = 0; idx < MAX_NUMNODES; idx++ ) {
> -        node_to_cpumask[idx] = cpu_online_map;
> -        node_spanned_pages(idx) = (max_page - mfn_x(first_valid_mfn));
> -        node_start_pfn(idx) = (mfn_x(first_valid_mfn));
> +    /* Initialize NUMA from device tree when system is not ACPI booted */
> +    if ( acpi_off )
> +    {
> +#ifdef CONFIG_DEVICE_TREE_NUMA
> +        int ret = numa_device_tree_init(device_tree_flattened);
> +        if ( !ret )
> +            goto mem_init;
> +        printk(XENLOG_WARNING
> +               "Init NUMA from device tree failed, ret=%d\n", ret);
> +#else
> +        printk(XENLOG_WARNING
> +               "CONFIG_DEVICE_TREE_NUMA is not set, NUMA off!\n");

I don't think we want to see this warning every time at boot when
CONFIG_DEVICE_TREE_NUMA is off. I'd set it to XENLOG_DEBUG or remove it.

Also given that we have many stub functions in
xen/include/asm-arm/numa.h already, maybe we could also have a stub
function for numa_device_tree_init so that we won'd need an #ifdef
CONFIG_DEVICE_TREE_NUMA here.


> +#endif
> +        numa_off = true;
> +    }
> +    else
> +    {
> +        /* We don't support NUMA for ACPI boot currently */
> +        printk(XENLOG_WARNING
> +               "ACPI NUMA has not been supported yet, NUMA off!\n");
> +        numa_off = true;
>      }
>  
> +mem_init:
>      /*
>       * Find the minimal and maximum address of RAM, NUMA will
>       * build a memory to node mapping table for the whole range.
> diff --git a/xen/include/asm-arm/numa.h b/xen/include/asm-arm/numa.h
> index a3982a94b6..425eb9aede 100644
> --- a/xen/include/asm-arm/numa.h
> +++ b/xen/include/asm-arm/numa.h
> @@ -30,13 +30,6 @@ extern int numa_device_tree_init(const void *fdt);
>  extern void numa_set_distance(nodeid_t from, nodeid_t to, uint32_t distance);
>  extern void arch_numa_init_failed_fallback(void);
>  
> -/*
> - * Temporary for fake NUMA node, when CPU, memory and distance
> - * matrix will be read from DTB or ACPI SRAT. The following
> - * symbols will be removed.
> - */
> -extern mfn_t first_valid_mfn;
> -
>  #else
>  
>  /* Fake one node for now. See also node_online_map. */
> -- 
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 196+ messages in thread

* Re: [XEN RFC PATCH 39/40] xen/x86: move numa_setup to common to support NUMA switch in command line
  2021-08-11 10:24 ` [XEN RFC PATCH 39/40] xen/x86: move numa_setup to common to support NUMA switch in command line Wei Chen
  2021-08-27 14:37   ` Julien Grall
@ 2021-08-31  1:53   ` Stefano Stabellini
  2021-08-31 13:44     ` Wei Chen
  1 sibling, 1 reply; 196+ messages in thread
From: Stefano Stabellini @ 2021-08-31  1:53 UTC (permalink / raw)
  To: Wei Chen; +Cc: xen-devel, sstabellini, julien, jbeulich, Bertrand.Marquis

On Wed, 11 Aug 2021, Wei Chen wrote:
> Xen x86 has created a command line parameter "numa" as NUMA switch for
> user to turn on/off NUMA. As device tree based NUMA has been enabled
> for Arm, this parameter can be reused by Arm. So in this patch, we move
> this parameter to common.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> ---
>  xen/arch/x86/numa.c    | 34 ----------------------------------
>  xen/common/numa.c      | 35 ++++++++++++++++++++++++++++++++++-
>  xen/include/xen/numa.h |  1 -
>  3 files changed, 34 insertions(+), 36 deletions(-)
> 
> diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
> index 8b43be4aa7..380d8ed6fd 100644
> --- a/xen/arch/x86/numa.c
> +++ b/xen/arch/x86/numa.c
> @@ -11,7 +11,6 @@
>  #include <xen/nodemask.h>
>  #include <xen/numa.h>
>  #include <xen/keyhandler.h>
> -#include <xen/param.h>
>  #include <xen/time.h>
>  #include <xen/smp.h>
>  #include <xen/pfn.h>
> @@ -19,9 +18,6 @@
>  #include <xen/sched.h>
>  #include <xen/softirq.h>
>  
> -static int numa_setup(const char *s);
> -custom_param("numa", numa_setup);
> -
>  #ifndef Dprintk
>  #define Dprintk(x...)
>  #endif
> @@ -50,35 +46,6 @@ void numa_set_node(int cpu, nodeid_t node)
>      cpu_to_node[cpu] = node;
>  }
>  
> -/* [numa=off] */
> -static __init int numa_setup(const char *opt)
> -{
> -    if ( !strncmp(opt,"off",3) )
> -        numa_off = true;
> -    else if ( !strncmp(opt,"on",2) )
> -        numa_off = false;
> -#ifdef CONFIG_NUMA_EMU
> -    else if ( !strncmp(opt, "fake=", 5) )
> -    {
> -        numa_off = false;
> -        numa_fake = simple_strtoul(opt+5,NULL,0);
> -        if ( numa_fake >= MAX_NUMNODES )
> -            numa_fake = MAX_NUMNODES;
> -    }
> -#endif
> -#ifdef CONFIG_ACPI_NUMA
> -    else if ( !strncmp(opt,"noacpi",6) )
> -    {
> -        numa_off = false;
> -        acpi_numa = -1;
> -    }
> -#endif
> -    else
> -        return -EINVAL;
> -
> -    return 0;
> -} 
> -
>  /*
>   * Setup early cpu_to_node.
>   *
> @@ -287,4 +254,3 @@ static __init int register_numa_trigger(void)
>      return 0;
>  }
>  __initcall(register_numa_trigger);
> -

spurious change


^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 24/40] xen/arm: introduce a helper to parse device tree NUMA distance map
  2021-08-31  0:48   ` Stefano Stabellini
@ 2021-08-31 10:17     ` Wei Chen
  2021-08-31 21:36       ` Stefano Stabellini
  0 siblings, 1 reply; 196+ messages in thread
From: Wei Chen @ 2021-08-31 10:17 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, julien, Bertrand Marquis

Hi Stefano,

> -----Original Message-----
> From: Stefano Stabellini <sstabellini@kernel.org>
> Sent: 2021年8月31日 8:48
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: xen-devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org;
> jbeulich@suse.com; Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 24/40] xen/arm: introduce a helper to parse
> device tree NUMA distance map
> 
> On Wed, 11 Aug 2021, Wei Chen wrote:
> > A NUMA aware device tree will provide a "distance-map" node to
> > describe distance between any two nodes. This patch introduce a
> > new helper to parse this distance map.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >  xen/arch/arm/numa_device_tree.c | 67 +++++++++++++++++++++++++++++++++
> >  1 file changed, 67 insertions(+)
> >
> > diff --git a/xen/arch/arm/numa_device_tree.c
> b/xen/arch/arm/numa_device_tree.c
> > index bbe081dcd1..6e0d1d3d9f 100644
> > --- a/xen/arch/arm/numa_device_tree.c
> > +++ b/xen/arch/arm/numa_device_tree.c
> > @@ -200,3 +200,70 @@ device_tree_parse_numa_memory_node(const void *fdt,
> int node,
> >
> >      return 0;
> >  }
> > +
> > +/* Parse NUMA distance map v1 */
> > +int __init
> > +device_tree_parse_numa_distance_map_v1(const void *fdt, int node)
> > +{
> > +    const struct fdt_property *prop;
> > +    const __be32 *matrix;
> > +    int entry_count, len, i;
> > +
> > +    printk(XENLOG_INFO "NUMA: parsing numa-distance-map\n");
> > +
> > +    prop = fdt_get_property(fdt, node, "distance-matrix", &len);
> > +    if ( !prop )
> > +    {
> > +        printk(XENLOG_WARNING
> > +               "NUMA: No distance-matrix property in distance-map\n");
> > +
> > +        return -EINVAL;
> > +    }
> > +
> > +    if ( len % sizeof(uint32_t) != 0 )
> > +    {
> > +        printk(XENLOG_WARNING
> > +               "distance-matrix in node is not a multiple of u32\n");
> > +        return -EINVAL;
> > +    }
> > +
> > +    entry_count = len / sizeof(uint32_t);
> > +    if ( entry_count <= 0 )
> > +    {
> > +        printk(XENLOG_WARNING "NUMA: Invalid distance-matrix\n");
> > +
> > +        return -EINVAL;
> > +    }
> > +
> > +    matrix = (const __be32 *)prop->data;
> > +    for ( i = 0; i + 2 < entry_count; i += 3 )
> > +    {
> > +        uint32_t from, to, distance;
> > +
> > +        from = dt_read_number(matrix, 1);
> > +        matrix++;
> > +        to = dt_read_number(matrix, 1);
> > +        matrix++;
> > +        distance = dt_read_number(matrix, 1);
> > +        matrix++;
> > +
> > +        if ( (from == to && distance != NUMA_LOCAL_DISTANCE) ||
> > +            (from != to && distance <= NUMA_LOCAL_DISTANCE) )
> > +        {
> > +            printk(XENLOG_WARNING
> > +                   "Invalid nodes' distance from node#%d to node#%d
> = %d\n",
> > +                   from, to, distance);
> > +            return -EINVAL;
> > +        }
> > +
> > +        printk(XENLOG_INFO "NUMA: distance from node#%d to node#%d
> = %d\n",
> > +               from, to, distance);
> > +        numa_set_distance(from, to, distance);
> > +
> > +        /* Set default distance of node B->A same as A->B */
> > +        if (to > from)
> > +             numa_set_distance(to, from, distance);
> 
> I am a bit unsure about this last 2 lines: why calling numa_set_distance
> in the opposite direction only when to > from? Wouldn't it be OK to
> always do both:
> 
> numa_set_distance(from, to, distance);
> numa_set_distance(to, from, distance);
> 
> ?
> 
I borrowed this code from Linux, but here is my understanding:

First, I read some notes in Documentation/devicetree/bindings/numa.txt
1. Each entry represents distance from first node to second node.
The distances are equal in either direction.
2. distance-matrix should have entries in lexicographical ascending
order of nodes.

Here is an example of distance-map node in DTB:
Sample#1, full list:
		distance-map {
			 compatible = "numa-distance-map-v1";
			 distance-matrix = <0 0  10>,
					   <0 1  20>,
					   <0 2  40>,
					   <0 3  20>,
					   <1 0  20>,
					   <1 1  10>,
					   <1 2  20>,
					   <1 3  40>,
					   <2 0  40>,
					   <2 1  20>,
					   <2 2  10>,
					   <2 3  20>,
					   <3 0  20>,
					   <3 1  40>,
					   <3 2  20>,
					   <3 3  10>;
		};

Call numa_set_distance when "to > from" will prevent Xen to call
numa_set_distance(0, 1, 20) again when it's setting distance for <1 0 20>.
But, numa_set_distance(1, 0, 20) will be call twice.

Normally, distance-map node will be optimized in following sample#2,
all redundant entries are removed:
Sample#2, partial list:
		distance-map {
			 compatible = "numa-distance-map-v1";
			 distance-matrix = <0 0  10>,
					   <0 1  20>,
					   <0 2  40>,
					   <0 3  20>,
					   <1 1  10>,
					   <1 2  20>,
					   <1 3  40>,
					   <2 2  10>,
					   <2 3  20>,
					   <3 3  10>;
		};

There is not any "from > to" entry in the map. But using this partial map
still can set all distances for all pairs. And numa_set_distance(1, 0, 20)
will be only once.


> But in any case, I have a different suggestion. The binding states that
> "distances are equal in either direction". Also it has an example where
> only one direction is expressed unfortunately (at the end of the
> document).
> 

Oh, I should see this comment first, then I will not post above
comment : )

> So my suggestion is to parse it as follows:
> 
> - call numa_set_distance just once from
>   device_tree_parse_numa_distance_map_v1
> 
> - in numa_set_distance:
>     - set node_distance_map[from][to] = distance;
>     - check node_distance_map[to][from]
>           - if unset, node_distance_map[to][from] = distance;
>           - if already set to the same value, return success;
>           - if already set to a different value, return error;

I don't really like this implementation. I want the behavior of
numa_set_distance just like the function name, do not include
implicit operations. Otherwise, except the user read this function
implementation before he use it, he probably doesn't know this
function has done so many things.

Cheers,
Wei Chen


^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 28/40] xen/x86: decouple nodes_cover_memory with E820 map
  2021-08-31  1:07   ` Stefano Stabellini
@ 2021-08-31 10:19     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-31 10:19 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, julien, Bertrand Marquis

Hi Stefano,

> -----Original Message-----
> From: Stefano Stabellini <sstabellini@kernel.org>
> Sent: 2021年8月31日 9:08
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: xen-devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org;
> jbeulich@suse.com; Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 28/40] xen/x86: decouple nodes_cover_memory
> with E820 map
> 
> On Wed, 11 Aug 2021, Wei Chen wrote:
> > We will reuse nodes_cover_memory for Arm to check its bootmem
> > info. So we introduce two arch helpers to get memory map's
> > entry number and specified entry's range:
> >     arch_get_memory_bank_number
> >     arch_get_memory_bank_range
> >
> > Depends above two helpers, we make nodes_cover_memory become
> > architecture independent.
> 
> You might want to note that the only change from an x86 perspective is
> the additional checks:
> 
>   !start || !end
> 

Thanks, I will add it.

> 
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >  xen/arch/x86/numa.c    | 18 ++++++++++++++++++
> >  xen/arch/x86/srat.c    |  8 +++-----
> >  xen/include/xen/numa.h |  4 ++++
> >  3 files changed, 25 insertions(+), 5 deletions(-)
> >
> > diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
> > index 6908738305..8b43be4aa7 100644
> > --- a/xen/arch/x86/numa.c
> > +++ b/xen/arch/x86/numa.c
> > @@ -128,6 +128,24 @@ unsigned int __init arch_get_dma_bitsize(void)
> >                   + PAGE_SHIFT, 32);
> >  }
> >
> > +uint32_t __init arch_meminfo_get_nr_bank(void)
> > +{
> > +	return e820.nr_map;
> > +}
> > +
> > +int __init arch_meminfo_get_ram_bank_range(int bank,
> > +	unsigned long long *start, unsigned long long *end)
> > +{
> > +	if (e820.map[bank].type != E820_RAM || !start || !end) {
> > +		return -1;
> > +	}
> > +
> > +	*start = e820.map[bank].addr;
> > +	*end = e820.map[bank].addr + e820.map[bank].size;
> > +
> > +	return 0;
> > +}
> > +
> >  static void dump_numa(unsigned char key)
> >  {
> >      s_time_t now = NOW();
> > diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
> > index 6d68b8a614..2298353846 100644
> > --- a/xen/arch/x86/srat.c
> > +++ b/xen/arch/x86/srat.c
> > @@ -316,18 +316,16 @@ acpi_numa_memory_affinity_init(const struct
> acpi_srat_mem_affinity *ma)
> >  static int __init nodes_cover_memory(void)
> >  {
> >  	int i;
> > +	uint32_t nr_banks = arch_meminfo_get_nr_bank();
> >
> > -	for (i = 0; i < e820.nr_map; i++) {
> > +	for (i = 0; i < nr_banks; i++) {
> >  		int j, found;
> >  		unsigned long long start, end;
> >
> > -		if (e820.map[i].type != E820_RAM) {
> > +		if (arch_meminfo_get_ram_bank_range(i, &start, &end)) {
> >  			continue;
> >  		}
> >
> > -		start = e820.map[i].addr;
> > -		end = e820.map[i].addr + e820.map[i].size;
> > -
> >  		do {
> >  			found = 0;
> >  			for_each_node_mask(j, memory_nodes_parsed)
> > diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h
> > index 0475823b13..6d18059bcd 100644
> > --- a/xen/include/xen/numa.h
> > +++ b/xen/include/xen/numa.h
> > @@ -89,6 +89,10 @@ static inline void clear_node_cpumask(int cpu)
> >  	cpumask_clear_cpu(cpu, &node_to_cpumask[cpu_to_node(cpu)]);
> >  }
> >
> > +extern uint32_t arch_meminfo_get_nr_bank(void);
> > +extern int arch_meminfo_get_ram_bank_range(int bank,
> > +    unsigned long long *start, unsigned long long *end);
> > +
> >  #endif /* CONFIG_NUMA */
> >
> >  #endif /* _XEN_NUMA_H */
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 196+ messages in thread

* RE: [XEN RFC PATCH 31/40] xen/x86: move nodes_cover_memory to common
  2021-08-31  1:16   ` Stefano Stabellini
@ 2021-08-31 13:43     ` Wei Chen
  0 siblings, 0 replies; 196+ messages in thread
From: Wei Chen @ 2021-08-31 13:43 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, julien, Bertrand Marquis

Hi Stefano,

> -----Original Message-----
> From: Stefano Stabellini <sstabellini@kernel.org>
> Sent: 2021年8月31日 9:17
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: xen-devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org;
> jbeulich@suse.com; Bertrand Marquis <Bertrand.Marquis@arm.com>
> Subject: Re: [XEN RFC PATCH 31/40] xen/x86: move nodes_cover_memory to
> common
> 
> On Wed, 11 Aug 2021, Wei Chen wrote:
> > Not only ACPU NUMA, but also Arm device tree based NUMA
> > will use nodes_cover_memory to do sanity check. So we move
> > this function from arch/x86 to common.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > ---
> >  xen/arch/x86/srat.c    | 40 ----------------------------------------
> >  xen/common/numa.c      | 40 ++++++++++++++++++++++++++++++++++++++++
> >  xen/include/xen/numa.h |  1 +
> >  3 files changed, 41 insertions(+), 40 deletions(-)
> >
> > diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
> > index dd3aa30843..dcebc7adec 100644
> > --- a/xen/arch/x86/srat.c
> > +++ b/xen/arch/x86/srat.c
> > @@ -308,46 +308,6 @@ acpi_numa_memory_affinity_init(const struct
> acpi_srat_mem_affinity *ma)
> >  	num_node_memblks++;
> >  }
> >
> > -/* Sanity check to catch more bad SRATs (they are amazingly common).
> > -   Make sure the PXMs cover all memory. */
> > -static int __init nodes_cover_memory(void)
> > -{
> > -	int i;
> > -	uint32_t nr_banks = arch_meminfo_get_nr_bank();
> > -
> > -	for (i = 0; i < nr_banks; i++) {
> > -		int j, found;
> > -		unsigned long long start, end;
> > -
> > -		if (arch_meminfo_get_ram_bank_range(i, &start, &end)) {
> > -			continue;
> > -		}
> > -
> > -		do {
> > -			found = 0;
> > -			for_each_node_mask(j, memory_nodes_parsed)
> > -				if (start < nodes[j].end
> > -				    && end > nodes[j].start) {
> > -					if (start >= nodes[j].start) {
> > -						start = nodes[j].end;
> > -						found = 1;
> > -					}
> > -					if (end <= nodes[j].end) {
> > -						end = nodes[j].start;
> > -						found = 1;
> > -					}
> > -				}
> > -		} while (found && start < end);
> > -
> > -		if (start < end) {
> > -			printk(KERN_ERR "SRAT: No PXM for e820 range: "
> > -				"%016Lx - %016Lx\n", start, end);
> > -			return 0;
> > -		}
> > -	}
> > -	return 1;
> > -}
> > -
> >  void __init acpi_numa_arch_fixup(void) {}
> >
> >  static uint64_t __initdata srat_region_mask;
> > diff --git a/xen/common/numa.c b/xen/common/numa.c
> > index 79ab250543..74960885a6 100644
> > --- a/xen/common/numa.c
> > +++ b/xen/common/numa.c
> > @@ -193,6 +193,46 @@ void __init cutoff_node(int i, u64 start, u64 end)
> >  	}
> >  }
> >
> > +/* Sanity check to catch more bad SRATs (they are amazingly common).
> > +   Make sure the PXMs cover all memory. */
> > +int __init nodes_cover_memory(void)
> > +{
> > +	int i;
> > +	uint32_t nr_banks = arch_meminfo_get_nr_bank();
> > +
> > +	for (i = 0; i < nr_banks; i++) {
> > +		int j, found;
> > +		unsigned long long start, end;
> > +
> > +		if (arch_meminfo_get_ram_bank_range(i, &start, &end)) {
> > +			continue;
> > +		}
> > +
> > +		do {
> > +			found = 0;
> > +			for_each_node_mask(j, memory_nodes_parsed)
> > +				if (start < nodes[j].end
> > +				    && end > nodes[j].start) {
> > +					if (start >= nodes[j].start) {
> > +						start = nodes[j].end;
> > +						found = 1;
> > +					}
> > +					if (end <= nodes[j].end) {
> > +						end = nodes[j].start;
> > +						found = 1;
> > +					}
> > +				}
> > +		} while (found && start < end);
> > +
> > +		if (start < end) {
> > +			printk(KERN_ERR "SRAT: No PXM for e820 range: "
> > +				"%016Lx - %016Lx\n", start, end);
> 
> I don't know if you are already doing this in a later patch but the
> message shouldn't say e820 as it doesn't exist on all architecture.
> Maybe "for address range" or "for memory range" would suffice.
> 
> Normally we don't do change together with code movement but in this case
> I think it would be OK.

OK, I will do it in next version.

^ permalink raw reply	[