All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/9] Device tree based NUMA support for Arm - Part#1
@ 2022-05-11  1:46 Wei Chen
  2022-05-11  1:46 ` [PATCH v3 1/9] xen/arm: Print a 64-bit number in hex from early uart Wei Chen
                   ` (8 more replies)
  0 siblings, 9 replies; 29+ messages in thread
From: Wei Chen @ 2022-05-11  1:46 UTC (permalink / raw)
  To: xen-devel
  Cc: nd, Wei Chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, George Dunlap

(The Arm device tree based NUMA support patch set contains 35
patches. In order to make stuff easier for reviewers, I split
them into 3 parts:
1. Preparation. I have re-sorted the patch series. And moved
   independent patches to the head of the series.
2. Move generically usable code from x86 to common.
3. Add new code to support Arm.

This series only contains the first part patches.)

Xen memory allocation and scheduler modules are NUMA aware.
But actually, on x86 has implemented the architecture APIs
to support NUMA. Arm was providing a set of fake architecture
APIs to make it compatible with NUMA awared memory allocation
and scheduler.

Arm system was working well as a single node NUMA system with
these fake APIs, because we didn't have multiple nodes NUMA
system on Arm. But in recent years, more and more Arm devices
support multiple nodes NUMA system.

So now we have a new problem. When Xen is running on these Arm
devices, Xen still treat them as single node SMP systems. The
NUMA affinity capability of Xen memory allocation and scheduler
becomes meaningless. Because they rely on input data that does
not reflect real NUMA layout.

Xen still think the access time for all of the memory is the
same for all CPUs. However, Xen may allocate memory to a VM
from different NUMA nodes with different access speeds. This
difference can be amplified in workloads inside VM, causing
performance instability and timeouts. 

So in this patch series, we implement a set of NUMA API to use
device tree to describe the NUMA layout. We reuse most of the
code of x86 NUMA to create and maintain the mapping between
memory and CPU, create the matrix between any two NUMA nodes.
Except ACPI and some x86 specified code, we have moved other
code to common. In next stage, when we implement ACPI based
NUMA for Arm64, we may move the ACPI NUMA code to common too,
but in current stage, we keep it as x86 only.

This patch serires has been tested and booted well on one
Arm64 NUMA machine and one HPE x86 NUMA machine.

---
Part1 v2->v3:
1. Rework EFI stub patch:
   1.1. Add existed file check, if exists a regular stub files,
        the common/stub files' links will be ignored.
   1.2. Keep stub.c in x86/efi to include common/efi/stub.c
   1.3. Restore efi_compat_xxx stub functions to x86/efi.c.
        Other architectures will not use efi_compat_xxx.
   1.4. Remove ARM_EFI dependency from ARM_64.
   1.5. Add comment for adding stub.o to EFIOBJ-y.
   1.6. Merge patch#2 and patch#3 to one patch.
2. Rename arch_have_default_dmazone to arch_want_default_dmazone
3. Use uint64_t for size in acpi_scan_nodes, make it be
   consistent with numa_emulation.
4. Merge the interleaves checking code from a separate function
   to conflicting_memblks.
5. Use INFO level for node's without memory log message.
6. Move "xen/x86: Use ASSERT instead of VIRTUAL_BUG_ON for
   phys_to_nid" to part#2.
Part1 v1->v2:
1. Move independent patches from later to early of this series.
2. Drop the copy of EFI stub.c from Arm. Share common codes of
   x86 EFI stub for Arm.
3. Use CONFIG_ARM_EFI to replace CONFIG_EFI and remove help text
   and make CONFIG_ARM_EFI invisible.
4. Use ASSERT to replace VIRTUAL_BUG_ON in phys_to_nid.
5. Move MAX_NUMNODES from xen/numa.h to asm/numa.h for x86.
6. Extend the description of Arm's workaround for reserve DMA
   allocations to avoid the same discussion every time for
   arch_have_default_dmazone.
7. Update commit messages.

Wei Chen (9):
  xen/arm: Print a 64-bit number in hex from early uart
  xen: reuse x86 EFI stub functions for Arm
  xen/arm: Keep memory nodes in device tree when Xen boots from EFI
  xen: introduce an arch helper for default dma zone status
  xen: decouple NUMA from ACPI in Kconfig
  xen/arm: use !CONFIG_NUMA to keep fake NUMA API
  xen/x86: use paddr_t for addresses in NUMA node structure
  xen/x86: add detection of memory interleaves for different nodes
  xen/x86: use INFO level for node's without memory log message

 xen/arch/arm/Kconfig              |   4 +
 xen/arch/arm/Makefile             |   2 +-
 xen/arch/arm/arm64/head.S         |  12 +--
 xen/arch/arm/bootfdt.c            |   8 +-
 xen/arch/arm/efi/Makefile         |   8 ++
 xen/arch/arm/efi/efi-boot.h       |  25 ------
 xen/arch/arm/include/asm/numa.h   |   6 ++
 xen/arch/x86/Kconfig              |   2 +-
 xen/arch/x86/efi/stub.c           |  32 +------
 xen/arch/x86/include/asm/config.h |   1 -
 xen/arch/x86/include/asm/numa.h   |   9 +-
 xen/arch/x86/numa.c               |  32 ++++---
 xen/arch/x86/srat.c               | 137 +++++++++++++++++++++---------
 xen/common/Kconfig                |   3 +
 xen/common/efi/efi-common.mk      |   3 +-
 xen/common/efi/stub.c             |  32 +++++++
 xen/common/page_alloc.c           |   2 +-
 xen/drivers/acpi/Kconfig          |   3 +-
 xen/drivers/acpi/Makefile         |   2 +-
 19 files changed, 193 insertions(+), 130 deletions(-)
 create mode 100644 xen/common/efi/stub.c

-- 
2.25.1



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v3 1/9] xen/arm: Print a 64-bit number in hex from early uart
  2022-05-11  1:46 [PATCH v3 0/9] Device tree based NUMA support for Arm - Part#1 Wei Chen
@ 2022-05-11  1:46 ` Wei Chen
  2022-05-16 17:14   ` Julien Grall
  2022-05-11  1:46 ` [PATCH v3 2/9] xen: reuse x86 EFI stub functions for Arm Wei Chen
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 29+ messages in thread
From: Wei Chen @ 2022-05-11  1:46 UTC (permalink / raw)
  To: xen-devel
  Cc: nd, Wei Chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Jiamei Xie, Julien Grall

Current putn function that is using for early print
only can print low 32-bit of AArch64 register. This
will lose some important messages while debugging
with early console. For example:
(XEN) Bringing up CPU5
- CPU 0000000100000100 booting -
Will be truncated to
(XEN) Bringing up CPU5
- CPU 00000100 booting -

In this patch, we increased the print loops and shift
bits to make putn print 64-bit number.

Signed-off-by: Wei Chen <wei.chen@arm.com>
Tested-by: Jiamei Xie <jiamei.xie@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
---
v2->v3:
Add Tb from Jiamei.
---
 xen/arch/arm/arm64/head.S | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 1fd35a8390..109ae7de0c 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -872,17 +872,19 @@ puts:
         ret
 ENDPROC(puts)
 
-/* Print a 32-bit number in hex.  Specific to the PL011 UART.
+/*
+ * Print a 64-bit number in hex.
  * x0: Number to print.
  * x23: Early UART base address
- * Clobbers x0-x3 */
+ * Clobbers x0-x3
+ */
 putn:
         adr   x1, hex
-        mov   x3, #8
+        mov   x3, #16
 1:
         early_uart_ready x23, 2
-        and   x2, x0, #0xf0000000    /* Mask off the top nybble */
-        lsr   x2, x2, #28
+        and   x2, x0, #(0xf<<60)     /* Mask off the top nybble */
+        lsr   x2, x2, #60
         ldrb  w2, [x1, x2]           /* Convert to a char */
         early_uart_transmit x23, w2
         lsl   x0, x0, #4             /* Roll it through one nybble at a time */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 2/9] xen: reuse x86 EFI stub functions for Arm
  2022-05-11  1:46 [PATCH v3 0/9] Device tree based NUMA support for Arm - Part#1 Wei Chen
  2022-05-11  1:46 ` [PATCH v3 1/9] xen/arm: Print a 64-bit number in hex from early uart Wei Chen
@ 2022-05-11  1:46 ` Wei Chen
  2022-05-13  0:33   ` Stefano Stabellini
  2022-05-18 13:04   ` Jan Beulich
  2022-05-11  1:46 ` [PATCH v3 3/9] xen/arm: Keep memory nodes in device tree when Xen boots from EFI Wei Chen
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 29+ messages in thread
From: Wei Chen @ 2022-05-11  1:46 UTC (permalink / raw)
  To: xen-devel
  Cc: nd, Wei Chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Jiamei Xie

x86 is using compiler feature testing to decide EFI build
enable or not. When EFI build is disabled, x86 will use an
efi/stub.c file to replace efi/runtime.c for build objects.
Following this idea, we introduce a stub file for Arm, but
use CONFIG_ARM_EFI to decide EFI build enable or not.

And the most functions in x86 EFI stub.c can be reused for
other architectures, like Arm. So we move them to common
and keep the x86 specific function in x86/efi/stub.c.

To avoid the symbol link conflict error when linking common
stub files to x86/efi. We add a regular file check in efi
stub files' link script. Depends on this check we can bypass
the link behaviors for existed stub files in x86/efi.

As there is no Arm specific EFI stub function for Arm in
current stage, Arm still can use the existed symbol link
method for EFI stub files.

Signed-off-by: Wei Chen <wei.chen@arm.com>
Tested-by: Jiamei Xie <jiamei.xie@arm.com>
---
v2 -> v3:
1. Add existed file check, if a regular stub files,
   the common/stub files' link will be ignored.
2. Keep stub.c in x86/efi to include common/efi/stub.c
3. Restore efi_compat_xxx stub functions to x86/efi.c.
   Other architectures will not use efi_compat_xxx.
4. Remove ARM_EFI dependency from ARM_64.
5. Add comment for adding stub.o to EFIOBJ-y.
6. Merge patch#2 and patch#3 to one patch.
v1 -> v2:
1. Drop the copy of stub.c from Arm EFI.
2. Share common codes of x86 EFI stub for other architectures.
3. Use CONFIG_ARM_EFI to replace CONFIG_EFI
4. Remove help text and make CONFIG_ARM_EFI invisible.
5. Merge one following patch:
   xen/arm: introduce a stub file for non-EFI architectures
6. Use the common stub.c instead of creating new one.
---
 xen/arch/arm/Kconfig         |  4 ++++
 xen/arch/arm/Makefile        |  2 +-
 xen/arch/arm/efi/Makefile    |  8 ++++++++
 xen/arch/x86/efi/stub.c      | 32 +-------------------------------
 xen/common/efi/efi-common.mk |  3 ++-
 xen/common/efi/stub.c        | 32 ++++++++++++++++++++++++++++++++
 6 files changed, 48 insertions(+), 33 deletions(-)
 create mode 100644 xen/common/efi/stub.c

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index ecfa6822e4..8a16d43bd5 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -6,6 +6,7 @@ config ARM_64
 	def_bool y
 	depends on !ARM_32
 	select 64BIT
+	select ARM_EFI
 	select HAS_FAST_MULTIPLY
 
 config ARM
@@ -33,6 +34,9 @@ config ACPI
 	  Advanced Configuration and Power Interface (ACPI) support for Xen is
 	  an alternative to device tree on ARM64.
 
+config ARM_EFI
+	bool
+
 config GICV3
 	bool "GICv3 driver"
 	depends on ARM_64 && !NEW_VGIC
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 1d862351d1..bb7a6151c1 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -1,6 +1,5 @@
 obj-$(CONFIG_ARM_32) += arm32/
 obj-$(CONFIG_ARM_64) += arm64/
-obj-$(CONFIG_ARM_64) += efi/
 obj-$(CONFIG_ACPI) += acpi/
 obj-$(CONFIG_HAS_PCI) += pci/
 ifneq ($(CONFIG_NO_PLAT),y)
@@ -20,6 +19,7 @@ obj-y += domain.o
 obj-y += domain_build.init.o
 obj-y += domctl.o
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
+obj-y += efi/
 obj-y += gic.o
 obj-y += gic-v2.o
 obj-$(CONFIG_GICV3) += gic-v3.o
diff --git a/xen/arch/arm/efi/Makefile b/xen/arch/arm/efi/Makefile
index 4313c39066..dffe72e589 100644
--- a/xen/arch/arm/efi/Makefile
+++ b/xen/arch/arm/efi/Makefile
@@ -1,4 +1,12 @@
 include $(srctree)/common/efi/efi-common.mk
 
+ifeq ($(CONFIG_ARM_EFI),y)
 obj-y += $(EFIOBJ-y)
 obj-$(CONFIG_ACPI) +=  efi-dom0.init.o
+else
+# Add stub.o to EFIOBJ-y to re-use the clean-files in
+# efi-common.mk. Otherwise the link of stub.c in arm/efi
+# will not be cleaned in "make clean".
+EFIOBJ-y += stub.o
+obj-y += stub.o
+endif
diff --git a/xen/arch/x86/efi/stub.c b/xen/arch/x86/efi/stub.c
index 9984932626..f2365bc041 100644
--- a/xen/arch/x86/efi/stub.c
+++ b/xen/arch/x86/efi/stub.c
@@ -1,7 +1,5 @@
 #include <xen/efi.h>
-#include <xen/errno.h>
 #include <xen/init.h>
-#include <xen/lib.h>
 #include <asm/asm_defns.h>
 #include <asm/efibind.h>
 #include <asm/page.h>
@@ -10,6 +8,7 @@
 #include <efi/eficon.h>
 #include <efi/efidevp.h>
 #include <efi/efiapi.h>
+#include "../../../common/efi/stub.c"
 
 /*
  * Here we are in EFI stub. EFI calls are not supported due to lack
@@ -45,11 +44,6 @@ void __init noreturn efi_multiboot2(EFI_HANDLE ImageHandle,
     unreachable();
 }
 
-bool efi_enabled(unsigned int feature)
-{
-    return false;
-}
-
 void __init efi_init_memory(void) { }
 
 bool efi_boot_mem_unused(unsigned long *start, unsigned long *end)
@@ -62,32 +56,8 @@ bool efi_boot_mem_unused(unsigned long *start, unsigned long *end)
 
 void efi_update_l4_pgtable(unsigned int l4idx, l4_pgentry_t l4e) { }
 
-bool efi_rs_using_pgtables(void)
-{
-    return false;
-}
-
-unsigned long efi_get_time(void)
-{
-    BUG();
-    return 0;
-}
-
-void efi_halt_system(void) { }
-void efi_reset_system(bool warm) { }
-
-int efi_get_info(uint32_t idx, union xenpf_efi_info *info)
-{
-    return -ENOSYS;
-}
-
 int efi_compat_get_info(uint32_t idx, union compat_pf_efi_info *)
     __attribute__((__alias__("efi_get_info")));
 
-int efi_runtime_call(struct xenpf_efi_runtime_call *op)
-{
-    return -ENOSYS;
-}
-
 int efi_compat_runtime_call(struct compat_pf_efi_runtime_call *)
     __attribute__((__alias__("efi_runtime_call")));
diff --git a/xen/common/efi/efi-common.mk b/xen/common/efi/efi-common.mk
index 4298ceaee7..3a5b9958b3 100644
--- a/xen/common/efi/efi-common.mk
+++ b/xen/common/efi/efi-common.mk
@@ -9,7 +9,8 @@ CFLAGS-y += -iquote $(srcdir)
 # e.g.: It transforms "dir/foo/bar" into successively
 #       "dir foo bar", ".. .. ..", "../../.."
 $(obj)/%.c: $(srctree)/common/efi/%.c FORCE
-	$(Q)ln -nfs $(subst $(space),/,$(patsubst %,..,$(subst /, ,$(obj))))/source/common/efi/$(<F) $@
+	$(Q)test -f $@ || \
+	ln -nfs $(subst $(space),/,$(patsubst %,..,$(subst /, ,$(obj))))/source/common/efi/$(<F) $@
 
 clean-files += $(patsubst %.o, %.c, $(EFIOBJ-y:.init.o=.o) $(EFIOBJ-))
 
diff --git a/xen/common/efi/stub.c b/xen/common/efi/stub.c
new file mode 100644
index 0000000000..15694632c2
--- /dev/null
+++ b/xen/common/efi/stub.c
@@ -0,0 +1,32 @@
+#include <xen/efi.h>
+#include <xen/errno.h>
+#include <xen/lib.h>
+
+bool efi_enabled(unsigned int feature)
+{
+    return false;
+}
+
+bool efi_rs_using_pgtables(void)
+{
+    return false;
+}
+
+unsigned long efi_get_time(void)
+{
+    BUG();
+    return 0;
+}
+
+void efi_halt_system(void) { }
+void efi_reset_system(bool warm) { }
+
+int efi_get_info(uint32_t idx, union xenpf_efi_info *info)
+{
+    return -ENOSYS;
+}
+
+int efi_runtime_call(struct xenpf_efi_runtime_call *op)
+{
+    return -ENOSYS;
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 3/9] xen/arm: Keep memory nodes in device tree when Xen boots from EFI
  2022-05-11  1:46 [PATCH v3 0/9] Device tree based NUMA support for Arm - Part#1 Wei Chen
  2022-05-11  1:46 ` [PATCH v3 1/9] xen/arm: Print a 64-bit number in hex from early uart Wei Chen
  2022-05-11  1:46 ` [PATCH v3 2/9] xen: reuse x86 EFI stub functions for Arm Wei Chen
@ 2022-05-11  1:46 ` Wei Chen
  2022-05-11  1:46 ` [PATCH v3 4/9] xen: introduce an arch helper for default dma zone status Wei Chen
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: Wei Chen @ 2022-05-11  1:46 UTC (permalink / raw)
  To: xen-devel
  Cc: nd, Wei Chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Jiamei Xie

In current code, when Xen is booting from EFI, it will delete
all memory nodes in device tree. This would work well in current
stage, because Xen can get memory map from EFI system table.
However, EFI system table cannot completely replace memory nodes
of device tree. EFI system table doesn't contain memory NUMA
information. Xen depends on ACPI SRAT or device tree memory nodes
to parse memory blocks' NUMA mapping. So in EFI + DTB boot, Xen
doesn't have any method to get numa-node-id for memory blocks any
more. This makes device tree based NUMA support become impossible
for Xen in EFI + DTB boot.

So in this patch, we will keep memory nodes in device tree for
NUMA code to parse memory numa-node-id later.

As a side effect, if we still parse boot memory information in
early_scan_node, bootmem.info will calculate memory ranges in
memory nodes twice. So we have to prevent early_scan_node to
parse memory nodes in EFI boot.

Signed-off-by: Wei Chen <wei.chen@arm.com>
Tested-by: Jiamei Xie <jiamei.xie@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
---
v2 -> v3:
1. Add Rb.
v1 -> v2:
1. Move this patch from later to early of this series.
2. Refine commit message.
---
 xen/arch/arm/bootfdt.c      |  8 +++++++-
 xen/arch/arm/efi/efi-boot.h | 25 -------------------------
 2 files changed, 7 insertions(+), 26 deletions(-)

diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
index 29671c8df0..ec81a45de9 100644
--- a/xen/arch/arm/bootfdt.c
+++ b/xen/arch/arm/bootfdt.c
@@ -11,6 +11,7 @@
 #include <xen/lib.h>
 #include <xen/kernel.h>
 #include <xen/init.h>
+#include <xen/efi.h>
 #include <xen/device_tree.h>
 #include <xen/libfdt/libfdt.h>
 #include <xen/sort.h>
@@ -367,7 +368,12 @@ static int __init early_scan_node(const void *fdt,
 {
     int rc = 0;
 
-    if ( device_tree_node_matches(fdt, node, "memory") )
+    /*
+     * If Xen has been booted via UEFI, the memory banks are
+     * populated. So we should skip the parsing.
+     */
+    if ( !efi_enabled(EFI_BOOT) &&
+         device_tree_node_matches(fdt, node, "memory") )
         rc = process_memory_node(fdt, node, name, depth,
                                  address_cells, size_cells, &bootinfo.mem);
     else if ( depth == 1 && !dt_node_cmp(name, "reserved-memory") )
diff --git a/xen/arch/arm/efi/efi-boot.h b/xen/arch/arm/efi/efi-boot.h
index e452b687d8..59d93c24a1 100644
--- a/xen/arch/arm/efi/efi-boot.h
+++ b/xen/arch/arm/efi/efi-boot.h
@@ -231,33 +231,8 @@ EFI_STATUS __init fdt_add_uefi_nodes(EFI_SYSTEM_TABLE *sys_table,
     int status;
     u32 fdt_val32;
     u64 fdt_val64;
-    int prev;
     int num_rsv;
 
-    /*
-     * Delete any memory nodes present.  The EFI memory map is the only
-     * memory description provided to Xen.
-     */
-    prev = 0;
-    for (;;)
-    {
-        const char *type;
-        int len;
-
-        node = fdt_next_node(fdt, prev, NULL);
-        if ( node < 0 )
-            break;
-
-        type = fdt_getprop(fdt, node, "device_type", &len);
-        if ( type && strncmp(type, "memory", len) == 0 )
-        {
-            fdt_del_node(fdt, node);
-            continue;
-        }
-
-        prev = node;
-    }
-
    /*
     * Delete all memory reserve map entries. When booting via UEFI,
     * kernel will use the UEFI memory map to find reserved regions.
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 4/9] xen: introduce an arch helper for default dma zone status
  2022-05-11  1:46 [PATCH v3 0/9] Device tree based NUMA support for Arm - Part#1 Wei Chen
                   ` (2 preceding siblings ...)
  2022-05-11  1:46 ` [PATCH v3 3/9] xen/arm: Keep memory nodes in device tree when Xen boots from EFI Wei Chen
@ 2022-05-11  1:46 ` Wei Chen
  2022-05-13  0:32   ` Stefano Stabellini
  2022-05-18 13:09   ` Jan Beulich
  2022-05-11  1:46 ` [PATCH v3 5/9] xen: decouple NUMA from ACPI in Kconfig Wei Chen
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 29+ messages in thread
From: Wei Chen @ 2022-05-11  1:46 UTC (permalink / raw)
  To: xen-devel
  Cc: nd, Wei Chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Jan Beulich,
	Wei Liu, Roger Pau Monné,
	Jiamei Xie

In current code, when Xen is running in a multiple nodes
NUMA system, it will set dma_bitsize in end_boot_allocator
to reserve some low address memory as DMA zone.

There are some x86 implications in the implementation.
Because on x86, memory starts from 0. On a multiple-nodes
NUMA system, if a single node contains the majority or all
of the DMA memory, x86 prefers to give out memory from
non-local allocations rather than exhausting the DMA memory
ranges. Hence x86 uses dma_bitsize to set aside some largely
arbitrary amount of memory for DMA zone. The allocations
from DMA zone would happen only after exhausting all other
nodes' memory.

But the implications are not shared across all architectures.
For example, Arm cannot guarantee the availability of memory
below a certain boundary for DMA limited-capability devices
either. But currently, Arm doesn't need a reserved DMA zone
in Xen. Because there is no DMA device in Xen. And for guests,
Xen Arm only allows Dom0 to have DMA operations without IOMMU.
Xen will try to allocate memory under 4GB or memory range that
is limited by dma_bitsize for Dom0 in boot time. For DomU, even
Xen can passthrough devices to DomU without IOMMU, but Xen Arm
doesn't guarantee their DMA operations. So, Xen Arm doesn't
need a reserved DMA zone to provide DMA memory for guests.

In this patch, we introduce an arch_want_default_dmazone helper
for different architectures to determine whether they need to
set dma_bitsize for DMA zone reservation or not.

At the same time, when x86 Xen is built with CONFIG_PV=n could
probably leverage this new helper to actually not trigger DMA
zone reservation.

Signed-off-by: Wei Chen <wei.chen@arm.com>
Tested-by: Jiamei Xie <jiamei.xie@arm.com>
---
v2 -> v3:
1. Add Tb.
2. Rename arch_have_default_dmazone to arch_want_default_dmazone.
v1 -> v2:
1. Extend the description of Arm's workaround for reserve DMA
   allocations to avoid the same discussion every time.
2. Use a macro to define arch_have_default_dmazone, because
   it's little hard to make x86 version to static inline.
   Use a macro will also avoid add __init for this function.
3. Change arch_have_default_dmazone return value from
   unsigned int to bool.
4. Un-addressed comment: make arch_have_default_dmazone
   of x86 to be static inline. Because, if we move
   arch_have_default_dmazone to x86/asm/numa.h, it depends
   on nodemask.h to provide num_online_nodes. But nodemask.h
   needs numa.h to provide MAX_NUMANODES. This will cause a
   loop dependency. And this function can only be used in
   end_boot_allocator, in Xen initialization. So I think,
   compared to the changes introduced by inline, it doesn't
   mean much.
---
 xen/arch/arm/include/asm/numa.h | 1 +
 xen/arch/x86/include/asm/numa.h | 1 +
 xen/common/page_alloc.c         | 2 +-
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/include/asm/numa.h b/xen/arch/arm/include/asm/numa.h
index 31a6de4e23..e4c4d89192 100644
--- a/xen/arch/arm/include/asm/numa.h
+++ b/xen/arch/arm/include/asm/numa.h
@@ -24,6 +24,7 @@ extern mfn_t first_valid_mfn;
 #define node_spanned_pages(nid) (max_page - mfn_x(first_valid_mfn))
 #define node_start_pfn(nid) (mfn_x(first_valid_mfn))
 #define __node_distance(a, b) (20)
+#define arch_want_default_dmazone() (false)
 
 #endif /* __ARCH_ARM_NUMA_H */
 /*
diff --git a/xen/arch/x86/include/asm/numa.h b/xen/arch/x86/include/asm/numa.h
index bada2c0bb9..5d8385f2e1 100644
--- a/xen/arch/x86/include/asm/numa.h
+++ b/xen/arch/x86/include/asm/numa.h
@@ -74,6 +74,7 @@ static inline __attribute__((pure)) nodeid_t phys_to_nid(paddr_t addr)
 #define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
 #define node_end_pfn(nid)       (NODE_DATA(nid)->node_start_pfn + \
 				 NODE_DATA(nid)->node_spanned_pages)
+#define arch_want_default_dmazone() (num_online_nodes() > 1)
 
 extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
 
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 319029140f..b3bddc719b 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1889,7 +1889,7 @@ void __init end_boot_allocator(void)
     }
     nr_bootmem_regions = 0;
 
-    if ( !dma_bitsize && (num_online_nodes() > 1) )
+    if ( !dma_bitsize && arch_want_default_dmazone() )
         dma_bitsize = arch_get_dma_bitsize();
 
     printk("Domain heap initialised");
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 5/9] xen: decouple NUMA from ACPI in Kconfig
  2022-05-11  1:46 [PATCH v3 0/9] Device tree based NUMA support for Arm - Part#1 Wei Chen
                   ` (3 preceding siblings ...)
  2022-05-11  1:46 ` [PATCH v3 4/9] xen: introduce an arch helper for default dma zone status Wei Chen
@ 2022-05-11  1:46 ` Wei Chen
  2022-05-11  1:46 ` [PATCH v3 6/9] xen/arm: use !CONFIG_NUMA to keep fake NUMA API Wei Chen
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: Wei Chen @ 2022-05-11  1:46 UTC (permalink / raw)
  To: xen-devel
  Cc: nd, Wei Chen, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Jiamei Xie

In current Xen code only implements x86 ACPI-based NUMA support.
So in Xen Kconfig system, NUMA equals to ACPI_NUMA. x86 selects
NUMA by default, and CONFIG_ACPI_NUMA is hardcode in config.h.

In a follow-up patch, we will introduce support for NUMA using
the device tree. That means we will have two NUMA implementations,
so in this patch we decouple NUMA from ACPI based NUMA in Kconfig.
Make NUMA as a common feature, that device tree based NUMA also
can select it.

Signed-off-by: Wei Chen <wei.chen@arm.com>
Tested-by: Jiamei Xie <jiamei.xie@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
v2 -> v3:
Add Tb.
v1 -> v2:
No change.
---
 xen/arch/x86/Kconfig              | 2 +-
 xen/arch/x86/include/asm/config.h | 1 -
 xen/common/Kconfig                | 3 +++
 xen/drivers/acpi/Kconfig          | 3 ++-
 xen/drivers/acpi/Makefile         | 2 +-
 5 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 06d6fbc864..1e31edc99f 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -6,6 +6,7 @@ config X86
 	def_bool y
 	select ACPI
 	select ACPI_LEGACY_TABLES_LOOKUP
+	select ACPI_NUMA
 	select ALTERNATIVE_CALL
 	select ARCH_SUPPORTS_INT128
 	select CORE_PARKING
@@ -26,7 +27,6 @@ config X86
 	select HAS_UBSAN
 	select HAS_VPCI if HVM
 	select NEEDS_LIBELF
-	select NUMA
 
 config ARCH_DEFCONFIG
 	string
diff --git a/xen/arch/x86/include/asm/config.h b/xen/arch/x86/include/asm/config.h
index de20642524..07bcd15831 100644
--- a/xen/arch/x86/include/asm/config.h
+++ b/xen/arch/x86/include/asm/config.h
@@ -31,7 +31,6 @@
 /* Intel P4 currently has largest cache line (L2 line size is 128 bytes). */
 #define CONFIG_X86_L1_CACHE_SHIFT 7
 
-#define CONFIG_ACPI_NUMA 1
 #define CONFIG_ACPI_SRAT 1
 #define CONFIG_ACPI_CSTATE 1
 
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index d921c74d61..d65add3fc6 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -70,6 +70,9 @@ config MEM_ACCESS
 config NEEDS_LIBELF
 	bool
 
+config NUMA
+	bool
+
 config STATIC_MEMORY
 	bool "Static Allocation Support (UNSUPPORTED)" if UNSUPPORTED
 	depends on ARM
diff --git a/xen/drivers/acpi/Kconfig b/xen/drivers/acpi/Kconfig
index b64d3731fb..e3f3d8f4b1 100644
--- a/xen/drivers/acpi/Kconfig
+++ b/xen/drivers/acpi/Kconfig
@@ -5,5 +5,6 @@ config ACPI
 config ACPI_LEGACY_TABLES_LOOKUP
 	bool
 
-config NUMA
+config ACPI_NUMA
 	bool
+	select NUMA
diff --git a/xen/drivers/acpi/Makefile b/xen/drivers/acpi/Makefile
index 4f8e97228e..2fc5230253 100644
--- a/xen/drivers/acpi/Makefile
+++ b/xen/drivers/acpi/Makefile
@@ -3,7 +3,7 @@ obj-y += utilities/
 obj-$(CONFIG_X86) += apei/
 
 obj-bin-y += tables.init.o
-obj-$(CONFIG_NUMA) += numa.o
+obj-$(CONFIG_ACPI_NUMA) += numa.o
 obj-y += osl.o
 obj-$(CONFIG_HAS_CPUFREQ) += pmstat.o
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 6/9] xen/arm: use !CONFIG_NUMA to keep fake NUMA API
  2022-05-11  1:46 [PATCH v3 0/9] Device tree based NUMA support for Arm - Part#1 Wei Chen
                   ` (4 preceding siblings ...)
  2022-05-11  1:46 ` [PATCH v3 5/9] xen: decouple NUMA from ACPI in Kconfig Wei Chen
@ 2022-05-11  1:46 ` Wei Chen
  2022-05-11  1:46 ` [PATCH v3 7/9] xen/x86: use paddr_t for addresses in NUMA node structure Wei Chen
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 29+ messages in thread
From: Wei Chen @ 2022-05-11  1:46 UTC (permalink / raw)
  To: xen-devel
  Cc: nd, Wei Chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Jiamei Xie

We have introduced CONFIG_NUMA in a previous patch. And this
option is enabled only on x86 at the current stage. In a follow
up patch, we will enable this option for Arm. But we still
want users to be able to disable the CONFIG_NUMA via Kconfig. In
this case, keep the fake NUMA API, will make Arm code still
able to work with NUMA aware memory allocation and scheduler.

Signed-off-by: Wei Chen <wei.chen@arm.com>
Tested-by: Jiamei Xie <jiamei.xie@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
---
v2 -> v3:
Add Tb.
v1 -> v2:
No change.
---
 xen/arch/arm/include/asm/numa.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/xen/arch/arm/include/asm/numa.h b/xen/arch/arm/include/asm/numa.h
index e4c4d89192..268a9db055 100644
--- a/xen/arch/arm/include/asm/numa.h
+++ b/xen/arch/arm/include/asm/numa.h
@@ -5,6 +5,8 @@
 
 typedef u8 nodeid_t;
 
+#ifndef CONFIG_NUMA
+
 /* Fake one node for now. See also node_online_map. */
 #define cpu_to_node(cpu) 0
 #define node_to_cpumask(node)   (cpu_online_map)
@@ -24,6 +26,9 @@ extern mfn_t first_valid_mfn;
 #define node_spanned_pages(nid) (max_page - mfn_x(first_valid_mfn))
 #define node_start_pfn(nid) (mfn_x(first_valid_mfn))
 #define __node_distance(a, b) (20)
+
+#endif
+
 #define arch_want_default_dmazone() (false)
 
 #endif /* __ARCH_ARM_NUMA_H */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 7/9] xen/x86: use paddr_t for addresses in NUMA node structure
  2022-05-11  1:46 [PATCH v3 0/9] Device tree based NUMA support for Arm - Part#1 Wei Chen
                   ` (5 preceding siblings ...)
  2022-05-11  1:46 ` [PATCH v3 6/9] xen/arm: use !CONFIG_NUMA to keep fake NUMA API Wei Chen
@ 2022-05-11  1:46 ` Wei Chen
  2022-05-18 13:13   ` Jan Beulich
  2022-05-11  1:46 ` [PATCH v3 8/9] xen/x86: add detection of memory interleaves for different nodes Wei Chen
  2022-05-11  1:46 ` [PATCH v3 9/9] xen/x86: use INFO level for node's without memory log message Wei Chen
  8 siblings, 1 reply; 29+ messages in thread
From: Wei Chen @ 2022-05-11  1:46 UTC (permalink / raw)
  To: xen-devel
  Cc: nd, Wei Chen, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Jiamei Xie

NUMA node structure "struct node" is using u64 as node memory
range. In order to make other architectures can reuse this
NUMA node relative code, we replace the u64 to paddr_t. And
use pfn_to_paddr and paddr_to_pfn to replace explicit shift
operations. The relate PRIx64 in print messages have been
replaced by PRIpaddr at the same time. And some being-phased-out
types like u64 in the lines we have touched also have been
converted to uint64_t or unsigned long.

Tested-by: Jiamei Xie <jiamei.xie@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
---
v2 -> v3:
1. Use uint64_t for size in acpi_scan_nodes, make it be
   consistent with numa_emulation.
2. Add Tb.
v1 -> v2:
1. Drop useless cast.
2. Use initializers of the variables.
3. Replace u64 by uint64_t.
4. Use unsigned long for start_pfn and end_pfn.
---
 xen/arch/x86/include/asm/numa.h |  8 ++++----
 xen/arch/x86/numa.c             | 32 +++++++++++++++-----------------
 xen/arch/x86/srat.c             | 25 +++++++++++++------------
 3 files changed, 32 insertions(+), 33 deletions(-)

diff --git a/xen/arch/x86/include/asm/numa.h b/xen/arch/x86/include/asm/numa.h
index 5d8385f2e1..c32ccffde3 100644
--- a/xen/arch/x86/include/asm/numa.h
+++ b/xen/arch/x86/include/asm/numa.h
@@ -18,7 +18,7 @@ extern cpumask_t     node_to_cpumask[];
 #define node_to_cpumask(node)    (node_to_cpumask[node])
 
 struct node { 
-	u64 start,end; 
+	paddr_t start, end;
 };
 
 extern int compute_hash_shift(struct node *nodes, int numnodes,
@@ -38,7 +38,7 @@ extern void numa_set_node(int cpu, nodeid_t node);
 extern nodeid_t setup_node(unsigned int pxm);
 extern void srat_detect_node(int cpu);
 
-extern void setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end);
+extern void setup_node_bootmem(nodeid_t nodeid, paddr_t start, paddr_t end);
 extern nodeid_t apicid_to_node[];
 extern void init_cpu_to_node(void);
 
@@ -76,9 +76,9 @@ static inline __attribute__((pure)) nodeid_t phys_to_nid(paddr_t addr)
 				 NODE_DATA(nid)->node_spanned_pages)
 #define arch_want_default_dmazone() (num_online_nodes() > 1)
 
-extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
+extern int valid_numa_range(paddr_t start, paddr_t end, nodeid_t node);
 
-void srat_parse_regions(u64 addr);
+void srat_parse_regions(paddr_t addr);
 extern u8 __node_distance(nodeid_t a, nodeid_t b);
 unsigned int arch_get_dma_bitsize(void);
 
diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
index 680b7d9002..627ae8aa95 100644
--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -162,12 +162,10 @@ int __init compute_hash_shift(struct node *nodes, int numnodes,
     return shift;
 }
 /* initialize NODE_DATA given nodeid and start/end */
-void __init setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end)
-{ 
-    unsigned long start_pfn, end_pfn;
-
-    start_pfn = start >> PAGE_SHIFT;
-    end_pfn = end >> PAGE_SHIFT;
+void __init setup_node_bootmem(nodeid_t nodeid, paddr_t start, paddr_t end)
+{
+    unsigned long start_pfn = paddr_to_pfn(start);
+    unsigned long end_pfn = paddr_to_pfn(end);
 
     NODE_DATA(nodeid)->node_start_pfn = start_pfn;
     NODE_DATA(nodeid)->node_spanned_pages = end_pfn - start_pfn;
@@ -198,11 +196,12 @@ void __init numa_init_array(void)
 static int numa_fake __initdata = 0;
 
 /* Numa emulation */
-static int __init numa_emulation(u64 start_pfn, u64 end_pfn)
+static int __init numa_emulation(unsigned long start_pfn,
+                                 unsigned long end_pfn)
 {
     int i;
     struct node nodes[MAX_NUMNODES];
-    u64 sz = ((end_pfn - start_pfn)<<PAGE_SHIFT) / numa_fake;
+    uint64_t sz = pfn_to_paddr(end_pfn - start_pfn) / numa_fake;
 
     /* Kludge needed for the hash function */
     if ( hweight64(sz) > 1 )
@@ -218,9 +217,9 @@ static int __init numa_emulation(u64 start_pfn, u64 end_pfn)
     memset(&nodes,0,sizeof(nodes));
     for ( i = 0; i < numa_fake; i++ )
     {
-        nodes[i].start = (start_pfn<<PAGE_SHIFT) + i*sz;
+        nodes[i].start = pfn_to_paddr(start_pfn) + i * sz;
         if ( i == numa_fake - 1 )
-            sz = (end_pfn<<PAGE_SHIFT) - nodes[i].start;
+            sz = pfn_to_paddr(end_pfn) - nodes[i].start;
         nodes[i].end = nodes[i].start + sz;
         printk(KERN_INFO "Faking node %d at %"PRIx64"-%"PRIx64" (%"PRIu64"MB)\n",
                i,
@@ -246,6 +245,8 @@ static int __init numa_emulation(u64 start_pfn, u64 end_pfn)
 void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
 { 
     int i;
+    paddr_t start = pfn_to_paddr(start_pfn);
+    paddr_t end = pfn_to_paddr(end_pfn);
 
 #ifdef CONFIG_NUMA_EMU
     if ( numa_fake && !numa_emulation(start_pfn, end_pfn) )
@@ -253,17 +254,15 @@ void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
 #endif
 
 #ifdef CONFIG_ACPI_NUMA
-    if ( !numa_off && !acpi_scan_nodes((u64)start_pfn << PAGE_SHIFT,
-         (u64)end_pfn << PAGE_SHIFT) )
+    if ( !numa_off && !acpi_scan_nodes(start, end) )
         return;
 #endif
 
     printk(KERN_INFO "%s\n",
            numa_off ? "NUMA turned off" : "No NUMA configuration found");
 
-    printk(KERN_INFO "Faking a node at %016"PRIx64"-%016"PRIx64"\n",
-           (u64)start_pfn << PAGE_SHIFT,
-           (u64)end_pfn << PAGE_SHIFT);
+    printk(KERN_INFO "Faking a node at %"PRIpaddr"-%"PRIpaddr"\n",
+           start, end);
     /* setup dummy node covering all memory */
     memnode_shift = BITS_PER_LONG - 1;
     memnodemap = _memnodemap;
@@ -276,8 +275,7 @@ void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
     for ( i = 0; i < nr_cpu_ids; i++ )
         numa_set_node(i, 0);
     cpumask_copy(&node_to_cpumask[0], cpumask_of(0));
-    setup_node_bootmem(0, (u64)start_pfn << PAGE_SHIFT,
-                    (u64)end_pfn << PAGE_SHIFT);
+    setup_node_bootmem(0, start, end);
 }
 
 void numa_add_cpu(int cpu)
diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
index cfe24c7e78..8ffe43bdfe 100644
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -104,7 +104,7 @@ nodeid_t setup_node(unsigned pxm)
 	return node;
 }
 
-int valid_numa_range(u64 start, u64 end, nodeid_t node)
+int valid_numa_range(paddr_t start, paddr_t end, nodeid_t node)
 {
 	int i;
 
@@ -119,7 +119,7 @@ int valid_numa_range(u64 start, u64 end, nodeid_t node)
 	return 0;
 }
 
-static __init int conflicting_memblks(u64 start, u64 end)
+static __init int conflicting_memblks(paddr_t start, paddr_t end)
 {
 	int i;
 
@@ -135,7 +135,7 @@ static __init int conflicting_memblks(u64 start, u64 end)
 	return -1;
 }
 
-static __init void cutoff_node(int i, u64 start, u64 end)
+static __init void cutoff_node(int i, paddr_t start, paddr_t end)
 {
 	struct node *nd = &nodes[i];
 	if (nd->start < start) {
@@ -275,7 +275,7 @@ acpi_numa_processor_affinity_init(const struct acpi_srat_cpu_affinity *pa)
 void __init
 acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
 {
-	u64 start, end;
+	paddr_t start, end;
 	unsigned pxm;
 	nodeid_t node;
 	int i;
@@ -318,7 +318,7 @@ acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
 		bool mismatch = !(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) !=
 		                !test_bit(i, memblk_hotplug);
 
-		printk("%sSRAT: PXM %u (%"PRIx64"-%"PRIx64") overlaps with itself (%"PRIx64"-%"PRIx64")\n",
+		printk("%sSRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with itself (%"PRIpaddr"-%"PRIpaddr")\n",
 		       mismatch ? KERN_ERR : KERN_WARNING, pxm, start, end,
 		       node_memblk_range[i].start, node_memblk_range[i].end);
 		if (mismatch) {
@@ -327,7 +327,7 @@ acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
 		}
 	} else {
 		printk(KERN_ERR
-		       "SRAT: PXM %u (%"PRIx64"-%"PRIx64") overlaps with PXM %u (%"PRIx64"-%"PRIx64")\n",
+		       "SRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with PXM %u (%"PRIpaddr"-%"PRIpaddr")\n",
 		       pxm, start, end, node_to_pxm(memblk_nodeid[i]),
 		       node_memblk_range[i].start, node_memblk_range[i].end);
 		bad_srat();
@@ -346,7 +346,7 @@ acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
 				nd->end = end;
 		}
 	}
-	printk(KERN_INFO "SRAT: Node %u PXM %u %"PRIx64"-%"PRIx64"%s\n",
+	printk(KERN_INFO "SRAT: Node %u PXM %u %"PRIpaddr"-%"PRIpaddr"%s\n",
 	       node, pxm, start, end,
 	       ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE ? " (hotplug)" : "");
 
@@ -369,7 +369,7 @@ static int __init nodes_cover_memory(void)
 
 	for (i = 0; i < e820.nr_map; i++) {
 		int j, found;
-		unsigned long long start, end;
+		paddr_t start, end;
 
 		if (e820.map[i].type != E820_RAM) {
 			continue;
@@ -396,7 +396,7 @@ static int __init nodes_cover_memory(void)
 
 		if (start < end) {
 			printk(KERN_ERR "SRAT: No PXM for e820 range: "
-				"%016Lx - %016Lx\n", start, end);
+				"%"PRIpaddr" - %"PRIpaddr"\n", start, end);
 			return 0;
 		}
 	}
@@ -432,7 +432,7 @@ static int __init cf_check srat_parse_region(
 	return 0;
 }
 
-void __init srat_parse_regions(u64 addr)
+void __init srat_parse_regions(paddr_t addr)
 {
 	u64 mask;
 	unsigned int i;
@@ -457,7 +457,7 @@ void __init srat_parse_regions(u64 addr)
 }
 
 /* Use the information discovered above to actually set up the nodes. */
-int __init acpi_scan_nodes(u64 start, u64 end)
+int __init acpi_scan_nodes(paddr_t start, paddr_t end)
 {
 	int i;
 	nodemask_t all_nodes_parsed;
@@ -489,7 +489,8 @@ int __init acpi_scan_nodes(u64 start, u64 end)
 	/* Finally register nodes */
 	for_each_node_mask(i, all_nodes_parsed)
 	{
-		u64 size = nodes[i].end - nodes[i].start;
+		uint64_t size = nodes[i].end - nodes[i].start;
+
 		if ( size == 0 )
 			printk(KERN_WARNING "SRAT: Node %u has no memory. "
 			       "BIOS Bug or mis-configured hardware?\n", i);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 8/9] xen/x86: add detection of memory interleaves for different nodes
  2022-05-11  1:46 [PATCH v3 0/9] Device tree based NUMA support for Arm - Part#1 Wei Chen
                   ` (6 preceding siblings ...)
  2022-05-11  1:46 ` [PATCH v3 7/9] xen/x86: use paddr_t for addresses in NUMA node structure Wei Chen
@ 2022-05-11  1:46 ` Wei Chen
  2022-05-18 13:30   ` Jan Beulich
  2022-05-11  1:46 ` [PATCH v3 9/9] xen/x86: use INFO level for node's without memory log message Wei Chen
  8 siblings, 1 reply; 29+ messages in thread
From: Wei Chen @ 2022-05-11  1:46 UTC (permalink / raw)
  To: xen-devel
  Cc: nd, Wei Chen, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Jiamei Xie

One NUMA node may contain several memory blocks. In current Xen
code, Xen will maintain a node memory range for each node to cover
all its memory blocks. But here comes the problem, in the gap of
one node's two memory blocks, if there are some memory blocks don't
belong to this node (remote memory blocks). This node's memory range
will be expanded to cover these remote memory blocks.

One node's memory range contains other nodes' memory, this is
obviously not very reasonable. This means current NUMA code only
can support node has no interleaved memory blocks. However, on a
physical machine, the addresses of multiple nodes can be interleaved.

So in this patch, we add code to detect memory interleaves of
different nodes. NUMA initialization will be failed and error
messages will be printed when Xen detect such hardware configuration.

Signed-off-by: Wei Chen <wei.chen@arm.com>
Tested-by: Jiamei Xie <jiamei.xie@arm.com>
---
v2 -> v3:
1. Merge the check code from a separate function to
   conflicting_memblks. This will reduce the loop
   times of node memory blocks.
2. Use an enumeration to indicate conflict check status.
3. Use a pointer to get conflict memory block id.
v1 -> v2:
1. Update the description to say we're after is no memory
   interleaves of different nodes.
2. Only update node range when it passes the interleave check.
3. Don't use full upper-case for "node".
---
 xen/arch/x86/srat.c | 115 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 86 insertions(+), 29 deletions(-)

diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
index 8ffe43bdfe..53835ae3eb 100644
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -42,6 +42,12 @@ static struct node node_memblk_range[NR_NODE_MEMBLKS];
 static nodeid_t memblk_nodeid[NR_NODE_MEMBLKS];
 static __initdata DECLARE_BITMAP(memblk_hotplug, NR_NODE_MEMBLKS);
 
+enum conflicts {
+	NO_CONFLICT = 0,
+	ERR_OVERLAP,
+	ERR_INTERLEAVE,
+};
+
 static inline bool node_found(unsigned idx, unsigned pxm)
 {
 	return ((pxm2node[idx].pxm == pxm) &&
@@ -119,20 +125,43 @@ int valid_numa_range(paddr_t start, paddr_t end, nodeid_t node)
 	return 0;
 }
 
-static __init int conflicting_memblks(paddr_t start, paddr_t end)
+static enum conflicts __init
+conflicting_memblks(nodeid_t nid, paddr_t start, paddr_t end,
+		    paddr_t nd_start, paddr_t nd_end, int *mblkid)
 {
 	int i;
 
+	/*
+	 * Scan all recorded nodes' memory blocks to check conflicts:
+	 * Overlap or interleave.
+	 */
 	for (i = 0; i < num_node_memblks; i++) {
 		struct node *nd = &node_memblk_range[i];
+		*mblkid = i;
+
+		/* Skip 0 bytes node memory block. */
 		if (nd->start == nd->end)
 			continue;
+		/*
+		 * Use memblk range to check memblk overlaps, include the
+		 * self-overlap case.
+		 */
 		if (nd->end > start && nd->start < end)
-			return i;
+			return ERR_OVERLAP;
 		if (nd->end == end && nd->start == start)
-			return i;
+			return ERR_OVERLAP;
+		/*
+		 * Use node memory range to check whether new range contains
+		 * memory from other nodes - interleave check. We just need
+		 * to check full contains situation. Because overlaps have
+		 * been checked above.
+		 */
+	        if (nid != memblk_nodeid[i] &&
+		    (nd_start < nd->start && nd->end < nd_end))
+			return ERR_INTERLEAVE;
 	}
-	return -1;
+
+	return NO_CONFLICT;
 }
 
 static __init void cutoff_node(int i, paddr_t start, paddr_t end)
@@ -275,6 +304,9 @@ acpi_numa_processor_affinity_init(const struct acpi_srat_cpu_affinity *pa)
 void __init
 acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
 {
+	enum conflicts status;
+	struct node *nd;
+	paddr_t nd_start, nd_end;
 	paddr_t start, end;
 	unsigned pxm;
 	nodeid_t node;
@@ -310,42 +342,67 @@ acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
 		bad_srat();
 		return;
 	}
+
+	/*
+	 * For the node that already has some memory blocks, we will
+	 * expand the node memory range temporarily to check memory
+	 * interleaves with other nodes. We will not use this node
+	 * temp memory range to check overlaps, because it will mask
+	 * the overlaps in same node.
+	 *
+	 * Node with 0 bytes memory doesn't need this expandsion.
+	 */
+	nd_start = start;
+	nd_end = end;
+	nd = &nodes[node];
+	if (nd->start != nd->end) {
+		if (nd_start > nd->start)
+			nd_start = nd->start;
+
+		if (nd_end < end)
+			nd_end = nd->end;
+	}
+
 	/* It is fine to add this area to the nodes data it will be used later*/
-	i = conflicting_memblks(start, end);
-	if (i < 0)
-		/* everything fine */;
-	else if (memblk_nodeid[i] == node) {
-		bool mismatch = !(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) !=
-		                !test_bit(i, memblk_hotplug);
-
-		printk("%sSRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with itself (%"PRIpaddr"-%"PRIpaddr")\n",
-		       mismatch ? KERN_ERR : KERN_WARNING, pxm, start, end,
-		       node_memblk_range[i].start, node_memblk_range[i].end);
-		if (mismatch) {
+	status = conflicting_memblks(node, start, end, nd_start, nd_end, &i);
+	if (status == ERR_OVERLAP) {
+		if (memblk_nodeid[i] == node) {
+			bool mismatch = !(ma->flags &
+					ACPI_SRAT_MEM_HOT_PLUGGABLE) !=
+			                !test_bit(i, memblk_hotplug);
+
+			printk("%sSRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with itself (%"PRIpaddr"-%"PRIpaddr")\n",
+			       mismatch ? KERN_ERR : KERN_WARNING, pxm, start,
+			       end, node_memblk_range[i].start,
+			       node_memblk_range[i].end);
+			if (mismatch) {
+				bad_srat();
+				return;
+			}
+		} else {
+			printk(KERN_ERR
+			       "SRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with PXM %u (%"PRIpaddr"-%"PRIpaddr")\n",
+			       pxm, start, end, node_to_pxm(memblk_nodeid[i]),
+			       node_memblk_range[i].start,
+			       node_memblk_range[i].end);
 			bad_srat();
 			return;
 		}
-	} else {
+	} else if (status == ERR_INTERLEAVE) {
 		printk(KERN_ERR
-		       "SRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with PXM %u (%"PRIpaddr"-%"PRIpaddr")\n",
-		       pxm, start, end, node_to_pxm(memblk_nodeid[i]),
+		       "SRAT: Node %u: (%"PRIpaddr"-%"PRIpaddr") interleaves with node %u memblk (%"PRIpaddr"-%"PRIpaddr")\n",
+		       node, nd_start, nd_end, memblk_nodeid[i],
 		       node_memblk_range[i].start, node_memblk_range[i].end);
 		bad_srat();
 		return;
 	}
-	if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
-		struct node *nd = &nodes[node];
 
-		if (!node_test_and_set(node, memory_nodes_parsed)) {
-			nd->start = start;
-			nd->end = end;
-		} else {
-			if (start < nd->start)
-				nd->start = start;
-			if (nd->end < end)
-				nd->end = end;
-		}
+	if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
+		node_set(node, memory_nodes_parsed);
+		nd->start = nd_start;
+		nd->end = nd_end;
 	}
+
 	printk(KERN_INFO "SRAT: Node %u PXM %u %"PRIpaddr"-%"PRIpaddr"%s\n",
 	       node, pxm, start, end,
 	       ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE ? " (hotplug)" : "");
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v3 9/9] xen/x86: use INFO level for node's without memory log message
  2022-05-11  1:46 [PATCH v3 0/9] Device tree based NUMA support for Arm - Part#1 Wei Chen
                   ` (7 preceding siblings ...)
  2022-05-11  1:46 ` [PATCH v3 8/9] xen/x86: add detection of memory interleaves for different nodes Wei Chen
@ 2022-05-11  1:46 ` Wei Chen
  2022-05-18 13:33   ` Jan Beulich
  8 siblings, 1 reply; 29+ messages in thread
From: Wei Chen @ 2022-05-11  1:46 UTC (permalink / raw)
  To: xen-devel
  Cc: nd, Wei Chen, Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu

In previous code, Xen was using KERN_WARNING for log message
when Xen found a node without memory. Xen will print this
warning message, and said that this may be an BIOS Bug or
mis-configured hardware. But actually, this warning is bogus,
because in an NUMA setting, nodes can only have processors,
and with 0 bytes memory. So it is unreasonable to warn about
BIOS or hardware corruption based on the detection of node
with 0 bytes memory.

So in this patch, we remove the warning messages, but just
keep an info message to info users that there is one or more
nodes with 0 bytes memory in the system.

Signed-off-by: Wei Chen <wei.chen@arm.com>
---
v2 -> v3:
new commit.
---
 xen/arch/x86/srat.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
index 53835ae3eb..acaebad2a2 100644
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -549,8 +549,7 @@ int __init acpi_scan_nodes(paddr_t start, paddr_t end)
 		uint64_t size = nodes[i].end - nodes[i].start;
 
 		if ( size == 0 )
-			printk(KERN_WARNING "SRAT: Node %u has no memory. "
-			       "BIOS Bug or mis-configured hardware?\n", i);
+			printk(KERN_INFO "SRAT: Node %u has no memory.\n", i);
 
 		setup_node_bootmem(i, nodes[i].start, nodes[i].end);
 	}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 4/9] xen: introduce an arch helper for default dma zone status
  2022-05-11  1:46 ` [PATCH v3 4/9] xen: introduce an arch helper for default dma zone status Wei Chen
@ 2022-05-13  0:32   ` Stefano Stabellini
  2022-05-18 13:09   ` Jan Beulich
  1 sibling, 0 replies; 29+ messages in thread
From: Stefano Stabellini @ 2022-05-13  0:32 UTC (permalink / raw)
  To: Wei Chen
  Cc: xen-devel, nd, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Andrew Cooper,
	George Dunlap, Jan Beulich, Wei Liu, Roger Pau Monné,
	Jiamei Xie

On Wed, 11 May 2022, Wei Chen wrote:
> In current code, when Xen is running in a multiple nodes
> NUMA system, it will set dma_bitsize in end_boot_allocator
> to reserve some low address memory as DMA zone.
> 
> There are some x86 implications in the implementation.
> Because on x86, memory starts from 0. On a multiple-nodes
> NUMA system, if a single node contains the majority or all
> of the DMA memory, x86 prefers to give out memory from
> non-local allocations rather than exhausting the DMA memory
> ranges. Hence x86 uses dma_bitsize to set aside some largely
> arbitrary amount of memory for DMA zone. The allocations
> from DMA zone would happen only after exhausting all other
> nodes' memory.
> 
> But the implications are not shared across all architectures.
> For example, Arm cannot guarantee the availability of memory
> below a certain boundary for DMA limited-capability devices
> either. But currently, Arm doesn't need a reserved DMA zone
> in Xen. Because there is no DMA device in Xen. And for guests,
> Xen Arm only allows Dom0 to have DMA operations without IOMMU.
> Xen will try to allocate memory under 4GB or memory range that
> is limited by dma_bitsize for Dom0 in boot time. For DomU, even
> Xen can passthrough devices to DomU without IOMMU, but Xen Arm
> doesn't guarantee their DMA operations. So, Xen Arm doesn't
> need a reserved DMA zone to provide DMA memory for guests.
> 
> In this patch, we introduce an arch_want_default_dmazone helper
> for different architectures to determine whether they need to
> set dma_bitsize for DMA zone reservation or not.
> 
> At the same time, when x86 Xen is built with CONFIG_PV=n could
> probably leverage this new helper to actually not trigger DMA
> zone reservation.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> Tested-by: Jiamei Xie <jiamei.xie@arm.com>

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>


> ---
> v2 -> v3:
> 1. Add Tb.
> 2. Rename arch_have_default_dmazone to arch_want_default_dmazone.
> v1 -> v2:
> 1. Extend the description of Arm's workaround for reserve DMA
>    allocations to avoid the same discussion every time.
> 2. Use a macro to define arch_have_default_dmazone, because
>    it's little hard to make x86 version to static inline.
>    Use a macro will also avoid add __init for this function.
> 3. Change arch_have_default_dmazone return value from
>    unsigned int to bool.
> 4. Un-addressed comment: make arch_have_default_dmazone
>    of x86 to be static inline. Because, if we move
>    arch_have_default_dmazone to x86/asm/numa.h, it depends
>    on nodemask.h to provide num_online_nodes. But nodemask.h
>    needs numa.h to provide MAX_NUMANODES. This will cause a
>    loop dependency. And this function can only be used in
>    end_boot_allocator, in Xen initialization. So I think,
>    compared to the changes introduced by inline, it doesn't
>    mean much.
> ---
>  xen/arch/arm/include/asm/numa.h | 1 +
>  xen/arch/x86/include/asm/numa.h | 1 +
>  xen/common/page_alloc.c         | 2 +-
>  3 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/arm/include/asm/numa.h b/xen/arch/arm/include/asm/numa.h
> index 31a6de4e23..e4c4d89192 100644
> --- a/xen/arch/arm/include/asm/numa.h
> +++ b/xen/arch/arm/include/asm/numa.h
> @@ -24,6 +24,7 @@ extern mfn_t first_valid_mfn;
>  #define node_spanned_pages(nid) (max_page - mfn_x(first_valid_mfn))
>  #define node_start_pfn(nid) (mfn_x(first_valid_mfn))
>  #define __node_distance(a, b) (20)
> +#define arch_want_default_dmazone() (false)
>  
>  #endif /* __ARCH_ARM_NUMA_H */
>  /*
> diff --git a/xen/arch/x86/include/asm/numa.h b/xen/arch/x86/include/asm/numa.h
> index bada2c0bb9..5d8385f2e1 100644
> --- a/xen/arch/x86/include/asm/numa.h
> +++ b/xen/arch/x86/include/asm/numa.h
> @@ -74,6 +74,7 @@ static inline __attribute__((pure)) nodeid_t phys_to_nid(paddr_t addr)
>  #define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
>  #define node_end_pfn(nid)       (NODE_DATA(nid)->node_start_pfn + \
>  				 NODE_DATA(nid)->node_spanned_pages)
> +#define arch_want_default_dmazone() (num_online_nodes() > 1)
>  
>  extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
>  
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 319029140f..b3bddc719b 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1889,7 +1889,7 @@ void __init end_boot_allocator(void)
>      }
>      nr_bootmem_regions = 0;
>  
> -    if ( !dma_bitsize && (num_online_nodes() > 1) )
> +    if ( !dma_bitsize && arch_want_default_dmazone() )
>          dma_bitsize = arch_get_dma_bitsize();
>  
>      printk("Domain heap initialised");
> -- 
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 2/9] xen: reuse x86 EFI stub functions for Arm
  2022-05-11  1:46 ` [PATCH v3 2/9] xen: reuse x86 EFI stub functions for Arm Wei Chen
@ 2022-05-13  0:33   ` Stefano Stabellini
  2022-05-18 13:04   ` Jan Beulich
  1 sibling, 0 replies; 29+ messages in thread
From: Stefano Stabellini @ 2022-05-13  0:33 UTC (permalink / raw)
  To: Wei Chen
  Cc: xen-devel, nd, Stefano Stabellini, Julien Grall,
	Bertrand Marquis, Volodymyr Babchuk, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Jiamei Xie

On Wed, 11 May 2022, Wei Chen wrote:
> x86 is using compiler feature testing to decide EFI build
> enable or not. When EFI build is disabled, x86 will use an
> efi/stub.c file to replace efi/runtime.c for build objects.
> Following this idea, we introduce a stub file for Arm, but
> use CONFIG_ARM_EFI to decide EFI build enable or not.
> 
> And the most functions in x86 EFI stub.c can be reused for
> other architectures, like Arm. So we move them to common
> and keep the x86 specific function in x86/efi/stub.c.
> 
> To avoid the symbol link conflict error when linking common
> stub files to x86/efi. We add a regular file check in efi
> stub files' link script. Depends on this check we can bypass
> the link behaviors for existed stub files in x86/efi.
> 
> As there is no Arm specific EFI stub function for Arm in
> current stage, Arm still can use the existed symbol link
> method for EFI stub files.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> Tested-by: Jiamei Xie <jiamei.xie@arm.com>

The ARM side looks OK to me, I'll let Jan comment on the x86 side.

> ---
> v2 -> v3:
> 1. Add existed file check, if a regular stub files,
>    the common/stub files' link will be ignored.
> 2. Keep stub.c in x86/efi to include common/efi/stub.c
> 3. Restore efi_compat_xxx stub functions to x86/efi.c.
>    Other architectures will not use efi_compat_xxx.
> 4. Remove ARM_EFI dependency from ARM_64.
> 5. Add comment for adding stub.o to EFIOBJ-y.
> 6. Merge patch#2 and patch#3 to one patch.
> v1 -> v2:
> 1. Drop the copy of stub.c from Arm EFI.
> 2. Share common codes of x86 EFI stub for other architectures.
> 3. Use CONFIG_ARM_EFI to replace CONFIG_EFI
> 4. Remove help text and make CONFIG_ARM_EFI invisible.
> 5. Merge one following patch:
>    xen/arm: introduce a stub file for non-EFI architectures
> 6. Use the common stub.c instead of creating new one.
> ---
>  xen/arch/arm/Kconfig         |  4 ++++
>  xen/arch/arm/Makefile        |  2 +-
>  xen/arch/arm/efi/Makefile    |  8 ++++++++
>  xen/arch/x86/efi/stub.c      | 32 +-------------------------------
>  xen/common/efi/efi-common.mk |  3 ++-
>  xen/common/efi/stub.c        | 32 ++++++++++++++++++++++++++++++++
>  6 files changed, 48 insertions(+), 33 deletions(-)
>  create mode 100644 xen/common/efi/stub.c
> 
> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> index ecfa6822e4..8a16d43bd5 100644
> --- a/xen/arch/arm/Kconfig
> +++ b/xen/arch/arm/Kconfig
> @@ -6,6 +6,7 @@ config ARM_64
>  	def_bool y
>  	depends on !ARM_32
>  	select 64BIT
> +	select ARM_EFI
>  	select HAS_FAST_MULTIPLY
>  
>  config ARM
> @@ -33,6 +34,9 @@ config ACPI
>  	  Advanced Configuration and Power Interface (ACPI) support for Xen is
>  	  an alternative to device tree on ARM64.
>  
> +config ARM_EFI
> +	bool
> +
>  config GICV3
>  	bool "GICv3 driver"
>  	depends on ARM_64 && !NEW_VGIC
> diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
> index 1d862351d1..bb7a6151c1 100644
> --- a/xen/arch/arm/Makefile
> +++ b/xen/arch/arm/Makefile
> @@ -1,6 +1,5 @@
>  obj-$(CONFIG_ARM_32) += arm32/
>  obj-$(CONFIG_ARM_64) += arm64/
> -obj-$(CONFIG_ARM_64) += efi/
>  obj-$(CONFIG_ACPI) += acpi/
>  obj-$(CONFIG_HAS_PCI) += pci/
>  ifneq ($(CONFIG_NO_PLAT),y)
> @@ -20,6 +19,7 @@ obj-y += domain.o
>  obj-y += domain_build.init.o
>  obj-y += domctl.o
>  obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
> +obj-y += efi/
>  obj-y += gic.o
>  obj-y += gic-v2.o
>  obj-$(CONFIG_GICV3) += gic-v3.o
> diff --git a/xen/arch/arm/efi/Makefile b/xen/arch/arm/efi/Makefile
> index 4313c39066..dffe72e589 100644
> --- a/xen/arch/arm/efi/Makefile
> +++ b/xen/arch/arm/efi/Makefile
> @@ -1,4 +1,12 @@
>  include $(srctree)/common/efi/efi-common.mk
>  
> +ifeq ($(CONFIG_ARM_EFI),y)
>  obj-y += $(EFIOBJ-y)
>  obj-$(CONFIG_ACPI) +=  efi-dom0.init.o
> +else
> +# Add stub.o to EFIOBJ-y to re-use the clean-files in
> +# efi-common.mk. Otherwise the link of stub.c in arm/efi
> +# will not be cleaned in "make clean".
> +EFIOBJ-y += stub.o
> +obj-y += stub.o
> +endif
> diff --git a/xen/arch/x86/efi/stub.c b/xen/arch/x86/efi/stub.c
> index 9984932626..f2365bc041 100644
> --- a/xen/arch/x86/efi/stub.c
> +++ b/xen/arch/x86/efi/stub.c
> @@ -1,7 +1,5 @@
>  #include <xen/efi.h>
> -#include <xen/errno.h>
>  #include <xen/init.h>
> -#include <xen/lib.h>
>  #include <asm/asm_defns.h>
>  #include <asm/efibind.h>
>  #include <asm/page.h>
> @@ -10,6 +8,7 @@
>  #include <efi/eficon.h>
>  #include <efi/efidevp.h>
>  #include <efi/efiapi.h>
> +#include "../../../common/efi/stub.c"
>  
>  /*
>   * Here we are in EFI stub. EFI calls are not supported due to lack
> @@ -45,11 +44,6 @@ void __init noreturn efi_multiboot2(EFI_HANDLE ImageHandle,
>      unreachable();
>  }
>  
> -bool efi_enabled(unsigned int feature)
> -{
> -    return false;
> -}
> -
>  void __init efi_init_memory(void) { }
>  
>  bool efi_boot_mem_unused(unsigned long *start, unsigned long *end)
> @@ -62,32 +56,8 @@ bool efi_boot_mem_unused(unsigned long *start, unsigned long *end)
>  
>  void efi_update_l4_pgtable(unsigned int l4idx, l4_pgentry_t l4e) { }
>  
> -bool efi_rs_using_pgtables(void)
> -{
> -    return false;
> -}
> -
> -unsigned long efi_get_time(void)
> -{
> -    BUG();
> -    return 0;
> -}
> -
> -void efi_halt_system(void) { }
> -void efi_reset_system(bool warm) { }
> -
> -int efi_get_info(uint32_t idx, union xenpf_efi_info *info)
> -{
> -    return -ENOSYS;
> -}
> -
>  int efi_compat_get_info(uint32_t idx, union compat_pf_efi_info *)
>      __attribute__((__alias__("efi_get_info")));
>  
> -int efi_runtime_call(struct xenpf_efi_runtime_call *op)
> -{
> -    return -ENOSYS;
> -}
> -
>  int efi_compat_runtime_call(struct compat_pf_efi_runtime_call *)
>      __attribute__((__alias__("efi_runtime_call")));
> diff --git a/xen/common/efi/efi-common.mk b/xen/common/efi/efi-common.mk
> index 4298ceaee7..3a5b9958b3 100644
> --- a/xen/common/efi/efi-common.mk
> +++ b/xen/common/efi/efi-common.mk
> @@ -9,7 +9,8 @@ CFLAGS-y += -iquote $(srcdir)
>  # e.g.: It transforms "dir/foo/bar" into successively
>  #       "dir foo bar", ".. .. ..", "../../.."
>  $(obj)/%.c: $(srctree)/common/efi/%.c FORCE
> -	$(Q)ln -nfs $(subst $(space),/,$(patsubst %,..,$(subst /, ,$(obj))))/source/common/efi/$(<F) $@
> +	$(Q)test -f $@ || \
> +	ln -nfs $(subst $(space),/,$(patsubst %,..,$(subst /, ,$(obj))))/source/common/efi/$(<F) $@
>  
>  clean-files += $(patsubst %.o, %.c, $(EFIOBJ-y:.init.o=.o) $(EFIOBJ-))
>  
> diff --git a/xen/common/efi/stub.c b/xen/common/efi/stub.c
> new file mode 100644
> index 0000000000..15694632c2
> --- /dev/null
> +++ b/xen/common/efi/stub.c
> @@ -0,0 +1,32 @@
> +#include <xen/efi.h>
> +#include <xen/errno.h>
> +#include <xen/lib.h>
> +
> +bool efi_enabled(unsigned int feature)
> +{
> +    return false;
> +}
> +
> +bool efi_rs_using_pgtables(void)
> +{
> +    return false;
> +}
> +
> +unsigned long efi_get_time(void)
> +{
> +    BUG();
> +    return 0;
> +}
> +
> +void efi_halt_system(void) { }
> +void efi_reset_system(bool warm) { }
> +
> +int efi_get_info(uint32_t idx, union xenpf_efi_info *info)
> +{
> +    return -ENOSYS;
> +}
> +
> +int efi_runtime_call(struct xenpf_efi_runtime_call *op)
> +{
> +    return -ENOSYS;
> +}
> -- 
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/9] xen/arm: Print a 64-bit number in hex from early uart
  2022-05-11  1:46 ` [PATCH v3 1/9] xen/arm: Print a 64-bit number in hex from early uart Wei Chen
@ 2022-05-16 17:14   ` Julien Grall
  2022-05-17  1:21     ` Wei Chen
  2022-06-17  3:27     ` Henry Wang
  0 siblings, 2 replies; 29+ messages in thread
From: Julien Grall @ 2022-05-16 17:14 UTC (permalink / raw)
  To: Wei Chen, xen-devel
  Cc: nd, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk,
	Jiamei Xie, Julien Grall

Hi,

On 11/05/2022 02:46, Wei Chen wrote:
> Current putn function that is using for early print
> only can print low 32-bit of AArch64 register. This
> will lose some important messages while debugging
> with early console. For example:
> (XEN) Bringing up CPU5
> - CPU 0000000100000100 booting -
> Will be truncated to
> (XEN) Bringing up CPU5
> - CPU 00000100 booting -
> 
> In this patch, we increased the print loops and shift
> bits to make putn print 64-bit number.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> Tested-by: Jiamei Xie <jiamei.xie@arm.com>
> Acked-by: Julien Grall <jgrall@amazon.com>

I have committed this patch.

Patch #3 looks to be suitably acked but I am not sure whether it can be 
committed before #2. So I didn't commit it.

Please let me know if it can be.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v3 1/9] xen/arm: Print a 64-bit number in hex from early uart
  2022-05-16 17:14   ` Julien Grall
@ 2022-05-17  1:21     ` Wei Chen
  2022-06-17  3:27     ` Henry Wang
  1 sibling, 0 replies; 29+ messages in thread
From: Wei Chen @ 2022-05-17  1:21 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: nd, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk,
	Jiamei Xie, Julien Grall

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 2022年5月17日 1:15
> To: Wei Chen <Wei.Chen@arm.com>; xen-devel@lists.xenproject.org
> Cc: nd <nd@arm.com>; Stefano Stabellini <sstabellini@kernel.org>; Bertrand
> Marquis <Bertrand.Marquis@arm.com>; Volodymyr Babchuk
> <Volodymyr_Babchuk@epam.com>; Jiamei Xie <Jiamei.Xie@arm.com>; Julien
> Grall <jgrall@amazon.com>
> Subject: Re: [PATCH v3 1/9] xen/arm: Print a 64-bit number in hex from
> early uart
> 
> Hi,
> 
> On 11/05/2022 02:46, Wei Chen wrote:
> > Current putn function that is using for early print
> > only can print low 32-bit of AArch64 register. This
> > will lose some important messages while debugging
> > with early console. For example:
> > (XEN) Bringing up CPU5
> > - CPU 0000000100000100 booting -
> > Will be truncated to
> > (XEN) Bringing up CPU5
> > - CPU 00000100 booting -
> >
> > In this patch, we increased the print loops and shift
> > bits to make putn print 64-bit number.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > Tested-by: Jiamei Xie <jiamei.xie@arm.com>
> > Acked-by: Julien Grall <jgrall@amazon.com>
> 
> I have committed this patch.
> 
> Patch #3 looks to be suitably acked but I am not sure whether it can be
> committed before #2. So I didn't commit it.
> 

No, it depends on patch#2 to provide EFI stubs for Arm32, otherwise
Arm32 will be failed on building.

Cheers,
Wei Chen

> Please let me know if it can be.
> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 2/9] xen: reuse x86 EFI stub functions for Arm
  2022-05-11  1:46 ` [PATCH v3 2/9] xen: reuse x86 EFI stub functions for Arm Wei Chen
  2022-05-13  0:33   ` Stefano Stabellini
@ 2022-05-18 13:04   ` Jan Beulich
  2022-05-19  2:36     ` Wei Chen
  1 sibling, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2022-05-18 13:04 UTC (permalink / raw)
  To: Wei Chen
  Cc: nd, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Jiamei Xie, xen-devel

On 11.05.2022 03:46, Wei Chen wrote:
> x86 is using compiler feature testing to decide EFI build
> enable or not. When EFI build is disabled, x86 will use an
> efi/stub.c file to replace efi/runtime.c for build objects.
> Following this idea, we introduce a stub file for Arm, but
> use CONFIG_ARM_EFI to decide EFI build enable or not.
> 
> And the most functions in x86 EFI stub.c can be reused for
> other architectures, like Arm. So we move them to common
> and keep the x86 specific function in x86/efi/stub.c.
> 
> To avoid the symbol link conflict error when linking common
> stub files to x86/efi. We add a regular file check in efi
> stub files' link script. Depends on this check we can bypass
> the link behaviors for existed stub files in x86/efi.
> 
> As there is no Arm specific EFI stub function for Arm in
> current stage, Arm still can use the existed symbol link
> method for EFI stub files.

Wouldn't it be better to mandate that every arch has its stub.c,
and in the Arm one all you'd do (for now) is #include the common
one? (But see also below.)

> --- a/xen/arch/arm/Makefile
> +++ b/xen/arch/arm/Makefile
> @@ -1,6 +1,5 @@
>  obj-$(CONFIG_ARM_32) += arm32/
>  obj-$(CONFIG_ARM_64) += arm64/
> -obj-$(CONFIG_ARM_64) += efi/
>  obj-$(CONFIG_ACPI) += acpi/
>  obj-$(CONFIG_HAS_PCI) += pci/
>  ifneq ($(CONFIG_NO_PLAT),y)
> @@ -20,6 +19,7 @@ obj-y += domain.o
>  obj-y += domain_build.init.o
>  obj-y += domctl.o
>  obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
> +obj-y += efi/
>  obj-y += gic.o
>  obj-y += gic-v2.o
>  obj-$(CONFIG_GICV3) += gic-v3.o
> diff --git a/xen/arch/arm/efi/Makefile b/xen/arch/arm/efi/Makefile
> index 4313c39066..dffe72e589 100644
> --- a/xen/arch/arm/efi/Makefile
> +++ b/xen/arch/arm/efi/Makefile
> @@ -1,4 +1,12 @@
>  include $(srctree)/common/efi/efi-common.mk
>  
> +ifeq ($(CONFIG_ARM_EFI),y)
>  obj-y += $(EFIOBJ-y)
>  obj-$(CONFIG_ACPI) +=  efi-dom0.init.o
> +else
> +# Add stub.o to EFIOBJ-y to re-use the clean-files in
> +# efi-common.mk. Otherwise the link of stub.c in arm/efi
> +# will not be cleaned in "make clean".
> +EFIOBJ-y += stub.o
> +obj-y += stub.o
> +endif

I realize Stefano indicated he's happy with the Arm side, but I still
wonder: What use is the stub on Arm32? Even further - once you have a
config option (rather than x86'es build-time check plus x86'es dual-
purposing of all object files), why do you need a stub in the first
place? You ought to be able to deal with things via inline functions
and macros, I would think.

> --- a/xen/common/efi/efi-common.mk
> +++ b/xen/common/efi/efi-common.mk
> @@ -9,7 +9,8 @@ CFLAGS-y += -iquote $(srcdir)
>  # e.g.: It transforms "dir/foo/bar" into successively
>  #       "dir foo bar", ".. .. ..", "../../.."
>  $(obj)/%.c: $(srctree)/common/efi/%.c FORCE
> -	$(Q)ln -nfs $(subst $(space),/,$(patsubst %,..,$(subst /, ,$(obj))))/source/common/efi/$(<F) $@
> +	$(Q)test -f $@ || \
> +	ln -nfs $(subst $(space),/,$(patsubst %,..,$(subst /, ,$(obj))))/source/common/efi/$(<F) $@

Please can you indent the "ln" to match "test", such that it's easily
visible (without paying attention to line continuation characters)
that these two lines are a single command?

Jan



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 4/9] xen: introduce an arch helper for default dma zone status
  2022-05-11  1:46 ` [PATCH v3 4/9] xen: introduce an arch helper for default dma zone status Wei Chen
  2022-05-13  0:32   ` Stefano Stabellini
@ 2022-05-18 13:09   ` Jan Beulich
  1 sibling, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2022-05-18 13:09 UTC (permalink / raw)
  To: Wei Chen
  Cc: nd, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Wei Liu,
	Roger Pau Monné,
	Jiamei Xie, xen-devel

On 11.05.2022 03:46, Wei Chen wrote:
> In current code, when Xen is running in a multiple nodes
> NUMA system, it will set dma_bitsize in end_boot_allocator
> to reserve some low address memory as DMA zone.
> 
> There are some x86 implications in the implementation.
> Because on x86, memory starts from 0. On a multiple-nodes
> NUMA system, if a single node contains the majority or all
> of the DMA memory, x86 prefers to give out memory from
> non-local allocations rather than exhausting the DMA memory
> ranges. Hence x86 uses dma_bitsize to set aside some largely
> arbitrary amount of memory for DMA zone. The allocations
> from DMA zone would happen only after exhausting all other
> nodes' memory.
> 
> But the implications are not shared across all architectures.
> For example, Arm cannot guarantee the availability of memory
> below a certain boundary for DMA limited-capability devices
> either. But currently, Arm doesn't need a reserved DMA zone
> in Xen. Because there is no DMA device in Xen. And for guests,
> Xen Arm only allows Dom0 to have DMA operations without IOMMU.
> Xen will try to allocate memory under 4GB or memory range that
> is limited by dma_bitsize for Dom0 in boot time. For DomU, even
> Xen can passthrough devices to DomU without IOMMU, but Xen Arm
> doesn't guarantee their DMA operations. So, Xen Arm doesn't
> need a reserved DMA zone to provide DMA memory for guests.
> 
> In this patch, we introduce an arch_want_default_dmazone helper
> for different architectures to determine whether they need to
> set dma_bitsize for DMA zone reservation or not.
> 
> At the same time, when x86 Xen is built with CONFIG_PV=n could
> probably leverage this new helper to actually not trigger DMA
> zone reservation.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> Tested-by: Jiamei Xie <jiamei.xie@arm.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 7/9] xen/x86: use paddr_t for addresses in NUMA node structure
  2022-05-11  1:46 ` [PATCH v3 7/9] xen/x86: use paddr_t for addresses in NUMA node structure Wei Chen
@ 2022-05-18 13:13   ` Jan Beulich
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2022-05-18 13:13 UTC (permalink / raw)
  To: Wei Chen
  Cc: nd, Andrew Cooper, Roger Pau Monné, Wei Liu, Jiamei Xie, xen-devel

On 11.05.2022 03:46, Wei Chen wrote:
> NUMA node structure "struct node" is using u64 as node memory
> range. In order to make other architectures can reuse this
> NUMA node relative code, we replace the u64 to paddr_t. And
> use pfn_to_paddr and paddr_to_pfn to replace explicit shift
> operations. The relate PRIx64 in print messages have been
> replaced by PRIpaddr at the same time. And some being-phased-out
> types like u64 in the lines we have touched also have been
> converted to uint64_t or unsigned long.
> 
> Tested-by: Jiamei Xie <jiamei.xie@arm.com>
> Signed-off-by: Wei Chen <wei.chen@arm.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 8/9] xen/x86: add detection of memory interleaves for different nodes
  2022-05-11  1:46 ` [PATCH v3 8/9] xen/x86: add detection of memory interleaves for different nodes Wei Chen
@ 2022-05-18 13:30   ` Jan Beulich
  2022-05-19  3:37     ` Wei Chen
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2022-05-18 13:30 UTC (permalink / raw)
  To: Wei Chen
  Cc: nd, Andrew Cooper, Roger Pau Monné, Wei Liu, Jiamei Xie, xen-devel

On 11.05.2022 03:46, Wei Chen wrote:
> --- a/xen/arch/x86/srat.c
> +++ b/xen/arch/x86/srat.c
> @@ -42,6 +42,12 @@ static struct node node_memblk_range[NR_NODE_MEMBLKS];
>  static nodeid_t memblk_nodeid[NR_NODE_MEMBLKS];
>  static __initdata DECLARE_BITMAP(memblk_hotplug, NR_NODE_MEMBLKS);
>  
> +enum conflicts {
> +	NO_CONFLICT = 0,

No need for the "= 0".

> +	ERR_OVERLAP,

While this at least can be an error (the self-overlap case is merely
warned about), ...

> +	ERR_INTERLEAVE,

... I don't think this is, and hence I'd recommend to drop "ERR_".

> @@ -119,20 +125,43 @@ int valid_numa_range(paddr_t start, paddr_t end, nodeid_t node)
>  	return 0;
>  }
>  
> -static __init int conflicting_memblks(paddr_t start, paddr_t end)
> +static enum conflicts __init
> +conflicting_memblks(nodeid_t nid, paddr_t start, paddr_t end,
> +		    paddr_t nd_start, paddr_t nd_end, int *mblkid)

Why "int"? Can the value passed back be negative?

>  {
>  	int i;
>  
> +	/*
> +	 * Scan all recorded nodes' memory blocks to check conflicts:
> +	 * Overlap or interleave.
> +	 */
>  	for (i = 0; i < num_node_memblks; i++) {
>  		struct node *nd = &node_memblk_range[i];
> +		*mblkid = i;

Style: Please maintain a blank line between declaration(s) and
statement(s).

> @@ -310,42 +342,67 @@ acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
>  		bad_srat();
>  		return;
>  	}
> +
> +	/*
> +	 * For the node that already has some memory blocks, we will
> +	 * expand the node memory range temporarily to check memory
> +	 * interleaves with other nodes. We will not use this node
> +	 * temp memory range to check overlaps, because it will mask
> +	 * the overlaps in same node.
> +	 *
> +	 * Node with 0 bytes memory doesn't need this expandsion.
> +	 */
> +	nd_start = start;
> +	nd_end = end;
> +	nd = &nodes[node];
> +	if (nd->start != nd->end) {
> +		if (nd_start > nd->start)
> +			nd_start = nd->start;
> +
> +		if (nd_end < end)

Did you mean nd->end here on the right side of < ? By intentionally
not adding "default:" in the body, you then also allow the compiler
to point out that addition of a new enumerator also needs handling
here.

> +			nd_end = nd->end;
> +	}
> +
>  	/* It is fine to add this area to the nodes data it will be used later*/
> -	i = conflicting_memblks(start, end);
> -	if (i < 0)
> -		/* everything fine */;
> -	else if (memblk_nodeid[i] == node) {
> -		bool mismatch = !(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) !=
> -		                !test_bit(i, memblk_hotplug);
> -
> -		printk("%sSRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with itself (%"PRIpaddr"-%"PRIpaddr")\n",
> -		       mismatch ? KERN_ERR : KERN_WARNING, pxm, start, end,
> -		       node_memblk_range[i].start, node_memblk_range[i].end);
> -		if (mismatch) {
> +	status = conflicting_memblks(node, start, end, nd_start, nd_end, &i);
> +	if (status == ERR_OVERLAP) {

Please use switch(status) when checking enumerated values.

> +		if (memblk_nodeid[i] == node) {
> +			bool mismatch = !(ma->flags &
> +					ACPI_SRAT_MEM_HOT_PLUGGABLE) !=
> +			                !test_bit(i, memblk_hotplug);

Style: The middle line wants indenting by two more characters.

> +
> +			printk("%sSRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with itself (%"PRIpaddr"-%"PRIpaddr")\n",
> +			       mismatch ? KERN_ERR : KERN_WARNING, pxm, start,
> +			       end, node_memblk_range[i].start,
> +			       node_memblk_range[i].end);
> +			if (mismatch) {
> +				bad_srat();
> +				return;
> +			}
> +		} else {
> +			printk(KERN_ERR
> +			       "SRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with PXM %u (%"PRIpaddr"-%"PRIpaddr")\n",
> +			       pxm, start, end, node_to_pxm(memblk_nodeid[i]),
> +			       node_memblk_range[i].start,
> +			       node_memblk_range[i].end);
>  			bad_srat();
>  			return;
>  		}
> -	} else {
> +	} else if (status == ERR_INTERLEAVE) {
>  		printk(KERN_ERR
> -		       "SRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with PXM %u (%"PRIpaddr"-%"PRIpaddr")\n",
> -		       pxm, start, end, node_to_pxm(memblk_nodeid[i]),
> +		       "SRAT: Node %u: (%"PRIpaddr"-%"PRIpaddr") interleaves with node %u memblk (%"PRIpaddr"-%"PRIpaddr")\n",
> +		       node, nd_start, nd_end, memblk_nodeid[i],

Please log pxm (not node) here just like is done in the overlap case.
The remote node ID will then require converting to PXM, of course.

Jan



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 9/9] xen/x86: use INFO level for node's without memory log message
  2022-05-11  1:46 ` [PATCH v3 9/9] xen/x86: use INFO level for node's without memory log message Wei Chen
@ 2022-05-18 13:33   ` Jan Beulich
  2022-05-19  2:37     ` Wei Chen
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2022-05-18 13:33 UTC (permalink / raw)
  To: Wei Chen; +Cc: nd, Andrew Cooper, Roger Pau Monné, Wei Liu, xen-devel

On 11.05.2022 03:46, Wei Chen wrote:
> In previous code, Xen was using KERN_WARNING for log message
> when Xen found a node without memory. Xen will print this
> warning message, and said that this may be an BIOS Bug or
> mis-configured hardware. But actually, this warning is bogus,
> because in an NUMA setting, nodes can only have processors,
> and with 0 bytes memory. So it is unreasonable to warn about
> BIOS or hardware corruption based on the detection of node
> with 0 bytes memory.
> 
> So in this patch, we remove the warning messages, but just
> keep an info message to info users that there is one or more
> nodes with 0 bytes memory in the system.
> 
> Signed-off-by: Wei Chen <wei.chen@arm.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
preferably with ...

> --- a/xen/arch/x86/srat.c
> +++ b/xen/arch/x86/srat.c
> @@ -549,8 +549,7 @@ int __init acpi_scan_nodes(paddr_t start, paddr_t end)
>  		uint64_t size = nodes[i].end - nodes[i].start;
>  
>  		if ( size == 0 )
> -			printk(KERN_WARNING "SRAT: Node %u has no memory. "
> -			       "BIOS Bug or mis-configured hardware?\n", i);
> +			printk(KERN_INFO "SRAT: Node %u has no memory.\n", i);

... the full stop also dropped (and maybe the upper-case N converted to
lower-case).

Jan



^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v3 2/9] xen: reuse x86 EFI stub functions for Arm
  2022-05-18 13:04   ` Jan Beulich
@ 2022-05-19  2:36     ` Wei Chen
  2022-05-19  6:11       ` Jan Beulich
                         ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Wei Chen @ 2022-05-19  2:36 UTC (permalink / raw)
  To: Jan Beulich, Stefano Stabellini, Julien Grall, Bertrand Marquis
  Cc: nd, Volodymyr Babchuk, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Jiamei Xie, xen-devel

Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2022年5月18日 21:05
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: nd <nd@arm.com>; Stefano Stabellini <sstabellini@kernel.org>; Julien
> Grall <julien@xen.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Roger Pau Monné <roger.pau@citrix.com>; Wei
> Liu <wl@xen.org>; Jiamei Xie <Jiamei.Xie@arm.com>; xen-
> devel@lists.xenproject.org
> Subject: Re: [PATCH v3 2/9] xen: reuse x86 EFI stub functions for Arm
> 
> On 11.05.2022 03:46, Wei Chen wrote:
> > x86 is using compiler feature testing to decide EFI build
> > enable or not. When EFI build is disabled, x86 will use an
> > efi/stub.c file to replace efi/runtime.c for build objects.
> > Following this idea, we introduce a stub file for Arm, but
> > use CONFIG_ARM_EFI to decide EFI build enable or not.
> >
> > And the most functions in x86 EFI stub.c can be reused for
> > other architectures, like Arm. So we move them to common
> > and keep the x86 specific function in x86/efi/stub.c.
> >
> > To avoid the symbol link conflict error when linking common
> > stub files to x86/efi. We add a regular file check in efi
> > stub files' link script. Depends on this check we can bypass
> > the link behaviors for existed stub files in x86/efi.
> >
> > As there is no Arm specific EFI stub function for Arm in
> > current stage, Arm still can use the existed symbol link
> > method for EFI stub files.
> 
> Wouldn't it be better to mandate that every arch has its stub.c,
> and in the Arm one all you'd do (for now) is #include the common
> one? (But see also below.)
>

Personally, I don't like to include a C file into another C file.
But I am OK as long as the Arm maintainers agree.
@Stefano Stabellini @Bertrand Marquis @Julien Grall

> > --- a/xen/arch/arm/Makefile
> > +++ b/xen/arch/arm/Makefile
> > @@ -1,6 +1,5 @@
> >  obj-$(CONFIG_ARM_32) += arm32/
> >  obj-$(CONFIG_ARM_64) += arm64/
> > -obj-$(CONFIG_ARM_64) += efi/
> >  obj-$(CONFIG_ACPI) += acpi/
> >  obj-$(CONFIG_HAS_PCI) += pci/
> >  ifneq ($(CONFIG_NO_PLAT),y)
> > @@ -20,6 +19,7 @@ obj-y += domain.o
> >  obj-y += domain_build.init.o
> >  obj-y += domctl.o
> >  obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
> > +obj-y += efi/
> >  obj-y += gic.o
> >  obj-y += gic-v2.o
> >  obj-$(CONFIG_GICV3) += gic-v3.o
> > diff --git a/xen/arch/arm/efi/Makefile b/xen/arch/arm/efi/Makefile
> > index 4313c39066..dffe72e589 100644
> > --- a/xen/arch/arm/efi/Makefile
> > +++ b/xen/arch/arm/efi/Makefile
> > @@ -1,4 +1,12 @@
> >  include $(srctree)/common/efi/efi-common.mk
> >
> > +ifeq ($(CONFIG_ARM_EFI),y)
> >  obj-y += $(EFIOBJ-y)
> >  obj-$(CONFIG_ACPI) +=  efi-dom0.init.o
> > +else
> > +# Add stub.o to EFIOBJ-y to re-use the clean-files in
> > +# efi-common.mk. Otherwise the link of stub.c in arm/efi
> > +# will not be cleaned in "make clean".
> > +EFIOBJ-y += stub.o
> > +obj-y += stub.o
> > +endif
> 
> I realize Stefano indicated he's happy with the Arm side, but I still
> wonder: What use is the stub on Arm32? Even further - once you have a
> config option (rather than x86'es build-time check plus x86'es dual-
> purposing of all object files), why do you need a stub in the first
> place? You ought to be able to deal with things via inline functions
> and macros, I would think.
> 

We will use efi_enabled() on some common codes of Arm. In the last
version, I had used static inline function, but that will need an
CONFIG_EFI in xen/efi.h to gate the definitions of EFI functions,
otherwise we just can implement the efi_enabled in non-static-inline
way. Or use another name to wrapper efi_enabled. (patch#20, 21)
But as x86 has its own way to decide EFI build or not, the CONFIG_EFI
has been rejected. In this case, we use CONFIG_ARM_EFI for Arm itself.

For CONFIG_ARM_EFI, it's impossible to be used in xen/efi.h to gate
definitions. So if I want to use macros or static-inline functions,
I need to use #ifdef CONFIG_ARM_EFI in everywhere to gate xen/efi.h.
Or use another header file to warpper xen/efi.h.

> > --- a/xen/common/efi/efi-common.mk
> > +++ b/xen/common/efi/efi-common.mk
> > @@ -9,7 +9,8 @@ CFLAGS-y += -iquote $(srcdir)
> >  # e.g.: It transforms "dir/foo/bar" into successively
> >  #       "dir foo bar", ".. .. ..", "../../.."
> >  $(obj)/%.c: $(srctree)/common/efi/%.c FORCE
> > -	$(Q)ln -nfs $(subst $(space),/,$(patsubst %,..,$(subst
> /, ,$(obj))))/source/common/efi/$(<F) $@
> > +	$(Q)test -f $@ || \
> > +	ln -nfs $(subst $(space),/,$(patsubst %,..,$(subst
> /, ,$(obj))))/source/common/efi/$(<F) $@
> 
> Please can you indent the "ln" to match "test", such that it's easily
> visible (without paying attention to line continuation characters)
> that these two lines are a single command?
> 

Yeah, of course, I will do it.

> Jan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v3 9/9] xen/x86: use INFO level for node's without memory log message
  2022-05-18 13:33   ` Jan Beulich
@ 2022-05-19  2:37     ` Wei Chen
  0 siblings, 0 replies; 29+ messages in thread
From: Wei Chen @ 2022-05-19  2:37 UTC (permalink / raw)
  To: Jan Beulich; +Cc: nd, Andrew Cooper, Roger Pau Monné, Wei Liu, xen-devel

Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2022年5月18日 21:34
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: nd <nd@arm.com>; Andrew Cooper <andrew.cooper3@citrix.com>; Roger Pau
> Monné <roger.pau@citrix.com>; Wei Liu <wl@xen.org>; xen-
> devel@lists.xenproject.org
> Subject: Re: [PATCH v3 9/9] xen/x86: use INFO level for node's without
> memory log message
> 
> On 11.05.2022 03:46, Wei Chen wrote:
> > In previous code, Xen was using KERN_WARNING for log message
> > when Xen found a node without memory. Xen will print this
> > warning message, and said that this may be an BIOS Bug or
> > mis-configured hardware. But actually, this warning is bogus,
> > because in an NUMA setting, nodes can only have processors,
> > and with 0 bytes memory. So it is unreasonable to warn about
> > BIOS or hardware corruption based on the detection of node
> > with 0 bytes memory.
> >
> > So in this patch, we remove the warning messages, but just
> > keep an info message to info users that there is one or more
> > nodes with 0 bytes memory in the system.
> >
> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> preferably with ...
> 
> > --- a/xen/arch/x86/srat.c
> > +++ b/xen/arch/x86/srat.c
> > @@ -549,8 +549,7 @@ int __init acpi_scan_nodes(paddr_t start, paddr_t
> end)
> >  		uint64_t size = nodes[i].end - nodes[i].start;
> >
> >  		if ( size == 0 )
> > -			printk(KERN_WARNING "SRAT: Node %u has no memory. "
> > -			       "BIOS Bug or mis-configured hardware?\n", i);
> > +			printk(KERN_INFO "SRAT: Node %u has no memory.\n", i);
> 
> ... the full stop also dropped (and maybe the upper-case N converted to
> lower-case).
> 

Ok, I will do it in next version.

> Jan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v3 8/9] xen/x86: add detection of memory interleaves for different nodes
  2022-05-18 13:30   ` Jan Beulich
@ 2022-05-19  3:37     ` Wei Chen
  2022-05-19  6:17       ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Wei Chen @ 2022-05-19  3:37 UTC (permalink / raw)
  To: Jan Beulich
  Cc: nd, Andrew Cooper, Roger Pau Monné, Wei Liu, Jiamei Xie, xen-devel

Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2022年5月18日 21:31
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: nd <nd@arm.com>; Andrew Cooper <andrew.cooper3@citrix.com>; Roger Pau
> Monné <roger.pau@citrix.com>; Wei Liu <wl@xen.org>; Jiamei Xie
> <Jiamei.Xie@arm.com>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v3 8/9] xen/x86: add detection of memory interleaves
> for different nodes
> 
> On 11.05.2022 03:46, Wei Chen wrote:
> > --- a/xen/arch/x86/srat.c
> > +++ b/xen/arch/x86/srat.c
> > @@ -42,6 +42,12 @@ static struct node node_memblk_range[NR_NODE_MEMBLKS];
> >  static nodeid_t memblk_nodeid[NR_NODE_MEMBLKS];
> >  static __initdata DECLARE_BITMAP(memblk_hotplug, NR_NODE_MEMBLKS);
> >
> > +enum conflicts {
> > +	NO_CONFLICT = 0,
> 
> No need for the "= 0".
> 

Ack.

> > +	ERR_OVERLAP,
> 
> While this at least can be an error (the self-overlap case is merely
> warned about), ...
> 
> > +	ERR_INTERLEAVE,
> 
> ... I don't think this is, and hence I'd recommend to drop "ERR_".
> 

Oh, yes. I all drop it for all above enumerations.

> > @@ -119,20 +125,43 @@ int valid_numa_range(paddr_t start, paddr_t end,
> nodeid_t node)
> >  	return 0;
> >  }
> >
> > -static __init int conflicting_memblks(paddr_t start, paddr_t end)
> > +static enum conflicts __init
> > +conflicting_memblks(nodeid_t nid, paddr_t start, paddr_t end,
> > +		    paddr_t nd_start, paddr_t nd_end, int *mblkid)
> 
> Why "int"? Can the value passed back be negative?
> 

The caller "acpi_numa_memory_affinity_init" defines int for node memory
block id, and as a return value at the same time. So I haven't changed it.
As we don't use this "int" for return value any more, I agree, it will
never be negative, I would fix it.

> >  {
> >  	int i;
> >
> > +	/*
> > +	 * Scan all recorded nodes' memory blocks to check conflicts:
> > +	 * Overlap or interleave.
> > +	 */
> >  	for (i = 0; i < num_node_memblks; i++) {
> >  		struct node *nd = &node_memblk_range[i];
> > +		*mblkid = i;
> 
> Style: Please maintain a blank line between declaration(s) and
> statement(s).
> 

Ok.

> > @@ -310,42 +342,67 @@ acpi_numa_memory_affinity_init(const struct
> acpi_srat_mem_affinity *ma)
> >  		bad_srat();
> >  		return;
> >  	}
> > +
> > +	/*
> > +	 * For the node that already has some memory blocks, we will
> > +	 * expand the node memory range temporarily to check memory
> > +	 * interleaves with other nodes. We will not use this node
> > +	 * temp memory range to check overlaps, because it will mask
> > +	 * the overlaps in same node.
> > +	 *
> > +	 * Node with 0 bytes memory doesn't need this expandsion.
> > +	 */
> > +	nd_start = start;
> > +	nd_end = end;
> > +	nd = &nodes[node];
> > +	if (nd->start != nd->end) {
> > +		if (nd_start > nd->start)
> > +			nd_start = nd->start;
> > +
> > +		if (nd_end < end)
> 
> Did you mean nd->end here on the right side of < ? By intentionally

Oh! thanks for pointing out this one! Yes, right side should be nd->end.

> not adding "default:" in the body, you then also allow the compiler
> to point out that addition of a new enumerator also needs handling
> here.
> 

Did you mean, we need to add if ... else ... in this block? If yes,
is it ok to update this block like:
	if (nd->start != nd->end) {
		nd_start = min(nd_start, nd->start);
		nd_end = max(nd_end, nd->end);
	}
?

> > +			nd_end = nd->end;
> > +	}
> > +
> >  	/* It is fine to add this area to the nodes data it will be used
> later*/
> > -	i = conflicting_memblks(start, end);
> > -	if (i < 0)
> > -		/* everything fine */;
> > -	else if (memblk_nodeid[i] == node) {
> > -		bool mismatch = !(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) !=
> > -		                !test_bit(i, memblk_hotplug);
> > -
> > -		printk("%sSRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with
> itself (%"PRIpaddr"-%"PRIpaddr")\n",
> > -		       mismatch ? KERN_ERR : KERN_WARNING, pxm, start, end,
> > -		       node_memblk_range[i].start, node_memblk_range[i].end);
> > -		if (mismatch) {
> > +	status = conflicting_memblks(node, start, end, nd_start, nd_end, &i);
> > +	if (status == ERR_OVERLAP) {
> 
> Please use switch(status) when checking enumerated values.
> 

Ok, I will do it.

> > +		if (memblk_nodeid[i] == node) {
> > +			bool mismatch = !(ma->flags &
> > +					ACPI_SRAT_MEM_HOT_PLUGGABLE) !=
> > +			                !test_bit(i, memblk_hotplug);
> 
> Style: The middle line wants indenting by two more characters.
> 

Yes, I will do it.

> > +
> > +			printk("%sSRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr")
> overlaps with itself (%"PRIpaddr"-%"PRIpaddr")\n",
> > +			       mismatch ? KERN_ERR : KERN_WARNING, pxm, start,
> > +			       end, node_memblk_range[i].start,
> > +			       node_memblk_range[i].end);
> > +			if (mismatch) {
> > +				bad_srat();
> > +				return;
> > +			}
> > +		} else {
> > +			printk(KERN_ERR
> > +			       "SRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps
> with PXM %u (%"PRIpaddr"-%"PRIpaddr")\n",
> > +			       pxm, start, end, node_to_pxm(memblk_nodeid[i]),
> > +			       node_memblk_range[i].start,
> > +			       node_memblk_range[i].end);
> >  			bad_srat();
> >  			return;
> >  		}
> > -	} else {
> > +	} else if (status == ERR_INTERLEAVE) {
> >  		printk(KERN_ERR
> > -		       "SRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with
> PXM %u (%"PRIpaddr"-%"PRIpaddr")\n",
> > -		       pxm, start, end, node_to_pxm(memblk_nodeid[i]),
> > +		       "SRAT: Node %u: (%"PRIpaddr"-%"PRIpaddr") interleaves
> with node %u memblk (%"PRIpaddr"-%"PRIpaddr")\n",
> > +		       node, nd_start, nd_end, memblk_nodeid[i],
> 
> Please log pxm (not node) here just like is done in the overlap case.
> The remote node ID will then require converting to PXM, of course.
> 

Ok, will use PXM here. But I have question for upcoming changes, if we
move this part of code to common. As device tree NUMA doesn't have
PXM concept (even I can use a fake node_to_pxm to do 1:1 mapping), so
can we still use PXM here?

> Jan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 2/9] xen: reuse x86 EFI stub functions for Arm
  2022-05-19  2:36     ` Wei Chen
@ 2022-05-19  6:11       ` Jan Beulich
  2022-05-20  9:08       ` Bertrand Marquis
  2022-05-20  9:46       ` Julien Grall
  2 siblings, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2022-05-19  6:11 UTC (permalink / raw)
  To: Wei Chen
  Cc: nd, Volodymyr Babchuk, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Jiamei Xie, xen-devel, Stefano Stabellini, Julien Grall,
	Bertrand Marquis

On 19.05.2022 04:36, Wei Chen wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 2022年5月18日 21:05
>>
>> On 11.05.2022 03:46, Wei Chen wrote:
>>> x86 is using compiler feature testing to decide EFI build
>>> enable or not. When EFI build is disabled, x86 will use an
>>> efi/stub.c file to replace efi/runtime.c for build objects.
>>> Following this idea, we introduce a stub file for Arm, but
>>> use CONFIG_ARM_EFI to decide EFI build enable or not.
>>>
>>> And the most functions in x86 EFI stub.c can be reused for
>>> other architectures, like Arm. So we move them to common
>>> and keep the x86 specific function in x86/efi/stub.c.
>>>
>>> To avoid the symbol link conflict error when linking common
>>> stub files to x86/efi. We add a regular file check in efi
>>> stub files' link script. Depends on this check we can bypass
>>> the link behaviors for existed stub files in x86/efi.
>>>
>>> As there is no Arm specific EFI stub function for Arm in
>>> current stage, Arm still can use the existed symbol link
>>> method for EFI stub files.
>>
>> Wouldn't it be better to mandate that every arch has its stub.c,
>> and in the Arm one all you'd do (for now) is #include the common
>> one? (But see also below.)
>>
> 
> Personally, I don't like to include a C file into another C file.
> But I am OK as long as the Arm maintainers agree.
> @Stefano Stabellini @Bertrand Marquis @Julien Grall

Well - an alternative is to follow the boot.c model: Have a per-arch
stub.h which the common stub.c includes.

>>> --- a/xen/arch/arm/Makefile
>>> +++ b/xen/arch/arm/Makefile
>>> @@ -1,6 +1,5 @@
>>>  obj-$(CONFIG_ARM_32) += arm32/
>>>  obj-$(CONFIG_ARM_64) += arm64/
>>> -obj-$(CONFIG_ARM_64) += efi/
>>>  obj-$(CONFIG_ACPI) += acpi/
>>>  obj-$(CONFIG_HAS_PCI) += pci/
>>>  ifneq ($(CONFIG_NO_PLAT),y)
>>> @@ -20,6 +19,7 @@ obj-y += domain.o
>>>  obj-y += domain_build.init.o
>>>  obj-y += domctl.o
>>>  obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
>>> +obj-y += efi/
>>>  obj-y += gic.o
>>>  obj-y += gic-v2.o
>>>  obj-$(CONFIG_GICV3) += gic-v3.o
>>> diff --git a/xen/arch/arm/efi/Makefile b/xen/arch/arm/efi/Makefile
>>> index 4313c39066..dffe72e589 100644
>>> --- a/xen/arch/arm/efi/Makefile
>>> +++ b/xen/arch/arm/efi/Makefile
>>> @@ -1,4 +1,12 @@
>>>  include $(srctree)/common/efi/efi-common.mk
>>>
>>> +ifeq ($(CONFIG_ARM_EFI),y)
>>>  obj-y += $(EFIOBJ-y)
>>>  obj-$(CONFIG_ACPI) +=  efi-dom0.init.o
>>> +else
>>> +# Add stub.o to EFIOBJ-y to re-use the clean-files in
>>> +# efi-common.mk. Otherwise the link of stub.c in arm/efi
>>> +# will not be cleaned in "make clean".
>>> +EFIOBJ-y += stub.o
>>> +obj-y += stub.o
>>> +endif
>>
>> I realize Stefano indicated he's happy with the Arm side, but I still
>> wonder: What use is the stub on Arm32? Even further - once you have a
>> config option (rather than x86'es build-time check plus x86'es dual-
>> purposing of all object files), why do you need a stub in the first
>> place? You ought to be able to deal with things via inline functions
>> and macros, I would think.
>>
> 
> We will use efi_enabled() on some common codes of Arm. In the last
> version, I had used static inline function, but that will need an
> CONFIG_EFI in xen/efi.h to gate the definitions of EFI functions,
> otherwise we just can implement the efi_enabled in non-static-inline
> way. Or use another name to wrapper efi_enabled. (patch#20, 21)
> But as x86 has its own way to decide EFI build or not, the CONFIG_EFI
> has been rejected. In this case, we use CONFIG_ARM_EFI for Arm itself.
> 
> For CONFIG_ARM_EFI, it's impossible to be used in xen/efi.h to gate
> definitions. So if I want to use macros or static-inline functions,
> I need to use #ifdef CONFIG_ARM_EFI in everywhere to gate xen/efi.h.
> Or use another header file to warpper xen/efi.h.

Actually I wouldn't mind CONFIG_EFI - eventually we may want to make
this a use-visible option even on x86. Sooner or later (I hope) we're
going to bump the tool chain base line requirements, at which point
the build time check on x86 can at least partly go away. Plus there
we support EFI in two different ways, and CONFIG_EFI would hence have
a meaning even when the build time check fails. The one thing that I
wouldn't view as reasonable is a prompt-less EFI option, which x86
would (not entirely correctly) select unconditionally.

Jan



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 8/9] xen/x86: add detection of memory interleaves for different nodes
  2022-05-19  3:37     ` Wei Chen
@ 2022-05-19  6:17       ` Jan Beulich
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2022-05-19  6:17 UTC (permalink / raw)
  To: Wei Chen
  Cc: nd, Andrew Cooper, Roger Pau Monné, Wei Liu, Jiamei Xie, xen-devel

On 19.05.2022 05:37, Wei Chen wrote:
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 2022年5月18日 21:31
>>
>> On 11.05.2022 03:46, Wei Chen wrote:
>>> @@ -310,42 +342,67 @@ acpi_numa_memory_affinity_init(const struct
>> acpi_srat_mem_affinity *ma)
>>>  		bad_srat();
>>>  		return;
>>>  	}
>>> +
>>> +	/*
>>> +	 * For the node that already has some memory blocks, we will
>>> +	 * expand the node memory range temporarily to check memory
>>> +	 * interleaves with other nodes. We will not use this node
>>> +	 * temp memory range to check overlaps, because it will mask
>>> +	 * the overlaps in same node.
>>> +	 *
>>> +	 * Node with 0 bytes memory doesn't need this expandsion.
>>> +	 */
>>> +	nd_start = start;
>>> +	nd_end = end;
>>> +	nd = &nodes[node];
>>> +	if (nd->start != nd->end) {
>>> +		if (nd_start > nd->start)
>>> +			nd_start = nd->start;
>>> +
>>> +		if (nd_end < end)
>>
>> Did you mean nd->end here on the right side of < ? By intentionally
> 
> Oh! thanks for pointing out this one! Yes, right side should be nd->end.
> 
>> not adding "default:" in the body, you then also allow the compiler
>> to point out that addition of a new enumerator also needs handling
>> here.
>>
> 
> Did you mean, we need to add if ... else ... in this block? If yes,
> is it ok to update this block like:
> 	if (nd->start != nd->end) {
> 		nd_start = min(nd_start, nd->start);
> 		nd_end = max(nd_end, nd->end);
> 	}
> ?

No. I attached this part about "default:" late in the process of writing
the reply, and I did put it in the wrong spot. I'm sorry. It really was
meant to go ...

>>> +			nd_end = nd->end;
>>> +	}
>>> +
>>>  	/* It is fine to add this area to the nodes data it will be used
>> later*/
>>> -	i = conflicting_memblks(start, end);
>>> -	if (i < 0)
>>> -		/* everything fine */;
>>> -	else if (memblk_nodeid[i] == node) {
>>> -		bool mismatch = !(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) !=
>>> -		                !test_bit(i, memblk_hotplug);
>>> -
>>> -		printk("%sSRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with
>> itself (%"PRIpaddr"-%"PRIpaddr")\n",
>>> -		       mismatch ? KERN_ERR : KERN_WARNING, pxm, start, end,
>>> -		       node_memblk_range[i].start, node_memblk_range[i].end);
>>> -		if (mismatch) {
>>> +	status = conflicting_memblks(node, start, end, nd_start, nd_end, &i);
>>> +	if (status == ERR_OVERLAP) {
>>
>> Please use switch(status) when checking enumerated values.
>>
> 
> Ok, I will do it.

... here, explaining the request to use switch().

>>> +			printk("%sSRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr")
>> overlaps with itself (%"PRIpaddr"-%"PRIpaddr")\n",
>>> +			       mismatch ? KERN_ERR : KERN_WARNING, pxm, start,
>>> +			       end, node_memblk_range[i].start,
>>> +			       node_memblk_range[i].end);
>>> +			if (mismatch) {
>>> +				bad_srat();
>>> +				return;
>>> +			}
>>> +		} else {
>>> +			printk(KERN_ERR
>>> +			       "SRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps
>> with PXM %u (%"PRIpaddr"-%"PRIpaddr")\n",
>>> +			       pxm, start, end, node_to_pxm(memblk_nodeid[i]),
>>> +			       node_memblk_range[i].start,
>>> +			       node_memblk_range[i].end);
>>>  			bad_srat();
>>>  			return;
>>>  		}
>>> -	} else {
>>> +	} else if (status == ERR_INTERLEAVE) {
>>>  		printk(KERN_ERR
>>> -		       "SRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with
>> PXM %u (%"PRIpaddr"-%"PRIpaddr")\n",
>>> -		       pxm, start, end, node_to_pxm(memblk_nodeid[i]),
>>> +		       "SRAT: Node %u: (%"PRIpaddr"-%"PRIpaddr") interleaves
>> with node %u memblk (%"PRIpaddr"-%"PRIpaddr")\n",
>>> +		       node, nd_start, nd_end, memblk_nodeid[i],
>>
>> Please log pxm (not node) here just like is done in the overlap case.
>> The remote node ID will then require converting to PXM, of course.
>>
> 
> Ok, will use PXM here. But I have question for upcoming changes, if we
> move this part of code to common. As device tree NUMA doesn't have
> PXM concept (even I can use a fake node_to_pxm to do 1:1 mapping), so
> can we still use PXM here?

This will want properly abstracting once made common, yes. What the correct
model on Arm/DT is I can't really tell. But my (earlier voiced) request
remains: What is logged should by referring the firmware provided values,
not Xen-internal ones. Otherwise someone reading the log cannot easily
know / derive what's wrong where.

Jan



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 2/9] xen: reuse x86 EFI stub functions for Arm
  2022-05-19  2:36     ` Wei Chen
  2022-05-19  6:11       ` Jan Beulich
@ 2022-05-20  9:08       ` Bertrand Marquis
  2022-05-20  9:46       ` Julien Grall
  2 siblings, 0 replies; 29+ messages in thread
From: Bertrand Marquis @ 2022-05-20  9:08 UTC (permalink / raw)
  To: Wei Chen
  Cc: Jan Beulich, Stefano Stabellini, Julien Grall, nd,
	Volodymyr Babchuk, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Jiamei Xie, xen-devel

Hi,

> On 19 May 2022, at 03:36, Wei Chen <Wei.Chen@arm.com> wrote:
> 
> Hi Jan,
> 
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 2022年5月18日 21:05
>> To: Wei Chen <Wei.Chen@arm.com>
>> Cc: nd <nd@arm.com>; Stefano Stabellini <sstabellini@kernel.org>; Julien
>> Grall <julien@xen.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
>> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>; Andrew Cooper
>> <andrew.cooper3@citrix.com>; Roger Pau Monné <roger.pau@citrix.com>; Wei
>> Liu <wl@xen.org>; Jiamei Xie <Jiamei.Xie@arm.com>; xen-
>> devel@lists.xenproject.org
>> Subject: Re: [PATCH v3 2/9] xen: reuse x86 EFI stub functions for Arm
>> 
>> On 11.05.2022 03:46, Wei Chen wrote:
>>> x86 is using compiler feature testing to decide EFI build
>>> enable or not. When EFI build is disabled, x86 will use an
>>> efi/stub.c file to replace efi/runtime.c for build objects.
>>> Following this idea, we introduce a stub file for Arm, but
>>> use CONFIG_ARM_EFI to decide EFI build enable or not.
>>> 
>>> And the most functions in x86 EFI stub.c can be reused for
>>> other architectures, like Arm. So we move them to common
>>> and keep the x86 specific function in x86/efi/stub.c.
>>> 
>>> To avoid the symbol link conflict error when linking common
>>> stub files to x86/efi. We add a regular file check in efi
>>> stub files' link script. Depends on this check we can bypass
>>> the link behaviors for existed stub files in x86/efi.
>>> 
>>> As there is no Arm specific EFI stub function for Arm in
>>> current stage, Arm still can use the existed symbol link
>>> method for EFI stub files.
>> 
>> Wouldn't it be better to mandate that every arch has its stub.c,
>> and in the Arm one all you'd do (for now) is #include the common
>> one? (But see also below.)
>> 
> 
> Personally, I don't like to include a C file into another C file.
> But I am OK as long as the Arm maintainers agree.
> @Stefano Stabellini @Bertrand Marquis @Julien Grall

I agree with Wei here and if we are realistic the current way the EFI code works needs a redesign anyway and asking him to change this in this serie is not right.
So I am OK with Wei solution.

Cheers
Bertrand


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 2/9] xen: reuse x86 EFI stub functions for Arm
  2022-05-19  2:36     ` Wei Chen
  2022-05-19  6:11       ` Jan Beulich
  2022-05-20  9:08       ` Bertrand Marquis
@ 2022-05-20  9:46       ` Julien Grall
  2 siblings, 0 replies; 29+ messages in thread
From: Julien Grall @ 2022-05-20  9:46 UTC (permalink / raw)
  To: Wei Chen, Jan Beulich, Stefano Stabellini, Bertrand Marquis
  Cc: nd, Volodymyr Babchuk, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Jiamei Xie, xen-devel

Hi Wei,

On 19/05/2022 03:36, Wei Chen wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 2022年5月18日 21:05
>> To: Wei Chen <Wei.Chen@arm.com>
>> Cc: nd <nd@arm.com>; Stefano Stabellini <sstabellini@kernel.org>; Julien
>> Grall <julien@xen.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
>> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>; Andrew Cooper
>> <andrew.cooper3@citrix.com>; Roger Pau Monné <roger.pau@citrix.com>; Wei
>> Liu <wl@xen.org>; Jiamei Xie <Jiamei.Xie@arm.com>; xen-
>> devel@lists.xenproject.org
>> Subject: Re: [PATCH v3 2/9] xen: reuse x86 EFI stub functions for Arm
>>
>> On 11.05.2022 03:46, Wei Chen wrote:
>>> x86 is using compiler feature testing to decide EFI build
>>> enable or not. When EFI build is disabled, x86 will use an
>>> efi/stub.c file to replace efi/runtime.c for build objects.
>>> Following this idea, we introduce a stub file for Arm, but
>>> use CONFIG_ARM_EFI to decide EFI build enable or not.
>>>
>>> And the most functions in x86 EFI stub.c can be reused for
>>> other architectures, like Arm. So we move them to common
>>> and keep the x86 specific function in x86/efi/stub.c.
>>>
>>> To avoid the symbol link conflict error when linking common
>>> stub files to x86/efi. We add a regular file check in efi
>>> stub files' link script. Depends on this check we can bypass
>>> the link behaviors for existed stub files in x86/efi.
>>>
>>> As there is no Arm specific EFI stub function for Arm in
>>> current stage, Arm still can use the existed symbol link
>>> method for EFI stub files.
>>
>> Wouldn't it be better to mandate that every arch has its stub.c,
>> and in the Arm one all you'd do (for now) is #include the common
>> one? (But see also below.)
>>
> 
> Personally, I don't like to include a C file into another C file.

Same here. I know we already use it in EFI but I am not in favor to add 
more of them.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v3 1/9] xen/arm: Print a 64-bit number in hex from early uart
  2022-05-16 17:14   ` Julien Grall
  2022-05-17  1:21     ` Wei Chen
@ 2022-06-17  3:27     ` Henry Wang
  2022-06-17  8:41       ` Julien Grall
  1 sibling, 1 reply; 29+ messages in thread
From: Henry Wang @ 2022-06-17  3:27 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: nd, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk,
	Jiamei Xie, Wei Chen, Julien Grall

Hi Julien,

> -----Original Message-----
> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of
> Julien Grall
> 
> Hi,
> 
> I have committed this patch.
> 
> Patch #3 looks to be suitably acked but I am not sure whether it can be
> committed before #2. So I didn't commit it.
> 
> Please let me know if it can be.

IIUC, the latest series (v6) [1] is properly acked and reviewed for the whole
series, so I think v6 of this series is ready to be merged. Sending this as a
gentle reminder :) 

[1] https://patchwork.kernel.org/project/xen-devel/list/?series=649092

Kind regards,
Henry

> 
> Cheers,
> 
> --
> Julien Grall


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v3 1/9] xen/arm: Print a 64-bit number in hex from early uart
  2022-06-17  3:27     ` Henry Wang
@ 2022-06-17  8:41       ` Julien Grall
  2022-06-17  9:10         ` Henry Wang
  0 siblings, 1 reply; 29+ messages in thread
From: Julien Grall @ 2022-06-17  8:41 UTC (permalink / raw)
  To: Henry Wang, xen-devel
  Cc: nd, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk,
	Jiamei Xie, Wei Chen, Julien Grall



On 17/06/2022 04:27, Henry Wang wrote:
> Hi Julien,

Hi Henry,

>> -----Original Message-----
>> From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of
>> Julien Grall
>>
>> Hi,
>>
>> I have committed this patch.
>>
>> Patch #3 looks to be suitably acked but I am not sure whether it can be
>> committed before #2. So I didn't commit it.
>>
>> Please let me know if it can be.
> 
> IIUC, the latest series (v6) [1] is properly acked and reviewed for the whole
> series, so I think v6 of this series is ready to be merged. Sending this as a
> gentle reminder :)

Thanks for the reminder. My comment above was specifically referring to 
patches in v3. If the patches are from a new version, can I suggest to 
ping on the exact version?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [PATCH v3 1/9] xen/arm: Print a 64-bit number in hex from early uart
  2022-06-17  8:41       ` Julien Grall
@ 2022-06-17  9:10         ` Henry Wang
  0 siblings, 0 replies; 29+ messages in thread
From: Henry Wang @ 2022-06-17  9:10 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: nd, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk,
	Jiamei Xie, Wei Chen, Julien Grall

Hi Julien,

> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> On 17/06/2022 04:27, Henry Wang wrote:
> > Hi Julien,
> Hi Henry,
> >>
> >> Hi,
> >>
> >> I have committed this patch.
> >>
> >> Patch #3 looks to be suitably acked but I am not sure whether it can be
> >> committed before #2. So I didn't commit it.
> >>
> >> Please let me know if it can be.
> >
> > IIUC, the latest series (v6) [1] is properly acked and reviewed for the
> whole
> > series, so I think v6 of this series is ready to be merged. Sending this as a
> > gentle reminder :)
> 
> Thanks for the reminder. My comment above was specifically referring to
> patches in v3. If the patches are from a new version, can I suggest to
> ping on the exact version?

Oh of course, my bad - I thought that email didn't received reply so I tried to
continue the discussion there. Thanks for the suggestion! I will keep that in
mind from now :))

Kind regards,
Henry

> 
> Cheers,
> 
> --
> Julien Grall

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2022-06-17  9:11 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-11  1:46 [PATCH v3 0/9] Device tree based NUMA support for Arm - Part#1 Wei Chen
2022-05-11  1:46 ` [PATCH v3 1/9] xen/arm: Print a 64-bit number in hex from early uart Wei Chen
2022-05-16 17:14   ` Julien Grall
2022-05-17  1:21     ` Wei Chen
2022-06-17  3:27     ` Henry Wang
2022-06-17  8:41       ` Julien Grall
2022-06-17  9:10         ` Henry Wang
2022-05-11  1:46 ` [PATCH v3 2/9] xen: reuse x86 EFI stub functions for Arm Wei Chen
2022-05-13  0:33   ` Stefano Stabellini
2022-05-18 13:04   ` Jan Beulich
2022-05-19  2:36     ` Wei Chen
2022-05-19  6:11       ` Jan Beulich
2022-05-20  9:08       ` Bertrand Marquis
2022-05-20  9:46       ` Julien Grall
2022-05-11  1:46 ` [PATCH v3 3/9] xen/arm: Keep memory nodes in device tree when Xen boots from EFI Wei Chen
2022-05-11  1:46 ` [PATCH v3 4/9] xen: introduce an arch helper for default dma zone status Wei Chen
2022-05-13  0:32   ` Stefano Stabellini
2022-05-18 13:09   ` Jan Beulich
2022-05-11  1:46 ` [PATCH v3 5/9] xen: decouple NUMA from ACPI in Kconfig Wei Chen
2022-05-11  1:46 ` [PATCH v3 6/9] xen/arm: use !CONFIG_NUMA to keep fake NUMA API Wei Chen
2022-05-11  1:46 ` [PATCH v3 7/9] xen/x86: use paddr_t for addresses in NUMA node structure Wei Chen
2022-05-18 13:13   ` Jan Beulich
2022-05-11  1:46 ` [PATCH v3 8/9] xen/x86: add detection of memory interleaves for different nodes Wei Chen
2022-05-18 13:30   ` Jan Beulich
2022-05-19  3:37     ` Wei Chen
2022-05-19  6:17       ` Jan Beulich
2022-05-11  1:46 ` [PATCH v3 9/9] xen/x86: use INFO level for node's without memory log message Wei Chen
2022-05-18 13:33   ` Jan Beulich
2022-05-19  2:37     ` Wei Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.