All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/8] Device tree based NUMA support for Arm - Part#1
@ 2022-05-23  6:25 Wei Chen
  2022-05-23  6:25 ` [PATCH v4 1/8] xen: reuse x86 EFI stub functions for Arm Wei Chen
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Wei Chen @ 2022-05-23  6:25 UTC (permalink / raw)
  To: xen-devel
  Cc: nd, Wei Chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, George Dunlap

(The Arm device tree based NUMA support patch set contains 35
patches. In order to make stuff easier for reviewers, I split
them into 3 parts:
1. Preparation. I have re-sorted the patch series. And moved
   independent patches to the head of the series.
2. Move generically usable code from x86 to common.
3. Add new code to support Arm.

This series only contains the first part patches.)

Xen memory allocation and scheduler modules are NUMA aware.
But actually, on x86 has implemented the architecture APIs
to support NUMA. Arm was providing a set of fake architecture
APIs to make it compatible with NUMA awared memory allocation
and scheduler.

Arm system was working well as a single node NUMA system with
these fake APIs, because we didn't have multiple nodes NUMA
system on Arm. But in recent years, more and more Arm devices
support multiple nodes NUMA system.

So now we have a new problem. When Xen is running on these Arm
devices, Xen still treat them as single node SMP systems. The
NUMA affinity capability of Xen memory allocation and scheduler
becomes meaningless. Because they rely on input data that does
not reflect real NUMA layout.

Xen still think the access time for all of the memory is the
same for all CPUs. However, Xen may allocate memory to a VM
from different NUMA nodes with different access speeds. This
difference can be amplified in workloads inside VM, causing
performance instability and timeouts.

So in this patch series, we implement a set of NUMA API to use
device tree to describe the NUMA layout. We reuse most of the
code of x86 NUMA to create and maintain the mapping between
memory and CPU, create the matrix between any two NUMA nodes.
Except ACPI and some x86 specified code, we have moved other
code to common. In next stage, when we implement ACPI based
NUMA for Arm64, we may move the ACPI NUMA code to common too,
but in current stage, we keep it as x86 only.

This patch serires has been tested and booted well on one
Arm64 NUMA machine and one HPE x86 NUMA machine.

---
Part1 v3->v4:
1. Add indent to make ln and test to be aligned in EFI
   common makefile.
2. Drop "ERR" prefix for node conflict check enumeration,
   and remove init value.
3. Use "switch case" for enumeration, and add "default:"
4. Use "PXM" in log messages.
5. Use unsigned int for node memory block id.
6. Fix some code-style comments.
7. Use "nd->end" in node range expansion check.
Part1 v2->v3:
1. Rework EFI stub patch:
   1.1. Add existed file check, if exists a regular stub files,
        the common/stub files' links will be ignored.
   1.2. Keep stub.c in x86/efi to include common/efi/stub.c
   1.3. Restore efi_compat_xxx stub functions to x86/efi.c.
        Other architectures will not use efi_compat_xxx.
   1.4. Remove ARM_EFI dependency from ARM_64.
   1.5. Add comment for adding stub.o to EFIOBJ-y.
   1.6. Merge patch#2 and patch#3 to one patch.
2. Rename arch_have_default_dmazone to arch_want_default_dmazone
3. Use uint64_t for size in acpi_scan_nodes, make it be
   consistent with numa_emulation.
4. Merge the interleaves checking code from a separate function
   to conflicting_memblks.
5. Use INFO level for node's without memory log message.
6. Move "xen/x86: Use ASSERT instead of VIRTUAL_BUG_ON for
   phys_to_nid" to part#2.
Part1 v1->v2:
1. Move independent patches from later to early of this series.
2. Drop the copy of EFI stub.c from Arm. Share common codes of
   x86 EFI stub for Arm.
3. Use CONFIG_ARM_EFI to replace CONFIG_EFI and remove help text
   and make CONFIG_ARM_EFI invisible.
4. Use ASSERT to replace VIRTUAL_BUG_ON in phys_to_nid.
5. Move MAX_NUMNODES from xen/numa.h to asm/numa.h for x86.
6. Extend the description of Arm's workaround for reserve DMA
   allocations to avoid the same discussion every time for
   arch_have_default_dmazone.
7. Update commit messages.

Wei Chen (8):
  xen: reuse x86 EFI stub functions for Arm
  xen/arm: Keep memory nodes in device tree when Xen boots from EFI
  xen: introduce an arch helper for default dma zone status
  xen: decouple NUMA from ACPI in Kconfig
  xen/arm: use !CONFIG_NUMA to keep fake NUMA API
  xen/x86: use paddr_t for addresses in NUMA node structure
  xen/x86: add detection of memory interleaves for different nodes
  xen/x86: use INFO level for node's without memory log message

 xen/arch/arm/Kconfig              |   4 +
 xen/arch/arm/Makefile             |   2 +-
 xen/arch/arm/bootfdt.c            |   8 +-
 xen/arch/arm/efi/Makefile         |   8 ++
 xen/arch/arm/efi/efi-boot.h       |  25 -----
 xen/arch/arm/include/asm/numa.h   |   6 ++
 xen/arch/x86/Kconfig              |   2 +-
 xen/arch/x86/efi/stub.c           |  32 +------
 xen/arch/x86/include/asm/config.h |   1 -
 xen/arch/x86/include/asm/numa.h   |   9 +-
 xen/arch/x86/numa.c               |  32 +++----
 xen/arch/x86/srat.c               | 154 ++++++++++++++++++++++--------
 xen/common/Kconfig                |   3 +
 xen/common/efi/efi-common.mk      |   3 +-
 xen/common/efi/stub.c             |  32 +++++++
 xen/common/page_alloc.c           |   2 +-
 xen/drivers/acpi/Kconfig          |   3 +-
 xen/drivers/acpi/Makefile         |   2 +-
 18 files changed, 201 insertions(+), 127 deletions(-)
 create mode 100644 xen/common/efi/stub.c

-- 
2.25.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v4 1/8] xen: reuse x86 EFI stub functions for Arm
  2022-05-23  6:25 [PATCH v4 0/8] Device tree based NUMA support for Arm - Part#1 Wei Chen
@ 2022-05-23  6:25 ` Wei Chen
  2022-05-23  7:10   ` Jan Beulich
  2022-05-23  6:25 ` [PATCH v4 2/8] xen/arm: Keep memory nodes in device tree when Xen boots from EFI Wei Chen
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Wei Chen @ 2022-05-23  6:25 UTC (permalink / raw)
  To: xen-devel
  Cc: nd, Wei Chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu, Jiamei Xie

x86 is using compiler feature testing to decide EFI build
enable or not. When EFI build is disabled, x86 will use an
efi/stub.c file to replace efi/runtime.c for build objects.
Following this idea, we introduce a stub file for Arm, but
use CONFIG_ARM_EFI to decide EFI build enable or not.

And the most functions in x86 EFI stub.c can be reused for
other architectures, like Arm. So we move them to common
and keep the x86 specific function in x86/efi/stub.c.

To avoid the symbol link conflict error when linking common
stub files to x86/efi. We add a regular file check in efi
stub files' link script. Depends on this check we can bypass
the link behaviors for existed stub files in x86/efi.

As there is no Arm specific EFI stub function for Arm in
current stage, Arm still can use the existed symbol link
method for EFI stub files.

Change-Id: Idf19db1ada609d05fc0c0c3b0e1e8687c9d6ac71
Issue-Id: SCM-2240
Signed-off-by: Wei Chen <wei.chen@arm.com>
Tested-by: Jiamei Xie <jiamei.xie@arm.com>
---
v3 -> v4:
1. Add indent to make ln and test to be aligned.
v2 -> v3:
1. Add existed file check, if a regular stub files,
   the common/stub files' link will be ignored.
2. Keep stub.c in x86/efi to include common/efi/stub.c
3. Restore efi_compat_xxx stub functions to x86/efi.c.
   Other architectures will not use efi_compat_xxx.
4. Remove ARM_EFI dependency from ARM_64.
5. Add comment for adding stub.o to EFIOBJ-y.
6. Merge patch#2 and patch#3 to one patch.
v1 -> v2:
1. Drop the copy of stub.c from Arm EFI.
2. Share common codes of x86 EFI stub for other architectures.
3. Use CONFIG_ARM_EFI to replace CONFIG_EFI
4. Remove help text and make CONFIG_ARM_EFI invisible.
5. Merge one following patch:
   xen/arm: introduce a stub file for non-EFI architectures
6. Use the common stub.c instead of creating new one.
---
 xen/arch/arm/Kconfig         |  4 ++++
 xen/arch/arm/Makefile        |  2 +-
 xen/arch/arm/efi/Makefile    |  8 ++++++++
 xen/arch/x86/efi/stub.c      | 32 +-------------------------------
 xen/common/efi/efi-common.mk |  3 ++-
 xen/common/efi/stub.c        | 32 ++++++++++++++++++++++++++++++++
 6 files changed, 48 insertions(+), 33 deletions(-)
 create mode 100644 xen/common/efi/stub.c

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index ecfa6822e4..8a16d43bd5 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -6,6 +6,7 @@ config ARM_64
 	def_bool y
 	depends on !ARM_32
 	select 64BIT
+	select ARM_EFI
 	select HAS_FAST_MULTIPLY
 
 config ARM
@@ -33,6 +34,9 @@ config ACPI
 	  Advanced Configuration and Power Interface (ACPI) support for Xen is
 	  an alternative to device tree on ARM64.
 
+config ARM_EFI
+	bool
+
 config GICV3
 	bool "GICv3 driver"
 	depends on ARM_64 && !NEW_VGIC
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 1d862351d1..bb7a6151c1 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -1,6 +1,5 @@
 obj-$(CONFIG_ARM_32) += arm32/
 obj-$(CONFIG_ARM_64) += arm64/
-obj-$(CONFIG_ARM_64) += efi/
 obj-$(CONFIG_ACPI) += acpi/
 obj-$(CONFIG_HAS_PCI) += pci/
 ifneq ($(CONFIG_NO_PLAT),y)
@@ -20,6 +19,7 @@ obj-y += domain.o
 obj-y += domain_build.init.o
 obj-y += domctl.o
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
+obj-y += efi/
 obj-y += gic.o
 obj-y += gic-v2.o
 obj-$(CONFIG_GICV3) += gic-v3.o
diff --git a/xen/arch/arm/efi/Makefile b/xen/arch/arm/efi/Makefile
index 4313c39066..dffe72e589 100644
--- a/xen/arch/arm/efi/Makefile
+++ b/xen/arch/arm/efi/Makefile
@@ -1,4 +1,12 @@
 include $(srctree)/common/efi/efi-common.mk
 
+ifeq ($(CONFIG_ARM_EFI),y)
 obj-y += $(EFIOBJ-y)
 obj-$(CONFIG_ACPI) +=  efi-dom0.init.o
+else
+# Add stub.o to EFIOBJ-y to re-use the clean-files in
+# efi-common.mk. Otherwise the link of stub.c in arm/efi
+# will not be cleaned in "make clean".
+EFIOBJ-y += stub.o
+obj-y += stub.o
+endif
diff --git a/xen/arch/x86/efi/stub.c b/xen/arch/x86/efi/stub.c
index 9984932626..f2365bc041 100644
--- a/xen/arch/x86/efi/stub.c
+++ b/xen/arch/x86/efi/stub.c
@@ -1,7 +1,5 @@
 #include <xen/efi.h>
-#include <xen/errno.h>
 #include <xen/init.h>
-#include <xen/lib.h>
 #include <asm/asm_defns.h>
 #include <asm/efibind.h>
 #include <asm/page.h>
@@ -10,6 +8,7 @@
 #include <efi/eficon.h>
 #include <efi/efidevp.h>
 #include <efi/efiapi.h>
+#include "../../../common/efi/stub.c"
 
 /*
  * Here we are in EFI stub. EFI calls are not supported due to lack
@@ -45,11 +44,6 @@ void __init noreturn efi_multiboot2(EFI_HANDLE ImageHandle,
     unreachable();
 }
 
-bool efi_enabled(unsigned int feature)
-{
-    return false;
-}
-
 void __init efi_init_memory(void) { }
 
 bool efi_boot_mem_unused(unsigned long *start, unsigned long *end)
@@ -62,32 +56,8 @@ bool efi_boot_mem_unused(unsigned long *start, unsigned long *end)
 
 void efi_update_l4_pgtable(unsigned int l4idx, l4_pgentry_t l4e) { }
 
-bool efi_rs_using_pgtables(void)
-{
-    return false;
-}
-
-unsigned long efi_get_time(void)
-{
-    BUG();
-    return 0;
-}
-
-void efi_halt_system(void) { }
-void efi_reset_system(bool warm) { }
-
-int efi_get_info(uint32_t idx, union xenpf_efi_info *info)
-{
-    return -ENOSYS;
-}
-
 int efi_compat_get_info(uint32_t idx, union compat_pf_efi_info *)
     __attribute__((__alias__("efi_get_info")));
 
-int efi_runtime_call(struct xenpf_efi_runtime_call *op)
-{
-    return -ENOSYS;
-}
-
 int efi_compat_runtime_call(struct compat_pf_efi_runtime_call *)
     __attribute__((__alias__("efi_runtime_call")));
diff --git a/xen/common/efi/efi-common.mk b/xen/common/efi/efi-common.mk
index 4298ceaee7..ec2c34f198 100644
--- a/xen/common/efi/efi-common.mk
+++ b/xen/common/efi/efi-common.mk
@@ -9,7 +9,8 @@ CFLAGS-y += -iquote $(srcdir)
 # e.g.: It transforms "dir/foo/bar" into successively
 #       "dir foo bar", ".. .. ..", "../../.."
 $(obj)/%.c: $(srctree)/common/efi/%.c FORCE
-	$(Q)ln -nfs $(subst $(space),/,$(patsubst %,..,$(subst /, ,$(obj))))/source/common/efi/$(<F) $@
+	$(Q)test -f $@ || \
+	    ln -nfs $(subst $(space),/,$(patsubst %,..,$(subst /, ,$(obj))))/source/common/efi/$(<F) $@
 
 clean-files += $(patsubst %.o, %.c, $(EFIOBJ-y:.init.o=.o) $(EFIOBJ-))
 
diff --git a/xen/common/efi/stub.c b/xen/common/efi/stub.c
new file mode 100644
index 0000000000..15694632c2
--- /dev/null
+++ b/xen/common/efi/stub.c
@@ -0,0 +1,32 @@
+#include <xen/efi.h>
+#include <xen/errno.h>
+#include <xen/lib.h>
+
+bool efi_enabled(unsigned int feature)
+{
+    return false;
+}
+
+bool efi_rs_using_pgtables(void)
+{
+    return false;
+}
+
+unsigned long efi_get_time(void)
+{
+    BUG();
+    return 0;
+}
+
+void efi_halt_system(void) { }
+void efi_reset_system(bool warm) { }
+
+int efi_get_info(uint32_t idx, union xenpf_efi_info *info)
+{
+    return -ENOSYS;
+}
+
+int efi_runtime_call(struct xenpf_efi_runtime_call *op)
+{
+    return -ENOSYS;
+}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 2/8] xen/arm: Keep memory nodes in device tree when Xen boots from EFI
  2022-05-23  6:25 [PATCH v4 0/8] Device tree based NUMA support for Arm - Part#1 Wei Chen
  2022-05-23  6:25 ` [PATCH v4 1/8] xen: reuse x86 EFI stub functions for Arm Wei Chen
@ 2022-05-23  6:25 ` Wei Chen
  2022-05-23  6:25 ` [PATCH v4 3/8] xen: introduce an arch helper for default dma zone status Wei Chen
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Wei Chen @ 2022-05-23  6:25 UTC (permalink / raw)
  To: xen-devel
  Cc: nd, Wei Chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Jiamei Xie

In current code, when Xen is booting from EFI, it will delete
all memory nodes in device tree. This would work well in current
stage, because Xen can get memory map from EFI system table.
However, EFI system table cannot completely replace memory nodes
of device tree. EFI system table doesn't contain memory NUMA
information. Xen depends on ACPI SRAT or device tree memory nodes
to parse memory blocks' NUMA mapping. So in EFI + DTB boot, Xen
doesn't have any method to get numa-node-id for memory blocks any
more. This makes device tree based NUMA support become impossible
for Xen in EFI + DTB boot.

So in this patch, we will keep memory nodes in device tree for
NUMA code to parse memory numa-node-id later.

As a side effect, if we still parse boot memory information in
early_scan_node, bootmem.info will calculate memory ranges in
memory nodes twice. So we have to prevent early_scan_node to
parse memory nodes in EFI boot.

Change-Id: Ia1acab203887114c9f31cb3eeeb38794868efc9d
Issue-Id: SCM-2240
Signed-off-by: Wei Chen <wei.chen@arm.com>
Tested-by: Jiamei Xie <jiamei.xie@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
---
v3 -> v4:
1. No change.
v2 -> v3:
1. Add Rb.
v1 -> v2:
1. Move this patch from later to early of this series.
2. Refine commit message.
---
 xen/arch/arm/bootfdt.c      |  8 +++++++-
 xen/arch/arm/efi/efi-boot.h | 25 -------------------------
 2 files changed, 7 insertions(+), 26 deletions(-)

diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
index 29671c8df0..ec81a45de9 100644
--- a/xen/arch/arm/bootfdt.c
+++ b/xen/arch/arm/bootfdt.c
@@ -11,6 +11,7 @@
 #include <xen/lib.h>
 #include <xen/kernel.h>
 #include <xen/init.h>
+#include <xen/efi.h>
 #include <xen/device_tree.h>
 #include <xen/libfdt/libfdt.h>
 #include <xen/sort.h>
@@ -367,7 +368,12 @@ static int __init early_scan_node(const void *fdt,
 {
     int rc = 0;
 
-    if ( device_tree_node_matches(fdt, node, "memory") )
+    /*
+     * If Xen has been booted via UEFI, the memory banks are
+     * populated. So we should skip the parsing.
+     */
+    if ( !efi_enabled(EFI_BOOT) &&
+         device_tree_node_matches(fdt, node, "memory") )
         rc = process_memory_node(fdt, node, name, depth,
                                  address_cells, size_cells, &bootinfo.mem);
     else if ( depth == 1 && !dt_node_cmp(name, "reserved-memory") )
diff --git a/xen/arch/arm/efi/efi-boot.h b/xen/arch/arm/efi/efi-boot.h
index e452b687d8..59d93c24a1 100644
--- a/xen/arch/arm/efi/efi-boot.h
+++ b/xen/arch/arm/efi/efi-boot.h
@@ -231,33 +231,8 @@ EFI_STATUS __init fdt_add_uefi_nodes(EFI_SYSTEM_TABLE *sys_table,
     int status;
     u32 fdt_val32;
     u64 fdt_val64;
-    int prev;
     int num_rsv;
 
-    /*
-     * Delete any memory nodes present.  The EFI memory map is the only
-     * memory description provided to Xen.
-     */
-    prev = 0;
-    for (;;)
-    {
-        const char *type;
-        int len;
-
-        node = fdt_next_node(fdt, prev, NULL);
-        if ( node < 0 )
-            break;
-
-        type = fdt_getprop(fdt, node, "device_type", &len);
-        if ( type && strncmp(type, "memory", len) == 0 )
-        {
-            fdt_del_node(fdt, node);
-            continue;
-        }
-
-        prev = node;
-    }
-
    /*
     * Delete all memory reserve map entries. When booting via UEFI,
     * kernel will use the UEFI memory map to find reserved regions.
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 3/8] xen: introduce an arch helper for default dma zone status
  2022-05-23  6:25 [PATCH v4 0/8] Device tree based NUMA support for Arm - Part#1 Wei Chen
  2022-05-23  6:25 ` [PATCH v4 1/8] xen: reuse x86 EFI stub functions for Arm Wei Chen
  2022-05-23  6:25 ` [PATCH v4 2/8] xen/arm: Keep memory nodes in device tree when Xen boots from EFI Wei Chen
@ 2022-05-23  6:25 ` Wei Chen
  2022-05-23  6:25 ` [PATCH v4 4/8] xen: decouple NUMA from ACPI in Kconfig Wei Chen
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Wei Chen @ 2022-05-23  6:25 UTC (permalink / raw)
  To: xen-devel
  Cc: nd, Wei Chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, George Dunlap, Jan Beulich,
	Wei Liu, Roger Pau Monné,
	Jiamei Xie

In current code, when Xen is running in a multiple nodes
NUMA system, it will set dma_bitsize in end_boot_allocator
to reserve some low address memory as DMA zone.

There are some x86 implications in the implementation.
Because on x86, memory starts from 0. On a multiple-nodes
NUMA system, if a single node contains the majority or all
of the DMA memory, x86 prefers to give out memory from
non-local allocations rather than exhausting the DMA memory
ranges. Hence x86 uses dma_bitsize to set aside some largely
arbitrary amount of memory for DMA zone. The allocations
from DMA zone would happen only after exhausting all other
nodes' memory.

But the implications are not shared across all architectures.
For example, Arm cannot guarantee the availability of memory
below a certain boundary for DMA limited-capability devices
either. But currently, Arm doesn't need a reserved DMA zone
in Xen. Because there is no DMA device in Xen. And for guests,
Xen Arm only allows Dom0 to have DMA operations without IOMMU.
Xen will try to allocate memory under 4GB or memory range that
is limited by dma_bitsize for Dom0 in boot time. For DomU, even
Xen can passthrough devices to DomU without IOMMU, but Xen Arm
doesn't guarantee their DMA operations. So, Xen Arm doesn't
need a reserved DMA zone to provide DMA memory for guests.

In this patch, we introduce an arch_want_default_dmazone helper
for different architectures to determine whether they need to
set dma_bitsize for DMA zone reservation or not.

At the same time, when x86 Xen is built with CONFIG_PV=n could
probably leverage this new helper to actually not trigger DMA
zone reservation.

Issue-Id: SCM-2240
Change-Id: I08c5d463995322ecaf7e6755d417c814825010eb
Signed-off-by: Wei Chen <wei.chen@arm.com>
Tested-by: Jiamei Xie <jiamei.xie@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
v3 -> v4:
1. Add Acked-by.
v2 -> v3:
1. Add Tb.
2. Rename arch_have_default_dmazone to arch_want_default_dmazone.
v1 -> v2:
1. Extend the description of Arm's workaround for reserve DMA
   allocations to avoid the same discussion every time.
2. Use a macro to define arch_have_default_dmazone, because
   it's little hard to make x86 version to static inline.
   Use a macro will also avoid add __init for this function.
3. Change arch_have_default_dmazone return value from
   unsigned int to bool.
4. Un-addressed comment: make arch_have_default_dmazone
   of x86 to be static inline. Because, if we move
   arch_have_default_dmazone to x86/asm/numa.h, it depends
   on nodemask.h to provide num_online_nodes. But nodemask.h
   needs numa.h to provide MAX_NUMANODES. This will cause a
   loop dependency. And this function can only be used in
   end_boot_allocator, in Xen initialization. So I think,
   compared to the changes introduced by inline, it doesn't
   mean much.
---
 xen/arch/arm/include/asm/numa.h | 1 +
 xen/arch/x86/include/asm/numa.h | 1 +
 xen/common/page_alloc.c         | 2 +-
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/include/asm/numa.h b/xen/arch/arm/include/asm/numa.h
index 31a6de4e23..e4c4d89192 100644
--- a/xen/arch/arm/include/asm/numa.h
+++ b/xen/arch/arm/include/asm/numa.h
@@ -24,6 +24,7 @@ extern mfn_t first_valid_mfn;
 #define node_spanned_pages(nid) (max_page - mfn_x(first_valid_mfn))
 #define node_start_pfn(nid) (mfn_x(first_valid_mfn))
 #define __node_distance(a, b) (20)
+#define arch_want_default_dmazone() (false)
 
 #endif /* __ARCH_ARM_NUMA_H */
 /*
diff --git a/xen/arch/x86/include/asm/numa.h b/xen/arch/x86/include/asm/numa.h
index bada2c0bb9..5d8385f2e1 100644
--- a/xen/arch/x86/include/asm/numa.h
+++ b/xen/arch/x86/include/asm/numa.h
@@ -74,6 +74,7 @@ static inline __attribute__((pure)) nodeid_t phys_to_nid(paddr_t addr)
 #define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
 #define node_end_pfn(nid)       (NODE_DATA(nid)->node_start_pfn + \
 				 NODE_DATA(nid)->node_spanned_pages)
+#define arch_want_default_dmazone() (num_online_nodes() > 1)
 
 extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
 
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 319029140f..b3bddc719b 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1889,7 +1889,7 @@ void __init end_boot_allocator(void)
     }
     nr_bootmem_regions = 0;
 
-    if ( !dma_bitsize && (num_online_nodes() > 1) )
+    if ( !dma_bitsize && arch_want_default_dmazone() )
         dma_bitsize = arch_get_dma_bitsize();
 
     printk("Domain heap initialised");
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 4/8] xen: decouple NUMA from ACPI in Kconfig
  2022-05-23  6:25 [PATCH v4 0/8] Device tree based NUMA support for Arm - Part#1 Wei Chen
                   ` (2 preceding siblings ...)
  2022-05-23  6:25 ` [PATCH v4 3/8] xen: introduce an arch helper for default dma zone status Wei Chen
@ 2022-05-23  6:25 ` Wei Chen
  2022-05-23  6:25 ` [PATCH v4 5/8] xen/arm: use !CONFIG_NUMA to keep fake NUMA API Wei Chen
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Wei Chen @ 2022-05-23  6:25 UTC (permalink / raw)
  To: xen-devel
  Cc: nd, Wei Chen, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini,
	Jiamei Xie

In current Xen code only implements x86 ACPI-based NUMA support.
So in Xen Kconfig system, NUMA equals to ACPI_NUMA. x86 selects
NUMA by default, and CONFIG_ACPI_NUMA is hardcode in config.h.

In a follow-up patch, we will introduce support for NUMA using
the device tree. That means we will have two NUMA implementations,
so in this patch we decouple NUMA from ACPI based NUMA in Kconfig.
Make NUMA as a common feature, that device tree based NUMA also
can select it.

Change-Id: I926c73c10aa9bb8e1e7ff77596afb11d3ea18363
Issue-Id: SCM-2240
Signed-off-by: Wei Chen <wei.chen@arm.com>
Tested-by: Jiamei Xie <jiamei.xie@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
v3 -> v4:
no change.
v2 -> v3:
Add Tb.
v1 -> v2:
No change.
---
 xen/arch/x86/Kconfig              | 2 +-
 xen/arch/x86/include/asm/config.h | 1 -
 xen/common/Kconfig                | 3 +++
 xen/drivers/acpi/Kconfig          | 3 ++-
 xen/drivers/acpi/Makefile         | 2 +-
 5 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 06d6fbc864..1e31edc99f 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -6,6 +6,7 @@ config X86
 	def_bool y
 	select ACPI
 	select ACPI_LEGACY_TABLES_LOOKUP
+	select ACPI_NUMA
 	select ALTERNATIVE_CALL
 	select ARCH_SUPPORTS_INT128
 	select CORE_PARKING
@@ -26,7 +27,6 @@ config X86
 	select HAS_UBSAN
 	select HAS_VPCI if HVM
 	select NEEDS_LIBELF
-	select NUMA
 
 config ARCH_DEFCONFIG
 	string
diff --git a/xen/arch/x86/include/asm/config.h b/xen/arch/x86/include/asm/config.h
index de20642524..07bcd15831 100644
--- a/xen/arch/x86/include/asm/config.h
+++ b/xen/arch/x86/include/asm/config.h
@@ -31,7 +31,6 @@
 /* Intel P4 currently has largest cache line (L2 line size is 128 bytes). */
 #define CONFIG_X86_L1_CACHE_SHIFT 7
 
-#define CONFIG_ACPI_NUMA 1
 #define CONFIG_ACPI_SRAT 1
 #define CONFIG_ACPI_CSTATE 1
 
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index d921c74d61..d65add3fc6 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -70,6 +70,9 @@ config MEM_ACCESS
 config NEEDS_LIBELF
 	bool
 
+config NUMA
+	bool
+
 config STATIC_MEMORY
 	bool "Static Allocation Support (UNSUPPORTED)" if UNSUPPORTED
 	depends on ARM
diff --git a/xen/drivers/acpi/Kconfig b/xen/drivers/acpi/Kconfig
index b64d3731fb..e3f3d8f4b1 100644
--- a/xen/drivers/acpi/Kconfig
+++ b/xen/drivers/acpi/Kconfig
@@ -5,5 +5,6 @@ config ACPI
 config ACPI_LEGACY_TABLES_LOOKUP
 	bool
 
-config NUMA
+config ACPI_NUMA
 	bool
+	select NUMA
diff --git a/xen/drivers/acpi/Makefile b/xen/drivers/acpi/Makefile
index 4f8e97228e..2fc5230253 100644
--- a/xen/drivers/acpi/Makefile
+++ b/xen/drivers/acpi/Makefile
@@ -3,7 +3,7 @@ obj-y += utilities/
 obj-$(CONFIG_X86) += apei/
 
 obj-bin-y += tables.init.o
-obj-$(CONFIG_NUMA) += numa.o
+obj-$(CONFIG_ACPI_NUMA) += numa.o
 obj-y += osl.o
 obj-$(CONFIG_HAS_CPUFREQ) += pmstat.o
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 5/8] xen/arm: use !CONFIG_NUMA to keep fake NUMA API
  2022-05-23  6:25 [PATCH v4 0/8] Device tree based NUMA support for Arm - Part#1 Wei Chen
                   ` (3 preceding siblings ...)
  2022-05-23  6:25 ` [PATCH v4 4/8] xen: decouple NUMA from ACPI in Kconfig Wei Chen
@ 2022-05-23  6:25 ` Wei Chen
  2022-05-23  6:25 ` [PATCH v4 6/8] xen/x86: use paddr_t for addresses in NUMA node structure Wei Chen
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Wei Chen @ 2022-05-23  6:25 UTC (permalink / raw)
  To: xen-devel
  Cc: nd, Wei Chen, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Jiamei Xie

We have introduced CONFIG_NUMA in a previous patch. And this
option is enabled only on x86 at the current stage. In a follow
up patch, we will enable this option for Arm. But we still
want users to be able to disable the CONFIG_NUMA via Kconfig. In
this case, keep the fake NUMA API, will make Arm code still
able to work with NUMA aware memory allocation and scheduler.

Change-Id: I595b4c84f61f0b827ad9dfbeef03aae30f4752f0
Issue-Id: SCM-2240
Signed-off-by: Wei Chen <wei.chen@arm.com>
Tested-by: Jiamei Xie <jiamei.xie@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
---
v3 -> v4:
no change
v2 -> v3:
Add Tb.
v1 -> v2:
No change.
---
 xen/arch/arm/include/asm/numa.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/xen/arch/arm/include/asm/numa.h b/xen/arch/arm/include/asm/numa.h
index e4c4d89192..268a9db055 100644
--- a/xen/arch/arm/include/asm/numa.h
+++ b/xen/arch/arm/include/asm/numa.h
@@ -5,6 +5,8 @@
 
 typedef u8 nodeid_t;
 
+#ifndef CONFIG_NUMA
+
 /* Fake one node for now. See also node_online_map. */
 #define cpu_to_node(cpu) 0
 #define node_to_cpumask(node)   (cpu_online_map)
@@ -24,6 +26,9 @@ extern mfn_t first_valid_mfn;
 #define node_spanned_pages(nid) (max_page - mfn_x(first_valid_mfn))
 #define node_start_pfn(nid) (mfn_x(first_valid_mfn))
 #define __node_distance(a, b) (20)
+
+#endif
+
 #define arch_want_default_dmazone() (false)
 
 #endif /* __ARCH_ARM_NUMA_H */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 6/8] xen/x86: use paddr_t for addresses in NUMA node structure
  2022-05-23  6:25 [PATCH v4 0/8] Device tree based NUMA support for Arm - Part#1 Wei Chen
                   ` (4 preceding siblings ...)
  2022-05-23  6:25 ` [PATCH v4 5/8] xen/arm: use !CONFIG_NUMA to keep fake NUMA API Wei Chen
@ 2022-05-23  6:25 ` Wei Chen
  2022-05-23  6:25 ` [PATCH v4 7/8] xen/x86: add detection of memory interleaves for different nodes Wei Chen
  2022-05-23  6:25 ` [PATCH v4 8/8] xen/x86: use INFO level for node's without memory log message Wei Chen
  7 siblings, 0 replies; 18+ messages in thread
From: Wei Chen @ 2022-05-23  6:25 UTC (permalink / raw)
  To: xen-devel
  Cc: nd, Wei Chen, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Jiamei Xie

NUMA node structure "struct node" is using u64 as node memory
range. In order to make other architectures can reuse this
NUMA node relative code, we replace the u64 to paddr_t. And
use pfn_to_paddr and paddr_to_pfn to replace explicit shift
operations. The relate PRIx64 in print messages have been
replaced by PRIpaddr at the same time. And some being-phased-out
types like u64 in the lines we have touched also have been
converted to uint64_t or unsigned long.

Change-Id: Iac64ff398c10f6e642d40d06fc57e28806a1a4ff
Issue-Id: SCM-2240
Tested-by: Jiamei Xie <jiamei.xie@arm.com>
Signed-off-by: Wei Chen <wei.chen@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
v3 -> v4:
1. Add Tb.
v2 -> v3:
1. Use uint64_t for size in acpi_scan_nodes, make it be
   consistent with numa_emulation.
2. Add Tb.
v1 -> v2:
1. Drop useless cast.
2. Use initializers of the variables.
3. Replace u64 by uint64_t.
4. Use unsigned long for start_pfn and end_pfn.
---
 xen/arch/x86/include/asm/numa.h |  8 ++++----
 xen/arch/x86/numa.c             | 32 +++++++++++++++-----------------
 xen/arch/x86/srat.c             | 25 +++++++++++++------------
 3 files changed, 32 insertions(+), 33 deletions(-)

diff --git a/xen/arch/x86/include/asm/numa.h b/xen/arch/x86/include/asm/numa.h
index 5d8385f2e1..c32ccffde3 100644
--- a/xen/arch/x86/include/asm/numa.h
+++ b/xen/arch/x86/include/asm/numa.h
@@ -18,7 +18,7 @@ extern cpumask_t     node_to_cpumask[];
 #define node_to_cpumask(node)    (node_to_cpumask[node])
 
 struct node { 
-	u64 start,end; 
+	paddr_t start, end;
 };
 
 extern int compute_hash_shift(struct node *nodes, int numnodes,
@@ -38,7 +38,7 @@ extern void numa_set_node(int cpu, nodeid_t node);
 extern nodeid_t setup_node(unsigned int pxm);
 extern void srat_detect_node(int cpu);
 
-extern void setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end);
+extern void setup_node_bootmem(nodeid_t nodeid, paddr_t start, paddr_t end);
 extern nodeid_t apicid_to_node[];
 extern void init_cpu_to_node(void);
 
@@ -76,9 +76,9 @@ static inline __attribute__((pure)) nodeid_t phys_to_nid(paddr_t addr)
 				 NODE_DATA(nid)->node_spanned_pages)
 #define arch_want_default_dmazone() (num_online_nodes() > 1)
 
-extern int valid_numa_range(u64 start, u64 end, nodeid_t node);
+extern int valid_numa_range(paddr_t start, paddr_t end, nodeid_t node);
 
-void srat_parse_regions(u64 addr);
+void srat_parse_regions(paddr_t addr);
 extern u8 __node_distance(nodeid_t a, nodeid_t b);
 unsigned int arch_get_dma_bitsize(void);
 
diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
index 680b7d9002..627ae8aa95 100644
--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -162,12 +162,10 @@ int __init compute_hash_shift(struct node *nodes, int numnodes,
     return shift;
 }
 /* initialize NODE_DATA given nodeid and start/end */
-void __init setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end)
-{ 
-    unsigned long start_pfn, end_pfn;
-
-    start_pfn = start >> PAGE_SHIFT;
-    end_pfn = end >> PAGE_SHIFT;
+void __init setup_node_bootmem(nodeid_t nodeid, paddr_t start, paddr_t end)
+{
+    unsigned long start_pfn = paddr_to_pfn(start);
+    unsigned long end_pfn = paddr_to_pfn(end);
 
     NODE_DATA(nodeid)->node_start_pfn = start_pfn;
     NODE_DATA(nodeid)->node_spanned_pages = end_pfn - start_pfn;
@@ -198,11 +196,12 @@ void __init numa_init_array(void)
 static int numa_fake __initdata = 0;
 
 /* Numa emulation */
-static int __init numa_emulation(u64 start_pfn, u64 end_pfn)
+static int __init numa_emulation(unsigned long start_pfn,
+                                 unsigned long end_pfn)
 {
     int i;
     struct node nodes[MAX_NUMNODES];
-    u64 sz = ((end_pfn - start_pfn)<<PAGE_SHIFT) / numa_fake;
+    uint64_t sz = pfn_to_paddr(end_pfn - start_pfn) / numa_fake;
 
     /* Kludge needed for the hash function */
     if ( hweight64(sz) > 1 )
@@ -218,9 +217,9 @@ static int __init numa_emulation(u64 start_pfn, u64 end_pfn)
     memset(&nodes,0,sizeof(nodes));
     for ( i = 0; i < numa_fake; i++ )
     {
-        nodes[i].start = (start_pfn<<PAGE_SHIFT) + i*sz;
+        nodes[i].start = pfn_to_paddr(start_pfn) + i * sz;
         if ( i == numa_fake - 1 )
-            sz = (end_pfn<<PAGE_SHIFT) - nodes[i].start;
+            sz = pfn_to_paddr(end_pfn) - nodes[i].start;
         nodes[i].end = nodes[i].start + sz;
         printk(KERN_INFO "Faking node %d at %"PRIx64"-%"PRIx64" (%"PRIu64"MB)\n",
                i,
@@ -246,6 +245,8 @@ static int __init numa_emulation(u64 start_pfn, u64 end_pfn)
 void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
 { 
     int i;
+    paddr_t start = pfn_to_paddr(start_pfn);
+    paddr_t end = pfn_to_paddr(end_pfn);
 
 #ifdef CONFIG_NUMA_EMU
     if ( numa_fake && !numa_emulation(start_pfn, end_pfn) )
@@ -253,17 +254,15 @@ void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
 #endif
 
 #ifdef CONFIG_ACPI_NUMA
-    if ( !numa_off && !acpi_scan_nodes((u64)start_pfn << PAGE_SHIFT,
-         (u64)end_pfn << PAGE_SHIFT) )
+    if ( !numa_off && !acpi_scan_nodes(start, end) )
         return;
 #endif
 
     printk(KERN_INFO "%s\n",
            numa_off ? "NUMA turned off" : "No NUMA configuration found");
 
-    printk(KERN_INFO "Faking a node at %016"PRIx64"-%016"PRIx64"\n",
-           (u64)start_pfn << PAGE_SHIFT,
-           (u64)end_pfn << PAGE_SHIFT);
+    printk(KERN_INFO "Faking a node at %"PRIpaddr"-%"PRIpaddr"\n",
+           start, end);
     /* setup dummy node covering all memory */
     memnode_shift = BITS_PER_LONG - 1;
     memnodemap = _memnodemap;
@@ -276,8 +275,7 @@ void __init numa_initmem_init(unsigned long start_pfn, unsigned long end_pfn)
     for ( i = 0; i < nr_cpu_ids; i++ )
         numa_set_node(i, 0);
     cpumask_copy(&node_to_cpumask[0], cpumask_of(0));
-    setup_node_bootmem(0, (u64)start_pfn << PAGE_SHIFT,
-                    (u64)end_pfn << PAGE_SHIFT);
+    setup_node_bootmem(0, start, end);
 }
 
 void numa_add_cpu(int cpu)
diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
index cfe24c7e78..8ffe43bdfe 100644
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -104,7 +104,7 @@ nodeid_t setup_node(unsigned pxm)
 	return node;
 }
 
-int valid_numa_range(u64 start, u64 end, nodeid_t node)
+int valid_numa_range(paddr_t start, paddr_t end, nodeid_t node)
 {
 	int i;
 
@@ -119,7 +119,7 @@ int valid_numa_range(u64 start, u64 end, nodeid_t node)
 	return 0;
 }
 
-static __init int conflicting_memblks(u64 start, u64 end)
+static __init int conflicting_memblks(paddr_t start, paddr_t end)
 {
 	int i;
 
@@ -135,7 +135,7 @@ static __init int conflicting_memblks(u64 start, u64 end)
 	return -1;
 }
 
-static __init void cutoff_node(int i, u64 start, u64 end)
+static __init void cutoff_node(int i, paddr_t start, paddr_t end)
 {
 	struct node *nd = &nodes[i];
 	if (nd->start < start) {
@@ -275,7 +275,7 @@ acpi_numa_processor_affinity_init(const struct acpi_srat_cpu_affinity *pa)
 void __init
 acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
 {
-	u64 start, end;
+	paddr_t start, end;
 	unsigned pxm;
 	nodeid_t node;
 	int i;
@@ -318,7 +318,7 @@ acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
 		bool mismatch = !(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) !=
 		                !test_bit(i, memblk_hotplug);
 
-		printk("%sSRAT: PXM %u (%"PRIx64"-%"PRIx64") overlaps with itself (%"PRIx64"-%"PRIx64")\n",
+		printk("%sSRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with itself (%"PRIpaddr"-%"PRIpaddr")\n",
 		       mismatch ? KERN_ERR : KERN_WARNING, pxm, start, end,
 		       node_memblk_range[i].start, node_memblk_range[i].end);
 		if (mismatch) {
@@ -327,7 +327,7 @@ acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
 		}
 	} else {
 		printk(KERN_ERR
-		       "SRAT: PXM %u (%"PRIx64"-%"PRIx64") overlaps with PXM %u (%"PRIx64"-%"PRIx64")\n",
+		       "SRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with PXM %u (%"PRIpaddr"-%"PRIpaddr")\n",
 		       pxm, start, end, node_to_pxm(memblk_nodeid[i]),
 		       node_memblk_range[i].start, node_memblk_range[i].end);
 		bad_srat();
@@ -346,7 +346,7 @@ acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
 				nd->end = end;
 		}
 	}
-	printk(KERN_INFO "SRAT: Node %u PXM %u %"PRIx64"-%"PRIx64"%s\n",
+	printk(KERN_INFO "SRAT: Node %u PXM %u %"PRIpaddr"-%"PRIpaddr"%s\n",
 	       node, pxm, start, end,
 	       ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE ? " (hotplug)" : "");
 
@@ -369,7 +369,7 @@ static int __init nodes_cover_memory(void)
 
 	for (i = 0; i < e820.nr_map; i++) {
 		int j, found;
-		unsigned long long start, end;
+		paddr_t start, end;
 
 		if (e820.map[i].type != E820_RAM) {
 			continue;
@@ -396,7 +396,7 @@ static int __init nodes_cover_memory(void)
 
 		if (start < end) {
 			printk(KERN_ERR "SRAT: No PXM for e820 range: "
-				"%016Lx - %016Lx\n", start, end);
+				"%"PRIpaddr" - %"PRIpaddr"\n", start, end);
 			return 0;
 		}
 	}
@@ -432,7 +432,7 @@ static int __init cf_check srat_parse_region(
 	return 0;
 }
 
-void __init srat_parse_regions(u64 addr)
+void __init srat_parse_regions(paddr_t addr)
 {
 	u64 mask;
 	unsigned int i;
@@ -457,7 +457,7 @@ void __init srat_parse_regions(u64 addr)
 }
 
 /* Use the information discovered above to actually set up the nodes. */
-int __init acpi_scan_nodes(u64 start, u64 end)
+int __init acpi_scan_nodes(paddr_t start, paddr_t end)
 {
 	int i;
 	nodemask_t all_nodes_parsed;
@@ -489,7 +489,8 @@ int __init acpi_scan_nodes(u64 start, u64 end)
 	/* Finally register nodes */
 	for_each_node_mask(i, all_nodes_parsed)
 	{
-		u64 size = nodes[i].end - nodes[i].start;
+		uint64_t size = nodes[i].end - nodes[i].start;
+
 		if ( size == 0 )
 			printk(KERN_WARNING "SRAT: Node %u has no memory. "
 			       "BIOS Bug or mis-configured hardware?\n", i);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 7/8] xen/x86: add detection of memory interleaves for different nodes
  2022-05-23  6:25 [PATCH v4 0/8] Device tree based NUMA support for Arm - Part#1 Wei Chen
                   ` (5 preceding siblings ...)
  2022-05-23  6:25 ` [PATCH v4 6/8] xen/x86: use paddr_t for addresses in NUMA node structure Wei Chen
@ 2022-05-23  6:25 ` Wei Chen
  2022-05-30  1:46   ` Henry Wang
  2022-05-31 13:21   ` Jan Beulich
  2022-05-23  6:25 ` [PATCH v4 8/8] xen/x86: use INFO level for node's without memory log message Wei Chen
  7 siblings, 2 replies; 18+ messages in thread
From: Wei Chen @ 2022-05-23  6:25 UTC (permalink / raw)
  To: xen-devel
  Cc: nd, Wei Chen, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Jiamei Xie

One NUMA node may contain several memory blocks. In current Xen
code, Xen will maintain a node memory range for each node to cover
all its memory blocks. But here comes the problem, in the gap of
one node's two memory blocks, if there are some memory blocks don't
belong to this node (remote memory blocks). This node's memory range
will be expanded to cover these remote memory blocks.

One node's memory range contains other nodes' memory, this is
obviously not very reasonable. This means current NUMA code only
can support node has no interleaved memory blocks. However, on a
physical machine, the addresses of multiple nodes can be interleaved.

So in this patch, we add code to detect memory interleaves of
different nodes. NUMA initialization will be failed and error
messages will be printed when Xen detect such hardware configuration.

Change-Id: Ia7ff9a9128ecbe3eb4dddd1307ae8fbe65575ccf
Issue-Id: SCM-2240
Signed-off-by: Wei Chen <wei.chen@arm.com>
Tested-by: Jiamei Xie <jiamei.xie@arm.com>
---
v3 -> v4:
1. Drop "ERR" prefix for enumeration, and remove init value.
2. Use "switch case" for enumeration, and add "default:"
3. Use "PXM" in log messages.
4. Use unsigned int for node memory block id.
5. Fix some code-style comments.
6. Use "nd->end" in node range expansion check.
v2 -> v3:
1. Merge the check code from a separate function to
   conflicting_memblks. This will reduce the loop
   times of node memory blocks.
2. Use an enumeration to indicate conflict check status.
3. Use a pointer to get conflict memory block id.
v1 -> v2:
1. Update the description to say we're after is no memory
   interleaves of different nodes.
2. Only update node range when it passes the interleave check.
3. Don't use full upper-case for "node".
---
 xen/arch/x86/srat.c | 132 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 101 insertions(+), 31 deletions(-)

diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
index 8ffe43bdfe..a831df7648 100644
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -42,6 +42,12 @@ static struct node node_memblk_range[NR_NODE_MEMBLKS];
 static nodeid_t memblk_nodeid[NR_NODE_MEMBLKS];
 static __initdata DECLARE_BITMAP(memblk_hotplug, NR_NODE_MEMBLKS);
 
+enum conflicts {
+	NO_CONFLICT,
+	OVERLAP,
+	INTERLEAVE,
+};
+
 static inline bool node_found(unsigned idx, unsigned pxm)
 {
 	return ((pxm2node[idx].pxm == pxm) &&
@@ -119,20 +125,45 @@ int valid_numa_range(paddr_t start, paddr_t end, nodeid_t node)
 	return 0;
 }
 
-static __init int conflicting_memblks(paddr_t start, paddr_t end)
+static
+enum conflicts __init conflicting_memblks(nodeid_t nid, paddr_t start,
+					  paddr_t end, paddr_t nd_start,
+					  paddr_t nd_end, unsigned int *mblkid)
 {
-	int i;
+	unsigned int i;
 
+	/*
+	 * Scan all recorded nodes' memory blocks to check conflicts:
+	 * Overlap or interleave.
+	 */
 	for (i = 0; i < num_node_memblks; i++) {
 		struct node *nd = &node_memblk_range[i];
+
+		*mblkid = i;
+
+		/* Skip 0 bytes node memory block. */
 		if (nd->start == nd->end)
 			continue;
+		/*
+		 * Use memblk range to check memblk overlaps, include the
+		 * self-overlap case.
+		 */
 		if (nd->end > start && nd->start < end)
-			return i;
+			return OVERLAP;
 		if (nd->end == end && nd->start == start)
-			return i;
+			return OVERLAP;
+		/*
+		 * Use node memory range to check whether new range contains
+		 * memory from other nodes - interleave check. We just need
+		 * to check full contains situation. Because overlaps have
+		 * been checked above.
+		 */
+	        if (nid != memblk_nodeid[i] &&
+		    (nd_start < nd->start && nd->end < nd_end))
+			return INTERLEAVE;
 	}
-	return -1;
+
+	return NO_CONFLICT;
 }
 
 static __init void cutoff_node(int i, paddr_t start, paddr_t end)
@@ -275,10 +306,13 @@ acpi_numa_processor_affinity_init(const struct acpi_srat_cpu_affinity *pa)
 void __init
 acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
 {
+	enum conflicts status;
+	struct node *nd;
+	paddr_t nd_start, nd_end;
 	paddr_t start, end;
 	unsigned pxm;
 	nodeid_t node;
-	int i;
+	unsigned int i;
 
 	if (srat_disabled())
 		return;
@@ -310,42 +344,78 @@ acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
 		bad_srat();
 		return;
 	}
+
+	/*
+	 * For the node that already has some memory blocks, we will
+	 * expand the node memory range temporarily to check memory
+	 * interleaves with other nodes. We will not use this node
+	 * temp memory range to check overlaps, because it will mask
+	 * the overlaps in same node.
+	 *
+	 * Node with 0 bytes memory doesn't need this expandsion.
+	 */
+	nd_start = start;
+	nd_end = end;
+	nd = &nodes[node];
+	if (nd->start != nd->end) {
+		if (nd_start > nd->start)
+			nd_start = nd->start;
+
+		if (nd_end < nd->end)
+			nd_end = nd->end;
+	}
+
 	/* It is fine to add this area to the nodes data it will be used later*/
-	i = conflicting_memblks(start, end);
-	if (i < 0)
-		/* everything fine */;
-	else if (memblk_nodeid[i] == node) {
-		bool mismatch = !(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) !=
-		                !test_bit(i, memblk_hotplug);
-
-		printk("%sSRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with itself (%"PRIpaddr"-%"PRIpaddr")\n",
-		       mismatch ? KERN_ERR : KERN_WARNING, pxm, start, end,
-		       node_memblk_range[i].start, node_memblk_range[i].end);
-		if (mismatch) {
+	status = conflicting_memblks(node, start, end, nd_start, nd_end, &i);
+	switch(status)
+	{
+	case OVERLAP:
+	{
+		if (memblk_nodeid[i] == node) {
+			bool mismatch = !(ma->flags &
+					  ACPI_SRAT_MEM_HOT_PLUGGABLE) !=
+			                !test_bit(i, memblk_hotplug);
+
+			printk("%sSRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with itself (%"PRIpaddr"-%"PRIpaddr")\n",
+			       mismatch ? KERN_ERR : KERN_WARNING, pxm, start,
+			       end, node_memblk_range[i].start,
+			       node_memblk_range[i].end);
+			if (mismatch) {
+				bad_srat();
+				return;
+			}
+			break;
+		} else {
+			printk(KERN_ERR
+			       "SRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with PXM %u (%"PRIpaddr"-%"PRIpaddr")\n",
+			       pxm, start, end, node_to_pxm(memblk_nodeid[i]),
+			       node_memblk_range[i].start,
+			       node_memblk_range[i].end);
 			bad_srat();
 			return;
 		}
-	} else {
+	}
+
+	case INTERLEAVE:
+	{
 		printk(KERN_ERR
-		       "SRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with PXM %u (%"PRIpaddr"-%"PRIpaddr")\n",
-		       pxm, start, end, node_to_pxm(memblk_nodeid[i]),
+		       "SRAT: PXM %u: (%"PRIpaddr"-%"PRIpaddr") interleaves with PXM %u memblk (%"PRIpaddr"-%"PRIpaddr")\n",
+		       node, nd_start, nd_end, node_to_pxm(memblk_nodeid[i]),
 		       node_memblk_range[i].start, node_memblk_range[i].end);
 		bad_srat();
 		return;
 	}
-	if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
-		struct node *nd = &nodes[node];
 
-		if (!node_test_and_set(node, memory_nodes_parsed)) {
-			nd->start = start;
-			nd->end = end;
-		} else {
-			if (start < nd->start)
-				nd->start = start;
-			if (nd->end < end)
-				nd->end = end;
-		}
+	default:
+		break;
+	}
+
+	if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
+		node_set(node, memory_nodes_parsed);
+		nd->start = nd_start;
+		nd->end = nd_end;
 	}
+
 	printk(KERN_INFO "SRAT: Node %u PXM %u %"PRIpaddr"-%"PRIpaddr"%s\n",
 	       node, pxm, start, end,
 	       ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE ? " (hotplug)" : "");
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 8/8] xen/x86: use INFO level for node's without memory log message
  2022-05-23  6:25 [PATCH v4 0/8] Device tree based NUMA support for Arm - Part#1 Wei Chen
                   ` (6 preceding siblings ...)
  2022-05-23  6:25 ` [PATCH v4 7/8] xen/x86: add detection of memory interleaves for different nodes Wei Chen
@ 2022-05-23  6:25 ` Wei Chen
  7 siblings, 0 replies; 18+ messages in thread
From: Wei Chen @ 2022-05-23  6:25 UTC (permalink / raw)
  To: xen-devel
  Cc: nd, Wei Chen, Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu

In previous code, Xen was using KERN_WARNING for log message
when Xen found a node without memory. Xen will print this
warning message, and said that this may be an BIOS Bug or
mis-configured hardware. But actually, this warning is bogus,
because in an NUMA setting, nodes can only have processors,
and with 0 bytes memory. So it is unreasonable to warn about
BIOS or hardware corruption based on the detection of node
with 0 bytes memory.

So in this patch, we remove the warning messages, but just
keep an info message to info users that there is one or more
nodes with 0 bytes memory in the system.

Issue-Id: SCM-2240
Change-Id: I922a5f17e8d7e9d250a70eb3f703dabe4698027a
Signed-off-by: Wei Chen <wei.chen@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
v3 -> v4:
1. Remove full stop and use lower-case for node.
2. Add Rb.
v2 -> v3:
new commit.
---
 xen/arch/x86/srat.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
index a831df7648..5bd6279920 100644
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -562,8 +562,7 @@ int __init acpi_scan_nodes(paddr_t start, paddr_t end)
 		uint64_t size = nodes[i].end - nodes[i].start;
 
 		if ( size == 0 )
-			printk(KERN_WARNING "SRAT: Node %u has no memory. "
-			       "BIOS Bug or mis-configured hardware?\n", i);
+			printk(KERN_INFO "SRAT: node %u has no memory\n", i);
 
 		setup_node_bootmem(i, nodes[i].start, nodes[i].end);
 	}
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 1/8] xen: reuse x86 EFI stub functions for Arm
  2022-05-23  6:25 ` [PATCH v4 1/8] xen: reuse x86 EFI stub functions for Arm Wei Chen
@ 2022-05-23  7:10   ` Jan Beulich
  2022-05-23  7:19     ` Wei Chen
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2022-05-23  7:10 UTC (permalink / raw)
  To: Wei Chen
  Cc: nd, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Jiamei Xie, xen-devel

On 23.05.2022 08:25, Wei Chen wrote:
> x86 is using compiler feature testing to decide EFI build
> enable or not. When EFI build is disabled, x86 will use an
> efi/stub.c file to replace efi/runtime.c for build objects.
> Following this idea, we introduce a stub file for Arm, but
> use CONFIG_ARM_EFI to decide EFI build enable or not.
> 
> And the most functions in x86 EFI stub.c can be reused for
> other architectures, like Arm. So we move them to common
> and keep the x86 specific function in x86/efi/stub.c.
> 
> To avoid the symbol link conflict error when linking common
> stub files to x86/efi. We add a regular file check in efi
> stub files' link script. Depends on this check we can bypass
> the link behaviors for existed stub files in x86/efi.
> 
> As there is no Arm specific EFI stub function for Arm in
> current stage, Arm still can use the existed symbol link
> method for EFI stub files.
> 
> Change-Id: Idf19db1ada609d05fc0c0c3b0e1e8687c9d6ac71
> Issue-Id: SCM-2240

I don't think these two lines belong in an upstream submission (I
checked patch 2 and at least there they are two similar lines).

> Signed-off-by: Wei Chen <wei.chen@arm.com>
> Tested-by: Jiamei Xie <jiamei.xie@arm.com>

While I'm not really happy with the Arm side, it's only the other
parts which this is applicable to anyway (with the stray tags
dropped):
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan



^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v4 1/8] xen: reuse x86 EFI stub functions for Arm
  2022-05-23  7:10   ` Jan Beulich
@ 2022-05-23  7:19     ` Wei Chen
  0 siblings, 0 replies; 18+ messages in thread
From: Wei Chen @ 2022-05-23  7:19 UTC (permalink / raw)
  To: Jan Beulich
  Cc: nd, Stefano Stabellini, Julien Grall, Bertrand Marquis,
	Volodymyr Babchuk, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Jiamei Xie, xen-devel

Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2022年5月23日 15:11
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: nd <nd@arm.com>; Stefano Stabellini <sstabellini@kernel.org>; Julien
> Grall <julien@xen.org>; Bertrand Marquis <Bertrand.Marquis@arm.com>;
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; Roger Pau Monné <roger.pau@citrix.com>; Wei
> Liu <wl@xen.org>; Jiamei Xie <Jiamei.Xie@arm.com>; xen-
> devel@lists.xenproject.org
> Subject: Re: [PATCH v4 1/8] xen: reuse x86 EFI stub functions for Arm
> 
> On 23.05.2022 08:25, Wei Chen wrote:
> > x86 is using compiler feature testing to decide EFI build
> > enable or not. When EFI build is disabled, x86 will use an
> > efi/stub.c file to replace efi/runtime.c for build objects.
> > Following this idea, we introduce a stub file for Arm, but
> > use CONFIG_ARM_EFI to decide EFI build enable or not.
> >
> > And the most functions in x86 EFI stub.c can be reused for
> > other architectures, like Arm. So we move them to common
> > and keep the x86 specific function in x86/efi/stub.c.
> >
> > To avoid the symbol link conflict error when linking common
> > stub files to x86/efi. We add a regular file check in efi
> > stub files' link script. Depends on this check we can bypass
> > the link behaviors for existed stub files in x86/efi.
> >
> > As there is no Arm specific EFI stub function for Arm in
> > current stage, Arm still can use the existed symbol link
> > method for EFI stub files.
> >
> > Change-Id: Idf19db1ada609d05fc0c0c3b0e1e8687c9d6ac71
> > Issue-Id: SCM-2240
> 
> I don't think these two lines belong in an upstream submission (I
> checked patch 2 and at least there they are two similar lines).
> 

Ah, sorry, I had selected the wrong directory after I ran the scripts.
But the patch content is the same.

> > Signed-off-by: Wei Chen <wei.chen@arm.com>
> > Tested-by: Jiamei Xie <jiamei.xie@arm.com>
> 
> While I'm not really happy with the Arm side, it's only the other
> parts which this is applicable to anyway (with the stray tags
> dropped):
> Acked-by: Jan Beulich <jbeulich@suse.com>
> 

Thanks!

> Jan


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v4 7/8] xen/x86: add detection of memory interleaves for different nodes
  2022-05-23  6:25 ` [PATCH v4 7/8] xen/x86: add detection of memory interleaves for different nodes Wei Chen
@ 2022-05-30  1:46   ` Henry Wang
  2022-05-31 13:21   ` Jan Beulich
  1 sibling, 0 replies; 18+ messages in thread
From: Henry Wang @ 2022-05-30  1:46 UTC (permalink / raw)
  To: xen-devel
  Cc: nd, Wei Chen, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, Jiamei Xie

Hi,

It seems that this patch is the only one that is not properly acked and
reviewed in this series. So this is a gentle ping asking if everyone is happy
with this patch or are there any more actions that should be taken from the
author. Thanks!

Kind regards,
Henry

> -----Original Message-----
> Subject: [PATCH v4 7/8] xen/x86: add detection of memory interleaves for
> different nodes
> 
> One NUMA node may contain several memory blocks. In current Xen
> code, Xen will maintain a node memory range for each node to cover
> all its memory blocks. But here comes the problem, in the gap of
> one node's two memory blocks, if there are some memory blocks don't
> belong to this node (remote memory blocks). This node's memory range
> will be expanded to cover these remote memory blocks.
> 
> One node's memory range contains other nodes' memory, this is
> obviously not very reasonable. This means current NUMA code only
> can support node has no interleaved memory blocks. However, on a
> physical machine, the addresses of multiple nodes can be interleaved.
> 
> So in this patch, we add code to detect memory interleaves of
> different nodes. NUMA initialization will be failed and error
> messages will be printed when Xen detect such hardware configuration.
> 
> Change-Id: Ia7ff9a9128ecbe3eb4dddd1307ae8fbe65575ccf
> Issue-Id: SCM-2240
> Signed-off-by: Wei Chen <wei.chen@arm.com>
> Tested-by: Jiamei Xie <jiamei.xie@arm.com>
> ---
> v3 -> v4:
> 1. Drop "ERR" prefix for enumeration, and remove init value.
> 2. Use "switch case" for enumeration, and add "default:"
> 3. Use "PXM" in log messages.
> 4. Use unsigned int for node memory block id.
> 5. Fix some code-style comments.
> 6. Use "nd->end" in node range expansion check.
> v2 -> v3:
> 1. Merge the check code from a separate function to
>    conflicting_memblks. This will reduce the loop
>    times of node memory blocks.
> 2. Use an enumeration to indicate conflict check status.
> 3. Use a pointer to get conflict memory block id.
> v1 -> v2:
> 1. Update the description to say we're after is no memory
>    interleaves of different nodes.
> 2. Only update node range when it passes the interleave check.
> 3. Don't use full upper-case for "node".
> ---
>  xen/arch/x86/srat.c | 132 +++++++++++++++++++++++++++++++++-----------
>  1 file changed, 101 insertions(+), 31 deletions(-)
> 
> diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
> index 8ffe43bdfe..a831df7648 100644
> --- a/xen/arch/x86/srat.c
> +++ b/xen/arch/x86/srat.c
> @@ -42,6 +42,12 @@ static struct node
> node_memblk_range[NR_NODE_MEMBLKS];
>  static nodeid_t memblk_nodeid[NR_NODE_MEMBLKS];
>  static __initdata DECLARE_BITMAP(memblk_hotplug,
> NR_NODE_MEMBLKS);
> 
> +enum conflicts {
> +	NO_CONFLICT,
> +	OVERLAP,
> +	INTERLEAVE,
> +};
> +
>  static inline bool node_found(unsigned idx, unsigned pxm)
>  {
>  	return ((pxm2node[idx].pxm == pxm) &&
> @@ -119,20 +125,45 @@ int valid_numa_range(paddr_t start, paddr_t end,
> nodeid_t node)
>  	return 0;
>  }
> 
> -static __init int conflicting_memblks(paddr_t start, paddr_t end)
> +static
> +enum conflicts __init conflicting_memblks(nodeid_t nid, paddr_t start,
> +					  paddr_t end, paddr_t nd_start,
> +					  paddr_t nd_end, unsigned int
> *mblkid)
>  {
> -	int i;
> +	unsigned int i;
> 
> +	/*
> +	 * Scan all recorded nodes' memory blocks to check conflicts:
> +	 * Overlap or interleave.
> +	 */
>  	for (i = 0; i < num_node_memblks; i++) {
>  		struct node *nd = &node_memblk_range[i];
> +
> +		*mblkid = i;
> +
> +		/* Skip 0 bytes node memory block. */
>  		if (nd->start == nd->end)
>  			continue;
> +		/*
> +		 * Use memblk range to check memblk overlaps, include the
> +		 * self-overlap case.
> +		 */
>  		if (nd->end > start && nd->start < end)
> -			return i;
> +			return OVERLAP;
>  		if (nd->end == end && nd->start == start)
> -			return i;
> +			return OVERLAP;
> +		/*
> +		 * Use node memory range to check whether new range
> contains
> +		 * memory from other nodes - interleave check. We just
> need
> +		 * to check full contains situation. Because overlaps have
> +		 * been checked above.
> +		 */
> +	        if (nid != memblk_nodeid[i] &&
> +		    (nd_start < nd->start && nd->end < nd_end))
> +			return INTERLEAVE;
>  	}
> -	return -1;
> +
> +	return NO_CONFLICT;
>  }
> 
>  static __init void cutoff_node(int i, paddr_t start, paddr_t end)
> @@ -275,10 +306,13 @@ acpi_numa_processor_affinity_init(const struct
> acpi_srat_cpu_affinity *pa)
>  void __init
>  acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
>  {
> +	enum conflicts status;
> +	struct node *nd;
> +	paddr_t nd_start, nd_end;
>  	paddr_t start, end;
>  	unsigned pxm;
>  	nodeid_t node;
> -	int i;
> +	unsigned int i;
> 
>  	if (srat_disabled())
>  		return;
> @@ -310,42 +344,78 @@ acpi_numa_memory_affinity_init(const struct
> acpi_srat_mem_affinity *ma)
>  		bad_srat();
>  		return;
>  	}
> +
> +	/*
> +	 * For the node that already has some memory blocks, we will
> +	 * expand the node memory range temporarily to check memory
> +	 * interleaves with other nodes. We will not use this node
> +	 * temp memory range to check overlaps, because it will mask
> +	 * the overlaps in same node.
> +	 *
> +	 * Node with 0 bytes memory doesn't need this expandsion.
> +	 */
> +	nd_start = start;
> +	nd_end = end;
> +	nd = &nodes[node];
> +	if (nd->start != nd->end) {
> +		if (nd_start > nd->start)
> +			nd_start = nd->start;
> +
> +		if (nd_end < nd->end)
> +			nd_end = nd->end;
> +	}
> +
>  	/* It is fine to add this area to the nodes data it will be used later*/
> -	i = conflicting_memblks(start, end);
> -	if (i < 0)
> -		/* everything fine */;
> -	else if (memblk_nodeid[i] == node) {
> -		bool mismatch = !(ma->flags &
> ACPI_SRAT_MEM_HOT_PLUGGABLE) !=
> -		                !test_bit(i, memblk_hotplug);
> -
> -		printk("%sSRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr")
> overlaps with itself (%"PRIpaddr"-%"PRIpaddr")\n",
> -		       mismatch ? KERN_ERR : KERN_WARNING, pxm, start,
> end,
> -		       node_memblk_range[i].start,
> node_memblk_range[i].end);
> -		if (mismatch) {
> +	status = conflicting_memblks(node, start, end, nd_start, nd_end,
> &i);
> +	switch(status)
> +	{
> +	case OVERLAP:
> +	{
> +		if (memblk_nodeid[i] == node) {
> +			bool mismatch = !(ma->flags &
> +
> ACPI_SRAT_MEM_HOT_PLUGGABLE) !=
> +			                !test_bit(i, memblk_hotplug);
> +
> +			printk("%sSRAT: PXM %u (%"PRIpaddr"-
> %"PRIpaddr") overlaps with itself (%"PRIpaddr"-%"PRIpaddr")\n",
> +			       mismatch ? KERN_ERR : KERN_WARNING, pxm,
> start,
> +			       end, node_memblk_range[i].start,
> +			       node_memblk_range[i].end);
> +			if (mismatch) {
> +				bad_srat();
> +				return;
> +			}
> +			break;
> +		} else {
> +			printk(KERN_ERR
> +			       "SRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr")
> overlaps with PXM %u (%"PRIpaddr"-%"PRIpaddr")\n",
> +			       pxm, start, end,
> node_to_pxm(memblk_nodeid[i]),
> +			       node_memblk_range[i].start,
> +			       node_memblk_range[i].end);
>  			bad_srat();
>  			return;
>  		}
> -	} else {
> +	}
> +
> +	case INTERLEAVE:
> +	{
>  		printk(KERN_ERR
> -		       "SRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps
> with PXM %u (%"PRIpaddr"-%"PRIpaddr")\n",
> -		       pxm, start, end, node_to_pxm(memblk_nodeid[i]),
> +		       "SRAT: PXM %u: (%"PRIpaddr"-%"PRIpaddr")
> interleaves with PXM %u memblk (%"PRIpaddr"-%"PRIpaddr")\n",
> +		       node, nd_start, nd_end,
> node_to_pxm(memblk_nodeid[i]),
>  		       node_memblk_range[i].start,
> node_memblk_range[i].end);
>  		bad_srat();
>  		return;
>  	}
> -	if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
> -		struct node *nd = &nodes[node];
> 
> -		if (!node_test_and_set(node, memory_nodes_parsed)) {
> -			nd->start = start;
> -			nd->end = end;
> -		} else {
> -			if (start < nd->start)
> -				nd->start = start;
> -			if (nd->end < end)
> -				nd->end = end;
> -		}
> +	default:
> +		break;
> +	}
> +
> +	if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
> +		node_set(node, memory_nodes_parsed);
> +		nd->start = nd_start;
> +		nd->end = nd_end;
>  	}
> +
>  	printk(KERN_INFO "SRAT: Node %u PXM %u %"PRIpaddr"-
> %"PRIpaddr"%s\n",
>  	       node, pxm, start, end,
>  	       ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE ? " (hotplug)" :
> "");
> --
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 7/8] xen/x86: add detection of memory interleaves for different nodes
  2022-05-23  6:25 ` [PATCH v4 7/8] xen/x86: add detection of memory interleaves for different nodes Wei Chen
  2022-05-30  1:46   ` Henry Wang
@ 2022-05-31 13:21   ` Jan Beulich
  2022-06-01  2:53     ` Wei Chen
  2022-06-02  4:10     ` Wei Chen
  1 sibling, 2 replies; 18+ messages in thread
From: Jan Beulich @ 2022-05-31 13:21 UTC (permalink / raw)
  To: Wei Chen
  Cc: nd, Andrew Cooper, Roger Pau Monné, Wei Liu, Jiamei Xie, xen-devel

On 23.05.2022 08:25, Wei Chen wrote:
> @@ -119,20 +125,45 @@ int valid_numa_range(paddr_t start, paddr_t end, nodeid_t node)
>  	return 0;
>  }
>  
> -static __init int conflicting_memblks(paddr_t start, paddr_t end)
> +static
> +enum conflicts __init conflicting_memblks(nodeid_t nid, paddr_t start,
> +					  paddr_t end, paddr_t nd_start,
> +					  paddr_t nd_end, unsigned int *mblkid)
>  {
> -	int i;
> +	unsigned int i;
>  
> +	/*
> +	 * Scan all recorded nodes' memory blocks to check conflicts:
> +	 * Overlap or interleave.
> +	 */
>  	for (i = 0; i < num_node_memblks; i++) {
>  		struct node *nd = &node_memblk_range[i];
> +
> +		*mblkid = i;
> +
> +		/* Skip 0 bytes node memory block. */
>  		if (nd->start == nd->end)
>  			continue;
> +		/*
> +		 * Use memblk range to check memblk overlaps, include the
> +		 * self-overlap case.
> +		 */
>  		if (nd->end > start && nd->start < end)
> -			return i;
> +			return OVERLAP;
>  		if (nd->end == end && nd->start == start)
> -			return i;
> +			return OVERLAP;

Knowing that nd's range is non-empty, is this 2nd condition actually
needed here? (Such an adjustment, if you decided to make it and if not
split out to a separate patch, would need calling out in the
description.)

> +		/*
> +		 * Use node memory range to check whether new range contains
> +		 * memory from other nodes - interleave check. We just need
> +		 * to check full contains situation. Because overlaps have
> +		 * been checked above.
> +		 */
> +	        if (nid != memblk_nodeid[i] &&
> +		    (nd_start < nd->start && nd->end < nd_end))
> +			return INTERLEAVE;

Doesn't this need to be <= in both cases (albeit I think one of the two
expressions would want switching around, to better line up with the
earlier one, visible in context further up).

> @@ -275,10 +306,13 @@ acpi_numa_processor_affinity_init(const struct acpi_srat_cpu_affinity *pa)
>  void __init
>  acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
>  {
> +	enum conflicts status;

I don't think you need this local variable.

> @@ -310,42 +344,78 @@ acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
>  		bad_srat();
>  		return;
>  	}
> +
> +	/*
> +	 * For the node that already has some memory blocks, we will
> +	 * expand the node memory range temporarily to check memory
> +	 * interleaves with other nodes. We will not use this node
> +	 * temp memory range to check overlaps, because it will mask
> +	 * the overlaps in same node.
> +	 *
> +	 * Node with 0 bytes memory doesn't need this expandsion.
> +	 */
> +	nd_start = start;
> +	nd_end = end;
> +	nd = &nodes[node];
> +	if (nd->start != nd->end) {
> +		if (nd_start > nd->start)
> +			nd_start = nd->start;
> +
> +		if (nd_end < nd->end)
> +			nd_end = nd->end;
> +	}
> +
>  	/* It is fine to add this area to the nodes data it will be used later*/
> -	i = conflicting_memblks(start, end);
> -	if (i < 0)
> -		/* everything fine */;
> -	else if (memblk_nodeid[i] == node) {
> -		bool mismatch = !(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) !=
> -		                !test_bit(i, memblk_hotplug);
> -
> -		printk("%sSRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with itself (%"PRIpaddr"-%"PRIpaddr")\n",
> -		       mismatch ? KERN_ERR : KERN_WARNING, pxm, start, end,
> -		       node_memblk_range[i].start, node_memblk_range[i].end);
> -		if (mismatch) {
> +	status = conflicting_memblks(node, start, end, nd_start, nd_end, &i);
> +	switch(status)
> +	{

Style: Missing blank before ( and the brace goes on the same line here
(Linux style).

> +	case OVERLAP:
> +	{

Please omit braces at case labels unless you need a new scope to declare
variables.

> +		if (memblk_nodeid[i] == node) {
> +			bool mismatch = !(ma->flags &
> +					  ACPI_SRAT_MEM_HOT_PLUGGABLE) !=
> +			                !test_bit(i, memblk_hotplug);
> +
> +			printk("%sSRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with itself (%"PRIpaddr"-%"PRIpaddr")\n",
> +			       mismatch ? KERN_ERR : KERN_WARNING, pxm, start,
> +			       end, node_memblk_range[i].start,
> +			       node_memblk_range[i].end);
> +			if (mismatch) {
> +				bad_srat();
> +				return;
> +			}
> +			break;
> +		} else {
> +			printk(KERN_ERR
> +			       "SRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with PXM %u (%"PRIpaddr"-%"PRIpaddr")\n",
> +			       pxm, start, end, node_to_pxm(memblk_nodeid[i]),
> +			       node_memblk_range[i].start,
> +			       node_memblk_range[i].end);
>  			bad_srat();
>  			return;
>  		}

To limit indentation depth, on of the two sides of the conditional can
be moved out, by omitting the unnecessary "else". To reduce the diff
it may be worthwhile to invert the if() condition, allowing the (then
implicit) "else" case to remain (almost) unchanged from the original.

> -	} else {
> +	}
> +
> +	case INTERLEAVE:
> +	{
>  		printk(KERN_ERR
> -		       "SRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with PXM %u (%"PRIpaddr"-%"PRIpaddr")\n",
> -		       pxm, start, end, node_to_pxm(memblk_nodeid[i]),
> +		       "SRAT: PXM %u: (%"PRIpaddr"-%"PRIpaddr") interleaves with PXM %u memblk (%"PRIpaddr"-%"PRIpaddr")\n",
> +		       node, nd_start, nd_end, node_to_pxm(memblk_nodeid[i]),

Hmm, you have PXM in the log message text, but you still pass "node" as
first argument.

Since you're touching all these messages, could I ask you to convert
all ranges to proper mathematical interval representation? I.e.
[start,end) here aiui as the end addresses look to be non-inclusive.

>  		       node_memblk_range[i].start, node_memblk_range[i].end);
>  		bad_srat();
>  		return;
>  	}
> -	if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
> -		struct node *nd = &nodes[node];
>  
> -		if (!node_test_and_set(node, memory_nodes_parsed)) {
> -			nd->start = start;
> -			nd->end = end;
> -		} else {
> -			if (start < nd->start)
> -				nd->start = start;
> -			if (nd->end < end)
> -				nd->end = end;
> -		}
> +	default:
> +		break;

This wants to be "case NO_CONFLICT:", such that the compiler would
warn if a new enumerator appears without adding code here. (An
alternative - which personally I don't like - would be to put
ASSERT_UNREACHABLE() in the default: case. The downside is that
then the issue would only be noticeable at runtime.)

Jan



^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v4 7/8] xen/x86: add detection of memory interleaves for different nodes
  2022-05-31 13:21   ` Jan Beulich
@ 2022-06-01  2:53     ` Wei Chen
  2022-06-01  6:32       ` Jan Beulich
  2022-06-02  4:10     ` Wei Chen
  1 sibling, 1 reply; 18+ messages in thread
From: Wei Chen @ 2022-06-01  2:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: nd, Andrew Cooper, Roger Pau Monné, Wei Liu, Jiamei Xie, xen-devel

Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2022年5月31日 21:21
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: nd <nd@arm.com>; Andrew Cooper <andrew.cooper3@citrix.com>; Roger Pau
> Monné <roger.pau@citrix.com>; Wei Liu <wl@xen.org>; Jiamei Xie
> <Jiamei.Xie@arm.com>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v4 7/8] xen/x86: add detection of memory interleaves
> for different nodes
> 
> On 23.05.2022 08:25, Wei Chen wrote:
> > @@ -119,20 +125,45 @@ int valid_numa_range(paddr_t start, paddr_t end,
> nodeid_t node)
> >  	return 0;
> >  }
> >
> > -static __init int conflicting_memblks(paddr_t start, paddr_t end)
> > +static
> > +enum conflicts __init conflicting_memblks(nodeid_t nid, paddr_t start,
> > +					  paddr_t end, paddr_t nd_start,
> > +					  paddr_t nd_end, unsigned int *mblkid)
> >  {
> > -	int i;
> > +	unsigned int i;
> >
> > +	/*
> > +	 * Scan all recorded nodes' memory blocks to check conflicts:
> > +	 * Overlap or interleave.
> > +	 */
> >  	for (i = 0; i < num_node_memblks; i++) {
> >  		struct node *nd = &node_memblk_range[i];
> > +
> > +		*mblkid = i;
> > +
> > +		/* Skip 0 bytes node memory block. */
> >  		if (nd->start == nd->end)
> >  			continue;
> > +		/*
> > +		 * Use memblk range to check memblk overlaps, include the
> > +		 * self-overlap case.
> > +		 */
> >  		if (nd->end > start && nd->start < end)
> > -			return i;
> > +			return OVERLAP;
> >  		if (nd->end == end && nd->start == start)
> > -			return i;
> > +			return OVERLAP;
> 
> Knowing that nd's range is non-empty, is this 2nd condition actually
> needed here? (Such an adjustment, if you decided to make it and if not
> split out to a separate patch, would need calling out in the
> description.)

The 2nd condition here, you meant is "(nd->end == end && nd->start == start)"
or just "nd->start == start" after "&&"?

My understanding is the first case, "(nd->end == end && nd->start == start)"
will be covered by "(nd->end > start && nd->start < end)". If so, I'll remove
it in the next version and add some descriptions in the commit log and code
comment.

> 
> > +		/*
> > +		 * Use node memory range to check whether new range contains
> > +		 * memory from other nodes - interleave check. We just need
> > +		 * to check full contains situation. Because overlaps have
> > +		 * been checked above.
> > +		 */
> > +	        if (nid != memblk_nodeid[i] &&
> > +		    (nd_start < nd->start && nd->end < nd_end))
> > +			return INTERLEAVE;
> 
> Doesn't this need to be <= in both cases (albeit I think one of the two
> expressions would want switching around, to better line up with the
> earlier one, visible in context further up).
> 

Yes, I will add "="in both cases. But for switching around, I also
wanted to make a better line up. But if nid == memblk_nodeid[i],
the check of (nd_start < nd->start && nd->end < nd_end) is meaningless.
I'll adjust their order in the next version if you think this is
acceptable.

> > @@ -275,10 +306,13 @@ acpi_numa_processor_affinity_init(const struct
> acpi_srat_cpu_affinity *pa)
> >  void __init
> >  acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
> >  {
> > +	enum conflicts status;
> 
> I don't think you need this local variable.
> 

Why I don't need this one? Did you mean I can use
switch (conflicting_memblks(...)) directly?

> > @@ -310,42 +344,78 @@ acpi_numa_memory_affinity_init(const struct
> acpi_srat_mem_affinity *ma)
> >  		bad_srat();
> >  		return;
> >  	}
> > +
> > +	/*
> > +	 * For the node that already has some memory blocks, we will
> > +	 * expand the node memory range temporarily to check memory
> > +	 * interleaves with other nodes. We will not use this node
> > +	 * temp memory range to check overlaps, because it will mask
> > +	 * the overlaps in same node.
> > +	 *
> > +	 * Node with 0 bytes memory doesn't need this expandsion.
> > +	 */
> > +	nd_start = start;
> > +	nd_end = end;
> > +	nd = &nodes[node];
> > +	if (nd->start != nd->end) {
> > +		if (nd_start > nd->start)
> > +			nd_start = nd->start;
> > +
> > +		if (nd_end < nd->end)
> > +			nd_end = nd->end;
> > +	}
> > +
> >  	/* It is fine to add this area to the nodes data it will be used
> later*/
> > -	i = conflicting_memblks(start, end);
> > -	if (i < 0)
> > -		/* everything fine */;
> > -	else if (memblk_nodeid[i] == node) {
> > -		bool mismatch = !(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) !=
> > -		                !test_bit(i, memblk_hotplug);
> > -
> > -		printk("%sSRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with
> itself (%"PRIpaddr"-%"PRIpaddr")\n",
> > -		       mismatch ? KERN_ERR : KERN_WARNING, pxm, start, end,
> > -		       node_memblk_range[i].start, node_memblk_range[i].end);
> > -		if (mismatch) {
> > +	status = conflicting_memblks(node, start, end, nd_start, nd_end, &i);
> > +	switch(status)
> > +	{
> 
> Style: Missing blank before ( and the brace goes on the same line here
> (Linux style).
> 

Ok.

> > +	case OVERLAP:
> > +	{
> 
> Please omit braces at case labels unless you need a new scope to declare
> variables.
> 

OK.

> > +		if (memblk_nodeid[i] == node) {
> > +			bool mismatch = !(ma->flags &
> > +					  ACPI_SRAT_MEM_HOT_PLUGGABLE) !=
> > +			                !test_bit(i, memblk_hotplug);
> > +
> > +			printk("%sSRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr")
> overlaps with itself (%"PRIpaddr"-%"PRIpaddr")\n",
> > +			       mismatch ? KERN_ERR : KERN_WARNING, pxm, start,
> > +			       end, node_memblk_range[i].start,
> > +			       node_memblk_range[i].end);
> > +			if (mismatch) {
> > +				bad_srat();
> > +				return;
> > +			}
> > +			break;
> > +		} else {
> > +			printk(KERN_ERR
> > +			       "SRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps
> with PXM %u (%"PRIpaddr"-%"PRIpaddr")\n",
> > +			       pxm, start, end, node_to_pxm(memblk_nodeid[i]),
> > +			       node_memblk_range[i].start,
> > +			       node_memblk_range[i].end);
> >  			bad_srat();
> >  			return;
> >  		}
> 
> To limit indentation depth, on of the two sides of the conditional can
> be moved out, by omitting the unnecessary "else". To reduce the diff
> it may be worthwhile to invert the if() condition, allowing the (then
> implicit) "else" case to remain (almost) unchanged from the original.
> 

I will adjust them in next version.

> > -	} else {
> > +	}
> > +
> > +	case INTERLEAVE:
> > +	{
> >  		printk(KERN_ERR
> > -		       "SRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with
> PXM %u (%"PRIpaddr"-%"PRIpaddr")\n",
> > -		       pxm, start, end, node_to_pxm(memblk_nodeid[i]),
> > +		       "SRAT: PXM %u: (%"PRIpaddr"-%"PRIpaddr") interleaves
> with PXM %u memblk (%"PRIpaddr"-%"PRIpaddr")\n",
> > +		       node, nd_start, nd_end, node_to_pxm(memblk_nodeid[i]),
> 
> Hmm, you have PXM in the log message text, but you still pass "node" as
> first argument.
> 

I will fix it.

> Since you're touching all these messages, could I ask you to convert
> all ranges to proper mathematical interval representation? I.e.
> [start,end) here aiui as the end addresses look to be non-inclusive.
> 

Sure, I will do it.

> >  		       node_memblk_range[i].start, node_memblk_range[i].end);
> >  		bad_srat();
> >  		return;
> >  	}
> > -	if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
> > -		struct node *nd = &nodes[node];
> >
> > -		if (!node_test_and_set(node, memory_nodes_parsed)) {
> > -			nd->start = start;
> > -			nd->end = end;
> > -		} else {
> > -			if (start < nd->start)
> > -				nd->start = start;
> > -			if (nd->end < end)
> > -				nd->end = end;
> > -		}
> > +	default:
> > +		break;
> 
> This wants to be "case NO_CONFLICT:", such that the compiler would
> warn if a new enumerator appears without adding code here. (An
> alternative - which personally I don't like - would be to put
> ASSERT_UNREACHABLE() in the default: case. The downside is that
> then the issue would only be noticeable at runtime.)
> 

Thanks, I will adjust it to:
	case NO_CONFLICT:
		break;
	default:
		ASSERT_UNREACHABLE();
in next version.


> Jan


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 7/8] xen/x86: add detection of memory interleaves for different nodes
  2022-06-01  2:53     ` Wei Chen
@ 2022-06-01  6:32       ` Jan Beulich
  2022-06-02  2:20         ` Wei Chen
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Beulich @ 2022-06-01  6:32 UTC (permalink / raw)
  To: Wei Chen
  Cc: nd, Andrew Cooper, Roger Pau Monné, Wei Liu, Jiamei Xie, xen-devel

On 01.06.2022 04:53, Wei Chen wrote:
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 2022年5月31日 21:21
>>
>> On 23.05.2022 08:25, Wei Chen wrote:
>>> @@ -119,20 +125,45 @@ int valid_numa_range(paddr_t start, paddr_t end,
>> nodeid_t node)
>>>  	return 0;
>>>  }
>>>
>>> -static __init int conflicting_memblks(paddr_t start, paddr_t end)
>>> +static
>>> +enum conflicts __init conflicting_memblks(nodeid_t nid, paddr_t start,
>>> +					  paddr_t end, paddr_t nd_start,
>>> +					  paddr_t nd_end, unsigned int *mblkid)
>>>  {
>>> -	int i;
>>> +	unsigned int i;
>>>
>>> +	/*
>>> +	 * Scan all recorded nodes' memory blocks to check conflicts:
>>> +	 * Overlap or interleave.
>>> +	 */
>>>  	for (i = 0; i < num_node_memblks; i++) {
>>>  		struct node *nd = &node_memblk_range[i];
>>> +
>>> +		*mblkid = i;
>>> +
>>> +		/* Skip 0 bytes node memory block. */
>>>  		if (nd->start == nd->end)
>>>  			continue;
>>> +		/*
>>> +		 * Use memblk range to check memblk overlaps, include the
>>> +		 * self-overlap case.
>>> +		 */
>>>  		if (nd->end > start && nd->start < end)
>>> -			return i;
>>> +			return OVERLAP;
>>>  		if (nd->end == end && nd->start == start)
>>> -			return i;
>>> +			return OVERLAP;
>>
>> Knowing that nd's range is non-empty, is this 2nd condition actually
>> needed here? (Such an adjustment, if you decided to make it and if not
>> split out to a separate patch, would need calling out in the
>> description.)
> 
> The 2nd condition here, you meant is "(nd->end == end && nd->start == start)"
> or just "nd->start == start" after "&&"?

No, I mean the entire 2nd if().

>>> +		/*
>>> +		 * Use node memory range to check whether new range contains
>>> +		 * memory from other nodes - interleave check. We just need
>>> +		 * to check full contains situation. Because overlaps have
>>> +		 * been checked above.
>>> +		 */
>>> +	        if (nid != memblk_nodeid[i] &&
>>> +		    (nd_start < nd->start && nd->end < nd_end))
>>> +			return INTERLEAVE;
>>
>> Doesn't this need to be <= in both cases (albeit I think one of the two
>> expressions would want switching around, to better line up with the
>> earlier one, visible in context further up).
>>
> 
> Yes, I will add "="in both cases. But for switching around, I also
> wanted to make a better line up. But if nid == memblk_nodeid[i],
> the check of (nd_start < nd->start && nd->end < nd_end) is meaningless.
> I'll adjust their order in the next version if you think this is
> acceptable.

I wasn't referring to the "nid != memblk_nodeid[i]" part at all. What
I'm after is for the two range checks to come as close as possible to
what the other range check does. (Which, as I notice only now, would
include the dropping of the unnecessary inner pair of parentheses.)
E.g. (there are other variations of it)

	        if (nid != memblk_nodeid[i] &&
		    nd->start >= nd_start && nd->end <= nd_end)
			return INTERLEAVE;

>>> @@ -275,10 +306,13 @@ acpi_numa_processor_affinity_init(const struct
>> acpi_srat_cpu_affinity *pa)
>>>  void __init
>>>  acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity *ma)
>>>  {
>>> +	enum conflicts status;
>>
>> I don't think you need this local variable.
>>
> 
> Why I don't need this one? Did you mean I can use
> switch (conflicting_memblks(...)) directly?

Yes. Why could this not be possible?

>>>  		       node_memblk_range[i].start, node_memblk_range[i].end);
>>>  		bad_srat();
>>>  		return;
>>>  	}
>>> -	if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
>>> -		struct node *nd = &nodes[node];
>>>
>>> -		if (!node_test_and_set(node, memory_nodes_parsed)) {
>>> -			nd->start = start;
>>> -			nd->end = end;
>>> -		} else {
>>> -			if (start < nd->start)
>>> -				nd->start = start;
>>> -			if (nd->end < end)
>>> -				nd->end = end;
>>> -		}
>>> +	default:
>>> +		break;
>>
>> This wants to be "case NO_CONFLICT:", such that the compiler would
>> warn if a new enumerator appears without adding code here. (An
>> alternative - which personally I don't like - would be to put
>> ASSERT_UNREACHABLE() in the default: case. The downside is that
>> then the issue would only be noticeable at runtime.)
>>
> 
> Thanks, I will adjust it to:
> 	case NO_CONFLICT:
> 		break;
> 	default:
> 		ASSERT_UNREACHABLE();
> in next version.

As said - I consider this form less desirable, as it'll defer
noticing of an issue from build-time to runtime. If you think that
form is better, may I ask why?

Jan



^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v4 7/8] xen/x86: add detection of memory interleaves for different nodes
  2022-06-01  6:32       ` Jan Beulich
@ 2022-06-02  2:20         ` Wei Chen
  0 siblings, 0 replies; 18+ messages in thread
From: Wei Chen @ 2022-06-02  2:20 UTC (permalink / raw)
  To: Jan Beulich
  Cc: nd, Andrew Cooper, Roger Pau Monné, Wei Liu, Jiamei Xie, xen-devel

Hi Jan,

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 2022年6月1日 14:32
> To: Wei Chen <Wei.Chen@arm.com>
> Cc: nd <nd@arm.com>; Andrew Cooper <andrew.cooper3@citrix.com>; Roger Pau
> Monné <roger.pau@citrix.com>; Wei Liu <wl@xen.org>; Jiamei Xie
> <Jiamei.Xie@arm.com>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v4 7/8] xen/x86: add detection of memory interleaves
> for different nodes
> 
> On 01.06.2022 04:53, Wei Chen wrote:
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 2022年5月31日 21:21
> >>
> >> On 23.05.2022 08:25, Wei Chen wrote:
> >>> @@ -119,20 +125,45 @@ int valid_numa_range(paddr_t start, paddr_t end,
> >> nodeid_t node)
> >>>  	return 0;
> >>>  }
> >>>
> >>> -static __init int conflicting_memblks(paddr_t start, paddr_t end)
> >>> +static
> >>> +enum conflicts __init conflicting_memblks(nodeid_t nid, paddr_t start,
> >>> +					  paddr_t end, paddr_t nd_start,
> >>> +					  paddr_t nd_end, unsigned int *mblkid)
> >>>  {
> >>> -	int i;
> >>> +	unsigned int i;
> >>>
> >>> +	/*
> >>> +	 * Scan all recorded nodes' memory blocks to check conflicts:
> >>> +	 * Overlap or interleave.
> >>> +	 */
> >>>  	for (i = 0; i < num_node_memblks; i++) {
> >>>  		struct node *nd = &node_memblk_range[i];
> >>> +
> >>> +		*mblkid = i;
> >>> +
> >>> +		/* Skip 0 bytes node memory block. */
> >>>  		if (nd->start == nd->end)
> >>>  			continue;
> >>> +		/*
> >>> +		 * Use memblk range to check memblk overlaps, include the
> >>> +		 * self-overlap case.
> >>> +		 */
> >>>  		if (nd->end > start && nd->start < end)
> >>> -			return i;
> >>> +			return OVERLAP;
> >>>  		if (nd->end == end && nd->start == start)
> >>> -			return i;
> >>> +			return OVERLAP;
> >>
> >> Knowing that nd's range is non-empty, is this 2nd condition actually
> >> needed here? (Such an adjustment, if you decided to make it and if not
> >> split out to a separate patch, would need calling out in the
> >> description.)
> >
> > The 2nd condition here, you meant is "(nd->end == end && nd->start ==
> start)"
> > or just "nd->start == start" after "&&"?
> 
> No, I mean the entire 2nd if().
> 

OK.

> >>> +		/*
> >>> +		 * Use node memory range to check whether new range contains
> >>> +		 * memory from other nodes - interleave check. We just need
> >>> +		 * to check full contains situation. Because overlaps have
> >>> +		 * been checked above.
> >>> +		 */
> >>> +	        if (nid != memblk_nodeid[i] &&
> >>> +		    (nd_start < nd->start && nd->end < nd_end))
> >>> +			return INTERLEAVE;
> >>
> >> Doesn't this need to be <= in both cases (albeit I think one of the two
> >> expressions would want switching around, to better line up with the
> >> earlier one, visible in context further up).
> >>
> >
> > Yes, I will add "="in both cases. But for switching around, I also
> > wanted to make a better line up. But if nid == memblk_nodeid[i],
> > the check of (nd_start < nd->start && nd->end < nd_end) is meaningless.
> > I'll adjust their order in the next version if you think this is
> > acceptable.
> 
> I wasn't referring to the "nid != memblk_nodeid[i]" part at all. What
> I'm after is for the two range checks to come as close as possible to
> what the other range check does. (Which, as I notice only now, would
> include the dropping of the unnecessary inner pair of parentheses.)
> E.g. (there are other variations of it)
> 
> 	        if (nid != memblk_nodeid[i] &&
> 		    nd->start >= nd_start && nd->end <= nd_end)
> 			return INTERLEAVE;
> 

Oh, thanks. I had thought too much, I will drop them.

> >>> @@ -275,10 +306,13 @@ acpi_numa_processor_affinity_init(const struct
> >> acpi_srat_cpu_affinity *pa)
> >>>  void __init
> >>>  acpi_numa_memory_affinity_init(const struct acpi_srat_mem_affinity
> *ma)
> >>>  {
> >>> +	enum conflicts status;
> >>
> >> I don't think you need this local variable.
> >>
> >
> > Why I don't need this one? Did you mean I can use
> > switch (conflicting_memblks(...)) directly?
> 
> Yes. Why could this not be possible?
> 

Ok.

> >>>  		       node_memblk_range[i].start, node_memblk_range[i].end);
> >>>  		bad_srat();
> >>>  		return;
> >>>  	}
> >>> -	if (!(ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE)) {
> >>> -		struct node *nd = &nodes[node];
> >>>
> >>> -		if (!node_test_and_set(node, memory_nodes_parsed)) {
> >>> -			nd->start = start;
> >>> -			nd->end = end;
> >>> -		} else {
> >>> -			if (start < nd->start)
> >>> -				nd->start = start;
> >>> -			if (nd->end < end)
> >>> -				nd->end = end;
> >>> -		}
> >>> +	default:
> >>> +		break;
> >>
> >> This wants to be "case NO_CONFLICT:", such that the compiler would
> >> warn if a new enumerator appears without adding code here. (An
> >> alternative - which personally I don't like - would be to put
> >> ASSERT_UNREACHABLE() in the default: case. The downside is that
> >> then the issue would only be noticeable at runtime.)
> >>
> >
> > Thanks, I will adjust it to:
> > 	case NO_CONFLICT:
> > 		break;
> > 	default:
> > 		ASSERT_UNREACHABLE();
> > in next version.
> 
> As said - I consider this form less desirable, as it'll defer
> noticing of an issue from build-time to runtime. If you think that
> form is better, may I ask why?
> 

Ok. I will drop the default. I had mis-understood your comment.

> Jan


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 7/8] xen/x86: add detection of memory interleaves for different nodes
  2022-05-31 13:21   ` Jan Beulich
  2022-06-01  2:53     ` Wei Chen
@ 2022-06-02  4:10     ` Wei Chen
  2022-06-02  8:32       ` Jan Beulich
  1 sibling, 1 reply; 18+ messages in thread
From: Wei Chen @ 2022-06-02  4:10 UTC (permalink / raw)
  To: Jan Beulich
  Cc: nd, Andrew Cooper, Roger Pau Monné, Wei Liu, Jiamei Xie, xen-devel

Hi Jan,

On 2022/5/31 21:21, Jan Beulich wrote:
> On 23.05.2022 08:25, Wei Chen wrote:
>> @@ -119,20 +125,45 @@ int valid_numa_range(paddr_t start, paddr_t end, nodeid_t node)
>>   	return 0;

> 
> To limit indentation depth, on of the two sides of the conditional can
> be moved out, by omitting the unnecessary "else". To reduce the diff
> it may be worthwhile to invert the if() condition, allowing the (then
> implicit) "else" case to remain (almost) unchanged from the original.
> 
>> -	} else {
>> +	}
>> +
>> +	case INTERLEAVE:
>> +	{
>>   		printk(KERN_ERR
>> -		       "SRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with PXM %u (%"PRIpaddr"-%"PRIpaddr")\n",
>> -		       pxm, start, end, node_to_pxm(memblk_nodeid[i]),
>> +		       "SRAT: PXM %u: (%"PRIpaddr"-%"PRIpaddr") interleaves with PXM %u memblk (%"PRIpaddr"-%"PRIpaddr")\n",
>> +		       node, nd_start, nd_end, node_to_pxm(memblk_nodeid[i]),
> 
> Hmm, you have PXM in the log message text, but you still pass "node" as
> first argument.
> 
> Since you're touching all these messages, could I ask you to convert
> all ranges to proper mathematical interval representation? I.e.
> [start,end) here aiui as the end addresses look to be non-inclusive.
> 

Sorry, I want to confirm with you about this comment again. Now the 
messages look like:
(XEN) NUMA: PXM 0: (0000000080000000-00000008d8000000) interleaves...

So I want to know, is it [0000000080000000-00000008d8000000) or
(0000000080000000-00000008d7ffffff) addressed your comment?
Literally, I think it's case#1?

Thanks,
Wei Chen

> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 7/8] xen/x86: add detection of memory interleaves for different nodes
  2022-06-02  4:10     ` Wei Chen
@ 2022-06-02  8:32       ` Jan Beulich
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Beulich @ 2022-06-02  8:32 UTC (permalink / raw)
  To: Wei Chen
  Cc: nd, Andrew Cooper, Roger Pau Monné, Wei Liu, Jiamei Xie, xen-devel

On 02.06.2022 06:10, Wei Chen wrote:
> Hi Jan,
> 
> On 2022/5/31 21:21, Jan Beulich wrote:
>> On 23.05.2022 08:25, Wei Chen wrote:
>>> @@ -119,20 +125,45 @@ int valid_numa_range(paddr_t start, paddr_t end, nodeid_t node)
>>>   	return 0;
> 
>>
>> To limit indentation depth, on of the two sides of the conditional can
>> be moved out, by omitting the unnecessary "else". To reduce the diff
>> it may be worthwhile to invert the if() condition, allowing the (then
>> implicit) "else" case to remain (almost) unchanged from the original.
>>
>>> -	} else {
>>> +	}
>>> +
>>> +	case INTERLEAVE:
>>> +	{
>>>   		printk(KERN_ERR
>>> -		       "SRAT: PXM %u (%"PRIpaddr"-%"PRIpaddr") overlaps with PXM %u (%"PRIpaddr"-%"PRIpaddr")\n",
>>> -		       pxm, start, end, node_to_pxm(memblk_nodeid[i]),
>>> +		       "SRAT: PXM %u: (%"PRIpaddr"-%"PRIpaddr") interleaves with PXM %u memblk (%"PRIpaddr"-%"PRIpaddr")\n",
>>> +		       node, nd_start, nd_end, node_to_pxm(memblk_nodeid[i]),
>>
>> Hmm, you have PXM in the log message text, but you still pass "node" as
>> first argument.
>>
>> Since you're touching all these messages, could I ask you to convert
>> all ranges to proper mathematical interval representation? I.e.
>> [start,end) here aiui as the end addresses look to be non-inclusive.
>>
> 
> Sorry, I want to confirm with you about this comment again. Now the 
> messages look like:
> (XEN) NUMA: PXM 0: (0000000080000000-00000008d8000000) interleaves...
> 
> So I want to know, is it [0000000080000000-00000008d8000000) or
> (0000000080000000-00000008d7ffffff) addressed your comment?
> Literally, I think it's case#1?

The former or [0000000080000000-00000008d7ffffff]. Parentheses stand for
exclusive boundaries, while square brackets stand for inclusive ones.

Jan



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-06-02  8:32 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-23  6:25 [PATCH v4 0/8] Device tree based NUMA support for Arm - Part#1 Wei Chen
2022-05-23  6:25 ` [PATCH v4 1/8] xen: reuse x86 EFI stub functions for Arm Wei Chen
2022-05-23  7:10   ` Jan Beulich
2022-05-23  7:19     ` Wei Chen
2022-05-23  6:25 ` [PATCH v4 2/8] xen/arm: Keep memory nodes in device tree when Xen boots from EFI Wei Chen
2022-05-23  6:25 ` [PATCH v4 3/8] xen: introduce an arch helper for default dma zone status Wei Chen
2022-05-23  6:25 ` [PATCH v4 4/8] xen: decouple NUMA from ACPI in Kconfig Wei Chen
2022-05-23  6:25 ` [PATCH v4 5/8] xen/arm: use !CONFIG_NUMA to keep fake NUMA API Wei Chen
2022-05-23  6:25 ` [PATCH v4 6/8] xen/x86: use paddr_t for addresses in NUMA node structure Wei Chen
2022-05-23  6:25 ` [PATCH v4 7/8] xen/x86: add detection of memory interleaves for different nodes Wei Chen
2022-05-30  1:46   ` Henry Wang
2022-05-31 13:21   ` Jan Beulich
2022-06-01  2:53     ` Wei Chen
2022-06-01  6:32       ` Jan Beulich
2022-06-02  2:20         ` Wei Chen
2022-06-02  4:10     ` Wei Chen
2022-06-02  8:32       ` Jan Beulich
2022-05-23  6:25 ` [PATCH v4 8/8] xen/x86: use INFO level for node's without memory log message Wei Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.