All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5] hw/arm/virt: Don't create device-tree node for empty NUMA node
@ 2021-10-15 12:42 ` Gavin Shan
  0 siblings, 0 replies; 5+ messages in thread
From: Gavin Shan @ 2021-10-15 12:42 UTC (permalink / raw)
  To: qemu-arm
  Cc: robh, drjones, qemu-riscv, ehabkost, peter.maydell, qemu-devel,
	shan.gavin, imammedo

The empty NUMA node, where no memory resides, are allowed. For
example, the following command line specifies two empty NUMA nodes.
With this, QEMU fails to boot because of the conflicting device-tree
node names, as the following error message indicates.

  /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
  -accel kvm -machine virt,gic-version=host               \
  -cpu host -smp 4,sockets=2,cores=2,threads=1            \
  -m 1024M,slots=16,maxmem=64G                            \
  -object memory-backend-ram,id=mem0,size=512M            \
  -object memory-backend-ram,id=mem1,size=512M            \
  -numa node,nodeid=0,cpus=0-1,memdev=mem0                \
  -numa node,nodeid=1,cpus=2-3,memdev=mem1                \
  -numa node,nodeid=2                                     \
  -numa node,nodeid=3
    :
  qemu-system-aarch64: FDT: Failed to create subnode /memory@80000000: FDT_ERR_EXISTS

As specified by linux device-tree binding document, the device-tree
nodes for these empty NUMA nodes shouldn't be generated. However,
the corresponding NUMA node IDs should be included in the distance
map. The memory hotplug through device-tree on ARM64 isn't existing
so far and it's not necessary to require the user to provide a distance
map. Furthermore, the default distance map Linux generates may even be
sufficient. So this simply skips populating the device-tree nodes for
these empty NUMA nodes to avoid the error, so that QEMU can be started
successfully.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
v5: Improved commit log and comments as Drew suggested.
---
 hw/arm/boot.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 57efb61ee4..74ad397b1f 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -599,10 +599,23 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
     }
     g_strfreev(node_path);
 
+    /*
+     * We drop all the memory nodes which correspond to empty NUMA nodes
+     * from the device tree, because the Linux NUMA binding document
+     * states they should not be generated. Linux will get the NUMA node
+     * IDs of the empty NUMA nodes from the distance map if they are needed.
+     * This means QEMU users may be obliged to provide command lines which
+     * configure distance maps when the empty NUMA node IDs are needed and
+     * Linux's default distance map isn't sufficient.
+     */
     if (ms->numa_state != NULL && ms->numa_state->num_nodes > 0) {
         mem_base = binfo->loader_start;
         for (i = 0; i < ms->numa_state->num_nodes; i++) {
             mem_len = ms->numa_state->nodes[i].node_mem;
+            if (!mem_len) {
+                continue;
+            }
+
             rc = fdt_add_memory_node(fdt, acells, mem_base,
                                      scells, mem_len, i);
             if (rc < 0) {
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v5] hw/arm/virt: Don't create device-tree node for empty NUMA node
@ 2021-10-15 12:42 ` Gavin Shan
  0 siblings, 0 replies; 5+ messages in thread
From: Gavin Shan @ 2021-10-15 12:42 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, qemu-riscv, imammedo, drjones, ehabkost, robh,
	peter.maydell, shan.gavin

The empty NUMA node, where no memory resides, are allowed. For
example, the following command line specifies two empty NUMA nodes.
With this, QEMU fails to boot because of the conflicting device-tree
node names, as the following error message indicates.

  /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
  -accel kvm -machine virt,gic-version=host               \
  -cpu host -smp 4,sockets=2,cores=2,threads=1            \
  -m 1024M,slots=16,maxmem=64G                            \
  -object memory-backend-ram,id=mem0,size=512M            \
  -object memory-backend-ram,id=mem1,size=512M            \
  -numa node,nodeid=0,cpus=0-1,memdev=mem0                \
  -numa node,nodeid=1,cpus=2-3,memdev=mem1                \
  -numa node,nodeid=2                                     \
  -numa node,nodeid=3
    :
  qemu-system-aarch64: FDT: Failed to create subnode /memory@80000000: FDT_ERR_EXISTS

As specified by linux device-tree binding document, the device-tree
nodes for these empty NUMA nodes shouldn't be generated. However,
the corresponding NUMA node IDs should be included in the distance
map. The memory hotplug through device-tree on ARM64 isn't existing
so far and it's not necessary to require the user to provide a distance
map. Furthermore, the default distance map Linux generates may even be
sufficient. So this simply skips populating the device-tree nodes for
these empty NUMA nodes to avoid the error, so that QEMU can be started
successfully.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
v5: Improved commit log and comments as Drew suggested.
---
 hw/arm/boot.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 57efb61ee4..74ad397b1f 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -599,10 +599,23 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
     }
     g_strfreev(node_path);
 
+    /*
+     * We drop all the memory nodes which correspond to empty NUMA nodes
+     * from the device tree, because the Linux NUMA binding document
+     * states they should not be generated. Linux will get the NUMA node
+     * IDs of the empty NUMA nodes from the distance map if they are needed.
+     * This means QEMU users may be obliged to provide command lines which
+     * configure distance maps when the empty NUMA node IDs are needed and
+     * Linux's default distance map isn't sufficient.
+     */
     if (ms->numa_state != NULL && ms->numa_state->num_nodes > 0) {
         mem_base = binfo->loader_start;
         for (i = 0; i < ms->numa_state->num_nodes; i++) {
             mem_len = ms->numa_state->nodes[i].node_mem;
+            if (!mem_len) {
+                continue;
+            }
+
             rc = fdt_add_memory_node(fdt, acells, mem_base,
                                      scells, mem_len, i);
             if (rc < 0) {
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v5] hw/arm/virt: Don't create device-tree node for empty NUMA node
  2021-10-15 12:42 ` Gavin Shan
@ 2021-10-15 13:11   ` Andrew Jones
  -1 siblings, 0 replies; 5+ messages in thread
From: Andrew Jones @ 2021-10-15 13:11 UTC (permalink / raw)
  To: Gavin Shan
  Cc: robh, qemu-riscv, ehabkost, peter.maydell, qemu-devel, qemu-arm,
	shan.gavin, imammedo

On Fri, Oct 15, 2021 at 08:42:46PM +0800, Gavin Shan wrote:
> The empty NUMA node, where no memory resides, are allowed. For
> example, the following command line specifies two empty NUMA nodes.
> With this, QEMU fails to boot because of the conflicting device-tree
> node names, as the following error message indicates.
> 
>   /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
>   -accel kvm -machine virt,gic-version=host               \
>   -cpu host -smp 4,sockets=2,cores=2,threads=1            \
>   -m 1024M,slots=16,maxmem=64G                            \
>   -object memory-backend-ram,id=mem0,size=512M            \
>   -object memory-backend-ram,id=mem1,size=512M            \
>   -numa node,nodeid=0,cpus=0-1,memdev=mem0                \
>   -numa node,nodeid=1,cpus=2-3,memdev=mem1                \
>   -numa node,nodeid=2                                     \
>   -numa node,nodeid=3
>     :
>   qemu-system-aarch64: FDT: Failed to create subnode /memory@80000000: FDT_ERR_EXISTS
> 
> As specified by linux device-tree binding document, the device-tree
> nodes for these empty NUMA nodes shouldn't be generated. However,
> the corresponding NUMA node IDs should be included in the distance
> map. The memory hotplug through device-tree on ARM64 isn't existing
> so far and it's not necessary to require the user to provide a distance
> map. Furthermore, the default distance map Linux generates may even be
> sufficient. So this simply skips populating the device-tree nodes for
> these empty NUMA nodes to avoid the error, so that QEMU can be started
> successfully.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
> v5: Improved commit log and comments as Drew suggested.
> ---
>  hw/arm/boot.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> index 57efb61ee4..74ad397b1f 100644
> --- a/hw/arm/boot.c
> +++ b/hw/arm/boot.c
> @@ -599,10 +599,23 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
>      }
>      g_strfreev(node_path);
>  
> +    /*
> +     * We drop all the memory nodes which correspond to empty NUMA nodes
> +     * from the device tree, because the Linux NUMA binding document
> +     * states they should not be generated. Linux will get the NUMA node
> +     * IDs of the empty NUMA nodes from the distance map if they are needed.
> +     * This means QEMU users may be obliged to provide command lines which
> +     * configure distance maps when the empty NUMA node IDs are needed and
> +     * Linux's default distance map isn't sufficient.
> +     */
>      if (ms->numa_state != NULL && ms->numa_state->num_nodes > 0) {
>          mem_base = binfo->loader_start;
>          for (i = 0; i < ms->numa_state->num_nodes; i++) {
>              mem_len = ms->numa_state->nodes[i].node_mem;
> +            if (!mem_len) {
> +                continue;
> +            }
> +
>              rc = fdt_add_memory_node(fdt, acells, mem_base,
>                                       scells, mem_len, i);
>              if (rc < 0) {
> -- 
> 2.23.0
>

Reviewed-by: Andrew Jones <drjones@redhat.com>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v5] hw/arm/virt: Don't create device-tree node for empty NUMA node
@ 2021-10-15 13:11   ` Andrew Jones
  0 siblings, 0 replies; 5+ messages in thread
From: Andrew Jones @ 2021-10-15 13:11 UTC (permalink / raw)
  To: Gavin Shan
  Cc: qemu-arm, qemu-devel, qemu-riscv, imammedo, ehabkost, robh,
	peter.maydell, shan.gavin

On Fri, Oct 15, 2021 at 08:42:46PM +0800, Gavin Shan wrote:
> The empty NUMA node, where no memory resides, are allowed. For
> example, the following command line specifies two empty NUMA nodes.
> With this, QEMU fails to boot because of the conflicting device-tree
> node names, as the following error message indicates.
> 
>   /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
>   -accel kvm -machine virt,gic-version=host               \
>   -cpu host -smp 4,sockets=2,cores=2,threads=1            \
>   -m 1024M,slots=16,maxmem=64G                            \
>   -object memory-backend-ram,id=mem0,size=512M            \
>   -object memory-backend-ram,id=mem1,size=512M            \
>   -numa node,nodeid=0,cpus=0-1,memdev=mem0                \
>   -numa node,nodeid=1,cpus=2-3,memdev=mem1                \
>   -numa node,nodeid=2                                     \
>   -numa node,nodeid=3
>     :
>   qemu-system-aarch64: FDT: Failed to create subnode /memory@80000000: FDT_ERR_EXISTS
> 
> As specified by linux device-tree binding document, the device-tree
> nodes for these empty NUMA nodes shouldn't be generated. However,
> the corresponding NUMA node IDs should be included in the distance
> map. The memory hotplug through device-tree on ARM64 isn't existing
> so far and it's not necessary to require the user to provide a distance
> map. Furthermore, the default distance map Linux generates may even be
> sufficient. So this simply skips populating the device-tree nodes for
> these empty NUMA nodes to avoid the error, so that QEMU can be started
> successfully.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
> v5: Improved commit log and comments as Drew suggested.
> ---
>  hw/arm/boot.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> index 57efb61ee4..74ad397b1f 100644
> --- a/hw/arm/boot.c
> +++ b/hw/arm/boot.c
> @@ -599,10 +599,23 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
>      }
>      g_strfreev(node_path);
>  
> +    /*
> +     * We drop all the memory nodes which correspond to empty NUMA nodes
> +     * from the device tree, because the Linux NUMA binding document
> +     * states they should not be generated. Linux will get the NUMA node
> +     * IDs of the empty NUMA nodes from the distance map if they are needed.
> +     * This means QEMU users may be obliged to provide command lines which
> +     * configure distance maps when the empty NUMA node IDs are needed and
> +     * Linux's default distance map isn't sufficient.
> +     */
>      if (ms->numa_state != NULL && ms->numa_state->num_nodes > 0) {
>          mem_base = binfo->loader_start;
>          for (i = 0; i < ms->numa_state->num_nodes; i++) {
>              mem_len = ms->numa_state->nodes[i].node_mem;
> +            if (!mem_len) {
> +                continue;
> +            }
> +
>              rc = fdt_add_memory_node(fdt, acells, mem_base,
>                                       scells, mem_len, i);
>              if (rc < 0) {
> -- 
> 2.23.0
>

Reviewed-by: Andrew Jones <drjones@redhat.com>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v5] hw/arm/virt: Don't create device-tree node for empty NUMA node
  2021-10-15 12:42 ` Gavin Shan
  (?)
  (?)
@ 2021-10-15 23:31 ` Richard Henderson
  -1 siblings, 0 replies; 5+ messages in thread
From: Richard Henderson @ 2021-10-15 23:31 UTC (permalink / raw)
  To: Gavin Shan, qemu-arm
  Cc: robh, drjones, qemu-riscv, ehabkost, peter.maydell, qemu-devel,
	shan.gavin, imammedo

On 10/15/21 5:42 AM, Gavin Shan wrote:
> The empty NUMA node, where no memory resides, are allowed. For
> example, the following command line specifies two empty NUMA nodes.
> With this, QEMU fails to boot because of the conflicting device-tree
> node names, as the following error message indicates.
> 
>    /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
>    -accel kvm -machine virt,gic-version=host               \
>    -cpu host -smp 4,sockets=2,cores=2,threads=1            \
>    -m 1024M,slots=16,maxmem=64G                            \
>    -object memory-backend-ram,id=mem0,size=512M            \
>    -object memory-backend-ram,id=mem1,size=512M            \
>    -numa node,nodeid=0,cpus=0-1,memdev=mem0                \
>    -numa node,nodeid=1,cpus=2-3,memdev=mem1                \
>    -numa node,nodeid=2                                     \
>    -numa node,nodeid=3
>      :
>    qemu-system-aarch64: FDT: Failed to create subnode /memory@80000000: FDT_ERR_EXISTS
> 
> As specified by linux device-tree binding document, the device-tree
> nodes for these empty NUMA nodes shouldn't be generated. However,
> the corresponding NUMA node IDs should be included in the distance
> map. The memory hotplug through device-tree on ARM64 isn't existing
> so far and it's not necessary to require the user to provide a distance
> map. Furthermore, the default distance map Linux generates may even be
> sufficient. So this simply skips populating the device-tree nodes for
> these empty NUMA nodes to avoid the error, so that QEMU can be started
> successfully.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
> v5: Improved commit log and comments as Drew suggested.

Queued to target-arm, thanks.


r~


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-15 23:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-15 12:42 [PATCH v5] hw/arm/virt: Don't create device-tree node for empty NUMA node Gavin Shan
2021-10-15 12:42 ` Gavin Shan
2021-10-15 13:11 ` Andrew Jones
2021-10-15 13:11   ` Andrew Jones
2021-10-15 23:31 ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.