All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] hw/arm/virt: Fix qemu booting failure on device-tree
@ 2021-10-06 10:22 Gavin Shan
  2021-10-06 10:22 ` [PATCH 1/2] numa: Set default distance map if needed Gavin Shan
  2021-10-06 10:22 ` [PATCH 2/2] hw/arm/virt: Don't create device-tree node for empty NUMA node Gavin Shan
  0 siblings, 2 replies; 24+ messages in thread
From: Gavin Shan @ 2021-10-06 10:22 UTC (permalink / raw)
  To: qemu-arm; +Cc: peter.maydell, drjones, qemu-devel, shan.gavin, ehabkost

The empty NUMA nodes, where no memory resides, are allowed on ARM64 virt
platform. However, QEMU fails to boot because the device-tree can't be
populated due to the conflicting device-tree node names of these empty
NUMA nodes. For example, QEMU fails to boot and the following error
message reported when below command line is used.

  /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
  -accel kvm -machine virt,gic-version=host               \
  -cpu host -smp 4,sockets=2,cores=2,threads=1            \
  -m 1024M,slots=16,maxmem=64G                            \
  -object memory-backend-ram,id=mem0,size=512M            \
  -object memory-backend-ram,id=mem1,size=512M            \
  -numa node,nodeid=0,cpus=0-1,memdev=mem0                \
  -numa node,nodeid=1,cpus=2-3,memdev=mem1                \
  -numa node,nodeid=2                                     \
  -numa node,nodeid=3                                     \
    :
  qemu-system-aarch64: FDT: Failed to create subnode /memory@80000000: FDT_ERR_EXISTS

The lastest device-tree specification doesn't indicate how the device-tree
nodes should be populated for these empty NUMA nodes. The proposed way
to handle this is documented in linux kernel. The linux kernel patches
have been acknoledged and merged to upstream pretty soon.
  
  https://lkml.org/lkml/2021/9/27/31

This series follows the suggestion, which is included in linux kernel
patches, to resolve the QEMU boot failure issue: The corresponding
device-tree nodes aren't created for the empty NUMA nodes, but their
NUMA IDs and distance map matrix should be included in the distance-map
device-tree node.

Gavin Shan (2):
  numa: Set default distance map if needed
  hw/arm/virt: Don't create device-tree node for empty NUMA node

 hw/arm/boot.c  |  4 ++++
 hw/core/numa.c | 13 +++++++++++--
 2 files changed, 15 insertions(+), 2 deletions(-)

-- 
2.23.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/2] numa: Set default distance map if needed
  2021-10-06 10:22 [PATCH 0/2] hw/arm/virt: Fix qemu booting failure on device-tree Gavin Shan
@ 2021-10-06 10:22 ` Gavin Shan
  2021-10-06 10:35   ` Andrew Jones
  2021-10-12  9:40   ` Igor Mammedov
  2021-10-06 10:22 ` [PATCH 2/2] hw/arm/virt: Don't create device-tree node for empty NUMA node Gavin Shan
  1 sibling, 2 replies; 24+ messages in thread
From: Gavin Shan @ 2021-10-06 10:22 UTC (permalink / raw)
  To: qemu-arm; +Cc: peter.maydell, drjones, qemu-devel, shan.gavin, ehabkost

The following option is used to specify the distance map. It's
possible the option isn't provided by user. In this case, the
distance map isn't populated and exposed to platform. On the
other hand, the empty NUMA node, where no memory resides, is
allowed on ARM64 virt platform. For these empty NUMA nodes,
their corresponding device-tree nodes aren't populated, but
their NUMA IDs should be included in the "/distance-map"
device-tree node, so that kernel can probe them properly if
device-tree is used.

  -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>

So when user doesn't specify distance map, we need to generate
the default distance map, where the local and remote distances
are 10 and 20 separately. This adds an extra parameter to the
exiting complete_init_numa_distance() to generate the default
distance map for this case.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 hw/core/numa.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 510d096a88..fdb3a4aeca 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -594,7 +594,7 @@ static void validate_numa_distance(MachineState *ms)
     }
 }
 
-static void complete_init_numa_distance(MachineState *ms)
+static void complete_init_numa_distance(MachineState *ms, bool is_default)
 {
     int src, dst;
     NodeInfo *numa_info = ms->numa_state->nodes;
@@ -609,6 +609,8 @@ static void complete_init_numa_distance(MachineState *ms)
             if (numa_info[src].distance[dst] == 0) {
                 if (src == dst) {
                     numa_info[src].distance[dst] = NUMA_DISTANCE_MIN;
+                } else if (is_default) {
+                    numa_info[src].distance[dst] = NUMA_DISTANCE_DEFAULT;
                 } else {
                     numa_info[src].distance[dst] = numa_info[dst].distance[src];
                 }
@@ -716,13 +718,20 @@ void numa_complete_configuration(MachineState *ms)
          * A->B != distance B->A, then that means the distance table is
          * asymmetric. In this case, the distances for both directions
          * of all node pairs are required.
+         *
+         * The default node pair distances, which are 10 and 20 for the
+         * local and remote nodes separatly, are provided if user doesn't
+         * specify any node pair distances.
          */
         if (ms->numa_state->have_numa_distance) {
             /* Validate enough NUMA distance information was provided. */
             validate_numa_distance(ms);
 
             /* Validation succeeded, now fill in any missing distances. */
-            complete_init_numa_distance(ms);
+            complete_init_numa_distance(ms, false);
+        } else {
+            complete_init_numa_distance(ms, true);
+            ms->numa_state->have_numa_distance = true;
         }
     }
 }
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/2] hw/arm/virt: Don't create device-tree node for empty NUMA node
  2021-10-06 10:22 [PATCH 0/2] hw/arm/virt: Fix qemu booting failure on device-tree Gavin Shan
  2021-10-06 10:22 ` [PATCH 1/2] numa: Set default distance map if needed Gavin Shan
@ 2021-10-06 10:22 ` Gavin Shan
  2021-10-06 10:36   ` Andrew Jones
  1 sibling, 1 reply; 24+ messages in thread
From: Gavin Shan @ 2021-10-06 10:22 UTC (permalink / raw)
  To: qemu-arm; +Cc: peter.maydell, drjones, qemu-devel, shan.gavin, ehabkost

The empty NUMA node, where no memory resides, are allowed. For
example, the following command line specifies two empty NUMA nodes.
With this, QEMU fails to boot because of the conflicting device-tree
node names, as the following error message indicates.

  /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
  -accel kvm -machine virt,gic-version=host               \
  -cpu host -smp 4,sockets=2,cores=2,threads=1            \
  -m 1024M,slots=16,maxmem=64G                            \
  -object memory-backend-ram,id=mem0,size=512M            \
  -object memory-backend-ram,id=mem1,size=512M            \
  -numa node,nodeid=0,cpus=0-1,memdev=mem0                \
  -numa node,nodeid=1,cpus=2-3,memdev=mem1                \
  -numa node,nodeid=2                                     \
  -numa node,nodeid=3
    :
  qemu-system-aarch64: FDT: Failed to create subnode /memory@80000000: FDT_ERR_EXISTS

As specified by linux device-tree binding document, the device-tree
nodes for these empty NUMA nodes shouldn't be generated. However,
the corresponding NUMA node IDs should be included in the distance
map device-tree node. This skips populating the device-tree nodes
for these empty NUMA nodes to avoid the error, so that QEMU can be
started successfully.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 hw/arm/boot.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 57efb61ee4..4e5898fcdc 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -603,6 +603,10 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
         mem_base = binfo->loader_start;
         for (i = 0; i < ms->numa_state->num_nodes; i++) {
             mem_len = ms->numa_state->nodes[i].node_mem;
+            if (!mem_len) {
+                continue;
+            }
+
             rc = fdt_add_memory_node(fdt, acells, mem_base,
                                      scells, mem_len, i);
             if (rc < 0) {
-- 
2.23.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] numa: Set default distance map if needed
  2021-10-06 10:22 ` [PATCH 1/2] numa: Set default distance map if needed Gavin Shan
@ 2021-10-06 10:35   ` Andrew Jones
  2021-10-06 11:03     ` Gavin Shan
  2021-10-12  9:40   ` Igor Mammedov
  1 sibling, 1 reply; 24+ messages in thread
From: Andrew Jones @ 2021-10-06 10:35 UTC (permalink / raw)
  To: Gavin Shan; +Cc: peter.maydell, qemu-devel, qemu-arm, shan.gavin, ehabkost

On Wed, Oct 06, 2021 at 06:22:08PM +0800, Gavin Shan wrote:
> The following option is used to specify the distance map. It's
> possible the option isn't provided by user. In this case, the
> distance map isn't populated and exposed to platform. On the
> other hand, the empty NUMA node, where no memory resides, is
> allowed on ARM64 virt platform. For these empty NUMA nodes,
> their corresponding device-tree nodes aren't populated, but
> their NUMA IDs should be included in the "/distance-map"
> device-tree node, so that kernel can probe them properly if
> device-tree is used.
> 
>   -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
> 
> So when user doesn't specify distance map, we need to generate
> the default distance map, where the local and remote distances
> are 10 and 20 separately. This adds an extra parameter to the
> exiting complete_init_numa_distance() to generate the default
> distance map for this case.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  hw/core/numa.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/core/numa.c b/hw/core/numa.c
> index 510d096a88..fdb3a4aeca 100644
> --- a/hw/core/numa.c
> +++ b/hw/core/numa.c
> @@ -594,7 +594,7 @@ static void validate_numa_distance(MachineState *ms)
>      }
>  }
>  
> -static void complete_init_numa_distance(MachineState *ms)
> +static void complete_init_numa_distance(MachineState *ms, bool is_default)
>  {
>      int src, dst;
>      NodeInfo *numa_info = ms->numa_state->nodes;
> @@ -609,6 +609,8 @@ static void complete_init_numa_distance(MachineState *ms)
>              if (numa_info[src].distance[dst] == 0) {
>                  if (src == dst) {
>                      numa_info[src].distance[dst] = NUMA_DISTANCE_MIN;
> +                } else if (is_default) {
> +                    numa_info[src].distance[dst] = NUMA_DISTANCE_DEFAULT;
>                  } else {
>                      numa_info[src].distance[dst] = numa_info[dst].distance[src];
>                  }
> @@ -716,13 +718,20 @@ void numa_complete_configuration(MachineState *ms)
>           * A->B != distance B->A, then that means the distance table is
>           * asymmetric. In this case, the distances for both directions
>           * of all node pairs are required.
> +         *
> +         * The default node pair distances, which are 10 and 20 for the
> +         * local and remote nodes separatly, are provided if user doesn't
> +         * specify any node pair distances.
>           */
>          if (ms->numa_state->have_numa_distance) {
>              /* Validate enough NUMA distance information was provided. */
>              validate_numa_distance(ms);
>  
>              /* Validation succeeded, now fill in any missing distances. */
> -            complete_init_numa_distance(ms);
> +            complete_init_numa_distance(ms, false);
> +        } else {
> +            complete_init_numa_distance(ms, true);
> +            ms->numa_state->have_numa_distance = true;
>          }
>      }
>  }
> -- 
> 2.23.0
>

With this patch we'll always generate a distance map when there's a numa
config now. Is there any reason a user would not want to do that? I.e.
should we still give the user the choice of presenting a distance map?
Also, does the addition of a distance map in DTs for compat machine types
matter?

Otherwise patch looks good to me.

Thanks,
drew



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] hw/arm/virt: Don't create device-tree node for empty NUMA node
  2021-10-06 10:22 ` [PATCH 2/2] hw/arm/virt: Don't create device-tree node for empty NUMA node Gavin Shan
@ 2021-10-06 10:36   ` Andrew Jones
  0 siblings, 0 replies; 24+ messages in thread
From: Andrew Jones @ 2021-10-06 10:36 UTC (permalink / raw)
  To: Gavin Shan; +Cc: peter.maydell, qemu-devel, qemu-arm, shan.gavin, ehabkost

On Wed, Oct 06, 2021 at 06:22:09PM +0800, Gavin Shan wrote:
> The empty NUMA node, where no memory resides, are allowed. For
> example, the following command line specifies two empty NUMA nodes.
> With this, QEMU fails to boot because of the conflicting device-tree
> node names, as the following error message indicates.
> 
>   /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
>   -accel kvm -machine virt,gic-version=host               \
>   -cpu host -smp 4,sockets=2,cores=2,threads=1            \
>   -m 1024M,slots=16,maxmem=64G                            \
>   -object memory-backend-ram,id=mem0,size=512M            \
>   -object memory-backend-ram,id=mem1,size=512M            \
>   -numa node,nodeid=0,cpus=0-1,memdev=mem0                \
>   -numa node,nodeid=1,cpus=2-3,memdev=mem1                \
>   -numa node,nodeid=2                                     \
>   -numa node,nodeid=3
>     :
>   qemu-system-aarch64: FDT: Failed to create subnode /memory@80000000: FDT_ERR_EXISTS
> 
> As specified by linux device-tree binding document, the device-tree
> nodes for these empty NUMA nodes shouldn't be generated. However,
> the corresponding NUMA node IDs should be included in the distance
> map device-tree node. This skips populating the device-tree nodes
> for these empty NUMA nodes to avoid the error, so that QEMU can be
> started successfully.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  hw/arm/boot.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/hw/arm/boot.c b/hw/arm/boot.c
> index 57efb61ee4..4e5898fcdc 100644
> --- a/hw/arm/boot.c
> +++ b/hw/arm/boot.c
> @@ -603,6 +603,10 @@ int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo,
>          mem_base = binfo->loader_start;
>          for (i = 0; i < ms->numa_state->num_nodes; i++) {
>              mem_len = ms->numa_state->nodes[i].node_mem;
> +            if (!mem_len) {
> +                continue;
> +            }
> +
>              rc = fdt_add_memory_node(fdt, acells, mem_base,
>                                       scells, mem_len, i);
>              if (rc < 0) {
> -- 
> 2.23.0
>

Reviewed-by: Andrew Jones <drjones@redhat.com>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] numa: Set default distance map if needed
  2021-10-06 10:35   ` Andrew Jones
@ 2021-10-06 11:03     ` Gavin Shan
  2021-10-06 11:56       ` Andrew Jones
  0 siblings, 1 reply; 24+ messages in thread
From: Gavin Shan @ 2021-10-06 11:03 UTC (permalink / raw)
  To: Andrew Jones; +Cc: peter.maydell, qemu-devel, qemu-arm, shan.gavin, ehabkost

Hi Drew,

On 10/6/21 9:35 PM, Andrew Jones wrote:
> On Wed, Oct 06, 2021 at 06:22:08PM +0800, Gavin Shan wrote:
>> The following option is used to specify the distance map. It's
>> possible the option isn't provided by user. In this case, the
>> distance map isn't populated and exposed to platform. On the
>> other hand, the empty NUMA node, where no memory resides, is
>> allowed on ARM64 virt platform. For these empty NUMA nodes,
>> their corresponding device-tree nodes aren't populated, but
>> their NUMA IDs should be included in the "/distance-map"
>> device-tree node, so that kernel can probe them properly if
>> device-tree is used.
>>
>>    -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
>>
>> So when user doesn't specify distance map, we need to generate
>> the default distance map, where the local and remote distances
>> are 10 and 20 separately. This adds an extra parameter to the
>> exiting complete_init_numa_distance() to generate the default
>> distance map for this case.
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>>   hw/core/numa.c | 13 +++++++++++--
>>   1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/core/numa.c b/hw/core/numa.c
>> index 510d096a88..fdb3a4aeca 100644
>> --- a/hw/core/numa.c
>> +++ b/hw/core/numa.c
>> @@ -594,7 +594,7 @@ static void validate_numa_distance(MachineState *ms)
>>       }
>>   }
>>   
>> -static void complete_init_numa_distance(MachineState *ms)
>> +static void complete_init_numa_distance(MachineState *ms, bool is_default)
>>   {
>>       int src, dst;
>>       NodeInfo *numa_info = ms->numa_state->nodes;
>> @@ -609,6 +609,8 @@ static void complete_init_numa_distance(MachineState *ms)
>>               if (numa_info[src].distance[dst] == 0) {
>>                   if (src == dst) {
>>                       numa_info[src].distance[dst] = NUMA_DISTANCE_MIN;
>> +                } else if (is_default) {
>> +                    numa_info[src].distance[dst] = NUMA_DISTANCE_DEFAULT;
>>                   } else {
>>                       numa_info[src].distance[dst] = numa_info[dst].distance[src];
>>                   }
>> @@ -716,13 +718,20 @@ void numa_complete_configuration(MachineState *ms)
>>            * A->B != distance B->A, then that means the distance table is
>>            * asymmetric. In this case, the distances for both directions
>>            * of all node pairs are required.
>> +         *
>> +         * The default node pair distances, which are 10 and 20 for the
>> +         * local and remote nodes separatly, are provided if user doesn't
>> +         * specify any node pair distances.
>>            */
>>           if (ms->numa_state->have_numa_distance) {
>>               /* Validate enough NUMA distance information was provided. */
>>               validate_numa_distance(ms);
>>   
>>               /* Validation succeeded, now fill in any missing distances. */
>> -            complete_init_numa_distance(ms);
>> +            complete_init_numa_distance(ms, false);
>> +        } else {
>> +            complete_init_numa_distance(ms, true);
>> +            ms->numa_state->have_numa_distance = true;
>>           }
>>       }
>>   }
>> -- 
>> 2.23.0
>>
> 
> With this patch we'll always generate a distance map when there's a numa
> config now. Is there any reason a user would not want to do that? I.e.
> should we still give the user the choice of presenting a distance map?
> Also, does the addition of a distance map in DTs for compat machine types
> matter?
> 
> Otherwise patch looks good to me.
> 

Users needn't specify the distance map when the default one in kernel,
whose distances are 10 and 20 for local and remote nodes in linux for
all architectures and machines, is used. The following option is still
usable to specify the distance map.

   -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>

When the empty NUMA nodes are concerned, the distance map is mandatory
because their NUMA IDs are identified from there. So we always generate
the distance map as this patch does :)

Thanks,
Gavin




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] numa: Set default distance map if needed
  2021-10-06 11:03     ` Gavin Shan
@ 2021-10-06 11:56       ` Andrew Jones
  2021-10-07 23:51         ` Gavin Shan
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Jones @ 2021-10-06 11:56 UTC (permalink / raw)
  To: Gavin Shan; +Cc: peter.maydell, qemu-devel, qemu-arm, shan.gavin, ehabkost

On Wed, Oct 06, 2021 at 10:03:25PM +1100, Gavin Shan wrote:
> Hi Drew,
> 
> On 10/6/21 9:35 PM, Andrew Jones wrote:
> > On Wed, Oct 06, 2021 at 06:22:08PM +0800, Gavin Shan wrote:
> > > The following option is used to specify the distance map. It's
> > > possible the option isn't provided by user. In this case, the
> > > distance map isn't populated and exposed to platform. On the
> > > other hand, the empty NUMA node, where no memory resides, is
> > > allowed on ARM64 virt platform. For these empty NUMA nodes,
> > > their corresponding device-tree nodes aren't populated, but
> > > their NUMA IDs should be included in the "/distance-map"
> > > device-tree node, so that kernel can probe them properly if
> > > device-tree is used.
> > > 
> > >    -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
> > > 
> > > So when user doesn't specify distance map, we need to generate
> > > the default distance map, where the local and remote distances
> > > are 10 and 20 separately. This adds an extra parameter to the
> > > exiting complete_init_numa_distance() to generate the default
> > > distance map for this case.
> > > 
> > > Signed-off-by: Gavin Shan <gshan@redhat.com>
> > > ---
> > >   hw/core/numa.c | 13 +++++++++++--
> > >   1 file changed, 11 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/hw/core/numa.c b/hw/core/numa.c
> > > index 510d096a88..fdb3a4aeca 100644
> > > --- a/hw/core/numa.c
> > > +++ b/hw/core/numa.c
> > > @@ -594,7 +594,7 @@ static void validate_numa_distance(MachineState *ms)
> > >       }
> > >   }
> > > -static void complete_init_numa_distance(MachineState *ms)
> > > +static void complete_init_numa_distance(MachineState *ms, bool is_default)
> > >   {
> > >       int src, dst;
> > >       NodeInfo *numa_info = ms->numa_state->nodes;
> > > @@ -609,6 +609,8 @@ static void complete_init_numa_distance(MachineState *ms)
> > >               if (numa_info[src].distance[dst] == 0) {
> > >                   if (src == dst) {
> > >                       numa_info[src].distance[dst] = NUMA_DISTANCE_MIN;
> > > +                } else if (is_default) {
> > > +                    numa_info[src].distance[dst] = NUMA_DISTANCE_DEFAULT;
> > >                   } else {
> > >                       numa_info[src].distance[dst] = numa_info[dst].distance[src];
> > >                   }
> > > @@ -716,13 +718,20 @@ void numa_complete_configuration(MachineState *ms)
> > >            * A->B != distance B->A, then that means the distance table is
> > >            * asymmetric. In this case, the distances for both directions
> > >            * of all node pairs are required.
> > > +         *
> > > +         * The default node pair distances, which are 10 and 20 for the
> > > +         * local and remote nodes separatly, are provided if user doesn't
> > > +         * specify any node pair distances.
> > >            */
> > >           if (ms->numa_state->have_numa_distance) {
> > >               /* Validate enough NUMA distance information was provided. */
> > >               validate_numa_distance(ms);
> > >               /* Validation succeeded, now fill in any missing distances. */
> > > -            complete_init_numa_distance(ms);
> > > +            complete_init_numa_distance(ms, false);
> > > +        } else {
> > > +            complete_init_numa_distance(ms, true);
> > > +            ms->numa_state->have_numa_distance = true;
> > >           }
> > >       }
> > >   }
> > > -- 
> > > 2.23.0
> > > 
> > 
> > With this patch we'll always generate a distance map when there's a numa
> > config now. Is there any reason a user would not want to do that? I.e.
> > should we still give the user the choice of presenting a distance map?
> > Also, does the addition of a distance map in DTs for compat machine types
> > matter?
> > 
> > Otherwise patch looks good to me.
> > 
> 
> Users needn't specify the distance map when the default one in kernel,
> whose distances are 10 and 20 for local and remote nodes in linux for
> all architectures and machines, is used. The following option is still
> usable to specify the distance map.
> 
>   -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
> 
> When the empty NUMA nodes are concerned, the distance map is mandatory
> because their NUMA IDs are identified from there. So we always generate
> the distance map as this patch does :)
>

Yup, I knew all that already :-) I'm asking if we want to ensure the user
can still control whether or not this distance map is generated at all. If
a user doesn't want empty numa nodes or a distance map, then, with this
patch, they cannot avoid the map's generation. That configurability
question also relates to machine compatibility. Do we want to start
generating this distance map on old, numa configured machine types? This
patch will do that too.

But, it might be OK to just start generating this new DT node for all numa
configured machine types and not allow the user to opt out. I do know that
we allow hardware descriptions to be changed without compat code.  Also, a
disable-auto-distance-map option may be considered useless and therefore
not worth maintaining. The conservative in me says it's worth debating
these things first though.

(Note, empty numa nodes have never worked with QEMU, so it's OK to start
 erroring out when empty numa nodes and a disable-auto-distance-map option
 are given together.)

Thanks,
drew



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] numa: Set default distance map if needed
  2021-10-06 11:56       ` Andrew Jones
@ 2021-10-07 23:51         ` Gavin Shan
  2021-10-08  6:07           ` Andrew Jones
  0 siblings, 1 reply; 24+ messages in thread
From: Gavin Shan @ 2021-10-07 23:51 UTC (permalink / raw)
  To: Andrew Jones; +Cc: peter.maydell, qemu-devel, qemu-arm, shan.gavin, ehabkost

Hi Drew,

On 10/6/21 10:56 PM, Andrew Jones wrote:
> On Wed, Oct 06, 2021 at 10:03:25PM +1100, Gavin Shan wrote:
>> On 10/6/21 9:35 PM, Andrew Jones wrote:
>>> On Wed, Oct 06, 2021 at 06:22:08PM +0800, Gavin Shan wrote:
>>>> The following option is used to specify the distance map. It's
>>>> possible the option isn't provided by user. In this case, the
>>>> distance map isn't populated and exposed to platform. On the
>>>> other hand, the empty NUMA node, where no memory resides, is
>>>> allowed on ARM64 virt platform. For these empty NUMA nodes,
>>>> their corresponding device-tree nodes aren't populated, but
>>>> their NUMA IDs should be included in the "/distance-map"
>>>> device-tree node, so that kernel can probe them properly if
>>>> device-tree is used.
>>>>
>>>>     -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
>>>>
>>>> So when user doesn't specify distance map, we need to generate
>>>> the default distance map, where the local and remote distances
>>>> are 10 and 20 separately. This adds an extra parameter to the
>>>> exiting complete_init_numa_distance() to generate the default
>>>> distance map for this case.
>>>>
>>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>>> ---
>>>>    hw/core/numa.c | 13 +++++++++++--
>>>>    1 file changed, 11 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/core/numa.c b/hw/core/numa.c
>>>> index 510d096a88..fdb3a4aeca 100644
>>>> --- a/hw/core/numa.c
>>>> +++ b/hw/core/numa.c
>>>> @@ -594,7 +594,7 @@ static void validate_numa_distance(MachineState *ms)
>>>>        }
>>>>    }
>>>> -static void complete_init_numa_distance(MachineState *ms)
>>>> +static void complete_init_numa_distance(MachineState *ms, bool is_default)
>>>>    {
>>>>        int src, dst;
>>>>        NodeInfo *numa_info = ms->numa_state->nodes;
>>>> @@ -609,6 +609,8 @@ static void complete_init_numa_distance(MachineState *ms)
>>>>                if (numa_info[src].distance[dst] == 0) {
>>>>                    if (src == dst) {
>>>>                        numa_info[src].distance[dst] = NUMA_DISTANCE_MIN;
>>>> +                } else if (is_default) {
>>>> +                    numa_info[src].distance[dst] = NUMA_DISTANCE_DEFAULT;
>>>>                    } else {
>>>>                        numa_info[src].distance[dst] = numa_info[dst].distance[src];
>>>>                    }
>>>> @@ -716,13 +718,20 @@ void numa_complete_configuration(MachineState *ms)
>>>>             * A->B != distance B->A, then that means the distance table is
>>>>             * asymmetric. In this case, the distances for both directions
>>>>             * of all node pairs are required.
>>>> +         *
>>>> +         * The default node pair distances, which are 10 and 20 for the
>>>> +         * local and remote nodes separatly, are provided if user doesn't
>>>> +         * specify any node pair distances.
>>>>             */
>>>>            if (ms->numa_state->have_numa_distance) {
>>>>                /* Validate enough NUMA distance information was provided. */
>>>>                validate_numa_distance(ms);
>>>>                /* Validation succeeded, now fill in any missing distances. */
>>>> -            complete_init_numa_distance(ms);
>>>> +            complete_init_numa_distance(ms, false);
>>>> +        } else {
>>>> +            complete_init_numa_distance(ms, true);
>>>> +            ms->numa_state->have_numa_distance = true;
>>>>            }
>>>>        }
>>>>    }
>>>> -- 
>>>> 2.23.0
>>>>
>>>
>>> With this patch we'll always generate a distance map when there's a numa
>>> config now. Is there any reason a user would not want to do that? I.e.
>>> should we still give the user the choice of presenting a distance map?
>>> Also, does the addition of a distance map in DTs for compat machine types
>>> matter?
>>>
>>> Otherwise patch looks good to me.
>>>
>>
>> Users needn't specify the distance map when the default one in kernel,
>> whose distances are 10 and 20 for local and remote nodes in linux for
>> all architectures and machines, is used. The following option is still
>> usable to specify the distance map.
>>
>>    -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
>>
>> When the empty NUMA nodes are concerned, the distance map is mandatory
>> because their NUMA IDs are identified from there. So we always generate
>> the distance map as this patch does :)
>>
> 
> Yup, I knew all that already :-) I'm asking if we want to ensure the user
> can still control whether or not this distance map is generated at all. If
> a user doesn't want empty numa nodes or a distance map, then, with this
> patch, they cannot avoid the map's generation. That configurability
> question also relates to machine compatibility. Do we want to start
> generating this distance map on old, numa configured machine types? This
> patch will do that too.
> 
> But, it might be OK to just start generating this new DT node for all numa
> configured machine types and not allow the user to opt out. I do know that
> we allow hardware descriptions to be changed without compat code.  Also, a
> disable-auto-distance-map option may be considered useless and therefore
> not worth maintaining. The conservative in me says it's worth debating
> these things first though.
> 
> (Note, empty numa nodes have never worked with QEMU, so it's OK to start
>   erroring out when empty numa nodes and a disable-auto-distance-map option
>   are given together.)
> 

Sorry for the delay. I didn't fully understand "machine compatibility" even
after checking the code around. Could you please provide more details? I'm
not sure if the enforced distance-map for empty NUMA nodes will cause any
issues?

Yes, the empty NUMA node never worked with QEMU if device-tree is used.
We still need to figure out a way to support memory hotplug through
device-tree, similar thing as to what IBM's pSeries platform has.
However, it works when ACPI table is used. Taking the following
command line as an example, the hot-added memory is always put
into the last NUMA node (3). The last NUMA node can be empty node
after changing the code to allow to export ACPI SRAT table to include
the empty NUMA nodes.

    /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
    -accel kvm -machine virt,gic-version=host               \
    -cpu host -smp 4,sockets=2,cores=2,threads=1            \
    -m 1024M,slots=16,maxmem=64G                            \
    -object memory-backend-ram,id=mem0,size=512M            \
    -object memory-backend-ram,id=mem1,size=512M            \
    -numa node,nodeid=0,cpus=0-1,memdev=mem0                \
    -numa node,nodeid=1,cpus=2-3,memdev=mem1                \
    -numa node,nodeid=2                                     \
    -numa node,nodeid=3
      :
      :
    guest# cat /sys/devices/system/node/node3/meminfo | grep MemTotal
    Node 3 MemTotal:              0 kB
    (qemu) object_add memory-backend-ram,id=hpmem0,size=1G
    (qemu) device_add pc-dimm,id=dimm1,memdev=hpmem0,node=3
    guest# cat /sys/devices/system/node/node3/meminfo | grep MemTotal
    Node 3 MemTotal:        1048576 kB

Thanks,
Gavin





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] numa: Set default distance map if needed
  2021-10-07 23:51         ` Gavin Shan
@ 2021-10-08  6:07           ` Andrew Jones
  2021-10-12  6:13             ` Gavin Shan
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Jones @ 2021-10-08  6:07 UTC (permalink / raw)
  To: Gavin Shan; +Cc: peter.maydell, qemu-devel, qemu-arm, shan.gavin, ehabkost

On Fri, Oct 08, 2021 at 10:51:24AM +1100, Gavin Shan wrote:
> Hi Drew,
> 
> On 10/6/21 10:56 PM, Andrew Jones wrote:
> > On Wed, Oct 06, 2021 at 10:03:25PM +1100, Gavin Shan wrote:
> > > On 10/6/21 9:35 PM, Andrew Jones wrote:
> > > > On Wed, Oct 06, 2021 at 06:22:08PM +0800, Gavin Shan wrote:
> > > > > The following option is used to specify the distance map. It's
> > > > > possible the option isn't provided by user. In this case, the
> > > > > distance map isn't populated and exposed to platform. On the
> > > > > other hand, the empty NUMA node, where no memory resides, is
> > > > > allowed on ARM64 virt platform. For these empty NUMA nodes,
> > > > > their corresponding device-tree nodes aren't populated, but
> > > > > their NUMA IDs should be included in the "/distance-map"
> > > > > device-tree node, so that kernel can probe them properly if
> > > > > device-tree is used.
> > > > > 
> > > > >     -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
> > > > > 
> > > > > So when user doesn't specify distance map, we need to generate
> > > > > the default distance map, where the local and remote distances
> > > > > are 10 and 20 separately. This adds an extra parameter to the
> > > > > exiting complete_init_numa_distance() to generate the default
> > > > > distance map for this case.
> > > > > 
> > > > > Signed-off-by: Gavin Shan <gshan@redhat.com>
> > > > > ---
> > > > >    hw/core/numa.c | 13 +++++++++++--
> > > > >    1 file changed, 11 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/hw/core/numa.c b/hw/core/numa.c
> > > > > index 510d096a88..fdb3a4aeca 100644
> > > > > --- a/hw/core/numa.c
> > > > > +++ b/hw/core/numa.c
> > > > > @@ -594,7 +594,7 @@ static void validate_numa_distance(MachineState *ms)
> > > > >        }
> > > > >    }
> > > > > -static void complete_init_numa_distance(MachineState *ms)
> > > > > +static void complete_init_numa_distance(MachineState *ms, bool is_default)
> > > > >    {
> > > > >        int src, dst;
> > > > >        NodeInfo *numa_info = ms->numa_state->nodes;
> > > > > @@ -609,6 +609,8 @@ static void complete_init_numa_distance(MachineState *ms)
> > > > >                if (numa_info[src].distance[dst] == 0) {
> > > > >                    if (src == dst) {
> > > > >                        numa_info[src].distance[dst] = NUMA_DISTANCE_MIN;
> > > > > +                } else if (is_default) {
> > > > > +                    numa_info[src].distance[dst] = NUMA_DISTANCE_DEFAULT;
> > > > >                    } else {
> > > > >                        numa_info[src].distance[dst] = numa_info[dst].distance[src];
> > > > >                    }
> > > > > @@ -716,13 +718,20 @@ void numa_complete_configuration(MachineState *ms)
> > > > >             * A->B != distance B->A, then that means the distance table is
> > > > >             * asymmetric. In this case, the distances for both directions
> > > > >             * of all node pairs are required.
> > > > > +         *
> > > > > +         * The default node pair distances, which are 10 and 20 for the
> > > > > +         * local and remote nodes separatly, are provided if user doesn't
> > > > > +         * specify any node pair distances.
> > > > >             */
> > > > >            if (ms->numa_state->have_numa_distance) {
> > > > >                /* Validate enough NUMA distance information was provided. */
> > > > >                validate_numa_distance(ms);
> > > > >                /* Validation succeeded, now fill in any missing distances. */
> > > > > -            complete_init_numa_distance(ms);
> > > > > +            complete_init_numa_distance(ms, false);
> > > > > +        } else {
> > > > > +            complete_init_numa_distance(ms, true);
> > > > > +            ms->numa_state->have_numa_distance = true;
> > > > >            }
> > > > >        }
> > > > >    }
> > > > > -- 
> > > > > 2.23.0
> > > > > 
> > > > 
> > > > With this patch we'll always generate a distance map when there's a numa
> > > > config now. Is there any reason a user would not want to do that? I.e.
> > > > should we still give the user the choice of presenting a distance map?
> > > > Also, does the addition of a distance map in DTs for compat machine types
> > > > matter?
> > > > 
> > > > Otherwise patch looks good to me.
> > > > 
> > > 
> > > Users needn't specify the distance map when the default one in kernel,
> > > whose distances are 10 and 20 for local and remote nodes in linux for
> > > all architectures and machines, is used. The following option is still
> > > usable to specify the distance map.
> > > 
> > >    -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
> > > 
> > > When the empty NUMA nodes are concerned, the distance map is mandatory
> > > because their NUMA IDs are identified from there. So we always generate
> > > the distance map as this patch does :)
> > > 
> > 
> > Yup, I knew all that already :-) I'm asking if we want to ensure the user
> > can still control whether or not this distance map is generated at all. If
> > a user doesn't want empty numa nodes or a distance map, then, with this
> > patch, they cannot avoid the map's generation. That configurability
> > question also relates to machine compatibility. Do we want to start
> > generating this distance map on old, numa configured machine types? This
> > patch will do that too.
> > 
> > But, it might be OK to just start generating this new DT node for all numa
> > configured machine types and not allow the user to opt out. I do know that
> > we allow hardware descriptions to be changed without compat code.  Also, a
> > disable-auto-distance-map option may be considered useless and therefore
> > not worth maintaining. The conservative in me says it's worth debating
> > these things first though.
> > 
> > (Note, empty numa nodes have never worked with QEMU, so it's OK to start
> >   erroring out when empty numa nodes and a disable-auto-distance-map option
> >   are given together.)
> > 
> 
> Sorry for the delay. I didn't fully understand "machine compatibility" even
> after checking the code around. Could you please provide more details? I'm
> not sure if the enforced distance-map for empty NUMA nodes will cause any
> issues?

On QEMU, currently booting/running VMs on machine type X should not notice
when QEMU has been updated and they are still boot with machine type X.
That's what the "compat machine types" stuff means and what I'm referring
to above. I think it may be fine to boot a VM that never had a
distance-map before on an updated QEMU with machine type X and suddenly
get a distance-map, because we claim this is similar to a firmware update
that will change hardware descriptions on reboot. We expect guest kernels
to be tolerant of that. That said, there's always some risk, so we need
to consciously make that decision. Also, if we choose to expose a switch
to disable to the auto-distance-map to the user, then it's pretty trivial
to automatically set that on older machine types in order to avoid the
concern. So, do we think we need to expose a disable-auto-distance-map
type of option? Or would that be a useless burden? Also, if the decision
is to not worry about it, then the commit message should be updated to
add the rationale for that decision.

> 
> Yes, the empty NUMA node never worked with QEMU if device-tree is used.
> We still need to figure out a way to support memory hotplug through
> device-tree, similar thing as to what IBM's pSeries platform has.

That's for the guest kernel to figure out. I doubt it'll be a high
priority, though, because, as you've shown below, memory hotplug works
with ACPI, which is what Arm servers use. I don't expect smaller DT
platforms to care much about memory hotplug.

Thanks,
drew

> However, it works when ACPI table is used. Taking the following
> command line as an example, the hot-added memory is always put
> into the last NUMA node (3). The last NUMA node can be empty node
> after changing the code to allow to export ACPI SRAT table to include
> the empty NUMA nodes.
> 
>    /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
>    -accel kvm -machine virt,gic-version=host               \
>    -cpu host -smp 4,sockets=2,cores=2,threads=1            \
>    -m 1024M,slots=16,maxmem=64G                            \
>    -object memory-backend-ram,id=mem0,size=512M            \
>    -object memory-backend-ram,id=mem1,size=512M            \
>    -numa node,nodeid=0,cpus=0-1,memdev=mem0                \
>    -numa node,nodeid=1,cpus=2-3,memdev=mem1                \
>    -numa node,nodeid=2                                     \
>    -numa node,nodeid=3
>      :
>      :
>    guest# cat /sys/devices/system/node/node3/meminfo | grep MemTotal
>    Node 3 MemTotal:              0 kB
>    (qemu) object_add memory-backend-ram,id=hpmem0,size=1G
>    (qemu) device_add pc-dimm,id=dimm1,memdev=hpmem0,node=3
>    guest# cat /sys/devices/system/node/node3/meminfo | grep MemTotal
>    Node 3 MemTotal:        1048576 kB
> 
> Thanks,
> Gavin
> 
> 
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] numa: Set default distance map if needed
  2021-10-08  6:07           ` Andrew Jones
@ 2021-10-12  6:13             ` Gavin Shan
  0 siblings, 0 replies; 24+ messages in thread
From: Gavin Shan @ 2021-10-12  6:13 UTC (permalink / raw)
  To: Andrew Jones; +Cc: peter.maydell, qemu-devel, qemu-arm, shan.gavin, ehabkost

Hi Drew,

On 10/8/21 5:07 PM, Andrew Jones wrote:
> On Fri, Oct 08, 2021 at 10:51:24AM +1100, Gavin Shan wrote:
>> On 10/6/21 10:56 PM, Andrew Jones wrote:
>>> On Wed, Oct 06, 2021 at 10:03:25PM +1100, Gavin Shan wrote:
>>>> On 10/6/21 9:35 PM, Andrew Jones wrote:
>>>>> On Wed, Oct 06, 2021 at 06:22:08PM +0800, Gavin Shan wrote:
>>>>>> The following option is used to specify the distance map. It's
>>>>>> possible the option isn't provided by user. In this case, the
>>>>>> distance map isn't populated and exposed to platform. On the
>>>>>> other hand, the empty NUMA node, where no memory resides, is
>>>>>> allowed on ARM64 virt platform. For these empty NUMA nodes,
>>>>>> their corresponding device-tree nodes aren't populated, but
>>>>>> their NUMA IDs should be included in the "/distance-map"
>>>>>> device-tree node, so that kernel can probe them properly if
>>>>>> device-tree is used.
>>>>>>
>>>>>>      -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
>>>>>>
>>>>>> So when user doesn't specify distance map, we need to generate
>>>>>> the default distance map, where the local and remote distances
>>>>>> are 10 and 20 separately. This adds an extra parameter to the
>>>>>> exiting complete_init_numa_distance() to generate the default
>>>>>> distance map for this case.
>>>>>>
>>>>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>>>>> ---
>>>>>>     hw/core/numa.c | 13 +++++++++++--
>>>>>>     1 file changed, 11 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/core/numa.c b/hw/core/numa.c
>>>>>> index 510d096a88..fdb3a4aeca 100644
>>>>>> --- a/hw/core/numa.c
>>>>>> +++ b/hw/core/numa.c
>>>>>> @@ -594,7 +594,7 @@ static void validate_numa_distance(MachineState *ms)
>>>>>>         }
>>>>>>     }
>>>>>> -static void complete_init_numa_distance(MachineState *ms)
>>>>>> +static void complete_init_numa_distance(MachineState *ms, bool is_default)
>>>>>>     {
>>>>>>         int src, dst;
>>>>>>         NodeInfo *numa_info = ms->numa_state->nodes;
>>>>>> @@ -609,6 +609,8 @@ static void complete_init_numa_distance(MachineState *ms)
>>>>>>                 if (numa_info[src].distance[dst] == 0) {
>>>>>>                     if (src == dst) {
>>>>>>                         numa_info[src].distance[dst] = NUMA_DISTANCE_MIN;
>>>>>> +                } else if (is_default) {
>>>>>> +                    numa_info[src].distance[dst] = NUMA_DISTANCE_DEFAULT;
>>>>>>                     } else {
>>>>>>                         numa_info[src].distance[dst] = numa_info[dst].distance[src];
>>>>>>                     }
>>>>>> @@ -716,13 +718,20 @@ void numa_complete_configuration(MachineState *ms)
>>>>>>              * A->B != distance B->A, then that means the distance table is
>>>>>>              * asymmetric. In this case, the distances for both directions
>>>>>>              * of all node pairs are required.
>>>>>> +         *
>>>>>> +         * The default node pair distances, which are 10 and 20 for the
>>>>>> +         * local and remote nodes separatly, are provided if user doesn't
>>>>>> +         * specify any node pair distances.
>>>>>>              */
>>>>>>             if (ms->numa_state->have_numa_distance) {
>>>>>>                 /* Validate enough NUMA distance information was provided. */
>>>>>>                 validate_numa_distance(ms);
>>>>>>                 /* Validation succeeded, now fill in any missing distances. */
>>>>>> -            complete_init_numa_distance(ms);
>>>>>> +            complete_init_numa_distance(ms, false);
>>>>>> +        } else {
>>>>>> +            complete_init_numa_distance(ms, true);
>>>>>> +            ms->numa_state->have_numa_distance = true;
>>>>>>             }
>>>>>>         }
>>>>>>     }
>>>>>> -- 
>>>>>> 2.23.0
>>>>>>
>>>>>
>>>>> With this patch we'll always generate a distance map when there's a numa
>>>>> config now. Is there any reason a user would not want to do that? I.e.
>>>>> should we still give the user the choice of presenting a distance map?
>>>>> Also, does the addition of a distance map in DTs for compat machine types
>>>>> matter?
>>>>>
>>>>> Otherwise patch looks good to me.
>>>>>
>>>>
>>>> Users needn't specify the distance map when the default one in kernel,
>>>> whose distances are 10 and 20 for local and remote nodes in linux for
>>>> all architectures and machines, is used. The following option is still
>>>> usable to specify the distance map.
>>>>
>>>>     -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
>>>>
>>>> When the empty NUMA nodes are concerned, the distance map is mandatory
>>>> because their NUMA IDs are identified from there. So we always generate
>>>> the distance map as this patch does :)
>>>>
>>>
>>> Yup, I knew all that already :-) I'm asking if we want to ensure the user
>>> can still control whether or not this distance map is generated at all. If
>>> a user doesn't want empty numa nodes or a distance map, then, with this
>>> patch, they cannot avoid the map's generation. That configurability
>>> question also relates to machine compatibility. Do we want to start
>>> generating this distance map on old, numa configured machine types? This
>>> patch will do that too.
>>>
>>> But, it might be OK to just start generating this new DT node for all numa
>>> configured machine types and not allow the user to opt out. I do know that
>>> we allow hardware descriptions to be changed without compat code.  Also, a
>>> disable-auto-distance-map option may be considered useless and therefore
>>> not worth maintaining. The conservative in me says it's worth debating
>>> these things first though.
>>>
>>> (Note, empty numa nodes have never worked with QEMU, so it's OK to start
>>>    erroring out when empty numa nodes and a disable-auto-distance-map option
>>>    are given together.)
>>>
>>
>> Sorry for the delay. I didn't fully understand "machine compatibility" even
>> after checking the code around. Could you please provide more details? I'm
>> not sure if the enforced distance-map for empty NUMA nodes will cause any
>> issues?
> 
> On QEMU, currently booting/running VMs on machine type X should not notice
> when QEMU has been updated and they are still boot with machine type X.
> That's what the "compat machine types" stuff means and what I'm referring
> to above. I think it may be fine to boot a VM that never had a
> distance-map before on an updated QEMU with machine type X and suddenly
> get a distance-map, because we claim this is similar to a firmware update
> that will change hardware descriptions on reboot. We expect guest kernels
> to be tolerant of that. That said, there's always some risk, so we need
> to consciously make that decision. Also, if we choose to expose a switch
> to disable to the auto-distance-map to the user, then it's pretty trivial
> to automatically set that on older machine types in order to avoid the
> concern. So, do we think we need to expose a disable-auto-distance-map
> type of option? Or would that be a useless burden? Also, if the decision
> is to not worry about it, then the commit message should be updated to
> add the rationale for that decision.
> 
>>
>> Yes, the empty NUMA node never worked with QEMU if device-tree is used.
>> We still need to figure out a way to support memory hotplug through
>> device-tree, similar thing as to what IBM's pSeries platform has.
> 
> That's for the guest kernel to figure out. I doubt it'll be a high
> priority, though, because, as you've shown below, memory hotplug works
> with ACPI, which is what Arm servers use. I don't expect smaller DT
> platforms to care much about memory hotplug.
> 

[...]

Thanks for the detailed explanation about the compatible machine type
issue. I don't think we need to introduce the switch to disable the
distance map as I think the guest is tolerant in this case: There
are two cases for machine type X to boot when distance map is missed.
The distance map is either not parsed or expected to be correct and
complete. If we provide correct and complete (default) distance map,
the machine type X won't be affected. So I will amend the changelog
to explain why we needn't the switch in v2, to be posted shortly.

Yup, It's different topic to support memory hotplug device-tree. I
agree it's not urgent to support it as ACPI is required or event
mandatory to boot Linux ARM64 servers. The embedded systems would
be different, but people might be not concerned to memory hotplug
on embedded systems.

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] numa: Set default distance map if needed
  2021-10-06 10:22 ` [PATCH 1/2] numa: Set default distance map if needed Gavin Shan
  2021-10-06 10:35   ` Andrew Jones
@ 2021-10-12  9:40   ` Igor Mammedov
  2021-10-12 10:31     ` Gavin Shan
  2021-10-12 10:37     ` Andrew Jones
  1 sibling, 2 replies; 24+ messages in thread
From: Igor Mammedov @ 2021-10-12  9:40 UTC (permalink / raw)
  To: Gavin Shan
  Cc: peter.maydell, drjones, ehabkost, qemu-devel, qemu-arm, shan.gavin

On Wed,  6 Oct 2021 18:22:08 +0800
Gavin Shan <gshan@redhat.com> wrote:

> The following option is used to specify the distance map. It's
> possible the option isn't provided by user. In this case, the
> distance map isn't populated and exposed to platform. On the
> other hand, the empty NUMA node, where no memory resides, is
> allowed on ARM64 virt platform. For these empty NUMA nodes,
> their corresponding device-tree nodes aren't populated, but
> their NUMA IDs should be included in the "/distance-map"
> device-tree node, so that kernel can probe them properly if
> device-tree is used.
> 
>   -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
> 
> So when user doesn't specify distance map, we need to generate
> the default distance map, where the local and remote distances
> are 10 and 20 separately. This adds an extra parameter to the
> exiting complete_init_numa_distance() to generate the default
> distance map for this case.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>


how about error-ing out if distance map is required but
not provided by user explicitly and asking user to fix
command line?

Reasoning behind this that defaults are hard to maintain
and will require compat hacks and being raod blocks down
the road.
Approach I was taking with generic NUMA code, is deprecating
defaults and replacing them with sanity checks, which bail
out on incorrect configuration and ask user to correct command line.
Hence I dislike approach taken in this patch.

If you really wish to provide default, push it out of
generic code into ARM specific one
(then I won't oppose it that much (I think PPC does
some magic like this))
Also behavior seems to be ARM specific so generic
NUMA code isn't a place for it anyways

> ---
>  hw/core/numa.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/core/numa.c b/hw/core/numa.c
> index 510d096a88..fdb3a4aeca 100644
> --- a/hw/core/numa.c
> +++ b/hw/core/numa.c
> @@ -594,7 +594,7 @@ static void validate_numa_distance(MachineState *ms)
>      }
>  }
>  
> -static void complete_init_numa_distance(MachineState *ms)
> +static void complete_init_numa_distance(MachineState *ms, bool is_default)
>  {
>      int src, dst;
>      NodeInfo *numa_info = ms->numa_state->nodes;
> @@ -609,6 +609,8 @@ static void complete_init_numa_distance(MachineState *ms)
>              if (numa_info[src].distance[dst] == 0) {
>                  if (src == dst) {
>                      numa_info[src].distance[dst] = NUMA_DISTANCE_MIN;
> +                } else if (is_default) {
> +                    numa_info[src].distance[dst] = NUMA_DISTANCE_DEFAULT;
>                  } else {
>                      numa_info[src].distance[dst] = numa_info[dst].distance[src];
>                  }
> @@ -716,13 +718,20 @@ void numa_complete_configuration(MachineState *ms)
>           * A->B != distance B->A, then that means the distance table is
>           * asymmetric. In this case, the distances for both directions
>           * of all node pairs are required.
> +         *
> +         * The default node pair distances, which are 10 and 20 for the
> +         * local and remote nodes separatly, are provided if user doesn't
> +         * specify any node pair distances.
>           */
>          if (ms->numa_state->have_numa_distance) {
>              /* Validate enough NUMA distance information was provided. */
>              validate_numa_distance(ms);
>  
>              /* Validation succeeded, now fill in any missing distances. */
> -            complete_init_numa_distance(ms);
> +            complete_init_numa_distance(ms, false);
> +        } else {
> +            complete_init_numa_distance(ms, true);
> +            ms->numa_state->have_numa_distance = true;
>          }
>      }
>  }



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] numa: Set default distance map if needed
  2021-10-12  9:40   ` Igor Mammedov
@ 2021-10-12 10:31     ` Gavin Shan
  2021-10-12 11:18       ` Igor Mammedov
  2021-10-12 11:48       ` Andrew Jones
  2021-10-12 10:37     ` Andrew Jones
  1 sibling, 2 replies; 24+ messages in thread
From: Gavin Shan @ 2021-10-12 10:31 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: peter.maydell, drjones, ehabkost, qemu-devel, qemu-arm, shan.gavin

Hi Igor,

On 10/12/21 8:40 PM, Igor Mammedov wrote:
> On Wed,  6 Oct 2021 18:22:08 +0800
> Gavin Shan <gshan@redhat.com> wrote:
> 
>> The following option is used to specify the distance map. It's
>> possible the option isn't provided by user. In this case, the
>> distance map isn't populated and exposed to platform. On the
>> other hand, the empty NUMA node, where no memory resides, is
>> allowed on ARM64 virt platform. For these empty NUMA nodes,
>> their corresponding device-tree nodes aren't populated, but
>> their NUMA IDs should be included in the "/distance-map"
>> device-tree node, so that kernel can probe them properly if
>> device-tree is used.
>>
>>    -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
>>
>> So when user doesn't specify distance map, we need to generate
>> the default distance map, where the local and remote distances
>> are 10 and 20 separately. This adds an extra parameter to the
>> exiting complete_init_numa_distance() to generate the default
>> distance map for this case.
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
> 
> 
> how about error-ing out if distance map is required but
> not provided by user explicitly and asking user to fix
> command line?
> 
> Reasoning behind this that defaults are hard to maintain
> and will require compat hacks and being raod blocks down
> the road.
> Approach I was taking with generic NUMA code, is deprecating
> defaults and replacing them with sanity checks, which bail
> out on incorrect configuration and ask user to correct command line.
> Hence I dislike approach taken in this patch.
> 
> If you really wish to provide default, push it out of
> generic code into ARM specific one
> (then I won't oppose it that much (I think PPC does
> some magic like this))
> Also behavior seems to be ARM specific so generic
> NUMA code isn't a place for it anyways
> 

Thanks for your comments.

Yep, Lets move the logic into hw/arm/virt in v3 because I think simply
error-ing out will block the existing configuration where the distance
map isn't provided by user. After moving the logic to hw/arm/virt,
this patch is consistent with PATCH[02/02] and the specific platform
is affected only.


>> ---
>>   hw/core/numa.c | 13 +++++++++++--
>>   1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/core/numa.c b/hw/core/numa.c
>> index 510d096a88..fdb3a4aeca 100644
>> --- a/hw/core/numa.c
>> +++ b/hw/core/numa.c
>> @@ -594,7 +594,7 @@ static void validate_numa_distance(MachineState *ms)
>>       }
>>   }
>>   
>> -static void complete_init_numa_distance(MachineState *ms)
>> +static void complete_init_numa_distance(MachineState *ms, bool is_default)
>>   {
>>       int src, dst;
>>       NodeInfo *numa_info = ms->numa_state->nodes;
>> @@ -609,6 +609,8 @@ static void complete_init_numa_distance(MachineState *ms)
>>               if (numa_info[src].distance[dst] == 0) {
>>                   if (src == dst) {
>>                       numa_info[src].distance[dst] = NUMA_DISTANCE_MIN;
>> +                } else if (is_default) {
>> +                    numa_info[src].distance[dst] = NUMA_DISTANCE_DEFAULT;
>>                   } else {
>>                       numa_info[src].distance[dst] = numa_info[dst].distance[src];
>>                   }
>> @@ -716,13 +718,20 @@ void numa_complete_configuration(MachineState *ms)
>>            * A->B != distance B->A, then that means the distance table is
>>            * asymmetric. In this case, the distances for both directions
>>            * of all node pairs are required.
>> +         *
>> +         * The default node pair distances, which are 10 and 20 for the
>> +         * local and remote nodes separatly, are provided if user doesn't
>> +         * specify any node pair distances.
>>            */
>>           if (ms->numa_state->have_numa_distance) {
>>               /* Validate enough NUMA distance information was provided. */
>>               validate_numa_distance(ms);
>>   
>>               /* Validation succeeded, now fill in any missing distances. */
>> -            complete_init_numa_distance(ms);
>> +            complete_init_numa_distance(ms, false);
>> +        } else {
>> +            complete_init_numa_distance(ms, true);
>> +            ms->numa_state->have_numa_distance = true;
>>           }
>>       }
>>   }
> 

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] numa: Set default distance map if needed
  2021-10-12  9:40   ` Igor Mammedov
  2021-10-12 10:31     ` Gavin Shan
@ 2021-10-12 10:37     ` Andrew Jones
  2021-10-12 12:27       ` Igor Mammedov
  1 sibling, 1 reply; 24+ messages in thread
From: Andrew Jones @ 2021-10-12 10:37 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: peter.maydell, Gavin Shan, ehabkost, qemu-devel, qemu-arm, shan.gavin

On Tue, Oct 12, 2021 at 11:40:16AM +0200, Igor Mammedov wrote:
> On Wed,  6 Oct 2021 18:22:08 +0800
> Gavin Shan <gshan@redhat.com> wrote:
> 
> > The following option is used to specify the distance map. It's
> > possible the option isn't provided by user. In this case, the
> > distance map isn't populated and exposed to platform. On the
> > other hand, the empty NUMA node, where no memory resides, is
> > allowed on ARM64 virt platform. For these empty NUMA nodes,
> > their corresponding device-tree nodes aren't populated, but
> > their NUMA IDs should be included in the "/distance-map"
> > device-tree node, so that kernel can probe them properly if
> > device-tree is used.
> > 
> >   -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
> > 
> > So when user doesn't specify distance map, we need to generate
> > the default distance map, where the local and remote distances
> > are 10 and 20 separately. This adds an extra parameter to the
> > exiting complete_init_numa_distance() to generate the default
> > distance map for this case.
> > 
> > Signed-off-by: Gavin Shan <gshan@redhat.com>
> 
> 
> how about error-ing out if distance map is required but
> not provided by user explicitly and asking user to fix
> command line?
> 
> Reasoning behind this that defaults are hard to maintain
> and will require compat hacks and being raod blocks down
> the road.
> Approach I was taking with generic NUMA code, is deprecating
> defaults and replacing them with sanity checks, which bail
> out on incorrect configuration and ask user to correct command line.
> Hence I dislike approach taken in this patch.
> 
> If you really wish to provide default, push it out of
> generic code into ARM specific one
> (then I won't oppose it that much (I think PPC does
> some magic like this))
> Also behavior seems to be ARM specific so generic
> NUMA code isn't a place for it anyways

The distance-map DT node and the default 10/20 distance-map values
aren't arch-specific. RISCV is using it too.

I'm on the fence with this. I see erroring-out to require users
to provide explicit command lines as a good thing, but I also
see it as potentially an unnecessary burden for those that want
the default map anyway. The optional nature of the distance-map
node and the specification of the default map is here [1]

[1] Linux source: Documentation/devicetree/bindings/numa.txt

So, my r-b stands for this patch, but I also wouldn't complain
about respinning it to error out instead. I would complain about
moving the logic to Arm specific code, though, since RISCV would
then need to duplicate it.

Thanks,
drew

> 
> > ---
> >  hw/core/numa.c | 13 +++++++++++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> > 
> > diff --git a/hw/core/numa.c b/hw/core/numa.c
> > index 510d096a88..fdb3a4aeca 100644
> > --- a/hw/core/numa.c
> > +++ b/hw/core/numa.c
> > @@ -594,7 +594,7 @@ static void validate_numa_distance(MachineState *ms)
> >      }
> >  }
> >  
> > -static void complete_init_numa_distance(MachineState *ms)
> > +static void complete_init_numa_distance(MachineState *ms, bool is_default)
> >  {
> >      int src, dst;
> >      NodeInfo *numa_info = ms->numa_state->nodes;
> > @@ -609,6 +609,8 @@ static void complete_init_numa_distance(MachineState *ms)
> >              if (numa_info[src].distance[dst] == 0) {
> >                  if (src == dst) {
> >                      numa_info[src].distance[dst] = NUMA_DISTANCE_MIN;
> > +                } else if (is_default) {
> > +                    numa_info[src].distance[dst] = NUMA_DISTANCE_DEFAULT;
> >                  } else {
> >                      numa_info[src].distance[dst] = numa_info[dst].distance[src];
> >                  }
> > @@ -716,13 +718,20 @@ void numa_complete_configuration(MachineState *ms)
> >           * A->B != distance B->A, then that means the distance table is
> >           * asymmetric. In this case, the distances for both directions
> >           * of all node pairs are required.
> > +         *
> > +         * The default node pair distances, which are 10 and 20 for the
> > +         * local and remote nodes separatly, are provided if user doesn't
> > +         * specify any node pair distances.
> >           */
> >          if (ms->numa_state->have_numa_distance) {
> >              /* Validate enough NUMA distance information was provided. */
> >              validate_numa_distance(ms);
> >  
> >              /* Validation succeeded, now fill in any missing distances. */
> > -            complete_init_numa_distance(ms);
> > +            complete_init_numa_distance(ms, false);
> > +        } else {
> > +            complete_init_numa_distance(ms, true);
> > +            ms->numa_state->have_numa_distance = true;
> >          }
> >      }
> >  }
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] numa: Set default distance map if needed
  2021-10-12 10:31     ` Gavin Shan
@ 2021-10-12 11:18       ` Igor Mammedov
  2021-10-12 11:48       ` Andrew Jones
  1 sibling, 0 replies; 24+ messages in thread
From: Igor Mammedov @ 2021-10-12 11:18 UTC (permalink / raw)
  To: Gavin Shan
  Cc: peter.maydell, drjones, ehabkost, qemu-devel, qemu-arm, shan.gavin

On Tue, 12 Oct 2021 21:31:55 +1100
Gavin Shan <gshan@redhat.com> wrote:

> Hi Igor,
> 
> On 10/12/21 8:40 PM, Igor Mammedov wrote:
> > On Wed,  6 Oct 2021 18:22:08 +0800
> > Gavin Shan <gshan@redhat.com> wrote:
> >   
> >> The following option is used to specify the distance map. It's
> >> possible the option isn't provided by user. In this case, the
> >> distance map isn't populated and exposed to platform. On the
> >> other hand, the empty NUMA node, where no memory resides, is
> >> allowed on ARM64 virt platform. For these empty NUMA nodes,
> >> their corresponding device-tree nodes aren't populated, but
> >> their NUMA IDs should be included in the "/distance-map"
> >> device-tree node, so that kernel can probe them properly if
> >> device-tree is used.
> >>
> >>    -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
> >>
> >> So when user doesn't specify distance map, we need to generate
> >> the default distance map, where the local and remote distances
> >> are 10 and 20 separately. This adds an extra parameter to the
> >> exiting complete_init_numa_distance() to generate the default
> >> distance map for this case.
> >>
> >> Signed-off-by: Gavin Shan <gshan@redhat.com>  
> > 
> > 
> > how about error-ing out if distance map is required but
> > not provided by user explicitly and asking user to fix
> > command line?
> > 
> > Reasoning behind this that defaults are hard to maintain
> > and will require compat hacks and being raod blocks down
> > the road.
> > Approach I was taking with generic NUMA code, is deprecating
> > defaults and replacing them with sanity checks, which bail
> > out on incorrect configuration and ask user to correct command line.
> > Hence I dislike approach taken in this patch.
> > 
> > If you really wish to provide default, push it out of
> > generic code into ARM specific one
> > (then I won't oppose it that much (I think PPC does
> > some magic like this))
> > Also behavior seems to be ARM specific so generic
> > NUMA code isn't a place for it anyways
> >   
> 
> Thanks for your comments.
> 
> Yep, Lets move the logic into hw/arm/virt in v3 because I think simply
> error-ing out will block the existing configuration where the distance
> map isn't provided by user.

that can be solved by deprecating broken config (with temporary compat
code to keep the behavior for old machine types
and then all that be removed in 2 releases, leaving us with explicit
option only)

> After moving the logic to hw/arm/virt,
> this patch is consistent with PATCH[02/02] and the specific platform
> is affected only.
> 
> 
> >> ---
> >>   hw/core/numa.c | 13 +++++++++++--
> >>   1 file changed, 11 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/hw/core/numa.c b/hw/core/numa.c
> >> index 510d096a88..fdb3a4aeca 100644
> >> --- a/hw/core/numa.c
> >> +++ b/hw/core/numa.c
> >> @@ -594,7 +594,7 @@ static void validate_numa_distance(MachineState *ms)
> >>       }
> >>   }
> >>   
> >> -static void complete_init_numa_distance(MachineState *ms)
> >> +static void complete_init_numa_distance(MachineState *ms, bool is_default)
> >>   {
> >>       int src, dst;
> >>       NodeInfo *numa_info = ms->numa_state->nodes;
> >> @@ -609,6 +609,8 @@ static void complete_init_numa_distance(MachineState *ms)
> >>               if (numa_info[src].distance[dst] == 0) {
> >>                   if (src == dst) {
> >>                       numa_info[src].distance[dst] = NUMA_DISTANCE_MIN;
> >> +                } else if (is_default) {
> >> +                    numa_info[src].distance[dst] = NUMA_DISTANCE_DEFAULT;
> >>                   } else {
> >>                       numa_info[src].distance[dst] = numa_info[dst].distance[src];
> >>                   }
> >> @@ -716,13 +718,20 @@ void numa_complete_configuration(MachineState *ms)
> >>            * A->B != distance B->A, then that means the distance table is
> >>            * asymmetric. In this case, the distances for both directions
> >>            * of all node pairs are required.
> >> +         *
> >> +         * The default node pair distances, which are 10 and 20 for the
> >> +         * local and remote nodes separatly, are provided if user doesn't
> >> +         * specify any node pair distances.
> >>            */
> >>           if (ms->numa_state->have_numa_distance) {
> >>               /* Validate enough NUMA distance information was provided. */
> >>               validate_numa_distance(ms);
> >>   
> >>               /* Validation succeeded, now fill in any missing distances. */
> >> -            complete_init_numa_distance(ms);
> >> +            complete_init_numa_distance(ms, false);
> >> +        } else {
> >> +            complete_init_numa_distance(ms, true);
> >> +            ms->numa_state->have_numa_distance = true;
> >>           }
> >>       }
> >>   }  
> >   
> 
> Thanks,
> Gavin
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] numa: Set default distance map if needed
  2021-10-12 10:31     ` Gavin Shan
  2021-10-12 11:18       ` Igor Mammedov
@ 2021-10-12 11:48       ` Andrew Jones
  2021-10-12 12:34         ` Igor Mammedov
  1 sibling, 1 reply; 24+ messages in thread
From: Andrew Jones @ 2021-10-12 11:48 UTC (permalink / raw)
  To: Gavin Shan
  Cc: peter.maydell, ehabkost, qemu-devel, qemu-arm, shan.gavin, Igor Mammedov

On Tue, Oct 12, 2021 at 09:31:55PM +1100, Gavin Shan wrote:
> Hi Igor,
> 
> On 10/12/21 8:40 PM, Igor Mammedov wrote:
> > On Wed,  6 Oct 2021 18:22:08 +0800
> > Gavin Shan <gshan@redhat.com> wrote:
> > 
> > > The following option is used to specify the distance map. It's
> > > possible the option isn't provided by user. In this case, the
> > > distance map isn't populated and exposed to platform. On the
> > > other hand, the empty NUMA node, where no memory resides, is
> > > allowed on ARM64 virt platform. For these empty NUMA nodes,
> > > their corresponding device-tree nodes aren't populated, but
> > > their NUMA IDs should be included in the "/distance-map"
> > > device-tree node, so that kernel can probe them properly if
> > > device-tree is used.
> > > 
> > >    -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
> > > 
> > > So when user doesn't specify distance map, we need to generate
> > > the default distance map, where the local and remote distances
> > > are 10 and 20 separately. This adds an extra parameter to the
> > > exiting complete_init_numa_distance() to generate the default
> > > distance map for this case.
> > > 
> > > Signed-off-by: Gavin Shan <gshan@redhat.com>
> > 
> > 
> > how about error-ing out if distance map is required but
> > not provided by user explicitly and asking user to fix
> > command line?
> > 
> > Reasoning behind this that defaults are hard to maintain
> > and will require compat hacks and being raod blocks down
> > the road.
> > Approach I was taking with generic NUMA code, is deprecating
> > defaults and replacing them with sanity checks, which bail
> > out on incorrect configuration and ask user to correct command line.
> > Hence I dislike approach taken in this patch.
> > 
> > If you really wish to provide default, push it out of
> > generic code into ARM specific one
> > (then I won't oppose it that much (I think PPC does
> > some magic like this))
> > Also behavior seems to be ARM specific so generic
> > NUMA code isn't a place for it anyways
> > 
> 
> Thanks for your comments.
> 
> Yep, Lets move the logic into hw/arm/virt in v3 because I think simply
> error-ing out will block the existing configuration where the distance
> map isn't provided by user. After moving the logic to hw/arm/virt,
> this patch is consistent with PATCH[02/02] and the specific platform
> is affected only.

Please don't move anything NUMA DT generic to hw/arm/virt. If the spec
isn't arch-specific, then the modeling shouldn't be either.

If you want to error-out for all configs missing the distance map, then
you'll need compat code. If you only want to error-out for configs that
have empty NUMA nodes and are missing a distance map, then you don't
need compat code, because those configs never worked before anyway.

Thanks,
drew

> 
> 
> > > ---
> > >   hw/core/numa.c | 13 +++++++++++--
> > >   1 file changed, 11 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/hw/core/numa.c b/hw/core/numa.c
> > > index 510d096a88..fdb3a4aeca 100644
> > > --- a/hw/core/numa.c
> > > +++ b/hw/core/numa.c
> > > @@ -594,7 +594,7 @@ static void validate_numa_distance(MachineState *ms)
> > >       }
> > >   }
> > > -static void complete_init_numa_distance(MachineState *ms)
> > > +static void complete_init_numa_distance(MachineState *ms, bool is_default)
> > >   {
> > >       int src, dst;
> > >       NodeInfo *numa_info = ms->numa_state->nodes;
> > > @@ -609,6 +609,8 @@ static void complete_init_numa_distance(MachineState *ms)
> > >               if (numa_info[src].distance[dst] == 0) {
> > >                   if (src == dst) {
> > >                       numa_info[src].distance[dst] = NUMA_DISTANCE_MIN;
> > > +                } else if (is_default) {
> > > +                    numa_info[src].distance[dst] = NUMA_DISTANCE_DEFAULT;
> > >                   } else {
> > >                       numa_info[src].distance[dst] = numa_info[dst].distance[src];
> > >                   }
> > > @@ -716,13 +718,20 @@ void numa_complete_configuration(MachineState *ms)
> > >            * A->B != distance B->A, then that means the distance table is
> > >            * asymmetric. In this case, the distances for both directions
> > >            * of all node pairs are required.
> > > +         *
> > > +         * The default node pair distances, which are 10 and 20 for the
> > > +         * local and remote nodes separatly, are provided if user doesn't
> > > +         * specify any node pair distances.
> > >            */
> > >           if (ms->numa_state->have_numa_distance) {
> > >               /* Validate enough NUMA distance information was provided. */
> > >               validate_numa_distance(ms);
> > >               /* Validation succeeded, now fill in any missing distances. */
> > > -            complete_init_numa_distance(ms);
> > > +            complete_init_numa_distance(ms, false);
> > > +        } else {
> > > +            complete_init_numa_distance(ms, true);
> > > +            ms->numa_state->have_numa_distance = true;
> > >           }
> > >       }
> > >   }
> > 
> 
> Thanks,
> Gavin
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] numa: Set default distance map if needed
  2021-10-12 10:37     ` Andrew Jones
@ 2021-10-12 12:27       ` Igor Mammedov
  2021-10-12 13:13         ` Andrew Jones
  0 siblings, 1 reply; 24+ messages in thread
From: Igor Mammedov @ 2021-10-12 12:27 UTC (permalink / raw)
  To: Andrew Jones
  Cc: peter.maydell, Gavin Shan, ehabkost, qemu-devel, qemu-arm, shan.gavin

On Tue, 12 Oct 2021 12:37:54 +0200
Andrew Jones <drjones@redhat.com> wrote:

> On Tue, Oct 12, 2021 at 11:40:16AM +0200, Igor Mammedov wrote:
> > On Wed,  6 Oct 2021 18:22:08 +0800
> > Gavin Shan <gshan@redhat.com> wrote:
> >   
> > > The following option is used to specify the distance map. It's
> > > possible the option isn't provided by user. In this case, the
> > > distance map isn't populated and exposed to platform. On the
> > > other hand, the empty NUMA node, where no memory resides, is
> > > allowed on ARM64 virt platform. For these empty NUMA nodes,
> > > their corresponding device-tree nodes aren't populated, but
> > > their NUMA IDs should be included in the "/distance-map"
> > > device-tree node, so that kernel can probe them properly if
> > > device-tree is used.
> > > 
> > >   -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
> > > 
> > > So when user doesn't specify distance map, we need to generate
> > > the default distance map, where the local and remote distances
> > > are 10 and 20 separately. This adds an extra parameter to the
> > > exiting complete_init_numa_distance() to generate the default
> > > distance map for this case.
> > > 
> > > Signed-off-by: Gavin Shan <gshan@redhat.com>  
> > 
> > 
> > how about error-ing out if distance map is required but
> > not provided by user explicitly and asking user to fix
> > command line?
> > 
> > Reasoning behind this that defaults are hard to maintain
> > and will require compat hacks and being raod blocks down
> > the road.
> > Approach I was taking with generic NUMA code, is deprecating
> > defaults and replacing them with sanity checks, which bail
> > out on incorrect configuration and ask user to correct command line.
> > Hence I dislike approach taken in this patch.
> > 
> > If you really wish to provide default, push it out of
> > generic code into ARM specific one
> > (then I won't oppose it that much (I think PPC does
> > some magic like this))
> > Also behavior seems to be ARM specific so generic
> > NUMA code isn't a place for it anyways  
> 
> The distance-map DT node and the default 10/20 distance-map values
> aren't arch-specific. RISCV is using it too.
> 
> I'm on the fence with this. I see erroring-out to require users
> to provide explicit command lines as a good thing, but I also
> see it as potentially an unnecessary burden for those that want
> the default map anyway. The optional nature of the distance-map
> node and the specification of the default map is here [1]
> 
> [1] Linux source: Documentation/devicetree/bindings/numa.txt

Looking at proposed linux patches [ https://lkml.org/lkml/2021/9/27/31 ],
using optional distance table as source for numa-node-ids,
looks like a hack around kernel's inability to fish them out
from CPU &| PCI nodes (using those nodes as source should
cover memory-less node use-case).

I consider including optional node as a policy decision.
So user shall include it explicitly on QEMU command line
if necessary (that works just fine for x86), or guest OS
can make up defaults on its own in absence of data.

> So, my r-b stands for this patch, but I also wouldn't complain
> about respinning it to error out instead.

> I would complain about
> moving the logic to Arm specific code, though, since RISCV would
> then need to duplicate it.

Instead of putting workaround in QEMU and then making them generic,
I'd prefer to:
 1. make QEMU to be able generate DT with memory-less nodes
 2. fix guest to get numa-node-id from CPU/PCI nodes if
    memory node isn't present, or use ACPI tables which can
    describe memory-less NUMA nodes if fixing how DT is
    parsed unfeasible.

> Thanks,
> drew
> 
> >   
> > > ---
> > >  hw/core/numa.c | 13 +++++++++++--
> > >  1 file changed, 11 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/hw/core/numa.c b/hw/core/numa.c
> > > index 510d096a88..fdb3a4aeca 100644
> > > --- a/hw/core/numa.c
> > > +++ b/hw/core/numa.c
> > > @@ -594,7 +594,7 @@ static void validate_numa_distance(MachineState *ms)
> > >      }
> > >  }
> > >  
> > > -static void complete_init_numa_distance(MachineState *ms)
> > > +static void complete_init_numa_distance(MachineState *ms, bool is_default)
> > >  {
> > >      int src, dst;
> > >      NodeInfo *numa_info = ms->numa_state->nodes;
> > > @@ -609,6 +609,8 @@ static void complete_init_numa_distance(MachineState *ms)
> > >              if (numa_info[src].distance[dst] == 0) {
> > >                  if (src == dst) {
> > >                      numa_info[src].distance[dst] = NUMA_DISTANCE_MIN;
> > > +                } else if (is_default) {
> > > +                    numa_info[src].distance[dst] = NUMA_DISTANCE_DEFAULT;
> > >                  } else {
> > >                      numa_info[src].distance[dst] = numa_info[dst].distance[src];
> > >                  }
> > > @@ -716,13 +718,20 @@ void numa_complete_configuration(MachineState *ms)
> > >           * A->B != distance B->A, then that means the distance table is
> > >           * asymmetric. In this case, the distances for both directions
> > >           * of all node pairs are required.
> > > +         *
> > > +         * The default node pair distances, which are 10 and 20 for the
> > > +         * local and remote nodes separatly, are provided if user doesn't
> > > +         * specify any node pair distances.
> > >           */
> > >          if (ms->numa_state->have_numa_distance) {
> > >              /* Validate enough NUMA distance information was provided. */
> > >              validate_numa_distance(ms);
> > >  
> > >              /* Validation succeeded, now fill in any missing distances. */
> > > -            complete_init_numa_distance(ms);
> > > +            complete_init_numa_distance(ms, false);
> > > +        } else {
> > > +            complete_init_numa_distance(ms, true);
> > > +            ms->numa_state->have_numa_distance = true;
> > >          }
> > >      }
> > >  }  
> >   
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] numa: Set default distance map if needed
  2021-10-12 11:48       ` Andrew Jones
@ 2021-10-12 12:34         ` Igor Mammedov
  2021-10-12 13:05           ` Andrew Jones
  0 siblings, 1 reply; 24+ messages in thread
From: Igor Mammedov @ 2021-10-12 12:34 UTC (permalink / raw)
  To: Andrew Jones
  Cc: peter.maydell, Gavin Shan, ehabkost, qemu-devel, qemu-arm, shan.gavin

On Tue, 12 Oct 2021 13:48:02 +0200
Andrew Jones <drjones@redhat.com> wrote:

> On Tue, Oct 12, 2021 at 09:31:55PM +1100, Gavin Shan wrote:
> > Hi Igor,
> > 
> > On 10/12/21 8:40 PM, Igor Mammedov wrote:  
> > > On Wed,  6 Oct 2021 18:22:08 +0800
> > > Gavin Shan <gshan@redhat.com> wrote:
> > >   
> > > > The following option is used to specify the distance map. It's
> > > > possible the option isn't provided by user. In this case, the
> > > > distance map isn't populated and exposed to platform. On the
> > > > other hand, the empty NUMA node, where no memory resides, is
> > > > allowed on ARM64 virt platform. For these empty NUMA nodes,
> > > > their corresponding device-tree nodes aren't populated, but
> > > > their NUMA IDs should be included in the "/distance-map"
> > > > device-tree node, so that kernel can probe them properly if
> > > > device-tree is used.
> > > > 
> > > >    -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
> > > > 
> > > > So when user doesn't specify distance map, we need to generate
> > > > the default distance map, where the local and remote distances
> > > > are 10 and 20 separately. This adds an extra parameter to the
> > > > exiting complete_init_numa_distance() to generate the default
> > > > distance map for this case.
> > > > 
> > > > Signed-off-by: Gavin Shan <gshan@redhat.com>  
> > > 
> > > 
> > > how about error-ing out if distance map is required but
> > > not provided by user explicitly and asking user to fix
> > > command line?
> > > 
> > > Reasoning behind this that defaults are hard to maintain
> > > and will require compat hacks and being raod blocks down
> > > the road.
> > > Approach I was taking with generic NUMA code, is deprecating
> > > defaults and replacing them with sanity checks, which bail
> > > out on incorrect configuration and ask user to correct command line.
> > > Hence I dislike approach taken in this patch.
> > > 
> > > If you really wish to provide default, push it out of
> > > generic code into ARM specific one
> > > (then I won't oppose it that much (I think PPC does
> > > some magic like this))
> > > Also behavior seems to be ARM specific so generic
> > > NUMA code isn't a place for it anyways
> > >   
> > 
> > Thanks for your comments.
> > 
> > Yep, Lets move the logic into hw/arm/virt in v3 because I think simply
> > error-ing out will block the existing configuration where the distance
> > map isn't provided by user. After moving the logic to hw/arm/virt,
> > this patch is consistent with PATCH[02/02] and the specific platform
> > is affected only.  
> 
> Please don't move anything NUMA DT generic to hw/arm/virt. If the spec
> isn't arch-specific, then the modeling shouldn't be either.


> If you want to error-out for all configs missing the distance map, then
> you'll need compat code.

> If you only want to error-out for configs that
> have empty NUMA nodes and are missing a distance map, then you don't
> need compat code, because those configs never worked before anyway.

I think memory-less configs without distance map worked for x86 just fine.

After looking at this thread all over again it seems to me that using
distance map as a source of numa ids is a mistake.


> 
> Thanks,
> drew
> 
> > 
> >   
> > > > ---
> > > >   hw/core/numa.c | 13 +++++++++++--
> > > >   1 file changed, 11 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/hw/core/numa.c b/hw/core/numa.c
> > > > index 510d096a88..fdb3a4aeca 100644
> > > > --- a/hw/core/numa.c
> > > > +++ b/hw/core/numa.c
> > > > @@ -594,7 +594,7 @@ static void validate_numa_distance(MachineState *ms)
> > > >       }
> > > >   }
> > > > -static void complete_init_numa_distance(MachineState *ms)
> > > > +static void complete_init_numa_distance(MachineState *ms, bool is_default)
> > > >   {
> > > >       int src, dst;
> > > >       NodeInfo *numa_info = ms->numa_state->nodes;
> > > > @@ -609,6 +609,8 @@ static void complete_init_numa_distance(MachineState *ms)
> > > >               if (numa_info[src].distance[dst] == 0) {
> > > >                   if (src == dst) {
> > > >                       numa_info[src].distance[dst] = NUMA_DISTANCE_MIN;
> > > > +                } else if (is_default) {
> > > > +                    numa_info[src].distance[dst] = NUMA_DISTANCE_DEFAULT;
> > > >                   } else {
> > > >                       numa_info[src].distance[dst] = numa_info[dst].distance[src];
> > > >                   }
> > > > @@ -716,13 +718,20 @@ void numa_complete_configuration(MachineState *ms)
> > > >            * A->B != distance B->A, then that means the distance table is
> > > >            * asymmetric. In this case, the distances for both directions
> > > >            * of all node pairs are required.
> > > > +         *
> > > > +         * The default node pair distances, which are 10 and 20 for the
> > > > +         * local and remote nodes separatly, are provided if user doesn't
> > > > +         * specify any node pair distances.
> > > >            */
> > > >           if (ms->numa_state->have_numa_distance) {
> > > >               /* Validate enough NUMA distance information was provided. */
> > > >               validate_numa_distance(ms);
> > > >               /* Validation succeeded, now fill in any missing distances. */
> > > > -            complete_init_numa_distance(ms);
> > > > +            complete_init_numa_distance(ms, false);
> > > > +        } else {
> > > > +            complete_init_numa_distance(ms, true);
> > > > +            ms->numa_state->have_numa_distance = true;
> > > >           }
> > > >       }
> > > >   }  
> > >   
> > 
> > Thanks,
> > Gavin
> >   
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] numa: Set default distance map if needed
  2021-10-12 12:34         ` Igor Mammedov
@ 2021-10-12 13:05           ` Andrew Jones
  2021-10-12 22:59             ` Gavin Shan
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Jones @ 2021-10-12 13:05 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: peter.maydell, Gavin Shan, ehabkost, qemu-devel, qemu-arm, shan.gavin

On Tue, Oct 12, 2021 at 02:34:30PM +0200, Igor Mammedov wrote:
> On Tue, 12 Oct 2021 13:48:02 +0200
> Andrew Jones <drjones@redhat.com> wrote:
> 
> > On Tue, Oct 12, 2021 at 09:31:55PM +1100, Gavin Shan wrote:
> > > Hi Igor,
> > > 
> > > On 10/12/21 8:40 PM, Igor Mammedov wrote:  
> > > > On Wed,  6 Oct 2021 18:22:08 +0800
> > > > Gavin Shan <gshan@redhat.com> wrote:
> > > >   
> > > > > The following option is used to specify the distance map. It's
> > > > > possible the option isn't provided by user. In this case, the
> > > > > distance map isn't populated and exposed to platform. On the
> > > > > other hand, the empty NUMA node, where no memory resides, is
> > > > > allowed on ARM64 virt platform. For these empty NUMA nodes,
> > > > > their corresponding device-tree nodes aren't populated, but
> > > > > their NUMA IDs should be included in the "/distance-map"
> > > > > device-tree node, so that kernel can probe them properly if
> > > > > device-tree is used.
> > > > > 
> > > > >    -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
> > > > > 
> > > > > So when user doesn't specify distance map, we need to generate
> > > > > the default distance map, where the local and remote distances
> > > > > are 10 and 20 separately. This adds an extra parameter to the
> > > > > exiting complete_init_numa_distance() to generate the default
> > > > > distance map for this case.
> > > > > 
> > > > > Signed-off-by: Gavin Shan <gshan@redhat.com>  
> > > > 
> > > > 
> > > > how about error-ing out if distance map is required but
> > > > not provided by user explicitly and asking user to fix
> > > > command line?
> > > > 
> > > > Reasoning behind this that defaults are hard to maintain
> > > > and will require compat hacks and being raod blocks down
> > > > the road.
> > > > Approach I was taking with generic NUMA code, is deprecating
> > > > defaults and replacing them with sanity checks, which bail
> > > > out on incorrect configuration and ask user to correct command line.
> > > > Hence I dislike approach taken in this patch.
> > > > 
> > > > If you really wish to provide default, push it out of
> > > > generic code into ARM specific one
> > > > (then I won't oppose it that much (I think PPC does
> > > > some magic like this))
> > > > Also behavior seems to be ARM specific so generic
> > > > NUMA code isn't a place for it anyways
> > > >   
> > > 
> > > Thanks for your comments.
> > > 
> > > Yep, Lets move the logic into hw/arm/virt in v3 because I think simply
> > > error-ing out will block the existing configuration where the distance
> > > map isn't provided by user. After moving the logic to hw/arm/virt,
> > > this patch is consistent with PATCH[02/02] and the specific platform
> > > is affected only.  
> > 
> > Please don't move anything NUMA DT generic to hw/arm/virt. If the spec
> > isn't arch-specific, then the modeling shouldn't be either.
> 
> 
> > If you want to error-out for all configs missing the distance map, then
> > you'll need compat code.
> 
> > If you only want to error-out for configs that
> > have empty NUMA nodes and are missing a distance map, then you don't
> > need compat code, because those configs never worked before anyway.
> 
> I think memory-less configs without distance map worked for x86 just fine.

Ah, yes, we should make the condition for erroring-out be

 have-memoryless-nodes && !have-distance-map && generate-DT

ACPI only architectures, x86, don't need to care about this.

> 
> After looking at this thread all over again it seems to me that using
> distance map as a source of numa ids is a mistake.

You'll have to discuss that with Rob Herring, as that was his proposal.
He'll expect a counterproposal though, which we don't have...

Thanks,
drew



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] numa: Set default distance map if needed
  2021-10-12 12:27       ` Igor Mammedov
@ 2021-10-12 13:13         ` Andrew Jones
  2021-10-12 13:53           ` Igor Mammedov
  0 siblings, 1 reply; 24+ messages in thread
From: Andrew Jones @ 2021-10-12 13:13 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: peter.maydell, Gavin Shan, ehabkost, qemu-devel, qemu-arm, shan.gavin

On Tue, Oct 12, 2021 at 02:27:54PM +0200, Igor Mammedov wrote:
> On Tue, 12 Oct 2021 12:37:54 +0200
> Andrew Jones <drjones@redhat.com> wrote:
> 
> > On Tue, Oct 12, 2021 at 11:40:16AM +0200, Igor Mammedov wrote:
> > > On Wed,  6 Oct 2021 18:22:08 +0800
> > > Gavin Shan <gshan@redhat.com> wrote:
> > >   
> > > > The following option is used to specify the distance map. It's
> > > > possible the option isn't provided by user. In this case, the
> > > > distance map isn't populated and exposed to platform. On the
> > > > other hand, the empty NUMA node, where no memory resides, is
> > > > allowed on ARM64 virt platform. For these empty NUMA nodes,
> > > > their corresponding device-tree nodes aren't populated, but
> > > > their NUMA IDs should be included in the "/distance-map"
> > > > device-tree node, so that kernel can probe them properly if
> > > > device-tree is used.
> > > > 
> > > >   -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
> > > > 
> > > > So when user doesn't specify distance map, we need to generate
> > > > the default distance map, where the local and remote distances
> > > > are 10 and 20 separately. This adds an extra parameter to the
> > > > exiting complete_init_numa_distance() to generate the default
> > > > distance map for this case.
> > > > 
> > > > Signed-off-by: Gavin Shan <gshan@redhat.com>  
> > > 
> > > 
> > > how about error-ing out if distance map is required but
> > > not provided by user explicitly and asking user to fix
> > > command line?
> > > 
> > > Reasoning behind this that defaults are hard to maintain
> > > and will require compat hacks and being raod blocks down
> > > the road.
> > > Approach I was taking with generic NUMA code, is deprecating
> > > defaults and replacing them with sanity checks, which bail
> > > out on incorrect configuration and ask user to correct command line.
> > > Hence I dislike approach taken in this patch.
> > > 
> > > If you really wish to provide default, push it out of
> > > generic code into ARM specific one
> > > (then I won't oppose it that much (I think PPC does
> > > some magic like this))
> > > Also behavior seems to be ARM specific so generic
> > > NUMA code isn't a place for it anyways  
> > 
> > The distance-map DT node and the default 10/20 distance-map values
> > aren't arch-specific. RISCV is using it too.
> > 
> > I'm on the fence with this. I see erroring-out to require users
> > to provide explicit command lines as a good thing, but I also
> > see it as potentially an unnecessary burden for those that want
> > the default map anyway. The optional nature of the distance-map
> > node and the specification of the default map is here [1]
> > 
> > [1] Linux source: Documentation/devicetree/bindings/numa.txt
> 
> Looking at proposed linux patches [ https://lkml.org/lkml/2021/9/27/31 ],
> using optional distance table as source for numa-node-ids,
> looks like a hack around kernel's inability to fish them out
> from CPU &| PCI nodes (using those nodes as source should
> cover memory-less node use-case).
> 
> I consider including optional node as a policy decision.
> So user shall include it explicitly on QEMU command line
> if necessary (that works just fine for x86), or guest OS
> can make up defaults on its own in absence of data.

OK, so erroring-out on configs that must provide distance-maps, rather
than automatically generating them for all configs is better.

> 
> > So, my r-b stands for this patch, but I also wouldn't complain
> > about respinning it to error out instead.
> 
> > I would complain about
> > moving the logic to Arm specific code, though, since RISCV would
> > then need to duplicate it.
> 
> Instead of putting workaround in QEMU and then making them generic,
> I'd prefer to:
>  1. make QEMU to be able generate DT with memory-less nodes

How? DT syntax doesn't allow this, because each node needs a unique
name which is derived from its base address, which an empty numa
node doesn't have.

>  2. fix guest to get numa-node-id from CPU/PCI nodes if
>     memory node isn't present,

I'm not sure that's possible with DT. If it is, then proposing it
upstream to Linux DT maintainers would be the next step.

> or use ACPI tables which can
>     describe memory-less NUMA nodes if fixing how DT is
>     parsed unfeasible.

We use ACPI already for our guests, but we also generate a DT (which
edk2 consumes). We can't generate a valid DT when empty numa nodes
are put on the command line unless we follow a DT spec saying how
to do that. The current spec says we should have a distance-map
that contains those nodes.

Thanks,
drew



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] numa: Set default distance map if needed
  2021-10-12 13:13         ` Andrew Jones
@ 2021-10-12 13:53           ` Igor Mammedov
  2021-10-12 23:32             ` Gavin Shan
  2021-10-13  6:29             ` Andrew Jones
  0 siblings, 2 replies; 24+ messages in thread
From: Igor Mammedov @ 2021-10-12 13:53 UTC (permalink / raw)
  To: Andrew Jones
  Cc: peter.maydell, Gavin Shan, ehabkost, robh, qemu-devel, qemu-arm,
	shan.gavin

On Tue, 12 Oct 2021 15:13:08 +0200
Andrew Jones <drjones@redhat.com> wrote:

> On Tue, Oct 12, 2021 at 02:27:54PM +0200, Igor Mammedov wrote:
> > On Tue, 12 Oct 2021 12:37:54 +0200
> > Andrew Jones <drjones@redhat.com> wrote:
> >   
> > > On Tue, Oct 12, 2021 at 11:40:16AM +0200, Igor Mammedov wrote:  
> > > > On Wed,  6 Oct 2021 18:22:08 +0800
> > > > Gavin Shan <gshan@redhat.com> wrote:
> > > >     
> > > > > The following option is used to specify the distance map. It's
> > > > > possible the option isn't provided by user. In this case, the
> > > > > distance map isn't populated and exposed to platform. On the
> > > > > other hand, the empty NUMA node, where no memory resides, is
> > > > > allowed on ARM64 virt platform. For these empty NUMA nodes,
> > > > > their corresponding device-tree nodes aren't populated, but
> > > > > their NUMA IDs should be included in the "/distance-map"
> > > > > device-tree node, so that kernel can probe them properly if
> > > > > device-tree is used.
> > > > > 
> > > > >   -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
> > > > > 
> > > > > So when user doesn't specify distance map, we need to generate
> > > > > the default distance map, where the local and remote distances
> > > > > are 10 and 20 separately. This adds an extra parameter to the
> > > > > exiting complete_init_numa_distance() to generate the default
> > > > > distance map for this case.
> > > > > 
> > > > > Signed-off-by: Gavin Shan <gshan@redhat.com>    
> > > > 
> > > > 
> > > > how about error-ing out if distance map is required but
> > > > not provided by user explicitly and asking user to fix
> > > > command line?
> > > > 
> > > > Reasoning behind this that defaults are hard to maintain
> > > > and will require compat hacks and being raod blocks down
> > > > the road.
> > > > Approach I was taking with generic NUMA code, is deprecating
> > > > defaults and replacing them with sanity checks, which bail
> > > > out on incorrect configuration and ask user to correct command line.
> > > > Hence I dislike approach taken in this patch.
> > > > 
> > > > If you really wish to provide default, push it out of
> > > > generic code into ARM specific one
> > > > (then I won't oppose it that much (I think PPC does
> > > > some magic like this))
> > > > Also behavior seems to be ARM specific so generic
> > > > NUMA code isn't a place for it anyways    
> > > 
> > > The distance-map DT node and the default 10/20 distance-map values
> > > aren't arch-specific. RISCV is using it too.
> > > 
> > > I'm on the fence with this. I see erroring-out to require users
> > > to provide explicit command lines as a good thing, but I also
> > > see it as potentially an unnecessary burden for those that want
> > > the default map anyway. The optional nature of the distance-map
> > > node and the specification of the default map is here [1]
> > > 
> > > [1] Linux source: Documentation/devicetree/bindings/numa.txt  
> > 
> > Looking at proposed linux patches [ https://lkml.org/lkml/2021/9/27/31 ],
> > using optional distance table as source for numa-node-ids,
> > looks like a hack around kernel's inability to fish them out
> > from CPU &| PCI nodes (using those nodes as source should
> > cover memory-less node use-case).
> > 
> > I consider including optional node as a policy decision.
> > So user shall include it explicitly on QEMU command line
> > if necessary (that works just fine for x86), or guest OS
> > can make up defaults on its own in absence of data.  
> 
> OK, so erroring-out on configs that must provide distance-maps, rather
> than automatically generating them for all configs is better.
> 
> >   
> > > So, my r-b stands for this patch, but I also wouldn't complain
> > > about respinning it to error out instead.  
> >   
> > > I would complain about
> > > moving the logic to Arm specific code, though, since RISCV would
> > > then need to duplicate it.  
> > 
> > Instead of putting workaround in QEMU and then making them generic,
> > I'd prefer to:
> >  1. make QEMU to be able generate DT with memory-less nodes  
> 
> How? DT syntax doesn't allow this, because each node needs a unique
> name which is derived from its base address, which an empty numa
you are talking about memory@foo nodes, aren't you?

> node doesn't have.

Looking at Documentation/devicetree/bindings/numa.txt

mem/cpu/pci nodes also contain numa-node-id attribute,
so idea is to collect IDs from all present sources
instead of abusing distance map.
 
That would allow QEMU to skip memory@foo elements for
memory-less nodes because they obviously do not exist
and there is no way to describe them using 'memory' nodes.

> >  2. fix guest to get numa-node-id from CPU/PCI nodes if
> >     memory node isn't present,  
> 
> I'm not sure that's possible with DT. If it is, then proposing it
> upstream to Linux DT maintainers would be the next step.
Added Rob to CC.

> 
> > or use ACPI tables which can
> >     describe memory-less NUMA nodes if fixing how DT is
> >     parsed unfeasible.  
> 
> We use ACPI already for our guests, but we also generate a DT (which
> edk2 consumes). We can't generate a valid DT when empty numa nodes
does edk2 actually uses numa info from QEMU?

> are put on the command line unless we follow a DT spec saying how
> to do that. The current spec says we should have a distance-map
> that contains those nodes.

can you point out to the spec and place within it, pls?
 
> Thanks,
> drew
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] numa: Set default distance map if needed
  2021-10-12 13:05           ` Andrew Jones
@ 2021-10-12 22:59             ` Gavin Shan
  0 siblings, 0 replies; 24+ messages in thread
From: Gavin Shan @ 2021-10-12 22:59 UTC (permalink / raw)
  To: Andrew Jones, Igor Mammedov
  Cc: peter.maydell, qemu-arm, qemu-devel, shan.gavin, ehabkost

Hi Drew and Igor,

On 10/13/21 12:05 AM, Andrew Jones wrote:
> On Tue, Oct 12, 2021 at 02:34:30PM +0200, Igor Mammedov wrote:
>> On Tue, 12 Oct 2021 13:48:02 +0200
>>> On Tue, Oct 12, 2021 at 09:31:55PM +1100, Gavin Shan wrote:
>>>> On 10/12/21 8:40 PM, Igor Mammedov wrote:
>>>>> On Wed,  6 Oct 2021 18:22:08 +0800
>>>>> Gavin Shan <gshan@redhat.com> wrote:
>>>>>    
>>>>>> The following option is used to specify the distance map. It's
>>>>>> possible the option isn't provided by user. In this case, the
>>>>>> distance map isn't populated and exposed to platform. On the
>>>>>> other hand, the empty NUMA node, where no memory resides, is
>>>>>> allowed on ARM64 virt platform. For these empty NUMA nodes,
>>>>>> their corresponding device-tree nodes aren't populated, but
>>>>>> their NUMA IDs should be included in the "/distance-map"
>>>>>> device-tree node, so that kernel can probe them properly if
>>>>>> device-tree is used.
>>>>>>
>>>>>>     -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
>>>>>>
>>>>>> So when user doesn't specify distance map, we need to generate
>>>>>> the default distance map, where the local and remote distances
>>>>>> are 10 and 20 separately. This adds an extra parameter to the
>>>>>> exiting complete_init_numa_distance() to generate the default
>>>>>> distance map for this case.
>>>>>>
>>>>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>>>>
>>>>>
>>>>> how about error-ing out if distance map is required but
>>>>> not provided by user explicitly and asking user to fix
>>>>> command line?
>>>>>
>>>>> Reasoning behind this that defaults are hard to maintain
>>>>> and will require compat hacks and being raod blocks down
>>>>> the road.
>>>>> Approach I was taking with generic NUMA code, is deprecating
>>>>> defaults and replacing them with sanity checks, which bail
>>>>> out on incorrect configuration and ask user to correct command line.
>>>>> Hence I dislike approach taken in this patch.
>>>>>
>>>>> If you really wish to provide default, push it out of
>>>>> generic code into ARM specific one
>>>>> (then I won't oppose it that much (I think PPC does
>>>>> some magic like this))
>>>>> Also behavior seems to be ARM specific so generic
>>>>> NUMA code isn't a place for it anyways
>>>>>    
>>>>
>>>> Thanks for your comments.
>>>>
>>>> Yep, Lets move the logic into hw/arm/virt in v3 because I think simply
>>>> error-ing out will block the existing configuration where the distance
>>>> map isn't provided by user. After moving the logic to hw/arm/virt,
>>>> this patch is consistent with PATCH[02/02] and the specific platform
>>>> is affected only.
>>>
>>> Please don't move anything NUMA DT generic to hw/arm/virt. If the spec
>>> isn't arch-specific, then the modeling shouldn't be either.
>>
>>
>>> If you want to error-out for all configs missing the distance map, then
>>> you'll need compat code.
>>
>>> If you only want to error-out for configs that
>>> have empty NUMA nodes and are missing a distance map, then you don't
>>> need compat code, because those configs never worked before anyway.
>>
>> I think memory-less configs without distance map worked for x86 just fine.
> 
> Ah, yes, we should make the condition for erroring-out be
> 
>   have-memoryless-nodes && !have-distance-map && generate-DT
> 
> ACPI only architectures, x86, don't need to care about this.
> 

Sure, I will change the code accordingly in v3. Thanks for discussing
it through with Igor :)

>>
>> After looking at this thread all over again it seems to me that using
>> distance map as a source of numa ids is a mistake.
> 
> You'll have to discuss that with Rob Herring, as that was his proposal.
> He'll expect a counterproposal though, which we don't have...
> 

However, Getting the NUMA node IDs from PCI host bridge and CPUs aren't
working out. I will explain in another thread.

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] numa: Set default distance map if needed
  2021-10-12 13:53           ` Igor Mammedov
@ 2021-10-12 23:32             ` Gavin Shan
  2021-10-13  9:32               ` Igor Mammedov
  2021-10-13  6:29             ` Andrew Jones
  1 sibling, 1 reply; 24+ messages in thread
From: Gavin Shan @ 2021-10-12 23:32 UTC (permalink / raw)
  To: Igor Mammedov, Andrew Jones
  Cc: robh, ehabkost, peter.maydell, qemu-devel, qemu-arm, shan.gavin

Hi Igor,

On 10/13/21 12:53 AM, Igor Mammedov wrote:
> On Tue, 12 Oct 2021 15:13:08 +0200
> Andrew Jones <drjones@redhat.com> wrote: 
>> On Tue, Oct 12, 2021 at 02:27:54PM +0200, Igor Mammedov wrote:
>>> On Tue, 12 Oct 2021 12:37:54 +0200
>>> Andrew Jones <drjones@redhat.com> wrote:   
>>>> On Tue, Oct 12, 2021 at 11:40:16AM +0200, Igor Mammedov wrote:
>>>>> On Wed,  6 Oct 2021 18:22:08 +0800
>>>>> Gavin Shan <gshan@redhat.com> wrote:
>>>>>      
>>>>>> The following option is used to specify the distance map. It's
>>>>>> possible the option isn't provided by user. In this case, the
>>>>>> distance map isn't populated and exposed to platform. On the
>>>>>> other hand, the empty NUMA node, where no memory resides, is
>>>>>> allowed on ARM64 virt platform. For these empty NUMA nodes,
>>>>>> their corresponding device-tree nodes aren't populated, but
>>>>>> their NUMA IDs should be included in the "/distance-map"
>>>>>> device-tree node, so that kernel can probe them properly if
>>>>>> device-tree is used.
>>>>>>
>>>>>>    -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
>>>>>>
>>>>>> So when user doesn't specify distance map, we need to generate
>>>>>> the default distance map, where the local and remote distances
>>>>>> are 10 and 20 separately. This adds an extra parameter to the
>>>>>> exiting complete_init_numa_distance() to generate the default
>>>>>> distance map for this case.
>>>>>>
>>>>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>>>>
>>>>>
>>>>> how about error-ing out if distance map is required but
>>>>> not provided by user explicitly and asking user to fix
>>>>> command line?
>>>>>
>>>>> Reasoning behind this that defaults are hard to maintain
>>>>> and will require compat hacks and being raod blocks down
>>>>> the road.
>>>>> Approach I was taking with generic NUMA code, is deprecating
>>>>> defaults and replacing them with sanity checks, which bail
>>>>> out on incorrect configuration and ask user to correct command line.
>>>>> Hence I dislike approach taken in this patch.
>>>>>
>>>>> If you really wish to provide default, push it out of
>>>>> generic code into ARM specific one
>>>>> (then I won't oppose it that much (I think PPC does
>>>>> some magic like this))
>>>>> Also behavior seems to be ARM specific so generic
>>>>> NUMA code isn't a place for it anyways
>>>>
>>>> The distance-map DT node and the default 10/20 distance-map values
>>>> aren't arch-specific. RISCV is using it too.
>>>>
>>>> I'm on the fence with this. I see erroring-out to require users
>>>> to provide explicit command lines as a good thing, but I also
>>>> see it as potentially an unnecessary burden for those that want
>>>> the default map anyway. The optional nature of the distance-map
>>>> node and the specification of the default map is here [1]
>>>>
>>>> [1] Linux source: Documentation/devicetree/bindings/numa.txt
>>>
>>> Looking at proposed linux patches [ https://lkml.org/lkml/2021/9/27/31 ],
>>> using optional distance table as source for numa-node-ids,
>>> looks like a hack around kernel's inability to fish them out
>>> from CPU &| PCI nodes (using those nodes as source should
>>> cover memory-less node use-case).
>>>
>>> I consider including optional node as a policy decision.
>>> So user shall include it explicitly on QEMU command line
>>> if necessary (that works just fine for x86), or guest OS
>>> can make up defaults on its own in absence of data.
>>
>> OK, so erroring-out on configs that must provide distance-maps, rather
>> than automatically generating them for all configs is better.
>>
>>>    
>>>> So, my r-b stands for this patch, but I also wouldn't complain
>>>> about respinning it to error out instead.
>>>    
>>>> I would complain about
>>>> moving the logic to Arm specific code, though, since RISCV would
>>>> then need to duplicate it.
>>>
>>> Instead of putting workaround in QEMU and then making them generic,
>>> I'd prefer to:
>>>   1. make QEMU to be able generate DT with memory-less nodes
>>
>> How? DT syntax doesn't allow this, because each node needs a unique
>> name which is derived from its base address, which an empty numa
> you are talking about memory@foo nodes, aren't you?
> 
>> node doesn't have.
> 
> Looking at Documentation/devicetree/bindings/numa.txt
> 
> mem/cpu/pci nodes also contain numa-node-id attribute,
> so idea is to collect IDs from all present sources
> instead of abusing distance map.
>   
> That would allow QEMU to skip memory@foo elements for
> memory-less nodes because they obviously do not exist
> and there is no way to describe them using 'memory' nodes.
> 

I don't think it's feasible because it's hard to elaborate NUMA node IDs
from this sort of sources. Apart from mem/cpu/pci, the NUMA node IDs
can be included into platform devices, which could be vendor specific
sometimes. Other type of devices, which I don't know, could include
NUMA node IDs either.

Besides, things become more complicated when hotplug is considered.
For example, the hot-added CPU is associated with a non-existing
NUMA node. The CPU hot-add fails until the associated NUMA node
is initialized. This means CPU/mem hotplug have to be twisted.

So the point is to elaborate the NUMA node IDs from the limited
source: mem/cpu/distance-map. The distance-map is optional in
current Linux implementation.

>>>   2. fix guest to get numa-node-id from CPU/PCI nodes if
>>>      memory node isn't present,
>>
>> I'm not sure that's possible with DT. If it is, then proposing it
>> upstream to Linux DT maintainers would be the next step.
> Added Rob to CC.
> 

As explained above.

>>
>>> or use ACPI tables which can
>>>      describe memory-less NUMA nodes if fixing how DT is
>>>      parsed unfeasible.
>>
>> We use ACPI already for our guests, but we also generate a DT (which
>> edk2 consumes). We can't generate a valid DT when empty numa nodes
> does edk2 actually uses numa info from QEMU?
> 
>> are put on the command line unless we follow a DT spec saying how
>> to do that. The current spec says we should have a distance-map
>> that contains those nodes.
> 
> can you point out to the spec and place within it, pls?
>

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20211012&id=58ae0b51506802713aa0e9956d1853ba4c722c98
("Documentation, dt, numa: Add note to empty NUMA node")

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] numa: Set default distance map if needed
  2021-10-12 13:53           ` Igor Mammedov
  2021-10-12 23:32             ` Gavin Shan
@ 2021-10-13  6:29             ` Andrew Jones
  1 sibling, 0 replies; 24+ messages in thread
From: Andrew Jones @ 2021-10-13  6:29 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: peter.maydell, Gavin Shan, ehabkost, robh, qemu-devel, qemu-arm,
	shan.gavin

On Tue, Oct 12, 2021 at 03:53:21PM +0200, Igor Mammedov wrote:
...
> > > 
> > > Instead of putting workaround in QEMU and then making them generic,
> > > I'd prefer to:
> > >  1. make QEMU to be able generate DT with memory-less nodes  
> > 
> > How? DT syntax doesn't allow this, because each node needs a unique
> > name which is derived from its base address, which an empty numa
> you are talking about memory@foo nodes, aren't you?

yes, memory@<address> nodes

...
> > 
> > > or use ACPI tables which can
> > >     describe memory-less NUMA nodes if fixing how DT is
> > >     parsed unfeasible.  
> > 
> > We use ACPI already for our guests, but we also generate a DT (which
> > edk2 consumes). We can't generate a valid DT when empty numa nodes
> does edk2 actually uses numa info from QEMU?
>

edk2 doesn't use it, but I'd prefer we generate a DT which describes
the user's input and what the ACPI tables describe. Maybe someday it
won't be possible, but so far we've managed.

Thanks,
drew



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] numa: Set default distance map if needed
  2021-10-12 23:32             ` Gavin Shan
@ 2021-10-13  9:32               ` Igor Mammedov
  0 siblings, 0 replies; 24+ messages in thread
From: Igor Mammedov @ 2021-10-13  9:32 UTC (permalink / raw)
  To: Gavin Shan
  Cc: peter.maydell, Andrew Jones, ehabkost, robh, qemu-devel,
	qemu-arm, shan.gavin

On Wed, 13 Oct 2021 10:32:18 +1100
Gavin Shan <gshan@redhat.com> wrote:

> Hi Igor,
> 
> On 10/13/21 12:53 AM, Igor Mammedov wrote:
> > On Tue, 12 Oct 2021 15:13:08 +0200
> > Andrew Jones <drjones@redhat.com> wrote:   
> >> On Tue, Oct 12, 2021 at 02:27:54PM +0200, Igor Mammedov wrote:  
> >>> On Tue, 12 Oct 2021 12:37:54 +0200
> >>> Andrew Jones <drjones@redhat.com> wrote:     
> >>>> On Tue, Oct 12, 2021 at 11:40:16AM +0200, Igor Mammedov wrote:  
> >>>>> On Wed,  6 Oct 2021 18:22:08 +0800
> >>>>> Gavin Shan <gshan@redhat.com> wrote:
> >>>>>        
> >>>>>> The following option is used to specify the distance map. It's
> >>>>>> possible the option isn't provided by user. In this case, the
> >>>>>> distance map isn't populated and exposed to platform. On the
> >>>>>> other hand, the empty NUMA node, where no memory resides, is
> >>>>>> allowed on ARM64 virt platform. For these empty NUMA nodes,
> >>>>>> their corresponding device-tree nodes aren't populated, but
> >>>>>> their NUMA IDs should be included in the "/distance-map"
> >>>>>> device-tree node, so that kernel can probe them properly if
> >>>>>> device-tree is used.
> >>>>>>
> >>>>>>    -numa,dist,src=<numa_id>,dst=<numa_id>,val=<distance>
> >>>>>>
> >>>>>> So when user doesn't specify distance map, we need to generate
> >>>>>> the default distance map, where the local and remote distances
> >>>>>> are 10 and 20 separately. This adds an extra parameter to the
> >>>>>> exiting complete_init_numa_distance() to generate the default
> >>>>>> distance map for this case.
> >>>>>>
> >>>>>> Signed-off-by: Gavin Shan <gshan@redhat.com>  
> >>>>>
> >>>>>
> >>>>> how about error-ing out if distance map is required but
> >>>>> not provided by user explicitly and asking user to fix
> >>>>> command line?
> >>>>>
> >>>>> Reasoning behind this that defaults are hard to maintain
> >>>>> and will require compat hacks and being raod blocks down
> >>>>> the road.
> >>>>> Approach I was taking with generic NUMA code, is deprecating
> >>>>> defaults and replacing them with sanity checks, which bail
> >>>>> out on incorrect configuration and ask user to correct command line.
> >>>>> Hence I dislike approach taken in this patch.
> >>>>>
> >>>>> If you really wish to provide default, push it out of
> >>>>> generic code into ARM specific one
> >>>>> (then I won't oppose it that much (I think PPC does
> >>>>> some magic like this))
> >>>>> Also behavior seems to be ARM specific so generic
> >>>>> NUMA code isn't a place for it anyways  
> >>>>
> >>>> The distance-map DT node and the default 10/20 distance-map values
> >>>> aren't arch-specific. RISCV is using it too.
> >>>>
> >>>> I'm on the fence with this. I see erroring-out to require users
> >>>> to provide explicit command lines as a good thing, but I also
> >>>> see it as potentially an unnecessary burden for those that want
> >>>> the default map anyway. The optional nature of the distance-map
> >>>> node and the specification of the default map is here [1]
> >>>>
> >>>> [1] Linux source: Documentation/devicetree/bindings/numa.txt  
> >>>
> >>> Looking at proposed linux patches [ https://lkml.org/lkml/2021/9/27/31 ],
> >>> using optional distance table as source for numa-node-ids,
> >>> looks like a hack around kernel's inability to fish them out
> >>> from CPU &| PCI nodes (using those nodes as source should
> >>> cover memory-less node use-case).
> >>>
> >>> I consider including optional node as a policy decision.
> >>> So user shall include it explicitly on QEMU command line
> >>> if necessary (that works just fine for x86), or guest OS
> >>> can make up defaults on its own in absence of data.  
> >>
> >> OK, so erroring-out on configs that must provide distance-maps, rather
> >> than automatically generating them for all configs is better.
> >>  
> >>>      
> >>>> So, my r-b stands for this patch, but I also wouldn't complain
> >>>> about respinning it to error out instead.  
> >>>      
> >>>> I would complain about
> >>>> moving the logic to Arm specific code, though, since RISCV would
> >>>> then need to duplicate it.  
> >>>
> >>> Instead of putting workaround in QEMU and then making them generic,
> >>> I'd prefer to:
> >>>   1. make QEMU to be able generate DT with memory-less nodes  
> >>
> >> How? DT syntax doesn't allow this, because each node needs a unique
> >> name which is derived from its base address, which an empty numa  
> > you are talking about memory@foo nodes, aren't you?
> >   
> >> node doesn't have.  
> > 
> > Looking at Documentation/devicetree/bindings/numa.txt
> > 
> > mem/cpu/pci nodes also contain numa-node-id attribute,
> > so idea is to collect IDs from all present sources
> > instead of abusing distance map.
> >   
> > That would allow QEMU to skip memory@foo elements for
> > memory-less nodes because they obviously do not exist
> > and there is no way to describe them using 'memory' nodes.
> >   
> 
> I don't think it's feasible because it's hard to elaborate NUMA node IDs
> from this sort of sources. Apart from mem/cpu/pci, the NUMA node IDs
> can be included into platform devices, which could be vendor specific
> sometimes. Other type of devices, which I don't know, could include
> NUMA node IDs either.

Most likely mem/cpu(/pci) nodes are sufficient to get node ids
(as they can be node forming entities, not sure about PCI)
So forcing QEMU to provide the yet another optional node
to describe numa-ids that could be figured out from already
present nodes doesn't look like a good idea. Sure thing it's
a simple one from guest kernel pov (though I doubt that it's
any harder to parse cpu nodes in addition to memory ones
to get numa-ids), but otherwise it puts unnecessary
restrictions on QEMU. As for hotplug see comment below.

 
> Besides, things become more complicated when hotplug is considered.
> For example, the hot-added CPU is associated with a non-existing
> NUMA node. The CPU hot-add fails until the associated NUMA node
> is initialized. This means CPU/mem hotplug have to be twisted.

Is hotplug even a thing with device tree (I don't thinks so)?
With QEMU we use ACPI for hotplug, so from arm/virt machine
point of view we probably do not care about theoretical
hotplug with device-tree.


> So the point is to elaborate the NUMA node IDs from the limited
> source: mem/cpu/distance-map. The distance-map is optional in
> current Linux implementation.
> 
> >>>   2. fix guest to get numa-node-id from CPU/PCI nodes if
> >>>      memory node isn't present,  
> >>
> >> I'm not sure that's possible with DT. If it is, then proposing it
> >> upstream to Linux DT maintainers would be the next step.  
> > Added Rob to CC.
> >   
> 
> As explained above.
> 
> >>  
> >>> or use ACPI tables which can
> >>>      describe memory-less NUMA nodes if fixing how DT is
> >>>      parsed unfeasible.  
> >>
> >> We use ACPI already for our guests, but we also generate a DT (which
> >> edk2 consumes). We can't generate a valid DT when empty numa nodes  
> > does edk2 actually uses numa info from QEMU?
> >   
> >> are put on the command line unless we follow a DT spec saying how
> >> to do that. The current spec says we should have a distance-map
> >> that contains those nodes.  
> > 
> > can you point out to the spec and place within it, pls?
> >  
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20211012&id=58ae0b51506802713aa0e9956d1853ba4c722c98
> ("Documentation, dt, numa: Add note to empty NUMA node")

that's not something set in stone, it's description of
a possible implementation introduced for the sake of
this patch. And can be changed.

> Thanks,
> Gavin
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-10-13  9:49 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-06 10:22 [PATCH 0/2] hw/arm/virt: Fix qemu booting failure on device-tree Gavin Shan
2021-10-06 10:22 ` [PATCH 1/2] numa: Set default distance map if needed Gavin Shan
2021-10-06 10:35   ` Andrew Jones
2021-10-06 11:03     ` Gavin Shan
2021-10-06 11:56       ` Andrew Jones
2021-10-07 23:51         ` Gavin Shan
2021-10-08  6:07           ` Andrew Jones
2021-10-12  6:13             ` Gavin Shan
2021-10-12  9:40   ` Igor Mammedov
2021-10-12 10:31     ` Gavin Shan
2021-10-12 11:18       ` Igor Mammedov
2021-10-12 11:48       ` Andrew Jones
2021-10-12 12:34         ` Igor Mammedov
2021-10-12 13:05           ` Andrew Jones
2021-10-12 22:59             ` Gavin Shan
2021-10-12 10:37     ` Andrew Jones
2021-10-12 12:27       ` Igor Mammedov
2021-10-12 13:13         ` Andrew Jones
2021-10-12 13:53           ` Igor Mammedov
2021-10-12 23:32             ` Gavin Shan
2021-10-13  9:32               ` Igor Mammedov
2021-10-13  6:29             ` Andrew Jones
2021-10-06 10:22 ` [PATCH 2/2] hw/arm/virt: Don't create device-tree node for empty NUMA node Gavin Shan
2021-10-06 10:36   ` Andrew Jones

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.