All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Fix couple of issues with AMD topology
@ 2020-06-08 20:18 Babu Moger
  2020-06-08 20:18 ` [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU hotplug Babu Moger
  2020-06-08 20:18 ` [PATCH 2/2] i386: Simplify CPUID_8000_001E for AMD Babu Moger
  0 siblings, 2 replies; 13+ messages in thread
From: Babu Moger @ 2020-06-08 20:18 UTC (permalink / raw)
  To: pbonzini, rth, ehabkost, mst, marcel.apfelbaum; +Cc: qemu-devel

This series fixes couple of issues with recent topology related code.
1. Fix uninitialized memory with -device and CPU hotplug
2. Simplify CPUID_8000_001E to generalize the support for higher
   number of cores and nodes

Here are the threads discussing the issue.
https://lore.kernel.org/qemu-devel/20200602175212.GH577771@habkost.net/
https://lore.kernel.org/qemu-devel/20200602175212.GH577771@habkost.net/
---

Babu Moger (2):
      hw/386: Fix uninitialized memory with -device and CPU hotplug
      i386: Simplify CPUID_8000_001E for AMD


 hw/i386/pc.c               |    2 +
 include/hw/i386/topology.h |   11 ++++++
 target/i386/cpu.c          |   78 +++++++++++++++++++++-----------------------
 3 files changed, 50 insertions(+), 41 deletions(-)

--


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU hotplug
  2020-06-08 20:18 [PATCH 0/2] Fix couple of issues with AMD topology Babu Moger
@ 2020-06-08 20:18 ` Babu Moger
  2020-06-16 10:59   ` Igor Mammedov
  2020-06-08 20:18 ` [PATCH 2/2] i386: Simplify CPUID_8000_001E for AMD Babu Moger
  1 sibling, 1 reply; 13+ messages in thread
From: Babu Moger @ 2020-06-08 20:18 UTC (permalink / raw)
  To: pbonzini, rth, ehabkost, mst, marcel.apfelbaum; +Cc: qemu-devel

Noticed the following command failure while testing CPU hotplug.

$ qemu-system-x86_64 -machine q35,accel=kvm -smp 1,maxcpus=2,
  cores=1, threads=1,sockets=2 -cpu EPYC -device EPYC-x86_64-
  cpu,core-id=0,socket-id=1,thread-id=0

  qemu-system-x86_64: -device EPYC-x86_64-cpu,core-id=0,socket-id=1,
  thread-id=0: Invalid CPU [socket: 21855, die: 0, core: 0, thread: 0]
  with APIC ID 21855, valid index range 0:1

This happens because APIC ID is calculated using uninitialized memory.
This is happening after the addition of new field node_id in X86CPUTopoIDs
structure. The node_id field is uninitialized while calling
apicid_from_topo_ids. The problem is discussed in the thread below.
https://lore.kernel.org/qemu-devel/20200602171838.GG577771@habkost.net/

Fix the problem by initializing the node_id properly.

Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 hw/i386/pc.c               |    2 ++
 include/hw/i386/topology.h |   11 +++++++++++
 2 files changed, 13 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 2128f3d6fe..974cc30891 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1585,6 +1585,8 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
         topo_ids.die_id = cpu->die_id;
         topo_ids.core_id = cpu->core_id;
         topo_ids.smt_id = cpu->thread_id;
+        topo_ids.node_id = cpu_x86_use_epyc_apic_id_encoding(ms->cpu_type) ?
+                           x86_node_id_for_epyc(&topo_info, &topo_ids) : 0;
         cpu->apic_id = x86ms->apicid_from_topo_ids(&topo_info, &topo_ids);
     }
 
diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
index 07239f95f4..ee4deb84c4 100644
--- a/include/hw/i386/topology.h
+++ b/include/hw/i386/topology.h
@@ -140,6 +140,17 @@ static inline unsigned apicid_pkg_offset_epyc(X86CPUTopoInfo *topo_info)
            apicid_node_width_epyc(topo_info);
 }
 
+static inline unsigned x86_node_id_for_epyc(X86CPUTopoInfo *topo_info,
+                                            const X86CPUTopoIDs *topo_ids)
+{
+    unsigned nr_nodes = MAX(topo_info->nodes_per_pkg, 1);
+    unsigned cores_per_node = DIV_ROUND_UP((topo_info->dies_per_pkg *
+                                            topo_info->cores_per_die *
+                                            topo_info->threads_per_core),
+                                            nr_nodes);
+
+    return (topo_ids->core_id / cores_per_node) % nr_nodes;
+}
 /*
  * Make APIC ID for the CPU based on Pkg_ID, Core_ID, SMT_ID
  *



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] i386: Simplify CPUID_8000_001E for AMD
  2020-06-08 20:18 [PATCH 0/2] Fix couple of issues with AMD topology Babu Moger
  2020-06-08 20:18 ` [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU hotplug Babu Moger
@ 2020-06-08 20:18 ` Babu Moger
  1 sibling, 0 replies; 13+ messages in thread
From: Babu Moger @ 2020-06-08 20:18 UTC (permalink / raw)
  To: pbonzini, rth, ehabkost, mst, marcel.apfelbaum; +Cc: qemu-devel

apic_id contains all the information required to build
CPUID_8000_001E. core_id and node_id is already part of
apic_id generated by x86_topo_ids_from_apicid_epyc.
Also remove the restriction on number bits on core_id and
node_id.

Remove all the hardcoded values and replace with generalized
fields.

Refer the Processor Programming Reference (PPR) documentation
available from the bugzilla Link below.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
 target/i386/cpu.c |   78 +++++++++++++++++++++++++----------------------------
 1 file changed, 37 insertions(+), 41 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 3733d9a279..166d2f2bde 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -388,57 +388,53 @@ static void encode_topo_cpuid8000001e(X86CPUTopoInfo *topo_info, X86CPU *cpu,
 {
     X86CPUTopoIDs topo_ids = {0};
     unsigned long nodes = MAX(topo_info->nodes_per_pkg, 1);
-    int shift;
 
     x86_topo_ids_from_apicid_epyc(cpu->apic_id, topo_info, &topo_ids);
 
     *eax = cpu->apic_id;
+
     /*
+     * CPUID_Fn8000001E_EBX [Core Identifiers] (CoreId)
+     * Read-only. Reset: 0000_XXXXh.
+     * See Core::X86::Cpuid::ExtApicId.
+     * Core::X86::Cpuid::CoreId_lthree[1:0]_core[3:0]_thread[1:0];
      * CPUID_Fn8000001E_EBX
-     * 31:16 Reserved
-     * 15:8  Threads per core (The number of threads per core is
-     *       Threads per core + 1)
-     *  7:0  Core id (see bit decoding below)
-     *       SMT:
-     *           4:3 node id
-     *             2 Core complex id
-     *           1:0 Core id
-     *       Non SMT:
-     *           5:4 node id
-     *             3 Core complex id
-     *           1:0 Core id
+     * Bits Description
+     * 31:16 Reserved.
+     * 15:8 ThreadsPerCore: threads per core. Read-only. Reset: XXh.
+     *      The number of threads per core is ThreadsPerCore+1.
+     *  7:0 CoreId: core ID. Read-only. Reset: XXh.
+     *
+     *  NOTE: CoreId is already part of apic_id. Just use it. We can
+     *  use all the 8 bits to represent the core_id here.
      */
-    *ebx = ((topo_info->threads_per_core - 1) << 8) | (topo_ids.node_id << 3) |
-            (topo_ids.core_id);
+    *ebx = ((topo_info->threads_per_core - 1) << 8) | (topo_ids.core_id & 0xFF);
+
     /*
+     * CPUID_Fn8000001E_ECX [Node Identifiers] (NodeId)
+     * Read-only. Reset: 0000_0XXXh.
+     * Core::X86::Cpuid::NodeId_lthree[1:0]_core[3:0]_thread[1:0];
      * CPUID_Fn8000001E_ECX
-     * 31:11 Reserved
-     * 10:8  Nodes per processor (Nodes per processor is number of nodes + 1)
-     *  7:0  Node id (see bit decoding below)
-     *         2  Socket id
-     *       1:0  Node id
+     * Bits Description
+     * 31:11 Reserved.
+     * 10:8 NodesPerProcessor: Node per processor. Read-only. Reset: XXXb.
+     *      ValidValues:
+     *      Value Description
+     *      000b  1 node per processor.
+     *      001b  2 nodes per processor.
+     *      010b Reserved.
+     *      011b 4 nodes per processor.
+     *      111b-100b Reserved.
+     *  7:0 NodeId: Node ID. Read-only. Reset: XXh.
+     *
+     * NOTE: Hardware reserves 3 bits for number of nodes per processor.
+     * But users can create more nodes than the actual hardware can
+     * support. To genaralize we can use all the upper 8 bits for nodes.
+     * NodeId is combination of node and socket_id which is already decoded
+     * in apic_id. Just use it by shifting.
      */
-    if (nodes <= 4) {
-        *ecx = ((nodes - 1) << 8) | (topo_ids.pkg_id << 2) | topo_ids.node_id;
-    } else {
-        /*
-         * Node id fix up. Actual hardware supports up to 4 nodes. But with
-         * more than 32 cores, we may end up with more than 4 nodes.
-         * Node id is a combination of socket id and node id. Only requirement
-         * here is that this number should be unique accross the system.
-         * Shift the socket id to accommodate more nodes. We dont expect both
-         * socket id and node id to be big number at the same time. This is not
-         * an ideal config but we need to to support it. Max nodes we can have
-         * is 32 (255/8) with 8 cores per node and 255 max cores. We only need
-         * 5 bits for nodes. Find the left most set bit to represent the total
-         * number of nodes. find_last_bit returns last set bit(0 based). Left
-         * shift(+1) the socket id to represent all the nodes.
-         */
-        nodes -= 1;
-        shift = find_last_bit(&nodes, 8);
-        *ecx = (nodes << 8) | (topo_ids.pkg_id << (shift + 1)) |
-               topo_ids.node_id;
-    }
+    *ecx = ((nodes - 1) << 8) |
+            ((cpu->apic_id >> apicid_node_offset_epyc(topo_info)) & 0xFF);
     *edx = 0;
 }
 



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU hotplug
  2020-06-08 20:18 ` [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU hotplug Babu Moger
@ 2020-06-16 10:59   ` Igor Mammedov
  2020-06-16 17:18     ` Babu Moger
  0 siblings, 1 reply; 13+ messages in thread
From: Igor Mammedov @ 2020-06-16 10:59 UTC (permalink / raw)
  To: Babu Moger; +Cc: ehabkost, mst, qemu-devel, pbonzini, rth

On Mon, 08 Jun 2020 15:18:50 -0500
Babu Moger <babu.moger@amd.com> wrote:

> Noticed the following command failure while testing CPU hotplug.
> 
> $ qemu-system-x86_64 -machine q35,accel=kvm -smp 1,maxcpus=2,
>   cores=1, threads=1,sockets=2 -cpu EPYC -device EPYC-x86_64-
>   cpu,core-id=0,socket-id=1,thread-id=0
> 
>   qemu-system-x86_64: -device EPYC-x86_64-cpu,core-id=0,socket-id=1,
>   thread-id=0: Invalid CPU [socket: 21855, die: 0, core: 0, thread: 0]
>   with APIC ID 21855, valid index range 0:1
> 
> This happens because APIC ID is calculated using uninitialized memory.
> This is happening after the addition of new field node_id in X86CPUTopoIDs
> structure. The node_id field is uninitialized while calling
> apicid_from_topo_ids. The problem is discussed in the thread below.
> https://lore.kernel.org/qemu-devel/20200602171838.GG577771@habkost.net/
> 
> Fix the problem by initializing the node_id properly.
> 
> Signed-off-by: Babu Moger <babu.moger@amd.com>
> ---
>  hw/i386/pc.c               |    2 ++
>  include/hw/i386/topology.h |   11 +++++++++++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 2128f3d6fe..974cc30891 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1585,6 +1585,8 @@ static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
>          topo_ids.die_id = cpu->die_id;
>          topo_ids.core_id = cpu->core_id;
>          topo_ids.smt_id = cpu->thread_id;
> +        topo_ids.node_id = cpu_x86_use_epyc_apic_id_encoding(ms->cpu_type) ?
> +                           x86_node_id_for_epyc(&topo_info, &topo_ids) : 0;

I'd rather not calculate some default value here,
this is the branch where we check user provided topology info and error out asking
to provide missing bits.

I also wonder if we should force user to specify numa nodes on CLI if EPYC cpu is used.
(i.e. I'm assuming that EPYC always requires numa)

>          cpu->apic_id = x86ms->apicid_from_topo_ids(&topo_info, &topo_ids);
>      }
>  
> diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
> index 07239f95f4..ee4deb84c4 100644
> --- a/include/hw/i386/topology.h
> +++ b/include/hw/i386/topology.h
> @@ -140,6 +140,17 @@ static inline unsigned apicid_pkg_offset_epyc(X86CPUTopoInfo *topo_info)
>             apicid_node_width_epyc(topo_info);
>  }
>  
> +static inline unsigned x86_node_id_for_epyc(X86CPUTopoInfo *topo_info,
> +                                            const X86CPUTopoIDs *topo_ids)
> +{
> +    unsigned nr_nodes = MAX(topo_info->nodes_per_pkg, 1);
> +    unsigned cores_per_node = DIV_ROUND_UP((topo_info->dies_per_pkg *
> +                                            topo_info->cores_per_die *
> +                                            topo_info->threads_per_core),
> +                                            nr_nodes);
> +
> +    return (topo_ids->core_id / cores_per_node) % nr_nodes;
what if nr_nodes == 0?

> +}
>  /*
>   * Make APIC ID for the CPU based on Pkg_ID, Core_ID, SMT_ID
>   *
> 
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU hotplug
  2020-06-16 10:59   ` Igor Mammedov
@ 2020-06-16 17:18     ` Babu Moger
  2020-06-24 13:47       ` Igor Mammedov
  0 siblings, 1 reply; 13+ messages in thread
From: Babu Moger @ 2020-06-16 17:18 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: ehabkost, mst, qemu-devel, pbonzini, rth



> -----Original Message-----
> From: Igor Mammedov <imammedo@redhat.com>
> Sent: Tuesday, June 16, 2020 5:59 AM
> To: Moger, Babu <Babu.Moger@amd.com>
> Cc: pbonzini@redhat.com; rth@twiddle.net; ehabkost@redhat.com;
> mst@redhat.com; marcel.apfelbaum@gmail.com; qemu-devel@nongnu.org
> Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU
> hotplug
> 
> On Mon, 08 Jun 2020 15:18:50 -0500
> Babu Moger <babu.moger@amd.com> wrote:
> 
> > Noticed the following command failure while testing CPU hotplug.
> >
> > $ qemu-system-x86_64 -machine q35,accel=kvm -smp 1,maxcpus=2,
> >   cores=1, threads=1,sockets=2 -cpu EPYC -device EPYC-x86_64-
> >   cpu,core-id=0,socket-id=1,thread-id=0
> >
> >   qemu-system-x86_64: -device EPYC-x86_64-cpu,core-id=0,socket-id=1,
> >   thread-id=0: Invalid CPU [socket: 21855, die: 0, core: 0, thread: 0]
> >   with APIC ID 21855, valid index range 0:1
> >
> > This happens because APIC ID is calculated using uninitialized memory.
> > This is happening after the addition of new field node_id in X86CPUTopoIDs
> > structure. The node_id field is uninitialized while calling
> > apicid_from_topo_ids. The problem is discussed in the thread below.
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ker
> nel.org%2Fqemu-
> devel%2F20200602171838.GG577771%40habkost.net%2F&amp;data=02%7C01
> %7Cbabu.moger%40amd.com%7C02200d75fd8b48d1955608d811e44f5b%7C3d
> d8961fe4884e608e11a82d994e183d%7C0%7C0%7C637279019564311233&amp
> ;sdata=ry3QO0Z5dxLPoRxkYVkOsVm3nl%2BxfCGv8be%2BMHdoUPY%3D&amp;r
> eserved=0
> >
> > Fix the problem by initializing the node_id properly.
> >
> > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > ---
> >  hw/i386/pc.c               |    2 ++
> >  include/hw/i386/topology.h |   11 +++++++++++
> >  2 files changed, 13 insertions(+)
> >
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index 2128f3d6fe..974cc30891 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -1585,6 +1585,8 @@ static void pc_cpu_pre_plug(HotplugHandler
> *hotplug_dev,
> >          topo_ids.die_id = cpu->die_id;
> >          topo_ids.core_id = cpu->core_id;
> >          topo_ids.smt_id = cpu->thread_id;
> > +        topo_ids.node_id = cpu_x86_use_epyc_apic_id_encoding(ms->cpu_type)
> ?
> > +                           x86_node_id_for_epyc(&topo_info, &topo_ids) : 0;
> 
> I'd rather not calculate some default value here,
> this is the branch where we check user provided topology info and error out
> asking
> to provide missing bits.
Noticed that cpu->node_id is initialized to 0xFF(NUMA_NODE_UNASSIGNED).
We can initialize cpu->node_id to default node like how we do it in
x86_get_default_cpu_node_id.  We can use it to initialize topo_ids.node_id.
This is consistent with other fields core_id, die_id etc..


> 
> I also wonder if we should force user to specify numa nodes on CLI if EPYC cpu is
> used.
> (i.e. I'm assuming that EPYC always requires numa)

That is not true. Without numa all the cpus will be configured under one
default numa node 0. Like we do it using x86_get_default_cpu_node_id.

> 
> >          cpu->apic_id = x86ms->apicid_from_topo_ids(&topo_info, &topo_ids);
> >      }
> >
> > diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
> > index 07239f95f4..ee4deb84c4 100644
> > --- a/include/hw/i386/topology.h
> > +++ b/include/hw/i386/topology.h
> > @@ -140,6 +140,17 @@ static inline unsigned
> apicid_pkg_offset_epyc(X86CPUTopoInfo *topo_info)
> >             apicid_node_width_epyc(topo_info);
> >  }
> >
> > +static inline unsigned x86_node_id_for_epyc(X86CPUTopoInfo *topo_info,
> > +                                            const X86CPUTopoIDs *topo_ids)
> > +{
> > +    unsigned nr_nodes = MAX(topo_info->nodes_per_pkg, 1);
> > +    unsigned cores_per_node = DIV_ROUND_UP((topo_info->dies_per_pkg *
> > +                                            topo_info->cores_per_die *
> > +                                            topo_info->threads_per_core),
> > +                                            nr_nodes);
> > +
> > +    return (topo_ids->core_id / cores_per_node) % nr_nodes;
> what if nr_nodes == 0?
> 
> > +}
> >  /*
> >   * Make APIC ID for the CPU based on Pkg_ID, Core_ID, SMT_ID
> >   *
> >
> >



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU hotplug
  2020-06-16 17:18     ` Babu Moger
@ 2020-06-24 13:47       ` Igor Mammedov
  2020-06-24 17:35         ` Babu Moger
  0 siblings, 1 reply; 13+ messages in thread
From: Igor Mammedov @ 2020-06-24 13:47 UTC (permalink / raw)
  To: Babu Moger; +Cc: qemu-devel, pbonzini, rth, ehabkost, mst

On Tue, 16 Jun 2020 12:18:56 -0500
Babu Moger <babu.moger@amd.com> wrote:

> > -----Original Message-----
> > From: Igor Mammedov <imammedo@redhat.com>
> > Sent: Tuesday, June 16, 2020 5:59 AM
> > To: Moger, Babu <Babu.Moger@amd.com>
> > Cc: pbonzini@redhat.com; rth@twiddle.net; ehabkost@redhat.com;
> > mst@redhat.com; marcel.apfelbaum@gmail.com; qemu-devel@nongnu.org
> > Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU
> > hotplug
> > 
> > On Mon, 08 Jun 2020 15:18:50 -0500
> > Babu Moger <babu.moger@amd.com> wrote:
> >   
> > > Noticed the following command failure while testing CPU hotplug.
> > >
> > > $ qemu-system-x86_64 -machine q35,accel=kvm -smp 1,maxcpus=2,
> > >   cores=1, threads=1,sockets=2 -cpu EPYC -device EPYC-x86_64-
> > >   cpu,core-id=0,socket-id=1,thread-id=0
> > >
> > >   qemu-system-x86_64: -device EPYC-x86_64-cpu,core-id=0,socket-id=1,
> > >   thread-id=0: Invalid CPU [socket: 21855, die: 0, core: 0, thread: 0]
> > >   with APIC ID 21855, valid index range 0:1
> > >
> > > This happens because APIC ID is calculated using uninitialized memory.
> > > This is happening after the addition of new field node_id in X86CPUTopoIDs
> > > structure. The node_id field is uninitialized while calling
> > > apicid_from_topo_ids. The problem is discussed in the thread below.
> > >  
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ker
> > nel.org%2Fqemu-
> > devel%2F20200602171838.GG577771%40habkost.net%2F&amp;data=02%7C01
> > %7Cbabu.moger%40amd.com%7C02200d75fd8b48d1955608d811e44f5b%7C3d
> > d8961fe4884e608e11a82d994e183d%7C0%7C0%7C637279019564311233&amp
> > ;sdata=ry3QO0Z5dxLPoRxkYVkOsVm3nl%2BxfCGv8be%2BMHdoUPY%3D&amp;r
> > eserved=0  
> > >
> > > Fix the problem by initializing the node_id properly.
> > >
> > > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > > ---
> > >  hw/i386/pc.c               |    2 ++
> > >  include/hw/i386/topology.h |   11 +++++++++++
> > >  2 files changed, 13 insertions(+)
> > >
> > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > > index 2128f3d6fe..974cc30891 100644
> > > --- a/hw/i386/pc.c
> > > +++ b/hw/i386/pc.c
> > > @@ -1585,6 +1585,8 @@ static void pc_cpu_pre_plug(HotplugHandler  
> > *hotplug_dev,  
> > >          topo_ids.die_id = cpu->die_id;
> > >          topo_ids.core_id = cpu->core_id;
> > >          topo_ids.smt_id = cpu->thread_id;
> > > +        topo_ids.node_id = cpu_x86_use_epyc_apic_id_encoding(ms->cpu_type)  
> > ?  
> > > +                           x86_node_id_for_epyc(&topo_info, &topo_ids) : 0;  
> > 
> > I'd rather not calculate some default value here,
> > this is the branch where we check user provided topology info and error out
> > asking
> > to provide missing bits.  
> Noticed that cpu->node_id is initialized to 0xFF(NUMA_NODE_UNASSIGNED).
> We can initialize cpu->node_id to default node like how we do it in
> x86_get_default_cpu_node_id.  We can use it to initialize topo_ids.node_id.
> This is consistent with other fields core_id, die_id etc..
> 
> > 
> > I also wonder if we should force user to specify numa nodes on CLI if EPYC cpu is
> > used.
> > (i.e. I'm assuming that EPYC always requires numa)  
> 
> That is not true. Without numa all the cpus will be configured under one
> default numa node 0. Like we do it using x86_get_default_cpu_node_id.

get_default_cpu_node_id() which is making things up, is going to be removed
eventually in favor of asking user to provide numa mapping explicitly on CLI.

now if it's always only node 0, why do we need to calculate it then,
why not just assing 0 directly?

what if we have several sockets, would all vCPUs still have node-id = 0?

Another question is if we have CPUs with only 1 numa node set on all of them,
does it require all other machinery like ACPI SRAT table defined or it's just
internal CPU impl. detail?


> >   
> > >          cpu->apic_id = x86ms->apicid_from_topo_ids(&topo_info, &topo_ids);
> > >      }
> > >
> > > diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
> > > index 07239f95f4..ee4deb84c4 100644
> > > --- a/include/hw/i386/topology.h
> > > +++ b/include/hw/i386/topology.h
> > > @@ -140,6 +140,17 @@ static inline unsigned  
> > apicid_pkg_offset_epyc(X86CPUTopoInfo *topo_info)  
> > >             apicid_node_width_epyc(topo_info);
> > >  }
> > >
> > > +static inline unsigned x86_node_id_for_epyc(X86CPUTopoInfo *topo_info,
> > > +                                            const X86CPUTopoIDs *topo_ids)
> > > +{
> > > +    unsigned nr_nodes = MAX(topo_info->nodes_per_pkg, 1);
> > > +    unsigned cores_per_node = DIV_ROUND_UP((topo_info->dies_per_pkg *
> > > +                                            topo_info->cores_per_die *
> > > +                                            topo_info->threads_per_core),
> > > +                                            nr_nodes);
> > > +
> > > +    return (topo_ids->core_id / cores_per_node) % nr_nodes;  
> > what if nr_nodes == 0?
> >   
> > > +}
> > >  /*
> > >   * Make APIC ID for the CPU based on Pkg_ID, Core_ID, SMT_ID
> > >   *
> > >
> > >  
> 
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU hotplug
  2020-06-24 13:47       ` Igor Mammedov
@ 2020-06-24 17:35         ` Babu Moger
  2020-06-25 15:18           ` Igor Mammedov
  0 siblings, 1 reply; 13+ messages in thread
From: Babu Moger @ 2020-06-24 17:35 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel, pbonzini, rth, ehabkost, mst



> -----Original Message-----
> From: Igor Mammedov <imammedo@redhat.com>
> Sent: Wednesday, June 24, 2020 8:48 AM
> To: Moger, Babu <Babu.Moger@amd.com>
> Cc: ehabkost@redhat.com; mst@redhat.com; qemu-devel@nongnu.org;
> pbonzini@redhat.com; rth@twiddle.net
> Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU
> hotplug
> 
> On Tue, 16 Jun 2020 12:18:56 -0500
> Babu Moger <babu.moger@amd.com> wrote:
> 
> > > -----Original Message-----
> > > From: Igor Mammedov <imammedo@redhat.com>
> > > Sent: Tuesday, June 16, 2020 5:59 AM
> > > To: Moger, Babu <Babu.Moger@amd.com>
> > > Cc: pbonzini@redhat.com; rth@twiddle.net; ehabkost@redhat.com;
> > > mst@redhat.com; marcel.apfelbaum@gmail.com; qemu-devel@nongnu.org
> > > Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and
> CPU
> > > hotplug
> > >
> > > On Mon, 08 Jun 2020 15:18:50 -0500
> > > Babu Moger <babu.moger@amd.com> wrote:
> > >
> > > > Noticed the following command failure while testing CPU hotplug.
> > > >
> > > > $ qemu-system-x86_64 -machine q35,accel=kvm -smp 1,maxcpus=2,
> > > >   cores=1, threads=1,sockets=2 -cpu EPYC -device EPYC-x86_64-
> > > >   cpu,core-id=0,socket-id=1,thread-id=0
> > > >
> > > >   qemu-system-x86_64: -device EPYC-x86_64-cpu,core-id=0,socket-id=1,
> > > >   thread-id=0: Invalid CPU [socket: 21855, die: 0, core: 0, thread: 0]
> > > >   with APIC ID 21855, valid index range 0:1
> > > >
> > > > This happens because APIC ID is calculated using uninitialized memory.
> > > > This is happening after the addition of new field node_id in
> X86CPUTopoIDs
> > > > structure. The node_id field is uninitialized while calling
> > > > apicid_from_topo_ids. The problem is discussed in the thread below.
> > > >
> > >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ker
> > > nel.org%2Fqemu-
> > >
> devel%2F20200602171838.GG577771%40habkost.net%2F&amp;data=02%7C01
> > >
> %7Cbabu.moger%40amd.com%7C02200d75fd8b48d1955608d811e44f5b%7C3d
> > >
> d8961fe4884e608e11a82d994e183d%7C0%7C0%7C637279019564311233&amp
> > >
> ;sdata=ry3QO0Z5dxLPoRxkYVkOsVm3nl%2BxfCGv8be%2BMHdoUPY%3D&amp;r
> > > eserved=0
> > > >
> > > > Fix the problem by initializing the node_id properly.
> > > >
> > > > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > > > ---
> > > >  hw/i386/pc.c               |    2 ++
> > > >  include/hw/i386/topology.h |   11 +++++++++++
> > > >  2 files changed, 13 insertions(+)
> > > >
> > > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > > > index 2128f3d6fe..974cc30891 100644
> > > > --- a/hw/i386/pc.c
> > > > +++ b/hw/i386/pc.c
> > > > @@ -1585,6 +1585,8 @@ static void pc_cpu_pre_plug(HotplugHandler
> > > *hotplug_dev,
> > > >          topo_ids.die_id = cpu->die_id;
> > > >          topo_ids.core_id = cpu->core_id;
> > > >          topo_ids.smt_id = cpu->thread_id;
> > > > +        topo_ids.node_id = cpu_x86_use_epyc_apic_id_encoding(ms-
> >cpu_type)
> > > ?
> > > > +                           x86_node_id_for_epyc(&topo_info, &topo_ids) : 0;
> > >
> > > I'd rather not calculate some default value here,
> > > this is the branch where we check user provided topology info and error out
> > > asking
> > > to provide missing bits.
> > Noticed that cpu->node_id is initialized to 0xFF(NUMA_NODE_UNASSIGNED).
> > We can initialize cpu->node_id to default node like how we do it in
> > x86_get_default_cpu_node_id.  We can use it to initialize topo_ids.node_id.
> > This is consistent with other fields core_id, die_id etc..
> >
> > >
> > > I also wonder if we should force user to specify numa nodes on CLI if EPYC
> cpu is
> > > used.
> > > (i.e. I'm assuming that EPYC always requires numa)
> >
> > That is not true. Without numa all the cpus will be configured under one
> > default numa node 0. Like we do it using x86_get_default_cpu_node_id.
> 
> get_default_cpu_node_id() which is making things up, is going to be removed
> eventually in favor of asking user to provide numa mapping explicitly on CLI.

That will be good going forward.

> 
> now if it's always only node 0, why do we need to calculate it then,
> why not just assing 0 directly?
> 
> what if we have several sockets, would all vCPUs still have node-id = 0?

If there are several nodes then socket id becomes node id.

> 
> Another question is if we have CPUs with only 1 numa node set on all of them,
> does it require all other machinery like ACPI SRAT table defined or it's just
> internal CPU impl. detail?

I am not very sure about it. To me it looks like it is just internal cpu
implementation detail.

I think we have two options here.

1. Update the patch to initialize the node_id the way it is done
get_default_cpu_node_id.
2. Ask the user to pass the node_id while hot plugging the device. This is
a clean solution. Will require some data structure changes.

Let me know if you see any other option.

> 
> 
> > >
> > > >          cpu->apic_id = x86ms->apicid_from_topo_ids(&topo_info,
> &topo_ids);
> > > >      }
> > > >
> > > > diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
> > > > index 07239f95f4..ee4deb84c4 100644
> > > > --- a/include/hw/i386/topology.h
> > > > +++ b/include/hw/i386/topology.h
> > > > @@ -140,6 +140,17 @@ static inline unsigned
> > > apicid_pkg_offset_epyc(X86CPUTopoInfo *topo_info)
> > > >             apicid_node_width_epyc(topo_info);
> > > >  }
> > > >
> > > > +static inline unsigned x86_node_id_for_epyc(X86CPUTopoInfo
> *topo_info,
> > > > +                                            const X86CPUTopoIDs *topo_ids)
> > > > +{
> > > > +    unsigned nr_nodes = MAX(topo_info->nodes_per_pkg, 1);
> > > > +    unsigned cores_per_node = DIV_ROUND_UP((topo_info->dies_per_pkg
> *
> > > > +                                            topo_info->cores_per_die *
> > > > +                                            topo_info->threads_per_core),
> > > > +                                            nr_nodes);
> > > > +
> > > > +    return (topo_ids->core_id / cores_per_node) % nr_nodes;
> > > what if nr_nodes == 0?
> > >
> > > > +}
> > > >  /*
> > > >   * Make APIC ID for the CPU based on Pkg_ID, Core_ID, SMT_ID
> > > >   *
> > > >
> > > >
> >
> >



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU hotplug
  2020-06-24 17:35         ` Babu Moger
@ 2020-06-25 15:18           ` Igor Mammedov
  2020-06-25 16:41             ` Babu Moger
  0 siblings, 1 reply; 13+ messages in thread
From: Igor Mammedov @ 2020-06-25 15:18 UTC (permalink / raw)
  To: Babu Moger; +Cc: qemu-devel, pbonzini, rth, ehabkost, mst

On Wed, 24 Jun 2020 12:35:59 -0500
Babu Moger <babu.moger@amd.com> wrote:

> > -----Original Message-----
> > From: Igor Mammedov <imammedo@redhat.com>
> > Sent: Wednesday, June 24, 2020 8:48 AM
> > To: Moger, Babu <Babu.Moger@amd.com>
> > Cc: ehabkost@redhat.com; mst@redhat.com; qemu-devel@nongnu.org;
> > pbonzini@redhat.com; rth@twiddle.net
> > Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU
> > hotplug
> > 
> > On Tue, 16 Jun 2020 12:18:56 -0500
> > Babu Moger <babu.moger@amd.com> wrote:
> >   
> > > > -----Original Message-----
> > > > From: Igor Mammedov <imammedo@redhat.com>
> > > > Sent: Tuesday, June 16, 2020 5:59 AM
> > > > To: Moger, Babu <Babu.Moger@amd.com>
> > > > Cc: pbonzini@redhat.com; rth@twiddle.net; ehabkost@redhat.com;
> > > > mst@redhat.com; marcel.apfelbaum@gmail.com; qemu-devel@nongnu.org
> > > > Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and  
> > CPU  
> > > > hotplug
> > > >
> > > > On Mon, 08 Jun 2020 15:18:50 -0500
> > > > Babu Moger <babu.moger@amd.com> wrote:
> > > >  
> > > > > Noticed the following command failure while testing CPU hotplug.
> > > > >
> > > > > $ qemu-system-x86_64 -machine q35,accel=kvm -smp 1,maxcpus=2,
> > > > >   cores=1, threads=1,sockets=2 -cpu EPYC -device EPYC-x86_64-
> > > > >   cpu,core-id=0,socket-id=1,thread-id=0
> > > > >
> > > > >   qemu-system-x86_64: -device EPYC-x86_64-cpu,core-id=0,socket-id=1,
> > > > >   thread-id=0: Invalid CPU [socket: 21855, die: 0, core: 0, thread: 0]
> > > > >   with APIC ID 21855, valid index range 0:1
> > > > >
> > > > > This happens because APIC ID is calculated using uninitialized memory.
> > > > > This is happening after the addition of new field node_id in  
> > X86CPUTopoIDs  
> > > > > structure. The node_id field is uninitialized while calling
> > > > > apicid_from_topo_ids. The problem is discussed in the thread below.
> > > > >  
> > > >  
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ker  
> > > > nel.org%2Fqemu-
> > > >  
> > devel%2F20200602171838.GG577771%40habkost.net%2F&amp;data=02%7C01  
> > > >  
> > %7Cbabu.moger%40amd.com%7C02200d75fd8b48d1955608d811e44f5b%7C3d  
> > > >  
> > d8961fe4884e608e11a82d994e183d%7C0%7C0%7C637279019564311233&amp  
> > > >  
> > ;sdata=ry3QO0Z5dxLPoRxkYVkOsVm3nl%2BxfCGv8be%2BMHdoUPY%3D&amp;r  
> > > > eserved=0  
> > > > >
> > > > > Fix the problem by initializing the node_id properly.
> > > > >
> > > > > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > > > > ---
> > > > >  hw/i386/pc.c               |    2 ++
> > > > >  include/hw/i386/topology.h |   11 +++++++++++
> > > > >  2 files changed, 13 insertions(+)
> > > > >
> > > > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > > > > index 2128f3d6fe..974cc30891 100644
> > > > > --- a/hw/i386/pc.c
> > > > > +++ b/hw/i386/pc.c
> > > > > @@ -1585,6 +1585,8 @@ static void pc_cpu_pre_plug(HotplugHandler  
> > > > *hotplug_dev,  
> > > > >          topo_ids.die_id = cpu->die_id;
> > > > >          topo_ids.core_id = cpu->core_id;
> > > > >          topo_ids.smt_id = cpu->thread_id;
> > > > > +        topo_ids.node_id = cpu_x86_use_epyc_apic_id_encoding(ms-  
> > >cpu_type)  
> > > > ?  
> > > > > +                           x86_node_id_for_epyc(&topo_info, &topo_ids) : 0;  
> > > >
> > > > I'd rather not calculate some default value here,
> > > > this is the branch where we check user provided topology info and error out
> > > > asking
> > > > to provide missing bits.  
> > > Noticed that cpu->node_id is initialized to 0xFF(NUMA_NODE_UNASSIGNED).
> > > We can initialize cpu->node_id to default node like how we do it in
> > > x86_get_default_cpu_node_id.  We can use it to initialize topo_ids.node_id.
> > > This is consistent with other fields core_id, die_id etc..
> > >  
> > > >
> > > > I also wonder if we should force user to specify numa nodes on CLI if EPYC  
> > cpu is  
> > > > used.
> > > > (i.e. I'm assuming that EPYC always requires numa)  
> > >
> > > That is not true. Without numa all the cpus will be configured under one
> > > default numa node 0. Like we do it using x86_get_default_cpu_node_id.  
> > 
> > get_default_cpu_node_id() which is making things up, is going to be removed
> > eventually in favor of asking user to provide numa mapping explicitly on CLI.  
> 
> That will be good going forward.
> 
> > 
> > now if it's always only node 0, why do we need to calculate it then,
> > why not just assing 0 directly?
> > 
> > what if we have several sockets, would all vCPUs still have node-id = 0?  
> 
> If there are several nodes then socket id becomes node id.
I wonder if node id == socket id then why bother with node_id at all,
probably node id is there to allow for design where several sockets are on
the same node


> > Another question is if we have CPUs with only 1 numa node set on all of them,
> > does it require all other machinery like ACPI SRAT table defined or it's just
> > internal CPU impl. detail?  
> 
> I am not very sure about it. To me it looks like it is just internal cpu
> implementation detail.
I'd think it might confuse guest OS, when it decodes more than 1 numa node
for APIC ID/CPUID but then there are no such nodes described in ACPI.
While it might work for caches, it would miss any relation of memory mapping
to nodes or get it wrong if one doesn't match another.


> I think we have two options here.
> 
> 1. Update the patch to initialize the node_id the way it is done
> get_default_cpu_node_id.

if it were only one node for every CPU (incl. multisocket), I'd go with enabling
autonuma assigning all CPUs to default 0 node-id, since there is no ambiguity where
CPUs and RAM are mapped to.
Is it possible to use node-id=0 for all EPYC CPUs even in multisocket config?
(it seems spec allows only one node per socket, but doesn't say that node ids must
be different.)
If not, then making up node-id is not an option. 

> 2. Ask the user to pass the node_id while hot plugging the device. This is
> a clean solution. Will require some data structure changes.

Here is my brain dump of current very non obvious flow:

  1. x86_possible_cpu_arch_ids()
         ms->possible_cpus->cpus[i].props.* = x86_topo_ids_from_idx()

  2. possible numa_cpu_set()
         ms->possible_cpus->cpus[i].props.node_id = user input|0 -in autonuma case

  3. x86_cpus_init()
         // generate apic_id AND makeup node_id embedded into it
         ms->possible_cpus->cpus[i].arch_id = x86_cpu_apic_id_from_index(x86ms, i);
                        -> x86_apicid_from_cpu_idx_epyc() -> x86_topo_ids_from_idx_epyc()
                                                                 same as x86_topo_ids_from_idx() + node_id
                     or
                        -> x86_apicid_from_cpu_idx() -> x86_topo_ids_from_idx() 

  4. pc_cpu_pre_plug()
         // basically topo ids module node-id is not set or user provided
         cpu_slot = pc_find_cpu_slot(MACHINE(pcms), cpu->apic_id, &idx);

  5.
         // do it again with diff that in EPYC case it my have different node-id than cpu_slot
         x86ms->topo_ids_from_apicid(cpu->apic_id, &topo_info, &topo_ids);

         //i.e. user input of node-id is ignored
         set socket-id/core-id/... (but not node-id) from topo_info

         numa_cpu_pre_plug(cpu_slot)
                           ^^^^^^
              if (node_id == CPU_UNSET_NUMA_NODE_ID) {
                   if (slot->props.has_node_id)
                       object_property_set_int(... slot->props.node_id, "node-id",...);
              // this applies to hotplugged without node-id and to initial CPUs (-smp X)
              // so we may end up with "node-id" being set to user defined value
              // or left unset (no numa enabled)
              // while APIC ID will have some node-id encoded in it.


that's quite a mess, maybe we should unify both
amd make x86_apicid_from_cpu_idx_epyc()/x86_apicid_from_cpu_idx() use
ms->possible_cpus->cpus[i].props instead of x86_topo_ids_from_idx()
i.e

       x86_apicid_from_cpu_idx_epyc() {
           topoids = x86_apicid_from_cpu_idx() {
                          return ms->possible_cpus->cpus[i].props
                      }
           if (ms->possible_cpus->cpus[i].props.has_node_id)
               topoids.node_id = ms->possible_cpus->cpus[i].props.node_id
           else
               error_fatal("EPYC requires use of -numa to define topology if using more than 1 socket")
       }

that way QEMU makes up only node[0] by enabling autonuma or whatever
user privided explicitly is encoded into APIC ID and it will be always consistent with cpu
*-id properties in possible_cpus and SRAT table QEMU generates.

as cleanup we can get rid of back and forth conversion [5] and use cpu_slot to set
the same ids.

Also maybe we should have a check that node-id is the same within socket in case of EPYC
if it's guarantied that EPYC won't support multiple nodes per socket.

hope it makes at least some sense.


> Let me know if you see any other option.
> 
> > 
> >   
> > > >  
> > > > >          cpu->apic_id = x86ms->apicid_from_topo_ids(&topo_info,  
> > &topo_ids);  
> > > > >      }
> > > > >
> > > > > diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
> > > > > index 07239f95f4..ee4deb84c4 100644
> > > > > --- a/include/hw/i386/topology.h
> > > > > +++ b/include/hw/i386/topology.h
> > > > > @@ -140,6 +140,17 @@ static inline unsigned  
> > > > apicid_pkg_offset_epyc(X86CPUTopoInfo *topo_info)  
> > > > >             apicid_node_width_epyc(topo_info);
> > > > >  }
> > > > >
> > > > > +static inline unsigned x86_node_id_for_epyc(X86CPUTopoInfo  
> > *topo_info,  
> > > > > +                                            const X86CPUTopoIDs *topo_ids)
> > > > > +{
> > > > > +    unsigned nr_nodes = MAX(topo_info->nodes_per_pkg, 1);
> > > > > +    unsigned cores_per_node = DIV_ROUND_UP((topo_info->dies_per_pkg  
> > *  
> > > > > +                                            topo_info->cores_per_die *
> > > > > +                                            topo_info->threads_per_core),
> > > > > +                                            nr_nodes);
> > > > > +
> > > > > +    return (topo_ids->core_id / cores_per_node) % nr_nodes;  
> > > > what if nr_nodes == 0?
> > > >  
> > > > > +}
> > > > >  /*
> > > > >   * Make APIC ID for the CPU based on Pkg_ID, Core_ID, SMT_ID
> > > > >   *
> > > > >
> > > > >  
> > >
> > >  
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU hotplug
  2020-06-25 15:18           ` Igor Mammedov
@ 2020-06-25 16:41             ` Babu Moger
  2020-06-25 18:32               ` Igor Mammedov
  0 siblings, 1 reply; 13+ messages in thread
From: Babu Moger @ 2020-06-25 16:41 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel, pbonzini, rth, ehabkost, mst

Igor,

> -----Original Message-----
> From: Igor Mammedov <imammedo@redhat.com>
> Sent: Thursday, June 25, 2020 10:19 AM
> To: Moger, Babu <Babu.Moger@amd.com>
> Cc: ehabkost@redhat.com; mst@redhat.com; qemu-devel@nongnu.org;
> pbonzini@redhat.com; rth@twiddle.net
> Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU
> hotplug
> 
> On Wed, 24 Jun 2020 12:35:59 -0500
> Babu Moger <babu.moger@amd.com> wrote:
> 
> > > -----Original Message-----
> > > From: Igor Mammedov <imammedo@redhat.com>
> > > Sent: Wednesday, June 24, 2020 8:48 AM
> > > To: Moger, Babu <Babu.Moger@amd.com>
> > > Cc: ehabkost@redhat.com; mst@redhat.com; qemu-devel@nongnu.org;
> > > pbonzini@redhat.com; rth@twiddle.net
> > > Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and
> CPU
> > > hotplug
> > >
> > > On Tue, 16 Jun 2020 12:18:56 -0500
> > > Babu Moger <babu.moger@amd.com> wrote:
> > >
> > > > > -----Original Message-----
> > > > > From: Igor Mammedov <imammedo@redhat.com>
> > > > > Sent: Tuesday, June 16, 2020 5:59 AM
> > > > > To: Moger, Babu <Babu.Moger@amd.com>
> > > > > Cc: pbonzini@redhat.com; rth@twiddle.net; ehabkost@redhat.com;
> > > > > mst@redhat.com; marcel.apfelbaum@gmail.com; qemu-
> devel@nongnu.org
> > > > > Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device
> and
> > > CPU
> > > > > hotplug
> > > > >
> > > > > On Mon, 08 Jun 2020 15:18:50 -0500
> > > > > Babu Moger <babu.moger@amd.com> wrote:
> > > > >
> > > > > > Noticed the following command failure while testing CPU hotplug.
> > > > > >
> > > > > > $ qemu-system-x86_64 -machine q35,accel=kvm -smp 1,maxcpus=2,
> > > > > >   cores=1, threads=1,sockets=2 -cpu EPYC -device EPYC-x86_64-
> > > > > >   cpu,core-id=0,socket-id=1,thread-id=0
> > > > > >
> > > > > >   qemu-system-x86_64: -device EPYC-x86_64-cpu,core-id=0,socket-
> id=1,
> > > > > >   thread-id=0: Invalid CPU [socket: 21855, die: 0, core: 0, thread: 0]
> > > > > >   with APIC ID 21855, valid index range 0:1
> > > > > >
> > > > > > This happens because APIC ID is calculated using uninitialized memory.
> > > > > > This is happening after the addition of new field node_id in
> > > X86CPUTopoIDs
> > > > > > structure. The node_id field is uninitialized while calling
> > > > > > apicid_from_topo_ids. The problem is discussed in the thread below.
> > > > > >
> > > > >
> > >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ker
> > > > > nel.org%2Fqemu-
> > > > >
> > >
> devel%2F20200602171838.GG577771%40habkost.net%2F&amp;data=02%7C01
> > > > >
> > >
> %7Cbabu.moger%40amd.com%7C02200d75fd8b48d1955608d811e44f5b%7C3d
> > > > >
> > >
> d8961fe4884e608e11a82d994e183d%7C0%7C0%7C637279019564311233&amp
> > > > >
> > >
> ;sdata=ry3QO0Z5dxLPoRxkYVkOsVm3nl%2BxfCGv8be%2BMHdoUPY%3D&amp;r
> > > > > eserved=0
> > > > > >
> > > > > > Fix the problem by initializing the node_id properly.
> > > > > >
> > > > > > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > > > > > ---
> > > > > >  hw/i386/pc.c               |    2 ++
> > > > > >  include/hw/i386/topology.h |   11 +++++++++++
> > > > > >  2 files changed, 13 insertions(+)
> > > > > >
> > > > > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > > > > > index 2128f3d6fe..974cc30891 100644
> > > > > > --- a/hw/i386/pc.c
> > > > > > +++ b/hw/i386/pc.c
> > > > > > @@ -1585,6 +1585,8 @@ static void pc_cpu_pre_plug(HotplugHandler
> > > > > *hotplug_dev,
> > > > > >          topo_ids.die_id = cpu->die_id;
> > > > > >          topo_ids.core_id = cpu->core_id;
> > > > > >          topo_ids.smt_id = cpu->thread_id;
> > > > > > +        topo_ids.node_id = cpu_x86_use_epyc_apic_id_encoding(ms-
> > > >cpu_type)
> > > > > ?
> > > > > > +                           x86_node_id_for_epyc(&topo_info, &topo_ids) : 0;
> > > > >
> > > > > I'd rather not calculate some default value here,
> > > > > this is the branch where we check user provided topology info and error
> out
> > > > > asking
> > > > > to provide missing bits.
> > > > Noticed that cpu->node_id is initialized to
> 0xFF(NUMA_NODE_UNASSIGNED).
> > > > We can initialize cpu->node_id to default node like how we do it in
> > > > x86_get_default_cpu_node_id.  We can use it to initialize
> topo_ids.node_id.
> > > > This is consistent with other fields core_id, die_id etc..
> > > >
> > > > >
> > > > > I also wonder if we should force user to specify numa nodes on CLI if
> EPYC
> > > cpu is
> > > > > used.
> > > > > (i.e. I'm assuming that EPYC always requires numa)
> > > >
> > > > That is not true. Without numa all the cpus will be configured under one
> > > > default numa node 0. Like we do it using x86_get_default_cpu_node_id.
> > >
> > > get_default_cpu_node_id() which is making things up, is going to be
> removed
> > > eventually in favor of asking user to provide numa mapping explicitly on CLI.
> >
> > That will be good going forward.
> >
> > >
> > > now if it's always only node 0, why do we need to calculate it then,
> > > why not just assing 0 directly?
> > >
> > > what if we have several sockets, would all vCPUs still have node-id = 0?
> >
> > If there are several nodes then socket id becomes node id.
> I wonder if node id == socket id then why bother with node_id at all,
> probably node id is there to allow for design where several sockets are on
> the same node
> 
> 
> > > Another question is if we have CPUs with only 1 numa node set on all of
> them,
> > > does it require all other machinery like ACPI SRAT table defined or it's just
> > > internal CPU impl. detail?
> >
> > I am not very sure about it. To me it looks like it is just internal cpu
> > implementation detail.
> I'd think it might confuse guest OS, when it decodes more than 1 numa node
> for APIC ID/CPUID but then there are no such nodes described in ACPI.
> While it might work for caches, it would miss any relation of memory mapping
> to nodes or get it wrong if one doesn't match another.
> 
> 
> > I think we have two options here.
> >
> > 1. Update the patch to initialize the node_id the way it is done
> > get_default_cpu_node_id.
> 
> if it were only one node for every CPU (incl. multisocket), I'd go with enabling
> autonuma assigning all CPUs to default 0 node-id, since there is no ambiguity
> where
> CPUs and RAM are mapped to.
> Is it possible to use node-id=0 for all EPYC CPUs even in multisocket config?
> (it seems spec allows only one node per socket, but doesn't say that node ids
> must
> be different.)
> If not, then making up node-id is not an option.
> 
> > 2. Ask the user to pass the node_id while hot plugging the device. This is
> > a clean solution. Will require some data structure changes.
> 
> Here is my brain dump of current very non obvious flow:
> 
>   1. x86_possible_cpu_arch_ids()
>          ms->possible_cpus->cpus[i].props.* = x86_topo_ids_from_idx()
> 
>   2. possible numa_cpu_set()
>          ms->possible_cpus->cpus[i].props.node_id = user input|0 -in autonuma
> case
> 
>   3. x86_cpus_init()
>          // generate apic_id AND makeup node_id embedded into it
>          ms->possible_cpus->cpus[i].arch_id = x86_cpu_apic_id_from_index(x86ms,
> i);
>                         -> x86_apicid_from_cpu_idx_epyc() ->
> x86_topo_ids_from_idx_epyc()
>                                                                  same as x86_topo_ids_from_idx() + node_id
>                      or
>                         -> x86_apicid_from_cpu_idx() -> x86_topo_ids_from_idx()
> 
>   4. pc_cpu_pre_plug()
>          // basically topo ids module node-id is not set or user provided
>          cpu_slot = pc_find_cpu_slot(MACHINE(pcms), cpu->apic_id, &idx);
> 
>   5.
>          // do it again with diff that in EPYC case it my have different node-id than
> cpu_slot
>          x86ms->topo_ids_from_apicid(cpu->apic_id, &topo_info, &topo_ids);
> 
>          //i.e. user input of node-id is ignored
>          set socket-id/core-id/... (but not node-id) from topo_info
> 
>          numa_cpu_pre_plug(cpu_slot)
>                            ^^^^^^
>               if (node_id == CPU_UNSET_NUMA_NODE_ID) {
>                    if (slot->props.has_node_id)
>                        object_property_set_int(... slot->props.node_id, "node-id",...);
>               // this applies to hotplugged without node-id and to initial CPUs (-smp X)
>               // so we may end up with "node-id" being set to user defined value
>               // or left unset (no numa enabled)
>               // while APIC ID will have some node-id encoded in it.
> 
> 
> that's quite a mess, maybe we should unify both
> amd make x86_apicid_from_cpu_idx_epyc()/x86_apicid_from_cpu_idx() use
> ms->possible_cpus->cpus[i].props instead of x86_topo_ids_from_idx()
> i.e
> 
>        x86_apicid_from_cpu_idx_epyc() {
>            topoids = x86_apicid_from_cpu_idx() {
>                           return ms->possible_cpus->cpus[i].props
>                       }
>            if (ms->possible_cpus->cpus[i].props.has_node_id)
>                topoids.node_id = ms->possible_cpus->cpus[i].props.node_id
>            else
>                error_fatal("EPYC requires use of -numa to define topology if using
> more than 1 socket")
>        }
> 
> that way QEMU makes up only node[0] by enabling autonuma or whatever
> user privided explicitly is encoded into APIC ID and it will be always consistent
> with cpu
> *-id properties in possible_cpus and SRAT table QEMU generates.
> 
> as cleanup we can get rid of back and forth conversion [5] and use cpu_slot to
> set
> the same ids.
> 
> Also maybe we should have a check that node-id is the same within socket in
> case of EPYC
> if it's guarantied that EPYC won't support multiple nodes per socket.
> 
> hope it makes at least some sense.

To make things clear, in case of autonuma we don't have to worry about
node_id. We just have to set it topo_ids.node_id to 0 in pc_cpu_pre_plug,
Everything will work as expected. This will solve our current problem of
uninitialized variable.

Problem here is, when user has configured the numa, then setting the
topo_ids.node_id to 0 might not work because it might create duplicate
apicids and device_add will be rejected.  As per the comments in
numa_cpu_pre_plug, this is already broken. Look at the comments below.
Looks like node_id cannot be passed down.
============================================
if (node_id == CPU_UNSET_NUMA_NODE_ID) {
        /* due to bug in libvirt, it doesn't pass node-id from props on
         * device_add as expected, so we have to fix it up here */
        if (slot->props.has_node_id) {
            object_property_set_int(OBJECT(dev), slot->props.node_id,
                                    "node-id", errp);
        }
    } else if (node_id != slot->props.node_id) {
============================================

I was trying to solve this problem setting the node_id correctly for EPYC
at least.  If you think, this is not important we can ignore it (by
setting topo_ids.node_id to 0) and move forward.  I don't see the need for
changing other topology specific code as we have already made very generic.

> 
> 
> > Let me know if you see any other option.
> >
> > >
> > >
> > > > >
> > > > > >          cpu->apic_id = x86ms->apicid_from_topo_ids(&topo_info,
> > > &topo_ids);
> > > > > >      }
> > > > > >
> > > > > > diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
> > > > > > index 07239f95f4..ee4deb84c4 100644
> > > > > > --- a/include/hw/i386/topology.h
> > > > > > +++ b/include/hw/i386/topology.h
> > > > > > @@ -140,6 +140,17 @@ static inline unsigned
> > > > > apicid_pkg_offset_epyc(X86CPUTopoInfo *topo_info)
> > > > > >             apicid_node_width_epyc(topo_info);
> > > > > >  }
> > > > > >
> > > > > > +static inline unsigned x86_node_id_for_epyc(X86CPUTopoInfo
> > > *topo_info,
> > > > > > +                                            const X86CPUTopoIDs *topo_ids)
> > > > > > +{
> > > > > > +    unsigned nr_nodes = MAX(topo_info->nodes_per_pkg, 1);
> > > > > > +    unsigned cores_per_node = DIV_ROUND_UP((topo_info-
> >dies_per_pkg
> > > *
> > > > > > +                                            topo_info->cores_per_die *
> > > > > > +                                            topo_info->threads_per_core),
> > > > > > +                                            nr_nodes);
> > > > > > +
> > > > > > +    return (topo_ids->core_id / cores_per_node) % nr_nodes;
> > > > > what if nr_nodes == 0?
> > > > >
> > > > > > +}
> > > > > >  /*
> > > > > >   * Make APIC ID for the CPU based on Pkg_ID, Core_ID, SMT_ID
> > > > > >   *
> > > > > >
> > > > > >
> > > >
> > > >
> >



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU hotplug
  2020-06-25 16:41             ` Babu Moger
@ 2020-06-25 18:32               ` Igor Mammedov
  2020-06-25 22:55                 ` Babu Moger
  0 siblings, 1 reply; 13+ messages in thread
From: Igor Mammedov @ 2020-06-25 18:32 UTC (permalink / raw)
  To: Babu Moger; +Cc: qemu-devel, pbonzini, rth, ehabkost, mst

On Thu, 25 Jun 2020 11:41:25 -0500
Babu Moger <babu.moger@amd.com> wrote:

> Igor,
> 
> > -----Original Message-----
> > From: Igor Mammedov <imammedo@redhat.com>
> > Sent: Thursday, June 25, 2020 10:19 AM
> > To: Moger, Babu <Babu.Moger@amd.com>
> > Cc: ehabkost@redhat.com; mst@redhat.com; qemu-devel@nongnu.org;
> > pbonzini@redhat.com; rth@twiddle.net
> > Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU
> > hotplug
> > 
> > On Wed, 24 Jun 2020 12:35:59 -0500
> > Babu Moger <babu.moger@amd.com> wrote:
> >   
> > > > -----Original Message-----
> > > > From: Igor Mammedov <imammedo@redhat.com>
> > > > Sent: Wednesday, June 24, 2020 8:48 AM
> > > > To: Moger, Babu <Babu.Moger@amd.com>
> > > > Cc: ehabkost@redhat.com; mst@redhat.com; qemu-devel@nongnu.org;
> > > > pbonzini@redhat.com; rth@twiddle.net
> > > > Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and  
> > CPU  
> > > > hotplug
> > > >
> > > > On Tue, 16 Jun 2020 12:18:56 -0500
> > > > Babu Moger <babu.moger@amd.com> wrote:
> > > >  
> > > > > > -----Original Message-----
> > > > > > From: Igor Mammedov <imammedo@redhat.com>
> > > > > > Sent: Tuesday, June 16, 2020 5:59 AM
> > > > > > To: Moger, Babu <Babu.Moger@amd.com>
> > > > > > Cc: pbonzini@redhat.com; rth@twiddle.net; ehabkost@redhat.com;
> > > > > > mst@redhat.com; marcel.apfelbaum@gmail.com; qemu-  
> > devel@nongnu.org  
> > > > > > Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device  
> > and  
> > > > CPU  
> > > > > > hotplug
> > > > > >
> > > > > > On Mon, 08 Jun 2020 15:18:50 -0500
> > > > > > Babu Moger <babu.moger@amd.com> wrote:
> > > > > >  
> > > > > > > Noticed the following command failure while testing CPU hotplug.
> > > > > > >
> > > > > > > $ qemu-system-x86_64 -machine q35,accel=kvm -smp 1,maxcpus=2,
> > > > > > >   cores=1, threads=1,sockets=2 -cpu EPYC -device EPYC-x86_64-
> > > > > > >   cpu,core-id=0,socket-id=1,thread-id=0
> > > > > > >
> > > > > > >   qemu-system-x86_64: -device EPYC-x86_64-cpu,core-id=0,socket-  
> > id=1,  
> > > > > > >   thread-id=0: Invalid CPU [socket: 21855, die: 0, core: 0, thread: 0]
> > > > > > >   with APIC ID 21855, valid index range 0:1
> > > > > > >
> > > > > > > This happens because APIC ID is calculated using uninitialized memory.
> > > > > > > This is happening after the addition of new field node_id in  
> > > > X86CPUTopoIDs  
> > > > > > > structure. The node_id field is uninitialized while calling
> > > > > > > apicid_from_topo_ids. The problem is discussed in the thread below.
> > > > > > >  
> > > > > >  
> > > >  
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ker  
> > > > > > nel.org%2Fqemu-
> > > > > >  
> > > >  
> > devel%2F20200602171838.GG577771%40habkost.net%2F&amp;data=02%7C01  
> > > > > >  
> > > >  
> > %7Cbabu.moger%40amd.com%7C02200d75fd8b48d1955608d811e44f5b%7C3d  
> > > > > >  
> > > >  
> > d8961fe4884e608e11a82d994e183d%7C0%7C0%7C637279019564311233&amp  
> > > > > >  
> > > >  
> > ;sdata=ry3QO0Z5dxLPoRxkYVkOsVm3nl%2BxfCGv8be%2BMHdoUPY%3D&amp;r  
> > > > > > eserved=0  
> > > > > > >
> > > > > > > Fix the problem by initializing the node_id properly.
> > > > > > >
> > > > > > > Signed-off-by: Babu Moger <babu.moger@amd.com>
> > > > > > > ---
> > > > > > >  hw/i386/pc.c               |    2 ++
> > > > > > >  include/hw/i386/topology.h |   11 +++++++++++
> > > > > > >  2 files changed, 13 insertions(+)
> > > > > > >
> > > > > > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > > > > > > index 2128f3d6fe..974cc30891 100644
> > > > > > > --- a/hw/i386/pc.c
> > > > > > > +++ b/hw/i386/pc.c
> > > > > > > @@ -1585,6 +1585,8 @@ static void pc_cpu_pre_plug(HotplugHandler  
> > > > > > *hotplug_dev,  
> > > > > > >          topo_ids.die_id = cpu->die_id;
> > > > > > >          topo_ids.core_id = cpu->core_id;
> > > > > > >          topo_ids.smt_id = cpu->thread_id;
> > > > > > > +        topo_ids.node_id = cpu_x86_use_epyc_apic_id_encoding(ms-  
> > > > >cpu_type)  
> > > > > > ?  
> > > > > > > +                           x86_node_id_for_epyc(&topo_info, &topo_ids) : 0;  
> > > > > >
> > > > > > I'd rather not calculate some default value here,
> > > > > > this is the branch where we check user provided topology info and error  
> > out  
> > > > > > asking
> > > > > > to provide missing bits.  
> > > > > Noticed that cpu->node_id is initialized to  
> > 0xFF(NUMA_NODE_UNASSIGNED).  
> > > > > We can initialize cpu->node_id to default node like how we do it in
> > > > > x86_get_default_cpu_node_id.  We can use it to initialize  
> > topo_ids.node_id.  
> > > > > This is consistent with other fields core_id, die_id etc..
> > > > >  
> > > > > >
> > > > > > I also wonder if we should force user to specify numa nodes on CLI if  
> > EPYC  
> > > > cpu is  
> > > > > > used.
> > > > > > (i.e. I'm assuming that EPYC always requires numa)  
> > > > >
> > > > > That is not true. Without numa all the cpus will be configured under one
> > > > > default numa node 0. Like we do it using x86_get_default_cpu_node_id.  
> > > >
> > > > get_default_cpu_node_id() which is making things up, is going to be  
> > removed  
> > > > eventually in favor of asking user to provide numa mapping explicitly on CLI.  
> > >
> > > That will be good going forward.
> > >  
> > > >
> > > > now if it's always only node 0, why do we need to calculate it then,
> > > > why not just assing 0 directly?
> > > >
> > > > what if we have several sockets, would all vCPUs still have node-id = 0?  
> > >
> > > If there are several nodes then socket id becomes node id.  
> > I wonder if node id == socket id then why bother with node_id at all,
> > probably node id is there to allow for design where several sockets are on
> > the same node
> > 
> >   
> > > > Another question is if we have CPUs with only 1 numa node set on all of  
> > them,  
> > > > does it require all other machinery like ACPI SRAT table defined or it's just
> > > > internal CPU impl. detail?  
> > >
> > > I am not very sure about it. To me it looks like it is just internal cpu
> > > implementation detail.  
> > I'd think it might confuse guest OS, when it decodes more than 1 numa node
> > for APIC ID/CPUID but then there are no such nodes described in ACPI.
> > While it might work for caches, it would miss any relation of memory mapping
> > to nodes or get it wrong if one doesn't match another.
> > 
> >   
> > > I think we have two options here.
> > >
> > > 1. Update the patch to initialize the node_id the way it is done
> > > get_default_cpu_node_id.  
> > 
> > if it were only one node for every CPU (incl. multisocket), I'd go with enabling
> > autonuma assigning all CPUs to default 0 node-id, since there is no ambiguity
> > where
> > CPUs and RAM are mapped to.
> > Is it possible to use node-id=0 for all EPYC CPUs even in multisocket config?
> > (it seems spec allows only one node per socket, but doesn't say that node ids
> > must
> > be different.)
> > If not, then making up node-id is not an option.
> >   
> > > 2. Ask the user to pass the node_id while hot plugging the device. This is
> > > a clean solution. Will require some data structure changes.  
> > 
> > Here is my brain dump of current very non obvious flow:
> > 
> >   1. x86_possible_cpu_arch_ids()
> >          ms->possible_cpus->cpus[i].props.* = x86_topo_ids_from_idx()
> > 
> >   2. possible numa_cpu_set()
> >          ms->possible_cpus->cpus[i].props.node_id = user input|0 -in autonuma
> > case
> > 
> >   3. x86_cpus_init()
> >          // generate apic_id AND makeup node_id embedded into it
> >          ms->possible_cpus->cpus[i].arch_id = x86_cpu_apic_id_from_index(x86ms,
> > i);  
> >                         -> x86_apicid_from_cpu_idx_epyc() ->  
> > x86_topo_ids_from_idx_epyc()
> >                                                                  same as x86_topo_ids_from_idx() + node_id
> >                      or  
> >                         -> x86_apicid_from_cpu_idx() -> x86_topo_ids_from_idx()  
> > 
> >   4. pc_cpu_pre_plug()
> >          // basically topo ids module node-id is not set or user provided
> >          cpu_slot = pc_find_cpu_slot(MACHINE(pcms), cpu->apic_id, &idx);
> > 
> >   5.
> >          // do it again with diff that in EPYC case it my have different node-id than
> > cpu_slot
> >          x86ms->topo_ids_from_apicid(cpu->apic_id, &topo_info, &topo_ids);
> > 
> >          //i.e. user input of node-id is ignored
> >          set socket-id/core-id/... (but not node-id) from topo_info
> > 
> >          numa_cpu_pre_plug(cpu_slot)
> >                            ^^^^^^
> >               if (node_id == CPU_UNSET_NUMA_NODE_ID) {
> >                    if (slot->props.has_node_id)
> >                        object_property_set_int(... slot->props.node_id, "node-id",...);
> >               // this applies to hotplugged without node-id and to initial CPUs (-smp X)
> >               // so we may end up with "node-id" being set to user defined value
> >               // or left unset (no numa enabled)
> >               // while APIC ID will have some node-id encoded in it.
> > 
> > 
> > that's quite a mess, maybe we should unify both
> > amd make x86_apicid_from_cpu_idx_epyc()/x86_apicid_from_cpu_idx() use
> > ms->possible_cpus->cpus[i].props instead of x86_topo_ids_from_idx()
> > i.e
> > 
> >        x86_apicid_from_cpu_idx_epyc() {
> >            topoids = x86_apicid_from_cpu_idx() {
> >                           return ms->possible_cpus->cpus[i].props
> >                       }
> >            if (ms->possible_cpus->cpus[i].props.has_node_id)
> >                topoids.node_id = ms->possible_cpus->cpus[i].props.node_id
> >            else
> >                error_fatal("EPYC requires use of -numa to define topology if using
> > more than 1 socket")
> >        }
> > 
> > that way QEMU makes up only node[0] by enabling autonuma or whatever
> > user privided explicitly is encoded into APIC ID and it will be always consistent
> > with cpu
> > *-id properties in possible_cpus and SRAT table QEMU generates.
> > 
> > as cleanup we can get rid of back and forth conversion [5] and use cpu_slot to
> > set
> > the same ids.
> > 
> > Also maybe we should have a check that node-id is the same within socket in
> > case of EPYC
> > if it's guarantied that EPYC won't support multiple nodes per socket.
> > 
> > hope it makes at least some sense.  
> 
> To make things clear, in case of autonuma we don't have to worry about
> node_id. We just have to set it topo_ids.node_id to 0 in pc_cpu_pre_plug,
> Everything will work as expected. This will solve our current problem of
> uninitialized variable.

I'm proposing to enable autonuma, which in its turn will assign all CPUs to
node-id=0 in possible_cpus. And once this information is in possible_cpus,
numa_cpu_pre_plug() should take care of setting correct node-id on CPU for
the case of initial CPUs (node_id == CPU_UNSET_NUMA_NODE_ID), and in case
of hotplug numa_cpu_pre_plug() will complaing if user suppled nonsense on
with device_add.

> Problem here is, when user has configured the numa, then setting the
> topo_ids.node_id to 0 might not work because it might create duplicate
> apicids and device_add will be rejected.  As per the comments in
I don't see how such duplicate could be made, even if all CPUs have 0
node-id, there are pkg_id/core_id/thread_id wich are encoded in APIC ID,
where pkg_id is unique across machine (at least in QEMU), so I don't see
how duplicate is possible.

> numa_cpu_pre_plug, this is already broken. Look at the comments below.
> Looks like node_id cannot be passed down.
> ============================================
> if (node_id == CPU_UNSET_NUMA_NODE_ID) {
>         /* due to bug in libvirt, it doesn't pass node-id from props on
>          * device_add as expected, so we have to fix it up here */
>         if (slot->props.has_node_id) {
>             object_property_set_int(OBJECT(dev), slot->props.node_id,
>                                     "node-id", errp);
>         }
          else if (epyc)
             error("incomplete EPYC topology use -numa cpu,node-id=some-id,socket-id=%d  to configure numa node for socket",
                    cpu_socket_id)

>     } else if (node_id != slot->props.node_id) {
> ============================================
> 
> I was trying to solve this problem setting the node_id correctly for EPYC
> at least.  If you think, this is not important we can ignore it (by
> setting topo_ids.node_id to 0) and move forward.  I don't see the need for
> changing other topology specific code as we have already made very generic.

node-id - can be passed down (problem was that libvird didn't do it back then
for -device/device_add, hence above hack).

But that's not a problem, the problem is that x86_apicid_from_cpu_idx_epyc() makes
up node-id on its own, which is not big deal in case of autonuma since they happen
to match and there aren't any ambiguity, but with more numa nodes, numa config should
be defined by user explicitly and current code may end up with incoherent config,
where some parts of QEMU think CPU has one node-id while APIC ID is encoded with another. 

So after some pondering on a subject, to make sure it will look correct from all angles,
we need to:
 
 1: use single source for topo ids, i.e. pull user provided node-id from possible_cpus
    for both cpu.node-id property and for APIC ID. Hence my suggestion to change
    x86_apicid_from_cpu_idx_epyc() as described above.

 2: verify that user provided id's make sense in EPYC case. (pre_plug)

> > > Let me know if you see any other option.
> > >  
> > > >
> > > >  
> > > > > >  
> > > > > > >          cpu->apic_id = x86ms->apicid_from_topo_ids(&topo_info,  
> > > > &topo_ids);  
> > > > > > >      }
> > > > > > >
> > > > > > > diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
> > > > > > > index 07239f95f4..ee4deb84c4 100644
> > > > > > > --- a/include/hw/i386/topology.h
> > > > > > > +++ b/include/hw/i386/topology.h
> > > > > > > @@ -140,6 +140,17 @@ static inline unsigned  
> > > > > > apicid_pkg_offset_epyc(X86CPUTopoInfo *topo_info)  
> > > > > > >             apicid_node_width_epyc(topo_info);
> > > > > > >  }
> > > > > > >
> > > > > > > +static inline unsigned x86_node_id_for_epyc(X86CPUTopoInfo  
> > > > *topo_info,  
> > > > > > > +                                            const X86CPUTopoIDs *topo_ids)
> > > > > > > +{
> > > > > > > +    unsigned nr_nodes = MAX(topo_info->nodes_per_pkg, 1);
> > > > > > > +    unsigned cores_per_node = DIV_ROUND_UP((topo_info-  
> > >dies_per_pkg  
> > > > *  
> > > > > > > +                                            topo_info->cores_per_die *
> > > > > > > +                                            topo_info->threads_per_core),
> > > > > > > +                                            nr_nodes);
> > > > > > > +
> > > > > > > +    return (topo_ids->core_id / cores_per_node) % nr_nodes;  
> > > > > > what if nr_nodes == 0?
> > > > > >  
> > > > > > > +}
> > > > > > >  /*
> > > > > > >   * Make APIC ID for the CPU based on Pkg_ID, Core_ID, SMT_ID
> > > > > > >   *
> > > > > > >
> > > > > > >  
> > > > >
> > > > >  
> > >  
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU hotplug
  2020-06-25 18:32               ` Igor Mammedov
@ 2020-06-25 22:55                 ` Babu Moger
  2020-06-26 20:56                   ` Babu Moger
  2020-06-30  2:48                   ` Babu Moger
  0 siblings, 2 replies; 13+ messages in thread
From: Babu Moger @ 2020-06-25 22:55 UTC (permalink / raw)
  To: Igor Mammedov; +Cc: qemu-devel, pbonzini, rth, ehabkost, mst



On 6/25/20 1:32 PM, Igor Mammedov wrote:
> On Thu, 25 Jun 2020 11:41:25 -0500
> Babu Moger <babu.moger@amd.com> wrote:
> 
>> Igor,
>>
>>> -----Original Message-----
>>> From: Igor Mammedov <imammedo@redhat.com>
>>> Sent: Thursday, June 25, 2020 10:19 AM
>>> To: Moger, Babu <Babu.Moger@amd.com>
>>> Cc: ehabkost@redhat.com; mst@redhat.com; qemu-devel@nongnu.org;
>>> pbonzini@redhat.com; rth@twiddle.net
>>> Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU
>>> hotplug
>>>
>>> On Wed, 24 Jun 2020 12:35:59 -0500
>>> Babu Moger <babu.moger@amd.com> wrote:
>>>   
>>>>> -----Original Message-----
>>>>> From: Igor Mammedov <imammedo@redhat.com>
>>>>> Sent: Wednesday, June 24, 2020 8:48 AM
>>>>> To: Moger, Babu <Babu.Moger@amd.com>
>>>>> Cc: ehabkost@redhat.com; mst@redhat.com; qemu-devel@nongnu.org;
>>>>> pbonzini@redhat.com; rth@twiddle.net
>>>>> Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and  
>>> CPU  
>>>>> hotplug
>>>>>
>>>>> On Tue, 16 Jun 2020 12:18:56 -0500
>>>>> Babu Moger <babu.moger@amd.com> wrote:
>>>>>  
>>>>>>> -----Original Message-----
>>>>>>> From: Igor Mammedov <imammedo@redhat.com>
>>>>>>> Sent: Tuesday, June 16, 2020 5:59 AM
>>>>>>> To: Moger, Babu <Babu.Moger@amd.com>
>>>>>>> Cc: pbonzini@redhat.com; rth@twiddle.net; ehabkost@redhat.com;
>>>>>>> mst@redhat.com; marcel.apfelbaum@gmail.com; qemu-  
>>> devel@nongnu.org  
>>>>>>> Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device  
>>> and  
>>>>> CPU  
>>>>>>> hotplug
>>>>>>>
>>>>>>> On Mon, 08 Jun 2020 15:18:50 -0500
>>>>>>> Babu Moger <babu.moger@amd.com> wrote:
>>>>>>>  
>>>>>>>> Noticed the following command failure while testing CPU hotplug.
>>>>>>>>
>>>>>>>> $ qemu-system-x86_64 -machine q35,accel=kvm -smp 1,maxcpus=2,
>>>>>>>>   cores=1, threads=1,sockets=2 -cpu EPYC -device EPYC-x86_64-
>>>>>>>>   cpu,core-id=0,socket-id=1,thread-id=0
>>>>>>>>
>>>>>>>>   qemu-system-x86_64: -device EPYC-x86_64-cpu,core-id=0,socket-  
>>> id=1,  
>>>>>>>>   thread-id=0: Invalid CPU [socket: 21855, die: 0, core: 0, thread: 0]
>>>>>>>>   with APIC ID 21855, valid index range 0:1
>>>>>>>>
>>>>>>>> This happens because APIC ID is calculated using uninitialized memory.
>>>>>>>> This is happening after the addition of new field node_id in  
>>>>> X86CPUTopoIDs  
>>>>>>>> structure. The node_id field is uninitialized while calling
>>>>>>>> apicid_from_topo_ids. The problem is discussed in the thread below.
>>>>>>>>  
>>>>>>>  
>>>>>  
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ker  
>>>>>>> nel.org%2Fqemu-
>>>>>>>  
>>>>>  
>>> devel%2F20200602171838.GG577771%40habkost.net%2F&amp;data=02%7C01  
>>>>>>>  
>>>>>  
>>> %7Cbabu.moger%40amd.com%7C02200d75fd8b48d1955608d811e44f5b%7C3d  
>>>>>>>  
>>>>>  
>>> d8961fe4884e608e11a82d994e183d%7C0%7C0%7C637279019564311233&amp  
>>>>>>>  
>>>>>  
>>> ;sdata=ry3QO0Z5dxLPoRxkYVkOsVm3nl%2BxfCGv8be%2BMHdoUPY%3D&amp;r  
>>>>>>> eserved=0  
>>>>>>>>
>>>>>>>> Fix the problem by initializing the node_id properly.
>>>>>>>>
>>>>>>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>>>>>>>> ---
>>>>>>>>  hw/i386/pc.c               |    2 ++
>>>>>>>>  include/hw/i386/topology.h |   11 +++++++++++
>>>>>>>>  2 files changed, 13 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>>>>>>>> index 2128f3d6fe..974cc30891 100644
>>>>>>>> --- a/hw/i386/pc.c
>>>>>>>> +++ b/hw/i386/pc.c
>>>>>>>> @@ -1585,6 +1585,8 @@ static void pc_cpu_pre_plug(HotplugHandler  
>>>>>>> *hotplug_dev,  
>>>>>>>>          topo_ids.die_id = cpu->die_id;
>>>>>>>>          topo_ids.core_id = cpu->core_id;
>>>>>>>>          topo_ids.smt_id = cpu->thread_id;
>>>>>>>> +        topo_ids.node_id = cpu_x86_use_epyc_apic_id_encoding(ms-  
>>>>>> cpu_type)  
>>>>>>> ?  
>>>>>>>> +                           x86_node_id_for_epyc(&topo_info, &topo_ids) : 0;  
>>>>>>>
>>>>>>> I'd rather not calculate some default value here,
>>>>>>> this is the branch where we check user provided topology info and error  
>>> out  
>>>>>>> asking
>>>>>>> to provide missing bits.  
>>>>>> Noticed that cpu->node_id is initialized to  
>>> 0xFF(NUMA_NODE_UNASSIGNED).  
>>>>>> We can initialize cpu->node_id to default node like how we do it in
>>>>>> x86_get_default_cpu_node_id.  We can use it to initialize  
>>> topo_ids.node_id.  
>>>>>> This is consistent with other fields core_id, die_id etc..
>>>>>>  
>>>>>>>
>>>>>>> I also wonder if we should force user to specify numa nodes on CLI if  
>>> EPYC  
>>>>> cpu is  
>>>>>>> used.
>>>>>>> (i.e. I'm assuming that EPYC always requires numa)  
>>>>>>
>>>>>> That is not true. Without numa all the cpus will be configured under one
>>>>>> default numa node 0. Like we do it using x86_get_default_cpu_node_id.  
>>>>>
>>>>> get_default_cpu_node_id() which is making things up, is going to be  
>>> removed  
>>>>> eventually in favor of asking user to provide numa mapping explicitly on CLI.  
>>>>
>>>> That will be good going forward.
>>>>  
>>>>>
>>>>> now if it's always only node 0, why do we need to calculate it then,
>>>>> why not just assing 0 directly?
>>>>>
>>>>> what if we have several sockets, would all vCPUs still have node-id = 0?  
>>>>
>>>> If there are several nodes then socket id becomes node id.  
>>> I wonder if node id == socket id then why bother with node_id at all,
>>> probably node id is there to allow for design where several sockets are on
>>> the same node
>>>
>>>   
>>>>> Another question is if we have CPUs with only 1 numa node set on all of  
>>> them,  
>>>>> does it require all other machinery like ACPI SRAT table defined or it's just
>>>>> internal CPU impl. detail?  
>>>>
>>>> I am not very sure about it. To me it looks like it is just internal cpu
>>>> implementation detail.  
>>> I'd think it might confuse guest OS, when it decodes more than 1 numa node
>>> for APIC ID/CPUID but then there are no such nodes described in ACPI.
>>> While it might work for caches, it would miss any relation of memory mapping
>>> to nodes or get it wrong if one doesn't match another.
>>>
>>>   
>>>> I think we have two options here.
>>>>
>>>> 1. Update the patch to initialize the node_id the way it is done
>>>> get_default_cpu_node_id.  
>>>
>>> if it were only one node for every CPU (incl. multisocket), I'd go with enabling
>>> autonuma assigning all CPUs to default 0 node-id, since there is no ambiguity
>>> where
>>> CPUs and RAM are mapped to.
>>> Is it possible to use node-id=0 for all EPYC CPUs even in multisocket config?
>>> (it seems spec allows only one node per socket, but doesn't say that node ids
>>> must
>>> be different.)
>>> If not, then making up node-id is not an option.
>>>   
>>>> 2. Ask the user to pass the node_id while hot plugging the device. This is
>>>> a clean solution. Will require some data structure changes.  
>>>
>>> Here is my brain dump of current very non obvious flow:
>>>
>>>   1. x86_possible_cpu_arch_ids()
>>>          ms->possible_cpus->cpus[i].props.* = x86_topo_ids_from_idx()
>>>
>>>   2. possible numa_cpu_set()
>>>          ms->possible_cpus->cpus[i].props.node_id = user input|0 -in autonuma
>>> case
>>>
>>>   3. x86_cpus_init()
>>>          // generate apic_id AND makeup node_id embedded into it
>>>          ms->possible_cpus->cpus[i].arch_id = x86_cpu_apic_id_from_index(x86ms,
>>> i);  
>>>                         -> x86_apicid_from_cpu_idx_epyc() ->  
>>> x86_topo_ids_from_idx_epyc()
>>>                                                                  same as x86_topo_ids_from_idx() + node_id
>>>                      or  
>>>                         -> x86_apicid_from_cpu_idx() -> x86_topo_ids_from_idx()  
>>>
>>>   4. pc_cpu_pre_plug()
>>>          // basically topo ids module node-id is not set or user provided
>>>          cpu_slot = pc_find_cpu_slot(MACHINE(pcms), cpu->apic_id, &idx);
>>>
>>>   5.
>>>          // do it again with diff that in EPYC case it my have different node-id than
>>> cpu_slot
>>>          x86ms->topo_ids_from_apicid(cpu->apic_id, &topo_info, &topo_ids);
>>>
>>>          //i.e. user input of node-id is ignored
>>>          set socket-id/core-id/... (but not node-id) from topo_info
>>>
>>>          numa_cpu_pre_plug(cpu_slot)
>>>                            ^^^^^^
>>>               if (node_id == CPU_UNSET_NUMA_NODE_ID) {
>>>                    if (slot->props.has_node_id)
>>>                        object_property_set_int(... slot->props.node_id, "node-id",...);
>>>               // this applies to hotplugged without node-id and to initial CPUs (-smp X)
>>>               // so we may end up with "node-id" being set to user defined value
>>>               // or left unset (no numa enabled)
>>>               // while APIC ID will have some node-id encoded in it.
>>>
>>>
>>> that's quite a mess, maybe we should unify both
>>> amd make x86_apicid_from_cpu_idx_epyc()/x86_apicid_from_cpu_idx() use
>>> ms->possible_cpus->cpus[i].props instead of x86_topo_ids_from_idx()
>>> i.e
>>>
>>>        x86_apicid_from_cpu_idx_epyc() {
>>>            topoids = x86_apicid_from_cpu_idx() {
>>>                           return ms->possible_cpus->cpus[i].props
>>>                       }
>>>            if (ms->possible_cpus->cpus[i].props.has_node_id)
>>>                topoids.node_id = ms->possible_cpus->cpus[i].props.node_id
>>>            else
>>>                error_fatal("EPYC requires use of -numa to define topology if using
>>> more than 1 socket")
>>>        }
>>>
>>> that way QEMU makes up only node[0] by enabling autonuma or whatever
>>> user privided explicitly is encoded into APIC ID and it will be always consistent
>>> with cpu
>>> *-id properties in possible_cpus and SRAT table QEMU generates.
>>>
>>> as cleanup we can get rid of back and forth conversion [5] and use cpu_slot to
>>> set
>>> the same ids.
>>>
>>> Also maybe we should have a check that node-id is the same within socket in
>>> case of EPYC
>>> if it's guarantied that EPYC won't support multiple nodes per socket.
>>>
>>> hope it makes at least some sense.  
>>
>> To make things clear, in case of autonuma we don't have to worry about
>> node_id. We just have to set it topo_ids.node_id to 0 in pc_cpu_pre_plug,
>> Everything will work as expected. This will solve our current problem of
>> uninitialized variable.
> 
> I'm proposing to enable autonuma, which in its turn will assign all CPUs to
> node-id=0 in possible_cpus. And once this information is in possible_cpus,
> numa_cpu_pre_plug() should take care of setting correct node-id on CPU for
> the case of initial CPUs (node_id == CPU_UNSET_NUMA_NODE_ID), and in case
> of hotplug numa_cpu_pre_plug() will complaing if user suppled nonsense on
> with device_add.
> 
>> Problem here is, when user has configured the numa, then setting the
>> topo_ids.node_id to 0 might not work because it might create duplicate
>> apicids and device_add will be rejected.  As per the comments in
> I don't see how such duplicate could be made, even if all CPUs have 0
> node-id, there are pkg_id/core_id/thread_id wich are encoded in APIC ID,
> where pkg_id is unique across machine (at least in QEMU), so I don't see
> how duplicate is possible.
> 
>> numa_cpu_pre_plug, this is already broken. Look at the comments below.
>> Looks like node_id cannot be passed down.
>> ============================================
>> if (node_id == CPU_UNSET_NUMA_NODE_ID) {
>>         /* due to bug in libvirt, it doesn't pass node-id from props on
>>          * device_add as expected, so we have to fix it up here */
>>         if (slot->props.has_node_id) {
>>             object_property_set_int(OBJECT(dev), slot->props.node_id,
>>                                     "node-id", errp);
>>         }
>           else if (epyc)
>              error("incomplete EPYC topology use -numa cpu,node-id=some-id,socket-id=%d  to configure numa node for socket",
>                     cpu_socket_id)
> 
>>     } else if (node_id != slot->props.node_id) {
>> ============================================
>>
>> I was trying to solve this problem setting the node_id correctly for EPYC
>> at least.  If you think, this is not important we can ignore it (by
>> setting topo_ids.node_id to 0) and move forward.  I don't see the need for
>> changing other topology specific code as we have already made very generic.
> 
> node-id - can be passed down (problem was that libvird didn't do it back then
> for -device/device_add, hence above hack).
> 
> But that's not a problem, the problem is that x86_apicid_from_cpu_idx_epyc() makes
> up node-id on its own, which is not big deal in case of autonuma since they happen
> to match and there aren't any ambiguity, but with more numa nodes, numa config should
> be defined by user explicitly and current code may end up with incoherent config,
> where some parts of QEMU think CPU has one node-id while APIC ID is encoded with another. 
> 
> So after some pondering on a subject, to make sure it will look correct from all angles,
> we need to:
>  
>  1: use single source for topo ids, i.e. pull user provided node-id from possible_cpus
>     for both cpu.node-id property and for APIC ID. Hence my suggestion to change
>     x86_apicid_from_cpu_idx_epyc() as described above.
> 
>  2: verify that user provided id's make sense in EPYC case. (pre_plug)

Basically, you are saying to setup the props.has_node_id and props.node_id
while buidling the topology. Make sure to use the numa information if the
user has provided else build the topology as per EPYC topology model.
Right? I will have to think thru this little bit. Will try to send the
draft patch tomarrow.


> 
>>>> Let me know if you see any other option.
>>>>  
>>>>>
>>>>>  
>>>>>>>  
>>>>>>>>          cpu->apic_id = x86ms->apicid_from_topo_ids(&topo_info,  
>>>>> &topo_ids);  
>>>>>>>>      }
>>>>>>>>
>>>>>>>> diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
>>>>>>>> index 07239f95f4..ee4deb84c4 100644
>>>>>>>> --- a/include/hw/i386/topology.h
>>>>>>>> +++ b/include/hw/i386/topology.h
>>>>>>>> @@ -140,6 +140,17 @@ static inline unsigned  
>>>>>>> apicid_pkg_offset_epyc(X86CPUTopoInfo *topo_info)  
>>>>>>>>             apicid_node_width_epyc(topo_info);
>>>>>>>>  }
>>>>>>>>
>>>>>>>> +static inline unsigned x86_node_id_for_epyc(X86CPUTopoInfo  
>>>>> *topo_info,  
>>>>>>>> +                                            const X86CPUTopoIDs *topo_ids)
>>>>>>>> +{
>>>>>>>> +    unsigned nr_nodes = MAX(topo_info->nodes_per_pkg, 1);
>>>>>>>> +    unsigned cores_per_node = DIV_ROUND_UP((topo_info-  
>>>> dies_per_pkg  
>>>>> *  
>>>>>>>> +                                            topo_info->cores_per_die *
>>>>>>>> +                                            topo_info->threads_per_core),
>>>>>>>> +                                            nr_nodes);
>>>>>>>> +
>>>>>>>> +    return (topo_ids->core_id / cores_per_node) % nr_nodes;  
>>>>>>> what if nr_nodes == 0?
>>>>>>>  
>>>>>>>> +}
>>>>>>>>  /*
>>>>>>>>   * Make APIC ID for the CPU based on Pkg_ID, Core_ID, SMT_ID
>>>>>>>>   *
>>>>>>>>
>>>>>>>>  
>>>>>>
>>>>>>  
>>>>  
>>
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU hotplug
  2020-06-25 22:55                 ` Babu Moger
@ 2020-06-26 20:56                   ` Babu Moger
  2020-06-30  2:48                   ` Babu Moger
  1 sibling, 0 replies; 13+ messages in thread
From: Babu Moger @ 2020-06-26 20:56 UTC (permalink / raw)
  To: Moger, Babu, Igor Mammedov; +Cc: qemu-devel, pbonzini, rth, ehabkost, mst


> -----Original Message-----
> From: Moger, Babu <babu.moger@amd.com>
> Sent: Thursday, June 25, 2020 5:55 PM
> To: Igor Mammedov <imammedo@redhat.com>
> Cc: ehabkost@redhat.com; mst@redhat.com; qemu-devel@nongnu.org;
> pbonzini@redhat.com; rth@twiddle.net
> Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU
> hotplug
> 
> 
> 
> On 6/25/20 1:32 PM, Igor Mammedov wrote:
> > On Thu, 25 Jun 2020 11:41:25 -0500
> > Babu Moger <babu.moger@amd.com> wrote:
> >
> >> Igor,
> >>
> >>> -----Original Message-----
> >>> From: Igor Mammedov <imammedo@redhat.com>
> >>> Sent: Thursday, June 25, 2020 10:19 AM
> >>> To: Moger, Babu <Babu.Moger@amd.com>
> >>> Cc: ehabkost@redhat.com; mst@redhat.com; qemu-devel@nongnu.org;
> >>> pbonzini@redhat.com; rth@twiddle.net
> >>> Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and
> CPU
> >>> hotplug
> >>>
> >>> On Wed, 24 Jun 2020 12:35:59 -0500
> >>> Babu Moger <babu.moger@amd.com> wrote:
> >>>
> >>>>> -----Original Message-----
> >>>>> From: Igor Mammedov <imammedo@redhat.com>
> >>>>> Sent: Wednesday, June 24, 2020 8:48 AM
> >>>>> To: Moger, Babu <Babu.Moger@amd.com>
> >>>>> Cc: ehabkost@redhat.com; mst@redhat.com; qemu-devel@nongnu.org;
> >>>>> pbonzini@redhat.com; rth@twiddle.net
> >>>>> Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device
> and
> >>> CPU
> >>>>> hotplug
> >>>>>
> >>>>> On Tue, 16 Jun 2020 12:18:56 -0500
> >>>>> Babu Moger <babu.moger@amd.com> wrote:
> >>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Igor Mammedov <imammedo@redhat.com>
> >>>>>>> Sent: Tuesday, June 16, 2020 5:59 AM
> >>>>>>> To: Moger, Babu <Babu.Moger@amd.com>
> >>>>>>> Cc: pbonzini@redhat.com; rth@twiddle.net; ehabkost@redhat.com;
> >>>>>>> mst@redhat.com; marcel.apfelbaum@gmail.com; qemu-
> >>> devel@nongnu.org
> >>>>>>> Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device
> >>> and
> >>>>> CPU
> >>>>>>> hotplug
> >>>>>>>
> >>>>>>> On Mon, 08 Jun 2020 15:18:50 -0500
> >>>>>>> Babu Moger <babu.moger@amd.com> wrote:
> >>>>>>>
> >>>>>>>> Noticed the following command failure while testing CPU hotplug.
> >>>>>>>>
> >>>>>>>> $ qemu-system-x86_64 -machine q35,accel=kvm -smp 1,maxcpus=2,
> >>>>>>>>   cores=1, threads=1,sockets=2 -cpu EPYC -device EPYC-x86_64-
> >>>>>>>>   cpu,core-id=0,socket-id=1,thread-id=0
> >>>>>>>>
> >>>>>>>>   qemu-system-x86_64: -device EPYC-x86_64-cpu,core-id=0,socket-
> >>> id=1,
> >>>>>>>>   thread-id=0: Invalid CPU [socket: 21855, die: 0, core: 0, thread: 0]
> >>>>>>>>   with APIC ID 21855, valid index range 0:1
> >>>>>>>>
> >>>>>>>> This happens because APIC ID is calculated using uninitialized
> memory.
> >>>>>>>> This is happening after the addition of new field node_id in
> >>>>> X86CPUTopoIDs
> >>>>>>>> structure. The node_id field is uninitialized while calling
> >>>>>>>> apicid_from_topo_ids. The problem is discussed in the thread below.
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ker
> >>>>>>> nel.org%2Fqemu-
> >>>>>>>
> >>>>>
> >>>
> devel%2F20200602171838.GG577771%40habkost.net%2F&amp;data=02%7C01
> >>>>>>>
> >>>>>
> >>>
> %7Cbabu.moger%40amd.com%7C02200d75fd8b48d1955608d811e44f5b%7C3d
> >>>>>>>
> >>>>>
> >>>
> d8961fe4884e608e11a82d994e183d%7C0%7C0%7C637279019564311233&amp
> >>>>>>>
> >>>>>
> >>>
> ;sdata=ry3QO0Z5dxLPoRxkYVkOsVm3nl%2BxfCGv8be%2BMHdoUPY%3D&amp;r
> >>>>>>> eserved=0
> >>>>>>>>
> >>>>>>>> Fix the problem by initializing the node_id properly.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
> >>>>>>>> ---
> >>>>>>>>  hw/i386/pc.c               |    2 ++
> >>>>>>>>  include/hw/i386/topology.h |   11 +++++++++++
> >>>>>>>>  2 files changed, 13 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> >>>>>>>> index 2128f3d6fe..974cc30891 100644
> >>>>>>>> --- a/hw/i386/pc.c
> >>>>>>>> +++ b/hw/i386/pc.c
> >>>>>>>> @@ -1585,6 +1585,8 @@ static void
> pc_cpu_pre_plug(HotplugHandler
> >>>>>>> *hotplug_dev,
> >>>>>>>>          topo_ids.die_id = cpu->die_id;
> >>>>>>>>          topo_ids.core_id = cpu->core_id;
> >>>>>>>>          topo_ids.smt_id = cpu->thread_id;
> >>>>>>>> +        topo_ids.node_id = cpu_x86_use_epyc_apic_id_encoding(ms-
> >>>>>> cpu_type)
> >>>>>>> ?
> >>>>>>>> +                           x86_node_id_for_epyc(&topo_info, &topo_ids) : 0;
> >>>>>>>
> >>>>>>> I'd rather not calculate some default value here,
> >>>>>>> this is the branch where we check user provided topology info and
> error
> >>> out
> >>>>>>> asking
> >>>>>>> to provide missing bits.
> >>>>>> Noticed that cpu->node_id is initialized to
> >>> 0xFF(NUMA_NODE_UNASSIGNED).
> >>>>>> We can initialize cpu->node_id to default node like how we do it in
> >>>>>> x86_get_default_cpu_node_id.  We can use it to initialize
> >>> topo_ids.node_id.
> >>>>>> This is consistent with other fields core_id, die_id etc..
> >>>>>>
> >>>>>>>
> >>>>>>> I also wonder if we should force user to specify numa nodes on CLI if
> >>> EPYC
> >>>>> cpu is
> >>>>>>> used.
> >>>>>>> (i.e. I'm assuming that EPYC always requires numa)
> >>>>>>
> >>>>>> That is not true. Without numa all the cpus will be configured under one
> >>>>>> default numa node 0. Like we do it using x86_get_default_cpu_node_id.
> >>>>>
> >>>>> get_default_cpu_node_id() which is making things up, is going to be
> >>> removed
> >>>>> eventually in favor of asking user to provide numa mapping explicitly on
> CLI.
> >>>>
> >>>> That will be good going forward.
> >>>>
> >>>>>
> >>>>> now if it's always only node 0, why do we need to calculate it then,
> >>>>> why not just assing 0 directly?
> >>>>>
> >>>>> what if we have several sockets, would all vCPUs still have node-id = 0?
> >>>>
> >>>> If there are several nodes then socket id becomes node id.
> >>> I wonder if node id == socket id then why bother with node_id at all,
> >>> probably node id is there to allow for design where several sockets are on
> >>> the same node
> >>>
> >>>
> >>>>> Another question is if we have CPUs with only 1 numa node set on all of
> >>> them,
> >>>>> does it require all other machinery like ACPI SRAT table defined or it's just
> >>>>> internal CPU impl. detail?
> >>>>
> >>>> I am not very sure about it. To me it looks like it is just internal cpu
> >>>> implementation detail.
> >>> I'd think it might confuse guest OS, when it decodes more than 1 numa node
> >>> for APIC ID/CPUID but then there are no such nodes described in ACPI.
> >>> While it might work for caches, it would miss any relation of memory
> mapping
> >>> to nodes or get it wrong if one doesn't match another.
> >>>
> >>>
> >>>> I think we have two options here.
> >>>>
> >>>> 1. Update the patch to initialize the node_id the way it is done
> >>>> get_default_cpu_node_id.
> >>>
> >>> if it were only one node for every CPU (incl. multisocket), I'd go with
> enabling
> >>> autonuma assigning all CPUs to default 0 node-id, since there is no
> ambiguity
> >>> where
> >>> CPUs and RAM are mapped to.
> >>> Is it possible to use node-id=0 for all EPYC CPUs even in multisocket config?
> >>> (it seems spec allows only one node per socket, but doesn't say that node
> ids
> >>> must
> >>> be different.)
> >>> If not, then making up node-id is not an option.
> >>>
> >>>> 2. Ask the user to pass the node_id while hot plugging the device. This is
> >>>> a clean solution. Will require some data structure changes.
> >>>
> >>> Here is my brain dump of current very non obvious flow:
> >>>
> >>>   1. x86_possible_cpu_arch_ids()
> >>>          ms->possible_cpus->cpus[i].props.* = x86_topo_ids_from_idx()
> >>>
> >>>   2. possible numa_cpu_set()
> >>>          ms->possible_cpus->cpus[i].props.node_id = user input|0 -in autonuma
> >>> case
> >>>
> >>>   3. x86_cpus_init()
> >>>          // generate apic_id AND makeup node_id embedded into it
> >>>          ms->possible_cpus->cpus[i].arch_id =
> x86_cpu_apic_id_from_index(x86ms,
> >>> i);
> >>>                         -> x86_apicid_from_cpu_idx_epyc() ->
> >>> x86_topo_ids_from_idx_epyc()
> >>>                                                                  same as x86_topo_ids_from_idx() +
> node_id
> >>>                      or
> >>>                         -> x86_apicid_from_cpu_idx() -> x86_topo_ids_from_idx()
> >>>
> >>>   4. pc_cpu_pre_plug()
> >>>          // basically topo ids module node-id is not set or user provided
> >>>          cpu_slot = pc_find_cpu_slot(MACHINE(pcms), cpu->apic_id, &idx);
> >>>
> >>>   5.
> >>>          // do it again with diff that in EPYC case it my have different node-id
> than
> >>> cpu_slot
> >>>          x86ms->topo_ids_from_apicid(cpu->apic_id, &topo_info, &topo_ids);
> >>>
> >>>          //i.e. user input of node-id is ignored
> >>>          set socket-id/core-id/... (but not node-id) from topo_info
> >>>
> >>>          numa_cpu_pre_plug(cpu_slot)
> >>>                            ^^^^^^
> >>>               if (node_id == CPU_UNSET_NUMA_NODE_ID) {
> >>>                    if (slot->props.has_node_id)
> >>>                        object_property_set_int(... slot->props.node_id, "node-id",...);
> >>>               // this applies to hotplugged without node-id and to initial CPUs (-
> smp X)
> >>>               // so we may end up with "node-id" being set to user defined value
> >>>               // or left unset (no numa enabled)
> >>>               // while APIC ID will have some node-id encoded in it.
> >>>
> >>>
> >>> that's quite a mess, maybe we should unify both
> >>> amd make x86_apicid_from_cpu_idx_epyc()/x86_apicid_from_cpu_idx() use
> >>> ms->possible_cpus->cpus[i].props instead of x86_topo_ids_from_idx()
> >>> i.e
> >>>
> >>>        x86_apicid_from_cpu_idx_epyc() {
> >>>            topoids = x86_apicid_from_cpu_idx() {
> >>>                           return ms->possible_cpus->cpus[i].props
> >>>                       }
> >>>            if (ms->possible_cpus->cpus[i].props.has_node_id)
> >>>                topoids.node_id = ms->possible_cpus->cpus[i].props.node_id
> >>>            else
> >>>                error_fatal("EPYC requires use of -numa to define topology if using
> >>> more than 1 socket")
> >>>        }
> >>>
> >>> that way QEMU makes up only node[0] by enabling autonuma or whatever
> >>> user privided explicitly is encoded into APIC ID and it will be always
> consistent
> >>> with cpu
> >>> *-id properties in possible_cpus and SRAT table QEMU generates.
> >>>
> >>> as cleanup we can get rid of back and forth conversion [5] and use cpu_slot
> to
> >>> set
> >>> the same ids.
> >>>
> >>> Also maybe we should have a check that node-id is the same within socket
> in
> >>> case of EPYC
> >>> if it's guarantied that EPYC won't support multiple nodes per socket.
> >>>
> >>> hope it makes at least some sense.
> >>
> >> To make things clear, in case of autonuma we don't have to worry about
> >> node_id. We just have to set it topo_ids.node_id to 0 in pc_cpu_pre_plug,
> >> Everything will work as expected. This will solve our current problem of
> >> uninitialized variable.
> >
> > I'm proposing to enable autonuma, which in its turn will assign all CPUs to
> > node-id=0 in possible_cpus. And once this information is in possible_cpus,
> > numa_cpu_pre_plug() should take care of setting correct node-id on CPU for
> > the case of initial CPUs (node_id == CPU_UNSET_NUMA_NODE_ID), and in
> case
> > of hotplug numa_cpu_pre_plug() will complaing if user suppled nonsense on
> > with device_add.
> >
> >> Problem here is, when user has configured the numa, then setting the
> >> topo_ids.node_id to 0 might not work because it might create duplicate
> >> apicids and device_add will be rejected.  As per the comments in
> > I don't see how such duplicate could be made, even if all CPUs have 0
> > node-id, there are pkg_id/core_id/thread_id wich are encoded in APIC ID,
> > where pkg_id is unique across machine (at least in QEMU), so I don't see
> > how duplicate is possible.
> >
> >> numa_cpu_pre_plug, this is already broken. Look at the comments below.
> >> Looks like node_id cannot be passed down.
> >> ============================================
> >> if (node_id == CPU_UNSET_NUMA_NODE_ID) {
> >>         /* due to bug in libvirt, it doesn't pass node-id from props on
> >>          * device_add as expected, so we have to fix it up here */
> >>         if (slot->props.has_node_id) {
> >>             object_property_set_int(OBJECT(dev), slot->props.node_id,
> >>                                     "node-id", errp);
> >>         }
> >           else if (epyc)
> >              error("incomplete EPYC topology use -numa cpu,node-id=some-
> id,socket-id=%d  to configure numa node for socket",
> >                     cpu_socket_id)
> >
> >>     } else if (node_id != slot->props.node_id) {
> >> ============================================
> >>
> >> I was trying to solve this problem setting the node_id correctly for EPYC
> >> at least.  If you think, this is not important we can ignore it (by
> >> setting topo_ids.node_id to 0) and move forward.  I don't see the need for
> >> changing other topology specific code as we have already made very generic.
> >
> > node-id - can be passed down (problem was that libvird didn't do it back then
> > for -device/device_add, hence above hack).
> >
> > But that's not a problem, the problem is that x86_apicid_from_cpu_idx_epyc()
> makes
> > up node-id on its own, which is not big deal in case of autonuma since they
> happen
> > to match and there aren't any ambiguity, but with more numa nodes, numa
> config should
> > be defined by user explicitly and current code may end up with incoherent
> config,
> > where some parts of QEMU think CPU has one node-id while APIC ID is
> encoded with another.
> >
> > So after some pondering on a subject, to make sure it will look correct from all
> angles,
> > we need to:
> >
> >  1: use single source for topo ids, i.e. pull user provided node-id from
> possible_cpus
> >     for both cpu.node-id property and for APIC ID. Hence my suggestion to
> change
> >     x86_apicid_from_cpu_idx_epyc() as described above.
> >
> >  2: verify that user provided id's make sense in EPYC case. (pre_plug)
> 
> Basically, you are saying to setup the props.has_node_id and props.node_id
> while buidling the topology. Make sure to use the numa information if the
> user has provided else build the topology as per EPYC topology model.
> Right? I will have to think thru this little bit. Will try to send the
> draft patch tomarrow.

It is getting much more complicated. I cannot change the
x86_apicid_from_cpu_idx_epyc to access the
ms->possible_cpus->cpus[i].props because we don’t have access to
MachineState inside topology functions.  Even with these changes I don’t
know if we are going to solve the un-initialized memory problem.
I will continue to investigate. Let me know if think any other options.

Following command hotplugs the device without the node_id. Is there a
command to hotadd the device with node id? I am trying to understand if we
really need to worry about non-zero node id.

$ qemu-system-x86_64 -machine q35,accel=kvm -smp 1,maxcpus=2,
  cores=1, threads=1,sockets=2 -cpu EPYC -device EPYC-x86_64-
  cpu,core-id=0,socket-id=1,thread-id=0


> 
> >
> >>>> Let me know if you see any other option.
> >>>>
> >>>>>
> >>>>>
> >>>>>>>
> >>>>>>>>          cpu->apic_id = x86ms->apicid_from_topo_ids(&topo_info,
> >>>>> &topo_ids);
> >>>>>>>>      }
> >>>>>>>>
> >>>>>>>> diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
> >>>>>>>> index 07239f95f4..ee4deb84c4 100644
> >>>>>>>> --- a/include/hw/i386/topology.h
> >>>>>>>> +++ b/include/hw/i386/topology.h
> >>>>>>>> @@ -140,6 +140,17 @@ static inline unsigned
> >>>>>>> apicid_pkg_offset_epyc(X86CPUTopoInfo *topo_info)
> >>>>>>>>             apicid_node_width_epyc(topo_info);
> >>>>>>>>  }
> >>>>>>>>
> >>>>>>>> +static inline unsigned x86_node_id_for_epyc(X86CPUTopoInfo
> >>>>> *topo_info,
> >>>>>>>> +                                            const X86CPUTopoIDs *topo_ids)
> >>>>>>>> +{
> >>>>>>>> +    unsigned nr_nodes = MAX(topo_info->nodes_per_pkg, 1);
> >>>>>>>> +    unsigned cores_per_node = DIV_ROUND_UP((topo_info-
> >>>> dies_per_pkg
> >>>>> *
> >>>>>>>> +                                            topo_info->cores_per_die *
> >>>>>>>> +                                            topo_info->threads_per_core),
> >>>>>>>> +                                            nr_nodes);
> >>>>>>>> +
> >>>>>>>> +    return (topo_ids->core_id / cores_per_node) % nr_nodes;
> >>>>>>> what if nr_nodes == 0?
> >>>>>>>
> >>>>>>>> +}
> >>>>>>>>  /*
> >>>>>>>>   * Make APIC ID for the CPU based on Pkg_ID, Core_ID, SMT_ID
> >>>>>>>>   *
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>
> >


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU hotplug
  2020-06-25 22:55                 ` Babu Moger
  2020-06-26 20:56                   ` Babu Moger
@ 2020-06-30  2:48                   ` Babu Moger
  1 sibling, 0 replies; 13+ messages in thread
From: Babu Moger @ 2020-06-30  2:48 UTC (permalink / raw)
  To: Babu Moger, Igor Mammedov; +Cc: qemu-devel, pbonzini, rth, ehabkost, mst



> -----Original Message-----
> From: Moger, Babu <babu.moger@amd.com>
> Sent: Thursday, June 25, 2020 5:55 PM
> To: Igor Mammedov <imammedo@redhat.com>
> Cc: ehabkost@redhat.com; mst@redhat.com; qemu-devel@nongnu.org;
> pbonzini@redhat.com; rth@twiddle.net
> Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU
> hotplug
> 
> 
> 
> On 6/25/20 1:32 PM, Igor Mammedov wrote:
> > On Thu, 25 Jun 2020 11:41:25 -0500
> > Babu Moger <babu.moger@amd.com> wrote:
> >
> >> Igor,
> >>
> >>> -----Original Message-----
> >>> From: Igor Mammedov <imammedo@redhat.com>
> >>> Sent: Thursday, June 25, 2020 10:19 AM
> >>> To: Moger, Babu <Babu.Moger@amd.com>
> >>> Cc: ehabkost@redhat.com; mst@redhat.com; qemu-devel@nongnu.org;
> >>> pbonzini@redhat.com; rth@twiddle.net
> >>> Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and
> CPU
> >>> hotplug
> >>>
> >>> On Wed, 24 Jun 2020 12:35:59 -0500
> >>> Babu Moger <babu.moger@amd.com> wrote:
> >>>
> >>>>> -----Original Message-----
> >>>>> From: Igor Mammedov <imammedo@redhat.com>
> >>>>> Sent: Wednesday, June 24, 2020 8:48 AM
> >>>>> To: Moger, Babu <Babu.Moger@amd.com>
> >>>>> Cc: ehabkost@redhat.com; mst@redhat.com; qemu-devel@nongnu.org;
> >>>>> pbonzini@redhat.com; rth@twiddle.net
> >>>>> Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device
> and
> >>> CPU
> >>>>> hotplug
> >>>>>
> >>>>> On Tue, 16 Jun 2020 12:18:56 -0500
> >>>>> Babu Moger <babu.moger@amd.com> wrote:
> >>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Igor Mammedov <imammedo@redhat.com>
> >>>>>>> Sent: Tuesday, June 16, 2020 5:59 AM
> >>>>>>> To: Moger, Babu <Babu.Moger@amd.com>
> >>>>>>> Cc: pbonzini@redhat.com; rth@twiddle.net; ehabkost@redhat.com;
> >>>>>>> mst@redhat.com; marcel.apfelbaum@gmail.com; qemu-
> >>> devel@nongnu.org
> >>>>>>> Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device
> >>> and
> >>>>> CPU
> >>>>>>> hotplug
> >>>>>>>
> >>>>>>> On Mon, 08 Jun 2020 15:18:50 -0500
> >>>>>>> Babu Moger <babu.moger@amd.com> wrote:
> >>>>>>>
> >>>>>>>> Noticed the following command failure while testing CPU hotplug.
> >>>>>>>>
> >>>>>>>> $ qemu-system-x86_64 -machine q35,accel=kvm -smp 1,maxcpus=2,
> >>>>>>>>   cores=1, threads=1,sockets=2 -cpu EPYC -device EPYC-x86_64-
> >>>>>>>>   cpu,core-id=0,socket-id=1,thread-id=0
> >>>>>>>>
> >>>>>>>>   qemu-system-x86_64: -device EPYC-x86_64-cpu,core-id=0,socket-
> >>> id=1,
> >>>>>>>>   thread-id=0: Invalid CPU [socket: 21855, die: 0, core: 0, thread: 0]
> >>>>>>>>   with APIC ID 21855, valid index range 0:1
> >>>>>>>>
> >>>>>>>> This happens because APIC ID is calculated using uninitialized
> memory.
> >>>>>>>> This is happening after the addition of new field node_id in
> >>>>> X86CPUTopoIDs
> >>>>>>>> structure. The node_id field is uninitialized while calling
> >>>>>>>> apicid_from_topo_ids. The problem is discussed in the thread below.
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ker
> >>>>>>> nel.org%2Fqemu-
> >>>>>>>
> >>>>>
> >>>
> devel%2F20200602171838.GG577771%40habkost.net%2F&amp;data=02%7C01
> >>>>>>>
> >>>>>
> >>>
> %7Cbabu.moger%40amd.com%7C02200d75fd8b48d1955608d811e44f5b%7C3d
> >>>>>>>
> >>>>>
> >>>
> d8961fe4884e608e11a82d994e183d%7C0%7C0%7C637279019564311233&amp
> >>>>>>>
> >>>>>
> >>>
> ;sdata=ry3QO0Z5dxLPoRxkYVkOsVm3nl%2BxfCGv8be%2BMHdoUPY%3D&amp;r
> >>>>>>> eserved=0
> >>>>>>>>
> >>>>>>>> Fix the problem by initializing the node_id properly.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
> >>>>>>>> ---
> >>>>>>>>  hw/i386/pc.c               |    2 ++
> >>>>>>>>  include/hw/i386/topology.h |   11 +++++++++++
> >>>>>>>>  2 files changed, 13 insertions(+)
> >>>>>>>>
> >>>>>>>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> >>>>>>>> index 2128f3d6fe..974cc30891 100644
> >>>>>>>> --- a/hw/i386/pc.c
> >>>>>>>> +++ b/hw/i386/pc.c
> >>>>>>>> @@ -1585,6 +1585,8 @@ static void
> pc_cpu_pre_plug(HotplugHandler
> >>>>>>> *hotplug_dev,
> >>>>>>>>          topo_ids.die_id = cpu->die_id;
> >>>>>>>>          topo_ids.core_id = cpu->core_id;
> >>>>>>>>          topo_ids.smt_id = cpu->thread_id;
> >>>>>>>> +        topo_ids.node_id = cpu_x86_use_epyc_apic_id_encoding(ms-
> >>>>>> cpu_type)
> >>>>>>> ?
> >>>>>>>> +                           x86_node_id_for_epyc(&topo_info, &topo_ids) : 0;
> >>>>>>>
> >>>>>>> I'd rather not calculate some default value here,
> >>>>>>> this is the branch where we check user provided topology info and
> error
> >>> out
> >>>>>>> asking
> >>>>>>> to provide missing bits.
> >>>>>> Noticed that cpu->node_id is initialized to
> >>> 0xFF(NUMA_NODE_UNASSIGNED).
> >>>>>> We can initialize cpu->node_id to default node like how we do it in
> >>>>>> x86_get_default_cpu_node_id.  We can use it to initialize
> >>> topo_ids.node_id.
> >>>>>> This is consistent with other fields core_id, die_id etc..
> >>>>>>
> >>>>>>>
> >>>>>>> I also wonder if we should force user to specify numa nodes on CLI if
> >>> EPYC
> >>>>> cpu is
> >>>>>>> used.
> >>>>>>> (i.e. I'm assuming that EPYC always requires numa)
> >>>>>>
> >>>>>> That is not true. Without numa all the cpus will be configured under one
> >>>>>> default numa node 0. Like we do it using x86_get_default_cpu_node_id.
> >>>>>
> >>>>> get_default_cpu_node_id() which is making things up, is going to be
> >>> removed
> >>>>> eventually in favor of asking user to provide numa mapping explicitly on
> CLI.
> >>>>
> >>>> That will be good going forward.
> >>>>
> >>>>>
> >>>>> now if it's always only node 0, why do we need to calculate it then,
> >>>>> why not just assing 0 directly?
> >>>>>
> >>>>> what if we have several sockets, would all vCPUs still have node-id = 0?
> >>>>
> >>>> If there are several nodes then socket id becomes node id.
> >>> I wonder if node id == socket id then why bother with node_id at all,
> >>> probably node id is there to allow for design where several sockets are on
> >>> the same node
> >>>
> >>>
> >>>>> Another question is if we have CPUs with only 1 numa node set on all of
> >>> them,
> >>>>> does it require all other machinery like ACPI SRAT table defined or it's just
> >>>>> internal CPU impl. detail?
> >>>>
> >>>> I am not very sure about it. To me it looks like it is just internal cpu
> >>>> implementation detail.
> >>> I'd think it might confuse guest OS, when it decodes more than 1 numa node
> >>> for APIC ID/CPUID but then there are no such nodes described in ACPI.
> >>> While it might work for caches, it would miss any relation of memory
> mapping
> >>> to nodes or get it wrong if one doesn't match another.
> >>>
> >>>
> >>>> I think we have two options here.
> >>>>
> >>>> 1. Update the patch to initialize the node_id the way it is done
> >>>> get_default_cpu_node_id.
> >>>
> >>> if it were only one node for every CPU (incl. multisocket), I'd go with
> enabling
> >>> autonuma assigning all CPUs to default 0 node-id, since there is no
> ambiguity
> >>> where
> >>> CPUs and RAM are mapped to.
> >>> Is it possible to use node-id=0 for all EPYC CPUs even in multisocket config?
> >>> (it seems spec allows only one node per socket, but doesn't say that node
> ids
> >>> must
> >>> be different.)
> >>> If not, then making up node-id is not an option.
> >>>
> >>>> 2. Ask the user to pass the node_id while hot plugging the device. This is
> >>>> a clean solution. Will require some data structure changes.
> >>>
> >>> Here is my brain dump of current very non obvious flow:
> >>>
> >>>   1. x86_possible_cpu_arch_ids()
> >>>          ms->possible_cpus->cpus[i].props.* = x86_topo_ids_from_idx()
> >>>
> >>>   2. possible numa_cpu_set()
> >>>          ms->possible_cpus->cpus[i].props.node_id = user input|0 -in autonuma
> >>> case
> >>>
> >>>   3. x86_cpus_init()
> >>>          // generate apic_id AND makeup node_id embedded into it
> >>>          ms->possible_cpus->cpus[i].arch_id =
> x86_cpu_apic_id_from_index(x86ms,
> >>> i);
> >>>                         -> x86_apicid_from_cpu_idx_epyc() ->
> >>> x86_topo_ids_from_idx_epyc()
> >>>                                                                  same as x86_topo_ids_from_idx() +
> node_id
> >>>                      or
> >>>                         -> x86_apicid_from_cpu_idx() -> x86_topo_ids_from_idx()
> >>>
> >>>   4. pc_cpu_pre_plug()
> >>>          // basically topo ids module node-id is not set or user provided
> >>>          cpu_slot = pc_find_cpu_slot(MACHINE(pcms), cpu->apic_id, &idx);
> >>>
> >>>   5.
> >>>          // do it again with diff that in EPYC case it my have different node-id
> than
> >>> cpu_slot
> >>>          x86ms->topo_ids_from_apicid(cpu->apic_id, &topo_info, &topo_ids);
> >>>
> >>>          //i.e. user input of node-id is ignored
> >>>          set socket-id/core-id/... (but not node-id) from topo_info
> >>>
> >>>          numa_cpu_pre_plug(cpu_slot)
> >>>                            ^^^^^^
> >>>               if (node_id == CPU_UNSET_NUMA_NODE_ID) {
> >>>                    if (slot->props.has_node_id)
> >>>                        object_property_set_int(... slot->props.node_id, "node-id",...);
> >>>               // this applies to hotplugged without node-id and to initial CPUs (-
> smp X)
> >>>               // so we may end up with "node-id" being set to user defined value
> >>>               // or left unset (no numa enabled)
> >>>               // while APIC ID will have some node-id encoded in it.
> >>>
> >>>
> >>> that's quite a mess, maybe we should unify both
> >>> amd make x86_apicid_from_cpu_idx_epyc()/x86_apicid_from_cpu_idx() use
> >>> ms->possible_cpus->cpus[i].props instead of x86_topo_ids_from_idx()
> >>> i.e
> >>>
> >>>        x86_apicid_from_cpu_idx_epyc() {
> >>>            topoids = x86_apicid_from_cpu_idx() {
> >>>                           return ms->possible_cpus->cpus[i].props
> >>>                       }
> >>>            if (ms->possible_cpus->cpus[i].props.has_node_id)
> >>>                topoids.node_id = ms->possible_cpus->cpus[i].props.node_id
> >>>            else
> >>>                error_fatal("EPYC requires use of -numa to define topology if using
> >>> more than 1 socket")
> >>>        }
> >>>
> >>> that way QEMU makes up only node[0] by enabling autonuma or whatever
> >>> user privided explicitly is encoded into APIC ID and it will be always
> consistent
> >>> with cpu
> >>> *-id properties in possible_cpus and SRAT table QEMU generates.
> >>>
> >>> as cleanup we can get rid of back and forth conversion [5] and use cpu_slot
> to
> >>> set
> >>> the same ids.

Still working on this idea.  Hopefully will send the patches soon if I can
get it working.

> >>>
> >>> Also maybe we should have a check that node-id is the same within socket
> in
> >>> case of EPYC
> >>> if it's guarantied that EPYC won't support multiple nodes per socket.
> >>>
> >>> hope it makes at least some sense.
> >>
> >> To make things clear, in case of autonuma we don't have to worry about
> >> node_id. We just have to set it topo_ids.node_id to 0 in pc_cpu_pre_plug,
> >> Everything will work as expected. This will solve our current problem of
> >> uninitialized variable.
> >
> > I'm proposing to enable autonuma, which in its turn will assign all CPUs to
> > node-id=0 in possible_cpus. And once this information is in possible_cpus,
> > numa_cpu_pre_plug() should take care of setting correct node-id on CPU for
> > the case of initial CPUs (node_id == CPU_UNSET_NUMA_NODE_ID), and in
> case
> > of hotplug numa_cpu_pre_plug() will complaing if user suppled nonsense on
> > with device_add.
> >
> >> Problem here is, when user has configured the numa, then setting the
> >> topo_ids.node_id to 0 might not work because it might create duplicate
> >> apicids and device_add will be rejected.  As per the comments in
> > I don't see how such duplicate could be made, even if all CPUs have 0
> > node-id, there are pkg_id/core_id/thread_id wich are encoded in APIC ID,
> > where pkg_id is unique across machine (at least in QEMU), so I don't see
> > how duplicate is possible.
> >
> >> numa_cpu_pre_plug, this is already broken. Look at the comments below.
> >> Looks like node_id cannot be passed down.
> >> ============================================
> >> if (node_id == CPU_UNSET_NUMA_NODE_ID) {
> >>         /* due to bug in libvirt, it doesn't pass node-id from props on
> >>          * device_add as expected, so we have to fix it up here */
> >>         if (slot->props.has_node_id) {
> >>             object_property_set_int(OBJECT(dev), slot->props.node_id,
> >>                                     "node-id", errp);
> >>         }
> >           else if (epyc)
> >              error("incomplete EPYC topology use -numa cpu,node-id=some-
> id,socket-id=%d  to configure numa node for socket",
> >                     cpu_socket_id)
> >
> >>     } else if (node_id != slot->props.node_id) {
> >> ============================================
> >>
> >> I was trying to solve this problem setting the node_id correctly for EPYC
> >> at least.  If you think, this is not important we can ignore it (by
> >> setting topo_ids.node_id to 0) and move forward.  I don't see the need for
> >> changing other topology specific code as we have already made very generic.
> >
> > node-id - can be passed down (problem was that libvird didn't do it back then
> > for -device/device_add, hence above hack).
> >
> > But that's not a problem, the problem is that x86_apicid_from_cpu_idx_epyc()
> makes
> > up node-id on its own, which is not big deal in case of autonuma since they
> happen
> > to match and there aren't any ambiguity, but with more numa nodes, numa
> config should
> > be defined by user explicitly and current code may end up with incoherent
> config,
> > where some parts of QEMU think CPU has one node-id while APIC ID is
> encoded with another.
> >
> > So after some pondering on a subject, to make sure it will look correct from all
> angles,
> > we need to:
> >
> >  1: use single source for topo ids, i.e. pull user provided node-id from
> possible_cpus
> >     for both cpu.node-id property and for APIC ID. Hence my suggestion to
> change
> >     x86_apicid_from_cpu_idx_epyc() as described above.
> >
> >  2: verify that user provided id's make sense in EPYC case. (pre_plug)
> 
> Basically, you are saying to setup the props.has_node_id and props.node_id
> while buidling the topology. Make sure to use the numa information if the
> user has provided else build the topology as per EPYC topology model.
> Right? I will have to think thru this little bit. Will try to send the
> draft patch tomarrow.
> 
> 
> >
> >>>> Let me know if you see any other option.
> >>>>
> >>>>>
> >>>>>
> >>>>>>>
> >>>>>>>>          cpu->apic_id = x86ms->apicid_from_topo_ids(&topo_info,
> >>>>> &topo_ids);
> >>>>>>>>      }
> >>>>>>>>
> >>>>>>>> diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
> >>>>>>>> index 07239f95f4..ee4deb84c4 100644
> >>>>>>>> --- a/include/hw/i386/topology.h
> >>>>>>>> +++ b/include/hw/i386/topology.h
> >>>>>>>> @@ -140,6 +140,17 @@ static inline unsigned
> >>>>>>> apicid_pkg_offset_epyc(X86CPUTopoInfo *topo_info)
> >>>>>>>>             apicid_node_width_epyc(topo_info);
> >>>>>>>>  }
> >>>>>>>>
> >>>>>>>> +static inline unsigned x86_node_id_for_epyc(X86CPUTopoInfo
> >>>>> *topo_info,
> >>>>>>>> +                                            const X86CPUTopoIDs *topo_ids)
> >>>>>>>> +{
> >>>>>>>> +    unsigned nr_nodes = MAX(topo_info->nodes_per_pkg, 1);
> >>>>>>>> +    unsigned cores_per_node = DIV_ROUND_UP((topo_info-
> >>>> dies_per_pkg
> >>>>> *
> >>>>>>>> +                                            topo_info->cores_per_die *
> >>>>>>>> +                                            topo_info->threads_per_core),
> >>>>>>>> +                                            nr_nodes);
> >>>>>>>> +
> >>>>>>>> +    return (topo_ids->core_id / cores_per_node) % nr_nodes;
> >>>>>>> what if nr_nodes == 0?
> >>>>>>>
> >>>>>>>> +}
> >>>>>>>>  /*
> >>>>>>>>   * Make APIC ID for the CPU based on Pkg_ID, Core_ID, SMT_ID
> >>>>>>>>   *
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>
> >


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-06-30  2:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-08 20:18 [PATCH 0/2] Fix couple of issues with AMD topology Babu Moger
2020-06-08 20:18 ` [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU hotplug Babu Moger
2020-06-16 10:59   ` Igor Mammedov
2020-06-16 17:18     ` Babu Moger
2020-06-24 13:47       ` Igor Mammedov
2020-06-24 17:35         ` Babu Moger
2020-06-25 15:18           ` Igor Mammedov
2020-06-25 16:41             ` Babu Moger
2020-06-25 18:32               ` Igor Mammedov
2020-06-25 22:55                 ` Babu Moger
2020-06-26 20:56                   ` Babu Moger
2020-06-30  2:48                   ` Babu Moger
2020-06-08 20:18 ` [PATCH 2/2] i386: Simplify CPUID_8000_001E for AMD Babu Moger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.