All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Babu Moger <babu.moger@amd.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>,
	"mst@redhat.com" <mst@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"imammedo@redhat.com" <imammedo@redhat.com>,
	"rth@twiddle.net" <rth@twiddle.net>
Subject: Re: [PATCH v7 07/13] hw/386: Add EPYC mode topology decoding functions
Date: Mon, 15 Jun 2020 12:44:10 +0100	[thread overview]
Message-ID: <20200615114410.GA2884@work-vm> (raw)
In-Reply-To: <1a3bdd75-e092-1e7d-c2da-2459dd987e4b@amd.com>

* Babu Moger (babu.moger@amd.com) wrote:
> 
> 
> > -----Original Message-----
> > From: Eduardo Habkost <ehabkost@redhat.com>
> > Sent: Wednesday, June 3, 2020 10:46 AM
> > To: Moger, Babu <Babu.Moger@amd.com>
> > Cc: marcel.apfelbaum@gmail.com; pbonzini@redhat.com; rth@twiddle.net;
> > mst@redhat.com; imammedo@redhat.com; qemu-devel@nongnu.org
> > Subject: Re: [PATCH v7 07/13] hw/386: Add EPYC mode topology decoding
> > functions
> > 
> > On Wed, Jun 03, 2020 at 10:38:46AM -0500, Babu Moger wrote:
> > >
> > >
> > > On 6/3/20 10:31 AM, Eduardo Habkost wrote:
> > > > On Wed, Jun 03, 2020 at 10:22:10AM -0500, Babu Moger wrote:
> > > >>
> > > >>
> > > >>> -----Original Message-----
> > > >>> From: Eduardo Habkost <ehabkost@redhat.com>
> > > >>> Sent: Tuesday, June 2, 2020 6:01 PM
> > > >>> To: Moger, Babu <Babu.Moger@amd.com>
> > > >>> Cc: marcel.apfelbaum@gmail.com; pbonzini@redhat.com;
> > rth@twiddle.net;
> > > >>> mst@redhat.com; imammedo@redhat.com; qemu-devel@nongnu.org
> > > >>> Subject: Re: [PATCH v7 07/13] hw/386: Add EPYC mode topology decoding
> > > >>> functions
> > > >>>
> > > >>> On Tue, Jun 02, 2020 at 04:59:19PM -0500, Babu Moger wrote:
> > > >>>>
> > > >>>>
> > > >>>>> -----Original Message-----
> > > >>>>> From: Eduardo Habkost <ehabkost@redhat.com>
> > > >>>>> Sent: Tuesday, June 2, 2020 12:19 PM
> > > >>>>> To: Moger, Babu <Babu.Moger@amd.com>
> > > >>>>> Cc: marcel.apfelbaum@gmail.com; pbonzini@redhat.com;
> > rth@twiddle.net;
> > > >>>>> mst@redhat.com; imammedo@redhat.com; qemu-devel@nongnu.org
> > > >>>>> Subject: Re: [PATCH v7 07/13] hw/386: Add EPYC mode topology
> > decoding
> > > >>>>> functions
> > > >>>>>
> > > >>>>> Hi,
> > > >>>>>
> > > >>>>> It looks like this series breaks -device and CPU hotplug:
> > > >>>>>
> > > >>>>> On Wed, Mar 11, 2020 at 05:53:34PM -0500, Babu Moger wrote:
> > > >>>>>> These functions add support for building EPYC mode topology given
> > the
> > > >>> smp
> > > >>>>>> details like numa nodes, cores, threads and sockets.
> > > >>>>>>
> > > >>>>>> The new apic id decoding is mostly similar to current apic id decoding
> > > >>>>>> except that it adds a new field node_id when numa configured.
> > Removes
> > > >>> all
> > > >>>>>> the hardcoded values. Subsequent patches will use these functions to
> > build
> > > >>>>>> the topology.
> > > >>>>>>
> > > >>>>>> Following functions are added.
> > > >>>>>> apicid_llc_width_epyc
> > > >>>>>> apicid_llc_offset_epyc
> > > >>>>>> apicid_pkg_offset_epyc
> > > >>>>>> apicid_from_topo_ids_epyc
> > > >>>>>> x86_topo_ids_from_idx_epyc
> > > >>>>>> x86_topo_ids_from_apicid_epyc
> > > >>>>>> x86_apicid_from_cpu_idx_epyc
> > > >>>>>>
> > > >>>>>> The topology details are available in Processor Programming
> > Reference
> > > >>> (PPR)
> > > >>>>>> for AMD Family 17h Model 01h, Revision B1 Processors. The revision
> > > >>> guides
> > > >>>>> are
> > > >>>>>> available from the bugzilla Link below.
> > > >>>>>> Link:
> > > >>>>>
> > > >>>
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.
> > > >>>>>
> > > >>>
> > kernel.org%2Fshow_bug.cgi%3Fid%3D206537&amp;data=02%7C01%7Cbabu.m
> > > >>>>>
> > > >>>
> > oger%40amd.com%7C3487f40d37df4d59097d08d807190248%7C3dd8961fe488
> > > >>>>>
> > > >>>
> > 4e608e11a82d994e183d%7C0%7C0%7C637267151289763739&amp;sdata=wE0
> > > >>>>> ukXIVh0l5eNQWsv6VDE9UZEVJmisofaW192gcZAI%3D&amp;reserved=0
> > > >>>>>>
> > > >>>>>> Signed-off-by: Babu Moger <babu.moger@amd.com>
> > > >>>>>> Acked-by: Igor Mammedov <imammedo@redhat.com>
> > > >>>>>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > > >>>>>> ---
> > > >>>>> [...]
> > > >>>>>>  typedef struct X86CPUTopoIDs {
> > > >>>>>>      unsigned pkg_id;
> > > >>>>>> +    unsigned node_id;
> > > >>>>>
> > > >>>>> You have added a new field here.
> > > >>>>>
> > > >>>>>>      unsigned die_id;
> > > >>>>>>      unsigned core_id;
> > > >>>>>>      unsigned smt_id;
> > > >>>>> [...]
> > > >>>>>> +static inline apic_id_t
> > > >>>>>> +x86_apicid_from_topo_ids_epyc(X86CPUTopoInfo *topo_info,
> > > >>>>>> +                              const X86CPUTopoIDs *topo_ids)
> > > >>>>>> +{
> > > >>>>>> +    return (topo_ids->pkg_id  << apicid_pkg_offset_epyc(topo_info)) |
> > > >>>>>> +           (topo_ids->node_id << apicid_node_offset_epyc(topo_info)) |
> > > >>>>>
> > > >>>>> You are using the new field here.
> > > >>>>>
> > > >>>>>> +           (topo_ids->die_id  << apicid_die_offset(topo_info)) |
> > > >>>>>> +           (topo_ids->core_id << apicid_core_offset(topo_info)) |
> > > >>>>>> +           topo_ids->smt_id;
> > > >>>>>> +}
> > > >>>>>
> > > >>>>> But you are not initializing node_id in one caller of
> > apicid_from_topo_ids():
> > > >>>>>
> > > >>>>> static void pc_cpu_pre_plug(HotplugHandler *hotplug_dev,
> > > >>>>>                             DeviceState *dev, Error **errp)
> > > >>>>> {
> > > >>>>>     [...]
> > > >>>>>     X86CPUTopoIDs topo_ids;
> > > >>>>>     [...]
> > > >>>>>     if (cpu->apic_id == UNASSIGNED_APIC_ID) {
> > > >>>>>         [...]
> > > >>>>>         topo_ids.pkg_id = cpu->socket_id;
> > > >>>>>         topo_ids.die_id = cpu->die_id;
> > > >>>>>         topo_ids.core_id = cpu->core_id;
> > > >>>>>         topo_ids.smt_id = cpu->thread_id;
> > > >>>>>         cpu->apic_id = x86ms->apicid_from_topo_ids(&topo_info,
> > &topo_ids);
> > > >>>>>     }
> > > >>>>>     [...]
> > > >>>>> }
> > > >>>>>
> > > >>>>> Result: -device is broken when using -cpu EPYC:
> > > >>>>>
> > > >>>>>   $ qemu-system-x86_64 -machine q35,accel=kvm -smp
> > > >>>>> 1,maxcpus=2,cores=1,threads=1,sockets=2 -cpu EPYC -device EPYC-
> > x86_64-
> > > >>>>> cpu,core-id=0,socket-id=1,thread-id=0
> > > >>>
> > > >>> [1]
> > > >>>
> > > >>>>>   qemu-system-x86_64: -device EPYC-x86_64-cpu,core-id=0,socket-
> > > >>> id=1,thread-
> > > >>>>> id=0: Invalid CPU [socket: 21855, die: 0, core: 0, thread: 0] with APIC ID
> > > >>> 21855,
> > > >>>>> valid index range 0:1
> > > >>>>>
> > > >>>>> This happens because APIC ID is calculated using uninitialized
> > > >>>>> memory.
> > > >>>> This patch should initialize the node_id. But I am not sure how to
> > > >>>> reproduce the bug. Can you please send me the full command line to
> > > >>>> reproduce the problem. Also test different options.
> > > >>>
> > > >>> The full command line is above[1].
> > > >>
> > > >> I just picked up the latest tree from
> > > >> git clone
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.qem
> > u.org%2Fgit%2Fqemu.git&amp;data=02%7C01%7Cbabu.moger%40amd.com%7
> > C4b409b77ff2c4c4ed90608d807d53570%7C3dd8961fe4884e608e11a82d994e1
> > 83d%7C0%7C0%7C637267959601716206&amp;sdata=ih6bo9Wp0RbJgRpryzSa2
> > D0v6kr3Zfww%2B1uB%2FNyngk8%3D&amp;reserved=0 qemu
> > > >> Not seeing the problem.
> > > >>
> > > >> ./x86_64-softmmu/qemu-system-x86_64 -machine q35,accel=kvm -smp
> > > >> 1,maxcpus=2,cores=1,threads=1,sockets=2 -cpu EPYC -device
> > > >> EPYC-x86_64-cpu,core-id=0,socket-id=1,thread-id=0
> > > >> VNC server running on ::1:5900
> > > >>
> > > >> It appears to run ok.
> > >
> > > [2]
> > >
> > > >>
> > > >>>
> > > >>>
> > > >>>>
> > > >>>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > > >>>> index 2128f3d6fe..047b4b9391 100644
> > > >>>> --- a/hw/i386/pc.c
> > > >>>> +++ b/hw/i386/pc.c
> > > >>>> @@ -1539,6 +1539,9 @@ static void pc_cpu_pre_plug(HotplugHandler
> > > >>> *hotplug_dev,
> > > >>>>      if (cpu->apic_id == UNASSIGNED_APIC_ID) {
> > > >>>>          int max_socket = (ms->smp.max_cpus - 1) /
> > > >>>>                                  smp_threads / smp_cores / x86ms->smp_dies;
> > > >>>
> > > >>> So, here's the input you are using to calculate topo_ids.node_id:
> > > >>>
> > > >>>> +        unsigned nr_nodes = MAX(topo_info.nodes_per_pkg, 1);
> > > >>>
> > > >>> When is topo_info.nodes_per_pkg allowed to be 0?
> > > >>
> > > >> It will be zero if numa is not configured. Node_id will be zero for all
> > > >> the cores.
> > > >>
> > > >>>
> > > >>>> +        unsigned cores_per_node = DIV_ROUND_UP((x86ms->smp_dies *
> > > >>> smp_cores *
> > > >>>> +                                                smp_threads), nr_nodes);
> > > >>>
> > > >>> x86ms->smp_dies should be available at topo_info.dies_per_pkg,
> > > >>> smp_cores should available at topo_info.cores_per_die,
> > > >>> smp_threads should be available at topo_info.threads_per_core,
> > > >>> nr_nodes should be available at topo_info.nodes_per_pkg.
> > > >>>
> > > >>>>
> > > >>>>          /*
> > > >>>>           * die-id was optional in QEMU 4.0 and older, so keep it optional
> > > >>>> @@ -1585,6 +1588,7 @@ static void pc_cpu_pre_plug(HotplugHandler
> > > >>> *hotplug_dev,
> > > >>>>          topo_ids.die_id = cpu->die_id;
> > > >>>>          topo_ids.core_id = cpu->core_id;
> > > >>>>          topo_ids.smt_id = cpu->thread_id;
> > > >>>> +        topo_ids.node_id = (cpu->core_id / cores_per_node) % nr_nodes;
> > > >>>
> > > >>> apicid_from_topo_ids() have access to topo_info and topo_ids, If
> > > >>> all the information you need to calculate node_id is already
> > > >>> available inside topo_info + topo_ids, we could be calculating it
> > > >>> inside apicid_from_topo_ids().  Why don't we do it?
> > > >>>
> > > >>> Also, is topo_ids.core_id really allowed to be larger than
> > > >>> cores_per_node when calling apicid_from_topo_ids()?
> > > >>
> > > >> Yes. It is. If we have two numa nodes and 8 cores. Then cores_per_node is
> > > >> 4. Nr_nodes =2. Yes. Core_id can be larger than cores_per_node.
> > > >
> > > > I assumed core_id identified the core inside a die, and the range
> > > > would be [0, cores_per_die).  This seems to be what
> > > > apicid_from_topo_ids_epyc() expects.
> > > >
> > > > We need clearer documentation on the semantics of each *_id
> > > > field, to avoid confusion.
> > >
> > > Ok. Sure. I will add that. Can you please clarify on how to repro the
> > > issue. Marked the question at [2].
> > 
> > As it is usage of uninitialized memory, behavior is undefined and
> > not guaranteed to crash.  My QEMU binary was built with
> > --enable-debug, maybe it makes it easier to reproduce.
> 
> This patch works fine. Tested few combination. Will send the formal
> version if patch looks good.
> 
> diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
> index 07239f95f4..cb28f0a57c 100644
> --- a/include/hw/i386/topology.h
> +++ b/include/hw/i386/topology.h
> @@ -140,6 +140,17 @@ static inline unsigned
> apicid_pkg_offset_epyc(X86CPUTopoInfo *topo_info)
>             apicid_node_width_epyc(topo_info);
>  }
> 
> +static inline unsigned x86_node_id_for_epyc(X86CPUTopoInfo *topo_info,
> +                                            const X86CPUTopoIDs *topo_ids)
> +{
> +    unsigned nr_nodes = MAX(topo_info->nodes_per_pkg, 1);
> +    unsigned cores_per_node = DIV_ROUND_UP((topo_info->dies_per_pkg *
> +                                            topo_info->cores_per_die *
> +                                            topo_info->threads_per_core),
> +                                            nr_nodes);
> +
> +    return ((topo_ids->core_id / cores_per_node) % nr_nodes);
> +}
>  /*
>   * Make APIC ID for the CPU based on Pkg_ID, Core_ID, SMT_ID
>   *
> @@ -149,8 +160,11 @@ static inline apic_id_t
>  x86_apicid_from_topo_ids_epyc(X86CPUTopoInfo *topo_info,
>                                const X86CPUTopoIDs *topo_ids)
>  {
> +    /* node_id can be uninitialized. Initialize here */
> +    unsigned node_id = x86_node_id_for_epyc(topo_info, topo_ids);
> +
>      return (topo_ids->pkg_id  << apicid_pkg_offset_epyc(topo_info)) |
> -           (topo_ids->node_id << apicid_node_offset_epyc(topo_info)) |
> +           (node_id << apicid_node_offset_epyc(topo_info)) |
>             (topo_ids->die_id  << apicid_die_offset(topo_info)) |
>             (topo_ids->core_id << apicid_core_offset(topo_info)) |
>             topo_ids->smt_id;
> 

What's the status of this?  It seems to fix a hotplug case our QE hit
(RH bz 1828750)

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



  reply	other threads:[~2020-06-15 11:45 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-11 22:52 [PATCH v7 00/13] APIC ID fixes for AMD EPYC CPU model Babu Moger
2020-03-11 22:52 ` [PATCH v7 01/13] hw/i386: Introduce X86CPUTopoInfo to contain topology info Babu Moger
2020-03-12 11:11   ` Igor Mammedov
2020-03-11 22:52 ` [PATCH v7 02/13] hw/i386: Consolidate topology functions Babu Moger
2020-03-11 22:53 ` [PATCH v7 03/13] machine: Add SMP Sockets in CpuTopology Babu Moger
2020-03-11 22:53 ` [PATCH v7 04/13] hw/i386: Remove unnecessary initialization in x86_cpu_new Babu Moger
2020-03-11 22:53 ` [PATCH v7 05/13] hw/i386: Update structures to save the number of nodes per package Babu Moger
2020-03-11 22:53 ` [PATCH v7 06/13] hw/i386: Rename apicid_from_topo_ids to x86_apicid_from_topo_ids Babu Moger
2020-03-11 22:53 ` [PATCH v7 07/13] hw/386: Add EPYC mode topology decoding functions Babu Moger
2020-03-12 12:39   ` Igor Mammedov
2020-03-12 13:44     ` Babu Moger
2020-06-02 17:18   ` Eduardo Habkost
2020-06-02 17:27     ` Babu Moger
2020-06-02 21:59     ` Babu Moger
2020-06-02 23:01       ` Eduardo Habkost
2020-06-03  0:07         ` Babu Moger
2020-06-03 15:22         ` Babu Moger
2020-06-03 15:31           ` Eduardo Habkost
2020-06-03 15:38             ` Babu Moger
2020-06-03 15:45               ` Eduardo Habkost
2020-06-03 15:49                 ` Babu Moger
2020-06-03 21:49                 ` Babu Moger
2020-06-15 11:44                   ` Dr. David Alan Gilbert [this message]
2020-03-11 22:53 ` [PATCH v7 08/13] target/i386: Cleanup and use the EPYC mode topology functions Babu Moger
2020-03-11 22:53 ` [PATCH v7 09/13] hw/i386: Introduce apicid functions inside X86MachineState Babu Moger
2020-03-11 22:53 ` [PATCH v7 10/13] i386: Introduce use_epyc_apic_id_encoding in X86CPUDefinition Babu Moger
2020-03-12 12:26   ` Igor Mammedov
2020-03-11 22:54 ` [PATCH v7 11/13] hw/i386: Move arch_id decode inside x86_cpus_init Babu Moger
2020-03-12 12:31   ` Igor Mammedov
2020-03-11 22:54 ` [PATCH v7 12/13] target/i386: Enable new apic id encoding for EPYC based cpus models Babu Moger
2020-03-11 22:54 ` [PATCH v7 13/13] i386: Fix pkg_id offset for EPYC cpu models Babu Moger
2020-03-12 16:28 ` [PATCH v7 00/13] APIC ID fixes for AMD EPYC CPU model Babu Moger
2020-03-17 23:22   ` Eduardo Habkost
2020-03-17 23:31     ` Moger, Babu
2020-03-17 23:46     ` Eduardo Habkost
2020-03-18  2:43       ` Moger, Babu
2020-03-18 10:47         ` Igor Mammedov
2020-03-23 15:35           ` Moger, Babu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200615114410.GA2884@work-vm \
    --to=dgilbert@redhat.com \
    --cc=babu.moger@amd.com \
    --cc=ehabkost@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.