qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: Babu Moger <babu.moger@amd.com>
Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>,
	"rth@twiddle.net" <rth@twiddle.net>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"ehabkost@redhat.com" <ehabkost@redhat.com>
Subject: Re: [PATCH v2 1/3] hw/i386: Initialize topo_ids from CpuInstanceProperties
Date: Mon, 27 Jul 2020 19:14:16 +0200	[thread overview]
Message-ID: <20200727191416.2bf6e34a@redhat.com> (raw)
In-Reply-To: <5b32c961-4fd0-8b8c-4475-eafff2ae48a9@amd.com>

On Mon, 27 Jul 2020 10:49:08 -0500
Babu Moger <babu.moger@amd.com> wrote:

> > -----Original Message-----
> > From: Igor Mammedov <imammedo@redhat.com>
> > Sent: Friday, July 24, 2020 12:05 PM
> > To: Moger, Babu <Babu.Moger@amd.com>
> > Cc: qemu-devel@nongnu.org; pbonzini@redhat.com; ehabkost@redhat.com;
> > rth@twiddle.net
> > Subject: Re: [PATCH v2 1/3] hw/i386: Initialize topo_ids from
> > CpuInstanceProperties
> > 
> > On Mon, 13 Jul 2020 14:30:29 -0500
> > Babu Moger <babu.moger@amd.com> wrote:
> >   
> > > > -----Original Message-----
> > > > From: Igor Mammedov <imammedo@redhat.com>
> > > > Sent: Monday, July 13, 2020 12:32 PM
> > > > To: Moger, Babu <Babu.Moger@amd.com>
> > > > Cc: pbonzini@redhat.com; rth@twiddle.net; ehabkost@redhat.com; qemu-
> > > > devel@nongnu.org
> > > > Subject: Re: [PATCH v2 1/3] hw/i386: Initialize topo_ids from
> > > > CpuInstanceProperties
> > > >
> > > > On Mon, 13 Jul 2020 11:43:33 -0500
> > > > Babu Moger <babu.moger@amd.com> wrote:
> > > >  
> > > > > On 7/13/20 11:17 AM, Igor Mammedov wrote:  
> > > > > > On Mon, 13 Jul 2020 10:02:22 -0500 Babu Moger
> > > > > > <babu.moger@amd.com> wrote:
> > > > > >  
> > > > > >>> -----Original Message-----
> > > > > >>> From: Igor Mammedov <imammedo@redhat.com>
> > > > > >>> Sent: Monday, July 13, 2020 4:08 AM
> > > > > >>> To: Moger, Babu <Babu.Moger@amd.com>
> > > > > >>> Cc: pbonzini@redhat.com; rth@twiddle.net; ehabkost@redhat.com;
> > > > > >>> qemu- devel@nongnu.org
> > > > > >>> Subject: Re: [PATCH v2 1/3] hw/i386: Initialize topo_ids from
> > > > > >>> CpuInstanceProperties  
> > > > > > [...]  
> > > > > >>>> +
> > > > > >>>> +/*
> > > > > >>>> + * Initialize topo_ids from CpuInstanceProperties
> > > > > >>>> + * node_id in CpuInstanceProperties(or in CPU device) is a
> > > > > >>>> +sequential
> > > > > >>>> + * number, but while building the topology  
> > > > > >>>  
> > > > > >>>> we need to separate it for
> > > > > >>>> + * each socket(mod nodes_per_pkg).  
> > > > > >>> could you clarify a bit more on why this is necessary?  
> > > > > >>
> > > > > >> If you have two sockets and 4 numa nodes, node_id in
> > > > > >> CpuInstanceProperties will be number sequentially as 0, 1, 2, 3.
> > > > > >> But in EPYC topology, it will be  0, 1, 0, 1( Basically mod %
> > > > > >> number of nodes  
> > > > per socket).  
> > > > > >
> > > > > > I'm confused, let's suppose we have 2 EPYC sockets with 2 nodes
> > > > > > per socket so APIC id woulbe be composed like:
> > > > > >
> > > > > >  1st socket
> > > > > >    pkg_id(0) | node_id(0)
> > > > > >    pkg_id(0) | node_id(1)
> > > > > >
> > > > > >  2nd socket
> > > > > >    pkg_id(1) | node_id(0)
> > > > > >    pkg_id(1) | node_id(1)
> > > > > >
> > > > > > if that's the case, then EPYC's node_id here doesn't look like a
> > > > > > NUMA node in the sense it's usually used (above config would
> > > > > > have 4 different memory controllers => 4 conventional NUMA nodes).  
> > > > >
> > > > > EPIC model uses combination of socket id and node id to identify
> > > > > the numa nodes. So, it internally uses all the information.  
> > > >
> > > > well with above values, EPYC's node_id doesn't look like it's
> > > > specifying a machine numa node, but rather a node index within
> > > > single socket. In which case, it doesn't make much sense calling it
> > > > NUMA node_id, it's rather some index within a socket. (it starts
> > > > looking like terminology is all mixed up)
> > > >
> > > > If you have access to a milti-socket EPYC machine, can you dump and
> > > > post here its apic ids, pls?  
> > >
> > > Here is the output from my EPYC machine with 2 sockets and totally 8
> > > nodes(SMT disabled). The cpus 0-31 are in socket 0 and  cpus 32-63 in
> > > socket 1.
> > >
> > > # lscpu
> > > Architecture:        x86_64
> > > CPU op-mode(s):      32-bit, 64-bit
> > > Byte Order:          Little Endian
> > > CPU(s):              64
> > > On-line CPU(s) list: 0-63
> > > Thread(s) per core:  1
> > > Core(s) per socket:  32
> > > Socket(s):           2
> > > NUMA node(s):        8
> > > Vendor ID:           AuthenticAMD
> > > CPU family:          23
> > > Model:               1
> > > Model name:          AMD Eng Sample: 1S1901A4VIHF5_30/19_N
> > > Stepping:            2
> > > CPU MHz:             2379.233
> > > CPU max MHz:         1900.0000
> > > CPU min MHz:         1200.0000
> > > BogoMIPS:            3792.81
> > > Virtualization:      AMD-V
> > > L1d cache:           32K
> > > L1i cache:           64K
> > > L2 cache:            512K
> > > L3 cache:            8192K
> > > NUMA node0 CPU(s):   0-7
> > > NUMA node1 CPU(s):   8-15
> > > NUMA node2 CPU(s):   16-23
> > > NUMA node3 CPU(s):   24-31
> > > NUMA node4 CPU(s):   32-39
> > > NUMA node5 CPU(s):   40-47
> > > NUMA node6 CPU(s):   48-55
> > > NUMA node7 CPU(s):   56-63
> > >
> > > Here is the output of #cpuid  -l 0x8000001e  -r  
> > 
> > 
> > (1)  
> > > You may want to refer
> > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
> > >  
> > amd.com%2Fsystem%2Ffiles%2FTechDocs%2F54945_3.03_ppr_ZP_B2_pub.zip&
> > amp  
> > >  
> > ;data=02%7C01%7Cbabu.moger%40amd.com%7Ceacf7e8facbc4ae2eee808d82
> > ff3ca9  
> > >  
> > 0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6373120714103223
> > 35&amp;  
> > >  
> > sdata=%2Fdr93YVlwSq82%2FwRh2NU21Zkw4HJ%2B%2FVVYxAkhCCKJ4w%3D&a
> > mp;reser  
> > > ved=0 (section 2.1.12.2.1.3 ApicId Enumeration Requirements).
> > > Note that this is a general guideline. We tried to generalize in qemu
> > > as much as possible. It is bit complex.  
> > 
> > 
> >   
> > > CPU 0:
> > >    0x8000001e 0x00: eax=0x00000000 ebx=0x00000100 ecx=0x00000300
> > > edx=0x00000000  
> > [...]  
> > > CPU 63:
> > >    0x8000001e 0x00: eax=0x0000007e ebx=0x0000011f ecx=0x00000307
> > > edx=0x00000000
> > >  
> > > >  
> > > > >  
> > > > > >
> > > > > > I wonder if linux guest actually uses node_id encoded in apic id
> > > > > > for configuring/checking numa structures, or it just uses
> > > > > > whatever ACPI SRAT table provided.
> > > > > >  
> > > > > >>>> + */
> > > > > >>>> +static inline void x86_init_topo_ids(X86CPUTopoInfo *topo_info,
> > > > > >>>> +                                     CpuInstanceProperties props,
> > > > > >>>> +                                     X86CPUTopoIDs *topo_ids) {
> > > > > >>>> +    topo_ids->smt_id = props.has_thread_id ? props.thread_id : 0;
> > > > > >>>> +    topo_ids->core_id = props.has_core_id ? props.core_id : 0;
> > > > > >>>> +    topo_ids->die_id = props.has_die_id ? props.die_id : 0;
> > > > > >>>> +    topo_ids->node_id = props.has_node_id ?
> > > > > >>>> +                        props.node_id %
> > > > > >>>> +MAX(topo_info->nodes_per_pkg, 1) : 0;  
> > 
> > It looks like I was wrong pushing system wide NUMA node-id into APIC ID
> > (choosen naming is confusing a bit), per [1] mentioned above, EPYC's node-id is:
> > 
> > • ApicId[6] = Socket ID.
> > * ApicId[5:4]= Node ID.
> > • ApicId[3] = Logical CCX L3 complex ID
> > • ApicId[2:0]= (SMT) ? {LogicalCoreID[1:0],ThreadId} : {1'b0,LogicalCoreID[1:0]}
> > 
> > which is can hold only 0-3 values, and defined as:
> > 
> > "A node, is an integrated circuit device that includes one to 8 cores (one or two
> > Core Complexes)."
> > 
> > spec also mentions it indirectly as die-id if one looks at CPUID_Fn8000001E_EBX
> > [Core Identifiers] (Core::X86::Cpuid::CoreId) ...
> >   CoreId = ({2'b0, DieId[1:0], LogicalComplexId[0], LogicalThreadId[2:0]} >> SMT
> > 
> > and in
> > (2)
> > CPUID_Fn8000001E_ECX [Node Identifiers] (Core::X86::Cpuid::NodeId) ...
> >   {5'b00000,1'b[SOCKET_ID],2'b[DIE_ID]}
> > 
> > Question is why we did not reuse topo_ids->die_id instead of adding confusing
> > topo_ids->node_id in the first place?  
> 
> Initially, I thought about it. But Intel uses die_id differently than AMD.
> So, did not want complicate things.
> If we take that route then we need to re-arrange the numa code as we need
> to numa information to calculate the die id. So, did not want to mix up
> things.
> 
> > 
> > Also looking APIC ID and SRAT table provided here, CPUID_Fn8000001E_ECX
> > corresponds to NUMA node id (i.e. what -numa in QEMU used for) and Node ID
> > embeded into ApicId[5:4] is basically die-id.
> > 
> > Difference between die-id implemented in QEMU and EPYC's die id (topo_ids-  
> > >node_id) is that the former doesn't require numa config (maybe it should, but  
> > ship'salready sailed) and gets number of dies from '-smp dies=X' CLI option,
> > while for EPYC we calculate topo_ids->node_id implicitly from number of numa
> > nodes and sockets, which implicitly requires that machine 'must' be configured
> > with -numa options.
> > 
> > Maybe we should drop this implicit calculation along with topo_ids->node_id
> > and reuse '-smp dies=X' plus extra checks for EPYC to ask for -numa if there is
> > more than 1 die and if we need to be really strict, add checks for limit of
> > dies/cores within socket/die according to spec[2] so encoded APIC ID and
> > CPUID_8000001E match the spec.  
> 
> There will be complications when user configures with both die_id and
> numa_id. It will complicate things further. I will have to look closely at
> the code if it is feasible.

it's worth a try.
conseptionally die_id in intel/amd is the same. Most likely intel has a dedicated
memory controller on each die so it still should form a NUMA node. But that aspect
probably was ignored while implementing it in QEMU so ping of configuring QEMU right
is on user's shoulders (there is no checks whatsoever if cpu belonging to specific
die is on right NUMA node).

What AMD has implemented on top of that in CPU hw, is to expose NUMA node id in
CPUID_8000001E. I don't know why it was done as usually it's ACPI tables that
describe relations between nodes so for OS this info almost useless (I'd guess
it's faster to use CPUID instead of fetching pre-cpu variable but that's pretty
much it from OS point of view)

> 
> > 
> > 
> >   
> > > > > >>>> +    topo_ids->pkg_id = props.has_socket_id ? props.socket_id :
> > > > > >>>> +0; }
> > > > > >>>>  /*
> > > > > >>>>   * Make APIC ID for the CPU 'cpu_index'
> > > > > >>>>   *
> > > > > >>>>  
> > > > > >>  
> > > > > >  
> > > > >  
> > >
> > >  
> 



  reply	other threads:[~2020-07-27 17:15 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-01 17:30 [PATCH v2 0/3] Fix couple of issues with AMD topology Babu Moger
2020-07-01 17:31 ` [PATCH v2 1/3] hw/i386: Initialize topo_ids from CpuInstanceProperties Babu Moger
2020-07-13  9:08   ` Igor Mammedov
2020-07-13 15:02     ` Babu Moger
2020-07-13 16:17       ` Igor Mammedov
2020-07-13 16:43         ` Babu Moger
2020-07-13 17:32           ` Igor Mammedov
2020-07-13 19:30             ` Babu Moger
2020-07-14 16:41               ` Igor Mammedov
2020-07-14 17:26                 ` Babu Moger
2020-07-14 18:26                   ` Igor Mammedov
2020-07-24 17:05               ` Igor Mammedov
2020-07-27 15:49                 ` Babu Moger
2020-07-27 17:14                   ` Igor Mammedov [this message]
2020-07-27 23:59                     ` Babu Moger
2020-07-29 14:12                       ` Igor Mammedov
2020-07-29 21:22                         ` Babu Moger
2020-07-30 11:27                           ` Igor Mammedov
2020-07-01 17:31 ` [PATCH v2 2/3] hw/i386: Build apic_id " Babu Moger
2020-07-13  9:15   ` Igor Mammedov
2020-07-13 15:00     ` Babu Moger
2020-07-01 17:31 ` [PATCH v2 3/3] hw/386: Fix uninitialized memory with -device and CPU hotplug Babu Moger
2020-07-27 16:36   ` Igor Mammedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200727191416.2bf6e34a@redhat.com \
    --to=imammedo@redhat.com \
    --cc=babu.moger@amd.com \
    --cc=ehabkost@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).