QEMU-Devel Archive on lore.kernel.org
 help / color / Atom feed
From: Babu Moger <babu.moger@amd.com>
To: Eduardo Habkost <ehabkost@redhat.com>
Cc: "mst@redhat.com" <mst@redhat.com>,
	"armbru@redhat.com" <armbru@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"imammedo@redhat.com" <imammedo@redhat.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"rth@twiddle.net" <rth@twiddle.net>
Subject: Re: [Qemu-devel] [RFC 2 PATCH 07/16] hw/386: Add new epyc mode topology decoding functions
Date: Mon, 2 Dec 2019 14:38:03 -0600
Message-ID: <d0b3badc-69d9-7538-0f47-e206c2123e6f@amd.com> (raw)
In-Reply-To: <20191011032953.GD29387@habkost.net>



On 10/10/19 10:29 PM, Eduardo Habkost wrote:
> On Fri, Sep 06, 2019 at 07:12:25PM +0000, Moger, Babu wrote:
>> These functions add support for building new epyc mode topology
>> given smp details like numa nodes, cores, threads and sockets.
>> Subsequent patches will use these functions to build the topology.
>>
>> The topology details are available in Processor Programming Reference (PPR)
>> for AMD Family 17h Model 01h, Revision B1 Processors.
>> It is available at https://www.amd.com/en/support/tech-docs
>>
>> Signed-off-by: Babu Moger <babu.moger@amd.com>
>> ---
>>  include/hw/i386/topology.h |  174 ++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 174 insertions(+)
>>
>> diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h
>> index 5a61d53f05..6fd4184f07 100644
>> --- a/include/hw/i386/topology.h
>> +++ b/include/hw/i386/topology.h
>> @@ -62,6 +62,22 @@ typedef struct X86CPUTopoInfo {
>>      unsigned nr_threads;
>>  } X86CPUTopoInfo;
>>  
>> +/*
>> + * Definitions used for building CPUID Leaf 0x8000001D and 0x8000001E
>> + * Please refer to the AMD64 Architecture Programmer’s Manual Volume 3.
>> + * Define the constants to build the cpu topology. Right now, TOPOEXT
>> + * feature is enabled only on EPYC. So, these constants are based on
>> + * EPYC supported configurations. We may need to handle the cases if
>> + * these values change in future.
>> + */
>> +
>> +/* Maximum core complexes in a node */
>> +#define MAX_CCX                  2
> 
> I suggest a more explicit name like MAX_CCX_IN_NODE.
> 
>> +/* Maximum cores in a core complex */
>> +#define MAX_CORES_IN_CCX         4
>> +/* Maximum cores in a node */
>> +#define MAX_CORES_IN_NODE        8
>> +
> 
> Having limits isn't necessarily bad, but if we look at the code
> that use those constants below:
> 
> [...]
>> +/*
>> + * Figure out the number of nodes required to build this config.
>> + * Max cores in a nodes is 8
>> + */
>> +static inline int nodes_in_pkg(X86CPUTopoInfo *topo_info)
>> +{
>> +    /*
>> +     * Create a config with user given (nr_nodes > 1) numa node config,
>> +     * else go with a standard configuration
>> +     */
>> +    if (topo_info->numa_nodes > 1) {
>> +        return DIV_ROUND_UP(topo_info->numa_nodes, topo_info->nr_sockets);
>> +    } else {
>> +        return DIV_ROUND_UP(topo_info->nr_cores, MAX_CORES_IN_NODE);
>> +    }
>> +}
> 
> Here we are trying to choose a default value for the number of
> nodes, but it's better to let the CPU or machine code choose it.
> Otherwise, this will be very painful to maintain if new CPUs with
> different limits appear.  I would just let the number of nodes
> per package to be configurable in the command-line.
> 
>> +
>> +/*
>> + * Decide the number of cores in a core complex with the given nr_cores using
>> + * following set constants MAX_CCX, MAX_CORES_IN_CCX, MAX_CORES_IN_DIE and
>> + * MAX_NODES_PER_SOCKET. Maintain symmetry as much as possible
>> + * L3 cache is shared across all cores in a core complex. So, this will also
>> + * tell us how many cores are sharing the L3 cache.
>> + */
>> +static inline int cores_in_ccx(X86CPUTopoInfo *topo_info)
>> +{
>> +    int nodes;
>> +
>> +    /* Get the number of nodes required to build this config */
>> +    nodes = nodes_in_pkg(topo_info);
>> +
>> +    /*
>> +     * Divide the cores accros all the core complexes
>> +     * Return rounded up value
>> +     */
>> +    return DIV_ROUND_UP(topo_info->nr_cores, nodes * MAX_CCX);
>> +}
> 
> Same here.  This will become painful to maintain if CPUs with a
> different CCX-per-node limit appear.  I suggest just letting the
> number of cores per CCX to be configurable in the command-line.
> The "cores" option is already defined as "cores per die" when
> nr_dies > 1.  We can easily define it to mean "cores per CCX"
> when nr_ccxs > 1.
> 
> If you still want to impose limits to some of those parameters, I
> suggest placing those limits in the CPU model table, not in
> global constants.
> 
> 
>> +
>> +/*
>> + * Make APIC ID for the CPU based on Pkg_ID, Core_ID, SMT_ID
>> + *
>> + * The caller must make sure core_id < nr_cores and smt_id < nr_threads.
>> + */
>> +static inline apic_id_t x86_apicid_from_topo_ids_epyc(X86CPUTopoInfo *topo_info,
>> +                                                  const X86CPUTopoIDs *topo_ids)
>> +{
>> +    unsigned nr_ccxs = MAX_CCX;
> 
> I suggest adding nr_ccxs to X86CPUTopoInfo, instead of making it
> fixed.
> 
>> +    unsigned nr_nodes = nodes_in_pkg(topo_info);
> 
> Same here: isn't it better to have a nr_nodes_per_pkg field in
> X86CPUTopoInfo, and make it configurable in the command-line?
> 
>> +    unsigned nr_cores = MAX_CORES_IN_CCX;
> 
> nr_cores is already in X86CPUTopoInfo, why aren't you using it?
> 
> Similar questions at x86_topo_ids_from_apicid_epyc() [1].
> 
>> +    unsigned nr_threads = topo_info->nr_threads;
>> +
>> +    return (topo_ids->pkg_id  << apicid_pkg_offset_epyc(nr_nodes, nr_ccxs,
>> +                                                        nr_cores, nr_threads)) |
>> +           (topo_ids->node_id  << apicid_node_offset(nr_ccxs, nr_cores,
>> +                                                     nr_threads)) |
>> +           (topo_ids->ccx_id  << apicid_ccx_offset(nr_cores, nr_threads)) |
>> +           (topo_ids->core_id << apicid_core_offset(nr_threads)) |
>> +           topo_ids->smt_id;
>> +}
>> +
>> +static inline void x86_topo_ids_from_idx_epyc(X86CPUTopoInfo *topo_info,
>> +                                              unsigned cpu_index,
>> +                                              X86CPUTopoIDs *topo_ids)
>> +{
>> +    unsigned nr_cores = topo_info->nr_cores;
> 
> Is nr_cores cores-per-ccx?  cores-per-node?  cores-per-pkg?
> 
> 
>> +    unsigned nr_threads = topo_info->nr_threads;
>> +    unsigned core_index = cpu_index / nr_threads % nr_cores;
> 
> It's hard to understand what this variable means.  Is it a unique
> identifier for a core inside the whole system?  Is it unique
> inside a package?  Unique inside a node?  Unique inside a ccx?
> 
> 
>> +    unsigned ccx_cores = cores_in_ccx(topo_info);
>> +
>> +    topo_ids->smt_id = cpu_index % nr_threads;
>> +    topo_ids->core_id = core_index % ccx_cores; /* core id inside the ccx */
>> +    topo_ids->ccx_id = (core_index % (ccx_cores * MAX_CCX)) / ccx_cores;
>> +    topo_ids->node_id = core_index / (ccx_cores * MAX_CCX);
> 
> It looks like we have one extra constraint we must ensure
> elsewhere in the code: we need to make sure the node topology in
> topo->node_id matches the node topology defined using -numa,
> don't we?
> 
> 
>> +    topo_ids->pkg_id = cpu_index / (nr_cores * nr_threads);
> 
> Now, these calculations are over my head.  It's hard to
> understand what "nr_cores" mean, or what "ccx_cores * MAX_CCX" is
> calculating.
> 
> I will try to compare with the existing x86_topo_ids_from_idx()
> function, to see if we can make the hierarchy clearer:
> 
> Existing code is:
> 
>     topo->pkg_id = cpu_index / (nr_dies * nr_cores * nr_threads);
>     topo->die_id = cpu_index / (nr_cores * nr_threads) % nr_dies;
>     topo->core_id = cpu_index / nr_threads % nr_cores;
>     topo->smt_id = cpu_index % nr_threads;
> 
> The code is hierarchical, but not 100% clear.  Let's try to make
> it clearer:
> 
>     unsigned threads_per_core = topo->nr_threads;
>     unsigned   cores_per_die  = topo->nr_cores;
>     unsigned    dies_per_pkg  = topo->nr_dies;
> 
>     unsigned threads_per_die  = threads_per_core * cores_per_die;
>     unsigned threads_per_pkg  = threads_per_die  *  dies_per_pkg;
> 
>     unsigned thread_index = cpu_index;
>     unsigned   core_index = cpu_index / threads_per_core;
>     unsigned    die_index = cpu_index / threads_per_die;
>     unsigned    pkg_index = cpu_index / threads_per_pkg;
> 
>     topo-> smt_id = thread_index % threads_per_core;
>     topo->core_id =   core_index %   cores_per_die;
>     topo-> die_id =    die_index %    dies_per_pkg;
>     topo-> pkg_id =    pkg_index;
> 
> The logic above should be equivalent to the existing
> x86_topo_ids_from_idx() function.
> 
> Can we make it better for the smt/core/ccx/node/pkg case too?
> Let's try:
> 
>     unsigned threads_per_core = topo->nr_threads;
>     unsigned   cores_per_ccx  = topo->nr_cores;
>     unsigned    ccxs_per_node = topo->nr_ccxs;
>     unsigned   nodes_per_pkg  = topo->nodes_per_pkg;
> 
>     unsigned threads_per_ccx  = threads_per_core * cores_per_ccx;
>     unsigned threads_per_node = threads_per_ccx  *  ccxs_per_node;
>     unsigned threads_per_pkg  = threads_per_node * nodes_per_pkg;
> 
>     unsigned thread_index = cpu_index;
>     unsigned   core_index = cpu_index / threads_per_core;
>     unsigned    ccx_index = cpu_index / threads_per_ccx;
>     unsigned   node_index = cpu_index / threads_per_node;
>     unsigned    pkg_index = cpu_index / threads_per_pkg;
> 
>     topo-> smt_id = thread_index % threads_per_core;
>     topo->core_id =   core_index %   cores_per_ccx;
>     topo-> ccx_id =    ccx_index %    ccxs_per_node;
>     topo->node_id =   node_index %   nodes_per_pkg;
>     topo-> pkg_id =    pkg_index;
> 
> The logic above probably isn't equivalent to the code you wrote.
> The point here is that each variable is very clearly defined with
> a "per_*" suffix to avoid confusion, and all parameters are
> coming from X86CPUTopoInfo.
> 
>> +}
>> +
>> +/*
>> + * Calculate thread/core/package IDs for a specific topology,
>> + * based on APIC ID
>> + */
>> +static inline void x86_topo_ids_from_apicid_epyc(apic_id_t apicid,
>> +                                            X86CPUTopoInfo *topo_info,
>> +                                            X86CPUTopoIDs *topo_ids)
>> +{
>> +    unsigned nr_nodes = nodes_in_pkg(topo_info);
>> +    unsigned nr_cores = MAX_CORES_IN_CCX;
>> +    unsigned nr_threads = topo_info->nr_threads;
>> +    unsigned nr_ccxs = MAX_CCX;
> 
> Same questions as in x86_apicid_from_topo_ids_epyc() [1]:
> shouldn't nr_cores, nr_ccxs and nr_nodes come from
> X86CPUTopoInfo?
> 
>> +
>> +    topo_ids->smt_id = apicid &
>> +                   ~(0xFFFFFFFFUL << apicid_smt_width(nr_threads));
>> +
>> +    topo_ids->core_id = (apicid >> apicid_core_offset(nr_threads)) &
>> +                   ~(0xFFFFFFFFUL << apicid_core_width(nr_cores));
>> +
>> +    topo_ids->ccx_id = (apicid >> apicid_ccx_offset(nr_cores, nr_threads)) &
>> +                   ~(0xFFFFFFFFUL << apicid_ccx_width(nr_ccxs));
>> +
>> +    topo_ids->node_id = (apicid >> apicid_node_offset(nr_ccxs, nr_cores,
>> +                                                      nr_threads)) &
>> +                   ~(0xFFFFFFFFUL << apicid_node_width(nr_nodes));
>> +
>> +    topo_ids->pkg_id = apicid >> apicid_pkg_offset_epyc(nr_nodes, nr_ccxs,
>> +                                                        nr_cores, nr_threads);
>> +}
>> +
>> +/*
>> + * Make APIC ID for the CPU 'cpu_index'
>> + *
>> + * 'cpu_index' is a sequential, contiguous ID for the CPU.
>> + */
>> +static inline apic_id_t x86_apicid_from_cpu_idx_epyc(X86CPUTopoInfo *topo_info,
>> +                                                     unsigned cpu_index)
>> +{
>> +    X86CPUTopoIDs topo_ids;
>> +    x86_topo_ids_from_idx_epyc(topo_info, cpu_index, &topo_ids);
>> +    return x86_apicid_from_topo_ids_epyc(topo_info, &topo_ids);
>> +}
> 
> If we make nodes_per_pkg==1 by default, treat nr_ccxs and nr_dies
> as synonyms, and make all parameters come from X86CPUTopoInfo,
> can't we reuse exactly the same topology code for both EPYC and
> non-EPYC?  It would be better than having two separate sets of
> functions.
> 
> I have one additional request: please add unit tests for the
> functions above to test-x86-cpuid.c.  There may be opportunities
> for refactoring this code later, and unit tests will be important
> to make sure we don't break anything.
> 
>> +
>>  /* Make APIC ID for the CPU based on Pkg_ID, Core_ID, SMT_ID
>>   *
>>   * The caller must make sure core_id < nr_cores and smt_id < nr_threads.
>>
> 

I have removed all the hardcoded values. All we need to know is the node
id if numa configured. All other fields just works fine as it is now.
Please look at v3 for more information.


  reply index

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-06 19:11 [Qemu-devel] [RFC 2 PATCH 00/16] APIC ID fixes for AMD EPYC CPU models Moger, Babu
2019-09-06 19:11 ` [Qemu-devel] [RFC 2 PATCH 01/16] numa: Split the numa functionality Moger, Babu
2019-10-10  3:25   ` Eduardo Habkost
2019-12-02 20:18     ` Babu Moger
2019-09-06 19:11 ` [Qemu-devel] [RFC 2 PATCH 02/16] hw/i386: Rename X86CPUTopoInfo structure to X86CPUTopoIDs Moger, Babu
2019-10-10  3:26   ` Eduardo Habkost
2019-09-06 19:11 ` [Qemu-devel] [RFC 2 PATCH 03/16] hw/i386: Introduce X86CPUTopoInfo to contain topology info Moger, Babu
2019-10-11  2:29   ` Eduardo Habkost
2019-12-02 20:23     ` Babu Moger
2019-10-11  3:54   ` Eduardo Habkost
2019-12-02 20:25     ` Babu Moger
2019-09-06 19:12 ` [Qemu-devel] [RFC 2 PATCH 04/16] machine: Add SMP Sockets in CpuTopology Moger, Babu
2019-10-11  2:31   ` Eduardo Habkost
2019-09-06 19:12 ` [Qemu-devel] [RFC 2 PATCH 05/16] hw/i386: Simplify topology Offset/width Calculation Moger, Babu
2019-10-11  2:32   ` Eduardo Habkost
2019-12-02 20:29     ` Babu Moger
2019-09-06 19:12 ` [Qemu-devel] [RFC 2 PATCH 06/16] hw/core: Add core complex id in X86CPU topology Moger, Babu
2019-09-06 19:20   ` Eric Blake
2019-09-06 19:34     ` Moger, Babu
2019-09-22 12:48   ` Michael S. Tsirkin
2019-09-23 14:38     ` Moger, Babu
2019-09-06 19:12 ` [Qemu-devel] [RFC 2 PATCH 07/16] hw/386: Add new epyc mode topology decoding functions Moger, Babu
2019-10-11  3:29   ` Eduardo Habkost
2019-12-02 20:38     ` Babu Moger [this message]
2019-09-06 19:12 ` [Qemu-devel] [RFC 2 PATCH 08/16] i386: Cleanup and use the new epyc mode topology functions Moger, Babu
2019-10-11  3:51   ` Eduardo Habkost
2019-09-06 19:12 ` [Qemu-devel] [RFC 2 PATCH 09/16] hw/i386: Introduce initialize_topo_info function Moger, Babu
2019-10-11  3:54   ` Eduardo Habkost
2019-09-06 19:12 ` [Qemu-devel] [RFC 2 PATCH 10/16] hw/i386: Introduce apicid_from_cpu_idx in PCMachineState Moger, Babu
2019-09-06 19:12 ` [Qemu-devel] [RFC 2 PATCH 11/16] Introduce-topo_ids_from_apicid-handler Moger, Babu
2019-09-06 19:13 ` [Qemu-devel] [RFC 2 PATCH 12/16] hw/i386: Introduce apic_id_from_topo_ids handler in PCMachineState Moger, Babu
2019-09-06 19:13 ` [Qemu-devel] [RFC 2 PATCH 13/16] machine: Add new epyc property " Moger, Babu
2019-10-11  3:59   ` Eduardo Habkost
2019-10-11 16:23     ` Moger, Babu
2019-10-11 16:59     ` Moger, Babu
2019-10-11 19:03       ` Eduardo Habkost
2019-09-06 19:13 ` [Qemu-devel] [RFC 2 PATCH 14/16] hw/i386: Introduce epyc mode function handlers Moger, Babu
2019-09-06 19:13 ` [Qemu-devel] [RFC 2 PATCH 15/16] i386: Fix pkg_id offset for epyc mode Moger, Babu
2019-10-11  4:03   ` Eduardo Habkost
2019-09-06 19:13 ` [Qemu-devel] [RFC 2 PATCH 16/16] hw/core: Fix up the machine_set_cpu_numa_node for epyc Moger, Babu
2019-10-11  4:10   ` Eduardo Habkost
2019-12-02 20:44     ` Babu Moger
2019-09-20 22:44 ` [RFC 2 PATCH 00/16] APIC ID fixes for AMD EPYC CPU models Moger, Babu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d0b3badc-69d9-7538-0f47-e206c2123e6f@amd.com \
    --to=babu.moger@amd.com \
    --cc=armbru@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

QEMU-Devel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/qemu-devel/0 qemu-devel/git/0.git
	git clone --mirror https://lore.kernel.org/qemu-devel/1 qemu-devel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 qemu-devel qemu-devel/ https://lore.kernel.org/qemu-devel \
		qemu-devel@nongnu.org
	public-inbox-index qemu-devel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.nongnu.qemu-devel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git