All of lore.kernel.org
 help / color / mirror / Atom feed
From: Babu Moger <babu.moger@amd.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: ehabkost@redhat.com, mst@redhat.com, armbru@redhat.com,
	qemu-devel@nongnu.org, pbonzini@redhat.com, rth@twiddle.net
Subject: Re: [PATCH v3 00/18] APIC ID fixes for AMD EPYC CPU models
Date: Thu, 6 Feb 2020 09:32:12 -0600	[thread overview]
Message-ID: <63d0f63c-bd84-5e42-0a30-531a54fb8af3@amd.com> (raw)
In-Reply-To: <20200206140839.378ea544@redhat.com>



On 2/6/20 7:08 AM, Igor Mammedov wrote:
> On Wed, 5 Feb 2020 13:07:31 -0600
> Babu Moger <babu.moger@amd.com> wrote:
> 
>> On 2/5/20 10:56 AM, Igor Mammedov wrote:
>>> On Wed, 5 Feb 2020 10:10:06 -0600
>>> Babu Moger <babu.moger@amd.com> wrote:
>>>   
>>>> On 2/5/20 3:38 AM, Igor Mammedov wrote:  
>>>>> On Tue, 4 Feb 2020 13:08:58 -0600
>>>>> Babu Moger <babu.moger@amd.com> wrote:
>>>>>     
>>>>>> On 2/4/20 2:02 AM, Igor Mammedov wrote:    
>>>>>>> On Mon, 3 Feb 2020 13:31:29 -0600
>>>>>>> Babu Moger <babu.moger@amd.com> wrote:
>>>>>>>       
>>>>>>>> On 2/3/20 8:59 AM, Igor Mammedov wrote:      
>>>>>>>>> On Tue, 03 Dec 2019 18:36:54 -0600
>>>>>>>>> Babu Moger <babu.moger@amd.com> wrote:
>>>>>>>>>         
>>>>>>>>>> This series fixes APIC ID encoding problems on AMD EPYC CPUs.
>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.redhat.com%2Fshow_bug.cgi%3Fid%3D1728166&amp;data=02%7C01%7Cbabu.moger%40amd.com%7C76bf8434899b41de094f08d7ab05bdf3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637165913481441118&amp;sdata=34fZQpUjScKbbc35c7ot433HA1Rz03YG6aP1ucyGUsQ%3D&amp;reserved=0
>>>>>>>>>>
>>>>>>>>>> Currently, the APIC ID is decoded based on the sequence
>>>>>>>>>> sockets->dies->cores->threads. This works for most standard AMD and other
>>>>>>>>>> vendors' configurations, but this decoding sequence does not follow that of
>>>>>>>>>> AMD's APIC ID enumeration strictly. In some cases this can cause CPU topology
>>>>>>>>>> inconsistency.  When booting a guest VM, the kernel tries to validate the
>>>>>>>>>> topology, and finds it inconsistent with the enumeration of EPYC cpu models.
>>>>>>>>>>
>>>>>>>>>> To fix the problem we need to build the topology as per the Processor
>>>>>>>>>> Programming Reference (PPR) for AMD Family 17h Model 01h, Revision B1
>>>>>>>>>> Processors. It is available at https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amd.com%2Fsystem%2Ffiles%2FTechDocs%2F55570-B1_PUB.zip&amp;data=02%7C01%7Cbabu.moger%40amd.com%7C76bf8434899b41de094f08d7ab05bdf3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637165913481451075&amp;sdata=4YXG%2BrCP5UUXcCQX4Ly8B%2FXdlvZoFrPCgonjy0IwG0U%3D&amp;reserved=0
>>>>>>>>>>
>>>>>>>>>> Here is the text from the PPR.
>>>>>>>>>> Operating systems are expected to use Core::X86::Cpuid::SizeId[ApicIdSize], the
>>>>>>>>>> number of least significant bits in the Initial APIC ID that indicate core ID
>>>>>>>>>> within a processor, in constructing per-core CPUID masks.
>>>>>>>>>> Core::X86::Cpuid::SizeId[ApicIdSize] determines the maximum number of cores
>>>>>>>>>> (MNC) that the processor could theoretically support, not the actual number of
>>>>>>>>>> cores that are actually implemented or enabled on the processor, as indicated
>>>>>>>>>> by Core::X86::Cpuid::SizeId[NC].
>>>>>>>>>> Each Core::X86::Apic::ApicId[ApicId] register is preset as follows:
>>>>>>>>>> • ApicId[6] = Socket ID.
>>>>>>>>>> • ApicId[5:4] = Node ID.
>>>>>>>>>> • ApicId[3] = Logical CCX L3 complex ID
>>>>>>>>>> • ApicId[2:0]= (SMT) ? {LogicalCoreID[1:0],ThreadId} : {1'b0,LogicalCoreID[1:0]}        
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> After checking out all patches and some pondering, used here approach
>>>>>>>>> looks to me too intrusive for the task at hand especially where it
>>>>>>>>> comes to generic code.
>>>>>>>>>
>>>>>>>>> (Ignore till ==== to see suggestion how to simplify without reading
>>>>>>>>> reasoning behind it first)
>>>>>>>>>
>>>>>>>>> Lets look for a way to simplify it a little bit.
>>>>>>>>>
>>>>>>>>> So problem we are trying to solve,
>>>>>>>>>  1: calculate APIC IDs based on cpu type (to e more specific: for EPYC based CPUs)
>>>>>>>>>  2: it depends on knowing total number of numa nodes.
>>>>>>>>>
>>>>>>>>> Externally workflow looks like following:
>>>>>>>>>   1. user provides -smp x,sockets,cores,...,maxcpus
>>>>>>>>>       that's used by possible_cpu_arch_ids() singleton to build list of
>>>>>>>>>       possible CPUs (which is available to user via command 'hotpluggable-cpus')
>>>>>>>>>
>>>>>>>>>       Hook could be called very early and possible_cpus data might be
>>>>>>>>>       not complete. It builds a list of possible CPUs which user could
>>>>>>>>>       modify later.
>>>>>>>>>
>>>>>>>>>   2.1 user uses "-numa cpu,node-id=x,..." or legacy "-numa node,node_id=x,cpus="
>>>>>>>>>       options to assign cpus to nodes, which is one way or another calling
>>>>>>>>>       machine_set_cpu_numa_node(). The later updates 'possible_cpus' list
>>>>>>>>>       with node information. It happens early when total number of nodes
>>>>>>>>>       is not available.
>>>>>>>>>
>>>>>>>>>   2.2 user does not provide explicit node mappings for CPUs.
>>>>>>>>>       QEMU steps in and assigns possible cpus to nodes in machine_numa_finish_cpu_init()
>>>>>>>>>       (using the same machine_set_cpu_numa_node()) right before calling boards
>>>>>>>>>       specific machine init(). At that time total number of nodes is known.
>>>>>>>>>
>>>>>>>>> In 1 -- 2.1 cases, 'arch_id' in 'possible_cpus' list doesn't have to be defined before
>>>>>>>>> boards init() is run.      
>>>>>>
>>>>>> In case of 2.1, we need to have the arch_id already generated. This is
>>>>>> done inside possible_cpu_arch_ids. The arch_id is used by
>>>>>> machine_set_cpu_numa_node to assign the cpus to correct numa node.    
>>>>>
>>>>> I might have missed something but I don't see arch_id itself being used in
>>>>> machine_set_cpu_numa_node(). It only uses props part of possible_cpus    
>>>>
>>>> Before calling machine_set_cpu_numa_node, we call
>>>> cpu_index_to_instance_props -> x86_cpu_index_to_props->
>>>> possible_cpu_arch_ids->x86_possible_cpu_arch_ids.
>>>>
>>>> This sequence sets up the arch_id(in x86_cpu_apic_id_from_index) for all
>>>> the available cpus. Based on the arch_id, it also sets up the props.  
>>>
>>>
>>> x86_possible_cpu_arch_ids()
>>>    arch_id = x86_cpu_apic_id_from_index(x86ms, i)
>>>    x86_topo_ids_from_apicid(arch_id, x86ms->smp_dies, ms->smp.cores,  ms->smp.threads, &topo);
>>>    // assign socket/die/core/thread from topo
>>>
>>> so currently it uses indirect way to convert index in possible_cpus->cpus[]
>>> to socket/die/core/thread ids.
>>> But essentially it take '-smp' options and [0..max_cpus) number as original data
>>> converts it into intermediate apic_id and then reverse engineer it back to
>>> topo info.
>>>
>>> Why not use x86_topo_ids_from_idx() directly to get rid of 'props' dependency on apic_id?  
>>
>> It might work. But this feels like a work-around and delaying the problem
>> for later. Just re-arranging the numa code little bit we can address this.
> 
> The idea behind possible_cpus is to allow users query topo information
> board generates (based on -smp) at configuration time (or late) so users
> could know what -numa cpu,topo_options [and -device foo-cpu,topo_options]
> to use, initializing apic_id on the first access is secondary and I did
> it only because I could do it without additional data.
> 
> But main purpose of possible_cpus is to keep topology information.
> That includes numa node mapping, which should be stored in possible_cpus
> along with the rest of cpu topology.
> 
> Looking [12/18] numa patch, it makes -numa node,cpus legacy option
> to reintroduce data duplication, by storing mapping elsewhere and
> then putting that mapping into possible_cpus at numa complete time
> (that's what I dislike and don't see a valid reason to do so).
> 
> That also won't work if user queries hotpluggable-cpus before that time
> and it also doesn't work if user uses preferable -numa cpu,topo_options
> as both would initialize possible_cpus on the first access.
> 
> So if you need do some board specific post-processing done on topo
> information when it's complete and recalculate apic_id do it at board
> init time like was suggested before (x86_cpu_new() looks like a good
> place to do it).

Ok. Sure. Will start working on it. Thanks


      reply	other threads:[~2020-02-06 15:33 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-04  0:36 [PATCH v3 00/18] APIC ID fixes for AMD EPYC CPU models Babu Moger
2019-12-04  0:37 ` [PATCH v3 01/18] hw/i386: Rename X86CPUTopoInfo structure to X86CPUTopoIDs Babu Moger
2020-02-03 15:08   ` Igor Mammedov
2020-02-03 18:25     ` Babu Moger
2019-12-04  0:37 ` [PATCH v3 02/18] hw/i386: Introduce X86CPUTopoInfo to contain topology info Babu Moger
2020-01-28 15:44   ` Igor Mammedov
2019-12-04  0:37 ` [PATCH v3 03/18] hw/i386: Consolidate topology functions Babu Moger
2020-01-28 15:46   ` Igor Mammedov
2019-12-04  0:37 ` [PATCH v3 04/18] hw/i386: Introduce initialize_topo_info to initialize X86CPUTopoInfo Babu Moger
2020-01-28 15:49   ` Igor Mammedov
2020-01-28 16:42     ` Babu Moger
2019-12-04  0:37 ` [PATCH v3 05/18] machine: Add SMP Sockets in CpuTopology Babu Moger
2019-12-04  0:37 ` [PATCH v3 06/18] hw/core: Add core complex id in X86CPU topology Babu Moger
2020-01-28 16:27   ` Igor Mammedov
2020-01-28 16:44     ` Babu Moger
2020-01-28 16:31   ` Eric Blake
2020-01-28 16:44     ` Babu Moger
2019-12-04  0:37 ` [PATCH v3 07/18] machine: Add a new function init_apicid_fn in MachineClass Babu Moger
2020-01-28 16:29   ` Igor Mammedov
2020-01-28 19:45     ` Babu Moger
2020-01-28 20:12       ` Eduardo Habkost
2020-01-29  9:14       ` Igor Mammedov
2020-01-29 16:17         ` Babu Moger
2020-02-03 15:17           ` Igor Mammedov
2020-02-03 21:49             ` Babu Moger
2020-02-04  7:38               ` Igor Mammedov
2020-01-29 16:32         ` Babu Moger
2020-01-29 16:51           ` Eduardo Habkost
2020-01-29 17:05             ` Babu Moger
2019-12-04  0:37 ` [PATCH v3 08/18] hw/i386: Update structures for nodes_per_pkg Babu Moger
2019-12-04  0:37 ` [PATCH v3 09/18] i386: Add CPUX86Family type in CPUX86State Babu Moger
2019-12-04  0:38 ` [PATCH v3 10/18] hw/386: Add EPYC mode topology decoding functions Babu Moger
2019-12-04  0:38 ` [PATCH v3 11/18] i386: Cleanup and use the EPYC mode topology functions Babu Moger
2019-12-04  0:38 ` [PATCH v3 12/18] numa: Split the numa initialization Babu Moger
2019-12-04  0:38 ` [PATCH v3 13/18] hw/i386: Introduce apicid_from_cpu_idx in PCMachineState Babu Moger
2019-12-04  0:38 ` [PATCH v3 14/18] hw/i386: Introduce topo_ids_from_apicid handler PCMachineState Babu Moger
2019-12-04  0:38 ` [PATCH v3 15/18] hw/i386: Introduce apic_id_from_topo_ids handler in PCMachineState Babu Moger
2019-12-04  0:38 ` [PATCH v3 16/18] hw/i386: Introduce EPYC mode function handlers Babu Moger
2020-01-28 20:04   ` Eduardo Habkost
2020-01-28 21:48     ` Babu Moger
2020-01-29 16:41       ` Eduardo Habkost
2019-12-04  0:38 ` [PATCH v3 17/18] i386: Fix pkg_id offset for epyc mode Babu Moger
2019-12-04  0:39 ` [PATCH v3 18/18] tests: Update the Unit tests Babu Moger
2020-02-03 14:59 ` [PATCH v3 00/18] APIC ID fixes for AMD EPYC CPU models Igor Mammedov
2020-02-03 19:31   ` Babu Moger
2020-02-04  8:02     ` Igor Mammedov
2020-02-04 19:08       ` Babu Moger
2020-02-05  9:38         ` Igor Mammedov
2020-02-05 16:10           ` Babu Moger
2020-02-05 16:56             ` Igor Mammedov
2020-02-05 19:07               ` Babu Moger
2020-02-06 13:08                 ` Igor Mammedov
2020-02-06 15:32                   ` Babu Moger [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=63d0f63c-bd84-5e42-0a30-531a54fb8af3@amd.com \
    --to=babu.moger@amd.com \
    --cc=armbru@redhat.com \
    --cc=ehabkost@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.