Re: [PATCH v2 1/3] hw/arm/virt: Fix CPU's default NUMA node ID

From: "wangyanan (Y)" via <qemu-devel@nongnu.org>
To: Igor Mammedov <imammedo@redhat.com>
Cc: Gavin Shan <gshan@redhat.com>, <qemu-arm@nongnu.org>,
	<qemu-devel@nongnu.org>, <drjones@redhat.com>,
	<peter.maydell@linaro.org>, <richard.henderson@linaro.org>,
	<shan.gavin@gmail.com>, <zhenyzha@redhat.com>
Subject: Re: [PATCH v2 1/3] hw/arm/virt: Fix CPU's default NUMA node ID
Date: Mon, 21 Mar 2022 10:28:52 +0800	[thread overview]
Message-ID: <e6efb1ca-08bb-fce5-de58-b8e2079880ca@huawei.com> (raw)
In-Reply-To: <20220318142723.142157c3@redhat.com>

On 2022/3/18 21:27, Igor Mammedov wrote:
> On Fri, 18 Mar 2022 21:00:35 +0800
> "wangyanan (Y)" <wangyanan55@huawei.com> wrote:
>
>> On 2022/3/18 17:56, Igor Mammedov wrote:
>>> On Fri, 18 Mar 2022 14:23:34 +0800
>>> "wangyanan (Y)" <wangyanan55@huawei.com> wrote:
>>>   
>>>> Hi Gavin,
>>>>
>>>> On 2022/3/3 11:11, Gavin Shan wrote:
>>>>> The default CPU-to-NUMA association is given by mc->get_default_cpu_node_id()
>>>>> when it isn't provided explicitly. However, the CPU topology isn't fully
>>>>> considered in the default association and it causes CPU topology broken
>>>>> warnings on booting Linux guest.
>>>>>
>>>>> For example, the following warning messages are observed when the Linux guest
>>>>> is booted with the following command lines.
>>>>>
>>>>>      /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
>>>>>      -accel kvm -machine virt,gic-version=host               \
>>>>>      -cpu host                                               \
>>>>>      -smp 6,sockets=2,cores=3,threads=1                      \
>>>>>      -m 1024M,slots=16,maxmem=64G                            \
>>>>>      -object memory-backend-ram,id=mem0,size=128M            \
>>>>>      -object memory-backend-ram,id=mem1,size=128M            \
>>>>>      -object memory-backend-ram,id=mem2,size=128M            \
>>>>>      -object memory-backend-ram,id=mem3,size=128M            \
>>>>>      -object memory-backend-ram,id=mem4,size=128M            \
>>>>>      -object memory-backend-ram,id=mem4,size=384M            \
>>>>>      -numa node,nodeid=0,memdev=mem0                         \
>>>>>      -numa node,nodeid=1,memdev=mem1                         \
>>>>>      -numa node,nodeid=2,memdev=mem2                         \
>>>>>      -numa node,nodeid=3,memdev=mem3                         \
>>>>>      -numa node,nodeid=4,memdev=mem4                         \
>>>>>      -numa node,nodeid=5,memdev=mem5
>>>>>             :
>>>>>      alternatives: patching kernel code
>>>>>      BUG: arch topology borken
>>>>>      the CLS domain not a subset of the MC domain
>>>>>      <the above error log repeats>
>>>>>      BUG: arch topology borken
>>>>>      the DIE domain not a subset of the NODE domain
>>>>>
>>>>> With current implementation of mc->get_default_cpu_node_id(), CPU#0 to CPU#5
>>>>> are associated with NODE#0 to NODE#5 separately. That's incorrect because
>>>>> CPU#0/1/2 should be associated with same NUMA node because they're seated
>>>>> in same socket.
>>>>>
>>>>> This fixes the issue by populating the CPU topology in virt_possible_cpu_arch_ids()
>>>>> and considering the socket index when default CPU-to-NUMA association is given
>>>>> in virt_possible_cpu_arch_ids(). With this applied, no more CPU topology broken
>>>>> warnings are seen from the Linux guest. The 6 CPUs are associated with NODE#0/1,
>>>>> but there are no CPUs associated with NODE#2/3/4/5.
>>>> It may be better to split this patch into two. One extends
>>>> virt_possible_cpu_arch_ids,
>>>> and the other fixes the numa node ID issue.
>>>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>>>> ---
>>>>>     hw/arm/virt.c | 17 ++++++++++++++++-
>>>>>     1 file changed, 16 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>>>> index 46bf7ceddf..dee02b60fc 100644
>>>>> --- a/hw/arm/virt.c
>>>>> +++ b/hw/arm/virt.c
>>>>> @@ -2488,7 +2488,9 @@ virt_cpu_index_to_props(MachineState *ms, unsigned cpu_index)
>>>>>     
>>>>>     static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx)
>>>>>     {
>>>>> -    return idx % ms->numa_state->num_nodes;
>>>>> +    int64_t socket_id = ms->possible_cpus->cpus[idx].props.socket_id;
>>>>> +
>>>>> +    return socket_id % ms->numa_state->num_nodes;
>>>>>     }
>>>>>     
>>>>>     static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
>>>>> @@ -2496,6 +2498,7 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
>>>>>         int n;
>>>>>         unsigned int max_cpus = ms->smp.max_cpus;
>>>>>         VirtMachineState *vms = VIRT_MACHINE(ms);
>>>>> +    MachineClass *mc = MACHINE_GET_CLASS(vms);
>>>>>     
>>>>>         if (ms->possible_cpus) {
>>>>>             assert(ms->possible_cpus->len == max_cpus);
>>>>> @@ -2509,6 +2512,18 @@ static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
>>>>>             ms->possible_cpus->cpus[n].type = ms->cpu_type;
>>>>>             ms->possible_cpus->cpus[n].arch_id =
>>>>>                 virt_cpu_mp_affinity(vms, n);
>>>>> +
>>>>> +        ms->possible_cpus->cpus[n].props.has_socket_id = true;
>>>>> +        ms->possible_cpus->cpus[n].props.socket_id =
>>>>> +            n / (ms->smp.dies * ms->smp.clusters *
>>>>> +                ms->smp.cores * ms->smp.threads);
>>>>> +        if (mc->smp_props.dies_supported) {
>>>>> +            ms->possible_cpus->cpus[n].props.has_die_id = true;
>>>>> +            ms->possible_cpus->cpus[n].props.die_id =
>>>>> +                n / (ms->smp.clusters * ms->smp.cores * ms->smp.threads);
>>>>> +        }
>>>> I still don't think we need to consider dies if it's certainly not
>>>> supported yet, IOW, we will never come into the if-branch.
>>>> We are populating arm-specific topo info instead of the generic,
>>>> we can probably uniformly update this part together with other
>>>> necessary places when we decide to support dies for arm virt
>>>> machine in the future. :)
>>> it seems we do support dies and they are supposed to be numa boundary too,
>>> so perhaps we should account for it when generating node-id.
>> Sorry, I actually meant that we currently don't support dies for arm, so
>> that
>> we will always have "mc->smp_props.dies_supported == False" here, which
>> makes the code a bit unnecessary.  dies are only supported for x86 for
>> now. :)
>>
> then perhaps add an assert() here, so that we would notice and fix this
> place when dies_supported becomes true.
A simple assert() works here, I think.

Thanks,
Yanan
>> Thanks,
>> Yanan
>>>>> +        ms->possible_cpus->cpus[n].props.has_core_id = true;
>>>>> +        ms->possible_cpus->cpus[n].props.core_id = n / ms->smp.threads;
>>>>>             ms->possible_cpus->cpus[n].props.has_thread_id = true;
>>>>>             ms->possible_cpus->cpus[n].props.thread_id = n;
>>>>>         }
>>>> Maybe we should use the same algorithm in x86_topo_ids_from_idx
>>>> to populate the IDs, so that scope of socket-id will be [0, total_sockets),
>>>> scope of thread-id is [0, threads_per_core), and so on. Then with a
>>>> group of socket/cluster/core/thread-id, we determine a CPU.
>>>>
>>>> Suggestion: For the long term, is it necessary now to add similar topo
>>>> info infrastructure for ARM, such as X86CPUTopoInfo, X86CPUTopoIDs,
>>>> x86_topo_ids_from_idx?
>>>>
>>>> Thanks,
>>>> Yanan
>>>>   
>>> .
> .