From: Daniel Henrique Barboza <danielhb413@gmail.com>
To: Igor Mammedov <imammedo@redhat.com>,
David Gibson <david@gibson.dropbear.id.au>
Cc: Laurent Vivier <lvivier@redhat.com>,
Thomas Huth <thuth@redhat.com>,
Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
qemu-devel@nongnu.org, groug@kaod.org, qemu-ppc@nongnu.org,
clg@kaod.org
Subject: Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes
Date: Tue, 30 Mar 2021 18:33:36 -0300 [thread overview]
Message-ID: <4eb701b6-19cb-6b60-95fa-61e812a4c44b@gmail.com> (raw)
In-Reply-To: <20210330015133.11cd9334@redhat.com>
On 3/29/21 8:51 PM, Igor Mammedov wrote:
> On Tue, 23 Mar 2021 12:03:58 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
>
>> On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza wrote:
>>> Kernel commit 4bce545903fa ("powerpc/topology: Update
>>> topology_core_cpumask") cause a regression in the pseries machine when
>>> defining certain SMP topologies [1]. The reasoning behind the change is
>>> explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating
>>> cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with
>>> large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as
>>> far as the kernel understanding of SMP topologies goes, both masks are
>>> equivalent.
>>>
>>> Further discussions in the kernel mailing list [2] shown that the
>>> powerpc kernel always considered that the number of sockets were equal
>>> to the number of NUMA nodes. The claim is that it doesn't make sense,
>>> for Power hardware at least, 2+ sockets being in the same NUMA node. The
>>> immediate conclusion is that all SMP topologies the pseries machine were
>>> supplying to the kernel, with more than one socket in the same NUMA node
>>> as in [1], happened to be correctly represented in the kernel by
>>> accident during all these years.
>>>
>>> There's a case to be made for virtual topologies being detached from
>>> hardware constraints, allowing maximum flexibility to users. At the same
>>> time, this freedom can't result in unrealistic hardware representations
>>> being emulated. If the real hardware and the pseries kernel don't
>>> support multiple chips/sockets in the same NUMA node, neither should we.
>>>
>>> Starting in 6.0.0, all sockets must match an unique NUMA node in the
>>> pseries machine. qtest changes were made to adapt to this new
>>> condition.
>>
>> Oof. I really don't like this idea. It means a bunch of fiddly work
>> for users to match these up, for no real gain. I'm also concerned
>> that this will require follow on changes in libvirt to not make this a
>> really cryptic and irritating point of failure.
>
> yes, it is annoying to users but we were very slowly tightening limitations
> on numa path, especially if it causes problems on guest side when user
> specify nonsense configs (and suspect there are few things to enforce in
> generic numa code left (currently just warnings)). So it's nothing new.
Do you mind elaborating a bit on the 'nonsense configs' bit? Do you mean
configs that are unrealistic because it's uncompliant with ACPI? I always
thought that all possible SMP/NUMA topologies QEMU allows were ACPI/x86
compliant.
If both x86 and PowerPC have restrictions in common we might want to consider
putting this code in a common helper/place.
>
> So if limit applies to new machine type it should be fine (i.e. no existing
> VMs will suffer). Later on we can deprecate invalid configurations altogether
> and just error out.
hmmm I'm not warning the user about the deprecated topologies in existing
guests in this patch. I guess it doesn't hurt to add a warning for existing
VMs.
Thanks,
DHB
>
>>>
>>> [1] https://bugzilla.redhat.com/1934421
>>> [2] https://lore.kernel.org/linuxppc-dev/daa5d05f-dbd0-05ad-7395-5d5a3d364fc6@gmail.com/
>>>
>>> CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
>>> CC: Cédric Le Goater <clg@kaod.org>
>>> CC: Igor Mammedov <imammedo@redhat.com>
>>> CC: Laurent Vivier <lvivier@redhat.com>
>>> CC: Thomas Huth <thuth@redhat.com>
>>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>>> ---
>>> hw/ppc/spapr.c | 3 ++
>>> hw/ppc/spapr_numa.c | 7 +++++
>>> include/hw/ppc/spapr.h | 1 +
>>> tests/qtest/cpu-plug-test.c | 4 +--
>>> tests/qtest/device-plug-test.c | 9 +++++-
>>> tests/qtest/numa-test.c | 52 ++++++++++++++++++++++++++++------
>>> 6 files changed, 64 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index d56418ca29..745f71c243 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -4611,8 +4611,11 @@ DEFINE_SPAPR_MACHINE(6_0, "6.0", true);
>>> */
>>> static void spapr_machine_5_2_class_options(MachineClass *mc)
>>> {
>>> + SpaprMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
>>> +
>>> spapr_machine_6_0_class_options(mc);
>>> compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len);
>>> + smc->pre_6_0_smp_topology = true;
>>> }
>>>
>>> DEFINE_SPAPR_MACHINE(5_2, "5.2", false);
>>> diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
>>> index 779f18b994..0ade43dd79 100644
>>> --- a/hw/ppc/spapr_numa.c
>>> +++ b/hw/ppc/spapr_numa.c
>>> @@ -163,6 +163,13 @@ void spapr_numa_associativity_init(SpaprMachineState *spapr,
>>> int i, j, max_nodes_with_gpus;
>>> bool using_legacy_numa = spapr_machine_using_legacy_numa(spapr);
>>>
>>> + if (!smc->pre_6_0_smp_topology &&
>>> + nb_numa_nodes != machine->smp.sockets) {
>>> + error_report("Number of CPU sockets must be equal to the number "
>>> + "of NUMA nodes");
>>> + exit(EXIT_FAILURE);
>>> + }
>>> +
>>> /*
>>> * For all associativity arrays: first position is the size,
>>> * position MAX_DISTANCE_REF_POINTS is always the numa_id,
>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>>> index 47cebaf3ac..98dc5d198a 100644
>>> --- a/include/hw/ppc/spapr.h
>>> +++ b/include/hw/ppc/spapr.h
>>> @@ -142,6 +142,7 @@ struct SpaprMachineClass {
>>> hwaddr rma_limit; /* clamp the RMA to this size */
>>> bool pre_5_1_assoc_refpoints;
>>> bool pre_5_2_numa_associativity;
>>> + bool pre_6_0_smp_topology;
>>>
>>> bool (*phb_placement)(SpaprMachineState *spapr, uint32_t index,
>>> uint64_t *buid, hwaddr *pio,
>>> diff --git a/tests/qtest/cpu-plug-test.c b/tests/qtest/cpu-plug-test.c
>>> index a1c689414b..946b9129ea 100644
>>> --- a/tests/qtest/cpu-plug-test.c
>>> +++ b/tests/qtest/cpu-plug-test.c
>>> @@ -118,8 +118,8 @@ static void add_pseries_test_case(const char *mname)
>>> data->machine = g_strdup(mname);
>>> data->cpu_model = "power8_v2.0";
>>> data->device_model = g_strdup("power8_v2.0-spapr-cpu-core");
>>> - data->sockets = 2;
>>> - data->cores = 3;
>>> + data->sockets = 1;
>>> + data->cores = 6;
>>> data->threads = 1;
>>> data->maxcpus = data->sockets * data->cores * data->threads;
>>>
>>> diff --git a/tests/qtest/device-plug-test.c b/tests/qtest/device-plug-test.c
>>> index 559d47727a..dd7d8268d2 100644
>>> --- a/tests/qtest/device-plug-test.c
>>> +++ b/tests/qtest/device-plug-test.c
>>> @@ -91,7 +91,14 @@ static void test_spapr_cpu_unplug_request(void)
>>> {
>>> QTestState *qtest;
>>>
>>> - qtest = qtest_initf("-cpu power9_v2.0 -smp 1,maxcpus=2 "
>>> + /*
>>> + * Default smp settings will prioritize sockets over cores and
>>> + * threads, so '-smp 2,maxcpus=2' will add 2 sockets. However,
>>> + * the pseries machine requires a NUMA node for each socket
>>> + * (since 6.0.0). Specify sockets=1 to make life easier.
>>> + */
>>> + qtest = qtest_initf("-cpu power9_v2.0 "
>>> + "-smp 1,maxcpus=2,threads=1,cores=2,sockets=1 "
>>> "-device power9_v2.0-spapr-cpu-core,core-id=1,id=dev0");
>>>
>>> /* similar to test_pci_unplug_request */
>>> diff --git a/tests/qtest/numa-test.c b/tests/qtest/numa-test.c
>>> index dc0ec571ca..bb13f7131b 100644
>>> --- a/tests/qtest/numa-test.c
>>> +++ b/tests/qtest/numa-test.c
>>> @@ -24,9 +24,17 @@ static void test_mon_explicit(const void *data)
>>> QTestState *qts;
>>> g_autofree char *s = NULL;
>>> g_autofree char *cli = NULL;
>>> + const char *arch = qtest_get_arch();
>>> +
>>> + if (g_str_equal(arch, "ppc64")) {
>>> + cli = make_cli(data, "-smp 8,threads=1,cores=4,sockets=2 "
>>> + "-numa node,nodeid=0,memdev=ram,cpus=0-3 "
>>> + "-numa node,nodeid=1,cpus=4-7");
>>> + } else {
>>> + cli = make_cli(data, "-smp 8 -numa node,nodeid=0,memdev=ram,cpus=0-3 "
>>> + "-numa node,nodeid=1,cpus=4-7");
>>> + }
>>>
>>> - cli = make_cli(data, "-smp 8 -numa node,nodeid=0,memdev=ram,cpus=0-3 "
>>> - "-numa node,nodeid=1,cpus=4-7");
>>> qts = qtest_init(cli);
>>>
>>> s = qtest_hmp(qts, "info numa");
>>> @@ -57,10 +65,18 @@ static void test_mon_partial(const void *data)
>>> QTestState *qts;
>>> g_autofree char *s = NULL;
>>> g_autofree char *cli = NULL;
>>> + const char *arch = qtest_get_arch();
>>> +
>>> + if (g_str_equal(arch, "ppc64")) {
>>> + cli = make_cli(data, "-smp 8,threads=1,cores=4,sockets=2 "
>>> + "-numa node,nodeid=0,memdev=ram,cpus=0-1 "
>>> + "-numa node,nodeid=1,cpus=4-5 ");
>>> + } else {
>>> + cli = make_cli(data, "-smp 8 "
>>> + "-numa node,nodeid=0,memdev=ram,cpus=0-1 "
>>> + "-numa node,nodeid=1,cpus=4-5 ");
>>> + }
>>>
>>> - cli = make_cli(data, "-smp 8 "
>>> - "-numa node,nodeid=0,memdev=ram,cpus=0-1 "
>>> - "-numa node,nodeid=1,cpus=4-5 ");
>>> qts = qtest_init(cli);
>>>
>>> s = qtest_hmp(qts, "info numa");
>>> @@ -85,9 +101,17 @@ static void test_query_cpus(const void *data)
>>> QObject *e;
>>> QTestState *qts;
>>> g_autofree char *cli = NULL;
>>> + const char *arch = qtest_get_arch();
>>> +
>>> + if (g_str_equal(arch, "ppc64")) {
>>> + cli = make_cli(data, "-smp 8,threads=1,cores=4,sockets=2 "
>>> + "-numa node,memdev=ram,cpus=0-3 "
>>> + "-numa node,cpus=4-7");
>>> + } else {
>>> + cli = make_cli(data, "-smp 8 -numa node,memdev=ram,cpus=0-3 "
>>> + "-numa node,cpus=4-7");
>>> + }
>>>
>>> - cli = make_cli(data, "-smp 8 -numa node,memdev=ram,cpus=0-3 "
>>> - "-numa node,cpus=4-7");
>>> qts = qtest_init(cli);
>>> cpus = get_cpus(qts, &resp);
>>> g_assert(cpus);
>>> @@ -177,7 +201,7 @@ static void spapr_numa_cpu(const void *data)
>>> QTestState *qts;
>>> g_autofree char *cli = NULL;
>>>
>>> - cli = make_cli(data, "-smp 4,cores=4 "
>>> + cli = make_cli(data, "-smp 4,threads=1,cores=2,sockets=2 "
>>> "-numa node,nodeid=0,memdev=ram -numa node,nodeid=1 "
>>> "-numa cpu,node-id=0,core-id=0 "
>>> "-numa cpu,node-id=0,core-id=1 "
>>> @@ -554,7 +578,17 @@ int main(int argc, char **argv)
>>>
>>> g_test_init(&argc, &argv, NULL);
>>>
>>> - qtest_add_data_func("/numa/mon/cpus/default", args, test_def_cpu_split);
>>> + /*
>>> + * Starting on 6.0.0, for the pseries machine, '-smp 8' will only work
>>> + * if we have 8 NUMA nodes. If we specify 'smp 8,sockets=2' to match
>>> + * 2 NUMA nodes, the CPUs will be split as 0123/4567 instead of
>>> + * 0246/1357 that test_def_cpu_split expects. In short, this test is
>>> + * no longer valid for ppc64 in 6.0.0.
>>> + */
>>> + if (!g_str_equal(arch, "ppc64")) {
>>> + qtest_add_data_func("/numa/mon/cpus/default", args, test_def_cpu_split);
>>> + }
>>> +
>>> qtest_add_data_func("/numa/mon/cpus/explicit", args, test_mon_explicit);
>>> qtest_add_data_func("/numa/mon/cpus/partial", args, test_mon_partial);
>>> qtest_add_data_func("/numa/qmp/cpus/query-cpus", args, test_query_cpus);
>>
>
next prev parent reply other threads:[~2021-03-30 21:35 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-19 18:34 [PATCH 0/2] pseries: SMP sockets must match NUMA nodes Daniel Henrique Barboza
2021-03-19 18:34 ` [PATCH 1/2] spapr: number of SMP sockets must be equal to " Daniel Henrique Barboza
2021-03-23 1:03 ` David Gibson
2021-03-23 17:21 ` Daniel Henrique Barboza
2021-03-25 2:10 ` David Gibson
2021-03-25 8:56 ` Cédric Le Goater
2021-03-25 10:15 ` Daniel Henrique Barboza
2021-03-29 4:20 ` David Gibson
2021-03-29 15:32 ` Cédric Le Goater
2021-03-29 18:32 ` Daniel Henrique Barboza
2021-03-29 23:55 ` Igor Mammedov
2021-03-31 0:57 ` David Gibson
2021-03-31 4:58 ` Michael Ellerman
2021-03-31 15:22 ` Cédric Le Goater
2021-04-01 2:53 ` David Gibson
2021-03-31 15:18 ` Cédric Le Goater
2021-03-31 17:29 ` Daniel Henrique Barboza
2021-03-31 17:40 ` Cédric Le Goater
2021-04-01 2:59 ` David Gibson
2021-04-01 9:21 ` Cédric Le Goater
2021-03-29 23:51 ` Igor Mammedov
2021-03-30 21:33 ` Daniel Henrique Barboza [this message]
2021-03-19 18:34 ` [PATCH 2/2] spapr.c: remove 'ibm,chip-id' from DT Daniel Henrique Barboza
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4eb701b6-19cb-6b60-95fa-61e812a4c44b@gmail.com \
--to=danielhb413@gmail.com \
--cc=clg@kaod.org \
--cc=david@gibson.dropbear.id.au \
--cc=groug@kaod.org \
--cc=imammedo@redhat.com \
--cc=lvivier@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
--cc=srikar@linux.vnet.ibm.com \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).