* [PATCH 0/2] pseries: SMP sockets must match NUMA nodes @ 2021-03-19 18:34 Daniel Henrique Barboza 2021-03-19 18:34 ` [PATCH 1/2] spapr: number of SMP sockets must be equal to " Daniel Henrique Barboza 2021-03-19 18:34 ` [PATCH 2/2] spapr.c: remove 'ibm,chip-id' from DT Daniel Henrique Barboza 0 siblings, 2 replies; 23+ messages in thread From: Daniel Henrique Barboza @ 2021-03-19 18:34 UTC (permalink / raw) To: qemu-devel; +Cc: clg, Daniel Henrique Barboza, qemu-ppc, groug, david Hi, The main change made in this series, in patch 01, is result of discussions with the kernel community. See the commit message for more details. Patch 02 is a second version of the patch sent in [1], but now the changes are only effective for the default pseries machine. It's being sent/reported together with patch 01 since they 'share history'. [1] https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg04029.html Daniel Henrique Barboza (2): spapr: number of SMP sockets must be equal to NUMA nodes spapr.c: remove 'ibm,chip-id' from DT hw/ppc/spapr.c | 10 +++++-- hw/ppc/spapr_numa.c | 7 +++++ include/hw/ppc/spapr.h | 1 + tests/qtest/cpu-plug-test.c | 4 +-- tests/qtest/device-plug-test.c | 9 +++++- tests/qtest/numa-test.c | 52 ++++++++++++++++++++++++++++------ 6 files changed, 69 insertions(+), 14 deletions(-) -- 2.29.2 ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes 2021-03-19 18:34 [PATCH 0/2] pseries: SMP sockets must match NUMA nodes Daniel Henrique Barboza @ 2021-03-19 18:34 ` Daniel Henrique Barboza 2021-03-23 1:03 ` David Gibson 2021-03-19 18:34 ` [PATCH 2/2] spapr.c: remove 'ibm,chip-id' from DT Daniel Henrique Barboza 1 sibling, 1 reply; 23+ messages in thread From: Daniel Henrique Barboza @ 2021-03-19 18:34 UTC (permalink / raw) To: qemu-devel Cc: Laurent Vivier, Thomas Huth, Srikar Dronamraju, Daniel Henrique Barboza, groug, qemu-ppc, clg, Igor Mammedov, david Kernel commit 4bce545903fa ("powerpc/topology: Update topology_core_cpumask") cause a regression in the pseries machine when defining certain SMP topologies [1]. The reasoning behind the change is explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as far as the kernel understanding of SMP topologies goes, both masks are equivalent. Further discussions in the kernel mailing list [2] shown that the powerpc kernel always considered that the number of sockets were equal to the number of NUMA nodes. The claim is that it doesn't make sense, for Power hardware at least, 2+ sockets being in the same NUMA node. The immediate conclusion is that all SMP topologies the pseries machine were supplying to the kernel, with more than one socket in the same NUMA node as in [1], happened to be correctly represented in the kernel by accident during all these years. There's a case to be made for virtual topologies being detached from hardware constraints, allowing maximum flexibility to users. At the same time, this freedom can't result in unrealistic hardware representations being emulated. If the real hardware and the pseries kernel don't support multiple chips/sockets in the same NUMA node, neither should we. Starting in 6.0.0, all sockets must match an unique NUMA node in the pseries machine. qtest changes were made to adapt to this new condition. [1] https://bugzilla.redhat.com/1934421 [2] https://lore.kernel.org/linuxppc-dev/daa5d05f-dbd0-05ad-7395-5d5a3d364fc6@gmail.com/ CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com> CC: Cédric Le Goater <clg@kaod.org> CC: Igor Mammedov <imammedo@redhat.com> CC: Laurent Vivier <lvivier@redhat.com> CC: Thomas Huth <thuth@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- hw/ppc/spapr.c | 3 ++ hw/ppc/spapr_numa.c | 7 +++++ include/hw/ppc/spapr.h | 1 + tests/qtest/cpu-plug-test.c | 4 +-- tests/qtest/device-plug-test.c | 9 +++++- tests/qtest/numa-test.c | 52 ++++++++++++++++++++++++++++------ 6 files changed, 64 insertions(+), 12 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index d56418ca29..745f71c243 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -4611,8 +4611,11 @@ DEFINE_SPAPR_MACHINE(6_0, "6.0", true); */ static void spapr_machine_5_2_class_options(MachineClass *mc) { + SpaprMachineClass *smc = SPAPR_MACHINE_CLASS(mc); + spapr_machine_6_0_class_options(mc); compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len); + smc->pre_6_0_smp_topology = true; } DEFINE_SPAPR_MACHINE(5_2, "5.2", false); diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c index 779f18b994..0ade43dd79 100644 --- a/hw/ppc/spapr_numa.c +++ b/hw/ppc/spapr_numa.c @@ -163,6 +163,13 @@ void spapr_numa_associativity_init(SpaprMachineState *spapr, int i, j, max_nodes_with_gpus; bool using_legacy_numa = spapr_machine_using_legacy_numa(spapr); + if (!smc->pre_6_0_smp_topology && + nb_numa_nodes != machine->smp.sockets) { + error_report("Number of CPU sockets must be equal to the number " + "of NUMA nodes"); + exit(EXIT_FAILURE); + } + /* * For all associativity arrays: first position is the size, * position MAX_DISTANCE_REF_POINTS is always the numa_id, diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h index 47cebaf3ac..98dc5d198a 100644 --- a/include/hw/ppc/spapr.h +++ b/include/hw/ppc/spapr.h @@ -142,6 +142,7 @@ struct SpaprMachineClass { hwaddr rma_limit; /* clamp the RMA to this size */ bool pre_5_1_assoc_refpoints; bool pre_5_2_numa_associativity; + bool pre_6_0_smp_topology; bool (*phb_placement)(SpaprMachineState *spapr, uint32_t index, uint64_t *buid, hwaddr *pio, diff --git a/tests/qtest/cpu-plug-test.c b/tests/qtest/cpu-plug-test.c index a1c689414b..946b9129ea 100644 --- a/tests/qtest/cpu-plug-test.c +++ b/tests/qtest/cpu-plug-test.c @@ -118,8 +118,8 @@ static void add_pseries_test_case(const char *mname) data->machine = g_strdup(mname); data->cpu_model = "power8_v2.0"; data->device_model = g_strdup("power8_v2.0-spapr-cpu-core"); - data->sockets = 2; - data->cores = 3; + data->sockets = 1; + data->cores = 6; data->threads = 1; data->maxcpus = data->sockets * data->cores * data->threads; diff --git a/tests/qtest/device-plug-test.c b/tests/qtest/device-plug-test.c index 559d47727a..dd7d8268d2 100644 --- a/tests/qtest/device-plug-test.c +++ b/tests/qtest/device-plug-test.c @@ -91,7 +91,14 @@ static void test_spapr_cpu_unplug_request(void) { QTestState *qtest; - qtest = qtest_initf("-cpu power9_v2.0 -smp 1,maxcpus=2 " + /* + * Default smp settings will prioritize sockets over cores and + * threads, so '-smp 2,maxcpus=2' will add 2 sockets. However, + * the pseries machine requires a NUMA node for each socket + * (since 6.0.0). Specify sockets=1 to make life easier. + */ + qtest = qtest_initf("-cpu power9_v2.0 " + "-smp 1,maxcpus=2,threads=1,cores=2,sockets=1 " "-device power9_v2.0-spapr-cpu-core,core-id=1,id=dev0"); /* similar to test_pci_unplug_request */ diff --git a/tests/qtest/numa-test.c b/tests/qtest/numa-test.c index dc0ec571ca..bb13f7131b 100644 --- a/tests/qtest/numa-test.c +++ b/tests/qtest/numa-test.c @@ -24,9 +24,17 @@ static void test_mon_explicit(const void *data) QTestState *qts; g_autofree char *s = NULL; g_autofree char *cli = NULL; + const char *arch = qtest_get_arch(); + + if (g_str_equal(arch, "ppc64")) { + cli = make_cli(data, "-smp 8,threads=1,cores=4,sockets=2 " + "-numa node,nodeid=0,memdev=ram,cpus=0-3 " + "-numa node,nodeid=1,cpus=4-7"); + } else { + cli = make_cli(data, "-smp 8 -numa node,nodeid=0,memdev=ram,cpus=0-3 " + "-numa node,nodeid=1,cpus=4-7"); + } - cli = make_cli(data, "-smp 8 -numa node,nodeid=0,memdev=ram,cpus=0-3 " - "-numa node,nodeid=1,cpus=4-7"); qts = qtest_init(cli); s = qtest_hmp(qts, "info numa"); @@ -57,10 +65,18 @@ static void test_mon_partial(const void *data) QTestState *qts; g_autofree char *s = NULL; g_autofree char *cli = NULL; + const char *arch = qtest_get_arch(); + + if (g_str_equal(arch, "ppc64")) { + cli = make_cli(data, "-smp 8,threads=1,cores=4,sockets=2 " + "-numa node,nodeid=0,memdev=ram,cpus=0-1 " + "-numa node,nodeid=1,cpus=4-5 "); + } else { + cli = make_cli(data, "-smp 8 " + "-numa node,nodeid=0,memdev=ram,cpus=0-1 " + "-numa node,nodeid=1,cpus=4-5 "); + } - cli = make_cli(data, "-smp 8 " - "-numa node,nodeid=0,memdev=ram,cpus=0-1 " - "-numa node,nodeid=1,cpus=4-5 "); qts = qtest_init(cli); s = qtest_hmp(qts, "info numa"); @@ -85,9 +101,17 @@ static void test_query_cpus(const void *data) QObject *e; QTestState *qts; g_autofree char *cli = NULL; + const char *arch = qtest_get_arch(); + + if (g_str_equal(arch, "ppc64")) { + cli = make_cli(data, "-smp 8,threads=1,cores=4,sockets=2 " + "-numa node,memdev=ram,cpus=0-3 " + "-numa node,cpus=4-7"); + } else { + cli = make_cli(data, "-smp 8 -numa node,memdev=ram,cpus=0-3 " + "-numa node,cpus=4-7"); + } - cli = make_cli(data, "-smp 8 -numa node,memdev=ram,cpus=0-3 " - "-numa node,cpus=4-7"); qts = qtest_init(cli); cpus = get_cpus(qts, &resp); g_assert(cpus); @@ -177,7 +201,7 @@ static void spapr_numa_cpu(const void *data) QTestState *qts; g_autofree char *cli = NULL; - cli = make_cli(data, "-smp 4,cores=4 " + cli = make_cli(data, "-smp 4,threads=1,cores=2,sockets=2 " "-numa node,nodeid=0,memdev=ram -numa node,nodeid=1 " "-numa cpu,node-id=0,core-id=0 " "-numa cpu,node-id=0,core-id=1 " @@ -554,7 +578,17 @@ int main(int argc, char **argv) g_test_init(&argc, &argv, NULL); - qtest_add_data_func("/numa/mon/cpus/default", args, test_def_cpu_split); + /* + * Starting on 6.0.0, for the pseries machine, '-smp 8' will only work + * if we have 8 NUMA nodes. If we specify 'smp 8,sockets=2' to match + * 2 NUMA nodes, the CPUs will be split as 0123/4567 instead of + * 0246/1357 that test_def_cpu_split expects. In short, this test is + * no longer valid for ppc64 in 6.0.0. + */ + if (!g_str_equal(arch, "ppc64")) { + qtest_add_data_func("/numa/mon/cpus/default", args, test_def_cpu_split); + } + qtest_add_data_func("/numa/mon/cpus/explicit", args, test_mon_explicit); qtest_add_data_func("/numa/mon/cpus/partial", args, test_mon_partial); qtest_add_data_func("/numa/qmp/cpus/query-cpus", args, test_query_cpus); -- 2.29.2 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes 2021-03-19 18:34 ` [PATCH 1/2] spapr: number of SMP sockets must be equal to " Daniel Henrique Barboza @ 2021-03-23 1:03 ` David Gibson 2021-03-23 17:21 ` Daniel Henrique Barboza 2021-03-29 23:51 ` Igor Mammedov 0 siblings, 2 replies; 23+ messages in thread From: David Gibson @ 2021-03-23 1:03 UTC (permalink / raw) To: Daniel Henrique Barboza Cc: Laurent Vivier, Thomas Huth, Srikar Dronamraju, qemu-devel, groug, qemu-ppc, clg, Igor Mammedov [-- Attachment #1: Type: text/plain, Size: 10478 bytes --] On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza wrote: > Kernel commit 4bce545903fa ("powerpc/topology: Update > topology_core_cpumask") cause a regression in the pseries machine when > defining certain SMP topologies [1]. The reasoning behind the change is > explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating > cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with > large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as > far as the kernel understanding of SMP topologies goes, both masks are > equivalent. > > Further discussions in the kernel mailing list [2] shown that the > powerpc kernel always considered that the number of sockets were equal > to the number of NUMA nodes. The claim is that it doesn't make sense, > for Power hardware at least, 2+ sockets being in the same NUMA node. The > immediate conclusion is that all SMP topologies the pseries machine were > supplying to the kernel, with more than one socket in the same NUMA node > as in [1], happened to be correctly represented in the kernel by > accident during all these years. > > There's a case to be made for virtual topologies being detached from > hardware constraints, allowing maximum flexibility to users. At the same > time, this freedom can't result in unrealistic hardware representations > being emulated. If the real hardware and the pseries kernel don't > support multiple chips/sockets in the same NUMA node, neither should we. > > Starting in 6.0.0, all sockets must match an unique NUMA node in the > pseries machine. qtest changes were made to adapt to this new > condition. Oof. I really don't like this idea. It means a bunch of fiddly work for users to match these up, for no real gain. I'm also concerned that this will require follow on changes in libvirt to not make this a really cryptic and irritating point of failure. > > [1] https://bugzilla.redhat.com/1934421 > [2] https://lore.kernel.org/linuxppc-dev/daa5d05f-dbd0-05ad-7395-5d5a3d364fc6@gmail.com/ > > CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com> > CC: Cédric Le Goater <clg@kaod.org> > CC: Igor Mammedov <imammedo@redhat.com> > CC: Laurent Vivier <lvivier@redhat.com> > CC: Thomas Huth <thuth@redhat.com> > Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> > --- > hw/ppc/spapr.c | 3 ++ > hw/ppc/spapr_numa.c | 7 +++++ > include/hw/ppc/spapr.h | 1 + > tests/qtest/cpu-plug-test.c | 4 +-- > tests/qtest/device-plug-test.c | 9 +++++- > tests/qtest/numa-test.c | 52 ++++++++++++++++++++++++++++------ > 6 files changed, 64 insertions(+), 12 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index d56418ca29..745f71c243 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -4611,8 +4611,11 @@ DEFINE_SPAPR_MACHINE(6_0, "6.0", true); > */ > static void spapr_machine_5_2_class_options(MachineClass *mc) > { > + SpaprMachineClass *smc = SPAPR_MACHINE_CLASS(mc); > + > spapr_machine_6_0_class_options(mc); > compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len); > + smc->pre_6_0_smp_topology = true; > } > > DEFINE_SPAPR_MACHINE(5_2, "5.2", false); > diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c > index 779f18b994..0ade43dd79 100644 > --- a/hw/ppc/spapr_numa.c > +++ b/hw/ppc/spapr_numa.c > @@ -163,6 +163,13 @@ void spapr_numa_associativity_init(SpaprMachineState *spapr, > int i, j, max_nodes_with_gpus; > bool using_legacy_numa = spapr_machine_using_legacy_numa(spapr); > > + if (!smc->pre_6_0_smp_topology && > + nb_numa_nodes != machine->smp.sockets) { > + error_report("Number of CPU sockets must be equal to the number " > + "of NUMA nodes"); > + exit(EXIT_FAILURE); > + } > + > /* > * For all associativity arrays: first position is the size, > * position MAX_DISTANCE_REF_POINTS is always the numa_id, > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h > index 47cebaf3ac..98dc5d198a 100644 > --- a/include/hw/ppc/spapr.h > +++ b/include/hw/ppc/spapr.h > @@ -142,6 +142,7 @@ struct SpaprMachineClass { > hwaddr rma_limit; /* clamp the RMA to this size */ > bool pre_5_1_assoc_refpoints; > bool pre_5_2_numa_associativity; > + bool pre_6_0_smp_topology; > > bool (*phb_placement)(SpaprMachineState *spapr, uint32_t index, > uint64_t *buid, hwaddr *pio, > diff --git a/tests/qtest/cpu-plug-test.c b/tests/qtest/cpu-plug-test.c > index a1c689414b..946b9129ea 100644 > --- a/tests/qtest/cpu-plug-test.c > +++ b/tests/qtest/cpu-plug-test.c > @@ -118,8 +118,8 @@ static void add_pseries_test_case(const char *mname) > data->machine = g_strdup(mname); > data->cpu_model = "power8_v2.0"; > data->device_model = g_strdup("power8_v2.0-spapr-cpu-core"); > - data->sockets = 2; > - data->cores = 3; > + data->sockets = 1; > + data->cores = 6; > data->threads = 1; > data->maxcpus = data->sockets * data->cores * data->threads; > > diff --git a/tests/qtest/device-plug-test.c b/tests/qtest/device-plug-test.c > index 559d47727a..dd7d8268d2 100644 > --- a/tests/qtest/device-plug-test.c > +++ b/tests/qtest/device-plug-test.c > @@ -91,7 +91,14 @@ static void test_spapr_cpu_unplug_request(void) > { > QTestState *qtest; > > - qtest = qtest_initf("-cpu power9_v2.0 -smp 1,maxcpus=2 " > + /* > + * Default smp settings will prioritize sockets over cores and > + * threads, so '-smp 2,maxcpus=2' will add 2 sockets. However, > + * the pseries machine requires a NUMA node for each socket > + * (since 6.0.0). Specify sockets=1 to make life easier. > + */ > + qtest = qtest_initf("-cpu power9_v2.0 " > + "-smp 1,maxcpus=2,threads=1,cores=2,sockets=1 " > "-device power9_v2.0-spapr-cpu-core,core-id=1,id=dev0"); > > /* similar to test_pci_unplug_request */ > diff --git a/tests/qtest/numa-test.c b/tests/qtest/numa-test.c > index dc0ec571ca..bb13f7131b 100644 > --- a/tests/qtest/numa-test.c > +++ b/tests/qtest/numa-test.c > @@ -24,9 +24,17 @@ static void test_mon_explicit(const void *data) > QTestState *qts; > g_autofree char *s = NULL; > g_autofree char *cli = NULL; > + const char *arch = qtest_get_arch(); > + > + if (g_str_equal(arch, "ppc64")) { > + cli = make_cli(data, "-smp 8,threads=1,cores=4,sockets=2 " > + "-numa node,nodeid=0,memdev=ram,cpus=0-3 " > + "-numa node,nodeid=1,cpus=4-7"); > + } else { > + cli = make_cli(data, "-smp 8 -numa node,nodeid=0,memdev=ram,cpus=0-3 " > + "-numa node,nodeid=1,cpus=4-7"); > + } > > - cli = make_cli(data, "-smp 8 -numa node,nodeid=0,memdev=ram,cpus=0-3 " > - "-numa node,nodeid=1,cpus=4-7"); > qts = qtest_init(cli); > > s = qtest_hmp(qts, "info numa"); > @@ -57,10 +65,18 @@ static void test_mon_partial(const void *data) > QTestState *qts; > g_autofree char *s = NULL; > g_autofree char *cli = NULL; > + const char *arch = qtest_get_arch(); > + > + if (g_str_equal(arch, "ppc64")) { > + cli = make_cli(data, "-smp 8,threads=1,cores=4,sockets=2 " > + "-numa node,nodeid=0,memdev=ram,cpus=0-1 " > + "-numa node,nodeid=1,cpus=4-5 "); > + } else { > + cli = make_cli(data, "-smp 8 " > + "-numa node,nodeid=0,memdev=ram,cpus=0-1 " > + "-numa node,nodeid=1,cpus=4-5 "); > + } > > - cli = make_cli(data, "-smp 8 " > - "-numa node,nodeid=0,memdev=ram,cpus=0-1 " > - "-numa node,nodeid=1,cpus=4-5 "); > qts = qtest_init(cli); > > s = qtest_hmp(qts, "info numa"); > @@ -85,9 +101,17 @@ static void test_query_cpus(const void *data) > QObject *e; > QTestState *qts; > g_autofree char *cli = NULL; > + const char *arch = qtest_get_arch(); > + > + if (g_str_equal(arch, "ppc64")) { > + cli = make_cli(data, "-smp 8,threads=1,cores=4,sockets=2 " > + "-numa node,memdev=ram,cpus=0-3 " > + "-numa node,cpus=4-7"); > + } else { > + cli = make_cli(data, "-smp 8 -numa node,memdev=ram,cpus=0-3 " > + "-numa node,cpus=4-7"); > + } > > - cli = make_cli(data, "-smp 8 -numa node,memdev=ram,cpus=0-3 " > - "-numa node,cpus=4-7"); > qts = qtest_init(cli); > cpus = get_cpus(qts, &resp); > g_assert(cpus); > @@ -177,7 +201,7 @@ static void spapr_numa_cpu(const void *data) > QTestState *qts; > g_autofree char *cli = NULL; > > - cli = make_cli(data, "-smp 4,cores=4 " > + cli = make_cli(data, "-smp 4,threads=1,cores=2,sockets=2 " > "-numa node,nodeid=0,memdev=ram -numa node,nodeid=1 " > "-numa cpu,node-id=0,core-id=0 " > "-numa cpu,node-id=0,core-id=1 " > @@ -554,7 +578,17 @@ int main(int argc, char **argv) > > g_test_init(&argc, &argv, NULL); > > - qtest_add_data_func("/numa/mon/cpus/default", args, test_def_cpu_split); > + /* > + * Starting on 6.0.0, for the pseries machine, '-smp 8' will only work > + * if we have 8 NUMA nodes. If we specify 'smp 8,sockets=2' to match > + * 2 NUMA nodes, the CPUs will be split as 0123/4567 instead of > + * 0246/1357 that test_def_cpu_split expects. In short, this test is > + * no longer valid for ppc64 in 6.0.0. > + */ > + if (!g_str_equal(arch, "ppc64")) { > + qtest_add_data_func("/numa/mon/cpus/default", args, test_def_cpu_split); > + } > + > qtest_add_data_func("/numa/mon/cpus/explicit", args, test_mon_explicit); > qtest_add_data_func("/numa/mon/cpus/partial", args, test_mon_partial); > qtest_add_data_func("/numa/qmp/cpus/query-cpus", args, test_query_cpus); -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes 2021-03-23 1:03 ` David Gibson @ 2021-03-23 17:21 ` Daniel Henrique Barboza 2021-03-25 2:10 ` David Gibson 2021-03-29 23:51 ` Igor Mammedov 1 sibling, 1 reply; 23+ messages in thread From: Daniel Henrique Barboza @ 2021-03-23 17:21 UTC (permalink / raw) To: David Gibson Cc: Laurent Vivier, Thomas Huth, Srikar Dronamraju, qemu-devel, groug, qemu-ppc, clg, Igor Mammedov On 3/22/21 10:03 PM, David Gibson wrote: > On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza wrote: >> Kernel commit 4bce545903fa ("powerpc/topology: Update >> topology_core_cpumask") cause a regression in the pseries machine when >> defining certain SMP topologies [1]. The reasoning behind the change is >> explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating >> cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with >> large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as >> far as the kernel understanding of SMP topologies goes, both masks are >> equivalent. >> >> Further discussions in the kernel mailing list [2] shown that the >> powerpc kernel always considered that the number of sockets were equal >> to the number of NUMA nodes. The claim is that it doesn't make sense, >> for Power hardware at least, 2+ sockets being in the same NUMA node. The >> immediate conclusion is that all SMP topologies the pseries machine were >> supplying to the kernel, with more than one socket in the same NUMA node >> as in [1], happened to be correctly represented in the kernel by >> accident during all these years. >> >> There's a case to be made for virtual topologies being detached from >> hardware constraints, allowing maximum flexibility to users. At the same >> time, this freedom can't result in unrealistic hardware representations >> being emulated. If the real hardware and the pseries kernel don't >> support multiple chips/sockets in the same NUMA node, neither should we. >> >> Starting in 6.0.0, all sockets must match an unique NUMA node in the >> pseries machine. qtest changes were made to adapt to this new >> condition. > > Oof. I really don't like this idea. It means a bunch of fiddly work > for users to match these up, for no real gain. I'm also concerned > that this will require follow on changes in libvirt to not make this a > really cryptic and irritating point of failure. Haven't though about required Libvirt changes, although I can say that there will be some amount to be mande and it will probably annoy existing users (everyone that has a multiple socket per NUMA node topology). There is not much we can do from the QEMU layer aside from what I've proposed here. The other alternative is to keep interacting with the kernel folks to see if there is a way to keep our use case untouched. This also means that 'ibm,chip-id' will probably remain in use since it's the only place where we inform cores per socket information to the kernel. Thanks, DHB > >> >> [1] https://bugzilla.redhat.com/1934421 >> [2] https://lore.kernel.org/linuxppc-dev/daa5d05f-dbd0-05ad-7395-5d5a3d364fc6@gmail.com/ >> >> CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com> >> CC: Cédric Le Goater <clg@kaod.org> >> CC: Igor Mammedov <imammedo@redhat.com> >> CC: Laurent Vivier <lvivier@redhat.com> >> CC: Thomas Huth <thuth@redhat.com> >> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> >> --- >> hw/ppc/spapr.c | 3 ++ >> hw/ppc/spapr_numa.c | 7 +++++ >> include/hw/ppc/spapr.h | 1 + >> tests/qtest/cpu-plug-test.c | 4 +-- >> tests/qtest/device-plug-test.c | 9 +++++- >> tests/qtest/numa-test.c | 52 ++++++++++++++++++++++++++++------ >> 6 files changed, 64 insertions(+), 12 deletions(-) >> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c >> index d56418ca29..745f71c243 100644 >> --- a/hw/ppc/spapr.c >> +++ b/hw/ppc/spapr.c >> @@ -4611,8 +4611,11 @@ DEFINE_SPAPR_MACHINE(6_0, "6.0", true); >> */ >> static void spapr_machine_5_2_class_options(MachineClass *mc) >> { >> + SpaprMachineClass *smc = SPAPR_MACHINE_CLASS(mc); >> + >> spapr_machine_6_0_class_options(mc); >> compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len); >> + smc->pre_6_0_smp_topology = true; >> } >> >> DEFINE_SPAPR_MACHINE(5_2, "5.2", false); >> diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c >> index 779f18b994..0ade43dd79 100644 >> --- a/hw/ppc/spapr_numa.c >> +++ b/hw/ppc/spapr_numa.c >> @@ -163,6 +163,13 @@ void spapr_numa_associativity_init(SpaprMachineState *spapr, >> int i, j, max_nodes_with_gpus; >> bool using_legacy_numa = spapr_machine_using_legacy_numa(spapr); >> >> + if (!smc->pre_6_0_smp_topology && >> + nb_numa_nodes != machine->smp.sockets) { >> + error_report("Number of CPU sockets must be equal to the number " >> + "of NUMA nodes"); >> + exit(EXIT_FAILURE); >> + } >> + >> /* >> * For all associativity arrays: first position is the size, >> * position MAX_DISTANCE_REF_POINTS is always the numa_id, >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h >> index 47cebaf3ac..98dc5d198a 100644 >> --- a/include/hw/ppc/spapr.h >> +++ b/include/hw/ppc/spapr.h >> @@ -142,6 +142,7 @@ struct SpaprMachineClass { >> hwaddr rma_limit; /* clamp the RMA to this size */ >> bool pre_5_1_assoc_refpoints; >> bool pre_5_2_numa_associativity; >> + bool pre_6_0_smp_topology; >> >> bool (*phb_placement)(SpaprMachineState *spapr, uint32_t index, >> uint64_t *buid, hwaddr *pio, >> diff --git a/tests/qtest/cpu-plug-test.c b/tests/qtest/cpu-plug-test.c >> index a1c689414b..946b9129ea 100644 >> --- a/tests/qtest/cpu-plug-test.c >> +++ b/tests/qtest/cpu-plug-test.c >> @@ -118,8 +118,8 @@ static void add_pseries_test_case(const char *mname) >> data->machine = g_strdup(mname); >> data->cpu_model = "power8_v2.0"; >> data->device_model = g_strdup("power8_v2.0-spapr-cpu-core"); >> - data->sockets = 2; >> - data->cores = 3; >> + data->sockets = 1; >> + data->cores = 6; >> data->threads = 1; >> data->maxcpus = data->sockets * data->cores * data->threads; >> >> diff --git a/tests/qtest/device-plug-test.c b/tests/qtest/device-plug-test.c >> index 559d47727a..dd7d8268d2 100644 >> --- a/tests/qtest/device-plug-test.c >> +++ b/tests/qtest/device-plug-test.c >> @@ -91,7 +91,14 @@ static void test_spapr_cpu_unplug_request(void) >> { >> QTestState *qtest; >> >> - qtest = qtest_initf("-cpu power9_v2.0 -smp 1,maxcpus=2 " >> + /* >> + * Default smp settings will prioritize sockets over cores and >> + * threads, so '-smp 2,maxcpus=2' will add 2 sockets. However, >> + * the pseries machine requires a NUMA node for each socket >> + * (since 6.0.0). Specify sockets=1 to make life easier. >> + */ >> + qtest = qtest_initf("-cpu power9_v2.0 " >> + "-smp 1,maxcpus=2,threads=1,cores=2,sockets=1 " >> "-device power9_v2.0-spapr-cpu-core,core-id=1,id=dev0"); >> >> /* similar to test_pci_unplug_request */ >> diff --git a/tests/qtest/numa-test.c b/tests/qtest/numa-test.c >> index dc0ec571ca..bb13f7131b 100644 >> --- a/tests/qtest/numa-test.c >> +++ b/tests/qtest/numa-test.c >> @@ -24,9 +24,17 @@ static void test_mon_explicit(const void *data) >> QTestState *qts; >> g_autofree char *s = NULL; >> g_autofree char *cli = NULL; >> + const char *arch = qtest_get_arch(); >> + >> + if (g_str_equal(arch, "ppc64")) { >> + cli = make_cli(data, "-smp 8,threads=1,cores=4,sockets=2 " >> + "-numa node,nodeid=0,memdev=ram,cpus=0-3 " >> + "-numa node,nodeid=1,cpus=4-7"); >> + } else { >> + cli = make_cli(data, "-smp 8 -numa node,nodeid=0,memdev=ram,cpus=0-3 " >> + "-numa node,nodeid=1,cpus=4-7"); >> + } >> >> - cli = make_cli(data, "-smp 8 -numa node,nodeid=0,memdev=ram,cpus=0-3 " >> - "-numa node,nodeid=1,cpus=4-7"); >> qts = qtest_init(cli); >> >> s = qtest_hmp(qts, "info numa"); >> @@ -57,10 +65,18 @@ static void test_mon_partial(const void *data) >> QTestState *qts; >> g_autofree char *s = NULL; >> g_autofree char *cli = NULL; >> + const char *arch = qtest_get_arch(); >> + >> + if (g_str_equal(arch, "ppc64")) { >> + cli = make_cli(data, "-smp 8,threads=1,cores=4,sockets=2 " >> + "-numa node,nodeid=0,memdev=ram,cpus=0-1 " >> + "-numa node,nodeid=1,cpus=4-5 "); >> + } else { >> + cli = make_cli(data, "-smp 8 " >> + "-numa node,nodeid=0,memdev=ram,cpus=0-1 " >> + "-numa node,nodeid=1,cpus=4-5 "); >> + } >> >> - cli = make_cli(data, "-smp 8 " >> - "-numa node,nodeid=0,memdev=ram,cpus=0-1 " >> - "-numa node,nodeid=1,cpus=4-5 "); >> qts = qtest_init(cli); >> >> s = qtest_hmp(qts, "info numa"); >> @@ -85,9 +101,17 @@ static void test_query_cpus(const void *data) >> QObject *e; >> QTestState *qts; >> g_autofree char *cli = NULL; >> + const char *arch = qtest_get_arch(); >> + >> + if (g_str_equal(arch, "ppc64")) { >> + cli = make_cli(data, "-smp 8,threads=1,cores=4,sockets=2 " >> + "-numa node,memdev=ram,cpus=0-3 " >> + "-numa node,cpus=4-7"); >> + } else { >> + cli = make_cli(data, "-smp 8 -numa node,memdev=ram,cpus=0-3 " >> + "-numa node,cpus=4-7"); >> + } >> >> - cli = make_cli(data, "-smp 8 -numa node,memdev=ram,cpus=0-3 " >> - "-numa node,cpus=4-7"); >> qts = qtest_init(cli); >> cpus = get_cpus(qts, &resp); >> g_assert(cpus); >> @@ -177,7 +201,7 @@ static void spapr_numa_cpu(const void *data) >> QTestState *qts; >> g_autofree char *cli = NULL; >> >> - cli = make_cli(data, "-smp 4,cores=4 " >> + cli = make_cli(data, "-smp 4,threads=1,cores=2,sockets=2 " >> "-numa node,nodeid=0,memdev=ram -numa node,nodeid=1 " >> "-numa cpu,node-id=0,core-id=0 " >> "-numa cpu,node-id=0,core-id=1 " >> @@ -554,7 +578,17 @@ int main(int argc, char **argv) >> >> g_test_init(&argc, &argv, NULL); >> >> - qtest_add_data_func("/numa/mon/cpus/default", args, test_def_cpu_split); >> + /* >> + * Starting on 6.0.0, for the pseries machine, '-smp 8' will only work >> + * if we have 8 NUMA nodes. If we specify 'smp 8,sockets=2' to match >> + * 2 NUMA nodes, the CPUs will be split as 0123/4567 instead of >> + * 0246/1357 that test_def_cpu_split expects. In short, this test is >> + * no longer valid for ppc64 in 6.0.0. >> + */ >> + if (!g_str_equal(arch, "ppc64")) { >> + qtest_add_data_func("/numa/mon/cpus/default", args, test_def_cpu_split); >> + } >> + >> qtest_add_data_func("/numa/mon/cpus/explicit", args, test_mon_explicit); >> qtest_add_data_func("/numa/mon/cpus/partial", args, test_mon_partial); >> qtest_add_data_func("/numa/qmp/cpus/query-cpus", args, test_query_cpus); > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes 2021-03-23 17:21 ` Daniel Henrique Barboza @ 2021-03-25 2:10 ` David Gibson 2021-03-25 8:56 ` Cédric Le Goater 0 siblings, 1 reply; 23+ messages in thread From: David Gibson @ 2021-03-25 2:10 UTC (permalink / raw) To: Daniel Henrique Barboza Cc: Laurent Vivier, Thomas Huth, Srikar Dronamraju, qemu-devel, groug, qemu-ppc, clg, Igor Mammedov [-- Attachment #1: Type: text/plain, Size: 3347 bytes --] On Tue, Mar 23, 2021 at 02:21:33PM -0300, Daniel Henrique Barboza wrote: > > > On 3/22/21 10:03 PM, David Gibson wrote: > > On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza wrote: > > > Kernel commit 4bce545903fa ("powerpc/topology: Update > > > topology_core_cpumask") cause a regression in the pseries machine when > > > defining certain SMP topologies [1]. The reasoning behind the change is > > > explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating > > > cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with > > > large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as > > > far as the kernel understanding of SMP topologies goes, both masks are > > > equivalent. > > > > > > Further discussions in the kernel mailing list [2] shown that the > > > powerpc kernel always considered that the number of sockets were equal > > > to the number of NUMA nodes. The claim is that it doesn't make sense, > > > for Power hardware at least, 2+ sockets being in the same NUMA node. The > > > immediate conclusion is that all SMP topologies the pseries machine were > > > supplying to the kernel, with more than one socket in the same NUMA node > > > as in [1], happened to be correctly represented in the kernel by > > > accident during all these years. > > > > > > There's a case to be made for virtual topologies being detached from > > > hardware constraints, allowing maximum flexibility to users. At the same > > > time, this freedom can't result in unrealistic hardware representations > > > being emulated. If the real hardware and the pseries kernel don't > > > support multiple chips/sockets in the same NUMA node, neither should we. > > > > > > Starting in 6.0.0, all sockets must match an unique NUMA node in the > > > pseries machine. qtest changes were made to adapt to this new > > > condition. > > > > Oof. I really don't like this idea. It means a bunch of fiddly work > > for users to match these up, for no real gain. I'm also concerned > > that this will require follow on changes in libvirt to not make this a > > really cryptic and irritating point of failure. > > Haven't though about required Libvirt changes, although I can say that there > will be some amount to be mande and it will probably annoy existing users > (everyone that has a multiple socket per NUMA node topology). > > There is not much we can do from the QEMU layer aside from what I've proposed > here. The other alternative is to keep interacting with the kernel folks to > see if there is a way to keep our use case untouched. Right. Well.. not necessarily untouched, but I'm hoping for more replies from Cédric to my objections and mpe's. Even with sockets being a kinda meaningless concept in PAPR, I don't think tying it to NUMA nodes makes sense. > This also means that > 'ibm,chip-id' will probably remain in use since it's the only place where > we inform cores per socket information to the kernel. Well.. unless we can find some other sensible way to convey that information. I haven't given up hope for that yet. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes 2021-03-25 2:10 ` David Gibson @ 2021-03-25 8:56 ` Cédric Le Goater 2021-03-25 10:15 ` Daniel Henrique Barboza 2021-03-29 4:20 ` David Gibson 0 siblings, 2 replies; 23+ messages in thread From: Cédric Le Goater @ 2021-03-25 8:56 UTC (permalink / raw) To: David Gibson, Daniel Henrique Barboza Cc: Laurent Vivier, Thomas Huth, Srikar Dronamraju, Michael Ellerman, qemu-devel, groug, qemu-ppc, Igor Mammedov On 3/25/21 3:10 AM, David Gibson wrote: > On Tue, Mar 23, 2021 at 02:21:33PM -0300, Daniel Henrique Barboza wrote: >> >> >> On 3/22/21 10:03 PM, David Gibson wrote: >>> On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza wrote: >>>> Kernel commit 4bce545903fa ("powerpc/topology: Update >>>> topology_core_cpumask") cause a regression in the pseries machine when >>>> defining certain SMP topologies [1]. The reasoning behind the change is >>>> explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating >>>> cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with >>>> large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as >>>> far as the kernel understanding of SMP topologies goes, both masks are >>>> equivalent. >>>> >>>> Further discussions in the kernel mailing list [2] shown that the >>>> powerpc kernel always considered that the number of sockets were equal >>>> to the number of NUMA nodes. The claim is that it doesn't make sense, >>>> for Power hardware at least, 2+ sockets being in the same NUMA node. The >>>> immediate conclusion is that all SMP topologies the pseries machine were >>>> supplying to the kernel, with more than one socket in the same NUMA node >>>> as in [1], happened to be correctly represented in the kernel by >>>> accident during all these years. >>>> >>>> There's a case to be made for virtual topologies being detached from >>>> hardware constraints, allowing maximum flexibility to users. At the same >>>> time, this freedom can't result in unrealistic hardware representations >>>> being emulated. If the real hardware and the pseries kernel don't >>>> support multiple chips/sockets in the same NUMA node, neither should we. >>>> >>>> Starting in 6.0.0, all sockets must match an unique NUMA node in the >>>> pseries machine. qtest changes were made to adapt to this new >>>> condition. >>> >>> Oof. I really don't like this idea. It means a bunch of fiddly work >>> for users to match these up, for no real gain. I'm also concerned >>> that this will require follow on changes in libvirt to not make this a >>> really cryptic and irritating point of failure. >> >> Haven't though about required Libvirt changes, although I can say that there >> will be some amount to be mande and it will probably annoy existing users >> (everyone that has a multiple socket per NUMA node topology). >> >> There is not much we can do from the QEMU layer aside from what I've proposed >> here. The other alternative is to keep interacting with the kernel folks to >> see if there is a way to keep our use case untouched. > > Right. Well.. not necessarily untouched, but I'm hoping for more > replies from Cédric to my objections and mpe's. Even with sockets > being a kinda meaningless concept in PAPR, I don't think tying it to > NUMA nodes makes sense. I did a couple of replies in different email threads but maybe not to all. I felt it was going nowhere :/ Couple of thoughts, Shouldn't we get rid of the socket concept, die also, under pseries since they don't exist under PAPR ? We only have numa nodes, cores, threads AFAICT. Should we diverged from PAPR and add extra DT properties "qemu,..." ? There are a couple of places where Linux checks for the underlying hypervisor already. >> This also means that >> 'ibm,chip-id' will probably remain in use since it's the only place where >> we inform cores per socket information to the kernel. > > Well.. unless we can find some other sensible way to convey that > information. I haven't given up hope for that yet. Well, we could start by fixing the value in QEMU. It is broken today. This is all coming from some work we did last year to evaluate our HW (mostly for XIVE) on 2s, 4s, 16s systems on baremetal, KVM and PowerVM. We saw some real problems because Linux did not have a clear view of the topology. See the figures here : http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210303174857.1760393-9-clg@kaod.org/ The node id is a key parameter for system resource management, memory allocation, interrupt affinity, etc. Linux scales much better if used correctly. C. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes 2021-03-25 8:56 ` Cédric Le Goater @ 2021-03-25 10:15 ` Daniel Henrique Barboza 2021-03-29 4:20 ` David Gibson 1 sibling, 0 replies; 23+ messages in thread From: Daniel Henrique Barboza @ 2021-03-25 10:15 UTC (permalink / raw) To: Cédric Le Goater, David Gibson Cc: Laurent Vivier, Thomas Huth, Srikar Dronamraju, Michael Ellerman, qemu-devel, groug, qemu-ppc, Igor Mammedov On 3/25/21 5:56 AM, Cédric Le Goater wrote: > On 3/25/21 3:10 AM, David Gibson wrote: >> On Tue, Mar 23, 2021 at 02:21:33PM -0300, Daniel Henrique Barboza wrote: >>> >>> >>> On 3/22/21 10:03 PM, David Gibson wrote: >>>> On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza wrote: >>>>> Kernel commit 4bce545903fa ("powerpc/topology: Update >>>>> topology_core_cpumask") cause a regression in the pseries machine when >>>>> defining certain SMP topologies [1]. The reasoning behind the change is >>>>> explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating >>>>> cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with >>>>> large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as >>>>> far as the kernel understanding of SMP topologies goes, both masks are >>>>> equivalent. >>>>> >>>>> Further discussions in the kernel mailing list [2] shown that the >>>>> powerpc kernel always considered that the number of sockets were equal >>>>> to the number of NUMA nodes. The claim is that it doesn't make sense, >>>>> for Power hardware at least, 2+ sockets being in the same NUMA node. The >>>>> immediate conclusion is that all SMP topologies the pseries machine were >>>>> supplying to the kernel, with more than one socket in the same NUMA node >>>>> as in [1], happened to be correctly represented in the kernel by >>>>> accident during all these years. >>>>> >>>>> There's a case to be made for virtual topologies being detached from >>>>> hardware constraints, allowing maximum flexibility to users. At the same >>>>> time, this freedom can't result in unrealistic hardware representations >>>>> being emulated. If the real hardware and the pseries kernel don't >>>>> support multiple chips/sockets in the same NUMA node, neither should we. >>>>> >>>>> Starting in 6.0.0, all sockets must match an unique NUMA node in the >>>>> pseries machine. qtest changes were made to adapt to this new >>>>> condition. >>>> >>>> Oof. I really don't like this idea. It means a bunch of fiddly work >>>> for users to match these up, for no real gain. I'm also concerned >>>> that this will require follow on changes in libvirt to not make this a >>>> really cryptic and irritating point of failure. >>> >>> Haven't though about required Libvirt changes, although I can say that there >>> will be some amount to be mande and it will probably annoy existing users >>> (everyone that has a multiple socket per NUMA node topology). >>> >>> There is not much we can do from the QEMU layer aside from what I've proposed >>> here. The other alternative is to keep interacting with the kernel folks to >>> see if there is a way to keep our use case untouched. >> >> Right. Well.. not necessarily untouched, but I'm hoping for more >> replies from Cédric to my objections and mpe's. Even with sockets >> being a kinda meaningless concept in PAPR, I don't think tying it to >> NUMA nodes makes sense. > > I did a couple of replies in different email threads but maybe not > to all. I felt it was going nowhere :/ Couple of thoughts, > > Shouldn't we get rid of the socket concept, die also, under pseries > since they don't exist under PAPR ? We only have numa nodes, cores, > threads AFAICT. I don't think we work with 'die'. Getting rid of the 'socket' representation is sensible regarding PAPR, but the effect for pseries will be similar to what this patch is already doing: users could have multiple sockets in the same NUMA node, and then they won't. Either because we got rid of the 'socket' representation or because socket == NUMA node. > > Should we diverged from PAPR and add extra DT properties "qemu,..." ? > There are a couple of places where Linux checks for the underlying > hypervisor already. > >>> This also means that >>> 'ibm,chip-id' will probably remain in use since it's the only place where >>> we inform cores per socket information to the kernel. >> >> Well.. unless we can find some other sensible way to convey that >> information. I haven't given up hope for that yet. > > Well, we could start by fixing the value in QEMU. It is broken today. I'll look into it. It makes more sense to talk about keeping it when it's working properly. DHB > > > This is all coming from some work we did last year to evaluate our HW > (mostly for XIVE) on 2s, 4s, 16s systems on baremetal, KVM and PowerVM. > We saw some real problems because Linux did not have a clear view of the > topology. See the figures here : > > http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210303174857.1760393-9-clg@kaod.org/ > > The node id is a key parameter for system resource management, memory > allocation, interrupt affinity, etc. Linux scales much better if used > correctly. > > C. > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes 2021-03-25 8:56 ` Cédric Le Goater 2021-03-25 10:15 ` Daniel Henrique Barboza @ 2021-03-29 4:20 ` David Gibson 2021-03-29 15:32 ` Cédric Le Goater 1 sibling, 1 reply; 23+ messages in thread From: David Gibson @ 2021-03-29 4:20 UTC (permalink / raw) To: Cédric Le Goater Cc: Laurent Vivier, Thomas Huth, Srikar Dronamraju, Michael Ellerman, Daniel Henrique Barboza, qemu-devel, groug, qemu-ppc, Igor Mammedov [-- Attachment #1: Type: text/plain, Size: 5505 bytes --] On Thu, Mar 25, 2021 at 09:56:04AM +0100, Cédric Le Goater wrote: > On 3/25/21 3:10 AM, David Gibson wrote: > > On Tue, Mar 23, 2021 at 02:21:33PM -0300, Daniel Henrique Barboza wrote: > >> > >> > >> On 3/22/21 10:03 PM, David Gibson wrote: > >>> On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza wrote: > >>>> Kernel commit 4bce545903fa ("powerpc/topology: Update > >>>> topology_core_cpumask") cause a regression in the pseries machine when > >>>> defining certain SMP topologies [1]. The reasoning behind the change is > >>>> explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating > >>>> cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with > >>>> large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as > >>>> far as the kernel understanding of SMP topologies goes, both masks are > >>>> equivalent. > >>>> > >>>> Further discussions in the kernel mailing list [2] shown that the > >>>> powerpc kernel always considered that the number of sockets were equal > >>>> to the number of NUMA nodes. The claim is that it doesn't make sense, > >>>> for Power hardware at least, 2+ sockets being in the same NUMA node. The > >>>> immediate conclusion is that all SMP topologies the pseries machine were > >>>> supplying to the kernel, with more than one socket in the same NUMA node > >>>> as in [1], happened to be correctly represented in the kernel by > >>>> accident during all these years. > >>>> > >>>> There's a case to be made for virtual topologies being detached from > >>>> hardware constraints, allowing maximum flexibility to users. At the same > >>>> time, this freedom can't result in unrealistic hardware representations > >>>> being emulated. If the real hardware and the pseries kernel don't > >>>> support multiple chips/sockets in the same NUMA node, neither should we. > >>>> > >>>> Starting in 6.0.0, all sockets must match an unique NUMA node in the > >>>> pseries machine. qtest changes were made to adapt to this new > >>>> condition. > >>> > >>> Oof. I really don't like this idea. It means a bunch of fiddly work > >>> for users to match these up, for no real gain. I'm also concerned > >>> that this will require follow on changes in libvirt to not make this a > >>> really cryptic and irritating point of failure. > >> > >> Haven't though about required Libvirt changes, although I can say that there > >> will be some amount to be mande and it will probably annoy existing users > >> (everyone that has a multiple socket per NUMA node topology). > >> > >> There is not much we can do from the QEMU layer aside from what I've proposed > >> here. The other alternative is to keep interacting with the kernel folks to > >> see if there is a way to keep our use case untouched. > > > > Right. Well.. not necessarily untouched, but I'm hoping for more > > replies from Cédric to my objections and mpe's. Even with sockets > > being a kinda meaningless concept in PAPR, I don't think tying it to > > NUMA nodes makes sense. > > I did a couple of replies in different email threads but maybe not > to all. I felt it was going nowhere :/ Couple of thoughts, I think I saw some of those, but maybe not all. > Shouldn't we get rid of the socket concept, die also, under pseries > since they don't exist under PAPR ? We only have numa nodes, cores, > threads AFAICT. Theoretically, yes. I'm not sure it's really practical, though, since AFAICT, both qemu and the kernel have the notion of sockets (though not dies) built into generic code. It does mean that one possible approach here - maybe the best one - is to simply declare that sockets are meaningless under, so we simply don't expect what the guest kernel reports to match what's given to qemu. It'd be nice to avoid that if we can: in a sense it's just cosmetic, but it is likely to surprise and confuse people. > Should we diverged from PAPR and add extra DT properties "qemu,..." ? > There are a couple of places where Linux checks for the underlying > hypervisor already. > > >> This also means that > >> 'ibm,chip-id' will probably remain in use since it's the only place where > >> we inform cores per socket information to the kernel. > > > > Well.. unless we can find some other sensible way to convey that > > information. I haven't given up hope for that yet. > > Well, we could start by fixing the value in QEMU. It is broken > today. Fixing what value, exactly? > This is all coming from some work we did last year to evaluate our HW > (mostly for XIVE) on 2s, 4s, 16s systems on baremetal, KVM and PowerVM. > We saw some real problems because Linux did not have a clear view of the > topology. See the figures here : > > http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210303174857.1760393-9-clg@kaod.org/ > > The node id is a key parameter for system resource management, memory > allocation, interrupt affinity, etc. Linux scales much better if used > correctly. Well, sure. And we have all the ibm,associativity stuff to convey the node ids to the guest (which has its own problems, but not that are relevant here). What's throwing me is why getting node IDs correct has anything to do with socket numbers. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes 2021-03-29 4:20 ` David Gibson @ 2021-03-29 15:32 ` Cédric Le Goater 2021-03-29 18:32 ` Daniel Henrique Barboza 0 siblings, 1 reply; 23+ messages in thread From: Cédric Le Goater @ 2021-03-29 15:32 UTC (permalink / raw) To: David Gibson Cc: Laurent Vivier, Thomas Huth, Srikar Dronamraju, Michael Ellerman, Daniel Henrique Barboza, qemu-devel, groug, qemu-ppc, Igor Mammedov On 3/29/21 6:20 AM, David Gibson wrote: > On Thu, Mar 25, 2021 at 09:56:04AM +0100, Cédric Le Goater wrote: >> On 3/25/21 3:10 AM, David Gibson wrote: >>> On Tue, Mar 23, 2021 at 02:21:33PM -0300, Daniel Henrique Barboza wrote: >>>> >>>> >>>> On 3/22/21 10:03 PM, David Gibson wrote: >>>>> On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza wrote: >>>>>> Kernel commit 4bce545903fa ("powerpc/topology: Update >>>>>> topology_core_cpumask") cause a regression in the pseries machine when >>>>>> defining certain SMP topologies [1]. The reasoning behind the change is >>>>>> explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating >>>>>> cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with >>>>>> large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as >>>>>> far as the kernel understanding of SMP topologies goes, both masks are >>>>>> equivalent. >>>>>> >>>>>> Further discussions in the kernel mailing list [2] shown that the >>>>>> powerpc kernel always considered that the number of sockets were equal >>>>>> to the number of NUMA nodes. The claim is that it doesn't make sense, >>>>>> for Power hardware at least, 2+ sockets being in the same NUMA node. The >>>>>> immediate conclusion is that all SMP topologies the pseries machine were >>>>>> supplying to the kernel, with more than one socket in the same NUMA node >>>>>> as in [1], happened to be correctly represented in the kernel by >>>>>> accident during all these years. >>>>>> >>>>>> There's a case to be made for virtual topologies being detached from >>>>>> hardware constraints, allowing maximum flexibility to users. At the same >>>>>> time, this freedom can't result in unrealistic hardware representations >>>>>> being emulated. If the real hardware and the pseries kernel don't >>>>>> support multiple chips/sockets in the same NUMA node, neither should we. >>>>>> >>>>>> Starting in 6.0.0, all sockets must match an unique NUMA node in the >>>>>> pseries machine. qtest changes were made to adapt to this new >>>>>> condition. >>>>> >>>>> Oof. I really don't like this idea. It means a bunch of fiddly work >>>>> for users to match these up, for no real gain. I'm also concerned >>>>> that this will require follow on changes in libvirt to not make this a >>>>> really cryptic and irritating point of failure. >>>> >>>> Haven't though about required Libvirt changes, although I can say that there >>>> will be some amount to be mande and it will probably annoy existing users >>>> (everyone that has a multiple socket per NUMA node topology). >>>> >>>> There is not much we can do from the QEMU layer aside from what I've proposed >>>> here. The other alternative is to keep interacting with the kernel folks to >>>> see if there is a way to keep our use case untouched. >>> >>> Right. Well.. not necessarily untouched, but I'm hoping for more >>> replies from Cédric to my objections and mpe's. Even with sockets >>> being a kinda meaningless concept in PAPR, I don't think tying it to >>> NUMA nodes makes sense. >> >> I did a couple of replies in different email threads but maybe not >> to all. I felt it was going nowhere :/ Couple of thoughts, > > I think I saw some of those, but maybe not all. > >> Shouldn't we get rid of the socket concept, die also, under pseries >> since they don't exist under PAPR ? We only have numa nodes, cores, >> threads AFAICT. > > Theoretically, yes. I'm not sure it's really practical, though, since > AFAICT, both qemu and the kernel have the notion of sockets (though > not dies) built into generic code. Yes. But, AFAICT, these topology notions have not reached "arch/powerpc" and PPC Linux only has a NUMA node id, on pseries and powernv. > It does mean that one possible approach here - maybe the best one - is > to simply declare that sockets are meaningless under, so we simply > don't expect what the guest kernel reports to match what's given to > qemu. > > It'd be nice to avoid that if we can: in a sense it's just cosmetic, > but it is likely to surprise and confuse people. > >> Should we diverged from PAPR and add extra DT properties "qemu,..." ? >> There are a couple of places where Linux checks for the underlying >> hypervisor already. >> >>>> This also means that >>>> 'ibm,chip-id' will probably remain in use since it's the only place where >>>> we inform cores per socket information to the kernel. >>> >>> Well.. unless we can find some other sensible way to convey that >>> information. I haven't given up hope for that yet. >> >> Well, we could start by fixing the value in QEMU. It is broken >> today. > > Fixing what value, exactly? The value of the "ibm,chip-id" since we are keeping the property under QEMU. >> This is all coming from some work we did last year to evaluate our HW >> (mostly for XIVE) on 2s, 4s, 16s systems on baremetal, KVM and PowerVM. >> We saw some real problems because Linux did not have a clear view of the >> topology. See the figures here : >> >> http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210303174857.1760393-9-clg@kaod.org/ >> >> The node id is a key parameter for system resource management, memory >> allocation, interrupt affinity, etc. Linux scales much better if used >> correctly. > > Well, sure. And we have all the ibm,associativity stuff to convey the > node ids to the guest (which has its own problems, but not that are > relevant here). What's throwing me is why getting node IDs correct > has anything to do with socket numbers. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes 2021-03-29 15:32 ` Cédric Le Goater @ 2021-03-29 18:32 ` Daniel Henrique Barboza 2021-03-29 23:55 ` Igor Mammedov 2021-03-31 0:57 ` David Gibson 0 siblings, 2 replies; 23+ messages in thread From: Daniel Henrique Barboza @ 2021-03-29 18:32 UTC (permalink / raw) To: Cédric Le Goater, David Gibson Cc: Laurent Vivier, Thomas Huth, Srikar Dronamraju, Michael Ellerman, qemu-devel, groug, qemu-ppc, Igor Mammedov On 3/29/21 12:32 PM, Cédric Le Goater wrote: > On 3/29/21 6:20 AM, David Gibson wrote: >> On Thu, Mar 25, 2021 at 09:56:04AM +0100, Cédric Le Goater wrote: >>> On 3/25/21 3:10 AM, David Gibson wrote: >>>> On Tue, Mar 23, 2021 at 02:21:33PM -0300, Daniel Henrique Barboza wrote: >>>>> >>>>> >>>>> On 3/22/21 10:03 PM, David Gibson wrote: >>>>>> On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza wrote: >>>>>>> Kernel commit 4bce545903fa ("powerpc/topology: Update >>>>>>> topology_core_cpumask") cause a regression in the pseries machine when >>>>>>> defining certain SMP topologies [1]. The reasoning behind the change is >>>>>>> explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating >>>>>>> cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with >>>>>>> large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as >>>>>>> far as the kernel understanding of SMP topologies goes, both masks are >>>>>>> equivalent. >>>>>>> >>>>>>> Further discussions in the kernel mailing list [2] shown that the >>>>>>> powerpc kernel always considered that the number of sockets were equal >>>>>>> to the number of NUMA nodes. The claim is that it doesn't make sense, >>>>>>> for Power hardware at least, 2+ sockets being in the same NUMA node. The >>>>>>> immediate conclusion is that all SMP topologies the pseries machine were >>>>>>> supplying to the kernel, with more than one socket in the same NUMA node >>>>>>> as in [1], happened to be correctly represented in the kernel by >>>>>>> accident during all these years. >>>>>>> >>>>>>> There's a case to be made for virtual topologies being detached from >>>>>>> hardware constraints, allowing maximum flexibility to users. At the same >>>>>>> time, this freedom can't result in unrealistic hardware representations >>>>>>> being emulated. If the real hardware and the pseries kernel don't >>>>>>> support multiple chips/sockets in the same NUMA node, neither should we. >>>>>>> >>>>>>> Starting in 6.0.0, all sockets must match an unique NUMA node in the >>>>>>> pseries machine. qtest changes were made to adapt to this new >>>>>>> condition. >>>>>> >>>>>> Oof. I really don't like this idea. It means a bunch of fiddly work >>>>>> for users to match these up, for no real gain. I'm also concerned >>>>>> that this will require follow on changes in libvirt to not make this a >>>>>> really cryptic and irritating point of failure. >>>>> >>>>> Haven't though about required Libvirt changes, although I can say that there >>>>> will be some amount to be mande and it will probably annoy existing users >>>>> (everyone that has a multiple socket per NUMA node topology). >>>>> >>>>> There is not much we can do from the QEMU layer aside from what I've proposed >>>>> here. The other alternative is to keep interacting with the kernel folks to >>>>> see if there is a way to keep our use case untouched. >>>> >>>> Right. Well.. not necessarily untouched, but I'm hoping for more >>>> replies from Cédric to my objections and mpe's. Even with sockets >>>> being a kinda meaningless concept in PAPR, I don't think tying it to >>>> NUMA nodes makes sense. >>> >>> I did a couple of replies in different email threads but maybe not >>> to all. I felt it was going nowhere :/ Couple of thoughts, >> >> I think I saw some of those, but maybe not all. >> >>> Shouldn't we get rid of the socket concept, die also, under pseries >>> since they don't exist under PAPR ? We only have numa nodes, cores, >>> threads AFAICT. >> >> Theoretically, yes. I'm not sure it's really practical, though, since >> AFAICT, both qemu and the kernel have the notion of sockets (though >> not dies) built into generic code. > > Yes. But, AFAICT, these topology notions have not reached "arch/powerpc" > and PPC Linux only has a NUMA node id, on pseries and powernv. > >> It does mean that one possible approach here - maybe the best one - is >> to simply declare that sockets are meaningless under, so we simply >> don't expect what the guest kernel reports to match what's given to >> qemu. >> >> It'd be nice to avoid that if we can: in a sense it's just cosmetic, >> but it is likely to surprise and confuse people. >> >>> Should we diverged from PAPR and add extra DT properties "qemu,..." ? >>> There are a couple of places where Linux checks for the underlying >>> hypervisor already. >>> >>>>> This also means that >>>>> 'ibm,chip-id' will probably remain in use since it's the only place where >>>>> we inform cores per socket information to the kernel. >>>> >>>> Well.. unless we can find some other sensible way to convey that >>>> information. I haven't given up hope for that yet. >>> >>> Well, we could start by fixing the value in QEMU. It is broken >>> today. >> >> Fixing what value, exactly? > > The value of the "ibm,chip-id" since we are keeping the property under > QEMU. David, I believe this has to do with the discussing we had last Friday. I mentioned that the ibm,chip-id property is being calculated in a way that promotes the same ibm,chip-id in CPUs that belongs to different NUMA nodes, e.g.: -smp 4,cores=4,maxcpus=8,threads=1 \ -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 \ -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 $ dtc -I dtb -O dts fdt.dtb | grep -B2 ibm,chip-id ibm,associativity = <0x05 0x00 0x00 0x00 0x00 0x00>; ibm,pft-size = <0x00 0x19>; ibm,chip-id = <0x00>; -- ibm,associativity = <0x05 0x00 0x00 0x00 0x00 0x01>; ibm,pft-size = <0x00 0x19>; ibm,chip-id = <0x00>; -- ibm,associativity = <0x05 0x01 0x01 0x01 0x01 0x02>; ibm,pft-size = <0x00 0x19>; ibm,chip-id = <0x00>; -- ibm,associativity = <0x05 0x01 0x01 0x01 0x01 0x03>; ibm,pft-size = <0x00 0x19>; ibm,chip-id = <0x00>; We assign ibm,chip-id=0x0 to CPUs 0-3, but CPUs 2-3 are located in a different NUMA node than 0-1. This would mean that the same socket would belong to different NUMA nodes at the same time. I believe this is what Cedric wants to be addressed. Given that the property is called after the OPAL property ibm,chip-id, the kernel expects that the property will have the same semantics as in OPAL. Thanks, DHB > >>> This is all coming from some work we did last year to evaluate our HW >>> (mostly for XIVE) on 2s, 4s, 16s systems on baremetal, KVM and PowerVM. >>> We saw some real problems because Linux did not have a clear view of the >>> topology. See the figures here : >>> >>> http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210303174857.1760393-9-clg@kaod.org/ >>> >>> The node id is a key parameter for system resource management, memory >>> allocation, interrupt affinity, etc. Linux scales much better if used >>> correctly. >> >> Well, sure. And we have all the ibm,associativity stuff to convey the >> node ids to the guest (which has its own problems, but not that are >> relevant here). What's throwing me is why getting node IDs correct >> has anything to do with socket numbers. > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes 2021-03-29 18:32 ` Daniel Henrique Barboza @ 2021-03-29 23:55 ` Igor Mammedov 2021-03-31 0:57 ` David Gibson 1 sibling, 0 replies; 23+ messages in thread From: Igor Mammedov @ 2021-03-29 23:55 UTC (permalink / raw) To: Daniel Henrique Barboza Cc: Laurent Vivier, Thomas Huth, Srikar Dronamraju, Michael Ellerman, qemu-devel, groug, qemu-ppc, Cédric Le Goater, David Gibson On Mon, 29 Mar 2021 15:32:37 -0300 Daniel Henrique Barboza <danielhb413@gmail.com> wrote: > On 3/29/21 12:32 PM, Cédric Le Goater wrote: > > On 3/29/21 6:20 AM, David Gibson wrote: > >> On Thu, Mar 25, 2021 at 09:56:04AM +0100, Cédric Le Goater wrote: > >>> On 3/25/21 3:10 AM, David Gibson wrote: > >>>> On Tue, Mar 23, 2021 at 02:21:33PM -0300, Daniel Henrique Barboza wrote: > >>>>> > >>>>> > >>>>> On 3/22/21 10:03 PM, David Gibson wrote: > >>>>>> On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza wrote: > >>>>>>> Kernel commit 4bce545903fa ("powerpc/topology: Update > >>>>>>> topology_core_cpumask") cause a regression in the pseries machine when > >>>>>>> defining certain SMP topologies [1]. The reasoning behind the change is > >>>>>>> explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating > >>>>>>> cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with > >>>>>>> large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as > >>>>>>> far as the kernel understanding of SMP topologies goes, both masks are > >>>>>>> equivalent. > >>>>>>> > >>>>>>> Further discussions in the kernel mailing list [2] shown that the > >>>>>>> powerpc kernel always considered that the number of sockets were equal > >>>>>>> to the number of NUMA nodes. The claim is that it doesn't make sense, > >>>>>>> for Power hardware at least, 2+ sockets being in the same NUMA node. The > >>>>>>> immediate conclusion is that all SMP topologies the pseries machine were > >>>>>>> supplying to the kernel, with more than one socket in the same NUMA node > >>>>>>> as in [1], happened to be correctly represented in the kernel by > >>>>>>> accident during all these years. > >>>>>>> > >>>>>>> There's a case to be made for virtual topologies being detached from > >>>>>>> hardware constraints, allowing maximum flexibility to users. At the same > >>>>>>> time, this freedom can't result in unrealistic hardware representations > >>>>>>> being emulated. If the real hardware and the pseries kernel don't > >>>>>>> support multiple chips/sockets in the same NUMA node, neither should we. > >>>>>>> > >>>>>>> Starting in 6.0.0, all sockets must match an unique NUMA node in the > >>>>>>> pseries machine. qtest changes were made to adapt to this new > >>>>>>> condition. > >>>>>> > >>>>>> Oof. I really don't like this idea. It means a bunch of fiddly work > >>>>>> for users to match these up, for no real gain. I'm also concerned > >>>>>> that this will require follow on changes in libvirt to not make this a > >>>>>> really cryptic and irritating point of failure. > >>>>> > >>>>> Haven't though about required Libvirt changes, although I can say that there > >>>>> will be some amount to be mande and it will probably annoy existing users > >>>>> (everyone that has a multiple socket per NUMA node topology). > >>>>> > >>>>> There is not much we can do from the QEMU layer aside from what I've proposed > >>>>> here. The other alternative is to keep interacting with the kernel folks to > >>>>> see if there is a way to keep our use case untouched. > >>>> > >>>> Right. Well.. not necessarily untouched, but I'm hoping for more > >>>> replies from Cédric to my objections and mpe's. Even with sockets > >>>> being a kinda meaningless concept in PAPR, I don't think tying it to > >>>> NUMA nodes makes sense. > >>> > >>> I did a couple of replies in different email threads but maybe not > >>> to all. I felt it was going nowhere :/ Couple of thoughts, > >> > >> I think I saw some of those, but maybe not all. > >> > >>> Shouldn't we get rid of the socket concept, die also, under pseries > >>> since they don't exist under PAPR ? We only have numa nodes, cores, > >>> threads AFAICT. > >> > >> Theoretically, yes. I'm not sure it's really practical, though, since > >> AFAICT, both qemu and the kernel have the notion of sockets (though > >> not dies) built into generic code. > > > > Yes. But, AFAICT, these topology notions have not reached "arch/powerpc" > > and PPC Linux only has a NUMA node id, on pseries and powernv. > > > >> It does mean that one possible approach here - maybe the best one - is > >> to simply declare that sockets are meaningless under, so we simply > >> don't expect what the guest kernel reports to match what's given to > >> qemu. > >> > >> It'd be nice to avoid that if we can: in a sense it's just cosmetic, > >> but it is likely to surprise and confuse people. > >> > >>> Should we diverged from PAPR and add extra DT properties "qemu,..." ? > >>> There are a couple of places where Linux checks for the underlying > >>> hypervisor already. > >>> > >>>>> This also means that > >>>>> 'ibm,chip-id' will probably remain in use since it's the only place where > >>>>> we inform cores per socket information to the kernel. > >>>> > >>>> Well.. unless we can find some other sensible way to convey that > >>>> information. I haven't given up hope for that yet. > >>> > >>> Well, we could start by fixing the value in QEMU. It is broken > >>> today. > >> > >> Fixing what value, exactly? > > > > The value of the "ibm,chip-id" since we are keeping the property under > > QEMU. > > David, I believe this has to do with the discussing we had last Friday. > > I mentioned that the ibm,chip-id property is being calculated in a way that > promotes the same ibm,chip-id in CPUs that belongs to different NUMA nodes, > e.g.: > > -smp 4,cores=4,maxcpus=8,threads=1 \ > -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 \ > -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 just heads up, 'cpus=' is going away (deprecation patch got lost during 6.0 but I plan on pushing it during 6.1 devel window), you can use '-numa cpus,node-id=a,core-id=b' instead > > > $ dtc -I dtb -O dts fdt.dtb | grep -B2 ibm,chip-id > ibm,associativity = <0x05 0x00 0x00 0x00 0x00 0x00>; > ibm,pft-size = <0x00 0x19>; > ibm,chip-id = <0x00>; > -- > ibm,associativity = <0x05 0x00 0x00 0x00 0x00 0x01>; > ibm,pft-size = <0x00 0x19>; > ibm,chip-id = <0x00>; > -- > ibm,associativity = <0x05 0x01 0x01 0x01 0x01 0x02>; > ibm,pft-size = <0x00 0x19>; > ibm,chip-id = <0x00>; > -- > ibm,associativity = <0x05 0x01 0x01 0x01 0x01 0x03>; > ibm,pft-size = <0x00 0x19>; > ibm,chip-id = <0x00>; > > We assign ibm,chip-id=0x0 to CPUs 0-3, but CPUs 2-3 are located in a different > NUMA node than 0-1. This would mean that the same socket would belong to > different NUMA nodes at the same time. > > I believe this is what Cedric wants to be addressed. Given that the property is > called after the OPAL property ibm,chip-id, the kernel expects that the property > will have the same semantics as in OPAL. > > > > Thanks, > > > DHB > > > > > > > >>> This is all coming from some work we did last year to evaluate our HW > >>> (mostly for XIVE) on 2s, 4s, 16s systems on baremetal, KVM and PowerVM. > >>> We saw some real problems because Linux did not have a clear view of the > >>> topology. See the figures here : > >>> > >>> http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210303174857.1760393-9-clg@kaod.org/ > >>> > >>> The node id is a key parameter for system resource management, memory > >>> allocation, interrupt affinity, etc. Linux scales much better if used > >>> correctly. > >> > >> Well, sure. And we have all the ibm,associativity stuff to convey the > >> node ids to the guest (which has its own problems, but not that are > >> relevant here). What's throwing me is why getting node IDs correct > >> has anything to do with socket numbers. > > > > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes 2021-03-29 18:32 ` Daniel Henrique Barboza 2021-03-29 23:55 ` Igor Mammedov @ 2021-03-31 0:57 ` David Gibson 2021-03-31 4:58 ` Michael Ellerman 2021-03-31 15:18 ` Cédric Le Goater 1 sibling, 2 replies; 23+ messages in thread From: David Gibson @ 2021-03-31 0:57 UTC (permalink / raw) To: Daniel Henrique Barboza Cc: Laurent Vivier, Thomas Huth, Srikar Dronamraju, Michael Ellerman, groug, qemu-devel, qemu-ppc, Cédric Le Goater, Igor Mammedov [-- Attachment #1: Type: text/plain, Size: 7676 bytes --] On Mon, Mar 29, 2021 at 03:32:37PM -0300, Daniel Henrique Barboza wrote: > > > On 3/29/21 12:32 PM, Cédric Le Goater wrote: > > On 3/29/21 6:20 AM, David Gibson wrote: > > > On Thu, Mar 25, 2021 at 09:56:04AM +0100, Cédric Le Goater wrote: > > > > On 3/25/21 3:10 AM, David Gibson wrote: > > > > > On Tue, Mar 23, 2021 at 02:21:33PM -0300, Daniel Henrique Barboza wrote: > > > > > > > > > > > > > > > > > > On 3/22/21 10:03 PM, David Gibson wrote: > > > > > > > On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza wrote: > > > > > > > > Kernel commit 4bce545903fa ("powerpc/topology: Update > > > > > > > > topology_core_cpumask") cause a regression in the pseries machine when > > > > > > > > defining certain SMP topologies [1]. The reasoning behind the change is > > > > > > > > explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating > > > > > > > > cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with > > > > > > > > large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as > > > > > > > > far as the kernel understanding of SMP topologies goes, both masks are > > > > > > > > equivalent. > > > > > > > > > > > > > > > > Further discussions in the kernel mailing list [2] shown that the > > > > > > > > powerpc kernel always considered that the number of sockets were equal > > > > > > > > to the number of NUMA nodes. The claim is that it doesn't make sense, > > > > > > > > for Power hardware at least, 2+ sockets being in the same NUMA node. The > > > > > > > > immediate conclusion is that all SMP topologies the pseries machine were > > > > > > > > supplying to the kernel, with more than one socket in the same NUMA node > > > > > > > > as in [1], happened to be correctly represented in the kernel by > > > > > > > > accident during all these years. > > > > > > > > > > > > > > > > There's a case to be made for virtual topologies being detached from > > > > > > > > hardware constraints, allowing maximum flexibility to users. At the same > > > > > > > > time, this freedom can't result in unrealistic hardware representations > > > > > > > > being emulated. If the real hardware and the pseries kernel don't > > > > > > > > support multiple chips/sockets in the same NUMA node, neither should we. > > > > > > > > > > > > > > > > Starting in 6.0.0, all sockets must match an unique NUMA node in the > > > > > > > > pseries machine. qtest changes were made to adapt to this new > > > > > > > > condition. > > > > > > > > > > > > > > Oof. I really don't like this idea. It means a bunch of fiddly work > > > > > > > for users to match these up, for no real gain. I'm also concerned > > > > > > > that this will require follow on changes in libvirt to not make this a > > > > > > > really cryptic and irritating point of failure. > > > > > > > > > > > > Haven't though about required Libvirt changes, although I can say that there > > > > > > will be some amount to be mande and it will probably annoy existing users > > > > > > (everyone that has a multiple socket per NUMA node topology). > > > > > > > > > > > > There is not much we can do from the QEMU layer aside from what I've proposed > > > > > > here. The other alternative is to keep interacting with the kernel folks to > > > > > > see if there is a way to keep our use case untouched. > > > > > > > > > > Right. Well.. not necessarily untouched, but I'm hoping for more > > > > > replies from Cédric to my objections and mpe's. Even with sockets > > > > > being a kinda meaningless concept in PAPR, I don't think tying it to > > > > > NUMA nodes makes sense. > > > > > > > > I did a couple of replies in different email threads but maybe not > > > > to all. I felt it was going nowhere :/ Couple of thoughts, > > > > > > I think I saw some of those, but maybe not all. > > > > > > > Shouldn't we get rid of the socket concept, die also, under pseries > > > > since they don't exist under PAPR ? We only have numa nodes, cores, > > > > threads AFAICT. > > > > > > Theoretically, yes. I'm not sure it's really practical, though, since > > > AFAICT, both qemu and the kernel have the notion of sockets (though > > > not dies) built into generic code. > > > > Yes. But, AFAICT, these topology notions have not reached "arch/powerpc" > > and PPC Linux only has a NUMA node id, on pseries and powernv. > > > > > It does mean that one possible approach here - maybe the best one - is > > > to simply declare that sockets are meaningless under, so we simply > > > don't expect what the guest kernel reports to match what's given to > > > qemu. > > > > > > It'd be nice to avoid that if we can: in a sense it's just cosmetic, > > > but it is likely to surprise and confuse people. > > > > > > > Should we diverged from PAPR and add extra DT properties "qemu,..." ? > > > > There are a couple of places where Linux checks for the underlying > > > > hypervisor already. > > > > > > > > > > This also means that > > > > > > 'ibm,chip-id' will probably remain in use since it's the only place where > > > > > > we inform cores per socket information to the kernel. > > > > > > > > > > Well.. unless we can find some other sensible way to convey that > > > > > information. I haven't given up hope for that yet. > > > > > > > > Well, we could start by fixing the value in QEMU. It is broken > > > > today. > > > > > > Fixing what value, exactly? > > > > The value of the "ibm,chip-id" since we are keeping the property under > > QEMU. > > David, I believe this has to do with the discussing we had last Friday. > > I mentioned that the ibm,chip-id property is being calculated in a way that > promotes the same ibm,chip-id in CPUs that belongs to different NUMA nodes, > e.g.: > > -smp 4,cores=4,maxcpus=8,threads=1 \ > -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 \ > -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 > > > $ dtc -I dtb -O dts fdt.dtb | grep -B2 ibm,chip-id > ibm,associativity = <0x05 0x00 0x00 0x00 0x00 0x00>; > ibm,pft-size = <0x00 0x19>; > ibm,chip-id = <0x00>; > -- > ibm,associativity = <0x05 0x00 0x00 0x00 0x00 0x01>; > ibm,pft-size = <0x00 0x19>; > ibm,chip-id = <0x00>; > -- > ibm,associativity = <0x05 0x01 0x01 0x01 0x01 0x02>; > ibm,pft-size = <0x00 0x19>; > ibm,chip-id = <0x00>; > -- > ibm,associativity = <0x05 0x01 0x01 0x01 0x01 0x03>; > ibm,pft-size = <0x00 0x19>; > ibm,chip-id = <0x00>; > We assign ibm,chip-id=0x0 to CPUs 0-3, but CPUs 2-3 are located in a > different NUMA node than 0-1. This would mean that the same socket > would belong to different NUMA nodes at the same time. Right... and I'm still not seeing why that's a problem. AFAICT that's a possible, if unexpected, situation under real hardware - though maybe not for POWER9 specifically. > I believe this is what Cedric wants to be addressed. Given that the > property is called after the OPAL property ibm,chip-id, the kernel > expects that the property will have the same semantics as in OPAL. Even on powernv, I'm not clear why chip-id is tied into the NUMA configuration, rather than getting all the NUMA info from associativity properties. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes 2021-03-31 0:57 ` David Gibson @ 2021-03-31 4:58 ` Michael Ellerman 2021-03-31 15:22 ` Cédric Le Goater 2021-03-31 15:18 ` Cédric Le Goater 1 sibling, 1 reply; 23+ messages in thread From: Michael Ellerman @ 2021-03-31 4:58 UTC (permalink / raw) To: David Gibson, Daniel Henrique Barboza Cc: Laurent Vivier, Thomas Huth, Srikar Dronamraju, groug, qemu-devel, qemu-ppc, Cédric Le Goater, Igor Mammedov David Gibson <david@gibson.dropbear.id.au> writes: > On Mon, Mar 29, 2021 at 03:32:37PM -0300, Daniel Henrique Barboza wrote: ... > >> We assign ibm,chip-id=0x0 to CPUs 0-3, but CPUs 2-3 are located in a >> different NUMA node than 0-1. This would mean that the same socket >> would belong to different NUMA nodes at the same time. > > Right... and I'm still not seeing why that's a problem. AFAICT that's > a possible, if unexpected, situation under real hardware - though > maybe not for POWER9 specifically. I think I agree. >> I believe this is what Cedric wants to be addressed. Given that the >> property is called after the OPAL property ibm,chip-id, the kernel >> expects that the property will have the same semantics as in OPAL. > > Even on powernv, I'm not clear why chip-id is tied into the NUMA > configuration, rather than getting all the NUMA info from > associativity properties. AFAIK we don't use chip-id for anything related to NUMA, if we do I'd consider that a bug. We do use it for topology_physical_package_id(), but that's almost completely unused. cheers ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes 2021-03-31 4:58 ` Michael Ellerman @ 2021-03-31 15:22 ` Cédric Le Goater 2021-04-01 2:53 ` David Gibson 0 siblings, 1 reply; 23+ messages in thread From: Cédric Le Goater @ 2021-03-31 15:22 UTC (permalink / raw) To: Michael Ellerman, David Gibson, Daniel Henrique Barboza Cc: Laurent Vivier, Thomas Huth, Srikar Dronamraju, qemu-devel, groug, qemu-ppc, Igor Mammedov On 3/31/21 6:58 AM, Michael Ellerman wrote: > David Gibson <david@gibson.dropbear.id.au> writes: >> On Mon, Mar 29, 2021 at 03:32:37PM -0300, Daniel Henrique Barboza wrote: > ... >> >>> We assign ibm,chip-id=0x0 to CPUs 0-3, but CPUs 2-3 are located in a >>> different NUMA node than 0-1. This would mean that the same socket >>> would belong to different NUMA nodes at the same time. >> >> Right... and I'm still not seeing why that's a problem. AFAICT that's >> a possible, if unexpected, situation under real hardware - though >> maybe not for POWER9 specifically. > > I think I agree. > >>> I believe this is what Cedric wants to be addressed. Given that the >>> property is called after the OPAL property ibm,chip-id, the kernel >>> expects that the property will have the same semantics as in OPAL. >> >> Even on powernv, I'm not clear why chip-id is tied into the NUMA >> configuration, rather than getting all the NUMA info from >> associativity properties. > > AFAIK we don't use chip-id for anything related to NUMA, if we do I'd > consider that a bug. Since PAPR only has NUMA nodes, is the use of chip-id in XIVE PAPR considered as a bug ? I would say so. > We do use it for topology_physical_package_id(), but that's almost > completely unused. In that case, I think it should be fine to return -1 like under PowerVM. Thanks, C. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes 2021-03-31 15:22 ` Cédric Le Goater @ 2021-04-01 2:53 ` David Gibson 0 siblings, 0 replies; 23+ messages in thread From: David Gibson @ 2021-04-01 2:53 UTC (permalink / raw) To: Cédric Le Goater Cc: Laurent Vivier, Thomas Huth, Srikar Dronamraju, Michael Ellerman, Daniel Henrique Barboza, qemu-devel, groug, qemu-ppc, Igor Mammedov [-- Attachment #1: Type: text/plain, Size: 2182 bytes --] On Wed, Mar 31, 2021 at 05:22:39PM +0200, Cédric Le Goater wrote: > On 3/31/21 6:58 AM, Michael Ellerman wrote: > > David Gibson <david@gibson.dropbear.id.au> writes: > >> On Mon, Mar 29, 2021 at 03:32:37PM -0300, Daniel Henrique Barboza wrote: > > ... > >> > >>> We assign ibm,chip-id=0x0 to CPUs 0-3, but CPUs 2-3 are located in a > >>> different NUMA node than 0-1. This would mean that the same socket > >>> would belong to different NUMA nodes at the same time. > >> > >> Right... and I'm still not seeing why that's a problem. AFAICT that's > >> a possible, if unexpected, situation under real hardware - though > >> maybe not for POWER9 specifically. > > > > I think I agree. > > > >>> I believe this is what Cedric wants to be addressed. Given that the > >>> property is called after the OPAL property ibm,chip-id, the kernel > >>> expects that the property will have the same semantics as in OPAL. > >> > >> Even on powernv, I'm not clear why chip-id is tied into the NUMA > >> configuration, rather than getting all the NUMA info from > >> associativity properties. > > > > AFAIK we don't use chip-id for anything related to NUMA, if we do I'd > > consider that a bug. > > Since PAPR only has NUMA nodes, is the use of chip-id in XIVE PAPR > considered as a bug ? I would say so. As noted in another thread, XIVE PAPR *doesn't* actually use chip_id. And even on PowerNV, I don't think this has any real connection to NUMA. For PowerNV we care about whether we're working within a single XIVE hardware instance, or across multiple. There's one XIVE per-chip, hence the relevance of chip-id. That happens to also match to NUMA topology on (currently existing) POWER9 chips, but I don't see that it inherently has to. > > We do use it for topology_physical_package_id(), but that's almost > > completely unused. > > In that case, I think it should be fine to return -1 like under PowerVM. > > Thanks, > > C. > > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes 2021-03-31 0:57 ` David Gibson 2021-03-31 4:58 ` Michael Ellerman @ 2021-03-31 15:18 ` Cédric Le Goater 2021-03-31 17:29 ` Daniel Henrique Barboza 2021-04-01 2:59 ` David Gibson 1 sibling, 2 replies; 23+ messages in thread From: Cédric Le Goater @ 2021-03-31 15:18 UTC (permalink / raw) To: David Gibson, Daniel Henrique Barboza Cc: Laurent Vivier, Thomas Huth, Srikar Dronamraju, Michael Ellerman, qemu-devel, groug, qemu-ppc, Igor Mammedov On 3/31/21 2:57 AM, David Gibson wrote: > On Mon, Mar 29, 2021 at 03:32:37PM -0300, Daniel Henrique Barboza wrote: >> >> >> On 3/29/21 12:32 PM, Cédric Le Goater wrote: >>> On 3/29/21 6:20 AM, David Gibson wrote: >>>> On Thu, Mar 25, 2021 at 09:56:04AM +0100, Cédric Le Goater wrote: >>>>> On 3/25/21 3:10 AM, David Gibson wrote: >>>>>> On Tue, Mar 23, 2021 at 02:21:33PM -0300, Daniel Henrique Barboza wrote: >>>>>>> >>>>>>> >>>>>>> On 3/22/21 10:03 PM, David Gibson wrote: >>>>>>>> On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza wrote: >>>>>>>>> Kernel commit 4bce545903fa ("powerpc/topology: Update >>>>>>>>> topology_core_cpumask") cause a regression in the pseries machine when >>>>>>>>> defining certain SMP topologies [1]. The reasoning behind the change is >>>>>>>>> explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating >>>>>>>>> cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with >>>>>>>>> large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as >>>>>>>>> far as the kernel understanding of SMP topologies goes, both masks are >>>>>>>>> equivalent. >>>>>>>>> >>>>>>>>> Further discussions in the kernel mailing list [2] shown that the >>>>>>>>> powerpc kernel always considered that the number of sockets were equal >>>>>>>>> to the number of NUMA nodes. The claim is that it doesn't make sense, >>>>>>>>> for Power hardware at least, 2+ sockets being in the same NUMA node. The >>>>>>>>> immediate conclusion is that all SMP topologies the pseries machine were >>>>>>>>> supplying to the kernel, with more than one socket in the same NUMA node >>>>>>>>> as in [1], happened to be correctly represented in the kernel by >>>>>>>>> accident during all these years. >>>>>>>>> >>>>>>>>> There's a case to be made for virtual topologies being detached from >>>>>>>>> hardware constraints, allowing maximum flexibility to users. At the same >>>>>>>>> time, this freedom can't result in unrealistic hardware representations >>>>>>>>> being emulated. If the real hardware and the pseries kernel don't >>>>>>>>> support multiple chips/sockets in the same NUMA node, neither should we. >>>>>>>>> >>>>>>>>> Starting in 6.0.0, all sockets must match an unique NUMA node in the >>>>>>>>> pseries machine. qtest changes were made to adapt to this new >>>>>>>>> condition. >>>>>>>> >>>>>>>> Oof. I really don't like this idea. It means a bunch of fiddly work >>>>>>>> for users to match these up, for no real gain. I'm also concerned >>>>>>>> that this will require follow on changes in libvirt to not make this a >>>>>>>> really cryptic and irritating point of failure. >>>>>>> >>>>>>> Haven't though about required Libvirt changes, although I can say that there >>>>>>> will be some amount to be mande and it will probably annoy existing users >>>>>>> (everyone that has a multiple socket per NUMA node topology). >>>>>>> >>>>>>> There is not much we can do from the QEMU layer aside from what I've proposed >>>>>>> here. The other alternative is to keep interacting with the kernel folks to >>>>>>> see if there is a way to keep our use case untouched. >>>>>> >>>>>> Right. Well.. not necessarily untouched, but I'm hoping for more >>>>>> replies from Cédric to my objections and mpe's. Even with sockets >>>>>> being a kinda meaningless concept in PAPR, I don't think tying it to >>>>>> NUMA nodes makes sense. >>>>> >>>>> I did a couple of replies in different email threads but maybe not >>>>> to all. I felt it was going nowhere :/ Couple of thoughts, >>>> >>>> I think I saw some of those, but maybe not all. >>>> >>>>> Shouldn't we get rid of the socket concept, die also, under pseries >>>>> since they don't exist under PAPR ? We only have numa nodes, cores, >>>>> threads AFAICT. >>>> >>>> Theoretically, yes. I'm not sure it's really practical, though, since >>>> AFAICT, both qemu and the kernel have the notion of sockets (though >>>> not dies) built into generic code. >>> >>> Yes. But, AFAICT, these topology notions have not reached "arch/powerpc" >>> and PPC Linux only has a NUMA node id, on pseries and powernv. >>> >>>> It does mean that one possible approach here - maybe the best one - is >>>> to simply declare that sockets are meaningless under, so we simply >>>> don't expect what the guest kernel reports to match what's given to >>>> qemu. >>>> >>>> It'd be nice to avoid that if we can: in a sense it's just cosmetic, >>>> but it is likely to surprise and confuse people. >>>> >>>>> Should we diverged from PAPR and add extra DT properties "qemu,..." ? >>>>> There are a couple of places where Linux checks for the underlying >>>>> hypervisor already. >>>>> >>>>>>> This also means that >>>>>>> 'ibm,chip-id' will probably remain in use since it's the only place where >>>>>>> we inform cores per socket information to the kernel. >>>>>> >>>>>> Well.. unless we can find some other sensible way to convey that >>>>>> information. I haven't given up hope for that yet. >>>>> >>>>> Well, we could start by fixing the value in QEMU. It is broken >>>>> today. >>>> >>>> Fixing what value, exactly? >>> >>> The value of the "ibm,chip-id" since we are keeping the property under >>> QEMU. >> >> David, I believe this has to do with the discussing we had last Friday. >> >> I mentioned that the ibm,chip-id property is being calculated in a way that >> promotes the same ibm,chip-id in CPUs that belongs to different NUMA nodes, >> e.g.: >> >> -smp 4,cores=4,maxcpus=8,threads=1 \ >> -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 \ >> -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 >> >> >> $ dtc -I dtb -O dts fdt.dtb | grep -B2 ibm,chip-id >> ibm,associativity = <0x05 0x00 0x00 0x00 0x00 0x00>; >> ibm,pft-size = <0x00 0x19>; >> ibm,chip-id = <0x00>; >> -- >> ibm,associativity = <0x05 0x00 0x00 0x00 0x00 0x01>; >> ibm,pft-size = <0x00 0x19>; >> ibm,chip-id = <0x00>; >> -- >> ibm,associativity = <0x05 0x01 0x01 0x01 0x01 0x02>; >> ibm,pft-size = <0x00 0x19>; >> ibm,chip-id = <0x00>; >> -- >> ibm,associativity = <0x05 0x01 0x01 0x01 0x01 0x03>; >> ibm,pft-size = <0x00 0x19>; >> ibm,chip-id = <0x00>; > >> We assign ibm,chip-id=0x0 to CPUs 0-3, but CPUs 2-3 are located in a >> different NUMA node than 0-1. This would mean that the same socket >> would belong to different NUMA nodes at the same time. > > Right... and I'm still not seeing why that's a problem. AFAICT that's > a possible, if unexpected, situation under real hardware - though > maybe not for POWER9 specifically. The ibm,chip-id property does not exist under PAPR. PAPR only has NUMA nodes, no sockets nor chips. And the property value is simply broken under QEMU. Try this : -smp 4,cores=1,maxcpus=8 -object memory-backend-ram,id=ram-node0,size=2G -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=2G -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 # dmesg | grep numa [ 0.013106] numa: Node 0 CPUs: 0-1 [ 0.013136] numa: Node 1 CPUs: 2-3 # dtc -I fs /proc/device-tree/cpus/ -f | grep ibm,chip-id ibm,chip-id = <0x01>; ibm,chip-id = <0x02>; ibm,chip-id = <0x00>; ibm,chip-id = <0x03>; >> I believe this is what Cedric wants to be addressed. Given that the >> property is called after the OPAL property ibm,chip-id, the kernel >> expects that the property will have the same semantics as in OPAL.> > Even on powernv, I'm not clear why chip-id is tied into the NUMA > configuration, rather than getting all the NUMA info from > associativity properties. It is the case. The associativity properties are built from chip-id in OPAL though. The chip-id property is only used in low level PowerNV drivers, VAS, XSCOM, LPC, etc. It's also badly used in the common part of the XIVE driver, what I am trying to fix to introduce an IPI per node on all platforms. C. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes 2021-03-31 15:18 ` Cédric Le Goater @ 2021-03-31 17:29 ` Daniel Henrique Barboza 2021-03-31 17:40 ` Cédric Le Goater 2021-04-01 2:59 ` David Gibson 1 sibling, 1 reply; 23+ messages in thread From: Daniel Henrique Barboza @ 2021-03-31 17:29 UTC (permalink / raw) To: Cédric Le Goater, David Gibson Cc: Laurent Vivier, Thomas Huth, Srikar Dronamraju, Michael Ellerman, qemu-devel, groug, qemu-ppc, Igor Mammedov On 3/31/21 12:18 PM, Cédric Le Goater wrote: > On 3/31/21 2:57 AM, David Gibson wrote: >> On Mon, Mar 29, 2021 at 03:32:37PM -0300, Daniel Henrique Barboza wrote: >>> >>> >>> On 3/29/21 12:32 PM, Cédric Le Goater wrote: >>>> On 3/29/21 6:20 AM, David Gibson wrote: >>>>> On Thu, Mar 25, 2021 at 09:56:04AM +0100, Cédric Le Goater wrote: >>>>>> On 3/25/21 3:10 AM, David Gibson wrote: >>>>>>> On Tue, Mar 23, 2021 at 02:21:33PM -0300, Daniel Henrique Barboza wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 3/22/21 10:03 PM, David Gibson wrote: >>>>>>>>> On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza wrote: >>>>>>>>>> Kernel commit 4bce545903fa ("powerpc/topology: Update >>>>>>>>>> topology_core_cpumask") cause a regression in the pseries machine when >>>>>>>>>> defining certain SMP topologies [1]. The reasoning behind the change is >>>>>>>>>> explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating >>>>>>>>>> cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with >>>>>>>>>> large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as >>>>>>>>>> far as the kernel understanding of SMP topologies goes, both masks are >>>>>>>>>> equivalent. >>>>>>>>>> >>>>>>>>>> Further discussions in the kernel mailing list [2] shown that the >>>>>>>>>> powerpc kernel always considered that the number of sockets were equal >>>>>>>>>> to the number of NUMA nodes. The claim is that it doesn't make sense, >>>>>>>>>> for Power hardware at least, 2+ sockets being in the same NUMA node. The >>>>>>>>>> immediate conclusion is that all SMP topologies the pseries machine were >>>>>>>>>> supplying to the kernel, with more than one socket in the same NUMA node >>>>>>>>>> as in [1], happened to be correctly represented in the kernel by >>>>>>>>>> accident during all these years. >>>>>>>>>> >>>>>>>>>> There's a case to be made for virtual topologies being detached from >>>>>>>>>> hardware constraints, allowing maximum flexibility to users. At the same >>>>>>>>>> time, this freedom can't result in unrealistic hardware representations >>>>>>>>>> being emulated. If the real hardware and the pseries kernel don't >>>>>>>>>> support multiple chips/sockets in the same NUMA node, neither should we. >>>>>>>>>> >>>>>>>>>> Starting in 6.0.0, all sockets must match an unique NUMA node in the >>>>>>>>>> pseries machine. qtest changes were made to adapt to this new >>>>>>>>>> condition. >>>>>>>>> >>>>>>>>> Oof. I really don't like this idea. It means a bunch of fiddly work >>>>>>>>> for users to match these up, for no real gain. I'm also concerned >>>>>>>>> that this will require follow on changes in libvirt to not make this a >>>>>>>>> really cryptic and irritating point of failure. >>>>>>>> >>>>>>>> Haven't though about required Libvirt changes, although I can say that there >>>>>>>> will be some amount to be mande and it will probably annoy existing users >>>>>>>> (everyone that has a multiple socket per NUMA node topology). >>>>>>>> >>>>>>>> There is not much we can do from the QEMU layer aside from what I've proposed >>>>>>>> here. The other alternative is to keep interacting with the kernel folks to >>>>>>>> see if there is a way to keep our use case untouched. >>>>>>> >>>>>>> Right. Well.. not necessarily untouched, but I'm hoping for more >>>>>>> replies from Cédric to my objections and mpe's. Even with sockets >>>>>>> being a kinda meaningless concept in PAPR, I don't think tying it to >>>>>>> NUMA nodes makes sense. >>>>>> >>>>>> I did a couple of replies in different email threads but maybe not >>>>>> to all. I felt it was going nowhere :/ Couple of thoughts, >>>>> >>>>> I think I saw some of those, but maybe not all. >>>>> >>>>>> Shouldn't we get rid of the socket concept, die also, under pseries >>>>>> since they don't exist under PAPR ? We only have numa nodes, cores, >>>>>> threads AFAICT. >>>>> >>>>> Theoretically, yes. I'm not sure it's really practical, though, since >>>>> AFAICT, both qemu and the kernel have the notion of sockets (though >>>>> not dies) built into generic code. >>>> >>>> Yes. But, AFAICT, these topology notions have not reached "arch/powerpc" >>>> and PPC Linux only has a NUMA node id, on pseries and powernv. >>>> >>>>> It does mean that one possible approach here - maybe the best one - is >>>>> to simply declare that sockets are meaningless under, so we simply >>>>> don't expect what the guest kernel reports to match what's given to >>>>> qemu. >>>>> >>>>> It'd be nice to avoid that if we can: in a sense it's just cosmetic, >>>>> but it is likely to surprise and confuse people. >>>>> >>>>>> Should we diverged from PAPR and add extra DT properties "qemu,..." ? >>>>>> There are a couple of places where Linux checks for the underlying >>>>>> hypervisor already. >>>>>> >>>>>>>> This also means that >>>>>>>> 'ibm,chip-id' will probably remain in use since it's the only place where >>>>>>>> we inform cores per socket information to the kernel. >>>>>>> >>>>>>> Well.. unless we can find some other sensible way to convey that >>>>>>> information. I haven't given up hope for that yet. >>>>>> >>>>>> Well, we could start by fixing the value in QEMU. It is broken >>>>>> today. >>>>> >>>>> Fixing what value, exactly? >>>> >>>> The value of the "ibm,chip-id" since we are keeping the property under >>>> QEMU. >>> >>> David, I believe this has to do with the discussing we had last Friday. >>> >>> I mentioned that the ibm,chip-id property is being calculated in a way that >>> promotes the same ibm,chip-id in CPUs that belongs to different NUMA nodes, >>> e.g.: >>> >>> -smp 4,cores=4,maxcpus=8,threads=1 \ >>> -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 \ >>> -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 >>> >>> >>> $ dtc -I dtb -O dts fdt.dtb | grep -B2 ibm,chip-id >>> ibm,associativity = <0x05 0x00 0x00 0x00 0x00 0x00>; >>> ibm,pft-size = <0x00 0x19>; >>> ibm,chip-id = <0x00>; >>> -- >>> ibm,associativity = <0x05 0x00 0x00 0x00 0x00 0x01>; >>> ibm,pft-size = <0x00 0x19>; >>> ibm,chip-id = <0x00>; >>> -- >>> ibm,associativity = <0x05 0x01 0x01 0x01 0x01 0x02>; >>> ibm,pft-size = <0x00 0x19>; >>> ibm,chip-id = <0x00>; >>> -- >>> ibm,associativity = <0x05 0x01 0x01 0x01 0x01 0x03>; >>> ibm,pft-size = <0x00 0x19>; >>> ibm,chip-id = <0x00>; >> >>> We assign ibm,chip-id=0x0 to CPUs 0-3, but CPUs 2-3 are located in a >>> different NUMA node than 0-1. This would mean that the same socket >>> would belong to different NUMA nodes at the same time. >> >> Right... and I'm still not seeing why that's a problem. AFAICT that's >> a possible, if unexpected, situation under real hardware - though >> maybe not for POWER9 specifically. > The ibm,chip-id property does not exist under PAPR. PAPR only has > NUMA nodes, no sockets nor chips. > > And the property value is simply broken under QEMU. Try this : > > -smp 4,cores=1,maxcpus=8 -object memory-backend-ram,id=ram-node0,size=2G -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=2G -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 > > # dmesg | grep numa > [ 0.013106] numa: Node 0 CPUs: 0-1 > [ 0.013136] numa: Node 1 CPUs: 2-3 > > # dtc -I fs /proc/device-tree/cpus/ -f | grep ibm,chip-id > ibm,chip-id = <0x01>; > ibm,chip-id = <0x02>; > ibm,chip-id = <0x00>; > ibm,chip-id = <0x03>; These values are not wrong. When you do: -smp 4,cores=1,maxcpus=8 (....) You didn't fill threads and sockets. QEMU default is to prioritize sockets to fill the missing information, up to the maxcpus value. This means that what you did is equivalent to: -smp 4,threads=1,cores=1,sockets=8,maxcpus=8 (....) It's a 1 thread/core, 1 core/socket with 8 sockets config. Each possible CPU will sit in its own core, having its own ibm,chip-id. So: "-numa node,nodeid=0,cpus=0-1" is in fact allocating sockets 0 and 1 to NUMA node 0. Thanks, DHB > >>> I believe this is what Cedric wants to be addressed. Given that the >>> property is called after the OPAL property ibm,chip-id, the kernel >>> expects that the property will have the same semantics as in OPAL.> >> Even on powernv, I'm not clear why chip-id is tied into the NUMA >> configuration, rather than getting all the NUMA info from >> associativity properties. > > It is the case. > > The associativity properties are built from chip-id in OPAL though. > > The chip-id property is only used in low level PowerNV drivers, VAS, > XSCOM, LPC, etc. > > It's also badly used in the common part of the XIVE driver, what I am > trying to fix to introduce an IPI per node on all platforms. > > C. > > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes 2021-03-31 17:29 ` Daniel Henrique Barboza @ 2021-03-31 17:40 ` Cédric Le Goater 0 siblings, 0 replies; 23+ messages in thread From: Cédric Le Goater @ 2021-03-31 17:40 UTC (permalink / raw) To: Daniel Henrique Barboza, David Gibson Cc: Laurent Vivier, Thomas Huth, Srikar Dronamraju, Michael Ellerman, qemu-devel, groug, qemu-ppc, Igor Mammedov On 3/31/21 7:29 PM, Daniel Henrique Barboza wrote: > > On 3/31/21 12:18 PM, Cédric Le Goater wrote: >> On 3/31/21 2:57 AM, David Gibson wrote: >>> On Mon, Mar 29, 2021 at 03:32:37PM -0300, Daniel Henrique Barboza wrote: >>>> >>>> >>>> On 3/29/21 12:32 PM, Cédric Le Goater wrote: >>>>> On 3/29/21 6:20 AM, David Gibson wrote: >>>>>> On Thu, Mar 25, 2021 at 09:56:04AM +0100, Cédric Le Goater wrote: >>>>>>> On 3/25/21 3:10 AM, David Gibson wrote: >>>>>>>> On Tue, Mar 23, 2021 at 02:21:33PM -0300, Daniel Henrique Barboza wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 3/22/21 10:03 PM, David Gibson wrote: >>>>>>>>>> On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza wrote: >>>>>>>>>>> Kernel commit 4bce545903fa ("powerpc/topology: Update >>>>>>>>>>> topology_core_cpumask") cause a regression in the pseries machine when >>>>>>>>>>> defining certain SMP topologies [1]. The reasoning behind the change is >>>>>>>>>>> explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating >>>>>>>>>>> cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with >>>>>>>>>>> large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as >>>>>>>>>>> far as the kernel understanding of SMP topologies goes, both masks are >>>>>>>>>>> equivalent. >>>>>>>>>>> >>>>>>>>>>> Further discussions in the kernel mailing list [2] shown that the >>>>>>>>>>> powerpc kernel always considered that the number of sockets were equal >>>>>>>>>>> to the number of NUMA nodes. The claim is that it doesn't make sense, >>>>>>>>>>> for Power hardware at least, 2+ sockets being in the same NUMA node. The >>>>>>>>>>> immediate conclusion is that all SMP topologies the pseries machine were >>>>>>>>>>> supplying to the kernel, with more than one socket in the same NUMA node >>>>>>>>>>> as in [1], happened to be correctly represented in the kernel by >>>>>>>>>>> accident during all these years. >>>>>>>>>>> >>>>>>>>>>> There's a case to be made for virtual topologies being detached from >>>>>>>>>>> hardware constraints, allowing maximum flexibility to users. At the same >>>>>>>>>>> time, this freedom can't result in unrealistic hardware representations >>>>>>>>>>> being emulated. If the real hardware and the pseries kernel don't >>>>>>>>>>> support multiple chips/sockets in the same NUMA node, neither should we. >>>>>>>>>>> >>>>>>>>>>> Starting in 6.0.0, all sockets must match an unique NUMA node in the >>>>>>>>>>> pseries machine. qtest changes were made to adapt to this new >>>>>>>>>>> condition. >>>>>>>>>> >>>>>>>>>> Oof. I really don't like this idea. It means a bunch of fiddly work >>>>>>>>>> for users to match these up, for no real gain. I'm also concerned >>>>>>>>>> that this will require follow on changes in libvirt to not make this a >>>>>>>>>> really cryptic and irritating point of failure. >>>>>>>>> >>>>>>>>> Haven't though about required Libvirt changes, although I can say that there >>>>>>>>> will be some amount to be mande and it will probably annoy existing users >>>>>>>>> (everyone that has a multiple socket per NUMA node topology). >>>>>>>>> >>>>>>>>> There is not much we can do from the QEMU layer aside from what I've proposed >>>>>>>>> here. The other alternative is to keep interacting with the kernel folks to >>>>>>>>> see if there is a way to keep our use case untouched. >>>>>>>> >>>>>>>> Right. Well.. not necessarily untouched, but I'm hoping for more >>>>>>>> replies from Cédric to my objections and mpe's. Even with sockets >>>>>>>> being a kinda meaningless concept in PAPR, I don't think tying it to >>>>>>>> NUMA nodes makes sense. >>>>>>> >>>>>>> I did a couple of replies in different email threads but maybe not >>>>>>> to all. I felt it was going nowhere :/ Couple of thoughts, >>>>>> >>>>>> I think I saw some of those, but maybe not all. >>>>>> >>>>>>> Shouldn't we get rid of the socket concept, die also, under pseries >>>>>>> since they don't exist under PAPR ? We only have numa nodes, cores, >>>>>>> threads AFAICT. >>>>>> >>>>>> Theoretically, yes. I'm not sure it's really practical, though, since >>>>>> AFAICT, both qemu and the kernel have the notion of sockets (though >>>>>> not dies) built into generic code. >>>>> >>>>> Yes. But, AFAICT, these topology notions have not reached "arch/powerpc" >>>>> and PPC Linux only has a NUMA node id, on pseries and powernv. >>>>> >>>>>> It does mean that one possible approach here - maybe the best one - is >>>>>> to simply declare that sockets are meaningless under, so we simply >>>>>> don't expect what the guest kernel reports to match what's given to >>>>>> qemu. >>>>>> >>>>>> It'd be nice to avoid that if we can: in a sense it's just cosmetic, >>>>>> but it is likely to surprise and confuse people. >>>>>> >>>>>>> Should we diverged from PAPR and add extra DT properties "qemu,..." ? >>>>>>> There are a couple of places where Linux checks for the underlying >>>>>>> hypervisor already. >>>>>>> >>>>>>>>> This also means that >>>>>>>>> 'ibm,chip-id' will probably remain in use since it's the only place where >>>>>>>>> we inform cores per socket information to the kernel. >>>>>>>> >>>>>>>> Well.. unless we can find some other sensible way to convey that >>>>>>>> information. I haven't given up hope for that yet. >>>>>>> >>>>>>> Well, we could start by fixing the value in QEMU. It is broken >>>>>>> today. >>>>>> >>>>>> Fixing what value, exactly? >>>>> >>>>> The value of the "ibm,chip-id" since we are keeping the property under >>>>> QEMU. >>>> >>>> David, I believe this has to do with the discussing we had last Friday. >>>> >>>> I mentioned that the ibm,chip-id property is being calculated in a way that >>>> promotes the same ibm,chip-id in CPUs that belongs to different NUMA nodes, >>>> e.g.: >>>> >>>> -smp 4,cores=4,maxcpus=8,threads=1 \ >>>> -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 \ >>>> -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 >>>> >>>> >>>> $ dtc -I dtb -O dts fdt.dtb | grep -B2 ibm,chip-id >>>> ibm,associativity = <0x05 0x00 0x00 0x00 0x00 0x00>; >>>> ibm,pft-size = <0x00 0x19>; >>>> ibm,chip-id = <0x00>; >>>> -- >>>> ibm,associativity = <0x05 0x00 0x00 0x00 0x00 0x01>; >>>> ibm,pft-size = <0x00 0x19>; >>>> ibm,chip-id = <0x00>; >>>> -- >>>> ibm,associativity = <0x05 0x01 0x01 0x01 0x01 0x02>; >>>> ibm,pft-size = <0x00 0x19>; >>>> ibm,chip-id = <0x00>; >>>> -- >>>> ibm,associativity = <0x05 0x01 0x01 0x01 0x01 0x03>; >>>> ibm,pft-size = <0x00 0x19>; >>>> ibm,chip-id = <0x00>; >>> >>>> We assign ibm,chip-id=0x0 to CPUs 0-3, but CPUs 2-3 are located in a >>>> different NUMA node than 0-1. This would mean that the same socket >>>> would belong to different NUMA nodes at the same time. >>> >>> Right... and I'm still not seeing why that's a problem. AFAICT that's >>> a possible, if unexpected, situation under real hardware - though >>> maybe not for POWER9 specifically. >> The ibm,chip-id property does not exist under PAPR. PAPR only has >> NUMA nodes, no sockets nor chips. >> >> And the property value is simply broken under QEMU. Try this : >> >> -smp 4,cores=1,maxcpus=8 -object memory-backend-ram,id=ram-node0,size=2G -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=2G -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 >> >> # dmesg | grep numa >> [ 0.013106] numa: Node 0 CPUs: 0-1 >> [ 0.013136] numa: Node 1 CPUs: 2-3 >> >> # dtc -I fs /proc/device-tree/cpus/ -f | grep ibm,chip-id >> ibm,chip-id = <0x01>; >> ibm,chip-id = <0x02>; >> ibm,chip-id = <0x00>; >> ibm,chip-id = <0x03>; > > These values are not wrong. When you do: > > -smp 4,cores=1,maxcpus=8 (....) > > You didn't fill threads and sockets. QEMU default is to prioritize sockets > to fill the missing information, up to the maxcpus value. This means that what > you did is equivalent to: > > -smp 4,threads=1,cores=1,sockets=8,maxcpus=8 (....) > > > It's a 1 thread/core, 1 core/socket with 8 sockets config. Each possible CPU > will sit in its own core, having its own ibm,chip-id. So: > > "-numa node,nodeid=0,cpus=0-1" > > is in fact allocating sockets 0 and 1 to NUMA node 0. ok. Thanks, C. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes 2021-03-31 15:18 ` Cédric Le Goater 2021-03-31 17:29 ` Daniel Henrique Barboza @ 2021-04-01 2:59 ` David Gibson 2021-04-01 9:21 ` Cédric Le Goater 1 sibling, 1 reply; 23+ messages in thread From: David Gibson @ 2021-04-01 2:59 UTC (permalink / raw) To: Cédric Le Goater Cc: Laurent Vivier, Thomas Huth, Srikar Dronamraju, Michael Ellerman, Daniel Henrique Barboza, qemu-devel, groug, qemu-ppc, Igor Mammedov [-- Attachment #1: Type: text/plain, Size: 8751 bytes --] On Wed, Mar 31, 2021 at 05:18:45PM +0200, Cédric Le Goater wrote: > On 3/31/21 2:57 AM, David Gibson wrote: > > On Mon, Mar 29, 2021 at 03:32:37PM -0300, Daniel Henrique Barboza wrote: > >> > >> > >> On 3/29/21 12:32 PM, Cédric Le Goater wrote: > >>> On 3/29/21 6:20 AM, David Gibson wrote: > >>>> On Thu, Mar 25, 2021 at 09:56:04AM +0100, Cédric Le Goater wrote: > >>>>> On 3/25/21 3:10 AM, David Gibson wrote: > >>>>>> On Tue, Mar 23, 2021 at 02:21:33PM -0300, Daniel Henrique Barboza wrote: > >>>>>>> > >>>>>>> > >>>>>>> On 3/22/21 10:03 PM, David Gibson wrote: > >>>>>>>> On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza wrote: > >>>>>>>>> Kernel commit 4bce545903fa ("powerpc/topology: Update > >>>>>>>>> topology_core_cpumask") cause a regression in the pseries machine when > >>>>>>>>> defining certain SMP topologies [1]. The reasoning behind the change is > >>>>>>>>> explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating > >>>>>>>>> cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with > >>>>>>>>> large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as > >>>>>>>>> far as the kernel understanding of SMP topologies goes, both masks are > >>>>>>>>> equivalent. > >>>>>>>>> > >>>>>>>>> Further discussions in the kernel mailing list [2] shown that the > >>>>>>>>> powerpc kernel always considered that the number of sockets were equal > >>>>>>>>> to the number of NUMA nodes. The claim is that it doesn't make sense, > >>>>>>>>> for Power hardware at least, 2+ sockets being in the same NUMA node. The > >>>>>>>>> immediate conclusion is that all SMP topologies the pseries machine were > >>>>>>>>> supplying to the kernel, with more than one socket in the same NUMA node > >>>>>>>>> as in [1], happened to be correctly represented in the kernel by > >>>>>>>>> accident during all these years. > >>>>>>>>> > >>>>>>>>> There's a case to be made for virtual topologies being detached from > >>>>>>>>> hardware constraints, allowing maximum flexibility to users. At the same > >>>>>>>>> time, this freedom can't result in unrealistic hardware representations > >>>>>>>>> being emulated. If the real hardware and the pseries kernel don't > >>>>>>>>> support multiple chips/sockets in the same NUMA node, neither should we. > >>>>>>>>> > >>>>>>>>> Starting in 6.0.0, all sockets must match an unique NUMA node in the > >>>>>>>>> pseries machine. qtest changes were made to adapt to this new > >>>>>>>>> condition. > >>>>>>>> > >>>>>>>> Oof. I really don't like this idea. It means a bunch of fiddly work > >>>>>>>> for users to match these up, for no real gain. I'm also concerned > >>>>>>>> that this will require follow on changes in libvirt to not make this a > >>>>>>>> really cryptic and irritating point of failure. > >>>>>>> > >>>>>>> Haven't though about required Libvirt changes, although I can say that there > >>>>>>> will be some amount to be mande and it will probably annoy existing users > >>>>>>> (everyone that has a multiple socket per NUMA node topology). > >>>>>>> > >>>>>>> There is not much we can do from the QEMU layer aside from what I've proposed > >>>>>>> here. The other alternative is to keep interacting with the kernel folks to > >>>>>>> see if there is a way to keep our use case untouched. > >>>>>> > >>>>>> Right. Well.. not necessarily untouched, but I'm hoping for more > >>>>>> replies from Cédric to my objections and mpe's. Even with sockets > >>>>>> being a kinda meaningless concept in PAPR, I don't think tying it to > >>>>>> NUMA nodes makes sense. > >>>>> > >>>>> I did a couple of replies in different email threads but maybe not > >>>>> to all. I felt it was going nowhere :/ Couple of thoughts, > >>>> > >>>> I think I saw some of those, but maybe not all. > >>>> > >>>>> Shouldn't we get rid of the socket concept, die also, under pseries > >>>>> since they don't exist under PAPR ? We only have numa nodes, cores, > >>>>> threads AFAICT. > >>>> > >>>> Theoretically, yes. I'm not sure it's really practical, though, since > >>>> AFAICT, both qemu and the kernel have the notion of sockets (though > >>>> not dies) built into generic code. > >>> > >>> Yes. But, AFAICT, these topology notions have not reached "arch/powerpc" > >>> and PPC Linux only has a NUMA node id, on pseries and powernv. > >>> > >>>> It does mean that one possible approach here - maybe the best one - is > >>>> to simply declare that sockets are meaningless under, so we simply > >>>> don't expect what the guest kernel reports to match what's given to > >>>> qemu. > >>>> > >>>> It'd be nice to avoid that if we can: in a sense it's just cosmetic, > >>>> but it is likely to surprise and confuse people. > >>>> > >>>>> Should we diverged from PAPR and add extra DT properties "qemu,..." ? > >>>>> There are a couple of places where Linux checks for the underlying > >>>>> hypervisor already. > >>>>> > >>>>>>> This also means that > >>>>>>> 'ibm,chip-id' will probably remain in use since it's the only place where > >>>>>>> we inform cores per socket information to the kernel. > >>>>>> > >>>>>> Well.. unless we can find some other sensible way to convey that > >>>>>> information. I haven't given up hope for that yet. > >>>>> > >>>>> Well, we could start by fixing the value in QEMU. It is broken > >>>>> today. > >>>> > >>>> Fixing what value, exactly? > >>> > >>> The value of the "ibm,chip-id" since we are keeping the property under > >>> QEMU. > >> > >> David, I believe this has to do with the discussing we had last Friday. > >> > >> I mentioned that the ibm,chip-id property is being calculated in a way that > >> promotes the same ibm,chip-id in CPUs that belongs to different NUMA nodes, > >> e.g.: > >> > >> -smp 4,cores=4,maxcpus=8,threads=1 \ > >> -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 \ > >> -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 > >> > >> > >> $ dtc -I dtb -O dts fdt.dtb | grep -B2 ibm,chip-id > >> ibm,associativity = <0x05 0x00 0x00 0x00 0x00 0x00>; > >> ibm,pft-size = <0x00 0x19>; > >> ibm,chip-id = <0x00>; > > > >> We assign ibm,chip-id=0x0 to CPUs 0-3, but CPUs 2-3 are located in a > >> different NUMA node than 0-1. This would mean that the same socket > >> would belong to different NUMA nodes at the same time. > > > > Right... and I'm still not seeing why that's a problem. AFAICT that's > > a possible, if unexpected, situation under real hardware - though > > maybe not for POWER9 specifically. > The ibm,chip-id property does not exist under PAPR. PAPR only has > NUMA nodes, no sockets nor chips. > > And the property value is simply broken under QEMU. Try this : > > -smp 4,cores=1,maxcpus=8 -object memory-backend-ram,id=ram-node0,size=2G -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=2G -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 > > # dmesg | grep numa > [ 0.013106] numa: Node 0 CPUs: 0-1 > [ 0.013136] numa: Node 1 CPUs: 2-3 > > # dtc -I fs /proc/device-tree/cpus/ -f | grep ibm,chip-id > ibm,chip-id = <0x01>; > ibm,chip-id = <0x02>; > ibm,chip-id = <0x00>; > ibm,chip-id = <0x03>; Not seeing the problem here. You've specified 8 (possible) cpus, 1 core-per-socket, therefore there are 4 sockets, hence 4 chip-ids. Again, I don't see where this assumption that the chip-id is somehow related to the NUMA topology is coming from. > >> I believe this is what Cedric wants to be addressed. Given that the > >> property is called after the OPAL property ibm,chip-id, the kernel > >> expects that the property will have the same semantics as in OPAL.> > > Even on powernv, I'm not clear why chip-id is tied into the NUMA > > configuration, rather than getting all the NUMA info from > > associativity properties. > > It is the case. What exactly is the case? > The associativity properties are built from chip-id in OPAL though. Ok, so? Why do we care how OPAL builds the associativity properties once we *have* the associativity properties its built? > The chip-id property is only used in low level PowerNV drivers, VAS, > XSCOM, LPC, etc. Which AFAIK, really *do* need to know which chip they're on, not the NUMA toplogy. > It's also badly used in the common part of the XIVE driver, what I am > trying to fix to introduce an IPI per node on all platforms. See comments on other thread. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes 2021-04-01 2:59 ` David Gibson @ 2021-04-01 9:21 ` Cédric Le Goater 0 siblings, 0 replies; 23+ messages in thread From: Cédric Le Goater @ 2021-04-01 9:21 UTC (permalink / raw) To: David Gibson Cc: Laurent Vivier, Thomas Huth, Srikar Dronamraju, Michael Ellerman, Daniel Henrique Barboza, qemu-devel, groug, qemu-ppc, Igor Mammedov On 4/1/21 4:59 AM, David Gibson wrote: > On Wed, Mar 31, 2021 at 05:18:45PM +0200, Cédric Le Goater wrote: >> On 3/31/21 2:57 AM, David Gibson wrote: >>> On Mon, Mar 29, 2021 at 03:32:37PM -0300, Daniel Henrique Barboza wrote: >>>> >>>> >>>> On 3/29/21 12:32 PM, Cédric Le Goater wrote: >>>>> On 3/29/21 6:20 AM, David Gibson wrote: >>>>>> On Thu, Mar 25, 2021 at 09:56:04AM +0100, Cédric Le Goater wrote: >>>>>>> On 3/25/21 3:10 AM, David Gibson wrote: >>>>>>>> On Tue, Mar 23, 2021 at 02:21:33PM -0300, Daniel Henrique Barboza wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 3/22/21 10:03 PM, David Gibson wrote: >>>>>>>>>> On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza wrote: >>>>>>>>>>> Kernel commit 4bce545903fa ("powerpc/topology: Update >>>>>>>>>>> topology_core_cpumask") cause a regression in the pseries machine when >>>>>>>>>>> defining certain SMP topologies [1]. The reasoning behind the change is >>>>>>>>>>> explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating >>>>>>>>>>> cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with >>>>>>>>>>> large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as >>>>>>>>>>> far as the kernel understanding of SMP topologies goes, both masks are >>>>>>>>>>> equivalent. >>>>>>>>>>> >>>>>>>>>>> Further discussions in the kernel mailing list [2] shown that the >>>>>>>>>>> powerpc kernel always considered that the number of sockets were equal >>>>>>>>>>> to the number of NUMA nodes. The claim is that it doesn't make sense, >>>>>>>>>>> for Power hardware at least, 2+ sockets being in the same NUMA node. The >>>>>>>>>>> immediate conclusion is that all SMP topologies the pseries machine were >>>>>>>>>>> supplying to the kernel, with more than one socket in the same NUMA node >>>>>>>>>>> as in [1], happened to be correctly represented in the kernel by >>>>>>>>>>> accident during all these years. >>>>>>>>>>> >>>>>>>>>>> There's a case to be made for virtual topologies being detached from >>>>>>>>>>> hardware constraints, allowing maximum flexibility to users. At the same >>>>>>>>>>> time, this freedom can't result in unrealistic hardware representations >>>>>>>>>>> being emulated. If the real hardware and the pseries kernel don't >>>>>>>>>>> support multiple chips/sockets in the same NUMA node, neither should we. >>>>>>>>>>> >>>>>>>>>>> Starting in 6.0.0, all sockets must match an unique NUMA node in the >>>>>>>>>>> pseries machine. qtest changes were made to adapt to this new >>>>>>>>>>> condition. >>>>>>>>>> >>>>>>>>>> Oof. I really don't like this idea. It means a bunch of fiddly work >>>>>>>>>> for users to match these up, for no real gain. I'm also concerned >>>>>>>>>> that this will require follow on changes in libvirt to not make this a >>>>>>>>>> really cryptic and irritating point of failure. >>>>>>>>> >>>>>>>>> Haven't though about required Libvirt changes, although I can say that there >>>>>>>>> will be some amount to be mande and it will probably annoy existing users >>>>>>>>> (everyone that has a multiple socket per NUMA node topology). >>>>>>>>> >>>>>>>>> There is not much we can do from the QEMU layer aside from what I've proposed >>>>>>>>> here. The other alternative is to keep interacting with the kernel folks to >>>>>>>>> see if there is a way to keep our use case untouched. >>>>>>>> >>>>>>>> Right. Well.. not necessarily untouched, but I'm hoping for more >>>>>>>> replies from Cédric to my objections and mpe's. Even with sockets >>>>>>>> being a kinda meaningless concept in PAPR, I don't think tying it to >>>>>>>> NUMA nodes makes sense. >>>>>>> >>>>>>> I did a couple of replies in different email threads but maybe not >>>>>>> to all. I felt it was going nowhere :/ Couple of thoughts, >>>>>> >>>>>> I think I saw some of those, but maybe not all. >>>>>> >>>>>>> Shouldn't we get rid of the socket concept, die also, under pseries >>>>>>> since they don't exist under PAPR ? We only have numa nodes, cores, >>>>>>> threads AFAICT. >>>>>> >>>>>> Theoretically, yes. I'm not sure it's really practical, though, since >>>>>> AFAICT, both qemu and the kernel have the notion of sockets (though >>>>>> not dies) built into generic code. >>>>> >>>>> Yes. But, AFAICT, these topology notions have not reached "arch/powerpc" >>>>> and PPC Linux only has a NUMA node id, on pseries and powernv. >>>>> >>>>>> It does mean that one possible approach here - maybe the best one - is >>>>>> to simply declare that sockets are meaningless under, so we simply >>>>>> don't expect what the guest kernel reports to match what's given to >>>>>> qemu. >>>>>> >>>>>> It'd be nice to avoid that if we can: in a sense it's just cosmetic, >>>>>> but it is likely to surprise and confuse people. >>>>>> >>>>>>> Should we diverged from PAPR and add extra DT properties "qemu,..." ? >>>>>>> There are a couple of places where Linux checks for the underlying >>>>>>> hypervisor already. >>>>>>> >>>>>>>>> This also means that >>>>>>>>> 'ibm,chip-id' will probably remain in use since it's the only place where >>>>>>>>> we inform cores per socket information to the kernel. >>>>>>>> >>>>>>>> Well.. unless we can find some other sensible way to convey that >>>>>>>> information. I haven't given up hope for that yet. >>>>>>> >>>>>>> Well, we could start by fixing the value in QEMU. It is broken >>>>>>> today. >>>>>> >>>>>> Fixing what value, exactly? >>>>> >>>>> The value of the "ibm,chip-id" since we are keeping the property under >>>>> QEMU. >>>> >>>> David, I believe this has to do with the discussing we had last Friday. >>>> >>>> I mentioned that the ibm,chip-id property is being calculated in a way that >>>> promotes the same ibm,chip-id in CPUs that belongs to different NUMA nodes, >>>> e.g.: >>>> >>>> -smp 4,cores=4,maxcpus=8,threads=1 \ >>>> -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 \ >>>> -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 >>>> >>>> >>>> $ dtc -I dtb -O dts fdt.dtb | grep -B2 ibm,chip-id >>>> ibm,associativity = <0x05 0x00 0x00 0x00 0x00 0x00>; >>>> ibm,pft-size = <0x00 0x19>; >>>> ibm,chip-id = <0x00>; >>> >>>> We assign ibm,chip-id=0x0 to CPUs 0-3, but CPUs 2-3 are located in a >>>> different NUMA node than 0-1. This would mean that the same socket >>>> would belong to different NUMA nodes at the same time. >>> >>> Right... and I'm still not seeing why that's a problem. AFAICT that's >>> a possible, if unexpected, situation under real hardware - though >>> maybe not for POWER9 specifically. >> The ibm,chip-id property does not exist under PAPR. PAPR only has >> NUMA nodes, no sockets nor chips. >> >> And the property value is simply broken under QEMU. Try this : >> >> -smp 4,cores=1,maxcpus=8 -object memory-backend-ram,id=ram-node0,size=2G -numa node,nodeid=0,cpus=0-1,cpus=4-5,memdev=ram-node0 -object memory-backend-ram,id=ram-node1,size=2G -numa node,nodeid=1,cpus=2-3,cpus=6-7,memdev=ram-node1 >> >> # dmesg | grep numa >> [ 0.013106] numa: Node 0 CPUs: 0-1 >> [ 0.013136] numa: Node 1 CPUs: 2-3 >> >> # dtc -I fs /proc/device-tree/cpus/ -f | grep ibm,chip-id >> ibm,chip-id = <0x01>; >> ibm,chip-id = <0x02>; >> ibm,chip-id = <0x00>; >> ibm,chip-id = <0x03>; > > Not seeing the problem here. You've specified 8 (possible) cpus, 1 > core-per-socket, therefore there are 4 sockets, hence 4 chip-ids. > Again, I don't see where this assumption that the chip-id is somehow > related to the NUMA topology is coming from. > >>>> I believe this is what Cedric wants to be addressed. Given that the >>>> property is called after the OPAL property ibm,chip-id, the kernel >>>> expects that the property will have the same semantics as in OPAL.> >>> Even on powernv, I'm not clear why chip-id is tied into the NUMA >>> configuration, rather than getting all the NUMA info from >>> associativity properties. >> >> It is the case. > > What exactly is the case? sorry. That was badly phrased. I meant ; powernv gets the NUMA info from "ibm,associativity" property. C. >> The associativity properties are built from chip-id in OPAL though. > > Ok, so? Why do we care how OPAL builds the associativity properties > once we *have* the associativity properties its built? > >> The chip-id property is only used in low level PowerNV drivers, VAS, >> XSCOM, LPC, etc. > > Which AFAIK, really *do* need to know which chip they're on, not the > NUMA toplogy. > >> It's also badly used in the common part of the XIVE driver, what I am >> trying to fix to introduce an IPI per node on all platforms. > > See comments on other thread. > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes 2021-03-23 1:03 ` David Gibson 2021-03-23 17:21 ` Daniel Henrique Barboza @ 2021-03-29 23:51 ` Igor Mammedov 2021-03-30 21:33 ` Daniel Henrique Barboza 1 sibling, 1 reply; 23+ messages in thread From: Igor Mammedov @ 2021-03-29 23:51 UTC (permalink / raw) To: David Gibson Cc: Laurent Vivier, Thomas Huth, Srikar Dronamraju, Daniel Henrique Barboza, qemu-devel, groug, qemu-ppc, clg On Tue, 23 Mar 2021 12:03:58 +1100 David Gibson <david@gibson.dropbear.id.au> wrote: > On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza wrote: > > Kernel commit 4bce545903fa ("powerpc/topology: Update > > topology_core_cpumask") cause a regression in the pseries machine when > > defining certain SMP topologies [1]. The reasoning behind the change is > > explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating > > cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with > > large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as > > far as the kernel understanding of SMP topologies goes, both masks are > > equivalent. > > > > Further discussions in the kernel mailing list [2] shown that the > > powerpc kernel always considered that the number of sockets were equal > > to the number of NUMA nodes. The claim is that it doesn't make sense, > > for Power hardware at least, 2+ sockets being in the same NUMA node. The > > immediate conclusion is that all SMP topologies the pseries machine were > > supplying to the kernel, with more than one socket in the same NUMA node > > as in [1], happened to be correctly represented in the kernel by > > accident during all these years. > > > > There's a case to be made for virtual topologies being detached from > > hardware constraints, allowing maximum flexibility to users. At the same > > time, this freedom can't result in unrealistic hardware representations > > being emulated. If the real hardware and the pseries kernel don't > > support multiple chips/sockets in the same NUMA node, neither should we. > > > > Starting in 6.0.0, all sockets must match an unique NUMA node in the > > pseries machine. qtest changes were made to adapt to this new > > condition. > > Oof. I really don't like this idea. It means a bunch of fiddly work > for users to match these up, for no real gain. I'm also concerned > that this will require follow on changes in libvirt to not make this a > really cryptic and irritating point of failure. yes, it is annoying to users but we were very slowly tightening limitations on numa path, especially if it causes problems on guest side when user specify nonsense configs (and suspect there are few things to enforce in generic numa code left (currently just warnings)). So it's nothing new. So if limit applies to new machine type it should be fine (i.e. no existing VMs will suffer). Later on we can deprecate invalid configurations altogether and just error out. > > > > [1] https://bugzilla.redhat.com/1934421 > > [2] https://lore.kernel.org/linuxppc-dev/daa5d05f-dbd0-05ad-7395-5d5a3d364fc6@gmail.com/ > > > > CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com> > > CC: Cédric Le Goater <clg@kaod.org> > > CC: Igor Mammedov <imammedo@redhat.com> > > CC: Laurent Vivier <lvivier@redhat.com> > > CC: Thomas Huth <thuth@redhat.com> > > Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> > > --- > > hw/ppc/spapr.c | 3 ++ > > hw/ppc/spapr_numa.c | 7 +++++ > > include/hw/ppc/spapr.h | 1 + > > tests/qtest/cpu-plug-test.c | 4 +-- > > tests/qtest/device-plug-test.c | 9 +++++- > > tests/qtest/numa-test.c | 52 ++++++++++++++++++++++++++++------ > > 6 files changed, 64 insertions(+), 12 deletions(-) > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > index d56418ca29..745f71c243 100644 > > --- a/hw/ppc/spapr.c > > +++ b/hw/ppc/spapr.c > > @@ -4611,8 +4611,11 @@ DEFINE_SPAPR_MACHINE(6_0, "6.0", true); > > */ > > static void spapr_machine_5_2_class_options(MachineClass *mc) > > { > > + SpaprMachineClass *smc = SPAPR_MACHINE_CLASS(mc); > > + > > spapr_machine_6_0_class_options(mc); > > compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len); > > + smc->pre_6_0_smp_topology = true; > > } > > > > DEFINE_SPAPR_MACHINE(5_2, "5.2", false); > > diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c > > index 779f18b994..0ade43dd79 100644 > > --- a/hw/ppc/spapr_numa.c > > +++ b/hw/ppc/spapr_numa.c > > @@ -163,6 +163,13 @@ void spapr_numa_associativity_init(SpaprMachineState *spapr, > > int i, j, max_nodes_with_gpus; > > bool using_legacy_numa = spapr_machine_using_legacy_numa(spapr); > > > > + if (!smc->pre_6_0_smp_topology && > > + nb_numa_nodes != machine->smp.sockets) { > > + error_report("Number of CPU sockets must be equal to the number " > > + "of NUMA nodes"); > > + exit(EXIT_FAILURE); > > + } > > + > > /* > > * For all associativity arrays: first position is the size, > > * position MAX_DISTANCE_REF_POINTS is always the numa_id, > > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h > > index 47cebaf3ac..98dc5d198a 100644 > > --- a/include/hw/ppc/spapr.h > > +++ b/include/hw/ppc/spapr.h > > @@ -142,6 +142,7 @@ struct SpaprMachineClass { > > hwaddr rma_limit; /* clamp the RMA to this size */ > > bool pre_5_1_assoc_refpoints; > > bool pre_5_2_numa_associativity; > > + bool pre_6_0_smp_topology; > > > > bool (*phb_placement)(SpaprMachineState *spapr, uint32_t index, > > uint64_t *buid, hwaddr *pio, > > diff --git a/tests/qtest/cpu-plug-test.c b/tests/qtest/cpu-plug-test.c > > index a1c689414b..946b9129ea 100644 > > --- a/tests/qtest/cpu-plug-test.c > > +++ b/tests/qtest/cpu-plug-test.c > > @@ -118,8 +118,8 @@ static void add_pseries_test_case(const char *mname) > > data->machine = g_strdup(mname); > > data->cpu_model = "power8_v2.0"; > > data->device_model = g_strdup("power8_v2.0-spapr-cpu-core"); > > - data->sockets = 2; > > - data->cores = 3; > > + data->sockets = 1; > > + data->cores = 6; > > data->threads = 1; > > data->maxcpus = data->sockets * data->cores * data->threads; > > > > diff --git a/tests/qtest/device-plug-test.c b/tests/qtest/device-plug-test.c > > index 559d47727a..dd7d8268d2 100644 > > --- a/tests/qtest/device-plug-test.c > > +++ b/tests/qtest/device-plug-test.c > > @@ -91,7 +91,14 @@ static void test_spapr_cpu_unplug_request(void) > > { > > QTestState *qtest; > > > > - qtest = qtest_initf("-cpu power9_v2.0 -smp 1,maxcpus=2 " > > + /* > > + * Default smp settings will prioritize sockets over cores and > > + * threads, so '-smp 2,maxcpus=2' will add 2 sockets. However, > > + * the pseries machine requires a NUMA node for each socket > > + * (since 6.0.0). Specify sockets=1 to make life easier. > > + */ > > + qtest = qtest_initf("-cpu power9_v2.0 " > > + "-smp 1,maxcpus=2,threads=1,cores=2,sockets=1 " > > "-device power9_v2.0-spapr-cpu-core,core-id=1,id=dev0"); > > > > /* similar to test_pci_unplug_request */ > > diff --git a/tests/qtest/numa-test.c b/tests/qtest/numa-test.c > > index dc0ec571ca..bb13f7131b 100644 > > --- a/tests/qtest/numa-test.c > > +++ b/tests/qtest/numa-test.c > > @@ -24,9 +24,17 @@ static void test_mon_explicit(const void *data) > > QTestState *qts; > > g_autofree char *s = NULL; > > g_autofree char *cli = NULL; > > + const char *arch = qtest_get_arch(); > > + > > + if (g_str_equal(arch, "ppc64")) { > > + cli = make_cli(data, "-smp 8,threads=1,cores=4,sockets=2 " > > + "-numa node,nodeid=0,memdev=ram,cpus=0-3 " > > + "-numa node,nodeid=1,cpus=4-7"); > > + } else { > > + cli = make_cli(data, "-smp 8 -numa node,nodeid=0,memdev=ram,cpus=0-3 " > > + "-numa node,nodeid=1,cpus=4-7"); > > + } > > > > - cli = make_cli(data, "-smp 8 -numa node,nodeid=0,memdev=ram,cpus=0-3 " > > - "-numa node,nodeid=1,cpus=4-7"); > > qts = qtest_init(cli); > > > > s = qtest_hmp(qts, "info numa"); > > @@ -57,10 +65,18 @@ static void test_mon_partial(const void *data) > > QTestState *qts; > > g_autofree char *s = NULL; > > g_autofree char *cli = NULL; > > + const char *arch = qtest_get_arch(); > > + > > + if (g_str_equal(arch, "ppc64")) { > > + cli = make_cli(data, "-smp 8,threads=1,cores=4,sockets=2 " > > + "-numa node,nodeid=0,memdev=ram,cpus=0-1 " > > + "-numa node,nodeid=1,cpus=4-5 "); > > + } else { > > + cli = make_cli(data, "-smp 8 " > > + "-numa node,nodeid=0,memdev=ram,cpus=0-1 " > > + "-numa node,nodeid=1,cpus=4-5 "); > > + } > > > > - cli = make_cli(data, "-smp 8 " > > - "-numa node,nodeid=0,memdev=ram,cpus=0-1 " > > - "-numa node,nodeid=1,cpus=4-5 "); > > qts = qtest_init(cli); > > > > s = qtest_hmp(qts, "info numa"); > > @@ -85,9 +101,17 @@ static void test_query_cpus(const void *data) > > QObject *e; > > QTestState *qts; > > g_autofree char *cli = NULL; > > + const char *arch = qtest_get_arch(); > > + > > + if (g_str_equal(arch, "ppc64")) { > > + cli = make_cli(data, "-smp 8,threads=1,cores=4,sockets=2 " > > + "-numa node,memdev=ram,cpus=0-3 " > > + "-numa node,cpus=4-7"); > > + } else { > > + cli = make_cli(data, "-smp 8 -numa node,memdev=ram,cpus=0-3 " > > + "-numa node,cpus=4-7"); > > + } > > > > - cli = make_cli(data, "-smp 8 -numa node,memdev=ram,cpus=0-3 " > > - "-numa node,cpus=4-7"); > > qts = qtest_init(cli); > > cpus = get_cpus(qts, &resp); > > g_assert(cpus); > > @@ -177,7 +201,7 @@ static void spapr_numa_cpu(const void *data) > > QTestState *qts; > > g_autofree char *cli = NULL; > > > > - cli = make_cli(data, "-smp 4,cores=4 " > > + cli = make_cli(data, "-smp 4,threads=1,cores=2,sockets=2 " > > "-numa node,nodeid=0,memdev=ram -numa node,nodeid=1 " > > "-numa cpu,node-id=0,core-id=0 " > > "-numa cpu,node-id=0,core-id=1 " > > @@ -554,7 +578,17 @@ int main(int argc, char **argv) > > > > g_test_init(&argc, &argv, NULL); > > > > - qtest_add_data_func("/numa/mon/cpus/default", args, test_def_cpu_split); > > + /* > > + * Starting on 6.0.0, for the pseries machine, '-smp 8' will only work > > + * if we have 8 NUMA nodes. If we specify 'smp 8,sockets=2' to match > > + * 2 NUMA nodes, the CPUs will be split as 0123/4567 instead of > > + * 0246/1357 that test_def_cpu_split expects. In short, this test is > > + * no longer valid for ppc64 in 6.0.0. > > + */ > > + if (!g_str_equal(arch, "ppc64")) { > > + qtest_add_data_func("/numa/mon/cpus/default", args, test_def_cpu_split); > > + } > > + > > qtest_add_data_func("/numa/mon/cpus/explicit", args, test_mon_explicit); > > qtest_add_data_func("/numa/mon/cpus/partial", args, test_mon_partial); > > qtest_add_data_func("/numa/qmp/cpus/query-cpus", args, test_query_cpus); > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes 2021-03-29 23:51 ` Igor Mammedov @ 2021-03-30 21:33 ` Daniel Henrique Barboza 0 siblings, 0 replies; 23+ messages in thread From: Daniel Henrique Barboza @ 2021-03-30 21:33 UTC (permalink / raw) To: Igor Mammedov, David Gibson Cc: Laurent Vivier, Thomas Huth, Srikar Dronamraju, qemu-devel, groug, qemu-ppc, clg On 3/29/21 8:51 PM, Igor Mammedov wrote: > On Tue, 23 Mar 2021 12:03:58 +1100 > David Gibson <david@gibson.dropbear.id.au> wrote: > >> On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza wrote: >>> Kernel commit 4bce545903fa ("powerpc/topology: Update >>> topology_core_cpumask") cause a regression in the pseries machine when >>> defining certain SMP topologies [1]. The reasoning behind the change is >>> explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating >>> cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with >>> large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as >>> far as the kernel understanding of SMP topologies goes, both masks are >>> equivalent. >>> >>> Further discussions in the kernel mailing list [2] shown that the >>> powerpc kernel always considered that the number of sockets were equal >>> to the number of NUMA nodes. The claim is that it doesn't make sense, >>> for Power hardware at least, 2+ sockets being in the same NUMA node. The >>> immediate conclusion is that all SMP topologies the pseries machine were >>> supplying to the kernel, with more than one socket in the same NUMA node >>> as in [1], happened to be correctly represented in the kernel by >>> accident during all these years. >>> >>> There's a case to be made for virtual topologies being detached from >>> hardware constraints, allowing maximum flexibility to users. At the same >>> time, this freedom can't result in unrealistic hardware representations >>> being emulated. If the real hardware and the pseries kernel don't >>> support multiple chips/sockets in the same NUMA node, neither should we. >>> >>> Starting in 6.0.0, all sockets must match an unique NUMA node in the >>> pseries machine. qtest changes were made to adapt to this new >>> condition. >> >> Oof. I really don't like this idea. It means a bunch of fiddly work >> for users to match these up, for no real gain. I'm also concerned >> that this will require follow on changes in libvirt to not make this a >> really cryptic and irritating point of failure. > > yes, it is annoying to users but we were very slowly tightening limitations > on numa path, especially if it causes problems on guest side when user > specify nonsense configs (and suspect there are few things to enforce in > generic numa code left (currently just warnings)). So it's nothing new. Do you mind elaborating a bit on the 'nonsense configs' bit? Do you mean configs that are unrealistic because it's uncompliant with ACPI? I always thought that all possible SMP/NUMA topologies QEMU allows were ACPI/x86 compliant. If both x86 and PowerPC have restrictions in common we might want to consider putting this code in a common helper/place. > > So if limit applies to new machine type it should be fine (i.e. no existing > VMs will suffer). Later on we can deprecate invalid configurations altogether > and just error out. hmmm I'm not warning the user about the deprecated topologies in existing guests in this patch. I guess it doesn't hurt to add a warning for existing VMs. Thanks, DHB > >>> >>> [1] https://bugzilla.redhat.com/1934421 >>> [2] https://lore.kernel.org/linuxppc-dev/daa5d05f-dbd0-05ad-7395-5d5a3d364fc6@gmail.com/ >>> >>> CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com> >>> CC: Cédric Le Goater <clg@kaod.org> >>> CC: Igor Mammedov <imammedo@redhat.com> >>> CC: Laurent Vivier <lvivier@redhat.com> >>> CC: Thomas Huth <thuth@redhat.com> >>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> >>> --- >>> hw/ppc/spapr.c | 3 ++ >>> hw/ppc/spapr_numa.c | 7 +++++ >>> include/hw/ppc/spapr.h | 1 + >>> tests/qtest/cpu-plug-test.c | 4 +-- >>> tests/qtest/device-plug-test.c | 9 +++++- >>> tests/qtest/numa-test.c | 52 ++++++++++++++++++++++++++++------ >>> 6 files changed, 64 insertions(+), 12 deletions(-) >>> >>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c >>> index d56418ca29..745f71c243 100644 >>> --- a/hw/ppc/spapr.c >>> +++ b/hw/ppc/spapr.c >>> @@ -4611,8 +4611,11 @@ DEFINE_SPAPR_MACHINE(6_0, "6.0", true); >>> */ >>> static void spapr_machine_5_2_class_options(MachineClass *mc) >>> { >>> + SpaprMachineClass *smc = SPAPR_MACHINE_CLASS(mc); >>> + >>> spapr_machine_6_0_class_options(mc); >>> compat_props_add(mc->compat_props, hw_compat_5_2, hw_compat_5_2_len); >>> + smc->pre_6_0_smp_topology = true; >>> } >>> >>> DEFINE_SPAPR_MACHINE(5_2, "5.2", false); >>> diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c >>> index 779f18b994..0ade43dd79 100644 >>> --- a/hw/ppc/spapr_numa.c >>> +++ b/hw/ppc/spapr_numa.c >>> @@ -163,6 +163,13 @@ void spapr_numa_associativity_init(SpaprMachineState *spapr, >>> int i, j, max_nodes_with_gpus; >>> bool using_legacy_numa = spapr_machine_using_legacy_numa(spapr); >>> >>> + if (!smc->pre_6_0_smp_topology && >>> + nb_numa_nodes != machine->smp.sockets) { >>> + error_report("Number of CPU sockets must be equal to the number " >>> + "of NUMA nodes"); >>> + exit(EXIT_FAILURE); >>> + } >>> + >>> /* >>> * For all associativity arrays: first position is the size, >>> * position MAX_DISTANCE_REF_POINTS is always the numa_id, >>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h >>> index 47cebaf3ac..98dc5d198a 100644 >>> --- a/include/hw/ppc/spapr.h >>> +++ b/include/hw/ppc/spapr.h >>> @@ -142,6 +142,7 @@ struct SpaprMachineClass { >>> hwaddr rma_limit; /* clamp the RMA to this size */ >>> bool pre_5_1_assoc_refpoints; >>> bool pre_5_2_numa_associativity; >>> + bool pre_6_0_smp_topology; >>> >>> bool (*phb_placement)(SpaprMachineState *spapr, uint32_t index, >>> uint64_t *buid, hwaddr *pio, >>> diff --git a/tests/qtest/cpu-plug-test.c b/tests/qtest/cpu-plug-test.c >>> index a1c689414b..946b9129ea 100644 >>> --- a/tests/qtest/cpu-plug-test.c >>> +++ b/tests/qtest/cpu-plug-test.c >>> @@ -118,8 +118,8 @@ static void add_pseries_test_case(const char *mname) >>> data->machine = g_strdup(mname); >>> data->cpu_model = "power8_v2.0"; >>> data->device_model = g_strdup("power8_v2.0-spapr-cpu-core"); >>> - data->sockets = 2; >>> - data->cores = 3; >>> + data->sockets = 1; >>> + data->cores = 6; >>> data->threads = 1; >>> data->maxcpus = data->sockets * data->cores * data->threads; >>> >>> diff --git a/tests/qtest/device-plug-test.c b/tests/qtest/device-plug-test.c >>> index 559d47727a..dd7d8268d2 100644 >>> --- a/tests/qtest/device-plug-test.c >>> +++ b/tests/qtest/device-plug-test.c >>> @@ -91,7 +91,14 @@ static void test_spapr_cpu_unplug_request(void) >>> { >>> QTestState *qtest; >>> >>> - qtest = qtest_initf("-cpu power9_v2.0 -smp 1,maxcpus=2 " >>> + /* >>> + * Default smp settings will prioritize sockets over cores and >>> + * threads, so '-smp 2,maxcpus=2' will add 2 sockets. However, >>> + * the pseries machine requires a NUMA node for each socket >>> + * (since 6.0.0). Specify sockets=1 to make life easier. >>> + */ >>> + qtest = qtest_initf("-cpu power9_v2.0 " >>> + "-smp 1,maxcpus=2,threads=1,cores=2,sockets=1 " >>> "-device power9_v2.0-spapr-cpu-core,core-id=1,id=dev0"); >>> >>> /* similar to test_pci_unplug_request */ >>> diff --git a/tests/qtest/numa-test.c b/tests/qtest/numa-test.c >>> index dc0ec571ca..bb13f7131b 100644 >>> --- a/tests/qtest/numa-test.c >>> +++ b/tests/qtest/numa-test.c >>> @@ -24,9 +24,17 @@ static void test_mon_explicit(const void *data) >>> QTestState *qts; >>> g_autofree char *s = NULL; >>> g_autofree char *cli = NULL; >>> + const char *arch = qtest_get_arch(); >>> + >>> + if (g_str_equal(arch, "ppc64")) { >>> + cli = make_cli(data, "-smp 8,threads=1,cores=4,sockets=2 " >>> + "-numa node,nodeid=0,memdev=ram,cpus=0-3 " >>> + "-numa node,nodeid=1,cpus=4-7"); >>> + } else { >>> + cli = make_cli(data, "-smp 8 -numa node,nodeid=0,memdev=ram,cpus=0-3 " >>> + "-numa node,nodeid=1,cpus=4-7"); >>> + } >>> >>> - cli = make_cli(data, "-smp 8 -numa node,nodeid=0,memdev=ram,cpus=0-3 " >>> - "-numa node,nodeid=1,cpus=4-7"); >>> qts = qtest_init(cli); >>> >>> s = qtest_hmp(qts, "info numa"); >>> @@ -57,10 +65,18 @@ static void test_mon_partial(const void *data) >>> QTestState *qts; >>> g_autofree char *s = NULL; >>> g_autofree char *cli = NULL; >>> + const char *arch = qtest_get_arch(); >>> + >>> + if (g_str_equal(arch, "ppc64")) { >>> + cli = make_cli(data, "-smp 8,threads=1,cores=4,sockets=2 " >>> + "-numa node,nodeid=0,memdev=ram,cpus=0-1 " >>> + "-numa node,nodeid=1,cpus=4-5 "); >>> + } else { >>> + cli = make_cli(data, "-smp 8 " >>> + "-numa node,nodeid=0,memdev=ram,cpus=0-1 " >>> + "-numa node,nodeid=1,cpus=4-5 "); >>> + } >>> >>> - cli = make_cli(data, "-smp 8 " >>> - "-numa node,nodeid=0,memdev=ram,cpus=0-1 " >>> - "-numa node,nodeid=1,cpus=4-5 "); >>> qts = qtest_init(cli); >>> >>> s = qtest_hmp(qts, "info numa"); >>> @@ -85,9 +101,17 @@ static void test_query_cpus(const void *data) >>> QObject *e; >>> QTestState *qts; >>> g_autofree char *cli = NULL; >>> + const char *arch = qtest_get_arch(); >>> + >>> + if (g_str_equal(arch, "ppc64")) { >>> + cli = make_cli(data, "-smp 8,threads=1,cores=4,sockets=2 " >>> + "-numa node,memdev=ram,cpus=0-3 " >>> + "-numa node,cpus=4-7"); >>> + } else { >>> + cli = make_cli(data, "-smp 8 -numa node,memdev=ram,cpus=0-3 " >>> + "-numa node,cpus=4-7"); >>> + } >>> >>> - cli = make_cli(data, "-smp 8 -numa node,memdev=ram,cpus=0-3 " >>> - "-numa node,cpus=4-7"); >>> qts = qtest_init(cli); >>> cpus = get_cpus(qts, &resp); >>> g_assert(cpus); >>> @@ -177,7 +201,7 @@ static void spapr_numa_cpu(const void *data) >>> QTestState *qts; >>> g_autofree char *cli = NULL; >>> >>> - cli = make_cli(data, "-smp 4,cores=4 " >>> + cli = make_cli(data, "-smp 4,threads=1,cores=2,sockets=2 " >>> "-numa node,nodeid=0,memdev=ram -numa node,nodeid=1 " >>> "-numa cpu,node-id=0,core-id=0 " >>> "-numa cpu,node-id=0,core-id=1 " >>> @@ -554,7 +578,17 @@ int main(int argc, char **argv) >>> >>> g_test_init(&argc, &argv, NULL); >>> >>> - qtest_add_data_func("/numa/mon/cpus/default", args, test_def_cpu_split); >>> + /* >>> + * Starting on 6.0.0, for the pseries machine, '-smp 8' will only work >>> + * if we have 8 NUMA nodes. If we specify 'smp 8,sockets=2' to match >>> + * 2 NUMA nodes, the CPUs will be split as 0123/4567 instead of >>> + * 0246/1357 that test_def_cpu_split expects. In short, this test is >>> + * no longer valid for ppc64 in 6.0.0. >>> + */ >>> + if (!g_str_equal(arch, "ppc64")) { >>> + qtest_add_data_func("/numa/mon/cpus/default", args, test_def_cpu_split); >>> + } >>> + >>> qtest_add_data_func("/numa/mon/cpus/explicit", args, test_mon_explicit); >>> qtest_add_data_func("/numa/mon/cpus/partial", args, test_mon_partial); >>> qtest_add_data_func("/numa/qmp/cpus/query-cpus", args, test_query_cpus); >> > ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 2/2] spapr.c: remove 'ibm,chip-id' from DT 2021-03-19 18:34 [PATCH 0/2] pseries: SMP sockets must match NUMA nodes Daniel Henrique Barboza 2021-03-19 18:34 ` [PATCH 1/2] spapr: number of SMP sockets must be equal to " Daniel Henrique Barboza @ 2021-03-19 18:34 ` Daniel Henrique Barboza 1 sibling, 0 replies; 23+ messages in thread From: Daniel Henrique Barboza @ 2021-03-19 18:34 UTC (permalink / raw) To: qemu-devel Cc: Alexey Kardashevskiy, Daniel Henrique Barboza, groug, qemu-ppc, clg, david The attribute 'ibm,chip-id' does not exist in PAPR. This alone would be enough reason to remove it from the spapr DT, but before doing that, let's give a brief context on how and why it was introduced. 'ibm,chip-id' was added in the spapr DT by commit 10582ff83279. This commit references kernel commit 256f2d4b463d ("powerpc: Use ibm,chip-id property to compute cpu_core_mask if available"). In this kernel commit, the 'ibm,chip-id' DT attribute is being used to calculate the cpu_core_mask in traverse_siblings_chip_id(). This function still need to consider 'ibm,chip-id' not being available as well to avoid breaking older guests. Later on, kernel commit df52f6714071 ("powerpc/smp: Rework CPU topology construction") removed traverse_siblings_chip_id() and its callers, making the CPU topology calculation independent of the 'ibm,chip-id' attribute. Last, but perhaps most relevant, kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating cpu_core_mask") changed the way the kernel calculates the cpu_core_mask, removing the use of 'ibm,chip-id' in the calculation altogether, with consequences already discussed in the previous patch. All that said, since it's not in PAPR and the pseries kernel does not rely on it, let's remove ibm,chip-id from the DT for the default machine type. This removal is related to the previous SMP change, so re-use the same smc->pre_6_0_smp_topology flag. CC: Alexey Kardashevskiy <aik@ozlabs.ru> Suggested-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> --- hw/ppc/spapr.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 745f71c243..58efb51ac7 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -649,6 +649,7 @@ static void spapr_dt_cpu(CPUState *cs, void *fdt, int offset, PowerPCCPU *cpu = POWERPC_CPU(cs); CPUPPCState *env = &cpu->env; PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cs); + SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(ms); int index = spapr_get_vcpu_id(cpu); uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40), 0xffffffff, 0xffffffff}; @@ -745,8 +746,10 @@ static void spapr_dt_cpu(CPUState *cs, void *fdt, int offset, spapr_dt_pa_features(spapr, cpu, fdt, offset); - _FDT((fdt_setprop_cell(fdt, offset, "ibm,chip-id", - cs->cpu_index / vcpus_per_socket))); + if (smc->pre_6_0_smp_topology) { + _FDT((fdt_setprop_cell(fdt, offset, "ibm,chip-id", + cs->cpu_index / vcpus_per_socket))); + } _FDT((fdt_setprop(fdt, offset, "ibm,pft-size", pft_size_prop, sizeof(pft_size_prop)))); -- 2.29.2 ^ permalink raw reply related [flat|nested] 23+ messages in thread
end of thread, other threads:[~2021-04-01 9:22 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-03-19 18:34 [PATCH 0/2] pseries: SMP sockets must match NUMA nodes Daniel Henrique Barboza 2021-03-19 18:34 ` [PATCH 1/2] spapr: number of SMP sockets must be equal to " Daniel Henrique Barboza 2021-03-23 1:03 ` David Gibson 2021-03-23 17:21 ` Daniel Henrique Barboza 2021-03-25 2:10 ` David Gibson 2021-03-25 8:56 ` Cédric Le Goater 2021-03-25 10:15 ` Daniel Henrique Barboza 2021-03-29 4:20 ` David Gibson 2021-03-29 15:32 ` Cédric Le Goater 2021-03-29 18:32 ` Daniel Henrique Barboza 2021-03-29 23:55 ` Igor Mammedov 2021-03-31 0:57 ` David Gibson 2021-03-31 4:58 ` Michael Ellerman 2021-03-31 15:22 ` Cédric Le Goater 2021-04-01 2:53 ` David Gibson 2021-03-31 15:18 ` Cédric Le Goater 2021-03-31 17:29 ` Daniel Henrique Barboza 2021-03-31 17:40 ` Cédric Le Goater 2021-04-01 2:59 ` David Gibson 2021-04-01 9:21 ` Cédric Le Goater 2021-03-29 23:51 ` Igor Mammedov 2021-03-30 21:33 ` Daniel Henrique Barboza 2021-03-19 18:34 ` [PATCH 2/2] spapr.c: remove 'ibm,chip-id' from DT Daniel Henrique Barboza
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).