From: David Gibson <david@gibson.dropbear.id.au>
To: Daniel Henrique Barboza <danielhb413@gmail.com>
Cc: Laurent Vivier <lvivier@redhat.com>,
Thomas Huth <thuth@redhat.com>,
Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
qemu-devel@nongnu.org, groug@kaod.org, qemu-ppc@nongnu.org,
clg@kaod.org, Igor Mammedov <imammedo@redhat.com>
Subject: Re: [PATCH 1/2] spapr: number of SMP sockets must be equal to NUMA nodes
Date: Thu, 25 Mar 2021 13:10:09 +1100 [thread overview]
Message-ID: <YFvxAW3l4t+YznEm@yekko.fritz.box> (raw)
In-Reply-To: <2025f26f-5883-4e86-02af-5b83a8d52465@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3347 bytes --]
On Tue, Mar 23, 2021 at 02:21:33PM -0300, Daniel Henrique Barboza wrote:
>
>
> On 3/22/21 10:03 PM, David Gibson wrote:
> > On Fri, Mar 19, 2021 at 03:34:52PM -0300, Daniel Henrique Barboza wrote:
> > > Kernel commit 4bce545903fa ("powerpc/topology: Update
> > > topology_core_cpumask") cause a regression in the pseries machine when
> > > defining certain SMP topologies [1]. The reasoning behind the change is
> > > explained in kernel commit 4ca234a9cbd7 ("powerpc/smp: Stop updating
> > > cpu_core_mask"). In short, cpu_core_mask logic was causing troubles with
> > > large VMs with lots of CPUs and was changed by cpu_cpu_mask because, as
> > > far as the kernel understanding of SMP topologies goes, both masks are
> > > equivalent.
> > >
> > > Further discussions in the kernel mailing list [2] shown that the
> > > powerpc kernel always considered that the number of sockets were equal
> > > to the number of NUMA nodes. The claim is that it doesn't make sense,
> > > for Power hardware at least, 2+ sockets being in the same NUMA node. The
> > > immediate conclusion is that all SMP topologies the pseries machine were
> > > supplying to the kernel, with more than one socket in the same NUMA node
> > > as in [1], happened to be correctly represented in the kernel by
> > > accident during all these years.
> > >
> > > There's a case to be made for virtual topologies being detached from
> > > hardware constraints, allowing maximum flexibility to users. At the same
> > > time, this freedom can't result in unrealistic hardware representations
> > > being emulated. If the real hardware and the pseries kernel don't
> > > support multiple chips/sockets in the same NUMA node, neither should we.
> > >
> > > Starting in 6.0.0, all sockets must match an unique NUMA node in the
> > > pseries machine. qtest changes were made to adapt to this new
> > > condition.
> >
> > Oof. I really don't like this idea. It means a bunch of fiddly work
> > for users to match these up, for no real gain. I'm also concerned
> > that this will require follow on changes in libvirt to not make this a
> > really cryptic and irritating point of failure.
>
> Haven't though about required Libvirt changes, although I can say that there
> will be some amount to be mande and it will probably annoy existing users
> (everyone that has a multiple socket per NUMA node topology).
>
> There is not much we can do from the QEMU layer aside from what I've proposed
> here. The other alternative is to keep interacting with the kernel folks to
> see if there is a way to keep our use case untouched.
Right. Well.. not necessarily untouched, but I'm hoping for more
replies from Cédric to my objections and mpe's. Even with sockets
being a kinda meaningless concept in PAPR, I don't think tying it to
NUMA nodes makes sense.
> This also means that
> 'ibm,chip-id' will probably remain in use since it's the only place where
> we inform cores per socket information to the kernel.
Well.. unless we can find some other sensible way to convey that
information. I haven't given up hope for that yet.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2021-03-25 3:16 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-19 18:34 [PATCH 0/2] pseries: SMP sockets must match NUMA nodes Daniel Henrique Barboza
2021-03-19 18:34 ` [PATCH 1/2] spapr: number of SMP sockets must be equal to " Daniel Henrique Barboza
2021-03-23 1:03 ` David Gibson
2021-03-23 17:21 ` Daniel Henrique Barboza
2021-03-25 2:10 ` David Gibson [this message]
2021-03-25 8:56 ` Cédric Le Goater
2021-03-25 10:15 ` Daniel Henrique Barboza
2021-03-29 4:20 ` David Gibson
2021-03-29 15:32 ` Cédric Le Goater
2021-03-29 18:32 ` Daniel Henrique Barboza
2021-03-29 23:55 ` Igor Mammedov
2021-03-31 0:57 ` David Gibson
2021-03-31 4:58 ` Michael Ellerman
2021-03-31 15:22 ` Cédric Le Goater
2021-04-01 2:53 ` David Gibson
2021-03-31 15:18 ` Cédric Le Goater
2021-03-31 17:29 ` Daniel Henrique Barboza
2021-03-31 17:40 ` Cédric Le Goater
2021-04-01 2:59 ` David Gibson
2021-04-01 9:21 ` Cédric Le Goater
2021-03-29 23:51 ` Igor Mammedov
2021-03-30 21:33 ` Daniel Henrique Barboza
2021-03-19 18:34 ` [PATCH 2/2] spapr.c: remove 'ibm,chip-id' from DT Daniel Henrique Barboza
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YFvxAW3l4t+YznEm@yekko.fritz.box \
--to=david@gibson.dropbear.id.au \
--cc=clg@kaod.org \
--cc=danielhb413@gmail.com \
--cc=groug@kaod.org \
--cc=imammedo@redhat.com \
--cc=lvivier@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
--cc=srikar@linux.vnet.ibm.com \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).