linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Henrique Barboza <danielhb413@gmail.com>
To: linuxppc-dev@lists.ozlabs.org
Cc: aneesh.kumar@in.ibm.com,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Cedric Le Goater <clg@kaod.org>
Subject: Advice needed on SMP regression after cpu_core_mask change
Date: Wed, 17 Mar 2021 10:00:34 -0300	[thread overview]
Message-ID: <daa5d05f-dbd0-05ad-7395-5d5a3d364fc6@gmail.com> (raw)

Hello,

Patch 4bce545903fa ("powerpc/topology: Update topology_core_cpumask") introduced
a regression in both upstream and RHEL downstream kernels [1]. The assumption made
in the commit:

"Further analysis shows that cpu_core_mask and cpu_cpu_mask for any CPU would be
equal on Power"

Doesn't seem to be true. After this commit, QEMU is now unable to set single NUMA
node SMP topologies such as:

-smp 8,maxcpus=8,cores=2,threads=2,sockets=2

lscpu will give the following output in this case:

# lscpu
Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Model:               2.2 (pvr 004e 1202)
Model name:          POWER9 (architected), altivec supported
Hypervisor vendor:   KVM
Virtualization type: para
L1d cache:           32K
L1i cache:           32K
NUMA node0 CPU(s):   0-7


This is happening because the macro cpu_cpu_mask(cpu) expands to
cpumask_of_node(cpu_to_node(cpu)), which in turn expands to node_to_cpumask_map[node].
node_to_cpumask_map is a NUMA array that maps CPUs to NUMA nodes (Aneesh is on CC to
correct me if I'm wrong). We're now associating sockets to NUMA nodes directly.

If I add a second NUMA node then I can get the intended smp topology:

-smp 8,maxcpus=8,cores=2,threads=2,sockets=2
-numa node,memdev=mem0,cpus=0-3,nodeid=0 \
-numa node,memdev=mem1,cpus=4-7,nodeid=1 \

# lscpu
Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):           2
NUMA node(s):        2
Model:               2.2 (pvr 004e 1202)
Model name:          POWER9 (architected), altivec supported
Hypervisor vendor:   KVM
Virtualization type: para
L1d cache:           32K
L1i cache:           32K
NUMA node0 CPU(s):   0-3
NUMA node1 CPU(s):   4-7


However, if I try a single socket with multiple NUMA nodes topology, which is the case
of Power10, e.g.:


-smp 8,maxcpus=8,cores=4,threads=2,sockets=1
-numa node,memdev=mem0,cpus=0-3,nodeid=0 \
-numa node,memdev=mem1,cpus=4-7,nodeid=1 \


This is the result:

# lscpu
Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):           2
NUMA node(s):        2
Model:               2.2 (pvr 004e 1202)
Model name:          POWER9 (architected), altivec supported
Hypervisor vendor:   KVM
Virtualization type: para
L1d cache:           32K
L1i cache:           32K
NUMA node0 CPU(s):   0-3
NUMA node1 CPU(s):   4-7


This confirms my suspicions that, at this moment, we're making sockets == NUMA nodes.


Cedric, the reason I'm CCing you is because this is related to ibm,chip-id. The commit
after the one that caused the regression, 4ca234a9cbd7c3a65 ("powerpc/smp: Stop updating
cpu_core_mask"), is erasing the code that calculated cpu_core_mask. cpu_core_mask, despite
its shortcomings that caused its removal, was giving a precise SMP topology. And it was
using physical_package_id/'ibm,chip-id' for that.

Checking in QEMU I can say that the ibm,chip-id calculation is the only place in the code
that cares about cores per socket information. The kernel is now ignoring that, starting
on 4bce545903fa, and now QEMU is unable to provide this info to the guest.

If we're not going to use ibm,chip-id any longer, which seems sensible given that PAPR does
not declare it, we need another way of letting the guest know how much cores per socket
we want.



[1] https://bugzilla.redhat.com/1934421



Thanks,


DHB

             reply	other threads:[~2021-03-17 13:01 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-17 13:00 Daniel Henrique Barboza [this message]
2021-03-17 15:30 ` Advice needed on SMP regression after cpu_core_mask change Cédric Le Goater
2021-03-17 16:05   ` Daniel Henrique Barboza
2021-03-22  3:16     ` David Gibson
2021-03-18 13:42 ` Srikar Dronamraju
2021-03-18 15:36   ` Daniel Henrique Barboza

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=daa5d05f-dbd0-05ad-7395-5d5a3d364fc6@gmail.com \
    --to=danielhb413@gmail.com \
    --cc=aneesh.kumar@in.ibm.com \
    --cc=clg@kaod.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=srikar@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).