linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: LKML <linux-kernel@vger.kernel.org>
Cc: x86@kernel.org, Tom Lendacky <thomas.lendacky@amd.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Arjan van de Ven <arjan@linux.intel.com>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	Dick Kennedy <dick.kennedy@broadcom.com>,
	James Smart <james.smart@broadcom.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	linux-scsi@vger.kernel.org, linux-hwmon@vger.kernel.org,
	Jean Delvare <jdelvare@suse.com>, Huang Rui <ray.huang@amd.com>,
	Guenter Roeck <linux@roeck-us.net>,
	Steve Wahl <steve.wahl@hpe.com>,
	Mike Travis <mike.travis@hpe.com>,
	Dimitri Sivanich <dimitri.sivanich@hpe.com>,
	Russ Anderson <russ.anderson@hpe.com>
Subject: [patch 00/29] x86/cpu: Rework the topology evaluation
Date: Mon, 24 Jul 2023 19:43:50 +0200 (CEST)	[thread overview]
Message-ID: <20230724155329.474037902@linutronix.de> (raw)

Hi!

A recent commit to the CPUID leaf 0xb/0x1f parser made me look deeper at
the way how topology is evaluated. That "fix" is just yet another cure the
sypmtom hack which completely ignores the underlying disaster.

The way how topology evaluation works is to overwrite the relevant
variables as often as possible. E.g. smp_num_siblings gets overwritten a
gazillion times, which is wrong to begin with. The boot CPU writes it 3
times, each AP two times.

What's worse is that this just works by chance on hybrid systems due to the
fact that the existing ones all seem to boot on a P-Core which has
SMT. Would it boot on a E-Core which has no SMT, then parts of the early
topology evaluation including the primary thread mask which is required for
parallel CPU bringup would be completely wrong. Overwriting it later on
with the correct value does not help at all.

What's wrong today with hybrid already is the number of cores per package.
On an ADL with 8 P-Cores and 8 E-cores the resulting number of cores per
package is evaluated to be 12. Which is not further surprising because the
CPUID 0xb/0x1f parser looks at the number of logical processors at core
level and divides them by the number of SMP siblings.

   24 / 2 = 12

Just that this CPU has obviously 16 cores not 12.

It's is even clearly documented in the SDM that this is wrong.

 "Bits 15-00: The number of logical processors across all instances of this
  domain within the next higher- scoped domain relative to this current
  logical processor. (For example, in a processor socket/package comprising
  "M" dies of "N" cores each, where each core has "L" processors, the
  "die" domain subleaf value of this field would be M*N*L. In an asymmetric
  topology this would be the summation of the value across the lower domain
  level instances to create each upper domain level instance.) This number
  reflects configuration as shipped by Intel. Note, software must not use
  this field to enumerate processor topology.

  Software must not use the value of EBX[15:0] to enumerate processor topology
  of the system. The value is only intended for display and diagnostic purposes.
  The actual number of logical processors available to BIOS/OS/Applications
  may be different from the value of EBX[15:0], depending on software and
  platform hardware configurations."

This "_NOT_ to use for topology evaluation" sentence existed even before
hybrid came along and got ignored. The code worked by chance, but with
hybrid all bets are off. The code completely falls apart once CPUID leaf
0x1f enumerates any topology level between CORE and DIE, but that's not a
suprise.

The proper thing to do is to actually evaluate the full topology including
the non-present (hotpluggable) CPUs based on the APICIDs which are provided
by the firmware and a proper topology domain level parser. This can exactly
tell the number of physical packages, logical packages etc. _before_ even
booting a single AP. All of that can be evaluated upfront.

Aside of that there are too many places which do their own topology
evaluation, but there is absolutely no central point which can actually
provide all of that information in a consistent way. This needs to change.

This series implements another piece towards this: sane CPUID evaluation,
which is done at _one_ place in a proper well defined order instead of
having it sprinkled all over the CPUID evaluation code.

At the end of this series this is pretty much bug compatible with the
current CPUID evaluation code in respect to the cores per package
evaluation, but it gets rid of overwriting things like smp_num_siblings,
which is now written once, but is still not capable to work correctly on a
hybrid machine which boots from a non SMT core. These things can only be
fixed up in the next step(s).

When I tried to go further with this I ran into yet another pile of
historical layers of duct tape and haywire with a gazillion of random
variables sprinkled all over the place. That's still work in progress.  It
actually works, but the last step which switches over is not yet in a shape
that can be easily reviewed. Stay tuned.

The series is based on the APIC cleanup series:

  https://lore.kernel.org/lkml/20230724131206.500814398@linutronix.de

and also available on top of that from git:

 git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-cpuid-v1

Thanks,

	tglx
---
 arch/x86/kernel/cpu/topology.c              |  168 --------------------
 b/Documentation/arch/x86/topology.rst       |   12 -
 b/arch/x86/events/amd/core.c                |    2 
 b/arch/x86/events/amd/uncore.c              |    2 
 b/arch/x86/events/intel/uncore.c            |    2 
 b/arch/x86/include/asm/apic.h               |    1 
 b/arch/x86/include/asm/cacheinfo.h          |    3 
 b/arch/x86/include/asm/cpuid.h              |   32 +++
 b/arch/x86/include/asm/processor.h          |   58 ++++--
 b/arch/x86/include/asm/smp.h                |    2 
 b/arch/x86/include/asm/topology.h           |   51 +++++-
 b/arch/x86/include/asm/x86_init.h           |    2 
 b/arch/x86/kernel/amd_nb.c                  |    8 
 b/arch/x86/kernel/apic/apic_flat_64.c       |    7 
 b/arch/x86/kernel/apic/apic_noop.c          |    3 
 b/arch/x86/kernel/apic/apic_numachip.c      |   11 -
 b/arch/x86/kernel/apic/bigsmp_32.c          |    6 
 b/arch/x86/kernel/apic/local.h              |    1 
 b/arch/x86/kernel/apic/probe_32.c           |    6 
 b/arch/x86/kernel/apic/x2apic_cluster.c     |    1 
 b/arch/x86/kernel/apic/x2apic_phys.c        |    6 
 b/arch/x86/kernel/apic/x2apic_uv_x.c        |   63 +------
 b/arch/x86/kernel/cpu/Makefile              |    5 
 b/arch/x86/kernel/cpu/amd.c                 |  156 ------------------
 b/arch/x86/kernel/cpu/cacheinfo.c           |   51 ++----
 b/arch/x86/kernel/cpu/centaur.c             |    4 
 b/arch/x86/kernel/cpu/common.c              |  108 +-----------
 b/arch/x86/kernel/cpu/cpu.h                 |   14 +
 b/arch/x86/kernel/cpu/debugfs.c             |   98 +++++++++++
 b/arch/x86/kernel/cpu/hygon.c               |  133 ---------------
 b/arch/x86/kernel/cpu/intel.c               |   38 ----
 b/arch/x86/kernel/cpu/mce/amd.c             |    4 
 b/arch/x86/kernel/cpu/mce/apei.c            |    4 
 b/arch/x86/kernel/cpu/mce/core.c            |    4 
 b/arch/x86/kernel/cpu/mce/inject.c          |    7 
 b/arch/x86/kernel/cpu/proc.c                |    8 
 b/arch/x86/kernel/cpu/topology.h            |   51 ++++++
 b/arch/x86/kernel/cpu/topology_amd.c        |  179 +++++++++++++++++++++
 b/arch/x86/kernel/cpu/topology_common.c     |  233 ++++++++++++++++++++++++++++
 b/arch/x86/kernel/cpu/topology_ext.c        |  136 ++++++++++++++++
 b/arch/x86/kernel/cpu/zhaoxin.c             |   18 --
 b/arch/x86/kernel/smpboot.c                 |   64 ++++---
 b/arch/x86/kernel/vsmp_64.c                 |   13 -
 b/arch/x86/mm/amdtopology.c                 |   35 +---
 b/arch/x86/xen/apic.c                       |    8 
 b/drivers/edac/amd64_edac.c                 |    4 
 b/drivers/edac/mce_amd.c                    |    4 
 b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c |    2 
 b/drivers/hwmon/fam15h_power.c              |    7 
 b/drivers/scsi/lpfc/lpfc_init.c             |    8 
 b/drivers/virt/acrn/hsm.c                   |    2 
 51 files changed, 964 insertions(+), 881 deletions(-)





             reply	other threads:[~2023-07-24 17:44 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-24 17:43 Thomas Gleixner [this message]
2023-07-24 17:43 ` [patch 01/29] x86/cpu: Encapsulate topology information in cpuinfo_x86 Thomas Gleixner
2023-07-25  7:46   ` Thomas Gleixner
2023-07-24 17:43 ` [patch 02/29] x86/cpu: Move phys_proc_id into topology info Thomas Gleixner
2023-07-24 17:43 ` [patch 03/29] x86/cpu: Move cpu_die_id " Thomas Gleixner
2023-07-24 17:43 ` [patch 04/29] scsi: lpfc: Use topology_core_id() Thomas Gleixner
2023-07-24 17:43 ` [patch 05/29] hwmon: (fam15h_power) " Thomas Gleixner
2023-07-24 17:53   ` Guenter Roeck
2023-07-24 17:43 ` [patch 06/29] x86/cpu: Move cpu_core_id into topology info Thomas Gleixner
2023-07-24 17:44 ` [patch 07/29] x86/cpu: Move cu_id " Thomas Gleixner
2023-07-24 17:44 ` [patch 08/29] x86/cpu: Remove pointless evaluation of x86_coreid_bits Thomas Gleixner
2023-07-24 17:44 ` [patch 09/29] x86/cpu: Move logical package and die IDs into topology info Thomas Gleixner
2023-07-24 17:44 ` [patch 10/29] x86/cpu: Move cpu_l[l2]c_id " Thomas Gleixner
2023-07-24 17:44 ` [patch 11/29] x86/cpu: Provide debug interface Thomas Gleixner
2023-07-24 17:44 ` [patch 12/29] x86/cpu: Provide cpuid_read() et al Thomas Gleixner
2023-07-24 17:44 ` [patch 13/29] x86/cpu: Provide cpu_init/parse_topology() Thomas Gleixner
2023-07-24 17:44 ` [patch 14/29] x86/cpu: Add legacy topology parser Thomas Gleixner
2023-07-24 17:44 ` [patch 15/29] x86/cpu: Use common topology code for Centaur and Zhaoxin Thomas Gleixner
2023-07-24 17:44 ` [patch 16/29] x86/cpu: Move __max_die_per_package to common.c Thomas Gleixner
2023-07-24 17:44 ` [patch 17/29] x86/cpu: Provide a sane leaf 0xb/0x1f parser Thomas Gleixner
2023-07-24 20:49   ` Peter Zijlstra
2023-07-24 21:02     ` Peter Zijlstra
2023-07-25  6:51     ` Thomas Gleixner
2023-07-24 17:44 ` [patch 18/29] x86/cpu: Use common topology code for Intel Thomas Gleixner
2023-07-24 17:44 ` [patch 19/29] x86/cpu/amd: Provide a separate acessor for Node ID Thomas Gleixner
2023-07-24 17:44 ` [patch 20/29] x86/cpu: Provide an AMD/HYGON specific topology parser Thomas Gleixner
2023-07-24 17:44 ` [patch 21/29] x86/smpboot: Teach it about topo.amd_node_id Thomas Gleixner
2023-07-24 17:44 ` [patch 22/29] x86/cpu: Use common topology code for AMD Thomas Gleixner
2023-07-24 17:44 ` [patch 23/29] x86/cpu: Use common topology code for HYGON Thomas Gleixner
2023-07-24 17:44 ` [patch 24/29] x86/mm/numa: Use core domain size on AMD Thomas Gleixner
2023-07-24 17:44 ` [patch 25/29] x86/cpu: Make topology_amd_node_id() use the actual node info Thomas Gleixner
2023-07-24 17:44 ` [patch 26/29] x86/cpu: Remove topology.c Thomas Gleixner
2023-07-24 17:44 ` [patch 27/29] x86/cpu: Remove x86_coreid_bits Thomas Gleixner
2023-07-24 17:44 ` [patch 28/29] x86/apic: Remove unused phys_pkg_id() callback Thomas Gleixner
2023-07-24 17:44 ` [patch 29/29] x86/apic/uv: Remove the private leaf 0xb parser Thomas Gleixner
2023-07-24 21:05 ` [patch 00/29] x86/cpu: Rework the topology evaluation Peter Zijlstra
2023-07-25 20:17 ` Dimitri Sivanich
2023-07-26 22:15 ` Sohil Mehta
2023-07-26 22:38   ` Thomas Gleixner
2023-07-26 23:27     ` Sohil Mehta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230724155329.474037902@linutronix.de \
    --to=tglx@linutronix.de \
    --cc=andrew.cooper3@citrix.com \
    --cc=arjan@linux.intel.com \
    --cc=dick.kennedy@broadcom.com \
    --cc=dimitri.sivanich@hpe.com \
    --cc=james.smart@broadcom.com \
    --cc=jdelvare@suse.com \
    --cc=jejb@linux.ibm.com \
    --cc=linux-hwmon@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=martin.petersen@oracle.com \
    --cc=mike.travis@hpe.com \
    --cc=ray.huang@amd.com \
    --cc=russ.anderson@hpe.com \
    --cc=steve.wahl@hpe.com \
    --cc=thomas.lendacky@amd.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).