linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch v2 00/30] x86/apic: Rework APIC registration
@ 2024-01-23 13:10 Thomas Gleixner
  2024-01-23 13:10 ` [patch v2 01/30] x86/cpu/topology: Move registration out of APIC code Thomas Gleixner
                   ` (31 more replies)
  0 siblings, 32 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:10 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

This is a breakout from:

  https://lore.kernel.org/all/20230807130108.853357011@linutronix.de

addressing the issues of the current topology code:

  - Wrong core count on hybrid systems

  - Heuristics based size information for packages and dies which
    are failing to work correctly with certain command line parameters.

  - Full evaluation fail for a theoretical hybrid system which boots
    from an E-core

  - The complete insanity of manipulating global data from firmware parsers
    or the XEN/PV fake SMP enumeration. The latter is really a piece of art.

This series addresses this by

  - Consolidating all topology relevant functionality into one place

  - Providing separate interfaces for boot time and ACPI hotplug operations

  - A sane ordering of command line options and restrictions

  - A sensible way to handle the BSP problem in kdump kernels instead of
    the unreliable command line option.

  - Confinement of topology relevant variables by replacing the XEN/PV SMP
    enumeration fake with something halfways sensible.

  - Evaluation of sizes by analysing the topology via the CPUID provided
    APIC ID segmentation and the actual APIC IDs which are registered at
    boot time.

  - Removal of heuristics and broken size calculations

The idea behind this is the following:

The APIC IDs describe the system topology in multiple domain levels. The
CPUID topology parser provides the information which part of the APIC ID is
associated to the individual levels (Intel terminology):

   [ROOT][PACKAGE][DIEGRP][DIE][TILE][MODULE][CORE][THREAD]

The root space contains the package (socket) IDs. Not enumerated levels
consume 0 bits space, but conceptually they are always represented. If
e.g. only CORE and THREAD levels are enumerated then the DIEGRP, DIE,
MODULE and TILE have the same physical ID as the PACKAGE.

If SMT is not supported, then the THREAD domain is still used. It then
has the same physical ID as the CORE domain and is the only child of
the core domain.

This allows an unified view on the system independent of the enumerated
domain levels without requiring any conditionals in the code.

AMD does only expose 4 domain levels with obviously different terminology,
but that can be easily mapped into the Intel variant with a trivial lookup
table added to the CPUID parser.

The resulting topology information of an ADL hybrid system with 8 P-Cores
and 8 E-Cores looks like this:

 CPU topo: Max. logical packages:   1
 CPU topo: Max. logical dies:       1
 CPU topo: Max. dies per package:   1
 CPU topo: Max. threads per core:   2
 CPU topo: Num. cores per package:    16
 CPU topo: Num. threads per package:  24
 CPU topo: Allowing 24 present CPUs plus 0 hotplug CPUs
 CPU topo: Thread    :    24
 CPU topo: Core      :    16
 CPU topo: Module    :     1
 CPU topo: Tile      :     1
 CPU topo: Die       :     1
 CPU topo: Package   :     1

This is happening on the boot CPU before any of the APs is started and
provides correct size information right from the start.

Even the XEN/PV trainwreck makes use of this now. On Dom0 it utilizes the
MADT and on DomU it provides fake APIC IDs, which combined with the
provided CPUID information make it at least look halfways realistic instead
of claiming to have one CPU per package as the current upstream code does.

This is solely addressing the core topology issues, but there is a plan for
further consolidation of other topology related information into one single
source of information instead of having a gazillion of localized special
parsers and representations all over the place. There are quite some other
things which can be simplified on top of this, like updating the various
cpumasks during CPU bringup, but that's all left for later.

Changes vs. V1:

	- Breakout of the actual topology management changes

	- Adopt DIEGRP

	- Different approach to identify the BSP on enumeration (Rui)

The current series applies on top of 

   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-cleanup-v2

and is available from git here:

   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-full-v2

Thanks,

	tglx
---
 Documentation/admin-guide/kdump/kdump.rst                      |    7 
 Documentation/admin-guide/kernel-parameters.txt                |    9 
 Documentation/arch/x86/topology.rst                            |   24 
 arch/x86/events/intel/cstate.c                                 |    2 
 arch/x86/events/intel/uncore.c                                 |    2 
 arch/x86/events/intel/uncore_nhmex.c                           |    4 
 arch/x86/events/intel/uncore_snb.c                             |    8 
 arch/x86/events/intel/uncore_snbep.c                           |   18 
 arch/x86/events/rapl.c                                         |    2 
 arch/x86/include/asm/apic.h                                    |   10 
 arch/x86/include/asm/cpu.h                                     |   10 
 arch/x86/include/asm/mpspec.h                                  |    2 
 arch/x86/include/asm/perf_event_p4.h                           |    4 
 arch/x86/include/asm/processor.h                               |    2 
 arch/x86/include/asm/smp.h                                     |    6 
 arch/x86/include/asm/topology.h                                |   53 -
 arch/x86/kernel/acpi/boot.c                                    |   59 -
 arch/x86/kernel/apic/apic.c                                    |  186 ---
 arch/x86/kernel/cpu/Makefile                                   |   12 
 arch/x86/kernel/cpu/cacheinfo.c                                |    2 
 arch/x86/kernel/cpu/common.c                                   |   33 
 arch/x86/kernel/cpu/debugfs.c                                  |    7 
 arch/x86/kernel/cpu/mce/inject.c                               |    3 
 arch/x86/kernel/cpu/microcode/intel.c                          |    2 
 arch/x86/kernel/cpu/topology.c                                 |  484 ++++++++++
 arch/x86/kernel/cpu/topology.h                                 |   11 
 arch/x86/kernel/cpu/topology_common.c                          |   45 
 arch/x86/kernel/devicetree.c                                   |    2 
 arch/x86/kernel/jailhouse.c                                    |    2 
 arch/x86/kernel/mpparse.c                                      |   17 
 arch/x86/kernel/process.c                                      |    2 
 arch/x86/kernel/setup.c                                        |    9 
 arch/x86/kernel/smpboot.c                                      |  219 ----
 arch/x86/xen/apic.c                                            |   14 
 arch/x86/xen/enlighten_pv.c                                    |    3 
 arch/x86/xen/smp.c                                             |    2 
 arch/x86/xen/smp.h                                             |    2 
 arch/x86/xen/smp_pv.c                                          |   58 -
 drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c               |    2 
 drivers/hwmon/coretemp.c                                       |    2 
 drivers/hwmon/fam15h_power.c                                   |    2 
 drivers/platform/x86/intel/uncore-frequency/uncore-frequency.c |    2 
 drivers/powercap/intel_rapl_common.c                           |    2 
 drivers/thermal/intel/intel_hfi.c                              |    2 
 drivers/thermal/intel/intel_powerclamp.c                       |    2 
 drivers/thermal/intel/x86_pkg_temp_thermal.c                   |    2 
 46 files changed, 698 insertions(+), 655 deletions(-)



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 01/30] x86/cpu/topology: Move registration out of APIC code
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
@ 2024-01-23 13:10 ` Thomas Gleixner
  2024-01-23 13:10 ` [patch v2 02/30] x86/cpu/topology: Provide separate APIC registration functions Thomas Gleixner
                   ` (30 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:10 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

The APIC/CPU registration sits in the middle of the APIC code. In fact this
is a topology evaluation function and has nothing to do with the inner
workings of the local APIC.

Move it out into a file which reflects what this is about.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/apic.h    |    2 
 arch/x86/kernel/apic/apic.c    |  185 -----------------------------------------
 arch/x86/kernel/cpu/Makefile   |   12 +-
 arch/x86/kernel/cpu/topology.c |  184 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 195 insertions(+), 188 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/topology.c
---
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -171,6 +171,8 @@ extern bool apic_needs_pit(void);
 
 extern void apic_send_IPI_allbutself(unsigned int vector);
 
+extern void topology_register_boot_apic(u32 apic_id);
+
 #else /* !CONFIG_X86_LOCAL_APIC */
 static inline void lapic_shutdown(void) { }
 #define local_apic_timer_c2_ok		1
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -68,26 +68,12 @@
 
 #include "local.h"
 
-unsigned int num_processors;
-
-unsigned disabled_cpus;
-
 /* Processor that is doing the boot up */
 u32 boot_cpu_physical_apicid __ro_after_init = BAD_APICID;
 EXPORT_SYMBOL_GPL(boot_cpu_physical_apicid);
 
 u8 boot_cpu_apic_version __ro_after_init;
 
-/* Bitmap of physically present CPUs. */
-DECLARE_BITMAP(phys_cpu_present_map, MAX_LOCAL_APIC);
-
-/*
- * Processor to be disabled specified by kernel parameter
- * disable_cpu_apicid=<int>, mostly used for the kdump 2nd kernel to
- * avoid undefined behaviour caused by sending INIT from AP to BSP.
- */
-static u32 disabled_cpu_apicid __ro_after_init = BAD_APICID;
-
 /*
  * This variable controls which CPUs receive external NMIs.  By default,
  * external NMIs are delivered only to the BSP.
@@ -107,14 +93,6 @@ static inline bool apic_accessible(void)
 	return x2apic_mode || apic_mmio_base;
 }
 
-/*
- * Map cpu index to physical APIC ID
- */
-DEFINE_EARLY_PER_CPU_READ_MOSTLY(u32, x86_cpu_to_apicid, BAD_APICID);
-DEFINE_EARLY_PER_CPU_READ_MOSTLY(u32, x86_cpu_to_acpiid, CPU_ACPIID_INVALID);
-EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_apicid);
-EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_acpiid);
-
 #ifdef CONFIG_X86_32
 /* Local APIC was disabled by the BIOS and enabled by the kernel */
 static int enabled_via_apicbase __ro_after_init;
@@ -1676,8 +1654,6 @@ void apic_ap_setup(void)
 	end_local_APIC_setup();
 }
 
-static __init void cpu_set_boot_apic(void);
-
 static __init void apic_read_boot_cpu_id(bool x2apic)
 {
 	/*
@@ -1692,7 +1668,8 @@ static __init void apic_read_boot_cpu_id
 		boot_cpu_physical_apicid = read_apic_id();
 		boot_cpu_apic_version = GET_APIC_VERSION(apic_read(APIC_LVR));
 	}
-	cpu_set_boot_apic();
+	topology_register_boot_apic(boot_cpu_physical_apicid);
+	x86_32_probe_bigsmp_early();
 }
 
 #ifdef CONFIG_X86_X2APIC
@@ -2291,155 +2268,6 @@ void disconnect_bsp_APIC(int virt_wire_s
 	apic_write(APIC_LVT1, value);
 }
 
-/*
- * The number of allocated logical CPU IDs. Since logical CPU IDs are allocated
- * contiguously, it equals to current allocated max logical CPU ID plus 1.
- * All allocated CPU IDs should be in the [0, nr_logical_cpuids) range,
- * so the maximum of nr_logical_cpuids is nr_cpu_ids.
- *
- * NOTE: Reserve 0 for BSP.
- */
-static int nr_logical_cpuids = 1;
-
-/*
- * Used to store mapping between logical CPU IDs and APIC IDs.
- */
-u32 cpuid_to_apicid[] = { [0 ... NR_CPUS - 1] = BAD_APICID, };
-
-bool arch_match_cpu_phys_id(int cpu, u64 phys_id)
-{
-	return phys_id == (u64)cpuid_to_apicid[cpu];
-}
-
-#ifdef CONFIG_SMP
-static void cpu_mark_primary_thread(unsigned int cpu, unsigned int apicid)
-{
-	/* Isolate the SMT bit(s) in the APICID and check for 0 */
-	u32 mask = (1U << (fls(smp_num_siblings) - 1)) - 1;
-
-	if (smp_num_siblings == 1 || !(apicid & mask))
-		cpumask_set_cpu(cpu, &__cpu_primary_thread_mask);
-}
-
-/*
- * Due to the utter mess of CPUID evaluation smp_num_siblings is not valid
- * during early boot. Initialize the primary thread mask before SMP
- * bringup.
- */
-static int __init smp_init_primary_thread_mask(void)
-{
-	unsigned int cpu;
-
-	/*
-	 * XEN/PV provides either none or useless topology information.
-	 * Pretend that all vCPUs are primary threads.
-	 */
-	if (xen_pv_domain()) {
-		cpumask_copy(&__cpu_primary_thread_mask, cpu_possible_mask);
-		return 0;
-	}
-
-	for (cpu = 0; cpu < nr_logical_cpuids; cpu++)
-		cpu_mark_primary_thread(cpu, cpuid_to_apicid[cpu]);
-	return 0;
-}
-early_initcall(smp_init_primary_thread_mask);
-#else
-static inline void cpu_mark_primary_thread(unsigned int cpu, unsigned int apicid) { }
-#endif
-
-/*
- * Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids
- * and cpuid_to_apicid[] synchronized.
- */
-static int allocate_logical_cpuid(int apicid)
-{
-	int i;
-
-	/*
-	 * cpuid <-> apicid mapping is persistent, so when a cpu is up,
-	 * check if the kernel has allocated a cpuid for it.
-	 */
-	for (i = 0; i < nr_logical_cpuids; i++) {
-		if (cpuid_to_apicid[i] == apicid)
-			return i;
-	}
-
-	/* Allocate a new cpuid. */
-	if (nr_logical_cpuids >= nr_cpu_ids) {
-		WARN_ONCE(1, "APIC: NR_CPUS/possible_cpus limit of %u reached. "
-			     "Processor %d/0x%x and the rest are ignored.\n",
-			     nr_cpu_ids, nr_logical_cpuids, apicid);
-		return -EINVAL;
-	}
-
-	cpuid_to_apicid[nr_logical_cpuids] = apicid;
-	return nr_logical_cpuids++;
-}
-
-static void cpu_update_apic(int cpu, u32 apicid)
-{
-#if defined(CONFIG_SMP) || defined(CONFIG_X86_64)
-	early_per_cpu(x86_cpu_to_apicid, cpu) = apicid;
-#endif
-	set_cpu_possible(cpu, true);
-	set_bit(apicid, phys_cpu_present_map);
-	set_cpu_present(cpu, true);
-	num_processors++;
-
-	if (system_state != SYSTEM_BOOTING)
-		cpu_mark_primary_thread(cpu, apicid);
-}
-
-static __init void cpu_set_boot_apic(void)
-{
-	cpuid_to_apicid[0] = boot_cpu_physical_apicid;
-	cpu_update_apic(0, boot_cpu_physical_apicid);
-	x86_32_probe_bigsmp_early();
-}
-
-int generic_processor_info(int apicid)
-{
-	int cpu, max = nr_cpu_ids;
-
-	/* The boot CPU must be set before MADT/MPTABLE parsing happens */
-	if (cpuid_to_apicid[0] == BAD_APICID)
-		panic("Boot CPU APIC not registered yet\n");
-
-	if (apicid == boot_cpu_physical_apicid)
-		return 0;
-
-	if (disabled_cpu_apicid == apicid) {
-		int thiscpu = num_processors + disabled_cpus;
-
-		pr_warn("APIC: Disabling requested cpu. Processor %d/0x%x ignored.\n",
-			thiscpu, apicid);
-
-		disabled_cpus++;
-		return -ENODEV;
-	}
-
-	if (num_processors >= nr_cpu_ids) {
-		int thiscpu = max + disabled_cpus;
-
-		pr_warn("APIC: NR_CPUS/possible_cpus limit of %i reached. "
-			"Processor %d/0x%x ignored.\n", max, thiscpu, apicid);
-
-		disabled_cpus++;
-		return -EINVAL;
-	}
-
-	cpu = allocate_logical_cpuid(apicid);
-	if (cpu < 0) {
-		disabled_cpus++;
-		return -EINVAL;
-	}
-
-	cpu_update_apic(cpu, apicid);
-	return cpu;
-}
-
-
 void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg,
 			   bool dmar)
 {
@@ -2828,15 +2656,6 @@ static int __init lapic_insert_resource(
  */
 late_initcall(lapic_insert_resource);
 
-static int __init apic_set_disabled_cpu_apicid(char *arg)
-{
-	if (!arg || !get_option(&arg, &disabled_cpu_apicid))
-		return -EINVAL;
-
-	return 0;
-}
-early_param("disable_cpu_apicid", apic_set_disabled_cpu_apicid);
-
 static int __init apic_set_extnmi(char *arg)
 {
 	if (!arg)
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -26,14 +26,16 @@ obj-y			+= bugs.o
 obj-y			+= aperfmperf.o
 obj-y			+= cpuid-deps.o
 obj-y			+= umwait.o
+obj-y 			+= capflags.o powerflags.o
 
-obj-$(CONFIG_PROC_FS)	+= proc.o
-obj-y += capflags.o powerflags.o
+obj-$(CONFIG_X86_LOCAL_APIC)		+= topology.o
 
-obj-$(CONFIG_IA32_FEAT_CTL) += feat_ctl.o
+obj-$(CONFIG_PROC_FS)			+= proc.o
+
+obj-$(CONFIG_IA32_FEAT_CTL)		+= feat_ctl.o
 ifdef CONFIG_CPU_SUP_INTEL
-obj-y			+= intel.o intel_pconfig.o tsx.o
-obj-$(CONFIG_PM)	+= intel_epb.o
+obj-y					+= intel.o intel_pconfig.o tsx.o
+obj-$(CONFIG_PM)			+= intel_epb.o
 endif
 obj-$(CONFIG_CPU_SUP_AMD)		+= amd.o
 obj-$(CONFIG_CPU_SUP_HYGON)		+= hygon.o
--- /dev/null
+++ b/arch/x86/kernel/cpu/topology.c
@@ -0,0 +1,184 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/cpu.h>
+
+#include <xen/xen.h>
+
+#include <asm/apic.h>
+#include <asm/mpspec.h>
+#include <asm/smp.h>
+
+/*
+ * Map cpu index to physical APIC ID
+ */
+DEFINE_EARLY_PER_CPU_READ_MOSTLY(u32, x86_cpu_to_apicid, BAD_APICID);
+DEFINE_EARLY_PER_CPU_READ_MOSTLY(u32, x86_cpu_to_acpiid, CPU_ACPIID_INVALID);
+EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_apicid);
+EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_acpiid);
+
+/* Bitmap of physically present CPUs. */
+DECLARE_BITMAP(phys_cpu_present_map, MAX_LOCAL_APIC) __read_mostly;
+
+/* Used for CPU number allocation and parallel CPU bringup */
+u32 cpuid_to_apicid[] __read_mostly = { [0 ... NR_CPUS - 1] = BAD_APICID, };
+
+/*
+ * Processor to be disabled specified by kernel parameter
+ * disable_cpu_apicid=<int>, mostly used for the kdump 2nd kernel to
+ * avoid undefined behaviour caused by sending INIT from AP to BSP.
+ */
+static u32 disabled_cpu_apicid __ro_after_init = BAD_APICID;
+
+unsigned int num_processors;
+unsigned disabled_cpus;
+
+/*
+ * The number of allocated logical CPU IDs. Since logical CPU IDs are allocated
+ * contiguously, it equals to current allocated max logical CPU ID plus 1.
+ * All allocated CPU IDs should be in the [0, nr_logical_cpuids) range,
+ * so the maximum of nr_logical_cpuids is nr_cpu_ids.
+ *
+ * NOTE: Reserve 0 for BSP.
+ */
+static int nr_logical_cpuids = 1;
+
+bool arch_match_cpu_phys_id(int cpu, u64 phys_id)
+{
+	return phys_id == (u64)cpuid_to_apicid[cpu];
+}
+
+#ifdef CONFIG_SMP
+static void cpu_mark_primary_thread(unsigned int cpu, unsigned int apicid)
+{
+	/* Isolate the SMT bit(s) in the APICID and check for 0 */
+	u32 mask = (1U << (fls(smp_num_siblings) - 1)) - 1;
+
+	if (smp_num_siblings == 1 || !(apicid & mask))
+		cpumask_set_cpu(cpu, &__cpu_primary_thread_mask);
+}
+
+/*
+ * Due to the utter mess of CPUID evaluation smp_num_siblings is not valid
+ * during early boot. Initialize the primary thread mask before SMP
+ * bringup.
+ */
+static int __init smp_init_primary_thread_mask(void)
+{
+	unsigned int cpu;
+
+	/*
+	 * XEN/PV provides either none or useless topology information.
+	 * Pretend that all vCPUs are primary threads.
+	 */
+	if (xen_pv_domain()) {
+		cpumask_copy(&__cpu_primary_thread_mask, cpu_possible_mask);
+		return 0;
+	}
+
+	for (cpu = 0; cpu < nr_logical_cpuids; cpu++)
+		cpu_mark_primary_thread(cpu, cpuid_to_apicid[cpu]);
+	return 0;
+}
+early_initcall(smp_init_primary_thread_mask);
+#else
+static inline void cpu_mark_primary_thread(unsigned int cpu, unsigned int apicid) { }
+#endif
+
+/*
+ * Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids
+ * and cpuid_to_apicid[] synchronized.
+ */
+static int allocate_logical_cpuid(int apicid)
+{
+	int i;
+
+	/*
+	 * cpuid <-> apicid mapping is persistent, so when a cpu is up,
+	 * check if the kernel has allocated a cpuid for it.
+	 */
+	for (i = 0; i < nr_logical_cpuids; i++) {
+		if (cpuid_to_apicid[i] == apicid)
+			return i;
+	}
+
+	/* Allocate a new cpuid. */
+	if (nr_logical_cpuids >= nr_cpu_ids) {
+		WARN_ONCE(1, "APIC: NR_CPUS/possible_cpus limit of %u reached. "
+			     "Processor %d/0x%x and the rest are ignored.\n",
+			     nr_cpu_ids, nr_logical_cpuids, apicid);
+		return -EINVAL;
+	}
+
+	cpuid_to_apicid[nr_logical_cpuids] = apicid;
+	return nr_logical_cpuids++;
+}
+
+static void cpu_update_apic(int cpu, u32 apicid)
+{
+#if defined(CONFIG_SMP) || defined(CONFIG_X86_64)
+	early_per_cpu(x86_cpu_to_apicid, cpu) = apicid;
+#endif
+	set_cpu_possible(cpu, true);
+	set_bit(apicid, phys_cpu_present_map);
+	set_cpu_present(cpu, true);
+	num_processors++;
+
+	if (system_state != SYSTEM_BOOTING)
+		cpu_mark_primary_thread(cpu, apicid);
+}
+
+void __init topology_register_boot_apic(u32 apic_id)
+{
+	cpuid_to_apicid[0] = apic_id;
+	cpu_update_apic(0, apic_id);
+}
+
+int generic_processor_info(int apicid)
+{
+	int cpu, max = nr_cpu_ids;
+
+	/* The boot CPU must be set before MADT/MPTABLE parsing happens */
+	if (cpuid_to_apicid[0] == BAD_APICID)
+		panic("Boot CPU APIC not registered yet\n");
+
+	if (apicid == boot_cpu_physical_apicid)
+		return 0;
+
+	if (disabled_cpu_apicid == apicid) {
+		int thiscpu = num_processors + disabled_cpus;
+
+		pr_warn("APIC: Disabling requested cpu. Processor %d/0x%x ignored.\n",
+			thiscpu, apicid);
+
+		disabled_cpus++;
+		return -ENODEV;
+	}
+
+	if (num_processors >= nr_cpu_ids) {
+		int thiscpu = max + disabled_cpus;
+
+		pr_warn("APIC: NR_CPUS/possible_cpus limit of %i reached. "
+			"Processor %d/0x%x ignored.\n", max, thiscpu, apicid);
+
+		disabled_cpus++;
+		return -EINVAL;
+	}
+
+	cpu = allocate_logical_cpuid(apicid);
+	if (cpu < 0) {
+		disabled_cpus++;
+		return -EINVAL;
+	}
+
+	cpu_update_apic(cpu, apicid);
+	return cpu;
+}
+
+static int __init apic_set_disabled_cpu_apicid(char *arg)
+{
+	if (!arg || !get_option(&arg, &disabled_cpu_apicid))
+		return -EINVAL;
+
+	return 0;
+}
+early_param("disable_cpu_apicid", apic_set_disabled_cpu_apicid);


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 02/30] x86/cpu/topology: Provide separate APIC registration functions
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
  2024-01-23 13:10 ` [patch v2 01/30] x86/cpu/topology: Move registration out of APIC code Thomas Gleixner
@ 2024-01-23 13:10 ` Thomas Gleixner
  2024-01-23 13:10 ` [patch v2 03/30] x86/acpi: Use new " Thomas Gleixner
                   ` (29 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:10 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

generic_processor_info() aside of being a complete misnomer is used for
both early boot registration and ACPI CPU hotplug.

While it's arguable that this can share some code, it results in code which
is hard to understand and kept around post init for no real reason.

Also the call sites do lots of manual fiddling in topology related
variables instead of having proper interfaces for the purpose which handle
the topology internals correctly.

Provide topology_register_apic(), topology_hotplug_apic() and
topology_hotunplug_apic() which have the extra magic of the call sites
incorporated and for now are wrappers around generic_processor_info().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/apic.h    |    3 +
 arch/x86/kernel/cpu/topology.c |  113 ++++++++++++++++++++++++++++++++++-------
 2 files changed, 98 insertions(+), 18 deletions(-)
---
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -171,7 +171,10 @@ extern bool apic_needs_pit(void);
 
 extern void apic_send_IPI_allbutself(unsigned int vector);
 
+extern void topology_register_apic(u32 apic_id, u32 acpi_id, bool present);
 extern void topology_register_boot_apic(u32 apic_id);
+extern int topology_hotplug_apic(u32 apic_id, u32 acpi_id);
+extern void topology_hotunplug_apic(unsigned int cpu);
 
 #else /* !CONFIG_X86_LOCAL_APIC */
 static inline void lapic_shutdown(void) { }
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -84,32 +84,38 @@ early_initcall(smp_init_primary_thread_m
 static inline void cpu_mark_primary_thread(unsigned int cpu, unsigned int apicid) { }
 #endif
 
-/*
- * Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids
- * and cpuid_to_apicid[] synchronized.
- */
-static int allocate_logical_cpuid(int apicid)
+static int topo_lookup_cpuid(u32 apic_id)
 {
 	int i;
 
-	/*
-	 * cpuid <-> apicid mapping is persistent, so when a cpu is up,
-	 * check if the kernel has allocated a cpuid for it.
-	 */
+	/* CPU# to APICID mapping is persistent once it is established */
 	for (i = 0; i < nr_logical_cpuids; i++) {
-		if (cpuid_to_apicid[i] == apicid)
+		if (cpuid_to_apicid[i] == apic_id)
 			return i;
 	}
+	return -ENODEV;
+}
+
+/*
+ * Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids
+ * and cpuid_to_apicid[] synchronized.
+ */
+static int allocate_logical_cpuid(u32 apic_id)
+{
+	int cpu = topo_lookup_cpuid(apic_id);
+
+	if (cpu >= 0)
+		return cpu;
 
 	/* Allocate a new cpuid. */
 	if (nr_logical_cpuids >= nr_cpu_ids) {
 		WARN_ONCE(1, "APIC: NR_CPUS/possible_cpus limit of %u reached. "
 			     "Processor %d/0x%x and the rest are ignored.\n",
-			     nr_cpu_ids, nr_logical_cpuids, apicid);
+			     nr_cpu_ids, nr_logical_cpuids, apic_id);
 		return -EINVAL;
 	}
 
-	cpuid_to_apicid[nr_logical_cpuids] = apicid;
+	cpuid_to_apicid[nr_logical_cpuids] = apic_id;
 	return nr_logical_cpuids++;
 }
 
@@ -127,12 +133,6 @@ static void cpu_update_apic(int cpu, u32
 		cpu_mark_primary_thread(cpu, apicid);
 }
 
-void __init topology_register_boot_apic(u32 apic_id)
-{
-	cpuid_to_apicid[0] = apic_id;
-	cpu_update_apic(0, apic_id);
-}
-
 int generic_processor_info(int apicid)
 {
 	int cpu, max = nr_cpu_ids;
@@ -174,6 +174,83 @@ int generic_processor_info(int apicid)
 	return cpu;
 }
 
+/**
+ * topology_register_apic - Register an APIC in early topology maps
+ * @apic_id:	The APIC ID to set up
+ * @acpi_id:	The ACPI ID associated to the APIC
+ * @present:	True if the corresponding CPU is present
+ */
+void __init topology_register_apic(u32 apic_id, u32 acpi_id, bool present)
+{
+	int cpu;
+
+	if (apic_id >= MAX_LOCAL_APIC) {
+		pr_err_once("APIC ID %x exceeds kernel limit of: %x\n", apic_id, MAX_LOCAL_APIC - 1);
+		return;
+	}
+
+	if (!present) {
+		disabled_cpus++;
+		return;
+	}
+
+	cpu = generic_processor_info(apic_id);
+	if (cpu >= 0)
+		early_per_cpu(x86_cpu_to_acpiid, cpu) = acpi_id;
+}
+
+/**
+ * topology_register_boot_apic - Register the boot CPU APIC
+ * @apic_id:	The APIC ID to set up
+ *
+ * Separate so CPU #0 can be assigned
+ */
+void __init topology_register_boot_apic(u32 apic_id)
+{
+	cpuid_to_apicid[0] = apic_id;
+	cpu_update_apic(0, apic_id);
+}
+
+#ifdef CONFIG_ACPI_HOTPLUG_CPU
+/**
+ * topology_hotplug_apic - Handle a physical hotplugged APIC after boot
+ * @apic_id:	The APIC ID to set up
+ * @acpi_id:	The ACPI ID associated to the APIC
+ */
+int topology_hotplug_apic(u32 apic_id, u32 acpi_id)
+{
+	int cpu;
+
+	if (apic_id >= MAX_LOCAL_APIC)
+		return -EINVAL;
+
+	cpu = topo_lookup_cpuid(apic_id);
+	if (cpu < 0) {
+		cpu = generic_processor_info(apic_id);
+		if (cpu >= 0)
+			per_cpu(x86_cpu_to_acpiid, cpu) = acpi_id;
+	}
+	return cpu;
+}
+
+/**
+ * topology_hotunplug_apic - Remove a physical hotplugged APIC after boot
+ * @cpu:	The CPU number for which the APIC ID is removed
+ */
+void topology_hotunplug_apic(unsigned int cpu)
+{
+	u32 apic_id = cpuid_to_apicid[cpu];
+
+	if (apic_id == BAD_APICID)
+		return;
+
+	per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
+	clear_bit(apic_id, phys_cpu_present_map);
+	set_cpu_present(cpu, false);
+	num_processors--;
+}
+#endif
+
 static int __init apic_set_disabled_cpu_apicid(char *arg)
 {
 	if (!arg || !get_option(&arg, &disabled_cpu_apicid))


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 03/30] x86/acpi: Use new APIC registration functions
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
  2024-01-23 13:10 ` [patch v2 01/30] x86/cpu/topology: Move registration out of APIC code Thomas Gleixner
  2024-01-23 13:10 ` [patch v2 02/30] x86/cpu/topology: Provide separate APIC registration functions Thomas Gleixner
@ 2024-01-23 13:10 ` Thomas Gleixner
  2024-01-23 13:10 ` [patch v2 04/30] x86/jailhouse: Use new APIC registration function Thomas Gleixner
                   ` (28 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:10 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

Use the new topology registration functions and make the early boot code
path __init. No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/acpi/boot.c |   44 +++++++-------------------------------------
 1 file changed, 7 insertions(+), 37 deletions(-)
---
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -164,33 +164,9 @@ static int __init acpi_parse_madt(struct
 	return 0;
 }
 
-/**
- * acpi_register_lapic - register a local apic and generates a logic cpu number
- * @id: local apic id to register
- * @acpiid: ACPI id to register
- * @enabled: this cpu is enabled or not
- *
- * Returns the logic cpu number which maps to the local apic
- */
-static int acpi_register_lapic(int id, u32 acpiid, u8 enabled)
+static __init void acpi_register_lapic(u32 apic_id, u32 acpi_id, bool present)
 {
-	int cpu;
-
-	if (id >= MAX_LOCAL_APIC) {
-		pr_info("skipped apicid that is too big\n");
-		return -EINVAL;
-	}
-
-	if (!enabled) {
-		++disabled_cpus;
-		return -EINVAL;
-	}
-
-	cpu = generic_processor_info(id);
-	if (cpu >= 0)
-		early_per_cpu(x86_cpu_to_acpiid, cpu) = acpiid;
-
-	return cpu;
+	topology_register_apic(apic_id, acpi_id, present);
 }
 
 static bool __init acpi_is_processor_usable(u32 lapic_flags)
@@ -844,12 +820,10 @@ static int acpi_map_cpu2node(acpi_handle
 	return 0;
 }
 
-int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, u32 acpi_id,
-		 int *pcpu)
+int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, u32 acpi_id, int *pcpu)
 {
-	int cpu;
+	int cpu = topology_hotplug_apic(physid, acpi_id);
 
-	cpu = acpi_register_lapic(physid, acpi_id, ACPI_MADT_ENABLED);
 	if (cpu < 0) {
 		pr_info("Unable to map lapic to logical cpu number\n");
 		return cpu;
@@ -868,15 +842,11 @@ int acpi_unmap_cpu(int cpu)
 #ifdef CONFIG_ACPI_NUMA
 	set_apicid_to_node(per_cpu(x86_cpu_to_apicid, cpu), NUMA_NO_NODE);
 #endif
-
-	per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
-	set_cpu_present(cpu, false);
-	num_processors--;
-
-	return (0);
+	topology_hotunplug_apic(cpu);
+	return 0;
 }
 EXPORT_SYMBOL(acpi_unmap_cpu);
-#endif				/* CONFIG_ACPI_HOTPLUG_CPU */
+#endif	/* CONFIG_ACPI_HOTPLUG_CPU */
 
 int acpi_register_ioapic(acpi_handle handle, u64 phys_addr, u32 gsi_base)
 {


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 04/30] x86/jailhouse: Use new APIC registration function
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (2 preceding siblings ...)
  2024-01-23 13:10 ` [patch v2 03/30] x86/acpi: Use new " Thomas Gleixner
@ 2024-01-23 13:10 ` Thomas Gleixner
  2024-01-23 13:10 ` [patch v2 05/30] x86/of: Use new APIC registration functions Thomas Gleixner
                   ` (27 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:10 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/jailhouse.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
---
--- a/arch/x86/kernel/jailhouse.c
+++ b/arch/x86/kernel/jailhouse.c
@@ -102,7 +102,7 @@ static void __init jailhouse_parse_smp_c
 	register_lapic_address(0xfee00000);
 
 	for (cpu = 0; cpu < setup_data.v1.num_cpus; cpu++)
-		generic_processor_info(setup_data.v1.cpu_ids[cpu]);
+		topology_register_apic(setup_data.v1.cpu_ids[cpu], CPU_ACPIID_INVALID, true);
 
 	smp_found_config = 1;
 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 05/30] x86/of: Use new APIC registration functions
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (3 preceding siblings ...)
  2024-01-23 13:10 ` [patch v2 04/30] x86/jailhouse: Use new APIC registration function Thomas Gleixner
@ 2024-01-23 13:10 ` Thomas Gleixner
  2024-01-23 13:10 ` [patch v2 06/30] x86/mpparse: Use new APIC registration function Thomas Gleixner
                   ` (26 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:10 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/devicetree.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
---
--- a/arch/x86/kernel/devicetree.c
+++ b/arch/x86/kernel/devicetree.c
@@ -136,7 +136,7 @@ static void __init dtb_cpu_setup(void)
 			pr_warn("%pOF: missing local APIC ID\n", dn);
 			continue;
 		}
-		generic_processor_info(apic_id);
+		topology_register_apic(apic_id, CPU_ACPIID_INVALID, true);
 	}
 }
 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 06/30] x86/mpparse: Use new APIC registration function
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (4 preceding siblings ...)
  2024-01-23 13:10 ` [patch v2 05/30] x86/of: Use new APIC registration functions Thomas Gleixner
@ 2024-01-23 13:10 ` Thomas Gleixner
  2024-01-23 13:11 ` [patch v2 07/30] x86/acpi: Dont invoke topology_register_apic() for XEN PV Thomas Gleixner
                   ` (25 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:10 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

Aside of switching over to the new interface, record the number of
registered CPUs locally, which allows to make num_processors and
disabled_cpus confined to the topology code.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/mpspec.h  |    2 --
 arch/x86/kernel/cpu/topology.c |    2 +-
 arch/x86/kernel/mpparse.c      |   17 +++++++++--------
 3 files changed, 10 insertions(+), 11 deletions(-)
---
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -61,8 +61,6 @@ static inline void e820__memblock_alloc_
 #define mpparse_parse_smp_config	x86_init_noop
 #endif
 
-int generic_processor_info(int apicid);
-
 extern DECLARE_BITMAP(phys_cpu_present_map, MAX_LOCAL_APIC);
 
 static inline void reset_phys_cpu_present_map(u32 apicid)
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -133,7 +133,7 @@ static void cpu_update_apic(int cpu, u32
 		cpu_mark_primary_thread(cpu, apicid);
 }
 
-int generic_processor_info(int apicid)
+static int generic_processor_info(int apicid)
 {
 	int cpu, max = nr_cpu_ids;
 
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -36,6 +36,8 @@
  * Checksum an MP configuration block.
  */
 
+static unsigned int num_procs __initdata;
+
 static int __init mpf_checksum(unsigned char *mp, int len)
 {
 	int sum = 0;
@@ -50,16 +52,15 @@ static void __init MP_processor_info(str
 {
 	char *bootup_cpu = "";
 
-	if (!(m->cpuflag & CPU_ENABLED)) {
-		disabled_cpus++;
+	topology_register_apic(m->apicid, CPU_ACPIID_INVALID, m->cpuflag & CPU_ENABLED);
+	if (!(m->cpuflag & CPU_ENABLED))
 		return;
-	}
 
 	if (m->cpuflag & CPU_BOOTPROCESSOR)
 		bootup_cpu = " (Bootup-CPU)";
 
 	pr_info("Processor #%d%s\n", m->apicid, bootup_cpu);
-	generic_processor_info(m->apicid);
+	num_procs++;
 }
 
 #ifdef CONFIG_X86_IO_APIC
@@ -236,9 +237,9 @@ static int __init smp_read_mpc(struct mp
 		}
 	}
 
-	if (!num_processors)
+	if (!num_procs && !acpi_lapic)
 		pr_err("MPTABLE: no processors registered!\n");
-	return num_processors;
+	return num_procs || acpi_lapic;
 }
 
 #ifdef CONFIG_X86_IO_APIC
@@ -529,8 +530,8 @@ static __init void mpparse_get_smp_confi
 	} else
 		BUG();
 
-	if (!early)
-		pr_info("Processors: %d\n", num_processors);
+	if (!early && !acpi_lapic)
+		pr_info("Processors: %d\n", num_procs);
 	/*
 	 * Only use the first configuration found.
 	 */


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 07/30] x86/acpi: Dont invoke topology_register_apic() for XEN PV
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (5 preceding siblings ...)
  2024-01-23 13:10 ` [patch v2 06/30] x86/mpparse: Use new APIC registration function Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-01-23 13:11 ` [patch v2 08/30] x86/xen/smp_pv: Register fake APICs Thomas Gleixner
                   ` (24 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

The MADT table for XEN/PV dom0 is not really useful and registering the
APICs is momentarily a pointless exercise because XENPV does not use an
APIC at all.

It overrides the x86_init.mpparse.parse_smp_config() callback, resets
num_processors and counts how many of them are provided by the hypervisor.

This is in the way of cleaning up the APIC registration. Prevent MADT
registration for XEN/PV temporarily until the rework is completed and
XEN/PV can use the MADT again.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/acpi/boot.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)
---
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -23,6 +23,8 @@
 #include <linux/serial_core.h>
 #include <linux/pgtable.h>
 
+#include <xen/xen.h>
+
 #include <asm/e820/api.h>
 #include <asm/irqdomain.h>
 #include <asm/pci_x86.h>
@@ -166,7 +168,8 @@ static int __init acpi_parse_madt(struct
 
 static __init void acpi_register_lapic(u32 apic_id, u32 acpi_id, bool present)
 {
-	topology_register_apic(apic_id, acpi_id, present);
+	if (!xen_pv_domain())
+		topology_register_apic(apic_id, acpi_id, present);
 }
 
 static bool __init acpi_is_processor_usable(u32 lapic_flags)
@@ -1087,7 +1090,8 @@ static int __init early_acpi_parse_madt_
 		return count;
 	}
 
-	register_lapic_address(acpi_lapic_addr);
+	if (!xen_pv_domain())
+		register_lapic_address(acpi_lapic_addr);
 
 	return count;
 }


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 08/30] x86/xen/smp_pv: Register fake APICs
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (6 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 07/30] x86/acpi: Dont invoke topology_register_apic() for XEN PV Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-01-23 13:11 ` [patch v2 09/30] x86/cpu/topology: Confine topology information Thomas Gleixner
                   ` (23 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

XENPV does not use the APIC. It's just piggy packing on the infrastructure
and fiddles with global variables as it sees fit.

These global variables are going away, so let XENPV register pseudo APIC
IDs to keep the accounting correct and keep up the illusion that XEN/PV is
something sane.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/xen/smp_pv.c |   35 +++++++++--------------------------
 1 file changed, 9 insertions(+), 26 deletions(-)
---
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -29,6 +29,7 @@
 #include <asm/idtentry.h>
 #include <asm/desc.h>
 #include <asm/cpu.h>
+#include <asm/apic.h>
 #include <asm/io_apic.h>
 
 #include <xen/interface/xen.h>
@@ -150,34 +151,16 @@ int xen_smp_intr_init_pv(unsigned int cp
 
 static void __init xen_pv_smp_config(void)
 {
-	int i, rc;
-	unsigned int subtract = 0;
+	u32 apicid = 0;
+	int i;
 
-	num_processors = 0;
-	disabled_cpus = 0;
-	for (i = 0; i < nr_cpu_ids; i++) {
-		rc = HYPERVISOR_vcpu_op(VCPUOP_is_up, i, NULL);
-		if (rc >= 0) {
-			num_processors++;
-			set_cpu_possible(i, true);
-		} else {
-			set_cpu_possible(i, false);
-			set_cpu_present(i, false);
-			subtract++;
-		}
+	topology_register_boot_apic(apicid++);
+
+	for (i = 1; i < nr_cpu_ids; i++) {
+		if (HYPERVISOR_vcpu_op(VCPUOP_is_up, i, NULL) < 0)
+			break;
+		topology_register_apic(apicid++, CPU_ACPIID_INVALID, true);
 	}
-#ifdef CONFIG_HOTPLUG_CPU
-	/* This is akin to using 'nr_cpus' on the Linux command line.
-	 * Which is OK as when we use 'dom0_max_vcpus=X' we can only
-	 * have up to X, while nr_cpu_ids is greater than X. This
-	 * normally is not a problem, except when CPU hotplugging
-	 * is involved and then there might be more than X CPUs
-	 * in the guest - which will not work as there is no
-	 * hypercall to expand the max number of VCPUs an already
-	 * running guest has. So cap it up to X. */
-	if (subtract)
-		set_nr_cpu_ids(nr_cpu_ids - subtract);
-#endif
 	/* Pretend to be a proper enumerated system */
 	smp_found_config = 1;
 }


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 09/30] x86/cpu/topology: Confine topology information
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (7 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 08/30] x86/xen/smp_pv: Register fake APICs Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-01-23 13:11 ` [patch v2 10/30] x86/cpu/topology: Simplify APIC registration Thomas Gleixner
                   ` (22 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

Now that all external fiddling with num_processors and disabled_cpus is
gone, move the last user prefill_possible_map() into the topology code too
and remove the global visibility of these variables.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/smp.h     |    3 -
 arch/x86/kernel/apic/apic.c    |    1 
 arch/x86/kernel/cpu/topology.c |   76 +++++++++++++++++++++++++++++++++++++++--
 arch/x86/kernel/smpboot.c      |   72 --------------------------------------
 4 files changed, 74 insertions(+), 78 deletions(-)
---
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -9,7 +9,6 @@
 #include <asm/thread_info.h>
 
 extern int smp_num_siblings;
-extern unsigned int num_processors;
 
 DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
 DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_map);
@@ -174,8 +173,6 @@ static inline struct cpumask *cpu_llc_sh
 }
 #endif /* CONFIG_SMP */
 
-extern unsigned disabled_cpus;
-
 #ifdef CONFIG_DEBUG_NMI_SELFTEST
 extern void nmi_selftest(void);
 #else
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2054,7 +2054,6 @@ void __init init_apic_mappings(void)
 			pr_info("APIC: disable apic facility\n");
 			apic_disable();
 		}
-		num_processors = 1;
 	}
 }
 
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -29,8 +29,8 @@ u32 cpuid_to_apicid[] __read_mostly = {
  */
 static u32 disabled_cpu_apicid __ro_after_init = BAD_APICID;
 
-unsigned int num_processors;
-unsigned disabled_cpus;
+static unsigned int num_processors;
+static unsigned int disabled_cpus;
 
 /*
  * The number of allocated logical CPU IDs. Since logical CPU IDs are allocated
@@ -174,6 +174,71 @@ static int generic_processor_info(int ap
 	return cpu;
 }
 
+static int __initdata setup_possible_cpus = -1;
+
+/*
+ * cpu_possible_mask should be static, it cannot change as cpu's
+ * are onlined, or offlined. The reason is per-cpu data-structures
+ * are allocated by some modules at init time, and don't expect to
+ * do this dynamically on cpu arrival/departure.
+ * cpu_present_mask on the other hand can change dynamically.
+ * In case when cpu_hotplug is not compiled, then we resort to current
+ * behaviour, which is cpu_possible == cpu_present.
+ * - Ashok Raj
+ *
+ * Three ways to find out the number of additional hotplug CPUs:
+ * - If the BIOS specified disabled CPUs in ACPI/mptables use that.
+ * - The user can overwrite it with possible_cpus=NUM
+ * - Otherwise don't reserve additional CPUs.
+ * We do this because additional CPUs waste a lot of memory.
+ * -AK
+ */
+__init void prefill_possible_map(void)
+{
+	int i, possible;
+
+	i = setup_max_cpus ?: 1;
+	if (setup_possible_cpus == -1) {
+		possible = num_processors;
+#ifdef CONFIG_HOTPLUG_CPU
+		if (setup_max_cpus)
+			possible += disabled_cpus;
+#else
+		if (possible > i)
+			possible = i;
+#endif
+	} else
+		possible = setup_possible_cpus;
+
+	total_cpus = max_t(int, possible, num_processors + disabled_cpus);
+
+	/* nr_cpu_ids could be reduced via nr_cpus= */
+	if (possible > nr_cpu_ids) {
+		pr_warn("%d Processors exceeds NR_CPUS limit of %u\n",
+			possible, nr_cpu_ids);
+		possible = nr_cpu_ids;
+	}
+
+#ifdef CONFIG_HOTPLUG_CPU
+	if (!setup_max_cpus)
+#endif
+	if (possible > i) {
+		pr_warn("%d Processors exceeds max_cpus limit of %u\n",
+			possible, setup_max_cpus);
+		possible = i;
+	}
+
+	set_nr_cpu_ids(possible);
+
+	pr_info("Allowing %d CPUs, %d hotplug CPUs\n",
+		possible, max_t(int, possible - num_processors, 0));
+
+	reset_cpu_possible_mask();
+
+	for (i = 0; i < possible; i++)
+		set_cpu_possible(i, true);
+}
+
 /**
  * topology_register_apic - Register an APIC in early topology maps
  * @apic_id:	The APIC ID to set up
@@ -251,6 +316,13 @@ void topology_hotunplug_apic(unsigned in
 }
 #endif
 
+static int __init _setup_possible_cpus(char *str)
+{
+	get_option(&str, &setup_possible_cpus);
+	return 0;
+}
+early_param("possible_cpus", _setup_possible_cpus);
+
 static int __init apic_set_disabled_cpu_apicid(char *arg)
 {
 	if (!arg || !get_option(&arg, &disabled_cpu_apicid))
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1291,78 +1291,6 @@ void __init native_smp_cpus_done(unsigne
 	cache_aps_init();
 }
 
-static int __initdata setup_possible_cpus = -1;
-static int __init _setup_possible_cpus(char *str)
-{
-	get_option(&str, &setup_possible_cpus);
-	return 0;
-}
-early_param("possible_cpus", _setup_possible_cpus);
-
-
-/*
- * cpu_possible_mask should be static, it cannot change as cpu's
- * are onlined, or offlined. The reason is per-cpu data-structures
- * are allocated by some modules at init time, and don't expect to
- * do this dynamically on cpu arrival/departure.
- * cpu_present_mask on the other hand can change dynamically.
- * In case when cpu_hotplug is not compiled, then we resort to current
- * behaviour, which is cpu_possible == cpu_present.
- * - Ashok Raj
- *
- * Three ways to find out the number of additional hotplug CPUs:
- * - If the BIOS specified disabled CPUs in ACPI/mptables use that.
- * - The user can overwrite it with possible_cpus=NUM
- * - Otherwise don't reserve additional CPUs.
- * We do this because additional CPUs waste a lot of memory.
- * -AK
- */
-__init void prefill_possible_map(void)
-{
-	int i, possible;
-
-	i = setup_max_cpus ?: 1;
-	if (setup_possible_cpus == -1) {
-		possible = num_processors;
-#ifdef CONFIG_HOTPLUG_CPU
-		if (setup_max_cpus)
-			possible += disabled_cpus;
-#else
-		if (possible > i)
-			possible = i;
-#endif
-	} else
-		possible = setup_possible_cpus;
-
-	total_cpus = max_t(int, possible, num_processors + disabled_cpus);
-
-	/* nr_cpu_ids could be reduced via nr_cpus= */
-	if (possible > nr_cpu_ids) {
-		pr_warn("%d Processors exceeds NR_CPUS limit of %u\n",
-			possible, nr_cpu_ids);
-		possible = nr_cpu_ids;
-	}
-
-#ifdef CONFIG_HOTPLUG_CPU
-	if (!setup_max_cpus)
-#endif
-	if (possible > i) {
-		pr_warn("%d Processors exceeds max_cpus limit of %u\n",
-			possible, setup_max_cpus);
-		possible = i;
-	}
-
-	set_nr_cpu_ids(possible);
-
-	pr_info("Allowing %d CPUs, %d hotplug CPUs\n",
-		possible, max_t(int, possible - num_processors, 0));
-
-	reset_cpu_possible_mask();
-
-	for (i = 0; i < possible; i++)
-		set_cpu_possible(i, true);
-}
-
 /* correctly size the local cpu masks */
 void __init setup_cpu_local_masks(void)
 {


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 10/30] x86/cpu/topology: Simplify APIC registration
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (8 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 09/30] x86/cpu/topology: Confine topology information Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-01-23 13:11 ` [patch v2 11/30] x86/cpu/topology: Use a data structure for topology info Thomas Gleixner
                   ` (21 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

Having the same check whether the number of assigned CPUs has reached the
nr_cpu_ids limit twice in the same code path is pointless. Repeating the
information that CPUs are ignored over and over is also pointless noise.

Remove the redundant check and reduce the noise by using a pr_warn_once().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/cpu/topology.c |   23 +++--------------------
 1 file changed, 3 insertions(+), 20 deletions(-)
---
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -107,14 +107,6 @@ static int allocate_logical_cpuid(u32 ap
 	if (cpu >= 0)
 		return cpu;
 
-	/* Allocate a new cpuid. */
-	if (nr_logical_cpuids >= nr_cpu_ids) {
-		WARN_ONCE(1, "APIC: NR_CPUS/possible_cpus limit of %u reached. "
-			     "Processor %d/0x%x and the rest are ignored.\n",
-			     nr_cpu_ids, nr_logical_cpuids, apic_id);
-		return -EINVAL;
-	}
-
 	cpuid_to_apicid[nr_logical_cpuids] = apic_id;
 	return nr_logical_cpuids++;
 }
@@ -135,7 +127,7 @@ static void cpu_update_apic(int cpu, u32
 
 static int generic_processor_info(int apicid)
 {
-	int cpu, max = nr_cpu_ids;
+	int cpu;
 
 	/* The boot CPU must be set before MADT/MPTABLE parsing happens */
 	if (cpuid_to_apicid[0] == BAD_APICID)
@@ -155,21 +147,12 @@ static int generic_processor_info(int ap
 	}
 
 	if (num_processors >= nr_cpu_ids) {
-		int thiscpu = max + disabled_cpus;
-
-		pr_warn("APIC: NR_CPUS/possible_cpus limit of %i reached. "
-			"Processor %d/0x%x ignored.\n", max, thiscpu, apicid);
-
+		pr_warn_once("APIC: CPU limit of %d reached. Ignoring further CPUs\n", nr_cpu_ids);
 		disabled_cpus++;
-		return -EINVAL;
+		return -ENOSPC;
 	}
 
 	cpu = allocate_logical_cpuid(apicid);
-	if (cpu < 0) {
-		disabled_cpus++;
-		return -EINVAL;
-	}
-
 	cpu_update_apic(cpu, apicid);
 	return cpu;
 }


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 11/30] x86/cpu/topology: Use a data structure for topology info
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (9 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 10/30] x86/cpu/topology: Simplify APIC registration Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-01-23 13:11 ` [patch v2 12/30] x86/smpboot: Make error message actually useful Thomas Gleixner
                   ` (20 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

Put the processor accounting into a data structure, which will gain more
topology related information in the next steps, and sanitize the accounting.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/cpu/topology.c |   59 ++++++++++++++++++++---------------------
 1 file changed, 29 insertions(+), 30 deletions(-)
---
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -23,25 +23,24 @@ DECLARE_BITMAP(phys_cpu_present_map, MAX
 u32 cpuid_to_apicid[] __read_mostly = { [0 ... NR_CPUS - 1] = BAD_APICID, };
 
 /*
+ * Keep track of assigned, disabled and rejected CPUs. Present assigned
+ * with 1 as CPU #0 is reserved for the boot CPU.
+ */
+static struct {
+	unsigned int		nr_assigned_cpus;
+	unsigned int		nr_disabled_cpus;
+	unsigned int		nr_rejected_cpus;
+} topo_info __read_mostly = {
+	.nr_assigned_cpus	= 1,
+};
+
+/*
  * Processor to be disabled specified by kernel parameter
  * disable_cpu_apicid=<int>, mostly used for the kdump 2nd kernel to
  * avoid undefined behaviour caused by sending INIT from AP to BSP.
  */
 static u32 disabled_cpu_apicid __ro_after_init = BAD_APICID;
 
-static unsigned int num_processors;
-static unsigned int disabled_cpus;
-
-/*
- * The number of allocated logical CPU IDs. Since logical CPU IDs are allocated
- * contiguously, it equals to current allocated max logical CPU ID plus 1.
- * All allocated CPU IDs should be in the [0, nr_logical_cpuids) range,
- * so the maximum of nr_logical_cpuids is nr_cpu_ids.
- *
- * NOTE: Reserve 0 for BSP.
- */
-static int nr_logical_cpuids = 1;
-
 bool arch_match_cpu_phys_id(int cpu, u64 phys_id)
 {
 	return phys_id == (u64)cpuid_to_apicid[cpu];
@@ -75,7 +74,7 @@ static int __init smp_init_primary_threa
 		return 0;
 	}
 
-	for (cpu = 0; cpu < nr_logical_cpuids; cpu++)
+	for (cpu = 0; cpu < topo_info.nr_assigned_cpus; cpu++)
 		cpu_mark_primary_thread(cpu, cpuid_to_apicid[cpu]);
 	return 0;
 }
@@ -89,7 +88,7 @@ static int topo_lookup_cpuid(u32 apic_id
 	int i;
 
 	/* CPU# to APICID mapping is persistent once it is established */
-	for (i = 0; i < nr_logical_cpuids; i++) {
+	for (i = 0; i < topo_info.nr_assigned_cpus; i++) {
 		if (cpuid_to_apicid[i] == apic_id)
 			return i;
 	}
@@ -107,22 +106,21 @@ static int allocate_logical_cpuid(u32 ap
 	if (cpu >= 0)
 		return cpu;
 
-	cpuid_to_apicid[nr_logical_cpuids] = apic_id;
-	return nr_logical_cpuids++;
+	return topo_info.nr_assigned_cpus++;
 }
 
-static void cpu_update_apic(int cpu, u32 apicid)
+static void cpu_update_apic(unsigned int cpu, u32 apic_id)
 {
 #if defined(CONFIG_SMP) || defined(CONFIG_X86_64)
-	early_per_cpu(x86_cpu_to_apicid, cpu) = apicid;
+	early_per_cpu(x86_cpu_to_apicid, cpu) = apic_id;
 #endif
+	cpuid_to_apicid[cpu] = apic_id;
 	set_cpu_possible(cpu, true);
-	set_bit(apicid, phys_cpu_present_map);
+	set_bit(apic_id, phys_cpu_present_map);
 	set_cpu_present(cpu, true);
-	num_processors++;
 
 	if (system_state != SYSTEM_BOOTING)
-		cpu_mark_primary_thread(cpu, apicid);
+		cpu_mark_primary_thread(cpu, apic_id);
 }
 
 static int generic_processor_info(int apicid)
@@ -137,18 +135,18 @@ static int generic_processor_info(int ap
 		return 0;
 
 	if (disabled_cpu_apicid == apicid) {
-		int thiscpu = num_processors + disabled_cpus;
+		int thiscpu = topo_info.nr_assigned_cpus + topo_info.nr_disabled_cpus;
 
 		pr_warn("APIC: Disabling requested cpu. Processor %d/0x%x ignored.\n",
 			thiscpu, apicid);
 
-		disabled_cpus++;
+		topo_info.nr_rejected_cpus++;
 		return -ENODEV;
 	}
 
-	if (num_processors >= nr_cpu_ids) {
+	if (topo_info.nr_assigned_cpus >= nr_cpu_ids) {
 		pr_warn_once("APIC: CPU limit of %d reached. Ignoring further CPUs\n", nr_cpu_ids);
-		disabled_cpus++;
+		topo_info.nr_rejected_cpus++;
 		return -ENOSPC;
 	}
 
@@ -178,14 +176,16 @@ static int __initdata setup_possible_cpu
  */
 __init void prefill_possible_map(void)
 {
+	unsigned int num_processors = topo_info.nr_assigned_cpus;
+	unsigned int disabled_cpus = topo_info.nr_disabled_cpus;
 	int i, possible;
 
 	i = setup_max_cpus ?: 1;
 	if (setup_possible_cpus == -1) {
-		possible = num_processors;
+		possible = topo_info.nr_assigned_cpus;
 #ifdef CONFIG_HOTPLUG_CPU
 		if (setup_max_cpus)
-			possible += disabled_cpus;
+			possible += num_processors;
 #else
 		if (possible > i)
 			possible = i;
@@ -238,7 +238,7 @@ void __init topology_register_apic(u32 a
 	}
 
 	if (!present) {
-		disabled_cpus++;
+		topo_info.nr_disabled_cpus++;
 		return;
 	}
 
@@ -295,7 +295,6 @@ void topology_hotunplug_apic(unsigned in
 	per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
 	clear_bit(apic_id, phys_cpu_present_map);
 	set_cpu_present(cpu, false);
-	num_processors--;
 }
 #endif
 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 12/30] x86/smpboot: Make error message actually useful
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (10 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 11/30] x86/cpu/topology: Use a data structure for topology info Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-01-23 13:11 ` [patch v2 13/30] x86/cpu/topology: Sanitize the APIC admission logic Thomas Gleixner
                   ` (19 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

"smpboot: native_kick_ap: bad cpu 33" is absolutely useless information.

Replace it with something meaningful which allows to decode the failure
condition.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/smpboot.c |   10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)
---
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1072,9 +1072,13 @@ int native_kick_ap(unsigned int cpu, str
 
 	pr_debug("++++++++++++++++++++=_---CPU UP  %u\n", cpu);
 
-	if (apicid == BAD_APICID || !test_bit(apicid, phys_cpu_present_map) ||
-	    !apic_id_valid(apicid)) {
-		pr_err("%s: bad cpu %d\n", __func__, cpu);
+	if (apicid == BAD_APICID || !apic_id_valid(apicid)) {
+		pr_err("CPU %u has invalid APIC ID %x. Aborting bringup\n", cpu, apicid);
+		return -EINVAL;
+	}
+
+	if (!test_bit(apicid, phys_cpu_present_map)) {
+		pr_err("CPU %u APIC ID %x is not present. Aborting bringup\n", cpu, apicid);
 		return -EINVAL;
 	}
 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 13/30] x86/cpu/topology: Sanitize the APIC admission logic
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (11 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 12/30] x86/smpboot: Make error message actually useful Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-01-23 13:11 ` [patch v2 14/30] x86/cpu/topology: Rework possible CPU management Thomas Gleixner
                   ` (18 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

Move the actually required content of generic_processor_id() into the call
sites and use common helper functions for them. This separates the early
boot registration and the ACPI hotplug mechanism completely which allows
further cleanups and improvements.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/cpu/topology.c |  160 +++++++++++++++++++----------------------
 1 file changed, 78 insertions(+), 82 deletions(-)
---
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -30,8 +30,10 @@ static struct {
 	unsigned int		nr_assigned_cpus;
 	unsigned int		nr_disabled_cpus;
 	unsigned int		nr_rejected_cpus;
+	u32			boot_cpu_apic_id;
 } topo_info __read_mostly = {
 	.nr_assigned_cpus	= 1,
+	.boot_cpu_apic_id	= BAD_APICID,
 };
 
 /*
@@ -83,78 +85,6 @@ early_initcall(smp_init_primary_thread_m
 static inline void cpu_mark_primary_thread(unsigned int cpu, unsigned int apicid) { }
 #endif
 
-static int topo_lookup_cpuid(u32 apic_id)
-{
-	int i;
-
-	/* CPU# to APICID mapping is persistent once it is established */
-	for (i = 0; i < topo_info.nr_assigned_cpus; i++) {
-		if (cpuid_to_apicid[i] == apic_id)
-			return i;
-	}
-	return -ENODEV;
-}
-
-/*
- * Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids
- * and cpuid_to_apicid[] synchronized.
- */
-static int allocate_logical_cpuid(u32 apic_id)
-{
-	int cpu = topo_lookup_cpuid(apic_id);
-
-	if (cpu >= 0)
-		return cpu;
-
-	return topo_info.nr_assigned_cpus++;
-}
-
-static void cpu_update_apic(unsigned int cpu, u32 apic_id)
-{
-#if defined(CONFIG_SMP) || defined(CONFIG_X86_64)
-	early_per_cpu(x86_cpu_to_apicid, cpu) = apic_id;
-#endif
-	cpuid_to_apicid[cpu] = apic_id;
-	set_cpu_possible(cpu, true);
-	set_bit(apic_id, phys_cpu_present_map);
-	set_cpu_present(cpu, true);
-
-	if (system_state != SYSTEM_BOOTING)
-		cpu_mark_primary_thread(cpu, apic_id);
-}
-
-static int generic_processor_info(int apicid)
-{
-	int cpu;
-
-	/* The boot CPU must be set before MADT/MPTABLE parsing happens */
-	if (cpuid_to_apicid[0] == BAD_APICID)
-		panic("Boot CPU APIC not registered yet\n");
-
-	if (apicid == boot_cpu_physical_apicid)
-		return 0;
-
-	if (disabled_cpu_apicid == apicid) {
-		int thiscpu = topo_info.nr_assigned_cpus + topo_info.nr_disabled_cpus;
-
-		pr_warn("APIC: Disabling requested cpu. Processor %d/0x%x ignored.\n",
-			thiscpu, apicid);
-
-		topo_info.nr_rejected_cpus++;
-		return -ENODEV;
-	}
-
-	if (topo_info.nr_assigned_cpus >= nr_cpu_ids) {
-		pr_warn_once("APIC: CPU limit of %d reached. Ignoring further CPUs\n", nr_cpu_ids);
-		topo_info.nr_rejected_cpus++;
-		return -ENOSPC;
-	}
-
-	cpu = allocate_logical_cpuid(apicid);
-	cpu_update_apic(cpu, apicid);
-	return cpu;
-}
-
 static int __initdata setup_possible_cpus = -1;
 
 /*
@@ -222,6 +152,43 @@ static int __initdata setup_possible_cpu
 		set_cpu_possible(i, true);
 }
 
+static int topo_lookup_cpuid(u32 apic_id)
+{
+	int i;
+
+	/* CPU# to APICID mapping is persistent once it is established */
+	for (i = 0; i < topo_info.nr_assigned_cpus; i++) {
+		if (cpuid_to_apicid[i] == apic_id)
+			return i;
+	}
+	return -ENODEV;
+}
+
+static int topo_assign_cpunr(u32 apic_id)
+{
+	int cpu = topo_lookup_cpuid(apic_id);
+
+	if (cpu >= 0)
+		return cpu;
+
+	return topo_info.nr_assigned_cpus++;
+}
+
+static void topo_set_cpuids(unsigned int cpu, u32 apic_id, u32 acpi_id)
+{
+#if defined(CONFIG_SMP) || defined(CONFIG_X86_64)
+	early_per_cpu(x86_cpu_to_apicid, cpu) = apic_id;
+	early_per_cpu(x86_cpu_to_acpiid, cpu) = acpi_id;
+#endif
+	cpuid_to_apicid[cpu] = apic_id;
+
+	set_cpu_possible(cpu, true);
+	set_cpu_present(cpu, true);
+
+	if (system_state != SYSTEM_BOOTING)
+		cpu_mark_primary_thread(cpu, apic_id);
+}
+
 /**
  * topology_register_apic - Register an APIC in early topology maps
  * @apic_id:	The APIC ID to set up
@@ -234,17 +201,41 @@ void __init topology_register_apic(u32 a
 
 	if (apic_id >= MAX_LOCAL_APIC) {
 		pr_err_once("APIC ID %x exceeds kernel limit of: %x\n", apic_id, MAX_LOCAL_APIC - 1);
+		topo_info.nr_rejected_cpus++;
 		return;
 	}
 
-	if (!present) {
-		topo_info.nr_disabled_cpus++;
+	/* CPU numbers exhausted? */
+	if (topo_info.nr_assigned_cpus >= nr_cpu_ids) {
+		pr_warn_once("CPU limit of %d reached. Ignoring further CPUs\n", nr_cpu_ids);
+		topo_info.nr_rejected_cpus++;
 		return;
 	}
 
-	cpu = generic_processor_info(apic_id);
-	if (cpu >= 0)
-		early_per_cpu(x86_cpu_to_acpiid, cpu) = acpi_id;
+	if (disabled_cpu_apicid == apic_id) {
+		pr_info("Disabling CPU as requested via 'disable_cpu_apicid=0x%x'.\n", apic_id);
+		topo_info.nr_rejected_cpus++;
+		return;
+	}
+
+	if (present) {
+		/*
+		 * Prevent double registration, which is valid in case of
+		 * the boot CPU APIC because that is registered before the
+		 * enumeration of the APICs via firmware parsers or VM
+		 * guest mechanisms.
+		 */
+		if (test_and_set_bit(apic_id, phys_cpu_present_map))
+			return;
+
+		if (apic_id == topo_info.boot_cpu_apic_id)
+			cpu = 0;
+		else
+			cpu = topo_assign_cpunr(apic_id);
+		topo_set_cpuids(cpu, apic_id, acpi_id);
+	} else {
+		topo_info.nr_disabled_cpus++;
+	}
 }
 
 /**
@@ -255,8 +246,10 @@ void __init topology_register_apic(u32 a
  */
 void __init topology_register_boot_apic(u32 apic_id)
 {
-	cpuid_to_apicid[0] = apic_id;
-	cpu_update_apic(0, apic_id);
+	WARN_ON_ONCE(topo_info.boot_cpu_apic_id != BAD_APICID);
+
+	topo_info.boot_cpu_apic_id = apic_id;
+	topology_register_apic(apic_id, CPU_ACPIID_INVALID, true);
 }
 
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
@@ -274,10 +267,13 @@ int topology_hotplug_apic(u32 apic_id, u
 
 	cpu = topo_lookup_cpuid(apic_id);
 	if (cpu < 0) {
-		cpu = generic_processor_info(apic_id);
-		if (cpu >= 0)
-			per_cpu(x86_cpu_to_acpiid, cpu) = acpi_id;
+		if (topo_info.nr_assigned_cpus >= nr_cpu_ids)
+			return -ENOSPC;
+
+		cpu = topo_assign_cpunr(apic_id);
 	}
+	set_bit(apic_id, phys_cpu_present_map);
+	topo_set_cpuids(cpu, apic_id, acpi_id);
 	return cpu;
 }
 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 14/30] x86/cpu/topology: Rework possible CPU management
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (12 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 13/30] x86/cpu/topology: Sanitize the APIC admission logic Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-01-31 23:47   ` Sohil Mehta
  2024-01-23 13:11 ` [patch v2 15/30] x86/cpu: Detect real BSP on crash kernels Thomas Gleixner
                   ` (17 subsequent siblings)
  31 siblings, 1 reply; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

Managing possible CPUs is an unreadable and uncomprehensible maze. Aside of
that it's backwards because it applies command line limits after
registering all APICs.

Rewrite it so that it:

  - Applies the command line limits upfront so that only the allowed amount
    of APIC IDs can be registered.

  - Applies eventual late restrictions in an understandable way

  - Uses simple min_t() calculations which are trivial to follow.

  - Provides a separate function for resetting to UP mode late in the
    bringup process.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/apic.h     |    5 +
 arch/x86/include/asm/cpu.h      |   10 --
 arch/x86/include/asm/topology.h |    1 
 arch/x86/kernel/cpu/topology.c  |  176 ++++++++++++++++++++++++----------------
 arch/x86/kernel/setup.c         |    9 --
 arch/x86/kernel/smpboot.c       |    6 -
 6 files changed, 118 insertions(+), 89 deletions(-)
---
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -175,6 +175,9 @@ extern void topology_register_apic(u32 a
 extern void topology_register_boot_apic(u32 apic_id);
 extern int topology_hotplug_apic(u32 apic_id, u32 acpi_id);
 extern void topology_hotunplug_apic(unsigned int cpu);
+extern void topology_apply_cmdline_limits_early(void);
+extern void topology_init_possible_cpus(void);
+extern void topology_reset_possible_cpus_up(void);
 
 #else /* !CONFIG_X86_LOCAL_APIC */
 static inline void lapic_shutdown(void) { }
@@ -190,6 +193,8 @@ static inline void apic_intr_mode_init(v
 static inline void lapic_assign_system_vectors(void) { }
 static inline void lapic_assign_legacy_vector(unsigned int i, bool r) { }
 static inline bool apic_needs_pit(void) { return true; }
+static inline void topology_apply_cmdline_limits_early(void) { }
+static inline void topology_init_possible_cpus(void) { }
 #endif /* !CONFIG_X86_LOCAL_APIC */
 
 #ifdef CONFIG_X86_X2APIC
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -9,18 +9,10 @@
 #include <linux/percpu.h>
 #include <asm/ibt.h>
 
-#ifdef CONFIG_SMP
-
-extern void prefill_possible_map(void);
-
-#else /* CONFIG_SMP */
-
-static inline void prefill_possible_map(void) {}
-
+#ifndef CONFIG_SMP
 #define cpu_physical_id(cpu)			boot_cpu_physical_apicid
 #define cpu_acpi_id(cpu)			0
 #define safe_smp_processor_id()			0
-
 #endif /* CONFIG_SMP */
 
 struct x86_cpu {
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -191,6 +191,7 @@ static inline bool topology_is_primary_t
 {
 	return cpumask_test_cpu(cpu, cpu_primary_thread_mask);
 }
+
 #else /* CONFIG_SMP */
 #define topology_max_packages()			(1)
 static inline int
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -5,6 +5,7 @@
 #include <xen/xen.h>
 
 #include <asm/apic.h>
+#include <asm/io_apic.h>
 #include <asm/mpspec.h>
 #include <asm/smp.h>
 
@@ -85,73 +86,6 @@ early_initcall(smp_init_primary_thread_m
 static inline void cpu_mark_primary_thread(unsigned int cpu, unsigned int apicid) { }
 #endif
 
-static int __initdata setup_possible_cpus = -1;
-
-/*
- * cpu_possible_mask should be static, it cannot change as cpu's
- * are onlined, or offlined. The reason is per-cpu data-structures
- * are allocated by some modules at init time, and don't expect to
- * do this dynamically on cpu arrival/departure.
- * cpu_present_mask on the other hand can change dynamically.
- * In case when cpu_hotplug is not compiled, then we resort to current
- * behaviour, which is cpu_possible == cpu_present.
- * - Ashok Raj
- *
- * Three ways to find out the number of additional hotplug CPUs:
- * - If the BIOS specified disabled CPUs in ACPI/mptables use that.
- * - The user can overwrite it with possible_cpus=NUM
- * - Otherwise don't reserve additional CPUs.
- * We do this because additional CPUs waste a lot of memory.
- * -AK
- */
-__init void prefill_possible_map(void)
-{
-	unsigned int num_processors = topo_info.nr_assigned_cpus;
-	unsigned int disabled_cpus = topo_info.nr_disabled_cpus;
-	int i, possible;
-
-	i = setup_max_cpus ?: 1;
-	if (setup_possible_cpus == -1) {
-		possible = topo_info.nr_assigned_cpus;
-#ifdef CONFIG_HOTPLUG_CPU
-		if (setup_max_cpus)
-			possible += num_processors;
-#else
-		if (possible > i)
-			possible = i;
-#endif
-	} else
-		possible = setup_possible_cpus;
-
-	total_cpus = max_t(int, possible, num_processors + disabled_cpus);
-
-	/* nr_cpu_ids could be reduced via nr_cpus= */
-	if (possible > nr_cpu_ids) {
-		pr_warn("%d Processors exceeds NR_CPUS limit of %u\n",
-			possible, nr_cpu_ids);
-		possible = nr_cpu_ids;
-	}
-
-#ifdef CONFIG_HOTPLUG_CPU
-	if (!setup_max_cpus)
-#endif
-	if (possible > i) {
-		pr_warn("%d Processors exceeds max_cpus limit of %u\n",
-			possible, setup_max_cpus);
-		possible = i;
-	}
-
-	set_nr_cpu_ids(possible);
-
-	pr_info("Allowing %d CPUs, %d hotplug CPUs\n",
-		possible, max_t(int, possible - num_processors, 0));
-
-	reset_cpu_possible_mask();
-
-	for (i = 0; i < possible; i++)
-		set_cpu_possible(i, true);
-}
-
 static int topo_lookup_cpuid(u32 apic_id)
 {
 	int i;
@@ -294,12 +228,114 @@ void topology_hotunplug_apic(unsigned in
 }
 #endif
 
-static int __init _setup_possible_cpus(char *str)
+#ifdef CONFIG_SMP
+static unsigned int max_possible_cpus __initdata = NR_CPUS;
+
+/**
+ * topology_apply_cmdline_limits_early - Apply topology command line limits early
+ *
+ * Ensure that command line limits are in effect before firmware parsing
+ * takes place.
+ */
+void __init topology_apply_cmdline_limits_early(void)
+{
+	unsigned int possible = nr_cpu_ids;
+
+	/* 'maxcpus=0' 'nosmp' 'nolapic' 'disableapic' 'noapic' */
+	if (!setup_max_cpus || ioapic_is_disabled || apic_is_disabled)
+		possible = 1;
+
+	/* 'possible_cpus=N' */
+	possible = min_t(unsigned int, max_possible_cpus, possible);
+
+	if (possible < nr_cpu_ids) {
+		pr_info("Limiting to %u possible CPUs\n", possible);
+		set_nr_cpu_ids(possible);
+	}
+}
+
+static __init bool restrict_to_up(void)
+{
+	if (!smp_found_config || ioapic_is_disabled)
+		return true;
+	/*
+	 * XEN PV is special as it does not advertise the local APIC
+	 * properly, but provides a fake topology for it so that the
+	 * infrastructure works. So don't apply the restrictions vs. APIC
+	 * here.
+	 */
+	if (xen_pv_domain())
+		return false;
+
+	return apic_is_disabled;
+}
+
+void __init topology_init_possible_cpus(void)
+{
+	unsigned int assigned = topo_info.nr_assigned_cpus;
+	unsigned int disabled = topo_info.nr_disabled_cpus;
+	unsigned int total = assigned + disabled;
+	unsigned int cpu, allowed = 1;
+
+	if (!restrict_to_up()) {
+		if (WARN_ON_ONCE(assigned > nr_cpu_ids)) {
+			disabled += assigned - nr_cpu_ids;
+			assigned = nr_cpu_ids;
+		}
+		allowed = min_t(unsigned int, total, nr_cpu_ids);
+	}
+
+	if (total > allowed)
+		pr_warn("%u possible CPUs exceed the limit of %u\n", total, allowed);
+
+	assigned = min_t(unsigned int, allowed, assigned);
+	disabled = allowed - assigned;
+
+	topo_info.nr_assigned_cpus = assigned;
+	topo_info.nr_disabled_cpus = disabled;
+
+	total_cpus = allowed;
+	set_nr_cpu_ids(allowed);
+
+	pr_info("Allowing %u present CPUs plus %u hotplug CPUs\n", assigned, disabled);
+	if (topo_info.nr_rejected_cpus)
+		pr_info("Rejected CPUs %u\n", topo_info.nr_rejected_cpus);
+
+	init_cpu_present(cpumask_of(0));
+	init_cpu_possible(cpumask_of(0));
+
+	for (cpu = 0; cpu < allowed; cpu++) {
+		u32 apicid = cpuid_to_apicid[cpu];
+
+		set_cpu_possible(cpu, true);
+
+		if (apicid == BAD_APICID)
+			continue;
+
+		set_cpu_present(cpu, test_bit(apicid, phys_cpu_present_map));
+	}
+}
+
+/*
+ * Late SMP disable after sizing CPU masks when APIC/IOAPIC setup failed.
+ */
+void __init topology_reset_possible_cpus_up(void)
 {
-	get_option(&str, &setup_possible_cpus);
+	init_cpu_present(cpumask_of(0));
+	init_cpu_possible(cpumask_of(0));
+
+	bitmap_zero(phys_cpu_present_map, MAX_LOCAL_APIC);
+	if (topo_info.boot_cpu_apic_id != BAD_APICID)
+		set_bit(topo_info.boot_cpu_apic_id, phys_cpu_present_map);
+}
+
+static int __init setup_possible_cpus(char *str)
+{
+	get_option(&str, &max_possible_cpus);
 	return 0;
 }
-early_param("possible_cpus", _setup_possible_cpus);
+early_param("possible_cpus", setup_possible_cpus);
+#endif
 
 static int __init apic_set_disabled_cpu_apicid(char *arg)
 {
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1129,6 +1129,8 @@ void __init setup_arch(char **cmdline_p)
 
 	early_quirks();
 
+	topology_apply_cmdline_limits_early();
+
 	/*
 	 * Parse SMP configuration. Try ACPI first and then the platform
 	 * specific parser.
@@ -1136,13 +1138,10 @@ void __init setup_arch(char **cmdline_p)
 	acpi_boot_init();
 	x86_init.mpparse.parse_smp_cfg();
 
-	/*
-	 * Systems w/o ACPI and mptables might not have it mapped the local
-	 * APIC yet, but prefill_possible_map() might need to access it.
-	 */
+	/* Last opportunity to detect and map the local APIC */
 	init_apic_mappings();
 
-	prefill_possible_map();
+	topology_init_possible_cpus();
 
 	init_cpu_to_node();
 	init_gi_nodes();
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1147,11 +1147,7 @@ static __init void disable_smp(void)
 	pr_info("SMP disabled\n");
 
 	disable_ioapic_support();
-
-	init_cpu_present(cpumask_of(0));
-	init_cpu_possible(cpumask_of(0));
-
-	reset_phys_cpu_present_map(smp_found_config ? boot_cpu_physical_apicid : 0);
+	topology_reset_possible_cpus_up();
 
 	cpumask_set_cpu(0, topology_sibling_cpumask(0));
 	cpumask_set_cpu(0, topology_core_cpumask(0));


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 15/30] x86/cpu: Detect real BSP on crash kernels
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (13 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 14/30] x86/cpu/topology: Rework possible CPU management Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-01-31 17:59   ` Michael Kelley
  2024-01-23 13:11 ` [patch v2 16/30] x86/topology: Add a mechanism to track topology via APIC IDs Thomas Gleixner
                   ` (16 subsequent siblings)
  31 siblings, 1 reply; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

When a kdump kernel is started from a crashing CPU then there is no
guarantee that this CPU is the real boot CPU (BSP). If the kdump kernel
tries to online the BSP then the INIT sequence will reset the machine.

There is a command line option to prevent this, but in case of nested kdump
kernels this is wrong.

But that command line option is not required at all because the real
BSP is enumerated as the first CPU by firmware. Support for the only
known system which was different (Voyager) got removed long ago.

Detect whether the boot CPU APIC ID is the first APIC ID enumerated by
the firmware. If the first APIC ID enumerated is not matching the boot
CPU APIC ID then skip registering it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Check for the first enumerated APIC ID (Rui)
---
 Documentation/admin-guide/kdump/kdump.rst       |    7 -
 Documentation/admin-guide/kernel-parameters.txt |    9 --
 arch/x86/kernel/cpu/topology.c                  |   99 ++++++++++++++----------
 3 files changed, 61 insertions(+), 54 deletions(-)
---
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -191,9 +191,7 @@ Dump-capture kernel config options (Arch
    CPU is enough for kdump kernel to dump vmcore on most of systems.
 
    However, you can also specify nr_cpus=X to enable multiple processors
-   in kdump kernel. In this case, "disable_cpu_apicid=" is needed to
-   tell kdump kernel which cpu is 1st kernel's BSP. Please refer to
-   admin-guide/kernel-parameters.txt for more details.
+   in kdump kernel.
 
    With CONFIG_SMP=n, the above things are not related.
 
@@ -454,8 +452,7 @@ loading dump-capture kernel.
   to use multi-thread programs with it, such as parallel dump feature of
   makedumpfile. Otherwise, the multi-thread program may have a great
   performance degradation. To enable multi-cpu support, you should bring up an
-  SMP dump-capture kernel and specify maxcpus/nr_cpus, disable_cpu_apicid=[X]
-  options while loading it.
+  SMP dump-capture kernel and specify maxcpus/nr_cpus options while loading it.
 
 * For s390x there are two kdump modes: If a ELF header is specified with
   the elfcorehdr= kernel parameter, it is used by the kdump kernel as it
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1098,15 +1098,6 @@
 			Disable TLBIE instruction. Currently does not work
 			with KVM, with HASH MMU, or with coherent accelerators.
 
-	disable_cpu_apicid= [X86,APIC,SMP]
-			Format: <int>
-			The number of initial APIC ID for the
-			corresponding CPU to be disabled at boot,
-			mostly used for the kdump 2nd kernel to
-			disable BSP to wake up multiple CPUs without
-			causing system reset or hang due to sending
-			INIT from AP to BSP.
-
 	disable_ddw	[PPC/PSERIES]
 			Disable Dynamic DMA Window support. Use this
 			to workaround buggy firmware.
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -32,18 +32,13 @@ static struct {
 	unsigned int		nr_disabled_cpus;
 	unsigned int		nr_rejected_cpus;
 	u32			boot_cpu_apic_id;
+	u32			real_bsp_apic_id;
 } topo_info __read_mostly = {
 	.nr_assigned_cpus	= 1,
 	.boot_cpu_apic_id	= BAD_APICID,
+	.real_bsp_apic_id	= BAD_APICID,
 };
 
-/*
- * Processor to be disabled specified by kernel parameter
- * disable_cpu_apicid=<int>, mostly used for the kdump 2nd kernel to
- * avoid undefined behaviour caused by sending INIT from AP to BSP.
- */
-static u32 disabled_cpu_apicid __ro_after_init = BAD_APICID;
-
 bool arch_match_cpu_phys_id(int cpu, u64 phys_id)
 {
 	return phys_id == (u64)cpuid_to_apicid[cpu];
@@ -123,34 +118,40 @@ static void topo_set_cpuids(unsigned int
 		cpu_mark_primary_thread(cpu, apic_id);
 }
 
-/**
- * topology_register_apic - Register an APIC in early topology maps
- * @apic_id:	The APIC ID to set up
- * @acpi_id:	The ACPI ID associated to the APIC
- * @present:	True if the corresponding CPU is present
- */
-void __init topology_register_apic(u32 apic_id, u32 acpi_id, bool present)
+static __init bool check_for_real_bsp(u32 apic_id)
 {
-	int cpu;
+	/*
+	 * There is no real good way to detect whether this a kdump()
+	 * kernel, but except on the Voyager SMP monstrosity which is not
+	 * longer supported, the real BSP APIC ID is the first one which is
+	 * enumerated by firmware. That allows to detect whether the boot
+	 * CPU is the real BSP. If it is not, then do not register the APIC
+	 * because sending INIT to the real BSP would reset the whole
+	 * system.
+	 *
+	 * The first APIC ID which is enumerated by firmware is detectable
+	 * because the boot CPU APIC ID is registered before that without
+	 * invoking this code.
+	 */
+	if (topo_info.real_bsp_apic_id != BAD_APICID)
+		return false;
 
-	if (apic_id >= MAX_LOCAL_APIC) {
-		pr_err_once("APIC ID %x exceeds kernel limit of: %x\n", apic_id, MAX_LOCAL_APIC - 1);
-		topo_info.nr_rejected_cpus++;
-		return;
+	if (apic_id == topo_info.boot_cpu_apic_id) {
+		topo_info.real_bsp_apic_id = apic_id;
+		return false;
 	}
 
-	/* CPU numbers exhausted? */
-	if (topo_info.nr_assigned_cpus >= nr_cpu_ids) {
-		pr_warn_once("CPU limit of %d reached. Ignoring further CPUs\n", nr_cpu_ids);
-		topo_info.nr_rejected_cpus++;
-		return;
-	}
+	pr_warn("Boot CPU APIC ID not the first enumerated APIC ID: %x > %x\n",
+		topo_info.boot_cpu_apic_id, apic_id);
+	pr_warn("Crash kernel detected. Disabling real BSP to prevent machine INIT\n");
 
-	if (disabled_cpu_apicid == apic_id) {
-		pr_info("Disabling CPU as requested via 'disable_cpu_apicid=0x%x'.\n", apic_id);
-		topo_info.nr_rejected_cpus++;
-		return;
-	}
+	topo_info.real_bsp_apic_id = apic_id;
+	return true;
+}
+
+static __init void topo_register_apic(u32 apic_id, u32 acpi_id, bool present)
+{
+	int cpu;
 
 	if (present) {
 		/*
@@ -173,6 +174,33 @@ void __init topology_register_apic(u32 a
 }
 
 /**
+ * topology_register_apic - Register an APIC in early topology maps
+ * @apic_id:	The APIC ID to set up
+ * @acpi_id:	The ACPI ID associated to the APIC
+ * @present:	True if the corresponding CPU is present
+ */
+void __init topology_register_apic(u32 apic_id, u32 acpi_id, bool present)
+{
+	if (apic_id >= MAX_LOCAL_APIC) {
+		pr_err_once("APIC ID %x exceeds kernel limit of: %x\n", apic_id, MAX_LOCAL_APIC - 1);
+		topo_info.nr_rejected_cpus++;
+		return;
+	}
+
+	/* CPU numbers exhausted? */
+	if (topo_info.nr_assigned_cpus >= nr_cpu_ids) {
+		pr_warn_once("CPU limit of %d reached. Ignoring further CPUs\n", nr_cpu_ids);
+		topo_info.nr_rejected_cpus++;
+		return;
+	}
+
+	if (check_for_real_bsp(apic_id))
+		return;
+
+	topo_register_apic(apic_id, acpi_id, present);
+}
+
+/**
  * topology_register_boot_apic - Register the boot CPU APIC
  * @apic_id:	The APIC ID to set up
  *
@@ -183,7 +211,7 @@ void __init topology_register_boot_apic(
 	WARN_ON_ONCE(topo_info.boot_cpu_apic_id != BAD_APICID);
 
 	topo_info.boot_cpu_apic_id = apic_id;
-	topology_register_apic(apic_id, CPU_ACPIID_INVALID, true);
+	topo_register_apic(apic_id, CPU_ACPIID_INVALID, true);
 }
 
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
@@ -336,12 +364,3 @@ static int __init setup_possible_cpus(ch
 }
 early_param("possible_cpus", setup_possible_cpus);
 #endif
-
-static int __init apic_set_disabled_cpu_apicid(char *arg)
-{
-	if (!arg || !get_option(&arg, &disabled_cpu_apicid))
-		return -EINVAL;
-
-	return 0;
-}
-early_param("disable_cpu_apicid", apic_set_disabled_cpu_apicid);


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 16/30] x86/topology: Add a mechanism to track topology via APIC IDs
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (14 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 15/30] x86/cpu: Detect real BSP on crash kernels Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-01-23 13:11 ` [patch v2 17/30] x86/cpu/topology: Reject unknown APIC IDs on ACPI hotplug Thomas Gleixner
                   ` (15 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

Topology on X86 is determined by the registered APIC IDs and the
segmentation information retrieved from CPUID. Depending on the granularity
of the provided CPUID information the most fine grained scheme looks like
this according to Intel terminology:

   [PKG][DIEGRP][DIE][TILE][MODULE][CORE][THREAD]

Not enumerated domain levels consume 0 bits in the APIC ID. This allows to
provide a consistent view at the topology and determine other information
precisely like the number of cores in a package on hybrid systems, where
the existing assumption that number or cores == number of threads / threads
per core does not hold.

Provide per domain level bitmaps which record the APIC ID split into the
domain levels to make later evaluation of domain level specific information
simple. This allows to calculate e.g. the logical IDs without any further
extra logic.

Contrary to the existing registration mechanism this records disabled CPUs,
which are subject to later hotplug as well. That's useful for boot time
sizing of package or die dependent allocations without using heuristics.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/cpu/topology.c |   48 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 46 insertions(+), 2 deletions(-)
---
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -1,5 +1,27 @@
 // SPDX-License-Identifier: GPL-2.0-only
-
+/*
+ * CPU/APIC topology
+ *
+ * The APIC IDs describe the system topology in multiple domain levels.
+ * The CPUID topology parser provides the information which part of the
+ * APIC ID is associated to the individual levels:
+ *
+ * [PACKAGE][DIEGRP][DIE][TILE][MODULE][CORE][THREAD]
+ *
+ * The root space contains the package (socket) IDs.
+ *
+ * Not enumerated levels consume 0 bits space, but conceptually they are
+ * always represented. If e.g. only CORE and THREAD levels are enumerated
+ * then the DIE, MODULE and TILE have the same physical ID as the PACKAGE.
+ *
+ * If SMT is not supported, then the THREAD domain is still used. It then
+ * has the same physical ID as the CORE domain and is the only child of
+ * the core domain.
+ *
+ * This allows a unified view on the system independent of the enumerated
+ * domain levels without requiring any conditionals in the code.
+ */
+#define pr_fmt(fmt) "CPU topo: " fmt
 #include <linux/cpu.h>
 
 #include <xen/xen.h>
@@ -9,6 +31,8 @@
 #include <asm/mpspec.h>
 #include <asm/smp.h>
 
+#include "cpu.h"
+
 /*
  * Map cpu index to physical APIC ID
  */
@@ -23,6 +47,9 @@ DECLARE_BITMAP(phys_cpu_present_map, MAX
 /* Used for CPU number allocation and parallel CPU bringup */
 u32 cpuid_to_apicid[] __read_mostly = { [0 ... NR_CPUS - 1] = BAD_APICID, };
 
+/* Bitmaps to mark registered APICs at each topology domain */
+static struct { DECLARE_BITMAP(map, MAX_LOCAL_APIC); } apic_maps[TOPO_MAX_DOMAIN] __ro_after_init;
+
 /*
  * Keep track of assigned, disabled and rejected CPUs. Present assigned
  * with 1 as CPU #0 is reserved for the boot CPU.
@@ -39,6 +66,8 @@ static struct {
 	.real_bsp_apic_id	= BAD_APICID,
 };
 
+#define domain_weight(_dom)	bitmap_weight(apic_maps[_dom].map, MAX_LOCAL_APIC)
+
 bool arch_match_cpu_phys_id(int cpu, u64 phys_id)
 {
 	return phys_id == (u64)cpuid_to_apicid[cpu];
@@ -81,6 +110,17 @@ early_initcall(smp_init_primary_thread_m
 static inline void cpu_mark_primary_thread(unsigned int cpu, unsigned int apicid) { }
 #endif
 
+/*
+ * Convert the APIC ID to a domain level ID by masking out the low bits
+ * below the domain level @dom.
+ */
+static inline u32 topo_apicid(u32 apicid, enum x86_topology_domains dom)
+{
+	if (dom == TOPO_SMT_DOMAIN)
+		return apicid;
+	return apicid & (UINT_MAX << x86_topo_system.dom_shifts[dom - 1]);
+}
+
 static int topo_lookup_cpuid(u32 apic_id)
 {
 	int i;
@@ -151,7 +191,7 @@ static __init bool check_for_real_bsp(u3
 
 static __init void topo_register_apic(u32 apic_id, u32 acpi_id, bool present)
 {
-	int cpu;
+	int cpu, dom;
 
 	if (present) {
 		/*
@@ -171,6 +211,10 @@ static __init void topo_register_apic(u3
 	} else {
 		topo_info.nr_disabled_cpus++;
 	}
+
+	/* Register present and possible CPUs in the domain maps */
+	for (dom = TOPO_SMT_DOMAIN; dom < TOPO_MAX_DOMAIN; dom++)
+		set_bit(topo_apicid(apic_id, dom), apic_maps[dom].map);
 }
 
 /**


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 17/30] x86/cpu/topology: Reject unknown APIC IDs on ACPI hotplug
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (15 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 16/30] x86/topology: Add a mechanism to track topology via APIC IDs Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-01-23 13:11 ` [patch v2 18/30] x86/cpu/topology: Assign hotpluggable CPUIDs during init Thomas Gleixner
                   ` (14 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

The topology bitmaps track all possible APIC IDs which have been registered
during enumeration. As sizing and further topology information is going to
be derived from these bitmaps, reject attempts to hotplug an APIC ID which
was not registered during enumeration.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/cpu/topology.c |    4 ++++
 1 file changed, 4 insertions(+)
---
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -266,6 +266,10 @@ int topology_hotplug_apic(u32 apic_id, u
 	if (apic_id >= MAX_LOCAL_APIC)
 		return -EINVAL;
 
+	/* Reject if the APIC ID was not registered during enumeration. */
+	if (!test_bit(apic_id, apic_maps[TOPO_SMT_DOMAIN].map))
+		return -ENODEV;
+
 	cpu = topo_lookup_cpuid(apic_id);
 	if (cpu < 0) {
 		if (topo_info.nr_assigned_cpus >= nr_cpu_ids)


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 18/30] x86/cpu/topology: Assign hotpluggable CPUIDs during init
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (16 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 17/30] x86/cpu/topology: Reject unknown APIC IDs on ACPI hotplug Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-01-23 13:11 ` [patch v2 19/30] x86/xen/smp_pv: Count number of vCPUs early Thomas Gleixner
                   ` (13 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

There is no point in assigning the CPU numbers during ACPI physical
hotplug. The number of possible hotplug CPUs is known when the possible map
is initialized, so the CPU numbers can be associated to the registered
non-present APIC IDs right there.

This allows to put more code into the __init section and makes the related
data __ro_after_init.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/cpu/topology.c |   29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)
---
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -45,7 +45,7 @@ EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_a
 DECLARE_BITMAP(phys_cpu_present_map, MAX_LOCAL_APIC) __read_mostly;
 
 /* Used for CPU number allocation and parallel CPU bringup */
-u32 cpuid_to_apicid[] __read_mostly = { [0 ... NR_CPUS - 1] = BAD_APICID, };
+u32 cpuid_to_apicid[] __ro_after_init = { [0 ... NR_CPUS - 1] = BAD_APICID, };
 
 /* Bitmaps to mark registered APICs at each topology domain */
 static struct { DECLARE_BITMAP(map, MAX_LOCAL_APIC); } apic_maps[TOPO_MAX_DOMAIN] __ro_after_init;
@@ -60,7 +60,7 @@ static struct {
 	unsigned int		nr_rejected_cpus;
 	u32			boot_cpu_apic_id;
 	u32			real_bsp_apic_id;
-} topo_info __read_mostly = {
+} topo_info __ro_after_init = {
 	.nr_assigned_cpus	= 1,
 	.boot_cpu_apic_id	= BAD_APICID,
 	.real_bsp_apic_id	= BAD_APICID,
@@ -133,7 +133,7 @@ static int topo_lookup_cpuid(u32 apic_id
 	return -ENODEV;
 }
 
-static int topo_assign_cpunr(u32 apic_id)
+static __init int topo_assign_cpunr(u32 apic_id)
 {
 	int cpu = topo_lookup_cpuid(apic_id);
 
@@ -149,8 +149,6 @@ static void topo_set_cpuids(unsigned int
 	early_per_cpu(x86_cpu_to_apicid, cpu) = apic_id;
 	early_per_cpu(x86_cpu_to_acpiid, cpu) = acpi_id;
 #endif
-	cpuid_to_apicid[cpu] = apic_id;
-
 	set_cpu_possible(cpu, true);
 	set_cpu_present(cpu, true);
 
@@ -207,6 +205,8 @@ static __init void topo_register_apic(u3
 			cpu = 0;
 		else
 			cpu = topo_assign_cpunr(apic_id);
+
+		cpuid_to_apicid[cpu] = apic_id;
 		topo_set_cpuids(cpu, apic_id, acpi_id);
 	} else {
 		topo_info.nr_disabled_cpus++;
@@ -276,12 +276,9 @@ int topology_hotplug_apic(u32 apic_id, u
 		return -ENODEV;
 
 	cpu = topo_lookup_cpuid(apic_id);
-	if (cpu < 0) {
-		if (topo_info.nr_assigned_cpus >= nr_cpu_ids)
-			return -ENOSPC;
+	if (cpu < 0)
+		return -ENOSPC;
 
-		cpu = topo_assign_cpunr(apic_id);
-	}
 	set_bit(apic_id, phys_cpu_present_map);
 	topo_set_cpuids(cpu, apic_id, acpi_id);
 	return cpu;
@@ -352,6 +349,7 @@ void __init topology_init_possible_cpus(
 	unsigned int disabled = topo_info.nr_disabled_cpus;
 	unsigned int total = assigned + disabled;
 	unsigned int cpu, allowed = 1;
+	u32 apicid;
 
 	if (!restrict_to_up()) {
 		if (WARN_ON_ONCE(assigned > nr_cpu_ids)) {
@@ -380,8 +378,17 @@ void __init topology_init_possible_cpus(
 	init_cpu_present(cpumask_of(0));
 	init_cpu_possible(cpumask_of(0));
 
+	/* Assign CPU numbers to non-present CPUs */
+	for (apicid = 0; disabled; disabled--, apicid++) {
+		apicid = find_next_andnot_bit(apic_maps[TOPO_SMT_DOMAIN].map, phys_cpu_present_map,
+					      MAX_LOCAL_APIC, apicid);
+		if (apicid >= MAX_LOCAL_APIC)
+			break;
+		cpuid_to_apicid[topo_info.nr_assigned_cpus++] = apicid;
+	}
+
 	for (cpu = 0; cpu < allowed; cpu++) {
-		u32 apicid = cpuid_to_apicid[cpu];
+		apicid = cpuid_to_apicid[cpu];
 
 		set_cpu_possible(cpu, true);
 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 19/30] x86/xen/smp_pv: Count number of vCPUs early
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (17 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 18/30] x86/cpu/topology: Assign hotpluggable CPUIDs during init Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-01-23 13:11 ` [patch v2 20/30] x86/cpu/topology: Let XEN/PV use topology from CPUID/MADT Thomas Gleixner
                   ` (12 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

XEN/PV has a completely broken vCPU enumeration scheme, which just works by
chance and provides zero topology information. Each vCPU ends up being a
single core package.

Dom0 provides MADT which can be used for topology information, but that
table is the unmodified host table, which means that there can be more CPUs
registered than the number of vCPUs XEN provides for the dom0 guest.

DomU does not have ACPI and both rely on counting the possible vCPUs via an
hypercall.

To prepare for using CPUID topology information either via MADT or via fake
APIC IDs count the number of possible CPUs during early boot and adjust
nr_cpu_ids() accordingly.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>



---
 arch/x86/xen/enlighten_pv.c |    3 +++
 arch/x86/xen/smp.h          |    2 ++
 arch/x86/xen/smp_pv.c       |   14 ++++++++++++++
 3 files changed, 19 insertions(+)
---
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -200,6 +200,9 @@ static void __init xen_pv_init_platform(
 		xen_set_mtrr_data();
 	else
 		mtrr_overwrite_state(NULL, 0, MTRR_TYPE_WRBACK);
+
+	/* Adjust nr_cpu_ids before "enumeration" happens */
+	xen_smp_count_cpus();
 }
 
 static void __init xen_pv_guest_late_init(void)
--- a/arch/x86/xen/smp.h
+++ b/arch/x86/xen/smp.h
@@ -19,6 +19,7 @@ extern void xen_smp_intr_free(unsigned i
 int xen_smp_intr_init_pv(unsigned int cpu);
 void xen_smp_intr_free_pv(unsigned int cpu);
 
+void xen_smp_count_cpus(void);
 void xen_smp_cpus_done(unsigned int max_cpus);
 
 void xen_smp_send_reschedule(int cpu);
@@ -44,6 +45,7 @@ static inline int xen_smp_intr_init_pv(u
 	return 0;
 }
 static inline void xen_smp_intr_free_pv(unsigned int cpu) {}
+static inline void xen_smp_count_cpus(void) { }
 #endif /* CONFIG_SMP */
 
 #endif
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -411,6 +411,20 @@ static irqreturn_t xen_irq_work_interrup
 	return IRQ_HANDLED;
 }
 
+void __init xen_smp_count_cpus(void)
+{
+	unsigned int cpus;
+
+	for (cpus = 0; cpus < nr_cpu_ids; cpus++) {
+		if (HYPERVISOR_vcpu_op(VCPUOP_is_up, cpus, NULL) < 0)
+			break;
+	}
+
+	pr_info("Xen PV: Detected %u vCPUS\n", cpus);
+	if (cpus < nr_cpu_ids)
+		set_nr_cpu_ids(cpus);
+}
+
 static const struct smp_ops xen_smp_ops __initconst = {
 	.smp_prepare_boot_cpu = xen_pv_smp_prepare_boot_cpu,
 	.smp_prepare_cpus = xen_pv_smp_prepare_cpus,


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 20/30] x86/cpu/topology: Let XEN/PV use topology from CPUID/MADT
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (18 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 19/30] x86/xen/smp_pv: Count number of vCPUs early Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-01-23 13:11 ` [patch v2 21/30] x86/cpu/topology: Use topology bitmaps for sizing Thomas Gleixner
                   ` (11 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

It turns out that XEN/PV Dom0 has halfways usable CPUID/MADT enumeration
except that it cannot deal with CPUs which are enumerated as disabled in
MADT.

DomU has no MADT and provides at least rudimentary topology information in
CPUID leaves 1 and 4.

For both it's important that there are not more possible Linux CPUs than
vCPUs provided by the hypervisor.

As this is ensured by counting the vCPUs before enumeration happens:

  - lift the restrictions in the CPUID evaluation and the MADT parser

  - Utilize MADT registration for Dom0

  - Keep the fake APIC ID registration for DomU

  - Fix the XEN APIC fake so the readout of the local APIC ID works for
    Dom0 via the hypercall and for DomU by returning the registered
    fake APIC IDs.

With that the XEN/PV fake approximates usefulness.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>



---
 arch/x86/kernel/acpi/boot.c           |   25 ++++++++-----------------
 arch/x86/kernel/cpu/topology_common.c |    2 +-
 arch/x86/xen/apic.c                   |   14 +++++++-------
 arch/x86/xen/smp_pv.c                 |   13 ++++++++-----
 4 files changed, 24 insertions(+), 30 deletions(-)
---
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -23,8 +23,6 @@
 #include <linux/serial_core.h>
 #include <linux/pgtable.h>
 
-#include <xen/xen.h>
-
 #include <asm/e820/api.h>
 #include <asm/irqdomain.h>
 #include <asm/pci_x86.h>
@@ -166,12 +164,6 @@ static int __init acpi_parse_madt(struct
 	return 0;
 }
 
-static __init void acpi_register_lapic(u32 apic_id, u32 acpi_id, bool present)
-{
-	if (!xen_pv_domain())
-		topology_register_apic(apic_id, acpi_id, present);
-}
-
 static bool __init acpi_is_processor_usable(u32 lapic_flags)
 {
 	if (lapic_flags & ACPI_MADT_ENABLED)
@@ -233,7 +225,7 @@ acpi_parse_x2apic(union acpi_subtable_he
 		return 0;
 	}
 
-	acpi_register_lapic(apic_id, processor->uid, enabled);
+	topology_register_apic(apic_id, processor->uid, enabled);
 #else
 	pr_warn("x2apic entry ignored\n");
 #endif
@@ -268,9 +260,9 @@ acpi_parse_lapic(union acpi_subtable_hea
 	 * to not preallocating memory for all NR_CPUS
 	 * when we use CPU hotplug.
 	 */
-	acpi_register_lapic(processor->id,	/* APIC ID */
-			    processor->processor_id, /* ACPI ID */
-			    processor->lapic_flags & ACPI_MADT_ENABLED);
+	topology_register_apic(processor->id,	/* APIC ID */
+			       processor->processor_id, /* ACPI ID */
+			       processor->lapic_flags & ACPI_MADT_ENABLED);
 
 	has_lapic_cpus = true;
 	return 0;
@@ -288,9 +280,9 @@ acpi_parse_sapic(union acpi_subtable_hea
 
 	acpi_table_print_madt_entry(&header->common);
 
-	acpi_register_lapic((processor->id << 8) | processor->eid,/* APIC ID */
-			    processor->processor_id, /* ACPI ID */
-			    processor->lapic_flags & ACPI_MADT_ENABLED);
+	topology_register_apic((processor->id << 8) | processor->eid,/* APIC ID */
+			       processor->processor_id, /* ACPI ID */
+			       processor->lapic_flags & ACPI_MADT_ENABLED);
 
 	return 0;
 }
@@ -1090,8 +1082,7 @@ static int __init early_acpi_parse_madt_
 		return count;
 	}
 
-	if (!xen_pv_domain())
-		register_lapic_address(acpi_lapic_addr);
+	register_lapic_address(acpi_lapic_addr);
 
 	return count;
 }
--- a/arch/x86/kernel/cpu/topology_common.c
+++ b/arch/x86/kernel/cpu/topology_common.c
@@ -77,7 +77,7 @@ static bool fake_topology(struct topo_sc
 	topology_set_dom(tscan, TOPO_SMT_DOMAIN, 0, 1);
 	topology_set_dom(tscan, TOPO_CORE_DOMAIN, 0, 1);
 
-	return tscan->c->cpuid_level < 1 || xen_pv_domain();
+	return tscan->c->cpuid_level < 1;
 }
 
 static void parse_topology(struct topo_scan *tscan, bool early)
--- a/arch/x86/xen/apic.c
+++ b/arch/x86/xen/apic.c
@@ -43,20 +43,20 @@ static u32 xen_apic_read(u32 reg)
 	struct xen_platform_op op = {
 		.cmd = XENPF_get_cpuinfo,
 		.interface_version = XENPF_INTERFACE_VERSION,
-		.u.pcpu_info.xen_cpuid = 0,
 	};
-	int ret;
-
-	/* Shouldn't need this as APIC is turned off for PV, and we only
-	 * get called on the bootup processor. But just in case. */
-	if (!xen_initial_domain() || smp_processor_id())
-		return 0;
+	int ret, cpu;
 
 	if (reg == APIC_LVR)
 		return 0x14;
 	if (reg != APIC_ID)
 		return 0;
 
+	cpu = smp_processor_id();
+	if (!xen_initial_domain())
+		return cpu ? cpuid_to_apicid[cpu] << 24 : 0;
+
+	op.u.pcpu_info.xen_cpuid = cpu;
+
 	ret = HYPERVISOR_platform_op(&op);
 	if (ret)
 		op.u.pcpu_info.apic_id = BAD_APICID;
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -156,11 +156,9 @@ static void __init xen_pv_smp_config(voi
 
 	topology_register_boot_apic(apicid++);
 
-	for (i = 1; i < nr_cpu_ids; i++) {
-		if (HYPERVISOR_vcpu_op(VCPUOP_is_up, i, NULL) < 0)
-			break;
+	for (i = 1; i < nr_cpu_ids; i++)
 		topology_register_apic(apicid++, CPU_ACPIID_INVALID, true);
-	}
+
 	/* Pretend to be a proper enumerated system */
 	smp_found_config = 1;
 }
@@ -451,5 +449,10 @@ void __init xen_smp_init(void)
 	/* Avoid searching for BIOS MP tables */
 	x86_init.mpparse.find_mptable		= x86_init_noop;
 	x86_init.mpparse.early_parse_smp_cfg	= x86_init_noop;
-	x86_init.mpparse.parse_smp_cfg		= xen_pv_smp_config;
+
+	/* XEN/PV Dom0 has halfways sane topology information via CPUID/MADT */
+	if (xen_initial_domain())
+		x86_init.mpparse.parse_smp_cfg	= x86_init_noop;
+	else
+		x86_init.mpparse.parse_smp_cfg	= xen_pv_smp_config;
 }


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 21/30] x86/cpu/topology: Use topology bitmaps for sizing
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (19 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 20/30] x86/cpu/topology: Let XEN/PV use topology from CPUID/MADT Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-01-26  7:07   ` Zhang, Rui
  2024-01-23 13:11 ` [patch v2 22/30] x86/cpu/topology: Mop up primary thread mask handling Thomas Gleixner
                   ` (10 subsequent siblings)
  31 siblings, 1 reply; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

Now that all possible APIC IDs are tracked in the topology bitmaps, its
trivial to retrieve the real information from there.

This gets rid of the guesstimates for the maximal packages and dies per
package as the actual numbers can be determined before a single AP has been
brought up.

The number of SMT threads can now be determined correctly from the bitmaps
in all situations. Up to now a system which has SMT disabled in the BIOS
will still claim that it is SMT capable, because the lowest APIC ID bit is
reserved for that and CPUID leaf 0xb/0x1f still enumerates the SMT domain
accordingly. By calculating the bitmap weights of the SMT and the CORE
domain and setting them into relation the SMT disabled in BIOS situation
reports correctly that the system is not SMT capable.

It also handles the situation correctly when a hybrid systems boot CPU does
not have SMT as it takes the SMT capability of the APs fully into account.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>



---
 arch/x86/include/asm/smp.h            |    3 +--
 arch/x86/include/asm/topology.h       |   23 ++++++++++++-----------
 arch/x86/kernel/cpu/common.c          |    9 ++++++---
 arch/x86/kernel/cpu/debugfs.c         |    2 +-
 arch/x86/kernel/cpu/topology.c        |   16 +++++++++++++++-
 arch/x86/kernel/cpu/topology_common.c |   24 ------------------------
 arch/x86/kernel/smpboot.c             |   16 ----------------
 arch/x86/xen/smp.c                    |    2 --
 8 files changed, 35 insertions(+), 60 deletions(-)
---
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -8,7 +8,7 @@
 #include <asm/current.h>
 #include <asm/thread_info.h>
 
-extern int smp_num_siblings;
+extern unsigned int smp_num_siblings;
 
 DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
 DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_map);
@@ -109,7 +109,6 @@ void cpu_disable_common(void);
 void native_smp_prepare_boot_cpu(void);
 void smp_prepare_cpus_common(void);
 void native_smp_prepare_cpus(unsigned int max_cpus);
-void calculate_max_logical_packages(void);
 void native_smp_cpus_done(unsigned int max_cpus);
 int common_cpu_up(unsigned int cpunum, struct task_struct *tidle);
 int native_kick_ap(unsigned int cpu, struct task_struct *tidle);
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -143,7 +143,18 @@ extern const struct cpumask *cpu_cluster
 
 #define topology_amd_node_id(cpu)		(cpu_data(cpu).topo.amd_node_id)
 
-extern unsigned int __max_die_per_package;
+extern unsigned int __max_dies_per_package;
+extern unsigned int __max_logical_packages;
+
+static inline unsigned int topology_max_packages(void)
+{
+	return __max_logical_packages;
+}
+
+static inline unsigned int topology_max_die_per_package(void)
+{
+	return __max_dies_per_package;
+}
 
 #ifdef CONFIG_SMP
 #define topology_cluster_id(cpu)		(cpu_data(cpu).topo.l2c_id)
@@ -152,14 +163,6 @@ extern unsigned int __max_die_per_packag
 #define topology_core_cpumask(cpu)		(per_cpu(cpu_core_map, cpu))
 #define topology_sibling_cpumask(cpu)		(per_cpu(cpu_sibling_map, cpu))
 
-extern unsigned int __max_logical_packages;
-#define topology_max_packages()			(__max_logical_packages)
-
-static inline int topology_max_die_per_package(void)
-{
-	return __max_die_per_package;
-}
-
 extern int __max_smt_threads;
 
 static inline int topology_max_smt_threads(void)
@@ -193,13 +196,11 @@ static inline bool topology_is_primary_t
 }
 
 #else /* CONFIG_SMP */
-#define topology_max_packages()			(1)
 static inline int
 topology_update_package_map(unsigned int apicid, unsigned int cpu) { return 0; }
 static inline int
 topology_update_die_map(unsigned int dieid, unsigned int cpu) { return 0; }
 static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
-static inline int topology_max_die_per_package(void) { return 1; }
 static inline int topology_max_smt_threads(void) { return 1; }
 static inline bool topology_is_primary_thread(unsigned int cpu) { return true; }
 static inline unsigned int topology_amd_nodes_per_pkg(void) { return 0; };
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -72,11 +72,14 @@
 u32 elf_hwcap2 __read_mostly;
 
 /* Number of siblings per CPU package */
-int smp_num_siblings = 1;
+unsigned int smp_num_siblings __ro_after_init = 1;
 EXPORT_SYMBOL(smp_num_siblings);
 
-unsigned int __max_die_per_package __read_mostly = 1;
-EXPORT_SYMBOL(__max_die_per_package);
+unsigned int __max_dies_per_package __ro_after_init = 1;
+EXPORT_SYMBOL(__max_dies_per_package);
+
+unsigned int __max_logical_packages __ro_after_init = 1;
+EXPORT_SYMBOL(__max_logical_packages);
 
 static struct ppin_info {
 	int	feature;
--- a/arch/x86/kernel/cpu/debugfs.c
+++ b/arch/x86/kernel/cpu/debugfs.c
@@ -29,7 +29,7 @@ static int cpu_debug_show(struct seq_fil
 	seq_printf(m, "amd_node_id:         %u\n", c->topo.amd_node_id);
 	seq_printf(m, "amd_nodes_per_pkg:   %u\n", topology_amd_nodes_per_pkg());
 	seq_printf(m, "max_cores:           %u\n", c->x86_max_cores);
-	seq_printf(m, "max_die_per_pkg:     %u\n", __max_die_per_package);
+	seq_printf(m, "max_dies_per_pkg:    %u\n", __max_dies_per_package);
 	seq_printf(m, "smp_num_siblings:    %u\n", smp_num_siblings);
 	return 0;
 }
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -349,8 +349,8 @@ void __init topology_init_possible_cpus(
 {
 	unsigned int assigned = topo_info.nr_assigned_cpus;
 	unsigned int disabled = topo_info.nr_disabled_cpus;
+	unsigned int cnta, cntb, cpu, allowed = 1;
 	unsigned int total = assigned + disabled;
-	unsigned int cpu, allowed = 1;
 	u32 apicid;
 
 	if (!restrict_to_up()) {
@@ -373,6 +373,20 @@ void __init topology_init_possible_cpus(
 	total_cpus = allowed;
 	set_nr_cpu_ids(allowed);
 
+	cnta = domain_weight(TOPO_PKG_DOMAIN);
+	cntb = domain_weight(TOPO_DIE_DOMAIN);
+	__max_logical_packages = cnta;
+	__max_dies_per_package = 1U << (get_count_order(cntb) - get_count_order(cnta));
+
+	pr_info("Max. logical packages: %3u\n", cnta);
+	pr_info("Max. logical dies:     %3u\n", cntb);
+	pr_info("Max. dies per package: %3u\n", __max_dies_per_package);
+
+	cnta = domain_weight(TOPO_CORE_DOMAIN);
+	cntb = domain_weight(TOPO_SMT_DOMAIN);
+	smp_num_siblings = 1U << (get_count_order(cntb) - get_count_order(cnta));
+	pr_info("Max. threads per core: %3u\n", smp_num_siblings);
+
 	pr_info("Allowing %u present CPUs plus %u hotplug CPUs\n", assigned, disabled);
 	if (topo_info.nr_rejected_cpus)
 		pr_info("Rejected CPUs %u\n", topo_info.nr_rejected_cpus);
--- a/arch/x86/kernel/cpu/topology_common.c
+++ b/arch/x86/kernel/cpu/topology_common.c
@@ -196,16 +196,6 @@ void cpu_parse_topology(struct cpuinfo_x
 		       tscan.dom_shifts[dom], x86_topo_system.dom_shifts[dom]);
 	}
 
-	/* Bug compatible with the existing parsers */
-	if (tscan.dom_ncpus[TOPO_SMT_DOMAIN] > smp_num_siblings) {
-		if (system_state == SYSTEM_BOOTING) {
-			pr_warn_once("CPU%d: SMT detected and enabled late\n", cpu);
-			smp_num_siblings = tscan.dom_ncpus[TOPO_SMT_DOMAIN];
-		} else {
-			pr_warn_once("CPU%d: SMT detected after init. Too late!\n", cpu);
-		}
-	}
-
 	topo_set_ids(&tscan);
 	topo_set_max_cores(&tscan);
 }
@@ -232,20 +222,6 @@ void __init cpu_init_topology(struct cpu
 	topo_set_max_cores(&tscan);
 
 	/*
-	 * Bug compatible with the existing code. If the boot CPU does not
-	 * have SMT this ends up with one sibling. This needs way deeper
-	 * changes further down the road to get it right during early boot.
-	 */
-	smp_num_siblings = tscan.dom_ncpus[TOPO_SMT_DOMAIN];
-
-	/*
-	 * Neither it's clear whether there are as many dies as the APIC
-	 * space indicating die level is. But assume that the actual number
-	 * of CPUs gives a proper indication for now to stay bug compatible.
-	 */
-	__max_die_per_package = tscan.dom_ncpus[TOPO_DIE_DOMAIN] /
-		tscan.dom_ncpus[TOPO_DIE_DOMAIN - 1];
-	/*
 	 * AMD systems have Nodes per package which cannot be mapped to
 	 * APIC ID.
 	 */
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -139,8 +139,6 @@ static DEFINE_PER_CPU_READ_MOSTLY(struct
 	.phys_die_id	= U32_MAX,
 };
 
-unsigned int __max_logical_packages __read_mostly;
-EXPORT_SYMBOL(__max_logical_packages);
 static unsigned int logical_packages __read_mostly;
 static unsigned int logical_die __read_mostly;
 
@@ -1267,24 +1265,10 @@ void __init native_smp_prepare_boot_cpu(
 	native_pv_lock_init();
 }
 
-void __init calculate_max_logical_packages(void)
-{
-	int ncpus;
-
-	/*
-	 * Today neither Intel nor AMD support heterogeneous systems so
-	 * extrapolate the boot cpu's data to all packages.
-	 */
-	ncpus = cpu_data(0).booted_cores * topology_max_smt_threads();
-	__max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
-	pr_info("Max logical packages: %u\n", __max_logical_packages);
-}
-
 void __init native_smp_cpus_done(unsigned int max_cpus)
 {
 	pr_debug("Boot done\n");
 
-	calculate_max_logical_packages();
 	build_sched_topology();
 	nmi_selftest();
 	impress_friends();
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -123,8 +123,6 @@ void __init xen_smp_cpus_done(unsigned i
 {
 	if (xen_hvm_domain())
 		native_smp_cpus_done(max_cpus);
-	else
-		calculate_max_logical_packages();
 }
 
 void xen_smp_send_reschedule(int cpu)


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 22/30] x86/cpu/topology: Mop up primary thread mask handling
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (20 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 21/30] x86/cpu/topology: Use topology bitmaps for sizing Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-01-23 13:11 ` [patch v2 23/30] x86/cpu/topology: Simplify cpu_mark_primary_thread() Thomas Gleixner
                   ` (9 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

The early initcall to initialize the primary thread mask is not longer
required because topology_init_possible_cpus() can mark primary threads
correctly when initializing the possible and present map as the number of
SMT threads is already determined correctly.

The XENPV workaround is not longer required because XENPV now registers
fake APIC IDs which will just work like any other enumeration.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>



---
 arch/x86/kernel/cpu/topology.c |   29 ++---------------------------
 1 file changed, 2 insertions(+), 27 deletions(-)
---
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -82,30 +82,6 @@ static void cpu_mark_primary_thread(unsi
 	if (smp_num_siblings == 1 || !(apicid & mask))
 		cpumask_set_cpu(cpu, &__cpu_primary_thread_mask);
 }
-
-/*
- * Due to the utter mess of CPUID evaluation smp_num_siblings is not valid
- * during early boot. Initialize the primary thread mask before SMP
- * bringup.
- */
-static int __init smp_init_primary_thread_mask(void)
-{
-	unsigned int cpu;
-
-	/*
-	 * XEN/PV provides either none or useless topology information.
-	 * Pretend that all vCPUs are primary threads.
-	 */
-	if (xen_pv_domain()) {
-		cpumask_copy(&__cpu_primary_thread_mask, cpu_possible_mask);
-		return 0;
-	}
-
-	for (cpu = 0; cpu < topo_info.nr_assigned_cpus; cpu++)
-		cpu_mark_primary_thread(cpu, cpuid_to_apicid[cpu]);
-	return 0;
-}
-early_initcall(smp_init_primary_thread_mask);
 #else
 static inline void cpu_mark_primary_thread(unsigned int cpu, unsigned int apicid) { }
 #endif
@@ -151,9 +127,6 @@ static void topo_set_cpuids(unsigned int
 #endif
 	set_cpu_possible(cpu, true);
 	set_cpu_present(cpu, true);
-
-	if (system_state != SYSTEM_BOOTING)
-		cpu_mark_primary_thread(cpu, apic_id);
 }
 
 static __init bool check_for_real_bsp(u32 apic_id)
@@ -276,6 +249,7 @@ int topology_hotplug_apic(u32 apic_id, u
 
 	set_bit(apic_id, phys_cpu_present_map);
 	topo_set_cpuids(cpu, apic_id, acpi_id);
+	cpu_mark_primary_thread(cpu, apic_id);
 	return cpu;
 }
 
@@ -411,6 +385,7 @@ void __init topology_init_possible_cpus(
 		if (apicid == BAD_APICID)
 			continue;
 
+		cpu_mark_primary_thread(cpu, apicid);
 		set_cpu_present(cpu, test_bit(apicid, phys_cpu_present_map));
 	}
 }


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 23/30] x86/cpu/topology: Simplify cpu_mark_primary_thread()
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (21 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 22/30] x86/cpu/topology: Mop up primary thread mask handling Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-01-23 13:11 ` [patch v2 24/30] x86/cpu/topology: Provide logical pkg/die mapping Thomas Gleixner
                   ` (8 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

No point in creating a mask via fls(). smp_num_siblings is guaranteed to be
a power of 2. So just using (smp_num_siblings - 1) has the same effect.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>



---
 arch/x86/kernel/cpu/topology.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)
---
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -76,10 +76,7 @@ bool arch_match_cpu_phys_id(int cpu, u64
 #ifdef CONFIG_SMP
 static void cpu_mark_primary_thread(unsigned int cpu, unsigned int apicid)
 {
-	/* Isolate the SMT bit(s) in the APICID and check for 0 */
-	u32 mask = (1U << (fls(smp_num_siblings) - 1)) - 1;
-
-	if (smp_num_siblings == 1 || !(apicid & mask))
+	if (!(apicid & (smp_num_siblings - 1)))
 		cpumask_set_cpu(cpu, &__cpu_primary_thread_mask);
 }
 #else


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 24/30] x86/cpu/topology: Provide logical pkg/die mapping
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (22 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 23/30] x86/cpu/topology: Simplify cpu_mark_primary_thread() Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-01-23 13:11 ` [patch v2 25/30] x86/cpu/topology: Use topology logical mapping mechanism Thomas Gleixner
                   ` (7 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

With the topology bitmaps in place the logical package and die IDs can
trivially be retrieved by determining the bitmap weight of the relevant
topology domain level up to and including the physical ID in question.

Provide a function to that effect.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>



---
 arch/x86/include/asm/topology.h |    9 +++++++++
 arch/x86/kernel/cpu/topology.c  |   28 ++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+)
---
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -156,6 +156,15 @@ static inline unsigned int topology_max_
 	return __max_dies_per_package;
 }
 
+#ifdef CONFIG_X86_LOCAL_APIC
+int topology_get_logical_id(u32 apicid, enum x86_topology_domains at_level);
+#else
+static inline int topology_get_logical_id(u32 apicid, enum x86_topology_domains at_level)
+{
+	return 0;
+}
+#endif
+
 #ifdef CONFIG_SMP
 #define topology_cluster_id(cpu)		(cpu_data(cpu).topo.l2c_id)
 #define topology_die_cpumask(cpu)		(per_cpu(cpu_die_map, cpu))
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -230,6 +230,34 @@ void __init topology_register_boot_apic(
 	topo_register_apic(apic_id, CPU_ACPIID_INVALID, true);
 }
 
+/**
+ * topology_get_logical_id - Retrieve the logical ID at a given topology domain level
+ * @apicid:		The APIC ID for which to lookup the logical ID
+ * @at_level:		The topology domain level to use
+ *
+ * @apicid must be a full APIC ID, not the normalized variant. It's valid to have
+ * all bits below the domain level specified by @at_level to be clear. So both
+ * real APIC IDs and backshifted normalized APIC IDs work correctly.
+ *
+ * Returns:
+ *  - >= 0:	The requested logical ID
+ *  - -ERANGE:	@apicid is out of range
+ *  - -ENODEV:	@apicid is not registered
+ */
+int topology_get_logical_id(u32 apicid, enum x86_topology_domains at_level)
+{
+	/* Remove the bits below @at_level to get the proper level ID of @apicid */
+	unsigned int lvlid = topo_apicid(apicid, at_level);
+
+	if (lvlid >= MAX_LOCAL_APIC)
+		return -ERANGE;
+	if (!test_bit(lvlid, apic_maps[at_level].map))
+		return -ENODEV;
+	/* Get the number of set bits before @lvlid. */
+	return bitmap_weight(apic_maps[at_level].map, lvlid);
+}
+EXPORT_SYMBOL_GPL(topology_get_logical_id);
+
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /**
  * topology_hotplug_apic - Handle a physical hotplugged APIC after boot


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 25/30] x86/cpu/topology: Use topology logical mapping mechanism
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (23 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 24/30] x86/cpu/topology: Provide logical pkg/die mapping Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-02-01 22:31   ` Sohil Mehta
  2024-02-02  6:45   ` Zhang, Rui
  2024-01-23 13:11 ` [patch v2 26/30] x86/cpu/topology: Retrieve cores per package from topology bitmaps Thomas Gleixner
                   ` (6 subsequent siblings)
  31 siblings, 2 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

Replace the logical package and die management functionality and retrieve
the logical IDs from the topology bitmaps.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>



---
 arch/x86/include/asm/topology.h       |   15 ++--
 arch/x86/kernel/cpu/common.c          |   13 ---
 arch/x86/kernel/cpu/topology_common.c |    4 +
 arch/x86/kernel/smpboot.c             |  111 ----------------------------------
 4 files changed, 12 insertions(+), 131 deletions(-)
---
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -172,6 +172,13 @@ static inline int topology_get_logical_i
 #define topology_core_cpumask(cpu)		(per_cpu(cpu_core_map, cpu))
 #define topology_sibling_cpumask(cpu)		(per_cpu(cpu_sibling_map, cpu))
 
+
+static inline int topology_phys_to_logical_pkg(unsigned int pkg)
+{
+	return topology_get_logical_id(pkg << x86_topo_system.dom_shifts[TOPO_PKG_DOMAIN],
+				       TOPO_PKG_DOMAIN);
+}
+
 extern int __max_smt_threads;
 
 static inline int topology_max_smt_threads(void)
@@ -181,10 +188,6 @@ static inline int topology_max_smt_threa
 
 #include <linux/cpu_smt.h>
 
-int topology_update_package_map(unsigned int apicid, unsigned int cpu);
-int topology_update_die_map(unsigned int dieid, unsigned int cpu);
-int topology_phys_to_logical_pkg(unsigned int pkg);
-
 extern unsigned int __amd_nodes_per_pkg;
 
 static inline unsigned int topology_amd_nodes_per_pkg(void)
@@ -205,10 +208,6 @@ static inline bool topology_is_primary_t
 }
 
 #else /* CONFIG_SMP */
-static inline int
-topology_update_package_map(unsigned int apicid, unsigned int cpu) { return 0; }
-static inline int
-topology_update_die_map(unsigned int dieid, unsigned int cpu) { return 0; }
 static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
 static inline int topology_max_smt_threads(void) { return 1; }
 static inline bool topology_is_primary_thread(unsigned int cpu) { return true; }
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1718,18 +1718,6 @@ static void generic_identify(struct cpui
 #endif
 }
 
-static void update_package_map(struct cpuinfo_x86 *c)
-{
-#ifdef CONFIG_SMP
-	unsigned int cpu = smp_processor_id();
-
-	BUG_ON(topology_update_package_map(c->topo.pkg_id, cpu));
-	BUG_ON(topology_update_die_map(c->topo.die_id, cpu));
-#else
-	c->topo.logical_pkg_id = 0;
-#endif
-}
-
 /*
  * This does the hard work of actually picking apart the CPU stuff...
  */
@@ -1913,7 +1901,6 @@ void identify_secondary_cpu(struct cpuin
 #ifdef CONFIG_X86_32
 	enable_sep_cpu();
 #endif
-	update_package_map(c);
 	x86_spec_ctrl_setup_ap();
 	update_srbds_msr();
 	if (boot_cpu_has_bug(X86_BUG_GDS))
--- a/arch/x86/kernel/cpu/topology_common.c
+++ b/arch/x86/kernel/cpu/topology_common.c
@@ -10,6 +10,7 @@
 #include "cpu.h"
 
 struct x86_topology_system x86_topo_system __ro_after_init;
+EXPORT_SYMBOL_GPL(x86_topo_system);
 
 unsigned int __amd_nodes_per_pkg __ro_after_init;
 EXPORT_SYMBOL_GPL(__amd_nodes_per_pkg);
@@ -147,6 +148,9 @@ static void topo_set_ids(struct topo_sca
 	c->topo.pkg_id = topo_shift_apicid(apicid, TOPO_PKG_DOMAIN);
 	c->topo.die_id = topo_shift_apicid(apicid, TOPO_DIE_DOMAIN);
 
+	c->topo.logical_pkg_id = topology_get_logical_id(apicid, TOPO_PKG_DOMAIN);
+	c->topo.logical_die_id = topology_get_logical_id(apicid, TOPO_DIE_DOMAIN);
+
 	/* Package relative core ID */
 	c->topo.core_id = (apicid & topo_domain_mask(TOPO_PKG_DOMAIN)) >>
 		x86_topo_system.dom_shifts[TOPO_SMT_DOMAIN];
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -125,23 +125,6 @@ struct mwait_cpu_dead {
  */
 static DEFINE_PER_CPU_ALIGNED(struct mwait_cpu_dead, mwait_cpu_dead);
 
-/* Logical package management. */
-struct logical_maps {
-	u32	phys_pkg_id;
-	u32	phys_die_id;
-	u32	logical_pkg_id;
-	u32	logical_die_id;
-};
-
-/* Temporary workaround until the full topology mechanics is in place */
-static DEFINE_PER_CPU_READ_MOSTLY(struct logical_maps, logical_maps) = {
-	.phys_pkg_id	= U32_MAX,
-	.phys_die_id	= U32_MAX,
-};
-
-static unsigned int logical_packages __read_mostly;
-static unsigned int logical_die __read_mostly;
-
 /* Maximum number of SMT threads on any online core */
 int __read_mostly __max_smt_threads = 1;
 
@@ -334,103 +317,11 @@ static void notrace start_secondary(void
 	cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);
 }
 
-/**
- * topology_phys_to_logical_pkg - Map a physical package id to a logical
- * @phys_pkg:	The physical package id to map
- *
- * Returns logical package id or -1 if not found
- */
-int topology_phys_to_logical_pkg(unsigned int phys_pkg)
-{
-	int cpu;
-
-	for_each_possible_cpu(cpu) {
-		if (per_cpu(logical_maps.phys_pkg_id, cpu) == phys_pkg)
-			return per_cpu(logical_maps.logical_pkg_id, cpu);
-	}
-	return -1;
-}
-EXPORT_SYMBOL(topology_phys_to_logical_pkg);
-
-/**
- * topology_phys_to_logical_die - Map a physical die id to logical
- * @die_id:	The physical die id to map
- * @cur_cpu:	The CPU for which the mapping is done
- *
- * Returns logical die id or -1 if not found
- */
-static int topology_phys_to_logical_die(unsigned int die_id, unsigned int cur_cpu)
-{
-	int cpu, proc_id = cpu_data(cur_cpu).topo.pkg_id;
-
-	for_each_possible_cpu(cpu) {
-		if (per_cpu(logical_maps.phys_pkg_id, cpu) == proc_id &&
-		    per_cpu(logical_maps.phys_die_id, cpu) == die_id)
-			return per_cpu(logical_maps.logical_die_id, cpu);
-	}
-	return -1;
-}
-
-/**
- * topology_update_package_map - Update the physical to logical package map
- * @pkg:	The physical package id as retrieved via CPUID
- * @cpu:	The cpu for which this is updated
- */
-int topology_update_package_map(unsigned int pkg, unsigned int cpu)
-{
-	int new;
-
-	/* Already available somewhere? */
-	new = topology_phys_to_logical_pkg(pkg);
-	if (new >= 0)
-		goto found;
-
-	new = logical_packages++;
-	if (new != pkg) {
-		pr_info("CPU %u Converting physical %u to logical package %u\n",
-			cpu, pkg, new);
-	}
-found:
-	per_cpu(logical_maps.phys_pkg_id, cpu) = pkg;
-	per_cpu(logical_maps.logical_pkg_id, cpu) = new;
-	cpu_data(cpu).topo.logical_pkg_id = new;
-	return 0;
-}
-/**
- * topology_update_die_map - Update the physical to logical die map
- * @die:	The die id as retrieved via CPUID
- * @cpu:	The cpu for which this is updated
- */
-int topology_update_die_map(unsigned int die, unsigned int cpu)
-{
-	int new;
-
-	/* Already available somewhere? */
-	new = topology_phys_to_logical_die(die, cpu);
-	if (new >= 0)
-		goto found;
-
-	new = logical_die++;
-	if (new != die) {
-		pr_info("CPU %u Converting physical %u to logical die %u\n",
-			cpu, die, new);
-	}
-found:
-	per_cpu(logical_maps.phys_die_id, cpu) = die;
-	per_cpu(logical_maps.logical_die_id, cpu) = new;
-	cpu_data(cpu).topo.logical_die_id = new;
-	return 0;
-}
-
 static void __init smp_store_boot_cpu_info(void)
 {
-	int id = 0; /* CPU 0 */
-	struct cpuinfo_x86 *c = &cpu_data(id);
+	struct cpuinfo_x86 *c = &cpu_data(0);
 
 	*c = boot_cpu_data;
-	c->cpu_index = id;
-	topology_update_package_map(c->topo.pkg_id, id);
-	topology_update_die_map(c->topo.die_id, id);
 	c->initialized = true;
 }
 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 26/30] x86/cpu/topology: Retrieve cores per package from topology bitmaps
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (24 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 25/30] x86/cpu/topology: Use topology logical mapping mechanism Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-01-23 13:11 ` [patch v2 27/30] x86/cpu/topology: Rename smp_num_siblings Thomas Gleixner
                   ` (5 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

Similar to other sizing information the number of cores per package can be
established from the topology bitmap.

Provide a function for retrieving that information and replace the buggy
hack in the CPUID evaluation with it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>



---
 arch/x86/kernel/cpu/topology.c        |   43 ++++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/topology.h        |   11 ++++++++
 arch/x86/kernel/cpu/topology_common.c |   18 ++------------
 3 files changed, 57 insertions(+), 15 deletions(-)
---
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -217,6 +217,49 @@ int topology_get_logical_id(u32 apicid,
 }
 EXPORT_SYMBOL_GPL(topology_get_logical_id);
 
+/**
+ * topology_unit_count - Retrieve the count of specified units at a given topology domain level
+ * @apicid:		The APIC ID which specifies the search range
+ * @which_units:	The domain level specifying the units to count
+ * @at_level:		The domain level at which @which_units have to be counted
+ *
+ * This returns the number of possible units according to the enumerated
+ * information.
+ *
+ * E.g. topology_count_units(apicid, TOPO_CORE_DOMAIN, TOPO_PKG_DOMAIN)
+ * counts the number of possible cores in the package to which @apicid
+ * belongs.
+ *
+ * @at_level must obviously be greater than @which_level to produce useful
+ * results.  If @at_level is equal to @which_units the result is
+ * unsurprisingly 1. If @at_level is less than @which_units the results
+ * is by definition undefined and the function returns 0.
+ */
+unsigned int topology_unit_count(u32 apicid, enum x86_topology_domains which_units,
+				 enum x86_topology_domains at_level)
+{
+	/* Remove the bits below @at_level to get the proper level ID of @apicid */
+	unsigned int lvlid = topo_apicid(apicid, at_level);
+	unsigned int id, end, cnt = 0;
+
+	if (lvlid >= MAX_LOCAL_APIC)
+		return 0;
+	if (!test_bit(lvlid, apic_maps[at_level].map))
+		return 0;
+	if (which_units > at_level)
+		return 0;
+	if (which_units == at_level)
+		return 1;
+
+	/* Calculate the exclusive end */
+	end = lvlid + (1U << x86_topo_system.dom_shifts[at_level]);
+	/* Unfortunately there is no bitmap_weight_range() */
+	for (id = find_next_bit(apic_maps[which_units].map, end, lvlid);
+	     id < end; id = find_next_bit(apic_maps[which_units].map, end, ++id))
+		cnt++;
+	return cnt;
+}
+
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /**
  * topology_hotplug_apic - Handle a physical hotplugged APIC after boot
--- a/arch/x86/kernel/cpu/topology.h
+++ b/arch/x86/kernel/cpu/topology.h
@@ -53,4 +53,15 @@ static inline void topology_update_dom(s
 	tscan->dom_ncpus[dom] = ncpus;
 }
 
+#ifdef CONFIG_X86_LOCAL_APIC
+unsigned int topology_unit_count(u32 apicid, enum x86_topology_domains which_units,
+				 enum x86_topology_domains at_level);
+#else
+static inline unsigned int topology_unit_count(u32 apicid, enum x86_topology_domains which_units,
+					       enum x86_topology_domains at_level)
+{
+	return 1;
+}
+#endif
+
 #endif /* ARCH_X86_TOPOLOGY_H */
--- a/arch/x86/kernel/cpu/topology_common.c
+++ b/arch/x86/kernel/cpu/topology_common.c
@@ -155,25 +155,15 @@ static void topo_set_ids(struct topo_sca
 	c->topo.core_id = (apicid & topo_domain_mask(TOPO_PKG_DOMAIN)) >>
 		x86_topo_system.dom_shifts[TOPO_SMT_DOMAIN];
 
+	/* Maximum number of cores on this package */
+	c->x86_max_cores = topology_unit_count(apicid, TOPO_CORE_DOMAIN, TOPO_PKG_DOMAIN);
+
 	c->topo.amd_node_id = tscan->amd_node_id;
 
 	if (c->x86_vendor == X86_VENDOR_AMD)
 		cpu_topology_fixup_amd(tscan);
 }
 
-static void topo_set_max_cores(struct topo_scan *tscan)
-{
-	/*
-	 * Bug compatible for now. This is broken on hybrid systems:
-	 * 8 cores SMT + 8 cores w/o SMT
-	 * tscan.dom_ncpus[TOPO_DIEGRP_DOMAIN] = 24; 24 / 2 = 12 !!
-	 *
-	 * Cannot be fixed without further topology enumeration changes.
-	 */
-	tscan->c->x86_max_cores = tscan->dom_ncpus[TOPO_DIEGRP_DOMAIN] >>
-		x86_topo_system.dom_shifts[TOPO_SMT_DOMAIN];
-}
-
 void cpu_parse_topology(struct cpuinfo_x86 *c)
 {
 	unsigned int dom, cpu = smp_processor_id();
@@ -201,7 +191,6 @@ void cpu_parse_topology(struct cpuinfo_x
 	}
 
 	topo_set_ids(&tscan);
-	topo_set_max_cores(&tscan);
 }
 
 void __init cpu_init_topology(struct cpuinfo_x86 *c)
@@ -223,7 +212,6 @@ void __init cpu_init_topology(struct cpu
 	}
 
 	topo_set_ids(&tscan);
-	topo_set_max_cores(&tscan);
 
 	/*
 	 * AMD systems have Nodes per package which cannot be mapped to


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 27/30] x86/cpu/topology: Rename smp_num_siblings
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (25 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 26/30] x86/cpu/topology: Retrieve cores per package from topology bitmaps Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-01-23 13:11 ` [patch v2 28/30] x86/cpu/topology: Rename topology_max_die_per_package() Thomas Gleixner
                   ` (4 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

It's really a non-intuitive name. Rename it to __max_threads_per_core which
is obvious.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>



---
 arch/x86/include/asm/perf_event_p4.h |    4 ++--
 arch/x86/include/asm/smp.h           |    2 --
 arch/x86/include/asm/topology.h      |    1 +
 arch/x86/kernel/cpu/common.c         |    6 +++---
 arch/x86/kernel/cpu/debugfs.c        |    2 +-
 arch/x86/kernel/cpu/mce/inject.c     |    2 +-
 arch/x86/kernel/cpu/topology.c       |    6 +++---
 arch/x86/kernel/process.c            |    2 +-
 arch/x86/kernel/smpboot.c            |    2 +-
 9 files changed, 13 insertions(+), 14 deletions(-)
---
--- a/arch/x86/include/asm/perf_event_p4.h
+++ b/arch/x86/include/asm/perf_event_p4.h
@@ -181,7 +181,7 @@ static inline u64 p4_clear_ht_bit(u64 co
 static inline int p4_ht_active(void)
 {
 #ifdef CONFIG_SMP
-	return smp_num_siblings > 1;
+	return __max_threads_per_core > 1;
 #endif
 	return 0;
 }
@@ -189,7 +189,7 @@ static inline int p4_ht_active(void)
 static inline int p4_ht_thread(int cpu)
 {
 #ifdef CONFIG_SMP
-	if (smp_num_siblings == 2)
+	if (__max_threads_per_core == 2)
 		return cpu != cpumask_first(this_cpu_cpumask_var_ptr(cpu_sibling_map));
 #endif
 	return 0;
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -8,8 +8,6 @@
 #include <asm/current.h>
 #include <asm/thread_info.h>
 
-extern unsigned int smp_num_siblings;
-
 DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
 DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_map);
 DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_die_map);
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -145,6 +145,7 @@ extern const struct cpumask *cpu_cluster
 
 extern unsigned int __max_dies_per_package;
 extern unsigned int __max_logical_packages;
+extern unsigned int __max_threads_per_core;
 
 static inline unsigned int topology_max_packages(void)
 {
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -72,8 +72,8 @@
 u32 elf_hwcap2 __read_mostly;
 
 /* Number of siblings per CPU package */
-unsigned int smp_num_siblings __ro_after_init = 1;
-EXPORT_SYMBOL(smp_num_siblings);
+unsigned int __max_threads_per_core __ro_after_init = 1;
+EXPORT_SYMBOL(__max_threads_per_core);
 
 unsigned int __max_dies_per_package __ro_after_init = 1;
 EXPORT_SYMBOL(__max_dies_per_package);
@@ -2249,7 +2249,7 @@ void __init arch_cpu_finalize_init(void)
 	 * identify_boot_cpu() initialized SMT support information, let the
 	 * core code know.
 	 */
-	cpu_smt_set_num_threads(smp_num_siblings, smp_num_siblings);
+	cpu_smt_set_num_threads(__max_threads_per_core, __max_threads_per_core);
 
 	if (!IS_ENABLED(CONFIG_SMP)) {
 		pr_info("CPU: ");
--- a/arch/x86/kernel/cpu/debugfs.c
+++ b/arch/x86/kernel/cpu/debugfs.c
@@ -30,7 +30,7 @@ static int cpu_debug_show(struct seq_fil
 	seq_printf(m, "amd_nodes_per_pkg:   %u\n", topology_amd_nodes_per_pkg());
 	seq_printf(m, "max_cores:           %u\n", c->x86_max_cores);
 	seq_printf(m, "max_dies_per_pkg:    %u\n", __max_dies_per_package);
-	seq_printf(m, "smp_num_siblings:    %u\n", smp_num_siblings);
+	seq_printf(m, "max_threads_per_core:%u\n", __max_threads_per_core);
 	return 0;
 }
 
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -433,7 +433,7 @@ static u32 get_nbc_for_node(int node_id)
 	struct cpuinfo_x86 *c = &boot_cpu_data;
 	u32 cores_per_node;
 
-	cores_per_node = (c->x86_max_cores * smp_num_siblings) / topology_amd_nodes_per_pkg();
+	cores_per_node = (c->x86_max_cores * __max_threads_per_core) / topology_amd_nodes_per_pkg();
 	return cores_per_node * node_id;
 }
 
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -76,7 +76,7 @@ bool arch_match_cpu_phys_id(int cpu, u64
 #ifdef CONFIG_SMP
 static void cpu_mark_primary_thread(unsigned int cpu, unsigned int apicid)
 {
-	if (!(apicid & (smp_num_siblings - 1)))
+	if (!(apicid & (__max_threads_per_core - 1)))
 		cpumask_set_cpu(cpu, &__cpu_primary_thread_mask);
 }
 #else
@@ -417,8 +417,8 @@ void __init topology_init_possible_cpus(
 
 	cnta = domain_weight(TOPO_CORE_DOMAIN);
 	cntb = domain_weight(TOPO_SMT_DOMAIN);
-	smp_num_siblings = 1U << (get_count_order(cntb) - get_count_order(cnta));
-	pr_info("Max. threads per core: %3u\n", smp_num_siblings);
+	__max_threads_per_core = 1U << (get_count_order(cntb) - get_count_order(cnta));
+	pr_info("Max. threads per core: %3u\n", __max_threads_per_core);
 
 	pr_info("Allowing %u present CPUs plus %u hotplug CPUs\n", assigned, disabled);
 	if (topo_info.nr_rejected_cpus)
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -936,7 +936,7 @@ static __cpuidle void mwait_idle(void)
 void select_idle_routine(const struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_SMP
-	if (boot_option_idle_override == IDLE_POLL && smp_num_siblings > 1)
+	if (boot_option_idle_override == IDLE_POLL && __max_threads_per_core > 1)
 		pr_warn_once("WARNING: polling idle and HT enabled, performance may degrade\n");
 #endif
 	if (x86_idle_set() || boot_option_idle_override == IDLE_POLL)
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -563,7 +563,7 @@ static void __init build_sched_topology(
 
 void set_cpu_sibling_map(int cpu)
 {
-	bool has_smt = smp_num_siblings > 1;
+	bool has_smt = __max_threads_per_core > 1;
 	bool has_mp = has_smt || boot_cpu_data.x86_max_cores > 1;
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
 	struct cpuinfo_x86 *o;


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 28/30] x86/cpu/topology: Rename topology_max_die_per_package()
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (26 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 27/30] x86/cpu/topology: Rename smp_num_siblings Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-01-23 13:11 ` [patch v2 29/30] x86/cpu/topology: Provide __num_[cores|threads]_per_package Thomas Gleixner
                   ` (3 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

The plural of die is dies.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>



---
 arch/x86/events/intel/cstate.c                                 |    2 +-
 arch/x86/events/intel/uncore.c                                 |    2 +-
 arch/x86/events/intel/uncore_snbep.c                           |    2 +-
 arch/x86/events/rapl.c                                         |    2 +-
 arch/x86/include/asm/topology.h                                |    2 +-
 drivers/hwmon/coretemp.c                                       |    2 +-
 drivers/platform/x86/intel/uncore-frequency/uncore-frequency.c |    2 +-
 drivers/powercap/intel_rapl_common.c                           |    2 +-
 drivers/thermal/intel/intel_hfi.c                              |    2 +-
 drivers/thermal/intel/intel_powerclamp.c                       |    2 +-
 drivers/thermal/intel/x86_pkg_temp_thermal.c                   |    2 +-
 11 files changed, 11 insertions(+), 11 deletions(-)
---
--- a/arch/x86/events/intel/cstate.c
+++ b/arch/x86/events/intel/cstate.c
@@ -834,7 +834,7 @@ static int __init cstate_init(void)
 	}
 
 	if (has_cstate_pkg) {
-		if (topology_max_die_per_package() > 1) {
+		if (topology_max_dies_per_package() > 1) {
 			err = perf_pmu_register(&cstate_pkg_pmu,
 						"cstate_die", -1);
 		} else {
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1893,7 +1893,7 @@ static int __init intel_uncore_init(void
 		return -ENODEV;
 
 	__uncore_max_dies =
-		topology_max_packages() * topology_max_die_per_package();
+		topology_max_packages() * topology_max_dies_per_package();
 
 	id = x86_match_cpu(intel_uncore_match);
 	if (!id) {
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -1406,7 +1406,7 @@ static int topology_gidnid_map(int nodei
 	 */
 	for (i = 0; i < 8; i++) {
 		if (nodeid == GIDNIDMAP(gidnid, i)) {
-			if (topology_max_die_per_package() > 1)
+			if (topology_max_dies_per_package() > 1)
 				die_id = i;
 			else
 				die_id = topology_phys_to_logical_pkg(i);
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -674,7 +674,7 @@ static const struct attribute_group *rap
 
 static int __init init_rapl_pmus(void)
 {
-	int maxdie = topology_max_packages() * topology_max_die_per_package();
+	int maxdie = topology_max_packages() * topology_max_dies_per_package();
 	size_t size;
 
 	size = sizeof(*rapl_pmus) + maxdie * sizeof(struct rapl_pmu *);
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -152,7 +152,7 @@ static inline unsigned int topology_max_
 	return __max_logical_packages;
 }
 
-static inline unsigned int topology_max_die_per_package(void)
+static inline unsigned int topology_max_dies_per_package(void)
 {
 	return __max_dies_per_package;
 }
--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -780,7 +780,7 @@ static int __init coretemp_init(void)
 	if (!x86_match_cpu(coretemp_ids))
 		return -ENODEV;
 
-	max_zones = topology_max_packages() * topology_max_die_per_package();
+	max_zones = topology_max_packages() * topology_max_dies_per_package();
 	zone_devices = kcalloc(max_zones, sizeof(struct platform_device *),
 			      GFP_KERNEL);
 	if (!zone_devices)
--- a/drivers/platform/x86/intel/uncore-frequency/uncore-frequency.c
+++ b/drivers/platform/x86/intel/uncore-frequency/uncore-frequency.c
@@ -242,7 +242,7 @@ static int __init intel_uncore_init(void
 		return -ENODEV;
 
 	uncore_max_entries = topology_max_packages() *
-					topology_max_die_per_package();
+					topology_max_dies_per_package();
 	uncore_instances = kcalloc(uncore_max_entries,
 				   sizeof(*uncore_instances), GFP_KERNEL);
 	if (!uncore_instances)
--- a/drivers/powercap/intel_rapl_common.c
+++ b/drivers/powercap/intel_rapl_common.c
@@ -1564,7 +1564,7 @@ struct rapl_package *rapl_add_package(in
 	if (id_is_cpu) {
 		rp->id = topology_logical_die_id(id);
 		rp->lead_cpu = id;
-		if (topology_max_die_per_package() > 1)
+		if (topology_max_dies_per_package() > 1)
 			snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH, "package-%d-die-%d",
 				 topology_physical_package_id(id), topology_die_id(id));
 		else
--- a/drivers/thermal/intel/intel_hfi.c
+++ b/drivers/thermal/intel/intel_hfi.c
@@ -581,7 +581,7 @@ void __init intel_hfi_init(void)
 
 	/* There is one HFI instance per die/package. */
 	max_hfi_instances = topology_max_packages() *
-			    topology_max_die_per_package();
+			    topology_max_dies_per_package();
 
 	/*
 	 * This allocation may fail. CPU hotplug callbacks must check
--- a/drivers/thermal/intel/intel_powerclamp.c
+++ b/drivers/thermal/intel/intel_powerclamp.c
@@ -616,7 +616,7 @@ static int powerclamp_idle_injection_reg
 	poll_pkg_cstate_enable = false;
 	if (cpumask_equal(cpu_present_mask, idle_injection_cpu_mask)) {
 		ii_dev = idle_inject_register_full(idle_injection_cpu_mask, idle_inject_update);
-		if (topology_max_packages() == 1 && topology_max_die_per_package() == 1)
+		if (topology_max_packages() == 1 && topology_max_dies_per_package() == 1)
 			poll_pkg_cstate_enable = true;
 	} else {
 		ii_dev = idle_inject_register(idle_injection_cpu_mask);
--- a/drivers/thermal/intel/x86_pkg_temp_thermal.c
+++ b/drivers/thermal/intel/x86_pkg_temp_thermal.c
@@ -494,7 +494,7 @@ static int __init pkg_temp_thermal_init(
 	if (!x86_match_cpu(pkg_temp_thermal_ids))
 		return -ENODEV;
 
-	max_id = topology_max_packages() * topology_max_die_per_package();
+	max_id = topology_max_packages() * topology_max_dies_per_package();
 	zones = kcalloc(max_id, sizeof(struct zone_device *),
 			   GFP_KERNEL);
 	if (!zones)


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 29/30] x86/cpu/topology: Provide __num_[cores|threads]_per_package
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (27 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 28/30] x86/cpu/topology: Rename topology_max_die_per_package() Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-01-23 13:11 ` [patch v2 30/30] x86/cpu/topology: Get rid of cpuinfo::x86_max_cores Thomas Gleixner
                   ` (2 subsequent siblings)
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

Expose properly accounted information and accessors so the fiddling with
other topology variables can be replaced.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>



---
 arch/x86/include/asm/topology.h |   12 ++++++++++++
 arch/x86/kernel/cpu/common.c    |    6 ++++++
 arch/x86/kernel/cpu/topology.c  |    8 +++++++-
 3 files changed, 25 insertions(+), 1 deletion(-)
---
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -146,6 +146,8 @@ extern const struct cpumask *cpu_cluster
 extern unsigned int __max_dies_per_package;
 extern unsigned int __max_logical_packages;
 extern unsigned int __max_threads_per_core;
+extern unsigned int __num_threads_per_package;
+extern unsigned int __num_cores_per_package;
 
 static inline unsigned int topology_max_packages(void)
 {
@@ -157,6 +159,16 @@ static inline unsigned int topology_max_
 	return __max_dies_per_package;
 }
 
+static inline unsigned int topology_num_cores_per_package(void)
+{
+	return __num_cores_per_package;
+}
+
+static inline unsigned int topology_num_threads_per_package(void)
+{
+	return __num_threads_per_package;
+}
+
 #ifdef CONFIG_X86_LOCAL_APIC
 int topology_get_logical_id(u32 apicid, enum x86_topology_domains at_level);
 #else
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -81,6 +81,12 @@ EXPORT_SYMBOL(__max_dies_per_package);
 unsigned int __max_logical_packages __ro_after_init = 1;
 EXPORT_SYMBOL(__max_logical_packages);
 
+unsigned int __num_cores_per_package __ro_after_init = 1;
+EXPORT_SYMBOL(__num_cores_per_package);
+
+unsigned int __num_threads_per_package __ro_after_init = 1;
+EXPORT_SYMBOL(__num_threads_per_package);
+
 static struct ppin_info {
 	int	feature;
 	int	msr_ppin_ctl;
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -386,7 +386,7 @@ void __init topology_init_possible_cpus(
 	unsigned int disabled = topo_info.nr_disabled_cpus;
 	unsigned int cnta, cntb, cpu, allowed = 1;
 	unsigned int total = assigned + disabled;
-	u32 apicid;
+	u32 apicid, firstid;
 
 	if (!restrict_to_up()) {
 		if (WARN_ON_ONCE(assigned > nr_cpu_ids)) {
@@ -422,6 +422,12 @@ void __init topology_init_possible_cpus(
 	__max_threads_per_core = 1U << (get_count_order(cntb) - get_count_order(cnta));
 	pr_info("Max. threads per core: %3u\n", __max_threads_per_core);
 
+	firstid = find_first_bit(apic_maps[TOPO_SMT_DOMAIN].map, MAX_LOCAL_APIC);
+	__num_cores_per_package = topology_unit_count(firstid, TOPO_CORE_DOMAIN, TOPO_PKG_DOMAIN);
+	pr_info("Num. cores per package:   %3u\n", __num_cores_per_package);
+	__num_threads_per_package = topology_unit_count(firstid, TOPO_SMT_DOMAIN, TOPO_PKG_DOMAIN);
+	pr_info("Num. threads per package: %3u\n", __num_threads_per_package);
+
 	pr_info("Allowing %u present CPUs plus %u hotplug CPUs\n", assigned, disabled);
 	if (topo_info.nr_rejected_cpus)
 		pr_info("Rejected CPUs %u\n", topo_info.nr_rejected_cpus);


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [patch v2 30/30] x86/cpu/topology: Get rid of cpuinfo::x86_max_cores
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (28 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 29/30] x86/cpu/topology: Provide __num_[cores|threads]_per_package Thomas Gleixner
@ 2024-01-23 13:11 ` Thomas Gleixner
  2024-01-24 14:31 ` [patch v2 00/30] x86/apic: Rework APIC registration Zhang, Rui
  2024-02-01 22:10 ` Sohil Mehta
  31 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-23 13:11 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Michael Kelley, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de>

Now that __num_cores_per_package and __num_threads_per_package are
available, cpuinfo::x86_max_cores and the related math all over the place
can be replaced with the ready to consume data.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>



---
 Documentation/arch/x86/topology.rst              |   24 ++++++++---------------
 arch/x86/events/intel/uncore_nhmex.c             |    4 +--
 arch/x86/events/intel/uncore_snb.c               |    8 +++----
 arch/x86/events/intel/uncore_snbep.c             |   16 +++++++--------
 arch/x86/include/asm/processor.h                 |    2 -
 arch/x86/kernel/cpu/cacheinfo.c                  |    2 -
 arch/x86/kernel/cpu/common.c                     |    1 
 arch/x86/kernel/cpu/debugfs.c                    |    3 +-
 arch/x86/kernel/cpu/mce/inject.c                 |    3 --
 arch/x86/kernel/cpu/microcode/intel.c            |    2 -
 arch/x86/kernel/cpu/topology_common.c            |    3 --
 arch/x86/kernel/smpboot.c                        |    2 -
 drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c |    2 -
 drivers/hwmon/fam15h_power.c                     |    2 -
 14 files changed, 31 insertions(+), 43 deletions(-)
---
--- a/Documentation/arch/x86/topology.rst
+++ b/Documentation/arch/x86/topology.rst
@@ -47,17 +47,21 @@ AMD nomenclature for package is 'Node'.
 
 Package-related topology information in the kernel:
 
-  - cpuinfo_x86.x86_max_cores:
+  - topology_num_threads_per_package()
 
-    The number of cores in a package. This information is retrieved via CPUID.
+    The number of threads in a package.
 
-  - cpuinfo_x86.x86_max_dies:
+  - topology_num_cores_per_package()
 
-    The number of dies in a package. This information is retrieved via CPUID.
+    The number of cores in a package.
+
+  - topology_max_dies_per_package()
+
+    The maximum number of dies in a package.
 
   - cpuinfo_x86.topo.die_id:
 
-    The physical ID of the die. This information is retrieved via CPUID.
+    The physical ID of the die.
 
   - cpuinfo_x86.topo.pkg_id:
 
@@ -96,16 +100,6 @@ are SMT- or CMT-type threads.
 AMDs nomenclature for a CMT core is "Compute Unit". The kernel always uses
 "core".
 
-Core-related topology information in the kernel:
-
-  - smp_num_siblings:
-
-    The number of threads in a core. The number of threads in a package can be
-    calculated by::
-
-	threads_per_package = cpuinfo_x86.x86_max_cores * smp_num_siblings
-
-
 Threads
 =======
 A thread is a single scheduling unit. It's the equivalent to a logical Linux
--- a/arch/x86/events/intel/uncore_nhmex.c
+++ b/arch/x86/events/intel/uncore_nhmex.c
@@ -1221,8 +1221,8 @@ void nhmex_uncore_cpu_init(void)
 		uncore_nhmex = true;
 	else
 		nhmex_uncore_mbox.event_descs = wsmex_uncore_mbox_events;
-	if (nhmex_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores)
-		nhmex_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores;
+	if (nhmex_uncore_cbox.num_boxes > topology_num_cores_per_package())
+		nhmex_uncore_cbox.num_boxes = topology_num_cores_per_package();
 	uncore_msr_uncores = nhmex_msr_uncores;
 }
 /* end of Nehalem-EX uncore support */
--- a/arch/x86/events/intel/uncore_snb.c
+++ b/arch/x86/events/intel/uncore_snb.c
@@ -364,8 +364,8 @@ static struct intel_uncore_type *snb_msr
 void snb_uncore_cpu_init(void)
 {
 	uncore_msr_uncores = snb_msr_uncores;
-	if (snb_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores)
-		snb_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores;
+	if (snb_uncore_cbox.num_boxes > topology_num_cores_per_package())
+		snb_uncore_cbox.num_boxes = topology_num_cores_per_package();
 }
 
 static void skl_uncore_msr_init_box(struct intel_uncore_box *box)
@@ -428,8 +428,8 @@ static struct intel_uncore_type *skl_msr
 void skl_uncore_cpu_init(void)
 {
 	uncore_msr_uncores = skl_msr_uncores;
-	if (skl_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores)
-		skl_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores;
+	if (skl_uncore_cbox.num_boxes > topology_num_cores_per_package())
+		skl_uncore_cbox.num_boxes = topology_num_cores_per_package();
 	snb_uncore_arb.ops = &skl_uncore_msr_ops;
 }
 
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -1172,8 +1172,8 @@ static struct intel_uncore_type *snbep_m
 
 void snbep_uncore_cpu_init(void)
 {
-	if (snbep_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores)
-		snbep_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores;
+	if (snbep_uncore_cbox.num_boxes > topology_num_cores_per_package())
+		snbep_uncore_cbox.num_boxes = topology_num_cores_per_package();
 	uncore_msr_uncores = snbep_msr_uncores;
 }
 
@@ -1845,8 +1845,8 @@ static struct intel_uncore_type *ivbep_m
 
 void ivbep_uncore_cpu_init(void)
 {
-	if (ivbep_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores)
-		ivbep_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores;
+	if (ivbep_uncore_cbox.num_boxes > topology_num_cores_per_package())
+		ivbep_uncore_cbox.num_boxes = topology_num_cores_per_package();
 	uncore_msr_uncores = ivbep_msr_uncores;
 }
 
@@ -2917,8 +2917,8 @@ static bool hswep_has_limit_sbox(unsigne
 
 void hswep_uncore_cpu_init(void)
 {
-	if (hswep_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores)
-		hswep_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores;
+	if (hswep_uncore_cbox.num_boxes > topology_num_cores_per_package())
+		hswep_uncore_cbox.num_boxes = topology_num_cores_per_package();
 
 	/* Detect 6-8 core systems with only two SBOXes */
 	if (hswep_has_limit_sbox(HSWEP_PCU_DID))
@@ -3280,8 +3280,8 @@ static struct event_constraint bdx_uncor
 
 void bdx_uncore_cpu_init(void)
 {
-	if (bdx_uncore_cbox.num_boxes > boot_cpu_data.x86_max_cores)
-		bdx_uncore_cbox.num_boxes = boot_cpu_data.x86_max_cores;
+	if (bdx_uncore_cbox.num_boxes > topology_num_cores_per_package())
+		bdx_uncore_cbox.num_boxes = topology_num_cores_per_package();
 	uncore_msr_uncores = bdx_msr_uncores;
 
 	/* Detect systems with no SBOXes */
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -149,8 +149,6 @@ struct cpuinfo_x86 {
 	unsigned long		loops_per_jiffy;
 	/* protected processor identification number */
 	u64			ppin;
-	/* cpuid returned max cores value: */
-	u16			x86_max_cores;
 	u16			x86_clflush_size;
 	/* number of cores as seen by the OS: */
 	u16			booted_cores;
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -301,7 +301,7 @@ amd_cpuid4(int leaf, union _cpuid4_leaf_
 	eax->split.type = types[leaf];
 	eax->split.level = levels[leaf];
 	eax->split.num_threads_sharing = 0;
-	eax->split.num_cores_on_die = __this_cpu_read(cpu_info.x86_max_cores) - 1;
+	eax->split.num_cores_on_die = topology_num_cores_per_package();
 
 
 	if (assoc == 0xffff)
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1737,7 +1737,6 @@ static void identify_cpu(struct cpuinfo_
 	c->x86_model = c->x86_stepping = 0;	/* So far unknown... */
 	c->x86_vendor_id[0] = '\0'; /* Unset */
 	c->x86_model_id[0] = '\0';  /* Unset */
-	c->x86_max_cores = 1;
 #ifdef CONFIG_X86_64
 	c->x86_clflush_size = 64;
 	c->x86_phys_bits = 36;
--- a/arch/x86/kernel/cpu/debugfs.c
+++ b/arch/x86/kernel/cpu/debugfs.c
@@ -28,7 +28,8 @@ static int cpu_debug_show(struct seq_fil
 	seq_printf(m, "l2c_id:              %u\n", c->topo.l2c_id);
 	seq_printf(m, "amd_node_id:         %u\n", c->topo.amd_node_id);
 	seq_printf(m, "amd_nodes_per_pkg:   %u\n", topology_amd_nodes_per_pkg());
-	seq_printf(m, "max_cores:           %u\n", c->x86_max_cores);
+	seq_printf(m, "num_threads:         %u\n", __num_threads_per_package);
+	seq_printf(m, "num_cores:           %u\n", __num_cores_per_package);
 	seq_printf(m, "max_dies_per_pkg:    %u\n", __max_dies_per_package);
 	seq_printf(m, "max_threads_per_core:%u\n", __max_threads_per_core);
 	return 0;
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -430,10 +430,9 @@ static void trigger_thr_int(void *info)
 
 static u32 get_nbc_for_node(int node_id)
 {
-	struct cpuinfo_x86 *c = &boot_cpu_data;
 	u32 cores_per_node;
 
-	cores_per_node = (c->x86_max_cores * __max_threads_per_core) / topology_amd_nodes_per_pkg();
+	cores_per_node = topology_num_threads_per_package() / topology_amd_nodes_per_pkg();
 	return cores_per_node * node_id;
 }
 
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -641,7 +641,7 @@ static __init void calc_llc_size_per_cor
 {
 	u64 llc_size = c->x86_cache_size * 1024ULL;
 
-	do_div(llc_size, c->x86_max_cores);
+	do_div(llc_size, topology_num_cores_per_package());
 	llc_size_per_core = (unsigned int)llc_size;
 }
 
--- a/arch/x86/kernel/cpu/topology_common.c
+++ b/arch/x86/kernel/cpu/topology_common.c
@@ -155,9 +155,6 @@ static void topo_set_ids(struct topo_sca
 	c->topo.core_id = (apicid & topo_domain_mask(TOPO_PKG_DOMAIN)) >>
 		x86_topo_system.dom_shifts[TOPO_SMT_DOMAIN];
 
-	/* Maximum number of cores on this package */
-	c->x86_max_cores = topology_unit_count(apicid, TOPO_CORE_DOMAIN, TOPO_PKG_DOMAIN);
-
 	c->topo.amd_node_id = tscan->amd_node_id;
 
 	if (c->x86_vendor == X86_VENDOR_AMD)
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -564,7 +564,7 @@ static void __init build_sched_topology(
 void set_cpu_sibling_map(int cpu)
 {
 	bool has_smt = __max_threads_per_core > 1;
-	bool has_mp = has_smt || boot_cpu_data.x86_max_cores > 1;
+	bool has_mp = has_smt || topology_num_cores_per_package() > 1;
 	struct cpuinfo_x86 *c = &cpu_data(cpu);
 	struct cpuinfo_x86 *o;
 	int i, threads;
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
@@ -451,7 +451,7 @@ static int vangogh_init_smc_tables(struc
 
 #ifdef CONFIG_X86
 	/* AMD x86 APU only */
-	smu->cpu_core_num = boot_cpu_data.x86_max_cores;
+	smu->cpu_core_num = topology_num_cores_per_package();
 #else
 	smu->cpu_core_num = 4;
 #endif
--- a/drivers/hwmon/fam15h_power.c
+++ b/drivers/hwmon/fam15h_power.c
@@ -209,7 +209,7 @@ static ssize_t power1_average_show(struc
 	 * With the new x86 topology modelling, x86_max_cores is the
 	 * compute unit number.
 	 */
-	cu_num = boot_cpu_data.x86_max_cores;
+	cu_num = topology_num_cores_per_package();
 
 	ret = read_registers(data);
 	if (ret)


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch v2 00/30] x86/apic: Rework APIC registration
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (29 preceding siblings ...)
  2024-01-23 13:11 ` [patch v2 30/30] x86/cpu/topology: Get rid of cpuinfo::x86_max_cores Thomas Gleixner
@ 2024-01-24 14:31 ` Zhang, Rui
  2024-02-01 22:10 ` Sohil Mehta
  31 siblings, 0 replies; 44+ messages in thread
From: Zhang, Rui @ 2024-01-24 14:31 UTC (permalink / raw)
  To: tglx, linux-kernel
  Cc: arjan, mhklinux, andrew.cooper3, ray.huang, thomas.lendacky,
	Wang, Wendy, Sivanich, Dimitri, Tang, Feng, kan.liang, Mehta,
	Sohil, peterz, paulmck, kprateek.nayak, jgross, andy, x86

> 
> The current series applies on top of 
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-
> cleanup-v2
> 
> and is available from git here:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-
> full-v2
> 

Hi, Thomas,

Great to see the update. We do have a couple of CPU topology bugs that
rely on this rework.

Wendy and I will test all the 3 patch sets on a series of machines, but
this may take some time.

thanks,
rui

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch v2 21/30] x86/cpu/topology: Use topology bitmaps for sizing
  2024-01-23 13:11 ` [patch v2 21/30] x86/cpu/topology: Use topology bitmaps for sizing Thomas Gleixner
@ 2024-01-26  7:07   ` Zhang, Rui
  2024-01-26 20:22     ` Thomas Gleixner
  0 siblings, 1 reply; 44+ messages in thread
From: Zhang, Rui @ 2024-01-26  7:07 UTC (permalink / raw)
  To: tglx, linux-kernel
  Cc: Raj, Ashok, mhklinux, arjan, ray.huang, thomas.lendacky,
	andrew.cooper3, Sivanich, Dimitri, Tang, Feng, kan.liang, Mehta,
	Sohil, peterz, paulmck, kprateek.nayak, jgross, andy, x86


> >  
> > +       cnta = domain_weight(TOPO_PKG_DOMAIN);
> > +       cntb = domain_weight(TOPO_DIE_DOMAIN);
> > +       __max_logical_packages = cnta;
> > +       __max_dies_per_package = 1U << (get_count_order(cntb) - >
> > get_count_order(cnta));
> > +
> > +       pr_info("Max. logical packages: %3u\n", cnta);
> > +       pr_info("Max. logical dies:     %3u\n", cntb);
> > +       pr_info("Max. dies per package: %3u\n", >
> > __max_dies_per_package);
> > +
> > +       cnta = domain_weight(TOPO_CORE_DOMAIN);
> > +       cntb = domain_weight(TOPO_SMT_DOMAIN);
> > +       smp_num_siblings = 1U << (get_count_order(cntb) - >
> > get_count_order(cnta));
> > +       pr_info("Max. threads per core: %3u\n", smp_num_siblings);
> > +

I missed this but Ashok catches it.

Say, on my Adlerlake platform, which has 4 Pcores with HT + 8 Ecores,
cnta is 12, cntb is 16, and smp_num_siblings is set to 1 erroneously.

I think we should use
	smp_num_siblings = DIV_ROUND_UP(cntb, cnta);
here.
Or even check each core to get the maximum value (in case there are
more than 2 siblings in a core some day).

thanks,
rui



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch v2 21/30] x86/cpu/topology: Use topology bitmaps for sizing
  2024-01-26  7:07   ` Zhang, Rui
@ 2024-01-26 20:22     ` Thomas Gleixner
  2024-01-28 20:01       ` Paul E. McKenney
  2024-02-12 16:40       ` Thomas Gleixner
  0 siblings, 2 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-01-26 20:22 UTC (permalink / raw)
  To: Zhang, Rui, linux-kernel
  Cc: Raj, Ashok, mhklinux, arjan, ray.huang, thomas.lendacky,
	andrew.cooper3, Sivanich, Dimitri, Tang, Feng, kan.liang, Mehta,
	Sohil, peterz, paulmck, kprateek.nayak, jgross, andy, x86

On Fri, Jan 26 2024 at 07:07, Zhang, Rui wrote:
>> >  
>> > +       cnta = domain_weight(TOPO_PKG_DOMAIN);
>> > +       cntb = domain_weight(TOPO_DIE_DOMAIN);
>> > +       __max_logical_packages = cnta;
>> > +       __max_dies_per_package = 1U << (get_count_order(cntb) - >
>> > get_count_order(cnta));
>> > +
>> > +       pr_info("Max. logical packages: %3u\n", cnta);
>> > +       pr_info("Max. logical dies:     %3u\n", cntb);
>> > +       pr_info("Max. dies per package: %3u\n", >
>> > __max_dies_per_package);
>> > +
>> > +       cnta = domain_weight(TOPO_CORE_DOMAIN);
>> > +       cntb = domain_weight(TOPO_SMT_DOMAIN);
>> > +       smp_num_siblings = 1U << (get_count_order(cntb) - >
>> > get_count_order(cnta));
>> > +       pr_info("Max. threads per core: %3u\n", smp_num_siblings);
>> > +
>
> I missed this but Ashok catches it.
>
> Say, on my Adlerlake platform, which has 4 Pcores with HT + 8 Ecores,
> cnta is 12, cntb is 16, and smp_num_siblings is set to 1 erroneously.
>
> I think we should use
> 	smp_num_siblings = DIV_ROUND_UP(cntb, cnta);
> here.

Indeed. That's more than obvious.

> Or even check each core to get the maximum value (in case there are
> more than 2 siblings in a core some day).

We want to get rid of HT not make it worse.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch v2 21/30] x86/cpu/topology: Use topology bitmaps for sizing
  2024-01-26 20:22     ` Thomas Gleixner
@ 2024-01-28 20:01       ` Paul E. McKenney
  2024-02-12 16:40       ` Thomas Gleixner
  1 sibling, 0 replies; 44+ messages in thread
From: Paul E. McKenney @ 2024-01-28 20:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Zhang, Rui, linux-kernel, Raj, Ashok, mhklinux, arjan, ray.huang,
	thomas.lendacky, andrew.cooper3, Sivanich, Dimitri, Tang, Feng,
	kan.liang, Mehta, Sohil, peterz, kprateek.nayak, jgross, andy,
	x86

On Fri, Jan 26, 2024 at 09:22:47PM +0100, Thomas Gleixner wrote:
> On Fri, Jan 26 2024 at 07:07, Zhang, Rui wrote:
> >> >  
> >> > +       cnta = domain_weight(TOPO_PKG_DOMAIN);
> >> > +       cntb = domain_weight(TOPO_DIE_DOMAIN);
> >> > +       __max_logical_packages = cnta;
> >> > +       __max_dies_per_package = 1U << (get_count_order(cntb) - >
> >> > get_count_order(cnta));
> >> > +
> >> > +       pr_info("Max. logical packages: %3u\n", cnta);
> >> > +       pr_info("Max. logical dies:     %3u\n", cntb);
> >> > +       pr_info("Max. dies per package: %3u\n", >
> >> > __max_dies_per_package);
> >> > +
> >> > +       cnta = domain_weight(TOPO_CORE_DOMAIN);
> >> > +       cntb = domain_weight(TOPO_SMT_DOMAIN);
> >> > +       smp_num_siblings = 1U << (get_count_order(cntb) - >
> >> > get_count_order(cnta));
> >> > +       pr_info("Max. threads per core: %3u\n", smp_num_siblings);
> >> > +
> >
> > I missed this but Ashok catches it.
> >
> > Say, on my Adlerlake platform, which has 4 Pcores with HT + 8 Ecores,
> > cnta is 12, cntb is 16, and smp_num_siblings is set to 1 erroneously.
> >
> > I think we should use
> > 	smp_num_siblings = DIV_ROUND_UP(cntb, cnta);
> > here.
> 
> Indeed. That's more than obvious.
> 
> > Or even check each core to get the maximum value (in case there are
> > more than 2 siblings in a core some day).
> 
> We want to get rid of HT not make it worse.

Hear, hear!!!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [patch v2 15/30] x86/cpu: Detect real BSP on crash kernels
  2024-01-23 13:11 ` [patch v2 15/30] x86/cpu: Detect real BSP on crash kernels Thomas Gleixner
@ 2024-01-31 17:59   ` Michael Kelley
  0 siblings, 0 replies; 44+ messages in thread
From: Michael Kelley @ 2024-01-31 17:59 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Sohil Mehta, K Prateek Nayak,
	Kan Liang, Zhang Rui, Paul E. McKenney, Feng Tang,
	Andy Shevchenko, Peter Zijlstra (Intel)

From: Thomas Gleixner <tglx@linutronix.de> Sent: Tuesday, January 23, 2024 5:11 AM
> 
> When a kdump kernel is started from a crashing CPU then there is no
> guarantee that this CPU is the real boot CPU (BSP). If the kdump kernel
> tries to online the BSP then the INIT sequence will reset the machine.
> 
> There is a command line option to prevent this, but in case of nested kdump
> kernels this is wrong.
> 
> But that command line option is not required at all because the real
> BSP is enumerated as the first CPU by firmware. Support for the only
> known system which was different (Voyager) got removed long ago.
> 
> Detect whether the boot CPU APIC ID is the first APIC ID enumerated by
> the firmware. If the first APIC ID enumerated is not matching the boot
> CPU APIC ID then skip registering it.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> V2: Check for the first enumerated APIC ID (Rui)
> ---

[snip]

>  /**
> + * topology_register_apic - Register an APIC in early topology maps
> + * @apic_id:	The APIC ID to set up
> + * @acpi_id:	The ACPI ID associated to the APIC
> + * @present:	True if the corresponding CPU is present
> + */
> +void __init topology_register_apic(u32 apic_id, u32 acpi_id, bool present)
> +{
> +	if (apic_id >= MAX_LOCAL_APIC) {
> +		pr_err_once("APIC ID %x exceeds kernel limit of: %x\n", apic_id, MAX_LOCAL_APIC - 1);
> +		topo_info.nr_rejected_cpus++;
> +		return;
> +	}
> +
> +	/* CPU numbers exhausted? */
> +	if (topo_info.nr_assigned_cpus >= nr_cpu_ids) {

I'm seeing a problem here when nr_cpus=1 on the kernel boot line
in an otherwise multiprocessor system.  topo_info.nr_assigned_cpus
is statically initialized to "1", so when nr_cpus_ids is "1", this test
is true every time this function is called, including for the first APIC
enumerated from the MADT.   The warning message is output once,
and it's correct.  But topo_info.nr_rejected_cpus is incremented
when it shouldn't be so the number of rejected CPUs is reported as
the # of CPUs in the system.

The # of rejected CPUs message is just cosmetic. But worse,
check_for_real_bsp() and topo_register_apic() are never called. In
a kdump kernel where the panic'ing CPU is not physical CPU zero,
I don't get any messages from check_for_real_bsp().

Michael

> +		pr_warn_once("CPU limit of %d reached. Ignoring further CPUs\n", nr_cpu_ids);
> +		topo_info.nr_rejected_cpus++;
> +		return;
> +	}
> +
> +	if (check_for_real_bsp(apic_id))
> +		return;
> +
> +	topo_register_apic(apic_id, acpi_id, present);
> +}
> +

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch v2 14/30] x86/cpu/topology: Rework possible CPU management
  2024-01-23 13:11 ` [patch v2 14/30] x86/cpu/topology: Rework possible CPU management Thomas Gleixner
@ 2024-01-31 23:47   ` Sohil Mehta
  0 siblings, 0 replies; 44+ messages in thread
From: Sohil Mehta @ 2024-01-31 23:47 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, K Prateek Nayak, Kan Liang,
	Zhang Rui, Paul E. McKenney, Feng Tang, Andy Shevchenko,
	Michael Kelley, Peter Zijlstra (Intel)

On 1/23/2024 5:11 AM, Thomas Gleixner wrote:

> +	pr_info("Allowing %u present CPUs plus %u hotplug CPUs\n", assigned, disabled);
> +	if (topo_info.nr_rejected_cpus)
> +		pr_info("Rejected CPUs %u\n", topo_info.nr_rejected_cpus);
> +

I encountered the same issue that Micheal mentions in the other thread.
This is how the messages show up in a 40-cpu system with:

nr_cpus=2 (correct)
-------------------
CPU topo: Allowing 2 present CPUs plus 0 hotplug CPUs
CPU topo: Rejected CPUs 38

nr_cpus=1 (incorrect)
---------------------
CPU topo: Allowing 1 present CPUs plus 0 hotplug CPUs
CPU topo: Rejected CPUs 40

Sohil

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch v2 00/30] x86/apic: Rework APIC registration
  2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
                   ` (30 preceding siblings ...)
  2024-01-24 14:31 ` [patch v2 00/30] x86/apic: Rework APIC registration Zhang, Rui
@ 2024-02-01 22:10 ` Sohil Mehta
  31 siblings, 0 replies; 44+ messages in thread
From: Sohil Mehta @ 2024-02-01 22:10 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, K Prateek Nayak, Kan Liang,
	Zhang Rui, Paul E. McKenney, Feng Tang, Andy Shevchenko,
	Michael Kelley, Peter Zijlstra (Intel)

On 1/23/2024 5:10 AM, Thomas Gleixner wrote:

> and is available from git here:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-full-v2
> 

I have been testing these patches on an Ivy Bridge and a Kaby Lake
machine. They seem to be working fine for the most part (except for the
nr_cpus=1 issue).

I also went through the checkpatch errors/warnings for all the patches
and nothing noteworthy stood out. There were a couple of white-space
errors that might be worth fixing. I'll reply inline on the patches to
make it easier to fix.

Sohil

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch v2 25/30] x86/cpu/topology: Use topology logical mapping mechanism
  2024-01-23 13:11 ` [patch v2 25/30] x86/cpu/topology: Use topology logical mapping mechanism Thomas Gleixner
@ 2024-02-01 22:31   ` Sohil Mehta
  2024-02-02  6:45   ` Zhang, Rui
  1 sibling, 0 replies; 44+ messages in thread
From: Sohil Mehta @ 2024-02-01 22:31 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, K Prateek Nayak, Kan Liang,
	Zhang Rui, Paul E. McKenney, Feng Tang, Andy Shevchenko,
	Michael Kelley, Peter Zijlstra (Intel)

On 1/23/2024 5:11 AM, Thomas Gleixner wrote:

> --- a/arch/x86/include/asm/topology.h
> +++ b/arch/x86/include/asm/topology.h
> @@ -172,6 +172,13 @@ static inline int topology_get_logical_i
>  #define topology_core_cpumask(cpu)		(per_cpu(cpu_core_map, cpu))
>  #define topology_sibling_cpumask(cpu)		(per_cpu(cpu_sibling_map, cpu))
>  
> +

This additional new line can be avoided.

> +static inline int topology_phys_to_logical_pkg(unsigned int pkg)
> +{
> +	return topology_get_logical_id(pkg << x86_topo_system.dom_shifts[TOPO_PKG_DOMAIN],
> +				       TOPO_PKG_DOMAIN);
> +}
> +


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch v2 25/30] x86/cpu/topology: Use topology logical mapping mechanism
  2024-01-23 13:11 ` [patch v2 25/30] x86/cpu/topology: Use topology logical mapping mechanism Thomas Gleixner
  2024-02-01 22:31   ` Sohil Mehta
@ 2024-02-02  6:45   ` Zhang, Rui
  2024-02-12 16:21     ` Thomas Gleixner
  1 sibling, 1 reply; 44+ messages in thread
From: Zhang, Rui @ 2024-02-02  6:45 UTC (permalink / raw)
  To: tglx, linux-kernel
  Cc: mhklinux, jgross, x86, arjan, kprateek.nayak, Tang, Feng,
	kan.liang, thomas.lendacky, ray.huang, Mehta, Sohil, Sivanich,
	Dimitri, paulmck, andrew.cooper3, andy, peterz

Hi, Thomas,

> @@ -147,6 +148,9 @@ static void topo_set_ids(struct topo_sca
>         c->topo.pkg_id = topo_shift_apicid(apicid, TOPO_PKG_DOMAIN);
>         c->topo.die_id = topo_shift_apicid(apicid, TOPO_DIE_DOMAIN);
>  
> +       c->topo.logical_pkg_id = topology_get_logical_id(apicid,
> TOPO_PKG_DOMAIN);
> +       c->topo.logical_die_id = topology_get_logical_id(apicid,
> TOPO_DIE_DOMAIN);
> +

Just wondering if we could have logical_core_id.

drivers/hwmon/coretemp.c uses an array to save per core temperature
information. We cannot use core_id as array index because it can be
sparse. Currently, to get the temperature info for a specified core,
we need to traverse the array to know which core each entry maps to.

Ideally, we could have a global logical_core_id, and use that as the
array index directly.
This can also simplify kernel code in many places when checking if two
cpus are in the same core or not.

For now, I don't see a need to expose this info to userspace.

thanks,
rui


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch v2 25/30] x86/cpu/topology: Use topology logical mapping mechanism
  2024-02-02  6:45   ` Zhang, Rui
@ 2024-02-12 16:21     ` Thomas Gleixner
  0 siblings, 0 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-02-12 16:21 UTC (permalink / raw)
  To: Zhang, Rui, linux-kernel
  Cc: mhklinux, jgross, x86, arjan, kprateek.nayak, Tang, Feng,
	kan.liang, thomas.lendacky, ray.huang, Mehta, Sohil, Sivanich,
	Dimitri, paulmck, andrew.cooper3, andy, peterz

On Fri, Feb 02 2024 at 06:45, Rui Zhang wrote:
>> @@ -147,6 +148,9 @@ static void topo_set_ids(struct topo_sca
>>         c->topo.pkg_id = topo_shift_apicid(apicid, TOPO_PKG_DOMAIN);
>>         c->topo.die_id = topo_shift_apicid(apicid, TOPO_DIE_DOMAIN);
>>  
>> +       c->topo.logical_pkg_id = topology_get_logical_id(apicid,
>> TOPO_PKG_DOMAIN);
>> +       c->topo.logical_die_id = topology_get_logical_id(apicid,
>> TOPO_DIE_DOMAIN);
>> +
>
> Just wondering if we could have logical_core_id.
>
> drivers/hwmon/coretemp.c uses an array to save per core temperature
> information. We cannot use core_id as array index because it can be
> sparse. Currently, to get the temperature info for a specified core,
> we need to traverse the array to know which core each entry maps to.
>
> Ideally, we could have a global logical_core_id, and use that as the
> array index directly.
> This can also simplify kernel code in many places when checking if two
> cpus are in the same core or not.

That's trivial to do now :)

It's an orthogonal change and we can put it on top once this pile is
merged.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch v2 21/30] x86/cpu/topology: Use topology bitmaps for sizing
  2024-01-26 20:22     ` Thomas Gleixner
  2024-01-28 20:01       ` Paul E. McKenney
@ 2024-02-12 16:40       ` Thomas Gleixner
  2024-02-12 19:49         ` Michael Kelley
  2024-02-13 20:23         ` Sohil Mehta
  1 sibling, 2 replies; 44+ messages in thread
From: Thomas Gleixner @ 2024-02-12 16:40 UTC (permalink / raw)
  To: Zhang, Rui, linux-kernel
  Cc: Raj, Ashok, mhklinux, arjan, ray.huang, thomas.lendacky,
	andrew.cooper3, Sivanich, Dimitri, Tang, Feng, kan.liang, Mehta,
	Sohil, peterz, paulmck, kprateek.nayak, jgross, andy, x86

On Fri, Jan 26 2024 at 21:22, Thomas Gleixner wrote:
> On Fri, Jan 26 2024 at 07:07, Zhang, Rui wrote:
>> Say, on my Adlerlake platform, which has 4 Pcores with HT + 8 Ecores,
>> cnta is 12, cntb is 16, and smp_num_siblings is set to 1 erroneously.
>>
>> I think we should use
>> 	smp_num_siblings = DIV_ROUND_UP(cntb, cnta);
>> here.
>
> Indeed. That's more than obvious.

I pushed out a new version which addresses this and also the fallout
Michael and Sohil reported:

  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-full-v3

I let the robot chew on it before posting it in the next days.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [patch v2 21/30] x86/cpu/topology: Use topology bitmaps for sizing
  2024-02-12 16:40       ` Thomas Gleixner
@ 2024-02-12 19:49         ` Michael Kelley
  2024-02-13 20:23         ` Sohil Mehta
  1 sibling, 0 replies; 44+ messages in thread
From: Michael Kelley @ 2024-02-12 19:49 UTC (permalink / raw)
  To: Thomas Gleixner, Zhang, Rui, linux-kernel
  Cc: Raj, Ashok, arjan, ray.huang, thomas.lendacky, andrew.cooper3,
	Sivanich, Dimitri, Tang, Feng, kan.liang, Mehta, Sohil, peterz,
	paulmck, kprateek.nayak, jgross, andy, x86, Dexuan Cui

From: Thomas Gleixner <tglx@linutronix.de> Sent: Monday, February 12, 2024 8:41 AM
> 
> On Fri, Jan 26 2024 at 21:22, Thomas Gleixner wrote:
> > On Fri, Jan 26 2024 at 07:07, Zhang, Rui wrote:
> >> Say, on my Adlerlake platform, which has 4 Pcores with HT + 8 Ecores,
> >> cnta is 12, cntb is 16, and smp_num_siblings is set to 1 erroneously.
> >>
> >> I think we should use
> >> 	smp_num_siblings = DIV_ROUND_UP(cntb, cnta);
> >> here.
> >
> > Indeed. That's more than obvious.
> 
> I pushed out a new version which addresses this and also the fallout
> Michael and Sohil reported:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-full-v3
> 
> I let the robot chew on it before posting it in the next days.
> 

I've tested the topo-full-v3 tag on a Hyper-V guest, and can confirm
that the Rejected CPU count is now correct when nr_cpus=1 is on the
kernel boot line.  And as expected, these messages are now emitted
when a kdump kernel boots on other than CPU 0:

[    0.339622] CPU topo: Boot CPU APIC ID not the first enumerated APIC ID: 2c > 0
[    0.342993] CPU topo: Crash kernel detected. Disabling real BSP to prevent machine INIT

Earlier, I did a variety of tests on topo-full-v2, and didn't see any other
problems, but I did not repeat those tests on topo-full-v3.  So overall,

Tested-by: Michael Kelley <mhklinux@outlook.com>

Tangentially and FWIW, you probably remember our discussion last
fall about Hyper-V, where in some circumstances the APIC IDs in the 
guest ACPI tables don't match the APIC ID reported by the CPUID
instruction, resulting in Linux guest "APIC id mismatch" warnings. That
problem appears to be fixed now in the Azure public cloud, and your
full set of topo revisions no longer causes mangled scheduler domains
in those circumstances.

There may be nooks and crannies in the Azure fleet that haven't gotten
the update for whatever reason, but everything I tested is good.  The
on-premises Hyper-V as part of the latest Windows 11 has *not* been
updated.  I don't have access to a Windows Server instance to test, but
it's probably not updated either.  Hopefully they will both be updated
in the coming months.
 
Michael

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [patch v2 21/30] x86/cpu/topology: Use topology bitmaps for sizing
  2024-02-12 16:40       ` Thomas Gleixner
  2024-02-12 19:49         ` Michael Kelley
@ 2024-02-13 20:23         ` Sohil Mehta
  1 sibling, 0 replies; 44+ messages in thread
From: Sohil Mehta @ 2024-02-13 20:23 UTC (permalink / raw)
  To: Thomas Gleixner, Zhang, Rui, linux-kernel
  Cc: Raj, Ashok, mhklinux, arjan, ray.huang, thomas.lendacky,
	andrew.cooper3, Sivanich, Dimitri, Tang, Feng, kan.liang, peterz,
	paulmck, kprateek.nayak, jgross, andy, x86

On 2/12/2024 8:40 AM, Thomas Gleixner wrote:

> I pushed out a new version which addresses this and also the fallout
> Michael and Sohil reported:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-full-v3
> 

I see the nr_cpus=1 issue resolved as well with the above set.

Please feel free to add:

Tested-by: Sohil Mehta <sohil.mehta@intel.com>

Sohil

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2024-02-13 20:23 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-23 13:10 [patch v2 00/30] x86/apic: Rework APIC registration Thomas Gleixner
2024-01-23 13:10 ` [patch v2 01/30] x86/cpu/topology: Move registration out of APIC code Thomas Gleixner
2024-01-23 13:10 ` [patch v2 02/30] x86/cpu/topology: Provide separate APIC registration functions Thomas Gleixner
2024-01-23 13:10 ` [patch v2 03/30] x86/acpi: Use new " Thomas Gleixner
2024-01-23 13:10 ` [patch v2 04/30] x86/jailhouse: Use new APIC registration function Thomas Gleixner
2024-01-23 13:10 ` [patch v2 05/30] x86/of: Use new APIC registration functions Thomas Gleixner
2024-01-23 13:10 ` [patch v2 06/30] x86/mpparse: Use new APIC registration function Thomas Gleixner
2024-01-23 13:11 ` [patch v2 07/30] x86/acpi: Dont invoke topology_register_apic() for XEN PV Thomas Gleixner
2024-01-23 13:11 ` [patch v2 08/30] x86/xen/smp_pv: Register fake APICs Thomas Gleixner
2024-01-23 13:11 ` [patch v2 09/30] x86/cpu/topology: Confine topology information Thomas Gleixner
2024-01-23 13:11 ` [patch v2 10/30] x86/cpu/topology: Simplify APIC registration Thomas Gleixner
2024-01-23 13:11 ` [patch v2 11/30] x86/cpu/topology: Use a data structure for topology info Thomas Gleixner
2024-01-23 13:11 ` [patch v2 12/30] x86/smpboot: Make error message actually useful Thomas Gleixner
2024-01-23 13:11 ` [patch v2 13/30] x86/cpu/topology: Sanitize the APIC admission logic Thomas Gleixner
2024-01-23 13:11 ` [patch v2 14/30] x86/cpu/topology: Rework possible CPU management Thomas Gleixner
2024-01-31 23:47   ` Sohil Mehta
2024-01-23 13:11 ` [patch v2 15/30] x86/cpu: Detect real BSP on crash kernels Thomas Gleixner
2024-01-31 17:59   ` Michael Kelley
2024-01-23 13:11 ` [patch v2 16/30] x86/topology: Add a mechanism to track topology via APIC IDs Thomas Gleixner
2024-01-23 13:11 ` [patch v2 17/30] x86/cpu/topology: Reject unknown APIC IDs on ACPI hotplug Thomas Gleixner
2024-01-23 13:11 ` [patch v2 18/30] x86/cpu/topology: Assign hotpluggable CPUIDs during init Thomas Gleixner
2024-01-23 13:11 ` [patch v2 19/30] x86/xen/smp_pv: Count number of vCPUs early Thomas Gleixner
2024-01-23 13:11 ` [patch v2 20/30] x86/cpu/topology: Let XEN/PV use topology from CPUID/MADT Thomas Gleixner
2024-01-23 13:11 ` [patch v2 21/30] x86/cpu/topology: Use topology bitmaps for sizing Thomas Gleixner
2024-01-26  7:07   ` Zhang, Rui
2024-01-26 20:22     ` Thomas Gleixner
2024-01-28 20:01       ` Paul E. McKenney
2024-02-12 16:40       ` Thomas Gleixner
2024-02-12 19:49         ` Michael Kelley
2024-02-13 20:23         ` Sohil Mehta
2024-01-23 13:11 ` [patch v2 22/30] x86/cpu/topology: Mop up primary thread mask handling Thomas Gleixner
2024-01-23 13:11 ` [patch v2 23/30] x86/cpu/topology: Simplify cpu_mark_primary_thread() Thomas Gleixner
2024-01-23 13:11 ` [patch v2 24/30] x86/cpu/topology: Provide logical pkg/die mapping Thomas Gleixner
2024-01-23 13:11 ` [patch v2 25/30] x86/cpu/topology: Use topology logical mapping mechanism Thomas Gleixner
2024-02-01 22:31   ` Sohil Mehta
2024-02-02  6:45   ` Zhang, Rui
2024-02-12 16:21     ` Thomas Gleixner
2024-01-23 13:11 ` [patch v2 26/30] x86/cpu/topology: Retrieve cores per package from topology bitmaps Thomas Gleixner
2024-01-23 13:11 ` [patch v2 27/30] x86/cpu/topology: Rename smp_num_siblings Thomas Gleixner
2024-01-23 13:11 ` [patch v2 28/30] x86/cpu/topology: Rename topology_max_die_per_package() Thomas Gleixner
2024-01-23 13:11 ` [patch v2 29/30] x86/cpu/topology: Provide __num_[cores|threads]_per_package Thomas Gleixner
2024-01-23 13:11 ` [patch v2 30/30] x86/cpu/topology: Get rid of cpuinfo::x86_max_cores Thomas Gleixner
2024-01-24 14:31 ` [patch v2 00/30] x86/apic: Rework APIC registration Zhang, Rui
2024-02-01 22:10 ` Sohil Mehta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).