linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch V3 00/40] x86/cpu: Rework the topology evaluation
@ 2023-08-02 10:20 Thomas Gleixner
  2023-08-02 10:20 ` [patch V3 01/40] cpu/SMT: Make SMT control more robust against enumeration failures Thomas Gleixner
                   ` (42 more replies)
  0 siblings, 43 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:20 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

Hi!

This is the follow up to V2:

  https://lore.kernel.org/lkml/20230728105650.565799744@linutronix.de

which addresses the review feedback and some fallout reported on and
off-list.

TLDR:

This reworks the way how topology information is evaluated via CPUID
in preparation for a larger topology management overhaul to address
shortcomings of the current code vs. hybrid systems and systems which make
use of the extended topology domains in leaf 0x1f. Aside of that it's an
overdue spring cleaning to get rid of accumulated layers of duct tape and
haywire.

What changed vs. V2:

  - Decoded and fixed the fallout vs. XEN/PV reported by Juergen. Thanks to
    Juergen for the remote hand debugging sessions!

    That's addressed in the first two new patches in this series. Summary:
    XEN/PV booted by pure chance since the addition of SMT control 5 years
    ago.

  - Fixed the off by one in the AMD parser which was debugged by Michael

  - Addressed review comments from various people

As discussed in:

  https://lore.kernel.org/lkml/BYAPR21MB16889FD224344B1B28BE22A1D705A@BYAPR21MB1688.namprd21.prod.outlook.com
  ....
  https://lore.kernel.org/lkml/87r0omjt8c.ffs@tglx

this series unfortunately brings the Hyper-V BIOS inconsistency into
effect, which results in a slight performance impact. The L3 association
which "worked" so far by exploiting the inconsistency of the Linux topology
code is not longer supportable as we really need to get the actual short
comings of our topology management addressed in a consistent way.

The series is based on V3 of the APIC cleanup series:

  https://lore.kernel.org/lkml/20230801103042.936020332@linutronix.de

and also available on top of that from git:

 git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-cpuid-v3

Thanks,

	tglx
---
 arch/x86/kernel/cpu/topology.c              |  168 -------------------
 b/Documentation/arch/x86/topology.rst       |   12 -
 b/arch/x86/events/amd/core.c                |    2 
 b/arch/x86/events/amd/uncore.c              |    2 
 b/arch/x86/events/intel/uncore.c            |    2 
 b/arch/x86/hyperv/hv_vtl.c                  |    2 
 b/arch/x86/include/asm/apic.h               |   32 +--
 b/arch/x86/include/asm/cacheinfo.h          |    3 
 b/arch/x86/include/asm/cpuid.h              |   36 ++++
 b/arch/x86/include/asm/mpspec.h             |    2 
 b/arch/x86/include/asm/processor.h          |   60 ++++---
 b/arch/x86/include/asm/smp.h                |    4 
 b/arch/x86/include/asm/topology.h           |   51 +++++
 b/arch/x86/include/asm/x86_init.h           |    2 
 b/arch/x86/kernel/acpi/boot.c               |    4 
 b/arch/x86/kernel/amd_nb.c                  |    8 
 b/arch/x86/kernel/apic/apic.c               |   25 +-
 b/arch/x86/kernel/apic/apic_common.c        |    4 
 b/arch/x86/kernel/apic/apic_flat_64.c       |   13 -
 b/arch/x86/kernel/apic/apic_noop.c          |    9 -
 b/arch/x86/kernel/apic/apic_numachip.c      |   21 --
 b/arch/x86/kernel/apic/bigsmp_32.c          |   10 -
 b/arch/x86/kernel/apic/local.h              |    6 
 b/arch/x86/kernel/apic/probe_32.c           |   10 -
 b/arch/x86/kernel/apic/x2apic_cluster.c     |    1 
 b/arch/x86/kernel/apic/x2apic_phys.c        |   10 -
 b/arch/x86/kernel/apic/x2apic_uv_x.c        |   67 +------
 b/arch/x86/kernel/cpu/Makefile              |    5 
 b/arch/x86/kernel/cpu/amd.c                 |  156 ------------------
 b/arch/x86/kernel/cpu/cacheinfo.c           |   51 ++---
 b/arch/x86/kernel/cpu/centaur.c             |    4 
 b/arch/x86/kernel/cpu/common.c              |  111 +-----------
 b/arch/x86/kernel/cpu/cpu.h                 |   14 +
 b/arch/x86/kernel/cpu/debugfs.c             |   97 +++++++++++
 b/arch/x86/kernel/cpu/hygon.c               |  133 ---------------
 b/arch/x86/kernel/cpu/intel.c               |   38 ----
 b/arch/x86/kernel/cpu/mce/amd.c             |    4 
 b/arch/x86/kernel/cpu/mce/apei.c            |    4 
 b/arch/x86/kernel/cpu/mce/core.c            |    4 
 b/arch/x86/kernel/cpu/mce/inject.c          |    7 
 b/arch/x86/kernel/cpu/proc.c                |    8 
 b/arch/x86/kernel/cpu/topology.h            |   51 +++++
 b/arch/x86/kernel/cpu/topology_amd.c        |  179 ++++++++++++++++++++
 b/arch/x86/kernel/cpu/topology_common.c     |  240 ++++++++++++++++++++++++++++
 b/arch/x86/kernel/cpu/topology_ext.c        |  136 +++++++++++++++
 b/arch/x86/kernel/cpu/zhaoxin.c             |   18 --
 b/arch/x86/kernel/kvm.c                     |    6 
 b/arch/x86/kernel/sev.c                     |    2 
 b/arch/x86/kernel/smpboot.c                 |   97 ++++++-----
 b/arch/x86/kernel/vsmp_64.c                 |   13 -
 b/arch/x86/mm/amdtopology.c                 |   35 +---
 b/arch/x86/mm/numa.c                        |    4 
 b/arch/x86/xen/apic.c                       |   14 -
 b/arch/x86/xen/smp_pv.c                     |    3 
 b/drivers/edac/amd64_edac.c                 |    4 
 b/drivers/edac/mce_amd.c                    |    4 
 b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c |    2 
 b/drivers/hwmon/fam15h_power.c              |    7 
 b/drivers/scsi/lpfc/lpfc_init.c             |    8 
 b/drivers/virt/acrn/hsm.c                   |    2 
 b/kernel/cpu.c                              |    6 
 61 files changed, 1077 insertions(+), 956 deletions(-)



^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 01/40] cpu/SMT: Make SMT control more robust against enumeration failures
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
@ 2023-08-02 10:20 ` Thomas Gleixner
  2023-08-04 17:50   ` Borislav Petkov
  2023-08-02 10:21 ` [patch V3 02/40] x86/apic: Fake primary thread mask for XEN/PV Thomas Gleixner
                   ` (41 subsequent siblings)
  42 siblings, 1 reply; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:20 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

The SMT control mechanism got added as speculation attack vector
mitigation. The implemented logic relies on the primary thread mask to
be set up properly.

This turns out to be an issue with XEN/PV guests because their CPU hotplug
mechanics do not enumerate APICs and therefore the mask is never correctly
populated.

This went unnoticed so far because by chance XEN/PV ends up with
smp_num_siblings == 2. So smt_hotplug_control stays at its default value
CPU_SMT_ENABLED and the primary thread mask is never evaluated in the
context of CPU hotplug.

This stopped "working" with the upcoming overhaul of the topology
evaluation which legitimately provides a fake topology for XEN/PV. That
sets smp_num_siblings to 1, which causes the core CPU hot-plug core to
refuse to bring up the APs.

This happens because smt_hotplug_control is set to CPU_SMT_NOT_SUPPORTED
which causes cpu_smt_allowed() to evaluate the unpopulated primary thread
mask with the conclusion that all non-boot CPUs are not valid to be
plugged.

Make cpu_smt_allowed() more robust and take CPU_SMT_NOT_SUPPORTED and
CPU_SMT_NOT_IMPLEMENTED into account.

The primary mask issue on x86 XEN/PV needs to be addressed separately as
there are users outside of the CPU hotplug code too.

Fixes: 05736e4ac13c ("cpu/hotplug: Provide knobs to control SMT")
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/cpu.c |    6 ++++++
 1 file changed, 6 insertions(+)
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -630,6 +630,12 @@ static inline bool cpu_smt_allowed(unsig
 	if (cpu_smt_control == CPU_SMT_ENABLED)
 		return true;
 
+	if (cpu_smt_control == CPU_SMT_NOT_SUPPORTED)
+		return true;
+
+	if (cpu_smt_control == CPU_SMT_NOT_IMPLEMENTED)
+		return true;
+
 	if (topology_is_primary_thread(cpu))
 		return true;
 


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 02/40] x86/apic: Fake primary thread mask for XEN/PV
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
  2023-08-02 10:20 ` [patch V3 01/40] cpu/SMT: Make SMT control more robust against enumeration failures Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-04 18:12   ` Borislav Petkov
  2023-08-02 10:21 ` [patch V3 03/40] x86/cpu: Encapsulate topology information in cpuinfo_x86 Thomas Gleixner
                   ` (40 subsequent siblings)
  42 siblings, 1 reply; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

The SMT control mechanism got added as speculation attack vector
mitigation. The implemented logic relies on the primary thread mask to
be set up properly.

This turns out to be an issue with XEN/PV guests because their CPU hotplug
mechanics do not enumerate APICs and therefore the mask is never correctly
populated.

This went unnoticed so far because by chance XEN/PV ends up with
smp_num_siblings == 2. So smt_hot-plug_control stays at its default value
CPU_SMT_ENABLED and the primary thread mask is never evaluated in the
context of CPU hotplug.

This stopped "working" with the upcoming overhaul of the topology
evaluation which legitimately provides a fake topology for XEN/PV. That
sets smp_num_siblings to 1, which causes the core CPU hot-plug core to
refuse to bring up the APs.

This happens because smt_hotplug_control is set to CPU_SMT_NOT_SUPPORTED
which causes cpu_smt_allowed() to evaluate the unpopulated primary thread
mask with the conclusion that all non-boot CPUs are not valid to be
plugged.

The core code has already been made more robust against this kind of fail,
but the primary thread mask really wants to be populated to avoid other
issues all over the place.

Just fake the mask by pretending that all XEN/PV vCPUs are primary threads,
which is consistent because all of XEN/PVs topology is fake or non-existent.

Fixes: 6a4d2657e048 ("x86/smp: Provide topology_is_primary_thread()")
Fixes: f54d4434c281 ("x86/apic: Provide cpu_primary_thread mask")
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/apic/apic.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -36,6 +36,8 @@
 #include <linux/smp.h>
 #include <linux/mm.h>
 
+#include <xen/xen.h>
+
 #include <asm/trace/irq_vectors.h>
 #include <asm/irq_remapping.h>
 #include <asm/pc-conf-reg.h>
@@ -2344,6 +2346,15 @@ static int __init smp_init_primary_threa
 {
 	unsigned int cpu;
 
+	/*
+	 * XEN/PV provides either none or useless topology information.
+	 * Pretend that all vCPUs are primary threads.
+	 */
+	if (xen_pv_domain()) {
+		cpumask_copy(&__cpu_primary_thread_mask, cpu_possible_mask);
+		return 0;
+	}
+
 	for (cpu = 0; cpu < nr_logical_cpuids; cpu++)
 		cpu_mark_primary_thread(cpu, cpuid_to_apicid[cpu]);
 	return 0;


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 03/40] x86/cpu: Encapsulate topology information in cpuinfo_x86
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
  2023-08-02 10:20 ` [patch V3 01/40] cpu/SMT: Make SMT control more robust against enumeration failures Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 02/40] x86/apic: Fake primary thread mask for XEN/PV Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 04/40] x86/cpu: Move phys_proc_id into topology info Thomas Gleixner
                   ` (39 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

The topology related information is randomly scattered across cpuinfo_x86.

Create a new structure cpuinfo_topo and move in a first step initial_apicid
and apicid into it.

Aside of being better readable this is in preparation for replacing the
horribly fragile CPU topology evaluation code further down the road.

Consolidate APIC ID fields to u32 as that represents the hardware type.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/processor.h          |   14 +++++++++-----
 arch/x86/kernel/cpu/amd.c                 |   10 +++++-----
 arch/x86/kernel/cpu/cacheinfo.c           |   20 ++++++++++----------
 arch/x86/kernel/cpu/common.c              |   18 +++++++++---------
 arch/x86/kernel/cpu/hygon.c               |   12 ++++++------
 arch/x86/kernel/cpu/mce/apei.c            |    2 +-
 arch/x86/kernel/cpu/mce/core.c            |    2 +-
 arch/x86/kernel/cpu/proc.c                |    4 ++--
 arch/x86/kernel/cpu/topology.c            |   12 ++++++------
 arch/x86/xen/apic.c                       |    2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c |    2 +-
 drivers/virt/acrn/hsm.c                   |    2 +-
 12 files changed, 52 insertions(+), 48 deletions(-)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -74,11 +74,16 @@ extern u16 __read_mostly tlb_lld_4m[NR_I
 extern u16 __read_mostly tlb_lld_1g[NR_INFO];
 
 /*
- *  CPU type and hardware bug flags. Kept separately for each CPU.
- *  Members of this structure are referenced in head_32.S, so think twice
- *  before touching them. [mj]
+ * CPU type and hardware bug flags. Kept separately for each CPU.
  */
 
+struct cpuinfo_topology {
+	// Real APIC ID read from the local APIC
+	u32			apicid;
+	// The initial APIC ID provided by CPUID
+	u32			initial_apicid;
+};
+
 struct cpuinfo_x86 {
 	__u8			x86;		/* CPU family */
 	__u8			x86_vendor;	/* CPU vendor */
@@ -111,6 +116,7 @@ struct cpuinfo_x86 {
 	};
 	char			x86_vendor_id[16];
 	char			x86_model_id[64];
+	struct cpuinfo_topology	topo;
 	/* in KB - valid for CPUS which support this call: */
 	unsigned int		x86_cache_size;
 	int			x86_cache_alignment;	/* In bytes */
@@ -124,8 +130,6 @@ struct cpuinfo_x86 {
 	u64			ppin;
 	/* cpuid returned max cores value: */
 	u16			x86_max_cores;
-	u16			apicid;
-	u16			initial_apicid;
 	u16			x86_clflush_size;
 	/* number of cores as seen by the OS: */
 	u16			booted_cores;
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -387,9 +387,9 @@ static void amd_detect_cmp(struct cpuinf
 
 	bits = c->x86_coreid_bits;
 	/* Low order bits define the core id (index of core in socket) */
-	c->cpu_core_id = c->initial_apicid & ((1 << bits)-1);
+	c->cpu_core_id = c->topo.initial_apicid & ((1 << bits)-1);
 	/* Convert the initial APIC ID into the socket ID */
-	c->phys_proc_id = c->initial_apicid >> bits;
+	c->phys_proc_id = c->topo.initial_apicid >> bits;
 	/* use socket ID also for last level cache */
 	per_cpu(cpu_llc_id, cpu) = c->cpu_die_id = c->phys_proc_id;
 }
@@ -405,7 +405,7 @@ static void srat_detect_node(struct cpui
 #ifdef CONFIG_NUMA
 	int cpu = smp_processor_id();
 	int node;
-	unsigned apicid = c->apicid;
+	unsigned apicid = c->topo.apicid;
 
 	node = numa_cpu_node(cpu);
 	if (node == NUMA_NO_NODE)
@@ -439,7 +439,7 @@ static void srat_detect_node(struct cpui
 		 * through CPU mapping may alter the outcome, directly
 		 * access __apicid_to_node[].
 		 */
-		int ht_nodeid = c->initial_apicid;
+		int ht_nodeid = c->topo.initial_apicid;
 
 		if (__apicid_to_node[ht_nodeid] != NUMA_NO_NODE)
 			node = __apicid_to_node[ht_nodeid];
@@ -934,7 +934,7 @@ static void init_amd(struct cpuinfo_x86
 		set_cpu_cap(c, X86_FEATURE_FSRS);
 
 	/* get apicid instead of initial apic id from cpuid */
-	c->apicid = read_apic_id();
+	c->topo.apicid = read_apic_id();
 
 	/* K6s reports MCEs but don't actually have all the MSRs */
 	if (c->x86 < 6)
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -678,7 +678,7 @@ void cacheinfo_amd_init_llc_id(struct cp
 		 * LLC is at the core complex level.
 		 * Core complex ID is ApicId[3] for these processors.
 		 */
-		per_cpu(cpu_llc_id, cpu) = c->apicid >> 3;
+		per_cpu(cpu_llc_id, cpu) = c->topo.apicid >> 3;
 	} else {
 		/*
 		 * LLC ID is calculated from the number of threads sharing the
@@ -694,7 +694,7 @@ void cacheinfo_amd_init_llc_id(struct cp
 		if (num_sharing_cache) {
 			int bits = get_count_order(num_sharing_cache);
 
-			per_cpu(cpu_llc_id, cpu) = c->apicid >> bits;
+			per_cpu(cpu_llc_id, cpu) = c->topo.apicid >> bits;
 		}
 	}
 }
@@ -712,7 +712,7 @@ void cacheinfo_hygon_init_llc_id(struct
 	 * LLC is at the core complex level.
 	 * Core complex ID is ApicId[3] for these processors.
 	 */
-	per_cpu(cpu_llc_id, cpu) = c->apicid >> 3;
+	per_cpu(cpu_llc_id, cpu) = c->topo.apicid >> 3;
 }
 
 void init_amd_cacheinfo(struct cpuinfo_x86 *c)
@@ -776,13 +776,13 @@ void init_intel_cacheinfo(struct cpuinfo
 				new_l2 = this_leaf.size/1024;
 				num_threads_sharing = 1 + this_leaf.eax.split.num_threads_sharing;
 				index_msb = get_count_order(num_threads_sharing);
-				l2_id = c->apicid & ~((1 << index_msb) - 1);
+				l2_id = c->topo.apicid & ~((1 << index_msb) - 1);
 				break;
 			case 3:
 				new_l3 = this_leaf.size/1024;
 				num_threads_sharing = 1 + this_leaf.eax.split.num_threads_sharing;
 				index_msb = get_count_order(num_threads_sharing);
-				l3_id = c->apicid & ~((1 << index_msb) - 1);
+				l3_id = c->topo.apicid & ~((1 << index_msb) - 1);
 				break;
 			default:
 				break;
@@ -915,7 +915,7 @@ static int __cache_amd_cpumap_setup(unsi
 		unsigned int apicid, nshared, first, last;
 
 		nshared = base->eax.split.num_threads_sharing + 1;
-		apicid = cpu_data(cpu).apicid;
+		apicid = cpu_data(cpu).topo.apicid;
 		first = apicid - (apicid % nshared);
 		last = first + nshared - 1;
 
@@ -924,14 +924,14 @@ static int __cache_amd_cpumap_setup(unsi
 			if (!this_cpu_ci->info_list)
 				continue;
 
-			apicid = cpu_data(i).apicid;
+			apicid = cpu_data(i).topo.apicid;
 			if ((apicid < first) || (apicid > last))
 				continue;
 
 			this_leaf = this_cpu_ci->info_list + index;
 
 			for_each_online_cpu(sibling) {
-				apicid = cpu_data(sibling).apicid;
+				apicid = cpu_data(sibling).topo.apicid;
 				if ((apicid < first) || (apicid > last))
 					continue;
 				cpumask_set_cpu(sibling,
@@ -969,7 +969,7 @@ static void __cache_cpumap_setup(unsigne
 	index_msb = get_count_order(num_threads_sharing);
 
 	for_each_online_cpu(i)
-		if (cpu_data(i).apicid >> index_msb == c->apicid >> index_msb) {
+		if (cpu_data(i).topo.apicid >> index_msb == c->topo.apicid >> index_msb) {
 			struct cpu_cacheinfo *sib_cpu_ci = get_cpu_cacheinfo(i);
 
 			if (i == cpu || !sib_cpu_ci->info_list)
@@ -1024,7 +1024,7 @@ static void get_cache_id(int cpu, struct
 
 	num_threads_sharing = 1 + id4_regs->eax.split.num_threads_sharing;
 	index_msb = get_count_order(num_threads_sharing);
-	id4_regs->id = c->apicid >> index_msb;
+	id4_regs->id = c->topo.apicid >> index_msb;
 }
 
 int populate_cache_leaves(unsigned int cpu)
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -899,7 +899,7 @@ void detect_ht(struct cpuinfo_x86 *c)
 		return;
 
 	index_msb = get_count_order(smp_num_siblings);
-	c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid, index_msb);
+	c->phys_proc_id = apic->phys_pkg_id(c->topo.initial_apicid, index_msb);
 
 	smp_num_siblings = smp_num_siblings / c->x86_max_cores;
 
@@ -907,7 +907,7 @@ void detect_ht(struct cpuinfo_x86 *c)
 
 	core_bits = get_count_order(c->x86_max_cores);
 
-	c->cpu_core_id = apic->phys_pkg_id(c->initial_apicid, index_msb) &
+	c->cpu_core_id = apic->phys_pkg_id(c->topo.initial_apicid, index_msb) &
 				       ((1 << core_bits) - 1);
 #endif
 }
@@ -1721,15 +1721,15 @@ static void generic_identify(struct cpui
 	get_cpu_address_sizes(c);
 
 	if (c->cpuid_level >= 0x00000001) {
-		c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF;
+		c->topo.initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF;
 #ifdef CONFIG_X86_32
 # ifdef CONFIG_SMP
-		c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);
+		c->topo.apicid = apic->phys_pkg_id(c->topo.initial_apicid, 0);
 # else
-		c->apicid = c->initial_apicid;
+		c->topo.apicid = c->topo.initial_apicid;
 # endif
 #endif
-		c->phys_proc_id = c->initial_apicid;
+		c->phys_proc_id = c->topo.initial_apicid;
 	}
 
 	get_model_name(c); /* Default name */
@@ -1763,9 +1763,9 @@ static void validate_apic_and_package_id
 
 	apicid = apic->cpu_present_to_apicid(cpu);
 
-	if (apicid != c->apicid) {
+	if (apicid != c->topo.apicid) {
 		pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x APIC: %x\n",
-		       cpu, apicid, c->initial_apicid);
+		       cpu, apicid, c->topo.initial_apicid);
 	}
 	BUG_ON(topology_update_package_map(c->phys_proc_id, cpu));
 	BUG_ON(topology_update_die_map(c->cpu_die_id, cpu));
@@ -1815,7 +1815,7 @@ static void identify_cpu(struct cpuinfo_
 	apply_forced_caps(c);
 
 #ifdef CONFIG_X86_64
-	c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);
+	c->topo.apicid = apic->phys_pkg_id(c->topo.initial_apicid, 0);
 #endif
 
 	/*
--- a/arch/x86/kernel/cpu/hygon.c
+++ b/arch/x86/kernel/cpu/hygon.c
@@ -88,7 +88,7 @@ static void hygon_get_topology(struct cp
 			c->x86_coreid_bits = get_count_order(c->x86_max_cores);
 
 		/* Socket ID is ApicId[6] for these processors. */
-		c->phys_proc_id = c->apicid >> APICID_SOCKET_ID_BIT;
+		c->phys_proc_id = c->topo.apicid >> APICID_SOCKET_ID_BIT;
 
 		cacheinfo_hygon_init_llc_id(c, cpu);
 	} else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
@@ -116,9 +116,9 @@ static void hygon_detect_cmp(struct cpui
 
 	bits = c->x86_coreid_bits;
 	/* Low order bits define the core id (index of core in socket) */
-	c->cpu_core_id = c->initial_apicid & ((1 << bits)-1);
+	c->cpu_core_id = c->topo.initial_apicid & ((1 << bits)-1);
 	/* Convert the initial APIC ID into the socket ID */
-	c->phys_proc_id = c->initial_apicid >> bits;
+	c->phys_proc_id = c->topo.initial_apicid >> bits;
 	/* use socket ID also for last level cache */
 	per_cpu(cpu_llc_id, cpu) = c->cpu_die_id = c->phys_proc_id;
 }
@@ -128,7 +128,7 @@ static void srat_detect_node(struct cpui
 #ifdef CONFIG_NUMA
 	int cpu = smp_processor_id();
 	int node;
-	unsigned int apicid = c->apicid;
+	unsigned int apicid = c->topo.apicid;
 
 	node = numa_cpu_node(cpu);
 	if (node == NUMA_NO_NODE)
@@ -161,7 +161,7 @@ static void srat_detect_node(struct cpui
 		 * through CPU mapping may alter the outcome, directly
 		 * access __apicid_to_node[].
 		 */
-		int ht_nodeid = c->initial_apicid;
+		int ht_nodeid = c->topo.initial_apicid;
 
 		if (__apicid_to_node[ht_nodeid] != NUMA_NO_NODE)
 			node = __apicid_to_node[ht_nodeid];
@@ -301,7 +301,7 @@ static void init_hygon(struct cpuinfo_x8
 	set_cpu_cap(c, X86_FEATURE_REP_GOOD);
 
 	/* get apicid instead of initial apic id from cpuid */
-	c->apicid = read_apic_id();
+	c->topo.apicid = read_apic_id();
 
 	/*
 	 * XXX someone from Hygon needs to confirm this DTRT
--- a/arch/x86/kernel/cpu/mce/apei.c
+++ b/arch/x86/kernel/cpu/mce/apei.c
@@ -103,7 +103,7 @@ int apei_smca_report_x86_error(struct cp
 	m.socketid = -1;
 
 	for_each_possible_cpu(cpu) {
-		if (cpu_data(cpu).initial_apicid == lapic_id) {
+		if (cpu_data(cpu).topo.initial_apicid == lapic_id) {
 			m.extcpu = cpu;
 			m.socketid = cpu_data(m.extcpu).phys_proc_id;
 			break;
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -124,7 +124,7 @@ void mce_setup(struct mce *m)
 	m->cpuvendor = boot_cpu_data.x86_vendor;
 	m->cpuid = cpuid_eax(1);
 	m->socketid = cpu_data(m->extcpu).phys_proc_id;
-	m->apicid = cpu_data(m->extcpu).initial_apicid;
+	m->apicid = cpu_data(m->extcpu).topo.initial_apicid;
 	m->mcgcap = __rdmsr(MSR_IA32_MCG_CAP);
 	m->ppin = cpu_data(m->extcpu).ppin;
 	m->microcode = boot_cpu_data.microcode;
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -23,8 +23,8 @@ static void show_cpuinfo_core(struct seq
 		   cpumask_weight(topology_core_cpumask(cpu)));
 	seq_printf(m, "core id\t\t: %d\n", c->cpu_core_id);
 	seq_printf(m, "cpu cores\t: %d\n", c->booted_cores);
-	seq_printf(m, "apicid\t\t: %d\n", c->apicid);
-	seq_printf(m, "initial apicid\t: %d\n", c->initial_apicid);
+	seq_printf(m, "apicid\t\t: %d\n", c->topo.apicid);
+	seq_printf(m, "initial apicid\t: %d\n", c->topo.initial_apicid);
 #endif
 }
 
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -78,7 +78,7 @@ int detect_extended_topology_early(struc
 	/*
 	 * initial apic id, which also represents 32-bit extended x2apic id.
 	 */
-	c->initial_apicid = edx;
+	c->topo.initial_apicid = edx;
 	smp_num_siblings = max_t(int, smp_num_siblings, LEVEL_MAX_SIBLINGS(ebx));
 #endif
 	return 0;
@@ -108,7 +108,7 @@ int detect_extended_topology(struct cpui
 	 * Populate HT related information from sub-leaf level 0.
 	 */
 	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
-	c->initial_apicid = edx;
+	c->topo.initial_apicid = edx;
 	core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
 	smp_num_siblings = max_t(int, smp_num_siblings, LEVEL_MAX_SIBLINGS(ebx));
 	core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
@@ -146,20 +146,20 @@ int detect_extended_topology(struct cpui
 	die_select_mask = (~(-1 << die_plus_mask_width)) >>
 				core_plus_mask_width;
 
-	c->cpu_core_id = apic->phys_pkg_id(c->initial_apicid,
+	c->cpu_core_id = apic->phys_pkg_id(c->topo.initial_apicid,
 				ht_mask_width) & core_select_mask;
 
 	if (die_level_present) {
-		c->cpu_die_id = apic->phys_pkg_id(c->initial_apicid,
+		c->cpu_die_id = apic->phys_pkg_id(c->topo.initial_apicid,
 					core_plus_mask_width) & die_select_mask;
 	}
 
-	c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid,
+	c->phys_proc_id = apic->phys_pkg_id(c->topo.initial_apicid,
 				pkg_mask_width);
 	/*
 	 * Reinit the apicid, now that we have extended initial_apicid.
 	 */
-	c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);
+	c->topo.apicid = apic->phys_pkg_id(c->topo.initial_apicid, 0);
 
 	c->x86_max_cores = (core_level_siblings / smp_num_siblings);
 	__max_die_per_package = (die_level_siblings / core_level_siblings);
--- a/arch/x86/xen/apic.c
+++ b/arch/x86/xen/apic.c
@@ -118,7 +118,7 @@ static int xen_phys_pkg_id(int initial_a
 static int xen_cpu_present_to_apicid(int cpu)
 {
 	if (cpu_present(cpu))
-		return cpu_data(cpu).apicid;
+		return cpu_data(cpu).topo.apicid;
 	else
 		return BAD_APICID;
 }
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -2255,7 +2255,7 @@ static int kfd_cpumask_to_apic_id(const
 	if (first_cpu_of_numa_node >= nr_cpu_ids)
 		return -1;
 #ifdef CONFIG_X86_64
-	return cpu_data(first_cpu_of_numa_node).apicid;
+	return cpu_data(first_cpu_of_numa_node).topo.apicid;
 #else
 	return first_cpu_of_numa_node;
 #endif
--- a/drivers/virt/acrn/hsm.c
+++ b/drivers/virt/acrn/hsm.c
@@ -447,7 +447,7 @@ static ssize_t remove_cpu_store(struct d
 	if (cpu_online(cpu))
 		remove_cpu(cpu);
 
-	lapicid = cpu_data(cpu).apicid;
+	lapicid = cpu_data(cpu).topo.apicid;
 	dev_dbg(dev, "Try to remove cpu %lld with lapicid %lld\n", cpu, lapicid);
 	ret = hcall_sos_remove_cpu(lapicid);
 	if (ret < 0) {


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 04/40] x86/cpu: Move phys_proc_id into topology info
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (2 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 03/40] x86/cpu: Encapsulate topology information in cpuinfo_x86 Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 05/40] x86/cpu: Move cpu_die_id " Thomas Gleixner
                   ` (38 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

Rename it to pkg_id which is the terminology used in the kernel.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 Documentation/arch/x86/topology.rst  |    2 +-
 arch/x86/include/asm/processor.h     |    5 +++--
 arch/x86/include/asm/topology.h      |    2 +-
 arch/x86/include/asm/x86_init.h      |    2 +-
 arch/x86/kernel/apic/apic_numachip.c |    2 +-
 arch/x86/kernel/cpu/amd.c            |    4 ++--
 arch/x86/kernel/cpu/cacheinfo.c      |    4 ++--
 arch/x86/kernel/cpu/common.c         |    6 +++---
 arch/x86/kernel/cpu/hygon.c          |    6 +++---
 arch/x86/kernel/cpu/mce/apei.c       |    2 +-
 arch/x86/kernel/cpu/mce/core.c       |    2 +-
 arch/x86/kernel/cpu/proc.c           |    2 +-
 arch/x86/kernel/cpu/topology.c       |    3 +--
 arch/x86/kernel/smpboot.c            |   16 ++++++++--------
 drivers/scsi/lpfc/lpfc_init.c        |    6 +-----
 15 files changed, 30 insertions(+), 34 deletions(-)

--- a/Documentation/arch/x86/topology.rst
+++ b/Documentation/arch/x86/topology.rst
@@ -59,7 +59,7 @@ AMD nomenclature for package is 'Node'.
 
     The physical ID of the die. This information is retrieved via CPUID.
 
-  - cpuinfo_x86.phys_proc_id:
+  - cpuinfo_x86.topo.pkg_id:
 
     The physical ID of the package. This information is retrieved via CPUID
     and deduced from the APIC IDs of the cores in the package.
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -82,6 +82,9 @@ struct cpuinfo_topology {
 	u32			apicid;
 	// The initial APIC ID provided by CPUID
 	u32			initial_apicid;
+
+	// Physical package ID
+	u32			pkg_id;
 };
 
 struct cpuinfo_x86 {
@@ -133,8 +136,6 @@ struct cpuinfo_x86 {
 	u16			x86_clflush_size;
 	/* number of cores as seen by the OS: */
 	u16			booted_cores;
-	/* Physical processor id: */
-	u16			phys_proc_id;
 	/* Logical processor id: */
 	u16			logical_proc_id;
 	/* Core id: */
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -106,7 +106,7 @@ extern const struct cpumask *cpu_coregro
 extern const struct cpumask *cpu_clustergroup_mask(int cpu);
 
 #define topology_logical_package_id(cpu)	(cpu_data(cpu).logical_proc_id)
-#define topology_physical_package_id(cpu)	(cpu_data(cpu).phys_proc_id)
+#define topology_physical_package_id(cpu)	(cpu_data(cpu).topo.pkg_id)
 #define topology_logical_die_id(cpu)		(cpu_data(cpu).logical_die_id)
 #define topology_die_id(cpu)			(cpu_data(cpu).cpu_die_id)
 #define topology_core_id(cpu)			(cpu_data(cpu).cpu_core_id)
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -177,7 +177,7 @@ struct x86_init_ops {
  * struct x86_cpuinit_ops - platform specific cpu hotplug setups
  * @setup_percpu_clockev:	set up the per cpu clock event device
  * @early_percpu_clock_init:	early init of the per cpu clock event device
- * @fixup_cpu_id:		fixup function for cpuinfo_x86::phys_proc_id
+ * @fixup_cpu_id:		fixup function for cpuinfo_x86::topo.pkg_id
  * @parallel_bringup:		Parallel bringup control
  */
 struct x86_cpuinit_ops {
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -169,7 +169,7 @@ static void fixup_cpu_id(struct cpuinfo_
 		nodes = ((val >> 3) & 7) + 1;
 	}
 
-	c->phys_proc_id = node / nodes;
+	c->topo.pkg_id = node / nodes;
 }
 
 static int __init numachip_system_init(void)
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -389,9 +389,9 @@ static void amd_detect_cmp(struct cpuinf
 	/* Low order bits define the core id (index of core in socket) */
 	c->cpu_core_id = c->topo.initial_apicid & ((1 << bits)-1);
 	/* Convert the initial APIC ID into the socket ID */
-	c->phys_proc_id = c->topo.initial_apicid >> bits;
+	c->topo.pkg_id = c->topo.initial_apicid >> bits;
 	/* use socket ID also for last level cache */
-	per_cpu(cpu_llc_id, cpu) = c->cpu_die_id = c->phys_proc_id;
+	per_cpu(cpu_llc_id, cpu) = c->cpu_die_id = c->topo.pkg_id;
 }
 
 u32 amd_get_nodes_per_socket(void)
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -875,10 +875,10 @@ void init_intel_cacheinfo(struct cpuinfo
 	 * turns means that the only possibility is SMT (as indicated in
 	 * cpuid1). Since cpuid2 doesn't specify shared caches, and we know
 	 * that SMT shares all caches, we can unconditionally set cpu_llc_id to
-	 * c->phys_proc_id.
+	 * c->topo.pkg_id.
 	 */
 	if (per_cpu(cpu_llc_id, cpu) == BAD_APICID)
-		per_cpu(cpu_llc_id, cpu) = c->phys_proc_id;
+		per_cpu(cpu_llc_id, cpu) = c->topo.pkg_id;
 #endif
 
 	c->x86_cache_size = l3 ? l3 : (l2 ? l2 : (l1i+l1d));
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -899,7 +899,7 @@ void detect_ht(struct cpuinfo_x86 *c)
 		return;
 
 	index_msb = get_count_order(smp_num_siblings);
-	c->phys_proc_id = apic->phys_pkg_id(c->topo.initial_apicid, index_msb);
+	c->topo.pkg_id = apic->phys_pkg_id(c->topo.initial_apicid, index_msb);
 
 	smp_num_siblings = smp_num_siblings / c->x86_max_cores;
 
@@ -1729,7 +1729,7 @@ static void generic_identify(struct cpui
 		c->topo.apicid = c->topo.initial_apicid;
 # endif
 #endif
-		c->phys_proc_id = c->topo.initial_apicid;
+		c->topo.pkg_id = c->topo.initial_apicid;
 	}
 
 	get_model_name(c); /* Default name */
@@ -1767,7 +1767,7 @@ static void validate_apic_and_package_id
 		pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x APIC: %x\n",
 		       cpu, apicid, c->topo.initial_apicid);
 	}
-	BUG_ON(topology_update_package_map(c->phys_proc_id, cpu));
+	BUG_ON(topology_update_package_map(c->topo.pkg_id, cpu));
 	BUG_ON(topology_update_die_map(c->cpu_die_id, cpu));
 #else
 	c->logical_proc_id = 0;
--- a/arch/x86/kernel/cpu/hygon.c
+++ b/arch/x86/kernel/cpu/hygon.c
@@ -88,7 +88,7 @@ static void hygon_get_topology(struct cp
 			c->x86_coreid_bits = get_count_order(c->x86_max_cores);
 
 		/* Socket ID is ApicId[6] for these processors. */
-		c->phys_proc_id = c->topo.apicid >> APICID_SOCKET_ID_BIT;
+		c->topo.pkg_id = c->topo.apicid >> APICID_SOCKET_ID_BIT;
 
 		cacheinfo_hygon_init_llc_id(c, cpu);
 	} else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
@@ -118,9 +118,9 @@ static void hygon_detect_cmp(struct cpui
 	/* Low order bits define the core id (index of core in socket) */
 	c->cpu_core_id = c->topo.initial_apicid & ((1 << bits)-1);
 	/* Convert the initial APIC ID into the socket ID */
-	c->phys_proc_id = c->topo.initial_apicid >> bits;
+	c->topo.pkg_id = c->topo.initial_apicid >> bits;
 	/* use socket ID also for last level cache */
-	per_cpu(cpu_llc_id, cpu) = c->cpu_die_id = c->phys_proc_id;
+	per_cpu(cpu_llc_id, cpu) = c->cpu_die_id = c->topo.pkg_id;
 }
 
 static void srat_detect_node(struct cpuinfo_x86 *c)
--- a/arch/x86/kernel/cpu/mce/apei.c
+++ b/arch/x86/kernel/cpu/mce/apei.c
@@ -105,7 +105,7 @@ int apei_smca_report_x86_error(struct cp
 	for_each_possible_cpu(cpu) {
 		if (cpu_data(cpu).topo.initial_apicid == lapic_id) {
 			m.extcpu = cpu;
-			m.socketid = cpu_data(m.extcpu).phys_proc_id;
+			m.socketid = cpu_data(m.extcpu).topo.pkg_id;
 			break;
 		}
 	}
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -123,7 +123,7 @@ void mce_setup(struct mce *m)
 	m->time = __ktime_get_real_seconds();
 	m->cpuvendor = boot_cpu_data.x86_vendor;
 	m->cpuid = cpuid_eax(1);
-	m->socketid = cpu_data(m->extcpu).phys_proc_id;
+	m->socketid = cpu_data(m->extcpu).topo.pkg_id;
 	m->apicid = cpu_data(m->extcpu).topo.initial_apicid;
 	m->mcgcap = __rdmsr(MSR_IA32_MCG_CAP);
 	m->ppin = cpu_data(m->extcpu).ppin;
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -18,7 +18,7 @@ static void show_cpuinfo_core(struct seq
 			      unsigned int cpu)
 {
 #ifdef CONFIG_SMP
-	seq_printf(m, "physical id\t: %d\n", c->phys_proc_id);
+	seq_printf(m, "physical id\t: %d\n", c->topo.pkg_id);
 	seq_printf(m, "siblings\t: %d\n",
 		   cpumask_weight(topology_core_cpumask(cpu)));
 	seq_printf(m, "core id\t\t: %d\n", c->cpu_core_id);
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -154,8 +154,7 @@ int detect_extended_topology(struct cpui
 					core_plus_mask_width) & die_select_mask;
 	}
 
-	c->phys_proc_id = apic->phys_pkg_id(c->topo.initial_apicid,
-				pkg_mask_width);
+	c->topo.pkg_id = apic->phys_pkg_id(c->topo.initial_apicid, pkg_mask_width);
 	/*
 	 * Reinit the apicid, now that we have extended initial_apicid.
 	 */
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -347,7 +347,7 @@ int topology_phys_to_logical_pkg(unsigne
 	for_each_possible_cpu(cpu) {
 		struct cpuinfo_x86 *c = &cpu_data(cpu);
 
-		if (c->initialized && c->phys_proc_id == phys_pkg)
+		if (c->initialized && c->topo.pkg_id == phys_pkg)
 			return c->logical_proc_id;
 	}
 	return -1;
@@ -363,13 +363,13 @@ EXPORT_SYMBOL(topology_phys_to_logical_p
  */
 static int topology_phys_to_logical_die(unsigned int die_id, unsigned int cur_cpu)
 {
-	int cpu, proc_id = cpu_data(cur_cpu).phys_proc_id;
+	int cpu, proc_id = cpu_data(cur_cpu).topo.pkg_id;
 
 	for_each_possible_cpu(cpu) {
 		struct cpuinfo_x86 *c = &cpu_data(cpu);
 
 		if (c->initialized && c->cpu_die_id == die_id &&
-		    c->phys_proc_id == proc_id)
+		    c->topo.pkg_id == proc_id)
 			return c->logical_die_id;
 	}
 	return -1;
@@ -429,7 +429,7 @@ void __init smp_store_boot_cpu_info(void
 
 	*c = boot_cpu_data;
 	c->cpu_index = id;
-	topology_update_package_map(c->phys_proc_id, id);
+	topology_update_package_map(c->topo.pkg_id, id);
 	topology_update_die_map(c->cpu_die_id, id);
 	c->initialized = true;
 }
@@ -484,7 +484,7 @@ static bool match_smt(struct cpuinfo_x86
 	if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
 		int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
 
-		if (c->phys_proc_id == o->phys_proc_id &&
+		if (c->topo.pkg_id == o->topo.pkg_id &&
 		    c->cpu_die_id == o->cpu_die_id &&
 		    per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) {
 			if (c->cpu_core_id == o->cpu_core_id)
@@ -496,7 +496,7 @@ static bool match_smt(struct cpuinfo_x86
 				return topology_sane(c, o, "smt");
 		}
 
-	} else if (c->phys_proc_id == o->phys_proc_id &&
+	} else if (c->topo.pkg_id == o->topo.pkg_id &&
 		   c->cpu_die_id == o->cpu_die_id &&
 		   c->cpu_core_id == o->cpu_core_id) {
 		return topology_sane(c, o, "smt");
@@ -507,7 +507,7 @@ static bool match_smt(struct cpuinfo_x86
 
 static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 {
-	if (c->phys_proc_id == o->phys_proc_id &&
+	if (c->topo.pkg_id == o->topo.pkg_id &&
 	    c->cpu_die_id == o->cpu_die_id)
 		return true;
 	return false;
@@ -535,7 +535,7 @@ static bool match_l2c(struct cpuinfo_x86
  */
 static bool match_pkg(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 {
-	if (c->phys_proc_id == o->phys_proc_id)
+	if (c->topo.pkg_id == o->topo.pkg_id)
 		return true;
 	return false;
 }
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -12428,9 +12428,6 @@ lpfc_cpu_affinity_check(struct lpfc_hba
 	int max_core_id, min_core_id;
 	struct lpfc_vector_map_info *cpup;
 	struct lpfc_vector_map_info *new_cpup;
-#ifdef CONFIG_X86
-	struct cpuinfo_x86 *cpuinfo;
-#endif
 #ifdef CONFIG_SCSI_LPFC_DEBUG_FS
 	struct lpfc_hdwq_stat *c_stat;
 #endif
@@ -12444,8 +12441,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba
 	for_each_present_cpu(cpu) {
 		cpup = &phba->sli4_hba.cpu_map[cpu];
 #ifdef CONFIG_X86
-		cpuinfo = &cpu_data(cpu);
-		cpup->phys_id = cpuinfo->phys_proc_id;
+		cpup->phys_id = topology_physical_package_id(cpu);
 		cpup->core_id = cpuinfo->cpu_core_id;
 		if (lpfc_find_hyper(phba, cpu, cpup->phys_id, cpup->core_id))
 			cpup->flag |= LPFC_CPU_MAP_HYPER;


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 05/40] x86/cpu: Move cpu_die_id into topology info
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (3 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 04/40] x86/cpu: Move phys_proc_id into topology info Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-09 14:32   ` Zhang, Rui
  2023-08-02 10:21 ` [patch V3 06/40] scsi: lpfc: Use topology_core_id() Thomas Gleixner
                   ` (37 subsequent siblings)
  42 siblings, 1 reply; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

Move the next member.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 Documentation/arch/x86/topology.rst |    4 ++--
 arch/x86/include/asm/processor.h    |    4 +++-
 arch/x86/include/asm/topology.h     |    2 +-
 arch/x86/kernel/cpu/amd.c           |    8 ++++----
 arch/x86/kernel/cpu/cacheinfo.c     |    2 +-
 arch/x86/kernel/cpu/common.c        |    2 +-
 arch/x86/kernel/cpu/hygon.c         |    8 ++++----
 arch/x86/kernel/cpu/topology.c      |    2 +-
 arch/x86/kernel/smpboot.c           |   10 +++++-----
 9 files changed, 22 insertions(+), 20 deletions(-)

--- a/Documentation/arch/x86/topology.rst
+++ b/Documentation/arch/x86/topology.rst
@@ -55,7 +55,7 @@ AMD nomenclature for package is 'Node'.
 
     The number of dies in a package. This information is retrieved via CPUID.
 
-  - cpuinfo_x86.cpu_die_id:
+  - cpuinfo_x86.topo_die_id:
 
     The physical ID of the die. This information is retrieved via CPUID.
 
@@ -65,7 +65,7 @@ AMD nomenclature for package is 'Node'.
     and deduced from the APIC IDs of the cores in the package.
 
     Modern systems use this value for the socket. There may be multiple
-    packages within a socket. This value may differ from cpu_die_id.
+    packages within a socket. This value may differ from topo.die_id.
 
   - cpuinfo_x86.logical_proc_id:
 
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -85,6 +85,9 @@ struct cpuinfo_topology {
 
 	// Physical package ID
 	u32			pkg_id;
+
+	// Physical die ID on AMD, Relative on Intel
+	u32			die_id;
 };
 
 struct cpuinfo_x86 {
@@ -140,7 +143,6 @@ struct cpuinfo_x86 {
 	u16			logical_proc_id;
 	/* Core id: */
 	u16			cpu_core_id;
-	u16			cpu_die_id;
 	u16			logical_die_id;
 	/* Index into per_cpu list: */
 	u16			cpu_index;
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -108,7 +108,7 @@ extern const struct cpumask *cpu_cluster
 #define topology_logical_package_id(cpu)	(cpu_data(cpu).logical_proc_id)
 #define topology_physical_package_id(cpu)	(cpu_data(cpu).topo.pkg_id)
 #define topology_logical_die_id(cpu)		(cpu_data(cpu).logical_die_id)
-#define topology_die_id(cpu)			(cpu_data(cpu).cpu_die_id)
+#define topology_die_id(cpu)			(cpu_data(cpu).topo.die_id)
 #define topology_core_id(cpu)			(cpu_data(cpu).cpu_core_id)
 #define topology_ppin(cpu)			(cpu_data(cpu).ppin)
 
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -338,7 +338,7 @@ static void amd_get_topology(struct cpui
 
 		cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
 
-		c->cpu_die_id  = ecx & 0xff;
+		c->topo.die_id  = ecx & 0xff;
 
 		if (c->x86 == 0x15)
 			c->cu_id = ebx & 0xff;
@@ -364,9 +364,9 @@ static void amd_get_topology(struct cpui
 		u64 value;
 
 		rdmsrl(MSR_FAM10H_NODE_ID, value);
-		c->cpu_die_id = value & 7;
+		c->topo.die_id = value & 7;
 
-		per_cpu(cpu_llc_id, cpu) = c->cpu_die_id;
+		per_cpu(cpu_llc_id, cpu) = c->topo.die_id;
 	} else
 		return;
 
@@ -391,7 +391,7 @@ static void amd_detect_cmp(struct cpuinf
 	/* Convert the initial APIC ID into the socket ID */
 	c->topo.pkg_id = c->topo.initial_apicid >> bits;
 	/* use socket ID also for last level cache */
-	per_cpu(cpu_llc_id, cpu) = c->cpu_die_id = c->topo.pkg_id;
+	per_cpu(cpu_llc_id, cpu) = c->topo.die_id = c->topo.pkg_id;
 }
 
 u32 amd_get_nodes_per_socket(void)
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -672,7 +672,7 @@ void cacheinfo_amd_init_llc_id(struct cp
 
 	if (c->x86 < 0x17) {
 		/* LLC is at the node level. */
-		per_cpu(cpu_llc_id, cpu) = c->cpu_die_id;
+		per_cpu(cpu_llc_id, cpu) = c->topo.die_id;
 	} else if (c->x86 == 0x17 && c->x86_model <= 0x1F) {
 		/*
 		 * LLC is at the core complex level.
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1768,7 +1768,7 @@ static void validate_apic_and_package_id
 		       cpu, apicid, c->topo.initial_apicid);
 	}
 	BUG_ON(topology_update_package_map(c->topo.pkg_id, cpu));
-	BUG_ON(topology_update_die_map(c->cpu_die_id, cpu));
+	BUG_ON(topology_update_die_map(c->topo.die_id, cpu));
 #else
 	c->logical_proc_id = 0;
 #endif
--- a/arch/x86/kernel/cpu/hygon.c
+++ b/arch/x86/kernel/cpu/hygon.c
@@ -72,7 +72,7 @@ static void hygon_get_topology(struct cp
 
 		cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
 
-		c->cpu_die_id  = ecx & 0xff;
+		c->topo.die_id  = ecx & 0xff;
 
 		c->cpu_core_id = ebx & 0xff;
 
@@ -95,9 +95,9 @@ static void hygon_get_topology(struct cp
 		u64 value;
 
 		rdmsrl(MSR_FAM10H_NODE_ID, value);
-		c->cpu_die_id = value & 7;
+		c->topo.die_id = value & 7;
 
-		per_cpu(cpu_llc_id, cpu) = c->cpu_die_id;
+		per_cpu(cpu_llc_id, cpu) = c->topo.die_id;
 	} else
 		return;
 
@@ -120,7 +120,7 @@ static void hygon_detect_cmp(struct cpui
 	/* Convert the initial APIC ID into the socket ID */
 	c->topo.pkg_id = c->topo.initial_apicid >> bits;
 	/* use socket ID also for last level cache */
-	per_cpu(cpu_llc_id, cpu) = c->cpu_die_id = c->topo.pkg_id;
+	per_cpu(cpu_llc_id, cpu) = c->topo.die_id = c->topo.pkg_id;
 }
 
 static void srat_detect_node(struct cpuinfo_x86 *c)
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -150,7 +150,7 @@ int detect_extended_topology(struct cpui
 				ht_mask_width) & core_select_mask;
 
 	if (die_level_present) {
-		c->cpu_die_id = apic->phys_pkg_id(c->topo.initial_apicid,
+		c->topo.die_id = apic->phys_pkg_id(c->topo.initial_apicid,
 					core_plus_mask_width) & die_select_mask;
 	}
 
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -368,7 +368,7 @@ static int topology_phys_to_logical_die(
 	for_each_possible_cpu(cpu) {
 		struct cpuinfo_x86 *c = &cpu_data(cpu);
 
-		if (c->initialized && c->cpu_die_id == die_id &&
+		if (c->initialized && c->topo.die_id == die_id &&
 		    c->topo.pkg_id == proc_id)
 			return c->logical_die_id;
 	}
@@ -430,7 +430,7 @@ void __init smp_store_boot_cpu_info(void
 	*c = boot_cpu_data;
 	c->cpu_index = id;
 	topology_update_package_map(c->topo.pkg_id, id);
-	topology_update_die_map(c->cpu_die_id, id);
+	topology_update_die_map(c->topo.die_id, id);
 	c->initialized = true;
 }
 
@@ -485,7 +485,7 @@ static bool match_smt(struct cpuinfo_x86
 		int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
 
 		if (c->topo.pkg_id == o->topo.pkg_id &&
-		    c->cpu_die_id == o->cpu_die_id &&
+		    c->topo.die_id == o->topo.die_id &&
 		    per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) {
 			if (c->cpu_core_id == o->cpu_core_id)
 				return topology_sane(c, o, "smt");
@@ -497,7 +497,7 @@ static bool match_smt(struct cpuinfo_x86
 		}
 
 	} else if (c->topo.pkg_id == o->topo.pkg_id &&
-		   c->cpu_die_id == o->cpu_die_id &&
+		   c->topo.die_id == o->topo.die_id &&
 		   c->cpu_core_id == o->cpu_core_id) {
 		return topology_sane(c, o, "smt");
 	}
@@ -508,7 +508,7 @@ static bool match_smt(struct cpuinfo_x86
 static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 {
 	if (c->topo.pkg_id == o->topo.pkg_id &&
-	    c->cpu_die_id == o->cpu_die_id)
+	    c->topo.die_id == o->topo.die_id)
 		return true;
 	return false;
 }


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 06/40] scsi: lpfc: Use topology_core_id()
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (4 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 05/40] x86/cpu: Move cpu_die_id " Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 07/40] hwmon: (fam15h_power) " Thomas Gleixner
                   ` (36 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu,
	James E.J. Bottomley, Dick Kennedy, James Smart,
	Martin K. Petersen, linux-scsi

Use the provided topology helper.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
---
 drivers/scsi/lpfc/lpfc_init.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -12442,7 +12442,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba
 		cpup = &phba->sli4_hba.cpu_map[cpu];
 #ifdef CONFIG_X86
 		cpup->phys_id = topology_physical_package_id(cpu);
-		cpup->core_id = cpuinfo->cpu_core_id;
+		cpup->core_id = topology_core_id(cpu);
 		if (lpfc_find_hyper(phba, cpu, cpup->phys_id, cpup->core_id))
 			cpup->flag |= LPFC_CPU_MAP_HYPER;
 #else


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 07/40] hwmon: (fam15h_power) Use topology_core_id()
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (5 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 06/40] scsi: lpfc: Use topology_core_id() Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 08/40] x86/cpu: Move cpu_core_id into topology info Thomas Gleixner
                   ` (35 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu,
	Guenter Roeck, linux-hwmon, Jean Delvare

Use the provided topology helper function instead of fiddling in cpu_data.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Cc: linux-hwmon@vger.kernel.org
Cc: Jean Delvare <jdelvare@suse.com>
Cc: Huang Rui <ray.huang@amd.com>
---
 drivers/hwmon/fam15h_power.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

--- a/drivers/hwmon/fam15h_power.c
+++ b/drivers/hwmon/fam15h_power.c
@@ -17,6 +17,7 @@
 #include <linux/cpumask.h>
 #include <linux/time.h>
 #include <linux/sched.h>
+#include <linux/topology.h>
 #include <asm/processor.h>
 #include <asm/msr.h>
 
@@ -134,15 +135,13 @@ static DEVICE_ATTR_RO(power1_crit);
 static void do_read_registers_on_cu(void *_data)
 {
 	struct fam15h_power_data *data = _data;
-	int cpu, cu;
-
-	cpu = smp_processor_id();
+	int cu;
 
 	/*
 	 * With the new x86 topology modelling, cpu core id actually
 	 * is compute unit id.
 	 */
-	cu = cpu_data(cpu).cpu_core_id;
+	cu = topology_core_id(smp_processor_id());
 
 	rdmsrl_safe(MSR_F15H_CU_PWR_ACCUMULATOR, &data->cu_acc_power[cu]);
 	rdmsrl_safe(MSR_F15H_PTSC, &data->cpu_sw_pwr_ptsc[cu]);


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 08/40] x86/cpu: Move cpu_core_id into topology info
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (6 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 07/40] hwmon: (fam15h_power) " Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 09/40] x86/cpu: Move cu_id " Thomas Gleixner
                   ` (34 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

Rename it to core_id and stick it to the other ID fields.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/processor.h |    4 +++-
 arch/x86/include/asm/topology.h  |    2 +-
 arch/x86/kernel/amd_nb.c         |    4 ++--
 arch/x86/kernel/cpu/amd.c        |    8 ++++----
 arch/x86/kernel/cpu/common.c     |    4 ++--
 arch/x86/kernel/cpu/hygon.c      |    4 ++--
 arch/x86/kernel/cpu/proc.c       |    2 +-
 arch/x86/kernel/cpu/topology.c   |    2 +-
 arch/x86/kernel/smpboot.c        |    6 +++---
 9 files changed, 19 insertions(+), 17 deletions(-)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -88,6 +88,9 @@ struct cpuinfo_topology {
 
 	// Physical die ID on AMD, Relative on Intel
 	u32			die_id;
+
+	// Core ID relative to the package
+	u32			core_id;
 };
 
 struct cpuinfo_x86 {
@@ -142,7 +145,6 @@ struct cpuinfo_x86 {
 	/* Logical processor id: */
 	u16			logical_proc_id;
 	/* Core id: */
-	u16			cpu_core_id;
 	u16			logical_die_id;
 	/* Index into per_cpu list: */
 	u16			cpu_index;
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -109,7 +109,7 @@ extern const struct cpumask *cpu_cluster
 #define topology_physical_package_id(cpu)	(cpu_data(cpu).topo.pkg_id)
 #define topology_logical_die_id(cpu)		(cpu_data(cpu).logical_die_id)
 #define topology_die_id(cpu)			(cpu_data(cpu).topo.die_id)
-#define topology_core_id(cpu)			(cpu_data(cpu).cpu_core_id)
+#define topology_core_id(cpu)			(cpu_data(cpu).topo.core_id)
 #define topology_ppin(cpu)			(cpu_data(cpu).ppin)
 
 extern unsigned int __max_die_per_package;
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -378,7 +378,7 @@ int amd_get_subcaches(int cpu)
 
 	pci_read_config_dword(link, 0x1d4, &mask);
 
-	return (mask >> (4 * cpu_data(cpu).cpu_core_id)) & 0xf;
+	return (mask >> (4 * cpu_data(cpu).topo.core_id)) & 0xf;
 }
 
 int amd_set_subcaches(int cpu, unsigned long mask)
@@ -404,7 +404,7 @@ int amd_set_subcaches(int cpu, unsigned
 		pci_write_config_dword(nb->misc, 0x1b8, reg & ~0x180000);
 	}
 
-	cuid = cpu_data(cpu).cpu_core_id;
+	cuid = cpu_data(cpu).topo.core_id;
 	mask <<= 4 * cuid;
 	mask |= (0xf ^ (1 << cuid)) << 26;
 
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -306,7 +306,7 @@ static int nearby_node(int apicid)
 #endif
 
 /*
- * Fix up cpu_core_id for pre-F17h systems to be in the
+ * Fix up topo::core_id for pre-F17h systems to be in the
  * [0 .. cores_per_node - 1] range. Not really needed but
  * kept so as not to break existing setups.
  */
@@ -318,7 +318,7 @@ static void legacy_fixup_core_id(struct
 		return;
 
 	cus_per_node = c->x86_max_cores / nodes_per_socket;
-	c->cpu_core_id %= cus_per_node;
+	c->topo.core_id %= cus_per_node;
 }
 
 /*
@@ -344,7 +344,7 @@ static void amd_get_topology(struct cpui
 			c->cu_id = ebx & 0xff;
 
 		if (c->x86 >= 0x17) {
-			c->cpu_core_id = ebx & 0xff;
+			c->topo.core_id = ebx & 0xff;
 
 			if (smp_num_siblings > 1)
 				c->x86_max_cores /= smp_num_siblings;
@@ -387,7 +387,7 @@ static void amd_detect_cmp(struct cpuinf
 
 	bits = c->x86_coreid_bits;
 	/* Low order bits define the core id (index of core in socket) */
-	c->cpu_core_id = c->topo.initial_apicid & ((1 << bits)-1);
+	c->topo.core_id = c->topo.initial_apicid & ((1 << bits)-1);
 	/* Convert the initial APIC ID into the socket ID */
 	c->topo.pkg_id = c->topo.initial_apicid >> bits;
 	/* use socket ID also for last level cache */
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -907,8 +907,8 @@ void detect_ht(struct cpuinfo_x86 *c)
 
 	core_bits = get_count_order(c->x86_max_cores);
 
-	c->cpu_core_id = apic->phys_pkg_id(c->topo.initial_apicid, index_msb) &
-				       ((1 << core_bits) - 1);
+	c->topo.core_id = apic->phys_pkg_id(c->topo.initial_apicid, index_msb) &
+		((1 << core_bits) - 1);
 #endif
 }
 
--- a/arch/x86/kernel/cpu/hygon.c
+++ b/arch/x86/kernel/cpu/hygon.c
@@ -74,7 +74,7 @@ static void hygon_get_topology(struct cp
 
 		c->topo.die_id  = ecx & 0xff;
 
-		c->cpu_core_id = ebx & 0xff;
+		c->topo.core_id = ebx & 0xff;
 
 		if (smp_num_siblings > 1)
 			c->x86_max_cores /= smp_num_siblings;
@@ -116,7 +116,7 @@ static void hygon_detect_cmp(struct cpui
 
 	bits = c->x86_coreid_bits;
 	/* Low order bits define the core id (index of core in socket) */
-	c->cpu_core_id = c->topo.initial_apicid & ((1 << bits)-1);
+	c->topo.core_id = c->topo.initial_apicid & ((1 << bits)-1);
 	/* Convert the initial APIC ID into the socket ID */
 	c->topo.pkg_id = c->topo.initial_apicid >> bits;
 	/* use socket ID also for last level cache */
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -21,7 +21,7 @@ static void show_cpuinfo_core(struct seq
 	seq_printf(m, "physical id\t: %d\n", c->topo.pkg_id);
 	seq_printf(m, "siblings\t: %d\n",
 		   cpumask_weight(topology_core_cpumask(cpu)));
-	seq_printf(m, "core id\t\t: %d\n", c->cpu_core_id);
+	seq_printf(m, "core id\t\t: %d\n", c->topo.core_id);
 	seq_printf(m, "cpu cores\t: %d\n", c->booted_cores);
 	seq_printf(m, "apicid\t\t: %d\n", c->topo.apicid);
 	seq_printf(m, "initial apicid\t: %d\n", c->topo.initial_apicid);
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -146,7 +146,7 @@ int detect_extended_topology(struct cpui
 	die_select_mask = (~(-1 << die_plus_mask_width)) >>
 				core_plus_mask_width;
 
-	c->cpu_core_id = apic->phys_pkg_id(c->topo.initial_apicid,
+	c->topo.core_id = apic->phys_pkg_id(c->topo.initial_apicid,
 				ht_mask_width) & core_select_mask;
 
 	if (die_level_present) {
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -487,7 +487,7 @@ static bool match_smt(struct cpuinfo_x86
 		if (c->topo.pkg_id == o->topo.pkg_id &&
 		    c->topo.die_id == o->topo.die_id &&
 		    per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) {
-			if (c->cpu_core_id == o->cpu_core_id)
+			if (c->topo.core_id == o->topo.core_id)
 				return topology_sane(c, o, "smt");
 
 			if ((c->cu_id != 0xff) &&
@@ -498,7 +498,7 @@ static bool match_smt(struct cpuinfo_x86
 
 	} else if (c->topo.pkg_id == o->topo.pkg_id &&
 		   c->topo.die_id == o->topo.die_id &&
-		   c->cpu_core_id == o->cpu_core_id) {
+		   c->topo.core_id == o->topo.core_id) {
 		return topology_sane(c, o, "smt");
 	}
 
@@ -1439,7 +1439,7 @@ static void remove_siblinginfo(int cpu)
 	cpumask_clear(topology_sibling_cpumask(cpu));
 	cpumask_clear(topology_core_cpumask(cpu));
 	cpumask_clear(topology_die_cpumask(cpu));
-	c->cpu_core_id = 0;
+	c->topo.core_id = 0;
 	c->booted_cores = 0;
 	cpumask_clear_cpu(cpu, cpu_sibling_setup_mask);
 	recompute_smt_state();


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 09/40] x86/cpu: Move cu_id into topology info
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (7 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 08/40] x86/cpu: Move cpu_core_id into topology info Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 10/40] x86/cpu: Remove pointless evaluation of x86_coreid_bits Thomas Gleixner
                   ` (33 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/processor.h |    4 +++-
 arch/x86/kernel/cpu/amd.c        |    2 +-
 arch/x86/kernel/cpu/common.c     |    2 +-
 arch/x86/kernel/smpboot.c        |    6 +++---
 4 files changed, 8 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -89,6 +89,9 @@ struct cpuinfo_topology {
 	// Physical die ID on AMD, Relative on Intel
 	u32			die_id;
 
+	// Compute unit ID - AMD specific
+	u32			cu_id;
+
 	// Core ID relative to the package
 	u32			core_id;
 };
@@ -109,7 +112,6 @@ struct cpuinfo_x86 {
 	__u8			x86_phys_bits;
 	/* CPUID returned core id bits: */
 	__u8			x86_coreid_bits;
-	__u8			cu_id;
 	/* Max extended CPUID function supported: */
 	__u32			extended_cpuid_level;
 	/* Maximum supported CPUID level, -1=no CPUID: */
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -341,7 +341,7 @@ static void amd_get_topology(struct cpui
 		c->topo.die_id  = ecx & 0xff;
 
 		if (c->x86 == 0x15)
-			c->cu_id = ebx & 0xff;
+			c->topo.cu_id = ebx & 0xff;
 
 		if (c->x86 >= 0x17) {
 			c->topo.core_id = ebx & 0xff;
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1789,7 +1789,7 @@ static void identify_cpu(struct cpuinfo_
 	c->x86_model_id[0] = '\0';  /* Unset */
 	c->x86_max_cores = 1;
 	c->x86_coreid_bits = 0;
-	c->cu_id = 0xff;
+	c->topo.cu_id = 0xff;
 #ifdef CONFIG_X86_64
 	c->x86_clflush_size = 64;
 	c->x86_phys_bits = 36;
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -490,9 +490,9 @@ static bool match_smt(struct cpuinfo_x86
 			if (c->topo.core_id == o->topo.core_id)
 				return topology_sane(c, o, "smt");
 
-			if ((c->cu_id != 0xff) &&
-			    (o->cu_id != 0xff) &&
-			    (c->cu_id == o->cu_id))
+			if ((c->topo.cu_id != 0xff) &&
+			    (o->topo.cu_id != 0xff) &&
+			    (c->topo.cu_id == o->topo.cu_id))
 				return topology_sane(c, o, "smt");
 		}
 


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 10/40] x86/cpu: Remove pointless evaluation of x86_coreid_bits
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (8 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 09/40] x86/cpu: Move cu_id " Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 11/40] x86/cpu: Move logical package and die IDs into topology info Thomas Gleixner
                   ` (32 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

cpuinfo_x86::x86_coreid_bits is only used by the AMD numa topology code. No
point in evaluating it on non AMD systems.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/x86/kernel/cpu/intel.c   |   13 -------------
 arch/x86/kernel/cpu/zhaoxin.c |   14 --------------
 2 files changed, 27 deletions(-)

--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -488,19 +488,6 @@ static void early_init_intel(struct cpui
 		setup_clear_cpu_cap(X86_FEATURE_PGE);
 	}
 
-	if (c->cpuid_level >= 0x00000001) {
-		u32 eax, ebx, ecx, edx;
-
-		cpuid(0x00000001, &eax, &ebx, &ecx, &edx);
-		/*
-		 * If HTT (EDX[28]) is set EBX[16:23] contain the number of
-		 * apicids which are reserved per package. Store the resulting
-		 * shift value for the package management code.
-		 */
-		if (edx & (1U << 28))
-			c->x86_coreid_bits = get_count_order((ebx >> 16) & 0xff);
-	}
-
 	check_memory_type_self_snoop_errata(c);
 
 	/*
--- a/arch/x86/kernel/cpu/zhaoxin.c
+++ b/arch/x86/kernel/cpu/zhaoxin.c
@@ -65,20 +65,6 @@ static void early_init_zhaoxin(struct cp
 		set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
 		set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
 	}
-
-	if (c->cpuid_level >= 0x00000001) {
-		u32 eax, ebx, ecx, edx;
-
-		cpuid(0x00000001, &eax, &ebx, &ecx, &edx);
-		/*
-		 * If HTT (EDX[28]) is set EBX[16:23] contain the number of
-		 * apicids which are reserved per package. Store the resulting
-		 * shift value for the package management code.
-		 */
-		if (edx & (1U << 28))
-			c->x86_coreid_bits = get_count_order((ebx >> 16) & 0xff);
-	}
-
 }
 
 static void init_zhaoxin(struct cpuinfo_x86 *c)


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 11/40] x86/cpu: Move logical package and die IDs into topology info
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (9 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 10/40] x86/cpu: Remove pointless evaluation of x86_coreid_bits Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 12/40] x86/cpu: Move cpu_l[l2]c_id " Thomas Gleixner
                   ` (31 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

Yet another topology related data pair. Rename logical_proc_id to
logical_pkg_id so it fits the common naming conventions.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 Documentation/arch/x86/topology.rst |    2 +-
 arch/x86/events/intel/uncore.c      |    2 +-
 arch/x86/include/asm/processor.h    |    8 ++++----
 arch/x86/include/asm/topology.h     |    4 ++--
 arch/x86/kernel/cpu/common.c        |    2 +-
 arch/x86/kernel/smpboot.c           |    8 ++++----
 6 files changed, 13 insertions(+), 13 deletions(-)

--- a/Documentation/arch/x86/topology.rst
+++ b/Documentation/arch/x86/topology.rst
@@ -67,7 +67,7 @@ AMD nomenclature for package is 'Node'.
     Modern systems use this value for the socket. There may be multiple
     packages within a socket. This value may differ from topo.die_id.
 
-  - cpuinfo_x86.logical_proc_id:
+  - cpuinfo_x86.topo.logical_pkg_id:
 
     The logical ID of the package. As we do not trust BIOSes to enumerate the
     packages in a consistent way, we introduced the concept of logical package
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -74,7 +74,7 @@ int uncore_device_to_die(struct pci_dev
 		struct cpuinfo_x86 *c = &cpu_data(cpu);
 
 		if (c->initialized && cpu_to_node(cpu) == node)
-			return c->logical_die_id;
+			return c->topo.logical_die_id;
 	}
 
 	return -1;
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -94,6 +94,10 @@ struct cpuinfo_topology {
 
 	// Core ID relative to the package
 	u32			core_id;
+
+	// Logical ID mappings
+	u32			logical_pkg_id;
+	u32			logical_die_id;
 };
 
 struct cpuinfo_x86 {
@@ -144,10 +148,6 @@ struct cpuinfo_x86 {
 	u16			x86_clflush_size;
 	/* number of cores as seen by the OS: */
 	u16			booted_cores;
-	/* Logical processor id: */
-	u16			logical_proc_id;
-	/* Core id: */
-	u16			logical_die_id;
 	/* Index into per_cpu list: */
 	u16			cpu_index;
 	/*  Is SMT active on this core? */
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -105,9 +105,9 @@ static inline void setup_node_to_cpumask
 extern const struct cpumask *cpu_coregroup_mask(int cpu);
 extern const struct cpumask *cpu_clustergroup_mask(int cpu);
 
-#define topology_logical_package_id(cpu)	(cpu_data(cpu).logical_proc_id)
+#define topology_logical_package_id(cpu)	(cpu_data(cpu).topo.logical_pkg_id)
 #define topology_physical_package_id(cpu)	(cpu_data(cpu).topo.pkg_id)
-#define topology_logical_die_id(cpu)		(cpu_data(cpu).logical_die_id)
+#define topology_logical_die_id(cpu)		(cpu_data(cpu).topo.logical_die_id)
 #define topology_die_id(cpu)			(cpu_data(cpu).topo.die_id)
 #define topology_core_id(cpu)			(cpu_data(cpu).topo.core_id)
 #define topology_ppin(cpu)			(cpu_data(cpu).ppin)
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1770,7 +1770,7 @@ static void validate_apic_and_package_id
 	BUG_ON(topology_update_package_map(c->topo.pkg_id, cpu));
 	BUG_ON(topology_update_die_map(c->topo.die_id, cpu));
 #else
-	c->logical_proc_id = 0;
+	c->topo.logical_pkg_id = 0;
 #endif
 }
 
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -348,7 +348,7 @@ int topology_phys_to_logical_pkg(unsigne
 		struct cpuinfo_x86 *c = &cpu_data(cpu);
 
 		if (c->initialized && c->topo.pkg_id == phys_pkg)
-			return c->logical_proc_id;
+			return c->topo.logical_pkg_id;
 	}
 	return -1;
 }
@@ -370,7 +370,7 @@ static int topology_phys_to_logical_die(
 
 		if (c->initialized && c->topo.die_id == die_id &&
 		    c->topo.pkg_id == proc_id)
-			return c->logical_die_id;
+			return c->topo.logical_die_id;
 	}
 	return -1;
 }
@@ -395,7 +395,7 @@ int topology_update_package_map(unsigned
 			cpu, pkg, new);
 	}
 found:
-	cpu_data(cpu).logical_proc_id = new;
+	cpu_data(cpu).topo.logical_pkg_id = new;
 	return 0;
 }
 /**
@@ -418,7 +418,7 @@ int topology_update_die_map(unsigned int
 			cpu, die, new);
 	}
 found:
-	cpu_data(cpu).logical_die_id = new;
+	cpu_data(cpu).topo.logical_die_id = new;
 	return 0;
 }
 


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 12/40] x86/cpu: Move cpu_l[l2]c_id into topology info
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (10 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 11/40] x86/cpu: Move logical package and die IDs into topology info Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 13/40] x86/apic: Use BAD_APICID consistently Thomas Gleixner
                   ` (30 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

The topology IDs which identify the LLC and L2 domains clearly belong to
the per CPU topology information.

Move them into cpuinfo_x86::cpuinfo_topo and get rid of the extra per CPU
data and the related exports.

This also paves the way to do proper topology evaluation during early boot
because it removes the only per CPU dependency for that.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
---
 Documentation/arch/x86/topology.rst  |    4 +---
 arch/x86/events/amd/uncore.c         |    2 +-
 arch/x86/include/asm/cacheinfo.h     |    3 ---
 arch/x86/include/asm/processor.h     |   14 +++++++++++++-
 arch/x86/include/asm/smp.h           |    2 --
 arch/x86/include/asm/topology.h      |    2 +-
 arch/x86/kernel/apic/apic_numachip.c |    2 +-
 arch/x86/kernel/cpu/amd.c            |   12 ++++--------
 arch/x86/kernel/cpu/cacheinfo.c      |   33 ++++++++++++---------------------
 arch/x86/kernel/cpu/common.c         |   14 ++------------
 arch/x86/kernel/cpu/cpu.h            |    3 +++
 arch/x86/kernel/cpu/hygon.c          |   14 +++++---------
 arch/x86/kernel/smpboot.c            |   10 +++++-----
 13 files changed, 48 insertions(+), 67 deletions(-)

--- a/Documentation/arch/x86/topology.rst
+++ b/Documentation/arch/x86/topology.rst
@@ -79,9 +79,7 @@ AMD nomenclature for package is 'Node'.
     The maximum possible number of packages in the system. Helpful for per
     package facilities to preallocate per package information.
 
-  - cpu_llc_id:
-
-    A per-CPU variable containing:
+  - cpuinfo_x86.topo.llc_id:
 
       - On Intel, the first APIC ID of the list of CPUs sharing the Last Level
         Cache
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -537,7 +537,7 @@ static int amd_uncore_cpu_starting(unsig
 
 	if (amd_uncore_llc) {
 		uncore = *per_cpu_ptr(amd_uncore_llc, cpu);
-		uncore->id = get_llc_id(cpu);
+		uncore->id = per_cpu_llc_id(cpu);
 
 		uncore = amd_uncore_find_online_sibling(uncore, amd_uncore_llc);
 		*per_cpu_ptr(amd_uncore_llc, cpu) = uncore;
--- a/arch/x86/include/asm/cacheinfo.h
+++ b/arch/x86/include/asm/cacheinfo.h
@@ -7,9 +7,6 @@ extern unsigned int memory_caching_contr
 #define CACHE_MTRR 0x01
 #define CACHE_PAT  0x02
 
-void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int cpu);
-void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c, int cpu);
-
 void cache_disable(void);
 void cache_enable(void);
 void set_cache_aps_delayed_init(bool val);
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -98,6 +98,10 @@ struct cpuinfo_topology {
 	// Logical ID mappings
 	u32			logical_pkg_id;
 	u32			logical_die_id;
+
+	// Cache level topology IDs
+	u32			llc_id;
+	u32			l2c_id;
 };
 
 struct cpuinfo_x86 {
@@ -687,7 +691,15 @@ extern int set_tsc_mode(unsigned int val
 
 DECLARE_PER_CPU(u64, msr_misc_features_shadow);
 
-extern u16 get_llc_id(unsigned int cpu);
+static inline u16 per_cpu_llc_id(unsigned int cpu)
+{
+	return per_cpu(cpu_info.topo.llc_id, cpu);
+}
+
+static inline u16 per_cpu_l2c_id(unsigned int cpu)
+{
+	return per_cpu(cpu_info.topo.l2c_id, cpu);
+}
 
 #ifdef CONFIG_CPU_SUP_AMD
 extern u32 amd_get_nodes_per_socket(void);
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -17,8 +17,6 @@ DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_
 /* cpus sharing the last level cache: */
 DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);
 DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_l2c_shared_map);
-DECLARE_PER_CPU_READ_MOSTLY(u16, cpu_llc_id);
-DECLARE_PER_CPU_READ_MOSTLY(u16, cpu_l2c_id);
 
 DECLARE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_cpu_to_apicid);
 DECLARE_EARLY_PER_CPU_READ_MOSTLY(u32, x86_cpu_to_acpiid);
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -115,7 +115,7 @@ extern const struct cpumask *cpu_cluster
 extern unsigned int __max_die_per_package;
 
 #ifdef CONFIG_SMP
-#define topology_cluster_id(cpu)		(per_cpu(cpu_l2c_id, cpu))
+#define topology_cluster_id(cpu)		(cpu_data(cpu).topo.l2c_id)
 #define topology_die_cpumask(cpu)		(per_cpu(cpu_die_map, cpu))
 #define topology_cluster_cpumask(cpu)		(cpu_clustergroup_mask(cpu))
 #define topology_core_cpumask(cpu)		(per_cpu(cpu_core_map, cpu))
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -161,7 +161,7 @@ static void fixup_cpu_id(struct cpuinfo_
 	u64 val;
 	u32 nodes = 1;
 
-	this_cpu_write(cpu_llc_id, node);
+	c->topo.llc_id = node;
 
 	/* Account for nodes per socket in multi-core-module processors */
 	if (boot_cpu_has(X86_FEATURE_NODEID_MSR)) {
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -329,8 +329,6 @@ static void legacy_fixup_core_id(struct
  */
 static void amd_get_topology(struct cpuinfo_x86 *c)
 {
-	int cpu = smp_processor_id();
-
 	/* get information required for multi-node processors */
 	if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
 		int err;
@@ -358,15 +356,14 @@ static void amd_get_topology(struct cpui
 		if (!err)
 			c->x86_coreid_bits = get_count_order(c->x86_max_cores);
 
-		cacheinfo_amd_init_llc_id(c, cpu);
+		cacheinfo_amd_init_llc_id(c);
 
 	} else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
 		u64 value;
 
 		rdmsrl(MSR_FAM10H_NODE_ID, value);
 		c->topo.die_id = value & 7;
-
-		per_cpu(cpu_llc_id, cpu) = c->topo.die_id;
+		c->topo.llc_id = c->topo.die_id;
 	} else
 		return;
 
@@ -383,7 +380,6 @@ static void amd_get_topology(struct cpui
 static void amd_detect_cmp(struct cpuinfo_x86 *c)
 {
 	unsigned bits;
-	int cpu = smp_processor_id();
 
 	bits = c->x86_coreid_bits;
 	/* Low order bits define the core id (index of core in socket) */
@@ -391,7 +387,7 @@ static void amd_detect_cmp(struct cpuinf
 	/* Convert the initial APIC ID into the socket ID */
 	c->topo.pkg_id = c->topo.initial_apicid >> bits;
 	/* use socket ID also for last level cache */
-	per_cpu(cpu_llc_id, cpu) = c->topo.die_id = c->topo.pkg_id;
+	c->topo.llc_id = c->topo.die_id = c->topo.pkg_id;
 }
 
 u32 amd_get_nodes_per_socket(void)
@@ -409,7 +405,7 @@ static void srat_detect_node(struct cpui
 
 	node = numa_cpu_node(cpu);
 	if (node == NUMA_NO_NODE)
-		node = get_llc_id(cpu);
+		node = per_cpu_llc_id(cpu);
 
 	/*
 	 * On multi-fabric platform (e.g. Numascale NumaChip) a
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -661,7 +661,7 @@ static int find_num_cache_leaves(struct
 	return i;
 }
 
-void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int cpu)
+void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c)
 {
 	/*
 	 * We may have multiple LLCs if L3 caches exist, so check if we
@@ -672,13 +672,13 @@ void cacheinfo_amd_init_llc_id(struct cp
 
 	if (c->x86 < 0x17) {
 		/* LLC is at the node level. */
-		per_cpu(cpu_llc_id, cpu) = c->topo.die_id;
+		c->topo.llc_id = c->topo.die_id;
 	} else if (c->x86 == 0x17 && c->x86_model <= 0x1F) {
 		/*
 		 * LLC is at the core complex level.
 		 * Core complex ID is ApicId[3] for these processors.
 		 */
-		per_cpu(cpu_llc_id, cpu) = c->topo.apicid >> 3;
+		c->topo.llc_id = c->topo.apicid >> 3;
 	} else {
 		/*
 		 * LLC ID is calculated from the number of threads sharing the
@@ -694,12 +694,12 @@ void cacheinfo_amd_init_llc_id(struct cp
 		if (num_sharing_cache) {
 			int bits = get_count_order(num_sharing_cache);
 
-			per_cpu(cpu_llc_id, cpu) = c->topo.apicid >> bits;
+			c->topo.llc_id = c->topo.apicid >> bits;
 		}
 	}
 }
 
-void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c, int cpu)
+void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c)
 {
 	/*
 	 * We may have multiple LLCs if L3 caches exist, so check if we
@@ -712,7 +712,7 @@ void cacheinfo_hygon_init_llc_id(struct
 	 * LLC is at the core complex level.
 	 * Core complex ID is ApicId[3] for these processors.
 	 */
-	per_cpu(cpu_llc_id, cpu) = c->topo.apicid >> 3;
+	c->topo.llc_id = c->topo.apicid >> 3;
 }
 
 void init_amd_cacheinfo(struct cpuinfo_x86 *c)
@@ -740,9 +740,6 @@ void init_intel_cacheinfo(struct cpuinfo
 	unsigned int new_l1d = 0, new_l1i = 0; /* Cache sizes from cpuid(4) */
 	unsigned int new_l2 = 0, new_l3 = 0, i; /* Cache sizes from cpuid(4) */
 	unsigned int l2_id = 0, l3_id = 0, num_threads_sharing, index_msb;
-#ifdef CONFIG_SMP
-	unsigned int cpu = c->cpu_index;
-#endif
 
 	if (c->cpuid_level > 3) {
 		static int is_initialized;
@@ -856,30 +853,24 @@ void init_intel_cacheinfo(struct cpuinfo
 
 	if (new_l2) {
 		l2 = new_l2;
-#ifdef CONFIG_SMP
-		per_cpu(cpu_llc_id, cpu) = l2_id;
-		per_cpu(cpu_l2c_id, cpu) = l2_id;
-#endif
+		c->topo.llc_id = l2_id;
+		c->topo.l2c_id = l2_id;
 	}
 
 	if (new_l3) {
 		l3 = new_l3;
-#ifdef CONFIG_SMP
-		per_cpu(cpu_llc_id, cpu) = l3_id;
-#endif
+		c->topo.llc_id = l3_id;
 	}
 
-#ifdef CONFIG_SMP
 	/*
-	 * If cpu_llc_id is not yet set, this means cpuid_level < 4 which in
+	 * If llc_id is not yet set, this means cpuid_level < 4 which in
 	 * turns means that the only possibility is SMT (as indicated in
 	 * cpuid1). Since cpuid2 doesn't specify shared caches, and we know
 	 * that SMT shares all caches, we can unconditionally set cpu_llc_id to
 	 * c->topo.pkg_id.
 	 */
-	if (per_cpu(cpu_llc_id, cpu) == BAD_APICID)
-		per_cpu(cpu_llc_id, cpu) = c->topo.pkg_id;
-#endif
+	if (c->topo.llc_id == BAD_APICID)
+		c->topo.llc_id = c->topo.pkg_id;
 
 	c->x86_cache_size = l3 ? l3 : (l2 ? l2 : (l1i+l1d));
 
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -75,18 +75,6 @@ u32 elf_hwcap2 __read_mostly;
 int smp_num_siblings = 1;
 EXPORT_SYMBOL(smp_num_siblings);
 
-/* Last level cache ID of each logical CPU */
-DEFINE_PER_CPU_READ_MOSTLY(u16, cpu_llc_id) = BAD_APICID;
-
-u16 get_llc_id(unsigned int cpu)
-{
-	return per_cpu(cpu_llc_id, cpu);
-}
-EXPORT_SYMBOL_GPL(get_llc_id);
-
-/* L2 cache ID of each logical CPU */
-DEFINE_PER_CPU_READ_MOSTLY(u16, cpu_l2c_id) = BAD_APICID;
-
 static struct ppin_info {
 	int	feature;
 	int	msr_ppin_ctl;
@@ -1790,6 +1778,8 @@ static void identify_cpu(struct cpuinfo_
 	c->x86_max_cores = 1;
 	c->x86_coreid_bits = 0;
 	c->topo.cu_id = 0xff;
+	c->topo.llc_id = BAD_APICID;
+	c->topo.l2c_id = BAD_APICID;
 #ifdef CONFIG_X86_64
 	c->x86_clflush_size = 64;
 	c->x86_phys_bits = 36;
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -78,6 +78,9 @@ extern int detect_ht_early(struct cpuinf
 extern void detect_ht(struct cpuinfo_x86 *c);
 extern void check_null_seg_clears_base(struct cpuinfo_x86 *c);
 
+void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c);
+void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c);
+
 unsigned int aperfmperf_get_khz(int cpu);
 void cpu_select_mitigations(void);
 
--- a/arch/x86/kernel/cpu/hygon.c
+++ b/arch/x86/kernel/cpu/hygon.c
@@ -63,8 +63,6 @@ static void hygon_get_topology_early(str
  */
 static void hygon_get_topology(struct cpuinfo_x86 *c)
 {
-	int cpu = smp_processor_id();
-
 	/* get information required for multi-node processors */
 	if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
 		int err;
@@ -90,14 +88,13 @@ static void hygon_get_topology(struct cp
 		/* Socket ID is ApicId[6] for these processors. */
 		c->topo.pkg_id = c->topo.apicid >> APICID_SOCKET_ID_BIT;
 
-		cacheinfo_hygon_init_llc_id(c, cpu);
+		cacheinfo_hygon_init_llc_id(c);
 	} else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
 		u64 value;
 
 		rdmsrl(MSR_FAM10H_NODE_ID, value);
 		c->topo.die_id = value & 7;
-
-		per_cpu(cpu_llc_id, cpu) = c->topo.die_id;
+		c->topo.llc_id = c->topo.die_id;
 	} else
 		return;
 
@@ -112,15 +109,14 @@ static void hygon_get_topology(struct cp
 static void hygon_detect_cmp(struct cpuinfo_x86 *c)
 {
 	unsigned int bits;
-	int cpu = smp_processor_id();
 
 	bits = c->x86_coreid_bits;
 	/* Low order bits define the core id (index of core in socket) */
 	c->topo.core_id = c->topo.initial_apicid & ((1 << bits)-1);
 	/* Convert the initial APIC ID into the socket ID */
 	c->topo.pkg_id = c->topo.initial_apicid >> bits;
-	/* use socket ID also for last level cache */
-	per_cpu(cpu_llc_id, cpu) = c->topo.die_id = c->topo.pkg_id;
+	/* Use package ID also for last level cache */
+	c->topo.llc_id = c->topo.die_id = c->topo.pkg_id;
 }
 
 static void srat_detect_node(struct cpuinfo_x86 *c)
@@ -132,7 +128,7 @@ static void srat_detect_node(struct cpui
 
 	node = numa_cpu_node(cpu);
 	if (node == NUMA_NO_NODE)
-		node = per_cpu(cpu_llc_id, cpu);
+		node = c->topo.llc_id;
 
 	/*
 	 * On multi-fabric platform (e.g. Numascale NumaChip) a
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -486,7 +486,7 @@ static bool match_smt(struct cpuinfo_x86
 
 		if (c->topo.pkg_id == o->topo.pkg_id &&
 		    c->topo.die_id == o->topo.die_id &&
-		    per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) {
+		    per_cpu_llc_id(cpu1) == per_cpu_llc_id(cpu2)) {
 			if (c->topo.core_id == o->topo.core_id)
 				return topology_sane(c, o, "smt");
 
@@ -518,11 +518,11 @@ static bool match_l2c(struct cpuinfo_x86
 	int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
 
 	/* If the arch didn't set up l2c_id, fall back to SMT */
-	if (per_cpu(cpu_l2c_id, cpu1) == BAD_APICID)
+	if (per_cpu_l2c_id(cpu1) == BAD_APICID)
 		return match_smt(c, o);
 
 	/* Do not match if L2 cache id does not match: */
-	if (per_cpu(cpu_l2c_id, cpu1) != per_cpu(cpu_l2c_id, cpu2))
+	if (per_cpu_l2c_id(cpu1) != per_cpu_l2c_id(cpu2))
 		return false;
 
 	return topology_sane(c, o, "l2c");
@@ -568,11 +568,11 @@ static bool match_llc(struct cpuinfo_x86
 	bool intel_snc = id && id->driver_data;
 
 	/* Do not match if we do not have a valid APICID for cpu: */
-	if (per_cpu(cpu_llc_id, cpu1) == BAD_APICID)
+	if (per_cpu_llc_id(cpu1) == BAD_APICID)
 		return false;
 
 	/* Do not match if LLC id does not match: */
-	if (per_cpu(cpu_llc_id, cpu1) != per_cpu(cpu_llc_id, cpu2))
+	if (per_cpu_llc_id(cpu1) != per_cpu_llc_id(cpu2))
 		return false;
 
 	/*


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 13/40] x86/apic: Use BAD_APICID consistently
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (11 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 12/40] x86/cpu: Move cpu_l[l2]c_id " Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 14/40] x86/apic: Use u32 for APIC IDs in global data Thomas Gleixner
                   ` (29 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

APIC ID checks compare with BAD_APICID all over the place, but some
initializers and some code which fiddles with global data structure use
-1[U] instead. That simply cannot work at all.

Fix it up and use BAD_APICID consistently all over the place.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V3: Fixed changelog typos - Sohil
---
 arch/x86/kernel/acpi/boot.c |    2 +-
 arch/x86/kernel/apic/apic.c |    6 ++----
 2 files changed, 3 insertions(+), 5 deletions(-)

--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -852,7 +852,7 @@ int acpi_unmap_cpu(int cpu)
 	set_apicid_to_node(per_cpu(x86_cpu_to_apicid, cpu), NUMA_NO_NODE);
 #endif
 
-	per_cpu(x86_cpu_to_apicid, cpu) = -1;
+	per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
 	set_cpu_present(cpu, false);
 	num_processors--;
 
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -70,7 +70,7 @@ unsigned int num_processors;
 unsigned disabled_cpus;
 
 /* Processor that is doing the boot up */
-unsigned int boot_cpu_physical_apicid __ro_after_init = -1U;
+unsigned int boot_cpu_physical_apicid __ro_after_init = BAD_APICID;
 EXPORT_SYMBOL_GPL(boot_cpu_physical_apicid);
 
 u8 boot_cpu_apic_version __ro_after_init;
@@ -2316,9 +2316,7 @@ static int nr_logical_cpuids = 1;
 /*
  * Used to store mapping between logical CPU IDs and APIC IDs.
  */
-int cpuid_to_apicid[] = {
-	[0 ... NR_CPUS - 1] = -1,
-};
+int cpuid_to_apicid[] = { [0 ... NR_CPUS - 1] = BAD_APICID, };
 
 bool arch_match_cpu_phys_id(int cpu, u64 phys_id)
 {


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 14/40] x86/apic: Use u32 for APIC IDs in global data
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (12 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 13/40] x86/apic: Use BAD_APICID consistently Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 15/40] x86/apic: Use u32 for check_apicid_used() Thomas Gleixner
                   ` (28 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

APIC IDs are used with random data types u16, u32, int, unsigned int,
unsigned long.

Make it all consistently use u32 because that reflects the hardware
register width and fixup the most obvious usage sites of that.

The APIC callbacks will be addressed separately.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/x86/include/asm/apic.h      |    2 +-
 arch/x86/include/asm/mpspec.h    |    2 +-
 arch/x86/include/asm/processor.h |    4 ++--
 arch/x86/include/asm/smp.h       |    2 +-
 arch/x86/kernel/apic/apic.c      |   12 ++++++------
 arch/x86/kernel/kvm.c            |    6 +++---
 arch/x86/mm/numa.c               |    4 ++--
 7 files changed, 16 insertions(+), 16 deletions(-)

--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -54,7 +54,7 @@ extern int local_apic_timer_c2_ok;
 extern bool apic_is_disabled;
 extern unsigned int lapic_timer_period;
 
-extern int cpuid_to_apicid[];
+extern u32 cpuid_to_apicid[];
 
 extern enum apic_intr_mode_id apic_intr_mode;
 enum apic_intr_mode_id {
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -37,7 +37,7 @@ extern int mp_bus_id_to_type[MAX_MP_BUSS
 
 extern DECLARE_BITMAP(mp_bus_not_pci, MAX_MP_BUSSES);
 
-extern unsigned int boot_cpu_physical_apicid;
+extern u32 boot_cpu_physical_apicid;
 extern u8 boot_cpu_apic_version;
 
 #ifdef CONFIG_X86_LOCAL_APIC
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -691,12 +691,12 @@ extern int set_tsc_mode(unsigned int val
 
 DECLARE_PER_CPU(u64, msr_misc_features_shadow);
 
-static inline u16 per_cpu_llc_id(unsigned int cpu)
+static inline u32 per_cpu_llc_id(unsigned int cpu)
 {
 	return per_cpu(cpu_info.topo.llc_id, cpu);
 }
 
-static inline u16 per_cpu_l2c_id(unsigned int cpu)
+static inline u32 per_cpu_l2c_id(unsigned int cpu)
 {
 	return per_cpu(cpu_info.topo.l2c_id, cpu);
 }
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -18,7 +18,7 @@ DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_
 DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);
 DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_l2c_shared_map);
 
-DECLARE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_cpu_to_apicid);
+DECLARE_EARLY_PER_CPU_READ_MOSTLY(u32, x86_cpu_to_apicid);
 DECLARE_EARLY_PER_CPU_READ_MOSTLY(u32, x86_cpu_to_acpiid);
 
 struct task_struct;
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -70,7 +70,7 @@ unsigned int num_processors;
 unsigned disabled_cpus;
 
 /* Processor that is doing the boot up */
-unsigned int boot_cpu_physical_apicid __ro_after_init = BAD_APICID;
+u32 boot_cpu_physical_apicid __ro_after_init = BAD_APICID;
 EXPORT_SYMBOL_GPL(boot_cpu_physical_apicid);
 
 u8 boot_cpu_apic_version __ro_after_init;
@@ -85,7 +85,7 @@ physid_mask_t phys_cpu_present_map;
  * disable_cpu_apicid=<int>, mostly used for the kdump 2nd kernel to
  * avoid undefined behaviour caused by sending INIT from AP to BSP.
  */
-static unsigned int disabled_cpu_apicid __ro_after_init = BAD_APICID;
+static u32 disabled_cpu_apicid __ro_after_init = BAD_APICID;
 
 /*
  * This variable controls which CPUs receive external NMIs.  By default,
@@ -109,7 +109,7 @@ static inline bool apic_accessible(void)
 /*
  * Map cpu index to physical APIC ID
  */
-DEFINE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_cpu_to_apicid, BAD_APICID);
+DEFINE_EARLY_PER_CPU_READ_MOSTLY(u32, x86_cpu_to_apicid, BAD_APICID);
 DEFINE_EARLY_PER_CPU_READ_MOSTLY(u32, x86_cpu_to_acpiid, U32_MAX);
 EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_apicid);
 EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_acpiid);
@@ -2316,11 +2316,11 @@ static int nr_logical_cpuids = 1;
 /*
  * Used to store mapping between logical CPU IDs and APIC IDs.
  */
-int cpuid_to_apicid[] = { [0 ... NR_CPUS - 1] = BAD_APICID, };
+u32 cpuid_to_apicid[] = { [0 ... NR_CPUS - 1] = BAD_APICID, };
 
 bool arch_match_cpu_phys_id(int cpu, u64 phys_id)
 {
-	return phys_id == cpuid_to_apicid[cpu];
+	return phys_id == (u64)cpuid_to_apicid[cpu];
 }
 
 #ifdef CONFIG_SMP
@@ -2380,7 +2380,7 @@ static int allocate_logical_cpuid(int ap
 	return nr_logical_cpuids++;
 }
 
-static void cpu_update_apic(int cpu, int apicid)
+static void cpu_update_apic(int cpu, u32 apicid)
 {
 #if defined(CONFIG_SMP) || defined(CONFIG_X86_64)
 	early_per_cpu(x86_cpu_to_apicid, cpu) = apicid;
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -500,13 +500,13 @@ static bool pv_sched_yield_supported(voi
 static void __send_ipi_mask(const struct cpumask *mask, int vector)
 {
 	unsigned long flags;
-	int cpu, apic_id, icr;
-	int min = 0, max = 0;
+	int cpu, min = 0, max = 0;
 #ifdef CONFIG_X86_64
 	__uint128_t ipi_bitmap = 0;
 #else
 	u64 ipi_bitmap = 0;
 #endif
+	u32 apic_id, icr;
 	long ret;
 
 	if (cpumask_empty(mask))
@@ -1030,8 +1030,8 @@ arch_initcall(activate_jump_labels);
 /* Kick a cpu by its apicid. Used to wake up a halted vcpu */
 static void kvm_kick_cpu(int cpu)
 {
-	int apicid;
 	unsigned long flags = 0;
+	u32 apicid;
 
 	apicid = per_cpu(x86_cpu_to_apicid, cpu);
 	kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid);
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -56,7 +56,7 @@ s16 __apicid_to_node[MAX_LOCAL_APIC] = {
 
 int numa_cpu_node(int cpu)
 {
-	int apicid = early_per_cpu(x86_cpu_to_apicid, cpu);
+	u32 apicid = early_per_cpu(x86_cpu_to_apicid, cpu);
 
 	if (apicid != BAD_APICID)
 		return __apicid_to_node[apicid];
@@ -786,7 +786,7 @@ void __init init_gi_nodes(void)
 void __init init_cpu_to_node(void)
 {
 	int cpu;
-	u16 *cpu_to_apicid = early_per_cpu_ptr(x86_cpu_to_apicid);
+	u32 *cpu_to_apicid = early_per_cpu_ptr(x86_cpu_to_apicid);
 
 	BUG_ON(cpu_to_apicid == NULL);
 


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 15/40] x86/apic: Use u32 for check_apicid_used()
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (13 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 14/40] x86/apic: Use u32 for APIC IDs in global data Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 16/40] x86/apic: Use u32 for cpu_present_to_apicid() Thomas Gleixner
                   ` (27 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

APIC IDs are used with random data types u16, u32, int, unsigned int,
unsigned long.

Make it all consistently use u32 because that reflects the hardware
register width and move the default implementation to local.h as there are
no users outside the apic directory.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/x86/include/asm/apic.h         |    3 +--
 arch/x86/kernel/apic/apic_common.c  |    2 +-
 arch/x86/kernel/apic/apic_flat_64.c |    2 --
 arch/x86/kernel/apic/apic_noop.c    |    2 ++
 arch/x86/kernel/apic/bigsmp_32.c    |    2 +-
 arch/x86/kernel/apic/local.h        |    1 +
 6 files changed, 6 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -292,7 +292,7 @@ struct apic {
 	int	(*acpi_madt_oem_check)(char *oem_id, char *oem_table_id);
 	bool	(*apic_id_registered)(void);
 
-	bool	(*check_apicid_used)(physid_mask_t *map, int apicid);
+	bool	(*check_apicid_used)(physid_mask_t *map, u32 apicid);
 	void	(*init_apic_ldr)(void);
 	void	(*ioapic_phys_id_map)(physid_mask_t *phys_map, physid_mask_t *retmap);
 	int	(*cpu_present_to_apicid)(int mps_cpu);
@@ -538,7 +538,6 @@ extern int default_apic_id_valid(u32 api
 extern u32 apic_default_calc_apicid(unsigned int cpu);
 extern u32 apic_flat_calc_apicid(unsigned int cpu);
 
-extern bool default_check_apicid_used(physid_mask_t *map, int apicid);
 extern void default_ioapic_phys_id_map(physid_mask_t *phys_map, physid_mask_t *retmap);
 extern int default_cpu_present_to_apicid(int mps_cpu);
 
--- a/arch/x86/kernel/apic/apic_common.c
+++ b/arch/x86/kernel/apic/apic_common.c
@@ -18,7 +18,7 @@ u32 apic_flat_calc_apicid(unsigned int c
 	return 1U << cpu;
 }
 
-bool default_check_apicid_used(physid_mask_t *map, int apicid)
+bool default_check_apicid_used(physid_mask_t *map, u32 apicid)
 {
 	return physid_isset(apicid, *map);
 }
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -158,8 +158,6 @@ static struct apic apic_physflat __ro_af
 
 	.disable_esr			= 0,
 
-	.check_apicid_used		= NULL,
-	.ioapic_phys_id_map		= NULL,
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
 	.phys_pkg_id			= flat_phys_pkg_id,
 
--- a/arch/x86/kernel/apic/apic_noop.c
+++ b/arch/x86/kernel/apic/apic_noop.c
@@ -18,6 +18,8 @@
 
 #include <asm/apic.h>
 
+#include "local.h"
+
 static void noop_send_IPI(int cpu, int vector) { }
 static void noop_send_IPI_mask(const struct cpumask *cpumask, int vector) { }
 static void noop_send_IPI_mask_allbutself(const struct cpumask *cpumask, int vector) { }
--- a/arch/x86/kernel/apic/bigsmp_32.c
+++ b/arch/x86/kernel/apic/bigsmp_32.c
@@ -18,7 +18,7 @@ static unsigned bigsmp_get_apic_id(unsig
 	return (x >> 24) & 0xFF;
 }
 
-static bool bigsmp_check_apicid_used(physid_mask_t *map, int apicid)
+static bool bigsmp_check_apicid_used(physid_mask_t *map, u32 apicid)
 {
 	return false;
 }
--- a/arch/x86/kernel/apic/local.h
+++ b/arch/x86/kernel/apic/local.h
@@ -64,6 +64,7 @@ void default_send_IPI_all(int vector);
 void default_send_IPI_self(int vector);
 
 bool default_apic_id_registered(void);
+bool default_check_apicid_used(physid_mask_t *map, u32 apicid);
 
 #ifdef CONFIG_X86_32
 void default_send_IPI_mask_sequence_logical(const struct cpumask *mask, int vector);


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 16/40] x86/apic: Use u32 for cpu_present_to_apicid()
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (14 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 15/40] x86/apic: Use u32 for check_apicid_used() Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 17/40] x86/apic: Use u32 for phys_pkg_id() Thomas Gleixner
                   ` (26 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

APIC IDs are used with random data types u16, u32, int, unsigned int,
unsigned long.

Make it all consistently use u32 because that reflects the hardware
register width and fixup a few related usage sites for consistency sake.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/x86/include/asm/apic.h        |    4 ++--
 arch/x86/kernel/apic/apic_common.c |    2 +-
 arch/x86/kernel/cpu/common.c       |    3 ++-
 arch/x86/kernel/smpboot.c          |   10 +++++-----
 arch/x86/xen/apic.c                |    2 +-
 5 files changed, 11 insertions(+), 10 deletions(-)

--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -295,7 +295,7 @@ struct apic {
 	bool	(*check_apicid_used)(physid_mask_t *map, u32 apicid);
 	void	(*init_apic_ldr)(void);
 	void	(*ioapic_phys_id_map)(physid_mask_t *phys_map, physid_mask_t *retmap);
-	int	(*cpu_present_to_apicid)(int mps_cpu);
+	u32	(*cpu_present_to_apicid)(int mps_cpu);
 	int	(*phys_pkg_id)(int cpuid_apic, int index_msb);
 
 	u32	(*get_apic_id)(unsigned long x);
@@ -539,7 +539,7 @@ extern u32 apic_default_calc_apicid(unsi
 extern u32 apic_flat_calc_apicid(unsigned int cpu);
 
 extern void default_ioapic_phys_id_map(physid_mask_t *phys_map, physid_mask_t *retmap);
-extern int default_cpu_present_to_apicid(int mps_cpu);
+extern u32 default_cpu_present_to_apicid(int mps_cpu);
 
 #else /* CONFIG_X86_LOCAL_APIC */
 
--- a/arch/x86/kernel/apic/apic_common.c
+++ b/arch/x86/kernel/apic/apic_common.c
@@ -28,7 +28,7 @@ void default_ioapic_phys_id_map(physid_m
 	*retmap = *phys_map;
 }
 
-int default_cpu_present_to_apicid(int mps_cpu)
+u32 default_cpu_present_to_apicid(int mps_cpu)
 {
 	if (mps_cpu < nr_cpu_ids && cpu_present(mps_cpu))
 		return (int)per_cpu(x86_cpu_to_apicid, mps_cpu);
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1747,7 +1747,8 @@ static void generic_identify(struct cpui
 static void validate_apic_and_package_id(struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_SMP
-	unsigned int apicid, cpu = smp_processor_id();
+	unsigned int cpu = smp_processor_id();
+	u32 apicid;
 
 	apicid = apic->cpu_present_to_apicid(cpu);
 
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -816,7 +816,7 @@ static void __init smp_quirk_init_udelay
 /*
  * Wake up AP by INIT, INIT, STARTUP sequence.
  */
-static void send_init_sequence(int phys_apicid)
+static void send_init_sequence(u32 phys_apicid)
 {
 	int maxlvt = lapic_get_maxlvt();
 
@@ -842,7 +842,7 @@ static void send_init_sequence(int phys_
 /*
  * Wake up AP by INIT, INIT, STARTUP sequence.
  */
-static int wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip)
+static int wakeup_secondary_cpu_via_init(u32 phys_apicid, unsigned long start_eip)
 {
 	unsigned long send_status = 0, accept_status = 0;
 	int num_starts, j, maxlvt;
@@ -989,7 +989,7 @@ int common_cpu_up(unsigned int cpu, stru
  * Returns zero if startup was successfully sent, else error code from
  * ->wakeup_secondary_cpu.
  */
-static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
+static int do_boot_cpu(u32 apicid, int cpu, struct task_struct *idle)
 {
 	unsigned long start_ip = real_mode_header->trampoline_start;
 	int ret;
@@ -1057,7 +1057,7 @@ static int do_boot_cpu(int apicid, int c
 
 int native_kick_ap(unsigned int cpu, struct task_struct *tidle)
 {
-	int apicid = apic->cpu_present_to_apicid(cpu);
+	u32 apicid = apic->cpu_present_to_apicid(cpu);
 	int err;
 
 	lockdep_assert_irqs_enabled();
@@ -1250,7 +1250,7 @@ void arch_thaw_secondary_cpus_end(void)
 bool smp_park_other_cpus_in_init(void)
 {
 	unsigned int cpu, this_cpu = smp_processor_id();
-	unsigned int apicid;
+	u32 apicid;
 
 	if (apic->wakeup_secondary_cpu_64 || apic->wakeup_secondary_cpu)
 		return false;
--- a/arch/x86/xen/apic.c
+++ b/arch/x86/xen/apic.c
@@ -115,7 +115,7 @@ static int xen_phys_pkg_id(int initial_a
 	return initial_apic_id >> index_msb;
 }
 
-static int xen_cpu_present_to_apicid(int cpu)
+static u32 xen_cpu_present_to_apicid(int cpu)
 {
 	if (cpu_present(cpu))
 		return cpu_data(cpu).topo.apicid;


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 17/40] x86/apic: Use u32 for phys_pkg_id()
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (15 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 16/40] x86/apic: Use u32 for cpu_present_to_apicid() Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 18/40] x86/apic: Use u32 for [gs]et_apic_id() Thomas Gleixner
                   ` (25 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

APIC IDs are used with random data types u16, u32, int, unsigned int,
unsigned long.

Make it all consistently use u32 because that reflects the hardware
register width even if that callback going to be removed soonish.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/x86/include/asm/apic.h          |    2 +-
 arch/x86/kernel/apic/apic_flat_64.c  |    2 +-
 arch/x86/kernel/apic/apic_noop.c     |    2 +-
 arch/x86/kernel/apic/apic_numachip.c |    2 +-
 arch/x86/kernel/apic/bigsmp_32.c     |    2 +-
 arch/x86/kernel/apic/local.h         |    2 +-
 arch/x86/kernel/apic/probe_32.c      |    2 +-
 arch/x86/kernel/apic/x2apic_phys.c   |    2 +-
 arch/x86/kernel/apic/x2apic_uv_x.c   |    2 +-
 arch/x86/kernel/vsmp_64.c            |    2 +-
 arch/x86/xen/apic.c                  |    2 +-
 11 files changed, 11 insertions(+), 11 deletions(-)

--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -296,7 +296,7 @@ struct apic {
 	void	(*init_apic_ldr)(void);
 	void	(*ioapic_phys_id_map)(physid_mask_t *phys_map, physid_mask_t *retmap);
 	u32	(*cpu_present_to_apicid)(int mps_cpu);
-	int	(*phys_pkg_id)(int cpuid_apic, int index_msb);
+	u32	(*phys_pkg_id)(u32 cpuid_apic, int index_msb);
 
 	u32	(*get_apic_id)(unsigned long x);
 	u32	(*set_apic_id)(unsigned int id);
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -66,7 +66,7 @@ static u32 set_apic_id(unsigned int id)
 	return (id & 0xFF) << 24;
 }
 
-static int flat_phys_pkg_id(int initial_apic_id, int index_msb)
+static u32 flat_phys_pkg_id(u32 initial_apic_id, int index_msb)
 {
 	return initial_apic_id >> index_msb;
 }
--- a/arch/x86/kernel/apic/apic_noop.c
+++ b/arch/x86/kernel/apic/apic_noop.c
@@ -29,7 +29,7 @@ static void noop_send_IPI_self(int vecto
 static void noop_apic_icr_write(u32 low, u32 id) { }
 static int noop_wakeup_secondary_cpu(int apicid, unsigned long start_eip) { return -1; }
 static u64 noop_apic_icr_read(void) { return 0; }
-static int noop_phys_pkg_id(int cpuid_apic, int index_msb) { return 0; }
+static u32 noop_phys_pkg_id(u32 cpuid_apic, int index_msb) { return 0; }
 static unsigned int noop_get_apic_id(unsigned long x) { return 0; }
 static void noop_apic_eoi(void) { }
 
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -56,7 +56,7 @@ static u32 numachip2_set_apic_id(unsigne
 	return id << 24;
 }
 
-static int numachip_phys_pkg_id(int initial_apic_id, int index_msb)
+static u32 numachip_phys_pkg_id(u32 initial_apic_id, int index_msb)
 {
 	return initial_apic_id >> index_msb;
 }
--- a/arch/x86/kernel/apic/bigsmp_32.c
+++ b/arch/x86/kernel/apic/bigsmp_32.c
@@ -29,7 +29,7 @@ static void bigsmp_ioapic_phys_id_map(ph
 	physids_promote(0xFFL, retmap);
 }
 
-static int bigsmp_phys_pkg_id(int cpuid_apic, int index_msb)
+static u32 bigsmp_phys_pkg_id(u32 cpuid_apic, int index_msb)
 {
 	return cpuid_apic >> index_msb;
 }
--- a/arch/x86/kernel/apic/local.h
+++ b/arch/x86/kernel/apic/local.h
@@ -17,7 +17,7 @@
 void __x2apic_send_IPI_dest(unsigned int apicid, int vector, unsigned int dest);
 unsigned int x2apic_get_apic_id(unsigned long id);
 u32 x2apic_set_apic_id(unsigned int id);
-int x2apic_phys_pkg_id(int initial_apicid, int index_msb);
+u32 x2apic_phys_pkg_id(u32 initial_apicid, int index_msb);
 
 void x2apic_send_IPI_all(int vector);
 void x2apic_send_IPI_allbutself(int vector);
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -18,7 +18,7 @@
 
 #include "local.h"
 
-static int default_phys_pkg_id(int cpuid_apic, int index_msb)
+static u32 default_phys_pkg_id(u32 cpuid_apic, int index_msb)
 {
 	return cpuid_apic >> index_msb;
 }
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -134,7 +134,7 @@ u32 x2apic_set_apic_id(unsigned int id)
 	return id;
 }
 
-int x2apic_phys_pkg_id(int initial_apicid, int index_msb)
+u32 x2apic_phys_pkg_id(u32 initial_apicid, int index_msb)
 {
 	return initial_apicid >> index_msb;
 }
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -790,7 +790,7 @@ static unsigned int uv_read_apic_id(void
 	return x2apic_get_apic_id(apic_read(APIC_ID));
 }
 
-static int uv_phys_pkg_id(int initial_apicid, int index_msb)
+static u32 uv_phys_pkg_id(u32 initial_apicid, int index_msb)
 {
 	return uv_read_apic_id() >> index_msb;
 }
--- a/arch/x86/kernel/vsmp_64.c
+++ b/arch/x86/kernel/vsmp_64.c
@@ -127,7 +127,7 @@ static void __init vsmp_cap_cpus(void)
 #endif
 }
 
-static int apicid_phys_pkg_id(int initial_apic_id, int index_msb)
+static u32 apicid_phys_pkg_id(u32 initial_apic_id, int index_msb)
 {
 	return read_apic_id() >> index_msb;
 }
--- a/arch/x86/xen/apic.c
+++ b/arch/x86/xen/apic.c
@@ -110,7 +110,7 @@ static int xen_madt_oem_check(char *oem_
 	return xen_pv_domain();
 }
 
-static int xen_phys_pkg_id(int initial_apic_id, int index_msb)
+static u32 xen_phys_pkg_id(u32 initial_apic_id, int index_msb)
 {
 	return initial_apic_id >> index_msb;
 }


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 18/40] x86/apic: Use u32 for [gs]et_apic_id()
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (16 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 17/40] x86/apic: Use u32 for phys_pkg_id() Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 19/40] x86/apic: Use u32 for wakeup_secondary_cpu[_64]() Thomas Gleixner
                   ` (24 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

APIC IDs are used with random data types u16, u32, int, unsigned int,
unsigned long.

Make it all consistently use u32 because that reflects the hardware
register width.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/apic.h          |   14 ++------------
 arch/x86/kernel/apic/apic_flat_64.c  |    4 ++--
 arch/x86/kernel/apic/apic_noop.c     |    2 +-
 arch/x86/kernel/apic/apic_numachip.c |    8 ++++----
 arch/x86/kernel/apic/bigsmp_32.c     |    2 +-
 arch/x86/kernel/apic/local.h         |    4 ++--
 arch/x86/kernel/apic/probe_32.c      |   10 ++++++++++
 arch/x86/kernel/apic/x2apic_phys.c   |    4 ++--
 arch/x86/kernel/apic/x2apic_uv_x.c   |    2 +-
 arch/x86/xen/apic.c                  |    4 ++--
 10 files changed, 27 insertions(+), 27 deletions(-)

--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -298,8 +298,8 @@ struct apic {
 	u32	(*cpu_present_to_apicid)(int mps_cpu);
 	u32	(*phys_pkg_id)(u32 cpuid_apic, int index_msb);
 
-	u32	(*get_apic_id)(unsigned long x);
-	u32	(*set_apic_id)(unsigned int id);
+	u32	(*get_apic_id)(u32 id);
+	u32	(*set_apic_id)(u32 apicid);
 
 	/* wakeup_secondary_cpu */
 	int	(*wakeup_secondary_cpu)(int apicid, unsigned long start_eip);
@@ -493,16 +493,6 @@ static inline bool lapic_vector_set_in_i
 	return !!(irr & (1U << (vector % 32)));
 }
 
-static inline unsigned default_get_apic_id(unsigned long x)
-{
-	unsigned int ver = GET_APIC_VERSION(apic_read(APIC_LVR));
-
-	if (APIC_XAPIC(ver) || boot_cpu_has(X86_FEATURE_EXTD_APICID))
-		return (x >> 24) & 0xFF;
-	else
-		return (x >> 24) & 0x0F;
-}
-
 /*
  * Warm reset vector position:
  */
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -56,12 +56,12 @@ flat_send_IPI_mask_allbutself(const stru
 	_flat_send_IPI_mask(mask, vector);
 }
 
-static unsigned int flat_get_apic_id(unsigned long x)
+static u32 flat_get_apic_id(u32 x)
 {
 	return (x >> 24) & 0xFF;
 }
 
-static u32 set_apic_id(unsigned int id)
+static u32 set_apic_id(u32 id)
 {
 	return (id & 0xFF) << 24;
 }
--- a/arch/x86/kernel/apic/apic_noop.c
+++ b/arch/x86/kernel/apic/apic_noop.c
@@ -30,7 +30,7 @@ static void noop_apic_icr_write(u32 low,
 static int noop_wakeup_secondary_cpu(int apicid, unsigned long start_eip) { return -1; }
 static u64 noop_apic_icr_read(void) { return 0; }
 static u32 noop_phys_pkg_id(u32 cpuid_apic, int index_msb) { return 0; }
-static unsigned int noop_get_apic_id(unsigned long x) { return 0; }
+static u32 noop_get_apic_id(u32 apicid) { return 0; }
 static void noop_apic_eoi(void) { }
 
 static u32 noop_apic_read(u32 reg)
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -25,7 +25,7 @@ static const struct apic apic_numachip1;
 static const struct apic apic_numachip2;
 static void (*numachip_apic_icr_write)(int apicid, unsigned int val) __read_mostly;
 
-static unsigned int numachip1_get_apic_id(unsigned long x)
+static u32 numachip1_get_apic_id(u32 x)
 {
 	unsigned long value;
 	unsigned int id = (x >> 24) & 0xff;
@@ -38,12 +38,12 @@ static unsigned int numachip1_get_apic_i
 	return id;
 }
 
-static u32 numachip1_set_apic_id(unsigned int id)
+static u32 numachip1_set_apic_id(u32 id)
 {
 	return (id & 0xff) << 24;
 }
 
-static unsigned int numachip2_get_apic_id(unsigned long x)
+static u32 numachip2_get_apic_id(u32 x)
 {
 	u64 mcfg;
 
@@ -51,7 +51,7 @@ static unsigned int numachip2_get_apic_i
 	return ((mcfg >> (28 - 8)) & 0xfff00) | (x >> 24);
 }
 
-static u32 numachip2_set_apic_id(unsigned int id)
+static u32 numachip2_set_apic_id(u32 id)
 {
 	return id << 24;
 }
--- a/arch/x86/kernel/apic/bigsmp_32.c
+++ b/arch/x86/kernel/apic/bigsmp_32.c
@@ -13,7 +13,7 @@
 
 #include "local.h"
 
-static unsigned bigsmp_get_apic_id(unsigned long x)
+static u32 bigsmp_get_apic_id(u32 x)
 {
 	return (x >> 24) & 0xFF;
 }
--- a/arch/x86/kernel/apic/local.h
+++ b/arch/x86/kernel/apic/local.h
@@ -15,8 +15,8 @@
 
 /* X2APIC */
 void __x2apic_send_IPI_dest(unsigned int apicid, int vector, unsigned int dest);
-unsigned int x2apic_get_apic_id(unsigned long id);
-u32 x2apic_set_apic_id(unsigned int id);
+u32 x2apic_get_apic_id(u32 id);
+u32 x2apic_set_apic_id(u32 id);
 u32 x2apic_phys_pkg_id(u32 initial_apicid, int index_msb);
 
 void x2apic_send_IPI_all(int vector);
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -23,6 +23,16 @@ static u32 default_phys_pkg_id(u32 cpuid
 	return cpuid_apic >> index_msb;
 }
 
+static u32 default_get_apic_id(u32 x)
+{
+	unsigned int ver = GET_APIC_VERSION(apic_read(APIC_LVR));
+
+	if (APIC_XAPIC(ver) || boot_cpu_has(X86_FEATURE_EXTD_APICID))
+		return (x >> 24) & 0xFF;
+	else
+		return (x >> 24) & 0x0F;
+}
+
 /* should be called last. */
 static int probe_default(void)
 {
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -124,12 +124,12 @@ static int x2apic_phys_probe(void)
 	return apic == &apic_x2apic_phys;
 }
 
-unsigned int x2apic_get_apic_id(unsigned long id)
+u32 x2apic_get_apic_id(u32 id)
 {
 	return id;
 }
 
-u32 x2apic_set_apic_id(unsigned int id)
+u32 x2apic_set_apic_id(u32 id)
 {
 	return id;
 }
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -780,7 +780,7 @@ static void uv_send_IPI_all(int vector)
 	uv_send_IPI_mask(cpu_online_mask, vector);
 }
 
-static u32 set_apic_id(unsigned int id)
+static u32 set_apic_id(u32 id)
 {
 	return id;
 }
--- a/arch/x86/xen/apic.c
+++ b/arch/x86/xen/apic.c
@@ -33,13 +33,13 @@ static unsigned int xen_io_apic_read(uns
 	return 0xfd;
 }
 
-static u32 xen_set_apic_id(unsigned int x)
+static u32 xen_set_apic_id(u32 x)
 {
 	WARN_ON(1);
 	return x;
 }
 
-static unsigned int xen_get_apic_id(unsigned long x)
+static u32 xen_get_apic_id(u32 x)
 {
 	return ((x)>>24) & 0xFFu;
 }


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 19/40] x86/apic: Use u32 for wakeup_secondary_cpu[_64]()
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (17 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 18/40] x86/apic: Use u32 for [gs]et_apic_id() Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-10  7:58   ` Qiuxu Zhuo
  2023-08-02 10:21 ` [patch V3 20/40] x86/cpu/topology: Cure the abuse of cpuinfo for persisting logical ids Thomas Gleixner
                   ` (23 subsequent siblings)
  42 siblings, 1 reply; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

APIC IDs are used with random data types u16, u32, int, unsigned int,
unsigned long.

Make it all consistently use u32 because that reflects the hardware
register width.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
---
 arch/x86/hyperv/hv_vtl.c             |    2 +-
 arch/x86/include/asm/apic.h          |    8 ++++----
 arch/x86/kernel/acpi/boot.c          |    2 +-
 arch/x86/kernel/apic/apic_noop.c     |    2 +-
 arch/x86/kernel/apic/apic_numachip.c |    2 +-
 arch/x86/kernel/apic/x2apic_uv_x.c   |    2 +-
 arch/x86/kernel/sev.c                |    2 +-
 7 files changed, 10 insertions(+), 10 deletions(-)

--- a/arch/x86/hyperv/hv_vtl.c
+++ b/arch/x86/hyperv/hv_vtl.c
@@ -192,7 +192,7 @@ static int hv_vtl_apicid_to_vp_id(u32 ap
 	return ret;
 }
 
-static int hv_vtl_wakeup_secondary_cpu(int apicid, unsigned long start_eip)
+static int hv_vtl_wakeup_secondary_cpu(u32 apicid, unsigned long start_eip)
 {
 	int vp_id;
 
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -302,9 +302,9 @@ struct apic {
 	u32	(*set_apic_id)(u32 apicid);
 
 	/* wakeup_secondary_cpu */
-	int	(*wakeup_secondary_cpu)(int apicid, unsigned long start_eip);
+	int	(*wakeup_secondary_cpu)(u32 apicid, unsigned long start_eip);
 	/* wakeup secondary CPU using 64-bit wakeup point */
-	int	(*wakeup_secondary_cpu_64)(int apicid, unsigned long start_eip);
+	int	(*wakeup_secondary_cpu_64)(u32 apicid, unsigned long start_eip);
 
 	char	*name;
 };
@@ -322,8 +322,8 @@ struct apic_override {
 	void	(*send_IPI_self)(int vector);
 	u64	(*icr_read)(void);
 	void	(*icr_write)(u32 low, u32 high);
-	int	(*wakeup_secondary_cpu)(int apicid, unsigned long start_eip);
-	int	(*wakeup_secondary_cpu_64)(int apicid, unsigned long start_eip);
+	int	(*wakeup_secondary_cpu)(u32 apicid, unsigned long start_eip);
+	int	(*wakeup_secondary_cpu_64)(u32 apicid, unsigned long start_eip);
 };
 
 /*
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -358,7 +358,7 @@ acpi_parse_lapic_nmi(union acpi_subtable
 }
 
 #ifdef CONFIG_X86_64
-static int acpi_wakeup_cpu(int apicid, unsigned long start_ip)
+static int acpi_wakeup_cpu(u32 apicid, unsigned long start_ip)
 {
 	/*
 	 * Remap mailbox memory only for the first call to acpi_wakeup_cpu().
--- a/arch/x86/kernel/apic/apic_noop.c
+++ b/arch/x86/kernel/apic/apic_noop.c
@@ -27,7 +27,7 @@ static void noop_send_IPI_allbutself(int
 static void noop_send_IPI_all(int vector) { }
 static void noop_send_IPI_self(int vector) { }
 static void noop_apic_icr_write(u32 low, u32 id) { }
-static int noop_wakeup_secondary_cpu(int apicid, unsigned long start_eip) { return -1; }
+static int noop_wakeup_secondary_cpu(u32 apicid, unsigned long start_eip) { return -1; }
 static u64 noop_apic_icr_read(void) { return 0; }
 static u32 noop_phys_pkg_id(u32 cpuid_apic, int index_msb) { return 0; }
 static u32 noop_get_apic_id(u32 apicid) { return 0; }
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -71,7 +71,7 @@ static void numachip2_apic_icr_write(int
 	numachip2_write32_lcsr(NUMACHIP2_APIC_ICR, (apicid << 12) | val);
 }
 
-static int numachip_wakeup_secondary(int phys_apicid, unsigned long start_rip)
+static int numachip_wakeup_secondary(u32 phys_apicid, unsigned long start_rip)
 {
 	numachip_apic_icr_write(phys_apicid, APIC_DM_INIT);
 	numachip_apic_icr_write(phys_apicid, APIC_DM_STARTUP |
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -702,7 +702,7 @@ static __init void build_uv_gr_table(voi
 	}
 }
 
-static int uv_wakeup_secondary(int phys_apicid, unsigned long start_rip)
+static int uv_wakeup_secondary(u32 phys_apicid, unsigned long start_rip)
 {
 	unsigned long val;
 	int pnode;
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -940,7 +940,7 @@ static void snp_cleanup_vmsa(struct sev_
 		free_page((unsigned long)vmsa);
 }
 
-static int wakeup_cpu_via_vmgexit(int apic_id, unsigned long start_ip)
+static int wakeup_cpu_via_vmgexit(u32 apic_id, unsigned long start_ip)
 {
 	struct sev_es_save_area *cur_vmsa, *vmsa;
 	struct ghcb_state state;


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 20/40] x86/cpu/topology: Cure the abuse of cpuinfo for persisting logical ids
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (18 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 19/40] x86/apic: Use u32 for wakeup_secondary_cpu[_64]() Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 21/40] x86/cpu: Provide debug interface Thomas Gleixner
                   ` (22 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

Per CPU cpuinfo is used to persist the logical package and die IDs. That's
really not the right place simply because cpuinfo is subject to be
reinitialized when a CPU goes through an offline/online cycle.

This works by chance today, but that's far from correct and neither obvious
nor documented.

Add a per cpu datastructure which persists those logical IDs, which allows
to cleanup the CPUID evaluation code.

This is a temporary workaround until the larger topology management is in
place, which makes all of this logical management mechanics obsolete.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/smpboot.c |   33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -124,7 +124,20 @@ struct mwait_cpu_dead {
  */
 static DEFINE_PER_CPU_ALIGNED(struct mwait_cpu_dead, mwait_cpu_dead);
 
-/* Logical package management. We might want to allocate that dynamically */
+/* Logical package management. */
+struct logical_maps {
+	u32	phys_pkg_id;
+	u32	phys_die_id;
+	u32	logical_pkg_id;
+	u32	logical_die_id;
+};
+
+/* Temporary workaround until the full topology mechanics is in place */
+static DEFINE_PER_CPU_READ_MOSTLY(struct logical_maps, logical_maps) = {
+	.phys_pkg_id	= U32_MAX,
+	.phys_die_id	= U32_MAX,
+};
+
 unsigned int __max_logical_packages __read_mostly;
 EXPORT_SYMBOL(__max_logical_packages);
 static unsigned int logical_packages __read_mostly;
@@ -345,10 +358,8 @@ int topology_phys_to_logical_pkg(unsigne
 	int cpu;
 
 	for_each_possible_cpu(cpu) {
-		struct cpuinfo_x86 *c = &cpu_data(cpu);
-
-		if (c->initialized && c->topo.pkg_id == phys_pkg)
-			return c->topo.logical_pkg_id;
+		if (per_cpu(logical_maps.phys_pkg_id, cpu) == phys_pkg)
+			return per_cpu(logical_maps.logical_pkg_id, cpu);
 	}
 	return -1;
 }
@@ -366,11 +377,9 @@ static int topology_phys_to_logical_die(
 	int cpu, proc_id = cpu_data(cur_cpu).topo.pkg_id;
 
 	for_each_possible_cpu(cpu) {
-		struct cpuinfo_x86 *c = &cpu_data(cpu);
-
-		if (c->initialized && c->topo.die_id == die_id &&
-		    c->topo.pkg_id == proc_id)
-			return c->topo.logical_die_id;
+		if (per_cpu(logical_maps.phys_pkg_id, cpu) == proc_id &&
+		    per_cpu(logical_maps.phys_die_id, cpu) == die_id)
+			return per_cpu(logical_maps.logical_die_id, cpu);
 	}
 	return -1;
 }
@@ -395,6 +404,8 @@ int topology_update_package_map(unsigned
 			cpu, pkg, new);
 	}
 found:
+	per_cpu(logical_maps.phys_pkg_id, cpu) = pkg;
+	per_cpu(logical_maps.logical_pkg_id, cpu) = new;
 	cpu_data(cpu).topo.logical_pkg_id = new;
 	return 0;
 }
@@ -418,6 +429,8 @@ int topology_update_die_map(unsigned int
 			cpu, die, new);
 	}
 found:
+	per_cpu(logical_maps.phys_die_id, cpu) = die;
+	per_cpu(logical_maps.logical_die_id, cpu) = new;
 	cpu_data(cpu).topo.logical_die_id = new;
 	return 0;
 }


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 21/40] x86/cpu: Provide debug interface
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (19 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 20/40] x86/cpu/topology: Cure the abuse of cpuinfo for persisting logical ids Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 22/40] x86/cpu: Provide cpuid_read() et al Thomas Gleixner
                   ` (21 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

Provide debug files which dump the topology related information of
cpuinfo_x86. This is useful to validate the upcoming conversion of the
topology evaluation for correctness or bug compatibility.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Don't return ENODEV when offline and make online a field.
---
 arch/x86/kernel/cpu/Makefile  |    2 +
 arch/x86/kernel/cpu/debugfs.c |   58 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+)

--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -54,6 +54,8 @@ obj-$(CONFIG_X86_LOCAL_APIC)		+= perfctr
 obj-$(CONFIG_HYPERVISOR_GUEST)		+= vmware.o hypervisor.o mshyperv.o
 obj-$(CONFIG_ACRN_GUEST)		+= acrn.o
 
+obj-$(CONFIG_DEBUG_FS)			+= debugfs.o
+
 quiet_cmd_mkcapflags = MKCAP   $@
       cmd_mkcapflags = $(CONFIG_SHELL) $(srctree)/$(src)/mkcapflags.sh $@ $^
 
--- /dev/null
+++ b/arch/x86/kernel/cpu/debugfs.c
@@ -0,0 +1,58 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/debugfs.h>
+
+#include <asm/apic.h>
+#include <asm/processor.h>
+
+static int cpu_debug_show(struct seq_file *m, void *p)
+{
+	unsigned long cpu = (unsigned long)m->private;
+	struct cpuinfo_x86 *c = per_cpu_ptr(&cpu_info, cpu);
+
+	seq_printf(m, "online:              %d\n", cpu_online(cpu));
+	if (!c->initialized)
+		return 0;
+
+	seq_printf(m, "initial_apicid:      %x\n", c->topo.initial_apicid);
+	seq_printf(m, "apicid:              %x\n", c->topo.apicid);
+	seq_printf(m, "pkg_id:              %u\n", c->topo.pkg_id);
+	seq_printf(m, "die_id:              %u\n", c->topo.die_id);
+	seq_printf(m, "cu_id:               %u\n", c->topo.cu_id);
+	seq_printf(m, "core_id:             %u\n", c->topo.core_id);
+	seq_printf(m, "logical_pkg_id:      %u\n", c->topo.logical_pkg_id);
+	seq_printf(m, "logical_die_id:      %u\n", c->topo.logical_die_id);
+	seq_printf(m, "llc_id:              %u\n", c->topo.llc_id);
+	seq_printf(m, "l2c_id:              %u\n", c->topo.l2c_id);
+	seq_printf(m, "max_cores:           %u\n", c->x86_max_cores);
+	seq_printf(m, "max_die_per_pkg:     %u\n", __max_die_per_package);
+	seq_printf(m, "smp_num_siblings:    %u\n", smp_num_siblings);
+	return 0;
+}
+
+static int cpu_debug_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, cpu_debug_show, inode->i_private);
+}
+
+static const struct file_operations dfs_cpu_ops = {
+	.open		= cpu_debug_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+static __init int cpu_init_debugfs(void)
+{
+	struct dentry *dir, *base = debugfs_create_dir("topo", arch_debugfs_dir);
+	unsigned long id;
+	char name[10];
+
+	dir = debugfs_create_dir("cpus", base);
+	for_each_possible_cpu(id) {
+		sprintf(name, "%lu", id);
+		debugfs_create_file(name, 0444, dir, (void *)id, &dfs_cpu_ops);
+	}
+	return 0;
+}
+late_initcall(cpu_init_debugfs);


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 22/40] x86/cpu: Provide cpuid_read() et al.
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (20 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 21/40] x86/cpu: Provide debug interface Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 23/40] x86/cpu: Provide cpu_init/parse_topology() Thomas Gleixner
                   ` (20 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

Provide a few helper functions to read CPUID leafs or individual registers
into a data structure without requiring unions.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/cpuid.h |   36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

--- a/arch/x86/include/asm/cpuid.h
+++ b/arch/x86/include/asm/cpuid.h
@@ -127,6 +127,42 @@ static inline unsigned int cpuid_edx(uns
 	return edx;
 }
 
+static inline void __cpuid_read(unsigned int leaf, unsigned int subleaf, u32 *regs)
+{
+	regs[CPUID_EAX] = leaf;
+	regs[CPUID_ECX] = subleaf;
+	__cpuid(regs, regs + 1, regs + 2, regs + 3);
+}
+
+#define cpuid_subleaf(leaf, subleaf, regs) {		\
+	BUILD_BUG_ON(sizeof(*(regs)) != 16);		\
+	__cpuid_read(leaf, subleaf, (u32 *)(regs));	\
+}
+
+#define cpuid_leaf(leaf, regs) {			\
+	BUILD_BUG_ON(sizeof(*(regs)) != 16);		\
+	__cpuid_read(leaf, 0, (u32 *)(regs));		\
+}
+
+static inline void __cpuid_read_reg(unsigned int leaf, unsigned int subleaf,
+				    enum cpuid_regs_idx regidx, u32 *reg)
+{
+	u32 regs[4];
+
+	__cpuid_read(leaf, subleaf, regs);
+	*reg = regs[regidx];
+}
+
+#define cpuid_subleaf_reg(leaf, subleaf, regidx, reg) {		\
+	BUILD_BUG_ON(sizeof(*(reg)) != 4);			\
+	__cpuid_read_reg(leaf, subleaf, regidx, (u32 *)(reg));	\
+}
+
+#define cpuid_leaf_reg(leaf, regidx, reg) {			\
+	BUILD_BUG_ON(sizeof(*(reg)) != 4);			\
+	__cpuid_read_reg(leaf, 0, regidx, (u32 *)(reg));	\
+}
+
 static __always_inline bool cpuid_function_is_indexed(u32 function)
 {
 	switch (function) {


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 23/40] x86/cpu: Provide cpu_init/parse_topology()
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (21 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 22/40] x86/cpu: Provide cpuid_read() et al Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-04  8:14   ` K Prateek Nayak
  2023-08-12  6:41   ` Zhang, Rui
  2023-08-02 10:21 ` [patch V3 24/40] x86/cpu: Add legacy topology parser Thomas Gleixner
                   ` (19 subsequent siblings)
  42 siblings, 2 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

Topology evaluation is a complete disaster and impenetrable mess. It's
scattered all over the place with some vendor implementations doing early
evaluation and some not. The most horrific part is the permanent
overwriting of smt_max_siblings and __max_die_per_package, instead of
establishing them once on the boot CPU and validating the result on the
APs.

The goals are:

  - One topology evaluation entry point

  - Proper sharing of pointlessly duplicated code

  - Proper structuring of the evaluation logic and preferences.

  - Evaluating important system wide information only once on the boot CPU

  - Making the 0xb/0x1f leaf parsing less convoluted and actually fixing
    the short comings of leaf 0x1f evaluation.

Start to consolidate the topology evaluation code by providing the entry
points for the early boot CPU evaluation and for the final parsing on the
boot CPU and the APs.

Move the trivial pieces into that new code:

   - The initialization of cpuinfo_x86::topo

   - The evaluation of CPUID leaf 1, which presets topo::initial_apicid

   - topo_apicid is set to topo::initial_apicid when invoked from early
     boot. When invoked for the final evaluation on the boot CPU it reads
     the actual APIC ID, which makes apic_get_initial_apicid() obsolete
     once everything is converted over.

Provide a temporary helper function topo_converted() which shields off the
not yet converted CPU vendors from invoking code which would break them.
This shielding covers all vendor CPUs which support SMP, but not the
historical pure UP ones as they only need the topology info init and
eventually the initial APIC initialization.

Provide two new members in cpuinfo_x86::topo to store the maximum number of
SMT siblings and the number of dies per package and add them to the debugfs
readout. These two members will be used to populate this information on the
boot CPU and to validate the APs against it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/topology.h       |   19 +++
 arch/x86/kernel/cpu/Makefile          |    3 
 arch/x86/kernel/cpu/common.c          |   23 +---
 arch/x86/kernel/cpu/cpu.h             |    6 +
 arch/x86/kernel/cpu/debugfs.c         |   37 ++++++
 arch/x86/kernel/cpu/topology.h        |   31 +++++
 arch/x86/kernel/cpu/topology_common.c |  187 ++++++++++++++++++++++++++++++++++
 7 files changed, 289 insertions(+), 17 deletions(-)

--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -102,6 +102,25 @@ static inline void setup_node_to_cpumask
 
 #include <asm-generic/topology.h>
 
+/* Topology information */
+enum x86_topology_domains {
+	TOPO_SMT_DOMAIN,
+	TOPO_CORE_DOMAIN,
+	TOPO_MODULE_DOMAIN,
+	TOPO_TILE_DOMAIN,
+	TOPO_DIE_DOMAIN,
+	TOPO_PKG_DOMAIN,
+	TOPO_ROOT_DOMAIN,
+	TOPO_MAX_DOMAIN,
+};
+
+struct x86_topology_system {
+	unsigned int	dom_shifts[TOPO_MAX_DOMAIN];
+	unsigned int	dom_size[TOPO_MAX_DOMAIN];
+};
+
+extern struct x86_topology_system x86_topo_system;
+
 extern const struct cpumask *cpu_coregroup_mask(int cpu);
 extern const struct cpumask *cpu_clustergroup_mask(int cpu);
 
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -17,7 +17,8 @@ KMSAN_SANITIZE_common.o := n
 # As above, instrumenting secondary CPU boot code causes boot hangs.
 KCSAN_SANITIZE_common.o := n
 
-obj-y			:= cacheinfo.o scattered.o topology.o
+obj-y			:= cacheinfo.o scattered.o
+obj-y			+= topology_common.o topology.o
 obj-y			+= common.o
 obj-y			+= rdrand.o
 obj-y			+= match.o
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1553,6 +1553,8 @@ static void __init early_identify_cpu(st
 		setup_force_cpu_cap(X86_FEATURE_CPUID);
 		cpu_parse_early_param();
 
+		cpu_init_topology(c);
+
 		if (this_cpu->c_early_init)
 			this_cpu->c_early_init(c);
 
@@ -1563,6 +1565,7 @@ static void __init early_identify_cpu(st
 			this_cpu->c_bsp_init(c);
 	} else {
 		setup_clear_cpu_cap(X86_FEATURE_CPUID);
+		cpu_init_topology(c);
 	}
 
 	setup_force_cpu_cap(X86_FEATURE_ALWAYS);
@@ -1708,18 +1711,6 @@ static void generic_identify(struct cpui
 
 	get_cpu_address_sizes(c);
 
-	if (c->cpuid_level >= 0x00000001) {
-		c->topo.initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF;
-#ifdef CONFIG_X86_32
-# ifdef CONFIG_SMP
-		c->topo.apicid = apic->phys_pkg_id(c->topo.initial_apicid, 0);
-# else
-		c->topo.apicid = c->topo.initial_apicid;
-# endif
-#endif
-		c->topo.pkg_id = c->topo.initial_apicid;
-	}
-
 	get_model_name(c); /* Default name */
 
 	/*
@@ -1778,9 +1769,6 @@ static void identify_cpu(struct cpuinfo_
 	c->x86_model_id[0] = '\0';  /* Unset */
 	c->x86_max_cores = 1;
 	c->x86_coreid_bits = 0;
-	c->topo.cu_id = 0xff;
-	c->topo.llc_id = BAD_APICID;
-	c->topo.l2c_id = BAD_APICID;
 #ifdef CONFIG_X86_64
 	c->x86_clflush_size = 64;
 	c->x86_phys_bits = 36;
@@ -1799,6 +1787,8 @@ static void identify_cpu(struct cpuinfo_
 
 	generic_identify(c);
 
+	cpu_parse_topology(c);
+
 	if (this_cpu->c_identify)
 		this_cpu->c_identify(c);
 
@@ -1806,7 +1796,8 @@ static void identify_cpu(struct cpuinfo_
 	apply_forced_caps(c);
 
 #ifdef CONFIG_X86_64
-	c->topo.apicid = apic->phys_pkg_id(c->topo.initial_apicid, 0);
+	if (!topo_is_converted(c))
+		c->topo.apicid = apic->phys_pkg_id(c->topo.initial_apicid, 0);
 #endif
 
 	/*
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -2,6 +2,11 @@
 #ifndef ARCH_X86_CPU_H
 #define ARCH_X86_CPU_H
 
+#include <asm/cpu.h>
+#include <asm/topology.h>
+
+#include "topology.h"
+
 /* attempt to consolidate cpu attributes */
 struct cpu_dev {
 	const char	*c_vendor;
@@ -95,4 +100,5 @@ static inline bool spectre_v2_in_eibrs_m
 	       mode == SPECTRE_V2_EIBRS_RETPOLINE ||
 	       mode == SPECTRE_V2_EIBRS_LFENCE;
 }
+
 #endif /* ARCH_X86_CPU_H */
--- a/arch/x86/kernel/cpu/debugfs.c
+++ b/arch/x86/kernel/cpu/debugfs.c
@@ -5,6 +5,8 @@
 #include <asm/apic.h>
 #include <asm/processor.h>
 
+#include "cpu.h"
+
 static int cpu_debug_show(struct seq_file *m, void *p)
 {
 	unsigned long cpu = (unsigned long)m->private;
@@ -42,12 +44,47 @@ static const struct file_operations dfs_
 	.release	= single_release,
 };
 
+static int dom_debug_show(struct seq_file *m, void *p)
+{
+	static const char *domain_names[TOPO_ROOT_DOMAIN] = {
+		[TOPO_SMT_DOMAIN]	= "Thread",
+		[TOPO_CORE_DOMAIN]	= "Core",
+		[TOPO_MODULE_DOMAIN]	= "Module",
+		[TOPO_TILE_DOMAIN]	= "Tile",
+		[TOPO_DIE_DOMAIN]	= "Die",
+		[TOPO_PKG_DOMAIN]	= "Package",
+	};
+	unsigned int dom, nthreads = 1;
+
+	for (dom = 0; dom < TOPO_ROOT_DOMAIN; dom++) {
+		nthreads *= x86_topo_system.dom_size[dom];
+		seq_printf(m, "domain: %-10s shift: %u dom_size: %5u max_threads: %5u\n",
+			   domain_names[dom], x86_topo_system.dom_shifts[dom],
+			   x86_topo_system.dom_size[dom], nthreads);
+	}
+	return 0;
+}
+
+static int dom_debug_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, dom_debug_show, inode->i_private);
+}
+
+static const struct file_operations dfs_dom_ops = {
+	.open		= dom_debug_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
 static __init int cpu_init_debugfs(void)
 {
 	struct dentry *dir, *base = debugfs_create_dir("topo", arch_debugfs_dir);
 	unsigned long id;
 	char name[10];
 
+	debugfs_create_file("domains", 0444, base, NULL, &dfs_dom_ops);
+
 	dir = debugfs_create_dir("cpus", base);
 	for_each_possible_cpu(id) {
 		sprintf(name, "%lu", id);
--- /dev/null
+++ b/arch/x86/kernel/cpu/topology.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef ARCH_X86_TOPOLOGY_H
+#define ARCH_X86_TOPOLOGY_H
+
+struct topo_scan {
+	struct cpuinfo_x86	*c;
+	unsigned int		dom_shifts[TOPO_MAX_DOMAIN];
+	unsigned int		dom_ncpus[TOPO_MAX_DOMAIN];
+};
+
+bool topo_is_converted(struct cpuinfo_x86 *c);
+void cpu_init_topology(struct cpuinfo_x86 *c);
+void cpu_parse_topology(struct cpuinfo_x86 *c);
+void topology_set_dom(struct topo_scan *tscan, enum x86_topology_domains dom,
+		      unsigned int shift, unsigned int ncpus);
+
+static inline u32 topo_shift_apicid(u32 apicid, enum x86_topology_domains dom)
+{
+	if (dom == TOPO_SMT_DOMAIN)
+		return apicid;
+	return apicid >> x86_topo_system.dom_shifts[dom - 1];
+}
+
+static inline u32 topo_relative_domain_id(u32 apicid, enum x86_topology_domains dom)
+{
+	if (dom != TOPO_SMT_DOMAIN)
+		apicid >>= x86_topo_system.dom_shifts[dom - 1];
+	return apicid & (x86_topo_system.dom_size[dom] - 1);
+}
+
+#endif /* ARCH_X86_TOPOLOGY_H */
--- /dev/null
+++ b/arch/x86/kernel/cpu/topology_common.c
@@ -0,0 +1,187 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/cpu.h>
+
+#include <xen/xen.h>
+
+#include <asm/apic.h>
+#include <asm/processor.h>
+#include <asm/smp.h>
+
+#include "cpu.h"
+
+struct x86_topology_system x86_topo_system __ro_after_init;
+
+void topology_set_dom(struct topo_scan *tscan, enum x86_topology_domains dom,
+		      unsigned int shift, unsigned int ncpus)
+{
+	tscan->dom_shifts[dom] = shift;
+	tscan->dom_ncpus[dom] = ncpus;
+
+	/* Propagate to the upper levels */
+	for (dom++; dom < TOPO_MAX_DOMAIN; dom++) {
+		tscan->dom_shifts[dom] = tscan->dom_shifts[dom - 1];
+		tscan->dom_ncpus[dom] = tscan->dom_ncpus[dom - 1];
+	}
+}
+
+bool topo_is_converted(struct cpuinfo_x86 *c)
+{
+	/* Temporary until everything is converted over. */
+	switch (boot_cpu_data.x86_vendor) {
+	case X86_VENDOR_AMD:
+	case X86_VENDOR_CENTAUR:
+	case X86_VENDOR_INTEL:
+	case X86_VENDOR_HYGON:
+	case X86_VENDOR_ZHAOXIN:
+		return false;
+	default:
+		/* Let all UP systems use the below */
+		return true;
+	}
+}
+
+static bool fake_topology(struct topo_scan *tscan)
+{
+	/*
+	 * Preset the CORE level shift for CPUID less systems and XEN_PV,
+	 * which has useless CPUID information.
+	 */
+	topology_set_dom(tscan, TOPO_SMT_DOMAIN, 0, 1);
+	topology_set_dom(tscan, TOPO_CORE_DOMAIN, 1, 1);
+
+	return tscan->c->cpuid_level < 1 || xen_pv_domain();
+}
+
+static void parse_topology(struct topo_scan *tscan, bool early)
+{
+	const struct cpuinfo_topology topo_defaults = {
+		.cu_id			= 0xff,
+		.llc_id			= BAD_APICID,
+		.l2c_id			= BAD_APICID,
+	};
+	struct cpuinfo_x86 *c = tscan->c;
+	struct {
+		u32	unused0		: 16,
+			nproc		:  8,
+			apicid		:  8;
+	} ebx;
+
+	c->topo = topo_defaults;
+
+	if (fake_topology(tscan))
+	    return;
+
+	/* Preset Initial APIC ID from CPUID leaf 1 */
+	cpuid_leaf_reg(1, CPUID_EBX, &ebx);
+	c->topo.initial_apicid = ebx.apicid;
+
+	/*
+	 * The initial invocation from early_identify_cpu() happens before
+	 * the APIC is mapped or X2APIC enabled. For establishing the
+	 * topology, that's not required. Use the initial APIC ID.
+	 */
+	if (early)
+		c->topo.apicid = c->topo.initial_apicid;
+	else
+		c->topo.apicid = read_apic_id();
+
+	/* The above is sufficient for UP */
+	if (!IS_ENABLED(CONFIG_SMP))
+		return;
+}
+
+static void topo_set_ids(struct topo_scan *tscan)
+{
+	struct cpuinfo_x86 *c = tscan->c;
+	u32 apicid = c->topo.apicid;
+
+	c->topo.pkg_id = topo_shift_apicid(apicid, TOPO_PKG_DOMAIN);
+	c->topo.die_id = topo_shift_apicid(apicid, TOPO_DIE_DOMAIN);
+
+	/* Relative core ID */
+	c->topo.core_id = topo_relative_domain_id(apicid, TOPO_CORE_DOMAIN);
+}
+
+static void topo_set_max_cores(struct topo_scan *tscan)
+{
+	/*
+	 * Bug compatible for now. This is broken on hybrid systems:
+	 * 8 cores SMT + 8 cores w/o SMT
+	 * tscan.dom_ncpus[TOPO_CORE_DOMAIN] = 24; 24 / 2 = 12 !!
+	 *
+	 * Cannot be fixed without further topology enumeration changes.
+	 */
+	tscan->c->x86_max_cores = tscan->dom_ncpus[TOPO_CORE_DOMAIN] >>
+		x86_topo_system.dom_shifts[TOPO_SMT_DOMAIN];
+}
+
+void cpu_parse_topology(struct cpuinfo_x86 *c)
+{
+	unsigned int dom, cpu = smp_processor_id();
+	struct topo_scan tscan = { .c = c, };
+
+	parse_topology(&tscan, false);
+
+	if (!topo_is_converted(c))
+		return;
+
+	for (dom = TOPO_SMT_DOMAIN; dom < TOPO_MAX_DOMAIN; dom++) {
+		if (tscan.dom_shifts[dom] == x86_topo_system.dom_shifts[dom])
+			continue;
+		pr_err(FW_BUG "CPU%d: Topology domain %u shift %u != %u\n", cpu, dom,
+		       tscan.dom_shifts[dom], x86_topo_system.dom_shifts[dom]);
+	}
+
+	/* Bug compatible with the existing parsers */
+	if (tscan.dom_ncpus[TOPO_SMT_DOMAIN] > smp_num_siblings) {
+		if (system_state == SYSTEM_BOOTING) {
+			pr_warn_once("CPU%d: SMT detected and enabled late\n", cpu);
+			smp_num_siblings = tscan.dom_ncpus[TOPO_SMT_DOMAIN];
+		} else {
+			pr_warn_once("CPU%d: SMT detected after init. Too late!\n", cpu);
+		}
+	}
+
+	topo_set_ids(&tscan);
+	topo_set_max_cores(&tscan);
+}
+
+void __init cpu_init_topology(struct cpuinfo_x86 *c)
+{
+	struct topo_scan tscan = { .c = c, };
+	unsigned int dom, sft;
+
+	parse_topology(&tscan, true);
+
+	if (!topo_is_converted(c))
+		return;
+
+	/* Copy the shift values and calculate the unit sizes. */
+	memcpy(x86_topo_system.dom_shifts, tscan.dom_shifts, sizeof(x86_topo_system.dom_shifts));
+
+	dom = TOPO_SMT_DOMAIN;
+	x86_topo_system.dom_size[dom] = 1U << x86_topo_system.dom_shifts[dom];
+
+	for (dom++; dom < TOPO_MAX_DOMAIN; dom++) {
+		sft = x86_topo_system.dom_shifts[dom] - x86_topo_system.dom_shifts[dom - 1];
+		x86_topo_system.dom_size[dom] = 1U << sft;
+	}
+
+	topo_set_ids(&tscan);
+	topo_set_max_cores(&tscan);
+
+	/*
+	 * Bug compatible with the existing code. If the boot CPU does not
+	 * have SMT this ends up with one sibling. This needs way deeper
+	 * changes further down the road to get it right during early boot.
+	 */
+	smp_num_siblings = tscan.dom_ncpus[TOPO_SMT_DOMAIN];
+
+	/*
+	 * Neither it's clear whether there are as many dies as the APIC
+	 * space indicating die level is. But assume that the actual number
+	 * of CPUs gives a proper indication for now to stay bug compatible.
+	 */
+	__max_die_per_package = tscan.dom_ncpus[TOPO_DIE_DOMAIN] /
+		tscan.dom_ncpus[TOPO_DIE_DOMAIN - 1];
+}


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 24/40] x86/cpu: Add legacy topology parser
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (22 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 23/40] x86/cpu: Provide cpu_init/parse_topology() Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 25/40] x86/cpu: Use common topology code for Centaur and Zhaoxin Thomas Gleixner
                   ` (18 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

The legacy topology detection via CPUID leaf 4, which provides the number
of cores in the package and CPUID leaf 1 which provides the number of
logical CPUs in case that FEATURE_HT is enabled and the CMP_LEGACY feature
is not set, is shared for Intel, Centaur amd Zhaoxin CPUs.

Lift the code from common.c without the early detection hack and provide it
as common fallback mechanism.

Will be utilized in later changes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V3: Provide legacy data in leaf 0xb/0x1f format as expected by the rest of
    the code - Borislav
---
 arch/x86/kernel/cpu/common.c          |    3 ++
 arch/x86/kernel/cpu/topology.h        |    3 ++
 arch/x86/kernel/cpu/topology_common.c |   44 ++++++++++++++++++++++++++++++++++
 3 files changed, 50 insertions(+)

--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -883,6 +883,9 @@ void detect_ht(struct cpuinfo_x86 *c)
 #ifdef CONFIG_SMP
 	int index_msb, core_bits;
 
+	if (topo_is_converted(c))
+		return;
+
 	if (detect_ht_early(c) < 0)
 		return;
 
--- a/arch/x86/kernel/cpu/topology.h
+++ b/arch/x86/kernel/cpu/topology.h
@@ -6,6 +6,9 @@ struct topo_scan {
 	struct cpuinfo_x86	*c;
 	unsigned int		dom_shifts[TOPO_MAX_DOMAIN];
 	unsigned int		dom_ncpus[TOPO_MAX_DOMAIN];
+
+	// Legacy CPUID[1]:EBX[23:16] number of logical processors
+	unsigned int		ebx1_nproc_shift;
 };
 
 bool topo_is_converted(struct cpuinfo_x86 *c);
--- a/arch/x86/kernel/cpu/topology_common.c
+++ b/arch/x86/kernel/cpu/topology_common.c
@@ -24,6 +24,48 @@ void topology_set_dom(struct topo_scan *
 	}
 }
 
+static unsigned int parse_num_cores(struct cpuinfo_x86 *c)
+{
+	struct {
+		u32	cache_type	:  5,
+			unused		: 21,
+			ncores		:  6;
+	} eax;
+
+	if (c->cpuid_level < 4)
+		return 1;
+
+	cpuid_subleaf_reg(4, 0, CPUID_EAX, &eax);
+	if (!eax.cache_type)
+		return 1;
+
+	return eax.ncores + 1;
+}
+
+static void __maybe_unused parse_legacy(struct topo_scan *tscan)
+{
+	unsigned int cores, core_shift, smt_shift = 0;
+	struct cpuinfo_x86 *c = tscan->c;
+
+	cores = parse_num_cores(c);
+	core_shift = get_count_order(cores);
+
+	if (cpu_has(c, X86_FEATURE_HT)) {
+		if (!WARN_ON_ONCE(tscan->ebx1_nproc_shift < core_shift))
+			smt_shift = tscan->ebx1_nproc_shift - core_shift;
+		/*
+		 * The parser expects leaf 0xb/0x1f format, which means
+		 * the number of logical processors at core level is
+		 * counting threads.
+		 */
+		core_shift += smt_shift;
+		cores <<= smt_shift;
+	}
+
+	topology_set_dom(tscan, TOPO_SMT_DOMAIN, smt_shift, 1U << smt_shift);
+	topology_set_dom(tscan, TOPO_CORE_DOMAIN, core_shift, cores);
+}
+
 bool topo_is_converted(struct cpuinfo_x86 *c)
 {
 	/* Temporary until everything is converted over. */
@@ -88,6 +130,8 @@ static void parse_topology(struct topo_s
 	/* The above is sufficient for UP */
 	if (!IS_ENABLED(CONFIG_SMP))
 		return;
+
+	tscan->ebx1_nproc_shift = get_count_order(ebx.nproc);
 }
 
 static void topo_set_ids(struct topo_scan *tscan)


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 25/40] x86/cpu: Use common topology code for Centaur and Zhaoxin
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (23 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 24/40] x86/cpu: Add legacy topology parser Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 26/40] x86/cpu: Move __max_die_per_package to common.c Thomas Gleixner
                   ` (17 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

Centaur and Zhaoxin CPUs use only the legacy SMP detection. Remove the
invocations from their 32bit path and exempt them from the call 64bit.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/centaur.c         |    4 ----
 arch/x86/kernel/cpu/topology_common.c |   11 ++++++++---
 arch/x86/kernel/cpu/zhaoxin.c         |    4 ----
 3 files changed, 8 insertions(+), 11 deletions(-)

--- a/arch/x86/kernel/cpu/centaur.c
+++ b/arch/x86/kernel/cpu/centaur.c
@@ -128,10 +128,6 @@ static void init_centaur(struct cpuinfo_
 #endif
 	early_init_centaur(c);
 	init_intel_cacheinfo(c);
-	detect_num_cpu_cores(c);
-#ifdef CONFIG_X86_32
-	detect_ht(c);
-#endif
 
 	if (c->cpuid_level > 9) {
 		unsigned int eax = cpuid_eax(10);
--- a/arch/x86/kernel/cpu/topology_common.c
+++ b/arch/x86/kernel/cpu/topology_common.c
@@ -42,7 +42,7 @@ static unsigned int parse_num_cores(stru
 	return eax.ncores + 1;
 }
 
-static void __maybe_unused parse_legacy(struct topo_scan *tscan)
+static void parse_legacy(struct topo_scan *tscan)
 {
 	unsigned int cores, core_shift, smt_shift = 0;
 	struct cpuinfo_x86 *c = tscan->c;
@@ -71,10 +71,8 @@ bool topo_is_converted(struct cpuinfo_x8
 	/* Temporary until everything is converted over. */
 	switch (boot_cpu_data.x86_vendor) {
 	case X86_VENDOR_AMD:
-	case X86_VENDOR_CENTAUR:
 	case X86_VENDOR_INTEL:
 	case X86_VENDOR_HYGON:
-	case X86_VENDOR_ZHAOXIN:
 		return false;
 	default:
 		/* Let all UP systems use the below */
@@ -132,6 +130,13 @@ static void parse_topology(struct topo_s
 		return;
 
 	tscan->ebx1_nproc_shift = get_count_order(ebx.nproc);
+
+	switch (c->x86_vendor) {
+	case X86_VENDOR_CENTAUR:
+	case X86_VENDOR_ZHAOXIN:
+		parse_legacy(tscan);
+		break;
+	}
 }
 
 static void topo_set_ids(struct topo_scan *tscan)
--- a/arch/x86/kernel/cpu/zhaoxin.c
+++ b/arch/x86/kernel/cpu/zhaoxin.c
@@ -71,10 +71,6 @@ static void init_zhaoxin(struct cpuinfo_
 {
 	early_init_zhaoxin(c);
 	init_intel_cacheinfo(c);
-	detect_num_cpu_cores(c);
-#ifdef CONFIG_X86_32
-	detect_ht(c);
-#endif
 
 	if (c->cpuid_level > 9) {
 		unsigned int eax = cpuid_eax(10);


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 26/40] x86/cpu: Move __max_die_per_package to common.c
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (24 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 25/40] x86/cpu: Use common topology code for Centaur and Zhaoxin Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser Thomas Gleixner
                   ` (16 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

In preparation of a complete replacement for the topology leaf 0xb/0x1f
evaluation, move __max_die_per_package into the common code.

Will be removed once everything is converted over.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/common.c   |    3 +++
 arch/x86/kernel/cpu/topology.c |    3 ---
 2 files changed, 3 insertions(+), 3 deletions(-)

--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -75,6 +75,9 @@ u32 elf_hwcap2 __read_mostly;
 int smp_num_siblings = 1;
 EXPORT_SYMBOL(smp_num_siblings);
 
+unsigned int __max_die_per_package __read_mostly = 1;
+EXPORT_SYMBOL(__max_die_per_package);
+
 static struct ppin_info {
 	int	feature;
 	int	msr_ppin_ctl;
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -25,9 +25,6 @@
 #define BITS_SHIFT_NEXT_LEVEL(eax)	((eax) & 0x1f)
 #define LEVEL_MAX_SIBLINGS(ebx)		((ebx) & 0xffff)
 
-unsigned int __max_die_per_package __read_mostly = 1;
-EXPORT_SYMBOL(__max_die_per_package);
-
 #ifdef CONFIG_SMP
 /*
  * Check if given CPUID extended topology "leaf" is implemented


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (25 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 26/40] x86/cpu: Move __max_die_per_package to common.c Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-12  8:21   ` Zhang, Rui
  2023-08-02 10:21 ` [patch V3 28/40] x86/cpu: Use common topology code for Intel Thomas Gleixner
                   ` (15 subsequent siblings)
  42 siblings, 1 reply; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

detect_extended_topology() along with it's early() variant is a classic
example for duct tape engineering:

  - It evaluates an array of subleafs with a boatload of local variables
    for the relevant topology levels instead of using an array to save the
    enumerated information and propagate it to the right level

  - It has no boundary checks for subleafs

  - It prevents updating the die_id with a crude workaround instead of
    checking for leaf 0xb which does not provide die information.

  - It's broken vs. the number of dies evaluation as it uses:

      num_processors[DIE_LEVEL] / num_processors[CORE_LEVEL]

    which "works" only correctly if there is none of the intermediate
    topology levels (MODULE/TILE) enumerated.

There is zero value in trying to "fix" that code as the only proper fix is
to rewrite it from scratch.

Implement a sane parser with proper code documentation, which will be used
for the consolidated topology evaluation in the next step.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V2: Fixed up the comment alignment for registers - Peterz
---
 arch/x86/kernel/cpu/Makefile       |    2 
 arch/x86/kernel/cpu/topology.h     |   12 +++
 arch/x86/kernel/cpu/topology_ext.c |  136 +++++++++++++++++++++++++++++++++++++
 3 files changed, 149 insertions(+), 1 deletion(-)

--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -18,7 +18,7 @@ KMSAN_SANITIZE_common.o := n
 KCSAN_SANITIZE_common.o := n
 
 obj-y			:= cacheinfo.o scattered.o
-obj-y			+= topology_common.o topology.o
+obj-y			+= topology_common.o topology_ext.o topology.o
 obj-y			+= common.o
 obj-y			+= rdrand.o
 obj-y			+= match.o
--- a/arch/x86/kernel/cpu/topology.h
+++ b/arch/x86/kernel/cpu/topology.h
@@ -16,6 +16,7 @@ void cpu_init_topology(struct cpuinfo_x8
 void cpu_parse_topology(struct cpuinfo_x86 *c);
 void topology_set_dom(struct topo_scan *tscan, enum x86_topology_domains dom,
 		      unsigned int shift, unsigned int ncpus);
+bool cpu_parse_topology_ext(struct topo_scan *tscan);
 
 static inline u32 topo_shift_apicid(u32 apicid, enum x86_topology_domains dom)
 {
@@ -31,4 +32,15 @@ static inline u32 topo_relative_domain_i
 	return apicid & (x86_topo_system.dom_size[dom] - 1);
 }
 
+/*
+ * Update a domain level after the fact without propagating. Used to fixup
+ * broken CPUID enumerations.
+ */
+static inline void topology_update_dom(struct topo_scan *tscan, enum x86_topology_domains dom,
+				       unsigned int shift, unsigned int ncpus)
+{
+	tscan->dom_shifts[dom] = shift;
+	tscan->dom_ncpus[dom] = ncpus;
+}
+
 #endif /* ARCH_X86_TOPOLOGY_H */
--- /dev/null
+++ b/arch/x86/kernel/cpu/topology_ext.c
@@ -0,0 +1,136 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/cpu.h>
+
+#include <asm/apic.h>
+#include <asm/memtype.h>
+#include <asm/processor.h>
+
+#include "cpu.h"
+
+enum topo_types {
+	INVALID_TYPE	= 0,
+	SMT_TYPE	= 1,
+	CORE_TYPE	= 2,
+	MODULE_TYPE	= 3,
+	TILE_TYPE	= 4,
+	DIE_TYPE	= 5,
+	DIEGRP_TYPE	= 6,
+	MAX_TYPE	= 7,
+};
+
+/*
+ * Use a lookup table for the case that there are future types > 6 which
+ * describe an intermediate domain level which does not exist today.
+ *
+ * A table will also be handy to parse the new AMD 0x80000026 leaf which
+ * has defined different domain types, but otherwise uses the same layout
+ * with some of the reserved bits used for new information.
+ */
+static const unsigned int topo_domain_map[MAX_TYPE] = {
+	[SMT_TYPE]	= TOPO_SMT_DOMAIN,
+	[CORE_TYPE]	= TOPO_CORE_DOMAIN,
+	[MODULE_TYPE]	= TOPO_MODULE_DOMAIN,
+	[TILE_TYPE]	= TOPO_TILE_DOMAIN,
+	[DIE_TYPE]	= TOPO_DIE_DOMAIN,
+	[DIEGRP_TYPE]	= TOPO_PKG_DOMAIN,
+};
+
+static inline bool topo_subleaf(struct topo_scan *tscan, u32 leaf, u32 subleaf)
+{
+	unsigned int dom, maxtype = leaf == 0xb ? CORE_TYPE + 1 : MAX_TYPE;
+	struct {
+		// eax
+		u32	x2apic_shift	:  5, // Number of bits to shift APIC ID right
+					      // for the topology ID at the next level
+			__rsvd0		: 27; // Reserved
+		// ebx
+		u32	num_processors	: 16, // Number of processors at current level
+			__rsvd1		: 16; // Reserved
+		// ecx
+		u32	level		:  8, // Current topology level. Same as sub leaf number
+			type		:  8, // Level type. If 0, invalid
+			__rsvd2		: 16; // Reserved
+		// edx
+		u32	x2apic_id	: 32; // X2APIC ID of the current logical processor
+	} sl;
+
+	cpuid_subleaf(leaf, subleaf, &sl);
+
+	if (!sl.num_processors || sl.type == INVALID_TYPE)
+		return false;
+
+	if (sl.type >= maxtype) {
+		/*
+		 * As the subleafs are ordered in domain level order, this
+		 * could be recovered in theory by propagating the
+		 * information at the last parsed level.
+		 *
+		 * But if the infinite wisdom of hardware folks decides to
+		 * create a new domain type between CORE and MODULE or DIE
+		 * and DIEGRP, then that would overwrite the CORE or DIE
+		 * information.
+		 *
+		 * It really would have been too obvious to make the domain
+		 * type space sparse and leave a few reserved types between
+		 * the points which might change instead of forcing
+		 * software to either create a monstrosity of workarounds
+		 * or just being up the creek without a paddle.
+		 *
+		 * Refuse to implement monstrosity, emit an error and try
+		 * to survive.
+		 */
+		pr_err_once("Topology: leaf 0x%x:%d Unknown domain type %u\n",
+			    leaf, subleaf, sl.type);
+		return true;
+	}
+
+	dom = topo_domain_map[sl.type];
+	if (!dom) {
+		tscan->c->topo.initial_apicid = sl.x2apic_id;
+	} else if (tscan->c->topo.initial_apicid != sl.x2apic_id) {
+		pr_warn_once(FW_BUG "CPUID leaf 0x%x subleaf %d APIC ID mismatch %x != %x\n",
+			     leaf, subleaf, tscan->c->topo.initial_apicid, sl.x2apic_id);
+	}
+
+	topology_set_dom(tscan, dom, sl.x2apic_shift, sl.num_processors);
+	return true;
+}
+
+static bool parse_topology_leaf(struct topo_scan *tscan, u32 leaf)
+{
+	u32 subleaf;
+
+	if (tscan->c->cpuid_level < leaf)
+		return false;
+
+	/* Read all available subleafs and populate the levels */
+	for (subleaf = 0; topo_subleaf(tscan, leaf, subleaf); subleaf++);
+
+	/* If subleaf 0 failed to parse, give up */
+	if (!subleaf)
+		return false;
+
+	/*
+	 * There are machines in the wild which have shift 0 in the subleaf
+	 * 0, but advertise 2 logical processors at that level. They are
+	 * truly SMT.
+	 */
+	if (!tscan->dom_shifts[TOPO_SMT_DOMAIN] && tscan->dom_ncpus[TOPO_SMT_DOMAIN] > 1) {
+		unsigned int sft = get_count_order(tscan->dom_ncpus[TOPO_SMT_DOMAIN]);
+
+		pr_warn_once(FW_BUG "CPUID leaf 0x%x subleaf 0 has shift level 0 but %u CPUs\n",
+			     leaf, tscan->dom_ncpus[TOPO_SMT_DOMAIN]);
+		topology_update_dom(tscan, TOPO_SMT_DOMAIN, sft, tscan->dom_ncpus[TOPO_SMT_DOMAIN]);
+	}
+
+	set_cpu_cap(tscan->c, X86_FEATURE_XTOPOLOGY);
+	return true;
+}
+
+bool cpu_parse_topology_ext(struct topo_scan *tscan)
+{
+	/* Try lead 0x1F first. If not available try leaf 0x0b */
+	if (parse_topology_leaf(tscan, 0x1f))
+		return true;
+	return parse_topology_leaf(tscan, 0x0b);
+}


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 28/40] x86/cpu: Use common topology code for Intel
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (26 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 29/40] x86/cpu/amd: Provide a separate accessor for Node ID Thomas Gleixner
                   ` (14 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

Intel CPUs use either topology leaf 0xb/0x1f evaluation or the legacy
SMP/HT evaluation based on CPUID leaf 0x1/0x4.

Move it over to the consolidated topology code and remove the random
topology hacks which are sprinkled into the Intel and the common code.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/common.c          |   65 ----------------------------------
 arch/x86/kernel/cpu/cpu.h             |    4 --
 arch/x86/kernel/cpu/intel.c           |   25 -------------
 arch/x86/kernel/cpu/topology_common.c |    5 ++
 4 files changed, 4 insertions(+), 95 deletions(-)

--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -784,19 +784,6 @@ static void get_model_name(struct cpuinf
 	*(s + 1) = '\0';
 }
 
-void detect_num_cpu_cores(struct cpuinfo_x86 *c)
-{
-	unsigned int eax, ebx, ecx, edx;
-
-	c->x86_max_cores = 1;
-	if (!IS_ENABLED(CONFIG_SMP) || c->cpuid_level < 4)
-		return;
-
-	cpuid_count(4, 0, &eax, &ebx, &ecx, &edx);
-	if (eax & 0x1f)
-		c->x86_max_cores = (eax >> 26) + 1;
-}
-
 void cpu_detect_cache_sizes(struct cpuinfo_x86 *c)
 {
 	unsigned int n, dummy, ebx, ecx, edx, l2size;
@@ -858,54 +845,6 @@ static void cpu_detect_tlb(struct cpuinf
 		tlb_lld_4m[ENTRIES], tlb_lld_1g[ENTRIES]);
 }
 
-int detect_ht_early(struct cpuinfo_x86 *c)
-{
-#ifdef CONFIG_SMP
-	u32 eax, ebx, ecx, edx;
-
-	if (!cpu_has(c, X86_FEATURE_HT))
-		return -1;
-
-	if (cpu_has(c, X86_FEATURE_CMP_LEGACY))
-		return -1;
-
-	if (cpu_has(c, X86_FEATURE_XTOPOLOGY))
-		return -1;
-
-	cpuid(1, &eax, &ebx, &ecx, &edx);
-
-	smp_num_siblings = (ebx & 0xff0000) >> 16;
-	if (smp_num_siblings == 1)
-		pr_info_once("CPU0: Hyper-Threading is disabled\n");
-#endif
-	return 0;
-}
-
-void detect_ht(struct cpuinfo_x86 *c)
-{
-#ifdef CONFIG_SMP
-	int index_msb, core_bits;
-
-	if (topo_is_converted(c))
-		return;
-
-	if (detect_ht_early(c) < 0)
-		return;
-
-	index_msb = get_count_order(smp_num_siblings);
-	c->topo.pkg_id = apic->phys_pkg_id(c->topo.initial_apicid, index_msb);
-
-	smp_num_siblings = smp_num_siblings / c->x86_max_cores;
-
-	index_msb = get_count_order(smp_num_siblings);
-
-	core_bits = get_count_order(c->x86_max_cores);
-
-	c->topo.core_id = apic->phys_pkg_id(c->topo.initial_apicid, index_msb) &
-		((1 << core_bits) - 1);
-#endif
-}
-
 static void get_cpu_vendor(struct cpuinfo_x86 *c)
 {
 	char *v = c->x86_vendor_id;
@@ -1853,10 +1792,6 @@ static void identify_cpu(struct cpuinfo_
 				c->x86, c->x86_model);
 	}
 
-#ifdef CONFIG_X86_64
-	detect_ht(c);
-#endif
-
 	x86_init_rdrand(c);
 	setup_pku(c);
 	setup_cet(c);
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -76,11 +76,7 @@ extern void init_intel_cacheinfo(struct
 extern void init_amd_cacheinfo(struct cpuinfo_x86 *c);
 extern void init_hygon_cacheinfo(struct cpuinfo_x86 *c);
 
-extern void detect_num_cpu_cores(struct cpuinfo_x86 *c);
-extern int detect_extended_topology_early(struct cpuinfo_x86 *c);
 extern int detect_extended_topology(struct cpuinfo_x86 *c);
-extern int detect_ht_early(struct cpuinfo_x86 *c);
-extern void detect_ht(struct cpuinfo_x86 *c);
 extern void check_null_seg_clears_base(struct cpuinfo_x86 *c);
 
 void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c);
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -489,13 +489,6 @@ static void early_init_intel(struct cpui
 	}
 
 	check_memory_type_self_snoop_errata(c);
-
-	/*
-	 * Get the number of SMT siblings early from the extended topology
-	 * leaf, if available. Otherwise try the legacy SMT detection.
-	 */
-	if (detect_extended_topology_early(c) < 0)
-		detect_ht_early(c);
 }
 
 static void bsp_init_intel(struct cpuinfo_x86 *c)
@@ -777,24 +770,6 @@ static void init_intel(struct cpuinfo_x8
 
 	intel_workarounds(c);
 
-	/*
-	 * Detect the extended topology information if available. This
-	 * will reinitialise the initial_apicid which will be used
-	 * in init_intel_cacheinfo()
-	 */
-	detect_extended_topology(c);
-
-	if (!cpu_has(c, X86_FEATURE_XTOPOLOGY)) {
-		/*
-		 * let's use the legacy cpuid vector 0x1 and 0x4 for topology
-		 * detection.
-		 */
-		detect_num_cpu_cores(c);
-#ifdef CONFIG_X86_32
-		detect_ht(c);
-#endif
-	}
-
 	init_intel_cacheinfo(c);
 
 	if (c->cpuid_level > 9) {
--- a/arch/x86/kernel/cpu/topology_common.c
+++ b/arch/x86/kernel/cpu/topology_common.c
@@ -64,7 +64,6 @@ bool topo_is_converted(struct cpuinfo_x8
 	/* Temporary until everything is converted over. */
 	switch (boot_cpu_data.x86_vendor) {
 	case X86_VENDOR_AMD:
-	case X86_VENDOR_INTEL:
 	case X86_VENDOR_HYGON:
 		return false;
 	default:
@@ -129,6 +128,10 @@ static void parse_topology(struct topo_s
 	case X86_VENDOR_ZHAOXIN:
 		parse_legacy(tscan);
 		break;
+	case X86_VENDOR_INTEL:
+		if (!IS_ENABLED(CONFIG_CPU_SUP_INTEL) || !cpu_parse_topology_ext(tscan))
+			parse_legacy(tscan);
+		break;
 	}
 }
 


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 29/40] x86/cpu/amd: Provide a separate accessor for Node ID
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (27 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 28/40] x86/cpu: Use common topology code for Intel Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 30/40] x86/cpu: Provide an AMD/HYGON specific topology parser Thomas Gleixner
                   ` (13 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

AMD (ab)uses topology_die_id() to store the Node ID information and
topology_max_dies_per_pkg to store the number of nodes per package.

This collides with the proper processor die level enumeration which is
coming on AMD with CPUID 8000_0026, unless there is a correlation between
the two. There is zero documentation about that.

So provide new storage and new accessors which for now still access die_id
and topology_max_dies_per_pkg. Will be mopped up after AMD and HYGON are
converted over.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/events/amd/core.c       |    2 +-
 arch/x86/include/asm/processor.h |    3 +++
 arch/x86/include/asm/topology.h  |    8 ++++++++
 arch/x86/kernel/amd_nb.c         |    4 ++--
 arch/x86/kernel/cpu/cacheinfo.c  |    2 +-
 arch/x86/kernel/cpu/mce/amd.c    |    4 ++--
 arch/x86/kernel/cpu/mce/inject.c |    4 ++--
 drivers/edac/amd64_edac.c        |    4 ++--
 drivers/edac/mce_amd.c           |    4 ++--
 9 files changed, 23 insertions(+), 12 deletions(-)

--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -574,7 +574,7 @@ static void amd_pmu_cpu_starting(int cpu
 	if (!x86_pmu.amd_nb_constraints)
 		return;
 
-	nb_id = topology_die_id(cpu);
+	nb_id = topology_amd_node_id(cpu);
 	WARN_ON_ONCE(nb_id == BAD_APICID);
 
 	for_each_online_cpu(i) {
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -99,6 +99,9 @@ struct cpuinfo_topology {
 	u32			logical_pkg_id;
 	u32			logical_die_id;
 
+	// AMD Node ID and Nodes per Package info
+	u32			amd_node_id;
+
 	// Cache level topology IDs
 	u32			llc_id;
 	u32			l2c_id;
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -131,6 +131,8 @@ extern const struct cpumask *cpu_cluster
 #define topology_core_id(cpu)			(cpu_data(cpu).topo.core_id)
 #define topology_ppin(cpu)			(cpu_data(cpu).ppin)
 
+#define topology_amd_node_id(cpu)		(cpu_data(cpu).topo.die_id)
+
 extern unsigned int __max_die_per_package;
 
 #ifdef CONFIG_SMP
@@ -160,6 +162,11 @@ int topology_update_die_map(unsigned int
 int topology_phys_to_logical_pkg(unsigned int pkg);
 bool topology_smt_supported(void);
 
+static inline unsigned int topology_amd_nodes_per_pkg(void)
+{
+	return __max_die_per_package;
+}
+
 extern struct cpumask __cpu_primary_thread_mask;
 #define cpu_primary_thread_mask ((const struct cpumask *)&__cpu_primary_thread_mask)
 
@@ -182,6 +189,7 @@ static inline int topology_max_die_per_p
 static inline int topology_max_smt_threads(void) { return 1; }
 static inline bool topology_is_primary_thread(unsigned int cpu) { return true; }
 static inline bool topology_smt_supported(void) { return false; }
+static inline unsigned int topology_amd_nodes_per_pkg(void) { return 0; };
 #endif /* !CONFIG_SMP */
 
 static inline void arch_fix_phys_package_id(int num, u32 slot)
--- a/arch/x86/kernel/amd_nb.c
+++ b/arch/x86/kernel/amd_nb.c
@@ -370,7 +370,7 @@ struct resource *amd_get_mmconfig_range(
 
 int amd_get_subcaches(int cpu)
 {
-	struct pci_dev *link = node_to_amd_nb(topology_die_id(cpu))->link;
+	struct pci_dev *link = node_to_amd_nb(topology_amd_node_id(cpu))->link;
 	unsigned int mask;
 
 	if (!amd_nb_has_feature(AMD_NB_L3_PARTITIONING))
@@ -384,7 +384,7 @@ int amd_get_subcaches(int cpu)
 int amd_set_subcaches(int cpu, unsigned long mask)
 {
 	static unsigned int reset, ban;
-	struct amd_northbridge *nb = node_to_amd_nb(topology_die_id(cpu));
+	struct amd_northbridge *nb = node_to_amd_nb(topology_amd_node_id(cpu));
 	unsigned int reg;
 	int cuid;
 
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -595,7 +595,7 @@ static void amd_init_l3_cache(struct _cp
 	if (index < 3)
 		return;
 
-	node = topology_die_id(smp_processor_id());
+	node = topology_amd_node_id(smp_processor_id());
 	this_leaf->nb = node_to_amd_nb(node);
 	if (this_leaf->nb && !this_leaf->nb->l3_cache.indices)
 		amd_calc_l3_indices(this_leaf->nb);
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -1181,7 +1181,7 @@ static int threshold_create_bank(struct
 		return -ENODEV;
 
 	if (is_shared_bank(bank)) {
-		nb = node_to_amd_nb(topology_die_id(cpu));
+		nb = node_to_amd_nb(topology_amd_node_id(cpu));
 
 		/* threshold descriptor already initialized on this node? */
 		if (nb && nb->bank4) {
@@ -1285,7 +1285,7 @@ static void threshold_remove_bank(struct
 		 * The last CPU on this node using the shared bank is going
 		 * away, remove that bank now.
 		 */
-		nb = node_to_amd_nb(topology_die_id(smp_processor_id()));
+		nb = node_to_amd_nb(topology_amd_node_id(smp_processor_id()));
 		nb->bank4 = NULL;
 	}
 
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -543,8 +543,8 @@ static void do_inject(void)
 	if (boot_cpu_has(X86_FEATURE_AMD_DCM) &&
 	    b == 4 &&
 	    boot_cpu_data.x86 < 0x17) {
-		toggle_nb_mca_mst_cpu(topology_die_id(cpu));
-		cpu = get_nbc_for_node(topology_die_id(cpu));
+		toggle_nb_mca_mst_cpu(topology_amd_node_id(cpu));
+		cpu = get_nbc_for_node(topology_amd_node_id(cpu));
 	}
 
 	cpus_read_lock();
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -1907,7 +1907,7 @@ static void dct_determine_memory_type(st
 /* On F10h and later ErrAddr is MC4_ADDR[47:1] */
 static u64 get_error_address(struct amd64_pvt *pvt, struct mce *m)
 {
-	u16 mce_nid = topology_die_id(m->extcpu);
+	u16 mce_nid = topology_amd_node_id(m->extcpu);
 	struct mem_ctl_info *mci;
 	u8 start_bit = 1;
 	u8 end_bit   = 47;
@@ -3438,7 +3438,7 @@ static void get_cpus_on_this_dct_cpumask
 	int cpu;
 
 	for_each_online_cpu(cpu)
-		if (topology_die_id(cpu) == nid)
+		if (topology_amd_node_id(cpu) == nid)
 			cpumask_set_cpu(cpu, mask);
 }
 
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -1060,7 +1060,7 @@ static void decode_mc3_mce(struct mce *m
 static void decode_mc4_mce(struct mce *m)
 {
 	unsigned int fam = x86_family(m->cpuid);
-	int node_id = topology_die_id(m->extcpu);
+	int node_id = topology_amd_node_id(m->extcpu);
 	u16 ec = EC(m->status);
 	u8 xec = XEC(m->status, 0x1f);
 	u8 offset = 0;
@@ -1188,7 +1188,7 @@ static void decode_smca_error(struct mce
 
 	if ((bank_type == SMCA_UMC || bank_type == SMCA_UMC_V2) &&
 	    xec == 0 && decode_dram_ecc)
-		decode_dram_ecc(topology_die_id(m->extcpu), m);
+		decode_dram_ecc(topology_amd_node_id(m->extcpu), m);
 }
 
 static inline void amd_decode_err_code(u16 ec)


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 30/40] x86/cpu: Provide an AMD/HYGON specific topology parser
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (28 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 29/40] x86/cpu/amd: Provide a separate accessor for Node ID Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 19:28   ` Michael Kelley (LINUX)
  2023-08-02 19:51   ` [patch V3a " Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 31/40] x86/smpboot: Teach it about topo.amd_node_id Thomas Gleixner
                   ` (12 subsequent siblings)
  42 siblings, 2 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

AMD/HYGON uses various methods for topology evaluation:

  - Leaf 0x80000008 and 0x8000001e based with an optional leaf 0xb,
    which is the preferred variant for modern CPUs.

    Leaf 0xb will be superseded by leaf 0x80000026 soon, which is just
    another variant of the Intel 0x1f leaf for whatever reasons.
    
  - Subleaf 0x80000008 and NODEID_MSR base

  - Legacy fallback

That code is following the principle of random bits and pieces all over the
place which results in multiple evaluations and impenetrable code flows in
the same way as the Intel parsing did.

Provide a sane implementation by clearly separating the three variants and
bringing them in the proper preference order in one place.

This provides the parsing for both AMD and HYGON because there is no point
in having a separate HYGON parser which only differs by 3 lines of
code. Any further divergence between AMD and HYGON can be handled in
different functions, while still sharing the existing parsers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V3: Fix the off by one with leaf 0x8000001e::ebx::threads_per_cu - Michael
---
 arch/x86/include/asm/topology.h       |    2 
 arch/x86/kernel/cpu/Makefile          |    2 
 arch/x86/kernel/cpu/amd.c             |    2 
 arch/x86/kernel/cpu/cacheinfo.c       |    4 
 arch/x86/kernel/cpu/cpu.h             |    2 
 arch/x86/kernel/cpu/debugfs.c         |    2 
 arch/x86/kernel/cpu/topology.h        |    6 +
 arch/x86/kernel/cpu/topology_amd.c    |  179 ++++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/topology_common.c |   19 +++
 9 files changed, 211 insertions(+), 7 deletions(-)

--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -162,6 +162,8 @@ int topology_update_die_map(unsigned int
 int topology_phys_to_logical_pkg(unsigned int pkg);
 bool topology_smt_supported(void);
 
+extern unsigned int __amd_nodes_per_pkg;
+
 static inline unsigned int topology_amd_nodes_per_pkg(void)
 {
 	return __max_die_per_package;
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -18,7 +18,7 @@ KMSAN_SANITIZE_common.o := n
 KCSAN_SANITIZE_common.o := n
 
 obj-y			:= cacheinfo.o scattered.o
-obj-y			+= topology_common.o topology_ext.o topology.o
+obj-y			+= topology_common.o topology_ext.o topology_amd.o topology.o
 obj-y			+= common.o
 obj-y			+= rdrand.o
 obj-y			+= match.o
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -423,7 +423,7 @@ static void amd_get_topology(struct cpui
 		if (!err)
 			c->x86_coreid_bits = get_count_order(c->x86_max_cores);
 
-		cacheinfo_amd_init_llc_id(c);
+		cacheinfo_amd_init_llc_id(c, c->topo.die_id);
 
 	} else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
 		u64 value;
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -661,7 +661,7 @@ static int find_num_cache_leaves(struct
 	return i;
 }
 
-void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c)
+void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, u16 die_id)
 {
 	/*
 	 * We may have multiple LLCs if L3 caches exist, so check if we
@@ -672,7 +672,7 @@ void cacheinfo_amd_init_llc_id(struct cp
 
 	if (c->x86 < 0x17) {
 		/* LLC is at the node level. */
-		c->topo.llc_id = c->topo.die_id;
+		c->topo.llc_id = die_id;
 	} else if (c->x86 == 0x17 && c->x86_model <= 0x1F) {
 		/*
 		 * LLC is at the core complex level.
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -79,7 +79,7 @@ extern void init_hygon_cacheinfo(struct
 extern int detect_extended_topology(struct cpuinfo_x86 *c);
 extern void check_null_seg_clears_base(struct cpuinfo_x86 *c);
 
-void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c);
+void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, u16 die_id);
 void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c);
 
 unsigned int aperfmperf_get_khz(int cpu);
--- a/arch/x86/kernel/cpu/debugfs.c
+++ b/arch/x86/kernel/cpu/debugfs.c
@@ -26,6 +26,8 @@ static int cpu_debug_show(struct seq_fil
 	seq_printf(m, "logical_die_id:      %u\n", c->topo.logical_die_id);
 	seq_printf(m, "llc_id:              %u\n", c->topo.llc_id);
 	seq_printf(m, "l2c_id:              %u\n", c->topo.l2c_id);
+	seq_printf(m, "amd_node_id:         %u\n", c->topo.amd_node_id);
+	seq_printf(m, "amd_nodes_per_pkg:   %u\n", topology_amd_nodes_per_pkg());
 	seq_printf(m, "max_cores:           %u\n", c->x86_max_cores);
 	seq_printf(m, "max_die_per_pkg:     %u\n", __max_die_per_package);
 	seq_printf(m, "smp_num_siblings:    %u\n", smp_num_siblings);
--- a/arch/x86/kernel/cpu/topology.h
+++ b/arch/x86/kernel/cpu/topology.h
@@ -9,6 +9,10 @@ struct topo_scan {
 
 	// Legacy CPUID[1]:EBX[23:16] number of logical processors
 	unsigned int		ebx1_nproc_shift;
+
+	// AMD specific node ID which cannot be mapped into APIC space.
+	u16			amd_nodes_per_pkg;
+	u16			amd_node_id;
 };
 
 bool topo_is_converted(struct cpuinfo_x86 *c);
@@ -17,6 +21,8 @@ void cpu_parse_topology(struct cpuinfo_x
 void topology_set_dom(struct topo_scan *tscan, enum x86_topology_domains dom,
 		      unsigned int shift, unsigned int ncpus);
 bool cpu_parse_topology_ext(struct topo_scan *tscan);
+void cpu_parse_topology_amd(struct topo_scan *tscan);
+void cpu_topology_fixup_amd(struct topo_scan *tscan);
 
 static inline u32 topo_shift_apicid(u32 apicid, enum x86_topology_domains dom)
 {
--- /dev/null
+++ b/arch/x86/kernel/cpu/topology_amd.c
@@ -0,0 +1,179 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/cpu.h>
+
+#include <asm/apic.h>
+#include <asm/memtype.h>
+#include <asm/processor.h>
+
+#include "cpu.h"
+
+static bool parse_8000_0008(struct topo_scan *tscan)
+{
+	struct {
+		u32	ncores		:  8,
+			__rsvd0		:  4,
+			apicidsize	:  4,
+			perftscsize	:  2,
+			__rsvd1		: 14;
+	} ecx;
+	unsigned int sft;
+
+	if (tscan->c->extended_cpuid_level < 0x80000008)
+		return false;
+
+	cpuid_leaf_reg(0x80000008, CPUID_ECX, &ecx);
+
+	/* If the APIC ID size is 0, then get the shift value from ecx.ncores */
+	sft = ecx.apicidsize;
+	if (!sft)
+		sft = get_count_order(ecx.ncores + 1);
+
+	topology_set_dom(tscan, TOPO_CORE_DOMAIN, sft, ecx.ncores + 1);
+	return true;
+}
+
+static void store_node(struct topo_scan *tscan, unsigned int nr_nodes, u16 node_id)
+{
+	/*
+	 * Starting with Fam 17h the DIE domain could probably be used to
+	 * retrieve the node info on AMD/HYGON. Analysis of CPUID dumps
+	 * suggests it's the topmost bit(s) of the CPU cores area, but
+	 * that's guess work and neither enumerated nor documented.
+	 *
+	 * Up to Fam 16h this does not work at all and the legacy node ID
+	 * has to be used.
+	 */
+	tscan->amd_nodes_per_pkg = nr_nodes;
+	tscan->amd_node_id = node_id;
+}
+
+static bool parse_8000_001e(struct topo_scan *tscan, bool has_0xb)
+{
+	struct {
+		// eax
+		u32	x2apic_id	: 32;
+		// ebx
+		u32	cuid		:  8,
+			threads_per_cu	:  8,
+			__rsvd0		: 16;
+		// ecx
+		u32	nodeid		:  8,
+			nodes_per_pkg	:  3,
+			__rsvd1		: 21;
+		// edx
+		u32	__rsvd2		: 32;
+	} leaf;
+
+	if (!boot_cpu_has(X86_FEATURE_TOPOEXT))
+		return false;
+
+	cpuid_leaf(0x8000001e, &leaf);
+
+	tscan->c->topo.initial_apicid = leaf.x2apic_id;
+
+	/*
+	 * If leaf 0xb is available, then SMT shift is set already. If not
+	 * take it from ecx.threads_per_cu and use topo_update_dom() -
+	 * topology_set_dom() would propagate and overwrite the already
+	 * propagated CORE level.
+	 */
+	if (!has_0xb) {
+		topology_update_dom(tscan, TOPO_SMT_DOMAIN, get_count_order(leaf.threads_per_cu),
+				    leaf.threads_per_cu + 1);
+	}
+
+	store_node(tscan, leaf.nodes_per_pkg + 1, leaf.nodeid);
+
+	if (tscan->c->x86_vendor == X86_VENDOR_AMD) {
+		if (tscan->c->x86 == 0x15)
+			tscan->c->topo.cu_id = leaf.cuid;
+
+		cacheinfo_amd_init_llc_id(tscan->c, leaf.nodeid);
+	} else {
+		/*
+		 * Package ID is ApicId[6..] on Hygon CPUs. See commit
+		 * e0ceeae708ce for explanation. The topology info is
+		 * screwed up: The package shift is always 6 and the node
+		 * ID is bit [4:5]. Don't touch the latter without
+		 * confirmation from the Hygon developers.
+		 */
+		topology_set_dom(tscan, TOPO_CORE_DOMAIN, 6, tscan->dom_ncpus[TOPO_CORE_DOMAIN]);
+		cacheinfo_hygon_init_llc_id(tscan->c);
+	}
+	return true;
+}
+
+static bool parse_fam10h_node_id(struct topo_scan *tscan)
+{
+	struct {
+		union {
+			u64	node_id		:  3,
+				nodes_per_pkg	:  3,
+				unused		: 58;
+			u64	msr;
+		};
+	} nid;
+
+	if (!boot_cpu_has(X86_FEATURE_NODEID_MSR))
+		return false;
+
+	rdmsrl(MSR_FAM10H_NODE_ID, nid.msr);
+	store_node(tscan, nid.nodes_per_pkg + 1, nid.node_id);
+	tscan->c->topo.llc_id = nid.node_id;
+	return true;
+}
+
+static void legacy_set_llc(struct topo_scan *tscan)
+{
+	unsigned int apicid = tscan->c->topo.initial_apicid;
+
+	/* parse_8000_0008() set everything up except llc_id */
+	tscan->c->topo.llc_id = apicid >> tscan->dom_shifts[TOPO_CORE_DOMAIN];
+}
+
+static void parse_topology_amd(struct topo_scan *tscan)
+{
+	bool has_0xb = false;
+
+	/*
+	 * If the extended topology leaf 0x8000_001e is available
+	 * try to get SMT and CORE shift from leaf 0xb first, then
+	 * try to get the CORE shift from leaf 0x8000_0008.
+	 */
+	if (boot_cpu_has(X86_FEATURE_TOPOEXT))
+		has_0xb = cpu_parse_topology_ext(tscan);
+
+	if (!has_0xb && !parse_8000_0008(tscan))
+		return;
+
+	/* Prefer leaf 0x8000001e if available */
+	if (parse_8000_001e(tscan, has_0xb))
+		return;
+
+	/* Try the NODEID MSR */
+	if (parse_fam10h_node_id(tscan))
+		return;
+
+	legacy_set_llc(tscan);
+}
+
+void cpu_parse_topology_amd(struct topo_scan *tscan)
+{
+	tscan->amd_nodes_per_pkg = 1;
+	parse_topology_amd(tscan);
+
+	if (tscan->amd_nodes_per_pkg > 1)
+		set_cpu_cap(tscan->c, X86_FEATURE_AMD_DCM);
+}
+
+void cpu_topology_fixup_amd(struct topo_scan *tscan)
+{
+	struct cpuinfo_x86 *c = tscan->c;
+
+	/*
+	 * Adjust the core_id relative to the node when there is more than
+	 * one node.
+	 */
+	if (tscan->c->x86 < 0x17 && tscan->amd_nodes_per_pkg > 1)
+		c->topo.core_id %= tscan->dom_ncpus[TOPO_CORE_DOMAIN] / tscan->amd_nodes_per_pkg;
+}
--- a/arch/x86/kernel/cpu/topology_common.c
+++ b/arch/x86/kernel/cpu/topology_common.c
@@ -11,11 +11,13 @@
 
 struct x86_topology_system x86_topo_system __ro_after_init;
 
+unsigned int __amd_nodes_per_pkg __ro_after_init;
+EXPORT_SYMBOL_GPL(__amd_nodes_per_pkg);
+
 void topology_set_dom(struct topo_scan *tscan, enum x86_topology_domains dom,
 		      unsigned int shift, unsigned int ncpus)
 {
-	tscan->dom_shifts[dom] = shift;
-	tscan->dom_ncpus[dom] = ncpus;
+	topology_update_dom(tscan, dom, shift, ncpus);
 
 	/* Propagate to the upper levels */
 	for (dom++; dom < TOPO_MAX_DOMAIN; dom++) {
@@ -152,6 +154,13 @@ static void topo_set_ids(struct topo_sca
 
 	/* Relative core ID */
 	c->topo.core_id = topo_relative_domain_id(apicid, TOPO_CORE_DOMAIN);
+
+	/* Temporary workaround */
+	if (tscan->amd_nodes_per_pkg)
+		c->topo.amd_node_id = c->topo.die_id = tscan->amd_node_id;
+
+	if (c->x86_vendor == X86_VENDOR_AMD)
+		cpu_topology_fixup_amd(tscan);
 }
 
 static void topo_set_max_cores(struct topo_scan *tscan)
@@ -236,4 +245,10 @@ void __init cpu_init_topology(struct cpu
 	 */
 	__max_die_per_package = tscan.dom_ncpus[TOPO_DIE_DOMAIN] /
 		tscan.dom_ncpus[TOPO_DIE_DOMAIN - 1];
+	/*
+	 * AMD systems have Nodes per package which cannot be mapped to
+	 * APIC ID (yet).
+	 */
+	if (c->x86_vendor == X86_VENDOR_AMD || c->x86_vendor == X86_VENDOR_HYGON)
+		__amd_nodes_per_pkg = __max_die_per_package = tscan.amd_nodes_per_pkg;
 }


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 31/40] x86/smpboot: Teach it about topo.amd_node_id
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (29 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 30/40] x86/cpu: Provide an AMD/HYGON specific topology parser Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 32/40] x86/cpu: Use common topology code for AMD Thomas Gleixner
                   ` (11 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

When switching AMD over to the new topology parser then the match functions
need to look for AMD systems with the extended topology feature at the new
topo.amd_node_id member which is then holding the node id information.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/smpboot.c |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -486,6 +486,7 @@ static bool match_smt(struct cpuinfo_x86
 
 		if (c->topo.pkg_id == o->topo.pkg_id &&
 		    c->topo.die_id == o->topo.die_id &&
+		    c->topo.amd_node_id == o->topo.amd_node_id &&
 		    per_cpu_llc_id(cpu1) == per_cpu_llc_id(cpu2)) {
 			if (c->topo.core_id == o->topo.core_id)
 				return topology_sane(c, o, "smt");
@@ -507,10 +508,13 @@ static bool match_smt(struct cpuinfo_x86
 
 static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 {
-	if (c->topo.pkg_id == o->topo.pkg_id &&
-	    c->topo.die_id == o->topo.die_id)
-		return true;
-	return false;
+	if (c->topo.pkg_id != o->topo.pkg_id || c->topo.die_id != o->topo.die_id)
+		return false;
+
+	if (boot_cpu_has(X86_FEATURE_TOPOEXT) && topology_amd_nodes_per_pkg() > 1)
+		return c->topo.amd_node_id == o->topo.amd_node_id;
+
+	return true;
 }
 
 static bool match_l2c(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 32/40] x86/cpu: Use common topology code for AMD
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (30 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 31/40] x86/smpboot: Teach it about topo.amd_node_id Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 33/40] x86/cpu: Use common topology code for HYGON Thomas Gleixner
                   ` (10 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

Switch it over to the new topology evaluation mechanism and remove the
random bits and pieces which are sprinkled all over the place.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/processor.h      |    2 
 arch/x86/include/asm/topology.h       |    5 +
 arch/x86/kernel/cpu/amd.c             |  146 ----------------------------------
 arch/x86/kernel/cpu/mce/inject.c      |    3 
 arch/x86/kernel/cpu/topology_common.c |    5 -
 5 files changed, 10 insertions(+), 151 deletions(-)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -705,10 +705,8 @@ static inline u32 per_cpu_l2c_id(unsigne
 }
 
 #ifdef CONFIG_CPU_SUP_AMD
-extern u32 amd_get_nodes_per_socket(void);
 extern u32 amd_get_highest_perf(void);
 #else
-static inline u32 amd_get_nodes_per_socket(void)	{ return 0; }
 static inline u32 amd_get_highest_perf(void)		{ return 0; }
 #endif
 
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -121,6 +121,11 @@ struct x86_topology_system {
 
 extern struct x86_topology_system x86_topo_system;
 
+static inline unsigned int topology_get_domain_size(enum x86_topology_domains dom)
+{
+	return x86_topo_system.dom_size[dom];
+}
+
 extern const struct cpumask *cpu_coregroup_mask(int cpu);
 extern const struct cpumask *cpu_clustergroup_mask(int cpu);
 
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -28,13 +28,6 @@
 #include "cpu.h"
 
 /*
- * nodes_per_socket: Stores the number of nodes per socket.
- * Refer to Fam15h Models 00-0fh BKDG - CPUID Fn8000_001E_ECX
- * Node Identifiers[10:8]
- */
-static u32 nodes_per_socket = 1;
-
-/*
  * AMD errata checking
  *
  * Errata are defined as arrays of ints using the AMD_LEGACY_ERRATUM() or
@@ -372,97 +365,6 @@ static int nearby_node(int apicid)
 }
 #endif
 
-/*
- * Fix up topo::core_id for pre-F17h systems to be in the
- * [0 .. cores_per_node - 1] range. Not really needed but
- * kept so as not to break existing setups.
- */
-static void legacy_fixup_core_id(struct cpuinfo_x86 *c)
-{
-	u32 cus_per_node;
-
-	if (c->x86 >= 0x17)
-		return;
-
-	cus_per_node = c->x86_max_cores / nodes_per_socket;
-	c->topo.core_id %= cus_per_node;
-}
-
-/*
- * Fixup core topology information for
- * (1) AMD multi-node processors
- *     Assumption: Number of cores in each internal node is the same.
- * (2) AMD processors supporting compute units
- */
-static void amd_get_topology(struct cpuinfo_x86 *c)
-{
-	/* get information required for multi-node processors */
-	if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
-		int err;
-		u32 eax, ebx, ecx, edx;
-
-		cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
-
-		c->topo.die_id  = ecx & 0xff;
-
-		if (c->x86 == 0x15)
-			c->topo.cu_id = ebx & 0xff;
-
-		if (c->x86 >= 0x17) {
-			c->topo.core_id = ebx & 0xff;
-
-			if (smp_num_siblings > 1)
-				c->x86_max_cores /= smp_num_siblings;
-		}
-
-		/*
-		 * In case leaf B is available, use it to derive
-		 * topology information.
-		 */
-		err = detect_extended_topology(c);
-		if (!err)
-			c->x86_coreid_bits = get_count_order(c->x86_max_cores);
-
-		cacheinfo_amd_init_llc_id(c, c->topo.die_id);
-
-	} else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
-		u64 value;
-
-		rdmsrl(MSR_FAM10H_NODE_ID, value);
-		c->topo.die_id = value & 7;
-		c->topo.llc_id = c->topo.die_id;
-	} else
-		return;
-
-	if (nodes_per_socket > 1) {
-		set_cpu_cap(c, X86_FEATURE_AMD_DCM);
-		legacy_fixup_core_id(c);
-	}
-}
-
-/*
- * On a AMD dual core setup the lower bits of the APIC id distinguish the cores.
- * Assumes number of cores is a power of two.
- */
-static void amd_detect_cmp(struct cpuinfo_x86 *c)
-{
-	unsigned bits;
-
-	bits = c->x86_coreid_bits;
-	/* Low order bits define the core id (index of core in socket) */
-	c->topo.core_id = c->topo.initial_apicid & ((1 << bits)-1);
-	/* Convert the initial APIC ID into the socket ID */
-	c->topo.pkg_id = c->topo.initial_apicid >> bits;
-	/* use socket ID also for last level cache */
-	c->topo.llc_id = c->topo.die_id = c->topo.pkg_id;
-}
-
-u32 amd_get_nodes_per_socket(void)
-{
-	return nodes_per_socket;
-}
-EXPORT_SYMBOL_GPL(amd_get_nodes_per_socket);
-
 static void srat_detect_node(struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_NUMA
@@ -514,32 +416,6 @@ static void srat_detect_node(struct cpui
 #endif
 }
 
-static void early_init_amd_mc(struct cpuinfo_x86 *c)
-{
-#ifdef CONFIG_SMP
-	unsigned bits, ecx;
-
-	/* Multi core CPU? */
-	if (c->extended_cpuid_level < 0x80000008)
-		return;
-
-	ecx = cpuid_ecx(0x80000008);
-
-	c->x86_max_cores = (ecx & 0xff) + 1;
-
-	/* CPU telling us the core id bits shift? */
-	bits = (ecx >> 12) & 0xF;
-
-	/* Otherwise recompute */
-	if (bits == 0) {
-		while ((1 << bits) < c->x86_max_cores)
-			bits++;
-	}
-
-	c->x86_coreid_bits = bits;
-#endif
-}
-
 static void bsp_init_amd(struct cpuinfo_x86 *c)
 {
 	if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
@@ -572,18 +448,6 @@ static void bsp_init_amd(struct cpuinfo_
 	if (cpu_has(c, X86_FEATURE_MWAITX))
 		use_mwaitx_delay();
 
-	if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
-		u32 ecx;
-
-		ecx = cpuid_ecx(0x8000001e);
-		__max_die_per_package = nodes_per_socket = ((ecx >> 8) & 7) + 1;
-	} else if (boot_cpu_has(X86_FEATURE_NODEID_MSR)) {
-		u64 value;
-
-		rdmsrl(MSR_FAM10H_NODE_ID, value);
-		__max_die_per_package = nodes_per_socket = ((value >> 3) & 7) + 1;
-	}
-
 	if (!boot_cpu_has(X86_FEATURE_AMD_SSBD) &&
 	    !boot_cpu_has(X86_FEATURE_VIRT_SSBD) &&
 	    c->x86 >= 0x15 && c->x86 <= 0x17) {
@@ -665,8 +529,6 @@ static void early_init_amd(struct cpuinf
 	u64 value;
 	u32 dummy;
 
-	early_init_amd_mc(c);
-
 	if (c->x86 >= 0xf)
 		set_cpu_cap(c, X86_FEATURE_K8);
 
@@ -754,9 +616,6 @@ static void early_init_amd(struct cpuinf
 			}
 		}
 	}
-
-	if (cpu_has(c, X86_FEATURE_TOPOEXT))
-		smp_num_siblings = ((cpuid_ebx(0x8000001e) >> 8) & 0xff) + 1;
 }
 
 static void init_amd_k8(struct cpuinfo_x86 *c)
@@ -1037,9 +896,6 @@ static void init_amd(struct cpuinfo_x86
 	if (cpu_has(c, X86_FEATURE_FSRM))
 		set_cpu_cap(c, X86_FEATURE_FSRS);
 
-	/* get apicid instead of initial apic id from cpuid */
-	c->topo.apicid = read_apic_id();
-
 	/* K6s reports MCEs but don't actually have all the MSRs */
 	if (c->x86 < 6)
 		clear_cpu_cap(c, X86_FEATURE_MCE);
@@ -1067,8 +923,6 @@ static void init_amd(struct cpuinfo_x86
 
 	cpu_detect_cache_sizes(c);
 
-	amd_detect_cmp(c);
-	amd_get_topology(c);
 	srat_detect_node(c);
 
 	init_amd_cacheinfo(c);
--- a/arch/x86/kernel/cpu/mce/inject.c
+++ b/arch/x86/kernel/cpu/mce/inject.c
@@ -433,8 +433,7 @@ static u32 get_nbc_for_node(int node_id)
 	struct cpuinfo_x86 *c = &boot_cpu_data;
 	u32 cores_per_node;
 
-	cores_per_node = (c->x86_max_cores * smp_num_siblings) / amd_get_nodes_per_socket();
-
+	cores_per_node = (c->x86_max_cores * smp_num_siblings) / topology_amd_nodes_per_pkg();
 	return cores_per_node * node_id;
 }
 
--- a/arch/x86/kernel/cpu/topology_common.c
+++ b/arch/x86/kernel/cpu/topology_common.c
@@ -72,7 +72,6 @@ bool topo_is_converted(struct cpuinfo_x8
 {
 	/* Temporary until everything is converted over. */
 	switch (boot_cpu_data.x86_vendor) {
-	case X86_VENDOR_AMD:
 	case X86_VENDOR_HYGON:
 		return false;
 	default:
@@ -133,6 +132,10 @@ static void parse_topology(struct topo_s
 	tscan->ebx1_nproc_shift = get_count_order(ebx.nproc);
 
 	switch (c->x86_vendor) {
+	case X86_VENDOR_AMD:
+		if (IS_ENABLED(CONFIG_CPU_SUP_AMD))
+			cpu_parse_topology_amd(tscan);
+		break;
 	case X86_VENDOR_CENTAUR:
 	case X86_VENDOR_ZHAOXIN:
 		parse_legacy(tscan);


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 33/40] x86/cpu: Use common topology code for HYGON
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (31 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 32/40] x86/cpu: Use common topology code for AMD Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-11 13:00   ` Pu Wen
  2023-08-02 10:21 ` [patch V3 34/40] x86/mm/numa: Use core domain size on AMD Thomas Gleixner
                   ` (9 subsequent siblings)
  42 siblings, 1 reply; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

Switch it over to use the consolidated topology evaluation and remove the
temporary safe guards which are not longer needed.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/common.c          |    5 -
 arch/x86/kernel/cpu/cpu.h             |    1 
 arch/x86/kernel/cpu/hygon.c           |  123 ----------------------------------
 arch/x86/kernel/cpu/topology.h        |    1 
 arch/x86/kernel/cpu/topology_common.c |   22 +-----
 5 files changed, 4 insertions(+), 148 deletions(-)

--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1740,11 +1740,6 @@ static void identify_cpu(struct cpuinfo_
 	/* Clear/Set all flags overridden by options, after probe */
 	apply_forced_caps(c);
 
-#ifdef CONFIG_X86_64
-	if (!topo_is_converted(c))
-		c->topo.apicid = apic->phys_pkg_id(c->topo.initial_apicid, 0);
-#endif
-
 	/*
 	 * Vendor-specific initialization.  In this section we
 	 * canonicalize the feature flags, meaning if there are
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -76,7 +76,6 @@ extern void init_intel_cacheinfo(struct
 extern void init_amd_cacheinfo(struct cpuinfo_x86 *c);
 extern void init_hygon_cacheinfo(struct cpuinfo_x86 *c);
 
-extern int detect_extended_topology(struct cpuinfo_x86 *c);
 extern void check_null_seg_clears_base(struct cpuinfo_x86 *c);
 
 void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, u16 die_id);
--- a/arch/x86/kernel/cpu/hygon.c
+++ b/arch/x86/kernel/cpu/hygon.c
@@ -20,12 +20,6 @@
 
 #define APICID_SOCKET_ID_BIT 6
 
-/*
- * nodes_per_socket: Stores the number of nodes per socket.
- * Refer to CPUID Fn8000_001E_ECX Node Identifiers[10:8]
- */
-static u32 nodes_per_socket = 1;
-
 #ifdef CONFIG_NUMA
 /*
  * To workaround broken NUMA config.  Read the comment in
@@ -49,76 +43,6 @@ static int nearby_node(int apicid)
 }
 #endif
 
-static void hygon_get_topology_early(struct cpuinfo_x86 *c)
-{
-	if (cpu_has(c, X86_FEATURE_TOPOEXT))
-		smp_num_siblings = ((cpuid_ebx(0x8000001e) >> 8) & 0xff) + 1;
-}
-
-/*
- * Fixup core topology information for
- * (1) Hygon multi-node processors
- *     Assumption: Number of cores in each internal node is the same.
- * (2) Hygon processors supporting compute units
- */
-static void hygon_get_topology(struct cpuinfo_x86 *c)
-{
-	/* get information required for multi-node processors */
-	if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
-		int err;
-		u32 eax, ebx, ecx, edx;
-
-		cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
-
-		c->topo.die_id  = ecx & 0xff;
-
-		c->topo.core_id = ebx & 0xff;
-
-		if (smp_num_siblings > 1)
-			c->x86_max_cores /= smp_num_siblings;
-
-		/*
-		 * In case leaf B is available, use it to derive
-		 * topology information.
-		 */
-		err = detect_extended_topology(c);
-		if (!err)
-			c->x86_coreid_bits = get_count_order(c->x86_max_cores);
-
-		/* Socket ID is ApicId[6] for these processors. */
-		c->topo.pkg_id = c->topo.apicid >> APICID_SOCKET_ID_BIT;
-
-		cacheinfo_hygon_init_llc_id(c);
-	} else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
-		u64 value;
-
-		rdmsrl(MSR_FAM10H_NODE_ID, value);
-		c->topo.die_id = value & 7;
-		c->topo.llc_id = c->topo.die_id;
-	} else
-		return;
-
-	if (nodes_per_socket > 1)
-		set_cpu_cap(c, X86_FEATURE_AMD_DCM);
-}
-
-/*
- * On Hygon setup the lower bits of the APIC id distinguish the cores.
- * Assumes number of cores is a power of two.
- */
-static void hygon_detect_cmp(struct cpuinfo_x86 *c)
-{
-	unsigned int bits;
-
-	bits = c->x86_coreid_bits;
-	/* Low order bits define the core id (index of core in socket) */
-	c->topo.core_id = c->topo.initial_apicid & ((1 << bits)-1);
-	/* Convert the initial APIC ID into the socket ID */
-	c->topo.pkg_id = c->topo.initial_apicid >> bits;
-	/* Use package ID also for last level cache */
-	c->topo.llc_id = c->topo.die_id = c->topo.pkg_id;
-}
-
 static void srat_detect_node(struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_NUMA
@@ -169,32 +93,6 @@ static void srat_detect_node(struct cpui
 #endif
 }
 
-static void early_init_hygon_mc(struct cpuinfo_x86 *c)
-{
-#ifdef CONFIG_SMP
-	unsigned int bits, ecx;
-
-	/* Multi core CPU? */
-	if (c->extended_cpuid_level < 0x80000008)
-		return;
-
-	ecx = cpuid_ecx(0x80000008);
-
-	c->x86_max_cores = (ecx & 0xff) + 1;
-
-	/* CPU telling us the core id bits shift? */
-	bits = (ecx >> 12) & 0xF;
-
-	/* Otherwise recompute */
-	if (bits == 0) {
-		while ((1 << bits) < c->x86_max_cores)
-			bits++;
-	}
-
-	c->x86_coreid_bits = bits;
-#endif
-}
-
 static void bsp_init_hygon(struct cpuinfo_x86 *c)
 {
 	if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
@@ -208,18 +106,6 @@ static void bsp_init_hygon(struct cpuinf
 	if (cpu_has(c, X86_FEATURE_MWAITX))
 		use_mwaitx_delay();
 
-	if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
-		u32 ecx;
-
-		ecx = cpuid_ecx(0x8000001e);
-		__max_die_per_package = nodes_per_socket = ((ecx >> 8) & 7) + 1;
-	} else if (boot_cpu_has(X86_FEATURE_NODEID_MSR)) {
-		u64 value;
-
-		rdmsrl(MSR_FAM10H_NODE_ID, value);
-		__max_die_per_package = nodes_per_socket = ((value >> 3) & 7) + 1;
-	}
-
 	if (!boot_cpu_has(X86_FEATURE_AMD_SSBD) &&
 	    !boot_cpu_has(X86_FEATURE_VIRT_SSBD)) {
 		/*
@@ -238,8 +124,6 @@ static void early_init_hygon(struct cpui
 {
 	u32 dummy;
 
-	early_init_hygon_mc(c);
-
 	set_cpu_cap(c, X86_FEATURE_K8);
 
 	rdmsr_safe(MSR_AMD64_PATCH_LEVEL, &c->microcode, &dummy);
@@ -280,8 +164,6 @@ static void early_init_hygon(struct cpui
 	 * we can set it unconditionally.
 	 */
 	set_cpu_cap(c, X86_FEATURE_VMMCALL);
-
-	hygon_get_topology_early(c);
 }
 
 static void init_hygon(struct cpuinfo_x86 *c)
@@ -296,9 +178,6 @@ static void init_hygon(struct cpuinfo_x8
 
 	set_cpu_cap(c, X86_FEATURE_REP_GOOD);
 
-	/* get apicid instead of initial apic id from cpuid */
-	c->topo.apicid = read_apic_id();
-
 	/*
 	 * XXX someone from Hygon needs to confirm this DTRT
 	 *
@@ -310,8 +189,6 @@ static void init_hygon(struct cpuinfo_x8
 
 	cpu_detect_cache_sizes(c);
 
-	hygon_detect_cmp(c);
-	hygon_get_topology(c);
 	srat_detect_node(c);
 
 	init_hygon_cacheinfo(c);
--- a/arch/x86/kernel/cpu/topology.h
+++ b/arch/x86/kernel/cpu/topology.h
@@ -15,7 +15,6 @@ struct topo_scan {
 	u16			amd_node_id;
 };
 
-bool topo_is_converted(struct cpuinfo_x86 *c);
 void cpu_init_topology(struct cpuinfo_x86 *c);
 void cpu_parse_topology(struct cpuinfo_x86 *c);
 void topology_set_dom(struct topo_scan *tscan, enum x86_topology_domains dom,
--- a/arch/x86/kernel/cpu/topology_common.c
+++ b/arch/x86/kernel/cpu/topology_common.c
@@ -68,18 +68,6 @@ static void parse_legacy(struct topo_sca
 	topology_set_dom(tscan, TOPO_CORE_DOMAIN, core_shift, cores);
 }
 
-bool topo_is_converted(struct cpuinfo_x86 *c)
-{
-	/* Temporary until everything is converted over. */
-	switch (boot_cpu_data.x86_vendor) {
-	case X86_VENDOR_HYGON:
-		return false;
-	default:
-		/* Let all UP systems use the below */
-		return true;
-	}
-}
-
 static bool fake_topology(struct topo_scan *tscan)
 {
 	/*
@@ -144,6 +132,10 @@ static void parse_topology(struct topo_s
 		if (!IS_ENABLED(CONFIG_CPU_SUP_INTEL) || !cpu_parse_topology_ext(tscan))
 			parse_legacy(tscan);
 		break;
+	case X86_VENDOR_HYGON:
+		if (IS_ENABLED(CONFIG_CPU_SUP_HYGON))
+			cpu_parse_topology_amd(tscan);
+		break;
 	}
 }
 
@@ -186,9 +178,6 @@ void cpu_parse_topology(struct cpuinfo_x
 
 	parse_topology(&tscan, false);
 
-	if (!topo_is_converted(c))
-		return;
-
 	for (dom = TOPO_SMT_DOMAIN; dom < TOPO_MAX_DOMAIN; dom++) {
 		if (tscan.dom_shifts[dom] == x86_topo_system.dom_shifts[dom])
 			continue;
@@ -217,9 +206,6 @@ void __init cpu_init_topology(struct cpu
 
 	parse_topology(&tscan, true);
 
-	if (!topo_is_converted(c))
-		return;
-
 	/* Copy the shift values and calculate the unit sizes. */
 	memcpy(x86_topo_system.dom_shifts, tscan.dom_shifts, sizeof(x86_topo_system.dom_shifts));
 


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 34/40] x86/mm/numa: Use core domain size on AMD
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (32 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 33/40] x86/cpu: Use common topology code for HYGON Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 35/40] x86/cpu: Make topology_amd_node_id() use the actual node info Thomas Gleixner
                   ` (8 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

cpuinfo::topo::x86_coreid_bits is about to be phased out. Use the core
domain size from the topology information.

Add a comment why the early MPTABLE parsing is required and decrapify the
loop which sets the APIC ID to node map.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/mm/amdtopology.c |   35 ++++++++++++++++-------------------
 1 file changed, 16 insertions(+), 19 deletions(-)

--- a/arch/x86/mm/amdtopology.c
+++ b/arch/x86/mm/amdtopology.c
@@ -54,13 +54,11 @@ static __init int find_northbridge(void)
 
 int __init amd_numa_init(void)
 {
-	u64 start = PFN_PHYS(0);
+	unsigned int numnodes, cores, apicid;
+	u64 prevbase, start = PFN_PHYS(0);
 	u64 end = PFN_PHYS(max_pfn);
-	unsigned numnodes;
-	u64 prevbase;
-	int i, j, nb;
 	u32 nodeid, reg;
-	unsigned int bits, cores, apicid_base;
+	int i, j, nb;
 
 	if (!early_pci_allowed())
 		return -EINVAL;
@@ -158,26 +156,25 @@ int __init amd_numa_init(void)
 		return -ENOENT;
 
 	/*
-	 * We seem to have valid NUMA configuration.  Map apicids to nodes
-	 * using the coreid bits from early_identify_cpu.
+	 * We seem to have valid NUMA configuration. Map apicids to nodes
+	 * using the size of the core domain in the APIC space.
 	 */
-	bits = boot_cpu_data.x86_coreid_bits;
-	cores = 1 << bits;
-	apicid_base = 0;
+	cores = topology_get_domain_size(TOPO_CORE_DOMAIN);
 
 	/*
-	 * get boot-time SMP configuration:
+	 * Scan MPTABLE to map the local APIC and ensure that the boot CPU
+	 * APIC ID is valid. This is required because on pre ACPI/SRAT
+	 * systems IO-APICs are mapped before the boot CPU.
 	 */
 	early_get_smp_config();
 
-	if (boot_cpu_physical_apicid > 0) {
-		pr_info("BSP APIC ID: %02x\n", boot_cpu_physical_apicid);
-		apicid_base = boot_cpu_physical_apicid;
+	apicid = boot_cpu_physical_apicid;
+	if (apicid > 0)
+		pr_info("BSP APIC ID: %02x\n", apicid);
+
+	for_each_node_mask(i, numa_nodes_parsed) {
+		for (j = 0; j < cores; j++, apicid++)
+			set_apicid_to_node(apicid, i);
 	}
-
-	for_each_node_mask(i, numa_nodes_parsed)
-		for (j = apicid_base; j < cores + apicid_base; j++)
-			set_apicid_to_node((i << bits) + j, i);
-
 	return 0;
 }


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 35/40] x86/cpu: Make topology_amd_node_id() use the actual node info
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (33 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 34/40] x86/mm/numa: Use core domain size on AMD Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 36/40] x86/cpu: Remove topology.c Thomas Gleixner
                   ` (7 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

Now that everything is converted switch it over and remove the intermediate
operation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/topology.h       |    4 ++--
 arch/x86/kernel/cpu/topology_common.c |    7 ++-----
 2 files changed, 4 insertions(+), 7 deletions(-)

--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -136,7 +136,7 @@ extern const struct cpumask *cpu_cluster
 #define topology_core_id(cpu)			(cpu_data(cpu).topo.core_id)
 #define topology_ppin(cpu)			(cpu_data(cpu).ppin)
 
-#define topology_amd_node_id(cpu)		(cpu_data(cpu).topo.die_id)
+#define topology_amd_node_id(cpu)		(cpu_data(cpu).topo.amd_node_id)
 
 extern unsigned int __max_die_per_package;
 
@@ -171,7 +171,7 @@ extern unsigned int __amd_nodes_per_pkg;
 
 static inline unsigned int topology_amd_nodes_per_pkg(void)
 {
-	return __max_die_per_package;
+	return __amd_nodes_per_pkg;
 }
 
 extern struct cpumask __cpu_primary_thread_mask;
--- a/arch/x86/kernel/cpu/topology_common.c
+++ b/arch/x86/kernel/cpu/topology_common.c
@@ -143,9 +143,7 @@ static void topo_set_ids(struct topo_sca
 	/* Relative core ID */
 	c->topo.core_id = topo_relative_domain_id(apicid, TOPO_CORE_DOMAIN);
 
-	/* Temporary workaround */
-	if (tscan->amd_nodes_per_pkg)
-		c->topo.amd_node_id = c->topo.die_id = tscan->amd_node_id;
+	c->topo.amd_node_id = tscan->amd_node_id;
 
 	if (c->x86_vendor == X86_VENDOR_AMD)
 		cpu_topology_fixup_amd(tscan);
@@ -231,6 +229,5 @@ void __init cpu_init_topology(struct cpu
 	 * AMD systems have Nodes per package which cannot be mapped to
 	 * APIC ID (yet).
 	 */
-	if (c->x86_vendor == X86_VENDOR_AMD || c->x86_vendor == X86_VENDOR_HYGON)
-		__amd_nodes_per_pkg = __max_die_per_package = tscan.amd_nodes_per_pkg;
+	__amd_nodes_per_pkg = tscan.amd_nodes_per_pkg;
 }


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 36/40] x86/cpu: Remove topology.c
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (34 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 35/40] x86/cpu: Make topology_amd_node_id() use the actual node info Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 37/40] x86/cpu: Remove x86_coreid_bits Thomas Gleixner
                   ` (6 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

No more users. Stick it into the ugly code museum.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/Makefile   |    2 
 arch/x86/kernel/cpu/topology.c |  164 -----------------------------------------
 2 files changed, 1 insertion(+), 165 deletions(-)

--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -18,7 +18,7 @@ KMSAN_SANITIZE_common.o := n
 KCSAN_SANITIZE_common.o := n
 
 obj-y			:= cacheinfo.o scattered.o
-obj-y			+= topology_common.o topology_ext.o topology_amd.o topology.o
+obj-y			+= topology_common.o topology_ext.o topology_amd.o
 obj-y			+= common.o
 obj-y			+= rdrand.o
 obj-y			+= match.o
--- a/arch/x86/kernel/cpu/topology.c
+++ /dev/null
@@ -1,164 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * Check for extended topology enumeration cpuid leaf 0xb and if it
- * exists, use it for populating initial_apicid and cpu topology
- * detection.
- */
-
-#include <linux/cpu.h>
-#include <asm/apic.h>
-#include <asm/memtype.h>
-#include <asm/processor.h>
-
-#include "cpu.h"
-
-/* leaf 0xb SMT level */
-#define SMT_LEVEL	0
-
-/* extended topology sub-leaf types */
-#define INVALID_TYPE	0
-#define SMT_TYPE	1
-#define CORE_TYPE	2
-#define DIE_TYPE	5
-
-#define LEAFB_SUBTYPE(ecx)		(((ecx) >> 8) & 0xff)
-#define BITS_SHIFT_NEXT_LEVEL(eax)	((eax) & 0x1f)
-#define LEVEL_MAX_SIBLINGS(ebx)		((ebx) & 0xffff)
-
-#ifdef CONFIG_SMP
-/*
- * Check if given CPUID extended topology "leaf" is implemented
- */
-static int check_extended_topology_leaf(int leaf)
-{
-	unsigned int eax, ebx, ecx, edx;
-
-	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
-
-	if (ebx == 0 || (LEAFB_SUBTYPE(ecx) != SMT_TYPE))
-		return -1;
-
-	return 0;
-}
-/*
- * Return best CPUID Extended Topology Leaf supported
- */
-static int detect_extended_topology_leaf(struct cpuinfo_x86 *c)
-{
-	if (c->cpuid_level >= 0x1f) {
-		if (check_extended_topology_leaf(0x1f) == 0)
-			return 0x1f;
-	}
-
-	if (c->cpuid_level >= 0xb) {
-		if (check_extended_topology_leaf(0xb) == 0)
-			return 0xb;
-	}
-
-	return -1;
-}
-#endif
-
-int detect_extended_topology_early(struct cpuinfo_x86 *c)
-{
-#ifdef CONFIG_SMP
-	unsigned int eax, ebx, ecx, edx;
-	int leaf;
-
-	leaf = detect_extended_topology_leaf(c);
-	if (leaf < 0)
-		return -1;
-
-	set_cpu_cap(c, X86_FEATURE_XTOPOLOGY);
-
-	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
-	/*
-	 * initial apic id, which also represents 32-bit extended x2apic id.
-	 */
-	c->topo.initial_apicid = edx;
-	smp_num_siblings = max_t(int, smp_num_siblings, LEVEL_MAX_SIBLINGS(ebx));
-#endif
-	return 0;
-}
-
-/*
- * Check for extended topology enumeration cpuid leaf, and if it
- * exists, use it for populating initial_apicid and cpu topology
- * detection.
- */
-int detect_extended_topology(struct cpuinfo_x86 *c)
-{
-#ifdef CONFIG_SMP
-	unsigned int eax, ebx, ecx, edx, sub_index;
-	unsigned int ht_mask_width, core_plus_mask_width, die_plus_mask_width;
-	unsigned int core_select_mask, core_level_siblings;
-	unsigned int die_select_mask, die_level_siblings;
-	unsigned int pkg_mask_width;
-	bool die_level_present = false;
-	int leaf;
-
-	leaf = detect_extended_topology_leaf(c);
-	if (leaf < 0)
-		return -1;
-
-	/*
-	 * Populate HT related information from sub-leaf level 0.
-	 */
-	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
-	c->topo.initial_apicid = edx;
-	core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
-	smp_num_siblings = max_t(int, smp_num_siblings, LEVEL_MAX_SIBLINGS(ebx));
-	core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
-	die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
-	pkg_mask_width = die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
-
-	sub_index = 1;
-	while (true) {
-		cpuid_count(leaf, sub_index, &eax, &ebx, &ecx, &edx);
-
-		/*
-		 * Check for the Core type in the implemented sub leaves.
-		 */
-		if (LEAFB_SUBTYPE(ecx) == CORE_TYPE) {
-			core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
-			core_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
-			die_level_siblings = core_level_siblings;
-			die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
-		}
-		if (LEAFB_SUBTYPE(ecx) == DIE_TYPE) {
-			die_level_present = true;
-			die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
-			die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
-		}
-
-		if (LEAFB_SUBTYPE(ecx) != INVALID_TYPE)
-			pkg_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
-		else
-			break;
-
-		sub_index++;
-	}
-
-	core_select_mask = (~(-1 << pkg_mask_width)) >> ht_mask_width;
-	die_select_mask = (~(-1 << die_plus_mask_width)) >>
-				core_plus_mask_width;
-
-	c->topo.core_id = apic->phys_pkg_id(c->topo.initial_apicid,
-				ht_mask_width) & core_select_mask;
-
-	if (die_level_present) {
-		c->topo.die_id = apic->phys_pkg_id(c->topo.initial_apicid,
-					core_plus_mask_width) & die_select_mask;
-	}
-
-	c->topo.pkg_id = apic->phys_pkg_id(c->topo.initial_apicid, pkg_mask_width);
-	/*
-	 * Reinit the apicid, now that we have extended initial_apicid.
-	 */
-	c->topo.apicid = apic->phys_pkg_id(c->topo.initial_apicid, 0);
-
-	c->x86_max_cores = (core_level_siblings / smp_num_siblings);
-	__max_die_per_package = (die_level_siblings / core_level_siblings);
-#endif
-	return 0;
-}


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 37/40] x86/cpu: Remove x86_coreid_bits
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (35 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 36/40] x86/cpu: Remove topology.c Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 38/40] x86/apic: Remove unused phys_pkg_id() callback Thomas Gleixner
                   ` (5 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

No more users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/processor.h |    2 --
 arch/x86/kernel/cpu/common.c     |    1 -
 2 files changed, 3 deletions(-)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -119,8 +119,6 @@ struct cpuinfo_x86 {
 #endif
 	__u8			x86_virt_bits;
 	__u8			x86_phys_bits;
-	/* CPUID returned core id bits: */
-	__u8			x86_coreid_bits;
 	/* Max extended CPUID function supported: */
 	__u32			extended_cpuid_level;
 	/* Maximum supported CPUID level, -1=no CPUID: */
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1712,7 +1712,6 @@ static void identify_cpu(struct cpuinfo_
 	c->x86_vendor_id[0] = '\0'; /* Unset */
 	c->x86_model_id[0] = '\0';  /* Unset */
 	c->x86_max_cores = 1;
-	c->x86_coreid_bits = 0;
 #ifdef CONFIG_X86_64
 	c->x86_clflush_size = 64;
 	c->x86_phys_bits = 36;


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 38/40] x86/apic: Remove unused phys_pkg_id() callback
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (36 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 37/40] x86/cpu: Remove x86_coreid_bits Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:21 ` [patch V3 39/40] x86/xen/smp_pv: Remove cpudata fiddling Thomas Gleixner
                   ` (4 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

Now that the core code does not use this monstrosity anymore, it's time to
put it to rest.

The only real purpose was to read the APIC ID on UV and VSMP systems for
the actual evaluation. That's what the core code does now.

For doing the actual shift operation there is truly no APIC callback
required.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/apic.h           |    1 -
 arch/x86/kernel/apic/apic_flat_64.c   |    7 -------
 arch/x86/kernel/apic/apic_noop.c      |    3 ---
 arch/x86/kernel/apic/apic_numachip.c  |    7 -------
 arch/x86/kernel/apic/bigsmp_32.c      |    6 ------
 arch/x86/kernel/apic/local.h          |    1 -
 arch/x86/kernel/apic/probe_32.c       |    6 ------
 arch/x86/kernel/apic/x2apic_cluster.c |    1 -
 arch/x86/kernel/apic/x2apic_phys.c    |    6 ------
 arch/x86/kernel/apic/x2apic_uv_x.c    |   11 -----------
 arch/x86/kernel/vsmp_64.c             |   13 -------------
 arch/x86/xen/apic.c                   |    6 ------
 12 files changed, 68 deletions(-)

--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -296,7 +296,6 @@ struct apic {
 	void	(*init_apic_ldr)(void);
 	void	(*ioapic_phys_id_map)(physid_mask_t *phys_map, physid_mask_t *retmap);
 	u32	(*cpu_present_to_apicid)(int mps_cpu);
-	u32	(*phys_pkg_id)(u32 cpuid_apic, int index_msb);
 
 	u32	(*get_apic_id)(u32 id);
 	u32	(*set_apic_id)(u32 apicid);
--- a/arch/x86/kernel/apic/apic_flat_64.c
+++ b/arch/x86/kernel/apic/apic_flat_64.c
@@ -66,11 +66,6 @@ static u32 set_apic_id(u32 id)
 	return (id & 0xFF) << 24;
 }
 
-static u32 flat_phys_pkg_id(u32 initial_apic_id, int index_msb)
-{
-	return initial_apic_id >> index_msb;
-}
-
 static int flat_probe(void)
 {
 	return 1;
@@ -89,7 +84,6 @@ static struct apic apic_flat __ro_after_
 
 	.init_apic_ldr			= default_init_apic_ldr,
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
-	.phys_pkg_id			= flat_phys_pkg_id,
 
 	.max_apic_id			= 0xFE,
 	.get_apic_id			= flat_get_apic_id,
@@ -159,7 +153,6 @@ static struct apic apic_physflat __ro_af
 	.disable_esr			= 0,
 
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
-	.phys_pkg_id			= flat_phys_pkg_id,
 
 	.max_apic_id			= 0xFE,
 	.get_apic_id			= flat_get_apic_id,
--- a/arch/x86/kernel/apic/apic_noop.c
+++ b/arch/x86/kernel/apic/apic_noop.c
@@ -29,7 +29,6 @@ static void noop_send_IPI_self(int vecto
 static void noop_apic_icr_write(u32 low, u32 id) { }
 static int noop_wakeup_secondary_cpu(u32 apicid, unsigned long start_eip) { return -1; }
 static u64 noop_apic_icr_read(void) { return 0; }
-static u32 noop_phys_pkg_id(u32 cpuid_apic, int index_msb) { return 0; }
 static u32 noop_get_apic_id(u32 apicid) { return 0; }
 static void noop_apic_eoi(void) { }
 
@@ -56,8 +55,6 @@ struct apic apic_noop __ro_after_init =
 	.ioapic_phys_id_map		= default_ioapic_phys_id_map,
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
 
-	.phys_pkg_id			= noop_phys_pkg_id,
-
 	.max_apic_id			= 0xFE,
 	.get_apic_id			= noop_get_apic_id,
 
--- a/arch/x86/kernel/apic/apic_numachip.c
+++ b/arch/x86/kernel/apic/apic_numachip.c
@@ -56,11 +56,6 @@ static u32 numachip2_set_apic_id(u32 id)
 	return id << 24;
 }
 
-static u32 numachip_phys_pkg_id(u32 initial_apic_id, int index_msb)
-{
-	return initial_apic_id >> index_msb;
-}
-
 static void numachip1_apic_icr_write(int apicid, unsigned int val)
 {
 	write_lcsr(CSR_G3_EXT_IRQ_GEN, (apicid << 16) | val);
@@ -228,7 +223,6 @@ static const struct apic apic_numachip1
 	.disable_esr			= 0,
 
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
-	.phys_pkg_id			= numachip_phys_pkg_id,
 
 	.max_apic_id			= UINT_MAX,
 	.get_apic_id			= numachip1_get_apic_id,
@@ -265,7 +259,6 @@ static const struct apic apic_numachip2
 	.disable_esr			= 0,
 
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
-	.phys_pkg_id			= numachip_phys_pkg_id,
 
 	.max_apic_id			= UINT_MAX,
 	.get_apic_id			= numachip2_get_apic_id,
--- a/arch/x86/kernel/apic/bigsmp_32.c
+++ b/arch/x86/kernel/apic/bigsmp_32.c
@@ -29,11 +29,6 @@ static void bigsmp_ioapic_phys_id_map(ph
 	physids_promote(0xFFL, retmap);
 }
 
-static u32 bigsmp_phys_pkg_id(u32 cpuid_apic, int index_msb)
-{
-	return cpuid_apic >> index_msb;
-}
-
 static void bigsmp_send_IPI_allbutself(int vector)
 {
 	default_send_IPI_mask_allbutself_phys(cpu_online_mask, vector);
@@ -88,7 +83,6 @@ static struct apic apic_bigsmp __ro_afte
 	.check_apicid_used		= bigsmp_check_apicid_used,
 	.ioapic_phys_id_map		= bigsmp_ioapic_phys_id_map,
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
-	.phys_pkg_id			= bigsmp_phys_pkg_id,
 
 	.max_apic_id			= 0xFE,
 	.get_apic_id			= bigsmp_get_apic_id,
--- a/arch/x86/kernel/apic/local.h
+++ b/arch/x86/kernel/apic/local.h
@@ -17,7 +17,6 @@
 void __x2apic_send_IPI_dest(unsigned int apicid, int vector, unsigned int dest);
 u32 x2apic_get_apic_id(u32 id);
 u32 x2apic_set_apic_id(u32 id);
-u32 x2apic_phys_pkg_id(u32 initial_apicid, int index_msb);
 
 void x2apic_send_IPI_all(int vector);
 void x2apic_send_IPI_allbutself(int vector);
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -18,11 +18,6 @@
 
 #include "local.h"
 
-static u32 default_phys_pkg_id(u32 cpuid_apic, int index_msb)
-{
-	return cpuid_apic >> index_msb;
-}
-
 static u32 default_get_apic_id(u32 x)
 {
 	unsigned int ver = GET_APIC_VERSION(apic_read(APIC_LVR));
@@ -54,7 +49,6 @@ static struct apic apic_default __ro_aft
 	.init_apic_ldr			= default_init_apic_ldr,
 	.ioapic_phys_id_map		= default_ioapic_phys_id_map,
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
-	.phys_pkg_id			= default_phys_pkg_id,
 
 	.max_apic_id			= 0xFE,
 	.get_apic_id			= default_get_apic_id,
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -236,7 +236,6 @@ static struct apic apic_x2apic_cluster _
 	.init_apic_ldr			= init_x2apic_ldr,
 	.ioapic_phys_id_map		= NULL,
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
-	.phys_pkg_id			= x2apic_phys_pkg_id,
 
 	.max_apic_id			= UINT_MAX,
 	.x2apic_set_max_apicid		= true,
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -134,11 +134,6 @@ u32 x2apic_set_apic_id(u32 id)
 	return id;
 }
 
-u32 x2apic_phys_pkg_id(u32 initial_apicid, int index_msb)
-{
-	return initial_apicid >> index_msb;
-}
-
 static struct apic apic_x2apic_phys __ro_after_init = {
 
 	.name				= "physical x2apic",
@@ -151,7 +146,6 @@ static struct apic apic_x2apic_phys __ro
 	.disable_esr			= 0,
 
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
-	.phys_pkg_id			= x2apic_phys_pkg_id,
 
 	.max_apic_id			= UINT_MAX,
 	.x2apic_set_max_apicid		= true,
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -785,16 +785,6 @@ static u32 set_apic_id(u32 id)
 	return id;
 }
 
-static unsigned int uv_read_apic_id(void)
-{
-	return x2apic_get_apic_id(apic_read(APIC_ID));
-}
-
-static u32 uv_phys_pkg_id(u32 initial_apicid, int index_msb)
-{
-	return uv_read_apic_id() >> index_msb;
-}
-
 static int uv_probe(void)
 {
 	return apic == &apic_x2apic_uv_x;
@@ -812,7 +802,6 @@ static struct apic apic_x2apic_uv_x __ro
 	.disable_esr			= 0,
 
 	.cpu_present_to_apicid		= default_cpu_present_to_apicid,
-	.phys_pkg_id			= uv_phys_pkg_id,
 
 	.max_apic_id			= UINT_MAX,
 	.get_apic_id			= x2apic_get_apic_id,
--- a/arch/x86/kernel/vsmp_64.c
+++ b/arch/x86/kernel/vsmp_64.c
@@ -127,25 +127,12 @@ static void __init vsmp_cap_cpus(void)
 #endif
 }
 
-static u32 apicid_phys_pkg_id(u32 initial_apic_id, int index_msb)
-{
-	return read_apic_id() >> index_msb;
-}
-
-static void vsmp_apic_post_init(void)
-{
-	/* need to update phys_pkg_id */
-	apic->phys_pkg_id = apicid_phys_pkg_id;
-}
-
 void __init vsmp_init(void)
 {
 	detect_vsmp_box();
 	if (!is_vsmp_box())
 		return;
 
-	x86_platform.apic_post_init = vsmp_apic_post_init;
-
 	vsmp_cap_cpus();
 
 	set_vsmp_ctl();
--- a/arch/x86/xen/apic.c
+++ b/arch/x86/xen/apic.c
@@ -110,11 +110,6 @@ static int xen_madt_oem_check(char *oem_
 	return xen_pv_domain();
 }
 
-static u32 xen_phys_pkg_id(u32 initial_apic_id, int index_msb)
-{
-	return initial_apic_id >> index_msb;
-}
-
 static u32 xen_cpu_present_to_apicid(int cpu)
 {
 	if (cpu_present(cpu))
@@ -133,7 +128,6 @@ static struct apic xen_pv_apic __ro_afte
 	.disable_esr			= 0,
 
 	.cpu_present_to_apicid		= xen_cpu_present_to_apicid,
-	.phys_pkg_id			= xen_phys_pkg_id, /* detect_ht */
 
 	.max_apic_id			= UINT_MAX,
 	.get_apic_id			= xen_get_apic_id,


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 39/40] x86/xen/smp_pv: Remove cpudata fiddling
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (37 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 38/40] x86/apic: Remove unused phys_pkg_id() callback Thomas Gleixner
@ 2023-08-02 10:21 ` Thomas Gleixner
  2023-08-02 10:22 ` [patch V3 40/40] x86/apic/uv: Remove the private leaf 0xb parser Thomas Gleixner
                   ` (3 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:21 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

The new topology CPUID parser installs already fake topology for XEN/PV,
which ends up with cpuinfo::max_cores = 1.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Juergen Gross <jgross@suse.com>
---
V2: New patch
---
 arch/x86/xen/smp_pv.c |    3 ---
 1 file changed, 3 deletions(-)

--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -73,7 +73,6 @@ static void cpu_bringup(void)
 	}
 	cpu = smp_processor_id();
 	smp_store_cpu_info(cpu);
-	cpu_data(cpu).x86_max_cores = 1;
 	set_cpu_sibling_map(cpu);
 
 	speculative_store_bypass_ht_init();
@@ -223,8 +222,6 @@ static void __init xen_pv_smp_prepare_cp
 
 	smp_prepare_cpus_common();
 
-	cpu_data(0).x86_max_cores = 1;
-
 	speculative_store_bypass_ht_init();
 
 	xen_pmu_init(0);


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3 40/40] x86/apic/uv: Remove the private leaf 0xb parser
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (38 preceding siblings ...)
  2023-08-02 10:21 ` [patch V3 39/40] x86/xen/smp_pv: Remove cpudata fiddling Thomas Gleixner
@ 2023-08-02 10:22 ` Thomas Gleixner
  2023-08-02 11:56 ` [patch V3 00/40] x86/cpu: Rework the topology evaluation Juergen Gross
                   ` (2 subsequent siblings)
  42 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 10:22 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu,
	Steve Wahl, Mike Travis, Russ Anderson

The package shift has been already evaluated by the early CPU init.

Put the mindless copy right next to the original leaf 0xb parser.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Steve Wahl <steve.wahl@hpe.com>
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Russ Anderson <russ.anderson@hpe.com>
---
 arch/x86/include/asm/topology.h    |    5 +++
 arch/x86/kernel/apic/x2apic_uv_x.c |   52 ++++++-------------------------------
 2 files changed, 14 insertions(+), 43 deletions(-)

--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -126,6 +126,11 @@ static inline unsigned int topology_get_
 	return x86_topo_system.dom_size[dom];
 }
 
+static inline unsigned int topology_get_domain_shift(enum x86_topology_domains dom)
+{
+	return dom == TOPO_SMT_DOMAIN ? 0 : x86_topo_system.dom_shifts[dom - 1];
+}
+
 extern const struct cpumask *cpu_coregroup_mask(int cpu);
 extern const struct cpumask *cpu_clustergroup_mask(int cpu);
 
--- a/arch/x86/kernel/apic/x2apic_uv_x.c
+++ b/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -241,54 +241,20 @@ static void __init uv_tsc_check_sync(voi
 	is_uv(UV3) ? sname.s3.field :		\
 	undef)
 
-/* [Copied from arch/x86/kernel/cpu/topology.c:detect_extended_topology()] */
-
-#define SMT_LEVEL			0	/* Leaf 0xb SMT level */
-#define INVALID_TYPE			0	/* Leaf 0xb sub-leaf types */
-#define SMT_TYPE			1
-#define CORE_TYPE			2
-#define LEAFB_SUBTYPE(ecx)		(((ecx) >> 8) & 0xff)
-#define BITS_SHIFT_NEXT_LEVEL(eax)	((eax) & 0x1f)
-
-static void set_x2apic_bits(void)
-{
-	unsigned int eax, ebx, ecx, edx, sub_index;
-	unsigned int sid_shift;
-
-	cpuid(0, &eax, &ebx, &ecx, &edx);
-	if (eax < 0xb) {
-		pr_info("UV: CPU does not have CPUID.11\n");
-		return;
-	}
-
-	cpuid_count(0xb, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
-	if (ebx == 0 || (LEAFB_SUBTYPE(ecx) != SMT_TYPE)) {
-		pr_info("UV: CPUID.11 not implemented\n");
-		return;
-	}
-
-	sid_shift = BITS_SHIFT_NEXT_LEVEL(eax);
-	sub_index = 1;
-	do {
-		cpuid_count(0xb, sub_index, &eax, &ebx, &ecx, &edx);
-		if (LEAFB_SUBTYPE(ecx) == CORE_TYPE) {
-			sid_shift = BITS_SHIFT_NEXT_LEVEL(eax);
-			break;
-		}
-		sub_index++;
-	} while (LEAFB_SUBTYPE(ecx) != INVALID_TYPE);
-
-	uv_cpuid.apicid_shift	= 0;
-	uv_cpuid.apicid_mask	= (~(-1 << sid_shift));
-	uv_cpuid.socketid_shift = sid_shift;
-}
-
 static void __init early_get_apic_socketid_shift(void)
 {
+	unsigned int sid_shift = topology_get_domain_shift(TOPO_ROOT_DOMAIN);
+
 	if (is_uv2_hub() || is_uv3_hub())
 		uvh_apicid.v = uv_early_read_mmr(UVH_APICID);
 
-	set_x2apic_bits();
+	if (sid_shift) {
+		uv_cpuid.apicid_shift	= 0;
+		uv_cpuid.apicid_mask	= (~(-1 << sid_shift));
+		uv_cpuid.socketid_shift = sid_shift;
+	} else {
+		pr_info("UV: CPU does not have valid CPUID.11\n");
+	}
 
 	pr_info("UV: apicid_shift:%d apicid_mask:0x%x\n", uv_cpuid.apicid_shift, uv_cpuid.apicid_mask);
 	pr_info("UV: socketid_shift:%d pnode_mask:0x%x\n", uv_cpuid.socketid_shift, uv_cpuid.pnode_mask);


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 00/40] x86/cpu: Rework the topology evaluation
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (39 preceding siblings ...)
  2023-08-02 10:22 ` [patch V3 40/40] x86/apic/uv: Remove the private leaf 0xb parser Thomas Gleixner
@ 2023-08-02 11:56 ` Juergen Gross
  2023-08-03  0:07 ` Sohil Mehta
  2023-08-03  3:44 ` Michael Kelley (LINUX)
  42 siblings, 0 replies; 88+ messages in thread
From: Juergen Gross @ 2023-08-02 11:56 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Dimitri Sivanich, Michael Kelley, Wei Liu


[-- Attachment #1.1.1: Type: text/plain, Size: 2130 bytes --]

On 02.08.23 12:20, Thomas Gleixner wrote:
> Hi!
> 
> This is the follow up to V2:
> 
>    https://lore.kernel.org/lkml/20230728105650.565799744@linutronix.de
> 
> which addresses the review feedback and some fallout reported on and
> off-list.
> 
> TLDR:
> 
> This reworks the way how topology information is evaluated via CPUID
> in preparation for a larger topology management overhaul to address
> shortcomings of the current code vs. hybrid systems and systems which make
> use of the extended topology domains in leaf 0x1f. Aside of that it's an
> overdue spring cleaning to get rid of accumulated layers of duct tape and
> haywire.
> 
> What changed vs. V2:
> 
>    - Decoded and fixed the fallout vs. XEN/PV reported by Juergen. Thanks to
>      Juergen for the remote hand debugging sessions!
> 
>      That's addressed in the first two new patches in this series. Summary:
>      XEN/PV booted by pure chance since the addition of SMT control 5 years
>      ago.
> 
>    - Fixed the off by one in the AMD parser which was debugged by Michael
> 
>    - Addressed review comments from various people
> 
> As discussed in:
> 
>    https://lore.kernel.org/lkml/BYAPR21MB16889FD224344B1B28BE22A1D705A@BYAPR21MB1688.namprd21.prod.outlook.com
>    ....
>    https://lore.kernel.org/lkml/87r0omjt8c.ffs@tglx
> 
> this series unfortunately brings the Hyper-V BIOS inconsistency into
> effect, which results in a slight performance impact. The L3 association
> which "worked" so far by exploiting the inconsistency of the Linux topology
> code is not longer supportable as we really need to get the actual short
> comings of our topology management addressed in a consistent way.
> 
> The series is based on V3 of the APIC cleanup series:
> 
>    https://lore.kernel.org/lkml/20230801103042.936020332@linutronix.de
> 
> and also available on top of that from git:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-cpuid-v3
> 
> Thanks,
> 
> 	tglx

For Xen PV (dom0 and unprivileged guest):

Tested-by: Juergen Gross <jgross@suse.com>


Juergen


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [patch V3 30/40] x86/cpu: Provide an AMD/HYGON specific topology parser
  2023-08-02 10:21 ` [patch V3 30/40] x86/cpu: Provide an AMD/HYGON specific topology parser Thomas Gleixner
@ 2023-08-02 19:28   ` Michael Kelley (LINUX)
  2023-08-02 19:46     ` Thomas Gleixner
  2023-08-02 19:51   ` [patch V3a " Thomas Gleixner
  1 sibling, 1 reply; 88+ messages in thread
From: Michael Kelley (LINUX) @ 2023-08-02 19:28 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Wei Liu

From: Thomas Gleixner <tglx@linutronix.de> Sent: Wednesday, August 2, 2023 3:22 AM
> 
> AMD/HYGON uses various methods for topology evaluation:
> 
>   - Leaf 0x80000008 and 0x8000001e based with an optional leaf 0xb,
>     which is the preferred variant for modern CPUs.
> 
>     Leaf 0xb will be superseded by leaf 0x80000026 soon, which is just
>     another variant of the Intel 0x1f leaf for whatever reasons.
> 
>   - Subleaf 0x80000008 and NODEID_MSR base
> 
>   - Legacy fallback
> 
> That code is following the principle of random bits and pieces all over the
> place which results in multiple evaluations and impenetrable code flows in
> the same way as the Intel parsing did.
> 
> Provide a sane implementation by clearly separating the three variants and
> bringing them in the proper preference order in one place.
> 
> This provides the parsing for both AMD and HYGON because there is no point
> in having a separate HYGON parser which only differs by 3 lines of
> code. Any further divergence between AMD and HYGON can be handled in
> different functions, while still sharing the existing parsers.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> V3: Fix the off by one with leaf 0x8000001e::ebx::threads_per_cu - Michael
> ---

[snip]

> +
> +static bool parse_8000_001e(struct topo_scan *tscan, bool has_0xb)
> +{
> +	struct {
> +		// eax
> +		u32	x2apic_id	: 32;
> +		// ebx
> +		u32	cuid		:  8,
> +			threads_per_cu	:  8,
> +			__rsvd0		: 16;
> +		// ecx
> +		u32	nodeid		:  8,
> +			nodes_per_pkg	:  3,
> +			__rsvd1		: 21;
> +		// edx
> +		u32	__rsvd2		: 32;
> +	} leaf;
> +
> +	if (!boot_cpu_has(X86_FEATURE_TOPOEXT))
> +		return false;
> +
> +	cpuid_leaf(0x8000001e, &leaf);
> +
> +	tscan->c->topo.initial_apicid = leaf.x2apic_id;
> +
> +	/*
> +	 * If leaf 0xb is available, then SMT shift is set already. If not
> +	 * take it from ecx.threads_per_cu and use topo_update_dom() -
> +	 * topology_set_dom() would propagate and overwrite the already
> +	 * propagated CORE level.
> +	 */
> +	if (!has_0xb) {
> +		topology_update_dom(tscan, TOPO_SMT_DOMAIN, get_count_order(leaf.threads_per_cu),
> +				    leaf.threads_per_cu + 1);

Isn't the +1 also needed on the argument to get_count_order()?
I haven't actually run the config, but if hyper-threading is disabled,
presumably threads_per_cu is 0, and get_count_order returns -1.

Michael

> +	}
> +
> +	store_node(tscan, leaf.nodes_per_pkg + 1, leaf.nodeid);
> +
> +	if (tscan->c->x86_vendor == X86_VENDOR_AMD) {
> +		if (tscan->c->x86 == 0x15)
> +			tscan->c->topo.cu_id = leaf.cuid;
> +
> +		cacheinfo_amd_init_llc_id(tscan->c, leaf.nodeid);
> +	} else {
> +		/*
> +		 * Package ID is ApicId[6..] on Hygon CPUs. See commit
> +		 * e0ceeae708ce for explanation. The topology info is
> +		 * screwed up: The package shift is always 6 and the node
> +		 * ID is bit [4:5]. Don't touch the latter without
> +		 * confirmation from the Hygon developers.
> +		 */
> +		topology_set_dom(tscan, TOPO_CORE_DOMAIN, 6, tscan->dom_ncpus[TOPO_CORE_DOMAIN]);
> +		cacheinfo_hygon_init_llc_id(tscan->c);
> +	}
> +	return true;
> +}
> +

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [patch V3 30/40] x86/cpu: Provide an AMD/HYGON specific topology parser
  2023-08-02 19:28   ` Michael Kelley (LINUX)
@ 2023-08-02 19:46     ` Thomas Gleixner
  0 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 19:46 UTC (permalink / raw)
  To: Michael Kelley (LINUX), LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Wei Liu

On Wed, Aug 02 2023 at 19:28, Michael Kelley (LINUX) wrote:
> From: Thomas Gleixner <tglx@linutronix.de> Sent: Wednesday, August 2, 2023 3:22 AM
>> +	/*
>> +	 * If leaf 0xb is available, then SMT shift is set already. If not
>> +	 * take it from ecx.threads_per_cu and use topo_update_dom() -
>> +	 * topology_set_dom() would propagate and overwrite the already
>> +	 * propagated CORE level.
>> +	 */
>> +	if (!has_0xb) {
>> +		topology_update_dom(tscan, TOPO_SMT_DOMAIN, get_count_order(leaf.threads_per_cu),
>> +				    leaf.threads_per_cu + 1);
>
> Isn't the +1 also needed on the argument to get_count_order()?
> I haven't actually run the config, but if hyper-threading is disabled,
> presumably threads_per_cu is 0, and get_count_order returns -1.

Bah. I'm sure that I wanted to do that. No idea how I forgot about it
again.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* [patch V3a 30/40] x86/cpu: Provide an AMD/HYGON specific topology parser
  2023-08-02 10:21 ` [patch V3 30/40] x86/cpu: Provide an AMD/HYGON specific topology parser Thomas Gleixner
  2023-08-02 19:28   ` Michael Kelley (LINUX)
@ 2023-08-02 19:51   ` Thomas Gleixner
  2023-08-11 12:58     ` Pu Wen
  1 sibling, 1 reply; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-02 19:51 UTC (permalink / raw)
  To: LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

AMD/HYGON uses various methods for topology evaluation:

  - Leaf 0x80000008 and 0x8000001e based with an optional leaf 0xb,
    which is the preferred variant for modern CPUs.

    Leaf 0xb will be superseded by leaf 0x80000026 soon, which is just
    another variant of the Intel 0x1f leaf for whatever reasons.
    
  - Subleaf 0x80000008 and NODEID_MSR base

  - Legacy fallback

That code is following the principle of random bits and pieces all over the
place which results in multiple evaluations and impenetrable code flows in
the same way as the Intel parsing did.

Provide a sane implementation by clearly separating the three variants and
bringing them in the proper preference order in one place.

This provides the parsing for both AMD and HYGON because there is no point
in having a separate HYGON parser which only differs by 3 lines of
code. Any further divergence between AMD and HYGON can be handled in
different functions, while still sharing the existing parsers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V3: Fix the off by one with leaf 0x8000001e::ebx::threads_per_cu - Michael
V3a: Fix it for real.
---
 arch/x86/include/asm/topology.h       |    2 
 arch/x86/kernel/cpu/Makefile          |    2 
 arch/x86/kernel/cpu/amd.c             |    2 
 arch/x86/kernel/cpu/cacheinfo.c       |    4 
 arch/x86/kernel/cpu/cpu.h             |    2 
 arch/x86/kernel/cpu/debugfs.c         |    2 
 arch/x86/kernel/cpu/topology.h        |    6 +
 arch/x86/kernel/cpu/topology_amd.c    |  180 ++++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/topology_common.c |   19 +++
 9 files changed, 212 insertions(+), 7 deletions(-)

--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -162,6 +162,8 @@ int topology_update_die_map(unsigned int
 int topology_phys_to_logical_pkg(unsigned int pkg);
 bool topology_smt_supported(void);
 
+extern unsigned int __amd_nodes_per_pkg;
+
 static inline unsigned int topology_amd_nodes_per_pkg(void)
 {
 	return __max_die_per_package;
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -18,7 +18,7 @@ KMSAN_SANITIZE_common.o := n
 KCSAN_SANITIZE_common.o := n
 
 obj-y			:= cacheinfo.o scattered.o
-obj-y			+= topology_common.o topology_ext.o topology.o
+obj-y			+= topology_common.o topology_ext.o topology_amd.o topology.o
 obj-y			+= common.o
 obj-y			+= rdrand.o
 obj-y			+= match.o
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -423,7 +423,7 @@ static void amd_get_topology(struct cpui
 		if (!err)
 			c->x86_coreid_bits = get_count_order(c->x86_max_cores);
 
-		cacheinfo_amd_init_llc_id(c);
+		cacheinfo_amd_init_llc_id(c, c->topo.die_id);
 
 	} else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
 		u64 value;
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -661,7 +661,7 @@ static int find_num_cache_leaves(struct
 	return i;
 }
 
-void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c)
+void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, u16 die_id)
 {
 	/*
 	 * We may have multiple LLCs if L3 caches exist, so check if we
@@ -672,7 +672,7 @@ void cacheinfo_amd_init_llc_id(struct cp
 
 	if (c->x86 < 0x17) {
 		/* LLC is at the node level. */
-		c->topo.llc_id = c->topo.die_id;
+		c->topo.llc_id = die_id;
 	} else if (c->x86 == 0x17 && c->x86_model <= 0x1F) {
 		/*
 		 * LLC is at the core complex level.
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -79,7 +79,7 @@ extern void init_hygon_cacheinfo(struct
 extern int detect_extended_topology(struct cpuinfo_x86 *c);
 extern void check_null_seg_clears_base(struct cpuinfo_x86 *c);
 
-void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c);
+void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, u16 die_id);
 void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c);
 
 unsigned int aperfmperf_get_khz(int cpu);
--- a/arch/x86/kernel/cpu/debugfs.c
+++ b/arch/x86/kernel/cpu/debugfs.c
@@ -26,6 +26,8 @@ static int cpu_debug_show(struct seq_fil
 	seq_printf(m, "logical_die_id:      %u\n", c->topo.logical_die_id);
 	seq_printf(m, "llc_id:              %u\n", c->topo.llc_id);
 	seq_printf(m, "l2c_id:              %u\n", c->topo.l2c_id);
+	seq_printf(m, "amd_node_id:         %u\n", c->topo.amd_node_id);
+	seq_printf(m, "amd_nodes_per_pkg:   %u\n", topology_amd_nodes_per_pkg());
 	seq_printf(m, "max_cores:           %u\n", c->x86_max_cores);
 	seq_printf(m, "max_die_per_pkg:     %u\n", __max_die_per_package);
 	seq_printf(m, "smp_num_siblings:    %u\n", smp_num_siblings);
--- a/arch/x86/kernel/cpu/topology.h
+++ b/arch/x86/kernel/cpu/topology.h
@@ -9,6 +9,10 @@ struct topo_scan {
 
 	// Legacy CPUID[1]:EBX[23:16] number of logical processors
 	unsigned int		ebx1_nproc_shift;
+
+	// AMD specific node ID which cannot be mapped into APIC space.
+	u16			amd_nodes_per_pkg;
+	u16			amd_node_id;
 };
 
 bool topo_is_converted(struct cpuinfo_x86 *c);
@@ -17,6 +21,8 @@ void cpu_parse_topology(struct cpuinfo_x
 void topology_set_dom(struct topo_scan *tscan, enum x86_topology_domains dom,
 		      unsigned int shift, unsigned int ncpus);
 bool cpu_parse_topology_ext(struct topo_scan *tscan);
+void cpu_parse_topology_amd(struct topo_scan *tscan);
+void cpu_topology_fixup_amd(struct topo_scan *tscan);
 
 static inline u32 topo_shift_apicid(u32 apicid, enum x86_topology_domains dom)
 {
--- /dev/null
+++ b/arch/x86/kernel/cpu/topology_amd.c
@@ -0,0 +1,180 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/cpu.h>
+
+#include <asm/apic.h>
+#include <asm/memtype.h>
+#include <asm/processor.h>
+
+#include "cpu.h"
+
+static bool parse_8000_0008(struct topo_scan *tscan)
+{
+	struct {
+		u32	ncores		:  8,
+			__rsvd0		:  4,
+			apicidsize	:  4,
+			perftscsize	:  2,
+			__rsvd1		: 14;
+	} ecx;
+	unsigned int sft;
+
+	if (tscan->c->extended_cpuid_level < 0x80000008)
+		return false;
+
+	cpuid_leaf_reg(0x80000008, CPUID_ECX, &ecx);
+
+	/* If the APIC ID size is 0, then get the shift value from ecx.ncores */
+	sft = ecx.apicidsize;
+	if (!sft)
+		sft = get_count_order(ecx.ncores + 1);
+
+	topology_set_dom(tscan, TOPO_CORE_DOMAIN, sft, ecx.ncores + 1);
+	return true;
+}
+
+static void store_node(struct topo_scan *tscan, unsigned int nr_nodes, u16 node_id)
+{
+	/*
+	 * Starting with Fam 17h the DIE domain could probably be used to
+	 * retrieve the node info on AMD/HYGON. Analysis of CPUID dumps
+	 * suggests it's the topmost bit(s) of the CPU cores area, but
+	 * that's guess work and neither enumerated nor documented.
+	 *
+	 * Up to Fam 16h this does not work at all and the legacy node ID
+	 * has to be used.
+	 */
+	tscan->amd_nodes_per_pkg = nr_nodes;
+	tscan->amd_node_id = node_id;
+}
+
+static bool parse_8000_001e(struct topo_scan *tscan, bool has_0xb)
+{
+	struct {
+		// eax
+		u32	x2apic_id	: 32;
+		// ebx
+		u32	cuid		:  8,
+			threads_per_cu	:  8,
+			__rsvd0		: 16;
+		// ecx
+		u32	nodeid		:  8,
+			nodes_per_pkg	:  3,
+			__rsvd1		: 21;
+		// edx
+		u32	__rsvd2		: 32;
+	} leaf;
+
+	if (!boot_cpu_has(X86_FEATURE_TOPOEXT))
+		return false;
+
+	cpuid_leaf(0x8000001e, &leaf);
+
+	tscan->c->topo.initial_apicid = leaf.x2apic_id;
+
+	/*
+	 * If leaf 0xb is available, then SMT shift is set already. If not
+	 * take it from ecx.threads_per_cu and use topo_update_dom() -
+	 * topology_set_dom() would propagate and overwrite the already
+	 * propagated CORE level.
+	 */
+	if (!has_0xb) {
+		unsigned int nthreads = leaf.threads_per_cu + 1;
+
+		topology_update_dom(tscan, TOPO_SMT_DOMAIN, get_count_order(nthreads), nthreads);
+	}
+
+	store_node(tscan, leaf.nodes_per_pkg + 1, leaf.nodeid);
+
+	if (tscan->c->x86_vendor == X86_VENDOR_AMD) {
+		if (tscan->c->x86 == 0x15)
+			tscan->c->topo.cu_id = leaf.cuid;
+
+		cacheinfo_amd_init_llc_id(tscan->c, leaf.nodeid);
+	} else {
+		/*
+		 * Package ID is ApicId[6..] on Hygon CPUs. See commit
+		 * e0ceeae708ce for explanation. The topology info is
+		 * screwed up: The package shift is always 6 and the node
+		 * ID is bit [4:5]. Don't touch the latter without
+		 * confirmation from the Hygon developers.
+		 */
+		topology_set_dom(tscan, TOPO_CORE_DOMAIN, 6, tscan->dom_ncpus[TOPO_CORE_DOMAIN]);
+		cacheinfo_hygon_init_llc_id(tscan->c);
+	}
+	return true;
+}
+
+static bool parse_fam10h_node_id(struct topo_scan *tscan)
+{
+	struct {
+		union {
+			u64	node_id		:  3,
+				nodes_per_pkg	:  3,
+				unused		: 58;
+			u64	msr;
+		};
+	} nid;
+
+	if (!boot_cpu_has(X86_FEATURE_NODEID_MSR))
+		return false;
+
+	rdmsrl(MSR_FAM10H_NODE_ID, nid.msr);
+	store_node(tscan, nid.nodes_per_pkg + 1, nid.node_id);
+	tscan->c->topo.llc_id = nid.node_id;
+	return true;
+}
+
+static void legacy_set_llc(struct topo_scan *tscan)
+{
+	unsigned int apicid = tscan->c->topo.initial_apicid;
+
+	/* parse_8000_0008() set everything up except llc_id */
+	tscan->c->topo.llc_id = apicid >> tscan->dom_shifts[TOPO_CORE_DOMAIN];
+}
+
+static void parse_topology_amd(struct topo_scan *tscan)
+{
+	bool has_0xb = false;
+
+	/*
+	 * If the extended topology leaf 0x8000_001e is available
+	 * try to get SMT and CORE shift from leaf 0xb first, then
+	 * try to get the CORE shift from leaf 0x8000_0008.
+	 */
+	if (boot_cpu_has(X86_FEATURE_TOPOEXT))
+		has_0xb = cpu_parse_topology_ext(tscan);
+
+	if (!has_0xb && !parse_8000_0008(tscan))
+		return;
+
+	/* Prefer leaf 0x8000001e if available */
+	if (parse_8000_001e(tscan, has_0xb))
+		return;
+
+	/* Try the NODEID MSR */
+	if (parse_fam10h_node_id(tscan))
+		return;
+
+	legacy_set_llc(tscan);
+}
+
+void cpu_parse_topology_amd(struct topo_scan *tscan)
+{
+	tscan->amd_nodes_per_pkg = 1;
+	parse_topology_amd(tscan);
+
+	if (tscan->amd_nodes_per_pkg > 1)
+		set_cpu_cap(tscan->c, X86_FEATURE_AMD_DCM);
+}
+
+void cpu_topology_fixup_amd(struct topo_scan *tscan)
+{
+	struct cpuinfo_x86 *c = tscan->c;
+
+	/*
+	 * Adjust the core_id relative to the node when there is more than
+	 * one node.
+	 */
+	if (tscan->c->x86 < 0x17 && tscan->amd_nodes_per_pkg > 1)
+		c->topo.core_id %= tscan->dom_ncpus[TOPO_CORE_DOMAIN] / tscan->amd_nodes_per_pkg;
+}
--- a/arch/x86/kernel/cpu/topology_common.c
+++ b/arch/x86/kernel/cpu/topology_common.c
@@ -11,11 +11,13 @@
 
 struct x86_topology_system x86_topo_system __ro_after_init;
 
+unsigned int __amd_nodes_per_pkg __ro_after_init;
+EXPORT_SYMBOL_GPL(__amd_nodes_per_pkg);
+
 void topology_set_dom(struct topo_scan *tscan, enum x86_topology_domains dom,
 		      unsigned int shift, unsigned int ncpus)
 {
-	tscan->dom_shifts[dom] = shift;
-	tscan->dom_ncpus[dom] = ncpus;
+	topology_update_dom(tscan, dom, shift, ncpus);
 
 	/* Propagate to the upper levels */
 	for (dom++; dom < TOPO_MAX_DOMAIN; dom++) {
@@ -152,6 +154,13 @@ static void topo_set_ids(struct topo_sca
 
 	/* Relative core ID */
 	c->topo.core_id = topo_relative_domain_id(apicid, TOPO_CORE_DOMAIN);
+
+	/* Temporary workaround */
+	if (tscan->amd_nodes_per_pkg)
+		c->topo.amd_node_id = c->topo.die_id = tscan->amd_node_id;
+
+	if (c->x86_vendor == X86_VENDOR_AMD)
+		cpu_topology_fixup_amd(tscan);
 }
 
 static void topo_set_max_cores(struct topo_scan *tscan)
@@ -236,4 +245,10 @@ void __init cpu_init_topology(struct cpu
 	 */
 	__max_die_per_package = tscan.dom_ncpus[TOPO_DIE_DOMAIN] /
 		tscan.dom_ncpus[TOPO_DIE_DOMAIN - 1];
+	/*
+	 * AMD systems have Nodes per package which cannot be mapped to
+	 * APIC ID (yet).
+	 */
+	if (c->x86_vendor == X86_VENDOR_AMD || c->x86_vendor == X86_VENDOR_HYGON)
+		__amd_nodes_per_pkg = __max_die_per_package = tscan.amd_nodes_per_pkg;
 }

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 00/40] x86/cpu: Rework the topology evaluation
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (40 preceding siblings ...)
  2023-08-02 11:56 ` [patch V3 00/40] x86/cpu: Rework the topology evaluation Juergen Gross
@ 2023-08-03  0:07 ` Sohil Mehta
  2023-08-03  3:44 ` Michael Kelley (LINUX)
  42 siblings, 0 replies; 88+ messages in thread
From: Sohil Mehta @ 2023-08-03  0:07 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

On 8/2/2023 3:20 AM, Thomas Gleixner wrote:
> The series is based on V3 of the APIC cleanup series:
> 
>   https://lore.kernel.org/lkml/20230801103042.936020332@linutronix.de
> 
> and also available on top of that from git:
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-cpuid-v3
> 

Tested-by: Sohil Mehta <sohil.mehta@intel.com>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [patch V3 00/40] x86/cpu: Rework the topology evaluation
  2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
                   ` (41 preceding siblings ...)
  2023-08-03  0:07 ` Sohil Mehta
@ 2023-08-03  3:44 ` Michael Kelley (LINUX)
  42 siblings, 0 replies; 88+ messages in thread
From: Michael Kelley (LINUX) @ 2023-08-03  3:44 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Wei Liu

From: Thomas Gleixner <tglx@linutronix.de> Sent: Wednesday, August 2, 2023 3:21 AM

> 
> Hi!
> 
> This is the follow up to V2:
> 
> 
> https://lore.kernel.org/lkml/20230728105650.565799744@linutronix.de/
> 
> which addresses the review feedback and some fallout reported on and
> off-list.
> 
> TLDR:
> 
> This reworks the way how topology information is evaluated via CPUID
> in preparation for a larger topology management overhaul to address
> shortcomings of the current code vs. hybrid systems and systems which make
> use of the extended topology domains in leaf 0x1f. Aside of that it's an
> overdue spring cleaning to get rid of accumulated layers of duct tape and
> haywire.
> 
> What changed vs. V2:
> 
>   - Decoded and fixed the fallout vs. XEN/PV reported by Juergen. Thanks to
>     Juergen for the remote hand debugging sessions!
> 
>     That's addressed in the first two new patches in this series. Summary:
>     XEN/PV booted by pure chance since the addition of SMT control 5 years
>     ago.
> 
>   - Fixed the off by one in the AMD parser which was debugged by Michael
> 
>   - Addressed review comments from various people
> 
> As discussed in:
> 
> 
> https://lore.kernel.org/lkml/BYAPR21MB16889FD224344B1B28BE22A1D705A@BYAPR21MB1688.namprd21.prod.outlook.com/
>   ....
> 
> https://lore.kernel.org/lkml/87r0omjt8c.ffs@tglx/
> 
> this series unfortunately brings the Hyper-V BIOS inconsistency into
> effect, which results in a slight performance impact. The L3 association
> which "worked" so far by exploiting the inconsistency of the Linux topology
> code is not longer supportable as we really need to get the actual short
> comings of our topology management addressed in a consistent way.
> 
> The series is based on V3 of the APIC cleanup series:
> 
> 
> https://lore.kernel.org/lkml/20230801103042.936020332@linutronix.de/
> 
> and also available on top of that from git:
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git topo-cpuid-v3
> 
> Thanks,
> 
> 	tglx
> ---
>  arch/x86/kernel/cpu/topology.c              |  168 -------------------
>  b/Documentation/arch/x86/topology.rst       |   12 -
>  b/arch/x86/events/amd/core.c                |    2
>  b/arch/x86/events/amd/uncore.c              |    2
>  b/arch/x86/events/intel/uncore.c            |    2
>  b/arch/x86/hyperv/hv_vtl.c                  |    2
>  b/arch/x86/include/asm/apic.h               |   32 +--
>  b/arch/x86/include/asm/cacheinfo.h          |    3
>  b/arch/x86/include/asm/cpuid.h              |   36 ++++
>  b/arch/x86/include/asm/mpspec.h             |    2
>  b/arch/x86/include/asm/processor.h          |   60 ++++---
>  b/arch/x86/include/asm/smp.h                |    4
>  b/arch/x86/include/asm/topology.h           |   51 +++++
>  b/arch/x86/include/asm/x86_init.h           |    2
>  b/arch/x86/kernel/acpi/boot.c               |    4
>  b/arch/x86/kernel/amd_nb.c                  |    8
>  b/arch/x86/kernel/apic/apic.c               |   25 +-
>  b/arch/x86/kernel/apic/apic_common.c        |    4
>  b/arch/x86/kernel/apic/apic_flat_64.c       |   13 -
>  b/arch/x86/kernel/apic/apic_noop.c          |    9 -
>  b/arch/x86/kernel/apic/apic_numachip.c      |   21 --
>  b/arch/x86/kernel/apic/bigsmp_32.c          |   10 -
>  b/arch/x86/kernel/apic/local.h              |    6
>  b/arch/x86/kernel/apic/probe_32.c           |   10 -
>  b/arch/x86/kernel/apic/x2apic_cluster.c     |    1
>  b/arch/x86/kernel/apic/x2apic_phys.c        |   10 -
>  b/arch/x86/kernel/apic/x2apic_uv_x.c        |   67 +------
>  b/arch/x86/kernel/cpu/Makefile              |    5
>  b/arch/x86/kernel/cpu/amd.c                 |  156 ------------------
>  b/arch/x86/kernel/cpu/cacheinfo.c           |   51 ++---
>  b/arch/x86/kernel/cpu/centaur.c             |    4
>  b/arch/x86/kernel/cpu/common.c              |  111 +-----------
>  b/arch/x86/kernel/cpu/cpu.h                 |   14 +
>  b/arch/x86/kernel/cpu/debugfs.c             |   97 +++++++++++
>  b/arch/x86/kernel/cpu/hygon.c               |  133 ---------------
>  b/arch/x86/kernel/cpu/intel.c               |   38 ----
>  b/arch/x86/kernel/cpu/mce/amd.c             |    4
>  b/arch/x86/kernel/cpu/mce/apei.c            |    4
>  b/arch/x86/kernel/cpu/mce/core.c            |    4
>  b/arch/x86/kernel/cpu/mce/inject.c          |    7
>  b/arch/x86/kernel/cpu/proc.c                |    8
>  b/arch/x86/kernel/cpu/topology.h            |   51 +++++
>  b/arch/x86/kernel/cpu/topology_amd.c        |  179 ++++++++++++++++++++
>  b/arch/x86/kernel/cpu/topology_common.c     |  240 ++++++++++++++++++++++++++++
>  b/arch/x86/kernel/cpu/topology_ext.c        |  136 +++++++++++++++
>  b/arch/x86/kernel/cpu/zhaoxin.c             |   18 --
>  b/arch/x86/kernel/kvm.c                     |    6
>  b/arch/x86/kernel/sev.c                     |    2
>  b/arch/x86/kernel/smpboot.c                 |   97 ++++++-----
>  b/arch/x86/kernel/vsmp_64.c                 |   13 -
>  b/arch/x86/mm/amdtopology.c                 |   35 +---
>  b/arch/x86/mm/numa.c                        |    4
>  b/arch/x86/xen/apic.c                       |   14 -
>  b/arch/x86/xen/smp_pv.c                     |    3
>  b/drivers/edac/amd64_edac.c                 |    4
>  b/drivers/edac/mce_amd.c                    |    4
>  b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c |    2
>  b/drivers/hwmon/fam15h_power.c              |    7
>  b/drivers/scsi/lpfc/lpfc_init.c             |    8
>  b/drivers/virt/acrn/hsm.c                   |    2
>  b/kernel/cpu.c                              |    6
>  61 files changed, 1077 insertions(+), 956 deletions(-)
> 

Tested a variety of Hyper-V guest sizes running on Intel and AMD
processors of various generations.  Tested guests configured with
hyper-threading and with no hyper-threading, single NUMA node
and multi-NUMA node, etc.  Also tested a hyper-threaded VM with
the 'nosmt' option. 

All topologies look good, modulo the identified Hyper-V issue
with mis-matched APIC IDs that must be fixed by Hyper-V.

Tested-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 23/40] x86/cpu: Provide cpu_init/parse_topology()
  2023-08-02 10:21 ` [patch V3 23/40] x86/cpu: Provide cpu_init/parse_topology() Thomas Gleixner
@ 2023-08-04  8:14   ` K Prateek Nayak
  2023-08-04  8:28     ` Thomas Gleixner
  2023-08-12  6:41   ` Zhang, Rui
  1 sibling, 1 reply; 88+ messages in thread
From: K Prateek Nayak @ 2023-08-04  8:14 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

Hello Thomas,

On 8/2/2023 3:51 PM, Thomas Gleixner wrote:
>
> [..snip..]
>
> +static void topo_set_max_cores(struct topo_scan *tscan)
> +{
> +	/*
> +	 * Bug compatible for now. This is broken on hybrid systems:
> +	 * 8 cores SMT + 8 cores w/o SMT
> +	 * tscan.dom_ncpus[TOPO_CORE_DOMAIN] = 24; 24 / 2 = 12 !!
> +	 *
> +	 * Cannot be fixed without further topology enumeration changes.
> +	 */
> +	tscan->c->x86_max_cores = tscan->dom_ncpus[TOPO_CORE_DOMAIN] >>
> +		x86_topo_system.dom_shifts[TOPO_SMT_DOMAIN];
> +}
>

In Documentation/arch/x86/topology.rst, "cpuinfo_x86.x86_max_cores" is
described as "The number of cores in a package". In which case,
shouldn't the above be:

	tscan->c->x86_max_cores = tscan->dom_ncpus[TOPO_PKG_DOMAIN] >>
		x86_topo_system.dom_shifts[TOPO_SMT_DOMAIN];

since, with extended topology, there could be other higher domains and
dom_ncpus[TOPO_CORE_DOMAIN] >> dom_shifts[TOPO_SMT_DOMAIN] should only
give number of cores within the next domain (TOPO_MODULE_DOMAIN).

Am I missing something here?

--
Thanks and Regards,
Prateek

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 23/40] x86/cpu: Provide cpu_init/parse_topology()
  2023-08-04  8:14   ` K Prateek Nayak
@ 2023-08-04  8:28     ` Thomas Gleixner
  2023-08-04  8:34       ` K Prateek Nayak
  0 siblings, 1 reply; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-04  8:28 UTC (permalink / raw)
  To: K Prateek Nayak, LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

On Fri, Aug 04 2023 at 13:44, K Prateek Nayak wrote:
> On 8/2/2023 3:51 PM, Thomas Gleixner wrote:
>>
>> [..snip..]
>>
>> +static void topo_set_max_cores(struct topo_scan *tscan)
>> +{
>> +	/*
>> +	 * Bug compatible for now. This is broken on hybrid systems:
>> +	 * 8 cores SMT + 8 cores w/o SMT
>> +	 * tscan.dom_ncpus[TOPO_CORE_DOMAIN] = 24; 24 / 2 = 12 !!
>> +	 *
>> +	 * Cannot be fixed without further topology enumeration changes.
>> +	 */
>> +	tscan->c->x86_max_cores = tscan->dom_ncpus[TOPO_CORE_DOMAIN] >>
>> +		x86_topo_system.dom_shifts[TOPO_SMT_DOMAIN];
>> +}
>>
>
> In Documentation/arch/x86/topology.rst, "cpuinfo_x86.x86_max_cores" is
> described as "The number of cores in a package". In which case,
> shouldn't the above be:
>
> 	tscan->c->x86_max_cores = tscan->dom_ncpus[TOPO_PKG_DOMAIN] >>
> 		x86_topo_system.dom_shifts[TOPO_SMT_DOMAIN];
>
> since, with extended topology, there could be other higher domains and
> dom_ncpus[TOPO_CORE_DOMAIN] >> dom_shifts[TOPO_SMT_DOMAIN] should only
> give number of cores within the next domain (TOPO_MODULE_DOMAIN).

You're right in principle.

> Am I missing something here?

The fact, that this is bug compatible. It's broken in several
aspects. The real fix is in the next series, where this function goes
away and actually uses real topology data to compute this.

I could change this to be more "accurate" as you suggested, but that's
not making much of a difference.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 23/40] x86/cpu: Provide cpu_init/parse_topology()
  2023-08-04  8:28     ` Thomas Gleixner
@ 2023-08-04  8:34       ` K Prateek Nayak
  0 siblings, 0 replies; 88+ messages in thread
From: K Prateek Nayak @ 2023-08-04  8:34 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

Hello Thomas,

On 8/4/2023 1:58 PM, Thomas Gleixner wrote:
> On Fri, Aug 04 2023 at 13:44, K Prateek Nayak wrote:
>> On 8/2/2023 3:51 PM, Thomas Gleixner wrote:
>>>
>>> [..snip..]
>>>
>>> +static void topo_set_max_cores(struct topo_scan *tscan)
>>> +{
>>> +	/*
>>> +	 * Bug compatible for now. This is broken on hybrid systems:
>>> +	 * 8 cores SMT + 8 cores w/o SMT
>>> +	 * tscan.dom_ncpus[TOPO_CORE_DOMAIN] = 24; 24 / 2 = 12 !!
>>> +	 *
>>> +	 * Cannot be fixed without further topology enumeration changes.
>>> +	 */
>>> +	tscan->c->x86_max_cores = tscan->dom_ncpus[TOPO_CORE_DOMAIN] >>
>>> +		x86_topo_system.dom_shifts[TOPO_SMT_DOMAIN];
>>> +}
>>>
>>
>> In Documentation/arch/x86/topology.rst, "cpuinfo_x86.x86_max_cores" is
>> described as "The number of cores in a package". In which case,
>> shouldn't the above be:
>>
>> 	tscan->c->x86_max_cores = tscan->dom_ncpus[TOPO_PKG_DOMAIN] >>
>> 		x86_topo_system.dom_shifts[TOPO_SMT_DOMAIN];
>>
>> since, with extended topology, there could be other higher domains and
>> dom_ncpus[TOPO_CORE_DOMAIN] >> dom_shifts[TOPO_SMT_DOMAIN] should only
>> give number of cores within the next domain (TOPO_MODULE_DOMAIN).
> 
> You're right in principle.
> 
>> Am I missing something here?
> 
> The fact, that this is bug compatible. It's broken in several
> aspects. The real fix is in the next series, where this function goes
> away and actually uses real topology data to compute this.
> 
> I could change this to be more "accurate" as you suggested, but that's
> not making much of a difference.

Ah! I see. Thank you for clarifying. I'll keep an eye out for the
next series.

--
Thanks and Regards,
Prateek

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 01/40] cpu/SMT: Make SMT control more robust against enumeration failures
  2023-08-02 10:20 ` [patch V3 01/40] cpu/SMT: Make SMT control more robust against enumeration failures Thomas Gleixner
@ 2023-08-04 17:50   ` Borislav Petkov
  2023-08-04 20:01     ` Thomas Gleixner
  0 siblings, 1 reply; 88+ messages in thread
From: Borislav Petkov @ 2023-08-04 17:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven,
	Huang Rui, Juergen Gross, Dimitri Sivanich, Michael Kelley,
	Wei Liu

On Wed, Aug 02, 2023 at 12:20:59PM +0200, Thomas Gleixner wrote:
>  kernel/cpu.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -630,6 +630,12 @@ static inline bool cpu_smt_allowed(unsig

As discussed on IRC, the name and what the function does is kinda
conflicting.

What it actually queries is whether the CPU can be booted. So something
like this ontop I guess:

---
diff --git a/kernel/cpu.c b/kernel/cpu.c
index f93ce69f7e3d..e4195d5425cb 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -659,7 +659,7 @@ static inline bool cpu_smt_thread_allowed(unsigned int cpu)
 #endif
 }
 
-static inline bool cpu_smt_allowed(unsigned int cpu)
+static inline bool cpu_bootable(unsigned int cpu)
 {
 	if (cpu_smt_control == CPU_SMT_ENABLED && cpu_smt_thread_allowed(cpu))
 		return true;
@@ -691,7 +691,7 @@ bool cpu_smt_possible(void)
 EXPORT_SYMBOL_GPL(cpu_smt_possible);
 
 #else
-static inline bool cpu_smt_allowed(unsigned int cpu) { return true; }
+static inline bool cpu_bootable(unsigned int cpu) { return true; }
 #endif
 
 static inline enum cpuhp_state
@@ -794,10 +794,10 @@ static int bringup_wait_for_ap_online(unsigned int cpu)
 	 * SMT soft disabling on X86 requires to bring the CPU out of the
 	 * BIOS 'wait for SIPI' state in order to set the CR4.MCE bit.  The
 	 * CPU marked itself as booted_once in notify_cpu_starting() so the
-	 * cpu_smt_allowed() check will now return false if this is not the
+	 * cpu_bootable() check will now return false if this is not the
 	 * primary sibling.
 	 */
-	if (!cpu_smt_allowed(cpu))
+	if (!cpu_bootable(cpu))
 		return -ECANCELED;
 	return 0;
 }
@@ -1725,7 +1725,7 @@ static int cpu_up(unsigned int cpu, enum cpuhp_state target)
 		err = -EBUSY;
 		goto out;
 	}
-	if (!cpu_smt_allowed(cpu)) {
+	if (!cpu_bootable(cpu)) {
 		err = -EPERM;
 		goto out;
 	}

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [patch V3 02/40] x86/apic: Fake primary thread mask for XEN/PV
  2023-08-02 10:21 ` [patch V3 02/40] x86/apic: Fake primary thread mask for XEN/PV Thomas Gleixner
@ 2023-08-04 18:12   ` Borislav Petkov
  2023-08-04 20:02     ` Thomas Gleixner
  0 siblings, 1 reply; 88+ messages in thread
From: Borislav Petkov @ 2023-08-04 18:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven,
	Huang Rui, Juergen Gross, Dimitri Sivanich, Michael Kelley,
	Wei Liu

On Wed, Aug 02, 2023 at 12:21:01PM +0200, Thomas Gleixner wrote:
> The SMT control mechanism got added as speculation attack vector
> mitigation. The implemented logic relies on the primary thread mask to
> be set up properly.
> 
> This turns out to be an issue with XEN/PV guests because their CPU hotplug
> mechanics do not enumerate APICs and therefore the mask is never correctly
> populated.
> 
> This went unnoticed so far because by chance XEN/PV ends up with
> smp_num_siblings == 2. So smt_hot-plug_control stays at its default value

I think this means to say "cpu_smt_control". Committer, pls fix. :-)

> CPU_SMT_ENABLED and the primary thread mask is never evaluated in the
> context of CPU hotplug.
> 
> This stopped "working" with the upcoming overhaul of the topology
> evaluation which legitimately provides a fake topology for XEN/PV. That
> sets smp_num_siblings to 1, which causes the core CPU hot-plug core to
> refuse to bring up the APs.
> 
> This happens because smt_hotplug_control is set to CPU_SMT_NOT_SUPPORTED

Ditto.

> which causes cpu_smt_allowed() to evaluate the unpopulated primary thread
> mask with the conclusion that all non-boot CPUs are not valid to be
> plugged.
> 
> The core code has already been made more robust against this kind of fail,
> but the primary thread mask really wants to be populated to avoid other
> issues all over the place.
> 
> Just fake the mask by pretending that all XEN/PV vCPUs are primary threads,
> which is consistent because all of XEN/PVs topology is fake or non-existent.
> 
> Fixes: 6a4d2657e048 ("x86/smp: Provide topology_is_primary_thread()")
> Fixes: f54d4434c281 ("x86/apic: Provide cpu_primary_thread mask")
> Reported-by: Juergen Gross <jgross@suse.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  arch/x86/kernel/apic/apic.c |   11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -36,6 +36,8 @@
>  #include <linux/smp.h>
>  #include <linux/mm.h>
>  
> +#include <xen/xen.h>
> +
>  #include <asm/trace/irq_vectors.h>
>  #include <asm/irq_remapping.h>
>  #include <asm/pc-conf-reg.h>
> @@ -2344,6 +2346,15 @@ static int __init smp_init_primary_threa
>  {
>  	unsigned int cpu;
>  
> +	/*
> +	 * XEN/PV provides either none or useless topology information.
> +	 * Pretend that all vCPUs are primary threads.
> +	 */
> +	if (xen_pv_domain()) {
> +		cpumask_copy(&__cpu_primary_thread_mask, cpu_possible_mask);
> +		return 0;
> +	}

Can this be somewhere in the Xen init code instead?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 01/40] cpu/SMT: Make SMT control more robust against enumeration failures
  2023-08-04 17:50   ` Borislav Petkov
@ 2023-08-04 20:01     ` Thomas Gleixner
  0 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-04 20:01 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: LKML, x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven,
	Huang Rui, Juergen Gross, Dimitri Sivanich, Michael Kelley,
	Wei Liu

On Fri, Aug 04 2023 at 19:50, Borislav Petkov wrote:
> On Wed, Aug 02, 2023 at 12:20:59PM +0200, Thomas Gleixner wrote:
>>  kernel/cpu.c |    6 ++++++
>>  1 file changed, 6 insertions(+)
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -630,6 +630,12 @@ static inline bool cpu_smt_allowed(unsig
>
> As discussed on IRC, the name and what the function does is kinda
> conflicting.
>
> What it actually queries is whether the CPU can be booted. So something
> like this ontop I guess:

No objections from my side.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 02/40] x86/apic: Fake primary thread mask for XEN/PV
  2023-08-04 18:12   ` Borislav Petkov
@ 2023-08-04 20:02     ` Thomas Gleixner
  0 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-04 20:02 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: LKML, x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven,
	Huang Rui, Juergen Gross, Dimitri Sivanich, Michael Kelley,
	Wei Liu

On Fri, Aug 04 2023 at 20:12, Borislav Petkov wrote:
>> @@ -2344,6 +2346,15 @@ static int __init smp_init_primary_threa
>>  {
>>  	unsigned int cpu;
>>  
>> +	/*
>> +	 * XEN/PV provides either none or useless topology information.
>> +	 * Pretend that all vCPUs are primary threads.
>> +	 */
>> +	if (xen_pv_domain()) {
>> +		cpumask_copy(&__cpu_primary_thread_mask, cpu_possible_mask);
>> +		return 0;
>> +	}
>
> Can this be somewhere in the Xen init code instead?

Not for now. That's all going away with the 3rd installment. But right
now it's the right place to be.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 05/40] x86/cpu: Move cpu_die_id into topology info
  2023-08-02 10:21 ` [patch V3 05/40] x86/cpu: Move cpu_die_id " Thomas Gleixner
@ 2023-08-09 14:32   ` Zhang, Rui
  2023-08-09 15:14     ` Thomas Gleixner
  0 siblings, 1 reply; 88+ messages in thread
From: Zhang, Rui @ 2023-08-09 14:32 UTC (permalink / raw)
  To: tglx, linux-kernel
  Cc: Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky, ray.huang,
	andrew.cooper3, Sivanich, Dimitri, wei.liu

Hi, Thomas,

On Wed, 2023-08-02 at 12:21 +0200, Thomas Gleixner wrote:
> Move the next member.
> 
> No functional change.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  Documentation/arch/x86/topology.rst |    4 ++--
>  arch/x86/include/asm/processor.h    |    4 +++-
>  arch/x86/include/asm/topology.h     |    2 +-
>  arch/x86/kernel/cpu/amd.c           |    8 ++++----
>  arch/x86/kernel/cpu/cacheinfo.c     |    2 +-
>  arch/x86/kernel/cpu/common.c        |    2 +-
>  arch/x86/kernel/cpu/hygon.c         |    8 ++++----
>  arch/x86/kernel/cpu/topology.c      |    2 +-
>  arch/x86/kernel/smpboot.c           |   10 +++++-----
>  9 files changed, 22 insertions(+), 20 deletions(-)
> 
> --- a/Documentation/arch/x86/topology.rst
> +++ b/Documentation/arch/x86/topology.rst
> @@ -55,7 +55,7 @@ AMD nomenclature for package is 'Node'.
>  
>      The number of dies in a package. This information is retrieved
> via CPUID.
>  
> -  - cpuinfo_x86.cpu_die_id:
> +  - cpuinfo_x86.topo_die_id:

s/topo_die_id/topo.die_id

thanks,
rui


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 05/40] x86/cpu: Move cpu_die_id into topology info
  2023-08-09 14:32   ` Zhang, Rui
@ 2023-08-09 15:14     ` Thomas Gleixner
  0 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-09 15:14 UTC (permalink / raw)
  To: Zhang, Rui, linux-kernel
  Cc: Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky, ray.huang,
	andrew.cooper3, Sivanich, Dimitri, wei.liu

On Wed, Aug 09 2023 at 14:32, Rui Zhang wrote:
>> @@ -55,7 +55,7 @@ AMD nomenclature for package is 'Node'.
>>  
>>      The number of dies in a package. This information is retrieved
>> via CPUID.
>>  
>> -  - cpuinfo_x86.cpu_die_id:
>> +  - cpuinfo_x86.topo_die_id:
>
> s/topo_die_id/topo.die_id

Ooops. I surely fixed that up later and forgot to fold back.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 19/40] x86/apic: Use u32 for wakeup_secondary_cpu[_64]()
  2023-08-02 10:21 ` [patch V3 19/40] x86/apic: Use u32 for wakeup_secondary_cpu[_64]() Thomas Gleixner
@ 2023-08-10  7:58   ` Qiuxu Zhuo
  0 siblings, 0 replies; 88+ messages in thread
From: Qiuxu Zhuo @ 2023-08-10  7:58 UTC (permalink / raw)
  To: tglx
  Cc: andrew.cooper3, arjan, dimitri.sivanich, jgross, linux-kernel,
	mikelley, ray.huang, thomas.lendacky, wei.liu, x86, qiuxu.zhuo

>From: Thomas Gleixner <tglx@linutronix.de>
> ...
>Subject: [patch V3 19/40] x86/apic: Use u32 for wakeup_secondary_cpu[_64]()
>
>APIC IDs are used with random data types u16, u32, int, unsigned int,
>unsigned long.
>
>Make it all consistently use u32 because that reflects the hardware
>register width.
> ...

Hi Thomas,

Seems some other places (see the diff below) may also need to consistently
use u32 for APIC IDs. If need me to create a separate patch, please let me know.

-Qiuxu

---
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index efa8128437cb..723304e9b4cd 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -514,9 +514,9 @@ extern void generic_bigsmp_probe(void);
 
 extern struct apic apic_noop;
 
-static inline unsigned int read_apic_id(void)
+static inline u32 read_apic_id(void)
 {
-	unsigned int reg = apic_read(APIC_ID);
+	u32 reg = apic_read(APIC_ID);
 
 	return apic->get_apic_id(reg);
 }
@@ -539,7 +539,7 @@ extern u32 default_cpu_present_to_apicid(int mps_cpu);
 
 #else /* CONFIG_X86_LOCAL_APIC */
 
-static inline unsigned int read_apic_id(void) { return 0; }
+static inline u32 read_apic_id(void) { return 0; }
 
 #endif /* !CONFIG_X86_LOCAL_APIC */
 
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index f63ab86f6d57..9e235b71b14e 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1728,7 +1728,7 @@ static void __x2apic_enable(void)
 static int __init setup_nox2apic(char *str)
 {
 	if (x2apic_enabled()) {
-		int apicid = native_apic_msr_read(APIC_ID);
+		u32 apicid = native_apic_msr_read(APIC_ID);
 
 		if (apicid >= 255) {
 			pr_warn("Apicid: %08x, cannot enforce nox2apic\n",
@@ -2354,7 +2354,7 @@ static struct {
 	 */
 	int active;
 	/* r/w apic fields */
-	unsigned int apic_id;
+	u32 apic_id;
 	unsigned int apic_taskpri;
 	unsigned int apic_ldr;
 	unsigned int apic_dfr;
diff --git a/arch/x86/kernel/apic/ipi.c b/arch/x86/kernel/apic/ipi.c
index b54b2a6a4c32..28972a044be2 100644
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -280,7 +280,7 @@ void default_send_IPI_mask_logical(const struct cpumask *cpumask, int vector)
 	local_irq_restore(flags);
 }
 
-static int convert_apicid_to_cpu(int apic_id)
+static int convert_apicid_to_cpu(u32 apic_id)
 {
 	int i;
 
@@ -293,7 +293,8 @@ static int convert_apicid_to_cpu(int apic_id)
 
 int safe_smp_processor_id(void)
 {
-	int apicid, cpuid;
+	u32 apicid;
+	int cpuid;
 
 	if (!boot_cpu_has(X86_FEATURE_APIC))
 		return 0;

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [patch V3a 30/40] x86/cpu: Provide an AMD/HYGON specific topology parser
  2023-08-02 19:51   ` [patch V3a " Thomas Gleixner
@ 2023-08-11 12:58     ` Pu Wen
  2023-08-11 17:11       ` Thomas Gleixner
  0 siblings, 1 reply; 88+ messages in thread
From: Pu Wen @ 2023-08-11 12:58 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

On 2023/8/3 3:51, Thomas Gleixner wrote:
> +	if (tscan->c->x86_vendor == X86_VENDOR_AMD) {
> +		if (tscan->c->x86 == 0x15)
> +			tscan->c->topo.cu_id = leaf.cuid;
> +
> +		cacheinfo_amd_init_llc_id(tscan->c, leaf.nodeid);
> +	} else {
> +		/*
> +		 * Package ID is ApicId[6..] on Hygon CPUs. See commit
> +		 * e0ceeae708ce for explanation. The topology info is
> +		 * screwed up: The package shift is always 6 and the node
> +		 * ID is bit [4:5]. Don't touch the latter without
> +		 * confirmation from the Hygon developers.
> +		 */
> +		topology_set_dom(tscan, TOPO_CORE_DOMAIN, 6, tscan->dom_ncpus[TOPO_CORE_DOMAIN]);

Hygon updated CPUs will not always shift 6, and shift 6 is not good for 
running guests.
So suggest to modify like this:
     if (!boot_cpu_has(X86_FEATURE_HYPERVISOR) && tscan->c->x86_model <= 
0x3)
         topology_set_dom(tscan, TOPO_CORE_DOMAIN, 6, 
tscan->dom_ncpus[TOPO_CORE_DOMAIN]);

-- 
Regards,
Pu Wen

> +		cacheinfo_hygon_init_llc_id(tscan->c);
> +	}
> +	return true;
> +}
> +


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 33/40] x86/cpu: Use common topology code for HYGON
  2023-08-02 10:21 ` [patch V3 33/40] x86/cpu: Use common topology code for HYGON Thomas Gleixner
@ 2023-08-11 13:00   ` Pu Wen
  2023-08-11 17:11     ` Thomas Gleixner
  0 siblings, 1 reply; 88+ messages in thread
From: Pu Wen @ 2023-08-11 13:00 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

On 2023/8/2 18:21, Thomas Gleixner wrote:
> --- a/arch/x86/kernel/cpu/hygon.c
> +++ b/arch/x86/kernel/cpu/hygon.c
> @@ -20,12 +20,6 @@
>   
>   #define APICID_SOCKET_ID_BIT 6

The macro should be removed, since it's no needed any more.

-- 
Regards,
Pu Wen


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3a 30/40] x86/cpu: Provide an AMD/HYGON specific topology parser
  2023-08-11 12:58     ` Pu Wen
@ 2023-08-11 17:11       ` Thomas Gleixner
  2023-08-12  3:58         ` Pu Wen
  0 siblings, 1 reply; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-11 17:11 UTC (permalink / raw)
  To: Pu Wen, LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

On Fri, Aug 11 2023 at 20:58, Pu Wen wrote:
> On 2023/8/3 3:51, Thomas Gleixner wrote:
>> +	if (tscan->c->x86_vendor == X86_VENDOR_AMD) {
>> +		if (tscan->c->x86 == 0x15)
>> +			tscan->c->topo.cu_id = leaf.cuid;
>> +
>> +		cacheinfo_amd_init_llc_id(tscan->c, leaf.nodeid);
>> +	} else {
>> +		/*
>> +		 * Package ID is ApicId[6..] on Hygon CPUs. See commit
>> +		 * e0ceeae708ce for explanation. The topology info is
>> +		 * screwed up: The package shift is always 6 and the node
>> +		 * ID is bit [4:5]. Don't touch the latter without
>> +		 * confirmation from the Hygon developers.
>> +		 */
>> +		topology_set_dom(tscan, TOPO_CORE_DOMAIN, 6, tscan->dom_ncpus[TOPO_CORE_DOMAIN]);
>
> Hygon updated CPUs will not always shift 6, and shift 6 is not good for 
> running guests.
> So suggest to modify like this:
>      if (!boot_cpu_has(X86_FEATURE_HYPERVISOR) && tscan->c->x86_model <= 
> 0x3)
>          topology_set_dom(tscan, TOPO_CORE_DOMAIN, 6, 
> tscan->dom_ncpus[TOPO_CORE_DOMAIN]);

This is exactly what the existing code does today. Can you please send a
delta patch on top of this with a proper explanation?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 33/40] x86/cpu: Use common topology code for HYGON
  2023-08-11 13:00   ` Pu Wen
@ 2023-08-11 17:11     ` Thomas Gleixner
  0 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-11 17:11 UTC (permalink / raw)
  To: Pu Wen, LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

On Fri, Aug 11 2023 at 21:00, Pu Wen wrote:
> On 2023/8/2 18:21, Thomas Gleixner wrote:
>> --- a/arch/x86/kernel/cpu/hygon.c
>> +++ b/arch/x86/kernel/cpu/hygon.c
>> @@ -20,12 +20,6 @@
>>   
>>   #define APICID_SOCKET_ID_BIT 6
>
> The macro should be removed, since it's no needed any more.

Indeed.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3a 30/40] x86/cpu: Provide an AMD/HYGON specific topology parser
  2023-08-11 17:11       ` Thomas Gleixner
@ 2023-08-12  3:58         ` Pu Wen
  2023-10-13  9:38           ` [tip: x86/core] x86/cpu/hygon: Fix the CPU topology evaluation for real tip-bot2 for Pu Wen
  0 siblings, 1 reply; 88+ messages in thread
From: Pu Wen @ 2023-08-12  3:58 UTC (permalink / raw)
  To: Thomas Gleixner, Pu Wen, LKML
  Cc: x86, Tom Lendacky, Andrew Cooper, Arjan van de Ven, Huang Rui,
	Juergen Gross, Dimitri Sivanich, Michael Kelley, Wei Liu

[-- Attachment #1: Type: text/plain, Size: 1386 bytes --]

On 2023/8/12 1:11, Thomas Gleixner wrote:
> On Fri, Aug 11 2023 at 20:58, Pu Wen wrote:
>> On 2023/8/3 3:51, Thomas Gleixner wrote:
>>> +	if (tscan->c->x86_vendor == X86_VENDOR_AMD) {
>>> +		if (tscan->c->x86 == 0x15)
>>> +			tscan->c->topo.cu_id = leaf.cuid;
>>> +
>>> +		cacheinfo_amd_init_llc_id(tscan->c, leaf.nodeid);
>>> +	} else {
>>> +		/*
>>> +		 * Package ID is ApicId[6..] on Hygon CPUs. See commit
>>> +		 * e0ceeae708ce for explanation. The topology info is
>>> +		 * screwed up: The package shift is always 6 and the node
>>> +		 * ID is bit [4:5]. Don't touch the latter without
>>> +		 * confirmation from the Hygon developers.
>>> +		 */
>>> +		topology_set_dom(tscan, TOPO_CORE_DOMAIN, 6, tscan->dom_ncpus[TOPO_CORE_DOMAIN]);
>>
>> Hygon updated CPUs will not always shift 6, and shift 6 is not good for
>> running guests.
>> So suggest to modify like this:
>>       if (!boot_cpu_has(X86_FEATURE_HYPERVISOR) && tscan->c->x86_model <=
>> 0x3)
>>           topology_set_dom(tscan, TOPO_CORE_DOMAIN, 6,
>> tscan->dom_ncpus[TOPO_CORE_DOMAIN]);
> 
> This is exactly what the existing code does today. Can you please send a
> delta patch on top of this with a proper explanation?

Will.

And I think it's better to send a prerequisite patch(is attached) after
patch 02 of this series. Since it can be individually back-ported to the
stable releases.

-- 
Regards,
Pu Wen



[-- Attachment #2: x86-cpu-Refine-the-CPU-topology-deriving-method-for-Hygon.patch --]
[-- Type: text/plain, Size: 1472 bytes --]

From 768af7009f6fa9f9c40251c5979f611479165dc4 Mon Sep 17 00:00:00 2001
From: Pu Wen <puwen@hygon.cn>
Date: Sat, 12 Aug 2023 11:41:06 +0800
Subject: [PATCH] x86/cpu: Refine the CPU topology deriving method for Hygon

Hygon updated processors will not always shift 6 and use the result
of CPUID leaf 0xB to derive the socket ID.

If running on guest, the ApicID is not the same as host's, so just
use the hypervisor's default.

Cc: <stable@vger.kernel.org> # v5.2+
Fixes: e0ceeae708ce ("x86/CPU/hygon: Fix phys_proc_id calculation logic for multi-die processors")
Signed-off-by: Pu Wen <puwen@hygon.cn>
---
 arch/x86/kernel/cpu/hygon.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/hygon.c b/arch/x86/kernel/cpu/hygon.c
index defdc594be14..a7b3ef4c4de9 100644
--- a/arch/x86/kernel/cpu/hygon.c
+++ b/arch/x86/kernel/cpu/hygon.c
@@ -87,8 +87,12 @@ static void hygon_get_topology(struct cpuinfo_x86 *c)
 		if (!err)
 			c->x86_coreid_bits = get_count_order(c->x86_max_cores);
 
-		/* Socket ID is ApicId[6] for these processors. */
-		c->phys_proc_id = c->apicid >> APICID_SOCKET_ID_BIT;
+		/*
+		 * Socket ID is ApicId[6] for the processors with model <= 0x3
+		 * when running on host.
+		 */
+		if (!boot_cpu_has(X86_FEATURE_HYPERVISOR) && c->x86_model <= 0x3)
+			c->phys_proc_id = c->apicid >> APICID_SOCKET_ID_BIT;
 
 		cacheinfo_hygon_init_llc_id(c, cpu);
 	} else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [patch V3 23/40] x86/cpu: Provide cpu_init/parse_topology()
  2023-08-02 10:21 ` [patch V3 23/40] x86/cpu: Provide cpu_init/parse_topology() Thomas Gleixner
  2023-08-04  8:14   ` K Prateek Nayak
@ 2023-08-12  6:41   ` Zhang, Rui
  2023-08-12  8:00     ` Zhang, Rui
  2023-08-13 13:30     ` Zhang, Rui
  1 sibling, 2 replies; 88+ messages in thread
From: Zhang, Rui @ 2023-08-12  6:41 UTC (permalink / raw)
  To: tglx, linux-kernel
  Cc: Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky, ray.huang,
	andrew.cooper3, Sivanich, Dimitri, wei.liu

> +
> +static inline u32 topo_relative_domain_id(u32 apicid, enum
> x86_topology_domains dom)
> +{
> +       if (dom != TOPO_SMT_DOMAIN)
> +               apicid >>= x86_topo_system.dom_shifts[dom - 1];
> +       return apicid & (x86_topo_system.dom_size[dom] - 1);
> +}

relative_domain_id() is used to get a unique id value within its next
higher level.

> +static void topo_set_ids(struct topo_scan *tscan)
> +{
> +       struct cpuinfo_x86 *c = tscan->c;
> +       u32 apicid = c->topo.apicid;
> +
> +       c->topo.pkg_id = topo_shift_apicid(apicid, TOPO_PKG_DOMAIN);
> +       c->topo.die_id = topo_shift_apicid(apicid, TOPO_DIE_DOMAIN);
> +
> +       /* Relative core ID */
> +       c->topo.core_id = topo_relative_domain_id(apicid,
> TOPO_CORE_DOMAIN);

My understanding is that, to ensure a package scope unique core_id,
rather than Module/Tile scope unique, what is really needed here is
something like,
	apicid >>= x86_topo_system.dom_shifts[SMT];
	c->topo.core_id = apicid & (x86_topo_system.dom_size[PACKAGE]
- 1);

I don't have chance to confirm this on a platform with Module level
yet, but will do soon.

thanks,
rui



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 23/40] x86/cpu: Provide cpu_init/parse_topology()
  2023-08-12  6:41   ` Zhang, Rui
@ 2023-08-12  8:00     ` Zhang, Rui
  2023-08-14  6:26       ` Thomas Gleixner
  2023-08-13 13:30     ` Zhang, Rui
  1 sibling, 1 reply; 88+ messages in thread
From: Zhang, Rui @ 2023-08-12  8:00 UTC (permalink / raw)
  To: tglx, linux-kernel
  Cc: Brown, Len, Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky,
	ray.huang, andrew.cooper3, Sivanich, Dimitri, wei.liu

On Sat, 2023-08-12 at 14:38 +0800, Zhang Rui wrote:
> > +
> > +static inline u32 topo_relative_domain_id(u32 apicid, enum
> > x86_topology_domains dom)
> > +{
> > +       if (dom != TOPO_SMT_DOMAIN)
> > +               apicid >>= x86_topo_system.dom_shifts[dom - 1];
> > +       return apicid & (x86_topo_system.dom_size[dom] - 1);
> > +}
> 
> relative_domain_id() is used to get a unique id value within its next
> higher level.
> 
> > +static void topo_set_ids(struct topo_scan *tscan)
> > +{
> > +       struct cpuinfo_x86 *c = tscan->c;
> > +       u32 apicid = c->topo.apicid;
> > +
> > +       c->topo.pkg_id = topo_shift_apicid(apicid,
> > TOPO_PKG_DOMAIN);
> > +       c->topo.die_id = topo_shift_apicid(apicid,
> > TOPO_DIE_DOMAIN);

And die_id is also package scope unique before this patch series.

> > +
> > +       /* Relative core ID */
> > +       c->topo.core_id = topo_relative_domain_id(apicid,
> > TOPO_CORE_DOMAIN);
> 
> My understanding is that, to ensure a package scope unique core_id,
> rather than Module/Tile scope unique, what is really needed here is
> something like,
>         apicid >>= x86_topo_system.dom_shifts[SMT];
>         c->topo.core_id = apicid & (x86_topo_system.dom_size[PACKAGE]
> - 1);
> 
BTW, can we consider using system wide unique core_id instead?

There are a couple of advantages by using this.
CC Len, who can provide detailed justifications for this.

thanks,
rui

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser
  2023-08-02 10:21 ` [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser Thomas Gleixner
@ 2023-08-12  8:21   ` Zhang, Rui
  2023-08-12 20:04     ` Thomas Gleixner
  0 siblings, 1 reply; 88+ messages in thread
From: Zhang, Rui @ 2023-08-12  8:21 UTC (permalink / raw)
  To: tglx, linux-kernel
  Cc: Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky, ray.huang,
	andrew.cooper3, Sivanich, Dimitri, wei.liu

Hi, Thomas,

On Wed, 2023-08-02 at 12:21 +0200, Thomas Gleixner wrote:
> detect_extended_topology() along with it's early() variant is a
> classic
> example for duct tape engineering:
> 
>   - It evaluates an array of subleafs with a boatload of local
> variables
>     for the relevant topology levels instead of using an array to
> save the
>     enumerated information and propagate it to the right level
> 
>   - It has no boundary checks for subleafs
> 
>   - It prevents updating the die_id with a crude workaround instead
> of
>     checking for leaf 0xb which does not provide die information.
> 
>   - It's broken vs. the number of dies evaluation as it uses:
> 
>       num_processors[DIE_LEVEL] / num_processors[CORE_LEVEL]
> 
>     which "works" only correctly if there is none of the intermediate
>     topology levels (MODULE/TILE) enumerated.
> 
> There is zero value in trying to "fix" that code as the only proper
> fix is
> to rewrite it from scratch.
> 
> Implement a sane parser with proper code documentation, which will be
> used
> for the consolidated topology evaluation in the next step.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> V2: Fixed up the comment alignment for registers - Peterz
> ---
>  arch/x86/kernel/cpu/Makefile       |    2 
>  arch/x86/kernel/cpu/topology.h     |   12 +++
>  arch/x86/kernel/cpu/topology_ext.c |  136
> +++++++++++++++++++++++++++++++++++++
>  3 files changed, 149 insertions(+), 1 deletion(-)
> 
> --- a/arch/x86/kernel/cpu/Makefile
> +++ b/arch/x86/kernel/cpu/Makefile
> @@ -18,7 +18,7 @@ KMSAN_SANITIZE_common.o := n
>  KCSAN_SANITIZE_common.o := n
>  
>  obj-y                  := cacheinfo.o scattered.o
> -obj-y                  += topology_common.o topology.o
> +obj-y                  += topology_common.o topology_ext.o
> topology.o
>  obj-y                  += common.o
>  obj-y                  += rdrand.o
>  obj-y                  += match.o
> --- a/arch/x86/kernel/cpu/topology.h
> +++ b/arch/x86/kernel/cpu/topology.h
> @@ -16,6 +16,7 @@ void cpu_init_topology(struct cpuinfo_x8
>  void cpu_parse_topology(struct cpuinfo_x86 *c);
>  void topology_set_dom(struct topo_scan *tscan, enum
> x86_topology_domains dom,
>                       unsigned int shift, unsigned int ncpus);
> +bool cpu_parse_topology_ext(struct topo_scan *tscan);
>  
>  static inline u32 topo_shift_apicid(u32 apicid, enum
> x86_topology_domains dom)
>  {
> @@ -31,4 +32,15 @@ static inline u32 topo_relative_domain_i
>         return apicid & (x86_topo_system.dom_size[dom] - 1);
>  }
>  
> +/*
> + * Update a domain level after the fact without propagating. Used to
> fixup
> + * broken CPUID enumerations.
> + */
> +static inline void topology_update_dom(struct topo_scan *tscan, enum
> x86_topology_domains dom,
> +                                      unsigned int shift, unsigned
> int ncpus)
> +{
> +       tscan->dom_shifts[dom] = shift;
> +       tscan->dom_ncpus[dom] = ncpus;
> +}
> +
>  #endif /* ARCH_X86_TOPOLOGY_H */
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/topology_ext.c
> @@ -0,0 +1,136 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/cpu.h>
> +
> +#include <asm/apic.h>
> +#include <asm/memtype.h>
> +#include <asm/processor.h>
> +
> +#include "cpu.h"
> +
> +enum topo_types {
> +       INVALID_TYPE    = 0,
> +       SMT_TYPE        = 1,
> +       CORE_TYPE       = 2,
> +       MODULE_TYPE     = 3,
> +       TILE_TYPE       = 4,
> +       DIE_TYPE        = 5,
> +       DIEGRP_TYPE     = 6,
> +       MAX_TYPE        = 7,
> +};
> +
> +/*
> + * Use a lookup table for the case that there are future types > 6
> which
> + * describe an intermediate domain level which does not exist today.
> + *
> + * A table will also be handy to parse the new AMD 0x80000026 leaf
> which
> + * has defined different domain types, but otherwise uses the same
> layout
> + * with some of the reserved bits used for new information.
> + */
> +static const unsigned int topo_domain_map[MAX_TYPE] = {
> +       [SMT_TYPE]      = TOPO_SMT_DOMAIN,
> +       [CORE_TYPE]     = TOPO_CORE_DOMAIN,
> +       [MODULE_TYPE]   = TOPO_MODULE_DOMAIN,
> +       [TILE_TYPE]     = TOPO_TILE_DOMAIN,
> +       [DIE_TYPE]      = TOPO_DIE_DOMAIN,
> +       [DIEGRP_TYPE]   = TOPO_PKG_DOMAIN,

May I know why DIEGRP_TYPE is mapped to TOPO_PKG_DOMAIN?

> +};
> +
> +static inline bool topo_subleaf(struct topo_scan *tscan, u32 leaf,
> u32 subleaf)
> +{
> +       unsigned int dom, maxtype = leaf == 0xb ? CORE_TYPE + 1 :
> MAX_TYPE;
> +       struct {
> +               // eax
> +               u32     x2apic_shift    :  5, // Number of bits to
> shift APIC ID right
> +                                             // for the topology ID
> at the next level
> +                       __rsvd0         : 27; // Reserved
> +               // ebx
> +               u32     num_processors  : 16, // Number of processors
> at current level
> +                       __rsvd1         : 16; // Reserved
> +               // ecx
> +               u32     level           :  8, // Current topology
> level. Same as sub leaf number
> +                       type            :  8, // Level type. If 0,
> invalid
> +                       __rsvd2         : 16; // Reserved
> +               // edx
> +               u32     x2apic_id       : 32; // X2APIC ID of the
> current logical processor
> +       } sl;
> +
> +       cpuid_subleaf(leaf, subleaf, &sl);
> +
> +       if (!sl.num_processors || sl.type == INVALID_TYPE)
> +               return false;
> +
> +       if (sl.type >= maxtype) {

It is still legal to have sparse type value in the future, and then
this check will break.
IMO, it is better to use a function to convert type to domain, and
check for unknown domain here, say, something like

diff --git a/arch/x86/kernel/cpu/topology_ext.c
b/arch/x86/kernel/cpu/topology_ext.c
index 5ddc5d24435e..7720a7bc7478 100644
--- a/arch/x86/kernel/cpu/topology_ext.c
+++ b/arch/x86/kernel/cpu/topology_ext.c
@@ -26,14 +26,27 @@ enum topo_types {
  * has defined different domain types, but otherwise uses the same
layout
  * with some of the reserved bits used for new information.
  */
-static const unsigned int topo_domain_map[MAX_TYPE] = {
-	[SMT_TYPE]	= TOPO_SMT_DOMAIN,
-	[CORE_TYPE]	= TOPO_CORE_DOMAIN,
-	[MODULE_TYPE]	= TOPO_MODULE_DOMAIN,
-	[TILE_TYPE]	= TOPO_TILE_DOMAIN,
-	[DIE_TYPE]	= TOPO_DIE_DOMAIN,
-	[DIEGRP_TYPE]	= TOPO_PKG_DOMAIN,
-};
+
+static enum x86_topology_domains topo_type_to_domain(int type)
+{
+	switch (type) {
+	case SMT_TYPE:
+		return TOPO_SMT_DOMAIN;
+	case CORE_TYPE:
+		return TOPO_CORE_DOMAIN;
+	case MODULE_TYPE:
+		return TOPO_MODULE_DOMAIN;
+	case TILE_TYPE:
+		return TOPO_TILE_DOMAIN;
+	case DIE_TYPE:
+		return TOPO_DIE_DOMAIN;
+	case DIEGRP_TYPE:
+		return TOPO_PKG_DOMAIN;
+	default:
+		return TOPO_MAX_DOMAIN;
+	}
+
+}
 
 static inline bool topo_subleaf(struct topo_scan *tscan, u32 leaf, u32
subleaf)
 {
@@ -59,7 +72,8 @@ static inline bool topo_subleaf(struct topo_scan
*tscan, u32 leaf, u32 subleaf)
 	if (!sl.num_processors || sl.type == INVALID_TYPE)
 		return false;
 
-	if (sl.type >= maxtype) {
+	dom = topo_type_to_domain(sl.type);
+	if (dom == TOPO_MAX_DOMAIN) {
 		/*
 		 * As the subleafs are ordered in domain level order,
this
 		 * could be recovered in theory by propagating the
@@ -84,7 +98,6 @@ static inline bool topo_subleaf(struct topo_scan
*tscan, u32 leaf, u32 subleaf)
 		return true;
 	}
 
-	dom = topo_domain_map[sl.type];
 	if (!dom) {
 		tscan->c->topo.initial_apicid = sl.x2apic_id;
 	} else if (tscan->c->topo.initial_apicid != sl.x2apic_id) {

> +               /*
> +                * As the subleafs are ordered in domain level order,
> this
> +                * could be recovered in theory by propagating the
> +                * information at the last parsed level.
> +                *
> +                * But if the infinite wisdom of hardware folks
> decides to
> +                * create a new domain type between CORE and MODULE
> or DIE
> +                * and DIEGRP, then that would overwrite the CORE or
> DIE
> +                * information.

Sorry that I'm confused here.

Say, we have CORE, FOO, MODULE, then the subleave of FOO must be higher
than CORE but lower than MODULE.
so we parse CORE first and propagates the info to FOO/MODULE, and then
parse FOO and propagate to MODULE, and parse MODULE in the end.
How could we overwrite the info of a lower level?

> +                *
> +                * It really would have been too obvious to make the
> domain
> +                * type space sparse and leave a few reserved types
> between
> +                * the points which might change instead of forcing
> +                * software to either create a monstrosity of
> workarounds
> +                * or just being up the creek without a paddle.

Agreed.
with sparse type space, we know the relationship between different
types, without knowing what the type really means.

> +                *
> +                * Refuse to implement monstrosity, emit an error and try
> +                * to survive.
> +                */
> +               pr_err_once("Topology: leaf 0x%x:%d Unknown domain type %u\n",
> +                           leaf, subleaf, sl.type);
> +               return true;

Don't want to be TLDR, I can think of a couple cases that breaks Linux
in different ways if we ignore the cpu topology info of an unknown
level.

So I just want to understand the strategy here, does this mean that
we're not looking for a future proof solution, and instead we are
planning to take future updates (patch enum topo_types/enum
x86_topology_domains/topo_domain_map) whenever a new level is invented?


TBH, I'm still thinking of a future proof proposal here.
currently, Linux only cares about pkg_id/core_id/die_id, and the
relationship between these three levels.
1. for package id: pkg_id_low = FOO.x2apic_shift (FOO is the highest
enumerated level, no matter its type is known or not)
2. for core_id: as SMT level is always enumerated, core_id_low =
SMT.x2apic_shift, core_id_high = pkg_id_low - 1;
3. for die_id: Make Linux Die *OPTIONAL*.
   when DIE is enumerated via CPUID.1F, die_id_low = FOO.x2apic_shift
(FOO is the next enumerated lower level of DIE, no matter its type is
known or not), die_id_high = pkg_id_low - 1;
   when DIE is not enumerated via CPUID.1F, then Linux die does not
exist, adjust the die related topology information, say, die_id = -1,
topology_max_dies_per_package = 0, etc, and don't expose die sysfs I/F.

With this, we can guarantee that all the available topology information
are always valid, even when running on future platforms.

what do you think?

thanks,
rui

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* Re: [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser
  2023-08-12  8:21   ` Zhang, Rui
@ 2023-08-12 20:04     ` Thomas Gleixner
  2023-08-13 15:04       ` Thomas Gleixner
  0 siblings, 1 reply; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-12 20:04 UTC (permalink / raw)
  To: Zhang, Rui, linux-kernel
  Cc: Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky, ray.huang,
	andrew.cooper3, Sivanich, Dimitri, wei.liu

On Sat, Aug 12 2023 at 08:21, Rui Zhang wrote:
> On Wed, 2023-08-02 at 12:21 +0200, Thomas Gleixner wrote:
>> +
>> +/*
>> + * Use a lookup table for the case that there are future types > 6
>> which
>> + * describe an intermediate domain level which does not exist today.
>> + *
>> + * A table will also be handy to parse the new AMD 0x80000026 leaf
>> which
>> + * has defined different domain types, but otherwise uses the same
>> layout
>> + * with some of the reserved bits used for new information.
>> + */
>> +static const unsigned int topo_domain_map[MAX_TYPE] = {
>> +       [SMT_TYPE]      = TOPO_SMT_DOMAIN,
>> +       [CORE_TYPE]     = TOPO_CORE_DOMAIN,
>> +       [MODULE_TYPE]   = TOPO_MODULE_DOMAIN,
>> +       [TILE_TYPE]     = TOPO_TILE_DOMAIN,
>> +       [DIE_TYPE]      = TOPO_DIE_DOMAIN,
>> +       [DIEGRP_TYPE]   = TOPO_PKG_DOMAIN,
>
> May I know why DIEGRP_TYPE is mapped to TOPO_PKG_DOMAIN?

Where else should it go? It's the topmost, no? But diegrp is a
terminoligy which is not used in the kernel

>> +
>> +       if (sl.type >= maxtype) {
>
> It is still legal to have sparse type value in the future, and then
> this check will break.
> IMO, it is better to use a function to convert type to domain, and
> check for unknown domain here, say, something like

Why? If somewhere in the future Intel decides to add UBER_TILE_TYPE,
then this will be a type larger than DIEGRP_TYPE. maxtype will then
cover the whole thing and the table will map it to the right place.

Even if in their infinite wisdom the HW folks decide to make a gap, then
the table can handle it simply by putting an invalid value into the gap
and checking for that.

Serioulsy we don't need a switch case for that.
>> +               /*
>> +                * As the subleafs are ordered in domain level order,
>> this
>> +                * could be recovered in theory by propagating the
>> +                * information at the last parsed level.
>> +                *
>> +                * But if the infinite wisdom of hardware folks
>> decides to
>> +                * create a new domain type between CORE and MODULE
>> or DIE
>> +                * and DIEGRP, then that would overwrite the CORE or
>> DIE
>> +                * information.
>
> Sorry that I'm confused here.
>
> Say, we have CORE, FOO, MODULE, then the subleave of FOO must be higher
> than CORE but lower than MODULE.
> so we parse CORE first and propagates the info to FOO/MODULE, and then
> parse FOO and propagate to MODULE, and parse MODULE in the end.
> How could we overwrite the info of a lower level?

We don't know about this new thing yet. So where should we propagate to?
We could say, last was core so we stick the new thing into module, but
do we know that's correct? Do we know there is a module actually. Let me
rephrase that comment.
>> +                *
>> +                * Refuse to implement monstrosity, emit an error and try
>> +                * to survive.
>> +                */
>> +               pr_err_once("Topology: leaf 0x%x:%d Unknown domain type %u\n",
>> +                           leaf, subleaf, sl.type);
>> +               return true;
>
> Don't want to be TLDR, I can think of a couple cases that breaks Linux
> in different ways if we ignore the cpu topology info of an unknown
> level.

Come on. If Intel manages it to create a new level then it's not rocket
science to integrate support for that a long time before actual silicon
ships. So what will break? The machines of people who use ancient
kernels on modern hardware? They can keep the pieces.

> So I just want to understand the strategy here, does this mean that
> we're not looking for a future proof solution, and instead we are
> planning to take future updates (patch enum topo_types/enum
> x86_topology_domains/topo_domain_map) whenever a new level is
> invented?

You need that anyway, no? 

> With this, we can guarantee that all the available topology information
> are always valid, even when running on future platforms.

I know that it can be made work, but is it worth the extra effort? I
don't think so.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 23/40] x86/cpu: Provide cpu_init/parse_topology()
  2023-08-12  6:41   ` Zhang, Rui
  2023-08-12  8:00     ` Zhang, Rui
@ 2023-08-13 13:30     ` Zhang, Rui
  2023-08-13 14:36       ` Thomas Gleixner
  1 sibling, 1 reply; 88+ messages in thread
From: Zhang, Rui @ 2023-08-13 13:30 UTC (permalink / raw)
  To: tglx, linux-kernel
  Cc: Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky, ray.huang,
	andrew.cooper3, Sivanich, Dimitri, wei.liu

On Sat, 2023-08-12 at 06:41 +0000, Zhang, Rui wrote:
> > +
> > +static inline u32 topo_relative_domain_id(u32 apicid, enum
> > x86_topology_domains dom)
> > +{
> > +       if (dom != TOPO_SMT_DOMAIN)
> > +               apicid >>= x86_topo_system.dom_shifts[dom - 1];
> > +       return apicid & (x86_topo_system.dom_size[dom] - 1);
> > +}
> 
> relative_domain_id() is used to get a unique id value within its next
> higher level.
> 
> > +static void topo_set_ids(struct topo_scan *tscan)
> > +{
> > +       struct cpuinfo_x86 *c = tscan->c;
> > +       u32 apicid = c->topo.apicid;
> > +
> > +       c->topo.pkg_id = topo_shift_apicid(apicid,
> > TOPO_PKG_DOMAIN);
> > +       c->topo.die_id = topo_shift_apicid(apicid,
> > TOPO_DIE_DOMAIN);
> > +
> > +       /* Relative core ID */
> > +       c->topo.core_id = topo_relative_domain_id(apicid,
> > TOPO_CORE_DOMAIN);
> 
> My understanding is that, to ensure a package scope unique core_id,
> rather than Module/Tile scope unique, what is really needed here is
> something like,
>         apicid >>= x86_topo_system.dom_shifts[SMT];
>         c->topo.core_id = apicid & (x86_topo_system.dom_size[PACKAGE]
> - 1);
> 
> I don't have chance to confirm this on a platform with Module level
> yet, but will do soon.
> 
Tested on an AdlerLake-N platform, which has 2 Ecore Modules only.

[    0.212526] CPU topo: Max. logical packages:   1
[    0.212527] CPU topo: Max. logical dies:       1
[    0.212528] CPU topo: Max. dies per package:   1
[    0.212531] CPU topo: Max. threads per core:   1
[    0.212532] CPU topo: Num. cores per package:     8
[    0.212532] CPU topo: Num. threads per package:   8
[    0.212532] CPU topo: Allowing 8 present CPUs plus 0 hotplug CPUs
[    0.212535] CPU topo: Thread    :     8
[    0.212537] CPU topo: Core      :     8
[    0.212539] CPU topo: Module    :     2
[    0.212541] CPU topo: Tile      :     1
[    0.212543] CPU topo: Die       :     1
[    0.212545] CPU topo: Package   :     1

This is all good, however,

# grep . /sys/devices/system/cpu/cpu*/topology/c*_id
/sys/devices/system/cpu/cpu0/topology/cluster_id:0
/sys/devices/system/cpu/cpu0/topology/core_id:0
/sys/devices/system/cpu/cpu1/topology/cluster_id:0
/sys/devices/system/cpu/cpu1/topology/core_id:1
/sys/devices/system/cpu/cpu2/topology/cluster_id:0
/sys/devices/system/cpu/cpu2/topology/core_id:2
/sys/devices/system/cpu/cpu3/topology/cluster_id:0
/sys/devices/system/cpu/cpu3/topology/core_id:3
/sys/devices/system/cpu/cpu4/topology/cluster_id:8
/sys/devices/system/cpu/cpu4/topology/core_id:0
/sys/devices/system/cpu/cpu5/topology/cluster_id:8
/sys/devices/system/cpu/cpu5/topology/core_id:1
/sys/devices/system/cpu/cpu6/topology/cluster_id:8
/sys/devices/system/cpu/cpu6/topology/core_id:2
/sys/devices/system/cpu/cpu7/topology/cluster_id:8
/sys/devices/system/cpu/cpu7/topology/core_id:3

The core_id is broken as it is Module scope unique only. To get package
scope unique core id, it should contain all bits up to package id bits.

thanks,
rui


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 23/40] x86/cpu: Provide cpu_init/parse_topology()
  2023-08-13 13:30     ` Zhang, Rui
@ 2023-08-13 14:36       ` Thomas Gleixner
  2023-08-14  6:20         ` Thomas Gleixner
  0 siblings, 1 reply; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-13 14:36 UTC (permalink / raw)
  To: Zhang, Rui, linux-kernel
  Cc: Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky, ray.huang,
	andrew.cooper3, Sivanich, Dimitri, wei.liu

On Sun, Aug 13 2023 at 13:30, Rui Zhang wrote:
> On Sat, 2023-08-12 at 06:41 +0000, Zhang, Rui wrote:
>> > +static inline u32 topo_relative_domain_id(u32 apicid, enum
>> > x86_topology_domains dom)
>> > +{
>> > +       if (dom != TOPO_SMT_DOMAIN)
>> > +               apicid >>= x86_topo_system.dom_shifts[dom - 1];
>> > +       return apicid & (x86_topo_system.dom_size[dom] - 1);
>> > +}
>> 
>> relative_domain_id() is used to get a unique id value within its next
>> higher level.

Correct.
 
>> > +static void topo_set_ids(struct topo_scan *tscan)
>> > +{
>> > +       struct cpuinfo_x86 *c = tscan->c;
>> > +       u32 apicid = c->topo.apicid;
>> > +
>> > +       c->topo.pkg_id = topo_shift_apicid(apicid,
>> > TOPO_PKG_DOMAIN);
>> > +       c->topo.die_id = topo_shift_apicid(apicid,
>> > TOPO_DIE_DOMAIN);
>> > +
>> > +       /* Relative core ID */
>> > +       c->topo.core_id = topo_relative_domain_id(apicid,
>> > TOPO_CORE_DOMAIN);
>> 
>> My understanding is that, to ensure a package scope unique core_id,
>> rather than Module/Tile scope unique, what is really needed here is
>> something like,
>>         apicid >>= x86_topo_system.dom_shifts[SMT];
>>         c->topo.core_id = apicid & (x86_topo_system.dom_size[PACKAGE]
>> - 1);

Indeed.

> This is all good, however,
>
> # grep . /sys/devices/system/cpu/cpu*/topology/c*_id
> /sys/devices/system/cpu/cpu0/topology/cluster_id:0
> /sys/devices/system/cpu/cpu0/topology/core_id:0
> /sys/devices/system/cpu/cpu1/topology/cluster_id:0
> /sys/devices/system/cpu/cpu1/topology/core_id:1
> /sys/devices/system/cpu/cpu2/topology/cluster_id:0
> /sys/devices/system/cpu/cpu2/topology/core_id:2
> /sys/devices/system/cpu/cpu3/topology/cluster_id:0
> /sys/devices/system/cpu/cpu3/topology/core_id:3
> /sys/devices/system/cpu/cpu4/topology/cluster_id:8
> /sys/devices/system/cpu/cpu4/topology/core_id:0
> /sys/devices/system/cpu/cpu5/topology/cluster_id:8
> /sys/devices/system/cpu/cpu5/topology/core_id:1
> /sys/devices/system/cpu/cpu6/topology/cluster_id:8
> /sys/devices/system/cpu/cpu6/topology/core_id:2
> /sys/devices/system/cpu/cpu7/topology/cluster_id:8
> /sys/devices/system/cpu/cpu7/topology/core_id:3
>
> The core_id is broken as it is Module scope unique only. To get package
> scope unique core id, it should contain all bits up to package id bits.

Right. Let me correct that.

But aside of that we need to have a discussion urgently how we look at
these things. If relative, then relative to what.

Right now its how it happened to be, but there is not really a plan
behind all that.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser
  2023-08-12 20:04     ` Thomas Gleixner
@ 2023-08-13 15:04       ` Thomas Gleixner
  2023-08-14  8:25         ` Zhang, Rui
  0 siblings, 1 reply; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-13 15:04 UTC (permalink / raw)
  To: Zhang, Rui, linux-kernel
  Cc: Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky, ray.huang,
	andrew.cooper3, Sivanich, Dimitri, wei.liu

On Sat, Aug 12 2023 at 22:04, Thomas Gleixner wrote:
> On Sat, Aug 12 2023 at 08:21, Rui Zhang wrote:
>> With this, we can guarantee that all the available topology information
>> are always valid, even when running on future platforms.
>
> I know that it can be made work, but is it worth the extra effort? I
> don't think so.

So I thought more about it. For intermediate levels, i.e. something
which is squeezed between two existing levels, this works by some
definition of works.

I.e. the example where we have UBER_TILE between TILE and DIE, then we'd
set and propagate the UBER_TILE entry into the DIE slot and then
overwrite it again, if there is a DIE entry too.

Where it becomes interesting is when the unknown level is past DIEGRP,
e.g. DIEGRP_CONGLOMORATE then we'd need to overwrite the DIEGRP level,
right?

It can be done, but I don't know whether it buys us much for the purely
theoretical case of new levels added.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 23/40] x86/cpu: Provide cpu_init/parse_topology()
  2023-08-13 14:36       ` Thomas Gleixner
@ 2023-08-14  6:20         ` Thomas Gleixner
  2023-08-14  6:42           ` Zhang, Rui
  0 siblings, 1 reply; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-14  6:20 UTC (permalink / raw)
  To: Zhang, Rui, linux-kernel
  Cc: Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky, ray.huang,
	andrew.cooper3, Sivanich, Dimitri, wei.liu

On Sun, Aug 13 2023 at 16:36, Thomas Gleixner wrote:
> On Sun, Aug 13 2023 at 13:30, Rui Zhang wrote:
>>> My understanding is that, to ensure a package scope unique core_id,
>>> rather than Module/Tile scope unique, what is really needed here is
>>> something like,
>>>>
>>>         apicid >>= x86_topo_system.dom_shifts[SMT];
>>>         c->topo.core_id = apicid & (x86_topo_system.dom_size[PACKAGE]
>>> - 1);

Actually it needs to be:

         apicid &= (1U << x86_topo_system.dom_shifts[TOPO_PKG_DOMAIN]) - 1;
         c->topo.core_id = apicid >> x86_topo_system.dom_shifts[TOPO_SMT_DOMAIN];

because otherwise you shift the lowest package ID bit into the result.

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 23/40] x86/cpu: Provide cpu_init/parse_topology()
  2023-08-12  8:00     ` Zhang, Rui
@ 2023-08-14  6:26       ` Thomas Gleixner
  2023-08-14  7:11         ` Zhang, Rui
  0 siblings, 1 reply; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-14  6:26 UTC (permalink / raw)
  To: Zhang, Rui, linux-kernel
  Cc: Brown, Len, Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky,
	ray.huang, andrew.cooper3, Sivanich, Dimitri, wei.liu

On Sat, Aug 12 2023 at 08:00, Rui Zhang wrote:
> On Sat, 2023-08-12 at 14:38 +0800, Zhang Rui wrote:
> BTW, can we consider using system wide unique core_id instead?
>
> There are a couple of advantages by using this.
> CC Len, who can provide detailed justifications for this.

I have no problem with that. But as I said before we need a discussion
about the ID representation in general so it becomes a consistent view
at all levels.

The other thing to think about is whether we really need all these IDs
explicitly stored in cpu_info::topo... The vast majority of usage sites
is in some slow path (setup, cpu hotplug, proc, sysfs ...).

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 23/40] x86/cpu: Provide cpu_init/parse_topology()
  2023-08-14  6:20         ` Thomas Gleixner
@ 2023-08-14  6:42           ` Zhang, Rui
  0 siblings, 0 replies; 88+ messages in thread
From: Zhang, Rui @ 2023-08-14  6:42 UTC (permalink / raw)
  To: tglx, linux-kernel
  Cc: Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky, ray.huang,
	andrew.cooper3, Sivanich, Dimitri, wei.liu

On Mon, 2023-08-14 at 08:20 +0200, Thomas Gleixner wrote:
> On Sun, Aug 13 2023 at 16:36, Thomas Gleixner wrote:
> > On Sun, Aug 13 2023 at 13:30, Rui Zhang wrote:
> > > > My understanding is that, to ensure a package scope unique
> > > > core_id,
> > > > rather than Module/Tile scope unique, what is really needed
> > > > here is
> > > > something like,
> > > > > 
> > > >         apicid >>= x86_topo_system.dom_shifts[SMT];
> > > >         c->topo.core_id = apicid &
> > > > (x86_topo_system.dom_size[PACKAGE]
> > > > - 1);
> 
> Actually it needs to be:
> 
>          apicid &= (1U <<
> x86_topo_system.dom_shifts[TOPO_PKG_DOMAIN]) - 1;
>          c->topo.core_id = apicid >>
> x86_topo_system.dom_shifts[TOPO_SMT_DOMAIN];
> 
> because otherwise you shift the lowest package ID bit into the
> result.

Agreed.

thanks,
rui


^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 23/40] x86/cpu: Provide cpu_init/parse_topology()
  2023-08-14  6:26       ` Thomas Gleixner
@ 2023-08-14  7:11         ` Zhang, Rui
  0 siblings, 0 replies; 88+ messages in thread
From: Zhang, Rui @ 2023-08-14  7:11 UTC (permalink / raw)
  To: tglx, linux-kernel
  Cc: Brown, Len, Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky,
	ray.huang, andrew.cooper3, Sivanich, Dimitri, wei.liu, Luck,
	Tony

On Mon, 2023-08-14 at 08:26 +0200, Thomas Gleixner wrote:
> On Sat, Aug 12 2023 at 08:00, Rui Zhang wrote:
> > On Sat, 2023-08-12 at 14:38 +0800, Zhang Rui wrote:
> > BTW, can we consider using system wide unique core_id instead?
> > 
> > There are a couple of advantages by using this.
> > CC Len, who can provide detailed justifications for this.
> 
> I have no problem with that. But as I said before we need a
> discussion
> about the ID representation in general so it becomes a consistent
> view
> at all levels.

Agreed.
And I think Len will be back online and propose something soon and we
can start with that. :)

> 
> The other thing to think about is whether we really need all these
> IDs
> explicitly stored in cpu_info::topo... The vast majority of usage
> sites
> is in some slow path (setup, cpu hotplug, proc, sysfs ...).
> 

I don't have a strong preference here. With the new framework, this
info is really handy even if we don't cache it somewhere.

thanks,
rui

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser
  2023-08-13 15:04       ` Thomas Gleixner
@ 2023-08-14  8:25         ` Zhang, Rui
  2023-08-14 12:26           ` Thomas Gleixner
  0 siblings, 1 reply; 88+ messages in thread
From: Zhang, Rui @ 2023-08-14  8:25 UTC (permalink / raw)
  To: tglx, linux-kernel
  Cc: Brown, Len, Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky,
	ray.huang, andrew.cooper3, Sivanich, Dimitri, wei.liu

On Sun, 2023-08-13 at 17:04 +0200, Thomas Gleixner wrote:
> On Sat, Aug 12 2023 at 22:04, Thomas Gleixner wrote:
> > On Sat, Aug 12 2023 at 08:21, Rui Zhang wrote:
> > > With this, we can guarantee that all the available topology
> > > information
> > > are always valid, even when running on future platforms.
> > 
> > I know that it can be made work, but is it worth the extra effort?
> > I
> > don't think so.
> 
> So I thought more about it. For intermediate levels, i.e. something
> which is squeezed between two existing levels, this works by some
> definition of works.

this "some definition of works" includes parsing the unknown levels,
right?

> 
> I.e. the example where we have UBER_TILE between TILE and DIE, then
> we'd
> set and propagate the UBER_TILE entry into the DIE slot and then
> overwrite it again, if there is a DIE entry too.

Well, not really.

If we have TILE/UBER_TILE/DIE in CPUID but only support TILE/DIE in
kernel, the UBER_TILE information is overwritten.

But, UBER_TILE tells us the starting bit in APIC ID for die_id.

Say,
level	type		eax.shifts
0	SMT		1
1	CORE		5
2	TILE		7
3	UBER_TILE	8
4	DIE		9

This is a 1 package system with 2 dies, each die has 2 uber_tiles and
each uber_tile has 2 tiles.

If we don't support uber_tile, what we want to see is a platform with 2
dies and each die has 4 tiles.

But topo_shift_apicid() uses x86_topo_system.dom_shifts[TILE], so what
we see is a platform with 4 dies, and each die has 2 tiles. And this is
broken.

IMO, what we really need for each domain in x86_topo_system is dom_size
and dom_offset (id bit offset in APIC ID). and when parsing domain A,
we can propagate its eax.shifts to the dom_offset of its upper level
domains.

With this, we set dom_offset[DIE] to 7 first when parsing TILE, and
then overwrite it to 8 when parsing UBER_TILE, and set
dom_offset[PACKAGE] to 9 when parsinig DIE.

lossing TILE.eax.shifts is okay, because it is for UBER_TILE id.

> 
> Where it becomes interesting is when the unknown level is past
> DIEGRP,
> e.g. DIEGRP_CONGLOMORATE then we'd need to overwrite the DIEGRP
> level,
> right?
> 
> It can be done, but I don't know whether it buys us much for the
> purely
> theoretical case of new levels added.
> 
> 
Similar to previous case, DIEGRP_CONGLOMORATE eax.shifts can be
propagated to dom_offset[PACKAGE].

But, still, there is one case that we can not handle, (the reason I'm
proposing optional die support in Linux)

Say, we have new level FOO, and the CPUID is like this
level	type		eax.shifts
0	SMT		1
1	CORE		5
2	FOO		8

This can be a system with
1. 1 die and 8 FOOs if DIE is the upper level of FOO
or
2. 8 FOOs with 1 die in each FOO if DIE is the lower level of FOO

Currently, die topology information is mandatory in Linux, we cannot
make it right without patching enum topo_types/enum
x86_topology_domains/topo_domain_map (which in fact tells the
relationship between DIE and FOO).

thanks,
rui

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser
  2023-08-14  8:25         ` Zhang, Rui
@ 2023-08-14 12:26           ` Thomas Gleixner
  2023-08-14 14:48             ` Brown, Len
  2023-08-14 15:28             ` Zhang, Rui
  0 siblings, 2 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-14 12:26 UTC (permalink / raw)
  To: Zhang, Rui, linux-kernel
  Cc: Brown, Len, Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky,
	ray.huang, andrew.cooper3, Sivanich, Dimitri, wei.liu

> On Sun, 2023-08-13 at 17:04 +0200, Thomas Gleixner wrote:
>
> With this, we set dom_offset[DIE] to 7 first when parsing TILE, and
> then overwrite it to 8 when parsing UBER_TILE, and set
> dom_offset[PACKAGE] to 9 when parsinig DIE.
>
> lossing TILE.eax.shifts is okay, because it is for UBER_TILE id.

No. That's just wrong. TILE is defined and potentially used in the
kernel. How can you rightfully assume that UBER TILE is a valid
substitution? You can't.

> Currently, die topology information is mandatory in Linux, we cannot
> make it right without patching enum topo_types/enum
> x86_topology_domains/topo_domain_map (which in fact tells the
> relationship between DIE and FOO).

You cannot just nilly willy assume at which domain level FOO sits. Look
at your example:

> Say, we have new level FOO, and the CPUID is like this
> level	type		eax.shifts
> 0	SMT		1
> 1	CORE		5
> 2	FOO		8

FOO can be anything between CORE and PKG, so you cannot tell what it
means.

Simply heuristics _cannot_ be correct by definition. So why trying to
come up with them just because?

What's the problem you are trying to solve? Some real world issue or
some academic though experiment which might never become a real problem?

Thanks,

        tglx




^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser
  2023-08-14 12:26           ` Thomas Gleixner
@ 2023-08-14 14:48             ` Brown, Len
  2023-08-14 16:13               ` Thomas Gleixner
  2023-08-14 15:28             ` Zhang, Rui
  1 sibling, 1 reply; 88+ messages in thread
From: Brown, Len @ 2023-08-14 14:48 UTC (permalink / raw)
  To: Thomas Gleixner, Zhang, Rui, linux-kernel
  Cc: Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky, ray.huang,
	andrew.cooper3, Sivanich, Dimitri, wei.liu

> What's the problem you are trying to solve? Some real world issue or some academic though experiment which might never become a real problem?

There are several existing bugs, bad practices, and latent future bugs in today's x86 topology code.

First,  with multiple cores having the same core_id.
A consumer of core_id must know about packages to understand core_id.

This is the original sin of the current interface -- which should never have used the word "sibling" *anyplace*,
Because to make sense of the word sibling, you must know what *contains* the siblings.

We introduced "core_cpus" a number of years ago to address this for core_ids (and for other levels,
Such as die_cpus).  Unfortunately, we can probably never actually delete threads_siblings and core_siblings
Without breaking some program someplace...

Core_id should be system-wide global, just like the cpu number is system wide global.
Today, it is the only level id that is not system wide global.
This could be implemented by simply not masking off the package_id bits when creating the core_id,
Like we have done for other levels.
Yes, this could be awkward for some existing code that indexes arrays with core_id, and doesn't like them to be sparse.
But that rough edge is a much smaller problem than having to comprehend a level (package) that you may 
Otherwise not care about.  Besides, core_id's can already be sparse today.

Secondly, with the obsolescence of CPUID.0b and its replacement with CPUID.1F, the contract between
The hardware and the software is that a level can appear and can in between any existing levels.
(the only exception is that SMT is married to core).  It is not possible
For an old kernel to know the name or position of a new level in the hierarchy, going forward.

Today, this manifests with a (currently) latent bug that I caused.  For I implemented die_id
In the style of package_id, and I shouldn't have followed that example.
Today, if CPUID.1F doesn't know anything about multiple DIE, Linux conjurs up
A die_id 0 in sysfs.  It should not.  The reason is that when CPUID.1F enumerates
A level that legacy code doesn't know about, we can't possibly tell if it is above DIE,
or below DIE.  If it is above DIE, then our default die_id 0 is becomes bogus.

That said, I have voiced my objection inside Intel to the creation of random levels
Which do not have an architectural (software) definition; and I'm advocating that
They be *removed* from the SDM until a software programming definition that
Spans all generation is documented.

SMT, core, module, die and the (implicit) package may not be well documented,
But they do have existing uses and will thus live on.
The others maybe not.

-Len




^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser
  2023-08-14 12:26           ` Thomas Gleixner
  2023-08-14 14:48             ` Brown, Len
@ 2023-08-14 15:28             ` Zhang, Rui
  2023-08-14 17:12               ` Thomas Gleixner
  1 sibling, 1 reply; 88+ messages in thread
From: Zhang, Rui @ 2023-08-14 15:28 UTC (permalink / raw)
  To: tglx, linux-kernel
  Cc: Brown, Len, Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky,
	ray.huang, andrew.cooper3, Sivanich, Dimitri, wei.liu

Hi, Thomas,

On Mon, 2023-08-14 at 14:26 +0200, Thomas Gleixner wrote:
> > On Sun, 2023-08-13 at 17:04 +0200, Thomas Gleixner wrote:
> > 
> > With this, we set dom_offset[DIE] to 7 first when parsing TILE, and
> > then overwrite it to 8 when parsing UBER_TILE, and set
> > dom_offset[PACKAGE] to 9 when parsinig DIE.
> > 
> > lossing TILE.eax.shifts is okay, because it is for UBER_TILE id.
> 
> No. That's just wrong. TILE is defined and potentially used in the
> kernel.

Sure.

>  How can you rightfully assume that UBER TILE is a valid
> substitution? You can't.

TILE.eax.shifts tells
1. the number of maximum addressable threads in TILE domain, which
   should be saved in x86_topo_system.dom_size[TILE]
2. the highest bit in APIC ID for tile id, but we don't need this if
   we use package/system scope unique tile id
3. the lowest bit in APIC ID for the upper level of tile
   if the upper level is a known level, say, die, this info is saved in
dom_offset[die]
   if the upper level is an unknown level, then we don't need this to
decode the topology information for the unknown level.

maybe I missed something, for now I don't see how things break here.

> 
> > Currently, die topology information is mandatory in Linux, we
> > cannot
> > make it right without patching enum topo_types/enum
> > x86_topology_domains/topo_domain_map (which in fact tells the
> > relationship between DIE and FOO).
> 
> You cannot just nilly willy assume at which domain level FOO sits.

exactly.

> Look
> at your example:
> 
> > Say, we have new level FOO, and the CPUID is like this
> > level   type            eax.shifts
> > 0       SMT             1
> > 1       CORE            5
> > 2       FOO             8
> 
> FOO can be anything between CORE and PKG, so you cannot tell what it
> means.

Exactly. Anything related with MODULE/TILE/DIE can break in this case.

Say this is a system with 1 package, 2 FOOs, 8 cores.

In current design (in this patch set), kernel has to tell how many
dies/tiles/modules this system has, and kernel cannot do this right.

But if using optional Die (and surely optional module/tile), kernel can
tell that this is a 1-package-0-die-0-tile-0-module-8-core system
before knowing what FOO means, we don't need to make up anything we
don't know.

> 
> Simply heuristics _cannot_ be correct by definition. So why trying to
> come up with them just because?
> 
> What's the problem you are trying to solve? Some real world issue or
> some academic though experiment which might never become a real
> problem?
> 
Maybe I was misleading previously, IMO, I totally agree with your
points, and "using optional die/tile/module" is what I propose to
address these concerns.

thanks,
rui

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser
  2023-08-14 14:48             ` Brown, Len
@ 2023-08-14 16:13               ` Thomas Gleixner
  2023-08-15 19:30                 ` Brown, Len
  0 siblings, 1 reply; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-14 16:13 UTC (permalink / raw)
  To: Brown, Len, Zhang, Rui, linux-kernel
  Cc: Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky, ray.huang,
	andrew.cooper3, Sivanich, Dimitri, wei.liu

Len!

On Mon, Aug 14 2023 at 14:48, Len Brown wrote:
> First,  with multiple cores having the same core_id.
> A consumer of core_id must know about packages to understand core_id.
>
> This is the original sin of the current interface -- which should
> never have used the word "sibling" *anyplace*, Because to make sense
> of the word sibling, you must know what *contains* the siblings.

You're conflating things. The fact that core_id is relative to the
package has absolutely nothing to do with the concept of siblings.

Whether sibling is the right terminology or not is a problem on its own.

Fact is that CPUs are structured in different domain levels and that
structuring makes some sense as it reflects the actual hardware.

The question whether this structuring has a relevance for software or
not, is independent of that. That's something which needs to be defined
and there are certainly aspects which affect scheduling, while others
affect power or other entities and some affect all of them.

> We introduced "core_cpus" a number of years ago to address this for
> core_ids (and for other levels, Such as die_cpus).  Unfortunately, we
> can probably never actually delete threads_siblings and core_siblings
> Without breaking some program someplace...

Sorry, but "core_cpus" is just a different terminology. I'm not seeing
how this is solving anything.

> Core_id should be system-wide global, just like the cpu number is
> system wide global.  Today, it is the only level id that is not system
> wide global.

That's simply not true. cpu_info::die_id is package relative too, which
is silly to begin with and caused to add this noisy logical_die_id muck.

> This could be implemented by simply not masking off the package_id
> bits when creating the core_id,

You have to shift it right by one, if the system is SMT capable, or just
use the APIC ID if it's not (i.e. 0x1f subleaf 0 has a shift of 0). Not
more not less.

Alternatively all IDs are not shifted right at all and just the bits
below the actual level are masked off.

> Like we have done for other levels.

Which one exactly? The only level ID which is truly system wide unique
is package ID.

Die ID is not and core ID is not and there are no other levels the
current upstream code is dealing with.

> Yes, this could be awkward for some existing code that indexes arrays
> with core_id, and doesn't like them to be sparse.  But that rough edge
> is a much smaller problem than having to comprehend a level (package)
> that you may Otherwise not care about.  Besides, core_id's can already
> be sparse today.

It's not awkward. It's just a matter of auditing all the places which
care about core ID and fixing them up in case they can't deal with it.
I went through the die ID usage before making die ID unique to ensure
that it wont break anything.

I surely have mopped up more complex things than that, so where is the
problem doing the same for core ID?

> Secondly, with the obsolescence of CPUID.0b and its replacement with
> CPUID.1F, the contract between The hardware and the software is that a
> level can appear and can in between any existing levels.  (the only
> exception is that SMT is married to core).

In theory, yes. But what's the practical relevance that there might be a
new level between CORE and MODULE or MODULE and TILE etc...?

> It is not possible For an old kernel to know the name or position of a
> new level in the hierarchy, going forward.

Again, where is the practical problem? These new levels are not going to
be declared nilly willy and every other week, right?

> Today, this manifests with a (currently) latent bug that I caused.
> For I implemented die_id In the style of package_id, and I shouldn't
> have followed that example.

You did NOT. You implemented die_id relative to the package, which does
not make it unique in the same way as core_id is relative to the package
and therefore not unique.

Package ID is unique and the only reason why logical package ID exists
is because there have been systems with massive gaps between the package
IDs. That could have been handled differently, but that's again a
different story.

> Today, if CPUID.1F doesn't know anything about multiple DIE, Linux
> conjurs up A die_id 0 in sysfs.  It should not.  The reason is that
> when CPUID.1F enumerates A level that legacy code doesn't know about,
> we can't possibly tell if it is above DIE, or below DIE.  If it is
> above DIE, then our default die_id 0 is becomes bogus.

That's an implementation problem and the code I posted fixes this by
making die_id unique and taking the documented domain levels into
account.

So if 0x1f does not enumerate dies, then each package has one die and
the die ID is the same as the package ID. It's that simple.

> That said, I have voiced my objection inside Intel to the creation of
> random levels Which do not have an architectural (software)
> definition; and I'm advocating that They be *removed* from the SDM
> until a software programming definition that Spans all generation is
> documented.
>
> SMT, core, module, die and the (implicit) package may not be well
> documented, But they do have existing uses and will thus live on.  The
> others maybe not.

Why removing them? If there is no documentation for using them then
software will ignore them, but they reflect how the hardware is built
and according to conversations with various people this topology
reflects other things which are undocumented.

What do you win by removing them from the SDM?

Absolutely nothing. Actually you lose because if they get added with
information how software should use those levels then the whole problem
I discussed with Rui about imaginary new domain levels surfacing in the
future is there sooner than later. If we deal with them correctly today
and ignore them for then kernels will just work on systems which
enumerate them, they just wont make use of these levels.

The amount of extra storage to handle them is marginal and really not
worth to debate.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser
  2023-08-14 15:28             ` Zhang, Rui
@ 2023-08-14 17:12               ` Thomas Gleixner
  0 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-14 17:12 UTC (permalink / raw)
  To: Zhang, Rui, linux-kernel
  Cc: Brown, Len, Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky,
	ray.huang, andrew.cooper3, Sivanich, Dimitri, wei.liu

On Mon, Aug 14 2023 at 15:28, Rui Zhang wrote:
> On Mon, 2023-08-14 at 14:26 +0200, Thomas Gleixner wrote:
>> What's the problem you are trying to solve? Some real world issue or
>> some academic though experiment which might never become a real
>> problem?
>> 
> Maybe I was misleading previously, IMO, I totally agree with your
> points, and "using optional die/tile/module" is what I propose to
> address these concerns.

That's exactly what's implemented. If module, tile, die are not
advertised, then you end up with:

        N   threads
        N/2 cores
        1   module
        1   tile
        1   die

in a package because the bits occupied by module, tile and die are
exactly 0.

But from a conceptual and consistency point of view they exist, no?

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser
  2023-08-14 16:13               ` Thomas Gleixner
@ 2023-08-15 19:30                 ` Brown, Len
  2023-08-17  9:09                   ` Thomas Gleixner
  0 siblings, 1 reply; 88+ messages in thread
From: Brown, Len @ 2023-08-15 19:30 UTC (permalink / raw)
  To: Thomas Gleixner, Zhang, Rui, linux-kernel
  Cc: Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky, ray.huang,
	andrew.cooper3, Sivanich, Dimitri, wei.liu

Hello Thomas,

It seems we need to take a momentary step back to step forward...

First, the Intel CPUID context...

Even though CPUID.B was created to be extensible, we found that adding a "die" level to it would break legacy software.
That is because some legacy software did silly things, such as hard-coding that package level is always adjacent to the core level...

Enter CPUID.1F -- an exact clone of CPUID.B, but with a new name.  The new name guaranteed that the old broken software will not parse CPUID.1F, and gave Intel license to add levels to CPUID.1F at any time without confusing CPUID.1F parsing software.  As 3-year-old kernels routinely run on the very latest hardware, this future-proof goal is paramount.

Multi-die/package systems shipped as the first test of CPUID.1F.  Enumerating the multi-die/package was mostly about MSR scope....

In retrospect, we under-specified what it means to enumerate a CPUID.1F die, because it has been a constant battle to get the HW people to *not* enumerate hidden die that software does not see.

Indeed, we were equally guilty in not codifying an architectural definition of "module" and "tile", which were placed into the  CPUID.1F definition mostly as place-holders with awareness of hardware structures that were already in common use.  For example, there were already module-scoped counters that were hard-coded, and enumerating modules seems to be an to give architectural (re-usable) enumeration to model-specific code.

Second, failings of the Linux topology code...

I agree with you that "thread_siblings" and "core_cpus" are the different words for the same thing.
This will always be true because the hardware architecture guarantees that SMT siblings are the next level down from core.

But no such definition exists for "core_siblings".   It is impossible to write correct software that reads "core_siblings" and takes any action on it.  Those could be the CPUs inside a module, or inside a die, or inside some other level that today's software can't possibly know by name.

On the other hand, die_cpus is clear -- the CPUs within a die.
Package_cpus -- the CPUs within a package.
Core_cpus -- the cpus within a core....
Words matter.

Specific replies....

Re: globally unique core_id

I have 100% confidence that you can make the Linux kernel handle a sparce globally unique core_id name space.
My concern is unknown exposure to joe-random-user-space program that consumes the sysfs representation.

>> Secondly, with the obsolescence of CPUID.0b and its replacement with 
>> CPUID.1F, the contract between The hardware and the software is that a 
>> level can appear and can in between any existing levels.  (the only 
>> exception is that SMT is married to core).

> In theory, yes. But what's the practical relevance that there might be a new level between CORE and MODULE or MODULE and TILE etc...?

>> It is not possible For an old kernel to know the name or position of a 
>> new level in the hierarchy, going forward.

>Again, where is the practical problem? These new levels are not going to be declared nilly willy and every other week, right?

It is irrelevant if a new level is of any practical use to Linux.

What is important is that Linux be able to parse and use the levels it finds useful, while gracefully ignoring any that it doesn't care about (or doesn't yet know about).

Yes, hardware folks can drop something into the ucode and the SDM w/o us knowing ahead of time (see DieGrp in the June 2023 SDM).  Certainly they can do it in well under the 4-year's notice we'd need if we were to simply track the named levels in the SDM.

>> Today, this manifests with a (currently) latent bug that I caused.
>> For I implemented die_id In the style of package_id, and I shouldn't 
>> have followed that example.

> You did NOT. You implemented die_id relative to the package, which does not make it unique in the same way as core_id is relative to the package and therefore not unique.

The point is that  like package_id=0 on a single package system, I put a die_id=0 attribute in sysfs even when NO "die" level is enumerated in CPUID.1F.

That was a mistake.

>> Today, if CPUID.1F doesn't know anything about multiple DIE, Linux 
>> conjurs up A die_id 0 in sysfs.  It should not.  The reason is that 
>> when CPUID.1F enumerates A level that legacy code doesn't know about, 
>> we can't possibly tell if it is above DIE, or below DIE.  If it is 
>> above DIE, then our default die_id 0 is becomes bogus.

>That's an implementation problem and the code I posted fixes this by making die_id unique and taking the documented domain levels into account.

Your code change does not fix the problem above.

>So if 0x1f does not enumerate dies, then each package has one die and the die ID is the same as the package ID. It's that simple.

Unfortunately, no.

Your code will be written and ship before level-X is defined.
A couple of years later, level-X is defined above die.
Your code runs on new hardware that defines no packages, level-X, and no die.
How many die-id's does this system have?

If you could see into the future, you'd answer that there are 2-die, because
There is one inside each level-X.

But since die isn't enumerated, and you don't know if a level-X is defined to be above or below die,
then you can't tell if level-X is something containing die, or something contained-by die...

The proper solution is to not expose a die_id attribute in sysfs if there is no die level enumerated in CPUID.1F.
When it is enumerated, we get it right.  When it is not enumerated, we don't guess.

> What do you win by removing them from the SDM?

When you give HW people enough rope to hang themselves, they will.
Give them something vague in the SDM, and you've created a monster that is interpreted differently by different hardware teams and no validation team on the planet can figure out if the hardware is correct or not.
Then the definition becomes how the OS (possibly not Linux) happened to use that interface on some past chip -- and that use is not documented in the SDM -- and down the rabbit hole you go...

When the SDM precisely documents the software/hardware interface, then proper tests can be written, independent hardware teams are forced to follow the same definition, and correct software can be written once and never break.
 
-Len



^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser
  2023-08-15 19:30                 ` Brown, Len
@ 2023-08-17  9:09                   ` Thomas Gleixner
  2023-08-18  5:01                     ` Brown, Len
  0 siblings, 1 reply; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-17  9:09 UTC (permalink / raw)
  To: Brown, Len, Zhang, Rui, linux-kernel
  Cc: Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky, ray.huang,
	andrew.cooper3, Sivanich, Dimitri, wei.liu

Len!

On Tue, Aug 15 2023 at 19:30, Len Brown wrote:
> In retrospect, we under-specified what it means to enumerate a
> CPUID.1F die, because it has been a constant battle to get the HW
> people to *not* enumerate hidden die that software does not see.
>
> Indeed, we were equally guilty in not codifying an architectural
> definition of "module" and "tile", which were placed into the CPUID.1F
> definition mostly as place-holders with awareness of hardware
> structures that were already in common use.  For example, there were
> already module-scoped counters that were hard-coded, and enumerating
> modules seems to be an to give architectural (re-usable) enumeration
> to model-specific code.

Sure 0x1f is underspecified in terms of meaning of the intermediate
levels, but it's perfectly parseable. Once there is information how to
actually utilize MODULE or TILE for software decisions, then it can be
implemented. Until then we still have to can parse it and otherwise just
ignore it. Where is the problem?

> Second, failings of the Linux topology code...
>
> I agree with you that "thread_siblings" and "core_cpus" are the
> different words for the same thing.  This will always be true because
> the hardware architecture guarantees that SMT siblings are the next
> level down from core.

Right. So you "fix" was pure cosmetic.

> But no such definition exists for "core_siblings".  It is impossible
> to write correct software that reads "core_siblings" and takes any
> action on it.  Those could be the CPUs inside a module, or inside a
> die, or inside some other level that today's software can't possibly
> know by name.
>
> On the other hand, die_cpus is clear -- the CPUs within a die.
> Package_cpus -- the CPUs within a package.
> Core_cpus -- the cpus within a core....
> Words matter.

Of course does terminology matter, but that's not the real problem.

> Re: globally unique core_id
>
> I have 100% confidence that you can make the Linux kernel handle a
> sparce globally unique core_id name space.  My concern is unknown
> exposure to joe-random-user-space program that consumes the sysfs
> representation.

You simply can keep the current representation for proc and sysfs with
the relative IDs and just make the kernel use unique core IDs. The
kernel needs to be sane before you can even think about user space
exposure.

>>> Secondly, with the obsolescence of CPUID.0b and its replacement with 
>>> CPUID.1F, the contract between The hardware and the software is that a 
>>> level can appear and can in between any existing levels.  (the only 
>>> exception is that SMT is married to core).
>
>> In theory, yes. But what's the practical relevance that there might
>> be a new level between CORE and MODULE or MODULE and TILE etc...?
>
>>> It is not possible For an old kernel to know the name or position of a 
>>> new level in the hierarchy, going forward.
>
>> Again, where is the practical problem? These new levels are not going
>> to be declared nilly willy and every other week, right?
>
> It is irrelevant if a new level is of any practical use to Linux.
>
> What is important is that Linux be able to parse and use the levels it
> finds useful, while gracefully ignoring any that it doesn't care about
> (or doesn't yet know about).
>
> Yes, hardware folks can drop something into the ucode and the SDM w/o
> us knowing ahead of time (see DieGrp in the June 2023 SDM).  Certainly
> they can do it in well under the 4-year's notice we'd need if we were
> to simply track the named levels in the SDM.

That's a matter of education of the hardware people. Sure they can do
whatever they want, but if they want us to provide primary support for
the stuff they dream up, then they better go and tell us early enough.

>>> Today, this manifests with a (currently) latent bug that I caused.
>>> For I implemented die_id In the style of package_id, and I shouldn't 
>>> have followed that example.
>
>> You did NOT. You implemented die_id relative to the package, which
>> does not make it unique in the same way as core_id is relative to the
>> package and therefore not unique.
>
> The point is that like package_id=0 on a single package system, I put
> a die_id=0 attribute in sysfs even when NO "die" level is enumerated
> in CPUID.1F.
>
> That was a mistake.

No. It's not a mistake. Conceptually the DIE level exists even if not
enumerated. It consumes zero bits and therefore has size 1.

>>> Today, if CPUID.1F doesn't know anything about multiple DIE, Linux 
>>> conjurs up A die_id 0 in sysfs.  It should not.  The reason is that 
>>> when CPUID.1F enumerates A level that legacy code doesn't know about, 
>>> we can't possibly tell if it is above DIE, or below DIE.  If it is 
>>> above DIE, then our default die_id 0 is becomes bogus.
>
>>That's an implementation problem and the code I posted fixes this by
>>making die_id unique and taking the documented domain levels into
>>account.
>
> Your code change does not fix the problem above.

Why are all of you so fixated on domain levels which are not documented
today? Either you know something which I don't know or you are just
debating an academic problem to death.

>> So if 0x1f does not enumerate dies, then each package has one die and
>> the die ID is the same as the package ID. It's that simple.
>
> Unfortunately, no.
>
> Your code will be written and ship before level-X is defined.
>
> A couple of years later, level-X is defined above die.  Your code runs
> on new hardware that defines no packages, level-X, and no die.  How
> many die-id's does this system have?
>
> If you could see into the future, you'd answer that there are 2-die,
> because There is one inside each level-X.
>
> But since die isn't enumerated, and you don't know if a level-X is
> defined to be above or below die, then you can't tell if level-X is
> something containing die, or something contained-by die...
>
> The proper solution is to not expose a die_id attribute in sysfs if
> there is no die level enumerated in CPUID.1F.  When it is enumerated,
> we get it right.  When it is not enumerated, we don't guess.

The main problem is the kernel side itself. /proc/ /sys/ are things
which can do conditinal exposure, but you can't do so in the kernel.

Fact is that the APIC ID space is segmented to reflect the today
decribed topology domains:

     [PKG] [DIE] [TILE] [MODULE] [CORE] [THREAD]

Each of them can occupy 0 or more bits. So using an internal
representation for them which treats them as size one if not specified
is the obvious and right thing to do.

You cannot create a software monstrosity which makes everything
conditional. It's neither workable nor maintainable. You need a
consistent view independent of the enumerated levels in 0x1F and as a
consequence you have to assume that the non-enumerated levels consume
zero bits in the APIC ID space and have size one.

If the hardware people fail to understand that software needs a
consistent representation of these things, then we can give up and just
refuse to parse 0x1f at all.

Now lets look at your imaginary future systems enumeration:

     [LEVELX] [CORE] [THREAD]

You have to take LEVELX - even if unknown to the kernel - into account
to evaluate the APID ID range which defines the package space.

Otherwise you obviously just have to ignore it because yes, it's unknown
between which domain levels this is going to sit.

But this assumes that LEVELX is going to appear out of the blue just
because the hardware people took the wrong pills. So it's our internal
problem to educate them so this won't happen.

>> What do you win by removing them from the SDM?
>
> When you give HW people enough rope to hang themselves, they will.

You are not preventing this by removing the MODULE/TILE domains
from the SDM.

> Give them something vague in the SDM, and you've created a monster
> that is interpreted differently by different hardware teams and no
> validation team on the planet can figure out if the hardware is
> correct or not.  Then the definition becomes how the OS (possibly not
> Linux) happened to use that interface on some past chip -- and that
> use is not documented in the SDM -- and down the rabbit hole you go...
>
> When the SDM precisely documents the software/hardware interface, then
> proper tests can be written, independent hardware teams are forced to
> follow the same definition, and correct software can be written once
> and never break.

I agree with that sentiment in principle, but you lost this battle
already because there is hardware which enumerates MODULE.

[    0.212535] CPU topo: Thread    :     8
[    0.212537] CPU topo: Core      :     8
[    0.212539] CPU topo: Module    :     2

So what are we going to do about that? Just pretend that it does not
exist?

Sure, we can do that. But I'm also 100% sure, that there is a meaning
which goes beyond the pure physical description of the CPU.

Yes, this description is not there today, but we still have to utilize
the enumeration level for evaluating the APIC ID bit range which defines
the package ID space.

That does not mean that we have to put a meaning on that level by doing
wild guesses. That'd be completely wrong and would cause the problems
you are so worried about.

Why has this even still to be debated? 0x1F support was merged more than
four years ago and we are now indulging in debating academic problems of
unknown levels appearing out of the blue in systems which ship today.

There is a meaning to the existing levels, so time is better spent to
get that documented.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser
  2023-08-17  9:09                   ` Thomas Gleixner
@ 2023-08-18  5:01                     ` Brown, Len
  2023-08-21 10:27                       ` Thomas Gleixner
  0 siblings, 1 reply; 88+ messages in thread
From: Brown, Len @ 2023-08-18  5:01 UTC (permalink / raw)
  To: Thomas Gleixner, Zhang, Rui, linux-kernel
  Cc: Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky, ray.huang,
	andrew.cooper3, Sivanich, Dimitri, wei.liu

Hello Thomas,

Thanks for listening.

Discussing further w/ Rui, we realized that is it not possible to future-proof the CPUID.1F code against the insertion of a new Domain at an  unknown level in the hierarchy.  Further, CPUID.1F explicitly documents that we can not count on any numeric relationship between the Domain Type IDs to tell us between what levels of the hierarchy a future Domain may reside.

For this reason, I recommend that we proceed with implementing internal kernel parsing of Domain Type ID 6 (DieGrp).
Even if DieGrp is a temporary name that changes to something else, we can assert that Domain Type ID 6 contains Die.
Thus, on a system with both DieGrp and Die, we know that multiple DieGrp produce multiple Die, rather than the reverse.

If I understand your description correctly, you propose that if a level is not enumerated, that we assume its id is always 0.  I agree this is the right thing to do with Package, and we should continue doing it, because we always have a package.

However, I don't think this is the right direction for the intermediate levels -- for their existence should be able to be used to enumerate the existence of level-specific features, and 0 suggests that those features exist when they may not.

Take for example, module-scope MSRs and counters that do not exist on systems that do not have modules.

We should only have a valid module_id if CPUID.1F has a non-zero shift width for Domain Type ID 3 (module).  Otherwise, we should probably be setting module_id to -1, so that code looking for those module features knows that they don't exist.  If we don't do this, that code will have to check model # (as it historically has) to determine if module-only features exist -- and we've thrown away the value of architectural level enumeration.

The situation for die_id is similar, except that there are no die features that don't already exist as package features when there is just 1 die/package.  Ie. The die-aware code can simply revert to the package code when die_id= -1.

This, of course, raises the question of the sysfs interface for die_id, and whether it should return -1 when there is no Die enumerated, or make that attribute simply not exist when there is no die_id.  Either would probably be an improvement over conjuring up a phony die_id=0 when no Die is actually enumerated.

Thanks,
-Len

^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser
  2023-08-18  5:01                     ` Brown, Len
@ 2023-08-21 10:27                       ` Thomas Gleixner
  2023-08-30  2:46                         ` Brown, Len
  0 siblings, 1 reply; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-21 10:27 UTC (permalink / raw)
  To: Brown, Len, Zhang, Rui, linux-kernel
  Cc: Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky, ray.huang,
	andrew.cooper3, Sivanich, Dimitri, wei.liu

Len!

On Fri, Aug 18 2023 at 05:01, Len Brown wrote:
> Discussing further w/ Rui, we realized that is it not possible to
> future-proof the CPUID.1F code against the insertion of a new Domain
> at an unknown level in the hierarchy.  Further, CPUID.1F explicitly
> documents that we can not count on any numeric relationship between
> the Domain Type IDs to tell us between what levels of the hierarchy a
> future Domain may reside.

I realized that long before and explained it to Rui already quite some
time ago, no?

> For this reason, I recommend that we proceed with implementing
> internal kernel parsing of Domain Type ID 6 (DieGrp).  Even if DieGrp
> is a temporary name that changes to something else, we can assert that
> Domain Type ID 6 contains Die.  Thus, on a system with both DieGrp and
> Die, we know that multiple DieGrp produce multiple Die, rather than
> the reverse.
>
> If I understand your description correctly, you propose that if a
> level is not enumerated, that we assume its id is always 0.  I agree
> this is the right thing to do with Package, and we should continue
> doing it, because we always have a package.
>
> However, I don't think this is the right direction for the
> intermediate levels -- for their existence should be able to be used
> to enumerate the existence of level-specific features, and 0 suggests
> that those features exist when they may not.
>
> Take for example, module-scope MSRs and counters that do not exist on
> systems that do not have modules.
>
> We should only have a valid module_id if CPUID.1F has a non-zero shift
> width for Domain Type ID 3 (module).  Otherwise, we should probably be
> setting module_id to -1, so that code looking for those module
> features knows that they don't exist.  If we don't do this, that code
> will have to check model # (as it historically has) to determine if
> module-only features exist -- and we've thrown away the value of
> architectural level enumeration.
>
> The situation for die_id is similar, except that there are no die
> features that don't already exist as package features when there is
> just 1 die/package.  Ie. The die-aware code can simply revert to the
> package code when die_id= -1.
>
> This, of course, raises the question of the sysfs interface for
> die_id, and whether it should return -1 when there is no Die
> enumerated, or make that attribute simply not exist when there is no
> die_id.  Either would probably be an improvement over conjuring up a
> phony die_id=0 when no Die is actually enumerated.

You are still conflating things at different levels and the above is a
mixture of implementation details, historical anecdotes, user space
interface issues and whatever.

But you are still failing to look at it from a ground up conceptual
level.

1.) Parsing and storing enumeration

    The way how 0x1F is designed and how the APIC ID space is
    partitioned, makes it entirely clear that all domain levels must be
    evaluated no matter what and it's more than obvious that
    non-enumerated domain levels occupy _zero_ bits in the APIC ID
    partitioning and therefore end up being size _one_.

2.) IDs

    As a logical consequence of #1 each level - whether enumerated or
    not - has an ID. Obviously non-enumerated levels share the ID with
    the levels up the hierarchy.

    Overloading ID with some "it's not enumerated" value is conceptually
    completely wrong.

3.) Evaluation

    Runtime evaluation of the stored information can and in most cases
    has to take the (non-)enumeration into account. But that's a
    conceptually separate issue from #1 and #2.

    #1 is providing a trivial mechanism for that as it's obvious that a
    domain level is only enumerated when it occupies more than 0 bits.

4.) User space exposure

    That's an orthogonal problem, which obviously has to be addressed,
    but #1 - #3 provide _all_ required mechanisms to preserve the
    existing (bogus) semantics and to provide a new set of sensible
    interfaces which are more than just a new name and paint for the
    already existing bogosities.

The whole problem of the existing mechanims and the resulting nonsense
at the usage sites of topology information is that there is no concept
behind any of this. It's a mess of ad-hoc mechanisms which have been
glued on over time, which completely lacks consistency.

As a consequence the kernel has grown warts at all ends to deal with
this and the two series I'm working on are laying the groundwork to
clean this up on top of a consistent and understandable topology
mechanism, which also allows to grow into a central storage of such
information instead of having home brewn solutions in every other
subsystem.

I'm happy to listen to your ideas and thoughts as long as we are
debating things at the proper conceptual design levels. But I'm not
interested at all in random recommendations which just address the
symptoms and therefore are proliferating the lack of consistent design.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser
  2023-08-21 10:27                       ` Thomas Gleixner
@ 2023-08-30  2:46                         ` Brown, Len
  2023-08-30 12:39                           ` Thomas Gleixner
  0 siblings, 1 reply; 88+ messages in thread
From: Brown, Len @ 2023-08-30  2:46 UTC (permalink / raw)
  To: Thomas Gleixner, Zhang, Rui, linux-kernel
  Cc: Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky, ray.huang,
	andrew.cooper3, Sivanich, Dimitri, wei.liu

Hello Thomas,

>    The way how 0x1F is designed and how the APIC ID space is
>    partitioned, makes it entirely clear that all domain levels must be
>    evaluated no matter what

Correct.

>    and it's more than obvious that
>    non-enumerated domain levels occupy _zero_ bits in the APIC ID
>    partitioning and therefore end up being size _one_.

By "size _one_", do you mean that a non-enumerated level gets a valid id, eg. id=0?

That direction would be problematic.

If CPUID.1F doesn't enumerate a module level, then there is NO module level.
Conjuring up a valid module_id=0 in this scenario is a bug.

For a module level to exist, it must occupy at least 1 bit in the APIC ID.

This is what I failed to impress upon you about die_id erroneously following the example of package_id.
Package_id is special.  There is always an architectural package, and if no package bits are set in the APIC-id, we can still assume package_id=0.

This is not true for the intermediate levels.  If they are not enumerated, they DO NOT EXIST.

This is particularly important given the ability of CPUID.1F to gain levels currently undocumented, and omit documented levels that may have no utility, or even a meaning, on future hardware.

-Len


^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser
  2023-08-30  2:46                         ` Brown, Len
@ 2023-08-30 12:39                           ` Thomas Gleixner
  2023-09-01  3:09                             ` Brown, Len
  0 siblings, 1 reply; 88+ messages in thread
From: Thomas Gleixner @ 2023-08-30 12:39 UTC (permalink / raw)
  To: Brown, Len, Zhang, Rui, linux-kernel
  Cc: Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky, ray.huang,
	andrew.cooper3, Sivanich, Dimitri, wei.liu

On Wed, Aug 30 2023 at 02:46, Len Brown wrote:
>>    and it's more than obvious that
>>    non-enumerated domain levels occupy _zero_ bits in the APIC ID
>>    partitioning and therefore end up being size _one_.
>
> By "size _one_", do you mean that a non-enumerated level gets a valid
> id, eg. id=0?
>
> That direction would be problematic.
>
> If CPUID.1F doesn't enumerate a module level, then there is NO module level.
> Conjuring up a valid module_id=0 in this scenario is a bug.
>
> For a module level to exist, it must occupy at least 1 bit in the APIC ID.
>
> This is what I failed to impress upon you about die_id erroneously
> following the example of package_id.  Package_id is special.  There is
> always an architectural package, and if no package bits are set in the
> APIC-id, we can still assume package_id=0.
>
> This is not true for the intermediate levels.  If they are not
> enumerated, they DO NOT EXIST.

Again. You are conflating an implementation detail with a conceptual
problem.

Conceptually _all_ levels exist, but the ones which occupy zero bits
have no meaning. Neither have the unknown levels if they should surface
at some point.

So as they _all_ exist the logical consequence is that even those which
occupy zero bits have an ID.

Code which is interested in information which depends on the enumeration
of the level must obviously do:

    if (level_exists(X))
    	analyse_level(X)

Whether you express that via an invalid level ID or via an explicit
check for the level is an implementation detail.

Other code paths can still utilize the resulting ID for comparisons
without checking whether the level exists or not. The validity of such a
comparision is not per se wrong. It depends on what you want to achieve
and in some cases the unconditional comparison just allows you to write
better code because there are enough cases where the only information
required is whether two IDs are matching or not. Such code neither cares
whether there are multiple instances of that level nor does it deduce
that there is level dependent information.

The problem of the current implementation is not that the die ID is
automatically assigned. The problem is at the usage sites which blindly
assume that there must be a meaning. That's a completely different issue
and has absolutely nothing to do with purely mathematical deduced ID
information at any given level.

See?

Thanks,

        tglx





^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser
  2023-08-30 12:39                           ` Thomas Gleixner
@ 2023-09-01  3:09                             ` Brown, Len
  2023-09-01  7:45                               ` Thomas Gleixner
  0 siblings, 1 reply; 88+ messages in thread
From: Brown, Len @ 2023-09-01  3:09 UTC (permalink / raw)
  To: Thomas Gleixner, Zhang, Rui, linux-kernel
  Cc: Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky, ray.huang,
	andrew.cooper3, Sivanich, Dimitri, wei.liu

Hello Thomas,

> Conceptually _all_ levels exist, but the ones which occupy zero bits have no meaning. Neither have the unknown levels if they should surface at some point.
>
> So as they _all_ exist the logical consequence is that even those which occupy zero bits have an ID.
>
> Code which is interested in information which depends on the enumeration of the level must obviously do:
>
>    if (level_exists(X))
>    	analyse_level(X)
>
> Whether you express that via an invalid level ID or via an explicit check for the level is an implementation detail.

Thank you for acknowledging that a level with a shift-width of 0 does not exist, and thus an id for that level has no meaning.

One could argue that except for package_id and core_id, which always exist, maintainable code would *always* check that a level exists before doing *anything* with its level_id.  Color me skeptical of an implementation that does otherwise...

So what are you proposing with the statement that "conceptually _all_ levels exist"?

> The problem of the current implementation is not that the die ID is automatically assigned. The problem is at the usage sites which blindly assume that there must be a meaning. That's a completely different issue and has absolutely nothing to do with purely mathematical deduced ID information at any given level.

I agree that the code that exports the die_id attributes in topology sysfs should not do so when the die_id is meaningless.

Thanks,
-Len

Ps. It is a safe bet that new levels will "surface at some point".  For example, DieGrp surfaced this summer w/o any prior consultation with the Linux team.  But even if they did consult us and gave us the ideal 1-year before-hardware advance notice, and even if we miraculously added support in 0 time, we would still be 2-years late to prescriptively recognize this new level -- as our enterprise customers routinely run 3-year-old kernels.  This is why it is mandatory that our code be resilient to the insertion of additional future levels.  I think it can be -- as long as we continue to use globally unique id's for all levels  (IIR, only core_id is not globally unique today) and do _nothing_ with levels that have a 0 shift-width.


^ permalink raw reply	[flat|nested] 88+ messages in thread

* RE: [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser
  2023-09-01  3:09                             ` Brown, Len
@ 2023-09-01  7:45                               ` Thomas Gleixner
  0 siblings, 0 replies; 88+ messages in thread
From: Thomas Gleixner @ 2023-09-01  7:45 UTC (permalink / raw)
  To: Brown, Len, Zhang, Rui, linux-kernel
  Cc: Gross, Jurgen, mikelley, arjan, x86, thomas.lendacky, ray.huang,
	andrew.cooper3, Sivanich, Dimitri, wei.liu

Len!

On Fri, Sep 01 2023 at 03:09, Len Brown wrote:
>> Conceptually _all_ levels exist, but the ones which occupy zero bits
>> have no meaning. Neither have the unknown levels if they should
>> surface at some point.
>>
>> So as they _all_ exist the logical consequence is that even those which occupy zero bits have an ID.
>>
>> Code which is interested in information which depends on the enumeration of the level must obviously do:
>>
>>    if (level_exists(X))
>>    	analyse_level(X)
>>
>> Whether you express that via an invalid level ID or via an explicit
>> check for the level is an implementation detail.
>
> Thank you for acknowledging that a level with a shift-width of 0 does
> not exist, and thus an id for that level has no meaning.

Even if the level is enumerated then there is no implicit meaning
attached per se. It's only relevant when there is a documented
relationship between the enumeration and secondary information attached
to it. Making implicit general assumptions about the meaning of an
enumeration is just not possible,

> One could argue that except for package_id and core_id, which always
> exist, maintainable code would *always* check that a level exists
> before doing *anything* with its level_id.  Color me skeptical of an
> implementation that does otherwise...

We have that today, no?

> So what are you proposing with the statement that "conceptually _all_
> levels exist"?

We need a consistent view on the topology and the only consistent view
is mathematical. Which means that a shift 0 element obviously has size
one because of size = 1 << SHIFT.

As a consequence these non-enumerated levels have an ID too, which in
turn makes the view on the topology consistent and independent of the
actually enumerated levels.

>> The problem of the current implementation is not that the die ID is
>> automatically assigned. The problem is at the usage sites which
>> blindly assume that there must be a meaning. That's a completely
>> different issue and has absolutely nothing to do with purely
>> mathematical deduced ID information at any given level.
>
> I agree that the code that exports the die_id attributes in topology
> sysfs should not do so when the die_id is meaningless.

The problem is not the fact that die_id is exposed. The problem is that
the meta information which allows to deduce meaning is not exposed along
with it. The fact that the exposure was half thought out makes is
slightly harder to correct that mistake, but I'm not yet convinced that
non-exposure is the correct answer in general.

> Ps. It is a safe bet that new levels will "surface at some point".
> For example, DieGrp surfaced this summer w/o any prior consultation
> with the Linux team.  But even if they did consult us and gave us the
> ideal 1-year before-hardware advance notice, and even if we
> miraculously added support in 0 time, we would still be 2-years late
> to prescriptively recognize this new level -- as our enterprise
> customers routinely run 3-year-old kernels.

That's a strawman as the enterprise people backport the world and some
more. So if there is timely upstream support then it will turn up in the
frankenkernels in time too. Arguably we could even backport the new
magic level ID to stable kernels as well as we do with other important
hardware related minimal addons.

> This is why it is mandatory that our code be resilient to the
> insertion of additional future levels.  I think it can be -- as long
> as we continue to use globally unique id's for all levels (IIR, only
> core_id is not globally unique today) and do _nothing_ with levels
> that have a 0 shift-width.

Die ID is relative too for no real good reason. Inside the kernel core
ID is not really required to be relative either.

Implementation wise it's just wrong to store this information in
cpu_info instead of doing a runtime evaluation of the topology
information, which allows to chose between global and relative IDs
depending on the requirements of the particular usage site.

The primary usage of these IDs is for initialization and everything
which needs this for hotpath usage converts it into a use case specific
cached representation anyway because accessing per cpu variables in a
hotpath is suboptimal at best.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* [tip: x86/core] x86/cpu/hygon: Fix the CPU topology evaluation for real
  2023-08-12  3:58         ` Pu Wen
@ 2023-10-13  9:38           ` tip-bot2 for Pu Wen
  0 siblings, 0 replies; 88+ messages in thread
From: tip-bot2 for Pu Wen @ 2023-10-13  9:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Pu Wen, Thomas Gleixner, Peter Zijlstra (Intel),
	stable, x86, linux-kernel

The following commit has been merged into the x86/core branch of tip:

Commit-ID:     ee545b94d39a00c93dc98b1dbcbcf731d2eadeb4
Gitweb:        https://git.kernel.org/tip/ee545b94d39a00c93dc98b1dbcbcf731d2eadeb4
Author:        Pu Wen <puwen@hygon.cn>
AuthorDate:    Mon, 14 Aug 2023 10:18:26 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 10 Oct 2023 14:38:16 +02:00

x86/cpu/hygon: Fix the CPU topology evaluation for real

Hygon processors with a model ID > 3 have CPUID leaf 0xB correctly
populated and don't need the fixed package ID shift workaround. The fixup
is also incorrect when running in a guest.

Fixes: e0ceeae708ce ("x86/CPU/hygon: Fix phys_proc_id calculation logic for multi-die processors")
Signed-off-by: Pu Wen <puwen@hygon.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/tencent_594804A808BD93A4EBF50A994F228E3A7F07@qq.com
Link: https://lore.kernel.org/r/20230814085112.089607918@linutronix.de
---
 arch/x86/kernel/cpu/hygon.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/hygon.c b/arch/x86/kernel/cpu/hygon.c
index defdc59..a7b3ef4 100644
--- a/arch/x86/kernel/cpu/hygon.c
+++ b/arch/x86/kernel/cpu/hygon.c
@@ -87,8 +87,12 @@ static void hygon_get_topology(struct cpuinfo_x86 *c)
 		if (!err)
 			c->x86_coreid_bits = get_count_order(c->x86_max_cores);
 
-		/* Socket ID is ApicId[6] for these processors. */
-		c->phys_proc_id = c->apicid >> APICID_SOCKET_ID_BIT;
+		/*
+		 * Socket ID is ApicId[6] for the processors with model <= 0x3
+		 * when running on host.
+		 */
+		if (!boot_cpu_has(X86_FEATURE_HYPERVISOR) && c->x86_model <= 0x3)
+			c->phys_proc_id = c->apicid >> APICID_SOCKET_ID_BIT;
 
 		cacheinfo_hygon_init_llc_id(c, cpu);
 	} else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {

^ permalink raw reply related	[flat|nested] 88+ messages in thread

end of thread, other threads:[~2023-10-13  9:39 UTC | newest]

Thread overview: 88+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-02 10:20 [patch V3 00/40] x86/cpu: Rework the topology evaluation Thomas Gleixner
2023-08-02 10:20 ` [patch V3 01/40] cpu/SMT: Make SMT control more robust against enumeration failures Thomas Gleixner
2023-08-04 17:50   ` Borislav Petkov
2023-08-04 20:01     ` Thomas Gleixner
2023-08-02 10:21 ` [patch V3 02/40] x86/apic: Fake primary thread mask for XEN/PV Thomas Gleixner
2023-08-04 18:12   ` Borislav Petkov
2023-08-04 20:02     ` Thomas Gleixner
2023-08-02 10:21 ` [patch V3 03/40] x86/cpu: Encapsulate topology information in cpuinfo_x86 Thomas Gleixner
2023-08-02 10:21 ` [patch V3 04/40] x86/cpu: Move phys_proc_id into topology info Thomas Gleixner
2023-08-02 10:21 ` [patch V3 05/40] x86/cpu: Move cpu_die_id " Thomas Gleixner
2023-08-09 14:32   ` Zhang, Rui
2023-08-09 15:14     ` Thomas Gleixner
2023-08-02 10:21 ` [patch V3 06/40] scsi: lpfc: Use topology_core_id() Thomas Gleixner
2023-08-02 10:21 ` [patch V3 07/40] hwmon: (fam15h_power) " Thomas Gleixner
2023-08-02 10:21 ` [patch V3 08/40] x86/cpu: Move cpu_core_id into topology info Thomas Gleixner
2023-08-02 10:21 ` [patch V3 09/40] x86/cpu: Move cu_id " Thomas Gleixner
2023-08-02 10:21 ` [patch V3 10/40] x86/cpu: Remove pointless evaluation of x86_coreid_bits Thomas Gleixner
2023-08-02 10:21 ` [patch V3 11/40] x86/cpu: Move logical package and die IDs into topology info Thomas Gleixner
2023-08-02 10:21 ` [patch V3 12/40] x86/cpu: Move cpu_l[l2]c_id " Thomas Gleixner
2023-08-02 10:21 ` [patch V3 13/40] x86/apic: Use BAD_APICID consistently Thomas Gleixner
2023-08-02 10:21 ` [patch V3 14/40] x86/apic: Use u32 for APIC IDs in global data Thomas Gleixner
2023-08-02 10:21 ` [patch V3 15/40] x86/apic: Use u32 for check_apicid_used() Thomas Gleixner
2023-08-02 10:21 ` [patch V3 16/40] x86/apic: Use u32 for cpu_present_to_apicid() Thomas Gleixner
2023-08-02 10:21 ` [patch V3 17/40] x86/apic: Use u32 for phys_pkg_id() Thomas Gleixner
2023-08-02 10:21 ` [patch V3 18/40] x86/apic: Use u32 for [gs]et_apic_id() Thomas Gleixner
2023-08-02 10:21 ` [patch V3 19/40] x86/apic: Use u32 for wakeup_secondary_cpu[_64]() Thomas Gleixner
2023-08-10  7:58   ` Qiuxu Zhuo
2023-08-02 10:21 ` [patch V3 20/40] x86/cpu/topology: Cure the abuse of cpuinfo for persisting logical ids Thomas Gleixner
2023-08-02 10:21 ` [patch V3 21/40] x86/cpu: Provide debug interface Thomas Gleixner
2023-08-02 10:21 ` [patch V3 22/40] x86/cpu: Provide cpuid_read() et al Thomas Gleixner
2023-08-02 10:21 ` [patch V3 23/40] x86/cpu: Provide cpu_init/parse_topology() Thomas Gleixner
2023-08-04  8:14   ` K Prateek Nayak
2023-08-04  8:28     ` Thomas Gleixner
2023-08-04  8:34       ` K Prateek Nayak
2023-08-12  6:41   ` Zhang, Rui
2023-08-12  8:00     ` Zhang, Rui
2023-08-14  6:26       ` Thomas Gleixner
2023-08-14  7:11         ` Zhang, Rui
2023-08-13 13:30     ` Zhang, Rui
2023-08-13 14:36       ` Thomas Gleixner
2023-08-14  6:20         ` Thomas Gleixner
2023-08-14  6:42           ` Zhang, Rui
2023-08-02 10:21 ` [patch V3 24/40] x86/cpu: Add legacy topology parser Thomas Gleixner
2023-08-02 10:21 ` [patch V3 25/40] x86/cpu: Use common topology code for Centaur and Zhaoxin Thomas Gleixner
2023-08-02 10:21 ` [patch V3 26/40] x86/cpu: Move __max_die_per_package to common.c Thomas Gleixner
2023-08-02 10:21 ` [patch V3 27/40] x86/cpu: Provide a sane leaf 0xb/0x1f parser Thomas Gleixner
2023-08-12  8:21   ` Zhang, Rui
2023-08-12 20:04     ` Thomas Gleixner
2023-08-13 15:04       ` Thomas Gleixner
2023-08-14  8:25         ` Zhang, Rui
2023-08-14 12:26           ` Thomas Gleixner
2023-08-14 14:48             ` Brown, Len
2023-08-14 16:13               ` Thomas Gleixner
2023-08-15 19:30                 ` Brown, Len
2023-08-17  9:09                   ` Thomas Gleixner
2023-08-18  5:01                     ` Brown, Len
2023-08-21 10:27                       ` Thomas Gleixner
2023-08-30  2:46                         ` Brown, Len
2023-08-30 12:39                           ` Thomas Gleixner
2023-09-01  3:09                             ` Brown, Len
2023-09-01  7:45                               ` Thomas Gleixner
2023-08-14 15:28             ` Zhang, Rui
2023-08-14 17:12               ` Thomas Gleixner
2023-08-02 10:21 ` [patch V3 28/40] x86/cpu: Use common topology code for Intel Thomas Gleixner
2023-08-02 10:21 ` [patch V3 29/40] x86/cpu/amd: Provide a separate accessor for Node ID Thomas Gleixner
2023-08-02 10:21 ` [patch V3 30/40] x86/cpu: Provide an AMD/HYGON specific topology parser Thomas Gleixner
2023-08-02 19:28   ` Michael Kelley (LINUX)
2023-08-02 19:46     ` Thomas Gleixner
2023-08-02 19:51   ` [patch V3a " Thomas Gleixner
2023-08-11 12:58     ` Pu Wen
2023-08-11 17:11       ` Thomas Gleixner
2023-08-12  3:58         ` Pu Wen
2023-10-13  9:38           ` [tip: x86/core] x86/cpu/hygon: Fix the CPU topology evaluation for real tip-bot2 for Pu Wen
2023-08-02 10:21 ` [patch V3 31/40] x86/smpboot: Teach it about topo.amd_node_id Thomas Gleixner
2023-08-02 10:21 ` [patch V3 32/40] x86/cpu: Use common topology code for AMD Thomas Gleixner
2023-08-02 10:21 ` [patch V3 33/40] x86/cpu: Use common topology code for HYGON Thomas Gleixner
2023-08-11 13:00   ` Pu Wen
2023-08-11 17:11     ` Thomas Gleixner
2023-08-02 10:21 ` [patch V3 34/40] x86/mm/numa: Use core domain size on AMD Thomas Gleixner
2023-08-02 10:21 ` [patch V3 35/40] x86/cpu: Make topology_amd_node_id() use the actual node info Thomas Gleixner
2023-08-02 10:21 ` [patch V3 36/40] x86/cpu: Remove topology.c Thomas Gleixner
2023-08-02 10:21 ` [patch V3 37/40] x86/cpu: Remove x86_coreid_bits Thomas Gleixner
2023-08-02 10:21 ` [patch V3 38/40] x86/apic: Remove unused phys_pkg_id() callback Thomas Gleixner
2023-08-02 10:21 ` [patch V3 39/40] x86/xen/smp_pv: Remove cpudata fiddling Thomas Gleixner
2023-08-02 10:22 ` [patch V3 40/40] x86/apic/uv: Remove the private leaf 0xb parser Thomas Gleixner
2023-08-02 11:56 ` [patch V3 00/40] x86/cpu: Rework the topology evaluation Juergen Gross
2023-08-03  0:07 ` Sohil Mehta
2023-08-03  3:44 ` Michael Kelley (LINUX)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).