linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 01/11] x86 topology: fix doc typo
       [not found] <20190219032259.12237-1-len.brown@intel.com>
@ 2019-02-19  3:22 ` Len Brown
  2019-02-19  3:22   ` [PATCH 02/11] topolgy: simplify cputopology.txt formatting and wording Len Brown
                     ` (4 more replies)
  0 siblings, 5 replies; 31+ messages in thread
From: Len Brown @ 2019-02-19  3:22 UTC (permalink / raw)
  To: linux-drivers-review; +Cc: linux-doc

reflect actual cpuinfo_x86 field name:

s/logical_id/logical_proc_id/

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
---
 Documentation/x86/topology.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/x86/topology.txt b/Documentation/x86/topology.txt
index 2953e3ec9a02..06b3cdbc4048 100644
--- a/Documentation/x86/topology.txt
+++ b/Documentation/x86/topology.txt
@@ -51,7 +51,7 @@ The topology of a system is described in the units of:
     The physical ID of the package. This information is retrieved via CPUID
     and deduced from the APIC IDs of the cores in the package.
 
-  - cpuinfo_x86.logical_id:
+  - cpuinfo_x86.logical_proc_id:
 
     The logical ID of the package. As we do not trust BIOSes to enumerate the
     packages in a consistent way, we introduced the concept of logical package
-- 
2.18.0-rc0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 02/11] topolgy: simplify cputopology.txt formatting and wording
  2019-02-19  3:22 ` [PATCH 01/11] x86 topology: fix doc typo Len Brown
@ 2019-02-19  3:22   ` Len Brown
  2019-02-19  3:40     ` Len Brown
  2019-02-19 19:33     ` Randy Dunlap
  2019-02-19  3:22   ` [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support Len Brown
                     ` (3 subsequent siblings)
  4 siblings, 2 replies; 31+ messages in thread
From: Len Brown @ 2019-02-19  3:22 UTC (permalink / raw)
  To: linux-drivers-review; +Cc: linux-doc

No semantic changes.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
---
 Documentation/cputopology.txt | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
index c6e7e9196a8b..2698da7e4f49 100644
--- a/Documentation/cputopology.txt
+++ b/Documentation/cputopology.txt
@@ -3,76 +3,76 @@ How CPU topology info is exported via sysfs
 ===========================================
 
 Export CPU topology info via sysfs. Items (attributes) are similar
-to /proc/cpuinfo output of some architectures:
+to /proc/cpuinfo output of some architectures.  They reside in
+/sys/devices/system/cpu/cpuX/topology/:
 
-1) /sys/devices/system/cpu/cpuX/topology/physical_package_id:
+physical_package_id:
 
 	physical package id of cpuX. Typically corresponds to a physical
 	socket number, but the actual value is architecture and platform
 	dependent.
 
-2) /sys/devices/system/cpu/cpuX/topology/core_id:
+core_id:
 
 	the CPU core ID of cpuX. Typically it is the hardware platform's
 	identifier (rather than the kernel's).  The actual value is
 	architecture and platform dependent.
 
-3) /sys/devices/system/cpu/cpuX/topology/book_id:
+book_id:
 
 	the book ID of cpuX. Typically it is the hardware platform's
 	identifier (rather than the kernel's).	The actual value is
 	architecture and platform dependent.
 
-4) /sys/devices/system/cpu/cpuX/topology/drawer_id:
+drawer_id:
 
 	the drawer ID of cpuX. Typically it is the hardware platform's
 	identifier (rather than the kernel's).	The actual value is
 	architecture and platform dependent.
 
-5) /sys/devices/system/cpu/cpuX/topology/thread_siblings:
+thread_siblings:
 
 	internal kernel map of cpuX's hardware threads within the same
 	core as cpuX.
 
-6) /sys/devices/system/cpu/cpuX/topology/thread_siblings_list:
+thread_siblings_list:
 
 	human-readable list of cpuX's hardware threads within the same
 	core as cpuX.
 
-7) /sys/devices/system/cpu/cpuX/topology/core_siblings:
+core_siblings:
 
 	internal kernel map of cpuX's hardware threads within the same
 	physical_package_id.
 
-8) /sys/devices/system/cpu/cpuX/topology/core_siblings_list:
+core_siblings_list:
 
 	human-readable list of cpuX's hardware threads within the same
 	physical_package_id.
 
-9) /sys/devices/system/cpu/cpuX/topology/book_siblings:
+book_siblings:
 
 	internal kernel map of cpuX's hardware threads within the same
 	book_id.
 
-10) /sys/devices/system/cpu/cpuX/topology/book_siblings_list:
+book_siblings_list:
 
 	human-readable list of cpuX's hardware threads within the same
 	book_id.
 
-11) /sys/devices/system/cpu/cpuX/topology/drawer_siblings:
+drawer_siblings:
 
 	internal kernel map of cpuX's hardware threads within the same
 	drawer_id.
 
-12) /sys/devices/system/cpu/cpuX/topology/drawer_siblings_list:
+drawer_siblings_list:
 
 	human-readable list of cpuX's hardware threads within the same
 	drawer_id.
 
-To implement it in an architecture-neutral way, a new source file,
-drivers/base/topology.c, is to export the 6 to 12 attributes. The book
-and drawer related sysfs files will only be created if CONFIG_SCHED_BOOK
-and CONFIG_SCHED_DRAWER are selected.
+Architecture-neutral, drivers/base/topology.c, exports these attributes.
+However, the book and drawer related sysfs files will only be created if
+CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are selected, respectively.
 
 CONFIG_SCHED_BOOK and CONFIG_DRAWER are currently only used on s390, where
 they reflect the cpu and cache hierarchy.
-- 
2.18.0-rc0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support
  2019-02-19  3:22 ` [PATCH 01/11] x86 topology: fix doc typo Len Brown
  2019-02-19  3:22   ` [PATCH 02/11] topolgy: simplify cputopology.txt formatting and wording Len Brown
@ 2019-02-19  3:22   ` Len Brown
  2019-02-19  3:40     ` Len Brown
                       ` (4 more replies)
  2019-02-19  3:22   ` [PATCH 04/11] cpu topology: export die_id Len Brown
                     ` (2 subsequent siblings)
  4 siblings, 5 replies; 31+ messages in thread
From: Len Brown @ 2019-02-19  3:22 UTC (permalink / raw)
  To: linux-drivers-review; +Cc: linux-doc

Some new systems have multiple software-visible die within each package.
The new CPUID.1F leaf can enumerate this multi-die/package topology.

CPUID.1F a super-set of the CPUID.B "Extended Toplogy Leaf",
and a common updated routine can parse either leaf.

Legacy systems without CPUID.1F, and systems without multi-die/package
hardware, will see no functional change from this patch series.

Multi-die/package systems will use CPUID.B before this patch,
and CPUID.1F after this patch.  In the CPUID.B case, all die appear
(incorrectly) to software as individual packages.  In the CPUID.1F case,
the package id's reflect reality, and die_id's become meaningful.

Subsequent patches in this series update the kernel to be multi-die aware.
In particular, some software needs to know the difference between
a die-scope MSR and a package-scope MSR.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
---
 Documentation/x86/topology.txt   |  4 ++
 arch/x86/include/asm/processor.h |  4 +-
 arch/x86/kernel/cpu/topology.c   | 82 ++++++++++++++++++++++++--------
 arch/x86/kernel/smpboot.c        |  4 +-
 4 files changed, 73 insertions(+), 21 deletions(-)

diff --git a/Documentation/x86/topology.txt b/Documentation/x86/topology.txt
index 06b3cdbc4048..8107b6cfc9ea 100644
--- a/Documentation/x86/topology.txt
+++ b/Documentation/x86/topology.txt
@@ -46,6 +46,10 @@ The topology of a system is described in the units of:
 
     The number of cores in a package. This information is retrieved via CPUID.
 
+  - cpuinfo_x86.x86_max_dies:
+
+    The number of dies in a package. This information is retrieved via CPUID.
+
   - cpuinfo_x86.phys_proc_id:
 
     The physical ID of the package. This information is retrieved via CPUID
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 33051436c864..f2856fe03715 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -105,7 +105,8 @@ struct cpuinfo_x86 {
 	int			x86_power;
 	unsigned long		loops_per_jiffy;
 	/* cpuid returned max cores value: */
-	u16			 x86_max_cores;
+	u16			x86_max_cores;
+	u16			x86_max_dies;
 	u16			apicid;
 	u16			initial_apicid;
 	u16			x86_clflush_size;
@@ -117,6 +118,7 @@ struct cpuinfo_x86 {
 	u16			logical_proc_id;
 	/* Core id: */
 	u16			cpu_core_id;
+	u16			cpu_die_id;
 	/* Index into per_cpu list: */
 	u16			cpu_index;
 	u32			microcode;
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 8f6c784141d1..6dce6ee77849 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -15,33 +15,61 @@
 /* leaf 0xb SMT level */
 #define SMT_LEVEL	0
 
-/* leaf 0xb sub-leaf types */
+/* extended topology sub-leaf types */
 #define INVALID_TYPE	0
 #define SMT_TYPE	1
 #define CORE_TYPE	2
+#define DIE_TYPE	5
 
 #define LEAFB_SUBTYPE(ecx)		(((ecx) >> 8) & 0xff)
 #define BITS_SHIFT_NEXT_LEVEL(eax)	((eax) & 0x1f)
 #define LEVEL_MAX_SIBLINGS(ebx)		((ebx) & 0xffff)
 
-int detect_extended_topology_early(struct cpuinfo_x86 *c)
-{
 #ifdef CONFIG_SMP
+/*
+ * Check if given CPUID extended toplogy "leaf" is implemented
+ */
+static int check_extended_topology_leaf(int leaf)
+{
 	unsigned int eax, ebx, ecx, edx;
 
-	if (c->cpuid_level < 0xb)
+	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
+
+	if (ebx == 0 || (LEAFB_SUBTYPE(ecx) != SMT_TYPE))
 		return -1;
 
-	cpuid_count(0xb, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
+	return 0;
+}
+/*
+ * Return best CPUID Extended Toplogy Leaf supported
+ */
+static int detect_extended_topology_leaf(struct cpuinfo_x86 *c)
+{
+	if (c->cpuid_level >= 0x1f)
+		if (check_extended_topology_leaf(0x1f) == 0)
+			return 0x1f;
 
-	/*
-	 * check if the cpuid leaf 0xb is actually implemented.
-	 */
-	if (ebx == 0 || (LEAFB_SUBTYPE(ecx) != SMT_TYPE))
+	if (c->cpuid_level >= 0xb)
+		if (check_extended_topology_leaf(0xb) == 0)
+			return 0xb;
+
+	return -1;
+}
+#endif
+
+int detect_extended_topology_early(struct cpuinfo_x86 *c)
+{
+#ifdef CONFIG_SMP
+	unsigned int eax, ebx, ecx, edx;
+	int leaf;
+
+	leaf = detect_extended_topology_leaf(c);
+	if (leaf < 0)
 		return -1;
 
 	set_cpu_cap(c, X86_FEATURE_XTOPOLOGY);
 
+	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
 	/*
 	 * initial apic id, which also represents 32-bit extended x2apic id.
 	 */
@@ -52,7 +80,7 @@ int detect_extended_topology_early(struct cpuinfo_x86 *c)
 }
 
 /*
- * Check for extended topology enumeration cpuid leaf 0xb and if it
+ * Check for extended topology enumeration cpuid leaf, and if it
  * exists, use it for populating initial_apicid and cpu topology
  * detection.
  */
@@ -60,46 +88,62 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_SMP
 	unsigned int eax, ebx, ecx, edx, sub_index;
-	unsigned int ht_mask_width, core_plus_mask_width;
+	unsigned int ht_mask_width, core_plus_mask_width, die_plus_mask_width;
 	unsigned int core_select_mask, core_level_siblings;
+	unsigned int die_select_mask, die_level_siblings;
+	int leaf;
 
-	if (detect_extended_topology_early(c) < 0)
+	leaf = detect_extended_topology_leaf(c);
+	if (leaf  < 0)
 		return -1;
 
 	/*
 	 * Populate HT related information from sub-leaf level 0.
 	 */
-	cpuid_count(0xb, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
+	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
+	c->initial_apicid = edx;
 	core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
 	core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
+	die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
+	die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
 
 	sub_index = 1;
 	do {
-		cpuid_count(0xb, sub_index, &eax, &ebx, &ecx, &edx);
+		cpuid_count(leaf, sub_index, &eax, &ebx, &ecx, &edx);
 
 		/*
 		 * Check for the Core type in the implemented sub leaves.
 		 */
 		if (LEAFB_SUBTYPE(ecx) == CORE_TYPE) {
 			core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
+			die_level_siblings = core_level_siblings;
 			core_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
-			break;
+		}
+		if (LEAFB_SUBTYPE(ecx) == DIE_TYPE) {
+			die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
+			die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
 		}
 
 		sub_index++;
 	} while (LEAFB_SUBTYPE(ecx) != INVALID_TYPE);
 
 	core_select_mask = (~(-1 << core_plus_mask_width)) >> ht_mask_width;
-
-	c->cpu_core_id = apic->phys_pkg_id(c->initial_apicid, ht_mask_width)
-						 & core_select_mask;
-	c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid, core_plus_mask_width);
+	die_select_mask = (~(-1 << die_plus_mask_width)) >>
+				core_plus_mask_width;
+
+	c->cpu_core_id = apic->phys_pkg_id(c->initial_apicid,
+				ht_mask_width) & core_select_mask;
+	c->cpu_die_id = apic->phys_pkg_id(c->initial_apicid,
+				core_plus_mask_width) & die_select_mask;
+	c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid,
+				die_plus_mask_width);
 	/*
 	 * Reinit the apicid, now that we have extended initial_apicid.
 	 */
 	c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);
 
 	c->x86_max_cores = (core_level_siblings / smp_num_siblings);
+	c->x86_max_dies = (die_level_siblings / core_level_siblings);
 #endif
 	return 0;
 }
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ccd1f2a8e557..4250a87f57db 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -393,6 +393,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 		int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
 
 		if (c->phys_proc_id == o->phys_proc_id &&
+		    c->cpu_die_id == o->cpu_die_id &&
 		    per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) {
 			if (c->cpu_core_id == o->cpu_core_id)
 				return topology_sane(c, o, "smt");
@@ -404,6 +405,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 		}
 
 	} else if (c->phys_proc_id == o->phys_proc_id &&
+		   c->cpu_die_id == o->cpu_die_id &&
 		   c->cpu_core_id == o->cpu_core_id) {
 		return topology_sane(c, o, "smt");
 	}
@@ -461,7 +463,7 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
  */
 static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 {
-	if (c->phys_proc_id == o->phys_proc_id)
+	if (c->cpu_die_id == o->cpu_die_id)
 		return true;
 	return false;
 }
-- 
2.18.0-rc0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 04/11] cpu topology: export die_id
  2019-02-19  3:22 ` [PATCH 01/11] x86 topology: fix doc typo Len Brown
  2019-02-19  3:22   ` [PATCH 02/11] topolgy: simplify cputopology.txt formatting and wording Len Brown
  2019-02-19  3:22   ` [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support Len Brown
@ 2019-02-19  3:22   ` Len Brown
  2019-02-19  3:40     ` Len Brown
  2019-02-19  3:22   ` [PATCH 05/11] x86 topology: export die_siblings Len Brown
  2019-02-19  3:40   ` [PATCH 01/11] x86 topology: fix doc typo Len Brown
  4 siblings, 1 reply; 31+ messages in thread
From: Len Brown @ 2019-02-19  3:22 UTC (permalink / raw)
  To: linux-drivers-review; +Cc: linux-doc

Export die_id in cpu topology, for the benefit of hardware that
has multiple die per package.  die_id is quite similar to core_id,
in that it holds multiple CPUs, but is inside a package.

This needed by topology-aware user-space.  In particular,
an application, such as turbostat(8), which needs to know
the actual scope of "package-scope" MSRs.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
---
 Documentation/cputopology.txt   | 10 ++++++++--
 arch/x86/include/asm/topology.h |  1 +
 drivers/base/topology.c         |  4 ++++
 include/linux/topology.h        |  3 +++
 4 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
index 2698da7e4f49..287213b4517b 100644
--- a/Documentation/cputopology.txt
+++ b/Documentation/cputopology.txt
@@ -12,6 +12,12 @@ physical_package_id:
 	socket number, but the actual value is architecture and platform
 	dependent.
 
+die_id:
+
+	the CPU die ID of cpuX. Typically it is the hardware platform's
+	identifier (rather than the kernel's).  The actual value is
+	architecture and platform dependent.
+
 core_id:
 
 	the CPU core ID of cpuX. Typically it is the hardware platform's
@@ -43,12 +49,12 @@ thread_siblings_list:
 core_siblings:
 
 	internal kernel map of cpuX's hardware threads within the same
-	physical_package_id.
+	die_id.
 
 core_siblings_list:
 
 	human-readable list of cpuX's hardware threads within the same
-	physical_package_id.
+	die_id.
 
 book_siblings:
 
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 453cf38a1c33..281be6bbc80d 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -106,6 +106,7 @@ extern const struct cpumask *cpu_coregroup_mask(int cpu);
 
 #define topology_logical_package_id(cpu)	(cpu_data(cpu).logical_proc_id)
 #define topology_physical_package_id(cpu)	(cpu_data(cpu).phys_proc_id)
+#define topology_die_id(cpu)			(cpu_data(cpu).cpu_die_id)
 #define topology_core_id(cpu)			(cpu_data(cpu).cpu_core_id)
 
 #ifdef CONFIG_SMP
diff --git a/drivers/base/topology.c b/drivers/base/topology.c
index 5fd9f167ecc1..50352cf96f85 100644
--- a/drivers/base/topology.c
+++ b/drivers/base/topology.c
@@ -43,6 +43,9 @@ static ssize_t name##_list_show(struct device *dev,			\
 define_id_show_func(physical_package_id);
 static DEVICE_ATTR_RO(physical_package_id);
 
+define_id_show_func(die_id);
+static DEVICE_ATTR_RO(die_id);
+
 define_id_show_func(core_id);
 static DEVICE_ATTR_RO(core_id);
 
@@ -72,6 +75,7 @@ static DEVICE_ATTR_RO(drawer_siblings_list);
 
 static struct attribute *default_attrs[] = {
 	&dev_attr_physical_package_id.attr,
+	&dev_attr_die_id.attr,
 	&dev_attr_core_id.attr,
 	&dev_attr_thread_siblings.attr,
 	&dev_attr_thread_siblings_list.attr,
diff --git a/include/linux/topology.h b/include/linux/topology.h
index cb0775e1ee4b..5cc8595dd0e4 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -184,6 +184,9 @@ static inline int cpu_to_mem(int cpu)
 #ifndef topology_physical_package_id
 #define topology_physical_package_id(cpu)	((void)(cpu), -1)
 #endif
+#ifndef topology_die_id
+#define topology_die_id(cpu)			((void)(cpu), -1)
+#endif
 #ifndef topology_core_id
 #define topology_core_id(cpu)			((void)(cpu), 0)
 #endif
-- 
2.18.0-rc0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 05/11] x86 topology: export die_siblings
  2019-02-19  3:22 ` [PATCH 01/11] x86 topology: fix doc typo Len Brown
                     ` (2 preceding siblings ...)
  2019-02-19  3:22   ` [PATCH 04/11] cpu topology: export die_id Len Brown
@ 2019-02-19  3:22   ` Len Brown
  2019-02-19  3:40     ` Len Brown
                       ` (2 more replies)
  2019-02-19  3:40   ` [PATCH 01/11] x86 topology: fix doc typo Len Brown
  4 siblings, 3 replies; 31+ messages in thread
From: Len Brown @ 2019-02-19  3:22 UTC (permalink / raw)
  To: linux-drivers-review; +Cc: linux-doc

like core_siblings, except it shows which die are in the same package.

This is needed for lscpu(1) to correctly display die topology.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
---
 Documentation/cputopology.txt   | 10 ++++++++++
 arch/x86/include/asm/smp.h      |  1 +
 arch/x86/include/asm/topology.h |  1 +
 arch/x86/kernel/smpboot.c       | 20 ++++++++++++++++++++
 arch/x86/xen/smp_pv.c           |  1 +
 drivers/base/topology.c         |  6 ++++++
 include/linux/topology.h        |  3 +++
 7 files changed, 42 insertions(+)

diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
index 287213b4517b..7dd2ae3df233 100644
--- a/Documentation/cputopology.txt
+++ b/Documentation/cputopology.txt
@@ -56,6 +56,16 @@ core_siblings_list:
 	human-readable list of cpuX's hardware threads within the same
 	die_id.
 
+die_siblings:
+
+	internal kernel map of cpuX's hardware threads within the same
+	physical_package_id.
+
+die_siblings_list:
+
+	human-readable list of cpuX's hardware threads within the same
+	physical_package_id.
+
 book_siblings:
 
 	internal kernel map of cpuX's hardware threads within the same
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 2e95b6c1bca3..39266d193597 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -23,6 +23,7 @@ extern unsigned int num_processors;
 
 DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
 DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_map);
+DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_die_map);
 /* cpus sharing the last level cache: */
 DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);
 DECLARE_PER_CPU_READ_MOSTLY(u16, cpu_llc_id);
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 281be6bbc80d..a52a572147ba 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -110,6 +110,7 @@ extern const struct cpumask *cpu_coregroup_mask(int cpu);
 #define topology_core_id(cpu)			(cpu_data(cpu).cpu_core_id)
 
 #ifdef CONFIG_SMP
+#define topology_die_cpumask(cpu)		(per_cpu(cpu_die_map, cpu))
 #define topology_core_cpumask(cpu)		(per_cpu(cpu_core_map, cpu))
 #define topology_sibling_cpumask(cpu)		(per_cpu(cpu_sibling_map, cpu))
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 4250a87f57db..42d37e4a1918 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -90,6 +90,10 @@ EXPORT_PER_CPU_SYMBOL(cpu_sibling_map);
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_map);
 EXPORT_PER_CPU_SYMBOL(cpu_core_map);
 
+/* representing HT and core and die siblings of each logical CPU */
+DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_die_map);
+EXPORT_PER_CPU_SYMBOL(cpu_die_map);
+
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);
 
 /* Per CPU bogomips and other parameters */
@@ -461,6 +465,12 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
  * multicore group inside a NUMA node.  If this happens, we will
  * discard the MC level of the topology later.
  */
+static bool match_pkg(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
+{
+	if (c->phys_proc_id == o->phys_proc_id)
+		return true;
+	return false;
+}
 static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 {
 	if (c->cpu_die_id == o->cpu_die_id)
@@ -530,6 +540,7 @@ void set_cpu_sibling_map(int cpu)
 		cpumask_set_cpu(cpu, topology_sibling_cpumask(cpu));
 		cpumask_set_cpu(cpu, cpu_llc_shared_mask(cpu));
 		cpumask_set_cpu(cpu, topology_core_cpumask(cpu));
+		cpumask_set_cpu(cpu, topology_die_cpumask(cpu));
 		c->booted_cores = 1;
 		return;
 	}
@@ -576,8 +587,12 @@ void set_cpu_sibling_map(int cpu)
 			} else if (i != cpu && !c->booted_cores)
 				c->booted_cores = cpu_data(i).booted_cores;
 		}
+
 		if (match_die(c, o) && !topology_same_node(c, o))
 			x86_has_numa_in_package = true;
+
+		if ((i == cpu) || (has_mp && match_pkg(c, o)))
+			link_mask(topology_die_cpumask, cpu, i);
 	}
 
 	threads = cpumask_weight(topology_sibling_cpumask(cpu));
@@ -1173,6 +1188,7 @@ static __init void disable_smp(void)
 		physid_set_mask_of_physid(0, &phys_cpu_present_map);
 	cpumask_set_cpu(0, topology_sibling_cpumask(0));
 	cpumask_set_cpu(0, topology_core_cpumask(0));
+	cpumask_set_cpu(0, topology_die_cpumask(0));
 }
 
 /*
@@ -1268,6 +1284,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
 	for_each_possible_cpu(i) {
 		zalloc_cpumask_var(&per_cpu(cpu_sibling_map, i), GFP_KERNEL);
 		zalloc_cpumask_var(&per_cpu(cpu_core_map, i), GFP_KERNEL);
+		zalloc_cpumask_var(&per_cpu(cpu_die_map, i), GFP_KERNEL);
 		zalloc_cpumask_var(&per_cpu(cpu_llc_shared_map, i), GFP_KERNEL);
 	}
 
@@ -1488,6 +1505,8 @@ static void remove_siblinginfo(int cpu)
 			cpu_data(sibling).booted_cores--;
 	}
 
+	for_each_cpu(sibling, topology_die_cpumask(cpu))
+		cpumask_clear_cpu(cpu, topology_die_cpumask(sibling));
 	for_each_cpu(sibling, topology_sibling_cpumask(cpu))
 		cpumask_clear_cpu(cpu, topology_sibling_cpumask(sibling));
 	for_each_cpu(sibling, cpu_llc_shared_mask(cpu))
@@ -1495,6 +1514,7 @@ static void remove_siblinginfo(int cpu)
 	cpumask_clear(cpu_llc_shared_mask(cpu));
 	cpumask_clear(topology_sibling_cpumask(cpu));
 	cpumask_clear(topology_core_cpumask(cpu));
+	cpumask_clear(topology_die_cpumask(cpu));
 	c->cpu_core_id = 0;
 	c->booted_cores = 0;
 	cpumask_clear_cpu(cpu, cpu_sibling_setup_mask);
diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
index 145506f9fdbe..ac13b0be8448 100644
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -251,6 +251,7 @@ static void __init xen_pv_smp_prepare_cpus(unsigned int max_cpus)
 	for_each_possible_cpu(i) {
 		zalloc_cpumask_var(&per_cpu(cpu_sibling_map, i), GFP_KERNEL);
 		zalloc_cpumask_var(&per_cpu(cpu_core_map, i), GFP_KERNEL);
+		zalloc_cpumask_var(&per_cpu(cpu_die_map, i), GFP_KERNEL);
 		zalloc_cpumask_var(&per_cpu(cpu_llc_shared_map, i), GFP_KERNEL);
 	}
 	set_cpu_sibling_map(0);
diff --git a/drivers/base/topology.c b/drivers/base/topology.c
index 50352cf96f85..5b1317ae3262 100644
--- a/drivers/base/topology.c
+++ b/drivers/base/topology.c
@@ -57,6 +57,10 @@ define_siblings_show_func(core_siblings, core_cpumask);
 static DEVICE_ATTR_RO(core_siblings);
 static DEVICE_ATTR_RO(core_siblings_list);
 
+define_siblings_show_func(die_siblings, die_cpumask);
+static DEVICE_ATTR_RO(die_siblings);
+static DEVICE_ATTR_RO(die_siblings_list);
+
 #ifdef CONFIG_SCHED_BOOK
 define_id_show_func(book_id);
 static DEVICE_ATTR_RO(book_id);
@@ -81,6 +85,8 @@ static struct attribute *default_attrs[] = {
 	&dev_attr_thread_siblings_list.attr,
 	&dev_attr_core_siblings.attr,
 	&dev_attr_core_siblings_list.attr,
+	&dev_attr_die_siblings.attr,
+	&dev_attr_die_siblings_list.attr,
 #ifdef CONFIG_SCHED_BOOK
 	&dev_attr_book_id.attr,
 	&dev_attr_book_siblings.attr,
diff --git a/include/linux/topology.h b/include/linux/topology.h
index 5cc8595dd0e4..47a3e3c08036 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -196,6 +196,9 @@ static inline int cpu_to_mem(int cpu)
 #ifndef topology_core_cpumask
 #define topology_core_cpumask(cpu)		cpumask_of(cpu)
 #endif
+#ifndef topology_die_cpumask
+#define topology_die_cpumask(cpu)		cpumask_of(cpu)
+#endif
 
 #ifdef CONFIG_SCHED_SMT
 static inline const struct cpumask *cpu_smt_mask(int cpu)
-- 
2.18.0-rc0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 01/11] x86 topology: fix doc typo
  2019-02-19  3:22 ` [PATCH 01/11] x86 topology: fix doc typo Len Brown
                     ` (3 preceding siblings ...)
  2019-02-19  3:22   ` [PATCH 05/11] x86 topology: export die_siblings Len Brown
@ 2019-02-19  3:40   ` Len Brown
  4 siblings, 0 replies; 31+ messages in thread
From: Len Brown @ 2019-02-19  3:40 UTC (permalink / raw)
  To: x86; +Cc: linux-kernel, Len Brown, linux-doc

From: Len Brown <len.brown@intel.com>

reflect actual cpuinfo_x86 field name:

s/logical_id/logical_proc_id/

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
---
 Documentation/x86/topology.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/x86/topology.txt b/Documentation/x86/topology.txt
index 2953e3ec9a02..06b3cdbc4048 100644
--- a/Documentation/x86/topology.txt
+++ b/Documentation/x86/topology.txt
@@ -51,7 +51,7 @@ The topology of a system is described in the units of:
     The physical ID of the package. This information is retrieved via CPUID
     and deduced from the APIC IDs of the cores in the package.
 
-  - cpuinfo_x86.logical_id:
+  - cpuinfo_x86.logical_proc_id:
 
     The logical ID of the package. As we do not trust BIOSes to enumerate the
     packages in a consistent way, we introduced the concept of logical package
-- 
2.18.0-rc0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 02/11] topolgy: simplify cputopology.txt formatting and wording
  2019-02-19  3:22   ` [PATCH 02/11] topolgy: simplify cputopology.txt formatting and wording Len Brown
@ 2019-02-19  3:40     ` Len Brown
  2019-02-19 19:33     ` Randy Dunlap
  1 sibling, 0 replies; 31+ messages in thread
From: Len Brown @ 2019-02-19  3:40 UTC (permalink / raw)
  To: x86; +Cc: linux-kernel, Len Brown, linux-doc

From: Len Brown <len.brown@intel.com>

No semantic changes.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
---
 Documentation/cputopology.txt | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
index c6e7e9196a8b..2698da7e4f49 100644
--- a/Documentation/cputopology.txt
+++ b/Documentation/cputopology.txt
@@ -3,76 +3,76 @@ How CPU topology info is exported via sysfs
 ===========================================
 
 Export CPU topology info via sysfs. Items (attributes) are similar
-to /proc/cpuinfo output of some architectures:
+to /proc/cpuinfo output of some architectures.  They reside in
+/sys/devices/system/cpu/cpuX/topology/:
 
-1) /sys/devices/system/cpu/cpuX/topology/physical_package_id:
+physical_package_id:
 
 	physical package id of cpuX. Typically corresponds to a physical
 	socket number, but the actual value is architecture and platform
 	dependent.
 
-2) /sys/devices/system/cpu/cpuX/topology/core_id:
+core_id:
 
 	the CPU core ID of cpuX. Typically it is the hardware platform's
 	identifier (rather than the kernel's).  The actual value is
 	architecture and platform dependent.
 
-3) /sys/devices/system/cpu/cpuX/topology/book_id:
+book_id:
 
 	the book ID of cpuX. Typically it is the hardware platform's
 	identifier (rather than the kernel's).	The actual value is
 	architecture and platform dependent.
 
-4) /sys/devices/system/cpu/cpuX/topology/drawer_id:
+drawer_id:
 
 	the drawer ID of cpuX. Typically it is the hardware platform's
 	identifier (rather than the kernel's).	The actual value is
 	architecture and platform dependent.
 
-5) /sys/devices/system/cpu/cpuX/topology/thread_siblings:
+thread_siblings:
 
 	internal kernel map of cpuX's hardware threads within the same
 	core as cpuX.
 
-6) /sys/devices/system/cpu/cpuX/topology/thread_siblings_list:
+thread_siblings_list:
 
 	human-readable list of cpuX's hardware threads within the same
 	core as cpuX.
 
-7) /sys/devices/system/cpu/cpuX/topology/core_siblings:
+core_siblings:
 
 	internal kernel map of cpuX's hardware threads within the same
 	physical_package_id.
 
-8) /sys/devices/system/cpu/cpuX/topology/core_siblings_list:
+core_siblings_list:
 
 	human-readable list of cpuX's hardware threads within the same
 	physical_package_id.
 
-9) /sys/devices/system/cpu/cpuX/topology/book_siblings:
+book_siblings:
 
 	internal kernel map of cpuX's hardware threads within the same
 	book_id.
 
-10) /sys/devices/system/cpu/cpuX/topology/book_siblings_list:
+book_siblings_list:
 
 	human-readable list of cpuX's hardware threads within the same
 	book_id.
 
-11) /sys/devices/system/cpu/cpuX/topology/drawer_siblings:
+drawer_siblings:
 
 	internal kernel map of cpuX's hardware threads within the same
 	drawer_id.
 
-12) /sys/devices/system/cpu/cpuX/topology/drawer_siblings_list:
+drawer_siblings_list:
 
 	human-readable list of cpuX's hardware threads within the same
 	drawer_id.
 
-To implement it in an architecture-neutral way, a new source file,
-drivers/base/topology.c, is to export the 6 to 12 attributes. The book
-and drawer related sysfs files will only be created if CONFIG_SCHED_BOOK
-and CONFIG_SCHED_DRAWER are selected.
+Architecture-neutral, drivers/base/topology.c, exports these attributes.
+However, the book and drawer related sysfs files will only be created if
+CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are selected, respectively.
 
 CONFIG_SCHED_BOOK and CONFIG_DRAWER are currently only used on s390, where
 they reflect the cpu and cache hierarchy.
-- 
2.18.0-rc0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support
  2019-02-19  3:22   ` [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support Len Brown
@ 2019-02-19  3:40     ` Len Brown
  2019-02-19 16:49     ` Liang, Kan
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 31+ messages in thread
From: Len Brown @ 2019-02-19  3:40 UTC (permalink / raw)
  To: x86; +Cc: linux-kernel, Len Brown, linux-doc

From: Len Brown <len.brown@intel.com>

Some new systems have multiple software-visible die within each package.
The new CPUID.1F leaf can enumerate this multi-die/package topology.

CPUID.1F a super-set of the CPUID.B "Extended Toplogy Leaf",
and a common updated routine can parse either leaf.

Legacy systems without CPUID.1F, and systems without multi-die/package
hardware, will see no functional change from this patch series.

Multi-die/package systems will use CPUID.B before this patch,
and CPUID.1F after this patch.  In the CPUID.B case, all die appear
(incorrectly) to software as individual packages.  In the CPUID.1F case,
the package id's reflect reality, and die_id's become meaningful.

Subsequent patches in this series update the kernel to be multi-die aware.
In particular, some software needs to know the difference between
a die-scope MSR and a package-scope MSR.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
---
 Documentation/x86/topology.txt   |  4 ++
 arch/x86/include/asm/processor.h |  4 +-
 arch/x86/kernel/cpu/topology.c   | 82 ++++++++++++++++++++++++--------
 arch/x86/kernel/smpboot.c        |  4 +-
 4 files changed, 73 insertions(+), 21 deletions(-)

diff --git a/Documentation/x86/topology.txt b/Documentation/x86/topology.txt
index 06b3cdbc4048..8107b6cfc9ea 100644
--- a/Documentation/x86/topology.txt
+++ b/Documentation/x86/topology.txt
@@ -46,6 +46,10 @@ The topology of a system is described in the units of:
 
     The number of cores in a package. This information is retrieved via CPUID.
 
+  - cpuinfo_x86.x86_max_dies:
+
+    The number of dies in a package. This information is retrieved via CPUID.
+
   - cpuinfo_x86.phys_proc_id:
 
     The physical ID of the package. This information is retrieved via CPUID
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 33051436c864..f2856fe03715 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -105,7 +105,8 @@ struct cpuinfo_x86 {
 	int			x86_power;
 	unsigned long		loops_per_jiffy;
 	/* cpuid returned max cores value: */
-	u16			 x86_max_cores;
+	u16			x86_max_cores;
+	u16			x86_max_dies;
 	u16			apicid;
 	u16			initial_apicid;
 	u16			x86_clflush_size;
@@ -117,6 +118,7 @@ struct cpuinfo_x86 {
 	u16			logical_proc_id;
 	/* Core id: */
 	u16			cpu_core_id;
+	u16			cpu_die_id;
 	/* Index into per_cpu list: */
 	u16			cpu_index;
 	u32			microcode;
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 8f6c784141d1..6dce6ee77849 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -15,33 +15,61 @@
 /* leaf 0xb SMT level */
 #define SMT_LEVEL	0
 
-/* leaf 0xb sub-leaf types */
+/* extended topology sub-leaf types */
 #define INVALID_TYPE	0
 #define SMT_TYPE	1
 #define CORE_TYPE	2
+#define DIE_TYPE	5
 
 #define LEAFB_SUBTYPE(ecx)		(((ecx) >> 8) & 0xff)
 #define BITS_SHIFT_NEXT_LEVEL(eax)	((eax) & 0x1f)
 #define LEVEL_MAX_SIBLINGS(ebx)		((ebx) & 0xffff)
 
-int detect_extended_topology_early(struct cpuinfo_x86 *c)
-{
 #ifdef CONFIG_SMP
+/*
+ * Check if given CPUID extended toplogy "leaf" is implemented
+ */
+static int check_extended_topology_leaf(int leaf)
+{
 	unsigned int eax, ebx, ecx, edx;
 
-	if (c->cpuid_level < 0xb)
+	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
+
+	if (ebx == 0 || (LEAFB_SUBTYPE(ecx) != SMT_TYPE))
 		return -1;
 
-	cpuid_count(0xb, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
+	return 0;
+}
+/*
+ * Return best CPUID Extended Toplogy Leaf supported
+ */
+static int detect_extended_topology_leaf(struct cpuinfo_x86 *c)
+{
+	if (c->cpuid_level >= 0x1f)
+		if (check_extended_topology_leaf(0x1f) == 0)
+			return 0x1f;
 
-	/*
-	 * check if the cpuid leaf 0xb is actually implemented.
-	 */
-	if (ebx == 0 || (LEAFB_SUBTYPE(ecx) != SMT_TYPE))
+	if (c->cpuid_level >= 0xb)
+		if (check_extended_topology_leaf(0xb) == 0)
+			return 0xb;
+
+	return -1;
+}
+#endif
+
+int detect_extended_topology_early(struct cpuinfo_x86 *c)
+{
+#ifdef CONFIG_SMP
+	unsigned int eax, ebx, ecx, edx;
+	int leaf;
+
+	leaf = detect_extended_topology_leaf(c);
+	if (leaf < 0)
 		return -1;
 
 	set_cpu_cap(c, X86_FEATURE_XTOPOLOGY);
 
+	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
 	/*
 	 * initial apic id, which also represents 32-bit extended x2apic id.
 	 */
@@ -52,7 +80,7 @@ int detect_extended_topology_early(struct cpuinfo_x86 *c)
 }
 
 /*
- * Check for extended topology enumeration cpuid leaf 0xb and if it
+ * Check for extended topology enumeration cpuid leaf, and if it
  * exists, use it for populating initial_apicid and cpu topology
  * detection.
  */
@@ -60,46 +88,62 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_SMP
 	unsigned int eax, ebx, ecx, edx, sub_index;
-	unsigned int ht_mask_width, core_plus_mask_width;
+	unsigned int ht_mask_width, core_plus_mask_width, die_plus_mask_width;
 	unsigned int core_select_mask, core_level_siblings;
+	unsigned int die_select_mask, die_level_siblings;
+	int leaf;
 
-	if (detect_extended_topology_early(c) < 0)
+	leaf = detect_extended_topology_leaf(c);
+	if (leaf  < 0)
 		return -1;
 
 	/*
 	 * Populate HT related information from sub-leaf level 0.
 	 */
-	cpuid_count(0xb, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
+	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
+	c->initial_apicid = edx;
 	core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
 	core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
+	die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
+	die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
 
 	sub_index = 1;
 	do {
-		cpuid_count(0xb, sub_index, &eax, &ebx, &ecx, &edx);
+		cpuid_count(leaf, sub_index, &eax, &ebx, &ecx, &edx);
 
 		/*
 		 * Check for the Core type in the implemented sub leaves.
 		 */
 		if (LEAFB_SUBTYPE(ecx) == CORE_TYPE) {
 			core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
+			die_level_siblings = core_level_siblings;
 			core_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
-			break;
+		}
+		if (LEAFB_SUBTYPE(ecx) == DIE_TYPE) {
+			die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
+			die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
 		}
 
 		sub_index++;
 	} while (LEAFB_SUBTYPE(ecx) != INVALID_TYPE);
 
 	core_select_mask = (~(-1 << core_plus_mask_width)) >> ht_mask_width;
-
-	c->cpu_core_id = apic->phys_pkg_id(c->initial_apicid, ht_mask_width)
-						 & core_select_mask;
-	c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid, core_plus_mask_width);
+	die_select_mask = (~(-1 << die_plus_mask_width)) >>
+				core_plus_mask_width;
+
+	c->cpu_core_id = apic->phys_pkg_id(c->initial_apicid,
+				ht_mask_width) & core_select_mask;
+	c->cpu_die_id = apic->phys_pkg_id(c->initial_apicid,
+				core_plus_mask_width) & die_select_mask;
+	c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid,
+				die_plus_mask_width);
 	/*
 	 * Reinit the apicid, now that we have extended initial_apicid.
 	 */
 	c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);
 
 	c->x86_max_cores = (core_level_siblings / smp_num_siblings);
+	c->x86_max_dies = (die_level_siblings / core_level_siblings);
 #endif
 	return 0;
 }
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ccd1f2a8e557..4250a87f57db 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -393,6 +393,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 		int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
 
 		if (c->phys_proc_id == o->phys_proc_id &&
+		    c->cpu_die_id == o->cpu_die_id &&
 		    per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) {
 			if (c->cpu_core_id == o->cpu_core_id)
 				return topology_sane(c, o, "smt");
@@ -404,6 +405,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 		}
 
 	} else if (c->phys_proc_id == o->phys_proc_id &&
+		   c->cpu_die_id == o->cpu_die_id &&
 		   c->cpu_core_id == o->cpu_core_id) {
 		return topology_sane(c, o, "smt");
 	}
@@ -461,7 +463,7 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
  */
 static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 {
-	if (c->phys_proc_id == o->phys_proc_id)
+	if (c->cpu_die_id == o->cpu_die_id)
 		return true;
 	return false;
 }
-- 
2.18.0-rc0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 04/11] cpu topology: export die_id
  2019-02-19  3:22   ` [PATCH 04/11] cpu topology: export die_id Len Brown
@ 2019-02-19  3:40     ` Len Brown
  0 siblings, 0 replies; 31+ messages in thread
From: Len Brown @ 2019-02-19  3:40 UTC (permalink / raw)
  To: x86; +Cc: linux-kernel, Len Brown, linux-doc

From: Len Brown <len.brown@intel.com>

Export die_id in cpu topology, for the benefit of hardware that
has multiple die per package.  die_id is quite similar to core_id,
in that it holds multiple CPUs, but is inside a package.

This needed by topology-aware user-space.  In particular,
an application, such as turbostat(8), which needs to know
the actual scope of "package-scope" MSRs.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
---
 Documentation/cputopology.txt   | 10 ++++++++--
 arch/x86/include/asm/topology.h |  1 +
 drivers/base/topology.c         |  4 ++++
 include/linux/topology.h        |  3 +++
 4 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
index 2698da7e4f49..287213b4517b 100644
--- a/Documentation/cputopology.txt
+++ b/Documentation/cputopology.txt
@@ -12,6 +12,12 @@ physical_package_id:
 	socket number, but the actual value is architecture and platform
 	dependent.
 
+die_id:
+
+	the CPU die ID of cpuX. Typically it is the hardware platform's
+	identifier (rather than the kernel's).  The actual value is
+	architecture and platform dependent.
+
 core_id:
 
 	the CPU core ID of cpuX. Typically it is the hardware platform's
@@ -43,12 +49,12 @@ thread_siblings_list:
 core_siblings:
 
 	internal kernel map of cpuX's hardware threads within the same
-	physical_package_id.
+	die_id.
 
 core_siblings_list:
 
 	human-readable list of cpuX's hardware threads within the same
-	physical_package_id.
+	die_id.
 
 book_siblings:
 
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 453cf38a1c33..281be6bbc80d 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -106,6 +106,7 @@ extern const struct cpumask *cpu_coregroup_mask(int cpu);
 
 #define topology_logical_package_id(cpu)	(cpu_data(cpu).logical_proc_id)
 #define topology_physical_package_id(cpu)	(cpu_data(cpu).phys_proc_id)
+#define topology_die_id(cpu)			(cpu_data(cpu).cpu_die_id)
 #define topology_core_id(cpu)			(cpu_data(cpu).cpu_core_id)
 
 #ifdef CONFIG_SMP
diff --git a/drivers/base/topology.c b/drivers/base/topology.c
index 5fd9f167ecc1..50352cf96f85 100644
--- a/drivers/base/topology.c
+++ b/drivers/base/topology.c
@@ -43,6 +43,9 @@ static ssize_t name##_list_show(struct device *dev,			\
 define_id_show_func(physical_package_id);
 static DEVICE_ATTR_RO(physical_package_id);
 
+define_id_show_func(die_id);
+static DEVICE_ATTR_RO(die_id);
+
 define_id_show_func(core_id);
 static DEVICE_ATTR_RO(core_id);
 
@@ -72,6 +75,7 @@ static DEVICE_ATTR_RO(drawer_siblings_list);
 
 static struct attribute *default_attrs[] = {
 	&dev_attr_physical_package_id.attr,
+	&dev_attr_die_id.attr,
 	&dev_attr_core_id.attr,
 	&dev_attr_thread_siblings.attr,
 	&dev_attr_thread_siblings_list.attr,
diff --git a/include/linux/topology.h b/include/linux/topology.h
index cb0775e1ee4b..5cc8595dd0e4 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -184,6 +184,9 @@ static inline int cpu_to_mem(int cpu)
 #ifndef topology_physical_package_id
 #define topology_physical_package_id(cpu)	((void)(cpu), -1)
 #endif
+#ifndef topology_die_id
+#define topology_die_id(cpu)			((void)(cpu), -1)
+#endif
 #ifndef topology_core_id
 #define topology_core_id(cpu)			((void)(cpu), 0)
 #endif
-- 
2.18.0-rc0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 05/11] x86 topology: export die_siblings
  2019-02-19  3:22   ` [PATCH 05/11] x86 topology: export die_siblings Len Brown
@ 2019-02-19  3:40     ` Len Brown
  2019-02-19 16:56     ` Liang, Kan
  2019-02-20 21:52     ` Brice Goglin
  2 siblings, 0 replies; 31+ messages in thread
From: Len Brown @ 2019-02-19  3:40 UTC (permalink / raw)
  To: x86; +Cc: linux-kernel, Len Brown, linux-doc

From: Len Brown <len.brown@intel.com>

like core_siblings, except it shows which die are in the same package.

This is needed for lscpu(1) to correctly display die topology.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
---
 Documentation/cputopology.txt   | 10 ++++++++++
 arch/x86/include/asm/smp.h      |  1 +
 arch/x86/include/asm/topology.h |  1 +
 arch/x86/kernel/smpboot.c       | 20 ++++++++++++++++++++
 arch/x86/xen/smp_pv.c           |  1 +
 drivers/base/topology.c         |  6 ++++++
 include/linux/topology.h        |  3 +++
 7 files changed, 42 insertions(+)

diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
index 287213b4517b..7dd2ae3df233 100644
--- a/Documentation/cputopology.txt
+++ b/Documentation/cputopology.txt
@@ -56,6 +56,16 @@ core_siblings_list:
 	human-readable list of cpuX's hardware threads within the same
 	die_id.
 
+die_siblings:
+
+	internal kernel map of cpuX's hardware threads within the same
+	physical_package_id.
+
+die_siblings_list:
+
+	human-readable list of cpuX's hardware threads within the same
+	physical_package_id.
+
 book_siblings:
 
 	internal kernel map of cpuX's hardware threads within the same
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 2e95b6c1bca3..39266d193597 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -23,6 +23,7 @@ extern unsigned int num_processors;
 
 DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
 DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_map);
+DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_die_map);
 /* cpus sharing the last level cache: */
 DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);
 DECLARE_PER_CPU_READ_MOSTLY(u16, cpu_llc_id);
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 281be6bbc80d..a52a572147ba 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -110,6 +110,7 @@ extern const struct cpumask *cpu_coregroup_mask(int cpu);
 #define topology_core_id(cpu)			(cpu_data(cpu).cpu_core_id)
 
 #ifdef CONFIG_SMP
+#define topology_die_cpumask(cpu)		(per_cpu(cpu_die_map, cpu))
 #define topology_core_cpumask(cpu)		(per_cpu(cpu_core_map, cpu))
 #define topology_sibling_cpumask(cpu)		(per_cpu(cpu_sibling_map, cpu))
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 4250a87f57db..42d37e4a1918 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -90,6 +90,10 @@ EXPORT_PER_CPU_SYMBOL(cpu_sibling_map);
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_map);
 EXPORT_PER_CPU_SYMBOL(cpu_core_map);
 
+/* representing HT and core and die siblings of each logical CPU */
+DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_die_map);
+EXPORT_PER_CPU_SYMBOL(cpu_die_map);
+
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);
 
 /* Per CPU bogomips and other parameters */
@@ -461,6 +465,12 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
  * multicore group inside a NUMA node.  If this happens, we will
  * discard the MC level of the topology later.
  */
+static bool match_pkg(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
+{
+	if (c->phys_proc_id == o->phys_proc_id)
+		return true;
+	return false;
+}
 static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 {
 	if (c->cpu_die_id == o->cpu_die_id)
@@ -530,6 +540,7 @@ void set_cpu_sibling_map(int cpu)
 		cpumask_set_cpu(cpu, topology_sibling_cpumask(cpu));
 		cpumask_set_cpu(cpu, cpu_llc_shared_mask(cpu));
 		cpumask_set_cpu(cpu, topology_core_cpumask(cpu));
+		cpumask_set_cpu(cpu, topology_die_cpumask(cpu));
 		c->booted_cores = 1;
 		return;
 	}
@@ -576,8 +587,12 @@ void set_cpu_sibling_map(int cpu)
 			} else if (i != cpu && !c->booted_cores)
 				c->booted_cores = cpu_data(i).booted_cores;
 		}
+
 		if (match_die(c, o) && !topology_same_node(c, o))
 			x86_has_numa_in_package = true;
+
+		if ((i == cpu) || (has_mp && match_pkg(c, o)))
+			link_mask(topology_die_cpumask, cpu, i);
 	}
 
 	threads = cpumask_weight(topology_sibling_cpumask(cpu));
@@ -1173,6 +1188,7 @@ static __init void disable_smp(void)
 		physid_set_mask_of_physid(0, &phys_cpu_present_map);
 	cpumask_set_cpu(0, topology_sibling_cpumask(0));
 	cpumask_set_cpu(0, topology_core_cpumask(0));
+	cpumask_set_cpu(0, topology_die_cpumask(0));
 }
 
 /*
@@ -1268,6 +1284,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
 	for_each_possible_cpu(i) {
 		zalloc_cpumask_var(&per_cpu(cpu_sibling_map, i), GFP_KERNEL);
 		zalloc_cpumask_var(&per_cpu(cpu_core_map, i), GFP_KERNEL);
+		zalloc_cpumask_var(&per_cpu(cpu_die_map, i), GFP_KERNEL);
 		zalloc_cpumask_var(&per_cpu(cpu_llc_shared_map, i), GFP_KERNEL);
 	}
 
@@ -1488,6 +1505,8 @@ static void remove_siblinginfo(int cpu)
 			cpu_data(sibling).booted_cores--;
 	}
 
+	for_each_cpu(sibling, topology_die_cpumask(cpu))
+		cpumask_clear_cpu(cpu, topology_die_cpumask(sibling));
 	for_each_cpu(sibling, topology_sibling_cpumask(cpu))
 		cpumask_clear_cpu(cpu, topology_sibling_cpumask(sibling));
 	for_each_cpu(sibling, cpu_llc_shared_mask(cpu))
@@ -1495,6 +1514,7 @@ static void remove_siblinginfo(int cpu)
 	cpumask_clear(cpu_llc_shared_mask(cpu));
 	cpumask_clear(topology_sibling_cpumask(cpu));
 	cpumask_clear(topology_core_cpumask(cpu));
+	cpumask_clear(topology_die_cpumask(cpu));
 	c->cpu_core_id = 0;
 	c->booted_cores = 0;
 	cpumask_clear_cpu(cpu, cpu_sibling_setup_mask);
diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
index 145506f9fdbe..ac13b0be8448 100644
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -251,6 +251,7 @@ static void __init xen_pv_smp_prepare_cpus(unsigned int max_cpus)
 	for_each_possible_cpu(i) {
 		zalloc_cpumask_var(&per_cpu(cpu_sibling_map, i), GFP_KERNEL);
 		zalloc_cpumask_var(&per_cpu(cpu_core_map, i), GFP_KERNEL);
+		zalloc_cpumask_var(&per_cpu(cpu_die_map, i), GFP_KERNEL);
 		zalloc_cpumask_var(&per_cpu(cpu_llc_shared_map, i), GFP_KERNEL);
 	}
 	set_cpu_sibling_map(0);
diff --git a/drivers/base/topology.c b/drivers/base/topology.c
index 50352cf96f85..5b1317ae3262 100644
--- a/drivers/base/topology.c
+++ b/drivers/base/topology.c
@@ -57,6 +57,10 @@ define_siblings_show_func(core_siblings, core_cpumask);
 static DEVICE_ATTR_RO(core_siblings);
 static DEVICE_ATTR_RO(core_siblings_list);
 
+define_siblings_show_func(die_siblings, die_cpumask);
+static DEVICE_ATTR_RO(die_siblings);
+static DEVICE_ATTR_RO(die_siblings_list);
+
 #ifdef CONFIG_SCHED_BOOK
 define_id_show_func(book_id);
 static DEVICE_ATTR_RO(book_id);
@@ -81,6 +85,8 @@ static struct attribute *default_attrs[] = {
 	&dev_attr_thread_siblings_list.attr,
 	&dev_attr_core_siblings.attr,
 	&dev_attr_core_siblings_list.attr,
+	&dev_attr_die_siblings.attr,
+	&dev_attr_die_siblings_list.attr,
 #ifdef CONFIG_SCHED_BOOK
 	&dev_attr_book_id.attr,
 	&dev_attr_book_siblings.attr,
diff --git a/include/linux/topology.h b/include/linux/topology.h
index 5cc8595dd0e4..47a3e3c08036 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -196,6 +196,9 @@ static inline int cpu_to_mem(int cpu)
 #ifndef topology_core_cpumask
 #define topology_core_cpumask(cpu)		cpumask_of(cpu)
 #endif
+#ifndef topology_die_cpumask
+#define topology_die_cpumask(cpu)		cpumask_of(cpu)
+#endif
 
 #ifdef CONFIG_SCHED_SMT
 static inline const struct cpumask *cpu_smt_mask(int cpu)
-- 
2.18.0-rc0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support
  2019-02-19  3:22   ` [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support Len Brown
  2019-02-19  3:40     ` Len Brown
@ 2019-02-19 16:49     ` Liang, Kan
  2019-02-19 19:27       ` Brown, Len
  2019-02-20  2:59     ` Like Xu
                       ` (2 subsequent siblings)
  4 siblings, 1 reply; 31+ messages in thread
From: Liang, Kan @ 2019-02-19 16:49 UTC (permalink / raw)
  To: Len Brown, x86; +Cc: linux-kernel, Len Brown, linux-doc



On 2/18/2019 10:40 PM, Len Brown wrote:
> From: Len Brown <len.brown@intel.com>
> 
> Some new systems have multiple software-visible die within each package.
> The new CPUID.1F leaf can enumerate this multi-die/package topology.
> 
> CPUID.1F a super-set of the CPUID.B "Extended Toplogy Leaf",
> and a common updated routine can parse either leaf.
> 
> Legacy systems without CPUID.1F, and systems without multi-die/package
> hardware, will see no functional change from this patch series.
> 
> Multi-die/package systems will use CPUID.B before this patch,
> and CPUID.1F after this patch.  In the CPUID.B case, all die appear
> (incorrectly) to software as individual packages.  In the CPUID.1F case,
> the package id's reflect reality, and die_id's become meaningful.
> 
> Subsequent patches in this series update the kernel to be multi-die aware.
> In particular, some software needs to know the difference between
> a die-scope MSR and a package-scope MSR.
> 
> Signed-off-by: Len Brown <len.brown@intel.com>
> Cc: linux-doc@vger.kernel.org
> Signed-off-by: Len Brown <len.brown@intel.com>
> ---
>   Documentation/x86/topology.txt   |  4 ++
>   arch/x86/include/asm/processor.h |  4 +-
>   arch/x86/kernel/cpu/topology.c   | 82 ++++++++++++++++++++++++--------
>   arch/x86/kernel/smpboot.c        |  4 +-
>   4 files changed, 73 insertions(+), 21 deletions(-)
> 
> diff --git a/Documentation/x86/topology.txt b/Documentation/x86/topology.txt
> index 06b3cdbc4048..8107b6cfc9ea 100644
> --- a/Documentation/x86/topology.txt
> +++ b/Documentation/x86/topology.txt
> @@ -46,6 +46,10 @@ The topology of a system is described in the units of:
>   
>       The number of cores in a package. This information is retrieved via CPUID.
>   
> +  - cpuinfo_x86.x86_max_dies:
> +
> +    The number of dies in a package. This information is retrieved via CPUID.
> +
>     - cpuinfo_x86.phys_proc_id:
>   
>       The physical ID of the package. This information is retrieved via CPUID
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 33051436c864..f2856fe03715 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -105,7 +105,8 @@ struct cpuinfo_x86 {
>   	int			x86_power;
>   	unsigned long		loops_per_jiffy;
>   	/* cpuid returned max cores value: */
> -	u16			 x86_max_cores;
> +	u16			x86_max_cores;
> +	u16			x86_max_dies;
>   	u16			apicid;
>   	u16			initial_apicid;
>   	u16			x86_clflush_size;
> @@ -117,6 +118,7 @@ struct cpuinfo_x86 {
>   	u16			logical_proc_id;
>   	/* Core id: */
>   	u16			cpu_core_id;
> +	u16			cpu_die_id;
>   	/* Index into per_cpu list: */
>   	u16			cpu_index;
>   	u32			microcode;
> diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
> index 8f6c784141d1..6dce6ee77849 100644
> --- a/arch/x86/kernel/cpu/topology.c
> +++ b/arch/x86/kernel/cpu/topology.c
> @@ -15,33 +15,61 @@
>   /* leaf 0xb SMT level */
>   #define SMT_LEVEL	0
>   
> -/* leaf 0xb sub-leaf types */
> +/* extended topology sub-leaf types */
>   #define INVALID_TYPE	0
>   #define SMT_TYPE	1
>   #define CORE_TYPE	2
> +#define DIE_TYPE	5
>   
>   #define LEAFB_SUBTYPE(ecx)		(((ecx) >> 8) & 0xff)
>   #define BITS_SHIFT_NEXT_LEVEL(eax)	((eax) & 0x1f)
>   #define LEVEL_MAX_SIBLINGS(ebx)		((ebx) & 0xffff)
>   
> -int detect_extended_topology_early(struct cpuinfo_x86 *c)
> -{
>   #ifdef CONFIG_SMP
> +/*
> + * Check if given CPUID extended toplogy "leaf" is implemented
> + */
> +static int check_extended_topology_leaf(int leaf)
> +{
>   	unsigned int eax, ebx, ecx, edx;
>   
> -	if (c->cpuid_level < 0xb)
> +	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> +
> +	if (ebx == 0 || (LEAFB_SUBTYPE(ecx) != SMT_TYPE))
>   		return -1;
>   
> -	cpuid_count(0xb, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> +	return 0;
> +}
> +/*
> + * Return best CPUID Extended Toplogy Leaf supported
> + */
> +static int detect_extended_topology_leaf(struct cpuinfo_x86 *c)
> +{
> +	if (c->cpuid_level >= 0x1f)
> +		if (check_extended_topology_leaf(0x1f) == 0)
> +			return 0x1f;
>   
> -	/*
> -	 * check if the cpuid leaf 0xb is actually implemented.
> -	 */
> -	if (ebx == 0 || (LEAFB_SUBTYPE(ecx) != SMT_TYPE))
> +	if (c->cpuid_level >= 0xb)
> +		if (check_extended_topology_leaf(0xb) == 0)
> +			return 0xb;
> +
> +	return -1;
> +}
> +#endif
> +
> +int detect_extended_topology_early(struct cpuinfo_x86 *c)
> +{
> +#ifdef CONFIG_SMP
> +	unsigned int eax, ebx, ecx, edx;
> +	int leaf;
> +
> +	leaf = detect_extended_topology_leaf(c);
> +	if (leaf < 0)
>   		return -1;
>   
>   	set_cpu_cap(c, X86_FEATURE_XTOPOLOGY);
>   
> +	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
>   	/*
>   	 * initial apic id, which also represents 32-bit extended x2apic id.
>   	 */
> @@ -52,7 +80,7 @@ int detect_extended_topology_early(struct cpuinfo_x86 *c)
>   }
>   
>   /*
> - * Check for extended topology enumeration cpuid leaf 0xb and if it
> + * Check for extended topology enumeration cpuid leaf, and if it
>    * exists, use it for populating initial_apicid and cpu topology
>    * detection.
>    */
> @@ -60,46 +88,62 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
>   {
>   #ifdef CONFIG_SMP
>   	unsigned int eax, ebx, ecx, edx, sub_index;
> -	unsigned int ht_mask_width, core_plus_mask_width;
> +	unsigned int ht_mask_width, core_plus_mask_width, die_plus_mask_width;
>   	unsigned int core_select_mask, core_level_siblings;
> +	unsigned int die_select_mask, die_level_siblings;
> +	int leaf;
>   
> -	if (detect_extended_topology_early(c) < 0)
> +	leaf = detect_extended_topology_leaf(c);
> +	if (leaf  < 0)
>   		return -1;
>   
>   	/*
>   	 * Populate HT related information from sub-leaf level 0.
>   	 */
> -	cpuid_count(0xb, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> +	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> +	c->initial_apicid = edx;
>   	core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
>   	core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
> +	die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> +	die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
>   
>   	sub_index = 1;
>   	do {
> -		cpuid_count(0xb, sub_index, &eax, &ebx, &ecx, &edx);
> +		cpuid_count(leaf, sub_index, &eax, &ebx, &ecx, &edx);
>   
>   		/*
>   		 * Check for the Core type in the implemented sub leaves.
>   		 */
>   		if (LEAFB_SUBTYPE(ecx) == CORE_TYPE) {
>   			core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> +			die_level_siblings = core_level_siblings;
>   			core_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
> -			break;
> +		}
> +		if (LEAFB_SUBTYPE(ecx) == DIE_TYPE) {
> +			die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> +			die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
>   		}
>   
>   		sub_index++;
>   	} while (LEAFB_SUBTYPE(ecx) != INVALID_TYPE);
>   
>   	core_select_mask = (~(-1 << core_plus_mask_width)) >> ht_mask_width;
> -
> -	c->cpu_core_id = apic->phys_pkg_id(c->initial_apicid, ht_mask_width)
> -						 & core_select_mask;
> -	c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid, core_plus_mask_width);
> +	die_select_mask = (~(-1 << die_plus_mask_width)) >>
> +				core_plus_mask_width;
> +
> +	c->cpu_core_id = apic->phys_pkg_id(c->initial_apicid,
> +				ht_mask_width) & core_select_mask;
> +	c->cpu_die_id = apic->phys_pkg_id(c->initial_apicid,
> +				core_plus_mask_width) & die_select_mask;
> +	c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid,
> +				die_plus_mask_width);
>   	/*
>   	 * Reinit the apicid, now that we have extended initial_apicid.
>   	 */
>   	c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);
>   
>   	c->x86_max_cores = (core_level_siblings / smp_num_siblings);
> +	c->x86_max_dies = (die_level_siblings / core_level_siblings);
>   #endif
>   	return 0;
>   }
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index ccd1f2a8e557..4250a87f57db 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -393,6 +393,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>   		int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
>   
>   		if (c->phys_proc_id == o->phys_proc_id &&
> +		    c->cpu_die_id == o->cpu_die_id &&
>   		    per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) {
>   			if (c->cpu_core_id == o->cpu_core_id)
>   				return topology_sane(c, o, "smt");
> @@ -404,6 +405,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>   		}
>   
>   	} else if (c->phys_proc_id == o->phys_proc_id &&
> +		   c->cpu_die_id == o->cpu_die_id &&
>   		   c->cpu_core_id == o->cpu_core_id) {
>   		return topology_sane(c, o, "smt");
>   	}
> @@ -461,7 +463,7 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>    */
>   static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>   {
> -	if (c->phys_proc_id == o->phys_proc_id)
> +	if (c->cpu_die_id == o->cpu_die_id)
>   		return true;
>   	return false;
>   }

Shouldn't we check the unique_die_id here?
Die from different package can have the same cpu_die_id.

Thanks,
Kan


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 05/11] x86 topology: export die_siblings
  2019-02-19  3:22   ` [PATCH 05/11] x86 topology: export die_siblings Len Brown
  2019-02-19  3:40     ` Len Brown
@ 2019-02-19 16:56     ` Liang, Kan
  2019-02-19 18:43       ` Brown, Len
  2019-02-20 21:52     ` Brice Goglin
  2 siblings, 1 reply; 31+ messages in thread
From: Liang, Kan @ 2019-02-19 16:56 UTC (permalink / raw)
  To: Len Brown, x86; +Cc: linux-kernel, Len Brown, linux-doc



On 2/18/2019 10:40 PM, Len Brown wrote:
> From: Len Brown <len.brown@intel.com>
> 
> like core_siblings, except it shows which die are in the same package.
> 
> This is needed for lscpu(1) to correctly display die topology.
> 
> Signed-off-by: Len Brown <len.brown@intel.com>
> Cc: linux-doc@vger.kernel.org
> Signed-off-by: Len Brown <len.brown@intel.com>
> ---
>   Documentation/cputopology.txt   | 10 ++++++++++
>   arch/x86/include/asm/smp.h      |  1 +
>   arch/x86/include/asm/topology.h |  1 +
>   arch/x86/kernel/smpboot.c       | 20 ++++++++++++++++++++
>   arch/x86/xen/smp_pv.c           |  1 +
>   drivers/base/topology.c         |  6 ++++++
>   include/linux/topology.h        |  3 +++
>   7 files changed, 42 insertions(+)
> 
> diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
> index 287213b4517b..7dd2ae3df233 100644
> --- a/Documentation/cputopology.txt
> +++ b/Documentation/cputopology.txt
> @@ -56,6 +56,16 @@ core_siblings_list:
>   	human-readable list of cpuX's hardware threads within the same
>   	die_id.
>   
> +die_siblings:
> +
> +	internal kernel map of cpuX's hardware threads within the same
> +	physical_package_id.
> +
> +die_siblings_list:
> +
> +	human-readable list of cpuX's hardware threads within the same
> +	physical_package_id.
> +
>   book_siblings:
>   
>   	internal kernel map of cpuX's hardware threads within the same
> diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
> index 2e95b6c1bca3..39266d193597 100644
> --- a/arch/x86/include/asm/smp.h
> +++ b/arch/x86/include/asm/smp.h
> @@ -23,6 +23,7 @@ extern unsigned int num_processors;
>   
>   DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
>   DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_map);
> +DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_die_map);
>   /* cpus sharing the last level cache: */
>   DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);
>   DECLARE_PER_CPU_READ_MOSTLY(u16, cpu_llc_id);
> diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
> index 281be6bbc80d..a52a572147ba 100644
> --- a/arch/x86/include/asm/topology.h
> +++ b/arch/x86/include/asm/topology.h
> @@ -110,6 +110,7 @@ extern const struct cpumask *cpu_coregroup_mask(int cpu);
>   #define topology_core_id(cpu)			(cpu_data(cpu).cpu_core_id)
>   
>   #ifdef CONFIG_SMP
> +#define topology_die_cpumask(cpu)		(per_cpu(cpu_die_map, cpu))
>   #define topology_core_cpumask(cpu)		(per_cpu(cpu_core_map, cpu))

Could you please update the document regarding to topology_die_cpumask 
and topology_core_cpumask in Documentation/x86/topology.txt

Thanks,
Kan

>   #define topology_sibling_cpumask(cpu)		(per_cpu(cpu_sibling_map, cpu))
>   
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 4250a87f57db..42d37e4a1918 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -90,6 +90,10 @@ EXPORT_PER_CPU_SYMBOL(cpu_sibling_map);
>   DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_core_map);
>   EXPORT_PER_CPU_SYMBOL(cpu_core_map);
>   
> +/* representing HT and core and die siblings of each logical CPU */
> +DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_die_map);
> +EXPORT_PER_CPU_SYMBOL(cpu_die_map);
> +
>   DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map);
>   
>   /* Per CPU bogomips and other parameters */
> @@ -461,6 +465,12 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>    * multicore group inside a NUMA node.  If this happens, we will
>    * discard the MC level of the topology later.
>    */
> +static bool match_pkg(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> +{
> +	if (c->phys_proc_id == o->phys_proc_id)
> +		return true;
> +	return false;
> +}
>   static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>   {
>   	if (c->cpu_die_id == o->cpu_die_id)
> @@ -530,6 +540,7 @@ void set_cpu_sibling_map(int cpu)
>   		cpumask_set_cpu(cpu, topology_sibling_cpumask(cpu));
>   		cpumask_set_cpu(cpu, cpu_llc_shared_mask(cpu));
>   		cpumask_set_cpu(cpu, topology_core_cpumask(cpu));
> +		cpumask_set_cpu(cpu, topology_die_cpumask(cpu));
>   		c->booted_cores = 1;
>   		return;
>   	}
> @@ -576,8 +587,12 @@ void set_cpu_sibling_map(int cpu)
>   			} else if (i != cpu && !c->booted_cores)
>   				c->booted_cores = cpu_data(i).booted_cores;
>   		}
> +
>   		if (match_die(c, o) && !topology_same_node(c, o))
>   			x86_has_numa_in_package = true;
> +
> +		if ((i == cpu) || (has_mp && match_pkg(c, o)))
> +			link_mask(topology_die_cpumask, cpu, i);
>   	}
>   
>   	threads = cpumask_weight(topology_sibling_cpumask(cpu));
> @@ -1173,6 +1188,7 @@ static __init void disable_smp(void)
>   		physid_set_mask_of_physid(0, &phys_cpu_present_map);
>   	cpumask_set_cpu(0, topology_sibling_cpumask(0));
>   	cpumask_set_cpu(0, topology_core_cpumask(0));
> +	cpumask_set_cpu(0, topology_die_cpumask(0));
>   }
>   
>   /*
> @@ -1268,6 +1284,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
>   	for_each_possible_cpu(i) {
>   		zalloc_cpumask_var(&per_cpu(cpu_sibling_map, i), GFP_KERNEL);
>   		zalloc_cpumask_var(&per_cpu(cpu_core_map, i), GFP_KERNEL);
> +		zalloc_cpumask_var(&per_cpu(cpu_die_map, i), GFP_KERNEL);
>   		zalloc_cpumask_var(&per_cpu(cpu_llc_shared_map, i), GFP_KERNEL);
>   	}
>   
> @@ -1488,6 +1505,8 @@ static void remove_siblinginfo(int cpu)
>   			cpu_data(sibling).booted_cores--;
>   	}
>   
> +	for_each_cpu(sibling, topology_die_cpumask(cpu))
> +		cpumask_clear_cpu(cpu, topology_die_cpumask(sibling));
>   	for_each_cpu(sibling, topology_sibling_cpumask(cpu))
>   		cpumask_clear_cpu(cpu, topology_sibling_cpumask(sibling));
>   	for_each_cpu(sibling, cpu_llc_shared_mask(cpu))
> @@ -1495,6 +1514,7 @@ static void remove_siblinginfo(int cpu)
>   	cpumask_clear(cpu_llc_shared_mask(cpu));
>   	cpumask_clear(topology_sibling_cpumask(cpu));
>   	cpumask_clear(topology_core_cpumask(cpu));
> +	cpumask_clear(topology_die_cpumask(cpu));
>   	c->cpu_core_id = 0;
>   	c->booted_cores = 0;
>   	cpumask_clear_cpu(cpu, cpu_sibling_setup_mask);
> diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
> index 145506f9fdbe..ac13b0be8448 100644
> --- a/arch/x86/xen/smp_pv.c
> +++ b/arch/x86/xen/smp_pv.c
> @@ -251,6 +251,7 @@ static void __init xen_pv_smp_prepare_cpus(unsigned int max_cpus)
>   	for_each_possible_cpu(i) {
>   		zalloc_cpumask_var(&per_cpu(cpu_sibling_map, i), GFP_KERNEL);
>   		zalloc_cpumask_var(&per_cpu(cpu_core_map, i), GFP_KERNEL);
> +		zalloc_cpumask_var(&per_cpu(cpu_die_map, i), GFP_KERNEL);
>   		zalloc_cpumask_var(&per_cpu(cpu_llc_shared_map, i), GFP_KERNEL);
>   	}
>   	set_cpu_sibling_map(0);
> diff --git a/drivers/base/topology.c b/drivers/base/topology.c
> index 50352cf96f85..5b1317ae3262 100644
> --- a/drivers/base/topology.c
> +++ b/drivers/base/topology.c
> @@ -57,6 +57,10 @@ define_siblings_show_func(core_siblings, core_cpumask);
>   static DEVICE_ATTR_RO(core_siblings);
>   static DEVICE_ATTR_RO(core_siblings_list);
>   
> +define_siblings_show_func(die_siblings, die_cpumask);
> +static DEVICE_ATTR_RO(die_siblings);
> +static DEVICE_ATTR_RO(die_siblings_list);
> +
>   #ifdef CONFIG_SCHED_BOOK
>   define_id_show_func(book_id);
>   static DEVICE_ATTR_RO(book_id);
> @@ -81,6 +85,8 @@ static struct attribute *default_attrs[] = {
>   	&dev_attr_thread_siblings_list.attr,
>   	&dev_attr_core_siblings.attr,
>   	&dev_attr_core_siblings_list.attr,
> +	&dev_attr_die_siblings.attr,
> +	&dev_attr_die_siblings_list.attr,
>   #ifdef CONFIG_SCHED_BOOK
>   	&dev_attr_book_id.attr,
>   	&dev_attr_book_siblings.attr,
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index 5cc8595dd0e4..47a3e3c08036 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -196,6 +196,9 @@ static inline int cpu_to_mem(int cpu)
>   #ifndef topology_core_cpumask
>   #define topology_core_cpumask(cpu)		cpumask_of(cpu)
>   #endif
> +#ifndef topology_die_cpumask
> +#define topology_die_cpumask(cpu)		cpumask_of(cpu)
> +#endif
>   
>   #ifdef CONFIG_SCHED_SMT
>   static inline const struct cpumask *cpu_smt_mask(int cpu)
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 05/11] x86 topology: export die_siblings
  2019-02-19 16:56     ` Liang, Kan
@ 2019-02-19 18:43       ` Brown, Len
  2019-02-19 19:33         ` Liang, Kan
  0 siblings, 1 reply; 31+ messages in thread
From: Brown, Len @ 2019-02-19 18:43 UTC (permalink / raw)
  To: Liang, Kan, Len Brown, x86; +Cc: linux-kernel, linux-doc

Thanks for the comments, Kan,

>> diff --git a/Documentation/cputopology.txt 
>> b/Documentation/cputopology.txt index 287213b4517b..7dd2ae3df233 
>> 100644
>> --- a/Documentation/cputopology.txt
>> +++ b/Documentation/cputopology.txt
>> @@ -56,6 +56,16 @@ core_siblings_list:
>>   	human-readable list of cpuX's hardware threads within the same
>>   	die_id.
>>   
>> +die_siblings:
>> +
>> +	internal kernel map of cpuX's hardware threads within the same
>> +	physical_package_id.
>> +
>> +die_siblings_list:
>> +
>> +	human-readable list of cpuX's hardware threads within the same
>> +	physical_package_id.
>> +
>>   book_siblings:
>>   
>>   	internal kernel map of cpuX's hardware threads within the same diff 

> Could you please update the document regarding to topology_die_cpumask and topology_core_cpumask in Documentation/x86/topology.txt
 
I agree that the top part of this file, as updated above, should document the external sysfs interface...

I'm less excited about the center of the file trying to document the internal implementation -- as the source code
is actually more clear than the document, but here is an update, consistent with the existing file.
Let me know if it does not fully address your comment.

Thanks,
-Len
---

diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
index 7dd2ae3..f39c759 100644
--- a/Documentation/cputopology.txt
+++ b/Documentation/cputopology.txt
@@ -102,6 +102,7 @@ these macros in include/asm-XXX/topology.h::
        #define topology_drawer_id(cpu)
        #define topology_sibling_cpumask(cpu)
        #define topology_core_cpumask(cpu)
+       #define topology_die_cpumask(cpu)
        #define topology_book_cpumask(cpu)
        #define topology_drawer_cpumask(cpu)

@@ -114,10 +115,11 @@ To be consistent on all architectures, include/linux/topology.h
 provides default definitions for any of the above macros that are
 not defined by include/asm-XXX/topology.h:

-1) physical_package_id: -1
-2) core_id: 0
-3) sibling_cpumask: just the given CPU
-4) core_cpumask: just the given CPU
+1) topology_physical_package_id: -1
+2) topology_core_id: 0
+3) topology_sibling_cpumask: just the given CPU
+4) topology_core_cpumask: just the given CPU
+5) topology_die_cpumask: just the given CPU

 For architectures that don't support books (CONFIG_SCHED_BOOK) there are no
 default definitions for topology_book_id() and topology_book_cpumask().



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* RE: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support
  2019-02-19 16:49     ` Liang, Kan
@ 2019-02-19 19:27       ` Brown, Len
  0 siblings, 0 replies; 31+ messages in thread
From: Brown, Len @ 2019-02-19 19:27 UTC (permalink / raw)
  To: Liang, Kan, Len Brown, x86; +Cc: linux-kernel, linux-doc

>> @@ -461,7 +463,7 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>>    */
>>   static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>>   {
>> -	if (c->>phys_proc_id == o->>phys_proc_id)
>> +	if (c->>cpu_die_id == o->>cpu_die_id)
>>   		return true;
>>   	return false;
>>   }

> Shouldn't we check the unique_die_id here?
> Die from different package can have the same cpu_die_id.

Good catch.
Updated this hunk as below.

Thanks!
-Len

@@ -461,7 +463,8 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
  */
 static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 {
-       if (c->phys_proc_id == o->phys_proc_id)
+       if ((c->phys_proc_id == o->phys_proc_id) &&
+               (c->cpu_die_id == o->cpu_die_id))
                return true;
        return false;
 }


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 02/11] topolgy: simplify cputopology.txt formatting and wording
  2019-02-19  3:22   ` [PATCH 02/11] topolgy: simplify cputopology.txt formatting and wording Len Brown
  2019-02-19  3:40     ` Len Brown
@ 2019-02-19 19:33     ` Randy Dunlap
  2019-02-19 20:33       ` [linux-drivers-review] " Brown, Len
  1 sibling, 1 reply; 31+ messages in thread
From: Randy Dunlap @ 2019-02-19 19:33 UTC (permalink / raw)
  To: Len Brown, linux-drivers-review; +Cc: linux-doc

On 2/18/19 7:22 PM, Len Brown wrote:
> No semantic changes.
> 
> Signed-off-by: Len Brown <len.brown@intel.com>
> Cc: linux-doc@vger.kernel.org
> Signed-off-by: Len Brown <len.brown@intel.com>
> ---
>  Documentation/cputopology.txt | 34 +++++++++++++++++-----------------
>  1 file changed, 17 insertions(+), 17 deletions(-)
> 
> diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
> index c6e7e9196a8b..2698da7e4f49 100644
> --- a/Documentation/cputopology.txt
> +++ b/Documentation/cputopology.txt
> @@ -3,76 +3,76 @@ How CPU topology info is exported via sysfs
>  ===========================================
>  
>  Export CPU topology info via sysfs. Items (attributes) are similar
> -to /proc/cpuinfo output of some architectures:
> +to /proc/cpuinfo output of some architectures.  They reside in
> +/sys/devices/system/cpu/cpuX/topology/:
>  
> -1) /sys/devices/system/cpu/cpuX/topology/physical_package_id:
> +physical_package_id:
>  
>  	physical package id of cpuX. Typically corresponds to a physical
>  	socket number, but the actual value is architecture and platform
>  	dependent.
>  
> -2) /sys/devices/system/cpu/cpuX/topology/core_id:
> +core_id:
>  
>  	the CPU core ID of cpuX. Typically it is the hardware platform's
>  	identifier (rather than the kernel's).  The actual value is
>  	architecture and platform dependent.
>  
> -3) /sys/devices/system/cpu/cpuX/topology/book_id:
> +book_id:
>  
>  	the book ID of cpuX. Typically it is the hardware platform's
>  	identifier (rather than the kernel's).	The actual value is
>  	architecture and platform dependent.
>  
> -4) /sys/devices/system/cpu/cpuX/topology/drawer_id:
> +drawer_id:
>  
>  	the drawer ID of cpuX. Typically it is the hardware platform's
>  	identifier (rather than the kernel's).	The actual value is
>  	architecture and platform dependent.
>  
> -5) /sys/devices/system/cpu/cpuX/topology/thread_siblings:
> +thread_siblings:
>  
>  	internal kernel map of cpuX's hardware threads within the same
>  	core as cpuX.
>  
> -6) /sys/devices/system/cpu/cpuX/topology/thread_siblings_list:
> +thread_siblings_list:
>  
>  	human-readable list of cpuX's hardware threads within the same
>  	core as cpuX.
>  
> -7) /sys/devices/system/cpu/cpuX/topology/core_siblings:
> +core_siblings:
>  
>  	internal kernel map of cpuX's hardware threads within the same
>  	physical_package_id.
>  
> -8) /sys/devices/system/cpu/cpuX/topology/core_siblings_list:
> +core_siblings_list:
>  
>  	human-readable list of cpuX's hardware threads within the same
>  	physical_package_id.
>  
> -9) /sys/devices/system/cpu/cpuX/topology/book_siblings:
> +book_siblings:
>  
>  	internal kernel map of cpuX's hardware threads within the same
>  	book_id.
>  
> -10) /sys/devices/system/cpu/cpuX/topology/book_siblings_list:
> +book_siblings_list:
>  
>  	human-readable list of cpuX's hardware threads within the same
>  	book_id.
>  
> -11) /sys/devices/system/cpu/cpuX/topology/drawer_siblings:
> +drawer_siblings:
>  
>  	internal kernel map of cpuX's hardware threads within the same
>  	drawer_id.
>  
> -12) /sys/devices/system/cpu/cpuX/topology/drawer_siblings_list:
> +drawer_siblings_list:
>  
>  	human-readable list of cpuX's hardware threads within the same
>  	drawer_id.
>  
> -To implement it in an architecture-neutral way, a new source file,
> -drivers/base/topology.c, is to export the 6 to 12 attributes. The book
> -and drawer related sysfs files will only be created if CONFIG_SCHED_BOOK
> -and CONFIG_SCHED_DRAWER are selected.
> +Architecture-neutral, drivers/base/topology.c, exports these attributes.
> +However, the book and drawer related sysfs files will only be created if
> +CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are selected, respectively.
>  
>  CONFIG_SCHED_BOOK and CONFIG_DRAWER are currently only used on s390, where

fwiw:                    CONFIG_SCHED_DRAWER

>  they reflect the cpu and cache hierarchy.


-- 
~Randy

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 05/11] x86 topology: export die_siblings
  2019-02-19 18:43       ` Brown, Len
@ 2019-02-19 19:33         ` Liang, Kan
  2019-02-20 10:58           ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Liang, Kan @ 2019-02-19 19:33 UTC (permalink / raw)
  To: Brown, Len, Len Brown, x86; +Cc: linux-kernel, linux-doc



On 2/19/2019 1:43 PM, Brown, Len wrote:
> Thanks for the comments, Kan,
> 
>>> diff --git a/Documentation/cputopology.txt
>>> b/Documentation/cputopology.txt index 287213b4517b..7dd2ae3df233
>>> 100644
>>> --- a/Documentation/cputopology.txt
>>> +++ b/Documentation/cputopology.txt
>>> @@ -56,6 +56,16 @@ core_siblings_list:
>>>    	human-readable list of cpuX's hardware threads within the same
>>>    	die_id.
>>>    
>>> +die_siblings:
>>> +
>>> +	internal kernel map of cpuX's hardware threads within the same
>>> +	physical_package_id.
>>> +
>>> +die_siblings_list:
>>> +
>>> +	human-readable list of cpuX's hardware threads within the same
>>> +	physical_package_id.
>>> +
>>>    book_siblings:
>>>    
>>>    	internal kernel map of cpuX's hardware threads within the same diff
> 
>> Could you please update the document regarding to topology_die_cpumask and topology_core_cpumask in Documentation/x86/topology.txt
>   
> I agree that the top part of this file, as updated above, should document the external sysfs interface...
> 
> I'm less excited about the center of the file trying to document the internal implementation -- as the source code
> is actually more clear than the document, but here is an update, consistent with the existing file.
> Let me know if it does not fully address your comment.
>

Besides the generic document, I think we should update x86 document as 
well, which is in Documentation/x86/topology.txt.

The definition of topology_core_cpumask has to be changed to per die, right?

diff --git a/Documentation/x86/topology.txt b/Documentation/x86/topology.txt
index 8107b6c..8698a41 100644
--- a/Documentation/x86/topology.txt
+++ b/Documentation/x86/topology.txt
@@ -105,11 +105,16 @@ The topology of a system is described in the units of:

    Thread-related topology information in the kernel:

-  - topology_core_cpumask():
+  - topology_die_cpumask():

      The cpumask contains all online threads in the package to which a 
thread
      belongs.

+  - topology_core_cpumask():
+
+    The cpumask contains all online threads in the die to which a thread
+    belongs.
+
      The number of online threads is also printed in /proc/cpuinfo 
"siblings."

    - topology_sibling_cpumask():


Thanks,
Kan

> Thanks,
> -Len
> ---
> 
> diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
> index 7dd2ae3..f39c759 100644
> --- a/Documentation/cputopology.txt
> +++ b/Documentation/cputopology.txt
> @@ -102,6 +102,7 @@ these macros in include/asm-XXX/topology.h::
>          #define topology_drawer_id(cpu)
>          #define topology_sibling_cpumask(cpu)
>          #define topology_core_cpumask(cpu)
> +       #define topology_die_cpumask(cpu)
>          #define topology_book_cpumask(cpu)
>          #define topology_drawer_cpumask(cpu)
> 
> @@ -114,10 +115,11 @@ To be consistent on all architectures, include/linux/topology.h
>   provides default definitions for any of the above macros that are
>   not defined by include/asm-XXX/topology.h:
> 
> -1) physical_package_id: -1
> -2) core_id: 0
> -3) sibling_cpumask: just the given CPU
> -4) core_cpumask: just the given CPU
> +1) topology_physical_package_id: -1
> +2) topology_core_id: 0
> +3) topology_sibling_cpumask: just the given CPU
> +4) topology_core_cpumask: just the given CPU
> +5) topology_die_cpumask: just the given CPU
> 
>   For architectures that don't support books (CONFIG_SCHED_BOOK) there are no
>   default definitions for topology_book_id() and topology_book_cpumask().
> 
> 

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* RE: [linux-drivers-review] [PATCH 02/11] topolgy: simplify cputopology.txt formatting and wording
  2019-02-19 19:33     ` Randy Dunlap
@ 2019-02-19 20:33       ` Brown, Len
  0 siblings, 0 replies; 31+ messages in thread
From: Brown, Len @ 2019-02-19 20:33 UTC (permalink / raw)
  To: Randy Dunlap, Len Brown, x86, linux-kernel; +Cc: linux-doc

>> +if CONFIG_SCHED_BOOK and CONFIG_SCHED_DRAWER are selected, respectively.
>>  
>>  CONFIG_SCHED_BOOK and CONFIG_DRAWER are currently only used on s390, 
>> where

>fwiw:                    CONFIG_SCHED_DRAWER

Heh.  Seems that every line of update to a .txt files uncovers two lines of existing bugs.

Hopefully we are better at writing software, than documentation!

Indeed, maybe if we had a compiler or checkpatch.pl file for documentation files,
They would approach the quality of the source code?

I updated the patch to fix this too.

Thanks,
-Len



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support
  2019-02-19  3:22   ` [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support Len Brown
  2019-02-19  3:40     ` Len Brown
  2019-02-19 16:49     ` Liang, Kan
@ 2019-02-20  2:59     ` Like Xu
  2019-02-20  6:10       ` Len Brown
  2019-02-20 10:55     ` Peter Zijlstra
  2019-02-24 10:04     ` Brice Goglin
  4 siblings, 1 reply; 31+ messages in thread
From: Like Xu @ 2019-02-20  2:59 UTC (permalink / raw)
  To: Len Brown, x86; +Cc: linux-kernel, Len Brown, linux-doc

On 2019/2/19 11:40, Len Brown wrote:
> From: Len Brown <len.brown@intel.com>
> 
> Some new systems have multiple software-visible die within each package.
> The new CPUID.1F leaf can enumerate this multi-die/package topology.
> 
> CPUID.1F a super-set of the CPUID.B "Extended Toplogy Leaf",
> and a common updated routine can parse either leaf.
> 
> Legacy systems without CPUID.1F, and systems without multi-die/package
> hardware, will see no functional change from this patch series.
> 
> Multi-die/package systems will use CPUID.B before this patch,
> and CPUID.1F after this patch.  In the CPUID.B case, all die appear
> (incorrectly) to software as individual packages.  In the CPUID.1F case,
> the package id's reflect reality, and die_id's become meaningful.
> 
> Subsequent patches in this series update the kernel to be multi-die aware.
> In particular, some software needs to know the difference between
> a die-scope MSR and a package-scope MSR.
> 
> Signed-off-by: Len Brown <len.brown@intel.com>
> Cc: linux-doc@vger.kernel.org
> Signed-off-by: Len Brown <len.brown@intel.com>
> ---
>   Documentation/x86/topology.txt   |  4 ++
>   arch/x86/include/asm/processor.h |  4 +-
>   arch/x86/kernel/cpu/topology.c   | 82 ++++++++++++++++++++++++--------
>   arch/x86/kernel/smpboot.c        |  4 +-
>   4 files changed, 73 insertions(+), 21 deletions(-)
> 
> diff --git a/Documentation/x86/topology.txt b/Documentation/x86/topology.txt
> index 06b3cdbc4048..8107b6cfc9ea 100644
> --- a/Documentation/x86/topology.txt
> +++ b/Documentation/x86/topology.txt
> @@ -46,6 +46,10 @@ The topology of a system is described in the units of:
>   
>       The number of cores in a package. This information is retrieved via CPUID.
>   
> +  - cpuinfo_x86.x86_max_dies:
> +
> +    The number of dies in a package. This information is retrieved via CPUID.
> +
>     - cpuinfo_x86.phys_proc_id:
>   
>       The physical ID of the package. This information is retrieved via CPUID
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 33051436c864..f2856fe03715 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -105,7 +105,8 @@ struct cpuinfo_x86 {
>   	int			x86_power;
>   	unsigned long		loops_per_jiffy;
>   	/* cpuid returned max cores value: */
> -	u16			 x86_max_cores;
> +	u16			x86_max_cores;
> +	u16			x86_max_dies;
>   	u16			apicid;
>   	u16			initial_apicid;
>   	u16			x86_clflush_size;
> @@ -117,6 +118,7 @@ struct cpuinfo_x86 {
>   	u16			logical_proc_id;
>   	/* Core id: */
>   	u16			cpu_core_id;
> +	u16			cpu_die_id;
>   	/* Index into per_cpu list: */
>   	u16			cpu_index;
>   	u32			microcode;
> diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
> index 8f6c784141d1..6dce6ee77849 100644
> --- a/arch/x86/kernel/cpu/topology.c
> +++ b/arch/x86/kernel/cpu/topology.c
> @@ -15,33 +15,61 @@
>   /* leaf 0xb SMT level */
>   #define SMT_LEVEL	0
>   
> -/* leaf 0xb sub-leaf types */
> +/* extended topology sub-leaf types */
>   #define INVALID_TYPE	0
>   #define SMT_TYPE	1
>   #define CORE_TYPE	2
> +#define DIE_TYPE	5

This patch set is going to export die topology via sysfs
while the extended topology sub-leaf type could be one of followings:
SMT/Core/Module/Tile/Die according to CPUID.1F from SDM.

Why not choose to export module/tile topology as well?

>   
>   #define LEAFB_SUBTYPE(ecx)		(((ecx) >> 8) & 0xff)
>   #define BITS_SHIFT_NEXT_LEVEL(eax)	((eax) & 0x1f)
>   #define LEVEL_MAX_SIBLINGS(ebx)		((ebx) & 0xffff)
>   
> -int detect_extended_topology_early(struct cpuinfo_x86 *c)
> -{
>   #ifdef CONFIG_SMP
> +/*
> + * Check if given CPUID extended toplogy "leaf" is implemented
> + */
> +static int check_extended_topology_leaf(int leaf)
> +{
>   	unsigned int eax, ebx, ecx, edx;
>   
> -	if (c->cpuid_level < 0xb)
> +	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> +
> +	if (ebx == 0 || (LEAFB_SUBTYPE(ecx) != SMT_TYPE))
>   		return -1;
>   
> -	cpuid_count(0xb, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> +	return 0;
> +}
> +/*
> + * Return best CPUID Extended Toplogy Leaf supported
> + */
> +static int detect_extended_topology_leaf(struct cpuinfo_x86 *c)
> +{
> +	if (c->cpuid_level >= 0x1f)
> +		if (check_extended_topology_leaf(0x1f) == 0)
> +			return 0x1f;
>   
> -	/*
> -	 * check if the cpuid leaf 0xb is actually implemented.
> -	 */
> -	if (ebx == 0 || (LEAFB_SUBTYPE(ecx) != SMT_TYPE))
> +	if (c->cpuid_level >= 0xb)
> +		if (check_extended_topology_leaf(0xb) == 0)
> +			return 0xb;
> +
> +	return -1;
> +}
> +#endif
> +
> +int detect_extended_topology_early(struct cpuinfo_x86 *c)
> +{
> +#ifdef CONFIG_SMP
> +	unsigned int eax, ebx, ecx, edx;
> +	int leaf;
> +
> +	leaf = detect_extended_topology_leaf(c);
> +	if (leaf < 0)
>   		return -1;
>   
>   	set_cpu_cap(c, X86_FEATURE_XTOPOLOGY);
>   
> +	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
>   	/*
>   	 * initial apic id, which also represents 32-bit extended x2apic id.
>   	 */
> @@ -52,7 +80,7 @@ int detect_extended_topology_early(struct cpuinfo_x86 *c)
>   }
>   
>   /*
> - * Check for extended topology enumeration cpuid leaf 0xb and if it
> + * Check for extended topology enumeration cpuid leaf, and if it
>    * exists, use it for populating initial_apicid and cpu topology
>    * detection.
>    */
> @@ -60,46 +88,62 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
>   {
>   #ifdef CONFIG_SMP
>   	unsigned int eax, ebx, ecx, edx, sub_index;
> -	unsigned int ht_mask_width, core_plus_mask_width;
> +	unsigned int ht_mask_width, core_plus_mask_width, die_plus_mask_width;
>   	unsigned int core_select_mask, core_level_siblings;
> +	unsigned int die_select_mask, die_level_siblings;
> +	int leaf;
>   
> -	if (detect_extended_topology_early(c) < 0)
> +	leaf = detect_extended_topology_leaf(c);
> +	if (leaf  < 0)
>   		return -1;
>   
>   	/*
>   	 * Populate HT related information from sub-leaf level 0.
>   	 */
> -	cpuid_count(0xb, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> +	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> +	c->initial_apicid = edx;
>   	core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
>   	core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
> +	die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> +	die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
>   
>   	sub_index = 1;
>   	do {
> -		cpuid_count(0xb, sub_index, &eax, &ebx, &ecx, &edx);
> +		cpuid_count(leaf, sub_index, &eax, &ebx, &ecx, &edx);
>   
>   		/*
>   		 * Check for the Core type in the implemented sub leaves.
>   		 */
>   		if (LEAFB_SUBTYPE(ecx) == CORE_TYPE) {
>   			core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> +			die_level_siblings = core_level_siblings;
>   			core_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
> -			break;
> +		}
> +		if (LEAFB_SUBTYPE(ecx) == DIE_TYPE) {
> +			die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> +			die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
>   		}
>   
>   		sub_index++;
>   	} while (LEAFB_SUBTYPE(ecx) != INVALID_TYPE);
>   
>   	core_select_mask = (~(-1 << core_plus_mask_width)) >> ht_mask_width;
> -
> -	c->cpu_core_id = apic->phys_pkg_id(c->initial_apicid, ht_mask_width)
> -						 & core_select_mask;
> -	c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid, core_plus_mask_width);
> +	die_select_mask = (~(-1 << die_plus_mask_width)) >>
> +				core_plus_mask_width;
> +
> +	c->cpu_core_id = apic->phys_pkg_id(c->initial_apicid,
> +				ht_mask_width) & core_select_mask;
> +	c->cpu_die_id = apic->phys_pkg_id(c->initial_apicid,
> +				core_plus_mask_width) & die_select_mask;
> +	c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid,
> +				die_plus_mask_width);
>   	/*
>   	 * Reinit the apicid, now that we have extended initial_apicid.
>   	 */
>   	c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);
>   
>   	c->x86_max_cores = (core_level_siblings / smp_num_siblings);
> +	c->x86_max_dies = (die_level_siblings / core_level_siblings);
>   #endif
>   	return 0;
>   }
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index ccd1f2a8e557..4250a87f57db 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -393,6 +393,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>   		int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
>   
>   		if (c->phys_proc_id == o->phys_proc_id &&
> +		    c->cpu_die_id == o->cpu_die_id &&
>   		    per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) {
>   			if (c->cpu_core_id == o->cpu_core_id)
>   				return topology_sane(c, o, "smt");
> @@ -404,6 +405,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>   		}
>   
>   	} else if (c->phys_proc_id == o->phys_proc_id &&
> +		   c->cpu_die_id == o->cpu_die_id &&
>   		   c->cpu_core_id == o->cpu_core_id) {
>   		return topology_sane(c, o, "smt");
>   	}
> @@ -461,7 +463,7 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>    */
>   static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>   {
> -	if (c->phys_proc_id == o->phys_proc_id)
> +	if (c->cpu_die_id == o->cpu_die_id)
>   		return true;
>   	return false;
>   }
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support
  2019-02-20  2:59     ` Like Xu
@ 2019-02-20  6:10       ` Len Brown
  0 siblings, 0 replies; 31+ messages in thread
From: Len Brown @ 2019-02-20  6:10 UTC (permalink / raw)
  To: Like Xu; +Cc: X86 ML, linux-kernel, Len Brown, linux-doc

On Tue, Feb 19, 2019 at 9:59 PM Like Xu <like.xu@linux.intel.com> wrote:

> > -/* leaf 0xb sub-leaf types */
> > +/* extended topology sub-leaf types */
> >   #define INVALID_TYPE        0
> >   #define SMT_TYPE    1
> >   #define CORE_TYPE   2
> > +#define DIE_TYPE     5
>
> This patch set is going to export die topology via sysfs
> while the extended topology sub-leaf type could be one of followings:
> SMT/Core/Module/Tile/Die according to CPUID.1F from SDM.
>
> Why not choose to export module/tile topology as well?

Excellent question.  (and thank you for reading the SDM:-)

While it is true that there are shipping products with
software-visible CPU modules and tiles,
they shipped before this mechanism was available.  As a result, there
are currently zero
products that use this mechanism to enumerate modules and tiles.  If
future products
have software-visible modules and tiles, and they choose to use this mechanism,
we can add support for them.  But I do not advocate adding code to the kernel
"just in case".

In contrast, the need to support multi-die/package products as soon as
possible is quite clear.

thanks,
Len Brown, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support
  2019-02-19  3:22   ` [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support Len Brown
                       ` (2 preceding siblings ...)
  2019-02-20  2:59     ` Like Xu
@ 2019-02-20 10:55     ` Peter Zijlstra
  2019-02-20 15:08       ` Len Brown
  2019-02-24 10:04     ` Brice Goglin
  4 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2019-02-20 10:55 UTC (permalink / raw)
  To: Len Brown; +Cc: x86, linux-kernel, Len Brown, linux-doc

On Mon, Feb 18, 2019 at 10:40:05PM -0500, Len Brown wrote:
> From: Len Brown <len.brown@intel.com>
> 
> Some new systems have multiple software-visible die within each package.
> The new CPUID.1F leaf can enumerate this multi-die/package topology.
> 
> CPUID.1F a super-set of the CPUID.B "Extended Toplogy Leaf",
> and a common updated routine can parse either leaf.
> 
> Legacy systems without CPUID.1F, and systems without multi-die/package
> hardware, will see no functional change from this patch series.
> 
> Multi-die/package systems will use CPUID.B before this patch,
> and CPUID.1F after this patch.  In the CPUID.B case, all die appear
> (incorrectly) to software as individual packages.  In the CPUID.1F case,
> the package id's reflect reality, and die_id's become meaningful.
> 
> Subsequent patches in this series update the kernel to be multi-die aware.
> In particular, some software needs to know the difference between
> a die-scope MSR and a package-scope MSR.

It would've been nice to have the CPUID instruction 1F leaf reference
3B-3.9 in the SDM, and maybe mention this here too.

Also, figure 8-6 uses Q,R ID, without prior mention.

> +/*
> + * Check if given CPUID extended toplogy "leaf" is implemented
> + */
> +static int check_extended_topology_leaf(int leaf)
> +{
>  	unsigned int eax, ebx, ecx, edx;
>  
> +	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> +
> +	if (ebx == 0 || (LEAFB_SUBTYPE(ecx) != SMT_TYPE))
>  		return -1;
>  
> +	return 0;
> +}
> +/*
> + * Return best CPUID Extended Toplogy Leaf supported
> + */
> +static int detect_extended_topology_leaf(struct cpuinfo_x86 *c)
> +{
> +	if (c->cpuid_level >= 0x1f)
> +		if (check_extended_topology_leaf(0x1f) == 0)
> +			return 0x1f;

Coding-style requires { } on the outer if-stmt.

>  
> +	if (c->cpuid_level >= 0xb)
> +		if (check_extended_topology_leaf(0xb) == 0)
> +			return 0xb;

idem.

> +	return -1;
> +}
> +#endif
> +
> +int detect_extended_topology_early(struct cpuinfo_x86 *c)
> +{
> +#ifdef CONFIG_SMP
> +	unsigned int eax, ebx, ecx, edx;
> +	int leaf;
> +
> +	leaf = detect_extended_topology_leaf(c);
> +	if (leaf < 0)
>  		return -1;
>  
>  	set_cpu_cap(c, X86_FEATURE_XTOPOLOGY);
>  
> +	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
>  	/*
>  	 * initial apic id, which also represents 32-bit extended x2apic id.
>  	 */
> @@ -52,7 +80,7 @@ int detect_extended_topology_early(struct cpuinfo_x86 *c)
>  }
>  
>  /*
> + * Check for extended topology enumeration cpuid leaf, and if it
>   * exists, use it for populating initial_apicid and cpu topology
>   * detection.
>   */
> @@ -60,46 +88,62 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
>  {
>  #ifdef CONFIG_SMP
>  	unsigned int eax, ebx, ecx, edx, sub_index;
> +	unsigned int ht_mask_width, core_plus_mask_width, die_plus_mask_width;
>  	unsigned int core_select_mask, core_level_siblings;
> +	unsigned int die_select_mask, die_level_siblings;
> +	int leaf;
>  
> +	leaf = detect_extended_topology_leaf(c);
> +	if (leaf  < 0)

s/  / /

>  		return -1;
>  
>  	/*
>  	 * Populate HT related information from sub-leaf level 0.
>  	 */
> +	cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
> +	c->initial_apicid = edx;
>  	core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
>  	core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
> +	die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> +	die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
>  
>  	sub_index = 1;
>  	do {
> +		cpuid_count(leaf, sub_index, &eax, &ebx, &ecx, &edx);
>  
>  		/*
>  		 * Check for the Core type in the implemented sub leaves.
>  		 */
>  		if (LEAFB_SUBTYPE(ecx) == CORE_TYPE) {
>  			core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> +			die_level_siblings = core_level_siblings;
>  			core_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
> -			break;
> +		}
> +		if (LEAFB_SUBTYPE(ecx) == DIE_TYPE) {
> +			die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
> +			die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
>  		}
>  
>  		sub_index++;
>  	} while (LEAFB_SUBTYPE(ecx) != INVALID_TYPE);

Personally I much prefer union based decoding of cpuid leafs over this
macro magic (git grep cpuid10_):

union cpuid1f_eax {
	struct {
		unsigned int x2apic_shift : 5;
	} split;
	unsigned int full;
};

union cpuid1f_ebx {
	struct {
		unsigned int nr_cpus : 16;
	} split;
	unsigned int full;
};

enum level_type_enum {
	invalid_type = 0,
	smt_type,
	core_type,
	module_type,
	tile_type,
	die_type,
};

union cpuid1f_ecx {
	struct {
		unsigned int subleaf : 8;
		unsigned int level_type : 8;
	} split;
	unsigned int full;
};

>  	core_select_mask = (~(-1 << core_plus_mask_width)) >> ht_mask_width;
> +	die_select_mask = (~(-1 << die_plus_mask_width)) >>
> +				core_plus_mask_width;
> +
> +	c->cpu_core_id = apic->phys_pkg_id(c->initial_apicid,
> +				ht_mask_width) & core_select_mask;
> +	c->cpu_die_id = apic->phys_pkg_id(c->initial_apicid,
> +				core_plus_mask_width) & die_select_mask;
> +	c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid,
> +				die_plus_mask_width);
>  	/*
>  	 * Reinit the apicid, now that we have extended initial_apicid.
>  	 */
>  	c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);
>  
>  	c->x86_max_cores = (core_level_siblings / smp_num_siblings);
> +	c->x86_max_dies = (die_level_siblings / core_level_siblings);
>  #endif
>  	return 0;
>  }


> @@ -461,7 +463,7 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>   */
>  static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>  {
> -	if (c->phys_proc_id == o->phys_proc_id)
> +	if (c->cpu_die_id == o->cpu_die_id)
>  		return true;
>  	return false;
>  }

You haven't explained, and I can't readily find it in the SDM either,
how these topology bits relate to caches and interconnects.

Will these die thingies share LLC, or will LLC be per die. Will NUMA
span dies or not.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 05/11] x86 topology: export die_siblings
  2019-02-19 19:33         ` Liang, Kan
@ 2019-02-20 10:58           ` Peter Zijlstra
  0 siblings, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2019-02-20 10:58 UTC (permalink / raw)
  To: Liang, Kan; +Cc: Brown, Len, Len Brown, x86, linux-kernel, linux-doc

On Tue, Feb 19, 2019 at 02:33:59PM -0500, Liang, Kan wrote:
> Besides the generic document, I think we should update x86 document as well,
> which is in Documentation/x86/topology.txt.
> 
> The definition of topology_core_cpumask has to be changed to per die, right?

That's what the change to match_die() did. It limits the core mask.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support
  2019-02-20 10:55     ` Peter Zijlstra
@ 2019-02-20 15:08       ` Len Brown
  2019-02-26 13:54         ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Len Brown @ 2019-02-20 15:08 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: X86 ML, linux-kernel, Len Brown, linux-doc

Thanks for the comments, Peter. I'll update the patch to address the
syntax points.  (Maybe checkpatch.pl should be updated to reflect your
preferences?).

About macros vs C.  I agree with your preference.
I used macros to be consistent with the existing code, and to be as
backport friendly as possible.
(a number of distros need to pull these patches into their supported kernels)
Sure, I'm willing to write in a cosmetic-only patch, after the
functional changes are upstream.

> It would've been nice to have the CPUID instruction 1F leaf reference
> 3B-3.9 in the SDM, and maybe mention this here too.

I didn't mention SDM sections because they change -- leaving stale
pointers in our commit messages.  The SDM is re-published 4 times per
year.

> Also, figure 8-6 uses Q,R ID, without prior mention.

Yeah, the tech-writer took my real example and turned it into a
generic example.  Probably a good idea, even if not gracefully
executed.  The point of undefined "Q" and "R" are that a new level can
be invented at any time in the future, and the existing code that
doesn't know about future levels should still function properly.

The back-story is that Leaf-B was supposed to work this way, but the
original SDM code example was hard-coded to assume 3-levels.  Plenty
of software was hard-coded and would have broken if we had actually
added new levels to Leaf-B.  And so Leaf-B had to be "forked" into
Leaf-1F, with the implicit contract that only new code that can
function properly in the face of unknown levels parses leaf-1F.

Yes, the parsing routine in the Linux Kernel will work fine in the
face of future unknown levels.  Some utilities (such as cpuid), would
have actually crashed if we added levels to Leaf-B.

> You haven't explained, and I can't readily find it in the SDM either,
> how these topology bits relate to caches and interconnects.
>
> Will these die thingies share LLC, or will LLC be per die. Will NUMA
> span dies or not.

Excellent question.
Cache enumeration in Leaf-4 is totally unchanged.
ACPI NUMA tables are totally unchanged.

From a scheduler point of view, imagine that a SKX system with 4 die
in 4 packages
was mechanically re-designed so that those 4 die resided in 2
double-sized packages.

They may have tweaked the links between the die, but logically it is
identical and compatible,
and the legacy kernel will function properly.

So the effect of Leaf B,1F is that it defines the scope of MSRs.
eg. what processors does a die-scope MSR cover.
That is why the rest of the patch is about sysfs topology, and about
package MSR scope.

Yes, there will be more exotic MSR situations in future products --
the first ones are pretty simple --
something  called a package-scope-MSR  in the SDM today becomes a
die-scope-MSR in this generation
on a multi-die/package system.

It also reflects how many packages appear in sysfs, and this can
effect licensing of some kinds of software.

thanks,
Len Brown, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 05/11] x86 topology: export die_siblings
  2019-02-19  3:22   ` [PATCH 05/11] x86 topology: export die_siblings Len Brown
  2019-02-19  3:40     ` Len Brown
  2019-02-19 16:56     ` Liang, Kan
@ 2019-02-20 21:52     ` Brice Goglin
  2019-02-21  7:41       ` Len Brown
  2 siblings, 1 reply; 31+ messages in thread
From: Brice Goglin @ 2019-02-20 21:52 UTC (permalink / raw)
  To: Len Brown, x86; +Cc: linux-kernel, Len Brown, linux-doc

Le 19/02/2019 à 04:40, Len Brown a écrit :
> From: Len Brown <len.brown@intel.com>
>
> like core_siblings, except it shows which die are in the same package.
>
> This is needed for lscpu(1) to correctly display die topology.
>
> Signed-off-by: Len Brown <len.brown@intel.com>
> Cc: linux-doc@vger.kernel.org
> Signed-off-by: Len Brown <len.brown@intel.com>
> ---
>  Documentation/cputopology.txt   | 10 ++++++++++
>  arch/x86/include/asm/smp.h      |  1 +
>  arch/x86/include/asm/topology.h |  1 +
>  arch/x86/kernel/smpboot.c       | 20 ++++++++++++++++++++
>  arch/x86/xen/smp_pv.c           |  1 +
>  drivers/base/topology.c         |  6 ++++++
>  include/linux/topology.h        |  3 +++
>  7 files changed, 42 insertions(+)
>
> diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
> index 287213b4517b..7dd2ae3df233 100644
> --- a/Documentation/cputopology.txt
> +++ b/Documentation/cputopology.txt
> @@ -56,6 +56,16 @@ core_siblings_list:
>  	human-readable list of cpuX's hardware threads within the same
>  	die_id.
>  
> +die_siblings:
> +
> +	internal kernel map of cpuX's hardware threads within the same
> +	physical_package_id.
> +
> +die_siblings_list:
> +
> +	human-readable list of cpuX's hardware threads within the same
> +	physical_package_id.
> +


Hello Len,

Patches #4 and #5 are changing the meaning the core_siblings (in the
past, it always returned all threads in the entire package). All
existing user-space tools will see each die as a separate package until
they are updated to read die_siblings too. It only matters for multi-die
CPUs when running a recent kernel with an old userspace tool, but it may
still be consider as a sysfs ABI change.

Worse, things will break again if you ever add tile_siblings for
CPUID.1f "Tiles". User-space will suddenly see 2 dies of 2 cores instead
1 die of 2 tiles of 2 cores.

I understand that this isn't easy to fix. But I want to make sure people
are aware of the meaning of this change.

The proper way to avoid this is to stop having file foo_siblings refer
to "the container of foo" instead of "foo itself" (because that
container changes when you add intermediate levels). Rename sysfs files
like below, and you don't get any breakage anymore when adding
intermediate levels:

thread_siblings -> core_threads (can we do sysfs alias or symlink to
keep the old name?)

core_siblings -> die_threads

die_siblings -> package_threads (needs an alias too)

The documentation would also be much easier to read since "die_threads"
is obviously "human-readable list of cpuX's hardware threads within the
same die_id". And no need to modify the doc anymore when adding levels :)

Brice



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 05/11] x86 topology: export die_siblings
  2019-02-20 21:52     ` Brice Goglin
@ 2019-02-21  7:41       ` Len Brown
  2019-02-21  8:38         ` Brice Goglin
  0 siblings, 1 reply; 31+ messages in thread
From: Len Brown @ 2019-02-21  7:41 UTC (permalink / raw)
  To: Brice Goglin; +Cc: X86 ML, linux-kernel, Len Brown, linux-doc

Hi Brice,
Thank you for your suggestions!

> Patches #4 and #5 are changing the meaning the core_siblings (in the
> past, it always returned all threads in the entire package). All
> existing user-space tools will see each die as a separate package until
> they are updated to read die_siblings too. It only matters for multi-die
> CPUs when running a recent kernel with an old userspace tool, but it may
> still be consider as a sysfs ABI change.

I agree.

Exhibit 1 is the "lscpu" program.

> Worse, things will break again if you ever add tile_siblings for
> CPUID.1f "Tiles". User-space will suddenly see 2 dies of 2 cores instead
> 1 die of 2 tiles of 2 cores.

Agreed, the existing naming scheme is not resilient to future additions.

> I understand that this isn't easy to fix. But I want to make sure people
> are aware of the meaning of this change.

Here is my list of applications that care about the new CPUID leaf
and the concepts of packages and die:

cpuid
lscpu
x86_energy_perf_policy
turbostat

> The proper way to avoid this is to stop having file foo_siblings refer
> to "the container of foo" instead of "foo itself" (because that
> container changes when you add intermediate levels). Rename sysfs files
> like below, and you don't get any breakage anymore when adding
> intermediate levels:
>
> thread_siblings -> core_threads (can we do sysfs alias or symlink to
> keep the old name?)
>
> core_siblings -> die_threads
>
> die_siblings -> package_threads (needs an alias too)
>
> The documentation would also be much easier to read since "die_threads"
> is obviously "human-readable list of cpuX's hardware threads within the
> same die_id". And no need to modify the doc anymore when adding levels :)

I like your idea!

Hm, I think i'd skip creating "die_siblings", as it adds to the
fragile legacy naming scheme
that we want to deprecate.

And although it is ill-defined and has a mis-leading name, I now think
it would be
better to leave "core_siblings" as defined -- a legacy synonym for
"package_threads".  Deprecate it, but keep its original definition
until it is removed.

Updated applications would use:

core_threads
die_threads
package_threads

and they'll be future proof if/when we add any new levels.

the legacy thread_siblings and core_siblings will stick around as aliases:

core_threads (thread_siblings)
die_threads
package_threads (core_siblings)

thanks!
Len Brown, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 05/11] x86 topology: export die_siblings
  2019-02-21  7:41       ` Len Brown
@ 2019-02-21  8:38         ` Brice Goglin
  0 siblings, 0 replies; 31+ messages in thread
From: Brice Goglin @ 2019-02-21  8:38 UTC (permalink / raw)
  To: Len Brown; +Cc: X86 ML, linux-kernel, Len Brown, linux-doc


Le 21/02/2019 à 08:41, Len Brown a écrit :
>
> Here is my list of applications that care about the new CPUID leaf
> and the concepts of packages and die:
>
> cpuid
> lscpu
> x86_energy_perf_policy
> turbostat


You may add hwloc/lstopo which is used by most HPC runtimes (including
your employers' OpenMP and MPI commercial products :))

Brice



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support
  2019-02-19  3:22   ` [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support Len Brown
                       ` (3 preceding siblings ...)
  2019-02-20 10:55     ` Peter Zijlstra
@ 2019-02-24 10:04     ` Brice Goglin
  2019-02-25  5:31       ` Like Xu
  2019-02-25  8:08       ` Brown, Len
  4 siblings, 2 replies; 31+ messages in thread
From: Brice Goglin @ 2019-02-24 10:04 UTC (permalink / raw)
  To: Len Brown, x86; +Cc: linux-kernel, Len Brown, linux-doc, Like Xu

Le 19/02/2019 à 04:40, Len Brown a écrit :
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index ccd1f2a8e557..4250a87f57db 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -393,6 +393,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>  		int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
>  
>  		if (c->phys_proc_id == o->phys_proc_id &&
> +		    c->cpu_die_id == o->cpu_die_id &&
>  		    per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) {
>  			if (c->cpu_core_id == o->cpu_core_id)
>  				return topology_sane(c, o, "smt");
> @@ -404,6 +405,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>  		}
>  
>  	} else if (c->phys_proc_id == o->phys_proc_id &&
> +		   c->cpu_die_id == o->cpu_die_id &&
>  		   c->cpu_core_id == o->cpu_core_id) {
>  		return topology_sane(c, o, "smt");
>  	}
> @@ -461,7 +463,7 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>   */
>  static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>  {
> -	if (c->phys_proc_id == o->phys_proc_id)
> +	if (c->cpu_die_id == o->cpu_die_id)
>  		return true;
>  	return false;
>  }

Hello Len,

I am testing your patches in a VM with CPUID.1f QEMU patches (author Like Xu is CC'ed),
booted to get 2 packages with 1 NUMA node each and 2 dies each:

-smp cpus=16,threads=2,cores=2,dies=2,sockets=2 -numa node,cpus=0-7,nodeid=0 -numa node,cpus=8-15,nodeid=1

sysfs files expose wrong information:
core_siblings contains threads of the local die AND of die with same number in other processors,
eg cpu 0-3 and 8-11 instead of 0-3 only.

The issue is that you seem to assume that die ids will always be unique across multiple packages.
Is this a valid assumption? If so, QEMU patches should be fixed.
If not, I fixed the issue by changing match_die() to check both phys_proc_id and cpu_die_id:

static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
{
	if (c->phys_proc_id == o->phys_proc_id && c->cpu_die_id == o->cpu_die_id)
 		return true;
	return false;
}

Thanks
Brice



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support
  2019-02-24 10:04     ` Brice Goglin
@ 2019-02-25  5:31       ` Like Xu
  2019-02-25  8:08       ` Brown, Len
  1 sibling, 0 replies; 31+ messages in thread
From: Like Xu @ 2019-02-25  5:31 UTC (permalink / raw)
  To: Brice Goglin, Len Brown, x86; +Cc: linux-kernel, Len Brown, linux-doc

On 2019/2/24 18:04, Brice Goglin wrote:
> Le 19/02/2019 à 04:40, Len Brown a écrit :
>> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
>> index ccd1f2a8e557..4250a87f57db 100644
>> --- a/arch/x86/kernel/smpboot.c
>> +++ b/arch/x86/kernel/smpboot.c
>> @@ -393,6 +393,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>>   		int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
>>   
>>   		if (c->phys_proc_id == o->phys_proc_id &&
>> +		    c->cpu_die_id == o->cpu_die_id &&
>>   		    per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) {
>>   			if (c->cpu_core_id == o->cpu_core_id)
>>   				return topology_sane(c, o, "smt");
>> @@ -404,6 +405,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>>   		}
>>   
>>   	} else if (c->phys_proc_id == o->phys_proc_id &&
>> +		   c->cpu_die_id == o->cpu_die_id &&
>>   		   c->cpu_core_id == o->cpu_core_id) {
>>   		return topology_sane(c, o, "smt");
>>   	}
>> @@ -461,7 +463,7 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>>    */
>>   static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>>   {
>> -	if (c->phys_proc_id == o->phys_proc_id)
>> +	if (c->cpu_die_id == o->cpu_die_id)
>>   		return true;
>>   	return false;
>>   }
> 
> Hello Len,
> 
> I am testing your patches in a VM with CPUID.1f QEMU patches (author Like Xu is CC'ed),
> booted to get 2 packages with 1 NUMA node each and 2 dies each:
> 
> -smp cpus=16,threads=2,cores=2,dies=2,sockets=2 -numa node,cpus=0-7,nodeid=0 -numa node,cpus=8-15,nodeid=1
> 
> sysfs files expose wrong information:
> core_siblings contains threads of the local die AND of die with same number in other processors,
> eg cpu 0-3 and 8-11 instead of 0-3 only.
> 
> The issue is that you seem to assume that die ids will always be unique across multiple packages.

I don't think the intel MCP topolohgy has this global uniqueness 
assumption for die_id according to CPUID.1F SDM.

> Is this a valid assumption? If so, QEMU patches should be fixed.
> If not, I fixed the issue by changing match_die() to check both phys_proc_id and cpu_die_id:
> 
> static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> {
> 	if (c->phys_proc_id == o->phys_proc_id && c->cpu_die_id == o->cpu_die_id)
>   		return true;
> 	return false;
> }
> 
> Thanks
> Brice
> 
> 
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support
  2019-02-24 10:04     ` Brice Goglin
  2019-02-25  5:31       ` Like Xu
@ 2019-02-25  8:08       ` Brown, Len
  1 sibling, 0 replies; 31+ messages in thread
From: Brown, Len @ 2019-02-25  8:08 UTC (permalink / raw)
  To: Brice Goglin, Len Brown, x86; +Cc: linux-kernel, linux-doc, Like Xu

Hi Brice,

Yes, you re-discovered the bug that Kan Liang pointed out last week.

I have updated this patch set, and the latest for testing is in my git tree on kernel.org or

https://github.com/lenb/linux.git x86

Note that I took your advice and left the core_siblings with its original definition,
And created package_threads as a synonym.  I will e-mail out the patch set again
When I do 2 more things:

1. add core_threads map to sysfs
2. replace unique_die_id with logical_die_id -- turns out it is useful for same reason as logical_package_id.

Thanks,
-Len




-----Original Message-----
From: Brice Goglin [mailto:Brice.Goglin@inria.fr] 
Sent: Sunday, February 24, 2019 5:04 AM
To: Len Brown <lenb@kernel.org>; x86@kernel.org
Cc: linux-kernel@vger.kernel.org; Brown, Len <len.brown@intel.com>; linux-doc@vger.kernel.org; Like Xu <like.xu@linux.intel.com>
Subject: Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support

Le 19/02/2019 à 04:40, Len Brown a écrit :
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c 
> index ccd1f2a8e557..4250a87f57db 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -393,6 +393,7 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>  		int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
>  
>  		if (c->phys_proc_id == o->phys_proc_id &&
> +		    c->cpu_die_id == o->cpu_die_id &&
>  		    per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2)) {
>  			if (c->cpu_core_id == o->cpu_core_id)
>  				return topology_sane(c, o, "smt"); @@ -404,6 +405,7 @@ static 
> bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>  		}
>  
>  	} else if (c->phys_proc_id == o->phys_proc_id &&
> +		   c->cpu_die_id == o->cpu_die_id &&
>  		   c->cpu_core_id == o->cpu_core_id) {
>  		return topology_sane(c, o, "smt");
>  	}
> @@ -461,7 +463,7 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>   */
>  static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)  
> {
> -	if (c->phys_proc_id == o->phys_proc_id)
> +	if (c->cpu_die_id == o->cpu_die_id)
>  		return true;
>  	return false;
>  }

Hello Len,

I am testing your patches in a VM with CPUID.1f QEMU patches (author Like Xu is CC'ed), booted to get 2 packages with 1 NUMA node each and 2 dies each:

-smp cpus=16,threads=2,cores=2,dies=2,sockets=2 -numa node,cpus=0-7,nodeid=0 -numa node,cpus=8-15,nodeid=1

sysfs files expose wrong information:
core_siblings contains threads of the local die AND of die with same number in other processors, eg cpu 0-3 and 8-11 instead of 0-3 only.

The issue is that you seem to assume that die ids will always be unique across multiple packages.
Is this a valid assumption? If so, QEMU patches should be fixed.
If not, I fixed the issue by changing match_die() to check both phys_proc_id and cpu_die_id:

static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) {
	if (c->phys_proc_id == o->phys_proc_id && c->cpu_die_id == o->cpu_die_id)
 		return true;
	return false;
}

Thanks
Brice



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support
  2019-02-20 15:08       ` Len Brown
@ 2019-02-26 13:54         ` Peter Zijlstra
  2019-02-28 15:59           ` Len Brown
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2019-02-26 13:54 UTC (permalink / raw)
  To: Len Brown; +Cc: X86 ML, linux-kernel, Len Brown, linux-doc

On Wed, Feb 20, 2019 at 10:08:48AM -0500, Len Brown wrote:
> Thanks for the comments, Peter. I'll update the patch to address the
> syntax points.  (Maybe checkpatch.pl should be updated to reflect your
> preferences?).

Don't know about checkpatch; I ignore plenty of its output. I think tglx
started a document somewhere for what tip prefers, but I'm not sure
where that went.

> About macros vs C.  I agree with your preference.
> I used macros to be consistent with the existing code, and to be as
> backport friendly as possible.
> (a number of distros need to pull these patches into their supported kernels)
> Sure, I'm willing to write in a cosmetic-only patch, after the
> functional changes are upstream.

Fair enough.

> > It would've been nice to have the CPUID instruction 1F leaf reference
> > 3B-3.9 in the SDM, and maybe mention this here too.
> 
> I didn't mention SDM sections because they change -- leaving stale
> pointers in our commit messages.  The SDM is re-published 4 times per
> year.

Yah, I know. Which is why I keep all SDMs. So if you say, book 3 section
8 of Jul'17, I can find it :-)

> > You haven't explained, and I can't readily find it in the SDM either,
> > how these topology bits relate to caches and interconnects.
> >
> > Will these die thingies share LLC, or will LLC be per die. Will NUMA
> > span dies or not.
> 
> Excellent question.
> Cache enumeration in Leaf-4 is totally unchanged.
> ACPI NUMA tables are totally unchanged.

Sure; and yet Sub-NUMA-Clustering broke stuff in interesting ways. I'm
trying to get a feel for how these levels will interact with all that.

Before that SNC stuff, caches had never spanned NODEs (and I still
think that is 'creative' at best).

> From a scheduler point of view, imagine that a SKX system with 4 die
> in 4 packages was mechanically re-designed so that those 4 die resided
> in 2 double-sized packages.
> 
> They may have tweaked the links between the die, but logically it is
> identical and compatible, and the legacy kernel will function
> properly.

This example has LLC in die and yes that works.

But I can imagine things like L2 in tile and L3 across tiles but within
DIE and then it _might_ make sense to still consider the tile for
scheduling.

Another option is having the LLC off die; also not unheard of.

And then there's many creative and slightly crazy ways this can all be
combined :/

> So the effect of Leaf B,1F is that it defines the scope of MSRs.  eg.
> what processors does a die-scope MSR cover.  That is why the rest of
> the patch is about sysfs topology, and about package MSR scope.
> 
> Yes, there will be more exotic MSR situations in future products --
> the first ones are pretty simple -- something  called a
> package-scope-MSR  in the SDM today becomes a die-scope-MSR in this
> generation on a multi-die/package system.

Yes :-(

> It also reflects how many packages appear in sysfs, and this can
> effect licensing of some kinds of software.

That's just plain insanity and we should not let that affect our sysfs
interfaces.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support
  2019-02-26 13:54         ` Peter Zijlstra
@ 2019-02-28 15:59           ` Len Brown
  2019-02-28 17:56             ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Len Brown @ 2019-02-28 15:59 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: X86 ML, linux-kernel, Len Brown, linux-doc

On Tue, Feb 26, 2019 at 8:54 AM Peter Zijlstra <peterz@infradead.org> wrote:

> > > It would've been nice to have the CPUID instruction 1F leaf reference
> > > 3B-3.9 in the SDM, and maybe mention this here too.
> >
> > I didn't mention SDM sections because they change -- leaving stale
> > pointers in our commit messages.  The SDM is re-published 4 times per
> > year.
>
> Yah, I know. Which is why I keep all SDMs. So if you say, book 3 section
> 8 of Jul'17, I can find it :-)

The SDM is like software -- usually (but not always) you are better
off with the latest version:-)

> > Cache enumeration in Leaf-4 is totally unchanged.
> > ACPI NUMA tables are totally unchanged.
>
> Sure; and yet Sub-NUMA-Clustering broke stuff in interesting ways. I'm
> trying to get a feel for how these levels will interact with all that.
>
> Before that SNC stuff, caches had never spanned NODEs (and I still
> think that is 'creative' at best).

Yeah, SNC is sort of a curve ball.  I guess it made enough stuff run better that
it is available as an option.  But it doesn't help everything, so it is disabled
by default...

I think from a scheduler point of view, sticking with the output of
CPUID.4 for the cache topology, and the ACPI tables for the
node topology/distances, is the right strategy.

> > From a scheduler point of view, imagine that a SKX system with 4 die
> > in 4 packages was mechanically re-designed so that those 4 die resided
> > in 2 double-sized packages.
> >
> > They may have tweaked the links between the die, but logically it is
> > identical and compatible, and the legacy kernel will function
> > properly.
>
> This example has LLC in die and yes that works.
>
> But I can imagine things like L2 in tile and L3 across tiles but within
> DIE and then it _might_ make sense to still consider the tile for
> scheduling.
>
> Another option is having the LLC off die; also not unheard of.
>
> And then there's many creative and slightly crazy ways this can all be
> combined :/

If any of those crazy things happen,  CPUID.B/CPUID.1F are not
going to help software understand it -- CPUID.4 and the NUMA tables
are the tool of choice.

> > So the effect of Leaf B,1F is that it defines the scope of MSRs.  eg.
> > what processors does a die-scope MSR cover.  That is why the rest of
> > the patch is about sysfs topology, and about package MSR scope.
> >
> > Yes, there will be more exotic MSR situations in future products --
> > the first ones are pretty simple -- something  called a
> > package-scope-MSR  in the SDM today becomes a die-scope-MSR in this
> > generation on a multi-die/package system.
>
> Yes :-(
>
> > It also reflects how many packages appear in sysfs, and this can
> > effect licensing of some kinds of software.
>
> That's just plain insanity and we should not let that affect our sysfs
> interfaces.

This change isn't made for compatibility with per-package licensing.
Indeed, vendors, who license  based on package-count need to
be made aware that on a system with multi-die/package, they'll
see their package count go _down_ as a result of this change.
Thankfully, I'm told that per-package licensing is quite rare --
most stuff that cares has moved to per-CPU.

I think a good semantic side effect of this series is that it
maintains the invariant that a physical package and a socket are synonymous.
While we don't use the word "socket" in Linux anymore, the industry
broadly assume that the two are synonyms.  And people expect that a
physical package really is a physical package -- you can see it,
buy it in a box, and hold it in your hand.

Functionally, the bottom line is that it allows software to discover topology
levels that previously needed to be discovered by looking up family/model,
in the past, which was sort of annoying.  The things that care are
things that care about MSR scope.   Thankfully, the list of things that care
about MSR scope is quite finite.

thanks,
-Len




--
Len Brown, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support
  2019-02-28 15:59           ` Len Brown
@ 2019-02-28 17:56             ` Peter Zijlstra
  0 siblings, 0 replies; 31+ messages in thread
From: Peter Zijlstra @ 2019-02-28 17:56 UTC (permalink / raw)
  To: Len Brown; +Cc: X86 ML, linux-kernel, Len Brown, linux-doc

On Thu, Feb 28, 2019 at 10:59:19AM -0500, Len Brown wrote:

> The SDM is like software -- usually (but not always) you are better
> off with the latest version:-)

If only it existed as a .tex file in a git repo. It has been very useful
to have older versions around a number of times.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2019-02-28 17:56 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20190219032259.12237-1-len.brown@intel.com>
2019-02-19  3:22 ` [PATCH 01/11] x86 topology: fix doc typo Len Brown
2019-02-19  3:22   ` [PATCH 02/11] topolgy: simplify cputopology.txt formatting and wording Len Brown
2019-02-19  3:40     ` Len Brown
2019-02-19 19:33     ` Randy Dunlap
2019-02-19 20:33       ` [linux-drivers-review] " Brown, Len
2019-02-19  3:22   ` [PATCH 03/11] x86 topology: Add CPUID.1F multi-die/package support Len Brown
2019-02-19  3:40     ` Len Brown
2019-02-19 16:49     ` Liang, Kan
2019-02-19 19:27       ` Brown, Len
2019-02-20  2:59     ` Like Xu
2019-02-20  6:10       ` Len Brown
2019-02-20 10:55     ` Peter Zijlstra
2019-02-20 15:08       ` Len Brown
2019-02-26 13:54         ` Peter Zijlstra
2019-02-28 15:59           ` Len Brown
2019-02-28 17:56             ` Peter Zijlstra
2019-02-24 10:04     ` Brice Goglin
2019-02-25  5:31       ` Like Xu
2019-02-25  8:08       ` Brown, Len
2019-02-19  3:22   ` [PATCH 04/11] cpu topology: export die_id Len Brown
2019-02-19  3:40     ` Len Brown
2019-02-19  3:22   ` [PATCH 05/11] x86 topology: export die_siblings Len Brown
2019-02-19  3:40     ` Len Brown
2019-02-19 16:56     ` Liang, Kan
2019-02-19 18:43       ` Brown, Len
2019-02-19 19:33         ` Liang, Kan
2019-02-20 10:58           ` Peter Zijlstra
2019-02-20 21:52     ` Brice Goglin
2019-02-21  7:41       ` Len Brown
2019-02-21  8:38         ` Brice Goglin
2019-02-19  3:40   ` [PATCH 01/11] x86 topology: fix doc typo Len Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).