linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] drivers core: Introduce CPU type sysfs interface
@ 2020-10-03  1:17 Ricardo Neri
  2020-10-03  1:17 ` [PATCH 1/4] " Ricardo Neri
                   ` (5 more replies)
  0 siblings, 6 replies; 34+ messages in thread
From: Ricardo Neri @ 2020-10-03  1:17 UTC (permalink / raw)
  To: Greg Kroah-Hartman, x86, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Rafael J. Wysocki
  Cc: Tony Luck, Len Brown, Ravi V. Shankar, linux-kernel, Ricardo Neri

Hybrid CPU topologies combine processors with more than one type of
micro-architecture. Hence, it is possible that individual CPUs support
slightly different features (e.g., performance counters) and different
performance properties. Thus, there may be user space entities interested
in knowing the topology of the system based on the types of available
CPUs.

Currently, there exists an interface for the CPU capacity (/sys/devices/
system/cpu/cpuX/cpu_capacity). However, CPU capacity does not always map
to CPU types (by the way, I will submit a separate series to bring such
interface to x86).

This series proposes the new interface /sys/devices/system/cpu/types
which, in hybrid parts, creates a subdirectory for each type of CPU.
Each subdirectory contains a CPU list and a CPU map that user space can
query.

Patch 1 of the series proposes the generic interface, with hooks
that architectures can override to suit their needs. The three patches
patches implement such interface for x86 (as per request from Boris,
I pulled patch 2 from a separate submission [1]).

Thanks and BR,
Ricardo

[1]. https://lkml.org/lkml/2020/10/2/1013

Ricardo Neri (4):
  drivers core: Introduce CPU type sysfs interface
  x86/cpu: Describe hybrid CPUs in cpuinfo_x86
  x86/cpu/intel: Add function to get name of hybrid CPU types
  x86/cpu/topology: Implement the CPU type sysfs interface

 .../ABI/testing/sysfs-devices-system-cpu      |  13 ++
 arch/x86/include/asm/intel-family.h           |   4 +
 arch/x86/include/asm/processor.h              |  13 ++
 arch/x86/include/asm/topology.h               |   2 +
 arch/x86/kernel/cpu/common.c                  |   3 +
 arch/x86/kernel/cpu/cpu.h                     |   3 +
 arch/x86/kernel/cpu/intel.c                   |  23 ++
 arch/x86/kernel/cpu/topology.c                |  23 ++
 drivers/base/cpu.c                            | 214 ++++++++++++++++++
 include/linux/cpu.h                           |  12 +
 include/linux/cpuhotplug.h                    |   1 +
 11 files changed, 311 insertions(+)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-03  1:17 [PATCH 0/4] drivers core: Introduce CPU type sysfs interface Ricardo Neri
@ 2020-10-03  1:17 ` Ricardo Neri
  2020-10-03  3:27   ` Randy Dunlap
  2020-10-03  8:53   ` Greg Kroah-Hartman
  2020-10-03  1:17 ` [PATCH 2/4] x86/cpu: Describe hybrid CPUs in cpuinfo_x86 Ricardo Neri
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 34+ messages in thread
From: Ricardo Neri @ 2020-10-03  1:17 UTC (permalink / raw)
  To: Greg Kroah-Hartman, x86, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Rafael J. Wysocki
  Cc: Tony Luck, Len Brown, Ravi V. Shankar, linux-kernel,
	Ricardo Neri, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

Hybrid CPU topologies combine CPUs of different microarchitectures in the
same die. Thus, even though the instruction set is compatible among all
CPUs, there may still be differences in features (e.g., some CPUs may
have counters that others CPU do not). There may be applications
interested in knowing the type of micro-architecture topology of the
system to make decisions about process affinity.

While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
cpu_capacity) may be used to infer the types of micro-architecture of the
CPUs in the platform, it may not be entirely accurate. For instance, two
subsets of CPUs with different types of micro-architecture may have the
same capacity due to power or thermal constraints.

Create the new directory /sys/devices/system/cpu/types. Under such
directory, create individual subdirectories for each type of CPU micro-
architecture. Each subdirectory will have cpulist and cpumap files. This
makes it convenient for user space to read all the CPUs of the same type
at once without having to inspect each CPU individually.

Implement a generic interface using weak functions that architectures can
override to indicate a) support for CPU types, b) the CPU type number, and
c) a string to identify the CPU vendor and type.

For example, an x86 system with one Intel Core and four Intel Atom CPUs
would look like this (other architectures have the hooks to use whatever
directory naming convention below "types" that meets their needs):

user@host:~$: ls /sys/devices/system/cpu/types
intel_atom_0  intel_core_0

user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
cpulist cpumap

user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
cpulist cpumap

user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
0f

user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
0-3

user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
10

user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
4

On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
created. Add a hook for this purpose.

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Tony Luck <tony.luck@intel.com>
Suggested-by: Len Brown <len.brown@intel.com> # Necessity of the interface
Suggested-by: Dave Hansen <dave.hansen@intel.com> # Details of the interface
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 .../ABI/testing/sysfs-devices-system-cpu      |  13 ++
 drivers/base/cpu.c                            | 214 ++++++++++++++++++
 include/linux/cpu.h                           |  12 +
 include/linux/cpuhotplug.h                    |   1 +
 4 files changed, 240 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index b555df825447..bcb3d7195102 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -614,3 +614,16 @@ Description:	SPURR ticks for cpuX when it was idle.
 
 		This sysfs interface exposes the number of SPURR ticks
 		for cpuX when it was idle.
+
+What:		/sys/devices/system/cpu/types
+		/sys/devices/system/cpu/types/<vendor>_<cpu_type>/cpumap
+		/sys/devices/system/cpu/types/<vendor>_<cpu_type>/cpulist
+Date:		Oct 2020
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	Types of CPUs in hybrid systems
+
+		If a platform has CPUs with more than one type of micro-
+		architecture, there is one directory for each type of CPU.
+		Inside each directory, the files cpumap and cpulist contain
+		a mask and a list representing CPUs of the same type of micro-
+		architecture. These files only contain CPUs currently online.
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index d2136ab9b14a..7e98b11b15a1 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -607,11 +607,225 @@ static void __init cpu_register_vulnerabilities(void)
 static inline void cpu_register_vulnerabilities(void) { }
 #endif
 
+static ssize_t cpulist_show(struct device *device,
+			    struct device_attribute *attr,
+			    char *buf)
+{
+	struct cpumask *mask = dev_get_drvdata(device);
+
+	if (!mask)
+		return -EINVAL;
+
+	return cpumap_print_to_pagebuf(true, buf, mask);
+}
+
+static ssize_t cpumap_show(struct device *device,
+			   struct device_attribute *attr,
+			   char *buf)
+{
+	struct cpumask *mask = dev_get_drvdata(device);
+
+	if (!mask)
+		return -EINVAL;
+
+	return cpumap_print_to_pagebuf(false, buf, mask);
+}
+
+static DEVICE_ATTR_RO(cpumap);
+static DEVICE_ATTR_RO(cpulist);
+
+static struct attribute *cpu_type_attrs[] = {
+	&dev_attr_cpumap.attr,
+	&dev_attr_cpulist.attr,
+	NULL
+};
+
+static const struct attribute_group cpu_type_attr_group = {
+	.attrs = cpu_type_attrs,
+};
+
+static const struct attribute_group *cpu_type_attr_groups[] = {
+	&cpu_type_attr_group,
+	NULL,
+};
+
+/* Root device for the cpu_type sysfs entries */
+static struct device *cpu_type_device;
+/*
+ * An array of cpu_type devices. One per each type of supported CPU micro-
+ * architecture.
+ */
+static struct device *cpu_type_devices[CPUTYPES_MAX_NR];
+
+/**
+ * arch_get_cpu_type() - Get the CPU type number
+ * @cpu:	Index of the CPU of which the index is needed
+ *
+ * Get the CPU type number of @cpu, a non-zero unsigned 32-bit number that
+ * uniquely identifies a type of CPU micro-architecture. All CPUs of the same
+ * type have the same type number. Type numbers are defined by each CPU
+ * architecture.
+ */
+u32 __weak arch_get_cpu_type(int cpu)
+{
+	return 0;
+}
+
+/**
+ * arch_has_cpu_type() - Check if CPU types are supported
+ *
+ * Returns true if the running platform supports CPU types. This is true if the
+ * platform has CPUs of more than one type of micro-architecture. Otherwise,
+ * returns false.
+ */
+bool __weak arch_has_cpu_type(void)
+{
+	return false;
+}
+
+/**
+ * arch_get_cpu_type_name() - Get the CPU type name
+ * @cpu_type:	Type of CPU micro-architecture.
+ *
+ * Returns a string name associated with the CPU micro-architecture type as
+ * indicated in @cpu_type. The format shall be <vendor>_<cpu_type>. Returns
+ * NULL if the CPU type is not known.
+ */
+const char __weak *arch_get_cpu_type_name(u32 cpu_type)
+{
+	return NULL;
+}
+
+static int cpu_type_create_device(int cpu, int idx)
+{
+	u32 cpu_type = arch_get_cpu_type(cpu);
+	struct cpumask *mask;
+	const char *name;
+	char buf[64];
+
+	mask = kzalloc(cpumask_size(), GFP_KERNEL);
+	if (!mask)
+		return -ENOMEM;
+
+	name = arch_get_cpu_type_name(cpu_type);
+	if (!name) {
+		snprintf(buf, sizeof(buf), "unknown_%u", cpu_type);
+		name = buf;
+	}
+
+	cpu_type_devices[idx] = cpu_device_create(cpu_type_device, mask,
+						  cpu_type_attr_groups, name);
+	if (IS_ERR(cpu_type_devices[idx]))
+		goto free_cpumask;
+
+	cpumask_set_cpu(cpu, mask);
+	return 0;
+
+free_cpumask:
+	kfree(mask);
+	return -ENOMEM;
+}
+
+static int cpu_type_sysfs_online(unsigned int cpu)
+{
+	u32 cpu_type = arch_get_cpu_type(cpu);
+	struct cpumask *mask;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(cpu_type_devices); i++) {
+		u32 this_cpu_type;
+
+		/*
+		 * The first devices in the array are used first. Thus, create
+		 * a new device as well as sysfs directory and for the type of
+		 * @cpu.
+		 */
+		if (!cpu_type_devices[i])
+			return cpu_type_create_device(cpu, i);
+
+		mask = dev_get_drvdata(cpu_type_devices[i]);
+		if (!mask && !cpumask_weight(mask)) {
+			/*
+			 * We should not be here. Be paranoid about
+			 * NULL pointers.
+			 */
+			dev_err(cpu_type_devices[i], "CPU mask is invalid");
+			return -EINVAL;
+		}
+
+		this_cpu_type = arch_get_cpu_type(cpumask_first(mask));
+		if (cpu_type == this_cpu_type) {
+			cpumask_set_cpu(cpu, mask);
+			return 0;
+		}
+	}
+
+	return -ENODEV;
+}
+
+static int cpu_type_sysfs_offline(unsigned int cpu)
+{
+	u32 cpu_type = arch_get_cpu_type(cpu);
+	struct cpumask *mask;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(cpu_type_devices); i++) {
+		u32 this_cpu_type;
+
+		if (!cpu_type_devices[i])
+			continue;
+
+		mask = dev_get_drvdata(cpu_type_devices[i]);
+		if (!mask && !cpumask_weight(mask)) {
+			/*
+			 * We should not be here. Be paranoid about
+			 * NULL pointers.
+			 */
+			dev_err(cpu_type_devices[i], "CPU mask is invalid");
+			continue;
+		}
+
+		this_cpu_type = arch_get_cpu_type(cpumask_first(mask));
+		if (cpu_type == this_cpu_type) {
+			cpumask_clear_cpu(cpu, mask);
+			return 0;
+		}
+	}
+
+	/*
+	 * If we are here, no matching cpu_type was found. This CPU was not
+	 * accounted for at hotplug online.
+	 */
+	pr_warn("Unexpected CPU offline!\n");
+
+	return -ENODEV;
+}
+
+static void __init cpu_type_sysfs_register(void)
+{
+	struct device *dev = cpu_subsys.dev_root;
+	int ret;
+
+	if (!arch_has_cpu_type())
+		return;
+
+	cpu_type_device = cpu_device_create(dev, NULL, NULL, "types");
+	if (IS_ERR(cpu_type_device))
+		return;
+
+	ret = cpuhp_setup_state(CPUHP_AP_BASE_CPUTYPE_ONLINE,
+				"base/cpu_type_sysfs:online",
+				cpu_type_sysfs_online, cpu_type_sysfs_offline);
+	if (ret)
+		dev_warn(dev, "failed to CPU type sysfs\n");
+}
+
 void __init cpu_dev_init(void)
 {
 	if (subsys_system_register(&cpu_subsys, cpu_root_attr_groups))
 		panic("Failed to register CPU subsystem");
 
 	cpu_dev_register_generic();
+	cpu_type_sysfs_register();
 	cpu_register_vulnerabilities();
 }
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 8aa84c052fdf..76fba70c7801 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -228,4 +228,16 @@ static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 0;
 extern bool cpu_mitigations_off(void);
 extern bool cpu_mitigations_auto_nosmt(void);
 
+#ifndef CPUTYPES_MAX_NR
+/*
+ * The maximum number of types of cpus. Architecture-specific implementations
+ * shall override this value.
+ */
+#define CPUTYPES_MAX_NR 0
+#endif
+
+extern u32 arch_get_cpu_type(int cpu);
+extern bool arch_has_cpu_type(void);
+extern const char *arch_get_cpu_type_name(u32 cpu_type);
+
 #endif /* _LINUX_CPU_H_ */
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 6f524bbf71a2..fd90ccb6652c 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -186,6 +186,7 @@ enum cpuhp_state {
 	CPUHP_AP_WATCHDOG_ONLINE,
 	CPUHP_AP_WORKQUEUE_ONLINE,
 	CPUHP_AP_RCUTREE_ONLINE,
+	CPUHP_AP_BASE_CPUTYPE_ONLINE,
 	CPUHP_AP_BASE_CACHEINFO_ONLINE,
 	CPUHP_AP_ONLINE_DYN,
 	CPUHP_AP_ONLINE_DYN_END		= CPUHP_AP_ONLINE_DYN + 30,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/4] x86/cpu: Describe hybrid CPUs in cpuinfo_x86
  2020-10-03  1:17 [PATCH 0/4] drivers core: Introduce CPU type sysfs interface Ricardo Neri
  2020-10-03  1:17 ` [PATCH 1/4] " Ricardo Neri
@ 2020-10-03  1:17 ` Ricardo Neri
  2020-10-03  4:07   ` kernel test robot
  2020-10-03  1:17 ` [PATCH 3/4] x86/cpu/intel: Add function to get name of hybrid CPU types Ricardo Neri
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 34+ messages in thread
From: Ricardo Neri @ 2020-10-03  1:17 UTC (permalink / raw)
  To: Greg Kroah-Hartman, x86, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Rafael J. Wysocki
  Cc: Tony Luck, Len Brown, Ravi V. Shankar, linux-kernel,
	Ricardo Neri, Andi Kleen, Andy Lutomirski, Dave Hansen,
	Kan Liang, Peter Zijlstra (Intel),
	Sean Christopherson, Srinivas Pandruvada

When Linux runs on Intel hybrid parts (i.e., having more than one type of
CPU in the same package), subsystems that deal with specific CPU features
may need to know the type of CPU in which they run. Instead of having each
subsystem to inspect CPUID leaves on its own, add a new member to
cpuinfo_x86 that can be queried to know the type of CPU.

Also, hybrid parts have a native model ID to uniquely identify the
micro-architecture of each CPU. Please note that the native model ID is not
related with the existing x86_model_id read from CPUID leaf 0x1.

In order to uniquely identify a CPU by type and micro-architecture, combine
the aforementioned identifiers into a single new member, x86_cpu_type.

The Intel manual (SDM) defines the CPU type and the CPU native model ID as
8-bit and 24-bit identifiers, respectively; they are packed in %eax when
read from CPUID.

Define also masks that subsystems can use to obtain the CPU type or the
native model separately. The native model ID only requires only a bit mask
as it uses the 24 least significant bits of %eax. The CPU type identifier
requires only a shift value as it uses the 8 most significant bytes of
%eax.

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
I pulled this patch from a separate series[1] as per request from Boris.

Changes wrt to such series:
 * Use cpuid_eax() instead of cpuid_count() to read %eax result from
   CPUID. (Boris)

[1]. https://lkml.org/lkml/2020/10/2/1013
---
 arch/x86/include/asm/processor.h | 13 +++++++++++++
 arch/x86/kernel/cpu/common.c     |  3 +++
 2 files changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index f88c74d7dbd4..d86cdf2b1562 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -141,6 +141,16 @@ struct cpuinfo_x86 {
 	u32			microcode;
 	/* Address space bits used by the cache internally */
 	u8			x86_cache_bits;
+	/*
+	 * In hybrid parts, there is a CPU type and a native model ID. The
+	 * CPU type (x86_cpu_type[31:24]) describes the type of micro-
+	 * architecture families. The native model ID (x86_cpu_type[23:0])
+	 * describes a specific microarchitecture version. Combining both
+	 * allows to uniquely identify a CPU.
+	 *
+	 * Please note that the native model ID is not related to x86_model.
+	 */
+	u32			x86_cpu_type;
 	unsigned		initialized : 1;
 } __randomize_layout;
 
@@ -168,6 +178,9 @@ enum cpuid_regs_idx {
 
 #define X86_VENDOR_UNKNOWN	0xff
 
+#define X86_HYBRID_CPU_TYPE_ID_SHIFT		24
+#define X86_HYBRID_CPU_NATIVE_MODEL_ID_MASK	0xffffff
+
 /*
  * capabilities of CPUs
  */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 35ad8480c464..a66c1fdc0e27 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -932,6 +932,9 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
 		c->x86_capability[CPUID_D_1_EAX] = eax;
 	}
 
+	if (cpu_has(c, X86_FEATURE_HYBRID_CPU))
+		c->x86_cpu_type = cpuid_eax(0x0000001a);
+
 	/* AMD-defined flags: level 0x80000001 */
 	eax = cpuid_eax(0x80000000);
 	c->extended_cpuid_level = eax;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/4] x86/cpu/intel: Add function to get name of hybrid CPU types
  2020-10-03  1:17 [PATCH 0/4] drivers core: Introduce CPU type sysfs interface Ricardo Neri
  2020-10-03  1:17 ` [PATCH 1/4] " Ricardo Neri
  2020-10-03  1:17 ` [PATCH 2/4] x86/cpu: Describe hybrid CPUs in cpuinfo_x86 Ricardo Neri
@ 2020-10-03  1:17 ` Ricardo Neri
  2020-10-03  1:17 ` [PATCH 4/4] x86/cpu/topology: Implement the CPU type sysfs interface Ricardo Neri
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 34+ messages in thread
From: Ricardo Neri @ 2020-10-03  1:17 UTC (permalink / raw)
  To: Greg Kroah-Hartman, x86, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Rafael J. Wysocki
  Cc: Tony Luck, Len Brown, Ravi V. Shankar, linux-kernel,
	Ricardo Neri, Andi Kleen, Dave Hansen, Kan Liang,
	Srinivas Pandruvada

Provided a human-friendly string name for each type of CPU micro-
architecture in Intel hybrid parts. This string is to be used in the
CPU type sysfs interface.

In order to uniquely identify CPUs of the same type, compose the name
string as <cpu_type_name>_<native_model_id_nr>.

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/intel-family.h |  4 ++++
 arch/x86/kernel/cpu/cpu.h           |  3 +++
 arch/x86/kernel/cpu/intel.c         | 23 +++++++++++++++++++++++
 3 files changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/intel-family.h b/arch/x86/include/asm/intel-family.h
index 5e658ba2654a..4ec2272e0049 100644
--- a/arch/x86/include/asm/intel-family.h
+++ b/arch/x86/include/asm/intel-family.h
@@ -133,4 +133,8 @@
 /* Family 5 */
 #define INTEL_FAM5_QUARK_X1000		0x09 /* Quark X1000 SoC */
 
+/* Types of CPUs in hybrid parts. */
+#define INTEL_HYBRID_TYPE_ATOM		0x20
+#define INTEL_HYBRID_TYPE_CORE		0x40
+
 #endif /* _ASM_X86_INTEL_FAMILY_H */
diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h
index 9d033693519a..b4474238e1f3 100644
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -56,8 +56,11 @@ extern __ro_after_init enum tsx_ctrl_states tsx_ctrl_state;
 extern void __init tsx_init(void);
 extern void tsx_enable(void);
 extern void tsx_disable(void);
+extern const char *intel_get_hybrid_cpu_type_name(u32 cpu_type);
 #else
 static inline void tsx_init(void) { }
+static inline const char *intel_get_hybrid_cpu_type_name(u32 cpu_type)
+{ return NULL; }
 #endif /* CONFIG_CPU_SUP_INTEL */
 
 extern void get_cpu_cap(struct cpuinfo_x86 *c);
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 59a1e3ce3f14..e1dee382cf98 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1191,3 +1191,26 @@ void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
 	cpu_model_supports_sld = true;
 	split_lock_setup();
 }
+
+static char hybrid_name[64];
+
+const char *intel_get_hybrid_cpu_type_name(u32 cpu_type)
+{
+	u32 native_model_id = cpu_type & X86_HYBRID_CPU_NATIVE_MODEL_ID_MASK;
+	u8 type = cpu_type >> X86_HYBRID_CPU_TYPE_ID_SHIFT;
+
+	switch (type) {
+	case INTEL_HYBRID_TYPE_ATOM:
+		snprintf(hybrid_name, sizeof(hybrid_name), "intel_atom_%u",
+			 native_model_id);
+		break;
+	case INTEL_HYBRID_TYPE_CORE:
+		snprintf(hybrid_name, sizeof(hybrid_name), "intel_core_%u",
+			 native_model_id);
+		break;
+	default:
+		return NULL;
+	}
+
+	return hybrid_name;
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 4/4] x86/cpu/topology: Implement the CPU type sysfs interface
  2020-10-03  1:17 [PATCH 0/4] drivers core: Introduce CPU type sysfs interface Ricardo Neri
                   ` (2 preceding siblings ...)
  2020-10-03  1:17 ` [PATCH 3/4] x86/cpu/intel: Add function to get name of hybrid CPU types Ricardo Neri
@ 2020-10-03  1:17 ` Ricardo Neri
  2020-10-03  3:33   ` Randy Dunlap
                     ` (2 more replies)
  2020-10-03  8:49 ` [PATCH 0/4] drivers core: Introduce " Borislav Petkov
  2020-10-06  8:51 ` Qais Yousef
  5 siblings, 3 replies; 34+ messages in thread
From: Ricardo Neri @ 2020-10-03  1:17 UTC (permalink / raw)
  To: Greg Kroah-Hartman, x86, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Rafael J. Wysocki
  Cc: Tony Luck, Len Brown, Ravi V. Shankar, linux-kernel,
	Ricardo Neri, Andi Kleen, Dave Hansen, Kan Liang,
	Srinivas Pandruvada

Recent Intel processors combine CPUs with different types of micro-
architecture in the same package. There may be applications interested in
knowing the type topology of the system. For instance, it can be used to
to determine which subsets of CPUs share a common feature.

Implement cpu_type sysfs interfaces for Intel processors.

For example, in a system with four Intel Atom CPUs and one Intel Core CPU,
these entries look as below. In this example, the native model IDs for
both types of CPUs are 0:

user@host:~$: ls /sys/devices/system/cpu/types
intel_atom_0 intel_core_0

user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
cpulist cpumap

user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
cpulist cpumap

user@host:~$ cat /sys/devices/system/cpu/types/intel_atom/cpumap
0f

user@host:~$ cat /sys/devices/system/cpu/types/intel_atom/cpulist
0-3

user@nost:~$ cat /sys/devices/system/cpu/types/intel_core/cpumap
10

user@host:~$ cat /sys/devices/system/cpu/types/intel_core/cpulist
4

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Tony Luck <tony.luck@intel.com>
Suggested-by: Len Brown <len.brown@intel.com> # Necessity of the interface
Suggested-by: Dave Hansen <dave.hansen@intel.com> # Details of the interface
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
---
 arch/x86/include/asm/topology.h |  2 ++
 arch/x86/kernel/cpu/topology.c  | 23 +++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index f4234575f3fd..d4a3e1ce8338 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -218,4 +218,6 @@ static inline void arch_set_max_freq_ratio(bool turbo_disabled)
 }
 #endif
 
+#define CPUTYPES_MAX_NR 2
+
 #endif /* _ASM_X86_TOPOLOGY_H */
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index d3a0791bc052..709fc473f905 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -153,3 +153,26 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
 #endif
 	return 0;
 }
+
+u32 arch_get_cpu_type(int cpu)
+{
+	struct cpuinfo_x86 *c = &cpu_data(cpu);
+
+	if (cpu < 0 || cpu >= nr_cpu_ids)
+		return 0;
+
+	return c->x86_cpu_type;
+}
+
+bool arch_has_cpu_type(void)
+{
+	return boot_cpu_has(X86_FEATURE_HYBRID_CPU);
+}
+
+const char *arch_get_cpu_type_name(u32 cpu_type)
+{
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+		return intel_get_hybrid_cpu_type_name(cpu_type);
+
+	return NULL;
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-03  1:17 ` [PATCH 1/4] " Ricardo Neri
@ 2020-10-03  3:27   ` Randy Dunlap
  2020-10-06  1:15     ` Ricardo Neri
  2020-10-03  8:53   ` Greg Kroah-Hartman
  1 sibling, 1 reply; 34+ messages in thread
From: Randy Dunlap @ 2020-10-03  3:27 UTC (permalink / raw)
  To: Ricardo Neri, Greg Kroah-Hartman, x86, Borislav Petkov,
	Ingo Molnar, Thomas Gleixner, Rafael J. Wysocki
  Cc: Tony Luck, Len Brown, Ravi V. Shankar, linux-kernel, Andi Kleen,
	Dave Hansen, Gautham R. Shenoy, Kan Liang, Srinivas Pandruvada

On 10/2/20 6:17 PM, Ricardo Neri wrote:
> +/**
> + * arch_get_cpu_type() - Get the CPU type number
> + * @cpu:	Index of the CPU of which the index is needed
> + *
> + * Get the CPU type number of @cpu, a non-zero unsigned 32-bit number that

Are you sure that @cpu is non-zero?


> + * uniquely identifies a type of CPU micro-architecture. All CPUs of the same
> + * type have the same type number. Type numbers are defined by each CPU
> + * architecture.
> + */
> +u32 __weak arch_get_cpu_type(int cpu)
> +{
> +	return 0;
> +}

arch_get_cpu_type() in patch 4/4 allows @cpu to be 0.


-- 
~Randy


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/4] x86/cpu/topology: Implement the CPU type sysfs interface
  2020-10-03  1:17 ` [PATCH 4/4] x86/cpu/topology: Implement the CPU type sysfs interface Ricardo Neri
@ 2020-10-03  3:33   ` Randy Dunlap
  2020-10-03  5:28   ` kernel test robot
  2020-10-03  8:55   ` Greg Kroah-Hartman
  2 siblings, 0 replies; 34+ messages in thread
From: Randy Dunlap @ 2020-10-03  3:33 UTC (permalink / raw)
  To: Ricardo Neri, Greg Kroah-Hartman, x86, Borislav Petkov,
	Ingo Molnar, Thomas Gleixner, Rafael J. Wysocki
  Cc: Tony Luck, Len Brown, Ravi V. Shankar, linux-kernel, Andi Kleen,
	Dave Hansen, Kan Liang, Srinivas Pandruvada

On 10/2/20 6:17 PM, Ricardo Neri wrote:
> +u32 arch_get_cpu_type(int cpu)
> +{
> +	struct cpuinfo_x86 *c = &cpu_data(cpu);
> +
> +	if (cpu < 0 || cpu >= nr_cpu_ids)
> +		return 0;
> +
> +	return c->x86_cpu_type;> +}

Hi,

Consider using
#include <linux/nospec.h>
and array_index_nospec() to avoid speculation problems on cpu_data.

cheers.
-- 
~Randy


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/4] x86/cpu: Describe hybrid CPUs in cpuinfo_x86
  2020-10-03  1:17 ` [PATCH 2/4] x86/cpu: Describe hybrid CPUs in cpuinfo_x86 Ricardo Neri
@ 2020-10-03  4:07   ` kernel test robot
  0 siblings, 0 replies; 34+ messages in thread
From: kernel test robot @ 2020-10-03  4:07 UTC (permalink / raw)
  To: Ricardo Neri, Greg Kroah-Hartman, x86, Borislav Petkov,
	Ingo Molnar, Thomas Gleixner, Rafael J. Wysocki
  Cc: kbuild-all, clang-built-linux, Tony Luck, Len Brown,
	Ravi V. Shankar, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 6730 bytes --]

Hi Ricardo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/x86/core]
[also build test ERROR on tip/master driver-core/driver-core-testing linus/master v5.9-rc7 next-20201002]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Ricardo-Neri/drivers-core-Introduce-CPU-type-sysfs-interface/20201003-091754
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 238c91115cd05c71447ea071624a4c9fe661f970
config: x86_64-randconfig-a012-20201002 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project bcd05599d0e53977a963799d6ee4f6e0bc21331b)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/ffe255e2342693ca1a8d96d052c903824595fde8
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Ricardo-Neri/drivers-core-Introduce-CPU-type-sysfs-interface/20201003-091754
        git checkout ffe255e2342693ca1a8d96d052c903824595fde8
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
           if (cpu_has(c, X86_FEATURE_HYBRID_CPU))
                          ^
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/common.c:934:17: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
   fatal error: too many errors emitted, stopping now [-ferror-limit=]
   20 errors generated.

vim +/X86_FEATURE_HYBRID_CPU +934 arch/x86/kernel/cpu/common.c

   896	
   897	void get_cpu_cap(struct cpuinfo_x86 *c)
   898	{
   899		u32 eax, ebx, ecx, edx;
   900	
   901		/* Intel-defined flags: level 0x00000001 */
   902		if (c->cpuid_level >= 0x00000001) {
   903			cpuid(0x00000001, &eax, &ebx, &ecx, &edx);
   904	
   905			c->x86_capability[CPUID_1_ECX] = ecx;
   906			c->x86_capability[CPUID_1_EDX] = edx;
   907		}
   908	
   909		/* Thermal and Power Management Leaf: level 0x00000006 (eax) */
   910		if (c->cpuid_level >= 0x00000006)
   911			c->x86_capability[CPUID_6_EAX] = cpuid_eax(0x00000006);
   912	
   913		/* Additional Intel-defined flags: level 0x00000007 */
   914		if (c->cpuid_level >= 0x00000007) {
   915			cpuid_count(0x00000007, 0, &eax, &ebx, &ecx, &edx);
   916			c->x86_capability[CPUID_7_0_EBX] = ebx;
   917			c->x86_capability[CPUID_7_ECX] = ecx;
   918			c->x86_capability[CPUID_7_EDX] = edx;
   919	
   920			/* Check valid sub-leaf index before accessing it */
   921			if (eax >= 1) {
   922				cpuid_count(0x00000007, 1, &eax, &ebx, &ecx, &edx);
   923				c->x86_capability[CPUID_7_1_EAX] = eax;
   924			}
   925		}
   926	
   927		/* Extended state features: level 0x0000000d */
   928		if (c->cpuid_level >= 0x0000000d) {
   929			cpuid_count(0x0000000d, 1, &eax, &ebx, &ecx, &edx);
   930	
   931			c->x86_capability[CPUID_D_1_EAX] = eax;
   932		}
   933	
 > 934		if (cpu_has(c, X86_FEATURE_HYBRID_CPU))
   935			c->x86_cpu_type = cpuid_eax(0x0000001a);
   936	
   937		/* AMD-defined flags: level 0x80000001 */
   938		eax = cpuid_eax(0x80000000);
   939		c->extended_cpuid_level = eax;
   940	
   941		if ((eax & 0xffff0000) == 0x80000000) {
   942			if (eax >= 0x80000001) {
   943				cpuid(0x80000001, &eax, &ebx, &ecx, &edx);
   944	
   945				c->x86_capability[CPUID_8000_0001_ECX] = ecx;
   946				c->x86_capability[CPUID_8000_0001_EDX] = edx;
   947			}
   948		}
   949	
   950		if (c->extended_cpuid_level >= 0x80000007) {
   951			cpuid(0x80000007, &eax, &ebx, &ecx, &edx);
   952	
   953			c->x86_capability[CPUID_8000_0007_EBX] = ebx;
   954			c->x86_power = edx;
   955		}
   956	
   957		if (c->extended_cpuid_level >= 0x80000008) {
   958			cpuid(0x80000008, &eax, &ebx, &ecx, &edx);
   959			c->x86_capability[CPUID_8000_0008_EBX] = ebx;
   960		}
   961	
   962		if (c->extended_cpuid_level >= 0x8000000a)
   963			c->x86_capability[CPUID_8000_000A_EDX] = cpuid_edx(0x8000000a);
   964	
   965		init_scattered_cpuid_features(c);
   966		init_speculation_control(c);
   967	
   968		/*
   969		 * Clear/Set all flags overridden by options, after probe.
   970		 * This needs to happen each time we re-probe, which may happen
   971		 * several times during CPU initialization.
   972		 */
   973		apply_forced_caps(c);
   974	}
   975	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34752 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/4] x86/cpu/topology: Implement the CPU type sysfs interface
  2020-10-03  1:17 ` [PATCH 4/4] x86/cpu/topology: Implement the CPU type sysfs interface Ricardo Neri
  2020-10-03  3:33   ` Randy Dunlap
@ 2020-10-03  5:28   ` kernel test robot
  2020-10-03  8:55   ` Greg Kroah-Hartman
  2 siblings, 0 replies; 34+ messages in thread
From: kernel test robot @ 2020-10-03  5:28 UTC (permalink / raw)
  To: Ricardo Neri, Greg Kroah-Hartman, x86, Borislav Petkov,
	Ingo Molnar, Thomas Gleixner, Rafael J. Wysocki
  Cc: kbuild-all, clang-built-linux, Tony Luck, Len Brown,
	Ravi V. Shankar, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4164 bytes --]

Hi Ricardo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/x86/core]
[also build test ERROR on tip/master driver-core/driver-core-testing linus/master v5.9-rc7 next-20201002]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Ricardo-Neri/drivers-core-Introduce-CPU-type-sysfs-interface/20201003-091754
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 238c91115cd05c71447ea071624a4c9fe661f970
config: x86_64-randconfig-a012-20201002 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project bcd05599d0e53977a963799d6ee4f6e0bc21331b)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install x86_64 cross compiling tool for clang build
        # apt-get install binutils-x86-64-linux-gnu
        # https://github.com/0day-ci/linux/commit/6e37c052cde780c58a9dd815e667d89538e30579
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Ricardo-Neri/drivers-core-Introduce-CPU-type-sysfs-interface/20201003-091754
        git checkout 6e37c052cde780c58a9dd815e667d89538e30579
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
           return boot_cpu_has(X86_FEATURE_HYBRID_CPU);
                               ^
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
>> arch/x86/kernel/cpu/topology.c:169:22: error: use of undeclared identifier 'X86_FEATURE_HYBRID_CPU'
   fatal error: too many errors emitted, stopping now [-ferror-limit=]
   20 errors generated.

vim +/X86_FEATURE_HYBRID_CPU +169 arch/x86/kernel/cpu/topology.c

   166	
   167	bool arch_has_cpu_type(void)
   168	{
 > 169		return boot_cpu_has(X86_FEATURE_HYBRID_CPU);
   170	}
   171	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34752 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/4] drivers core: Introduce CPU type sysfs interface
  2020-10-03  1:17 [PATCH 0/4] drivers core: Introduce CPU type sysfs interface Ricardo Neri
                   ` (3 preceding siblings ...)
  2020-10-03  1:17 ` [PATCH 4/4] x86/cpu/topology: Implement the CPU type sysfs interface Ricardo Neri
@ 2020-10-03  8:49 ` Borislav Petkov
  2020-10-06  0:27   ` Ricardo Neri
  2020-10-06  8:51 ` Qais Yousef
  5 siblings, 1 reply; 34+ messages in thread
From: Borislav Petkov @ 2020-10-03  8:49 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Greg Kroah-Hartman, x86, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel

On Fri, Oct 02, 2020 at 06:17:41PM -0700, Ricardo Neri wrote:
> Patch 1 of the series proposes the generic interface, with hooks
> that architectures can override to suit their needs. The three patches
> patches implement such interface for x86 (as per request from Boris,
> I pulled patch 2 from a separate submission [1]).

So I ask you to show me the whole thing, how this is supposed to be used
in a *real* use case and you're sending me a couple of patches which
report these heterogeneous or whatever they're gonna be called CPUs.

Are you telling me that all this development effort was done so that
you can report heterogeneity in sysfs? Or you just had to come up with
*something*?

Let me try again: please show me the *big* *picture* with all the code
how this is supposed to be used. In the patches I read a bunch of "may"
formulations of what is possible and what userspace could do and so on.

Not that - show me the *full* and *real* use cases which you are
enabling and which justify all that churn. Instead of leaving it all to
userspace CPUID and the kernel not caring one bit.

Does that make more sense?

> [1]. https://lkml.org/lkml/2020/10/2/1013

For supplying links, we use lore.kernel.org/r/<message-id> solely.
Please use that from now on.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-03  1:17 ` [PATCH 1/4] " Ricardo Neri
  2020-10-03  3:27   ` Randy Dunlap
@ 2020-10-03  8:53   ` Greg Kroah-Hartman
  2020-10-03 11:05     ` Greg Kroah-Hartman
  2020-10-06  0:57     ` Ricardo Neri
  1 sibling, 2 replies; 34+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-03  8:53 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
> Hybrid CPU topologies combine CPUs of different microarchitectures in the
> same die. Thus, even though the instruction set is compatible among all
> CPUs, there may still be differences in features (e.g., some CPUs may
> have counters that others CPU do not). There may be applications
> interested in knowing the type of micro-architecture topology of the
> system to make decisions about process affinity.
> 
> While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> cpu_capacity) may be used to infer the types of micro-architecture of the
> CPUs in the platform, it may not be entirely accurate. For instance, two
> subsets of CPUs with different types of micro-architecture may have the
> same capacity due to power or thermal constraints.
> 
> Create the new directory /sys/devices/system/cpu/types. Under such
> directory, create individual subdirectories for each type of CPU micro-
> architecture. Each subdirectory will have cpulist and cpumap files. This
> makes it convenient for user space to read all the CPUs of the same type
> at once without having to inspect each CPU individually.
> 
> Implement a generic interface using weak functions that architectures can
> override to indicate a) support for CPU types, b) the CPU type number, and
> c) a string to identify the CPU vendor and type.
> 
> For example, an x86 system with one Intel Core and four Intel Atom CPUs
> would look like this (other architectures have the hooks to use whatever
> directory naming convention below "types" that meets their needs):
> 
> user@host:~$: ls /sys/devices/system/cpu/types
> intel_atom_0  intel_core_0
> 
> user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> cpulist cpumap
> 
> user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> cpulist cpumap
> 
> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> 0f
> 
> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> 0-3
> 
> user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> 10
> 
> user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> 4

The output of 'tree' sometimes makes it easier to see here, or:
	grep -R . *
also works well.

> On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> created. Add a hook for this purpose.

Why should these not show up if the system is not "hybrid"?

> 
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Dave Hansen <dave.hansen@intel.com>
> Cc: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> Cc: Kan Liang <kan.liang@linux.intel.com>
> Cc: Len Brown <len.brown@intel.com>
> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: linux-kernel@vger.kernel.org
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Suggested-by: Len Brown <len.brown@intel.com> # Necessity of the interface
> Suggested-by: Dave Hansen <dave.hansen@intel.com> # Details of the interface
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> ---
>  .../ABI/testing/sysfs-devices-system-cpu      |  13 ++
>  drivers/base/cpu.c                            | 214 ++++++++++++++++++
>  include/linux/cpu.h                           |  12 +
>  include/linux/cpuhotplug.h                    |   1 +
>  4 files changed, 240 insertions(+)
> 
> diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
> index b555df825447..bcb3d7195102 100644
> --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> @@ -614,3 +614,16 @@ Description:	SPURR ticks for cpuX when it was idle.
>  
>  		This sysfs interface exposes the number of SPURR ticks
>  		for cpuX when it was idle.
> +
> +What:		/sys/devices/system/cpu/types
> +		/sys/devices/system/cpu/types/<vendor>_<cpu_type>/cpumap
> +		/sys/devices/system/cpu/types/<vendor>_<cpu_type>/cpulist

Please split these up into different entries, because:

> +Date:		Oct 2020
> +Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
> +Description:	Types of CPUs in hybrid systems
> +
> +		If a platform has CPUs with more than one type of micro-
> +		architecture, there is one directory for each type of CPU.
> +		Inside each directory, the files cpumap and cpulist contain
> +		a mask and a list representing CPUs of the same type of micro-
> +		architecture. These files only contain CPUs currently online.

You don't describe what mask and list actualy represent.  I have no clue
what they mean, so odds are, others will not either.

> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> index d2136ab9b14a..7e98b11b15a1 100644
> --- a/drivers/base/cpu.c
> +++ b/drivers/base/cpu.c
> @@ -607,11 +607,225 @@ static void __init cpu_register_vulnerabilities(void)
>  static inline void cpu_register_vulnerabilities(void) { }
>  #endif
>  
> +static ssize_t cpulist_show(struct device *device,
> +			    struct device_attribute *attr,
> +			    char *buf)
> +{
> +	struct cpumask *mask = dev_get_drvdata(device);
> +
> +	if (!mask)
> +		return -EINVAL;

How can mask be NULL?

> +
> +	return cpumap_print_to_pagebuf(true, buf, mask);
> +}
> +
> +static ssize_t cpumap_show(struct device *device,
> +			   struct device_attribute *attr,
> +			   char *buf)
> +{
> +	struct cpumask *mask = dev_get_drvdata(device);
> +
> +	if (!mask)
> +		return -EINVAL;

Again, how can that be true?

> +
> +	return cpumap_print_to_pagebuf(false, buf, mask);
> +}
> +
> +static DEVICE_ATTR_RO(cpumap);
> +static DEVICE_ATTR_RO(cpulist);
> +
> +static struct attribute *cpu_type_attrs[] = {
> +	&dev_attr_cpumap.attr,
> +	&dev_attr_cpulist.attr,
> +	NULL
> +};
> +
> +static const struct attribute_group cpu_type_attr_group = {
> +	.attrs = cpu_type_attrs,
> +};
> +
> +static const struct attribute_group *cpu_type_attr_groups[] = {
> +	&cpu_type_attr_group,
> +	NULL,
> +};

ATTRIBUTE_GROUPS()?

> +
> +/* Root device for the cpu_type sysfs entries */
> +static struct device *cpu_type_device;
> +/*

No extra blank line?

> + * An array of cpu_type devices. One per each type of supported CPU micro-
> + * architecture.
> + */
> +static struct device *cpu_type_devices[CPUTYPES_MAX_NR];
> +
> +/**
> + * arch_get_cpu_type() - Get the CPU type number
> + * @cpu:	Index of the CPU of which the index is needed
> + *
> + * Get the CPU type number of @cpu, a non-zero unsigned 32-bit number that
> + * uniquely identifies a type of CPU micro-architecture. All CPUs of the same
> + * type have the same type number. Type numbers are defined by each CPU
> + * architecture.
> + */
> +u32 __weak arch_get_cpu_type(int cpu)

Can you just have this "cpu type" be an enumerated type so we have a
clue as to what they really are?  Each arch can then define it, right?

> +{
> +	return 0;
> +}
> +
> +/**
> + * arch_has_cpu_type() - Check if CPU types are supported
> + *
> + * Returns true if the running platform supports CPU types. This is true if the
> + * platform has CPUs of more than one type of micro-architecture. Otherwise,
> + * returns false.
> + */
> +bool __weak arch_has_cpu_type(void)
> +{
> +	return false;
> +}
> +
> +/**
> + * arch_get_cpu_type_name() - Get the CPU type name
> + * @cpu_type:	Type of CPU micro-architecture.
> + *
> + * Returns a string name associated with the CPU micro-architecture type as
> + * indicated in @cpu_type. The format shall be <vendor>_<cpu_type>. Returns
> + * NULL if the CPU type is not known.
> + */
> +const char __weak *arch_get_cpu_type_name(u32 cpu_type)
> +{
> +	return NULL;
> +}

Why is vendor part of this?  Shouldn't it just be arch?

I say this as "vendor" is kind of "interesting" when it comes to other
arches...

Speaking of other arches, we all know that other arches have this
feature as well, have you worked with any other groups to verify that
this interface will also work with them?

> +
> +static int cpu_type_create_device(int cpu, int idx)
> +{
> +	u32 cpu_type = arch_get_cpu_type(cpu);
> +	struct cpumask *mask;
> +	const char *name;
> +	char buf[64];
> +
> +	mask = kzalloc(cpumask_size(), GFP_KERNEL);
> +	if (!mask)
> +		return -ENOMEM;
> +
> +	name = arch_get_cpu_type_name(cpu_type);
> +	if (!name) {
> +		snprintf(buf, sizeof(buf), "unknown_%u", cpu_type);

If there is no name, why are you still making one up?

Shouldn't you just not create it at all?

> +		name = buf;
> +	}
> +
> +	cpu_type_devices[idx] = cpu_device_create(cpu_type_device, mask,
> +						  cpu_type_attr_groups, name);
> +	if (IS_ERR(cpu_type_devices[idx]))
> +		goto free_cpumask;
> +
> +	cpumask_set_cpu(cpu, mask);
> +	return 0;
> +
> +free_cpumask:
> +	kfree(mask);
> +	return -ENOMEM;
> +}
> +
> +static int cpu_type_sysfs_online(unsigned int cpu)
> +{
> +	u32 cpu_type = arch_get_cpu_type(cpu);
> +	struct cpumask *mask;
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(cpu_type_devices); i++) {
> +		u32 this_cpu_type;
> +
> +		/*
> +		 * The first devices in the array are used first. Thus, create
> +		 * a new device as well as sysfs directory and for the type of
> +		 * @cpu.
> +		 */
> +		if (!cpu_type_devices[i])
> +			return cpu_type_create_device(cpu, i);
> +
> +		mask = dev_get_drvdata(cpu_type_devices[i]);
> +		if (!mask && !cpumask_weight(mask)) {
> +			/*
> +			 * We should not be here. Be paranoid about
> +			 * NULL pointers.
> +			 */
> +			dev_err(cpu_type_devices[i], "CPU mask is invalid");
> +			return -EINVAL;
> +		}
> +
> +		this_cpu_type = arch_get_cpu_type(cpumask_first(mask));
> +		if (cpu_type == this_cpu_type) {
> +			cpumask_set_cpu(cpu, mask);
> +			return 0;
> +		}
> +	}
> +
> +	return -ENODEV;
> +}
> +
> +static int cpu_type_sysfs_offline(unsigned int cpu)
> +{
> +	u32 cpu_type = arch_get_cpu_type(cpu);
> +	struct cpumask *mask;
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(cpu_type_devices); i++) {
> +		u32 this_cpu_type;
> +
> +		if (!cpu_type_devices[i])
> +			continue;
> +
> +		mask = dev_get_drvdata(cpu_type_devices[i]);
> +		if (!mask && !cpumask_weight(mask)) {
> +			/*
> +			 * We should not be here. Be paranoid about
> +			 * NULL pointers.
> +			 */
> +			dev_err(cpu_type_devices[i], "CPU mask is invalid");
> +			continue;
> +		}
> +
> +		this_cpu_type = arch_get_cpu_type(cpumask_first(mask));
> +		if (cpu_type == this_cpu_type) {
> +			cpumask_clear_cpu(cpu, mask);
> +			return 0;
> +		}
> +	}
> +
> +	/*
> +	 * If we are here, no matching cpu_type was found. This CPU was not
> +	 * accounted for at hotplug online.
> +	 */
> +	pr_warn("Unexpected CPU offline!\n");
> +
> +	return -ENODEV;
> +}
> +
> +static void __init cpu_type_sysfs_register(void)
> +{
> +	struct device *dev = cpu_subsys.dev_root;
> +	int ret;
> +
> +	if (!arch_has_cpu_type())
> +		return;
> +
> +	cpu_type_device = cpu_device_create(dev, NULL, NULL, "types");

This feels like an abuse of cpu_device_create(), doesn't it?

> +	if (IS_ERR(cpu_type_device))
> +		return;
> +
> +	ret = cpuhp_setup_state(CPUHP_AP_BASE_CPUTYPE_ONLINE,
> +				"base/cpu_type_sysfs:online",
> +				cpu_type_sysfs_online, cpu_type_sysfs_offline);
> +	if (ret)
> +		dev_warn(dev, "failed to CPU type sysfs\n");
> +}
> +
>  void __init cpu_dev_init(void)
>  {
>  	if (subsys_system_register(&cpu_subsys, cpu_root_attr_groups))
>  		panic("Failed to register CPU subsystem");
>  
>  	cpu_dev_register_generic();
> +	cpu_type_sysfs_register();
>  	cpu_register_vulnerabilities();
>  }
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index 8aa84c052fdf..76fba70c7801 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -228,4 +228,16 @@ static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 0;
>  extern bool cpu_mitigations_off(void);
>  extern bool cpu_mitigations_auto_nosmt(void);
>  
> +#ifndef CPUTYPES_MAX_NR
> +/*
> + * The maximum number of types of cpus. Architecture-specific implementations
> + * shall override this value.
> + */
> +#define CPUTYPES_MAX_NR 0
> +#endif
> +
> +extern u32 arch_get_cpu_type(int cpu);
> +extern bool arch_has_cpu_type(void);
> +extern const char *arch_get_cpu_type_name(u32 cpu_type);
> +
>  #endif /* _LINUX_CPU_H_ */
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index 6f524bbf71a2..fd90ccb6652c 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -186,6 +186,7 @@ enum cpuhp_state {
>  	CPUHP_AP_WATCHDOG_ONLINE,
>  	CPUHP_AP_WORKQUEUE_ONLINE,
>  	CPUHP_AP_RCUTREE_ONLINE,
> +	CPUHP_AP_BASE_CPUTYPE_ONLINE,
>  	CPUHP_AP_BASE_CACHEINFO_ONLINE,

I'm all for messing with enumerated type values, but why put it in this
part of the list and not right before the next one?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/4] x86/cpu/topology: Implement the CPU type sysfs interface
  2020-10-03  1:17 ` [PATCH 4/4] x86/cpu/topology: Implement the CPU type sysfs interface Ricardo Neri
  2020-10-03  3:33   ` Randy Dunlap
  2020-10-03  5:28   ` kernel test robot
@ 2020-10-03  8:55   ` Greg Kroah-Hartman
  2020-10-06  1:05     ` Ricardo Neri
  2 siblings, 1 reply; 34+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-03  8:55 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Kan Liang,
	Srinivas Pandruvada

On Fri, Oct 02, 2020 at 06:17:45PM -0700, Ricardo Neri wrote:
> Recent Intel processors combine CPUs with different types of micro-
> architecture in the same package. There may be applications interested in
> knowing the type topology of the system. For instance, it can be used to
> to determine which subsets of CPUs share a common feature.
> 
> Implement cpu_type sysfs interfaces for Intel processors.
> 
> For example, in a system with four Intel Atom CPUs and one Intel Core CPU,
> these entries look as below. In this example, the native model IDs for
> both types of CPUs are 0:
> 
> user@host:~$: ls /sys/devices/system/cpu/types
> intel_atom_0 intel_core_0
> 
> user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> cpulist cpumap
> 
> user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> cpulist cpumap
> 
> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom/cpumap
> 0f
> 
> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom/cpulist
> 0-3
> 
> user@nost:~$ cat /sys/devices/system/cpu/types/intel_core/cpumap
> 10
> 
> user@host:~$ cat /sys/devices/system/cpu/types/intel_core/cpulist
> 4

You used the same changelog text here as you did in patch 1/4, why?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-03  8:53   ` Greg Kroah-Hartman
@ 2020-10-03 11:05     ` Greg Kroah-Hartman
  2020-10-06  1:08       ` Ricardo Neri
  2020-10-06  0:57     ` Ricardo Neri
  1 sibling, 1 reply; 34+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-03 11:05 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
> > +/**
> > + * arch_get_cpu_type_name() - Get the CPU type name
> > + * @cpu_type:	Type of CPU micro-architecture.
> > + *
> > + * Returns a string name associated with the CPU micro-architecture type as
> > + * indicated in @cpu_type. The format shall be <vendor>_<cpu_type>. Returns
> > + * NULL if the CPU type is not known.
> > + */
> > +const char __weak *arch_get_cpu_type_name(u32 cpu_type)
> > +{
> > +	return NULL;
> > +}
> 
> Why is vendor part of this?  Shouldn't it just be arch?
> 
> I say this as "vendor" is kind of "interesting" when it comes to other
> arches...
> 
> Speaking of other arches, we all know that other arches have this
> feature as well, have you worked with any other groups to verify that
> this interface will also work with them?

Here's one set of patches for ARM64 for much the same type of cpu
design:
	https://android-review.googlesource.com/c/kernel/common/+/1437098/3
Yes, it's not been posted to any kernel lists, but this is public so you
need to work with the ARM developers to come up with an interface that
works for everyone please.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/4] drivers core: Introduce CPU type sysfs interface
  2020-10-03  8:49 ` [PATCH 0/4] drivers core: Introduce " Borislav Petkov
@ 2020-10-06  0:27   ` Ricardo Neri
  0 siblings, 0 replies; 34+ messages in thread
From: Ricardo Neri @ 2020-10-06  0:27 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Greg Kroah-Hartman, x86, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel

On Sat, Oct 03, 2020 at 10:49:34AM +0200, Borislav Petkov wrote:
> On Fri, Oct 02, 2020 at 06:17:41PM -0700, Ricardo Neri wrote:
> > Patch 1 of the series proposes the generic interface, with hooks
> > that architectures can override to suit their needs. The three patches
> > patches implement such interface for x86 (as per request from Boris,
> > I pulled patch 2 from a separate submission [1]).
> 
> So I ask you to show me the whole thing, how this is supposed to be used
> in a *real* use case and you're sending me a couple of patches which
> report these heterogeneous or whatever they're gonna be called CPUs.
> 
> Are you telling me that all this development effort was done so that
> you can report heterogeneity in sysfs? Or you just had to come up with
> *something*?
> 
> Let me try again: please show me the *big* *picture* with all the code
> how this is supposed to be used. In the patches I read a bunch of "may"
> formulations of what is possible and what userspace could do and so on.
> 
> Not that - show me the *full* and *real* use cases which you are
> enabling and which justify all that churn. Instead of leaving it all to
> userspace CPUID and the kernel not caring one bit.
> 
> Does that make more sense?

Yes Boris, thanks for the clarification. The proposed sysfs interface is
one instance in which we use cpuinfo_x86.x86_cpu_type. I have other
changes that use this new member. I will post them.

> 
> > [1]. https://lkml.org/lkml/2020/10/2/1013
> 
> For supplying links, we use lore.kernel.org/r/<message-id> solely.
> Please use that from now on.

Sure Boris, I will use lore.kernel.org in the future.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-03  8:53   ` Greg Kroah-Hartman
  2020-10-03 11:05     ` Greg Kroah-Hartman
@ 2020-10-06  0:57     ` Ricardo Neri
  2020-10-06  7:37       ` Greg Kroah-Hartman
  1 sibling, 1 reply; 34+ messages in thread
From: Ricardo Neri @ 2020-10-06  0:57 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
> > Hybrid CPU topologies combine CPUs of different microarchitectures in the
> > same die. Thus, even though the instruction set is compatible among all
> > CPUs, there may still be differences in features (e.g., some CPUs may
> > have counters that others CPU do not). There may be applications
> > interested in knowing the type of micro-architecture topology of the
> > system to make decisions about process affinity.
> > 
> > While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> > cpu_capacity) may be used to infer the types of micro-architecture of the
> > CPUs in the platform, it may not be entirely accurate. For instance, two
> > subsets of CPUs with different types of micro-architecture may have the
> > same capacity due to power or thermal constraints.
> > 
> > Create the new directory /sys/devices/system/cpu/types. Under such
> > directory, create individual subdirectories for each type of CPU micro-
> > architecture. Each subdirectory will have cpulist and cpumap files. This
> > makes it convenient for user space to read all the CPUs of the same type
> > at once without having to inspect each CPU individually.
> > 
> > Implement a generic interface using weak functions that architectures can
> > override to indicate a) support for CPU types, b) the CPU type number, and
> > c) a string to identify the CPU vendor and type.
> > 
> > For example, an x86 system with one Intel Core and four Intel Atom CPUs
> > would look like this (other architectures have the hooks to use whatever
> > directory naming convention below "types" that meets their needs):
> > 
> > user@host:~$: ls /sys/devices/system/cpu/types
> > intel_atom_0  intel_core_0
> > 
> > user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> > cpulist cpumap
> > 
> > user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> > cpulist cpumap
> > 
> > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> > 0f
> > 
> > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> > 0-3
> > 
> > user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> > 10
> > 
> > user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> > 4

Thank you for the quick and detailed Greg!

> 
> The output of 'tree' sometimes makes it easier to see here, or:
> 	grep -R . *
> also works well.

Indeed, this would definitely make it more readable.

> 
> > On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> > created. Add a hook for this purpose.
> 
> Why should these not show up if the system is not "hybrid"?

My thinking was that on a non-hybrid system, it does not make sense to
create this interface, as all the CPUs will be of the same type.

> 
> > 
> > Cc: Andi Kleen <ak@linux.intel.com>
> > Cc: Dave Hansen <dave.hansen@intel.com>
> > Cc: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> > Cc: Kan Liang <kan.liang@linux.intel.com>
> > Cc: Len Brown <len.brown@intel.com>
> > Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> > Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
> > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> > Cc: linux-kernel@vger.kernel.org
> > Reviewed-by: Tony Luck <tony.luck@intel.com>
> > Suggested-by: Len Brown <len.brown@intel.com> # Necessity of the interface
> > Suggested-by: Dave Hansen <dave.hansen@intel.com> # Details of the interface
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  .../ABI/testing/sysfs-devices-system-cpu      |  13 ++
> >  drivers/base/cpu.c                            | 214 ++++++++++++++++++
> >  include/linux/cpu.h                           |  12 +
> >  include/linux/cpuhotplug.h                    |   1 +
> >  4 files changed, 240 insertions(+)
> > 
> > diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
> > index b555df825447..bcb3d7195102 100644
> > --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> > +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> > @@ -614,3 +614,16 @@ Description:	SPURR ticks for cpuX when it was idle.
> >  
> >  		This sysfs interface exposes the number of SPURR ticks
> >  		for cpuX when it was idle.
> > +
> > +What:		/sys/devices/system/cpu/types
> > +		/sys/devices/system/cpu/types/<vendor>_<cpu_type>/cpumap
> > +		/sys/devices/system/cpu/types/<vendor>_<cpu_type>/cpulist
> 
> Please split these up into different entries, because:

Sure, I can split these into two entries.

> 
> > +Date:		Oct 2020
> > +Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
> > +Description:	Types of CPUs in hybrid systems
> > +
> > +		If a platform has CPUs with more than one type of micro-
> > +		architecture, there is one directory for each type of CPU.
> > +		Inside each directory, the files cpumap and cpulist contain
> > +		a mask and a list representing CPUs of the same type of micro-
> > +		architecture. These files only contain CPUs currently online.
> 
> You don't describe what mask and list actualy represent.  I have no clue
> what they mean, so odds are, others will not either.

Sure, I will explain better what mask and list represent.

> 
> > diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> > index d2136ab9b14a..7e98b11b15a1 100644
> > --- a/drivers/base/cpu.c
> > +++ b/drivers/base/cpu.c
> > @@ -607,11 +607,225 @@ static void __init cpu_register_vulnerabilities(void)
> >  static inline void cpu_register_vulnerabilities(void) { }
> >  #endif
> >  
> > +static ssize_t cpulist_show(struct device *device,
> > +			    struct device_attribute *attr,
> > +			    char *buf)
> > +{
> > +	struct cpumask *mask = dev_get_drvdata(device);
> > +
> > +	if (!mask)
> > +		return -EINVAL;
> 
> How can mask be NULL?

It cannot. I will remove this check.

> 
> > +
> > +	return cpumap_print_to_pagebuf(true, buf, mask);
> > +}
> > +
> > +static ssize_t cpumap_show(struct device *device,
> > +			   struct device_attribute *attr,
> > +			   char *buf)
> > +{
> > +	struct cpumask *mask = dev_get_drvdata(device);
> > +
> > +	if (!mask)
> > +		return -EINVAL;
> 
> Again, how can that be true?

It cannot, I will remove this check.

> 
> > +
> > +	return cpumap_print_to_pagebuf(false, buf, mask);
> > +}
> > +
> > +static DEVICE_ATTR_RO(cpumap);
> > +static DEVICE_ATTR_RO(cpulist);
> > +
> > +static struct attribute *cpu_type_attrs[] = {
> > +	&dev_attr_cpumap.attr,
> > +	&dev_attr_cpulist.attr,
> > +	NULL
> > +};
> > +
> > +static const struct attribute_group cpu_type_attr_group = {
> > +	.attrs = cpu_type_attrs,
> > +};
> > +
> > +static const struct attribute_group *cpu_type_attr_groups[] = {
> > +	&cpu_type_attr_group,
> > +	NULL,
> > +};
> 
> ATTRIBUTE_GROUPS()?

I will use this macro.

> 
> > +
> > +/* Root device for the cpu_type sysfs entries */
> > +static struct device *cpu_type_device;
> > +/*
> 
> No extra blank line?

I will add a blank line here.
> 
> > + * An array of cpu_type devices. One per each type of supported CPU micro-
> > + * architecture.
> > + */
> > +static struct device *cpu_type_devices[CPUTYPES_MAX_NR];
> > +
> > +/**
> > + * arch_get_cpu_type() - Get the CPU type number
> > + * @cpu:	Index of the CPU of which the index is needed
> > + *
> > + * Get the CPU type number of @cpu, a non-zero unsigned 32-bit number that
> > + * uniquely identifies a type of CPU micro-architecture. All CPUs of the same
> > + * type have the same type number. Type numbers are defined by each CPU
> > + * architecture.
> > + */
> > +u32 __weak arch_get_cpu_type(int cpu)
> 
> Can you just have this "cpu type" be an enumerated type so we have a
> clue as to what they really are?  Each arch can then define it, right?

Do you mean that each arch can have its own definition of the same
enumeration or that all archs share a common enumeration?
> 
> > +{
> > +	return 0;
> > +}
> > +
> > +/**
> > + * arch_has_cpu_type() - Check if CPU types are supported
> > + *
> > + * Returns true if the running platform supports CPU types. This is true if the
> > + * platform has CPUs of more than one type of micro-architecture. Otherwise,
> > + * returns false.
> > + */
> > +bool __weak arch_has_cpu_type(void)
> > +{
> > +	return false;
> > +}
> > +
> > +/**
> > + * arch_get_cpu_type_name() - Get the CPU type name
> > + * @cpu_type:	Type of CPU micro-architecture.
> > + *
> > + * Returns a string name associated with the CPU micro-architecture type as
> > + * indicated in @cpu_type. The format shall be <vendor>_<cpu_type>. Returns
> > + * NULL if the CPU type is not known.
> > + */
> > +const char __weak *arch_get_cpu_type_name(u32 cpu_type)
> > +{
> > +	return NULL;
> > +}
> 
> Why is vendor part of this?  Shouldn't it just be arch?
> 
> I say this as "vendor" is kind of "interesting" when it comes to other
> arches...

I thought of this as maybe intel_atcm vs a potential amd_minizen. I
guess arch is sufficient, as there will never be an amd_atom.

> 
> Speaking of other arches, we all know that other arches have this
> feature as well, have you worked with any other groups to verify that
> this interface will also work with them?

I was not aware of the effort that you shared in a subsequent email. I
will contact the ARM engineers and discuss with them a potential
interface.

> 
> > +
> > +static int cpu_type_create_device(int cpu, int idx)
> > +{
> > +	u32 cpu_type = arch_get_cpu_type(cpu);
> > +	struct cpumask *mask;
> > +	const char *name;
> > +	char buf[64];
> > +
> > +	mask = kzalloc(cpumask_size(), GFP_KERNEL);
> > +	if (!mask)
> > +		return -ENOMEM;
> > +
> > +	name = arch_get_cpu_type_name(cpu_type);
> > +	if (!name) {
> > +		snprintf(buf, sizeof(buf), "unknown_%u", cpu_type);
> 
> If there is no name, why are you still making one up?
> 
> Shouldn't you just not create it at all?

I think all CPUs in the system must be enumerated. It may happen that an
old kernel runs on a new processor and does not know about a new type in
such processor. In this case, it will simply say that there is a CPU type
it does not know about.
> 
> > +		name = buf;
> > +	}
> > +
> > +	cpu_type_devices[idx] = cpu_device_create(cpu_type_device, mask,
> > +						  cpu_type_attr_groups, name);
> > +	if (IS_ERR(cpu_type_devices[idx]))
> > +		goto free_cpumask;
> > +
> > +	cpumask_set_cpu(cpu, mask);
> > +	return 0;
> > +
> > +free_cpumask:
> > +	kfree(mask);
> > +	return -ENOMEM;
> > +}
> > +
> > +static int cpu_type_sysfs_online(unsigned int cpu)
> > +{
> > +	u32 cpu_type = arch_get_cpu_type(cpu);
> > +	struct cpumask *mask;
> > +	int i;
> > +
> > +	for (i = 0; i < ARRAY_SIZE(cpu_type_devices); i++) {
> > +		u32 this_cpu_type;
> > +
> > +		/*
> > +		 * The first devices in the array are used first. Thus, create
> > +		 * a new device as well as sysfs directory and for the type of
> > +		 * @cpu.
> > +		 */
> > +		if (!cpu_type_devices[i])
> > +			return cpu_type_create_device(cpu, i);
> > +
> > +		mask = dev_get_drvdata(cpu_type_devices[i]);
> > +		if (!mask && !cpumask_weight(mask)) {
> > +			/*
> > +			 * We should not be here. Be paranoid about
> > +			 * NULL pointers.
> > +			 */
> > +			dev_err(cpu_type_devices[i], "CPU mask is invalid");
> > +			return -EINVAL;
> > +		}
> > +
> > +		this_cpu_type = arch_get_cpu_type(cpumask_first(mask));
> > +		if (cpu_type == this_cpu_type) {
> > +			cpumask_set_cpu(cpu, mask);
> > +			return 0;
> > +		}
> > +	}
> > +
> > +	return -ENODEV;
> > +}
> > +
> > +static int cpu_type_sysfs_offline(unsigned int cpu)
> > +{
> > +	u32 cpu_type = arch_get_cpu_type(cpu);
> > +	struct cpumask *mask;
> > +	int i;
> > +
> > +	for (i = 0; i < ARRAY_SIZE(cpu_type_devices); i++) {
> > +		u32 this_cpu_type;
> > +
> > +		if (!cpu_type_devices[i])
> > +			continue;
> > +
> > +		mask = dev_get_drvdata(cpu_type_devices[i]);
> > +		if (!mask && !cpumask_weight(mask)) {
> > +			/*
> > +			 * We should not be here. Be paranoid about
> > +			 * NULL pointers.
> > +			 */
> > +			dev_err(cpu_type_devices[i], "CPU mask is invalid");
> > +			continue;
> > +		}
> > +
> > +		this_cpu_type = arch_get_cpu_type(cpumask_first(mask));
> > +		if (cpu_type == this_cpu_type) {
> > +			cpumask_clear_cpu(cpu, mask);
> > +			return 0;
> > +		}
> > +	}
> > +
> > +	/*
> > +	 * If we are here, no matching cpu_type was found. This CPU was not
> > +	 * accounted for at hotplug online.
> > +	 */
> > +	pr_warn("Unexpected CPU offline!\n");
> > +
> > +	return -ENODEV;
> > +}
> > +
> > +static void __init cpu_type_sysfs_register(void)
> > +{
> > +	struct device *dev = cpu_subsys.dev_root;
> > +	int ret;
> > +
> > +	if (!arch_has_cpu_type())
> > +		return;
> > +
> > +	cpu_type_device = cpu_device_create(dev, NULL, NULL, "types");
> 
> This feels like an abuse of cpu_device_create(), doesn't it?

Would it be better if I use a struct device?

> 
> > +	if (IS_ERR(cpu_type_device))
> > +		return;
> > +
> > +	ret = cpuhp_setup_state(CPUHP_AP_BASE_CPUTYPE_ONLINE,
> > +				"base/cpu_type_sysfs:online",
> > +				cpu_type_sysfs_online, cpu_type_sysfs_offline);
> > +	if (ret)
> > +		dev_warn(dev, "failed to CPU type sysfs\n");
> > +}
> > +
> >  void __init cpu_dev_init(void)
> >  {
> >  	if (subsys_system_register(&cpu_subsys, cpu_root_attr_groups))
> >  		panic("Failed to register CPU subsystem");
> >  
> >  	cpu_dev_register_generic();
> > +	cpu_type_sysfs_register();
> >  	cpu_register_vulnerabilities();
> >  }
> > diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> > index 8aa84c052fdf..76fba70c7801 100644
> > --- a/include/linux/cpu.h
> > +++ b/include/linux/cpu.h
> > @@ -228,4 +228,16 @@ static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 0;
> >  extern bool cpu_mitigations_off(void);
> >  extern bool cpu_mitigations_auto_nosmt(void);
> >  
> > +#ifndef CPUTYPES_MAX_NR
> > +/*
> > + * The maximum number of types of cpus. Architecture-specific implementations
> > + * shall override this value.
> > + */
> > +#define CPUTYPES_MAX_NR 0
> > +#endif
> > +
> > +extern u32 arch_get_cpu_type(int cpu);
> > +extern bool arch_has_cpu_type(void);
> > +extern const char *arch_get_cpu_type_name(u32 cpu_type);
> > +
> >  #endif /* _LINUX_CPU_H_ */
> > diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> > index 6f524bbf71a2..fd90ccb6652c 100644
> > --- a/include/linux/cpuhotplug.h
> > +++ b/include/linux/cpuhotplug.h
> > @@ -186,6 +186,7 @@ enum cpuhp_state {
> >  	CPUHP_AP_WATCHDOG_ONLINE,
> >  	CPUHP_AP_WORKQUEUE_ONLINE,
> >  	CPUHP_AP_RCUTREE_ONLINE,
> > +	CPUHP_AP_BASE_CPUTYPE_ONLINE,
> >  	CPUHP_AP_BASE_CACHEINFO_ONLINE,
> 
> I'm all for messing with enumerated type values, but why put it in this
> part of the list and not right before the next one?

The reason I put CPUHP_AP_BASE_CPUTYPE_ONLINE before
CPUHP_AP_BASE_CACHEINFO_ONLINE is that both deal with enumerating and
exposing the topology of the system. I could also invert the order.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/4] x86/cpu/topology: Implement the CPU type sysfs interface
  2020-10-03  8:55   ` Greg Kroah-Hartman
@ 2020-10-06  1:05     ` Ricardo Neri
  0 siblings, 0 replies; 34+ messages in thread
From: Ricardo Neri @ 2020-10-06  1:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Kan Liang,
	Srinivas Pandruvada

On Sat, Oct 03, 2020 at 10:55:06AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Oct 02, 2020 at 06:17:45PM -0700, Ricardo Neri wrote:
> > Recent Intel processors combine CPUs with different types of micro-
> > architecture in the same package. There may be applications interested in
> > knowing the type topology of the system. For instance, it can be used to
> > to determine which subsets of CPUs share a common feature.
> > 
> > Implement cpu_type sysfs interfaces for Intel processors.
> > 
> > For example, in a system with four Intel Atom CPUs and one Intel Core CPU,
> > these entries look as below. In this example, the native model IDs for
> > both types of CPUs are 0:
> > 
> > user@host:~$: ls /sys/devices/system/cpu/types
> > intel_atom_0 intel_core_0
> > 
> > user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> > cpulist cpumap
> > 
> > user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> > cpulist cpumap
> > 
> > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom/cpumap
> > 0f
> > 
> > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom/cpulist
> > 0-3
> > 
> > user@nost:~$ cat /sys/devices/system/cpu/types/intel_core/cpumap
> > 10
> > 
> > user@host:~$ cat /sys/devices/system/cpu/types/intel_core/cpulist
> > 4
> 
> You used the same changelog text here as you did in patch 1/4, why?

In both changesets, if merged, somebody could conveniently do git show
on either commit quickly see the result intent of the changeset.

Would it make it better if in patch 1/4 I put an hypothetical generic
example?

Something like:

user@host:~$: ls /sys/devices/system/cpu/types
<arch>_<type_a> <arch><type_b>

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-03 11:05     ` Greg Kroah-Hartman
@ 2020-10-06  1:08       ` Ricardo Neri
  0 siblings, 0 replies; 34+ messages in thread
From: Ricardo Neri @ 2020-10-06  1:08 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Sat, Oct 03, 2020 at 01:05:48PM +0200, Greg Kroah-Hartman wrote:
> On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
> > On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
> > > +/**
> > > + * arch_get_cpu_type_name() - Get the CPU type name
> > > + * @cpu_type:	Type of CPU micro-architecture.
> > > + *
> > > + * Returns a string name associated with the CPU micro-architecture type as
> > > + * indicated in @cpu_type. The format shall be <vendor>_<cpu_type>. Returns
> > > + * NULL if the CPU type is not known.
> > > + */
> > > +const char __weak *arch_get_cpu_type_name(u32 cpu_type)
> > > +{
> > > +	return NULL;
> > > +}
> > 
> > Why is vendor part of this?  Shouldn't it just be arch?
> > 
> > I say this as "vendor" is kind of "interesting" when it comes to other
> > arches...
> > 
> > Speaking of other arches, we all know that other arches have this
> > feature as well, have you worked with any other groups to verify that
> > this interface will also work with them?
> 
> Here's one set of patches for ARM64 for much the same type of cpu
> design:
> 	https://android-review.googlesource.com/c/kernel/common/+/1437098/3
> Yes, it's not been posted to any kernel lists, but this is public so you
> need to work with the ARM developers to come up with an interface that
> works for everyone please.

Thanks for the pointer, Greg! I will study this proposal and work with
the ARM engineers.

BR,
Ricardo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-03  3:27   ` Randy Dunlap
@ 2020-10-06  1:15     ` Ricardo Neri
  2020-10-10  3:14       ` Randy Dunlap
  0 siblings, 1 reply; 34+ messages in thread
From: Ricardo Neri @ 2020-10-06  1:15 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Greg Kroah-Hartman, x86, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Rafael J. Wysocki, Tony Luck, Len Brown,
	Ravi V. Shankar, linux-kernel, Andi Kleen, Dave Hansen,
	Gautham R. Shenoy, Kan Liang, Srinivas Pandruvada

On Fri, Oct 02, 2020 at 08:27:41PM -0700, Randy Dunlap wrote:
> On 10/2/20 6:17 PM, Ricardo Neri wrote:
> > +/**
> > + * arch_get_cpu_type() - Get the CPU type number
> > + * @cpu:	Index of the CPU of which the index is needed
> > + *
> > + * Get the CPU type number of @cpu, a non-zero unsigned 32-bit number that

Thank you for your feedback Randy!

> Are you sure that @cpu is non-zero?

This is the intent. Maybe it can be reworked to return -EINVAL instead?
I gues it is plausible but less likely that an arch defines a type as
0xffffffea;

> 
> > + * uniquely identifies a type of CPU micro-architecture. All CPUs of the same
> > + * type have the same type number. Type numbers are defined by each CPU
> > + * architecture.
> > + */
> > +u32 __weak arch_get_cpu_type(int cpu)
> > +{
> > +	return 0;
> > +}
> 
> arch_get_cpu_type() in patch 4/4 allows @cpu to be 0.

It should not return 0 if the system does not have
X86_FEATURE_HYBRID_CPU. The currently existing CPU types are all
non-zero as per the Intel SDM. Am I missing anything?

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-06  0:57     ` Ricardo Neri
@ 2020-10-06  7:37       ` Greg Kroah-Hartman
  2020-10-07  3:14         ` Ricardo Neri
  0 siblings, 1 reply; 34+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-06  7:37 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
> On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
> > On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
> > > Hybrid CPU topologies combine CPUs of different microarchitectures in the
> > > same die. Thus, even though the instruction set is compatible among all
> > > CPUs, there may still be differences in features (e.g., some CPUs may
> > > have counters that others CPU do not). There may be applications
> > > interested in knowing the type of micro-architecture topology of the
> > > system to make decisions about process affinity.
> > > 
> > > While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> > > cpu_capacity) may be used to infer the types of micro-architecture of the
> > > CPUs in the platform, it may not be entirely accurate. For instance, two
> > > subsets of CPUs with different types of micro-architecture may have the
> > > same capacity due to power or thermal constraints.
> > > 
> > > Create the new directory /sys/devices/system/cpu/types. Under such
> > > directory, create individual subdirectories for each type of CPU micro-
> > > architecture. Each subdirectory will have cpulist and cpumap files. This
> > > makes it convenient for user space to read all the CPUs of the same type
> > > at once without having to inspect each CPU individually.
> > > 
> > > Implement a generic interface using weak functions that architectures can
> > > override to indicate a) support for CPU types, b) the CPU type number, and
> > > c) a string to identify the CPU vendor and type.
> > > 
> > > For example, an x86 system with one Intel Core and four Intel Atom CPUs
> > > would look like this (other architectures have the hooks to use whatever
> > > directory naming convention below "types" that meets their needs):
> > > 
> > > user@host:~$: ls /sys/devices/system/cpu/types
> > > intel_atom_0  intel_core_0
> > > 
> > > user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> > > cpulist cpumap
> > > 
> > > user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> > > cpulist cpumap
> > > 
> > > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> > > 0f
> > > 
> > > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> > > 0-3
> > > 
> > > user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> > > 10
> > > 
> > > user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> > > 4
> 
> Thank you for the quick and detailed Greg!
> 
> > 
> > The output of 'tree' sometimes makes it easier to see here, or:
> > 	grep -R . *
> > also works well.
> 
> Indeed, this would definitely make it more readable.
> 
> > 
> > > On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> > > created. Add a hook for this purpose.
> > 
> > Why should these not show up if the system is not "hybrid"?
> 
> My thinking was that on a non-hybrid system, it does not make sense to
> create this interface, as all the CPUs will be of the same type.

Why not just have this an attribute type in the existing cpuX directory?
Why do this have to be a totally separate directory and userspace has to
figure out to look in two different spots for the same cpu to determine
what it is?

That feels wasteful, it should be much simpler to use the existing
object, right?

That way, you also show the "type" of all cpus, no matter if they are
"hybrid" or not, again, making userspace deal with things in a much
simpler manner.

> > > +/**
> > > + * arch_get_cpu_type() - Get the CPU type number
> > > + * @cpu:	Index of the CPU of which the index is needed
> > > + *
> > > + * Get the CPU type number of @cpu, a non-zero unsigned 32-bit number that
> > > + * uniquely identifies a type of CPU micro-architecture. All CPUs of the same
> > > + * type have the same type number. Type numbers are defined by each CPU
> > > + * architecture.
> > > + */
> > > +u32 __weak arch_get_cpu_type(int cpu)
> > 
> > Can you just have this "cpu type" be an enumerated type so we have a
> > clue as to what they really are?  Each arch can then define it, right?
> 
> Do you mean that each arch can have its own definition of the same
> enumeration or that all archs share a common enumeration?

Each arch should have its own definition of this enumerated type.

> > > +/**
> > > + * arch_get_cpu_type_name() - Get the CPU type name
> > > + * @cpu_type:	Type of CPU micro-architecture.
> > > + *
> > > + * Returns a string name associated with the CPU micro-architecture type as
> > > + * indicated in @cpu_type. The format shall be <vendor>_<cpu_type>. Returns
> > > + * NULL if the CPU type is not known.
> > > + */
> > > +const char __weak *arch_get_cpu_type_name(u32 cpu_type)
> > > +{
> > > +	return NULL;
> > > +}
> > 
> > Why is vendor part of this?  Shouldn't it just be arch?
> > 
> > I say this as "vendor" is kind of "interesting" when it comes to other
> > arches...
> 
> I thought of this as maybe intel_atcm vs a potential amd_minizen. I
> guess arch is sufficient, as there will never be an amd_atom.

Even if there is, that can be part of the cpu "type" that you have for
your enumerated type, right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/4] drivers core: Introduce CPU type sysfs interface
  2020-10-03  1:17 [PATCH 0/4] drivers core: Introduce CPU type sysfs interface Ricardo Neri
                   ` (4 preceding siblings ...)
  2020-10-03  8:49 ` [PATCH 0/4] drivers core: Introduce " Borislav Petkov
@ 2020-10-06  8:51 ` Qais Yousef
  2020-10-07  2:50   ` Ricardo Neri
  5 siblings, 1 reply; 34+ messages in thread
From: Qais Yousef @ 2020-10-06  8:51 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Greg Kroah-Hartman, x86, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Rafael J. Wysocki, Tony Luck, Len Brown,
	Ravi V. Shankar, linux-kernel, Dietmar Eggemann,
	Morten Rasmussen, Quentin Perret

Hi Ricardo

Adding some people who might be interested.

On 10/02/20 18:17, Ricardo Neri wrote:
> Hybrid CPU topologies combine processors with more than one type of
> micro-architecture. Hence, it is possible that individual CPUs support
> slightly different features (e.g., performance counters) and different
> performance properties. Thus, there may be user space entities interested
> in knowing the topology of the system based on the types of available
> CPUs.
> 
> Currently, there exists an interface for the CPU capacity (/sys/devices/
> system/cpu/cpuX/cpu_capacity). However, CPU capacity does not always map
> to CPU types (by the way, I will submit a separate series to bring such
> interface to x86).

Why do you need to do this mapping?

> 
> This series proposes the new interface /sys/devices/system/cpu/types
> which, in hybrid parts, creates a subdirectory for each type of CPU.
> Each subdirectory contains a CPU list and a CPU map that user space can
> query.

Why user space needs to query this info?

The rationale is missing the intention behind all of this. It seems you're
expecting software to parse this info and take decisions based on that?

Thanks

--
Qais Yousef

> 
> Patch 1 of the series proposes the generic interface, with hooks
> that architectures can override to suit their needs. The three patches
> patches implement such interface for x86 (as per request from Boris,
> I pulled patch 2 from a separate submission [1]).
> 
> Thanks and BR,
> Ricardo
> 
> [1]. https://lkml.org/lkml/2020/10/2/1013
> 
> Ricardo Neri (4):
>   drivers core: Introduce CPU type sysfs interface
>   x86/cpu: Describe hybrid CPUs in cpuinfo_x86
>   x86/cpu/intel: Add function to get name of hybrid CPU types
>   x86/cpu/topology: Implement the CPU type sysfs interface
> 
>  .../ABI/testing/sysfs-devices-system-cpu      |  13 ++
>  arch/x86/include/asm/intel-family.h           |   4 +
>  arch/x86/include/asm/processor.h              |  13 ++
>  arch/x86/include/asm/topology.h               |   2 +
>  arch/x86/kernel/cpu/common.c                  |   3 +
>  arch/x86/kernel/cpu/cpu.h                     |   3 +
>  arch/x86/kernel/cpu/intel.c                   |  23 ++
>  arch/x86/kernel/cpu/topology.c                |  23 ++
>  drivers/base/cpu.c                            | 214 ++++++++++++++++++
>  include/linux/cpu.h                           |  12 +
>  include/linux/cpuhotplug.h                    |   1 +
>  11 files changed, 311 insertions(+)
> 
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/4] drivers core: Introduce CPU type sysfs interface
  2020-10-06  8:51 ` Qais Yousef
@ 2020-10-07  2:50   ` Ricardo Neri
  0 siblings, 0 replies; 34+ messages in thread
From: Ricardo Neri @ 2020-10-07  2:50 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Greg Kroah-Hartman, x86, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Rafael J. Wysocki, Tony Luck, Len Brown,
	Ravi V. Shankar, linux-kernel, Dietmar Eggemann,
	Morten Rasmussen, Quentin Perret

On Tue, Oct 06, 2020 at 09:51:53AM +0100, Qais Yousef wrote:
> Hi Ricardo

Hi Qais,

Thanks for chiming in.

> 
> Adding some people who might be interested.
> 
> On 10/02/20 18:17, Ricardo Neri wrote:
> > Hybrid CPU topologies combine processors with more than one type of
> > micro-architecture. Hence, it is possible that individual CPUs support
> > slightly different features (e.g., performance counters) and different
> > performance properties. Thus, there may be user space entities interested
> > in knowing the topology of the system based on the types of available
> > CPUs.
> > 
> > Currently, there exists an interface for the CPU capacity (/sys/devices/
> > system/cpu/cpuX/cpu_capacity). However, CPU capacity does not always map
> > to CPU types (by the way, I will submit a separate series to bring such
> > interface to x86).
> 
> Why do you need to do this mapping?
> 
> > 
> > This series proposes the new interface /sys/devices/system/cpu/types
> > which, in hybrid parts, creates a subdirectory for each type of CPU.
> > Each subdirectory contains a CPU list and a CPU map that user space can
> > query.
> 
> Why user space needs to query this info?
> 
> The rationale is missing the intention behind all of this. It seems you're
> expecting software to parse this info and take decisions based on that?

I propose this interface to be consumed for application or libraries
wanting to know the topology of the system. Perhaps, a utility program
that wants to depict CPUs by type. Another example is a policy manager
wanting to create a cgroup cpuset of CPUs of a given type for
performance reasons. Similarly, a program may want to affinitize itself
to a type of CPU, if for instance certain performance events are only
counted on a given CPU type.

Also, an interface with cpumasks makes it convenient for userspace to
not have to traverse each CPU individually.

   a) add the type of each CPU in /proc/cpuinfo
   b) add the type of each CPU in /sys/devices/system/cpu/cpu#/type
   c) for performance, derive the CPU type from
      /sys/devices/system/cpu/cpu#/cpu_capacity
   d) add an interface similar to what I propose in this series.

None of this imply that certain instructions will be able to run on one
type of CPU but not on another. Instead, better performance may be
obtained on one type CPU vs another.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-06  7:37       ` Greg Kroah-Hartman
@ 2020-10-07  3:14         ` Ricardo Neri
  2020-10-07  5:15           ` Greg Kroah-Hartman
  0 siblings, 1 reply; 34+ messages in thread
From: Ricardo Neri @ 2020-10-07  3:14 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote:
> On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
> > On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
> > > On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
> > > > Hybrid CPU topologies combine CPUs of different microarchitectures in the
> > > > same die. Thus, even though the instruction set is compatible among all
> > > > CPUs, there may still be differences in features (e.g., some CPUs may
> > > > have counters that others CPU do not). There may be applications
> > > > interested in knowing the type of micro-architecture topology of the
> > > > system to make decisions about process affinity.
> > > > 
> > > > While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> > > > cpu_capacity) may be used to infer the types of micro-architecture of the
> > > > CPUs in the platform, it may not be entirely accurate. For instance, two
> > > > subsets of CPUs with different types of micro-architecture may have the
> > > > same capacity due to power or thermal constraints.
> > > > 
> > > > Create the new directory /sys/devices/system/cpu/types. Under such
> > > > directory, create individual subdirectories for each type of CPU micro-
> > > > architecture. Each subdirectory will have cpulist and cpumap files. This
> > > > makes it convenient for user space to read all the CPUs of the same type
> > > > at once without having to inspect each CPU individually.
> > > > 
> > > > Implement a generic interface using weak functions that architectures can
> > > > override to indicate a) support for CPU types, b) the CPU type number, and
> > > > c) a string to identify the CPU vendor and type.
> > > > 
> > > > For example, an x86 system with one Intel Core and four Intel Atom CPUs
> > > > would look like this (other architectures have the hooks to use whatever
> > > > directory naming convention below "types" that meets their needs):
> > > > 
> > > > user@host:~$: ls /sys/devices/system/cpu/types
> > > > intel_atom_0  intel_core_0
> > > > 
> > > > user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> > > > cpulist cpumap
> > > > 
> > > > user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> > > > cpulist cpumap
> > > > 
> > > > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> > > > 0f
> > > > 
> > > > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> > > > 0-3
> > > > 
> > > > user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> > > > 10
> > > > 
> > > > user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> > > > 4
> > 
> > Thank you for the quick and detailed Greg!
> > 
> > > 
> > > The output of 'tree' sometimes makes it easier to see here, or:
> > > 	grep -R . *
> > > also works well.
> > 
> > Indeed, this would definitely make it more readable.
> > 
> > > 
> > > > On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> > > > created. Add a hook for this purpose.
> > > 
> > > Why should these not show up if the system is not "hybrid"?
> > 
> > My thinking was that on a non-hybrid system, it does not make sense to
> > create this interface, as all the CPUs will be of the same type.
> 
> Why not just have this an attribute type in the existing cpuX directory?
> Why do this have to be a totally separate directory and userspace has to
> figure out to look in two different spots for the same cpu to determine
> what it is?

But if the type is located under cpuX, usespace would need to traverse
all the CPUs and create its own cpu masks. Under the types directory it
would only need to look once for each type of CPU, IMHO.
> 
> That feels wasteful, it should be much simpler to use the existing
> object, right?
> 
> That way, you also show the "type" of all cpus, no matter if they are
> "hybrid" or not, again, making userspace deal with things in a much
> simpler manner.

Indeed, that would be simpler to implement, and perhaps a natural extension
of the existing interface. 

Lastly, legacy and non-hybrid parts will not have a type defined. Thus,
it may not make sense for them to expose a type in sysfs.

> > > > +/**
> > > > + * arch_get_cpu_type() - Get the CPU type number
> > > > + * @cpu:	Index of the CPU of which the index is needed
> > > > + *
> > > > + * Get the CPU type number of @cpu, a non-zero unsigned 32-bit number that
> > > > + * uniquely identifies a type of CPU micro-architecture. All CPUs of the same
> > > > + * type have the same type number. Type numbers are defined by each CPU
> > > > + * architecture.
> > > > + */
> > > > +u32 __weak arch_get_cpu_type(int cpu)
> > > 
> > > Can you just have this "cpu type" be an enumerated type so we have a
> > > clue as to what they really are?  Each arch can then define it, right?
> > 
> > Do you mean that each arch can have its own definition of the same
> > enumeration or that all archs share a common enumeration?
> 
> Each arch should have its own definition of this enumerated type.

Sure, understood.

> 
> > > > +/**
> > > > + * arch_get_cpu_type_name() - Get the CPU type name
> > > > + * @cpu_type:	Type of CPU micro-architecture.
> > > > + *
> > > > + * Returns a string name associated with the CPU micro-architecture type as
> > > > + * indicated in @cpu_type. The format shall be <vendor>_<cpu_type>. Returns
> > > > + * NULL if the CPU type is not known.
> > > > + */
> > > > +const char __weak *arch_get_cpu_type_name(u32 cpu_type)
> > > > +{
> > > > +	return NULL;
> > > > +}
> > > 
> > > Why is vendor part of this?  Shouldn't it just be arch?
> > > 
> > > I say this as "vendor" is kind of "interesting" when it comes to other
> > > arches...
> > 
> > I thought of this as maybe intel_atcm vs a potential amd_minizen. I
> > guess arch is sufficient, as there will never be an amd_atom.
> 
> Even if there is, that can be part of the cpu "type" that you have for
> your enumerated type, right?

Indeed. The interface I propose lets archs define their own string
names.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-07  3:14         ` Ricardo Neri
@ 2020-10-07  5:15           ` Greg Kroah-Hartman
  2020-10-08  3:34             ` Ricardo Neri
  2020-11-12  6:19             ` Brice Goglin
  0 siblings, 2 replies; 34+ messages in thread
From: Greg Kroah-Hartman @ 2020-10-07  5:15 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Tue, Oct 06, 2020 at 08:14:47PM -0700, Ricardo Neri wrote:
> On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote:
> > On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
> > > On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
> > > > On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
> > > > > Hybrid CPU topologies combine CPUs of different microarchitectures in the
> > > > > same die. Thus, even though the instruction set is compatible among all
> > > > > CPUs, there may still be differences in features (e.g., some CPUs may
> > > > > have counters that others CPU do not). There may be applications
> > > > > interested in knowing the type of micro-architecture topology of the
> > > > > system to make decisions about process affinity.
> > > > > 
> > > > > While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> > > > > cpu_capacity) may be used to infer the types of micro-architecture of the
> > > > > CPUs in the platform, it may not be entirely accurate. For instance, two
> > > > > subsets of CPUs with different types of micro-architecture may have the
> > > > > same capacity due to power or thermal constraints.
> > > > > 
> > > > > Create the new directory /sys/devices/system/cpu/types. Under such
> > > > > directory, create individual subdirectories for each type of CPU micro-
> > > > > architecture. Each subdirectory will have cpulist and cpumap files. This
> > > > > makes it convenient for user space to read all the CPUs of the same type
> > > > > at once without having to inspect each CPU individually.
> > > > > 
> > > > > Implement a generic interface using weak functions that architectures can
> > > > > override to indicate a) support for CPU types, b) the CPU type number, and
> > > > > c) a string to identify the CPU vendor and type.
> > > > > 
> > > > > For example, an x86 system with one Intel Core and four Intel Atom CPUs
> > > > > would look like this (other architectures have the hooks to use whatever
> > > > > directory naming convention below "types" that meets their needs):
> > > > > 
> > > > > user@host:~$: ls /sys/devices/system/cpu/types
> > > > > intel_atom_0  intel_core_0
> > > > > 
> > > > > user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> > > > > cpulist cpumap
> > > > > 
> > > > > user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> > > > > cpulist cpumap
> > > > > 
> > > > > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> > > > > 0f
> > > > > 
> > > > > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> > > > > 0-3
> > > > > 
> > > > > user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> > > > > 10
> > > > > 
> > > > > user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> > > > > 4
> > > 
> > > Thank you for the quick and detailed Greg!
> > > 
> > > > 
> > > > The output of 'tree' sometimes makes it easier to see here, or:
> > > > 	grep -R . *
> > > > also works well.
> > > 
> > > Indeed, this would definitely make it more readable.
> > > 
> > > > 
> > > > > On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> > > > > created. Add a hook for this purpose.
> > > > 
> > > > Why should these not show up if the system is not "hybrid"?
> > > 
> > > My thinking was that on a non-hybrid system, it does not make sense to
> > > create this interface, as all the CPUs will be of the same type.
> > 
> > Why not just have this an attribute type in the existing cpuX directory?
> > Why do this have to be a totally separate directory and userspace has to
> > figure out to look in two different spots for the same cpu to determine
> > what it is?
> 
> But if the type is located under cpuX, usespace would need to traverse
> all the CPUs and create its own cpu masks. Under the types directory it
> would only need to look once for each type of CPU, IMHO.

What does a "mask" do?  What does userspace care about this?  You would
have to create it by traversing the directories you are creating anyway,
so it's not much different, right?

> > That feels wasteful, it should be much simpler to use the existing
> > object, right?
> > 
> > That way, you also show the "type" of all cpus, no matter if they are
> > "hybrid" or not, again, making userspace deal with things in a much
> > simpler manner.
> 
> Indeed, that would be simpler to implement, and perhaps a natural extension
> of the existing interface. 
> 
> Lastly, legacy and non-hybrid parts will not have a type defined. Thus,
> it may not make sense for them to expose a type in sysfs.

That's fine, not having a sysfs file if you don't know the type is fine.
Or you can fix that up and show the type of those as well, why wouldn't
you want to?

> > > I thought of this as maybe intel_atcm vs a potential amd_minizen. I
> > > guess arch is sufficient, as there will never be an amd_atom.
> > 
> > Even if there is, that can be part of the cpu "type" that you have for
> > your enumerated type, right?
> 
> Indeed. The interface I propose lets archs define their own string
> names.

Arches _should_ define their own names, otherwise who would?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-07  5:15           ` Greg Kroah-Hartman
@ 2020-10-08  3:34             ` Ricardo Neri
  2020-11-12  6:19             ` Brice Goglin
  1 sibling, 0 replies; 34+ messages in thread
From: Ricardo Neri @ 2020-10-08  3:34 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Wed, Oct 07, 2020 at 07:15:46AM +0200, Greg Kroah-Hartman wrote:
> On Tue, Oct 06, 2020 at 08:14:47PM -0700, Ricardo Neri wrote:
> > On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote:
> > > On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
> > > > On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
> > > > > On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
> > > > > > Hybrid CPU topologies combine CPUs of different microarchitectures in the
> > > > > > same die. Thus, even though the instruction set is compatible among all
> > > > > > CPUs, there may still be differences in features (e.g., some CPUs may
> > > > > > have counters that others CPU do not). There may be applications
> > > > > > interested in knowing the type of micro-architecture topology of the
> > > > > > system to make decisions about process affinity.
> > > > > > 
> > > > > > While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> > > > > > cpu_capacity) may be used to infer the types of micro-architecture of the
> > > > > > CPUs in the platform, it may not be entirely accurate. For instance, two
> > > > > > subsets of CPUs with different types of micro-architecture may have the
> > > > > > same capacity due to power or thermal constraints.
> > > > > > 
> > > > > > Create the new directory /sys/devices/system/cpu/types. Under such
> > > > > > directory, create individual subdirectories for each type of CPU micro-
> > > > > > architecture. Each subdirectory will have cpulist and cpumap files. This
> > > > > > makes it convenient for user space to read all the CPUs of the same type
> > > > > > at once without having to inspect each CPU individually.
> > > > > > 
> > > > > > Implement a generic interface using weak functions that architectures can
> > > > > > override to indicate a) support for CPU types, b) the CPU type number, and
> > > > > > c) a string to identify the CPU vendor and type.
> > > > > > 
> > > > > > For example, an x86 system with one Intel Core and four Intel Atom CPUs
> > > > > > would look like this (other architectures have the hooks to use whatever
> > > > > > directory naming convention below "types" that meets their needs):
> > > > > > 
> > > > > > user@host:~$: ls /sys/devices/system/cpu/types
> > > > > > intel_atom_0  intel_core_0
> > > > > > 
> > > > > > user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> > > > > > cpulist cpumap
> > > > > > 
> > > > > > user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> > > > > > cpulist cpumap
> > > > > > 
> > > > > > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> > > > > > 0f
> > > > > > 
> > > > > > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> > > > > > 0-3
> > > > > > 
> > > > > > user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> > > > > > 10
> > > > > > 
> > > > > > user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> > > > > > 4
> > > > 
> > > > Thank you for the quick and detailed Greg!
> > > > 
> > > > > 
> > > > > The output of 'tree' sometimes makes it easier to see here, or:
> > > > > 	grep -R . *
> > > > > also works well.
> > > > 
> > > > Indeed, this would definitely make it more readable.
> > > > 
> > > > > 
> > > > > > On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> > > > > > created. Add a hook for this purpose.
> > > > > 
> > > > > Why should these not show up if the system is not "hybrid"?
> > > > 
> > > > My thinking was that on a non-hybrid system, it does not make sense to
> > > > create this interface, as all the CPUs will be of the same type.
> > > 
> > > Why not just have this an attribute type in the existing cpuX directory?
> > > Why do this have to be a totally separate directory and userspace has to
> > > figure out to look in two different spots for the same cpu to determine
> > > what it is?
> > 
> > But if the type is located under cpuX, usespace would need to traverse
> > all the CPUs and create its own cpu masks. Under the types directory it
> > would only need to look once for each type of CPU, IMHO.
> 
> What does a "mask" do?  What does userspace care about this? 

It would only need to consume the cpu masks that sysfs provides; but it
would have to implement the extra step of looking at another directory.

> You would have to create it by traversing the directories you are creating anyway,
> so it's not much different, right?

True, that is a good point, userspace would still end up traversing
diretories anyways.

> 
> > > That feels wasteful, it should be much simpler to use the existing
> > > object, right?
> > > 
> > > That way, you also show the "type" of all cpus, no matter if they are
> > > "hybrid" or not, again, making userspace deal with things in a much
> > > simpler manner.
> > 
> > Indeed, that would be simpler to implement, and perhaps a natural extension
> > of the existing interface. 
> > 
> > Lastly, legacy and non-hybrid parts will not have a type defined. Thus,
> > it may not make sense for them to expose a type in sysfs.
> 
> That's fine, not having a sysfs file if you don't know the type is fine.
> Or you can fix that up and show the type of those as well, why wouldn't
> you want to?

What information should be used for the fixup? On legacy x86 parts,
CPUID would not have the leaf that provides this information. Maybe an
"unknown" type?

> 
> > > > I thought of this as maybe intel_atcm vs a potential amd_minizen. I
> > > > guess arch is sufficient, as there will never be an amd_atom.
> > > 
> > > Even if there is, that can be part of the cpu "type" that you have for
> > > your enumerated type, right?
> > 
> > Indeed. The interface I propose lets archs define their own string
> > names.
> 
> Arches _should_ define their own names, otherwise who would?

Yes, ineed.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-06  1:15     ` Ricardo Neri
@ 2020-10-10  3:14       ` Randy Dunlap
  0 siblings, 0 replies; 34+ messages in thread
From: Randy Dunlap @ 2020-10-10  3:14 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Greg Kroah-Hartman, x86, Borislav Petkov, Ingo Molnar,
	Thomas Gleixner, Rafael J. Wysocki, Tony Luck, Len Brown,
	Ravi V. Shankar, linux-kernel, Andi Kleen, Dave Hansen,
	Gautham R. Shenoy, Kan Liang, Srinivas Pandruvada

On 10/5/20 6:15 PM, Ricardo Neri wrote:
> On Fri, Oct 02, 2020 at 08:27:41PM -0700, Randy Dunlap wrote:
>> On 10/2/20 6:17 PM, Ricardo Neri wrote:
>>> +/**
>>> + * arch_get_cpu_type() - Get the CPU type number
>>> + * @cpu:	Index of the CPU of which the index is needed
>>> + *
>>> + * Get the CPU type number of @cpu, a non-zero unsigned 32-bit number that
> 
> Thank you for your feedback Randy!
> 
>> Are you sure that @cpu is non-zero?
> 
> This is the intent. Maybe it can be reworked to return -EINVAL instead?
> I gues it is plausible but less likely that an arch defines a type as
> 0xffffffea;

You lost me here.

>>
>>> + * uniquely identifies a type of CPU micro-architecture. All CPUs of the same
>>> + * type have the same type number. Type numbers are defined by each CPU
>>> + * architecture.
>>> + */
>>> +u32 __weak arch_get_cpu_type(int cpu)
>>> +{
>>> +	return 0;
>>> +}
>>
>> arch_get_cpu_type() in patch 4/4 allows @cpu to be 0.
> 
> It should not return 0 if the system does not have
> X86_FEATURE_HYBRID_CPU. The currently existing CPU types are all
> non-zero as per the Intel SDM. Am I missing anything?

@cpu is a cpu index, not a cpu type.

-- 
~Randy


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-10-07  5:15           ` Greg Kroah-Hartman
  2020-10-08  3:34             ` Ricardo Neri
@ 2020-11-12  6:19             ` Brice Goglin
  2020-11-12  6:42               ` Greg Kroah-Hartman
  1 sibling, 1 reply; 34+ messages in thread
From: Brice Goglin @ 2020-11-12  6:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Ricardo Neri
  Cc: x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

Le 07/10/2020 à 07:15, Greg Kroah-Hartman a écrit :
> On Tue, Oct 06, 2020 at 08:14:47PM -0700, Ricardo Neri wrote:
>> On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote:
>>> On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
>>>> On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
>>>>> On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
>>>>>> Hybrid CPU topologies combine CPUs of different microarchitectures in the
>>>>>> same die. Thus, even though the instruction set is compatible among all
>>>>>> CPUs, there may still be differences in features (e.g., some CPUs may
>>>>>> have counters that others CPU do not). There may be applications
>>>>>> interested in knowing the type of micro-architecture topology of the
>>>>>> system to make decisions about process affinity.
>>>>>>
>>>>>> While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
>>>>>> cpu_capacity) may be used to infer the types of micro-architecture of the
>>>>>> CPUs in the platform, it may not be entirely accurate. For instance, two
>>>>>> subsets of CPUs with different types of micro-architecture may have the
>>>>>> same capacity due to power or thermal constraints.
>>>>>>
>>>>>> Create the new directory /sys/devices/system/cpu/types. Under such
>>>>>> directory, create individual subdirectories for each type of CPU micro-
>>>>>> architecture. Each subdirectory will have cpulist and cpumap files. This
>>>>>> makes it convenient for user space to read all the CPUs of the same type
>>>>>> at once without having to inspect each CPU individually.
>>>>>>
>>>>>> Implement a generic interface using weak functions that architectures can
>>>>>> override to indicate a) support for CPU types, b) the CPU type number, and
>>>>>> c) a string to identify the CPU vendor and type.
>>>>>>
>>>>>> For example, an x86 system with one Intel Core and four Intel Atom CPUs
>>>>>> would look like this (other architectures have the hooks to use whatever
>>>>>> directory naming convention below "types" that meets their needs):
>>>>>>
>>>>>> user@host:~$: ls /sys/devices/system/cpu/types
>>>>>> intel_atom_0  intel_core_0
>>>>>>
>>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
>>>>>> cpulist cpumap
>>>>>>
>>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
>>>>>> cpulist cpumap
>>>>>>
>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
>>>>>> 0f
>>>>>>
>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
>>>>>> 0-3
>>>>>>
>>>>>> user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
>>>>>> 10
>>>>>>
>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
>>>>>> 4
>>>> Thank you for the quick and detailed Greg!
>>>>
>>>>> The output of 'tree' sometimes makes it easier to see here, or:
>>>>> 	grep -R . *
>>>>> also works well.
>>>> Indeed, this would definitely make it more readable.
>>>>
>>>>>> On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
>>>>>> created. Add a hook for this purpose.
>>>>> Why should these not show up if the system is not "hybrid"?
>>>> My thinking was that on a non-hybrid system, it does not make sense to
>>>> create this interface, as all the CPUs will be of the same type.
>>> Why not just have this an attribute type in the existing cpuX directory?
>>> Why do this have to be a totally separate directory and userspace has to
>>> figure out to look in two different spots for the same cpu to determine
>>> what it is?
>> But if the type is located under cpuX, usespace would need to traverse
>> all the CPUs and create its own cpu masks. Under the types directory it
>> would only need to look once for each type of CPU, IMHO.
> What does a "mask" do?  What does userspace care about this?  You would
> have to create it by traversing the directories you are creating anyway,
> so it's not much different, right?


Hello

Sorry for the late reply. As the first userspace consumer of this
interface [1], I can confirm that reading a single file to get the mask
would be better, at least for performance reason. On large platforms, we
already have to read thousands of sysfs files to get CPU topology and
cache information, I'd be happy not to read one more file per cpu.

Reading these sysfs files is slow, and it does not scale well when
multiple processes read them in parallel. There are ways to avoid this
multiple discoveries by sharing hwloc info through XML or shmem, but it
will take years before all developers of different runtimes all
implement this :)

Thanks

Brice

[1] hwloc and lstopo are going to expose this hybrid info to HPC
runtimes (https://www.open-mpi.org/projects/hwloc/)



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-11-12  6:19             ` Brice Goglin
@ 2020-11-12  6:42               ` Greg Kroah-Hartman
  2020-11-12  9:10                 ` Brice Goglin
  0 siblings, 1 reply; 34+ messages in thread
From: Greg Kroah-Hartman @ 2020-11-12  6:42 UTC (permalink / raw)
  To: Brice Goglin
  Cc: Ricardo Neri, x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Thu, Nov 12, 2020 at 07:19:48AM +0100, Brice Goglin wrote:
> Le 07/10/2020 à 07:15, Greg Kroah-Hartman a écrit :
> > On Tue, Oct 06, 2020 at 08:14:47PM -0700, Ricardo Neri wrote:
> >> On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote:
> >>> On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
> >>>> On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
> >>>>> On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
> >>>>>> Hybrid CPU topologies combine CPUs of different microarchitectures in the
> >>>>>> same die. Thus, even though the instruction set is compatible among all
> >>>>>> CPUs, there may still be differences in features (e.g., some CPUs may
> >>>>>> have counters that others CPU do not). There may be applications
> >>>>>> interested in knowing the type of micro-architecture topology of the
> >>>>>> system to make decisions about process affinity.
> >>>>>>
> >>>>>> While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> >>>>>> cpu_capacity) may be used to infer the types of micro-architecture of the
> >>>>>> CPUs in the platform, it may not be entirely accurate. For instance, two
> >>>>>> subsets of CPUs with different types of micro-architecture may have the
> >>>>>> same capacity due to power or thermal constraints.
> >>>>>>
> >>>>>> Create the new directory /sys/devices/system/cpu/types. Under such
> >>>>>> directory, create individual subdirectories for each type of CPU micro-
> >>>>>> architecture. Each subdirectory will have cpulist and cpumap files. This
> >>>>>> makes it convenient for user space to read all the CPUs of the same type
> >>>>>> at once without having to inspect each CPU individually.
> >>>>>>
> >>>>>> Implement a generic interface using weak functions that architectures can
> >>>>>> override to indicate a) support for CPU types, b) the CPU type number, and
> >>>>>> c) a string to identify the CPU vendor and type.
> >>>>>>
> >>>>>> For example, an x86 system with one Intel Core and four Intel Atom CPUs
> >>>>>> would look like this (other architectures have the hooks to use whatever
> >>>>>> directory naming convention below "types" that meets their needs):
> >>>>>>
> >>>>>> user@host:~$: ls /sys/devices/system/cpu/types
> >>>>>> intel_atom_0  intel_core_0
> >>>>>>
> >>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> >>>>>> cpulist cpumap
> >>>>>>
> >>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> >>>>>> cpulist cpumap
> >>>>>>
> >>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> >>>>>> 0f
> >>>>>>
> >>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> >>>>>> 0-3
> >>>>>>
> >>>>>> user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> >>>>>> 10
> >>>>>>
> >>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> >>>>>> 4
> >>>> Thank you for the quick and detailed Greg!
> >>>>
> >>>>> The output of 'tree' sometimes makes it easier to see here, or:
> >>>>> 	grep -R . *
> >>>>> also works well.
> >>>> Indeed, this would definitely make it more readable.
> >>>>
> >>>>>> On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> >>>>>> created. Add a hook for this purpose.
> >>>>> Why should these not show up if the system is not "hybrid"?
> >>>> My thinking was that on a non-hybrid system, it does not make sense to
> >>>> create this interface, as all the CPUs will be of the same type.
> >>> Why not just have this an attribute type in the existing cpuX directory?
> >>> Why do this have to be a totally separate directory and userspace has to
> >>> figure out to look in two different spots for the same cpu to determine
> >>> what it is?
> >> But if the type is located under cpuX, usespace would need to traverse
> >> all the CPUs and create its own cpu masks. Under the types directory it
> >> would only need to look once for each type of CPU, IMHO.
> > What does a "mask" do?  What does userspace care about this?  You would
> > have to create it by traversing the directories you are creating anyway,
> > so it's not much different, right?
> 
> 
> Hello
> 
> Sorry for the late reply. As the first userspace consumer of this
> interface [1], I can confirm that reading a single file to get the mask
> would be better, at least for performance reason. On large platforms, we
> already have to read thousands of sysfs files to get CPU topology and
> cache information, I'd be happy not to read one more file per cpu.
> 
> Reading these sysfs files is slow, and it does not scale well when
> multiple processes read them in parallel.

Really?  Where is the slowdown?  Would something like readfile() work
better for you for that?
	https://lore.kernel.org/linux-api/20200704140250.423345-1-gregkh@linuxfoundation.org/

How does multiple processes slow anything down, there shouldn't be any
shared locks here.

> There are ways to avoid this
> multiple discoveries by sharing hwloc info through XML or shmem, but it
> will take years before all developers of different runtimes all
> implement this :)

I don't understand, what exactly are you suggesting we do here instead?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-11-12  6:42               ` Greg Kroah-Hartman
@ 2020-11-12  9:10                 ` Brice Goglin
  2020-11-12 10:49                   ` Greg Kroah-Hartman
  0 siblings, 1 reply; 34+ messages in thread
From: Brice Goglin @ 2020-11-12  9:10 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Ricardo Neri, x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada


Le 12/11/2020 à 07:42, Greg Kroah-Hartman a écrit :
> On Thu, Nov 12, 2020 at 07:19:48AM +0100, Brice Goglin wrote:
>> Le 07/10/2020 à 07:15, Greg Kroah-Hartman a écrit :
>>> On Tue, Oct 06, 2020 at 08:14:47PM -0700, Ricardo Neri wrote:
>>>> On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote:
>>>>> On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
>>>>>> On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
>>>>>>> On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
>>>>>>>> Hybrid CPU topologies combine CPUs of different microarchitectures in the
>>>>>>>> same die. Thus, even though the instruction set is compatible among all
>>>>>>>> CPUs, there may still be differences in features (e.g., some CPUs may
>>>>>>>> have counters that others CPU do not). There may be applications
>>>>>>>> interested in knowing the type of micro-architecture topology of the
>>>>>>>> system to make decisions about process affinity.
>>>>>>>>
>>>>>>>> While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
>>>>>>>> cpu_capacity) may be used to infer the types of micro-architecture of the
>>>>>>>> CPUs in the platform, it may not be entirely accurate. For instance, two
>>>>>>>> subsets of CPUs with different types of micro-architecture may have the
>>>>>>>> same capacity due to power or thermal constraints.
>>>>>>>>
>>>>>>>> Create the new directory /sys/devices/system/cpu/types. Under such
>>>>>>>> directory, create individual subdirectories for each type of CPU micro-
>>>>>>>> architecture. Each subdirectory will have cpulist and cpumap files. This
>>>>>>>> makes it convenient for user space to read all the CPUs of the same type
>>>>>>>> at once without having to inspect each CPU individually.
>>>>>>>>
>>>>>>>> Implement a generic interface using weak functions that architectures can
>>>>>>>> override to indicate a) support for CPU types, b) the CPU type number, and
>>>>>>>> c) a string to identify the CPU vendor and type.
>>>>>>>>
>>>>>>>> For example, an x86 system with one Intel Core and four Intel Atom CPUs
>>>>>>>> would look like this (other architectures have the hooks to use whatever
>>>>>>>> directory naming convention below "types" that meets their needs):
>>>>>>>>
>>>>>>>> user@host:~$: ls /sys/devices/system/cpu/types
>>>>>>>> intel_atom_0  intel_core_0
>>>>>>>>
>>>>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
>>>>>>>> cpulist cpumap
>>>>>>>>
>>>>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
>>>>>>>> cpulist cpumap
>>>>>>>>
>>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
>>>>>>>> 0f
>>>>>>>>
>>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
>>>>>>>> 0-3
>>>>>>>>
>>>>>>>> user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
>>>>>>>> 10
>>>>>>>>
>>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
>>>>>>>> 4
>>>>>> Thank you for the quick and detailed Greg!
>>>>>>
>>>>>>> The output of 'tree' sometimes makes it easier to see here, or:
>>>>>>> 	grep -R . *
>>>>>>> also works well.
>>>>>> Indeed, this would definitely make it more readable.
>>>>>>
>>>>>>>> On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
>>>>>>>> created. Add a hook for this purpose.
>>>>>>> Why should these not show up if the system is not "hybrid"?
>>>>>> My thinking was that on a non-hybrid system, it does not make sense to
>>>>>> create this interface, as all the CPUs will be of the same type.
>>>>> Why not just have this an attribute type in the existing cpuX directory?
>>>>> Why do this have to be a totally separate directory and userspace has to
>>>>> figure out to look in two different spots for the same cpu to determine
>>>>> what it is?
>>>> But if the type is located under cpuX, usespace would need to traverse
>>>> all the CPUs and create its own cpu masks. Under the types directory it
>>>> would only need to look once for each type of CPU, IMHO.
>>> What does a "mask" do?  What does userspace care about this?  You would
>>> have to create it by traversing the directories you are creating anyway,
>>> so it's not much different, right?
>>
>> Hello
>>
>> Sorry for the late reply. As the first userspace consumer of this
>> interface [1], I can confirm that reading a single file to get the mask
>> would be better, at least for performance reason. On large platforms, we
>> already have to read thousands of sysfs files to get CPU topology and
>> cache information, I'd be happy not to read one more file per cpu.
>>
>> Reading these sysfs files is slow, and it does not scale well when
>> multiple processes read them in parallel.
> Really?  Where is the slowdown?  Would something like readfile() work
> better for you for that?
> 	https://lore.kernel.org/linux-api/20200704140250.423345-1-gregkh@linuxfoundation.org/


I guess readfile would improve the sequential case by avoiding syscalls
but it would not improve the parallel case since syscalls shouldn't have
any parallel issue?

We've been watching the status of readfile() since it was posted on LKML
6 months ago, but we were actually wondering if it would end up being
included at some point.


> How does multiple processes slow anything down, there shouldn't be any
> shared locks here.


When I benchmarked this in 2016, reading a single (small) sysfs file was
41x slower when running 64 processes simultaneously on a 64-core Knights
Landing than reading from a single process. On a SGI Altix UV with 12x
8-core CPUs, reading from one process per CPU (12 total) was 60x slower
(which could mean NUMA affinity matters), and reading from one process
per core (96 total) was 491x slower.

I will try to find some time to dig further on recent kernels with perf
and readfile (both machines were running RHEL7).


>> There are ways to avoid this
>> multiple discoveries by sharing hwloc info through XML or shmem, but it
>> will take years before all developers of different runtimes all
>> implement this :)
> I don't understand, what exactly are you suggesting we do here instead?


I was just saying userspace has ways to mitigate the issue but it will
take time because many different projects are involved.

Brice



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-11-12  9:10                 ` Brice Goglin
@ 2020-11-12 10:49                   ` Greg Kroah-Hartman
  2020-11-17 15:55                     ` Brice Goglin
       [not found]                     ` <38f290d2-4c3a-d1b0-f3cc-a0897ea10abd@gmail.com>
  0 siblings, 2 replies; 34+ messages in thread
From: Greg Kroah-Hartman @ 2020-11-12 10:49 UTC (permalink / raw)
  To: Brice Goglin
  Cc: Ricardo Neri, x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Thu, Nov 12, 2020 at 10:10:57AM +0100, Brice Goglin wrote:
> 
> Le 12/11/2020 à 07:42, Greg Kroah-Hartman a écrit :
> > On Thu, Nov 12, 2020 at 07:19:48AM +0100, Brice Goglin wrote:
> >> Le 07/10/2020 à 07:15, Greg Kroah-Hartman a écrit :
> >>> On Tue, Oct 06, 2020 at 08:14:47PM -0700, Ricardo Neri wrote:
> >>>> On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote:
> >>>>> On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
> >>>>>> On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
> >>>>>>> On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
> >>>>>>>> Hybrid CPU topologies combine CPUs of different microarchitectures in the
> >>>>>>>> same die. Thus, even though the instruction set is compatible among all
> >>>>>>>> CPUs, there may still be differences in features (e.g., some CPUs may
> >>>>>>>> have counters that others CPU do not). There may be applications
> >>>>>>>> interested in knowing the type of micro-architecture topology of the
> >>>>>>>> system to make decisions about process affinity.
> >>>>>>>>
> >>>>>>>> While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> >>>>>>>> cpu_capacity) may be used to infer the types of micro-architecture of the
> >>>>>>>> CPUs in the platform, it may not be entirely accurate. For instance, two
> >>>>>>>> subsets of CPUs with different types of micro-architecture may have the
> >>>>>>>> same capacity due to power or thermal constraints.
> >>>>>>>>
> >>>>>>>> Create the new directory /sys/devices/system/cpu/types. Under such
> >>>>>>>> directory, create individual subdirectories for each type of CPU micro-
> >>>>>>>> architecture. Each subdirectory will have cpulist and cpumap files. This
> >>>>>>>> makes it convenient for user space to read all the CPUs of the same type
> >>>>>>>> at once without having to inspect each CPU individually.
> >>>>>>>>
> >>>>>>>> Implement a generic interface using weak functions that architectures can
> >>>>>>>> override to indicate a) support for CPU types, b) the CPU type number, and
> >>>>>>>> c) a string to identify the CPU vendor and type.
> >>>>>>>>
> >>>>>>>> For example, an x86 system with one Intel Core and four Intel Atom CPUs
> >>>>>>>> would look like this (other architectures have the hooks to use whatever
> >>>>>>>> directory naming convention below "types" that meets their needs):
> >>>>>>>>
> >>>>>>>> user@host:~$: ls /sys/devices/system/cpu/types
> >>>>>>>> intel_atom_0  intel_core_0
> >>>>>>>>
> >>>>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> >>>>>>>> cpulist cpumap
> >>>>>>>>
> >>>>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> >>>>>>>> cpulist cpumap
> >>>>>>>>
> >>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> >>>>>>>> 0f
> >>>>>>>>
> >>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> >>>>>>>> 0-3
> >>>>>>>>
> >>>>>>>> user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> >>>>>>>> 10
> >>>>>>>>
> >>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> >>>>>>>> 4
> >>>>>> Thank you for the quick and detailed Greg!
> >>>>>>
> >>>>>>> The output of 'tree' sometimes makes it easier to see here, or:
> >>>>>>> 	grep -R . *
> >>>>>>> also works well.
> >>>>>> Indeed, this would definitely make it more readable.
> >>>>>>
> >>>>>>>> On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> >>>>>>>> created. Add a hook for this purpose.
> >>>>>>> Why should these not show up if the system is not "hybrid"?
> >>>>>> My thinking was that on a non-hybrid system, it does not make sense to
> >>>>>> create this interface, as all the CPUs will be of the same type.
> >>>>> Why not just have this an attribute type in the existing cpuX directory?
> >>>>> Why do this have to be a totally separate directory and userspace has to
> >>>>> figure out to look in two different spots for the same cpu to determine
> >>>>> what it is?
> >>>> But if the type is located under cpuX, usespace would need to traverse
> >>>> all the CPUs and create its own cpu masks. Under the types directory it
> >>>> would only need to look once for each type of CPU, IMHO.
> >>> What does a "mask" do?  What does userspace care about this?  You would
> >>> have to create it by traversing the directories you are creating anyway,
> >>> so it's not much different, right?
> >>
> >> Hello
> >>
> >> Sorry for the late reply. As the first userspace consumer of this
> >> interface [1], I can confirm that reading a single file to get the mask
> >> would be better, at least for performance reason. On large platforms, we
> >> already have to read thousands of sysfs files to get CPU topology and
> >> cache information, I'd be happy not to read one more file per cpu.
> >>
> >> Reading these sysfs files is slow, and it does not scale well when
> >> multiple processes read them in parallel.
> > Really?  Where is the slowdown?  Would something like readfile() work
> > better for you for that?
> > 	https://lore.kernel.org/linux-api/20200704140250.423345-1-gregkh@linuxfoundation.org/
> 
> 
> I guess readfile would improve the sequential case by avoiding syscalls
> but it would not improve the parallel case since syscalls shouldn't have
> any parallel issue?

syscalls should not have parallel issues at all.

> We've been watching the status of readfile() since it was posted on LKML
> 6 months ago, but we were actually wondering if it would end up being
> included at some point.

It needs a solid reason to be merged.  My "test" benchmarks are fun to
run, but I have yet to find a real need for it anywhere as the
open/read/close syscall overhead seems to be lost in the noise on any
real application workload that I can find.

If you have a real need, and it reduces overhead and cpu usage, I'm more
than willing to update the patchset and resubmit it.

> > How does multiple processes slow anything down, there shouldn't be any
> > shared locks here.
> 
> 
> When I benchmarked this in 2016, reading a single (small) sysfs file was
> 41x slower when running 64 processes simultaneously on a 64-core Knights
> Landing than reading from a single process. On a SGI Altix UV with 12x
> 8-core CPUs, reading from one process per CPU (12 total) was 60x slower
> (which could mean NUMA affinity matters), and reading from one process
> per core (96 total) was 491x slower.
> 
> I will try to find some time to dig further on recent kernels with perf
> and readfile (both machines were running RHEL7).

2016 was a long time ago in kernel-land, please retest on a kernel.org
release, not a RHEL monstrosity.

> >> There are ways to avoid this
> >> multiple discoveries by sharing hwloc info through XML or shmem, but it
> >> will take years before all developers of different runtimes all
> >> implement this :)
> > I don't understand, what exactly are you suggesting we do here instead?
> 
> 
> I was just saying userspace has ways to mitigate the issue but it will
> take time because many different projects are involved.

I still don't understand, what issue are you referring to?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
       [not found]                     ` <38f290d2-4c3a-d1b0-f3cc-a0897ea10abd@gmail.com>
@ 2020-11-12 11:34                       ` Greg Kroah-Hartman
  2020-11-19  8:25                       ` Fox Chen
  1 sibling, 0 replies; 34+ messages in thread
From: Greg Kroah-Hartman @ 2020-11-12 11:34 UTC (permalink / raw)
  To: Brice Goglin
  Cc: Ricardo Neri, x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Thu, Nov 12, 2020 at 12:21:43PM +0100, Brice Goglin wrote:
> 
> Le 12/11/2020 à 11:49, Greg Kroah-Hartman a écrit :
> > On Thu, Nov 12, 2020 at 10:10:57AM +0100, Brice Goglin wrote:
> >> Le 12/11/2020 à 07:42, Greg Kroah-Hartman a écrit :
> >>> On Thu, Nov 12, 2020 at 07:19:48AM +0100, Brice Goglin wrote:
> >>>> Le 07/10/2020 à 07:15, Greg Kroah-Hartman a écrit :
> >>>>> On Tue, Oct 06, 2020 at 08:14:47PM -0700, Ricardo Neri wrote:
> >>>>>> On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote:
> >>>>>>> On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
> >>>>>>>> On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
> >>>>>>>>> On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
> >>>>>>>>>> Hybrid CPU topologies combine CPUs of different microarchitectures in the
> >>>>>>>>>> same die. Thus, even though the instruction set is compatible among all
> >>>>>>>>>> CPUs, there may still be differences in features (e.g., some CPUs may
> >>>>>>>>>> have counters that others CPU do not). There may be applications
> >>>>>>>>>> interested in knowing the type of micro-architecture topology of the
> >>>>>>>>>> system to make decisions about process affinity.
> >>>>>>>>>>
> >>>>>>>>>> While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> >>>>>>>>>> cpu_capacity) may be used to infer the types of micro-architecture of the
> >>>>>>>>>> CPUs in the platform, it may not be entirely accurate. For instance, two
> >>>>>>>>>> subsets of CPUs with different types of micro-architecture may have the
> >>>>>>>>>> same capacity due to power or thermal constraints.
> >>>>>>>>>>
> >>>>>>>>>> Create the new directory /sys/devices/system/cpu/types. Under such
> >>>>>>>>>> directory, create individual subdirectories for each type of CPU micro-
> >>>>>>>>>> architecture. Each subdirectory will have cpulist and cpumap files. This
> >>>>>>>>>> makes it convenient for user space to read all the CPUs of the same type
> >>>>>>>>>> at once without having to inspect each CPU individually.
> >>>>>>>>>>
> >>>>>>>>>> Implement a generic interface using weak functions that architectures can
> >>>>>>>>>> override to indicate a) support for CPU types, b) the CPU type number, and
> >>>>>>>>>> c) a string to identify the CPU vendor and type.
> >>>>>>>>>>
> >>>>>>>>>> For example, an x86 system with one Intel Core and four Intel Atom CPUs
> >>>>>>>>>> would look like this (other architectures have the hooks to use whatever
> >>>>>>>>>> directory naming convention below "types" that meets their needs):
> >>>>>>>>>>
> >>>>>>>>>> user@host:~$: ls /sys/devices/system/cpu/types
> >>>>>>>>>> intel_atom_0  intel_core_0
> >>>>>>>>>>
> >>>>>>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> >>>>>>>>>> cpulist cpumap
> >>>>>>>>>>
> >>>>>>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> >>>>>>>>>> cpulist cpumap
> >>>>>>>>>>
> >>>>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> >>>>>>>>>> 0f
> >>>>>>>>>>
> >>>>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> >>>>>>>>>> 0-3
> >>>>>>>>>>
> >>>>>>>>>> user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> >>>>>>>>>> 10
> >>>>>>>>>>
> >>>>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> >>>>>>>>>> 4
> >>>>>>>> Thank you for the quick and detailed Greg!
> >>>>>>>>
> >>>>>>>>> The output of 'tree' sometimes makes it easier to see here, or:
> >>>>>>>>> 	grep -R . *
> >>>>>>>>> also works well.
> >>>>>>>> Indeed, this would definitely make it more readable.
> >>>>>>>>
> >>>>>>>>>> On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> >>>>>>>>>> created. Add a hook for this purpose.
> >>>>>>>>> Why should these not show up if the system is not "hybrid"?
> >>>>>>>> My thinking was that on a non-hybrid system, it does not make sense to
> >>>>>>>> create this interface, as all the CPUs will be of the same type.
> >>>>>>> Why not just have this an attribute type in the existing cpuX directory?
> >>>>>>> Why do this have to be a totally separate directory and userspace has to
> >>>>>>> figure out to look in two different spots for the same cpu to determine
> >>>>>>> what it is?
> >>>>>> But if the type is located under cpuX, usespace would need to traverse
> >>>>>> all the CPUs and create its own cpu masks. Under the types directory it
> >>>>>> would only need to look once for each type of CPU, IMHO.
> >>>>> What does a "mask" do?  What does userspace care about this?  You would
> >>>>> have to create it by traversing the directories you are creating anyway,
> >>>>> so it's not much different, right?
> >>>> Hello
> >>>>
> >>>> Sorry for the late reply. As the first userspace consumer of this
> >>>> interface [1], I can confirm that reading a single file to get the mask
> >>>> would be better, at least for performance reason. On large platforms, we
> >>>> already have to read thousands of sysfs files to get CPU topology and
> >>>> cache information, I'd be happy not to read one more file per cpu.
> >>>>
> >>>> Reading these sysfs files is slow, and it does not scale well when
> >>>> multiple processes read them in parallel.
> >>> Really?  Where is the slowdown?  Would something like readfile() work
> >>> better for you for that?
> >>> 	https://lore.kernel.org/linux-api/20200704140250.423345-1-gregkh@linuxfoundation.org/
> >>
> >> I guess readfile would improve the sequential case by avoiding syscalls
> >> but it would not improve the parallel case since syscalls shouldn't have
> >> any parallel issue?
> > syscalls should not have parallel issues at all.
> >
> >> We've been watching the status of readfile() since it was posted on LKML
> >> 6 months ago, but we were actually wondering if it would end up being
> >> included at some point.
> > It needs a solid reason to be merged.  My "test" benchmarks are fun to
> > run, but I have yet to find a real need for it anywhere as the
> > open/read/close syscall overhead seems to be lost in the noise on any
> > real application workload that I can find.
> >
> > If you have a real need, and it reduces overhead and cpu usage, I'm more
> > than willing to update the patchset and resubmit it.
> 
> 
> Good, I'll give it at try.
> 
> 
> >>> How does multiple processes slow anything down, there shouldn't be any
> >>> shared locks here.
> >>
> >> When I benchmarked this in 2016, reading a single (small) sysfs file was
> >> 41x slower when running 64 processes simultaneously on a 64-core Knights
> >> Landing than reading from a single process. On a SGI Altix UV with 12x
> >> 8-core CPUs, reading from one process per CPU (12 total) was 60x slower
> >> (which could mean NUMA affinity matters), and reading from one process
> >> per core (96 total) was 491x slower.
> >>
> >> I will try to find some time to dig further on recent kernels with perf
> >> and readfile (both machines were running RHEL7).
> > 2016 was a long time ago in kernel-land, please retest on a kernel.org
> > release, not a RHEL monstrosity.
> 
> 
> Quick test on 5.8.14 from Debian (fairly close to mainline) on a server
> with 2x20 cores.
> 
> I am measuring the time to do open+read+close of
> /sys/devices/system/cpu/cpu15/topology/die_id 1000 times
> 
> With a single process, it takes 2ms (2us per open+read+close, looks OK).
> 
> With one process per core (with careful binding, etc), it jumps from 2ms
> to 190ms (without much variation).
> 
> It looks like locks in kernfs_iop_permission and kernfs_dop_revalidate
> are causing the issue.
> 
> I am attaching the perf report callgraph output below.

Ouch, yes, we are hitting the single kernfs mutex for all of this, not
nice.  I'll add this to my list of things to look at in the near future,
thanks for the report!

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-11-12 10:49                   ` Greg Kroah-Hartman
@ 2020-11-17 15:55                     ` Brice Goglin
  2020-11-18 10:45                       ` Brice Goglin
       [not found]                     ` <38f290d2-4c3a-d1b0-f3cc-a0897ea10abd@gmail.com>
  1 sibling, 1 reply; 34+ messages in thread
From: Brice Goglin @ 2020-11-17 15:55 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Ricardo Neri, x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada


Le 12/11/2020 à 11:49, Greg Kroah-Hartman a écrit :
> On Thu, Nov 12, 2020 at 10:10:57AM +0100, Brice Goglin wrote:
>> Le 12/11/2020 à 07:42, Greg Kroah-Hartman a écrit :
>>> On Thu, Nov 12, 2020 at 07:19:48AM +0100, Brice Goglin wrote:
>>>>
>>>> Hello
>>>>
>>>> Sorry for the late reply. As the first userspace consumer of this
>>>> interface [1], I can confirm that reading a single file to get the mask
>>>> would be better, at least for performance reason. On large platforms, we
>>>> already have to read thousands of sysfs files to get CPU topology and
>>>> cache information, I'd be happy not to read one more file per cpu.
>>>>
>>>> Reading these sysfs files is slow, and it does not scale well when
>>>> multiple processes read them in parallel.
>>> Really?  Where is the slowdown?  Would something like readfile() work
>>> better for you for that?
>>> 	https://lore.kernel.org/linux-api/20200704140250.423345-1-gregkh@linuxfoundation.org/
>>
>> I guess readfile would improve the sequential case by avoiding syscalls
>> but it would not improve the parallel case since syscalls shouldn't have
>> any parallel issue?
> syscalls should not have parallel issues at all.
>
>> We've been watching the status of readfile() since it was posted on LKML
>> 6 months ago, but we were actually wondering if it would end up being
>> included at some point.
> It needs a solid reason to be merged.  My "test" benchmarks are fun to
> run, but I have yet to find a real need for it anywhere as the
> open/read/close syscall overhead seems to be lost in the noise on any
> real application workload that I can find.
>
> If you have a real need, and it reduces overhead and cpu usage, I'm more
> than willing to update the patchset and resubmit it.
>
>

Hello

I updated hwloc to use readfile instead of open+read+close on all those
small sysfs/procfs files. Unfortunately the improvement is very small,
only a couple percents. On a 40 core server, our library starts in 38ms
instead of 39ms. I can't deploy your patches on larger machines, but I
tested our code on a copy of their sysfs files saved on a local disk :
For a 256-thread KNL, we go from 15ms to 14ms. For a 896-core SGI
machine, from 73ms to 71ms.

I see 200ns improvement for readfile (2300) vs open+read+close (2500) on
my server when reading a single cpu topology file. With several
thousands of sysfs files to read in the above large hwloc tests, it
confirms an overall improvement in the order of 1ms.

So, just like you said, the overhead seems to be pretty much lost in the
noise of hwloc doing its own stuff after reading hundreds of sysfs files :/

Brice



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-11-17 15:55                     ` Brice Goglin
@ 2020-11-18 10:45                       ` Brice Goglin
  2020-11-18 10:57                         ` Greg Kroah-Hartman
  0 siblings, 1 reply; 34+ messages in thread
From: Brice Goglin @ 2020-11-18 10:45 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Ricardo Neri, x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada


Le 17/11/2020 à 16:55, Brice Goglin a écrit :
> Le 12/11/2020 à 11:49, Greg Kroah-Hartman a écrit :
>> On Thu, Nov 12, 2020 at 10:10:57AM +0100, Brice Goglin wrote:
>>> Le 12/11/2020 à 07:42, Greg Kroah-Hartman a écrit :
>>>> On Thu, Nov 12, 2020 at 07:19:48AM +0100, Brice Goglin wrote:
>>>>> Hello
>>>>>
>>>>> Sorry for the late reply. As the first userspace consumer of this
>>>>> interface [1], I can confirm that reading a single file to get the mask
>>>>> would be better, at least for performance reason. On large platforms, we
>>>>> already have to read thousands of sysfs files to get CPU topology and
>>>>> cache information, I'd be happy not to read one more file per cpu.
>>>>>
>>>>> Reading these sysfs files is slow, and it does not scale well when
>>>>> multiple processes read them in parallel.
>>>> Really?  Where is the slowdown?  Would something like readfile() work
>>>> better for you for that?
>>>> 	https://lore.kernel.org/linux-api/20200704140250.423345-1-gregkh@linuxfoundation.org/
>>> I guess readfile would improve the sequential case by avoiding syscalls
>>> but it would not improve the parallel case since syscalls shouldn't have
>>> any parallel issue?
>> syscalls should not have parallel issues at all.
>>
>>> We've been watching the status of readfile() since it was posted on LKML
>>> 6 months ago, but we were actually wondering if it would end up being
>>> included at some point.
>> It needs a solid reason to be merged.  My "test" benchmarks are fun to
>> run, but I have yet to find a real need for it anywhere as the
>> open/read/close syscall overhead seems to be lost in the noise on any
>> real application workload that I can find.
>>
>> If you have a real need, and it reduces overhead and cpu usage, I'm more
>> than willing to update the patchset and resubmit it.
>>
>>
> Hello
>
> I updated hwloc to use readfile instead of open+read+close on all those
> small sysfs/procfs files. Unfortunately the improvement is very small,
> only a couple percents. On a 40 core server, our library starts in 38ms
> instead of 39ms. I can't deploy your patches on larger machines, but I
> tested our code on a copy of their sysfs files saved on a local disk :
> For a 256-thread KNL, we go from 15ms to 14ms. For a 896-core SGI
> machine, from 73ms to 71ms.


Sorry, I forgot to update some codepaths to properly use readfile yesterday :/
Here are updated and more precise numbers that show a non-negligible improvement.
Again, we're measuring the entire hwloc topology discovery, which includes reading
many sysfs file (improved thanks to readfile) and then building a hierarchy of
objects describing the machine (not modified).

Server sysfs files (dual-socket x 20 cores x SMT-2)
default  43.48ms +/-4.48
readfile 42.15ms +/-4.58 => 3.1% better
1971 readfile calls => 674ns improvement per call

Knight Landing sysfs stored on local hard drive (64 cores x SMT-4)
default  14.60ms +/-0.91
readfile 13.63ms +/-1.05 => 6.6% better
2940 readfile calls => 329ns improvement per call

SGI Altix UV sysfs stored on local hard drive (56 sockets x 8 coeurs x SMT-2)
default  69.12ms +/-1.40
readfile 66.03ms +/-1.35 => 4.5% better
14525 readfile calls => 212ns improvement per call

I don't know why the first case (real sysfs files) gets a much
higher standard deviation and higher improvement per readfile call.
The other two cases match what microbenmarks say
(about 200ns improvement per readfile call).

Brice




>
> I see 200ns improvement for readfile (2300) vs open+read+close (2500) on
> my server when reading a single cpu topology file. With several
> thousands of sysfs files to read in the above large hwloc tests, it
> confirms an overall improvement in the order of 1ms.
>
> So, just like you said, the overhead seems to be pretty much lost in the
> noise of hwloc doing its own stuff after reading hundreds of sysfs files :/
>
> Brice
>
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
  2020-11-18 10:45                       ` Brice Goglin
@ 2020-11-18 10:57                         ` Greg Kroah-Hartman
  0 siblings, 0 replies; 34+ messages in thread
From: Greg Kroah-Hartman @ 2020-11-18 10:57 UTC (permalink / raw)
  To: Brice Goglin
  Cc: Ricardo Neri, x86, Borislav Petkov, Ingo Molnar, Thomas Gleixner,
	Rafael J. Wysocki, Tony Luck, Len Brown, Ravi V. Shankar,
	linux-kernel, Andi Kleen, Dave Hansen, Gautham R. Shenoy,
	Kan Liang, Srinivas Pandruvada

On Wed, Nov 18, 2020 at 11:45:46AM +0100, Brice Goglin wrote:
> 
> Le 17/11/2020 à 16:55, Brice Goglin a écrit :
> > Le 12/11/2020 à 11:49, Greg Kroah-Hartman a écrit :
> >> On Thu, Nov 12, 2020 at 10:10:57AM +0100, Brice Goglin wrote:
> >>> Le 12/11/2020 à 07:42, Greg Kroah-Hartman a écrit :
> >>>> On Thu, Nov 12, 2020 at 07:19:48AM +0100, Brice Goglin wrote:
> >>>>> Hello
> >>>>>
> >>>>> Sorry for the late reply. As the first userspace consumer of this
> >>>>> interface [1], I can confirm that reading a single file to get the mask
> >>>>> would be better, at least for performance reason. On large platforms, we
> >>>>> already have to read thousands of sysfs files to get CPU topology and
> >>>>> cache information, I'd be happy not to read one more file per cpu.
> >>>>>
> >>>>> Reading these sysfs files is slow, and it does not scale well when
> >>>>> multiple processes read them in parallel.
> >>>> Really?  Where is the slowdown?  Would something like readfile() work
> >>>> better for you for that?
> >>>> 	https://lore.kernel.org/linux-api/20200704140250.423345-1-gregkh@linuxfoundation.org/
> >>> I guess readfile would improve the sequential case by avoiding syscalls
> >>> but it would not improve the parallel case since syscalls shouldn't have
> >>> any parallel issue?
> >> syscalls should not have parallel issues at all.
> >>
> >>> We've been watching the status of readfile() since it was posted on LKML
> >>> 6 months ago, but we were actually wondering if it would end up being
> >>> included at some point.
> >> It needs a solid reason to be merged.  My "test" benchmarks are fun to
> >> run, but I have yet to find a real need for it anywhere as the
> >> open/read/close syscall overhead seems to be lost in the noise on any
> >> real application workload that I can find.
> >>
> >> If you have a real need, and it reduces overhead and cpu usage, I'm more
> >> than willing to update the patchset and resubmit it.
> >>
> >>
> > Hello
> >
> > I updated hwloc to use readfile instead of open+read+close on all those
> > small sysfs/procfs files. Unfortunately the improvement is very small,
> > only a couple percents. On a 40 core server, our library starts in 38ms
> > instead of 39ms. I can't deploy your patches on larger machines, but I
> > tested our code on a copy of their sysfs files saved on a local disk :
> > For a 256-thread KNL, we go from 15ms to 14ms. For a 896-core SGI
> > machine, from 73ms to 71ms.
> 
> 
> Sorry, I forgot to update some codepaths to properly use readfile yesterday :/
> Here are updated and more precise numbers that show a non-negligible improvement.
> Again, we're measuring the entire hwloc topology discovery, which includes reading
> many sysfs file (improved thanks to readfile) and then building a hierarchy of
> objects describing the machine (not modified).
> 
> Server sysfs files (dual-socket x 20 cores x SMT-2)
> default  43.48ms +/-4.48
> readfile 42.15ms +/-4.58 => 3.1% better
> 1971 readfile calls => 674ns improvement per call
> 
> Knight Landing sysfs stored on local hard drive (64 cores x SMT-4)
> default  14.60ms +/-0.91
> readfile 13.63ms +/-1.05 => 6.6% better
> 2940 readfile calls => 329ns improvement per call
> 
> SGI Altix UV sysfs stored on local hard drive (56 sockets x 8 coeurs x SMT-2)
> default  69.12ms +/-1.40
> readfile 66.03ms +/-1.35 => 4.5% better
> 14525 readfile calls => 212ns improvement per call
> 
> I don't know why the first case (real sysfs files) gets a much
> higher standard deviation and higher improvement per readfile call.
> The other two cases match what microbenmarks say
> (about 200ns improvement per readfile call).

Thanks a lot for the detailed numbers.  I think 6.6% is the best that I
saw in my microbenchmarks as well.

While that would be great for a userspace task that does real work,
doing this type of thing is arguably, not "real work" :)

I'll keep the patch updated over time just for fun, but odds are it's
not going to get merged unless we have some userspace workload that
really needs it.

But, you pointing out the global kernfs mutex is an issue, that we
should work on resolving.  I've pointed someone at it already and maybe
we will have some results in a few weeks.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface
       [not found]                     ` <38f290d2-4c3a-d1b0-f3cc-a0897ea10abd@gmail.com>
  2020-11-12 11:34                       ` Greg Kroah-Hartman
@ 2020-11-19  8:25                       ` Fox Chen
  1 sibling, 0 replies; 34+ messages in thread
From: Fox Chen @ 2020-11-19  8:25 UTC (permalink / raw)
  To: Brice Goglin; +Cc: Greg Kroah-Hartman, x86, linux-kernel

On Fri, Nov 13, 2020 at 2:15 PM Brice Goglin <brice.goglin@gmail.com> wrote:
>
>
> Le 12/11/2020 à 11:49, Greg Kroah-Hartman a écrit :
>
> On Thu, Nov 12, 2020 at 10:10:57AM +0100, Brice Goglin wrote:
>
> Le 12/11/2020 à 07:42, Greg Kroah-Hartman a écrit :
>
> On Thu, Nov 12, 2020 at 07:19:48AM +0100, Brice Goglin wrote:
>
> Le 07/10/2020 à 07:15, Greg Kroah-Hartman a écrit :
>
> On Tue, Oct 06, 2020 at 08:14:47PM -0700, Ricardo Neri wrote:
>
> On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote:
>
> On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
>
> On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
>
> On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
>
> Hybrid CPU topologies combine CPUs of different microarchitectures in the
> same die. Thus, even though the instruction set is compatible among all
> CPUs, there may still be differences in features (e.g., some CPUs may
> have counters that others CPU do not). There may be applications
> interested in knowing the type of micro-architecture topology of the
> system to make decisions about process affinity.
>
> While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> cpu_capacity) may be used to infer the types of micro-architecture of the
> CPUs in the platform, it may not be entirely accurate. For instance, two
> subsets of CPUs with different types of micro-architecture may have the
> same capacity due to power or thermal constraints.
>
> Create the new directory /sys/devices/system/cpu/types. Under such
> directory, create individual subdirectories for each type of CPU micro-
> architecture. Each subdirectory will have cpulist and cpumap files. This
> makes it convenient for user space to read all the CPUs of the same type
> at once without having to inspect each CPU individually.
>
> Implement a generic interface using weak functions that architectures can
> override to indicate a) support for CPU types, b) the CPU type number, and
> c) a string to identify the CPU vendor and type.
>
> For example, an x86 system with one Intel Core and four Intel Atom CPUs
> would look like this (other architectures have the hooks to use whatever
> directory naming convention below "types" that meets their needs):
>
> user@host:~$: ls /sys/devices/system/cpu/types
> intel_atom_0  intel_core_0
>
> user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> cpulist cpumap
>
> user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> cpulist cpumap
>
> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> 0f
>
> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> 0-3
>
> user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> 10
>
> user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> 4
>
> Thank you for the quick and detailed Greg!
>
> The output of 'tree' sometimes makes it easier to see here, or:
> grep -R . *
> also works well.
>
> Indeed, this would definitely make it more readable.
>
> On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> created. Add a hook for this purpose.
>
> Why should these not show up if the system is not "hybrid"?
>
> My thinking was that on a non-hybrid system, it does not make sense to
> create this interface, as all the CPUs will be of the same type.
>
> Why not just have this an attribute type in the existing cpuX directory?
> Why do this have to be a totally separate directory and userspace has to
> figure out to look in two different spots for the same cpu to determine
> what it is?
>
> But if the type is located under cpuX, usespace would need to traverse
> all the CPUs and create its own cpu masks. Under the types directory it
> would only need to look once for each type of CPU, IMHO.
>
> What does a "mask" do?  What does userspace care about this?  You would
> have to create it by traversing the directories you are creating anyway,
> so it's not much different, right?
>
> Hello
>
> Sorry for the late reply. As the first userspace consumer of this
> interface [1], I can confirm that reading a single file to get the mask
> would be better, at least for performance reason. On large platforms, we
> already have to read thousands of sysfs files to get CPU topology and
> cache information, I'd be happy not to read one more file per cpu.
>
> Reading these sysfs files is slow, and it does not scale well when
> multiple processes read them in parallel.
>
> Really?  Where is the slowdown?  Would something like readfile() work
> better for you for that?
> https://lore.kernel.org/linux-api/20200704140250.423345-1-gregkh@linuxfoundation.org/
>
> I guess readfile would improve the sequential case by avoiding syscalls
> but it would not improve the parallel case since syscalls shouldn't have
> any parallel issue?
>
> syscalls should not have parallel issues at all.
>
> We've been watching the status of readfile() since it was posted on LKML
> 6 months ago, but we were actually wondering if it would end up being
> included at some point.
>
> It needs a solid reason to be merged.  My "test" benchmarks are fun to
> run, but I have yet to find a real need for it anywhere as the
> open/read/close syscall overhead seems to be lost in the noise on any
> real application workload that I can find.
>
> If you have a real need, and it reduces overhead and cpu usage, I'm more
> than willing to update the patchset and resubmit it.
>
>
> Good, I'll give it at try.
>
>
> How does multiple processes slow anything down, there shouldn't be any
> shared locks here.
>
> When I benchmarked this in 2016, reading a single (small) sysfs file was
> 41x slower when running 64 processes simultaneously on a 64-core Knights
> Landing than reading from a single process. On a SGI Altix UV with 12x
> 8-core CPUs, reading from one process per CPU (12 total) was 60x slower
> (which could mean NUMA affinity matters), and reading from one process
> per core (96 total) was 491x slower.
>
> I will try to find some time to dig further on recent kernels with perf
> and readfile (both machines were running RHEL7).
>
> 2016 was a long time ago in kernel-land, please retest on a kernel.org
> release, not a RHEL monstrosity.
>
>
> Quick test on 5.8.14 from Debian (fairly close to mainline) on a server with 2x20 cores.
>
> I am measuring the time to do open+read+close of /sys/devices/system/cpu/cpu15/topology/die_id 1000 times
>
> With a single process, it takes 2ms (2us per open+read+close, looks OK).
>
> With one process per core (with careful binding, etc), it jumps from 2ms to 190ms (without much variation).
>
> It looks like locks in kernfs_iop_permission and kernfs_dop_revalidate are causing the issue.
>
> I am attaching the perf report callgraph output below.
>
>
>
> There are ways to avoid this
> multiple discoveries by sharing hwloc info through XML or shmem, but it
> will take years before all developers of different runtimes all
> implement this :)
>
> I don't understand, what exactly are you suggesting we do here instead?
>
> I was just saying userspace has ways to mitigate the issue but it will
> take time because many different projects are involved.
>
> I still don't understand, what issue are you referring to?
>
>
> Reading many sysfs files causing the application startup to takes many seconds when launching multiple process at the same time.
>
> Brice
>
>
> # To display the perf.data header info, please use --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 7K of event 'cycles'
> # Event count (approx.): 5291578622
> #
> # Children      Self  Command        Shared Object      Symbol
> # ........  ........  .............  .................  .......................................
> #
>     99.91%     0.00%  fops_overhead  [kernel.kallsyms]  [k] entry_SYSCALL_64_after_hwframe
>             |
>             ---entry_SYSCALL_64_after_hwframe
>                do_syscall_64
>                |
>                |--98.69%--__x64_sys_openat
>                |          |
>                |           --98.67%--do_sys_openat2
>                |                     |
>                |                      --98.57%--do_filp_open
>                |                                path_openat
>                |                                |
>                |                                |--81.83%--link_path_walk.part.0
>                |                                |          |
>                |                                |          |--52.19%--inode_permission.part.0
>                |                                |          |          |
>                |                                |          |           --51.86%--kernfs_iop_permission
>                |                                |          |                     |
>                |                                |          |                     |--50.92%--__mutex_lock.constprop.0
>                |                                |          |                     |          |
>                |                                |          |                     |           --49.58%--osq_lock
>                |                                |          |                     |
>                |                                |          |                      --0.59%--mutex_unlock
>                |                                |          |
>                |                                |           --29.47%--walk_component
>                |                                |                     |
>                |                                |                      --29.10%--lookup_fast
>                |                                |                                |
>                |                                |                                 --28.76%--kernfs_dop_revalidate
>                |                                |                                           |
>                |                                |                                            --28.29%--__mutex_lock.constprop.0
>                |                                |                                                      |
>                |                                |                                                       --27.65%--osq_lock
>                |                                |
>                |                                |--9.60%--lookup_fast
>                |                                |          |
>                |                                |           --9.50%--kernfs_dop_revalidate
>                |                                |                     |
>                |                                |                      --9.35%--__mutex_lock.constprop.0
>                |                                |                                |
>                |                                |                                 --9.18%--osq_lock
>                |                                |
>                |                                |--6.17%--may_open
>                |                                |          |
>                |                                |           --6.13%--inode_permission.part.0
>                |                                |                     |
>                |                                |                      --6.10%--kernfs_iop_permission
>                |                                |                                |
>                |                                |                                 --5.90%--__mutex_lock.constprop.0
>                |                                |                                           |
>                |                                |                                            --5.80%--osq_lock
>                |                                |
>                |                                 --0.52%--do_dentry_open
>                |
>                 --0.63%--__prepare_exit_to_usermode
>                           |
>                            --0.58%--task_work_run
>
>     99.91%     0.01%  fops_overhead  [kernel.kallsyms]  [k] do_syscall_64
>             |
>              --99.89%--do_syscall_64
>                        |
>                        |--98.69%--__x64_sys_openat
>                        |          |
>                        |           --98.67%--do_sys_openat2
>                        |                     |
>                        |                      --98.57%--do_filp_open
>                        |                                path_openat
>                        |                                |
>                        |                                |--81.83%--link_path_walk.part.0
>                        |                                |          |
>                        |                                |          |--52.19%--inode_permission.part.0
>                        |                                |          |          |
>                        |                                |          |           --51.86%--kernfs_iop_permission
>                        |                                |          |                     |
>                        |                                |          |                     |--50.92%--__mutex_lock.constprop.0
>                        |                                |          |                     |          |
>                        |                                |          |                     |           --49.58%--osq_lock
>                        |                                |          |                     |
>                        |                                |          |                      --0.59%--mutex_unlock
>                        |                                |          |
>                        |                                |           --29.47%--walk_component
>                        |                                |                     |
>                        |                                |                      --29.10%--lookup_fast
>                        |                                |                                |
>                        |                                |                                 --28.76%--kernfs_dop_revalidate
>                        |                                |                                           |
>                        |                                |                                            --28.29%--__mutex_lock.constprop.0
>                        |                                |                                                      |
>                        |                                |                                                       --27.65%--osq_lock
>                        |                                |
>                        |                                |--9.60%--lookup_fast
>                        |                                |          |
>                        |                                |           --9.50%--kernfs_dop_revalidate
>                        |                                |                     |
>                        |                                |                      --9.35%--__mutex_lock.constprop.0
>                        |                                |                                |
>                        |                                |                                 --9.18%--osq_lock
>                        |                                |
>                        |                                |--6.17%--may_open
>                        |                                |          |
>                        |                                |           --6.13%--inode_permission.part.0
>                        |                                |                     |
>                        |                                |                      --6.10%--kernfs_iop_permission
>                        |                                |                                |
>                        |                                |                                 --5.90%--__mutex_lock.constprop.0
>                        |                                |                                           |
>                        |                                |                                            --5.80%--osq_lock
>                        |                                |
>                        |                                 --0.52%--do_dentry_open
>                        |
>                         --0.63%--__prepare_exit_to_usermode
>                                   |
>                                    --0.58%--task_work_run
>
>     98.72%     0.00%  fops_overhead  [unknown]          [k] 0x7379732f73656369
>             |
>             ---0x7379732f73656369
>                __GI___libc_open
>                |
>                 --98.70%--entry_SYSCALL_64_after_hwframe
>                           do_syscall_64
>                           |
>                            --98.66%--__x64_sys_openat
>                                      |
>                                       --98.65%--do_sys_openat2
>                                                 |
>                                                  --98.55%--do_filp_open
>                                                            path_openat
>                                                            |
>                                                            |--81.80%--link_path_walk.part.0
>                                                            |          |
>                                                            |          |--52.16%--inode_permission.part.0
>                                                            |          |          |
>                                                            |          |           --51.86%--kernfs_iop_permission
>                                                            |          |                     |
>                                                            |          |                     |--50.92%--__mutex_lock.constprop.0
>                                                            |          |                     |          |
>                                                            |          |                     |           --49.58%--osq_lock
>                                                            |          |                     |
>                                                            |          |                      --0.59%--mutex_unlock
>                                                            |          |
>                                                            |           --29.47%--walk_component
>                                                            |                     |
>                                                            |                      --29.10%--lookup_fast
>                                                            |                                |
>                                                            |                                 --28.76%--kernfs_dop_revalidate
>                                                            |                                           |
>                                                            |                                            --28.29%--__mutex_lock.constprop.0
>                                                            |                                                      |
>                                                            |                                                       --27.65%--osq_lock
>                                                            |
>                                                            |--9.60%--lookup_fast
>                                                            |          |
>                                                            |           --9.50%--kernfs_dop_revalidate
>                                                            |                     |
>                                                            |                      --9.35%--__mutex_lock.constprop.0
>                                                            |                                |
>                                                            |                                 --9.18%--osq_lock
>                                                            |
>                                                            |--6.17%--may_open
>                                                            |          |
>                                                            |           --6.13%--inode_permission.part.0
>                                                            |                     |
>                                                            |                      --6.10%--kernfs_iop_permission
>                                                            |                                |
>                                                            |                                 --5.90%--__mutex_lock.constprop.0
>                                                            |                                           |
>                                                            |                                            --5.80%--osq_lock
>                                                            |
>                                                             --0.52%--do_dentry_open
>
>     98.72%     0.00%  fops_overhead  libc-2.31.so       [.] __GI___libc_open
>             |
>             ---__GI___libc_open
>                |
>                 --98.70%--entry_SYSCALL_64_after_hwframe
>                           do_syscall_64
>                           |
>                            --98.66%--__x64_sys_openat
>                                      |
>                                       --98.65%--do_sys_openat2
>                                                 |
>                                                  --98.55%--do_filp_open
>                                                            path_openat
>                                                            |
>                                                            |--81.80%--link_path_walk.part.0
>                                                            |          |
>                                                            |          |--52.16%--inode_permission.part.0
>                                                            |          |          |
>                                                            |          |           --51.86%--kernfs_iop_permission
>                                                            |          |                     |
>                                                            |          |                     |--50.92%--__mutex_lock.constprop.0
>                                                            |          |                     |          |
>                                                            |          |                     |           --49.58%--osq_lock
>                                                            |          |                     |
>                                                            |          |                      --0.59%--mutex_unlock
>                                                            |          |
>                                                            |           --29.47%--walk_component
>                                                            |                     |
>                                                            |                      --29.10%--lookup_fast
>                                                            |                                |
>                                                            |                                 --28.76%--kernfs_dop_revalidate
>                                                            |                                           |
>                                                            |                                            --28.29%--__mutex_lock.constprop.0
>                                                            |                                                      |
>                                                            |                                                       --27.65%--osq_lock
>                                                            |
>                                                            |--9.60%--lookup_fast
>                                                            |          |
>                                                            |           --9.50%--kernfs_dop_revalidate
>                                                            |                     |
>                                                            |                      --9.35%--__mutex_lock.constprop.0
>                                                            |                                |
>                                                            |                                 --9.18%--osq_lock
>                                                            |
>                                                            |--6.17%--may_open
>                                                            |          |
>                                                            |           --6.13%--inode_permission.part.0
>                                                            |                     |
>                                                            |                      --6.10%--kernfs_iop_permission
>                                                            |                                |
>                                                            |                                 --5.90%--__mutex_lock.constprop.0
>                                                            |                                           |
>                                                            |                                            --5.80%--osq_lock
>                                                            |
>                                                             --0.52%--do_dentry_open
>
>     98.69%     0.01%  fops_overhead  [kernel.kallsyms]  [k] __x64_sys_openat
>             |
>              --98.67%--__x64_sys_openat
>                        do_sys_openat2
>                        |
>                         --98.57%--do_filp_open
>                                   path_openat
>                                   |
>                                   |--81.83%--link_path_walk.part.0
>                                   |          |
>                                   |          |--52.19%--inode_permission.part.0
>                                   |          |          |
>                                   |          |           --51.86%--kernfs_iop_permission
>                                   |          |                     |
>                                   |          |                     |--50.92%--__mutex_lock.constprop.0
>                                   |          |                     |          |
>                                   |          |                     |           --49.58%--osq_lock
>                                   |          |                     |
>                                   |          |                      --0.59%--mutex_unlock
>                                   |          |
>                                   |           --29.47%--walk_component
>                                   |                     |
>                                   |                      --29.10%--lookup_fast
>                                   |                                |
>                                   |                                 --28.76%--kernfs_dop_revalidate
>                                   |                                           |
>                                   |                                            --28.29%--__mutex_lock.constprop.0
>                                   |                                                      |
>                                   |                                                       --27.65%--osq_lock
>                                   |
>                                   |--9.60%--lookup_fast
>                                   |          |
>                                   |           --9.50%--kernfs_dop_revalidate
>                                   |                     |
>                                   |                      --9.35%--__mutex_lock.constprop.0
>                                   |                                |
>                                   |                                 --9.18%--osq_lock
>                                   |
>                                   |--6.17%--may_open
>                                   |          |
>                                   |           --6.13%--inode_permission.part.0
>                                   |                     |
>                                   |                      --6.10%--kernfs_iop_permission
>                                   |                                |
>                                   |                                 --5.90%--__mutex_lock.constprop.0
>                                   |                                           |
>                                   |                                            --5.80%--osq_lock
>                                   |
>                                    --0.52%--do_dentry_open
>
>     98.67%     0.03%  fops_overhead  [kernel.kallsyms]  [k] do_sys_openat2
>             |
>              --98.65%--do_sys_openat2
>                        |
>                         --98.57%--do_filp_open
>                                   path_openat
>                                   |
>                                   |--81.83%--link_path_walk.part.0
>                                   |          |
>                                   |          |--52.19%--inode_permission.part.0
>                                   |          |          |
>                                   |          |           --51.86%--kernfs_iop_permission
>                                   |          |                     |
>                                   |          |                     |--50.92%--__mutex_lock.constprop.0
>                                   |          |                     |          |
>                                   |          |                     |           --49.58%--osq_lock
>                                   |          |                     |
>                                   |          |                      --0.59%--mutex_unlock
>                                   |          |
>                                   |           --29.47%--walk_component
>                                   |                     |
>                                   |                      --29.10%--lookup_fast
>                                   |                                |
>                                   |                                 --28.76%--kernfs_dop_revalidate
>                                   |                                           |
>                                   |                                            --28.29%--__mutex_lock.constprop.0
>                                   |                                                      |
>                                   |                                                       --27.65%--osq_lock
>                                   |
>                                   |--9.60%--lookup_fast
>                                   |          |
>                                   |           --9.50%--kernfs_dop_revalidate
>                                   |                     |
>                                   |                      --9.35%--__mutex_lock.constprop.0
>                                   |                                |
>                                   |                                 --9.18%--osq_lock
>                                   |
>                                   |--6.17%--may_open
>                                   |          |
>                                   |           --6.13%--inode_permission.part.0
>                                   |                     |
>                                   |                      --6.10%--kernfs_iop_permission
>                                   |                                |
>                                   |                                 --5.90%--__mutex_lock.constprop.0
>                                   |                                           |
>                                   |                                            --5.80%--osq_lock
>                                   |
>                                    --0.52%--do_dentry_open
>
>     98.57%     0.00%  fops_overhead  [kernel.kallsyms]  [k] do_filp_open
>             |
>             ---do_filp_open
>                path_openat
>                |
>                |--81.83%--link_path_walk.part.0
>                |          |
>                |          |--52.19%--inode_permission.part.0
>                |          |          |
>                |          |           --51.86%--kernfs_iop_permission
>                |          |                     |
>                |          |                     |--50.92%--__mutex_lock.constprop.0
>                |          |                     |          |
>                |          |                     |           --49.58%--osq_lock
>                |          |                     |
>                |          |                      --0.59%--mutex_unlock
>                |          |
>                |           --29.47%--walk_component
>                |                     |
>                |                      --29.10%--lookup_fast
>                |                                |
>                |                                 --28.76%--kernfs_dop_revalidate
>                |                                           |
>                |                                            --28.29%--__mutex_lock.constprop.0
>                |                                                      |
>                |                                                       --27.65%--osq_lock
>                |
>                |--9.60%--lookup_fast
>                |          |
>                |           --9.50%--kernfs_dop_revalidate
>                |                     |
>                |                      --9.35%--__mutex_lock.constprop.0
>                |                                |
>                |                                 --9.18%--osq_lock
>                |
>                |--6.17%--may_open
>                |          |
>                |           --6.13%--inode_permission.part.0
>                |                     |
>                |                      --6.10%--kernfs_iop_permission
>                |                                |
>                |                                 --5.90%--__mutex_lock.constprop.0
>                |                                           |
>                |                                            --5.80%--osq_lock
>                |
>                 --0.52%--do_dentry_open
>
>     98.57%     0.01%  fops_overhead  [kernel.kallsyms]  [k] path_openat
>             |
>              --98.56%--path_openat
>                        |
>                        |--81.83%--link_path_walk.part.0
>                        |          |
>                        |          |--52.19%--inode_permission.part.0
>                        |          |          |
>                        |          |           --51.86%--kernfs_iop_permission
>                        |          |                     |
>                        |          |                     |--50.92%--__mutex_lock.constprop.0
>                        |          |                     |          |
>                        |          |                     |           --49.58%--osq_lock
>                        |          |                     |
>                        |          |                      --0.59%--mutex_unlock
>                        |          |
>                        |           --29.47%--walk_component
>                        |                     |
>                        |                      --29.10%--lookup_fast
>                        |                                |
>                        |                                 --28.76%--kernfs_dop_revalidate
>                        |                                           |
>                        |                                            --28.29%--__mutex_lock.constprop.0
>                        |                                                      |
>                        |                                                       --27.65%--osq_lock
>                        |
>                        |--9.60%--lookup_fast
>                        |          |
>                        |           --9.50%--kernfs_dop_revalidate
>                        |                     |
>                        |                      --9.35%--__mutex_lock.constprop.0
>                        |                                |
>                        |                                 --9.18%--osq_lock
>                        |
>                        |--6.17%--may_open
>                        |          |
>                        |           --6.13%--inode_permission.part.0
>                        |                     |
>                        |                      --6.10%--kernfs_iop_permission
>                        |                                |
>                        |                                 --5.90%--__mutex_lock.constprop.0
>                        |                                           |
>                        |                                            --5.80%--osq_lock
>                        |
>                         --0.52%--do_dentry_open
>
>     94.52%     1.30%  fops_overhead  [kernel.kallsyms]  [k] __mutex_lock.constprop.0
>             |
>             |--93.23%--__mutex_lock.constprop.0
>             |          |
>             |          |--92.23%--osq_lock
>             |          |
>             |           --0.55%--mutex_spin_on_owner
>             |
>              --1.30%--0x7379732f73656369
>                        __GI___libc_open
>                        entry_SYSCALL_64_after_hwframe
>                        do_syscall_64
>                        __x64_sys_openat
>                        do_sys_openat2
>                        do_filp_open
>                        path_openat
>                        |
>                         --1.09%--link_path_walk.part.0
>                                   |
>                                    --0.75%--inode_permission.part.0
>                                              kernfs_iop_permission
>                                              __mutex_lock.constprop.0
>
>     92.24%    92.22%  fops_overhead  [kernel.kallsyms]  [k] osq_lock
>             |
>              --92.22%--0x7379732f73656369
>                        __GI___libc_open
>                        entry_SYSCALL_64_after_hwframe
>                        do_syscall_64
>                        __x64_sys_openat
>                        do_sys_openat2
>                        do_filp_open
>                        path_openat
>                        |
>                        |--77.21%--link_path_walk.part.0
>                        |          |
>                        |          |--49.57%--inode_permission.part.0
>                        |          |          kernfs_iop_permission
>                        |          |          __mutex_lock.constprop.0
>                        |          |          osq_lock
>                        |          |
>                        |           --27.64%--walk_component
>                        |                     lookup_fast
>                        |                     kernfs_dop_revalidate
>                        |                     __mutex_lock.constprop.0
>                        |                     osq_lock
>                        |
>                        |--9.18%--lookup_fast
>                        |          kernfs_dop_revalidate
>                        |          __mutex_lock.constprop.0
>                        |          osq_lock
>                        |
>                         --5.80%--may_open
>                                   inode_permission.part.0
>                                   kernfs_iop_permission
>                                   __mutex_lock.constprop.0
>                                   osq_lock
>
>     81.83%     0.03%  fops_overhead  [kernel.kallsyms]  [k] link_path_walk.part.0
>             |
>              --81.80%--link_path_walk.part.0
>                        |
>                        |--52.19%--inode_permission.part.0
>                        |          |
>                        |           --51.86%--kernfs_iop_permission
>                        |                     |
>                        |                     |--50.92%--__mutex_lock.constprop.0
>                        |                     |          |
>                        |                     |           --49.58%--osq_lock
>                        |                     |
>                        |                      --0.59%--mutex_unlock
>                        |
>                         --29.47%--walk_component
>                                   |
>                                    --29.10%--lookup_fast
>                                              |
>                                               --28.76%--kernfs_dop_revalidate
>                                                         |
>                                                          --28.29%--__mutex_lock.constprop.0
>                                                                    |
>                                                                     --27.65%--osq_lock
>
>     58.32%     0.24%  fops_overhead  [kernel.kallsyms]  [k] inode_permission.part.0
>             |
>              --58.08%--inode_permission.part.0
>                        |
>                         --57.97%--kernfs_iop_permission
>                                   |
>                                   |--56.81%--__mutex_lock.constprop.0
>                                   |          |
>                                   |           --55.39%--osq_lock
>                                   |
>                                    --0.73%--mutex_unlock
>
>     57.97%     0.00%  fops_overhead  [kernel.kallsyms]  [k] kernfs_iop_permission
>             |
>             ---kernfs_iop_permission
>                |
>                |--56.81%--__mutex_lock.constprop.0
>                |          |
>                |           --55.39%--osq_lock
>                |
>                 --0.73%--mutex_unlock
>
>     38.71%     0.03%  fops_overhead  [kernel.kallsyms]  [k] lookup_fast
>             |
>              --38.68%--lookup_fast
>                        |
>                         --38.26%--kernfs_dop_revalidate
>                                   |
>                                    --37.64%--__mutex_lock.constprop.0
>                                              |
>                                               --36.83%--osq_lock
>
>     38.26%     0.04%  fops_overhead  [kernel.kallsyms]  [k] kernfs_dop_revalidate
>             |
>              --38.22%--kernfs_dop_revalidate
>                        |
>                         --37.64%--__mutex_lock.constprop.0
>                                   |
>                                    --36.83%--osq_lock
>
>     29.47%     0.03%  fops_overhead  [kernel.kallsyms]  [k] walk_component
>             |
>              --29.44%--walk_component
>                        |
>                         --29.10%--lookup_fast
>                                   |
>                                    --28.76%--kernfs_dop_revalidate
>                                              |
>                                               --28.29%--__mutex_lock.constprop.0
>                                                         |
>                                                          --27.65%--osq_lock
>
>      6.17%     0.03%  fops_overhead  [kernel.kallsyms]  [k] may_open
>             |
>              --6.14%--may_open
>                        |
>                         --6.13%--inode_permission.part.0
>                                   |
>                                    --6.10%--kernfs_iop_permission
>                                              |
>                                               --5.90%--__mutex_lock.constprop.0
>                                                         |
>                                                          --5.80%--osq_lock
>
>      1.22%     0.00%  fops_overhead  [unknown]          [k] 0x5541d68949564100
>             |
>             ---0x5541d68949564100
>                __libc_start_main
>                |
>                |--0.68%--__close
>                |          |
>                |           --0.66%--entry_SYSCALL_64_after_hwframe
>                |                     do_syscall_64
>                |                     |
>                |                      --0.61%--__prepare_exit_to_usermode
>                |                                |
>                |                                 --0.58%--task_work_run
>                |
>                 --0.54%--read
>
>      1.22%     0.00%  fops_overhead  libc-2.31.so       [.] __libc_start_main
>             |
>             ---__libc_start_main
>                |
>                |--0.68%--__close
>                |          |
>                |           --0.66%--entry_SYSCALL_64_after_hwframe
>                |                     do_syscall_64
>                |                     |
>                |                      --0.61%--__prepare_exit_to_usermode
>                |                                |
>                |                                 --0.58%--task_work_run
>                |
>                 --0.54%--read
>
>      1.06%     1.05%  fops_overhead  [kernel.kallsyms]  [k] mutex_unlock
>             |
>              --1.02%--0x7379732f73656369
>                        __GI___libc_open
>                        entry_SYSCALL_64_after_hwframe
>                        do_syscall_64
>                        __x64_sys_openat
>                        do_sys_openat2
>                        do_filp_open
>                        path_openat
>                        |
>                         --0.80%--link_path_walk.part.0
>                                   |
>                                    --0.60%--inode_permission.part.0
>                                              kernfs_iop_permission
>                                              |
>                                               --0.59%--mutex_unlock
>
>      0.88%     0.79%  fops_overhead  [kernel.kallsyms]  [k] mutex_lock
>             |
>              --0.68%--0x7379732f73656369
>                        __GI___libc_open
>                        entry_SYSCALL_64_after_hwframe
>                        do_syscall_64
>                        __x64_sys_openat
>                        do_sys_openat2
>                        do_filp_open
>                        path_openat
>
>      0.68%     0.01%  fops_overhead  libc-2.31.so       [.] __close
>             |
>              --0.67%--__close
>                        |
>                         --0.66%--entry_SYSCALL_64_after_hwframe
>                                   do_syscall_64
>                                   |
>                                    --0.61%--__prepare_exit_to_usermode
>                                              |
>                                               --0.58%--task_work_run
>
>      0.63%     0.05%  fops_overhead  [kernel.kallsyms]  [k] __prepare_exit_to_usermode
>             |
>              --0.58%--__prepare_exit_to_usermode
>                        task_work_run
>
>      0.58%     0.00%  fops_overhead  [kernel.kallsyms]  [k] task_work_run
>             |
>             ---task_work_run
>
>      0.58%     0.10%  fops_overhead  [kernel.kallsyms]  [k] dput
>      0.56%     0.55%  fops_overhead  [kernel.kallsyms]  [k] mutex_spin_on_owner
>             |
>              --0.55%--0x7379732f73656369
>                        __GI___libc_open
>                        entry_SYSCALL_64_after_hwframe
>                        do_syscall_64
>                        __x64_sys_openat
>                        do_sys_openat2
>                        do_filp_open
>                        path_openat
>
>      0.54%     0.00%  fops_overhead  libc-2.31.so       [.] read
>             |
>             ---read
>
>      0.52%     0.12%  fops_overhead  [kernel.kallsyms]  [k] do_dentry_open
>      0.50%     0.00%  fops_overhead  [kernel.kallsyms]  [k] ksys_read
>      0.50%     0.03%  fops_overhead  [kernel.kallsyms]  [k] vfs_read
>      0.46%     0.05%  fops_overhead  [kernel.kallsyms]  [k] __fput
>      0.45%     0.45%  fops_overhead  [kernel.kallsyms]  [k] lockref_put_return
>      0.43%     0.43%  fops_overhead  [kernel.kallsyms]  [k] osq_unlock
>      0.41%     0.08%  fops_overhead  [kernel.kallsyms]  [k] step_into
>      0.41%     0.08%  fops_overhead  [kernel.kallsyms]  [k] __d_lookup
>      0.37%     0.35%  fops_overhead  [kernel.kallsyms]  [k] _raw_spin_lock
>      0.35%     0.03%  fops_overhead  [kernel.kallsyms]  [k] seq_read
>      0.28%     0.01%  fops_overhead  [kernel.kallsyms]  [k] kernfs_fop_open
>      0.27%     0.03%  fops_overhead  [kernel.kallsyms]  [k] kernfs_fop_release
>      0.16%     0.01%  fops_overhead  [kernel.kallsyms]  [k] kernfs_put_open_node
>      0.16%     0.00%  fops_overhead  [kernel.kallsyms]  [k] terminate_walk
>      0.12%     0.01%  fops_overhead  [kernel.kallsyms]  [k] __alloc_file
>      0.12%     0.00%  fops_overhead  [kernel.kallsyms]  [k] alloc_empty_file
>      0.12%     0.01%  fops_overhead  [kernel.kallsyms]  [k] unlazy_walk
>      0.12%     0.05%  fops_overhead  [kernel.kallsyms]  [k] _cond_resched
>      0.12%     0.07%  fops_overhead  [kernel.kallsyms]  [k] call_rcu
>      0.10%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __legitimize_path
>      0.09%     0.05%  fops_overhead  [kernel.kallsyms]  [k] sysfs_kf_seq_show
>      0.09%     0.09%  fops_overhead  [kernel.kallsyms]  [k] generic_permission
>      0.09%     0.07%  fops_overhead  [kernel.kallsyms]  [k] rcu_all_qs
>      0.09%     0.01%  fops_overhead  [kernel.kallsyms]  [k] security_file_open
>      0.08%     0.00%  fops_overhead  [kernel.kallsyms]  [k] security_file_alloc
>      0.08%     0.08%  fops_overhead  [kernel.kallsyms]  [k] lockref_get_not_dead
>      0.08%     0.03%  fops_overhead  [kernel.kallsyms]  [k] kmem_cache_alloc
>      0.08%     0.08%  fops_overhead  [kernel.kallsyms]  [k] apparmor_file_open
>      0.07%     0.05%  fops_overhead  [kernel.kallsyms]  [k] kfree
>      0.05%     0.05%  fops_overhead  [kernel.kallsyms]  [k] kernfs_fop_read
>      0.05%     0.05%  fops_overhead  [kernel.kallsyms]  [k] set_nlink
>      0.05%     0.01%  fops_overhead  [kernel.kallsyms]  [k] kernfs_seq_start
>      0.05%     0.03%  fops_overhead  [kernel.kallsyms]  [k] path_init
>      0.05%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __x64_sys_close
>      0.05%     0.00%  fops_overhead  [kernel.kallsyms]  [k] filp_close
>      0.05%     0.05%  fops_overhead  [kernel.kallsyms]  [k] syscall_return_via_sysret
>      0.05%     0.03%  fops_overhead  [kernel.kallsyms]  [k] __kmalloc_node
>      0.05%     0.05%  fops_overhead  [kernel.kallsyms]  [k] rcu_segcblist_enqueue
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] vfs_open
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
>      0.04%     0.03%  fops_overhead  [kernel.kallsyms]  [k] sprintf
>      0.04%     0.00%  fops_overhead  [kernel.kallsyms]  [k] dev_attr_show
>      0.04%     0.00%  fops_overhead  [kernel.kallsyms]  [k] die_id_show
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] kmem_cache_free
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] fsnotify_parent
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] security_inode_permission
>      0.04%     0.01%  fops_overhead  [kernel.kallsyms]  [k] __check_object_size
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] apparmor_file_alloc_security
>      0.04%     0.00%  fops_overhead  [kernel.kallsyms]  [k] seq_release
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] memset_erms
>      0.04%     0.04%  fops_overhead  [kernel.kallsyms]  [k] kernfs_get_active
>      0.03%     0.00%  fops_overhead  [kernel.kallsyms]  [k] try_to_wake_up
>      0.03%     0.00%  fops_overhead  [kernel.kallsyms]  [k] vsnprintf
>      0.03%     0.01%  fops_overhead  [kernel.kallsyms]  [k] mntput_no_expire
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] lockref_get
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] kernfs_put_active
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] fsnotify
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] locks_remove_posix
>      0.03%     0.00%  fops_overhead  [kernel.kallsyms]  [k] security_file_permission
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] rw_verify_area
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] set_root
>      0.03%     0.00%  fops_overhead  [kernel.kallsyms]  [k] nd_jump_root
>      0.03%     0.01%  fops_overhead  [kernel.kallsyms]  [k] wake_up_q
>      0.03%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __mutex_unlock_slowpath.constprop.0
>      0.03%     0.00%  fops_overhead  [kernel.kallsyms]  [k] getname_flags.part.0
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] task_work_add
>      0.03%     0.00%  fops_overhead  [kernel.kallsyms]  [k] fput_many
>      0.03%     0.03%  fops_overhead  [kernel.kallsyms]  [k] __legitimize_mnt
>      0.03%     0.01%  fops_overhead  [kernel.kallsyms]  [k] kernfs_seq_stop
>      0.02%     0.02%  fops_overhead  [nfs]              [k] nfs_do_access
>      0.02%     0.00%  fops_overhead  ld-2.31.so         [.] _dl_map_object
>      0.02%     0.00%  fops_overhead  ld-2.31.so         [.] open_path
>      0.02%     0.00%  fops_overhead  ld-2.31.so         [.] __GI___open64_nocancel
>      0.02%     0.00%  fops_overhead  [nfs]              [k] nfs_permission
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] kernfs_seq_next
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] available_idle_cpu
>      0.01%     0.00%  fops_overhead  [unknown]          [k] 0x3931206e69207364
>      0.01%     0.00%  fops_overhead  libc-2.31.so       [.] __GI___libc_write
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] ksys_write
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] vfs_write
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] tty_write
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] n_tty_write
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] pty_write
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] queue_work_on
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __queue_work
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] select_task_rq_fair
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] native_queued_spin_lock_slowpath
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] slab_free_freelist_hook
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] __list_del_entry_valid
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] memcg_kmem_put_cache
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __syscall_return_slowpath
>      0.01%     0.01%  fops_overhead  libc-2.31.so       [.] _dl_addr
>      0.01%     0.00%  fops_overhead  [unknown]          [.] 0x756e696c2d34365f
>      0.01%     0.00%  fops_overhead  [unknown]          [.] 0x00007f4b1ca1e000
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __x86_indirect_thunk_rax
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] __virt_addr_valid
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] locks_remove_file
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] memcpy_erms
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] update_rq_clock
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] entry_SYSCALL_64
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] __check_heap_object
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] apparmor_file_free_security
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] security_file_free
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] __d_lookup_rcu
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] mntput
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] get_unused_fd_flags
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] alloc_slab_page
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __slab_alloc
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] ___slab_alloc
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] allocate_slab
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __alloc_fd
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] legitimize_root
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] strncpy_from_user
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] kernfs_refresh_inode
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] build_open_flags
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] strcmp
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] memcg_kmem_get_cache
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] asm_sysvec_apic_timer_interrupt
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] sysvec_apic_timer_interrupt
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] asm_call_sysvec_on_stack
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __sysvec_apic_timer_interrupt
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] hrtimer_interrupt
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __hrtimer_run_queues
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] tick_sched_timer
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] tick_sched_handle
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] update_process_times
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] scheduler_tick
>      0.01%     0.01%  fops_overhead  [kernel.kallsyms]  [k] perf_iterate_ctx
>      0.01%     0.00%  fops_overhead  [unknown]          [k] 0x00007fd34e3a0627
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __x64_sys_execve
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] do_execve
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] __do_execve_file
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] load_elf_binary
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] elf_map
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] vm_mmap_pgoff
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] do_mmap
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] mmap_region
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] perf_event_mmap
>      0.01%     0.00%  fops_overhead  [kernel.kallsyms]  [k] perf_iterate_sb
>      0.00%     0.00%  perf_5.8       [unknown]          [k] 0x00007fd34e3a0627
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] entry_SYSCALL_64_after_hwframe
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] perf_event_exec
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] do_syscall_64
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] __x64_sys_execve
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] do_execve
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] __do_execve_file
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] load_elf_binary
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] begin_new_exec
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] native_write_msr
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] __intel_pmu_enable_all.constprop.0
>      0.00%     0.00%  perf_5.8       [kernel.kallsyms]  [k] acpi_os_read_memory
>
>
> #
> # (Tip: To count events in every 1000 msec: perf stat -I 1000)
> #
>

Hi Brice,

I wrote a benchmark to do open+read+close on
/sys/devices/system/cpu/cpu0/topology/die_id
https://github.com/foxhlchen/sysfs_benchmark/blob/main/main.c


+    3.39%     3.37%  a.out  [kernel.kallsyms]  [k] mutex_unlock
                                 ◆
+    2.76%     2.74%  a.out  [kernel.kallsyms]  [k] mutex_lock
                                 ▒
+    0.92%     0.42%  a.out  [kernel.kallsyms]  [k]
__mutex_lock.constprop.0                            ▒
     0.38%     0.37%  a.out  [kernel.kallsyms]  [k]
mutex_spin_on_owner                                 ▒
     0.05%     0.05%  a.out  [kernel.kallsyms]  [k] __mutex_init
                                 ▒
     0.01%     0.01%  a.out  [kernel.kallsyms]  [k]
__mutex_lock_slowpath                               ▒
     0.01%     0.00%  a.out  [kernel.kallsyms]  [k]
__mutex_unlock_slowpath.constprop.0

But I failed to reproduce your result.

If it is possible, would you mind providing your benchmark code? :)


thanks,
fox

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2020-11-19  8:26 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-03  1:17 [PATCH 0/4] drivers core: Introduce CPU type sysfs interface Ricardo Neri
2020-10-03  1:17 ` [PATCH 1/4] " Ricardo Neri
2020-10-03  3:27   ` Randy Dunlap
2020-10-06  1:15     ` Ricardo Neri
2020-10-10  3:14       ` Randy Dunlap
2020-10-03  8:53   ` Greg Kroah-Hartman
2020-10-03 11:05     ` Greg Kroah-Hartman
2020-10-06  1:08       ` Ricardo Neri
2020-10-06  0:57     ` Ricardo Neri
2020-10-06  7:37       ` Greg Kroah-Hartman
2020-10-07  3:14         ` Ricardo Neri
2020-10-07  5:15           ` Greg Kroah-Hartman
2020-10-08  3:34             ` Ricardo Neri
2020-11-12  6:19             ` Brice Goglin
2020-11-12  6:42               ` Greg Kroah-Hartman
2020-11-12  9:10                 ` Brice Goglin
2020-11-12 10:49                   ` Greg Kroah-Hartman
2020-11-17 15:55                     ` Brice Goglin
2020-11-18 10:45                       ` Brice Goglin
2020-11-18 10:57                         ` Greg Kroah-Hartman
     [not found]                     ` <38f290d2-4c3a-d1b0-f3cc-a0897ea10abd@gmail.com>
2020-11-12 11:34                       ` Greg Kroah-Hartman
2020-11-19  8:25                       ` Fox Chen
2020-10-03  1:17 ` [PATCH 2/4] x86/cpu: Describe hybrid CPUs in cpuinfo_x86 Ricardo Neri
2020-10-03  4:07   ` kernel test robot
2020-10-03  1:17 ` [PATCH 3/4] x86/cpu/intel: Add function to get name of hybrid CPU types Ricardo Neri
2020-10-03  1:17 ` [PATCH 4/4] x86/cpu/topology: Implement the CPU type sysfs interface Ricardo Neri
2020-10-03  3:33   ` Randy Dunlap
2020-10-03  5:28   ` kernel test robot
2020-10-03  8:55   ` Greg Kroah-Hartman
2020-10-06  1:05     ` Ricardo Neri
2020-10-03  8:49 ` [PATCH 0/4] drivers core: Introduce " Borislav Petkov
2020-10-06  0:27   ` Ricardo Neri
2020-10-06  8:51 ` Qais Yousef
2020-10-07  2:50   ` Ricardo Neri

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).