linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/10] perf: Multi-die/package support
@ 2019-02-19 20:00 kan.liang
  2019-02-19 20:00 ` [PATCH 01/10] perf/x86/intel: Introduce a concept "domain" as the scope of counters kan.liang
                   ` (11 more replies)
  0 siblings, 12 replies; 18+ messages in thread
From: kan.liang @ 2019-02-19 20:00 UTC (permalink / raw)
  To: peterz, tglx, acme, mingo, x86, linux-kernel
  Cc: len.brown, jolsa, namhyung, eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Add Linux perf support for multi-die/package. The first product with
multi-die is Xeon Cascade Lake-AP (CLX-AP).
The code bases on the top of Len's multi-die/package support.
https://lkml.org/lkml/2019/2/18/1534

Patch 1-4: They are generic codes for previous platforms.
Perf supports miscellaneous modules, e.g cstate, RAPL and uncore.
Their counters have the same scope of effect (per package).
But they maintain their own scope information independently.
It's very useful to abstract several common topology related codes
for these modules to reduce the code redundancy, especially when
adding counters with new scope.

Patch 5-8: Support die scope counters on CLX-AP for uncore, RAPL
and cstate.

Patch 9-10: Support per-die aggregation for perf stat and header.

Kan Liang (10):
  perf/x86/intel: Introduce a concept "domain" as the scope of counters
  perf/x86/intel/cstate: Apply "domain" for cstate
  perf/x86/intel/uncore: Apply "domain" for uncore
  perf/x86/intel/rapl: Apply "domain" for RAPL
  perf/x86/intel/domain: Add new domain type for die
  perf/x86/intel/uncore: Support die scope counters on CLX-AP
  perf/x86/intel/rapl: Support die scope counters on CLX-AP
  perf/x86/intel/cstate: Support die scope counters on CLX-AP
  perf header: Add die information in cpu topology
  perf stat: Support per-die aggregation

 arch/x86/events/Makefile                           |   2 +-
 arch/x86/events/domain.c                           |  81 +++++
 arch/x86/events/domain.h                           |  26 ++
 arch/x86/events/intel/cstate.c                     | 364 ++++++++++++---------
 arch/x86/events/intel/rapl.c                       | 333 ++++++++++++++-----
 arch/x86/events/intel/uncore.c                     | 247 +++++++++-----
 arch/x86/events/intel/uncore.h                     |   9 +-
 arch/x86/events/intel/uncore_snbep.c               |   2 +-
 tools/perf/Documentation/perf-stat.txt             |  10 +
 tools/perf/Documentation/perf.data-file-format.txt |   9 +-
 tools/perf/builtin-stat.c                          |  73 ++++-
 tools/perf/util/cpumap.c                           |  55 +++-
 tools/perf/util/cpumap.h                           |  10 +-
 tools/perf/util/env.c                              |   1 +
 tools/perf/util/env.h                              |   3 +
 tools/perf/util/header.c                           | 185 ++++++++++-
 tools/perf/util/stat-display.c                     |  24 +-
 tools/perf/util/stat-shadow.c                      |   1 +
 tools/perf/util/stat.c                             |   1 +
 tools/perf/util/stat.h                             |   1 +
 20 files changed, 1082 insertions(+), 355 deletions(-)
 create mode 100644 arch/x86/events/domain.c
 create mode 100644 arch/x86/events/domain.h

-- 
2.7.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 01/10] perf/x86/intel: Introduce a concept "domain" as the scope of counters
  2019-02-19 20:00 [PATCH 00/10] perf: Multi-die/package support kan.liang
@ 2019-02-19 20:00 ` kan.liang
  2019-02-20 11:12   ` Peter Zijlstra
  2019-02-19 20:00 ` [PATCH 02/10] perf/x86/intel/cstate: Apply "domain" for cstate kan.liang
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 18+ messages in thread
From: kan.liang @ 2019-02-19 20:00 UTC (permalink / raw)
  To: peterz, tglx, acme, mingo, x86, linux-kernel
  Cc: len.brown, jolsa, namhyung, eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Perf supports miscellaneous modules, e.g cstate, RAPL and uncore.
The counters of these modules have different scope of effect than core.
So these modules maintain their own scope information independently.
Actually, the scope of counters among these modules are similar.
It's very useful to abstract several common topology related codes for
these modules to reduce the code redundancy.
Furthermore, it will be very helpful if some counters within a new
scope are added, e.g die scope counters on CLX-AP. The similar topology
codes will not to be updated for each modules repeatedly.

A concept, "domain", is introduced as the scope of counters.
 - Domain type: A type of domain is classified by the scope of effect.
   Currently, there are two types of domain, PACKAGE_DOMAIN and
   CORE_DOMAIN. Their scope are physical package and physical core
   respectively.
   Add a new struct domain_type for domain type.
 - The number of domain for each type depends on the topology of the
   machine. For example, for a 4 socket machine, the number of domain
   is 4 for PACKAGE_DOMAIN type.
 - The domain ID: Each domain has an ID, which has to be consecutive.

Four common functions are abstracted.
 - domain_type_init(): Initialize domain type. Updates the number of
   domain for a given type. For PACKAGE_DOMAIN type, it's the maximum
   packages of the machine.
   Assign a postfix string for the name of a given domain type. If there
   are more than two types of domain on a system, the postfix is
   required to distinguish between domain types. For example, cstate PMU
   names are cstate_core and cstate_pkg. If there is only one type on a
   system, postfix is not applied, e.g. RAPL PMU name is powerf.
 - get_domain_cpu_mask(): Return a CPU mask for a given domain type and
   a CPU.
 - get_domain_id(): Return a domain ID for a given domain type and a
   CPU.
   Now, it is only used by RAPL and uncore for PACKAGE_DOMAIN type of
   domain. The domain ID is the same as the logical package ID.
 - get_domain_id_from_group_id(): Return a domain ID for a given domain
   type and a group ID of PCI BUS.
   The function is used by PCI uncore blocks to calculate the mapping
   between a domain ID and PCI BUS.
   For PACKAGE_DOMAIN type of domain, the group ID is the same as the
   physical package ID.

The new concept will be applied for each modules in the following
patches.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/Makefile |  2 +-
 arch/x86/events/domain.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/events/domain.h | 25 +++++++++++++++++
 3 files changed, 96 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/events/domain.c
 create mode 100644 arch/x86/events/domain.h

diff --git a/arch/x86/events/Makefile b/arch/x86/events/Makefile
index b8ccdb5..db638e2 100644
--- a/arch/x86/events/Makefile
+++ b/arch/x86/events/Makefile
@@ -1,4 +1,4 @@
-obj-y					+= core.o
+obj-y					+= core.o domain.o
 obj-y					+= amd/
 obj-$(CONFIG_X86_LOCAL_APIC)            += msr.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= intel/
diff --git a/arch/x86/events/domain.c b/arch/x86/events/domain.c
new file mode 100644
index 0000000..bd24c5b
--- /dev/null
+++ b/arch/x86/events/domain.c
@@ -0,0 +1,70 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019, Intel Corporation.
+ * Define "domain" as the scope of counters
+ *
+ */
+
+#include "domain.h"
+
+int domain_type_init(struct domain_type *type)
+{
+	switch (type->type) {
+	case PACKAGE_DOMAIN:
+		type->max_domains = topology_max_packages();
+		type->postfix = "pkg";
+		return 0;
+	case CORE_DOMAIN:
+		type->postfix = "core";
+		return 0;
+	default:
+		return -1;
+	}
+}
+EXPORT_SYMBOL_GPL(domain_type_init);
+
+/* Return a CPU mask for a given domain type and a CPU. */
+const struct cpumask *get_domain_cpu_mask(int cpu, struct domain_type *type)
+{
+	switch (type->type) {
+	case PACKAGE_DOMAIN:
+		return topology_die_cpumask(cpu);
+	case CORE_DOMAIN:
+		return topology_sibling_cpumask(cpu);
+	default:
+		return NULL;
+	}
+}
+EXPORT_SYMBOL_GPL(get_domain_cpu_mask);
+
+/*
+ * Return a domain ID for a given domain type and a CPU.
+ * The domain ID has to be consecutive.
+ */
+int get_domain_id(unsigned int cpu, struct domain_type *type)
+{
+	switch (type->type) {
+	case PACKAGE_DOMAIN:
+		/* Domain id is the same as logical package id */
+		return topology_logical_package_id(cpu);
+	default:
+		return -1;
+	}
+}
+EXPORT_SYMBOL_GPL(get_domain_id);
+
+/*
+ * Return a domain ID for a given domain type and a group ID of PCI BUS.
+ * Used by uncore to calculate the mapping between a domain ID and PCI BUS.
+ */
+int get_domain_id_from_group_id(int id, struct domain_type *type)
+{
+	switch (type->type) {
+	case PACKAGE_DOMAIN:
+		/* group id is physical pkg id*/
+		return topology_phys_to_logical_pkg(id);
+	default:
+		return -1;
+	}
+}
+EXPORT_SYMBOL_GPL(get_domain_id_from_group_id);
diff --git a/arch/x86/events/domain.h b/arch/x86/events/domain.h
new file mode 100644
index 0000000..c787816
--- /dev/null
+++ b/arch/x86/events/domain.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (C) 2019, Intel Corporation.
+ */
+#include <linux/perf_event.h>
+
+#define DOMAIN_NAME_LEN	32
+
+enum domain_types {
+	PACKAGE_DOMAIN = 0,
+	CORE_DOMAIN,
+
+	DOMAIN_TYPE_MAX,
+};
+
+struct domain_type {
+	enum domain_types	type;
+	unsigned int		max_domains;
+	const char		*postfix;
+};
+
+int domain_type_init(struct domain_type *type);
+const struct cpumask *get_domain_cpu_mask(int cpu, struct domain_type *type);
+int get_domain_id(unsigned int cpu, struct domain_type *type);
+int get_domain_id_from_group_id(int id, struct domain_type *type);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 02/10] perf/x86/intel/cstate: Apply "domain" for cstate
  2019-02-19 20:00 [PATCH 00/10] perf: Multi-die/package support kan.liang
  2019-02-19 20:00 ` [PATCH 01/10] perf/x86/intel: Introduce a concept "domain" as the scope of counters kan.liang
@ 2019-02-19 20:00 ` kan.liang
  2019-02-19 20:00 ` [PATCH 03/10] perf/x86/intel/uncore: Apply "domain" for uncore kan.liang
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: kan.liang @ 2019-02-19 20:00 UTC (permalink / raw)
  To: peterz, tglx, acme, mingo, x86, linux-kernel
  Cc: len.brown, jolsa, namhyung, eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

There are duplicate codes implemented to support different scopes of
counters. Apply the new concept, "domain", for cstate to reduce the
redundancy.

Add struct cstate_pmus to store the PMU related information. Each
available type needs a dedicated cstate_pmus, which is allocated
in cstate_probe_msr().
Remove hardcode cstate_core_pmu and cstate_pkg_pmu. The PMU information
can be found via domain type now.
Cleanup the codes in cstate_pmu_event_init(), cstate_get_attr_cpumask()
and cstate_init().

The format attrs are the same for PACKAGE_DOMAIN and CORE_DOMAIN.
Remove the duplicate codes.

The cpu_mask of a domain type can be retrieved from the common
functions. Cleanup cstate_cpu_init/exit, and remove duplicate codes.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/cstate.c | 341 ++++++++++++++++++++++-------------------
 1 file changed, 184 insertions(+), 157 deletions(-)

diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c
index d2e7807..5f71606 100644
--- a/arch/x86/events/intel/cstate.c
+++ b/arch/x86/events/intel/cstate.c
@@ -96,6 +96,7 @@
 #include <asm/cpu_device_id.h>
 #include <asm/intel-family.h>
 #include "../perf_event.h"
+#include "../domain.h"
 
 MODULE_LICENSE("GPL");
 
@@ -110,14 +111,15 @@ static ssize_t __cstate_##_var##_show(struct kobject *kobj,	\
 static struct kobj_attribute format_attr_##_var =		\
 	__ATTR(_name, 0444, __cstate_##_var##_show, NULL)
 
-static ssize_t cstate_get_attr_cpumask(struct device *dev,
-				       struct device_attribute *attr,
-				       char *buf);
-
 /* Model -> events mapping */
 struct cstate_model {
-	unsigned long		core_events;
-	unsigned long		pkg_events;
+	union {
+		unsigned long	events[DOMAIN_TYPE_MAX];
+		struct {
+			unsigned long	pkg_events;
+			unsigned long	core_events;
+		};
+	};
 	unsigned long		quirks;
 };
 
@@ -130,10 +132,17 @@ struct perf_cstate_msr {
 	struct	perf_pmu_events_attr *attr;
 };
 
+struct cstate_pmus {
+	struct pmu		pmu;
+	struct domain_type	type;
+	int			event_max;
+	struct perf_cstate_msr	*msrs;
+	struct attribute	**attrs;
+	cpumask_t		cpu_mask;
+};
+static struct cstate_pmus *cstate_pmus[DOMAIN_TYPE_MAX];
 
 /* cstate_core PMU */
-static struct pmu cstate_core_pmu;
-static bool has_cstate_core;
 
 enum perf_cstate_core_events {
 	PERF_CSTATE_CORE_C1_RES = 0,
@@ -166,17 +175,33 @@ static struct attribute_group core_events_attr_group = {
 };
 
 DEFINE_CSTATE_FORMAT_ATTR(core_event, event, "config:0-63");
-static struct attribute *core_format_attrs[] = {
+static struct attribute *format_attrs[] = {
 	&format_attr_core_event.attr,
 	NULL,
 };
 
-static struct attribute_group core_format_attr_group = {
+static struct attribute_group format_attr_group = {
 	.name = "format",
-	.attrs = core_format_attrs,
+	.attrs = format_attrs,
 };
 
-static cpumask_t cstate_core_cpu_mask;
+static ssize_t cstate_get_attr_cpumask(struct device *dev,
+				       struct device_attribute *attr,
+				       char *buf)
+{
+	struct pmu *pmu = dev_get_drvdata(dev);
+	struct cstate_pmus *pmus;
+	int i;
+
+	for (i = 0; i < DOMAIN_TYPE_MAX; i++) {
+		pmus = cstate_pmus[i];
+		if (!pmus || &pmus->pmu != pmu)
+			continue;
+		return cpumap_print_to_pagebuf(true, buf, &pmus->cpu_mask);
+	}
+	return 0;
+}
+
 static DEVICE_ATTR(cpumask, S_IRUGO, cstate_get_attr_cpumask, NULL);
 
 static struct attribute *cstate_cpumask_attrs[] = {
@@ -190,15 +215,12 @@ static struct attribute_group cpumask_attr_group = {
 
 static const struct attribute_group *core_attr_groups[] = {
 	&core_events_attr_group,
-	&core_format_attr_group,
+	&format_attr_group,
 	&cpumask_attr_group,
 	NULL,
 };
 
 /* cstate_pkg PMU */
-static struct pmu cstate_pkg_pmu;
-static bool has_cstate_pkg;
-
 enum perf_cstate_pkg_events {
 	PERF_CSTATE_PKG_C2_RES = 0,
 	PERF_CSTATE_PKG_C3_RES,
@@ -238,44 +260,24 @@ static struct attribute_group pkg_events_attr_group = {
 	.attrs = pkg_events_attrs,
 };
 
-DEFINE_CSTATE_FORMAT_ATTR(pkg_event, event, "config:0-63");
-static struct attribute *pkg_format_attrs[] = {
-	&format_attr_pkg_event.attr,
-	NULL,
-};
-static struct attribute_group pkg_format_attr_group = {
-	.name = "format",
-	.attrs = pkg_format_attrs,
-};
-
-static cpumask_t cstate_pkg_cpu_mask;
-
 static const struct attribute_group *pkg_attr_groups[] = {
 	&pkg_events_attr_group,
-	&pkg_format_attr_group,
+	&format_attr_group,
 	&cpumask_attr_group,
 	NULL,
 };
 
-static ssize_t cstate_get_attr_cpumask(struct device *dev,
-				       struct device_attribute *attr,
-				       char *buf)
-{
-	struct pmu *pmu = dev_get_drvdata(dev);
-
-	if (pmu == &cstate_core_pmu)
-		return cpumap_print_to_pagebuf(true, buf, &cstate_core_cpu_mask);
-	else if (pmu == &cstate_pkg_pmu)
-		return cpumap_print_to_pagebuf(true, buf, &cstate_pkg_cpu_mask);
-	else
-		return 0;
-}
-
 static int cstate_pmu_event_init(struct perf_event *event)
 {
+	const struct cpumask *cpu_mask;
 	u64 cfg = event->attr.config;
+	struct cstate_pmus *pmus;
 	int cpu;
 
+	pmus = container_of(event->pmu, struct cstate_pmus, pmu);
+	if (!pmus)
+		return -ENOENT;
+
 	if (event->attr.type != event->pmu->type)
 		return -ENOENT;
 
@@ -292,26 +294,19 @@ static int cstate_pmu_event_init(struct perf_event *event)
 	if (event->cpu < 0)
 		return -EINVAL;
 
-	if (event->pmu == &cstate_core_pmu) {
-		if (cfg >= PERF_CSTATE_CORE_EVENT_MAX)
-			return -EINVAL;
-		if (!core_msr[cfg].attr)
-			return -EINVAL;
-		event->hw.event_base = core_msr[cfg].msr;
-		cpu = cpumask_any_and(&cstate_core_cpu_mask,
-				      topology_sibling_cpumask(event->cpu));
-	} else if (event->pmu == &cstate_pkg_pmu) {
-		if (cfg >= PERF_CSTATE_PKG_EVENT_MAX)
-			return -EINVAL;
-		cfg = array_index_nospec((unsigned long)cfg, PERF_CSTATE_PKG_EVENT_MAX);
-		if (!pkg_msr[cfg].attr)
-			return -EINVAL;
-		event->hw.event_base = pkg_msr[cfg].msr;
-		cpu = cpumask_any_and(&cstate_pkg_cpu_mask,
-				      topology_core_cpumask(event->cpu));
-	} else {
-		return -ENOENT;
-	}
+	if (cfg >= pmus->event_max)
+		return -EINVAL;
+
+	cfg = array_index_nospec((unsigned long)cfg, pmus->event_max);
+	if (!pmus->msrs[cfg].attr)
+		return -EINVAL;
+
+	event->hw.event_base = pmus->msrs[cfg].msr;
+
+	cpu_mask = get_domain_cpu_mask(event->cpu, &pmus->type);
+	if (!cpu_mask)
+		return -ENODEV;
+	cpu = cpumask_any_and(&pmus->cpu_mask, cpu_mask);
 
 	if (cpu >= nr_cpu_ids)
 		return -ENODEV;
@@ -375,85 +370,61 @@ static int cstate_pmu_event_add(struct perf_event *event, int mode)
  */
 static int cstate_cpu_exit(unsigned int cpu)
 {
+	const struct cpumask *cpu_mask;
+	struct cstate_pmus *pmus;
 	unsigned int target;
+	int i;
 
-	if (has_cstate_core &&
-	    cpumask_test_and_clear_cpu(cpu, &cstate_core_cpu_mask)) {
+	for (i = 0; i < DOMAIN_TYPE_MAX; i++) {
+		if (!cstate_pmus[i])
+			continue;
 
-		target = cpumask_any_but(topology_sibling_cpumask(cpu), cpu);
-		/* Migrate events if there is a valid target */
-		if (target < nr_cpu_ids) {
-			cpumask_set_cpu(target, &cstate_core_cpu_mask);
-			perf_pmu_migrate_context(&cstate_core_pmu, cpu, target);
-		}
-	}
+		cpu_mask = get_domain_cpu_mask(cpu, &cstate_pmus[i]->type);
+		if (!cpu_mask)
+			continue;
 
-	if (has_cstate_pkg &&
-	    cpumask_test_and_clear_cpu(cpu, &cstate_pkg_cpu_mask)) {
+		pmus = cstate_pmus[i];
+		if (!cpumask_test_and_clear_cpu(cpu, &pmus->cpu_mask))
+			continue;
 
-		target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
+		target = cpumask_any_but(cpu_mask, cpu);
 		/* Migrate events if there is a valid target */
 		if (target < nr_cpu_ids) {
-			cpumask_set_cpu(target, &cstate_pkg_cpu_mask);
-			perf_pmu_migrate_context(&cstate_pkg_pmu, cpu, target);
+			cpumask_set_cpu(target, &pmus->cpu_mask);
+			perf_pmu_migrate_context(&pmus->pmu, cpu, target);
 		}
 	}
+
 	return 0;
 }
 
 static int cstate_cpu_init(unsigned int cpu)
 {
+	const struct cpumask *cpu_mask;
+	struct cstate_pmus *pmus;
 	unsigned int target;
+	int i;
 
-	/*
-	 * If this is the first online thread of that core, set it in
-	 * the core cpu mask as the designated reader.
-	 */
-	target = cpumask_any_and(&cstate_core_cpu_mask,
-				 topology_sibling_cpumask(cpu));
-
-	if (has_cstate_core && target >= nr_cpu_ids)
-		cpumask_set_cpu(cpu, &cstate_core_cpu_mask);
-
-	/*
-	 * If this is the first online thread of that package, set it
-	 * in the package cpu mask as the designated reader.
-	 */
-	target = cpumask_any_and(&cstate_pkg_cpu_mask,
-				 topology_core_cpumask(cpu));
-	if (has_cstate_pkg && target >= nr_cpu_ids)
-		cpumask_set_cpu(cpu, &cstate_pkg_cpu_mask);
+	for (i = 0; i < DOMAIN_TYPE_MAX; i++) {
+		if (!cstate_pmus[i])
+			continue;
 
-	return 0;
-}
+		cpu_mask = get_domain_cpu_mask(cpu, &cstate_pmus[i]->type);
+		if (!cpu_mask)
+			continue;
 
-static struct pmu cstate_core_pmu = {
-	.attr_groups	= core_attr_groups,
-	.name		= "cstate_core",
-	.task_ctx_nr	= perf_invalid_context,
-	.event_init	= cstate_pmu_event_init,
-	.add		= cstate_pmu_event_add,
-	.del		= cstate_pmu_event_del,
-	.start		= cstate_pmu_event_start,
-	.stop		= cstate_pmu_event_stop,
-	.read		= cstate_pmu_event_update,
-	.capabilities	= PERF_PMU_CAP_NO_INTERRUPT,
-	.module		= THIS_MODULE,
-};
+		pmus = cstate_pmus[i];
+		/*
+		 * If this is the first online thread of that core, set it in
+		 * the core cpu mask as the designated reader.
+		 */
+		target = cpumask_any_and(&pmus->cpu_mask, cpu_mask);
 
-static struct pmu cstate_pkg_pmu = {
-	.attr_groups	= pkg_attr_groups,
-	.name		= "cstate_pkg",
-	.task_ctx_nr	= perf_invalid_context,
-	.event_init	= cstate_pmu_event_init,
-	.add		= cstate_pmu_event_add,
-	.del		= cstate_pmu_event_del,
-	.start		= cstate_pmu_event_start,
-	.stop		= cstate_pmu_event_stop,
-	.read		= cstate_pmu_event_update,
-	.capabilities	= PERF_PMU_CAP_NO_INTERRUPT,
-	.module		= THIS_MODULE,
-};
+		if (target >= nr_cpu_ids)
+			cpumask_set_cpu(cpu, &pmus->cpu_mask);
+	}
+	return 0;
+}
 
 static const struct cstate_model nhm_cstates __initconst = {
 	.core_events		= BIT(PERF_CSTATE_CORE_C3_RES) |
@@ -592,14 +563,28 @@ MODULE_DEVICE_TABLE(x86cpu, intel_cstates_match);
  * Probe the cstate events and insert the available one into sysfs attrs
  * Return false if there are no available events.
  */
-static bool __init cstate_probe_msr(const unsigned long evmsk, int max,
-                                   struct perf_cstate_msr *msr,
-                                   struct attribute **attrs)
+static bool __init cstate_probe_msr(const unsigned long evmsk,
+				    enum domain_types type)
 {
+	struct perf_cstate_msr *msr;
+	struct attribute **attrs;
+	struct cstate_pmus *pmus;
 	bool found = false;
 	unsigned int bit;
+	int max;
 	u64 val;
 
+	if (type == PACKAGE_DOMAIN) {
+		max = PERF_CSTATE_PKG_EVENT_MAX;
+		msr = pkg_msr;
+		attrs = pkg_events_attrs;
+	} else if (type == CORE_DOMAIN) {
+		max = PERF_CSTATE_CORE_EVENT_MAX;
+		msr = core_msr;
+		attrs = core_events_attrs;
+	} else
+		return false;
+
 	for (bit = 0; bit < max; bit++) {
 		if (test_bit(bit, &evmsk) && !rdmsrl_safe(msr[bit].msr, &val)) {
 			*attrs++ = &msr[bit].attr->attr.attr;
@@ -610,11 +595,32 @@ static bool __init cstate_probe_msr(const unsigned long evmsk, int max,
 	}
 	*attrs = NULL;
 
-	return found;
+	if (!found)
+		return false;
+
+	pmus = kzalloc(sizeof(struct cstate_pmus), GFP_KERNEL);
+	if (!pmus)
+		return false;
+
+	pmus->type.type = type;
+	if (domain_type_init(&pmus->type)) {
+		kfree(pmus);
+		return false;
+	}
+	pmus->event_max = max;
+	pmus->msrs = msr;
+	pmus->attrs = attrs;
+
+	cstate_pmus[type] = pmus;
+
+	return true;
 }
 
 static int __init cstate_probe(const struct cstate_model *cm)
 {
+	bool found = false;
+	enum domain_types i;
+
 	/* SLM has different MSR for PKG C6 */
 	if (cm->quirks & SLM_PKG_C6_USE_C7_MSR)
 		pkg_msr[PERF_CSTATE_PKG_C6_RES].msr = MSR_PKG_C7_RESIDENCY;
@@ -624,58 +630,79 @@ static int __init cstate_probe(const struct cstate_model *cm)
 		pkg_msr[PERF_CSTATE_CORE_C6_RES].msr = MSR_KNL_CORE_C6_RESIDENCY;
 
 
-	has_cstate_core = cstate_probe_msr(cm->core_events,
-					   PERF_CSTATE_CORE_EVENT_MAX,
-					   core_msr, core_events_attrs);
+	for (i = 0; i < DOMAIN_TYPE_MAX; i++) {
+		if (!cm->events[i])
+			continue;
 
-	has_cstate_pkg = cstate_probe_msr(cm->pkg_events,
-					  PERF_CSTATE_PKG_EVENT_MAX,
-					  pkg_msr, pkg_events_attrs);
-
-	return (has_cstate_core || has_cstate_pkg) ? 0 : -ENODEV;
+		if (cstate_probe_msr(cm->events[i], i))
+			found = true;
+	}
+	return found ? 0 : -ENODEV;
 }
 
 static inline void cstate_cleanup(void)
 {
+	int i;
+
 	cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_CSTATE_ONLINE);
 	cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_CSTATE_STARTING);
 
-	if (has_cstate_core)
-		perf_pmu_unregister(&cstate_core_pmu);
-
-	if (has_cstate_pkg)
-		perf_pmu_unregister(&cstate_pkg_pmu);
+	for (i = 0; i < DOMAIN_TYPE_MAX; i++) {
+		if (!cstate_pmus[i])
+			continue;
+		perf_pmu_unregister(&cstate_pmus[i]->pmu);
+		kfree(cstate_pmus[i]);
+		cstate_pmus[i] = NULL;
+	}
 }
 
 static int __init cstate_init(void)
 {
-	int err;
+	struct pmu *pmu;
+	char name[DOMAIN_NAME_LEN];
+	int i, err = 0;
 
 	cpuhp_setup_state(CPUHP_AP_PERF_X86_CSTATE_STARTING,
 			  "perf/x86/cstate:starting", cstate_cpu_init, NULL);
 	cpuhp_setup_state(CPUHP_AP_PERF_X86_CSTATE_ONLINE,
 			  "perf/x86/cstate:online", NULL, cstate_cpu_exit);
 
-	if (has_cstate_core) {
-		err = perf_pmu_register(&cstate_core_pmu, cstate_core_pmu.name, -1);
-		if (err) {
-			has_cstate_core = false;
-			pr_info("Failed to register cstate core pmu\n");
-			cstate_cleanup();
-			return err;
+	for (i = 0; i < DOMAIN_TYPE_MAX; i++) {
+		if (!cstate_pmus[i])
+			continue;
+		pmu = &cstate_pmus[i]->pmu;
+
+		if (i == PACKAGE_DOMAIN)
+			pmu->attr_groups = pkg_attr_groups;
+		else if (i == CORE_DOMAIN)
+			pmu->attr_groups = core_attr_groups;
+
+		pmu->task_ctx_nr = perf_invalid_context;
+		pmu->event_init = cstate_pmu_event_init;
+		pmu->add = cstate_pmu_event_add;
+		pmu->del = cstate_pmu_event_del;
+		pmu->start = cstate_pmu_event_start;
+		pmu->stop = cstate_pmu_event_stop;
+		pmu->read = cstate_pmu_event_update;
+		pmu->capabilities = PERF_PMU_CAP_NO_INTERRUPT;
+		pmu->module = THIS_MODULE;
+
+		err = snprintf(name, DOMAIN_NAME_LEN, "cstate_%s",
+			       cstate_pmus[i]->type.postfix);
+		if (err < 0) {
+			kfree(cstate_pmus[i]);
+			cstate_pmus[i] = NULL;
+			continue;
 		}
-	}
-
-	if (has_cstate_pkg) {
-		err = perf_pmu_register(&cstate_pkg_pmu, cstate_pkg_pmu.name, -1);
+		err = perf_pmu_register(pmu, name, -1);
 		if (err) {
-			has_cstate_pkg = false;
-			pr_info("Failed to register cstate pkg pmu\n");
-			cstate_cleanup();
-			return err;
+			kfree(cstate_pmus[i]);
+			cstate_pmus[i] = NULL;
+			pr_info("Failed to register %s pmu\n", name);
 		}
 	}
-	return 0;
+
+	return err;
 }
 
 static int __init cstate_pmu_init(void)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 03/10] perf/x86/intel/uncore: Apply "domain" for uncore
  2019-02-19 20:00 [PATCH 00/10] perf: Multi-die/package support kan.liang
  2019-02-19 20:00 ` [PATCH 01/10] perf/x86/intel: Introduce a concept "domain" as the scope of counters kan.liang
  2019-02-19 20:00 ` [PATCH 02/10] perf/x86/intel/cstate: Apply "domain" for cstate kan.liang
@ 2019-02-19 20:00 ` kan.liang
  2019-02-19 20:00 ` [PATCH 04/10] perf/x86/intel/rapl: Apply "domain" for RAPL kan.liang
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: kan.liang @ 2019-02-19 20:00 UTC (permalink / raw)
  To: peterz, tglx, acme, mingo, x86, linux-kernel
  Cc: len.brown, jolsa, namhyung, eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The uncore counters are not package scope only anymore. For example,
there will be die scope counters on CLX-AP.
Apply "domain" for uncore, and make it easy to be extended later.

Add domain_type in intel_uncore_type to indicate the domain type of
uncore counters. The default is package scope domain.

Rename pkgid to domain_id for uncore box. Use domain ID to replace the
package ID.

Each type of domain has its own uncore_cpu_mask. Update
uncore_event_cpu_online/offline to apply per-domain uncore_cpu_mask.

Replace max_packages by the number of domains.

If there are more than two types of domain, using a new PMU name,
"uncore_$domain_type_$other_postfix".
Use DOMAIN_NAME_LEN to replace UNCORE_PMU_NAME_LEN.
Use more secure snprintf to replace sprintf

The uncore_extra_pci_dev is a filter register or capability register.
It is not uncore counter. It is not used on Skylake server ,and probably
not used on future platforms. The patch doesn't apply the "domain"
concept to uncore_extra_pci_dev. But it can be done later if needed.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/uncore.c       | 233 ++++++++++++++++++++++++-----------
 arch/x86/events/intel/uncore.h       |   9 +-
 arch/x86/events/intel/uncore_snbep.c |   2 +-
 3 files changed, 164 insertions(+), 80 deletions(-)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 27a4614..f795a73 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -14,10 +14,11 @@ struct pci_driver *uncore_pci_driver;
 DEFINE_RAW_SPINLOCK(pci2phy_map_lock);
 struct list_head pci2phy_map_head = LIST_HEAD_INIT(pci2phy_map_head);
 struct pci_extra_dev *uncore_extra_pci_dev;
-static int max_packages;
 
 /* mask of cpus that collect uncore events */
-static cpumask_t uncore_cpu_mask;
+static cpumask_t uncore_cpu_mask[DOMAIN_TYPE_MAX];
+
+static unsigned int uncore_domain_type_mask;
 
 /* constraint for the fixed counter */
 static struct event_constraint uncore_constraint_fixed =
@@ -100,13 +101,14 @@ ssize_t uncore_event_show(struct kobject *kobj,
 
 struct intel_uncore_box *uncore_pmu_to_box(struct intel_uncore_pmu *pmu, int cpu)
 {
-	unsigned int pkgid = topology_logical_package_id(cpu);
+	unsigned int id = get_domain_id(cpu, &pmu->type->type);
+	int max = pmu->type->type.max_domains;
 
 	/*
 	 * The unsigned check also catches the '-1' return value for non
 	 * existent mappings in the topology map.
 	 */
-	return pkgid < max_packages ? pmu->boxes[pkgid] : NULL;
+	return id < max ? pmu->boxes[id] : NULL;
 }
 
 u64 uncore_msr_read_counter(struct intel_uncore_box *box, struct perf_event *event)
@@ -311,7 +313,7 @@ static struct intel_uncore_box *uncore_alloc_box(struct intel_uncore_type *type,
 	uncore_pmu_init_hrtimer(box);
 	box->cpu = -1;
 	box->pci_phys_id = -1;
-	box->pkgid = -1;
+	box->domain_id = -1;
 
 	/* set default hrtimer timeout */
 	box->hrtimer_duration = UNCORE_PMU_HRTIMER_INTERVAL;
@@ -771,7 +773,15 @@ static int uncore_pmu_event_init(struct perf_event *event)
 static ssize_t uncore_get_attr_cpumask(struct device *dev,
 				struct device_attribute *attr, char *buf)
 {
-	return cpumap_print_to_pagebuf(true, buf, &uncore_cpu_mask);
+	struct pmu *pmu = dev_get_drvdata(dev);
+	struct intel_uncore_pmu *uncore_pmu;
+
+	uncore_pmu = container_of(pmu, struct intel_uncore_pmu, pmu);
+	if (uncore_pmu && uncore_pmu->type)
+		return cpumap_print_to_pagebuf(true, buf,
+					       &uncore_cpu_mask[uncore_pmu->type->type.type]);
+
+	return 0;
 }
 
 static DEVICE_ATTR(cpumask, S_IRUGO, uncore_get_attr_cpumask, NULL);
@@ -787,6 +797,8 @@ static const struct attribute_group uncore_pmu_attr_group = {
 
 static int uncore_pmu_register(struct intel_uncore_pmu *pmu)
 {
+	size_t len;
+	char *name;
 	int ret;
 
 	if (!pmu->type->pmu) {
@@ -806,15 +818,26 @@ static int uncore_pmu_register(struct intel_uncore_pmu *pmu)
 		pmu->pmu.attr_groups = pmu->type->attr_groups;
 	}
 
+	len = DOMAIN_NAME_LEN;
+	name = pmu->name;
+	if (hweight32(uncore_domain_type_mask) > 1)
+		ret = snprintf(name, len, "uncore_%s", pmu->type->type.postfix);
+	else
+		ret = snprintf(name, len, "uncore");
+	if (ret < 0)
+		return ret;
+
+	len -= ret;
+	name += ret;
 	if (pmu->type->num_boxes == 1) {
 		if (strlen(pmu->type->name) > 0)
-			sprintf(pmu->name, "uncore_%s", pmu->type->name);
-		else
-			sprintf(pmu->name, "uncore");
+			ret = snprintf(name, len, "_%s", pmu->type->name);
 	} else {
-		sprintf(pmu->name, "uncore_%s_%d", pmu->type->name,
-			pmu->pmu_idx);
+		ret = snprintf(name, len, "_%s_%d", pmu->type->name,
+			       pmu->pmu_idx);
 	}
+	if (ret < 0)
+		return ret;
 
 	ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
 	if (!ret)
@@ -832,10 +855,10 @@ static void uncore_pmu_unregister(struct intel_uncore_pmu *pmu)
 
 static void uncore_free_boxes(struct intel_uncore_pmu *pmu)
 {
-	int pkg;
+	int i, nr = pmu->type->type.max_domains;
 
-	for (pkg = 0; pkg < max_packages; pkg++)
-		kfree(pmu->boxes[pkg]);
+	for (i = 0; i < nr; i++)
+		kfree(pmu->boxes[i]);
 	kfree(pmu->boxes);
 }
 
@@ -866,13 +889,21 @@ static int __init uncore_type_init(struct intel_uncore_type *type, bool setid)
 {
 	struct intel_uncore_pmu *pmus;
 	size_t size;
-	int i, j;
+	int i, j, nr;
 
 	pmus = kcalloc(type->num_boxes, sizeof(*pmus), GFP_KERNEL);
 	if (!pmus)
 		return -ENOMEM;
 
-	size = max_packages * sizeof(struct intel_uncore_box *);
+	if (domain_type_init(&type->type)) {
+		kfree(pmus);
+		return -ENOMEM;
+	}
+	nr = type->type.max_domains;
+	if (nr < 0)
+		return -EINVAL;
+
+	size = nr * sizeof(struct intel_uncore_box *);
 
 	for (i = 0; i < type->num_boxes; i++) {
 		pmus[i].func_id	= setid ? i : -1;
@@ -911,6 +942,8 @@ static int __init uncore_type_init(struct intel_uncore_type *type, bool setid)
 
 	type->pmu_group = &uncore_pmu_attr_group;
 
+	uncore_domain_type_mask |= (1 << type->type.type);
+
 	return 0;
 
 err:
@@ -942,25 +975,28 @@ static int uncore_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id
 	struct intel_uncore_type *type;
 	struct intel_uncore_pmu *pmu = NULL;
 	struct intel_uncore_box *box;
-	int phys_id, pkg, ret;
+	int phys_id, pkg, domain, ret;
 
 	phys_id = uncore_pcibus_to_physid(pdev->bus);
 	if (phys_id < 0)
 		return -ENODEV;
 
-	pkg = topology_phys_to_logical_pkg(phys_id);
-	if (pkg < 0)
-		return -EINVAL;
-
 	if (UNCORE_PCI_DEV_TYPE(id->driver_data) == UNCORE_EXTRA_PCI_DEV) {
 		int idx = UNCORE_PCI_DEV_IDX(id->driver_data);
 
+		pkg = topology_phys_to_logical_pkg(phys_id);
+		if (pkg < 0)
+			return -EINVAL;
+
 		uncore_extra_pci_dev[pkg].dev[idx] = pdev;
 		pci_set_drvdata(pdev, NULL);
 		return 0;
 	}
 
 	type = uncore_pci_uncores[UNCORE_PCI_DEV_TYPE(id->driver_data)];
+	domain = get_domain_id_from_group_id(phys_id, &type->type);
+	if (domain < 0)
+		return -EINVAL;
 
 	/*
 	 * Some platforms, e.g.  Knights Landing, use a common PCI device ID
@@ -994,7 +1030,7 @@ static int uncore_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id
 		pmu = &type->pmus[UNCORE_PCI_DEV_IDX(id->driver_data)];
 	}
 
-	if (WARN_ON_ONCE(pmu->boxes[pkg] != NULL))
+	if (WARN_ON_ONCE(pmu->boxes[domain] != NULL))
 		return -EINVAL;
 
 	box = uncore_alloc_box(type, NUMA_NO_NODE);
@@ -1008,13 +1044,13 @@ static int uncore_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id
 
 	atomic_inc(&box->refcnt);
 	box->pci_phys_id = phys_id;
-	box->pkgid = pkg;
+	box->domain_id = domain;
 	box->pci_dev = pdev;
 	box->pmu = pmu;
 	uncore_box_init(box);
 	pci_set_drvdata(pdev, box);
 
-	pmu->boxes[pkg] = box;
+	pmu->boxes[domain] = box;
 	if (atomic_inc_return(&pmu->activeboxes) > 1)
 		return 0;
 
@@ -1022,7 +1058,7 @@ static int uncore_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id
 	ret = uncore_pmu_register(pmu);
 	if (ret) {
 		pci_set_drvdata(pdev, NULL);
-		pmu->boxes[pkg] = NULL;
+		pmu->boxes[domain] = NULL;
 		uncore_box_exit(box);
 		kfree(box);
 	}
@@ -1055,7 +1091,7 @@ static void uncore_pci_remove(struct pci_dev *pdev)
 		return;
 
 	pci_set_drvdata(pdev, NULL);
-	pmu->boxes[box->pkgid] = NULL;
+	pmu->boxes[box->domain_id] = NULL;
 	if (atomic_dec_return(&pmu->activeboxes) == 0)
 		uncore_pmu_unregister(pmu);
 	uncore_box_exit(box);
@@ -1067,7 +1103,7 @@ static int __init uncore_pci_init(void)
 	size_t size;
 	int ret;
 
-	size = max_packages * sizeof(struct pci_extra_dev);
+	size = topology_max_packages() * sizeof(struct pci_extra_dev);
 	uncore_extra_pci_dev = kzalloc(size, GFP_KERNEL);
 	if (!uncore_extra_pci_dev) {
 		ret = -ENOMEM;
@@ -1114,11 +1150,11 @@ static void uncore_change_type_ctx(struct intel_uncore_type *type, int old_cpu,
 {
 	struct intel_uncore_pmu *pmu = type->pmus;
 	struct intel_uncore_box *box;
-	int i, pkg;
+	int i, id;
 
-	pkg = topology_logical_package_id(old_cpu < 0 ? new_cpu : old_cpu);
+	id = get_domain_id(old_cpu < 0 ? new_cpu : old_cpu, &type->type);
 	for (i = 0; i < type->num_boxes; i++, pmu++) {
-		box = pmu->boxes[pkg];
+		box = pmu->boxes[id];
 		if (!box)
 			continue;
 
@@ -1139,11 +1175,37 @@ static void uncore_change_type_ctx(struct intel_uncore_type *type, int old_cpu,
 	}
 }
 
-static void uncore_change_context(struct intel_uncore_type **uncores,
-				  int old_cpu, int new_cpu)
+static void uncore_change_context_offline(struct intel_uncore_type **uncores,
+					  int cpu, int *target)
 {
-	for (; *uncores; uncores++)
-		uncore_change_type_ctx(*uncores, old_cpu, new_cpu);
+	const struct cpumask *cpu_mask;
+	struct intel_uncore_type *type;
+	enum domain_types id;
+
+	for (; *uncores; uncores++) {
+		type = *uncores;
+		id = type->type.type;
+
+		if (target[id] == nr_cpu_ids) {
+
+			/* Check if existing cpu is used for collecting uncore events */
+			if (!cpumask_test_and_clear_cpu(cpu, &uncore_cpu_mask[id]))
+				continue;
+
+			cpu_mask = get_domain_cpu_mask(cpu, &type->type);
+			if (!cpu_mask)
+				continue;
+			/* Find a new cpu to collect uncore events */
+			target[id] = cpumask_any_but(cpu_mask, cpu);
+
+			/* Migrate uncore events to the new target */
+			if (target[id] < nr_cpu_ids)
+				cpumask_set_cpu(target[id], &uncore_cpu_mask[id]);
+			else
+				target[id] = -1;
+		}
+		uncore_change_type_ctx(type, cpu, target[id]);
+	}
 }
 
 static int uncore_event_cpu_offline(unsigned int cpu)
@@ -1151,31 +1213,19 @@ static int uncore_event_cpu_offline(unsigned int cpu)
 	struct intel_uncore_type *type, **types = uncore_msr_uncores;
 	struct intel_uncore_pmu *pmu;
 	struct intel_uncore_box *box;
-	int i, pkg, target;
+	int i, id;
+	int target[DOMAIN_TYPE_MAX] = { [0 ... DOMAIN_TYPE_MAX - 1] = nr_cpu_ids };
 
-	/* Check if exiting cpu is used for collecting uncore events */
-	if (!cpumask_test_and_clear_cpu(cpu, &uncore_cpu_mask))
-		goto unref;
-	/* Find a new cpu to collect uncore events */
-	target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
+	uncore_change_context_offline(uncore_msr_uncores, cpu, target);
+	uncore_change_context_offline(uncore_pci_uncores, cpu, target);
 
-	/* Migrate uncore events to the new target */
-	if (target < nr_cpu_ids)
-		cpumask_set_cpu(target, &uncore_cpu_mask);
-	else
-		target = -1;
-
-	uncore_change_context(uncore_msr_uncores, cpu, target);
-	uncore_change_context(uncore_pci_uncores, cpu, target);
-
-unref:
 	/* Clear the references */
-	pkg = topology_logical_package_id(cpu);
 	for (; *types; types++) {
 		type = *types;
 		pmu = type->pmus;
+		id = get_domain_id(cpu, &type->type);
 		for (i = 0; i < type->num_boxes; i++, pmu++) {
-			box = pmu->boxes[pkg];
+			box = pmu->boxes[id];
 			if (box && atomic_dec_return(&box->refcnt) == 0)
 				uncore_box_exit(box);
 		}
@@ -1183,34 +1233,78 @@ static int uncore_event_cpu_offline(unsigned int cpu)
 	return 0;
 }
 
+static void uncore_change_context_online(struct intel_uncore_type **uncores,
+					 int cpu, int *target)
+{
+	const struct cpumask *cpu_mask;
+	struct intel_uncore_type *type;
+	enum domain_types id;
+
+	for (; *uncores; uncores++) {
+		type = *uncores;
+		id = type->type.type;
+
+		/*
+		 * Check if there is an online cpu in the domain
+		 * which collects uncore events already.
+		 * If yes, set target[id] = -1, other uncores from
+		 * the same domain will not re-check.
+		 * If no, set target[id] = cpu, update cpu_mask
+		 */
+		if (target[id] == nr_cpu_ids) {
+			cpu_mask = get_domain_cpu_mask(cpu, &type->type);
+			if (!cpu_mask)
+				continue;
+
+			target[id] = cpumask_any_and(&uncore_cpu_mask[id], cpu_mask);
+			if (target[id] < nr_cpu_ids) {
+				target[id] = -1;
+				continue;
+			}
+			target[id] = cpu;
+			cpumask_set_cpu(cpu, &uncore_cpu_mask[id]);
+		}
+
+		/*
+		 * There is an online cpu which collects
+		 * uncore events for the domain already.
+		 */
+		if (target[id] == -1)
+			continue;
+
+		uncore_change_type_ctx(type, -1, cpu);
+	}
+}
+
 static int allocate_boxes(struct intel_uncore_type **types,
-			 unsigned int pkg, unsigned int cpu)
+			  unsigned int cpu)
 {
 	struct intel_uncore_box *box, *tmp;
 	struct intel_uncore_type *type;
 	struct intel_uncore_pmu *pmu;
 	LIST_HEAD(allocated);
-	int i;
+	int i, id;
 
 	/* Try to allocate all required boxes */
 	for (; *types; types++) {
 		type = *types;
 		pmu = type->pmus;
+		id = get_domain_id(cpu, &type->type);
 		for (i = 0; i < type->num_boxes; i++, pmu++) {
-			if (pmu->boxes[pkg])
+			if (pmu->boxes[id])
 				continue;
 			box = uncore_alloc_box(type, cpu_to_node(cpu));
 			if (!box)
 				goto cleanup;
 			box->pmu = pmu;
-			box->pkgid = pkg;
+			box->domain_id = id;
 			list_add(&box->active_list, &allocated);
 		}
 	}
 	/* Install them in the pmus */
 	list_for_each_entry_safe(box, tmp, &allocated, active_list) {
 		list_del_init(&box->active_list);
-		box->pmu->boxes[pkg] = box;
+		box->pmu->boxes[box->domain_id] = box;
 	}
 	return 0;
 
@@ -1227,35 +1321,26 @@ static int uncore_event_cpu_online(unsigned int cpu)
 	struct intel_uncore_type *type, **types = uncore_msr_uncores;
 	struct intel_uncore_pmu *pmu;
 	struct intel_uncore_box *box;
-	int i, ret, pkg, target;
+	int i, ret, id;
+	int target[DOMAIN_TYPE_MAX] = { [0 ... DOMAIN_TYPE_MAX - 1] = nr_cpu_ids };
 
-	pkg = topology_logical_package_id(cpu);
-	ret = allocate_boxes(types, pkg, cpu);
+	ret = allocate_boxes(types, cpu);
 	if (ret)
 		return ret;
 
 	for (; *types; types++) {
 		type = *types;
 		pmu = type->pmus;
+		id = get_domain_id(cpu, &type->type);
 		for (i = 0; i < type->num_boxes; i++, pmu++) {
-			box = pmu->boxes[pkg];
+			box = pmu->boxes[id];
 			if (box && atomic_inc_return(&box->refcnt) == 1)
 				uncore_box_init(box);
 		}
 	}
 
-	/*
-	 * Check if there is an online cpu in the package
-	 * which collects uncore events already.
-	 */
-	target = cpumask_any_and(&uncore_cpu_mask, topology_core_cpumask(cpu));
-	if (target < nr_cpu_ids)
-		return 0;
-
-	cpumask_set_cpu(cpu, &uncore_cpu_mask);
-
-	uncore_change_context(uncore_msr_uncores, -1, cpu);
-	uncore_change_context(uncore_pci_uncores, -1, cpu);
+	uncore_change_context_online(uncore_msr_uncores, cpu, target);
+	uncore_change_context_online(uncore_pci_uncores, cpu, target);
 	return 0;
 }
 
@@ -1417,8 +1502,6 @@ static int __init intel_uncore_init(void)
 	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
 		return -ENODEV;
 
-	max_packages = topology_max_packages();
-
 	uncore_init = (struct intel_uncore_init_fun *)id->driver_data;
 	if (uncore_init->pci_init) {
 		pret = uncore_init->pci_init();
diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index cb46d60..3c06e1b 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -5,8 +5,8 @@
 
 #include <linux/perf_event.h>
 #include "../perf_event.h"
+#include "../domain.h"
 
-#define UNCORE_PMU_NAME_LEN		32
 #define UNCORE_PMU_HRTIMER_INTERVAL	(60LL * NSEC_PER_SEC)
 #define UNCORE_SNB_IMC_HRTIMER_INTERVAL (5ULL * NSEC_PER_SEC)
 
@@ -44,6 +44,7 @@ struct freerunning_counters;
 
 struct intel_uncore_type {
 	const char *name;
+	struct domain_type type;
 	int num_counters;
 	int num_boxes;
 	int perf_ctr_bits;
@@ -91,7 +92,7 @@ struct intel_uncore_ops {
 
 struct intel_uncore_pmu {
 	struct pmu			pmu;
-	char				name[UNCORE_PMU_NAME_LEN];
+	char				name[DOMAIN_NAME_LEN];
 	int				pmu_idx;
 	int				func_id;
 	bool				registered;
@@ -108,7 +109,7 @@ struct intel_uncore_extra_reg {
 
 struct intel_uncore_box {
 	int pci_phys_id;
-	int pkgid;	/* Logical package ID */
+	int domain_id;
 	int n_active;	/* number of active events */
 	int n_events;
 	int cpu;	/* cpu to collect events */
@@ -467,7 +468,7 @@ static inline void uncore_box_exit(struct intel_uncore_box *box)
 
 static inline bool uncore_box_is_fake(struct intel_uncore_box *box)
 {
-	return (box->pkgid < 0);
+	return (box->domain_id < 0);
 }
 
 static inline struct intel_uncore_pmu *uncore_event_to_pmu(struct perf_event *event)
diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
index b10e043..ba416b8 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -1058,7 +1058,7 @@ static void snbep_qpi_enable_event(struct intel_uncore_box *box, struct perf_eve
 
 	if (reg1->idx != EXTRA_REG_NONE) {
 		int idx = box->pmu->pmu_idx + SNBEP_PCI_QPI_PORT0_FILTER;
-		int pkg = box->pkgid;
+		int pkg = box->domain_id;
 		struct pci_dev *filter_pdev = uncore_extra_pci_dev[pkg].dev[idx];
 
 		if (filter_pdev) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 04/10] perf/x86/intel/rapl: Apply "domain" for RAPL
  2019-02-19 20:00 [PATCH 00/10] perf: Multi-die/package support kan.liang
                   ` (2 preceding siblings ...)
  2019-02-19 20:00 ` [PATCH 03/10] perf/x86/intel/uncore: Apply "domain" for uncore kan.liang
@ 2019-02-19 20:00 ` kan.liang
  2019-02-19 20:00 ` [PATCH 05/10] perf/x86/intel/domain: Add new domain type for die kan.liang
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: kan.liang @ 2019-02-19 20:00 UTC (permalink / raw)
  To: peterz, tglx, acme, mingo, x86, linux-kernel
  Cc: len.brown, jolsa, namhyung, eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The RAPL counters are not package scope only anymore. For example, there
will be die scope RAPL counters on CLX-AP.
Apply "domain" for RAPL, and make it easy to be extended later.

Each type of domain needs a dedicated rapl_pmus. The struct rapl_pmus is
modified accordingly.
 - The fixed counters may be different among different domain types.
   Move rapl_cntr_mask to struct rapl_pmus.
 - The CPU mask may be different among different domain types as well.
   Move rapl_cpu_mask to struct rapl_pmus. Also update
   rapl_cpu_online/offline accordingly.
 - Replace maxpkg by the number of domain

Rename rapl_pmu_events_group to rapl_pkg_pmu_events_group for domains of
PACKAGE_DOMAIN type.

Added PMU name in rapl_advertise() to distinguish between different type
of domain.

Extend intel_rapl_init_fun to support events from different type of
domain.

If there are more than two types of domain on a machine, using new PMU
name, "power_$domain_type", otherwise, still use "power".

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/rapl.c | 306 +++++++++++++++++++++++++++++++------------
 1 file changed, 224 insertions(+), 82 deletions(-)

diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c
index 91039ff..c1ba09c 100644
--- a/arch/x86/events/intel/rapl.c
+++ b/arch/x86/events/intel/rapl.c
@@ -57,6 +57,7 @@
 #include <asm/cpu_device_id.h>
 #include <asm/intel-family.h>
 #include "../perf_event.h"
+#include "../domain.h"
 
 MODULE_LICENSE("GPL");
 
@@ -148,26 +149,31 @@ struct rapl_pmu {
 
 struct rapl_pmus {
 	struct pmu		pmu;
-	unsigned int		maxpkg;
+	struct domain_type	type;
+	unsigned int		rapl_cntr_mask;
+	cpumask_t		rapl_cpu_mask;
 	struct rapl_pmu		*pmus[];
 };
 
  /* 1/2^hw_unit Joule */
 static int rapl_hw_unit[NR_RAPL_DOMAINS] __read_mostly;
-static struct rapl_pmus *rapl_pmus;
-static cpumask_t rapl_cpu_mask;
-static unsigned int rapl_cntr_mask;
+static struct rapl_pmus *rapl_pmus[DOMAIN_TYPE_MAX];
 static u64 rapl_timer_ms;
+static unsigned int rapl_domain_type_mask;
 
-static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
+static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu,
+					       struct rapl_pmus *pmus)
 {
-	unsigned int pkgid = topology_logical_package_id(cpu);
+	unsigned int id = get_domain_id(cpu, &pmus->type);
+
+	if (!pmus)
+		return NULL;
 
 	/*
 	 * The unsigned check also catches the '-1' return value for non
 	 * existent mappings in the topology map.
 	 */
-	return pkgid < rapl_pmus->maxpkg ? rapl_pmus->pmus[pkgid] : NULL;
+	return id < pmus->type.max_domains ? pmus->pmus[id] : NULL;
 }
 
 static inline u64 rapl_read_counter(struct perf_event *event)
@@ -350,10 +356,15 @@ static int rapl_pmu_event_init(struct perf_event *event)
 {
 	u64 cfg = event->attr.config & RAPL_EVENT_MASK;
 	int bit, msr, ret = 0;
+	struct rapl_pmus *pmus;
 	struct rapl_pmu *pmu;
 
+	pmus = container_of(event->pmu, struct rapl_pmus, pmu);
+	if (!pmus)
+		return -ENOENT;
+
 	/* only look at RAPL events */
-	if (event->attr.type != rapl_pmus->pmu.type)
+	if (event->attr.type != pmus->pmu.type)
 		return -ENOENT;
 
 	/* check only supported bits are set */
@@ -393,7 +404,7 @@ static int rapl_pmu_event_init(struct perf_event *event)
 		return -EINVAL;
 	}
 	/* check event supported */
-	if (!(rapl_cntr_mask & (1 << bit)))
+	if (!(pmus->rapl_cntr_mask & (1 << bit)))
 		return -EINVAL;
 
 	/* unsupported modes and filters */
@@ -407,7 +418,7 @@ static int rapl_pmu_event_init(struct perf_event *event)
 		return -EINVAL;
 
 	/* must be done before validate_group */
-	pmu = cpu_to_rapl_pmu(event->cpu);
+	pmu = cpu_to_rapl_pmu(event->cpu, pmus);
 	if (!pmu)
 		return -EINVAL;
 	event->cpu = pmu->cpu;
@@ -425,9 +436,21 @@ static void rapl_pmu_event_read(struct perf_event *event)
 }
 
 static ssize_t rapl_get_attr_cpumask(struct device *dev,
-				struct device_attribute *attr, char *buf)
+					 struct device_attribute *attr,
+					 char *buf)
 {
-	return cpumap_print_to_pagebuf(true, buf, &rapl_cpu_mask);
+	struct pmu *pmu = dev_get_drvdata(dev);
+	struct rapl_pmus *pmus;
+	int i;
+
+	for (i = 0; i < DOMAIN_TYPE_MAX; i++) {
+		pmus = rapl_pmus[i];
+		if (!pmus || &pmus->pmu != pmu)
+			continue;
+
+		return cpumap_print_to_pagebuf(true, buf, &pmus->rapl_cpu_mask);
+	}
+	return 0;
 }
 
 static DEVICE_ATTR(cpumask, S_IRUGO, rapl_get_attr_cpumask, NULL);
@@ -543,7 +566,7 @@ static struct attribute *rapl_events_knl_attr[] = {
 	NULL,
 };
 
-static struct attribute_group rapl_pmu_events_group = {
+static struct attribute_group rapl_pkg_pmu_events_group = {
 	.name = "events",
 	.attrs = NULL, /* patched at runtime */
 };
@@ -559,39 +582,63 @@ static struct attribute_group rapl_pmu_format_group = {
 	.attrs = rapl_formats_attr,
 };
 
-static const struct attribute_group *rapl_attr_groups[] = {
+static const struct attribute_group *rapl_pkg_attr_groups[] = {
 	&rapl_pmu_attr_group,
 	&rapl_pmu_format_group,
-	&rapl_pmu_events_group,
+	&rapl_pkg_pmu_events_group,
 	NULL,
 };
 
-static int rapl_cpu_offline(unsigned int cpu)
+static int __rapl_cpu_offline(unsigned int cpu, struct rapl_pmus *pmus)
 {
-	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
+	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu, pmus);
+	const struct cpumask *cpu_mask;
 	int target;
 
+	if (!pmus)
+		return -1;
+
 	/* Check if exiting cpu is used for collecting rapl events */
-	if (!cpumask_test_and_clear_cpu(cpu, &rapl_cpu_mask))
+	if (!cpumask_test_and_clear_cpu(cpu, &pmus->rapl_cpu_mask))
 		return 0;
 
 	pmu->cpu = -1;
 	/* Find a new cpu to collect rapl events */
-	target = cpumask_any_but(topology_core_cpumask(cpu), cpu);
+	cpu_mask = get_domain_cpu_mask(cpu, &pmus->type);
+	if (!cpu_mask)
+		return -1;
+	target = cpumask_any_but(cpu_mask, cpu);
 
 	/* Migrate rapl events to the new target */
 	if (target < nr_cpu_ids) {
-		cpumask_set_cpu(target, &rapl_cpu_mask);
+		cpumask_set_cpu(target, &pmus->rapl_cpu_mask);
 		pmu->cpu = target;
 		perf_pmu_migrate_context(pmu->pmu, cpu, target);
 	}
 	return 0;
 }
 
-static int rapl_cpu_online(unsigned int cpu)
+static int rapl_cpu_offline(unsigned int cpu)
 {
-	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
-	int target;
+	int i;
+
+	for (i = 0; i < DOMAIN_TYPE_MAX; i++) {
+		if (!rapl_pmus[i])
+			continue;
+
+		__rapl_cpu_offline(cpu, rapl_pmus[i]);
+	}
+	return 0;
+}
+
+static int __rapl_cpu_online(unsigned int cpu, struct rapl_pmus *pmus)
+{
+	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu, pmus);
+	const struct cpumask *cpu_mask;
+	int target, id;
+
+	if (!pmus)
+		return -EINVAL;
 
 	if (!pmu) {
 		pmu = kzalloc_node(sizeof(*pmu), GFP_KERNEL, cpu_to_node(cpu));
@@ -600,26 +647,47 @@ static int rapl_cpu_online(unsigned int cpu)
 
 		raw_spin_lock_init(&pmu->lock);
 		INIT_LIST_HEAD(&pmu->active_list);
-		pmu->pmu = &rapl_pmus->pmu;
+		pmu->pmu = &pmus->pmu;
 		pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
 		rapl_hrtimer_init(pmu);
 
-		rapl_pmus->pmus[topology_logical_package_id(cpu)] = pmu;
+		id = get_domain_id(cpu, &pmus->type);
+		if (id < 0) {
+			kfree(pmu);
+			return -EINVAL;
+		}
+		pmus->pmus[id] = pmu;
 	}
 
 	/*
 	 * Check if there is an online cpu in the package which collects rapl
 	 * events already.
 	 */
-	target = cpumask_any_and(&rapl_cpu_mask, topology_core_cpumask(cpu));
+	cpu_mask = get_domain_cpu_mask(cpu, &pmus->type);
+	if (!cpu_mask)
+		return -1;
+	target = cpumask_any_and(&pmus->rapl_cpu_mask, cpu_mask);
 	if (target < nr_cpu_ids)
 		return 0;
 
-	cpumask_set_cpu(cpu, &rapl_cpu_mask);
+	cpumask_set_cpu(cpu, &pmus->rapl_cpu_mask);
 	pmu->cpu = cpu;
 	return 0;
 }
 
+static int rapl_cpu_online(unsigned int cpu)
+{
+	int i;
+
+	for (i = 0; i < DOMAIN_TYPE_MAX; i++) {
+		if (!rapl_pmus[i])
+			continue;
+
+		__rapl_cpu_online(cpu, rapl_pmus[i]);
+	}
+	return 0;
+}
+
 static int rapl_check_hw_unit(bool apply_quirk)
 {
 	u64 msr_rapl_power_unit_bits;
@@ -657,94 +725,163 @@ static int rapl_check_hw_unit(bool apply_quirk)
 
 static void __init rapl_advertise(void)
 {
-	int i;
-
-	pr_info("API unit is 2^-32 Joules, %d fixed counters, %llu ms ovfl timer\n",
-		hweight32(rapl_cntr_mask), rapl_timer_ms);
-
-	for (i = 0; i < NR_RAPL_DOMAINS; i++) {
-		if (rapl_cntr_mask & (1 << i)) {
-			pr_info("hw unit of domain %s 2^-%d Joules\n",
-				rapl_domain_names[i], rapl_hw_unit[i]);
+	int i, j;
+
+	for (i = 0; i < DOMAIN_TYPE_MAX; i++) {
+		if (!rapl_pmus[i])
+			continue;
+
+		pr_info("%s: API unit is 2^-32 Joules, "
+			"%d fixed counters, %llu ms ovfl timer\n",
+			rapl_pmus[i]->pmu.name,
+			hweight32(rapl_pmus[i]->rapl_cntr_mask),
+			rapl_timer_ms);
+
+		for (j = 0; j < NR_RAPL_DOMAINS; j++) {
+			if (rapl_pmus[i]->rapl_cntr_mask & (1 << j)) {
+				pr_info("hw unit of domain %s 2^-%d Joules\n",
+					rapl_domain_names[j], rapl_hw_unit[j]);
+			}
 		}
 	}
 }
 
 static void cleanup_rapl_pmus(void)
 {
-	int i;
-
-	for (i = 0; i < rapl_pmus->maxpkg; i++)
-		kfree(rapl_pmus->pmus[i]);
-	kfree(rapl_pmus);
+	int i, j;
+
+	for (i = 0; i < DOMAIN_TYPE_MAX; i++) {
+		if (!rapl_pmus[i])
+			continue;
+		for (j = 0; j < rapl_pmus[i]->type.max_domains; j++)
+			kfree(rapl_pmus[i]->pmus[j]);
+		kfree(rapl_pmus[i]);
+		rapl_pmus[i] = NULL;
+	}
 }
 
-static int __init init_rapl_pmus(void)
+struct intel_rapl_events {
+	int cntr_mask;
+	struct attribute **attrs;
+};
+
+struct intel_rapl_init_fun {
+	bool apply_quirk;
+	const struct intel_rapl_events events[DOMAIN_TYPE_MAX];
+};
+
+static int __init init_rapl_pmus(const struct intel_rapl_events *events,
+				 enum domain_types type)
 {
-	int maxpkg = topology_max_packages();
+	struct domain_type domain_type;
+	struct rapl_pmus *pmus;
 	size_t size;
 
-	size = sizeof(*rapl_pmus) + maxpkg * sizeof(struct rapl_pmu *);
-	rapl_pmus = kzalloc(size, GFP_KERNEL);
-	if (!rapl_pmus)
+	domain_type.type = type;
+	if (domain_type_init(&domain_type))
+		return -ENODEV;
+
+	size = sizeof(struct rapl_pmus) + domain_type.max_domains * sizeof(struct rapl_pmu *);
+	pmus = kzalloc(size, GFP_KERNEL);
+	if (!pmus)
 		return -ENOMEM;
 
-	rapl_pmus->maxpkg		= maxpkg;
-	rapl_pmus->pmu.attr_groups	= rapl_attr_groups;
-	rapl_pmus->pmu.task_ctx_nr	= perf_invalid_context;
-	rapl_pmus->pmu.event_init	= rapl_pmu_event_init;
-	rapl_pmus->pmu.add		= rapl_pmu_event_add;
-	rapl_pmus->pmu.del		= rapl_pmu_event_del;
-	rapl_pmus->pmu.start		= rapl_pmu_event_start;
-	rapl_pmus->pmu.stop		= rapl_pmu_event_stop;
-	rapl_pmus->pmu.read		= rapl_pmu_event_read;
-	rapl_pmus->pmu.module		= THIS_MODULE;
+	memcpy(&pmus->type, &domain_type, sizeof(struct domain_type));
+	pmus->rapl_cntr_mask = events->cntr_mask;
+	if (type == PACKAGE_DOMAIN) {
+		rapl_pkg_pmu_events_group.attrs = events->attrs;
+		pmus->pmu.attr_groups = rapl_pkg_attr_groups;
+	}
+	pmus->pmu.task_ctx_nr = perf_invalid_context;
+	pmus->pmu.event_init = rapl_pmu_event_init;
+	pmus->pmu.add = rapl_pmu_event_add;
+	pmus->pmu.del = rapl_pmu_event_del;
+	pmus->pmu.start = rapl_pmu_event_start;
+	pmus->pmu.stop = rapl_pmu_event_stop;
+	pmus->pmu.read = rapl_pmu_event_read;
+	pmus->pmu.module = THIS_MODULE;
+
+	rapl_pmus[type] = pmus;
 	return 0;
 }
 
+static int __init rapl_pmus_register(void)
+{
+	bool registered = false;
+	char name[DOMAIN_NAME_LEN];
+	int i, ret;
+
+	for (i = 0; i < DOMAIN_TYPE_MAX; i++) {
+		if (!rapl_pmus[i])
+			continue;
+
+		if (hweight32(rapl_domain_type_mask) > 1)
+			ret = snprintf(name, DOMAIN_NAME_LEN, "power_%s",
+				       rapl_pmus[i]->type.postfix);
+		else
+			ret = snprintf(name, DOMAIN_NAME_LEN, "power");
+		if (ret < 0)
+			continue;
+		ret = perf_pmu_register(&rapl_pmus[i]->pmu, name, -1);
+		if (ret) {
+			kfree(rapl_pmus[i]);
+			rapl_pmus[i] = NULL;
+			continue;
+		}
+		registered = true;
+	}
+
+	return registered ? 0 : -1;
+}
+
+static void rapl_pmus_unregister(void)
+{
+	int i;
+
+	for (i = 0; i < DOMAIN_TYPE_MAX; i++) {
+		if (!rapl_pmus[i])
+			continue;
+		perf_pmu_unregister(&rapl_pmus[i]->pmu);
+	}
+}
+
 #define X86_RAPL_MODEL_MATCH(model, init)	\
 	{ X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY, (unsigned long)&init }
 
-struct intel_rapl_init_fun {
-	bool apply_quirk;
-	int cntr_mask;
-	struct attribute **attrs;
-};
-
 static const struct intel_rapl_init_fun snb_rapl_init __initconst = {
 	.apply_quirk = false,
-	.cntr_mask = RAPL_IDX_CLN,
-	.attrs = rapl_events_cln_attr,
+	.events[PACKAGE_DOMAIN].cntr_mask = RAPL_IDX_CLN,
+	.events[PACKAGE_DOMAIN].attrs = rapl_events_cln_attr,
 };
 
 static const struct intel_rapl_init_fun hsx_rapl_init __initconst = {
 	.apply_quirk = true,
-	.cntr_mask = RAPL_IDX_SRV,
-	.attrs = rapl_events_srv_attr,
+	.events[PACKAGE_DOMAIN].cntr_mask = RAPL_IDX_SRV,
+	.events[PACKAGE_DOMAIN].attrs = rapl_events_srv_attr,
 };
 
 static const struct intel_rapl_init_fun hsw_rapl_init __initconst = {
 	.apply_quirk = false,
-	.cntr_mask = RAPL_IDX_HSW,
-	.attrs = rapl_events_hsw_attr,
+	.events[PACKAGE_DOMAIN].cntr_mask = RAPL_IDX_HSW,
+	.events[PACKAGE_DOMAIN].attrs = rapl_events_hsw_attr,
 };
 
 static const struct intel_rapl_init_fun snbep_rapl_init __initconst = {
 	.apply_quirk = false,
-	.cntr_mask = RAPL_IDX_SRV,
-	.attrs = rapl_events_srv_attr,
+	.events[PACKAGE_DOMAIN].cntr_mask = RAPL_IDX_SRV,
+	.events[PACKAGE_DOMAIN].attrs = rapl_events_srv_attr,
 };
 
 static const struct intel_rapl_init_fun knl_rapl_init __initconst = {
 	.apply_quirk = true,
-	.cntr_mask = RAPL_IDX_KNL,
-	.attrs = rapl_events_knl_attr,
+	.events[PACKAGE_DOMAIN].cntr_mask = RAPL_IDX_KNL,
+	.events[PACKAGE_DOMAIN].attrs = rapl_events_knl_attr,
 };
 
 static const struct intel_rapl_init_fun skl_rapl_init __initconst = {
 	.apply_quirk = false,
-	.cntr_mask = RAPL_IDX_SKL_CLN,
-	.attrs = rapl_events_skl_attr,
+	.events[PACKAGE_DOMAIN].cntr_mask = RAPL_IDX_SKL_CLN,
+	.events[PACKAGE_DOMAIN].attrs = rapl_events_skl_attr,
 };
 
 static const struct x86_cpu_id rapl_cpu_match[] __initconst = {
@@ -790,7 +927,7 @@ static int __init rapl_pmu_init(void)
 	const struct x86_cpu_id *id;
 	struct intel_rapl_init_fun *rapl_init;
 	bool apply_quirk;
-	int ret;
+	int i, ret;
 
 	id = x86_match_cpu(rapl_cpu_match);
 	if (!id)
@@ -798,16 +935,21 @@ static int __init rapl_pmu_init(void)
 
 	rapl_init = (struct intel_rapl_init_fun *)id->driver_data;
 	apply_quirk = rapl_init->apply_quirk;
-	rapl_cntr_mask = rapl_init->cntr_mask;
-	rapl_pmu_events_group.attrs = rapl_init->attrs;
 
 	ret = rapl_check_hw_unit(apply_quirk);
 	if (ret)
 		return ret;
 
-	ret = init_rapl_pmus();
-	if (ret)
-		return ret;
+	for (i = 0; i < DOMAIN_TYPE_MAX; i++) {
+		if (!rapl_init->events[i].cntr_mask)
+			continue;
+		ret = init_rapl_pmus(&rapl_init->events[i], i);
+		if (ret)
+			continue;
+		rapl_domain_type_mask |= (1 << i);
+	}
+	if (hweight32(rapl_domain_type_mask) == 0)
+		return -ENODEV;
 
 	/*
 	 * Install callbacks. Core will call them for each online cpu.
@@ -818,7 +960,7 @@ static int __init rapl_pmu_init(void)
 	if (ret)
 		goto out;
 
-	ret = perf_pmu_register(&rapl_pmus->pmu, "power", -1);
+	ret = rapl_pmus_register();
 	if (ret)
 		goto out1;
 
@@ -837,7 +979,7 @@ module_init(rapl_pmu_init);
 static void __exit intel_rapl_exit(void)
 {
 	cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_RAPL_ONLINE);
-	perf_pmu_unregister(&rapl_pmus->pmu);
+	rapl_pmus_unregister();
 	cleanup_rapl_pmus();
 }
 module_exit(intel_rapl_exit);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 05/10] perf/x86/intel/domain: Add new domain type for die
  2019-02-19 20:00 [PATCH 00/10] perf: Multi-die/package support kan.liang
                   ` (3 preceding siblings ...)
  2019-02-19 20:00 ` [PATCH 04/10] perf/x86/intel/rapl: Apply "domain" for RAPL kan.liang
@ 2019-02-19 20:00 ` kan.liang
  2019-02-19 20:00 ` [PATCH 06/10] perf/x86/intel/cstate: Support die scope counters on CLX-AP kan.liang
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: kan.liang @ 2019-02-19 20:00 UTC (permalink / raw)
  To: peterz, tglx, acme, mingo, x86, linux-kernel
  Cc: len.brown, jolsa, namhyung, eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Starts from CLX-AP, some uncore and RAPL counters are die scope.

Add a new domain type DIE_DOMAIN for die scope counters.

To distinguish different die among package, unique die id is used as
domain id.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/domain.c | 11 +++++++++++
 arch/x86/events/domain.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/arch/x86/events/domain.c b/arch/x86/events/domain.c
index bd24c5b..1f572a5 100644
--- a/arch/x86/events/domain.c
+++ b/arch/x86/events/domain.c
@@ -17,6 +17,10 @@ int domain_type_init(struct domain_type *type)
 	case CORE_DOMAIN:
 		type->postfix = "core";
 		return 0;
+	case DIE_DOMAIN:
+		type->max_domains = topology_max_packages() * boot_cpu_data.x86_max_dies;
+		type->postfix = "die";
+		return 0;
 	default:
 		return -1;
 	}
@@ -31,6 +35,8 @@ const struct cpumask *get_domain_cpu_mask(int cpu, struct domain_type *type)
 		return topology_die_cpumask(cpu);
 	case CORE_DOMAIN:
 		return topology_sibling_cpumask(cpu);
+	case DIE_DOMAIN:
+		return topology_core_cpumask(cpu);
 	default:
 		return NULL;
 	}
@@ -47,6 +53,9 @@ int get_domain_id(unsigned int cpu, struct domain_type *type)
 	case PACKAGE_DOMAIN:
 		/* Domain id is the same as logical package id */
 		return topology_logical_package_id(cpu);
+	case DIE_DOMAIN:
+		/* Domain id is the same as logical unique die id */
+		return topology_unique_die_id(cpu);
 	default:
 		return -1;
 	}
@@ -63,6 +72,8 @@ int get_domain_id_from_group_id(int id, struct domain_type *type)
 	case PACKAGE_DOMAIN:
 		/* group id is physical pkg id*/
 		return topology_phys_to_logical_pkg(id);
+	case DIE_DOMAIN:
+		return id;
 	default:
 		return -1;
 	}
diff --git a/arch/x86/events/domain.h b/arch/x86/events/domain.h
index c787816..7815aea 100644
--- a/arch/x86/events/domain.h
+++ b/arch/x86/events/domain.h
@@ -9,6 +9,7 @@
 enum domain_types {
 	PACKAGE_DOMAIN = 0,
 	CORE_DOMAIN,
+	DIE_DOMAIN,
 
 	DOMAIN_TYPE_MAX,
 };
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 06/10] perf/x86/intel/cstate: Support die scope counters on CLX-AP
  2019-02-19 20:00 [PATCH 00/10] perf: Multi-die/package support kan.liang
                   ` (4 preceding siblings ...)
  2019-02-19 20:00 ` [PATCH 05/10] perf/x86/intel/domain: Add new domain type for die kan.liang
@ 2019-02-19 20:00 ` kan.liang
  2019-02-19 20:00 ` [PATCH 07/10] perf/x86/intel/uncore: " kan.liang
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: kan.liang @ 2019-02-19 20:00 UTC (permalink / raw)
  To: peterz, tglx, acme, mingo, x86, linux-kernel
  Cc: len.brown, jolsa, namhyung, eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

CLX-AP has the same PKG Cstate counters as SKX, but they are die scope.

Add clx_ap_cstates for CLX-AP.
Share the package code with die for now.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/cstate.c | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c
index 5f71606..8e7d906 100644
--- a/arch/x86/events/intel/cstate.c
+++ b/arch/x86/events/intel/cstate.c
@@ -31,6 +31,7 @@
  * with the perf_event core subsystem.
  *  - 'cstate_core': The counter is available for each physical core.
  *    The counters include CORE_C*_RESIDENCY.
+ *  - 'cstate_die': The counter is available for each physical die.
  *  - 'cstate_pkg': The counter is available for each physical package.
  *    The counters include PKG_C*_RESIDENCY.
  *
@@ -118,6 +119,7 @@ struct cstate_model {
 		struct {
 			unsigned long	pkg_events;
 			unsigned long	core_events;
+			unsigned long	die_events;
 		};
 	};
 	unsigned long		quirks;
@@ -446,6 +448,17 @@ static const struct cstate_model snb_cstates __initconst = {
 				  BIT(PERF_CSTATE_PKG_C7_RES),
 };
 
+static const struct cstate_model clx_ap_cstates __initconst = {
+	.core_events		= BIT(PERF_CSTATE_CORE_C3_RES) |
+				  BIT(PERF_CSTATE_CORE_C6_RES) |
+				  BIT(PERF_CSTATE_CORE_C7_RES),
+
+	.die_events		= BIT(PERF_CSTATE_PKG_C2_RES) |
+				  BIT(PERF_CSTATE_PKG_C3_RES) |
+				  BIT(PERF_CSTATE_PKG_C6_RES) |
+				  BIT(PERF_CSTATE_PKG_C7_RES),
+};
+
 static const struct cstate_model hswult_cstates __initconst = {
 	.core_events		= BIT(PERF_CSTATE_CORE_C3_RES) |
 				  BIT(PERF_CSTATE_CORE_C6_RES) |
@@ -574,7 +587,8 @@ static bool __init cstate_probe_msr(const unsigned long evmsk,
 	int max;
 	u64 val;
 
-	if (type == PACKAGE_DOMAIN) {
+	if ((type == PACKAGE_DOMAIN) ||
+	    (type == DIE_DOMAIN)) {
 		max = PERF_CSTATE_PKG_EVENT_MAX;
 		msr = pkg_msr;
 		attrs = pkg_events_attrs;
@@ -672,7 +686,8 @@ static int __init cstate_init(void)
 			continue;
 		pmu = &cstate_pmus[i]->pmu;
 
-		if (i == PACKAGE_DOMAIN)
+		if ((i == PACKAGE_DOMAIN) ||
+		    (i == DIE_DOMAIN))
 			pmu->attr_groups = pkg_attr_groups;
 		else if (i == CORE_DOMAIN)
 			pmu->attr_groups = core_attr_groups;
@@ -707,6 +722,7 @@ static int __init cstate_init(void)
 
 static int __init cstate_pmu_init(void)
 {
+	const struct cstate_model *cstate;
 	const struct x86_cpu_id *id;
 	int err;
 
@@ -717,7 +733,12 @@ static int __init cstate_pmu_init(void)
 	if (!id)
 		return -ENODEV;
 
-	err = cstate_probe((const struct cstate_model *) id->driver_data);
+	if ((boot_cpu_data.x86_model == INTEL_FAM6_SKYLAKE_X) &&
+	    (boot_cpu_data.x86_max_dies > 1))
+		cstate = &clx_ap_cstates;
+	else
+		cstate = (const struct cstate_model *) id->driver_data;
+	err = cstate_probe(cstate);
 	if (err)
 		return err;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 07/10] perf/x86/intel/uncore: Support die scope counters on CLX-AP
  2019-02-19 20:00 [PATCH 00/10] perf: Multi-die/package support kan.liang
                   ` (5 preceding siblings ...)
  2019-02-19 20:00 ` [PATCH 06/10] perf/x86/intel/cstate: Support die scope counters on CLX-AP kan.liang
@ 2019-02-19 20:00 ` kan.liang
  2019-02-19 20:00 ` [PATCH 08/10] perf/x86/intel/rapl: " kan.liang
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: kan.liang @ 2019-02-19 20:00 UTC (permalink / raw)
  To: peterz, tglx, acme, mingo, x86, linux-kernel
  Cc: len.brown, jolsa, namhyung, eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

CLX-AP has the same uncore counters as SKX, but they are die scope.
Add a bool variable to indicate die scope only uncore counters.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/uncore.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index f795a73..f850ab9 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -20,6 +20,8 @@ static cpumask_t uncore_cpu_mask[DOMAIN_TYPE_MAX];
 
 static unsigned int uncore_domain_type_mask;
 
+static bool die_only;
+
 /* constraint for the fixed counter */
 static struct event_constraint uncore_constraint_fixed =
 	EVENT_CONSTRAINT(~0ULL, 1 << UNCORE_PMC_IDX_FIXED, ~0ULL);
@@ -895,6 +897,9 @@ static int __init uncore_type_init(struct intel_uncore_type *type, bool setid)
 	if (!pmus)
 		return -ENOMEM;
 
+	if (die_only)
+		type->type.type = DIE_DOMAIN;
+
 	if (domain_type_init(&type->type)) {
 		kfree(pmus);
 		return -ENOMEM;
@@ -1503,6 +1508,15 @@ static int __init intel_uncore_init(void)
 		return -ENODEV;
 
 	uncore_init = (struct intel_uncore_init_fun *)id->driver_data;
+
+	/*
+	 * CLX-AP has the same uncore counters as SKX,
+	 * but they are die scope.
+	 */
+	if ((boot_cpu_data.x86_model == INTEL_FAM6_SKYLAKE_X) &&
+	    (boot_cpu_data.x86_max_dies > 1))
+		die_only = true;
+
 	if (uncore_init->pci_init) {
 		pret = uncore_init->pci_init();
 		if (!pret)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 08/10] perf/x86/intel/rapl: Support die scope counters on CLX-AP
  2019-02-19 20:00 [PATCH 00/10] perf: Multi-die/package support kan.liang
                   ` (6 preceding siblings ...)
  2019-02-19 20:00 ` [PATCH 07/10] perf/x86/intel/uncore: " kan.liang
@ 2019-02-19 20:00 ` kan.liang
  2019-02-19 20:00 ` [PATCH 09/10] perf header: Add die information in cpu topology kan.liang
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: kan.liang @ 2019-02-19 20:00 UTC (permalink / raw)
  To: peterz, tglx, acme, mingo, x86, linux-kernel
  Cc: len.brown, jolsa, namhyung, eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

CLX-AP has the same RAPL counters as SKX, but they are die scope.

Add clx_ap_rapl_init for CLX-AP.
Add new attr_groups for DIE_DOMAIN.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/rapl.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c
index c1ba09c..110b491 100644
--- a/arch/x86/events/intel/rapl.c
+++ b/arch/x86/events/intel/rapl.c
@@ -589,6 +589,18 @@ static const struct attribute_group *rapl_pkg_attr_groups[] = {
 	NULL,
 };
 
+static struct attribute_group rapl_die_pmu_events_group = {
+	.name = "events",
+	.attrs = NULL, /* patched at runtime */
+};
+
+static const struct attribute_group *rapl_die_attr_groups[] = {
+	&rapl_pmu_attr_group,
+	&rapl_pmu_format_group,
+	&rapl_die_pmu_events_group,
+	NULL,
+};
+
 static int __rapl_cpu_offline(unsigned int cpu, struct rapl_pmus *pmus)
 {
 	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu, pmus);
@@ -791,6 +803,9 @@ static int __init init_rapl_pmus(const struct intel_rapl_events *events,
 	if (type == PACKAGE_DOMAIN) {
 		rapl_pkg_pmu_events_group.attrs = events->attrs;
 		pmus->pmu.attr_groups = rapl_pkg_attr_groups;
+	} else if (type == DIE_DOMAIN) {
+		rapl_die_pmu_events_group.attrs = events->attrs;
+		pmus->pmu.attr_groups = rapl_die_attr_groups;
 	}
 	pmus->pmu.task_ctx_nr = perf_invalid_context;
 	pmus->pmu.event_init = rapl_pmu_event_init;
@@ -860,6 +875,12 @@ static const struct intel_rapl_init_fun hsx_rapl_init __initconst = {
 	.events[PACKAGE_DOMAIN].attrs = rapl_events_srv_attr,
 };
 
+static const struct intel_rapl_init_fun clx_ap_rapl_init __initconst = {
+	.apply_quirk = true,
+	.events[DIE_DOMAIN].cntr_mask = RAPL_IDX_SRV,
+	.events[DIE_DOMAIN].attrs = rapl_events_srv_attr,
+};
+
 static const struct intel_rapl_init_fun hsw_rapl_init __initconst = {
 	.apply_quirk = false,
 	.events[PACKAGE_DOMAIN].cntr_mask = RAPL_IDX_HSW,
@@ -933,7 +954,11 @@ static int __init rapl_pmu_init(void)
 	if (!id)
 		return -ENODEV;
 
-	rapl_init = (struct intel_rapl_init_fun *)id->driver_data;
+	if ((boot_cpu_data.x86_model == INTEL_FAM6_SKYLAKE_X) &&
+	    (boot_cpu_data.x86_max_dies > 1))
+		rapl_init = (struct intel_rapl_init_fun *)&clx_ap_rapl_init;
+	else
+		rapl_init = (struct intel_rapl_init_fun *)id->driver_data;
 	apply_quirk = rapl_init->apply_quirk;
 
 	ret = rapl_check_hw_unit(apply_quirk);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 09/10] perf header: Add die information in cpu topology
  2019-02-19 20:00 [PATCH 00/10] perf: Multi-die/package support kan.liang
                   ` (7 preceding siblings ...)
  2019-02-19 20:00 ` [PATCH 08/10] perf/x86/intel/rapl: " kan.liang
@ 2019-02-19 20:00 ` kan.liang
  2019-02-19 20:00 ` [PATCH 10/10] perf stat: Support per-die aggregation kan.liang
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: kan.liang @ 2019-02-19 20:00 UTC (permalink / raw)
  To: peterz, tglx, acme, mingo, x86, linux-kernel
  Cc: len.brown, jolsa, namhyung, eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

With the new CPUID.1F, a new level type of processor topology, 'die', is
introduced. The 'die' information in cpu topology should be added in
perf header.

To be compatible with old perf.data, the patch checks the section size
before reading the die information. It never reads data crossing the
section boundary.
Because the new info is added at the end of the cpu_topology section,
the old perf tool ignores the extra data.

The new perf tool with the patch may be used on legacy kernel. Add a
new function check_x86_die_exists() to check if die topology
information is supported by kernel. The function only check X86 and
CPU 0. Assuming other CPUs have same topology.

A similar way for core and socket is used to support die id and sibling
dies string.

Add a new function in cpumap.c to fetch die id information.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/Documentation/perf.data-file-format.txt |   9 +-
 tools/perf/util/cpumap.c                           |   7 +
 tools/perf/util/cpumap.h                           |   1 +
 tools/perf/util/env.c                              |   1 +
 tools/perf/util/env.h                              |   3 +
 tools/perf/util/header.c                           | 185 +++++++++++++++++++--
 6 files changed, 187 insertions(+), 19 deletions(-)

diff --git a/tools/perf/Documentation/perf.data-file-format.txt b/tools/perf/Documentation/perf.data-file-format.txt
index dfb218f..d2ec58d 100644
--- a/tools/perf/Documentation/perf.data-file-format.txt
+++ b/tools/perf/Documentation/perf.data-file-format.txt
@@ -154,7 +154,7 @@ struct {
 
 String lists defining the core and CPU threads topology.
 The string lists are followed by a variable length array
-which contains core_id and socket_id of each cpu.
+which contains core_id, die_id (for x86) and socket_id of each cpu.
 The number of entries can be determined by the size of the
 section minus the sizes of both string lists.
 
@@ -163,14 +163,19 @@ struct {
        struct perf_header_string_list threads; /* Variable length */
        struct {
 	      uint32_t core_id;
+	      uint32_t die_id;
 	      uint32_t socket_id;
        } cpus[nr]; /* Variable length records */
 };
 
 Example:
-	sibling cores   : 0-3
+	sibling cores   : 0-8
+	sibling dies	: 0-3
+	sibling dies	: 4-7
 	sibling threads : 0-1
 	sibling threads : 2-3
+	sibling threads : 4-5
+	sibling threads : 6-7
 
 	HEADER_NUMA_TOPOLOGY = 14,
 
diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
index 383674f..b84fcd4 100644
--- a/tools/perf/util/cpumap.c
+++ b/tools/perf/util/cpumap.c
@@ -373,6 +373,13 @@ int cpu_map__build_map(struct cpu_map *cpus, struct cpu_map **res,
 	return 0;
 }
 
+int cpu_map__get_die_id(int cpu)
+{
+	int value, ret = cpu__get_topology_int(cpu, "die_id", &value);
+
+	return ret ?: value;
+}
+
 int cpu_map__get_core_id(int cpu)
 {
 	int value, ret = cpu__get_topology_int(cpu, "core_id", &value);
diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
index ed8999d..4d66f67 100644
--- a/tools/perf/util/cpumap.h
+++ b/tools/perf/util/cpumap.h
@@ -25,6 +25,7 @@ size_t cpu_map__snprint_mask(struct cpu_map *map, char *buf, size_t size);
 size_t cpu_map__fprintf(struct cpu_map *map, FILE *fp);
 int cpu_map__get_socket_id(int cpu);
 int cpu_map__get_socket(struct cpu_map *map, int idx, void *data);
+int cpu_map__get_die_id(int cpu);
 int cpu_map__get_core_id(int cpu);
 int cpu_map__get_core(struct cpu_map *map, int idx, void *data);
 int cpu_map__build_socket_map(struct cpu_map *cpus, struct cpu_map **sockp);
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index 4c23779..4d576e5 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -87,6 +87,7 @@ int perf_env__read_cpu_topology_map(struct perf_env *env)
 	for (cpu = 0; cpu < nr_cpus; ++cpu) {
 		env->cpu[cpu].core_id	= cpu_map__get_core_id(cpu);
 		env->cpu[cpu].socket_id	= cpu_map__get_socket_id(cpu);
+		env->cpu[cpu].die_id	= cpu_map__get_die_id(cpu);
 	}
 
 	env->nr_cpus_avail = nr_cpus;
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index d01b8355..b6e6741 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -7,6 +7,7 @@
 
 struct cpu_topology_map {
 	int	socket_id;
+	int	die_id;
 	int	core_id;
 };
 
@@ -47,6 +48,7 @@ struct perf_env {
 
 	int			nr_cmdline;
 	int			nr_sibling_cores;
+	int			nr_sibling_dies;
 	int			nr_sibling_threads;
 	int			nr_numa_nodes;
 	int			nr_memory_nodes;
@@ -55,6 +57,7 @@ struct perf_env {
 	char			*cmdline;
 	const char		**cmdline_argv;
 	char			*sibling_cores;
+	char			*sibling_dies;
 	char			*sibling_threads;
 	char			*pmu_mappings;
 	struct cpu_topology_map	*cpu;
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index dec6d21..11637ca 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -557,20 +557,43 @@ static int write_cmdline(struct feat_fd *ff,
 	return 0;
 }
 
+#define DIE_SIBLINGS_LIST \
+	"/sys/devices/system/cpu/cpu0/topology/die_siblings_list"
+
+static bool check_x86_die_exists(void)
+{
+	struct utsname uts;
+
+	if (uname(&uts) < 0)
+		return false;
+
+	if (strncmp(uts.machine, "x86_64", 6))
+		return false;
+
+	if (access(DIE_SIBLINGS_LIST, F_OK) == -1)
+		return false;
+
+	return true;
+}
+
 #define CORE_SIB_FMT \
 	"/sys/devices/system/cpu/cpu%d/topology/core_siblings_list"
+#define DIE_SIB_FMT \
+	"/sys/devices/system/cpu/cpu%d/topology/die_siblings_list"
 #define THRD_SIB_FMT \
 	"/sys/devices/system/cpu/cpu%d/topology/thread_siblings_list"
 
 struct cpu_topo {
 	u32 cpu_nr;
 	u32 core_sib;
+	u32 die_sib;
 	u32 thread_sib;
 	char **core_siblings;
+	char **die_siblings;
 	char **thread_siblings;
 };
 
-static int build_cpu_topo(struct cpu_topo *tp, int cpu)
+static int build_cpu_topo(struct cpu_topo *tp, int cpu, bool has_die)
 {
 	FILE *fp;
 	char filename[MAXPATHLEN];
@@ -583,12 +606,12 @@ static int build_cpu_topo(struct cpu_topo *tp, int cpu)
 	sprintf(filename, CORE_SIB_FMT, cpu);
 	fp = fopen(filename, "r");
 	if (!fp)
-		goto try_threads;
+		goto try_dies;
 
 	sret = getline(&buf, &len, fp);
 	fclose(fp);
 	if (sret <= 0)
-		goto try_threads;
+		goto try_dies;
 
 	p = strchr(buf, '\n');
 	if (p)
@@ -606,6 +629,34 @@ static int build_cpu_topo(struct cpu_topo *tp, int cpu)
 	}
 	ret = 0;
 
+try_dies:
+	if (has_die) {
+		sprintf(filename, DIE_SIB_FMT, cpu);
+		fp = fopen(filename, "r");
+		if (!fp)
+			goto try_threads;
+
+		sret = getline(&buf, &len, fp);
+		fclose(fp);
+		if (sret <= 0)
+			goto try_threads;
+
+		p = strchr(buf, '\n');
+		if (p)
+			*p = '\0';
+
+		for (i = 0; i < tp->die_sib; i++) {
+			if (!strcmp(buf, tp->die_siblings[i]))
+				break;
+		}
+		if (i == tp->die_sib) {
+			tp->die_siblings[i] = buf;
+			tp->die_sib++;
+			buf = NULL;
+			len = 0;
+		}
+		ret = 0;
+	}
 try_threads:
 	sprintf(filename, THRD_SIB_FMT, cpu);
 	fp = fopen(filename, "r");
@@ -636,7 +687,7 @@ static int build_cpu_topo(struct cpu_topo *tp, int cpu)
 	return ret;
 }
 
-static void free_cpu_topo(struct cpu_topo *tp)
+static void free_cpu_topo(struct cpu_topo *tp, bool has_die)
 {
 	u32 i;
 
@@ -646,17 +697,22 @@ static void free_cpu_topo(struct cpu_topo *tp)
 	for (i = 0 ; i < tp->core_sib; i++)
 		zfree(&tp->core_siblings[i]);
 
+	if (has_die) {
+		for (i = 0 ; i < tp->die_sib; i++)
+			zfree(&tp->die_siblings[i]);
+	}
+
 	for (i = 0 ; i < tp->thread_sib; i++)
 		zfree(&tp->thread_siblings[i]);
 
 	free(tp);
 }
 
-static struct cpu_topo *build_cpu_topology(void)
+static struct cpu_topo *build_cpu_topology(bool has_die)
 {
 	struct cpu_topo *tp = NULL;
 	void *addr;
-	u32 nr, i;
+	u32 nr, i, nr_addr;
 	size_t sz;
 	long ncpus;
 	int ret = -1;
@@ -674,7 +730,11 @@ static struct cpu_topo *build_cpu_topology(void)
 	nr = (u32)(ncpus & UINT_MAX);
 
 	sz = nr * sizeof(char *);
-	addr = calloc(1, sizeof(*tp) + 2 * sz);
+	if (has_die)
+		nr_addr = 3;
+	else
+		nr_addr = 2;
+	addr = calloc(1, sizeof(*tp) + nr_addr * sz);
 	if (!addr)
 		goto out_free;
 
@@ -683,13 +743,17 @@ static struct cpu_topo *build_cpu_topology(void)
 	addr += sizeof(*tp);
 	tp->core_siblings = addr;
 	addr += sz;
+	if (has_die) {
+		tp->die_siblings = addr;
+		addr += sz;
+	}
 	tp->thread_siblings = addr;
 
 	for (i = 0; i < nr; i++) {
 		if (!cpu_map__has(map, i))
 			continue;
 
-		ret = build_cpu_topo(tp, i);
+		ret = build_cpu_topo(tp, i, has_die);
 		if (ret < 0)
 			break;
 	}
@@ -697,7 +761,7 @@ static struct cpu_topo *build_cpu_topology(void)
 out_free:
 	cpu_map__put(map);
 	if (ret) {
-		free_cpu_topo(tp);
+		free_cpu_topo(tp, has_die);
 		tp = NULL;
 	}
 	return tp;
@@ -709,8 +773,11 @@ static int write_cpu_topology(struct feat_fd *ff,
 	struct cpu_topo *tp;
 	u32 i;
 	int ret, j;
+	bool has_die;
+
+	has_die = check_x86_die_exists();
 
-	tp = build_cpu_topology();
+	tp = build_cpu_topology(has_die);
 	if (!tp)
 		return -1;
 
@@ -747,8 +814,28 @@ static int write_cpu_topology(struct feat_fd *ff,
 		if (ret < 0)
 			return ret;
 	}
+
+	if (!has_die)
+		goto done;
+
+	ret = do_write(ff, &tp->die_sib, sizeof(tp->die_sib));
+	if (ret < 0)
+		goto done;
+
+	for (i = 0; i < tp->die_sib; i++) {
+		ret = do_write_string(ff, tp->die_siblings[i]);
+		if (ret < 0)
+			goto done;
+	}
+
+	for (j = 0; j < perf_env.nr_cpus_avail; j++) {
+		ret = do_write(ff, &perf_env.cpu[j].die_id,
+			       sizeof(perf_env.cpu[j].die_id));
+		if (ret < 0)
+			return ret;
+	}
 done:
-	free_cpu_topo(tp);
+	free_cpu_topo(tp, has_die);
 	return ret;
 }
 
@@ -1523,6 +1610,8 @@ static void print_cmdline(struct feat_fd *ff, FILE *fp)
 	fputc('\n', fp);
 }
 
+static bool has_die;
+
 static void print_cpu_topology(struct feat_fd *ff, FILE *fp)
 {
 	struct perf_header *ph = ff->ph;
@@ -1538,6 +1627,16 @@ static void print_cpu_topology(struct feat_fd *ff, FILE *fp)
 		str += strlen(str) + 1;
 	}
 
+	if (has_die) {
+		nr = ph->env.nr_sibling_dies;
+		str = ph->env.sibling_dies;
+
+		for (i = 0; i < nr; i++) {
+			fprintf(fp, "# sibling dies   : %s\n", str);
+			str += strlen(str) + 1;
+		}
+	}
+
 	nr = ph->env.nr_sibling_threads;
 	str = ph->env.sibling_threads;
 
@@ -1546,12 +1645,28 @@ static void print_cpu_topology(struct feat_fd *ff, FILE *fp)
 		str += strlen(str) + 1;
 	}
 
-	if (ph->env.cpu != NULL) {
-		for (i = 0; i < cpu_nr; i++)
-			fprintf(fp, "# CPU %d: Core ID %d, Socket ID %d\n", i,
-				ph->env.cpu[i].core_id, ph->env.cpu[i].socket_id);
-	} else
-		fprintf(fp, "# Core ID and Socket ID information is not available\n");
+	if (has_die) {
+		if (ph->env.cpu != NULL) {
+			for (i = 0; i < cpu_nr; i++)
+				fprintf(fp, "# CPU %d: Core ID %d, "
+					    "Die ID %d, Socket ID %d\n",
+					    i, ph->env.cpu[i].core_id,
+					    ph->env.cpu[i].die_id,
+					    ph->env.cpu[i].socket_id);
+		} else
+			fprintf(fp, "# Core ID, Die ID and Socket ID "
+				    "information is not available\n");
+	} else {
+		if (ph->env.cpu != NULL) {
+			for (i = 0; i < cpu_nr; i++)
+				fprintf(fp, "# CPU %d: Core ID %d, "
+					    "Socket ID %d\n",
+					    i, ph->env.cpu[i].core_id,
+					    ph->env.cpu[i].socket_id);
+		} else
+			fprintf(fp, "# Core ID and Socket ID "
+				    "information is not available\n");
+	}
 }
 
 static void print_clockid(struct feat_fd *ff, FILE *fp)
@@ -2245,6 +2360,7 @@ static int process_cpu_topology(struct feat_fd *ff, void *data __maybe_unused)
 			goto free_cpu;
 
 		ph->env.cpu[i].core_id = nr;
+		size += sizeof(u32);
 
 		if (do_read_u32(ff, &nr))
 			goto free_cpu;
@@ -2256,7 +2372,42 @@ static int process_cpu_topology(struct feat_fd *ff, void *data __maybe_unused)
 		}
 
 		ph->env.cpu[i].socket_id = nr;
+		size += sizeof(u32);
+	}
+
+	/*
+	 * The header may be from old perf,
+	 * which doesn't include die information.
+	 */
+	if (ff->size <= size)
+		return 0;
+
+	if (do_read_u32(ff, &nr))
+		return -1;
+
+	ph->env.nr_sibling_dies = nr;
+	size += sizeof(u32);
+
+	for (i = 0; i < nr; i++) {
+		str = do_read_string(ff);
+		if (!str)
+			goto error;
+
+		/* include a NULL character at the end */
+		if (strbuf_add(&sb, str, strlen(str) + 1) < 0)
+			goto error;
+		size += string_size(str);
+		free(str);
+	}
+	ph->env.sibling_dies = strbuf_detach(&sb, NULL);
+
+	for (i = 0; i < (u32)cpu_nr; i++) {
+		if (do_read_u32(ff, &nr))
+			goto free_cpu;
+
+		ph->env.cpu[i].die_id = nr;
 	}
+	has_die = true;
 
 	return 0;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 10/10] perf stat: Support per-die aggregation
  2019-02-19 20:00 [PATCH 00/10] perf: Multi-die/package support kan.liang
                   ` (8 preceding siblings ...)
  2019-02-19 20:00 ` [PATCH 09/10] perf header: Add die information in cpu topology kan.liang
@ 2019-02-19 20:00 ` kan.liang
  2019-02-20 10:15 ` [PATCH 00/10] perf: Multi-die/package support Peter Zijlstra
  2019-02-20 12:46 ` Jiri Olsa
  11 siblings, 0 replies; 18+ messages in thread
From: kan.liang @ 2019-02-19 20:00 UTC (permalink / raw)
  To: peterz, tglx, acme, mingo, x86, linux-kernel
  Cc: len.brown, jolsa, namhyung, eranian, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

It is useful to aggregate counts per processor die.

Introduce a new option "--per-die" to support per-die aggregation.

The global id for each core has been changed to socket + die id + core
id. The global id for each die is socket + die id.

Add die information for per-core aggregation. The output of per-core
aggregation will be changed from "S0-C0" to "S0-D0-C0". Any scripts
which rely on the output format of per-core aggregation probably be
broken.

For perf stat record/report, there is no die information when processing
the old perf.data. The per-die result will be the same as per-socket.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/Documentation/perf-stat.txt | 10 +++++
 tools/perf/builtin-stat.c              | 73 +++++++++++++++++++++++++++++++---
 tools/perf/util/cpumap.c               | 48 +++++++++++++++++++---
 tools/perf/util/cpumap.h               |  9 ++++-
 tools/perf/util/stat-display.c         | 24 ++++++++++-
 tools/perf/util/stat-shadow.c          |  1 +
 tools/perf/util/stat.c                 |  1 +
 tools/perf/util/stat.h                 |  1 +
 8 files changed, 153 insertions(+), 14 deletions(-)

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 4bc2085..4fd68ac 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -197,6 +197,13 @@ use --per-socket in addition to -a. (system-wide).  The output includes the
 socket number and the number of online processors on that socket. This is
 useful to gauge the amount of aggregation.
 
+--per-die::
+Aggregate counts per processor die for system-wide mode measurements.  This
+is a useful mode to detect imbalance between dies.  To enable this mode,
+use --per-die in addition to -a. (system-wide).  The output includes the
+die number and the number of online processors on that die. This is
+useful to gauge the amount of aggregation.
+
 --per-core::
 Aggregate counts per physical processor for system-wide mode measurements.  This
 is a useful mode to detect imbalance between physical cores.  To enable this mode,
@@ -236,6 +243,9 @@ Input file name.
 --per-socket::
 Aggregate counts per processor socket for system-wide mode measurements.
 
+--per-die::
+Aggregate counts per processor die for system-wide mode measurements.
+
 --per-core::
 Aggregate counts per physical processor for system-wide mode measurements.
 
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 63a3afc..92c6967 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -772,6 +772,8 @@ static struct option stat_options[] = {
 		    "stop workload and print counts after a timeout period in ms (>= 10ms)"),
 	OPT_SET_UINT(0, "per-socket", &stat_config.aggr_mode,
 		     "aggregate counts per processor socket", AGGR_SOCKET),
+	OPT_SET_UINT(0, "per-die", &stat_config.aggr_mode,
+		     "aggregate counts per processor die", AGGR_DIE),
 	OPT_SET_UINT(0, "per-core", &stat_config.aggr_mode,
 		     "aggregate counts per physical processor core", AGGR_CORE),
 	OPT_SET_UINT(0, "per-thread", &stat_config.aggr_mode,
@@ -796,6 +798,12 @@ static int perf_stat__get_socket(struct perf_stat_config *config __maybe_unused,
 	return cpu_map__get_socket(map, cpu, NULL);
 }
 
+static int perf_stat__get_die(struct perf_stat_config *config __maybe_unused,
+			      struct cpu_map *map, int cpu)
+{
+	return cpu_map__get_die(map, cpu, NULL);
+}
+
 static int perf_stat__get_core(struct perf_stat_config *config __maybe_unused,
 			       struct cpu_map *map, int cpu)
 {
@@ -836,6 +844,12 @@ static int perf_stat__get_socket_cached(struct perf_stat_config *config,
 	return perf_stat__get_aggr(config, perf_stat__get_socket, map, idx);
 }
 
+static int perf_stat__get_die_cached(struct perf_stat_config *config,
+					struct cpu_map *map, int idx)
+{
+	return perf_stat__get_aggr(config, perf_stat__get_die, map, idx);
+}
+
 static int perf_stat__get_core_cached(struct perf_stat_config *config,
 				      struct cpu_map *map, int idx)
 {
@@ -854,6 +868,13 @@ static int perf_stat_init_aggr_mode(void)
 		}
 		stat_config.aggr_get_id = perf_stat__get_socket_cached;
 		break;
+	case AGGR_DIE:
+		if (cpu_map__build_die_map(evsel_list->cpus, &stat_config.aggr_map)) {
+			perror("cannot build die map");
+			return -1;
+		}
+		stat_config.aggr_get_id = perf_stat__get_die_cached;
+		break;
 	case AGGR_CORE:
 		if (cpu_map__build_core_map(evsel_list->cpus, &stat_config.aggr_map)) {
 			perror("cannot build core map");
@@ -910,21 +931,41 @@ static int perf_env__get_socket(struct cpu_map *map, int idx, void *data)
 	return cpu == -1 ? -1 : env->cpu[cpu].socket_id;
 }
 
+static int perf_env__get_die(struct cpu_map *map, int idx, void *data)
+{
+	struct perf_env *env = data;
+	int die = -1, cpu = perf_env__get_cpu(env, map, idx);
+
+	if (cpu != -1) {
+		/*
+		 * Encode socket in upper 8 bits
+		 * die_id is relative to socket,
+		 * we need a global id. So we combine
+		 * socket + die id
+		 */
+		die = (env->cpu[cpu].socket_id << 8) |
+		      (env->cpu[cpu].die_id & 0xff);
+	}
+
+	return die;
+}
+
 static int perf_env__get_core(struct cpu_map *map, int idx, void *data)
 {
 	struct perf_env *env = data;
 	int core = -1, cpu = perf_env__get_cpu(env, map, idx);
 
 	if (cpu != -1) {
-		int socket_id = env->cpu[cpu].socket_id;
-
 		/*
-		 * Encode socket in upper 16 bits
-		 * core_id is relative to socket, and
+		 * Encode socket in upper 24 bits
+		 * encode die id in upper 16 bits
+		 * core_id is relative to socket and die,
 		 * we need a global id. So we combine
-		 * socket + core id.
+		 * socket + die id + core id
 		 */
-		core = (socket_id << 16) | (env->cpu[cpu].core_id & 0xffff);
+		core = (env->cpu[cpu].socket_id << 24) |
+		       (env->cpu[cpu].die_id << 16) |
+		       (env->cpu[cpu].core_id & 0xffff);
 	}
 
 	return core;
@@ -936,6 +977,12 @@ static int perf_env__build_socket_map(struct perf_env *env, struct cpu_map *cpus
 	return cpu_map__build_map(cpus, sockp, perf_env__get_socket, env);
 }
 
+static int perf_env__build_die_map(struct perf_env *env, struct cpu_map *cpus,
+				   struct cpu_map **diep)
+{
+	return cpu_map__build_map(cpus, diep, perf_env__get_die, env);
+}
+
 static int perf_env__build_core_map(struct perf_env *env, struct cpu_map *cpus,
 				    struct cpu_map **corep)
 {
@@ -947,6 +994,11 @@ static int perf_stat__get_socket_file(struct perf_stat_config *config __maybe_un
 {
 	return perf_env__get_socket(map, idx, &perf_stat.session->header.env);
 }
+static int perf_stat__get_die_file(struct perf_stat_config *config __maybe_unused,
+				   struct cpu_map *map, int idx)
+{
+	return perf_env__get_die(map, idx, &perf_stat.session->header.env);
+}
 
 static int perf_stat__get_core_file(struct perf_stat_config *config __maybe_unused,
 				    struct cpu_map *map, int idx)
@@ -966,6 +1018,13 @@ static int perf_stat_init_aggr_mode_file(struct perf_stat *st)
 		}
 		stat_config.aggr_get_id = perf_stat__get_socket_file;
 		break;
+	case AGGR_DIE:
+		if (perf_env__build_die_map(env, evsel_list->cpus, &stat_config.aggr_map)) {
+			perror("cannot build die map");
+			return -1;
+		}
+		stat_config.aggr_get_id = perf_stat__get_die_file;
+		break;
 	case AGGR_CORE:
 		if (perf_env__build_core_map(env, evsel_list->cpus, &stat_config.aggr_map)) {
 			perror("cannot build core map");
@@ -1515,6 +1574,8 @@ static int __cmd_report(int argc, const char **argv)
 	OPT_STRING('i', "input", &input_name, "file", "input file name"),
 	OPT_SET_UINT(0, "per-socket", &perf_stat.aggr_mode,
 		     "aggregate counts per processor socket", AGGR_SOCKET),
+	OPT_SET_UINT(0, "per-die", &perf_stat.aggr_mode,
+		     "aggregate counts per processor die", AGGR_DIE),
 	OPT_SET_UINT(0, "per-core", &perf_stat.aggr_mode,
 		     "aggregate counts per physical processor core", AGGR_CORE),
 	OPT_SET_UINT('A', "no-aggr", &perf_stat.aggr_mode,
diff --git a/tools/perf/util/cpumap.c b/tools/perf/util/cpumap.c
index b84fcd4..f6c7aa1 100644
--- a/tools/perf/util/cpumap.c
+++ b/tools/perf/util/cpumap.c
@@ -380,6 +380,33 @@ int cpu_map__get_die_id(int cpu)
 	return ret ?: value;
 }
 
+int cpu_map__get_die(struct cpu_map *map, int idx, void *data)
+{
+	int cpu, die, s;
+
+	if (idx > map->nr)
+		return -1;
+
+	cpu = map->map[idx];
+
+	die = cpu_map__get_die_id(cpu);
+	/* There is no die_id on legacy system. */
+	if (die == -1)
+		die = 0;
+
+	s = cpu_map__get_socket(map, idx, data);
+	if (s == -1)
+		return -1;
+
+	/*
+	 * encode socket in upper 8 bits
+	 * die_id is relative to socket, and
+	 * we need a global id. So we combine
+	 * socket + die id
+	 */
+	return (s << 8) | (die & 0xff);
+}
+
 int cpu_map__get_core_id(int cpu)
 {
 	int value, ret = cpu__get_topology_int(cpu, "core_id", &value);
@@ -388,7 +415,7 @@ int cpu_map__get_core_id(int cpu)
 
 int cpu_map__get_core(struct cpu_map *map, int idx, void *data)
 {
-	int cpu, s;
+	int cpu, s, die;
 
 	if (idx > map->nr)
 		return -1;
@@ -397,17 +424,23 @@ int cpu_map__get_core(struct cpu_map *map, int idx, void *data)
 
 	cpu = cpu_map__get_core_id(cpu);
 
+	die = cpu_map__get_die(map, idx, data);
+	/* There is no die_id on legacy system. */
+	if (die == -1)
+		die = 0;
+
 	s = cpu_map__get_socket(map, idx, data);
 	if (s == -1)
 		return -1;
 
 	/*
-	 * encode socket in upper 16 bits
-	 * core_id is relative to socket, and
+	 * encode socket in upper 24 bits
+	 * encode die id in upper 16 bits
+	 * core_id is relative to socket and die,
 	 * we need a global id. So we combine
-	 * socket+ core id
+	 * socket + die id + core id
 	 */
-	return (s << 16) | (cpu & 0xffff);
+	return (s << 24) | (die << 16) | (cpu & 0xffff);
 }
 
 int cpu_map__build_socket_map(struct cpu_map *cpus, struct cpu_map **sockp)
@@ -415,6 +448,11 @@ int cpu_map__build_socket_map(struct cpu_map *cpus, struct cpu_map **sockp)
 	return cpu_map__build_map(cpus, sockp, cpu_map__get_socket, NULL);
 }
 
+int cpu_map__build_die_map(struct cpu_map *cpus, struct cpu_map **diep)
+{
+	return cpu_map__build_map(cpus, diep, cpu_map__get_die, NULL);
+}
+
 int cpu_map__build_core_map(struct cpu_map *cpus, struct cpu_map **corep)
 {
 	return cpu_map__build_map(cpus, corep, cpu_map__get_core, NULL);
diff --git a/tools/perf/util/cpumap.h b/tools/perf/util/cpumap.h
index 4d66f67..3442b7a 100644
--- a/tools/perf/util/cpumap.h
+++ b/tools/perf/util/cpumap.h
@@ -26,9 +26,11 @@ size_t cpu_map__fprintf(struct cpu_map *map, FILE *fp);
 int cpu_map__get_socket_id(int cpu);
 int cpu_map__get_socket(struct cpu_map *map, int idx, void *data);
 int cpu_map__get_die_id(int cpu);
+int cpu_map__get_die(struct cpu_map *map, int idx, void *data);
 int cpu_map__get_core_id(int cpu);
 int cpu_map__get_core(struct cpu_map *map, int idx, void *data);
 int cpu_map__build_socket_map(struct cpu_map *cpus, struct cpu_map **sockp);
+int cpu_map__build_die_map(struct cpu_map *cpus, struct cpu_map **diep);
 int cpu_map__build_core_map(struct cpu_map *cpus, struct cpu_map **corep);
 
 struct cpu_map *cpu_map__get(struct cpu_map *map);
@@ -43,7 +45,12 @@ static inline int cpu_map__socket(struct cpu_map *sock, int s)
 
 static inline int cpu_map__id_to_socket(int id)
 {
-	return id >> 16;
+	return id >> 24;
+}
+
+static inline int cpu_map__id_to_die(int id)
+{
+	return (id >> 16) & 0xff;
 }
 
 static inline int cpu_map__id_to_cpu(int id)
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 665ee37..05216a2 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -73,8 +73,9 @@ static void aggr_printout(struct perf_stat_config *config,
 {
 	switch (config->aggr_mode) {
 	case AGGR_CORE:
-		fprintf(config->output, "S%d-C%*d%s%*d%s",
+		fprintf(config->output, "S%d-D%d-C%*d%s%*d%s",
 			cpu_map__id_to_socket(id),
+			cpu_map__id_to_die(id),
 			config->csv_output ? 0 : -8,
 			cpu_map__id_to_cpu(id),
 			config->csv_sep,
@@ -82,6 +83,16 @@ static void aggr_printout(struct perf_stat_config *config,
 			nr,
 			config->csv_sep);
 		break;
+	case AGGR_DIE:
+		fprintf(config->output, "S%d-D%*d%s%*d%s",
+			cpu_map__id_to_socket(id << 16),
+			config->csv_output ? 0 : -8,
+			cpu_map__id_to_die(id << 16),
+			config->csv_sep,
+			config->csv_output ? 0 : 4,
+			nr,
+			config->csv_sep);
+		break;
 	case AGGR_SOCKET:
 		fprintf(config->output, "S%*d%s%*d%s",
 			config->csv_output ? 0 : -5,
@@ -403,6 +414,7 @@ static void printout(struct perf_stat_config *config, int id, int nr,
 			[AGGR_THREAD] = 1,
 			[AGGR_NONE] = 1,
 			[AGGR_SOCKET] = 2,
+			[AGGR_DIE] = 2,
 			[AGGR_CORE] = 2,
 		};
 
@@ -866,7 +878,8 @@ static void print_no_aggr_metric(struct perf_stat_config *config,
 }
 
 static int aggr_header_lens[] = {
-	[AGGR_CORE] = 18,
+	[AGGR_CORE] = 24,
+	[AGGR_DIE] = 18,
 	[AGGR_SOCKET] = 12,
 	[AGGR_NONE] = 6,
 	[AGGR_THREAD] = 24,
@@ -875,6 +888,7 @@ static int aggr_header_lens[] = {
 
 static const char *aggr_header_csv[] = {
 	[AGGR_CORE] 	= 	"core,cpus,",
+	[AGGR_DIE] 	= 	"die,cpus",
 	[AGGR_SOCKET] 	= 	"socket,cpus",
 	[AGGR_NONE] 	= 	"cpu,",
 	[AGGR_THREAD] 	= 	"comm-pid,",
@@ -943,6 +957,11 @@ static void print_interval(struct perf_stat_config *config,
 			if (!metric_only)
 				fprintf(output, "             counts %*s events\n", unit_width, "unit");
 			break;
+		case AGGR_DIE:
+			fprintf(output, "#           time die cpus");
+			if (!metric_only)
+				fprintf(output, "             counts %*s events\n", unit_width, "unit");
+			break;
 		case AGGR_CORE:
 			fprintf(output, "#           time core         cpus");
 			if (!metric_only)
@@ -1130,6 +1149,7 @@ perf_evlist__print_counters(struct perf_evlist *evlist,
 
 	switch (config->aggr_mode) {
 	case AGGR_CORE:
+	case AGGR_DIE:
 	case AGGR_SOCKET:
 		print_aggr(config, evlist, prefix);
 		break;
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 3c22c58..4d34dc8 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -12,6 +12,7 @@
 /*
  * AGGR_GLOBAL: Use CPU 0
  * AGGR_SOCKET: Use first CPU of socket
+ * AGGR_DIE: Use first CPU of die
  * AGGR_CORE: Use first CPU of core
  * AGGR_NONE: Use matching CPU
  * AGGR_THREAD: Not supported?
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 4d40515..6bad12f 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -272,6 +272,7 @@ process_counter_values(struct perf_stat_config *config, struct perf_evsel *evsel
 	switch (config->aggr_mode) {
 	case AGGR_THREAD:
 	case AGGR_CORE:
+	case AGGR_DIE:
 	case AGGR_SOCKET:
 	case AGGR_NONE:
 		if (!evsel->snapshot)
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 2f9c915..7032dd1 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -44,6 +44,7 @@ enum aggr_mode {
 	AGGR_NONE,
 	AGGR_GLOBAL,
 	AGGR_SOCKET,
+	AGGR_DIE,
 	AGGR_CORE,
 	AGGR_THREAD,
 	AGGR_UNSET,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 00/10] perf: Multi-die/package support
  2019-02-19 20:00 [PATCH 00/10] perf: Multi-die/package support kan.liang
                   ` (9 preceding siblings ...)
  2019-02-19 20:00 ` [PATCH 10/10] perf stat: Support per-die aggregation kan.liang
@ 2019-02-20 10:15 ` Peter Zijlstra
  2019-02-20 12:46 ` Jiri Olsa
  11 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2019-02-20 10:15 UTC (permalink / raw)
  To: kan.liang
  Cc: tglx, acme, mingo, x86, linux-kernel, len.brown, jolsa, namhyung,
	eranian, ak

On Tue, Feb 19, 2019 at 12:00:01PM -0800, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> Add Linux perf support for multi-die/package. The first product with
> multi-die is Xeon Cascade Lake-AP (CLX-AP).
> The code bases on the top of Len's multi-die/package support.
> https://lkml.org/lkml/2019/2/18/1534

*sigh*, don't use lkml.org links.

We have a perfectly good canonical form:

  https://lkml.kernel.org/r/20190219034013.4147-1-lenb@kernel.org

which has the added benefit of including the Message-Id such that I can
easily find the email in my local archive.

And since Len forgot to Cc me on those, I suppose I'll have to go dig
them out to make sense of these here patches.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 01/10] perf/x86/intel: Introduce a concept "domain" as the scope of counters
  2019-02-19 20:00 ` [PATCH 01/10] perf/x86/intel: Introduce a concept "domain" as the scope of counters kan.liang
@ 2019-02-20 11:12   ` Peter Zijlstra
  2019-02-20 14:36     ` Liang, Kan
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2019-02-20 11:12 UTC (permalink / raw)
  To: kan.liang
  Cc: tglx, acme, mingo, x86, linux-kernel, len.brown, jolsa, namhyung,
	eranian, ak

On Tue, Feb 19, 2019 at 12:00:02PM -0800, kan.liang@linux.intel.com wrote:
> It's very useful to abstract several common topology related codes for
> these modules to reduce the code redundancy.

>  3 files changed, 96 insertions(+), 1 deletion(-)

So you add 100 lines, so we can remove lines when we start to use this.

Except all 3 follow up patches that employ this, all add more lines
still:

 1 file changed, 184 insertions(+), 157 deletions(-)
 3 files changed, 164 insertions(+), 80 deletions(-)
 1 file changed, 224 insertions(+), 82 deletions(-)



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 00/10] perf: Multi-die/package support
  2019-02-19 20:00 [PATCH 00/10] perf: Multi-die/package support kan.liang
                   ` (10 preceding siblings ...)
  2019-02-20 10:15 ` [PATCH 00/10] perf: Multi-die/package support Peter Zijlstra
@ 2019-02-20 12:46 ` Jiri Olsa
  2019-02-20 13:24   ` Peter Zijlstra
  11 siblings, 1 reply; 18+ messages in thread
From: Jiri Olsa @ 2019-02-20 12:46 UTC (permalink / raw)
  To: kan.liang
  Cc: peterz, tglx, acme, mingo, x86, linux-kernel, len.brown,
	namhyung, eranian, ak

On Tue, Feb 19, 2019 at 12:00:01PM -0800, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> Add Linux perf support for multi-die/package. The first product with
> multi-die is Xeon Cascade Lake-AP (CLX-AP).
> The code bases on the top of Len's multi-die/package support.
> https://lkml.org/lkml/2019/2/18/1534
> 
> Patch 1-4: They are generic codes for previous platforms.
> Perf supports miscellaneous modules, e.g cstate, RAPL and uncore.
> Their counters have the same scope of effect (per package).
> But they maintain their own scope information independently.
> It's very useful to abstract several common topology related codes
> for these modules to reduce the code redundancy, especially when
> adding counters with new scope.
> 
> Patch 5-8: Support die scope counters on CLX-AP for uncore, RAPL
> and cstate.
> 
> Patch 9-10: Support per-die aggregation for perf stat and header.
> 
> Kan Liang (10):
>   perf/x86/intel: Introduce a concept "domain" as the scope of counters
>   perf/x86/intel/cstate: Apply "domain" for cstate
>   perf/x86/intel/uncore: Apply "domain" for uncore
>   perf/x86/intel/rapl: Apply "domain" for RAPL
>   perf/x86/intel/domain: Add new domain type for die
>   perf/x86/intel/uncore: Support die scope counters on CLX-AP
>   perf/x86/intel/rapl: Support die scope counters on CLX-AP
>   perf/x86/intel/cstate: Support die scope counters on CLX-AP
>   perf header: Add die information in cpu topology
>   perf stat: Support per-die aggregation

hi,
what is based on? I'm getting conflicts when applying
on tip or Arnaldo's perf/core

thanks,
jirka

> 
>  arch/x86/events/Makefile                           |   2 +-
>  arch/x86/events/domain.c                           |  81 +++++
>  arch/x86/events/domain.h                           |  26 ++
>  arch/x86/events/intel/cstate.c                     | 364 ++++++++++++---------
>  arch/x86/events/intel/rapl.c                       | 333 ++++++++++++++-----
>  arch/x86/events/intel/uncore.c                     | 247 +++++++++-----
>  arch/x86/events/intel/uncore.h                     |   9 +-
>  arch/x86/events/intel/uncore_snbep.c               |   2 +-
>  tools/perf/Documentation/perf-stat.txt             |  10 +
>  tools/perf/Documentation/perf.data-file-format.txt |   9 +-
>  tools/perf/builtin-stat.c                          |  73 ++++-
>  tools/perf/util/cpumap.c                           |  55 +++-
>  tools/perf/util/cpumap.h                           |  10 +-
>  tools/perf/util/env.c                              |   1 +
>  tools/perf/util/env.h                              |   3 +
>  tools/perf/util/header.c                           | 185 ++++++++++-
>  tools/perf/util/stat-display.c                     |  24 +-
>  tools/perf/util/stat-shadow.c                      |   1 +
>  tools/perf/util/stat.c                             |   1 +
>  tools/perf/util/stat.h                             |   1 +
>  20 files changed, 1082 insertions(+), 355 deletions(-)
>  create mode 100644 arch/x86/events/domain.c
>  create mode 100644 arch/x86/events/domain.h
> 
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 00/10] perf: Multi-die/package support
  2019-02-20 12:46 ` Jiri Olsa
@ 2019-02-20 13:24   ` Peter Zijlstra
  2019-02-20 13:32     ` Jiri Olsa
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2019-02-20 13:24 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: kan.liang, tglx, acme, mingo, x86, linux-kernel, len.brown,
	namhyung, eranian, ak

On Wed, Feb 20, 2019 at 01:46:19PM +0100, Jiri Olsa wrote:
> On Tue, Feb 19, 2019 at 12:00:01PM -0800, kan.liang@linux.intel.com wrote:

> > The code bases on the top of Len's multi-die/package support.
> > https://lkml.org/lkml/2019/2/18/1534

> what is based on? I'm getting conflicts when applying
> on tip or Arnaldo's perf/core



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 00/10] perf: Multi-die/package support
  2019-02-20 13:24   ` Peter Zijlstra
@ 2019-02-20 13:32     ` Jiri Olsa
  0 siblings, 0 replies; 18+ messages in thread
From: Jiri Olsa @ 2019-02-20 13:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kan.liang, tglx, acme, mingo, x86, linux-kernel, len.brown,
	namhyung, eranian, ak

On Wed, Feb 20, 2019 at 02:24:13PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 20, 2019 at 01:46:19PM +0100, Jiri Olsa wrote:
> > On Tue, Feb 19, 2019 at 12:00:01PM -0800, kan.liang@linux.intel.com wrote:
> 
> > > The code bases on the top of Len's multi-die/package support.
> > > https://lkml.org/lkml/2019/2/18/1534
> 
> > what is based on? I'm getting conflicts when applying
> > on tip or Arnaldo's perf/core

ugh, thanks ;-)

jirka

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 01/10] perf/x86/intel: Introduce a concept "domain" as the scope of counters
  2019-02-20 11:12   ` Peter Zijlstra
@ 2019-02-20 14:36     ` Liang, Kan
  2019-03-05 20:32       ` Liang, Kan
  0 siblings, 1 reply; 18+ messages in thread
From: Liang, Kan @ 2019-02-20 14:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, acme, mingo, x86, linux-kernel, len.brown, jolsa, namhyung,
	eranian, ak



On 2/20/2019 6:12 AM, Peter Zijlstra wrote:
> On Tue, Feb 19, 2019 at 12:00:02PM -0800, kan.liang@linux.intel.com wrote:
>> It's very useful to abstract several common topology related codes for
>> these modules to reduce the code redundancy.
> 
>>   3 files changed, 96 insertions(+), 1 deletion(-)
> 
> So you add 100 lines, so we can remove lines when we start to use this.
> 
> Except all 3 follow up patches that employ this, all add more lines
> still:

The previous implementation assumes that there is only one or two types 
of counters (per core or per package). The proposed solution breaks the 
assumption and can support more types of counters. The new 
infrastructure needs more lines. But with more and more types 
introduced, we can expect less lines in total with the proposed solution.

>   1 file changed, 184 insertions(+), 157 deletions(-)

This is cstate which supports two types of counters (per core and per 
package) now.

>   3 files changed, 164 insertions(+), 80 deletions(-)
>   1 file changed, 224 insertions(+), 82 deletions(-)
> 

They are uncore and rapl which only supports one type of counters (per 
package) now.

When there is only one type, the proposed solution has more lines.
But when there are two types, the proposed solution has similar number 
of lines as previous implementation.
In this trend, I expect that the proposed solution has less lines when 
there are three or more types.

With die introduced, there are at least two types of counters for rapl 
and uncore, and three types for cstate. In total, we should see less lines.

There may be more types of counters added later. In 8.9.1 Hierarchical 
Mapping of Shared Resources of SDM vol3A, it document 7 APIC_ID fields 
(cluster, package, die, tile, module, core and SMT). There are only 4 
types of counters for now.


Thanks,
Kan

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 01/10] perf/x86/intel: Introduce a concept "domain" as the scope of counters
  2019-02-20 14:36     ` Liang, Kan
@ 2019-03-05 20:32       ` Liang, Kan
  0 siblings, 0 replies; 18+ messages in thread
From: Liang, Kan @ 2019-03-05 20:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, acme, mingo, x86, linux-kernel, len.brown, jolsa, namhyung,
	eranian, ak

Hi Peter,

Is the idea (abstract common topology related codes for perf modules) 
the right direction?
I'm asking because I'm going to submit uncore codes for new platforms. 
I'm not sure if the new code should base on this series.

Could you please share your opinion?

If it's the right direction, could you please review the patch 1-4?
They are for previous platforms and can be merged separately.

If it's not the right time to do the abstraction, I will re-write the 
patches to only specially handle CLX-AP for now. For example, per-die 
counters can share the code with per-package counters by introducing a 
new variable 'bool die_scope' and checking the variable before calling 
any topology related functions.
But that's only a temporary solution for CLX-AP. We still need to add 
dedicated per-die PMU support later, if both per-die and per-package 
counters are supported on the same platforms.

Thanks,
Kan

On 2/20/2019 9:36 AM, Liang, Kan wrote:
> 
> 
> On 2/20/2019 6:12 AM, Peter Zijlstra wrote:
>> On Tue, Feb 19, 2019 at 12:00:02PM -0800, kan.liang@linux.intel.com 
>> wrote:
>>> It's very useful to abstract several common topology related codes for
>>> these modules to reduce the code redundancy.
>>
>>>   3 files changed, 96 insertions(+), 1 deletion(-)
>>
>> So you add 100 lines, so we can remove lines when we start to use this.
>>
>> Except all 3 follow up patches that employ this, all add more lines
>> still:
> 
> The previous implementation assumes that there is only one or two types 
> of counters (per core or per package). The proposed solution breaks the 
> assumption and can support more types of counters. The new 
> infrastructure needs more lines. But with more and more types 
> introduced, we can expect less lines in total with the proposed solution.
> 
>>   1 file changed, 184 insertions(+), 157 deletions(-)
> 
> This is cstate which supports two types of counters (per core and per 
> package) now.
> 
>>   3 files changed, 164 insertions(+), 80 deletions(-)
>>   1 file changed, 224 insertions(+), 82 deletions(-)
>>
> 
> They are uncore and rapl which only supports one type of counters (per 
> package) now.
> 
> When there is only one type, the proposed solution has more lines.
> But when there are two types, the proposed solution has similar number 
> of lines as previous implementation.
> In this trend, I expect that the proposed solution has less lines when 
> there are three or more types.
> 
> With die introduced, there are at least two types of counters for rapl 
> and uncore, and three types for cstate. In total, we should see less lines.
> 
> There may be more types of counters added later. In 8.9.1 Hierarchical 
> Mapping of Shared Resources of SDM vol3A, it document 7 APIC_ID fields 
> (cluster, package, die, tile, module, core and SMT). There are only 4 
> types of counters for now.
> 
> 
> Thanks,
> Kan

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-03-05 20:32 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-19 20:00 [PATCH 00/10] perf: Multi-die/package support kan.liang
2019-02-19 20:00 ` [PATCH 01/10] perf/x86/intel: Introduce a concept "domain" as the scope of counters kan.liang
2019-02-20 11:12   ` Peter Zijlstra
2019-02-20 14:36     ` Liang, Kan
2019-03-05 20:32       ` Liang, Kan
2019-02-19 20:00 ` [PATCH 02/10] perf/x86/intel/cstate: Apply "domain" for cstate kan.liang
2019-02-19 20:00 ` [PATCH 03/10] perf/x86/intel/uncore: Apply "domain" for uncore kan.liang
2019-02-19 20:00 ` [PATCH 04/10] perf/x86/intel/rapl: Apply "domain" for RAPL kan.liang
2019-02-19 20:00 ` [PATCH 05/10] perf/x86/intel/domain: Add new domain type for die kan.liang
2019-02-19 20:00 ` [PATCH 06/10] perf/x86/intel/cstate: Support die scope counters on CLX-AP kan.liang
2019-02-19 20:00 ` [PATCH 07/10] perf/x86/intel/uncore: " kan.liang
2019-02-19 20:00 ` [PATCH 08/10] perf/x86/intel/rapl: " kan.liang
2019-02-19 20:00 ` [PATCH 09/10] perf header: Add die information in cpu topology kan.liang
2019-02-19 20:00 ` [PATCH 10/10] perf stat: Support per-die aggregation kan.liang
2019-02-20 10:15 ` [PATCH 00/10] perf: Multi-die/package support Peter Zijlstra
2019-02-20 12:46 ` Jiri Olsa
2019-02-20 13:24   ` Peter Zijlstra
2019-02-20 13:32     ` Jiri Olsa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).