linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V4 00/14] TopDown metrics support for Icelake
@ 2019-09-16 13:41 kan.liang
  2019-09-16 13:41 ` [PATCH V4 01/14] perf/x86/intel: Introduce the fourth fixed counter kan.liang
                   ` (14 more replies)
  0 siblings, 15 replies; 26+ messages in thread
From: kan.liang @ 2019-09-16 13:41 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Icelake has support for measuring the level 1 TopDown metrics
directly in hardware. This is implemented by an additional METRICS
register, and a new Fixed Counter 3 that measures pipeline SLOTS.

For the Ice Lake implementation of performance metrics, software
should start both PERF_METRICS and fixed counter 3 from zero.
Additionally, for certain scenarios that involve counting metrics at
high rates, software is recommended to periodically clear both in
order to maintain accurate measurements.

IA32_PERF_GLOBAL_STATUS. OVF_PERF_METRICS[48]: If this bit is set,
it indicates that some PERF_METRICS-related counter has overflowed and
a PMI is triggered. It is recommended to clear PERF_METRICS as well as
fixed counter 3 in such case. However, an overflow of fixed counter 3
should normally happen first. If this bit is clear, no such overflow has
occurred.

New in Icelake
- Do not require generic counters. This allows to collect TopDown always
  in addition to other events.
- Measuring TopDown per thread/process instead of only per core

Limitation
- To get accurate result and avoid reading the METRICS register multiple
  times, the TopDown metrics events and SLOTS event have to be in the
  same group.
- METRICS and SLOTS registers have to be cleared after each read by SW.
  That is to prevent the lose of precision.
- Cannot do sampling read SLOTS and TopDown metric events

Please refer SDM Vol3, 18.3.9.3 Performance Metrics for the details of
TopDown metrics.

Changes since V3:
- Separate fixed counter3 definition patch
- Separate BTS index patch
- Apply Peter's cleanup patch
- Fix the name of perf capabilities for perf METRICS
- Apply patch for mul_u64_u32_div() x86_64 implementation
- Fix unconditionally allows collecting 4 extra events
- Add patch to clean up NMI handler by naming global status bit
- Add patch to reuse event_base_rdpmc for RDPMC userspace support

Changes since V2:
- Rebase on top of v5.3-rc1

Key changes since V1:
- Remove variables for reg_idx and enabled_events[] array.
  The reg_idx can be calculated by idx in runtime.
  Using existing active_mask to replace enabled_events.
- Choose value 47 for the fixed index of BTS.
- Support OVF_PERF_METRICS overflow bit in PMI handler
- Drops the caching mechanism and related variables
  New mechanism is to update all active slots/metrics events for the
  first slots/metrics events in a group. For each group reading, it
  still only read the slots/perf_metrics MSR once
- Disable PMU for read of topdown events to avoid the NMI issue
- Move RDPMC support to a separate patch
- Using event=0x00,umask=0x1X for topdown metrics events
- Drop the patch which add REMOVE transaction
  We can indicate x86_pmu_stop() by checking
  (event && !test_bit(event->hw.idx, cpuc->active_mask)),
  which is a good place to save the slots/metrics MSR value

Andi Kleen (2):
  perf, tools, stat: Support new per thread TopDown metrics
  perf, tools: Add documentation for topdown metrics

Kan Liang (11):
  perf/x86/intel: Introduce the fourth fixed counter
  perf/x86/intel: Set correct mask for TOPDOWN.SLOTS
  perf/x86/intel: Move BTS index to 47
  perf/x86/intel: Basic support for metrics counters
  perf/x86/intel: Fix the name of perf capabilities for perf METRICS
  perf/x86/intel: Support hardware TopDown metrics
  perf/x86/intel: Support per thread RDPMC TopDown metrics
  perf/x86/intel: Export TopDown events for Icelake
  perf/x86/intel: Disable sampling read slots and topdown
  perf/x86/intel: Name global status bit in NMI handler
  perf/x86: Use event_base_rdpmc for RDPMC userspace support

Peter Zijlstra (Intel) (1):
  x86/math64: Provide a sane mul_u64_u32_div() implementation for x86_64

 arch/x86/events/core.c                 |  80 ++++-
 arch/x86/events/intel/core.c           | 421 +++++++++++++++++++++++--
 arch/x86/events/perf_event.h           |  57 +++-
 arch/x86/include/asm/div64.h           |  13 +
 arch/x86/include/asm/msr-index.h       |   3 +
 arch/x86/include/asm/perf_event.h      |  48 ++-
 include/linux/perf_event.h             |   3 +
 tools/perf/Documentation/perf-stat.txt |   9 +-
 tools/perf/Documentation/topdown.txt   | 223 +++++++++++++
 tools/perf/builtin-stat.c              |  24 ++
 tools/perf/util/stat-shadow.c          |  89 ++++++
 tools/perf/util/stat.c                 |   4 +
 tools/perf/util/stat.h                 |   8 +
 13 files changed, 924 insertions(+), 58 deletions(-)
 create mode 100644 tools/perf/Documentation/topdown.txt

-- 
2.17.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH V4 01/14] perf/x86/intel: Introduce the fourth fixed counter
  2019-09-16 13:41 [PATCH V4 00/14] TopDown metrics support for Icelake kan.liang
@ 2019-09-16 13:41 ` kan.liang
  2019-09-16 13:41 ` [PATCH V4 02/14] perf/x86/intel: Set correct mask for TOPDOWN.SLOTS kan.liang
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: kan.liang @ 2019-09-16 13:41 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The fourth fixed counter, TOPDOWN.SLOTS, is introduced in Ice Lake.

Add MSR address and macros for the new fixed counter, which will be used
in the following patch.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

New patch for V4

 arch/x86/include/asm/perf_event.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index ee26e9215f18..9f15a700d1db 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -146,12 +146,12 @@ struct x86_pmu_capability {
  */
 
 /*
- * All 3 fixed-mode PMCs are configured via this single MSR:
+ * All 4 fixed-mode PMCs are configured via this single MSR:
  */
 #define MSR_ARCH_PERFMON_FIXED_CTR_CTRL	0x38d
 
 /*
- * The counts are available in three separate MSRs:
+ * The counts are available in four separate MSRs:
  */
 
 /* Instr_Retired.Any: */
@@ -167,6 +167,11 @@ struct x86_pmu_capability {
 #define INTEL_PMC_IDX_FIXED_REF_CYCLES	(INTEL_PMC_IDX_FIXED + 2)
 #define INTEL_PMC_MSK_FIXED_REF_CYCLES	(1ULL << INTEL_PMC_IDX_FIXED_REF_CYCLES)
 
+/* TOPDOWN.SLOTS: */
+#define MSR_ARCH_PERFMON_FIXED_CTR3	0x30c
+#define INTEL_PMC_IDX_FIXED_SLOTS	(INTEL_PMC_IDX_FIXED + 3)
+#define INTEL_PMC_MSK_FIXED_SLOTS	(1ULL << INTEL_PMC_IDX_FIXED_SLOTS)
+
 /*
  * We model BTS tracing as another fixed-mode PMC.
  *
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V4 02/14] perf/x86/intel: Set correct mask for TOPDOWN.SLOTS
  2019-09-16 13:41 [PATCH V4 00/14] TopDown metrics support for Icelake kan.liang
  2019-09-16 13:41 ` [PATCH V4 01/14] perf/x86/intel: Introduce the fourth fixed counter kan.liang
@ 2019-09-16 13:41 ` kan.liang
  2019-09-16 13:41 ` [PATCH V4 03/14] perf/x86/intel: Move BTS index to 47 kan.liang
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: kan.liang @ 2019-09-16 13:41 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

TOPDOWN.SLOTS(0x0400) is not a generic event. It is only available on
fixed counter3.

Don't extend its mask to generic counters.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

Changes since V3:
- Separate fixed counter3 definition patch

 arch/x86/events/intel/core.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 43c966d1208e..f0208ee554fc 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5106,12 +5106,14 @@ __init int intel_pmu_init(void)
 
 	if (x86_pmu.event_constraints) {
 		/*
-		 * event on fixed counter2 (REF_CYCLES) only works on this
+		 * event on fixed counter2 (REF_CYCLES) and
+		 * fixed counter3 (TOPDOWN.SLOTS) only work on this
 		 * counter, so do not extend mask to generic counters
 		 */
 		for_each_event_constraint(c, x86_pmu.event_constraints) {
 			if (c->cmask == FIXED_EVENT_FLAGS
-			    && c->idxmsk64 != INTEL_PMC_MSK_FIXED_REF_CYCLES) {
+			    && c->idxmsk64 != INTEL_PMC_MSK_FIXED_REF_CYCLES
+			    && c->idxmsk64 != INTEL_PMC_MSK_FIXED_SLOTS) {
 				c->idxmsk64 |= (1ULL << x86_pmu.num_counters) - 1;
 			}
 			c->idxmsk64 &=
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V4 03/14] perf/x86/intel: Move BTS index to 47
  2019-09-16 13:41 [PATCH V4 00/14] TopDown metrics support for Icelake kan.liang
  2019-09-16 13:41 ` [PATCH V4 01/14] perf/x86/intel: Introduce the fourth fixed counter kan.liang
  2019-09-16 13:41 ` [PATCH V4 02/14] perf/x86/intel: Set correct mask for TOPDOWN.SLOTS kan.liang
@ 2019-09-16 13:41 ` kan.liang
  2019-09-16 13:41 ` [PATCH V4 04/14] perf/x86/intel: Basic support for metrics counters kan.liang
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: kan.liang @ 2019-09-16 13:41 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The bit 48 in the PERF_GLOBAL_STATUS is used to indicate the overflow
status of PERF_METRICS counters now.

Move BTS index to 47.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

New patch for V4

 arch/x86/include/asm/perf_event.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 9f15a700d1db..eb460b63e07b 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -175,11 +175,11 @@ struct x86_pmu_capability {
 /*
  * We model BTS tracing as another fixed-mode PMC.
  *
- * We choose a value in the middle of the fixed event range, since lower
+ * We choose value 47 for the fixed index of BTS, since lower
  * values are used by actual fixed events and higher values are used
  * to indicate other overflow conditions in the PERF_GLOBAL_STATUS msr.
  */
-#define INTEL_PMC_IDX_FIXED_BTS				(INTEL_PMC_IDX_FIXED + 16)
+#define INTEL_PMC_IDX_FIXED_BTS				(INTEL_PMC_IDX_FIXED + 15)
 
 #define GLOBAL_STATUS_COND_CHG				BIT_ULL(63)
 #define GLOBAL_STATUS_BUFFER_OVF			BIT_ULL(62)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V4 04/14] perf/x86/intel: Basic support for metrics counters
  2019-09-16 13:41 [PATCH V4 00/14] TopDown metrics support for Icelake kan.liang
                   ` (2 preceding siblings ...)
  2019-09-16 13:41 ` [PATCH V4 03/14] perf/x86/intel: Move BTS index to 47 kan.liang
@ 2019-09-16 13:41 ` kan.liang
  2019-09-16 13:41 ` [PATCH V4 05/14] perf/x86/intel: Fix the name of perf capabilities for perf METRICS kan.liang
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: kan.liang @ 2019-09-16 13:41 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Metrics counters (hardware counters containing multiple metrics)
are modeled as separate registers for each TopDown metric events,
with an extra reg being used for coordinating access to the
underlying register in the scheduler.

Adds the basic infrastructure to separate the scheduler register indexes
from the actual hardware register indexes. In most cases the MSR address
is already used correctly, but for code using indexes we need calculate
the correct underlying register.

The TopDown metric events share the fixed counter 3. It only needs
enable/disable once for them.

Naming:
The events which uses Metrics counters are called TopDown metric
events or metric events in the code.
The fixed counter 3 is called TopDown slots event or slots event.
Topdown events stand for metric events + slots event in the code.

Thank Peter Zijlstra very much to clean up the patch. All the topdown
support has been properly placed in the fixed counter functions. So the
is_topdown_idx() only need to be check once. Also, clean up the
x86_assign_hw_event() by converting multiple if-else statements to a
switch statement.

Originally-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

Changes since V3:
- Separate BTS index patch
- Apply Peter's cleanup patch

 arch/x86/events/core.c            | 23 ++++++--
 arch/x86/events/intel/core.c      | 98 ++++++++++++++++++++++---------
 arch/x86/events/perf_event.h      | 14 +++++
 arch/x86/include/asm/msr-index.h  |  1 +
 arch/x86/include/asm/perf_event.h | 28 +++++++++
 5 files changed, 129 insertions(+), 35 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 15b90b1a8fb1..fc8b0cbe5dc1 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1054,22 +1054,33 @@ static inline void x86_assign_hw_event(struct perf_event *event,
 				struct cpu_hw_events *cpuc, int i)
 {
 	struct hw_perf_event *hwc = &event->hw;
+	int idx;
 
-	hwc->idx = cpuc->assign[i];
+	idx = hwc->idx = cpuc->assign[i];
 	hwc->last_cpu = smp_processor_id();
 	hwc->last_tag = ++cpuc->tags[i];
 
-	if (hwc->idx == INTEL_PMC_IDX_FIXED_BTS) {
+	switch (hwc->idx) {
+	case INTEL_PMC_IDX_FIXED_BTS:
 		hwc->config_base = 0;
 		hwc->event_base	= 0;
-	} else if (hwc->idx >= INTEL_PMC_IDX_FIXED) {
+		break;
+
+	case INTEL_PMC_IDX_FIXED_METRIC_BASE ... INTEL_PMC_IDX_FIXED_METRIC_BASE + 3:
+		/* All METRIC events are mapped onto the fixed SLOTS event */
+		idx = INTEL_PMC_IDX_FIXED_SLOTS;
+		/* fall through */
+	case INTEL_PMC_IDX_FIXED ... INTEL_PMC_IDX_FIXED_BTS - 1:
 		hwc->config_base = MSR_ARCH_PERFMON_FIXED_CTR_CTRL;
-		hwc->event_base = MSR_ARCH_PERFMON_FIXED_CTR0 + (hwc->idx - INTEL_PMC_IDX_FIXED);
-		hwc->event_base_rdpmc = (hwc->idx - INTEL_PMC_IDX_FIXED) | 1<<30;
-	} else {
+		hwc->event_base = MSR_ARCH_PERFMON_FIXED_CTR0 + (idx - INTEL_PMC_IDX_FIXED);
+		hwc->event_base_rdpmc = (idx - INTEL_PMC_IDX_FIXED) | 1<<30;
+		break;
+
+	default:
 		hwc->config_base = x86_pmu_config_addr(hwc->idx);
 		hwc->event_base  = x86_pmu_event_addr(hwc->idx);
 		hwc->event_base_rdpmc = x86_pmu_rdpmc_index(hwc->idx);
+		break;
 	}
 }
 
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index f0208ee554fc..c7e093e5abd2 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2129,27 +2129,60 @@ static inline void intel_pmu_ack_status(u64 ack)
 	wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, ack);
 }
 
-static void intel_pmu_disable_fixed(struct hw_perf_event *hwc)
+static inline bool event_is_checkpointed(struct perf_event *event)
+{
+	return unlikely(event->hw.config & HSW_IN_TX_CHECKPOINTED) != 0;
+}
+
+static inline void intel_set_masks(struct perf_event *event, int idx)
 {
-	int idx = hwc->idx - INTEL_PMC_IDX_FIXED;
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+	if (event->attr.exclude_host)
+		__set_bit(idx, (unsigned long *)&cpuc->intel_ctrl_guest_mask);
+	if (event->attr.exclude_guest)
+		__set_bit(idx, (unsigned long *)&cpuc->intel_ctrl_host_mask);
+	if (event_is_checkpointed(event))
+		__set_bit(idx, (unsigned long *)&cpuc->intel_cp_status);
+}
+
+static inline void intel_clear_masks(struct perf_event *event, int idx)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+	__clear_bit(idx, (unsigned long *)&cpuc->intel_ctrl_guest_mask);
+	__clear_bit(idx, (unsigned long *)&cpuc->intel_ctrl_host_mask);
+	__clear_bit(idx, (unsigned long *)&cpuc->intel_cp_status);
+}
+
+static void intel_pmu_disable_fixed(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
 	u64 ctrl_val, mask;
+	int idx = hwc->idx;
 
-	mask = 0xfULL << (idx * 4);
+	if (is_topdown_idx(idx)) {
+		struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+		/*
+		 * When there are other Top-Down events still active,
+		 * don't disable the SLOTS counter.
+		 */
+		if (*(u64 *)cpuc->active_mask & INTEL_PMC_OTHER_TOPDOWN_BITS(idx))
+			return;
+		idx = INTEL_PMC_IDX_FIXED_SLOTS;
+	}
 
+	intel_clear_masks(event, idx);
+
+	mask = 0xfULL << ((idx - INTEL_PMC_IDX_FIXED) * 4);
 	rdmsrl(hwc->config_base, ctrl_val);
 	ctrl_val &= ~mask;
 	wrmsrl(hwc->config_base, ctrl_val);
 }
 
-static inline bool event_is_checkpointed(struct perf_event *event)
-{
-	return (event->hw.config & HSW_IN_TX_CHECKPOINTED) != 0;
-}
-
 static void intel_pmu_disable_event(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
-	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 
 	if (unlikely(hwc->idx == INTEL_PMC_IDX_FIXED_BTS)) {
 		intel_pmu_disable_bts();
@@ -2157,18 +2190,19 @@ static void intel_pmu_disable_event(struct perf_event *event)
 		return;
 	}
 
-	cpuc->intel_ctrl_guest_mask &= ~(1ull << hwc->idx);
-	cpuc->intel_ctrl_host_mask &= ~(1ull << hwc->idx);
-	cpuc->intel_cp_status &= ~(1ull << hwc->idx);
-
-	if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL))
-		intel_pmu_disable_fixed(hwc);
-	else
+	if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) {
+		intel_pmu_disable_fixed(event);
+	} else {
+		intel_clear_masks(event, hwc->idx);
 		x86_pmu_disable_event(event);
+	}
 
 	/*
 	 * Needs to be called after x86_pmu_disable_event,
 	 * so we don't trigger the event without PEBS bit set.
+	 *
+	 * Metric stuff doesn't do PEBS. So the early exit from
+	 * intel_pmu_disable_fixed() is OK.
 	 */
 	if (unlikely(event->attr.precise_ip))
 		intel_pmu_pebs_disable(event);
@@ -2193,8 +2227,22 @@ static void intel_pmu_read_event(struct perf_event *event)
 static void intel_pmu_enable_fixed(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
-	int idx = hwc->idx - INTEL_PMC_IDX_FIXED;
 	u64 ctrl_val, mask, bits = 0;
+	int idx = hwc->idx;
+
+	if (is_topdown_idx(idx)) {
+		struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+		/*
+		 * When there are other Top-Down events already active,
+		 * don't enable the SLOTS counter.
+		 */
+		if (*(u64 *)cpuc->active_mask & INTEL_PMC_OTHER_TOPDOWN_BITS(idx))
+			return;
+
+		idx = INTEL_PMC_IDX_FIXED_SLOTS;
+	}
+
+	intel_set_masks(event, idx);
 
 	/*
 	 * Enable IRQ generation (0x8), if not PEBS,
@@ -2214,6 +2262,7 @@ static void intel_pmu_enable_fixed(struct perf_event *event)
 	if (x86_pmu.version > 2 && hwc->config & ARCH_PERFMON_EVENTSEL_ANY)
 		bits |= 0x4;
 
+	idx -= INTEL_PMC_IDX_FIXED;
 	bits <<= (idx * 4);
 	mask = 0xfULL << (idx * 4);
 
@@ -2231,7 +2280,6 @@ static void intel_pmu_enable_fixed(struct perf_event *event)
 static void intel_pmu_enable_event(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
-	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 
 	if (unlikely(hwc->idx == INTEL_PMC_IDX_FIXED_BTS)) {
 		if (!__this_cpu_read(cpu_hw_events.enabled))
@@ -2241,23 +2289,15 @@ static void intel_pmu_enable_event(struct perf_event *event)
 		return;
 	}
 
-	if (event->attr.exclude_host)
-		cpuc->intel_ctrl_guest_mask |= (1ull << hwc->idx);
-	if (event->attr.exclude_guest)
-		cpuc->intel_ctrl_host_mask |= (1ull << hwc->idx);
-
-	if (unlikely(event_is_checkpointed(event)))
-		cpuc->intel_cp_status |= (1ull << hwc->idx);
-
 	if (unlikely(event->attr.precise_ip))
 		intel_pmu_pebs_enable(event);
 
 	if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) {
 		intel_pmu_enable_fixed(event);
-		return;
+	} else {
+		intel_set_masks(event, hwc->idx);
+		__x86_pmu_enable_event(hwc, ARCH_PERFMON_EVENTSEL_ENABLE);
 	}
-
-	__x86_pmu_enable_event(hwc, ARCH_PERFMON_EVENTSEL_ENABLE);
 }
 
 static void intel_pmu_add_event(struct perf_event *event)
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index ecacfbf4ebc1..116cfc5c3293 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -356,6 +356,20 @@ struct cpu_hw_events {
 #define FIXED_EVENT_CONSTRAINT(c, n)	\
 	EVENT_CONSTRAINT(c, (1ULL << (32+n)), FIXED_EVENT_FLAGS)
 
+/*
+ * Special metric counters do not actually exist, but get remapped
+ * to a combination of FxCtr3 + MSR_PERF_METRICS
+ *
+ * This allocates them to a dummy offset for the scheduler.
+ * This does not allow sharing of multiple users of the same
+ * metric without multiplexing, even though the hardware supports that
+ * in principle.
+ */
+
+#define METRIC_EVENT_CONSTRAINT(c, n)						\
+	EVENT_CONSTRAINT(c, (1ULL << (INTEL_PMC_IDX_FIXED_METRIC_BASE+n)),	\
+	FIXED_EVENT_FLAGS)
+
 /*
  * Constraint on the Event code + UMask
  */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index de753206b427..5ebfd97d4ac9 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -798,6 +798,7 @@
 #define MSR_CORE_PERF_FIXED_CTR0	0x00000309
 #define MSR_CORE_PERF_FIXED_CTR1	0x0000030a
 #define MSR_CORE_PERF_FIXED_CTR2	0x0000030b
+#define MSR_CORE_PERF_FIXED_CTR3	0x0000030c
 #define MSR_CORE_PERF_FIXED_CTR_CTRL	0x0000038d
 #define MSR_CORE_PERF_GLOBAL_STATUS	0x0000038e
 #define MSR_CORE_PERF_GLOBAL_CTRL	0x0000038f
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index eb460b63e07b..31b96e14935b 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -181,6 +181,34 @@ struct x86_pmu_capability {
  */
 #define INTEL_PMC_IDX_FIXED_BTS				(INTEL_PMC_IDX_FIXED + 15)
 
+/*
+ * We model PERF_METRICS as more magic fixed-mode PMCs, one for each metric
+ *
+ * Internally they all map to Fixed Ctr 3 (SLOTS), and allocate PERF_METRICS
+ * as an extra_reg. PERF_METRICS has no own configuration, but we fill in
+ * the configuration of FxCtr3 to enforce that all the shared users of SLOTS
+ * have the same configuration.
+ */
+#define INTEL_PMC_IDX_FIXED_METRIC_BASE		(INTEL_PMC_IDX_FIXED + 16)
+#define INTEL_PMC_IDX_TD_RETIRING		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 0)
+#define INTEL_PMC_IDX_TD_BAD_SPEC		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 1)
+#define INTEL_PMC_IDX_TD_FE_BOUND		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 2)
+#define INTEL_PMC_IDX_TD_BE_BOUND		(INTEL_PMC_IDX_FIXED_METRIC_BASE + 3)
+#define INTEL_PMC_MSK_TOPDOWN			((0xfull << INTEL_PMC_IDX_FIXED_METRIC_BASE) | \
+						 INTEL_PMC_MSK_FIXED_SLOTS)
+
+static inline bool is_metric_idx(int idx)
+{
+	return (unsigned)(idx - INTEL_PMC_IDX_FIXED_METRIC_BASE) < 4;
+}
+
+static inline bool is_topdown_idx(int idx)
+{
+	return is_metric_idx(idx) || idx == INTEL_PMC_IDX_FIXED_SLOTS;
+}
+
+#define INTEL_PMC_OTHER_TOPDOWN_BITS(bit)	(~(0x1ull << bit) & INTEL_PMC_MSK_TOPDOWN)
+
 #define GLOBAL_STATUS_COND_CHG				BIT_ULL(63)
 #define GLOBAL_STATUS_BUFFER_OVF			BIT_ULL(62)
 #define GLOBAL_STATUS_UNC_OVF				BIT_ULL(61)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V4 05/14] perf/x86/intel: Fix the name of perf capabilities for perf METRICS
  2019-09-16 13:41 [PATCH V4 00/14] TopDown metrics support for Icelake kan.liang
                   ` (3 preceding siblings ...)
  2019-09-16 13:41 ` [PATCH V4 04/14] perf/x86/intel: Basic support for metrics counters kan.liang
@ 2019-09-16 13:41 ` kan.liang
  2019-09-16 13:41 ` [PATCH V4 06/14] x86/math64: Provide a sane mul_u64_u32_div() implementation for x86_64 kan.liang
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: kan.liang @ 2019-09-16 13:41 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Bit 15 of PERF_CAPABILITIES MSR indicates this architecture provides
built in support for perf METRICS. The perf METRICS is not a PEBS
feature.

Rename pebs_metrics_available perf_metrics.

No one use the bit in current code. The following patch will use it.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

New patch for V4

 arch/x86/events/perf_event.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 116cfc5c3293..cbbead3c12dc 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -532,7 +532,7 @@ union perf_capabilities {
 		 */
 		u64	full_width_write:1;
 		u64     pebs_baseline:1;
-		u64	pebs_metrics_available:1;
+		u64	perf_metrics:1;
 		u64	pebs_output_pt_available:1;
 	};
 	u64	capabilities;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V4 06/14] x86/math64: Provide a sane mul_u64_u32_div() implementation for x86_64
  2019-09-16 13:41 [PATCH V4 00/14] TopDown metrics support for Icelake kan.liang
                   ` (4 preceding siblings ...)
  2019-09-16 13:41 ` [PATCH V4 05/14] perf/x86/intel: Fix the name of perf capabilities for perf METRICS kan.liang
@ 2019-09-16 13:41 ` kan.liang
  2019-09-16 13:41 ` [PATCH V4 07/14] perf/x86/intel: Support hardware TopDown metrics kan.liang
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: kan.liang @ 2019-09-16 13:41 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak

From: "Peter Zijlstra (Intel)" <peterz@infradead.org>

On x86_64 we can do a u64 * u64 -> u128 widening multiply followed by
a u128 / u64 -> u64 division to implement a sane version of
mul_u64_u32_div().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---

New patch for V4

 arch/x86/include/asm/div64.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/div64.h b/arch/x86/include/asm/div64.h
index 20a46150e0a8..9b8cb50768c2 100644
--- a/arch/x86/include/asm/div64.h
+++ b/arch/x86/include/asm/div64.h
@@ -73,6 +73,19 @@ static inline u64 mul_u32_u32(u32 a, u32 b)
 
 #else
 # include <asm-generic/div64.h>
+
+static inline u64 mul_u64_u32_div(u64 a, u32 mul, u32 div)
+{
+	u64 q;
+
+	asm ("mulq %2; divq %3" : "=a" (q)
+				: "a" (a), "rm" ((u64)mul), "rm" ((u64)div)
+				: "rdx");
+
+	return q;
+}
+#define mul_u64_u32_div	mul_u64_u32_div
+
 #endif /* CONFIG_X86_32 */
 
 #endif /* _ASM_X86_DIV64_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V4 07/14] perf/x86/intel: Support hardware TopDown metrics
  2019-09-16 13:41 [PATCH V4 00/14] TopDown metrics support for Icelake kan.liang
                   ` (5 preceding siblings ...)
  2019-09-16 13:41 ` [PATCH V4 06/14] x86/math64: Provide a sane mul_u64_u32_div() implementation for x86_64 kan.liang
@ 2019-09-16 13:41 ` kan.liang
  2019-09-30 13:06   ` Peter Zijlstra
  2019-09-30 13:36   ` Peter Zijlstra
  2019-09-16 13:41 ` [PATCH V4 08/14] perf/x86/intel: Support per thread RDPMC " kan.liang
                   ` (7 subsequent siblings)
  14 siblings, 2 replies; 26+ messages in thread
From: kan.liang @ 2019-09-16 13:41 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Intro
=====

Icelake has support for measuring the four top level TopDown metrics
directly in hardware. This is implemented by an additional "metrics"
register, and a new Fixed Counter 3 that measures pipeline "slots".

Events
======

We export four metric events as separate perf events, which map to
internal "metrics" counter register. Those events do not exist in
hardware, but can be allocated by the scheduler.

For the event mapping we use a special 0x00 event code, which is
reserved for fake events. The metric events start from umask 0x10.

When setting up such events they point to the slots counter, and a
special callback, update_topdown_event(), reads the additional metrics
msr to generate the metrics. Then the metric is reported by multiplying
the metric (fraction) with slots.

This multiplication allows to easily keep a running count, for example
when the slots counter overflows, and makes all the standard tools, such
as a perf stat, work. They can do deltas of the values without needing
to know about fraction. This also simplifies accumulating the counts
of child events, which otherwise would need to know how to average
fraction values.

All four metric events don't support sampling. Since they will be
handled specially for event update, a flag PERF_X86_EVENT_TOPDOWN is
introduced to indicate this case.

The slots event can support both sampling and counting.
For counting, the flag is also applied.
For sampling, it will be handled normally as other normal events.

Groups
======

To avoid reading the METRICS register multiple times, the metrics and
slots value can only be updated by the first slots/metrics event in a
group. All active slots and metrics events will be updated one time.

Reset
======

Both PERF_METRICS and fixed counter 3 are required to start from zero.
However, current perf has -max_period as default initial value.
The patch force initial value as 0 for topdown and slots event counting.

Additionally, for certain scenarios that involve counting metrics at
high rates, SW is required to periodically clear both MSRs in order to
maintain accurate measurements. In this patch, both PERF_METRICS and
Fixed counter 3 are reset for each read.

NMI
======

The METRICS related register may be overflow. The bit 48 of STATUS
register will be set. If so, PERF_METRICS and Fixed counter 3 are
required to be reset. The patch also update all active slots and
metrics events in NMI handler.

The update_topdown_event() has to read two registers separately. The
values may be modify by a NMI. PMU has to be disabled before calling the
function.

RDPMC
======

RDPMC is temporarily disabled. The following patch will enable it.

Originally-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

Changes since V3:
- Modify the description
- Fix unconditionally allows collecting 4 extra events

 arch/x86/events/core.c           |  38 ++++++
 arch/x86/events/intel/core.c     | 215 ++++++++++++++++++++++++++++++-
 arch/x86/events/perf_event.h     |  39 ++++++
 arch/x86/include/asm/msr-index.h |   2 +
 4 files changed, 290 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index fc8b0cbe5dc1..283de3e548b4 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -76,6 +76,9 @@ u64 x86_perf_event_update(struct perf_event *event)
 	if (idx == INTEL_PMC_IDX_FIXED_BTS)
 		return 0;
 
+	/* Specially handle the counting of Topdown slots/metrics */
+	if (unlikely(is_topdown_count(event)) && x86_pmu.update_topdown_event)
+		return x86_pmu.update_topdown_event(event);
 	/*
 	 * Careful: an NMI might modify the previous event value.
 	 *
@@ -992,6 +995,26 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 	return unsched ? -EINVAL : 0;
 }
 
+static int add_nr_metric_event(struct cpu_hw_events *cpuc,
+			       struct perf_event *event,
+			       int *max_count)
+{
+	/* There are 4 TopDown metrics events. */
+	if (is_metric_event(event) && (++cpuc->n_metric_event > 4))
+		return -EINVAL;
+
+	*max_count += cpuc->n_metric_event;
+
+	return 0;
+}
+
+static void del_nr_metric_event(struct cpu_hw_events *cpuc,
+				struct perf_event *event)
+{
+	if (is_metric_event(event))
+		cpuc->n_metric_event--;
+}
+
 /*
  * dogrp: true if must collect siblings events (group)
  * returns total number of events and error code
@@ -1027,6 +1050,10 @@ static int collect_events(struct cpu_hw_events *cpuc, struct perf_event *leader,
 		cpuc->pebs_output = is_pebs_pt(leader) + 1;
 	}
 
+	if (x86_pmu.intel_cap.perf_metrics &&
+	    add_nr_metric_event(cpuc, leader, &max_count))
+			return -EINVAL;
+
 	if (is_x86_event(leader)) {
 		if (n >= max_count)
 			return -EINVAL;
@@ -1041,6 +1068,10 @@ static int collect_events(struct cpu_hw_events *cpuc, struct perf_event *leader,
 		    event->state <= PERF_EVENT_STATE_OFF)
 			continue;
 
+		if (x86_pmu.intel_cap.perf_metrics &&
+		    add_nr_metric_event(cpuc, event, &max_count))
+			return -EINVAL;
+
 		if (n >= max_count)
 			return -EINVAL;
 
@@ -1204,6 +1235,11 @@ int x86_perf_event_set_period(struct perf_event *event)
 	if (idx == INTEL_PMC_IDX_FIXED_BTS)
 		return 0;
 
+	/* Specially handle the counting of Topdown slots/metrics */
+	if (unlikely(is_topdown_count(event)) &&
+	    x86_pmu.set_topdown_event_period)
+		return x86_pmu.set_topdown_event_period(event);
+
 	/*
 	 * If we are way outside a reasonable range then just skip forward:
 	 */
@@ -1481,6 +1517,8 @@ static void x86_pmu_del(struct perf_event *event, int flags)
 	}
 	cpuc->event_constraint[i-1] = NULL;
 	--cpuc->n_events;
+	if (x86_pmu.intel_cap.perf_metrics)
+		del_nr_metric_event(cpuc, event);
 
 	perf_event_update_userpage(event);
 
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index c7e093e5abd2..71f3086a8adc 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -247,6 +247,10 @@ static struct event_constraint intel_icl_event_constraints[] = {
 	FIXED_EVENT_CONSTRAINT(0x003c, 1),	/* CPU_CLK_UNHALTED.CORE */
 	FIXED_EVENT_CONSTRAINT(0x0300, 2),	/* CPU_CLK_UNHALTED.REF */
 	FIXED_EVENT_CONSTRAINT(0x0400, 3),	/* SLOTS */
+	METRIC_EVENT_CONSTRAINT(0x1000, 0),	/* Retiring metric */
+	METRIC_EVENT_CONSTRAINT(0x1100, 1),	/* Bad speculation metric */
+	METRIC_EVENT_CONSTRAINT(0x1200, 2),	/* FE bound metric */
+	METRIC_EVENT_CONSTRAINT(0x1300, 3),	/* BE bound metric */
 	INTEL_EVENT_CONSTRAINT_RANGE(0x03, 0x0a, 0xf),
 	INTEL_EVENT_CONSTRAINT_RANGE(0x1f, 0x28, 0xf),
 	INTEL_EVENT_CONSTRAINT(0x32, 0xf),	/* SW_PREFETCH_ACCESS.* */
@@ -267,6 +271,14 @@ static struct extra_reg intel_icl_extra_regs[] __read_mostly = {
 	INTEL_UEVENT_EXTRA_REG(0x01bb, MSR_OFFCORE_RSP_1, 0x3fffffbfffull, RSP_1),
 	INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x01cd),
 	INTEL_UEVENT_EXTRA_REG(0x01c6, MSR_PEBS_FRONTEND, 0x7fff17, FE),
+	/*
+	 * The original Fixed Ctr 3 are shared from different metrics
+	 * events. So use the extra reg to enforce the same
+	 * configuration on the original register, but do not actually
+	 * write to it.
+	 */
+	INTEL_UEVENT_EXTRA_REG(0x0400, 0, -1L, TOPDOWN),
+	INTEL_UEVENT_TOPDOWN_EXTRA_REG(0x1000),
 	EVENT_EXTRA_END
 };
 
@@ -2216,10 +2228,148 @@ static void intel_pmu_del_event(struct perf_event *event)
 		intel_pmu_pebs_del(event);
 }
 
+static bool is_first_topdown_event_in_group(struct perf_event *event)
+{
+	struct perf_event *first = NULL;
+
+	if (is_topdown_event(event->group_leader))
+		first = event->group_leader;
+	else {
+		for_each_sibling_event(first, event->group_leader)
+			if (is_topdown_event(first))
+				break;
+	}
+
+	if (event == first)
+		return true;
+
+	return false;
+}
+
+static int icl_set_topdown_event_period(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	s64 left = local64_read(&hwc->period_left);
+
+	/*
+	 * Clear PERF_METRICS and Fixed counter 3 in initialization.
+	 * After that, both MSRs will be cleared for each read.
+	 * Don't need to clear them again.
+	 */
+	if (left == x86_pmu.max_period) {
+		wrmsrl(MSR_CORE_PERF_FIXED_CTR3, 0);
+		wrmsrl(MSR_PERF_METRICS, 0);
+		local64_set(&hwc->period_left, 0);
+	}
+
+	perf_event_update_userpage(event);
+
+	return 0;
+}
+
+static u64 icl_get_metrics_event_value(u64 metric, u64 slots, int idx)
+{
+	u32 val;
+
+	/*
+	 * The metric is reported as an 8bit integer fraction
+	 * suming up to 0xff.
+	 * slots-in-metric = (Metric / 0xff) * slots
+	 */
+	val = (metric >> ((idx - INTEL_PMC_IDX_FIXED_METRIC_BASE) * 8)) & 0xff;
+	return  mul_u64_u32_div(slots, val, 0xff);
+}
+
+static void __icl_update_topdown_event(struct perf_event *event,
+				       u64 slots, u64 metrics)
+{
+	int idx = event->hw.idx;
+	u64 delta;
+
+	if (is_metric_idx(idx))
+		delta = icl_get_metrics_event_value(metrics, slots, idx);
+	else
+		delta = slots;
+
+	local64_add(delta, &event->count);
+}
+
+/*
+ * Update all active Topdown events.
+ *
+ * The PERF_METRICS and Fixed counter 3 are read separately. The values may be
+ * modify by a NMI. PMU has to be disabled before calling this function.
+ */
+static u64 icl_update_topdown_event(struct perf_event *event)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	struct perf_event *other;
+	u64 slots, metrics;
+	int idx;
+
+	/*
+	 * Only need to update all events for the first
+	 * slots/metrics event in a group
+	 */
+	if (event && !is_first_topdown_event_in_group(event))
+		return 0;
+
+	/* read Fixed counter 3 */
+	rdpmcl((3 | 1<<30), slots);
+	if (!slots)
+		return 0;
+
+	/* read PERF_METRICS */
+	rdpmcl((1<<29), metrics);
+
+	for_each_set_bit(idx, cpuc->active_mask, INTEL_PMC_IDX_TD_BE_BOUND + 1) {
+		if (!is_topdown_idx(idx))
+			continue;
+		other = cpuc->events[idx];
+		__icl_update_topdown_event(other, slots, metrics);
+	}
+
+	/*
+	 * Check and update this event, which may have been cleared
+	 * in active_mask e.g. x86_pmu_stop()
+	 */
+	if (event && !test_bit(event->hw.idx, cpuc->active_mask))
+		__icl_update_topdown_event(event, slots, metrics);
+
+	/*
+	 * To avoid the known issues as below, the PERF_METRICS and
+	 * Fixed counter 3 are reset for each read.
+	 * - The 8bit metrics ratio values lose precision when the
+	 *   measurement period gets longer.
+	 * - The PERF_METRICS may report wrong value if its delta was
+	 *   less than 1/255 of Fixed counter 3.
+	 */
+	wrmsrl(MSR_PERF_METRICS, 0);
+	wrmsrl(MSR_CORE_PERF_FIXED_CTR3, 0);
+
+	return slots;
+}
+
+static void intel_pmu_read_topdown_event(struct perf_event *event)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+	/* Only need to call update_topdown_event() once for group read. */
+	if ((cpuc->txn_flags & PERF_PMU_TXN_READ) &&
+	    !is_first_topdown_event_in_group(event))
+		return;
+
+	perf_pmu_disable(event->pmu);
+	x86_pmu.update_topdown_event(event);
+	perf_pmu_enable(event->pmu);
+}
+
 static void intel_pmu_read_event(struct perf_event *event)
 {
 	if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
 		intel_pmu_auto_reload_read(event);
+	else if (is_topdown_count(event) && x86_pmu.update_topdown_event)
+		intel_pmu_read_topdown_event(event);
 	else
 		x86_perf_event_update(event);
 }
@@ -2431,6 +2581,15 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
 			intel_pt_interrupt();
 	}
 
+	/*
+	 * Intel Perf mertrics
+	 */
+	if (__test_and_clear_bit(48, (unsigned long *)&status)) {
+		handled++;
+		if (x86_pmu.update_topdown_event)
+			x86_pmu.update_topdown_event(NULL);
+	}
+
 	/*
 	 * Checkpointed counters can lead to 'spurious' PMIs because the
 	 * rollback caused by the PMI will have cleared the overflow status
@@ -3349,6 +3508,42 @@ static int intel_pmu_hw_config(struct perf_event *event)
 	if (event->attr.type != PERF_TYPE_RAW)
 		return 0;
 
+	/*
+	 * Config Topdown slots and metric events
+	 *
+	 * The slots event on Fixed Counter 3 can support sampling,
+	 * which will be handled normally in x86_perf_event_update().
+	 *
+	 * The metric events don't support sampling.
+	 *
+	 * For counting, topdown slots and metric events will be
+	 * handled specially for event update.
+	 * A flag PERF_X86_EVENT_TOPDOWN is applied for the case.
+	 */
+	if (x86_pmu.intel_cap.perf_metrics && is_topdown_event(event)) {
+		if (is_metric_event(event) && is_sampling_event(event))
+			return -EINVAL;
+
+		if (!is_sampling_event(event)) {
+			if (event->attr.config1 != 0)
+				return -EINVAL;
+			if (event->attr.config & ARCH_PERFMON_EVENTSEL_ANY)
+				return -EINVAL;
+			/*
+			 * Put configuration (minus event) into config1 so that
+			 * the scheduler enforces through an extra_reg that
+			 * all instances of the metrics events have the same
+			 * configuration.
+			 */
+			event->attr.config1 = event->hw.config &
+					      X86_ALL_EVENT_FLAGS;
+			event->hw.flags |= PERF_X86_EVENT_TOPDOWN;
+
+			if (is_metric_event(event))
+				event->hw.flags &= ~PERF_X86_EVENT_RDPMC_ALLOWED;
+		}
+	}
+
 	if (!(event->attr.config & ARCH_PERFMON_EVENTSEL_ANY))
 		return 0;
 
@@ -5095,6 +5290,8 @@ __init int intel_pmu_init(void)
 		x86_pmu.rtm_abort_event = X86_CONFIG(.event=0xca, .umask=0x02);
 		x86_pmu.lbr_pt_coexist = true;
 		intel_pmu_pebs_data_source_skl(pmem);
+		x86_pmu.update_topdown_event = icl_update_topdown_event;
+		x86_pmu.set_topdown_event_period = icl_set_topdown_event_period;
 		pr_cont("Icelake events, ");
 		name = "icelake";
 		break;
@@ -5151,10 +5348,17 @@ __init int intel_pmu_init(void)
 		 * counter, so do not extend mask to generic counters
 		 */
 		for_each_event_constraint(c, x86_pmu.event_constraints) {
-			if (c->cmask == FIXED_EVENT_FLAGS
-			    && c->idxmsk64 != INTEL_PMC_MSK_FIXED_REF_CYCLES
-			    && c->idxmsk64 != INTEL_PMC_MSK_FIXED_SLOTS) {
-				c->idxmsk64 |= (1ULL << x86_pmu.num_counters) - 1;
+			if (c->cmask == FIXED_EVENT_FLAGS) {
+				/*
+				 * Don't extend topdown slots and metrics
+				 * events to generic counters.
+				 */
+				if (c->idxmsk64 & INTEL_PMC_MSK_TOPDOWN) {
+					c->weight = hweight64(c->idxmsk64);
+					continue;
+				}
+				if (c->idxmsk64 != INTEL_PMC_MSK_FIXED_REF_CYCLES)
+					c->idxmsk64 |= (1ULL << x86_pmu.num_counters) - 1;
 			}
 			c->idxmsk64 &=
 				~(~0ULL << (INTEL_PMC_IDX_FIXED + x86_pmu.num_counters_fixed));
@@ -5207,6 +5411,9 @@ __init int intel_pmu_init(void)
 	if (x86_pmu.counter_freezing)
 		x86_pmu.handle_irq = intel_pmu_handle_irq_v4;
 
+	if (x86_pmu.intel_cap.perf_metrics)
+		x86_pmu.intel_ctrl |= 1ULL << 48;
+
 	return 0;
 }
 
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index cbbead3c12dc..dbe969ed4403 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -40,6 +40,7 @@ enum extra_reg_type {
 	EXTRA_REG_LBR   = 2,	/* lbr_select */
 	EXTRA_REG_LDLAT = 3,	/* ld_lat_threshold */
 	EXTRA_REG_FE    = 4,    /* fe_* */
+	EXTRA_REG_TOPDOWN = 5,	/* Topdown slots/metrics */
 
 	EXTRA_REG_MAX		/* number of entries needed */
 };
@@ -77,6 +78,29 @@ static inline bool constraint_match(struct event_constraint *c, u64 ecode)
 #define PERF_X86_EVENT_AUTO_RELOAD	0x0200 /* use PEBS auto-reload */
 #define PERF_X86_EVENT_LARGE_PEBS	0x0400 /* use large PEBS */
 #define PERF_X86_EVENT_PEBS_VIA_PT	0x0800 /* use PT buffer for PEBS */
+#define PERF_X86_EVENT_TOPDOWN		0x1000 /* Count Topdown slots/metrics events */
+
+static inline bool is_topdown_count(struct perf_event *event)
+{
+	return event->hw.flags & PERF_X86_EVENT_TOPDOWN;
+}
+
+static inline bool is_metric_event(struct perf_event *event)
+{
+	return ((event->attr.config & ARCH_PERFMON_EVENTSEL_EVENT) == 0) &&
+	       ((event->attr.config & INTEL_ARCH_EVENT_MASK) >= 0x1000)  &&
+	       ((event->attr.config & INTEL_ARCH_EVENT_MASK) <= 0x1300);
+}
+
+static inline bool is_slots_event(struct perf_event *event)
+{
+	return (event->attr.config & INTEL_ARCH_EVENT_MASK) == 0x0400;
+}
+
+static inline bool is_topdown_event(struct perf_event *event)
+{
+	return is_metric_event(event) || is_slots_event(event);
+}
 
 struct amd_nb {
 	int nb_id;  /* NorthBridge id */
@@ -266,6 +290,12 @@ struct cpu_hw_events {
 	 */
 	u64				tfa_shadow;
 
+	/*
+	 * Perf Metrics
+	 */
+	/* number of accepted metrics events */
+	int				n_metric_event;
+
 	/*
 	 * AMD specific bits
 	 */
@@ -517,6 +547,9 @@ struct extra_reg {
 			       0xffff, \
 			       LDLAT)
 
+#define INTEL_UEVENT_TOPDOWN_EXTRA_REG(event)	\
+	EVENT_EXTRA_REG(event, 0, 0xfcff, -1L, TOPDOWN)
+
 #define EVENT_EXTRA_END EVENT_EXTRA_REG(0, 0, 0, 0, RSP_0)
 
 union perf_capabilities {
@@ -696,6 +729,12 @@ struct x86_pmu {
 	 */
 	atomic_t	lbr_exclusive[x86_lbr_exclusive_max];
 
+	/*
+	 * Intel perf metrics
+	 */
+	u64		(*update_topdown_event)(struct perf_event *event);
+	int		(*set_topdown_event_period)(struct perf_event *event);
+
 	/*
 	 * AMD bits
 	 */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 5ebfd97d4ac9..74064bedd4a1 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -118,6 +118,8 @@
 #define MSR_TURBO_RATIO_LIMIT1		0x000001ae
 #define MSR_TURBO_RATIO_LIMIT2		0x000001af
 
+#define MSR_PERF_METRICS		0x00000329
+
 #define MSR_LBR_SELECT			0x000001c8
 #define MSR_LBR_TOS			0x000001c9
 #define MSR_LBR_NHM_FROM		0x00000680
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V4 08/14] perf/x86/intel: Support per thread RDPMC TopDown metrics
  2019-09-16 13:41 [PATCH V4 00/14] TopDown metrics support for Icelake kan.liang
                   ` (6 preceding siblings ...)
  2019-09-16 13:41 ` [PATCH V4 07/14] perf/x86/intel: Support hardware TopDown metrics kan.liang
@ 2019-09-16 13:41 ` kan.liang
  2019-09-30 15:52   ` Peter Zijlstra
  2019-09-16 13:41 ` [PATCH V4 09/14] perf/x86/intel: Export TopDown events for Icelake kan.liang
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 26+ messages in thread
From: kan.liang @ 2019-09-16 13:41 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

With Icelake CPUs, the TopDown metrics are directly available as fixed
counters and do not require generic counters, which make it possible to
measure TopDown per thread/process instead of only per core.

The metrics and slots values have to be saved/restored during context
switching.
The saved values are also used as previous values to calculate the
delta.

The PERF_METRICS MSR value will be returned if RDPMC metrics events.

Originally-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

No changes since V3

 arch/x86/events/core.c       |   5 +-
 arch/x86/events/intel/core.c | 102 ++++++++++++++++++++++++++++-------
 include/linux/perf_event.h   |   3 ++
 3 files changed, 91 insertions(+), 19 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 283de3e548b4..585fe1a81bf0 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2198,7 +2198,10 @@ static int x86_pmu_event_idx(struct perf_event *event)
 	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
 		return 0;
 
-	if (x86_pmu.num_counters_fixed && idx >= INTEL_PMC_IDX_FIXED) {
+	/* Return PERF_METRICS MSR value for metrics event */
+	if (is_metric_idx(idx))
+		idx = 1 << 29;
+	else if (x86_pmu.num_counters_fixed && idx >= INTEL_PMC_IDX_FIXED) {
 		idx -= INTEL_PMC_IDX_FIXED;
 		idx |= 1 << 30;
 	}
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 71f3086a8adc..7ec0f350d2ac 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2262,6 +2262,11 @@ static int icl_set_topdown_event_period(struct perf_event *event)
 		local64_set(&hwc->period_left, 0);
 	}
 
+	if ((hwc->saved_slots) && is_first_topdown_event_in_group(event)) {
+		wrmsrl(MSR_CORE_PERF_FIXED_CTR3, hwc->saved_slots);
+		wrmsrl(MSR_PERF_METRICS, hwc->saved_metric);
+	}
+
 	perf_event_update_userpage(event);
 
 	return 0;
@@ -2280,7 +2285,7 @@ static u64 icl_get_metrics_event_value(u64 metric, u64 slots, int idx)
 	return  mul_u64_u32_div(slots, val, 0xff);
 }
 
-static void __icl_update_topdown_event(struct perf_event *event,
+static u64 icl_get_topdown_value(struct perf_event *event,
 				       u64 slots, u64 metrics)
 {
 	int idx = event->hw.idx;
@@ -2291,7 +2296,50 @@ static void __icl_update_topdown_event(struct perf_event *event,
 	else
 		delta = slots;
 
-	local64_add(delta, &event->count);
+	return delta;
+}
+
+static void __icl_update_topdown_event(struct perf_event *event,
+				       u64 slots, u64 metrics,
+				       u64 last_slots, u64 last_metrics)
+{
+	u64 delta, last = 0;
+
+	delta = icl_get_topdown_value(event, slots, metrics);
+	if (last_slots)
+		last = icl_get_topdown_value(event, last_slots, last_metrics);
+
+	/*
+	 * The 8bit integer fraction of metric may be not accurate,
+	 * especially when the changes is very small.
+	 * For example, if only a few bad_spec happens, the fraction
+	 * may be reduced from 1 to 0. If so, the bad_spec event value
+	 * will be 0 which is definitely less than the last value.
+	 * Avoid update event->count for this case.
+	 */
+	if (delta > last) {
+		delta -= last;
+		local64_add(delta, &event->count);
+	}
+}
+
+static void update_saved_topdown_regs(struct perf_event *event,
+				      u64 slots, u64 metrics)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	struct perf_event *other;
+	int idx;
+
+	event->hw.saved_slots = slots;
+	event->hw.saved_metric = metrics;
+
+	for_each_set_bit(idx, cpuc->active_mask, INTEL_PMC_IDX_TD_BE_BOUND + 1) {
+		if (!is_topdown_idx(idx))
+			continue;
+		other = cpuc->events[idx];
+		other->hw.saved_slots = slots;
+		other->hw.saved_metric = metrics;
+	}
 }
 
 /*
@@ -2305,6 +2353,7 @@ static u64 icl_update_topdown_event(struct perf_event *event)
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct perf_event *other;
 	u64 slots, metrics;
+	bool reset = true;
 	int idx;
 
 	/*
@@ -2326,26 +2375,46 @@ static u64 icl_update_topdown_event(struct perf_event *event)
 		if (!is_topdown_idx(idx))
 			continue;
 		other = cpuc->events[idx];
-		__icl_update_topdown_event(other, slots, metrics);
+		__icl_update_topdown_event(other, slots, metrics,
+					   event ? event->hw.saved_slots : 0,
+					   event ? event->hw.saved_metric : 0);
 	}
 
 	/*
 	 * Check and update this event, which may have been cleared
 	 * in active_mask e.g. x86_pmu_stop()
 	 */
-	if (event && !test_bit(event->hw.idx, cpuc->active_mask))
-		__icl_update_topdown_event(event, slots, metrics);
+	if (event && !test_bit(event->hw.idx, cpuc->active_mask)) {
+		__icl_update_topdown_event(event, slots, metrics,
+					   event->hw.saved_slots,
+					   event->hw.saved_metric);
 
-	/*
-	 * To avoid the known issues as below, the PERF_METRICS and
-	 * Fixed counter 3 are reset for each read.
-	 * - The 8bit metrics ratio values lose precision when the
-	 *   measurement period gets longer.
-	 * - The PERF_METRICS may report wrong value if its delta was
-	 *   less than 1/255 of Fixed counter 3.
-	 */
-	wrmsrl(MSR_PERF_METRICS, 0);
-	wrmsrl(MSR_CORE_PERF_FIXED_CTR3, 0);
+		/*
+		 * In x86_pmu_stop(), the event is cleared in active_mask first,
+		 * then drain the delta, which indicates context switch for
+		 * counting.
+		 * Save metric and slots for context switch.
+		 * Don't need to reset the PERF_METRICS and Fixed counter 3.
+		 * Because the values will be restored in next schedule in.
+		 */
+		update_saved_topdown_regs(event, slots, metrics);
+		reset = false;
+	}
+
+	if (reset) {
+		/*
+		 * To avoid the known issues as below, the PERF_METRICS and
+		 * Fixed counter 3 are reset for each read.
+		 * - The 8bit metrics ratio values lose precision when the
+		 *   measurement period gets longer.
+		 * - The PERF_METRICS may report wrong value if its delta was
+		 *   less than 1/255 of Fixed counter 3.
+		 */
+		wrmsrl(MSR_PERF_METRICS, 0);
+		wrmsrl(MSR_CORE_PERF_FIXED_CTR3, 0);
+		if (event)
+			update_saved_topdown_regs(event, 0, 0);
+	}
 
 	return slots;
 }
@@ -3538,9 +3607,6 @@ static int intel_pmu_hw_config(struct perf_event *event)
 			event->attr.config1 = event->hw.config &
 					      X86_ALL_EVENT_FLAGS;
 			event->hw.flags |= PERF_X86_EVENT_TOPDOWN;
-
-			if (is_metric_event(event))
-				event->hw.flags &= ~PERF_X86_EVENT_RDPMC_ALLOWED;
 		}
 	}
 
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 61448c19a132..c125068f2e16 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -133,6 +133,9 @@ struct hw_perf_event {
 
 			struct hw_perf_event_extra extra_reg;
 			struct hw_perf_event_extra branch_reg;
+
+			u64		saved_slots;
+			u64		saved_metric;
 		};
 		struct { /* software */
 			struct hrtimer	hrtimer;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V4 09/14] perf/x86/intel: Export TopDown events for Icelake
  2019-09-16 13:41 [PATCH V4 00/14] TopDown metrics support for Icelake kan.liang
                   ` (7 preceding siblings ...)
  2019-09-16 13:41 ` [PATCH V4 08/14] perf/x86/intel: Support per thread RDPMC " kan.liang
@ 2019-09-16 13:41 ` kan.liang
  2019-09-16 13:41 ` [PATCH V4 10/14] perf/x86/intel: Disable sampling read slots and topdown kan.liang
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: kan.liang @ 2019-09-16 13:41 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Export new TopDown metrics events for perf that map to the sub metrics
in the metrics register, and another for the new slots fixed counter.
This makes the new fixed counters in Icelake visible to the perf
user tools.

Originally-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

No changes since V3

 arch/x86/events/intel/core.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 7ec0f350d2ac..8218f26035e3 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -321,6 +321,12 @@ EVENT_ATTR_STR_HT(topdown-recovery-bubbles, td_recovery_bubbles,
 EVENT_ATTR_STR_HT(topdown-recovery-bubbles.scale, td_recovery_bubbles_scale,
 	"4", "2");
 
+EVENT_ATTR_STR(slots,			slots,		"event=0x00,umask=0x4");
+EVENT_ATTR_STR(topdown-retiring,	td_retiring,	"event=0x00,umask=0x10");
+EVENT_ATTR_STR(topdown-bad-spec,	td_bad_spec,	"event=0x00,umask=0x11");
+EVENT_ATTR_STR(topdown-fe-bound,	td_fe_bound,	"event=0x00,umask=0x12");
+EVENT_ATTR_STR(topdown-be-bound,	td_be_bound,	"event=0x00,umask=0x13");
+
 static struct attribute *snb_events_attrs[] = {
 	EVENT_PTR(td_slots_issued),
 	EVENT_PTR(td_slots_retired),
@@ -4581,6 +4587,15 @@ static struct attribute *icl_events_attrs[] = {
 	NULL,
 };
 
+static struct attribute *icl_td_events_attrs[] = {
+	EVENT_PTR(slots),
+	EVENT_PTR(td_retiring),
+	EVENT_PTR(td_bad_spec),
+	EVENT_PTR(td_fe_bound),
+	EVENT_PTR(td_be_bound),
+	NULL,
+};
+
 static struct attribute *icl_tsx_events_attrs[] = {
 	EVENT_PTR(tx_start),
 	EVENT_PTR(tx_abort),
@@ -5352,6 +5367,7 @@ __init int intel_pmu_init(void)
 			hsw_format_attr : nhm_format_attr;
 		extra_skl_attr = skl_format_attr;
 		mem_attr = icl_events_attrs;
+		td_attr = icl_td_events_attrs;
 		tsx_attr = icl_tsx_events_attrs;
 		x86_pmu.rtm_abort_event = X86_CONFIG(.event=0xca, .umask=0x02);
 		x86_pmu.lbr_pt_coexist = true;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V4 10/14] perf/x86/intel: Disable sampling read slots and topdown
  2019-09-16 13:41 [PATCH V4 00/14] TopDown metrics support for Icelake kan.liang
                   ` (8 preceding siblings ...)
  2019-09-16 13:41 ` [PATCH V4 09/14] perf/x86/intel: Export TopDown events for Icelake kan.liang
@ 2019-09-16 13:41 ` kan.liang
  2019-09-16 13:41 ` [PATCH V4 11/14] perf/x86/intel: Name global status bit in NMI handler kan.liang
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: kan.liang @ 2019-09-16 13:41 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The slots event supports sampling. Users may sampling read slots and
metrics events, e.g perf record -e '{slots, topdown-retiring}:S'.
But the metrics event will reset the fixed counter 3 which will impact
the sampling of the slots event.

Add specific validate_group() support to reject the case and error out
for Icelake.

An alternative fix may unconditionally disable SLOTS sampling. But it's
not a decent fix. Because users may want to only sampling slot events
without topdown metrics event.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

No changes since V3

 arch/x86/events/core.c       |  4 ++++
 arch/x86/events/intel/core.c | 20 ++++++++++++++++++++
 arch/x86/events/perf_event.h |  2 ++
 3 files changed, 26 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 585fe1a81bf0..350c21aaea61 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2105,7 +2105,11 @@ static int validate_group(struct perf_event *event)
 
 	fake_cpuc->n_events = 0;
 	ret = x86_pmu.schedule_events(fake_cpuc, n, NULL);
+	if (ret)
+		goto out;
 
+	if (x86_pmu.validate_group)
+		ret = x86_pmu.validate_group(fake_cpuc, n);
 out:
 	free_fake_cpuc(fake_cpuc);
 	return ret;
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 8218f26035e3..b4b02bd24737 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4526,6 +4526,25 @@ static __init void intel_ht_bug(void)
 	x86_pmu.stop_scheduling = intel_stop_scheduling;
 }
 
+static int icl_validate_group(struct cpu_hw_events *cpuc, int n)
+{
+	bool has_sampling_slots = false, has_metrics = false;
+	struct perf_event *e;
+	int i;
+
+	for (i = 0; i < n; i++) {
+		e = cpuc->event_list[i];
+		if (is_slots_event(e) && is_sampling_event(e))
+			has_sampling_slots = true;
+
+		if (is_metric_event(e))
+			has_metrics = true;
+	}
+	if (unlikely(has_sampling_slots && has_metrics))
+		return -EINVAL;
+	return 0;
+}
+
 EVENT_ATTR_STR(mem-loads,	mem_ld_hsw,	"event=0xcd,umask=0x1,ldlat=3");
 EVENT_ATTR_STR(mem-stores,	mem_st_hsw,	"event=0xd0,umask=0x82")
 
@@ -5374,6 +5393,7 @@ __init int intel_pmu_init(void)
 		intel_pmu_pebs_data_source_skl(pmem);
 		x86_pmu.update_topdown_event = icl_update_topdown_event;
 		x86_pmu.set_topdown_event_period = icl_set_topdown_event_period;
+		x86_pmu.validate_group = icl_validate_group;
 		pr_cont("Icelake events, ");
 		name = "icelake";
 		break;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index dbe969ed4403..032c3495982a 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -661,6 +661,8 @@ struct x86_pmu {
 	int		perfctr_second_write;
 	u64		(*limit_period)(struct perf_event *event, u64 l);
 
+	int		(*validate_group)(struct cpu_hw_events *cpuc, int n);
+
 	/* PMI handler bits */
 	unsigned int	late_ack		:1,
 			counter_freezing	:1;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V4 11/14] perf/x86/intel: Name global status bit in NMI handler
  2019-09-16 13:41 [PATCH V4 00/14] TopDown metrics support for Icelake kan.liang
                   ` (9 preceding siblings ...)
  2019-09-16 13:41 ` [PATCH V4 10/14] perf/x86/intel: Disable sampling read slots and topdown kan.liang
@ 2019-09-16 13:41 ` kan.liang
  2019-09-16 13:41 ` [PATCH V4 12/14] perf/x86: Use event_base_rdpmc for RDPMC userspace support kan.liang
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: kan.liang @ 2019-09-16 13:41 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The bit index number of global status is directly used in current NMI
handler. Using a meaningful name to replace the number to improve the
readability of code.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

New patch for V4

 arch/x86/events/intel/core.c      | 6 +++---
 arch/x86/include/asm/perf_event.h | 7 +++++--
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index b4b02bd24737..db1337ea0634 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2638,7 +2638,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
 	/*
 	 * PEBS overflow sets bit 62 in the global status register
 	 */
-	if (__test_and_clear_bit(62, (unsigned long *)&status)) {
+	if (__test_and_clear_bit(GLOBAL_STATUS_BUFFER_OVF_BIT, (unsigned long *)&status)) {
 		handled++;
 		x86_pmu.drain_pebs(regs);
 		status &= x86_pmu.intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
@@ -2647,7 +2647,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
 	/*
 	 * Intel PT
 	 */
-	if (__test_and_clear_bit(55, (unsigned long *)&status)) {
+	if (__test_and_clear_bit(GLOBAL_STATUS_TRACE_TOPAPMI_BIT, (unsigned long *)&status)) {
 		handled++;
 		if (unlikely(perf_guest_cbs && perf_guest_cbs->is_in_guest() &&
 			perf_guest_cbs->handle_intel_pt_intr))
@@ -2659,7 +2659,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
 	/*
 	 * Intel Perf mertrics
 	 */
-	if (__test_and_clear_bit(48, (unsigned long *)&status)) {
+	if (__test_and_clear_bit(GLOBAL_STATUS_PERF_METRICS_OVF_BIT, (unsigned long *)&status)) {
 		handled++;
 		if (x86_pmu.update_topdown_event)
 			x86_pmu.update_topdown_event(NULL);
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 31b96e14935b..d7d3a7caba41 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -210,12 +210,15 @@ static inline bool is_topdown_idx(int idx)
 #define INTEL_PMC_OTHER_TOPDOWN_BITS(bit)	(~(0x1ull << bit) & INTEL_PMC_MSK_TOPDOWN)
 
 #define GLOBAL_STATUS_COND_CHG				BIT_ULL(63)
-#define GLOBAL_STATUS_BUFFER_OVF			BIT_ULL(62)
+#define GLOBAL_STATUS_BUFFER_OVF_BIT			62
+#define GLOBAL_STATUS_BUFFER_OVF			BIT_ULL(GLOBAL_STATUS_BUFFER_OVF_BIT)
 #define GLOBAL_STATUS_UNC_OVF				BIT_ULL(61)
 #define GLOBAL_STATUS_ASIF				BIT_ULL(60)
 #define GLOBAL_STATUS_COUNTERS_FROZEN			BIT_ULL(59)
 #define GLOBAL_STATUS_LBRS_FROZEN			BIT_ULL(58)
-#define GLOBAL_STATUS_TRACE_TOPAPMI			BIT_ULL(55)
+#define GLOBAL_STATUS_TRACE_TOPAPMI_BIT			55
+#define GLOBAL_STATUS_TRACE_TOPAPMI			BIT_ULL(GLOBAL_STATUS_TRACE_TOPAPMI_BIT)
+#define GLOBAL_STATUS_PERF_METRICS_OVF_BIT		48
 
 /*
  * Adaptive PEBS v4
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V4 12/14] perf/x86: Use event_base_rdpmc for RDPMC userspace support
  2019-09-16 13:41 [PATCH V4 00/14] TopDown metrics support for Icelake kan.liang
                   ` (10 preceding siblings ...)
  2019-09-16 13:41 ` [PATCH V4 11/14] perf/x86/intel: Name global status bit in NMI handler kan.liang
@ 2019-09-16 13:41 ` kan.liang
  2019-09-16 13:41 ` [PATCH V4 13/14] perf, tools, stat: Support new per thread TopDown metrics kan.liang
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 26+ messages in thread
From: kan.liang @ 2019-09-16 13:41 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The RDPMC index is always re-calculated in RDPMC userspace support,
especially for fixed counters.

The RDPMC index value is stored in variable event_base_rdpmc for kernel
usage, which can be used for RDPMC userspace support as well. Only the
metrics event has to be specially handled.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/core.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 350c21aaea61..10ce4c381425 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2197,20 +2197,16 @@ static void x86_pmu_event_unmapped(struct perf_event *event, struct mm_struct *m
 
 static int x86_pmu_event_idx(struct perf_event *event)
 {
-	int idx = event->hw.idx;
+	struct hw_perf_event *hwc = &event->hw;
 
-	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
+	if (!(hwc->flags & PERF_X86_EVENT_RDPMC_ALLOWED))
 		return 0;
 
 	/* Return PERF_METRICS MSR value for metrics event */
-	if (is_metric_idx(idx))
-		idx = 1 << 29;
-	else if (x86_pmu.num_counters_fixed && idx >= INTEL_PMC_IDX_FIXED) {
-		idx -= INTEL_PMC_IDX_FIXED;
-		idx |= 1 << 30;
-	}
-
-	return idx + 1;
+	if (is_metric_idx(hwc->idx))
+		return (1 << 29) + 1;
+	else
+		return hwc->event_base_rdpmc + 1;
 }
 
 static ssize_t get_attr_rdpmc(struct device *cdev,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V4 13/14] perf, tools, stat: Support new per thread TopDown metrics
  2019-09-16 13:41 [PATCH V4 00/14] TopDown metrics support for Icelake kan.liang
                   ` (11 preceding siblings ...)
  2019-09-16 13:41 ` [PATCH V4 12/14] perf/x86: Use event_base_rdpmc for RDPMC userspace support kan.liang
@ 2019-09-16 13:41 ` kan.liang
  2019-09-16 13:41 ` [PATCH V4 14/14] perf, tools: Add documentation for topdown metrics kan.liang
  2019-09-30 12:48 ` [PATCH V4 00/14] TopDown metrics support for Icelake Liang, Kan
  14 siblings, 0 replies; 26+ messages in thread
From: kan.liang @ 2019-09-16 13:41 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

Icelake has support for reporting per thread TopDown metrics.
These are reported differently than the previous TopDown support,
each metric is standalone, but scaled to pipeline "slots".
We don't need to do anything special for HyperThreading anymore.
Teach perf stat --topdown to handle these new metrics and
print them in the same way as the previous TopDown metrics.
The restrictions of only being able to report information per core is
gone.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

No changes since V3

 tools/perf/Documentation/perf-stat.txt |  9 ++-
 tools/perf/builtin-stat.c              | 24 +++++++
 tools/perf/util/stat-shadow.c          | 89 ++++++++++++++++++++++++++
 tools/perf/util/stat.c                 |  4 ++
 tools/perf/util/stat.h                 |  8 +++
 5 files changed, 132 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 930c51c01201..f90101637c76 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -279,8 +279,13 @@ if the workload is actually bound by the CPU and not by something else.
 For best results it is usually a good idea to use it with interval
 mode like -I 1000, as the bottleneck of workloads can change often.
 
-The top down metrics are collected per core instead of per
-CPU thread. Per core mode is automatically enabled
+This enables --metric-only, unless overridden with --no-metric-only.
+
+The following restrictions only apply to older Intel CPUs and Atom,
+on newer CPUs (IceLake and later) TopDown can be collected for any thread:
+
+The top down metrics are collected per core instead of per CPU thread.
+Per core mode is automatically enabled
 and -a (global monitoring) is needed, requiring root rights or
 perf.perf_event_paranoid=-1.
 
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 7e17bf9f700a..e63f6c06db30 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -124,6 +124,14 @@ static const char * topdown_attrs[] = {
 	NULL,
 };
 
+static const char *topdown_metric_attrs[] = {
+	"topdown-retiring",
+	"topdown-bad-spec",
+	"topdown-fe-bound",
+	"topdown-be-bound",
+	NULL,
+};
+
 static const char *smi_cost_attrs = {
 	"{"
 	"msr/aperf/,"
@@ -1331,6 +1339,21 @@ static int add_default_attributes(void)
 		char *str = NULL;
 		bool warn = false;
 
+		if (topdown_filter_events(topdown_metric_attrs, &str, 1) < 0) {
+			pr_err("Out of memory\n");
+			return -1;
+		}
+		if (topdown_metric_attrs[0] && str) {
+			if (!stat_config.interval) {
+				fprintf(stat_config.output,
+					"Topdown accuracy may decreases when measuring long period.\n"
+					"Please print the result regularly, e.g. -I1000\n");
+			}
+			goto setup_metrics;
+		}
+
+		str = NULL;
+
 		if (stat_config.aggr_mode != AGGR_GLOBAL &&
 		    stat_config.aggr_mode != AGGR_CORE) {
 			pr_err("top down event configuration requires --per-core mode\n");
@@ -1352,6 +1375,7 @@ static int add_default_attributes(void)
 		if (topdown_attrs[0] && str) {
 			if (warn)
 				arch_topdown_group_warn();
+setup_metrics:
 			err = parse_events(evsel_list, str, &errinfo);
 			if (err) {
 				fprintf(stderr,
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 70c87fdb2a43..0be18b10a96b 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -243,6 +243,18 @@ void perf_stat__update_shadow_stats(struct evsel *counter, u64 count,
 	else if (perf_stat_evsel__is(counter, TOPDOWN_RECOVERY_BUBBLES))
 		update_runtime_stat(st, STAT_TOPDOWN_RECOVERY_BUBBLES,
 				    ctx, cpu, count);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_RETIRING))
+		update_runtime_stat(st, STAT_TOPDOWN_RETIRING,
+				    ctx, cpu, count);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_BAD_SPEC))
+		update_runtime_stat(st, STAT_TOPDOWN_BAD_SPEC,
+				    ctx, cpu, count);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_FE_BOUND))
+		update_runtime_stat(st, STAT_TOPDOWN_FE_BOUND,
+				    ctx, cpu, count);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_BE_BOUND))
+		update_runtime_stat(st, STAT_TOPDOWN_BE_BOUND,
+				    ctx, cpu, count);
 	else if (perf_evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
 		update_runtime_stat(st, STAT_STALLED_CYCLES_FRONT,
 				    ctx, cpu, count);
@@ -694,6 +706,47 @@ static double td_be_bound(int ctx, int cpu, struct runtime_stat *st)
 	return sanitize_val(1.0 - sum);
 }
 
+/*
+ * Kernel reports metrics multiplied with slots. To get back
+ * the ratios we need to recreate the sum.
+ */
+
+static double td_metric_ratio(int ctx, int cpu,
+			      enum stat_type type,
+			      struct runtime_stat *stat)
+{
+	double sum = runtime_stat_avg(stat, STAT_TOPDOWN_RETIRING, ctx, cpu) +
+		runtime_stat_avg(stat, STAT_TOPDOWN_FE_BOUND, ctx, cpu) +
+		runtime_stat_avg(stat, STAT_TOPDOWN_BE_BOUND, ctx, cpu) +
+		runtime_stat_avg(stat, STAT_TOPDOWN_BAD_SPEC, ctx, cpu);
+	double d = runtime_stat_avg(stat, type, ctx, cpu);
+
+	if (sum)
+		return d / sum;
+	return 0;
+}
+
+/*
+ * ... but only if most of the values are actually available.
+ * We allow two missing.
+ */
+
+static bool full_td(int ctx, int cpu,
+		    struct runtime_stat *stat)
+{
+	int c = 0;
+
+	if (runtime_stat_avg(stat, STAT_TOPDOWN_RETIRING, ctx, cpu) > 0)
+		c++;
+	if (runtime_stat_avg(stat, STAT_TOPDOWN_BE_BOUND, ctx, cpu) > 0)
+		c++;
+	if (runtime_stat_avg(stat, STAT_TOPDOWN_FE_BOUND, ctx, cpu) > 0)
+		c++;
+	if (runtime_stat_avg(stat, STAT_TOPDOWN_BAD_SPEC, ctx, cpu) > 0)
+		c++;
+	return c >= 2;
+}
+
 static void print_smi_cost(struct perf_stat_config *config,
 			   int cpu, struct evsel *evsel,
 			   struct perf_stat_output_ctx *out,
@@ -1025,6 +1078,42 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 					be_bound * 100.);
 		else
 			print_metric(config, ctxp, NULL, NULL, name, 0);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_RETIRING) &&
+			full_td(ctx, cpu, st)) {
+		double retiring = td_metric_ratio(ctx, cpu,
+						  STAT_TOPDOWN_RETIRING, st);
+
+		if (retiring > 0.7)
+			color = PERF_COLOR_GREEN;
+		print_metric(config, ctxp, color, "%8.1f%%", "retiring",
+				retiring * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_FE_BOUND) &&
+			full_td(ctx, cpu, st)) {
+		double fe_bound = td_metric_ratio(ctx, cpu,
+						  STAT_TOPDOWN_FE_BOUND, st);
+
+		if (fe_bound > 0.2)
+			color = PERF_COLOR_RED;
+		print_metric(config, ctxp, color, "%8.1f%%", "frontend bound",
+				fe_bound * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_BE_BOUND) &&
+			full_td(ctx, cpu, st)) {
+		double be_bound = td_metric_ratio(ctx, cpu,
+						  STAT_TOPDOWN_BE_BOUND, st);
+
+		if (be_bound > 0.2)
+			color = PERF_COLOR_RED;
+		print_metric(config, ctxp, color, "%8.1f%%", "backend bound",
+				be_bound * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_BAD_SPEC) &&
+			full_td(ctx, cpu, st)) {
+		double bad_spec = td_metric_ratio(ctx, cpu,
+						  STAT_TOPDOWN_BAD_SPEC, st);
+
+		if (bad_spec > 0.1)
+			color = PERF_COLOR_RED;
+		print_metric(config, ctxp, color, "%8.1f%%", "bad speculation",
+				bad_spec * 100.);
 	} else if (evsel->metric_expr) {
 		generic_metric(config, evsel->metric_expr, evsel->metric_events, evsel->name,
 				evsel->metric_name, NULL, avg, cpu, out, st);
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 8f1ea27f976f..a6201ed26457 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -94,6 +94,10 @@ static const char *id_str[PERF_STAT_EVSEL_ID__MAX] = {
 	ID(TOPDOWN_SLOTS_RETIRED, topdown-slots-retired),
 	ID(TOPDOWN_FETCH_BUBBLES, topdown-fetch-bubbles),
 	ID(TOPDOWN_RECOVERY_BUBBLES, topdown-recovery-bubbles),
+	ID(TOPDOWN_RETIRING, topdown-retiring),
+	ID(TOPDOWN_BAD_SPEC, topdown-bad-spec),
+	ID(TOPDOWN_FE_BOUND, topdown-fe-bound),
+	ID(TOPDOWN_BE_BOUND, topdown-be-bound),
 	ID(SMI_NUM, msr/smi/),
 	ID(APERF, msr/aperf/),
 };
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index 14fe3e548229..2749b540218b 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -27,6 +27,10 @@ enum perf_stat_evsel_id {
 	PERF_STAT_EVSEL_ID__TOPDOWN_SLOTS_RETIRED,
 	PERF_STAT_EVSEL_ID__TOPDOWN_FETCH_BUBBLES,
 	PERF_STAT_EVSEL_ID__TOPDOWN_RECOVERY_BUBBLES,
+	PERF_STAT_EVSEL_ID__TOPDOWN_RETIRING,
+	PERF_STAT_EVSEL_ID__TOPDOWN_BAD_SPEC,
+	PERF_STAT_EVSEL_ID__TOPDOWN_FE_BOUND,
+	PERF_STAT_EVSEL_ID__TOPDOWN_BE_BOUND,
 	PERF_STAT_EVSEL_ID__SMI_NUM,
 	PERF_STAT_EVSEL_ID__APERF,
 	PERF_STAT_EVSEL_ID__MAX,
@@ -80,6 +84,10 @@ enum stat_type {
 	STAT_TOPDOWN_SLOTS_RETIRED,
 	STAT_TOPDOWN_FETCH_BUBBLES,
 	STAT_TOPDOWN_RECOVERY_BUBBLES,
+	STAT_TOPDOWN_RETIRING,
+	STAT_TOPDOWN_BAD_SPEC,
+	STAT_TOPDOWN_FE_BOUND,
+	STAT_TOPDOWN_BE_BOUND,
 	STAT_SMI_NUM,
 	STAT_APERF,
 	STAT_MAX
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH V4 14/14] perf, tools: Add documentation for topdown metrics
  2019-09-16 13:41 [PATCH V4 00/14] TopDown metrics support for Icelake kan.liang
                   ` (12 preceding siblings ...)
  2019-09-16 13:41 ` [PATCH V4 13/14] perf, tools, stat: Support new per thread TopDown metrics kan.liang
@ 2019-09-16 13:41 ` kan.liang
  2019-09-30 12:48 ` [PATCH V4 00/14] TopDown metrics support for Icelake Liang, Kan
  14 siblings, 0 replies; 26+ messages in thread
From: kan.liang @ 2019-09-16 13:41 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

Add some documentation how to use the topdown metrics in ring 3.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

No changes since V3

 tools/perf/Documentation/topdown.txt | 223 +++++++++++++++++++++++++++
 1 file changed, 223 insertions(+)
 create mode 100644 tools/perf/Documentation/topdown.txt

diff --git a/tools/perf/Documentation/topdown.txt b/tools/perf/Documentation/topdown.txt
new file mode 100644
index 000000000000..e82a74fa9243
--- /dev/null
+++ b/tools/perf/Documentation/topdown.txt
@@ -0,0 +1,223 @@
+Using TopDown metrics in user space
+-----------------------------------
+
+Intel CPUs (since Sandy Bridge and Silvermont) support a TopDown
+methology to break down CPU pipeline execution into 4 bottlenecks:
+frontend bound, backend bound, bad speculation, retiring.
+
+For more details on Topdown see [1][5]
+
+Traditionally this was implemented by events in generic counters
+and specific formulas to compute the bottlenecks.
+
+perf stat --topdown implements this.
+
+% perf stat -a --topdown -I1000
+#           time             counts unit events
+     1.000373951      8,460,978,609      topdown-retiring          #     22.9% retiring
+     1.000373951      3,445,383,303      topdown-bad-spec          #      9.3% bad speculation
+     1.000373951     15,886,483,355      topdown-fe-bound          #     43.0% frontend bound
+     1.000373951      9,163,488,720      topdown-be-bound          #     24.8% backend bound
+     2.000782154      8,477,925,431      topdown-retiring          #     22.9% retiring
+     2.000782154      3,459,151,256      topdown-bad-spec          #      9.3% bad speculation
+     2.000782154     15,947,224,725      topdown-fe-bound          #     43.0% frontend bound
+     2.000782154      9,145,551,695      topdown-be-bound          #     24.7% backend bound
+     3.001155967      8,487,323,125      topdown-retiring          #     22.9% retiring
+     3.001155967      3,451,808,066      topdown-bad-spec          #      9.3% bad speculation
+     3.001155967     15,959,068,902      topdown-fe-bound          #     43.0% frontend bound
+     3.001155967      9,172,479,784      topdown-be-bound          #     24.7% backend bound
+...
+
+Full Top Down includes more levels that can break down the
+bottlenecks further. This is not directly implemented in perf,
+but available in other tools that can run on top of perf,
+such as toplev[2] or vtune[3]
+
+New Topdown features in Icelake
+===============================
+
+With Icelake CPUs the TopDown metrics are directly available as
+fixed counters and do not require generic counters. This allows
+to collect TopDown always in addition to other events.
+
+This also enables measuring TopDown per thread/process instead
+of only per core.
+
+Using TopDown through RDPMC in applications on Icelake
+======================================================
+
+For more fine grained measurements it can be useful to
+access the new  directly from user space. This is more complicated,
+but drastically lowers overhead.
+
+On Icelake, there is a new fixed counter 3: SLOTS, which reports
+"pipeline SLOTS" (cycles multiplied by core issue width) and a
+metric register that reports slots ratios for the different bottleneck
+categories.
+
+The metrics counter is CPU model specific and is not be available
+on older CPUs.
+
+Example code
+============
+
+Library functions to do the functionality described below
+is also available in libjevents [4]
+
+The application opens a perf_event file descriptor
+and sets up fixed counter 3 (SLOTS) to start and
+allow user programs to read the performance counters.
+
+Fixed counter 3 is mapped to a pseudo event event=0x00, umask=04,
+so the perf_event_attr structure should be initialized with
+{ .config = 0x0400, .type = PERF_TYPE_RAW }
+
+#include <linux/perf_event.h>
+#include <sys/syscall.h>
+#include <unistd.h>
+
+/* Provide own perf_event_open stub because glibc doesn't */
+__attribute__((weak))
+int perf_event_open(struct perf_event_attr *attr, pid_t pid,
+		    int cpu, int group_fd, unsigned long flags)
+{
+	return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
+}
+
+/* open slots counter file descriptor for current task */
+struct perf_event_attr slots = {
+	.type = PERF_TYPE_RAW,
+	.size = sizeof(struct perf_event_attr),
+	.config = 0x400,
+	.exclude_kernel = 1,
+};
+
+int fd = perf_event_open(&slots, 0, -1, -1, 0);
+if (fd < 0)
+	... error ...
+
+The RDPMC instruction (or _rdpmc compiler intrinsic) can now be used
+to read slots and the topdown metrics at different points of the program:
+
+#include <stdint.h>
+#include <x86intrin.h>
+
+#define RDPMC_FIXED	(1 << 30)	/* return fixed counters */
+#define RDPMC_METRIC	(1 << 29)	/* return metric counters */
+
+#define FIXED_COUNTER_SLOTS		3
+#define METRIC_COUNTER_TOPDOWN_L1	0
+
+static inline uint64_t read_slots(void)
+{
+	return _rdpmc(RDPMC_FIXED | FIXED_COUNTER_SLOTS);
+}
+
+static inline uint64_t read_metrics(void)
+{
+	return _rdpmc(RDPMC_METRIC | METRIC_COUNTER_TOPDOWN_L1);
+}
+
+Then the program can be instrumented to read these metrics at different
+points.
+
+It's not a good idea to do this with too short code regions,
+as the parallelism and overlap in the CPU program execution will
+cause too much measurement inaccuracy. For example instrumenting
+individual basic blocks is definitely too fine grained.
+
+Decoding metrics values
+=======================
+
+The value reported by read_metrics() contains four 8 bit fields
+that represent a scaled ratio that represent the Level 1 bottleneck.
+All four fields add up to 0xff (= 100%)
+
+The binary ratios in the metric value can be converted to float ratios:
+
+#define GET_METRIC(m, i) (((m) >> (i*8)) & 0xff)
+
+#define TOPDOWN_RETIRING(val)	((float)GET_METRIC(val, 0) / 0xff)
+#define TOPDOWN_BAD_SPEC(val)	((float)GET_METRIC(val, 1) / 0xff)
+#define TOPDOWN_FE_BOUND(val)	((float)GET_METRIC(val, 2) / 0xff)
+#define TOPDOWN_BE_BOUND(val)	((float)GET_METRIC(val, 3) / 0xff)
+
+and then converted to percent for printing.
+
+The ratios in the metric accumulate for the time when the counter
+is enabled. For measuring programs it is often useful to measure
+specific sections. For this it is needed to deltas on metrics.
+
+This can be done by scaling the metrics with the slots counter
+read at the same time.
+
+Then it's possible to take deltas of these slots counts
+measured at different points, and determine the metrics
+for that time period.
+
+	slots_a = read_slots();
+	metric_a = read_metrics();
+
+	... larger code region ...
+
+	slots_b = read_slots()
+	metric_b = read_metrics()
+
+	# compute scaled metrics for measurement a
+	retiring_slots_a = GET_METRIC(metric_a, 0) * slots_a
+	bad_spec_slots_a = GET_METRIC(metric_a, 1) * slots_a
+	fe_bound_slots_a = GET_METRIC(metric_a, 2) * slots_a
+	be_bound_slots_a = GET_METRIC(metric_a, 3) * slots_a
+
+	# compute delta scaled metrics between b and a
+	retiring_slots = GET_METRIC(metric_b, 0) * slots_b - retiring_slots_a
+	bad_spec_slots = GET_METRIC(metric_b, 1) * slots_b - bad_spec_slots_a
+	fe_bound_slots = GET_METRIC(metric_b, 2) * slots_b - fe_bound_slots_a
+	be_bound_slots = GET_METRIC(metric_b, 3) * slots_b - be_bound_slots_a
+
+Later the individual ratios for the measurement period can be recreated
+from these counts.
+
+	slots_delta = slots_b - slots_a
+	retiring_ratio = (float)retiring_slots / slots_delta
+	bad_spec_ratio = (float)bad_spec_slots / slots_delta
+	fe_bound_ratio = (float)fe_bound_slots / slots_delta
+	be_bound_ratio = (float)be_bound_slots / slota_delta
+
+	printf("Retiring %.2f%% Bad Speculation %.2f%% FE Bound %.2f%% BE Bound %.2f%%\n",
+		retiring_ratio * 100.,
+		bad_spec_ratio * 100.,
+		fe_bound_ratio * 100.,
+		be_bound_ratio * 100.);
+
+Resetting metrics counters
+==========================
+
+Since the individual metrics are only 8bit they lose precision for
+short regions over time because the number of cycles covered by each
+fraction bit shrinks. So the counters need to be reset regularly.
+
+When using the kernel perf API the kernel resets on every read.
+So as long as the reading is at reasonable intervals (every few
+seconds) the precision is good.
+
+When using perf stat it is recommended to always use the -I option,
+with no longer interval than a few seconds
+
+	perf stat -I 1000 --topdown ...
+
+For user programs using RDPMC directly the counter can
+be reset explicitly using ioctl:
+
+	ioctl(perf_fd, PERF_EVENT_IOC_RESET, 0);
+
+This "opens" a new measurement period.
+
+A program using RDPMC for TopDown should schedule such a reset
+regularly, as in every few seconds.
+
+[1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win
+[2] https://github.com/andikleen/pmu-tools/wiki/toplev-manual
+[3] https://software.intel.com/en-us/intel-vtune-amplifier-xe
+[4] https://github.com/andikleen/pmu-tools/tree/master/jevents
+[5] https://sites.google.com/site/analysismethods/yasin-pubs
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 00/14] TopDown metrics support for Icelake
  2019-09-16 13:41 [PATCH V4 00/14] TopDown metrics support for Icelake kan.liang
                   ` (13 preceding siblings ...)
  2019-09-16 13:41 ` [PATCH V4 14/14] perf, tools: Add documentation for topdown metrics kan.liang
@ 2019-09-30 12:48 ` Liang, Kan
  14 siblings, 0 replies; 26+ messages in thread
From: Liang, Kan @ 2019-09-30 12:48 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: tglx, jolsa, eranian, alexander.shishkin, ak

Hi Peter,

Could you please take a look at the patch set?

Thanks,
Kan

On 9/16/2019 9:41 AM, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> Icelake has support for measuring the level 1 TopDown metrics
> directly in hardware. This is implemented by an additional METRICS
> register, and a new Fixed Counter 3 that measures pipeline SLOTS.
> 
> For the Ice Lake implementation of performance metrics, software
> should start both PERF_METRICS and fixed counter 3 from zero.
> Additionally, for certain scenarios that involve counting metrics at
> high rates, software is recommended to periodically clear both in
> order to maintain accurate measurements.
> 
> IA32_PERF_GLOBAL_STATUS. OVF_PERF_METRICS[48]: If this bit is set,
> it indicates that some PERF_METRICS-related counter has overflowed and
> a PMI is triggered. It is recommended to clear PERF_METRICS as well as
> fixed counter 3 in such case. However, an overflow of fixed counter 3
> should normally happen first. If this bit is clear, no such overflow has
> occurred.
> 
> New in Icelake
> - Do not require generic counters. This allows to collect TopDown always
>    in addition to other events.
> - Measuring TopDown per thread/process instead of only per core
> 
> Limitation
> - To get accurate result and avoid reading the METRICS register multiple
>    times, the TopDown metrics events and SLOTS event have to be in the
>    same group.
> - METRICS and SLOTS registers have to be cleared after each read by SW.
>    That is to prevent the lose of precision.
> - Cannot do sampling read SLOTS and TopDown metric events
> 
> Please refer SDM Vol3, 18.3.9.3 Performance Metrics for the details of
> TopDown metrics.
> 
> Changes since V3:
> - Separate fixed counter3 definition patch
> - Separate BTS index patch
> - Apply Peter's cleanup patch
> - Fix the name of perf capabilities for perf METRICS
> - Apply patch for mul_u64_u32_div() x86_64 implementation
> - Fix unconditionally allows collecting 4 extra events
> - Add patch to clean up NMI handler by naming global status bit
> - Add patch to reuse event_base_rdpmc for RDPMC userspace support
> 
> Changes since V2:
> - Rebase on top of v5.3-rc1
> 
> Key changes since V1:
> - Remove variables for reg_idx and enabled_events[] array.
>    The reg_idx can be calculated by idx in runtime.
>    Using existing active_mask to replace enabled_events.
> - Choose value 47 for the fixed index of BTS.
> - Support OVF_PERF_METRICS overflow bit in PMI handler
> - Drops the caching mechanism and related variables
>    New mechanism is to update all active slots/metrics events for the
>    first slots/metrics events in a group. For each group reading, it
>    still only read the slots/perf_metrics MSR once
> - Disable PMU for read of topdown events to avoid the NMI issue
> - Move RDPMC support to a separate patch
> - Using event=0x00,umask=0x1X for topdown metrics events
> - Drop the patch which add REMOVE transaction
>    We can indicate x86_pmu_stop() by checking
>    (event && !test_bit(event->hw.idx, cpuc->active_mask)),
>    which is a good place to save the slots/metrics MSR value
> 
> Andi Kleen (2):
>    perf, tools, stat: Support new per thread TopDown metrics
>    perf, tools: Add documentation for topdown metrics
> 
> Kan Liang (11):
>    perf/x86/intel: Introduce the fourth fixed counter
>    perf/x86/intel: Set correct mask for TOPDOWN.SLOTS
>    perf/x86/intel: Move BTS index to 47
>    perf/x86/intel: Basic support for metrics counters
>    perf/x86/intel: Fix the name of perf capabilities for perf METRICS
>    perf/x86/intel: Support hardware TopDown metrics
>    perf/x86/intel: Support per thread RDPMC TopDown metrics
>    perf/x86/intel: Export TopDown events for Icelake
>    perf/x86/intel: Disable sampling read slots and topdown
>    perf/x86/intel: Name global status bit in NMI handler
>    perf/x86: Use event_base_rdpmc for RDPMC userspace support
> 
> Peter Zijlstra (Intel) (1):
>    x86/math64: Provide a sane mul_u64_u32_div() implementation for x86_64
> 
>   arch/x86/events/core.c                 |  80 ++++-
>   arch/x86/events/intel/core.c           | 421 +++++++++++++++++++++++--
>   arch/x86/events/perf_event.h           |  57 +++-
>   arch/x86/include/asm/div64.h           |  13 +
>   arch/x86/include/asm/msr-index.h       |   3 +
>   arch/x86/include/asm/perf_event.h      |  48 ++-
>   include/linux/perf_event.h             |   3 +
>   tools/perf/Documentation/perf-stat.txt |   9 +-
>   tools/perf/Documentation/topdown.txt   | 223 +++++++++++++
>   tools/perf/builtin-stat.c              |  24 ++
>   tools/perf/util/stat-shadow.c          |  89 ++++++
>   tools/perf/util/stat.c                 |   4 +
>   tools/perf/util/stat.h                 |   8 +
>   13 files changed, 924 insertions(+), 58 deletions(-)
>   create mode 100644 tools/perf/Documentation/topdown.txt
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 07/14] perf/x86/intel: Support hardware TopDown metrics
  2019-09-16 13:41 ` [PATCH V4 07/14] perf/x86/intel: Support hardware TopDown metrics kan.liang
@ 2019-09-30 13:06   ` Peter Zijlstra
  2019-09-30 14:07     ` Peter Zijlstra
  2019-09-30 13:36   ` Peter Zijlstra
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2019-09-30 13:06 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Mon, Sep 16, 2019 at 06:41:21AM -0700, kan.liang@linux.intel.com wrote:
> Reset
> ======
> 
> Both PERF_METRICS and fixed counter 3 are required to start from zero.

explain *why*, also in the comments.

> However, current perf has -max_period as default initial value.
> The patch force initial value as 0 for topdown and slots event counting.
> 
> Additionally, for certain scenarios that involve counting metrics at
> high rates, SW is required to periodically clear both MSRs in order to
> maintain accurate measurements. In this patch, both PERF_METRICS and
> Fixed counter 3 are reset for each read.

That forgoes all the good details again :/ 


> Originally-by: Andi Kleen <ak@linux.intel.com>

This is of dubius value here and in that other patch. In that other
patch I've written more lines than Andi has and here you have pretty
much rewritten everything since v1 or so.

> +static bool is_first_topdown_event_in_group(struct perf_event *event)
> +{
> +	struct perf_event *first = NULL;
> +
> +	if (is_topdown_event(event->group_leader))
> +		first = event->group_leader;
> +	else {
> +		for_each_sibling_event(first, event->group_leader)
> +			if (is_topdown_event(first))
> +				break;
> +	}
> +
> +	if (event == first)
> +		return true;
> +
> +	return false;
> +}

> +static u64 icl_update_topdown_event(struct perf_event *event)
> +{
> +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> +	struct perf_event *other;
> +	u64 slots, metrics;
> +	int idx;
> +
> +	/*
> +	 * Only need to update all events for the first
> +	 * slots/metrics event in a group
> +	 */
> +	if (event && !is_first_topdown_event_in_group(event))
> +		return 0;

This is pretty crap and approaches O(n^2); let me think if there's
anything saner to do here.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 07/14] perf/x86/intel: Support hardware TopDown metrics
  2019-09-16 13:41 ` [PATCH V4 07/14] perf/x86/intel: Support hardware TopDown metrics kan.liang
  2019-09-30 13:06   ` Peter Zijlstra
@ 2019-09-30 13:36   ` Peter Zijlstra
  1 sibling, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2019-09-30 13:36 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Mon, Sep 16, 2019 at 06:41:21AM -0700, kan.liang@linux.intel.com wrote:
> +static int add_nr_metric_event(struct cpu_hw_events *cpuc,
> +			       struct perf_event *event,
> +			       int *max_count)
> +{
> +	/* There are 4 TopDown metrics events. */
> +	if (is_metric_event(event) && (++cpuc->n_metric_event > 4))
> +		return -EINVAL;
> +
> +	*max_count += cpuc->n_metric_event;

That's busted; it would increment with 1,2,3,4 for a total of 10.

> +
> +	return 0;
> +}

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 07/14] perf/x86/intel: Support hardware TopDown metrics
  2019-09-30 13:06   ` Peter Zijlstra
@ 2019-09-30 14:07     ` Peter Zijlstra
  2019-09-30 14:53       ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2019-09-30 14:07 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Mon, Sep 30, 2019 at 03:06:15PM +0200, Peter Zijlstra wrote:
> On Mon, Sep 16, 2019 at 06:41:21AM -0700, kan.liang@linux.intel.com wrote:

> > +static bool is_first_topdown_event_in_group(struct perf_event *event)
> > +{
> > +	struct perf_event *first = NULL;
> > +
> > +	if (is_topdown_event(event->group_leader))
> > +		first = event->group_leader;
> > +	else {
> > +		for_each_sibling_event(first, event->group_leader)
> > +			if (is_topdown_event(first))
> > +				break;
> > +	}
> > +
> > +	if (event == first)
> > +		return true;
> > +
> > +	return false;
> > +}
> 
> > +static u64 icl_update_topdown_event(struct perf_event *event)
> > +{
> > +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> > +	struct perf_event *other;
> > +	u64 slots, metrics;
> > +	int idx;
> > +
> > +	/*
> > +	 * Only need to update all events for the first
> > +	 * slots/metrics event in a group
> > +	 */
> > +	if (event && !is_first_topdown_event_in_group(event))
> > +		return 0;
> 
> This is pretty crap and approaches O(n^2); let me think if there's
> anything saner to do here.

This is also really complicated in the case where we do
perf_remove_from_context() in the 'wrong' order.

In that case we get detached events that are not up-to-date (and never
will be). It doesn't look like that matters, but it is weird.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 07/14] perf/x86/intel: Support hardware TopDown metrics
  2019-09-30 14:07     ` Peter Zijlstra
@ 2019-09-30 14:53       ` Peter Zijlstra
  2019-09-30 16:17         ` Liang, Kan
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2019-09-30 14:53 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Mon, Sep 30, 2019 at 04:07:55PM +0200, Peter Zijlstra wrote:
> On Mon, Sep 30, 2019 at 03:06:15PM +0200, Peter Zijlstra wrote:
> > On Mon, Sep 16, 2019 at 06:41:21AM -0700, kan.liang@linux.intel.com wrote:
> 
> > > +static bool is_first_topdown_event_in_group(struct perf_event *event)
> > > +{
> > > +	struct perf_event *first = NULL;
> > > +
> > > +	if (is_topdown_event(event->group_leader))
> > > +		first = event->group_leader;
> > > +	else {
> > > +		for_each_sibling_event(first, event->group_leader)
> > > +			if (is_topdown_event(first))
> > > +				break;
> > > +	}
> > > +
> > > +	if (event == first)
> > > +		return true;
> > > +
> > > +	return false;
> > > +}
> > 
> > > +static u64 icl_update_topdown_event(struct perf_event *event)
> > > +{
> > > +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> > > +	struct perf_event *other;
> > > +	u64 slots, metrics;
> > > +	int idx;
> > > +
> > > +	/*
> > > +	 * Only need to update all events for the first
> > > +	 * slots/metrics event in a group
> > > +	 */
> > > +	if (event && !is_first_topdown_event_in_group(event))
> > > +		return 0;
> > 
> > This is pretty crap and approaches O(n^2); let me think if there's
> > anything saner to do here.
> 
> This is also really complicated in the case where we do
> perf_remove_from_context() in the 'wrong' order.
> 
> In that case we get detached events that are not up-to-date (and never
> will be). It doesn't look like that matters, but it is weird.

So we either get called from the PMI, or read(). In the PMI there is the
perf_output_read_group() path, and that too appears broken vs the above,
it assumes perf_event_count() is up-to-date after calling pmu->read(),
which isn't true.

Now, I'm thinking that is already broken vs TXN_READ, so we should fix
that a little something like the below (needs to be tested on
Power-hv-24x7).

---
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6272,10 +6272,22 @@ static void perf_output_read_group(struc
 	if (read_format & PERF_FORMAT_TOTAL_TIME_RUNNING)
 		values[n++] = running;
 
+	if (leader->nr_siblings > 1)
+		leader->pmu->start_txn(pmu, PERF_PMU_TXN_READ);
+
 	if ((leader != event) &&
 	    (leader->state == PERF_EVENT_STATE_ACTIVE))
 		leader->pmu->read(leader);
 
+	for_each_sibling_event(sub, leader) {
+		if ((sub != event) &&
+		    (sub->state == PERF_EVENT_STATE_ACTIVE))
+			sub->pmu->read(sub);
+	}
+
+	if (leader->nr_siblings > 1)
+		leader->pmu->commit_tx(pmu, PERF_PMU_TXN_READ);
+
 	values[n++] = perf_event_count(leader);
 	if (read_format & PERF_FORMAT_ID)
 		values[n++] = primary_event_id(leader);
@@ -6285,10 +6297,6 @@ static void perf_output_read_group(struc
 	for_each_sibling_event(sub, leader) {
 		n = 0;
 
-		if ((sub != event) &&
-		    (sub->state == PERF_EVENT_STATE_ACTIVE))
-			sub->pmu->read(sub);
-
 		values[n++] = perf_event_count(sub);
 		if (read_format & PERF_FORMAT_ID)
 			values[n++] = primary_event_id(sub);


After that, I think we can simply do something like:

icl_update_topdown_event(..)
{
	int idx = event->hwc.idx;

	if (is_metric_idx(idx))
		return;

	// must be FIXED_SLOTS

	/* do teh thing and update SLOTS and METRIC together */
}

Hmmm?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 08/14] perf/x86/intel: Support per thread RDPMC TopDown metrics
  2019-09-16 13:41 ` [PATCH V4 08/14] perf/x86/intel: Support per thread RDPMC " kan.liang
@ 2019-09-30 15:52   ` Peter Zijlstra
  2019-09-30 18:18     ` Liang, Kan
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2019-09-30 15:52 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Mon, Sep 16, 2019 at 06:41:22AM -0700, kan.liang@linux.intel.com wrote:
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 71f3086a8adc..7ec0f350d2ac 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -2262,6 +2262,11 @@ static int icl_set_topdown_event_period(struct perf_event *event)
>  		local64_set(&hwc->period_left, 0);
>  	}
>  
> +	if ((hwc->saved_slots) && is_first_topdown_event_in_group(event)) {
> +		wrmsrl(MSR_CORE_PERF_FIXED_CTR3, hwc->saved_slots);
> +		wrmsrl(MSR_PERF_METRICS, hwc->saved_metric);
> +	}

> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 61448c19a132..c125068f2e16 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -133,6 +133,9 @@ struct hw_perf_event {
>  
>  			struct hw_perf_event_extra extra_reg;
>  			struct hw_perf_event_extra branch_reg;
> +
> +			u64		saved_slots;
> +			u64		saved_metric;
>  		};
>  		struct { /* software */
>  			struct hrtimer	hrtimer;

Normal counters save their counter value in hwc->period_left, why does
slots need a new word for that?

And since using METRIC means non-sampling, why can't we stick that
saved_metric field in one of the unused sampling fields?

ISTR asking this before...

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 07/14] perf/x86/intel: Support hardware TopDown metrics
  2019-09-30 14:53       ` Peter Zijlstra
@ 2019-09-30 16:17         ` Liang, Kan
  2019-09-30 16:21           ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Liang, Kan @ 2019-09-30 16:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak



On 9/30/2019 10:53 AM, Peter Zijlstra wrote:
> 
> After that, I think we can simply do something like:
> 
> icl_update_topdown_event(..)

We should call this function in x86_pmu_commit_txn()?
In intel_pmu_read_event(), we just simply return, when TXN_READ is set 
and is_topdown_count().

If so, it looks like we need add a per-cpu variable to store the topdown 
group events which user tries to read. Then we can update these events 
in x86_pmu_commit_txn(). The same method as Power does.

Is my understanding correct?

> {
> 	int idx = event->hwc.idx;
> 
> 	if (is_metric_idx(idx))
> 		return;
> 
> 	// must be FIXED_SLOTS

The FIXED_SLOTS may not be in the group.

Thanks,
Kan
> 
> 	/* do teh thing and update SLOTS and METRIC together */
> }
> 
> Hmmm?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 07/14] perf/x86/intel: Support hardware TopDown metrics
  2019-09-30 16:17         ` Liang, Kan
@ 2019-09-30 16:21           ` Peter Zijlstra
  2019-09-30 16:45             ` Liang, Kan
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2019-09-30 16:21 UTC (permalink / raw)
  To: Liang, Kan
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak

On Mon, Sep 30, 2019 at 12:17:57PM -0400, Liang, Kan wrote:
> 
> 
> On 9/30/2019 10:53 AM, Peter Zijlstra wrote:
> > 
> > After that, I think we can simply do something like:
> > 
> > icl_update_topdown_event(..)
> 
> We should call this function in x86_pmu_commit_txn()?

Nah, we could use that TXN_READ stuff, but I don't think we have to. We
just raely on having done all ->read() callbacks before doing
perf_event_count().

> > {
> > 	int idx = event->hwc.idx;
> > 
> > 	if (is_metric_idx(idx))
> > 		return;
> > 
> > 	// must be FIXED_SLOTS
> 
> The FIXED_SLOTS may not be in the group.

Argh.. can we mandate that it is? that is, if you want a metric thing,
you have to have a slots counter first?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 07/14] perf/x86/intel: Support hardware TopDown metrics
  2019-09-30 16:21           ` Peter Zijlstra
@ 2019-09-30 16:45             ` Liang, Kan
  2019-09-30 18:18               ` Andi Kleen
  0 siblings, 1 reply; 26+ messages in thread
From: Liang, Kan @ 2019-09-30 16:45 UTC (permalink / raw)
  To: Peter Zijlstra, ak
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin



On 9/30/2019 12:21 PM, Peter Zijlstra wrote:
>>> {
>>> 	int idx = event->hwc.idx;
>>>
>>> 	if (is_metric_idx(idx))
>>> 		return;
>>>
>>> 	// must be FIXED_SLOTS
>> The FIXED_SLOTS may not be in the group.
> Argh.. can we mandate that it is? that is, if you want a metric thing,
> you have to have a slots counter first?

If we mandate the FIXED_SLOTS in the group, we don't need the 
is_first_topdown_event_in_group() check. We just update everything for 
FIXED_SLOTS event. It will definitely simplify the code.

For perf sub-tool, I can add warning if users missed the SLOTs event or 
just implicitly add the event.

For RDPMC users, it looks like there is no way to warn them.
But they should always apply for both SLOTS and METRICS?

Andi, what do you think? Will it be a problem for RDPMC users?

Thanks,
Kan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 08/14] perf/x86/intel: Support per thread RDPMC TopDown metrics
  2019-09-30 15:52   ` Peter Zijlstra
@ 2019-09-30 18:18     ` Liang, Kan
  0 siblings, 0 replies; 26+ messages in thread
From: Liang, Kan @ 2019-09-30 18:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, mingo, linux-kernel, tglx, jolsa, eranian, alexander.shishkin, ak



On 9/30/2019 11:52 AM, Peter Zijlstra wrote:
> On Mon, Sep 16, 2019 at 06:41:22AM -0700, kan.liang@linux.intel.com wrote:
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index 71f3086a8adc..7ec0f350d2ac 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -2262,6 +2262,11 @@ static int icl_set_topdown_event_period(struct perf_event *event)
>>   		local64_set(&hwc->period_left, 0);
>>   	}
>>   
>> +	if ((hwc->saved_slots) && is_first_topdown_event_in_group(event)) {
>> +		wrmsrl(MSR_CORE_PERF_FIXED_CTR3, hwc->saved_slots);
>> +		wrmsrl(MSR_PERF_METRICS, hwc->saved_metric);
>> +	}
> 
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index 61448c19a132..c125068f2e16 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -133,6 +133,9 @@ struct hw_perf_event {
>>   
>>   			struct hw_perf_event_extra extra_reg;
>>   			struct hw_perf_event_extra branch_reg;
>> +
>> +			u64		saved_slots;
>> +			u64		saved_metric;
>>   		};
>>   		struct { /* software */
>>   			struct hrtimer	hrtimer;
> 
> Normal counters save their counter value in hwc->period_left, why does
> slots need a new word for that?
> 

We have two values which have to be stored. Only period_left is not enough.

> And since using METRIC means non-sampling, why can't we stick that
> saved_metric field in one of the unused sampling fields?
>

Yes, I think we can re-use last_period and period_left for saved_metric 
and saved_slots. I will change it in V5.


@@ -202,17 +199,26 @@ struct hw_perf_event {
  	 */
  	u64				sample_period;

-	/*
-	 * The period we started this sample with.
-	 */
-	u64				last_period;
+	union {
+		struct { /* Sampling */

-	/*
-	 * However much is left of the current period; note that this is
-	 * a full 64bit value and allows for generation of periods longer
-	 * than hardware might allow.
-	 */
-	local64_t			period_left;
+			/*
+			 * The period we started this sample with.
+			 */
+			u64				last_period;
+
+			/*
+			 * However much is left of the current period; note that this is
+			 * a full 64bit value and allows for generation of periods longer
+			 * than hardware might allow.
+			 */
+			local64_t			period_left;
+		};
+		struct { /* Topdown events counting for context switch*/
+			u64				saved_metric;
+			u64				saved_slots;
+		};
+	};


Thanks,
Kan

> ISTR asking this before...
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH V4 07/14] perf/x86/intel: Support hardware TopDown metrics
  2019-09-30 16:45             ` Liang, Kan
@ 2019-09-30 18:18               ` Andi Kleen
  0 siblings, 0 replies; 26+ messages in thread
From: Andi Kleen @ 2019-09-30 18:18 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Peter Zijlstra, acme, mingo, linux-kernel, tglx, jolsa, eranian,
	alexander.shishkin

> Andi, what do you think? Will it be a problem for RDPMC users?

Should be fine.

RDPMC users only need slots anyways, as they managed RDPMC on their own.

We can document it in the documentation. And they will see an 
error if they only enable a metrics event.

-Andi

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2019-09-30 20:40 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-16 13:41 [PATCH V4 00/14] TopDown metrics support for Icelake kan.liang
2019-09-16 13:41 ` [PATCH V4 01/14] perf/x86/intel: Introduce the fourth fixed counter kan.liang
2019-09-16 13:41 ` [PATCH V4 02/14] perf/x86/intel: Set correct mask for TOPDOWN.SLOTS kan.liang
2019-09-16 13:41 ` [PATCH V4 03/14] perf/x86/intel: Move BTS index to 47 kan.liang
2019-09-16 13:41 ` [PATCH V4 04/14] perf/x86/intel: Basic support for metrics counters kan.liang
2019-09-16 13:41 ` [PATCH V4 05/14] perf/x86/intel: Fix the name of perf capabilities for perf METRICS kan.liang
2019-09-16 13:41 ` [PATCH V4 06/14] x86/math64: Provide a sane mul_u64_u32_div() implementation for x86_64 kan.liang
2019-09-16 13:41 ` [PATCH V4 07/14] perf/x86/intel: Support hardware TopDown metrics kan.liang
2019-09-30 13:06   ` Peter Zijlstra
2019-09-30 14:07     ` Peter Zijlstra
2019-09-30 14:53       ` Peter Zijlstra
2019-09-30 16:17         ` Liang, Kan
2019-09-30 16:21           ` Peter Zijlstra
2019-09-30 16:45             ` Liang, Kan
2019-09-30 18:18               ` Andi Kleen
2019-09-30 13:36   ` Peter Zijlstra
2019-09-16 13:41 ` [PATCH V4 08/14] perf/x86/intel: Support per thread RDPMC " kan.liang
2019-09-30 15:52   ` Peter Zijlstra
2019-09-30 18:18     ` Liang, Kan
2019-09-16 13:41 ` [PATCH V4 09/14] perf/x86/intel: Export TopDown events for Icelake kan.liang
2019-09-16 13:41 ` [PATCH V4 10/14] perf/x86/intel: Disable sampling read slots and topdown kan.liang
2019-09-16 13:41 ` [PATCH V4 11/14] perf/x86/intel: Name global status bit in NMI handler kan.liang
2019-09-16 13:41 ` [PATCH V4 12/14] perf/x86: Use event_base_rdpmc for RDPMC userspace support kan.liang
2019-09-16 13:41 ` [PATCH V4 13/14] perf, tools, stat: Support new per thread TopDown metrics kan.liang
2019-09-16 13:41 ` [PATCH V4 14/14] perf, tools: Add documentation for topdown metrics kan.liang
2019-09-30 12:48 ` [PATCH V4 00/14] TopDown metrics support for Icelake Liang, Kan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).