linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V7 00/14] TopDown metrics support for Icelake
@ 2020-07-23 17:11 kan.liang
  2020-07-23 17:11 ` [PATCH V7 01/14] perf/x86: Use event_base_rdpmc for the RDPMC userspace support kan.liang
                   ` (13 more replies)
  0 siblings, 14 replies; 48+ messages in thread
From: kan.liang @ 2020-07-23 17:11 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: jolsa, eranian, alexander.shishkin, ak, like.xu, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Changes since V6:
- Correct the comments regarding to the pseudo umask-code of the fixed
  counter.
- Add a new PERF_EV_CAP_COEXIST event capability
- Update the code regarding to the nr_metric check
- Check the TopDown group in hw_config
  Drop the patch for x86_pmu.validate_group
- The pseudo umask-code of metrics event starts from the middle of the
  pseudo event space, 0x80.

Changes since V5:
- Rebase on top of tip perf/core branch
  commit c085fb877467 ("perf/x86/intel/lbr: Support XSAVES for arch LBR
  read")
- Add more cleanup codes, e.g. ("perf/x86/intel: Use switch in
  intel_pmu_disable/enable_event"). Use more macros to replace the magic
  number.
- Reorganize the patch set:
  - Move all cleanup related patches to the beginning of the patch set.
  - Split the TopDown metrics patch into a generic patch and a Icelake
    specific patch.
- Drop extra_reg. The TopDown metrics events and slots event don't
  support any filters.

Changes since V4:
- Add description regarding to event-code naming for fixed counters
- Fix add_nr_metric_event().
  For leader event, we have to take the accepted metrics events into
  account.
  For sibling event, it doesn't need to count accepted metrics events
  again.
- Remove is_first_topdown_event_in_group().
  Force slots in topdown group. Only update topdown events with slots
  event.
- Re-use last_period and period_left for saved_metric and saved_slots.

Changes since V3:
- Separate fixed counter3 definition patch
- Separate BTS index patch
- Apply Peter's cleanup patch
- Fix the name of perf capabilities for perf METRICS
- Apply patch for mul_u64_u32_div() x86_64 implementation
- Fix unconditionally allows collecting 4 extra events
- Add patch to clean up NMI handler by naming global status bit
- Add patch to reuse event_base_rdpmc for RDPMC userspace support

Changes since V2:
- Rebase on top of v5.3-rc1

Key changes since V1:
- Remove variables for reg_idx and enabled_events[] array.
  The reg_idx can be calculated by idx in runtime.
  Using existing active_mask to replace enabled_events.
- Choose value 47 for the fixed index of BTS.
- Support OVF_PERF_METRICS overflow bit in PMI handler
- Drops the caching mechanism and related variables
  New mechanism is to update all active slots/metrics events for the
  first slots/metrics events in a group. For each group reading, it
  still only read the slots/perf_metrics MSR once
- Disable PMU for read of topdown events to avoid the NMI issue
- Move RDPMC support to a separate patch
- Using event=0x00,umask=0x1X for topdown metrics events
- Drop the patch which add REMOVE transaction
  We can indicate x86_pmu_stop() by checking
  (event && !test_bit(event->hw.idx, cpuc->active_mask)),
  which is a good place to save the slots/metrics MSR value

Icelake has support for measuring the level 1 TopDown metrics
directly in hardware. This is implemented by an additional METRICS
register, and a new Fixed Counter 3 that measures pipeline SLOTS.

New in Icelake
- Do not require generic counters. This allows to collect TopDown always
  in addition to other events.
- Measuring TopDown per thread/process instead of only per core

For the Ice Lake implementation of performance metrics, the values in
PERF_METRICS MSR are derived from fixed counter 3. Software should start
both registers, PERF_METRICS and fixed counter 3, from zero.
Additionally, software is recommended to periodically clear both
registers in order to maintain accurate measurements. The latter is
required for certain scenarios that involve sampling metrics at high
rates. Software should always write fixed counter 3 before write to
PERF_METRICS.

IA32_PERF_GLOBAL_STATUS. OVF_PERF_METRICS[48]: If this bit is set,
it indicates that some PERF_METRICS-related counter has overflowed and
a PMI is triggered. Software has to synchronize, e.g. re-start,
PERF_METRICS as well as fixed counter 3. Otherwise, PERF_METRICS may
return invalid values.

Limitation
- To get accurate result and avoid reading the METRICS register multiple
  times, the TopDown metrics events and SLOTS event have to be in the
  same group.
- METRICS and SLOTS registers have to be cleared after each read by SW.
  That is to prevent the lose of precision.
- Cannot do sampling read SLOTS and TopDown metric events

Please refer SDM Vol3, 18.3.9.3 Performance Metrics for the details of
TopDown metrics.

Andi Kleen (2):
  perf, tools, stat: Support new per thread TopDown metrics
  perf, tools: Add documentation for topdown metrics

Kan Liang (12):
  perf/x86: Use event_base_rdpmc for the RDPMC userspace support
  perf/x86/intel: Name the global status bit in NMI handler
  perf/x86/intel: Introduce the fourth fixed counter
  perf/x86/intel: Move BTS index to 47
  perf/x86/intel: Fix the name of perf METRICS
  perf/x86/intel: Use switch in intel_pmu_disable/enable_event
  perf/core: Add a new PERF_EV_CAP_COEXIST event capability
  perf/x86/intel: Generic support for hardware TopDown  metrics
  perf/x86: Add a macro for RDPMC offset of fixed counters
  perf/x86/intel: Support TopDown metrics on Ice Lake
  perf/x86/intel: Support per-thread RDPMC TopDown metrics
  perf, tools, stat: Check Topdown Metric group

 arch/x86/events/core.c                 |  80 ++++--
 arch/x86/events/intel/core.c           | 363 ++++++++++++++++++++++++-
 arch/x86/events/perf_event.h           |  52 +++-
 arch/x86/include/asm/msr-index.h       |   3 +
 arch/x86/include/asm/perf_event.h      |  97 ++++++-
 include/linux/perf_event.h             |  34 ++-
 kernel/events/core.c                   |  52 +++-
 tools/perf/Documentation/perf-stat.txt |   9 +-
 tools/perf/Documentation/topdown.txt   | 235 ++++++++++++++++
 tools/perf/builtin-stat.c              |  97 +++++++
 tools/perf/util/stat-shadow.c          |  89 ++++++
 tools/perf/util/stat.c                 |   4 +
 tools/perf/util/stat.h                 |   8 +
 13 files changed, 1059 insertions(+), 64 deletions(-)
 create mode 100644 tools/perf/Documentation/topdown.txt

-- 
2.17.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH V7 01/14] perf/x86: Use event_base_rdpmc for the RDPMC userspace support
  2020-07-23 17:11 [PATCH V7 00/14] TopDown metrics support for Icelake kan.liang
@ 2020-07-23 17:11 ` kan.liang
  2020-08-19  8:52   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2020-07-23 17:11 ` [PATCH V7 02/14] perf/x86/intel: Name the global status bit in NMI handler kan.liang
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: kan.liang @ 2020-07-23 17:11 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: jolsa, eranian, alexander.shishkin, ak, like.xu, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The RDPMC index is always re-calculated for the RDPMC userspace support,
which is unnecessary.

The RDPMC index value is stored in the variable event_base_rdpmc for
the kernel usage, which can be used for RDPMC userspace support as well.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/core.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 1cbf57dc2ac8..8e108ea378e3 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2208,17 +2208,12 @@ static void x86_pmu_event_unmapped(struct perf_event *event, struct mm_struct *m
 
 static int x86_pmu_event_idx(struct perf_event *event)
 {
-	int idx = event->hw.idx;
+	struct hw_perf_event *hwc = &event->hw;
 
-	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
+	if (!(hwc->flags & PERF_X86_EVENT_RDPMC_ALLOWED))
 		return 0;
 
-	if (x86_pmu.num_counters_fixed && idx >= INTEL_PMC_IDX_FIXED) {
-		idx -= INTEL_PMC_IDX_FIXED;
-		idx |= 1 << 30;
-	}
-
-	return idx + 1;
+	return hwc->event_base_rdpmc + 1;
 }
 
 static ssize_t get_attr_rdpmc(struct device *cdev,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V7 02/14] perf/x86/intel: Name the global status bit in NMI handler
  2020-07-23 17:11 [PATCH V7 00/14] TopDown metrics support for Icelake kan.liang
  2020-07-23 17:11 ` [PATCH V7 01/14] perf/x86: Use event_base_rdpmc for the RDPMC userspace support kan.liang
@ 2020-07-23 17:11 ` kan.liang
  2020-08-19  8:52   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2020-07-23 17:11 ` [PATCH V7 03/14] perf/x86/intel: Introduce the fourth fixed counter kan.liang
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: kan.liang @ 2020-07-23 17:11 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: jolsa, eranian, alexander.shishkin, ak, like.xu, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Magic numbers are used in the current NMI handler for the global status
bit. Use a meaningful name to replace the magic numbers to improve the
readability of the code.

Remove a Tab for all GLOBAL_STATUS_* and INTEL_PMC_IDX_FIXED_BTS macros
to reduce the length of the line.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/core.c      |  4 ++--
 arch/x86/include/asm/perf_event.h | 22 ++++++++++++----------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 50963472ee85..ac1408fe1aee 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2389,7 +2389,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
 	/*
 	 * PEBS overflow sets bit 62 in the global status register
 	 */
-	if (__test_and_clear_bit(62, (unsigned long *)&status)) {
+	if (__test_and_clear_bit(GLOBAL_STATUS_BUFFER_OVF_BIT, (unsigned long *)&status)) {
 		u64 pebs_enabled = cpuc->pebs_enabled;
 
 		handled++;
@@ -2410,7 +2410,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
 	/*
 	 * Intel PT
 	 */
-	if (__test_and_clear_bit(55, (unsigned long *)&status)) {
+	if (__test_and_clear_bit(GLOBAL_STATUS_TRACE_TOPAPMI_BIT, (unsigned long *)&status)) {
 		handled++;
 		if (unlikely(perf_guest_cbs && perf_guest_cbs->is_in_guest() &&
 			perf_guest_cbs->handle_intel_pt_intr))
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 0c1b13720525..fd3eba65337f 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -225,16 +225,18 @@ struct x86_pmu_capability {
  * values are used by actual fixed events and higher values are used
  * to indicate other overflow conditions in the PERF_GLOBAL_STATUS msr.
  */
-#define INTEL_PMC_IDX_FIXED_BTS				(INTEL_PMC_IDX_FIXED + 16)
-
-#define GLOBAL_STATUS_COND_CHG				BIT_ULL(63)
-#define GLOBAL_STATUS_BUFFER_OVF			BIT_ULL(62)
-#define GLOBAL_STATUS_UNC_OVF				BIT_ULL(61)
-#define GLOBAL_STATUS_ASIF				BIT_ULL(60)
-#define GLOBAL_STATUS_COUNTERS_FROZEN			BIT_ULL(59)
-#define GLOBAL_STATUS_LBRS_FROZEN_BIT			58
-#define GLOBAL_STATUS_LBRS_FROZEN			BIT_ULL(GLOBAL_STATUS_LBRS_FROZEN_BIT)
-#define GLOBAL_STATUS_TRACE_TOPAPMI			BIT_ULL(55)
+#define INTEL_PMC_IDX_FIXED_BTS			(INTEL_PMC_IDX_FIXED + 16)
+
+#define GLOBAL_STATUS_COND_CHG			BIT_ULL(63)
+#define GLOBAL_STATUS_BUFFER_OVF_BIT		62
+#define GLOBAL_STATUS_BUFFER_OVF		BIT_ULL(GLOBAL_STATUS_BUFFER_OVF_BIT)
+#define GLOBAL_STATUS_UNC_OVF			BIT_ULL(61)
+#define GLOBAL_STATUS_ASIF			BIT_ULL(60)
+#define GLOBAL_STATUS_COUNTERS_FROZEN		BIT_ULL(59)
+#define GLOBAL_STATUS_LBRS_FROZEN_BIT		58
+#define GLOBAL_STATUS_LBRS_FROZEN		BIT_ULL(GLOBAL_STATUS_LBRS_FROZEN_BIT)
+#define GLOBAL_STATUS_TRACE_TOPAPMI_BIT		55
+#define GLOBAL_STATUS_TRACE_TOPAPMI		BIT_ULL(GLOBAL_STATUS_TRACE_TOPAPMI_BIT)
 
 /*
  * We model guest LBR event tracing as another fixed-mode PMC like BTS.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V7 03/14] perf/x86/intel: Introduce the fourth fixed counter
  2020-07-23 17:11 [PATCH V7 00/14] TopDown metrics support for Icelake kan.liang
  2020-07-23 17:11 ` [PATCH V7 01/14] perf/x86: Use event_base_rdpmc for the RDPMC userspace support kan.liang
  2020-07-23 17:11 ` [PATCH V7 02/14] perf/x86/intel: Name the global status bit in NMI handler kan.liang
@ 2020-07-23 17:11 ` kan.liang
  2020-08-19  8:52   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2020-07-23 17:11 ` [PATCH V7 04/14] perf/x86/intel: Move BTS index to 47 kan.liang
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: kan.liang @ 2020-07-23 17:11 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: jolsa, eranian, alexander.shishkin, ak, like.xu, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The fourth fixed counter, TOPDOWN.SLOTS, is introduced in Ice Lake to
measure the level 1 TopDown events.

Add MSR address and macros for the new fixed counter, which will be used
in a later patch.

Add comments to explain the event encoding rules for the fixed counters.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/include/asm/perf_event.h | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index fd3eba65337f..fe8110a8c75b 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -197,12 +197,24 @@ struct x86_pmu_capability {
  */
 
 /*
- * All 3 fixed-mode PMCs are configured via this single MSR:
+ * All the fixed-mode PMCs are configured via this single MSR:
  */
 #define MSR_ARCH_PERFMON_FIXED_CTR_CTRL	0x38d
 
 /*
- * The counts are available in three separate MSRs:
+ * There is no event-code assigned to the fixed-mode PMCs.
+ *
+ * For a fixed-mode PMC, which has an equivalent event on a general-purpose
+ * PMC, the event-code of the equivalent event is used for the fixed-mode PMC,
+ * e.g., Instr_Retired.Any and CPU_CLK_Unhalted.Core.
+ *
+ * For a fixed-mode PMC, which doesn't have an equivalent event, a
+ * pseudo-encoding is used, e.g., CPU_CLK_Unhalted.Ref and TOPDOWN.SLOTS.
+ * The pseudo event-code for a fixed-mode PMC must be 0x00.
+ * The pseudo umask-code is 0xX. The X equals the index of the fixed
+ * counter + 1, e.g., the fixed counter 2 has the pseudo-encoding 0x0300.
+ *
+ * The counts are available in separate MSRs:
  */
 
 /* Instr_Retired.Any: */
@@ -213,11 +225,16 @@ struct x86_pmu_capability {
 #define MSR_ARCH_PERFMON_FIXED_CTR1	0x30a
 #define INTEL_PMC_IDX_FIXED_CPU_CYCLES	(INTEL_PMC_IDX_FIXED + 1)
 
-/* CPU_CLK_Unhalted.Ref: */
+/* CPU_CLK_Unhalted.Ref: event=0x00,umask=0x3 (pseudo-encoding) */
 #define MSR_ARCH_PERFMON_FIXED_CTR2	0x30b
 #define INTEL_PMC_IDX_FIXED_REF_CYCLES	(INTEL_PMC_IDX_FIXED + 2)
 #define INTEL_PMC_MSK_FIXED_REF_CYCLES	(1ULL << INTEL_PMC_IDX_FIXED_REF_CYCLES)
 
+/* TOPDOWN.SLOTS: event=0x00,umask=0x4 (pseudo-encoding) */
+#define MSR_ARCH_PERFMON_FIXED_CTR3	0x30c
+#define INTEL_PMC_IDX_FIXED_SLOTS	(INTEL_PMC_IDX_FIXED + 3)
+#define INTEL_PMC_MSK_FIXED_SLOTS	(1ULL << INTEL_PMC_IDX_FIXED_SLOTS)
+
 /*
  * We model BTS tracing as another fixed-mode PMC.
  *
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V7 04/14] perf/x86/intel: Move BTS index to 47
  2020-07-23 17:11 [PATCH V7 00/14] TopDown metrics support for Icelake kan.liang
                   ` (2 preceding siblings ...)
  2020-07-23 17:11 ` [PATCH V7 03/14] perf/x86/intel: Introduce the fourth fixed counter kan.liang
@ 2020-07-23 17:11 ` kan.liang
  2020-08-19  8:52   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2020-07-23 17:11 ` [PATCH V7 05/14] perf/x86/intel: Fix the name of perf METRICS kan.liang
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: kan.liang @ 2020-07-23 17:11 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: jolsa, eranian, alexander.shishkin, ak, like.xu, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The bit 48 in the PERF_GLOBAL_STATUS is used to indicate the overflow
status of the PERF_METRICS counters.

Move the BTS index to the bit 47.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/include/asm/perf_event.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index fe8110a8c75b..58419e50ff98 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -238,11 +238,11 @@ struct x86_pmu_capability {
 /*
  * We model BTS tracing as another fixed-mode PMC.
  *
- * We choose a value in the middle of the fixed event range, since lower
+ * We choose the value 47 for the fixed index of BTS, since lower
  * values are used by actual fixed events and higher values are used
  * to indicate other overflow conditions in the PERF_GLOBAL_STATUS msr.
  */
-#define INTEL_PMC_IDX_FIXED_BTS			(INTEL_PMC_IDX_FIXED + 16)
+#define INTEL_PMC_IDX_FIXED_BTS			(INTEL_PMC_IDX_FIXED + 15)
 
 #define GLOBAL_STATUS_COND_CHG			BIT_ULL(63)
 #define GLOBAL_STATUS_BUFFER_OVF_BIT		62
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V7 05/14] perf/x86/intel: Fix the name of perf METRICS
  2020-07-23 17:11 [PATCH V7 00/14] TopDown metrics support for Icelake kan.liang
                   ` (3 preceding siblings ...)
  2020-07-23 17:11 ` [PATCH V7 04/14] perf/x86/intel: Move BTS index to 47 kan.liang
@ 2020-07-23 17:11 ` kan.liang
  2020-08-19  8:52   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2020-07-23 17:11 ` [PATCH V7 06/14] perf/x86/intel: Use switch in intel_pmu_disable/enable_event kan.liang
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: kan.liang @ 2020-07-23 17:11 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: jolsa, eranian, alexander.shishkin, ak, like.xu, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Bit 15 of the PERF_CAPABILITIES MSR indicates that the perf METRICS
feature is supported. The perf METRICS is not a PEBS feature.

Rename pebs_metrics_available perf_metrics.

The bit is not used in the current code. It will be used in a later
patch.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/perf_event.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 7b68ab5f19e7..5d453da8cfa6 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -537,7 +537,7 @@ union perf_capabilities {
 		 */
 		u64	full_width_write:1;
 		u64     pebs_baseline:1;
-		u64	pebs_metrics_available:1;
+		u64	perf_metrics:1;
 		u64	pebs_output_pt_available:1;
 	};
 	u64	capabilities;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V7 06/14] perf/x86/intel: Use switch in intel_pmu_disable/enable_event
  2020-07-23 17:11 [PATCH V7 00/14] TopDown metrics support for Icelake kan.liang
                   ` (4 preceding siblings ...)
  2020-07-23 17:11 ` [PATCH V7 05/14] perf/x86/intel: Fix the name of perf METRICS kan.liang
@ 2020-07-23 17:11 ` kan.liang
  2020-08-19  8:52   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2020-07-23 17:11 ` [PATCH V7 07/14] perf/core: Add a new PERF_EV_CAP_COEXIST event capability kan.liang
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: kan.liang @ 2020-07-23 17:11 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: jolsa, eranian, alexander.shishkin, ak, like.xu, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Currently, the if-else is used in the intel_pmu_disable/enable_event to
check the type of an event. It works well, but with more and more types
added later, e.g., perf metrics, compared to the switch statement, the
if-else may impair the readability of the code.

There is no harm to use the switch statement to replace the if-else
here. Also, some optimizing compilers may compile a switch statement
into a jump-table which is more efficient than if-else for a large
number of cases. The performance gain may not be observed for now,
because the number of cases is only 5, but the benefits may be observed
with more and more types added in the future.

Use switch to replace the if-else in the intel_pmu_disable/enable_event.

If the idx is invalid, print a warning.

For the case INTEL_PMC_IDX_FIXED_BTS in intel_pmu_disable_event, don't
need to check the event->attr.precise_ip. Use return for the case.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/core.c | 36 ++++++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index ac1408fe1aee..76eab8178047 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2180,17 +2180,28 @@ static void intel_pmu_disable_event(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
 
-	if (idx < INTEL_PMC_IDX_FIXED) {
+	switch (idx) {
+	case 0 ... INTEL_PMC_IDX_FIXED - 1:
 		intel_clear_masks(event, idx);
 		x86_pmu_disable_event(event);
-	} else if (idx < INTEL_PMC_IDX_FIXED_BTS) {
+		break;
+	case INTEL_PMC_IDX_FIXED ... INTEL_PMC_IDX_FIXED_BTS - 1:
 		intel_clear_masks(event, idx);
 		intel_pmu_disable_fixed(event);
-	} else if (idx == INTEL_PMC_IDX_FIXED_BTS) {
+		break;
+	case INTEL_PMC_IDX_FIXED_BTS:
 		intel_pmu_disable_bts();
 		intel_pmu_drain_bts_buffer();
-	} else if (idx == INTEL_PMC_IDX_FIXED_VLBR)
+		return;
+	case INTEL_PMC_IDX_FIXED_VLBR:
 		intel_clear_masks(event, idx);
+		break;
+	default:
+		intel_clear_masks(event, idx);
+		pr_warn("Failed to disable the event with invalid index %d\n",
+			idx);
+		return;
+	}
 
 	/*
 	 * Needs to be called after x86_pmu_disable_event,
@@ -2262,18 +2273,27 @@ static void intel_pmu_enable_event(struct perf_event *event)
 	if (unlikely(event->attr.precise_ip))
 		intel_pmu_pebs_enable(event);
 
-	if (idx < INTEL_PMC_IDX_FIXED) {
+	switch (idx) {
+	case 0 ... INTEL_PMC_IDX_FIXED - 1:
 		intel_set_masks(event, idx);
 		__x86_pmu_enable_event(hwc, ARCH_PERFMON_EVENTSEL_ENABLE);
-	} else if (idx < INTEL_PMC_IDX_FIXED_BTS) {
+		break;
+	case INTEL_PMC_IDX_FIXED ... INTEL_PMC_IDX_FIXED_BTS - 1:
 		intel_set_masks(event, idx);
 		intel_pmu_enable_fixed(event);
-	} else if (idx == INTEL_PMC_IDX_FIXED_BTS) {
+		break;
+	case INTEL_PMC_IDX_FIXED_BTS:
 		if (!__this_cpu_read(cpu_hw_events.enabled))
 			return;
 		intel_pmu_enable_bts(hwc->config);
-	} else if (idx == INTEL_PMC_IDX_FIXED_VLBR)
+		break;
+	case INTEL_PMC_IDX_FIXED_VLBR:
 		intel_set_masks(event, idx);
+		break;
+	default:
+		pr_warn("Failed to enable the event with invalid index %d\n",
+			idx);
+	}
 }
 
 static void intel_pmu_add_event(struct perf_event *event)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V7 07/14] perf/core: Add a new PERF_EV_CAP_COEXIST event capability
  2020-07-23 17:11 [PATCH V7 00/14] TopDown metrics support for Icelake kan.liang
                   ` (5 preceding siblings ...)
  2020-07-23 17:11 ` [PATCH V7 06/14] perf/x86/intel: Use switch in intel_pmu_disable/enable_event kan.liang
@ 2020-07-23 17:11 ` kan.liang
  2020-07-24 10:55   ` peterz
  2020-08-19  8:52   ` [tip: perf/core] perf/core: Add a new PERF_EV_CAP_SIBLING " tip-bot2 for Kan Liang
  2020-07-23 17:11 ` [PATCH V7 08/14] perf/x86/intel: Generic support for hardware TopDown metrics kan.liang
                   ` (6 subsequent siblings)
  13 siblings, 2 replies; 48+ messages in thread
From: kan.liang @ 2020-07-23 17:11 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: jolsa, eranian, alexander.shishkin, ak, like.xu, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Current perf assumes that events in a group are independent. Close an
event doesn't impact the value of the other events in the same group.
If the closed event is a member, after the event closure, other events
are still running like a group. If the closed event is a leader, other
events are running as singleton events.

However, the assumption is not correct anymore, e.g., the TopDown slots
and metrics events. The slots and metrics events must coexist in the
same group. If the slots event is closed, the value for the metrics
events is invalid.

Add a new PERF_EV_CAP_COEXIST event capability to indicate the
relationship among group events.

If any event with the flag is detached from the group, split the group
into singleton events, and move all events, which have the flag, to the
unrecoverable ERROR state.

The leader of a PERF_EV_CAP_COEXIST group has to be updated at last.
Move perf_event__header_size(leader); to the end of perf_group_detach().

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 include/linux/perf_event.h |  5 ++++
 kernel/events/core.c       | 52 +++++++++++++++++++++++++++++++++++---
 2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 3b22db08b6fb..93631e5389bf 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -576,9 +576,14 @@ typedef void (*perf_overflow_handler_t)(struct perf_event *,
  * PERF_EV_CAP_SOFTWARE: Is a software event.
  * PERF_EV_CAP_READ_ACTIVE_PKG: A CPU event (or cgroup event) that can be read
  * from any CPU in the package where it is active.
+ * PERF_EV_CAP_COEXIST: An event with this flag must coexist with other sibling
+ * events, which have the same flag. If any event with the flag is detached
+ * from the group, split the group into singleton events, and move the events
+ * with the flag to the unrecoverable ERROR state.
  */
 #define PERF_EV_CAP_SOFTWARE		BIT(0)
 #define PERF_EV_CAP_READ_ACTIVE_PKG	BIT(1)
+#define PERF_EV_CAP_COEXIST		BIT(2)
 
 #define SWEVENT_HLIST_BITS		8
 #define SWEVENT_HLIST_SIZE		(1 << SWEVENT_HLIST_BITS)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7c436d705fbd..e35d549a356d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2133,10 +2133,28 @@ static inline struct list_head *get_event_list(struct perf_event *event)
 	return event->attr.pinned ? &ctx->pinned_active : &ctx->flexible_active;
 }
 
+/*
+ * If the event has PERF_EV_CAP_COEXIST capability,
+ * schedule it out and move it into the ERROR state.
+ */
+static inline void perf_remove_coexist_events(struct perf_event *event)
+{
+	struct perf_event_context *ctx = event->ctx;
+	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
+
+	if (!(event->event_caps & PERF_EV_CAP_COEXIST))
+		return;
+
+	event_sched_out(event, cpuctx, ctx);
+	perf_event_set_state(event, PERF_EVENT_STATE_ERROR);
+}
+
 static void perf_group_detach(struct perf_event *event)
 {
+	struct perf_event *leader = event->group_leader;
 	struct perf_event *sibling, *tmp;
 	struct perf_event_context *ctx = event->ctx;
+	bool cap_coexist = !!(event->event_caps & PERF_EV_CAP_COEXIST);
 
 	lockdep_assert_held(&ctx->lock);
 
@@ -2150,15 +2168,25 @@ static void perf_group_detach(struct perf_event *event)
 
 	perf_put_aux_event(event);
 
+	/*
+	 * If a PERF_EV_CAP_COEXIST event is detached,
+	 * split the group into singleton events.
+	 */
+	if (cap_coexist) {
+		event = leader;
+		goto split_group;
+	}
+
 	/*
 	 * If this is a sibling, remove it from its group.
 	 */
-	if (event->group_leader != event) {
+	if (leader != event) {
 		list_del_init(&event->sibling_list);
 		event->group_leader->nr_siblings--;
 		goto out;
 	}
 
+split_group:
 	/*
 	 * If this was a group event with sibling events then
 	 * upgrade the siblings to singleton events by adding them
@@ -2172,6 +2200,10 @@ static void perf_group_detach(struct perf_event *event)
 		/* Inherit group flags from the previous leader */
 		sibling->group_caps = event->group_caps;
 
+		/* Remove sibling PERF_EV_CAP_COEXIST event */
+		if (cap_coexist)
+			perf_remove_coexist_events(sibling);
+
 		if (!RB_EMPTY_NODE(&event->group_node)) {
 			add_event_to_groups(sibling, event->ctx);
 
@@ -2181,12 +2213,24 @@ static void perf_group_detach(struct perf_event *event)
 
 		WARN_ON_ONCE(sibling->ctx != event->ctx);
 	}
-
 out:
-	perf_event__header_size(event->group_leader);
 
-	for_each_sibling_event(tmp, event->group_leader)
+	for_each_sibling_event(tmp, leader)
 		perf_event__header_size(tmp);
+
+	/*
+	 * Change the leader of a PERF_EV_CAP_COEXIST group into
+	 * a singleton event. If the leader is a PERF_EV_CAP_COEXIST
+	 * event as well, remove it.
+	 */
+
+	if (cap_coexist) {
+		list_del_init(&leader->sibling_list);
+		leader->group_leader->nr_siblings = 0;
+		perf_remove_coexist_events(leader);
+	}
+
+	perf_event__header_size(leader);
 }
 
 static bool is_orphaned_event(struct perf_event *event)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V7 08/14] perf/x86/intel: Generic support for hardware TopDown metrics
  2020-07-23 17:11 [PATCH V7 00/14] TopDown metrics support for Icelake kan.liang
                   ` (6 preceding siblings ...)
  2020-07-23 17:11 ` [PATCH V7 07/14] perf/core: Add a new PERF_EV_CAP_COEXIST event capability kan.liang
@ 2020-07-23 17:11 ` kan.liang
  2020-07-24 13:19   ` peterz
  2020-08-19  8:52   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2020-07-23 17:11 ` [PATCH V7 09/14] perf/x86: Add a macro for RDPMC offset of fixed counters kan.liang
                   ` (5 subsequent siblings)
  13 siblings, 2 replies; 48+ messages in thread
From: kan.liang @ 2020-07-23 17:11 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: jolsa, eranian, alexander.shishkin, ak, like.xu, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Intro
=====

The TopDown Microarchitecture Analysis (TMA) Method is a structured
analysis methodology to identify critical performance bottlenecks in
out-of-order processors. Current perf has supported the method.

The method works well, but there is one problem. To collect the TopDown
events, several GP counters have to be used. If a user wants to collect
other events at the same time, the multiplexing probably be triggered,
which impacts the accuracy.

To free up the scarce GP counters, the hardware TopDown metrics feature
is introduced from Ice Lake. The hardware implements an additional
"metrics" register and a new Fixed Counter 3 that measures pipeline
"slots". The TopDown events can be calculated from them instead.

Events
======

The level 1 TopDown has four metrics. There is no event-code assigned to
the TopDown metrics. Four metric events are exported as separate perf
events, which map to the internal "metrics" counter register. Those
events do not exist in hardware, but can be allocated by the scheduler.

For the event mapping, a special 0x00 event code is used, which is
reserved for fake events. The metric events start from umask 0x10.

When setting up the metric events, they point to the Fixed Counter 3.
They have to be specially handled.
- Add the update_topdown_event() callback to read the additional metrics
  MSR and generate the metrics.
- Add the set_topdown_event_period() callback to initialize metrics MSR
  and the fixed counter 3.
- Add a variable n_metric_event to track the number of the accepted
  metrics events. The sharing between multiple users of the same metric
  without multiplexing is not allowed.
- Only enable/disable the fixed counter 3 when there are no other active
  TopDown events, which avoid the unnecessary writing of the fixed
  control register.
- Disable the PMU when reading the metrics event. The metrics MSR and
  the fixed counter 3 are read separately. The values may be modified by
  an NMI.

All four metric events don't support sampling. Since they will be
handled specially for event update, a flag PERF_X86_EVENT_TOPDOWN is
introduced to indicate this case.

The slots event can support both sampling and counting.
For counting, the flag is also applied.
For sampling, it will be handled normally as other normal events.

Groups
======

The slots event is required in a Topdown group.
To avoid reading the METRICS register multiple times, the metrics and
slots value can only be updated by slots event in a group.
All active slots and metrics events will be updated one time.
Therefore, the slots event must be before any metric events in a Topdown
group.

NMI
======

The METRICS related register may be overflow. The bit 48 of the STATUS
register will be set. If so, PERF_METRICS and Fixed counter 3 are
required to be reset. The patch also update all active slots and
metrics events in the NMI handler.

The update_topdown_event() has to read two registers separately. The
values may be modified by an NMI. PMU has to be disabled before calling
the function.

RDPMC
======

RDPMC is temporarily disabled. A later patch will enable it.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/core.c            |  63 +++++++++++---
 arch/x86/events/intel/core.c      | 138 ++++++++++++++++++++++++++++--
 arch/x86/events/perf_event.h      |  37 ++++++++
 arch/x86/include/asm/msr-index.h  |   1 +
 arch/x86/include/asm/perf_event.h |  47 ++++++++++
 5 files changed, 271 insertions(+), 15 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 8e108ea378e3..53fcf0a2b025 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -76,6 +76,9 @@ u64 x86_perf_event_update(struct perf_event *event)
 	if (unlikely(!hwc->event_base))
 		return 0;
 
+	if (unlikely(is_topdown_count(event)) && x86_pmu.update_topdown_event)
+		return x86_pmu.update_topdown_event(event);
+
 	/*
 	 * Careful: an NMI might modify the previous event value.
 	 *
@@ -1031,6 +1034,42 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 	return unsched ? -EINVAL : 0;
 }
 
+static int add_nr_metric_event(struct cpu_hw_events *cpuc,
+			       struct perf_event *event)
+{
+	if (is_metric_event(event)) {
+		if (cpuc->n_metric == INTEL_TD_METRIC_NUM)
+			return -EINVAL;
+		cpuc->n_metric++;
+	}
+
+	return 0;
+}
+
+static void del_nr_metric_event(struct cpu_hw_events *cpuc,
+				struct perf_event *event)
+{
+	if (is_metric_event(event))
+		cpuc->n_metric--;
+}
+
+static int collect_event(struct cpu_hw_events *cpuc, struct perf_event *event,
+			 int max_count, int n)
+{
+
+	if (x86_pmu.intel_cap.perf_metrics && add_nr_metric_event(cpuc, event))
+		return -EINVAL;
+
+	if (n >= max_count + cpuc->n_metric)
+		return -EINVAL;
+
+	cpuc->event_list[n] = event;
+	if (is_counter_pair(&event->hw))
+		cpuc->n_pair++;
+
+	return 0;
+}
+
 /*
  * dogrp: true if must collect siblings events (group)
  * returns total number of events and error code
@@ -1067,28 +1106,22 @@ static int collect_events(struct cpu_hw_events *cpuc, struct perf_event *leader,
 	}
 
 	if (is_x86_event(leader)) {
-		if (n >= max_count)
+		if (collect_event(cpuc, leader, max_count, n))
 			return -EINVAL;
-		cpuc->event_list[n] = leader;
 		n++;
-		if (is_counter_pair(&leader->hw))
-			cpuc->n_pair++;
 	}
+
 	if (!dogrp)
 		return n;
 
 	for_each_sibling_event(event, leader) {
-		if (!is_x86_event(event) ||
-		    event->state <= PERF_EVENT_STATE_OFF)
+		if (!is_x86_event(event) || event->state <= PERF_EVENT_STATE_OFF)
 			continue;
 
-		if (n >= max_count)
+		if (collect_event(cpuc, event, max_count, n))
 			return -EINVAL;
 
-		cpuc->event_list[n] = event;
 		n++;
-		if (is_counter_pair(&event->hw))
-			cpuc->n_pair++;
 	}
 	return n;
 }
@@ -1110,6 +1143,10 @@ static inline void x86_assign_hw_event(struct perf_event *event,
 		hwc->event_base	= 0;
 		break;
 
+	case INTEL_PMC_IDX_METRIC_BASE ... INTEL_PMC_IDX_METRIC_END:
+		/* All the metric events are mapped onto the fixed counter 3. */
+		idx = INTEL_PMC_IDX_FIXED_SLOTS;
+		/* fall through */
 	case INTEL_PMC_IDX_FIXED ... INTEL_PMC_IDX_FIXED_BTS-1:
 		hwc->config_base = MSR_ARCH_PERFMON_FIXED_CTR_CTRL;
 		hwc->event_base = MSR_ARCH_PERFMON_FIXED_CTR0 +
@@ -1245,6 +1282,10 @@ int x86_perf_event_set_period(struct perf_event *event)
 	if (unlikely(!hwc->event_base))
 		return 0;
 
+	if (unlikely(is_topdown_count(event)) &&
+	    x86_pmu.set_topdown_event_period)
+		return x86_pmu.set_topdown_event_period(event);
+
 	/*
 	 * If we are way outside a reasonable range then just skip forward:
 	 */
@@ -1529,6 +1570,8 @@ static void x86_pmu_del(struct perf_event *event, int flags)
 	}
 	cpuc->event_constraint[i-1] = NULL;
 	--cpuc->n_events;
+	if (x86_pmu.intel_cap.perf_metrics)
+		del_nr_metric_event(cpuc, event);
 
 	perf_event_update_userpage(event);
 
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 76eab8178047..5655e09c6878 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2165,11 +2165,24 @@ static inline void intel_clear_masks(struct perf_event *event, int idx)
 static void intel_pmu_disable_fixed(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
-	int idx = hwc->idx - INTEL_PMC_IDX_FIXED;
 	u64 ctrl_val, mask;
+	int idx = hwc->idx;
 
-	mask = 0xfULL << (idx * 4);
+	if (is_topdown_idx(idx)) {
+		struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+		/*
+		 * When there are other active TopDown events,
+		 * don't disable the fixed counter 3.
+		 */
+		if (*(u64 *)cpuc->active_mask & INTEL_PMC_OTHER_TOPDOWN_BITS(idx))
+			return;
+		idx = INTEL_PMC_IDX_FIXED_SLOTS;
+	}
+
+	intel_clear_masks(event, idx);
 
+	mask = 0xfULL << ((idx - INTEL_PMC_IDX_FIXED) * 4);
 	rdmsrl(hwc->config_base, ctrl_val);
 	ctrl_val &= ~mask;
 	wrmsrl(hwc->config_base, ctrl_val);
@@ -2186,7 +2199,7 @@ static void intel_pmu_disable_event(struct perf_event *event)
 		x86_pmu_disable_event(event);
 		break;
 	case INTEL_PMC_IDX_FIXED ... INTEL_PMC_IDX_FIXED_BTS - 1:
-		intel_clear_masks(event, idx);
+	case INTEL_PMC_IDX_METRIC_BASE ... INTEL_PMC_IDX_METRIC_END:
 		intel_pmu_disable_fixed(event);
 		break;
 	case INTEL_PMC_IDX_FIXED_BTS:
@@ -2219,10 +2232,26 @@ static void intel_pmu_del_event(struct perf_event *event)
 		intel_pmu_pebs_del(event);
 }
 
+static void intel_pmu_read_topdown_event(struct perf_event *event)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+	/* Only need to call update_topdown_event() once for group read. */
+	if ((cpuc->txn_flags & PERF_PMU_TXN_READ) &&
+	    !is_slots_event(event))
+		return;
+
+	perf_pmu_disable(event->pmu);
+	x86_pmu.update_topdown_event(event);
+	perf_pmu_enable(event->pmu);
+}
+
 static void intel_pmu_read_event(struct perf_event *event)
 {
 	if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
 		intel_pmu_auto_reload_read(event);
+	else if (is_topdown_count(event) && x86_pmu.update_topdown_event)
+		intel_pmu_read_topdown_event(event);
 	else
 		x86_perf_event_update(event);
 }
@@ -2230,8 +2259,22 @@ static void intel_pmu_read_event(struct perf_event *event)
 static void intel_pmu_enable_fixed(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
-	int idx = hwc->idx - INTEL_PMC_IDX_FIXED;
 	u64 ctrl_val, mask, bits = 0;
+	int idx = hwc->idx;
+
+	if (is_topdown_idx(idx)) {
+		struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+		/*
+		 * When there are other active TopDown events,
+		 * don't enable the fixed counter 3 again.
+		 */
+		if (*(u64 *)cpuc->active_mask & INTEL_PMC_OTHER_TOPDOWN_BITS(idx))
+			return;
+
+		idx = INTEL_PMC_IDX_FIXED_SLOTS;
+	}
+
+	intel_set_masks(event, idx);
 
 	/*
 	 * Enable IRQ generation (0x8), if not PEBS,
@@ -2251,6 +2294,7 @@ static void intel_pmu_enable_fixed(struct perf_event *event)
 	if (x86_pmu.version > 2 && hwc->config & ARCH_PERFMON_EVENTSEL_ANY)
 		bits |= 0x4;
 
+	idx -= INTEL_PMC_IDX_FIXED;
 	bits <<= (idx * 4);
 	mask = 0xfULL << (idx * 4);
 
@@ -2279,7 +2323,7 @@ static void intel_pmu_enable_event(struct perf_event *event)
 		__x86_pmu_enable_event(hwc, ARCH_PERFMON_EVENTSEL_ENABLE);
 		break;
 	case INTEL_PMC_IDX_FIXED ... INTEL_PMC_IDX_FIXED_BTS - 1:
-		intel_set_masks(event, idx);
+	case INTEL_PMC_IDX_METRIC_BASE ... INTEL_PMC_IDX_METRIC_END:
 		intel_pmu_enable_fixed(event);
 		break;
 	case INTEL_PMC_IDX_FIXED_BTS:
@@ -2439,6 +2483,15 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
 			intel_pt_interrupt();
 	}
 
+	/*
+	 * Intel Perf mertrics
+	 */
+	if (__test_and_clear_bit(GLOBAL_STATUS_PERF_METRICS_OVF_BIT, (unsigned long *)&status)) {
+		handled++;
+		if (x86_pmu.update_topdown_event)
+			x86_pmu.update_topdown_event(NULL);
+	}
+
 	/*
 	 * Checkpointed counters can lead to 'spurious' PMIs because the
 	 * rollback caused by the PMI will have cleared the overflow status
@@ -3375,6 +3428,72 @@ static int intel_pmu_hw_config(struct perf_event *event)
 	if (event->attr.type != PERF_TYPE_RAW)
 		return 0;
 
+	/*
+	 * Config Topdown slots and metric events
+	 *
+	 * The slots event on Fixed Counter 3 can support sampling,
+	 * which will be handled normally in x86_perf_event_update().
+	 *
+	 * The metric events don't support sampling.
+	 *
+	 * For counting, topdown slots and metric events will be
+	 * handled specially for event update.
+	 * A flag PERF_X86_EVENT_TOPDOWN is applied for the case.
+	 */
+	if (x86_pmu.intel_cap.perf_metrics && is_topdown_event(event)) {
+		if (is_metric_event(event)) {
+			struct perf_event *leader = event->group_leader;
+			struct perf_event *sibling;
+
+			/* The metric events don't support sampling. */
+			if (is_sampling_event(event))
+				return -EINVAL;
+
+			/* The metric events cannot be a group leader. */
+			if (leader == event)
+				return -EINVAL;
+
+			/*
+			 * The slots event cannot be the leader of a topdown
+			 * sample-read group, e.g., {slots, topdown-retiring}:S
+			 */
+			if (is_slots_event(leader) && is_sampling_event(leader))
+				return -EINVAL;
+
+			/*
+			 * The slots event must be before the metric events,
+			 * because we only update the values of a topdown
+			 * group once with the slots event.
+			 */
+			if (!is_slots_event(leader)) {
+				for_each_sibling_event(sibling, leader) {
+					if (is_slots_event(sibling))
+						break;
+					if (is_metric_event(sibling))
+						return -EINVAL;
+				}
+			}
+		}
+
+		if (!is_sampling_event(event)) {
+			if (event->attr.config1 != 0)
+				return -EINVAL;
+			/*
+			 * The TopDown metrics events and slots event don't
+			 * support any filters.
+			 */
+			if (event->attr.config & X86_ALL_EVENT_FLAGS)
+				return -EINVAL;
+
+			event->hw.flags |= PERF_X86_EVENT_TOPDOWN;
+
+			event->event_caps |= PERF_EV_CAP_COEXIST;
+
+			if (is_metric_event(event))
+				event->hw.flags &= ~PERF_X86_EVENT_RDPMC_ALLOWED;
+		}
+	}
+
 	if (!(event->attr.config & ARCH_PERFMON_EVENTSEL_ANY))
 		return 0;
 
@@ -5218,6 +5337,15 @@ __init int intel_pmu_init(void)
 		 * counter, so do not extend mask to generic counters
 		 */
 		for_each_event_constraint(c, x86_pmu.event_constraints) {
+			/*
+			 * Don't extend the topdown slots and metrics
+			 * events to the generic counters.
+			 */
+			if (c->idxmsk64 & INTEL_PMC_MSK_TOPDOWN) {
+				c->weight = hweight64(c->idxmsk64);
+				continue;
+			}
+
 			if (c->cmask == FIXED_EVENT_FLAGS
 			    && c->idxmsk64 != INTEL_PMC_MSK_FIXED_REF_CYCLES) {
 				c->idxmsk64 |= (1ULL << x86_pmu.num_counters) - 1;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 5d453da8cfa6..dea009ce0e4a 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -79,6 +79,31 @@ static inline bool constraint_match(struct event_constraint *c, u64 ecode)
 #define PERF_X86_EVENT_PEBS_VIA_PT	0x0800 /* use PT buffer for PEBS */
 #define PERF_X86_EVENT_PAIR		0x1000 /* Large Increment per Cycle */
 #define PERF_X86_EVENT_LBR_SELECT	0x2000 /* Save/Restore MSR_LBR_SELECT */
+#define PERF_X86_EVENT_TOPDOWN		0x4000 /* Count Topdown slots/metrics events */
+
+static inline bool is_topdown_count(struct perf_event *event)
+{
+	return event->hw.flags & PERF_X86_EVENT_TOPDOWN;
+}
+
+static inline bool is_metric_event(struct perf_event *event)
+{
+	u64 config = event->attr.config;
+
+	return ((config & ARCH_PERFMON_EVENTSEL_EVENT) == 0) &&
+		((config & INTEL_ARCH_EVENT_MASK) >= INTEL_TD_METRIC_RETIRING)  &&
+		((config & INTEL_ARCH_EVENT_MASK) <= INTEL_TD_METRIC_MAX);
+}
+
+static inline bool is_slots_event(struct perf_event *event)
+{
+	return (event->attr.config & INTEL_ARCH_EVENT_MASK) == INTEL_TD_SLOTS;
+}
+
+static inline bool is_topdown_event(struct perf_event *event)
+{
+	return is_metric_event(event) || is_slots_event(event);
+}
 
 struct amd_nb {
 	int nb_id;  /* NorthBridge id */
@@ -284,6 +309,12 @@ struct cpu_hw_events {
 	 */
 	u64				tfa_shadow;
 
+	/*
+	 * Perf Metrics
+	 */
+	/* number of accepted metrics events */
+	int				n_metric;
+
 	/*
 	 * AMD specific bits
 	 */
@@ -726,6 +757,12 @@ struct x86_pmu {
 	 */
 	atomic_t	lbr_exclusive[x86_lbr_exclusive_max];
 
+	/*
+	 * Intel perf metrics
+	 */
+	u64		(*update_topdown_event)(struct perf_event *event);
+	int		(*set_topdown_event_period)(struct perf_event *event);
+
 	/*
 	 * perf task context (i.e. struct perf_event_context::task_ctx_data)
 	 * switch helper to bridge calls from perf/core to perf/x86.
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index bdc07fc6e517..d8c6cb82866d 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -853,6 +853,7 @@
 #define MSR_CORE_PERF_FIXED_CTR0	0x00000309
 #define MSR_CORE_PERF_FIXED_CTR1	0x0000030a
 #define MSR_CORE_PERF_FIXED_CTR2	0x0000030b
+#define MSR_CORE_PERF_FIXED_CTR3	0x0000030c
 #define MSR_CORE_PERF_FIXED_CTR_CTRL	0x0000038d
 #define MSR_CORE_PERF_GLOBAL_STATUS	0x0000038e
 #define MSR_CORE_PERF_GLOBAL_CTRL	0x0000038f
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 58419e50ff98..000cab7818b5 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -244,6 +244,52 @@ struct x86_pmu_capability {
  */
 #define INTEL_PMC_IDX_FIXED_BTS			(INTEL_PMC_IDX_FIXED + 15)
 
+/*
+ * The PERF_METRICS MSR is modeled as several magic fixed-mode PMCs, one for
+ * each TopDown metric event.
+ *
+ * Internally the TopDown metric events are mapped to the FxCtr 3 (SLOTS).
+ */
+#define INTEL_PMC_IDX_METRIC_BASE		(INTEL_PMC_IDX_FIXED + 16)
+#define INTEL_PMC_IDX_TD_RETIRING		(INTEL_PMC_IDX_METRIC_BASE + 0)
+#define INTEL_PMC_IDX_TD_BAD_SPEC		(INTEL_PMC_IDX_METRIC_BASE + 1)
+#define INTEL_PMC_IDX_TD_FE_BOUND		(INTEL_PMC_IDX_METRIC_BASE + 2)
+#define INTEL_PMC_IDX_TD_BE_BOUND		(INTEL_PMC_IDX_METRIC_BASE + 3)
+#define INTEL_PMC_IDX_METRIC_END		INTEL_PMC_IDX_TD_BE_BOUND
+#define INTEL_PMC_MSK_TOPDOWN			((0xfull << INTEL_PMC_IDX_METRIC_BASE) | \
+						INTEL_PMC_MSK_FIXED_SLOTS)
+
+/*
+ * There is no event-code assigned to the TopDown events.
+ *
+ * For the slots event, use the pseudo code of the fixed counter 3.
+ *
+ * For the metric events, the pseudo event-code is 0x00.
+ * The pseudo umask-code starts from the middle of the pseudo event
+ * space, 0x80.
+ */
+#define INTEL_TD_SLOTS				0x0400	/* TOPDOWN.SLOTS */
+/* Level 1 metrics */
+#define INTEL_TD_METRIC_RETIRING		0x8000	/* Retiring metric */
+#define INTEL_TD_METRIC_BAD_SPEC		0x8100	/* Bad speculation metric */
+#define INTEL_TD_METRIC_FE_BOUND		0x8200	/* FE bound metric */
+#define INTEL_TD_METRIC_BE_BOUND		0x8300	/* BE bound metric */
+#define INTEL_TD_METRIC_MAX			INTEL_TD_METRIC_BE_BOUND
+#define INTEL_TD_METRIC_NUM			4
+
+static inline bool is_metric_idx(int idx)
+{
+	return (unsigned)(idx - INTEL_PMC_IDX_METRIC_BASE) < INTEL_TD_METRIC_NUM;
+}
+
+static inline bool is_topdown_idx(int idx)
+{
+	return is_metric_idx(idx) || idx == INTEL_PMC_IDX_FIXED_SLOTS;
+}
+
+#define INTEL_PMC_OTHER_TOPDOWN_BITS(bit)	\
+			(~(0x1ull << bit) & INTEL_PMC_MSK_TOPDOWN)
+
 #define GLOBAL_STATUS_COND_CHG			BIT_ULL(63)
 #define GLOBAL_STATUS_BUFFER_OVF_BIT		62
 #define GLOBAL_STATUS_BUFFER_OVF		BIT_ULL(GLOBAL_STATUS_BUFFER_OVF_BIT)
@@ -254,6 +300,7 @@ struct x86_pmu_capability {
 #define GLOBAL_STATUS_LBRS_FROZEN		BIT_ULL(GLOBAL_STATUS_LBRS_FROZEN_BIT)
 #define GLOBAL_STATUS_TRACE_TOPAPMI_BIT		55
 #define GLOBAL_STATUS_TRACE_TOPAPMI		BIT_ULL(GLOBAL_STATUS_TRACE_TOPAPMI_BIT)
+#define GLOBAL_STATUS_PERF_METRICS_OVF_BIT	48
 
 /*
  * We model guest LBR event tracing as another fixed-mode PMC like BTS.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V7 09/14] perf/x86: Add a macro for RDPMC offset of fixed counters
  2020-07-23 17:11 [PATCH V7 00/14] TopDown metrics support for Icelake kan.liang
                   ` (7 preceding siblings ...)
  2020-07-23 17:11 ` [PATCH V7 08/14] perf/x86/intel: Generic support for hardware TopDown metrics kan.liang
@ 2020-07-23 17:11 ` kan.liang
  2020-08-19  8:52   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2020-07-23 17:11 ` [PATCH V7 10/14] perf/x86/intel: Support TopDown metrics on Ice Lake kan.liang
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: kan.liang @ 2020-07-23 17:11 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: jolsa, eranian, alexander.shishkin, ak, like.xu, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The RDPMC base offset of fixed counters is hard-code. Use a meaningful
name to replace the magic number to improve the readability of the code.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/core.c            | 3 ++-
 arch/x86/include/asm/perf_event.h | 3 +++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 53fcf0a2b025..ebf723f33794 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1151,7 +1151,8 @@ static inline void x86_assign_hw_event(struct perf_event *event,
 		hwc->config_base = MSR_ARCH_PERFMON_FIXED_CTR_CTRL;
 		hwc->event_base = MSR_ARCH_PERFMON_FIXED_CTR0 +
 				(idx - INTEL_PMC_IDX_FIXED);
-		hwc->event_base_rdpmc = (idx - INTEL_PMC_IDX_FIXED) | 1<<30;
+		hwc->event_base_rdpmc = (idx - INTEL_PMC_IDX_FIXED) |
+					INTEL_PMC_FIXED_RDPMC_BASE;
 		break;
 
 	default:
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 000cab7818b5..964ba312c249 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -196,6 +196,9 @@ struct x86_pmu_capability {
  * Fixed-purpose performance events:
  */
 
+/* RDPMC offset for Fixed PMCs */
+#define INTEL_PMC_FIXED_RDPMC_BASE		(1 << 30)
+
 /*
  * All the fixed-mode PMCs are configured via this single MSR:
  */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V7 10/14] perf/x86/intel: Support TopDown metrics on Ice Lake
  2020-07-23 17:11 [PATCH V7 00/14] TopDown metrics support for Icelake kan.liang
                   ` (8 preceding siblings ...)
  2020-07-23 17:11 ` [PATCH V7 09/14] perf/x86: Add a macro for RDPMC offset of fixed counters kan.liang
@ 2020-07-23 17:11 ` kan.liang
  2020-08-19  8:52   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2020-07-23 17:11 ` [PATCH V7 11/14] perf/x86/intel: Support per-thread RDPMC TopDown metrics kan.liang
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: kan.liang @ 2020-07-23 17:11 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: jolsa, eranian, alexander.shishkin, ak, like.xu, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Ice Lake supports the hardware TopDown metrics feature, which can free
up the scarce GP counters.

Update the event constraints for the metrics events. The metric counters
do not exist, which are mapped to a dummy offset. The sharing between
multiple users of the same metric without multiplexing is not allowed.

Implement set_topdown_event_period for Ice Lake. The values in
PERF_METRICS MSR are derived from the fixed counter 3. Both registers
should start from zero.

Implement update_topdown_event for Ice Lake. The metric is reported by
multiplying the metric (fraction) with slots. To maintain accurate
measurements, both registers are cleared for each update. The fixed
counter 3 should always be cleared before the PERF_METRICS.

Implement td_attr for the new metrics events and the new slots fixed
counter. Make them visible to the perf user tools.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/core.c      | 118 ++++++++++++++++++++++++++++++
 arch/x86/events/perf_event.h      |  13 ++++
 arch/x86/include/asm/msr-index.h  |   2 +
 arch/x86/include/asm/perf_event.h |   2 +
 4 files changed, 135 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 5655e09c6878..e7d6c3b79772 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -247,6 +247,10 @@ static struct event_constraint intel_icl_event_constraints[] = {
 	FIXED_EVENT_CONSTRAINT(0x003c, 1),	/* CPU_CLK_UNHALTED.CORE */
 	FIXED_EVENT_CONSTRAINT(0x0300, 2),	/* CPU_CLK_UNHALTED.REF */
 	FIXED_EVENT_CONSTRAINT(0x0400, 3),	/* SLOTS */
+	METRIC_EVENT_CONSTRAINT(INTEL_TD_METRIC_RETIRING, 0),
+	METRIC_EVENT_CONSTRAINT(INTEL_TD_METRIC_BAD_SPEC, 1),
+	METRIC_EVENT_CONSTRAINT(INTEL_TD_METRIC_FE_BOUND, 2),
+	METRIC_EVENT_CONSTRAINT(INTEL_TD_METRIC_BE_BOUND, 3),
 	INTEL_EVENT_CONSTRAINT_RANGE(0x03, 0x0a, 0xf),
 	INTEL_EVENT_CONSTRAINT_RANGE(0x1f, 0x28, 0xf),
 	INTEL_EVENT_CONSTRAINT(0x32, 0xf),	/* SW_PREFETCH_ACCESS.* */
@@ -309,6 +313,12 @@ EVENT_ATTR_STR_HT(topdown-recovery-bubbles, td_recovery_bubbles,
 EVENT_ATTR_STR_HT(topdown-recovery-bubbles.scale, td_recovery_bubbles_scale,
 	"4", "2");
 
+EVENT_ATTR_STR(slots,			slots,		"event=0x00,umask=0x4");
+EVENT_ATTR_STR(topdown-retiring,	td_retiring,	"event=0x00,umask=0x80");
+EVENT_ATTR_STR(topdown-bad-spec,	td_bad_spec,	"event=0x00,umask=0x81");
+EVENT_ATTR_STR(topdown-fe-bound,	td_fe_bound,	"event=0x00,umask=0x82");
+EVENT_ATTR_STR(topdown-be-bound,	td_be_bound,	"event=0x00,umask=0x83");
+
 static struct attribute *snb_events_attrs[] = {
 	EVENT_PTR(td_slots_issued),
 	EVENT_PTR(td_slots_retired),
@@ -2232,6 +2242,99 @@ static void intel_pmu_del_event(struct perf_event *event)
 		intel_pmu_pebs_del(event);
 }
 
+static int icl_set_topdown_event_period(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	s64 left = local64_read(&hwc->period_left);
+
+	/*
+	 * The values in PERF_METRICS MSR are derived from fixed counter 3.
+	 * Software should start both registers, PERF_METRICS and fixed
+	 * counter 3, from zero.
+	 * Clear PERF_METRICS and Fixed counter 3 in initialization.
+	 * After that, both MSRs will be cleared for each read.
+	 * Don't need to clear them again.
+	 */
+	if (left == x86_pmu.max_period) {
+		wrmsrl(MSR_CORE_PERF_FIXED_CTR3, 0);
+		wrmsrl(MSR_PERF_METRICS, 0);
+		local64_set(&hwc->period_left, 0);
+	}
+
+	perf_event_update_userpage(event);
+
+	return 0;
+}
+
+static inline u64 icl_get_metrics_event_value(u64 metric, u64 slots, int idx)
+{
+	u32 val;
+
+	/*
+	 * The metric is reported as an 8bit integer fraction
+	 * suming up to 0xff.
+	 * slots-in-metric = (Metric / 0xff) * slots
+	 */
+	val = (metric >> ((idx - INTEL_PMC_IDX_METRIC_BASE) * 8)) & 0xff;
+	return  mul_u64_u32_div(slots, val, 0xff);
+}
+
+static void __icl_update_topdown_event(struct perf_event *event,
+				       u64 slots, u64 metrics)
+{
+	int idx = event->hw.idx;
+	u64 delta;
+
+	if (is_metric_idx(idx))
+		delta = icl_get_metrics_event_value(metrics, slots, idx);
+	else
+		delta = slots;
+
+	local64_add(delta, &event->count);
+}
+
+/*
+ * Update all active Topdown events.
+ *
+ * The PERF_METRICS and Fixed counter 3 are read separately. The values may be
+ * modify by a NMI. PMU has to be disabled before calling this function.
+ */
+static u64 icl_update_topdown_event(struct perf_event *event)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	struct perf_event *other;
+	u64 slots, metrics;
+	int idx;
+
+	/* read Fixed counter 3 */
+	rdpmcl((3 | INTEL_PMC_FIXED_RDPMC_BASE), slots);
+	if (!slots)
+		return 0;
+
+	/* read PERF_METRICS */
+	rdpmcl(INTEL_PMC_FIXED_RDPMC_METRICS, metrics);
+
+	for_each_set_bit(idx, cpuc->active_mask, INTEL_PMC_IDX_TD_BE_BOUND + 1) {
+		if (!is_topdown_idx(idx))
+			continue;
+		other = cpuc->events[idx];
+		__icl_update_topdown_event(other, slots, metrics);
+	}
+
+	/*
+	 * Check and update this event, which may have been cleared
+	 * in active_mask e.g. x86_pmu_stop()
+	 */
+	if (event && !test_bit(event->hw.idx, cpuc->active_mask))
+		__icl_update_topdown_event(event, slots, metrics);
+
+	/* The fixed counter 3 has to be written before the PERF_METRICS. */
+	wrmsrl(MSR_CORE_PERF_FIXED_CTR3, 0);
+	wrmsrl(MSR_PERF_METRICS, 0);
+
+	return slots;
+}
+
 static void intel_pmu_read_topdown_event(struct perf_event *event)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
@@ -4494,6 +4597,15 @@ static struct attribute *icl_events_attrs[] = {
 	NULL,
 };
 
+static struct attribute *icl_td_events_attrs[] = {
+	EVENT_PTR(slots),
+	EVENT_PTR(td_retiring),
+	EVENT_PTR(td_bad_spec),
+	EVENT_PTR(td_fe_bound),
+	EVENT_PTR(td_be_bound),
+	NULL,
+};
+
 static struct attribute *icl_tsx_events_attrs[] = {
 	EVENT_PTR(tx_start),
 	EVENT_PTR(tx_abort),
@@ -5278,10 +5390,13 @@ __init int intel_pmu_init(void)
 			hsw_format_attr : nhm_format_attr;
 		extra_skl_attr = skl_format_attr;
 		mem_attr = icl_events_attrs;
+		td_attr = icl_td_events_attrs;
 		tsx_attr = icl_tsx_events_attrs;
 		x86_pmu.rtm_abort_event = X86_CONFIG(.event=0xca, .umask=0x02);
 		x86_pmu.lbr_pt_coexist = true;
 		intel_pmu_pebs_data_source_skl(pmem);
+		x86_pmu.update_topdown_event = icl_update_topdown_event;
+		x86_pmu.set_topdown_event_period = icl_set_topdown_event_period;
 		pr_cont("Icelake events, ");
 		name = "icelake";
 		break;
@@ -5401,6 +5516,9 @@ __init int intel_pmu_init(void)
 	if (x86_pmu.counter_freezing)
 		x86_pmu.handle_irq = intel_pmu_handle_irq_v4;
 
+	if (x86_pmu.intel_cap.perf_metrics)
+		x86_pmu.intel_ctrl |= 1ULL << GLOBAL_CTRL_EN_PERF_METRICS;
+
 	return 0;
 }
 
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index dea009ce0e4a..345442410a4d 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -406,6 +406,19 @@ struct cpu_hw_events {
 #define FIXED_EVENT_CONSTRAINT(c, n)	\
 	EVENT_CONSTRAINT(c, (1ULL << (32+n)), FIXED_EVENT_FLAGS)
 
+/*
+ * The special metric counters do not actually exist. They are calculated from
+ * the combination of the FxCtr3 + MSR_PERF_METRICS.
+ *
+ * The special metric counters are mapped to a dummy offset for the scheduler.
+ * The sharing between multiple users of the same metric without multiplexing
+ * is not allowed, even though the hardware supports that in principle.
+ */
+
+#define METRIC_EVENT_CONSTRAINT(c, n)					\
+	EVENT_CONSTRAINT(c, (1ULL << (INTEL_PMC_IDX_METRIC_BASE + n)),	\
+			 INTEL_ARCH_EVENT_MASK)
+
 /*
  * Constraint on the Event code + UMask
  */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index d8c6cb82866d..325cfe5dd95a 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -859,6 +859,8 @@
 #define MSR_CORE_PERF_GLOBAL_CTRL	0x0000038f
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL	0x00000390
 
+#define MSR_PERF_METRICS		0x00000329
+
 /* PERF_GLOBAL_OVF_CTL bits */
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI_BIT	55
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI		(1ULL << MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI_BIT)
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 964ba312c249..e20aa588dc47 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -198,6 +198,7 @@ struct x86_pmu_capability {
 
 /* RDPMC offset for Fixed PMCs */
 #define INTEL_PMC_FIXED_RDPMC_BASE		(1 << 30)
+#define INTEL_PMC_FIXED_RDPMC_METRICS		(1 << 29)
 
 /*
  * All the fixed-mode PMCs are configured via this single MSR:
@@ -305,6 +306,7 @@ static inline bool is_topdown_idx(int idx)
 #define GLOBAL_STATUS_TRACE_TOPAPMI		BIT_ULL(GLOBAL_STATUS_TRACE_TOPAPMI_BIT)
 #define GLOBAL_STATUS_PERF_METRICS_OVF_BIT	48
 
+#define GLOBAL_CTRL_EN_PERF_METRICS		48
 /*
  * We model guest LBR event tracing as another fixed-mode PMC like BTS.
  *
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V7 11/14] perf/x86/intel: Support per-thread RDPMC TopDown metrics
  2020-07-23 17:11 [PATCH V7 00/14] TopDown metrics support for Icelake kan.liang
                   ` (9 preceding siblings ...)
  2020-07-23 17:11 ` [PATCH V7 10/14] perf/x86/intel: Support TopDown metrics on Ice Lake kan.liang
@ 2020-07-23 17:11 ` kan.liang
  2020-08-19  8:52   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2020-07-23 17:11 ` [PATCH V7 12/14] perf, tools, stat: Support new per thread " kan.liang
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 48+ messages in thread
From: kan.liang @ 2020-07-23 17:11 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: jolsa, eranian, alexander.shishkin, ak, like.xu, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Starts from Ice Lake, the TopDown metrics are directly available as
fixed counters and do not require generic counters. Also, the TopDown
metrics can be collected per thread. Extend the RDPMC usage to support
per-thread TopDown metrics.

The RDPMC index of the PERF_METRICS will be output if RDPMC users ask
for the RDPMC index of the metrics events.

To support per thread RDPMC TopDown, the metrics and slots counters have
to be saved/restored during the context switching.

The last_period and period_left are not used in the counting mode. Use
the fields for saved_metric and saved_slots.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/core.c       |  5 +-
 arch/x86/events/intel/core.c | 91 +++++++++++++++++++++++++++++++-----
 include/linux/perf_event.h   | 29 ++++++++----
 3 files changed, 102 insertions(+), 23 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index ebf723f33794..0f3d01562ded 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2257,7 +2257,10 @@ static int x86_pmu_event_idx(struct perf_event *event)
 	if (!(hwc->flags & PERF_X86_EVENT_RDPMC_ALLOWED))
 		return 0;
 
-	return hwc->event_base_rdpmc + 1;
+	if (is_metric_idx(hwc->idx))
+		return INTEL_PMC_FIXED_RDPMC_METRICS + 1;
+	else
+		return hwc->event_base_rdpmc + 1;
 }
 
 static ssize_t get_attr_rdpmc(struct device *cdev,
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index e7d6c3b79772..b0ab638e48ee 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2258,7 +2258,13 @@ static int icl_set_topdown_event_period(struct perf_event *event)
 	if (left == x86_pmu.max_period) {
 		wrmsrl(MSR_CORE_PERF_FIXED_CTR3, 0);
 		wrmsrl(MSR_PERF_METRICS, 0);
-		local64_set(&hwc->period_left, 0);
+		hwc->saved_slots = 0;
+		hwc->saved_metric = 0;
+	}
+
+	if ((hwc->saved_slots) && is_slots_event(event)) {
+		wrmsrl(MSR_CORE_PERF_FIXED_CTR3, hwc->saved_slots);
+		wrmsrl(MSR_PERF_METRICS, hwc->saved_metric);
 	}
 
 	perf_event_update_userpage(event);
@@ -2279,7 +2285,7 @@ static inline u64 icl_get_metrics_event_value(u64 metric, u64 slots, int idx)
 	return  mul_u64_u32_div(slots, val, 0xff);
 }
 
-static void __icl_update_topdown_event(struct perf_event *event,
+static u64 icl_get_topdown_value(struct perf_event *event,
 				       u64 slots, u64 metrics)
 {
 	int idx = event->hw.idx;
@@ -2290,7 +2296,50 @@ static void __icl_update_topdown_event(struct perf_event *event,
 	else
 		delta = slots;
 
-	local64_add(delta, &event->count);
+	return delta;
+}
+
+static void __icl_update_topdown_event(struct perf_event *event,
+				       u64 slots, u64 metrics,
+				       u64 last_slots, u64 last_metrics)
+{
+	u64 delta, last = 0;
+
+	delta = icl_get_topdown_value(event, slots, metrics);
+	if (last_slots)
+		last = icl_get_topdown_value(event, last_slots, last_metrics);
+
+	/*
+	 * The 8bit integer fraction of metric may be not accurate,
+	 * especially when the changes is very small.
+	 * For example, if only a few bad_spec happens, the fraction
+	 * may be reduced from 1 to 0. If so, the bad_spec event value
+	 * will be 0 which is definitely less than the last value.
+	 * Avoid update event->count for this case.
+	 */
+	if (delta > last) {
+		delta -= last;
+		local64_add(delta, &event->count);
+	}
+}
+
+static void update_saved_topdown_regs(struct perf_event *event,
+				      u64 slots, u64 metrics)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	struct perf_event *other;
+	int idx;
+
+	event->hw.saved_slots = slots;
+	event->hw.saved_metric = metrics;
+
+	for_each_set_bit(idx, cpuc->active_mask, INTEL_PMC_IDX_TD_BE_BOUND + 1) {
+		if (!is_topdown_idx(idx))
+			continue;
+		other = cpuc->events[idx];
+		other->hw.saved_slots = slots;
+		other->hw.saved_metric = metrics;
+	}
 }
 
 /*
@@ -2304,6 +2353,7 @@ static u64 icl_update_topdown_event(struct perf_event *event)
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct perf_event *other;
 	u64 slots, metrics;
+	bool reset = true;
 	int idx;
 
 	/* read Fixed counter 3 */
@@ -2318,19 +2368,39 @@ static u64 icl_update_topdown_event(struct perf_event *event)
 		if (!is_topdown_idx(idx))
 			continue;
 		other = cpuc->events[idx];
-		__icl_update_topdown_event(other, slots, metrics);
+		__icl_update_topdown_event(other, slots, metrics,
+					   event ? event->hw.saved_slots : 0,
+					   event ? event->hw.saved_metric : 0);
 	}
 
 	/*
 	 * Check and update this event, which may have been cleared
 	 * in active_mask e.g. x86_pmu_stop()
 	 */
-	if (event && !test_bit(event->hw.idx, cpuc->active_mask))
-		__icl_update_topdown_event(event, slots, metrics);
+	if (event && !test_bit(event->hw.idx, cpuc->active_mask)) {
+		__icl_update_topdown_event(event, slots, metrics,
+					   event->hw.saved_slots,
+					   event->hw.saved_metric);
 
-	/* The fixed counter 3 has to be written before the PERF_METRICS. */
-	wrmsrl(MSR_CORE_PERF_FIXED_CTR3, 0);
-	wrmsrl(MSR_PERF_METRICS, 0);
+		/*
+		 * In x86_pmu_stop(), the event is cleared in active_mask first,
+		 * then drain the delta, which indicates context switch for
+		 * counting.
+		 * Save metric and slots for context switch.
+		 * Don't need to reset the PERF_METRICS and Fixed counter 3.
+		 * Because the values will be restored in next schedule in.
+		 */
+		update_saved_topdown_regs(event, slots, metrics);
+		reset = false;
+	}
+
+	if (reset) {
+		/* The fixed counter 3 has to be written before the PERF_METRICS. */
+		wrmsrl(MSR_CORE_PERF_FIXED_CTR3, 0);
+		wrmsrl(MSR_PERF_METRICS, 0);
+		if (event)
+			update_saved_topdown_regs(event, 0, 0);
+	}
 
 	return slots;
 }
@@ -3591,9 +3661,6 @@ static int intel_pmu_hw_config(struct perf_event *event)
 			event->hw.flags |= PERF_X86_EVENT_TOPDOWN;
 
 			event->event_caps |= PERF_EV_CAP_COEXIST;
-
-			if (is_metric_event(event))
-				event->hw.flags &= ~PERF_X86_EVENT_RDPMC_ALLOWED;
 		}
 	}
 
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 93631e5389bf..aa60a1381aa1 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -212,17 +212,26 @@ struct hw_perf_event {
 	 */
 	u64				sample_period;
 
-	/*
-	 * The period we started this sample with.
-	 */
-	u64				last_period;
+	union {
+		struct { /* Sampling */
+			/*
+			 * The period we started this sample with.
+			 */
+			u64				last_period;
 
-	/*
-	 * However much is left of the current period; note that this is
-	 * a full 64bit value and allows for generation of periods longer
-	 * than hardware might allow.
-	 */
-	local64_t			period_left;
+			/*
+			 * However much is left of the current period;
+			 * note that this is a full 64bit value and
+			 * allows for generation of periods longer
+			 * than hardware might allow.
+			 */
+			local64_t			period_left;
+		};
+		struct { /* Topdown events counting for context switch */
+			u64				saved_metric;
+			u64				saved_slots;
+		};
+	};
 
 	/*
 	 * State for throttling the event, see __perf_event_overflow() and
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V7 12/14] perf, tools, stat: Support new per thread TopDown metrics
  2020-07-23 17:11 [PATCH V7 00/14] TopDown metrics support for Icelake kan.liang
                   ` (10 preceding siblings ...)
  2020-07-23 17:11 ` [PATCH V7 11/14] perf/x86/intel: Support per-thread RDPMC TopDown metrics kan.liang
@ 2020-07-23 17:11 ` kan.liang
  2020-07-24  3:29   ` Andi Kleen
  2020-07-23 17:11 ` [PATCH V7 13/14] perf, tools, stat: Check Topdown Metric group kan.liang
  2020-07-23 17:11 ` [PATCH V7 14/14] perf, tools: Add documentation for topdown metrics kan.liang
  13 siblings, 1 reply; 48+ messages in thread
From: kan.liang @ 2020-07-23 17:11 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: jolsa, eranian, alexander.shishkin, ak, like.xu

From: Andi Kleen <ak@linux.intel.com>

Icelake has support for reporting per thread TopDown metrics.
These are reported differently than the previous TopDown support,
each metric is standalone, but scaled to pipeline "slots".
We don't need to do anything special for HyperThreading anymore.
Teach perf stat --topdown to handle these new metrics and
print them in the same way as the previous TopDown metrics.
The restrictions of only being able to report information per core is
gone.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/perf-stat.txt |  9 ++-
 tools/perf/builtin-stat.c              | 25 ++++++++
 tools/perf/util/stat-shadow.c          | 89 ++++++++++++++++++++++++++
 tools/perf/util/stat.c                 |  4 ++
 tools/perf/util/stat.h                 |  8 +++
 5 files changed, 133 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index b029ee728a0b..a1a2b9755ae8 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -317,8 +317,13 @@ if the workload is actually bound by the CPU and not by something else.
 For best results it is usually a good idea to use it with interval
 mode like -I 1000, as the bottleneck of workloads can change often.
 
-The top down metrics are collected per core instead of per
-CPU thread. Per core mode is automatically enabled
+This enables --metric-only, unless overridden with --no-metric-only.
+
+The following restrictions only apply to older Intel CPUs and Atom,
+on newer CPUs (IceLake and later) TopDown can be collected for any thread:
+
+The top down metrics are collected per core instead of per CPU thread.
+Per core mode is automatically enabled
 and -a (global monitoring) is needed, requiring root rights or
 perf.perf_event_paranoid=-1.
 
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 9be020e0098a..078ea1485317 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -128,6 +128,15 @@ static const char * topdown_attrs[] = {
 	NULL,
 };
 
+static const char *topdown_metric_attrs[] = {
+	"slots",
+	"topdown-retiring",
+	"topdown-bad-spec",
+	"topdown-fe-bound",
+	"topdown-be-bound",
+	NULL,
+};
+
 static const char *smi_cost_attrs = {
 	"{"
 	"msr/aperf/,"
@@ -1638,6 +1647,21 @@ static int add_default_attributes(void)
 		char *str = NULL;
 		bool warn = false;
 
+		if (topdown_filter_events(topdown_metric_attrs, &str, 1) < 0) {
+			pr_err("Out of memory\n");
+			return -1;
+		}
+		if (topdown_metric_attrs[0] && str) {
+			if (!stat_config.interval) {
+				fprintf(stat_config.output,
+					"Topdown accuracy may decreases when measuring long period.\n"
+					"Please print the result regularly, e.g. -I1000\n");
+			}
+			goto setup_metrics;
+		}
+
+		str = NULL;
+
 		if (stat_config.aggr_mode != AGGR_GLOBAL &&
 		    stat_config.aggr_mode != AGGR_CORE) {
 			pr_err("top down event configuration requires --per-core mode\n");
@@ -1659,6 +1683,7 @@ static int add_default_attributes(void)
 		if (topdown_attrs[0] && str) {
 			if (warn)
 				arch_topdown_group_warn();
+setup_metrics:
 			err = parse_events(evsel_list, str, &errinfo);
 			if (err) {
 				fprintf(stderr,
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index a7c13a88ecb9..933ddb601cb8 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -241,6 +241,18 @@ void perf_stat__update_shadow_stats(struct evsel *counter, u64 count,
 	else if (perf_stat_evsel__is(counter, TOPDOWN_RECOVERY_BUBBLES))
 		update_runtime_stat(st, STAT_TOPDOWN_RECOVERY_BUBBLES,
 				    ctx, cpu, count);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_RETIRING))
+		update_runtime_stat(st, STAT_TOPDOWN_RETIRING,
+				    ctx, cpu, count);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_BAD_SPEC))
+		update_runtime_stat(st, STAT_TOPDOWN_BAD_SPEC,
+				    ctx, cpu, count);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_FE_BOUND))
+		update_runtime_stat(st, STAT_TOPDOWN_FE_BOUND,
+				    ctx, cpu, count);
+	else if (perf_stat_evsel__is(counter, TOPDOWN_BE_BOUND))
+		update_runtime_stat(st, STAT_TOPDOWN_BE_BOUND,
+				    ctx, cpu, count);
 	else if (evsel__match(counter, HARDWARE, HW_STALLED_CYCLES_FRONTEND))
 		update_runtime_stat(st, STAT_STALLED_CYCLES_FRONT,
 				    ctx, cpu, count);
@@ -705,6 +717,47 @@ static double td_be_bound(int ctx, int cpu, struct runtime_stat *st)
 	return sanitize_val(1.0 - sum);
 }
 
+/*
+ * Kernel reports metrics multiplied with slots. To get back
+ * the ratios we need to recreate the sum.
+ */
+
+static double td_metric_ratio(int ctx, int cpu,
+			      enum stat_type type,
+			      struct runtime_stat *stat)
+{
+	double sum = runtime_stat_avg(stat, STAT_TOPDOWN_RETIRING, ctx, cpu) +
+		runtime_stat_avg(stat, STAT_TOPDOWN_FE_BOUND, ctx, cpu) +
+		runtime_stat_avg(stat, STAT_TOPDOWN_BE_BOUND, ctx, cpu) +
+		runtime_stat_avg(stat, STAT_TOPDOWN_BAD_SPEC, ctx, cpu);
+	double d = runtime_stat_avg(stat, type, ctx, cpu);
+
+	if (sum)
+		return d / sum;
+	return 0;
+}
+
+/*
+ * ... but only if most of the values are actually available.
+ * We allow two missing.
+ */
+
+static bool full_td(int ctx, int cpu,
+		    struct runtime_stat *stat)
+{
+	int c = 0;
+
+	if (runtime_stat_avg(stat, STAT_TOPDOWN_RETIRING, ctx, cpu) > 0)
+		c++;
+	if (runtime_stat_avg(stat, STAT_TOPDOWN_BE_BOUND, ctx, cpu) > 0)
+		c++;
+	if (runtime_stat_avg(stat, STAT_TOPDOWN_FE_BOUND, ctx, cpu) > 0)
+		c++;
+	if (runtime_stat_avg(stat, STAT_TOPDOWN_BAD_SPEC, ctx, cpu) > 0)
+		c++;
+	return c >= 2;
+}
+
 static void print_smi_cost(struct perf_stat_config *config,
 			   int cpu, struct evsel *evsel,
 			   struct perf_stat_output_ctx *out,
@@ -1034,6 +1087,42 @@ void perf_stat__print_shadow_stats(struct perf_stat_config *config,
 					be_bound * 100.);
 		else
 			print_metric(config, ctxp, NULL, NULL, name, 0);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_RETIRING) &&
+			full_td(ctx, cpu, st)) {
+		double retiring = td_metric_ratio(ctx, cpu,
+						  STAT_TOPDOWN_RETIRING, st);
+
+		if (retiring > 0.7)
+			color = PERF_COLOR_GREEN;
+		print_metric(config, ctxp, color, "%8.1f%%", "retiring",
+				retiring * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_FE_BOUND) &&
+			full_td(ctx, cpu, st)) {
+		double fe_bound = td_metric_ratio(ctx, cpu,
+						  STAT_TOPDOWN_FE_BOUND, st);
+
+		if (fe_bound > 0.2)
+			color = PERF_COLOR_RED;
+		print_metric(config, ctxp, color, "%8.1f%%", "frontend bound",
+				fe_bound * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_BE_BOUND) &&
+			full_td(ctx, cpu, st)) {
+		double be_bound = td_metric_ratio(ctx, cpu,
+						  STAT_TOPDOWN_BE_BOUND, st);
+
+		if (be_bound > 0.2)
+			color = PERF_COLOR_RED;
+		print_metric(config, ctxp, color, "%8.1f%%", "backend bound",
+				be_bound * 100.);
+	} else if (perf_stat_evsel__is(evsel, TOPDOWN_BAD_SPEC) &&
+			full_td(ctx, cpu, st)) {
+		double bad_spec = td_metric_ratio(ctx, cpu,
+						  STAT_TOPDOWN_BAD_SPEC, st);
+
+		if (bad_spec > 0.1)
+			color = PERF_COLOR_RED;
+		print_metric(config, ctxp, color, "%8.1f%%", "bad speculation",
+				bad_spec * 100.);
 	} else if (evsel->metric_expr) {
 		generic_metric(config, evsel->metric_expr, evsel->metric_events, evsel->name,
 				evsel->metric_name, NULL, 1, cpu, out, st);
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index cdb154381a87..bd0decd6d753 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -95,6 +95,10 @@ static const char *id_str[PERF_STAT_EVSEL_ID__MAX] = {
 	ID(TOPDOWN_SLOTS_RETIRED, topdown-slots-retired),
 	ID(TOPDOWN_FETCH_BUBBLES, topdown-fetch-bubbles),
 	ID(TOPDOWN_RECOVERY_BUBBLES, topdown-recovery-bubbles),
+	ID(TOPDOWN_RETIRING, topdown-retiring),
+	ID(TOPDOWN_BAD_SPEC, topdown-bad-spec),
+	ID(TOPDOWN_FE_BOUND, topdown-fe-bound),
+	ID(TOPDOWN_BE_BOUND, topdown-be-bound),
 	ID(SMI_NUM, msr/smi/),
 	ID(APERF, msr/aperf/),
 };
diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
index f75ae679eb28..9705421d99a4 100644
--- a/tools/perf/util/stat.h
+++ b/tools/perf/util/stat.h
@@ -28,6 +28,10 @@ enum perf_stat_evsel_id {
 	PERF_STAT_EVSEL_ID__TOPDOWN_SLOTS_RETIRED,
 	PERF_STAT_EVSEL_ID__TOPDOWN_FETCH_BUBBLES,
 	PERF_STAT_EVSEL_ID__TOPDOWN_RECOVERY_BUBBLES,
+	PERF_STAT_EVSEL_ID__TOPDOWN_RETIRING,
+	PERF_STAT_EVSEL_ID__TOPDOWN_BAD_SPEC,
+	PERF_STAT_EVSEL_ID__TOPDOWN_FE_BOUND,
+	PERF_STAT_EVSEL_ID__TOPDOWN_BE_BOUND,
 	PERF_STAT_EVSEL_ID__SMI_NUM,
 	PERF_STAT_EVSEL_ID__APERF,
 	PERF_STAT_EVSEL_ID__MAX,
@@ -82,6 +86,10 @@ enum stat_type {
 	STAT_TOPDOWN_SLOTS_RETIRED,
 	STAT_TOPDOWN_FETCH_BUBBLES,
 	STAT_TOPDOWN_RECOVERY_BUBBLES,
+	STAT_TOPDOWN_RETIRING,
+	STAT_TOPDOWN_BAD_SPEC,
+	STAT_TOPDOWN_FE_BOUND,
+	STAT_TOPDOWN_BE_BOUND,
 	STAT_SMI_NUM,
 	STAT_APERF,
 	STAT_MAX
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V7 13/14] perf, tools, stat: Check Topdown Metric group
  2020-07-23 17:11 [PATCH V7 00/14] TopDown metrics support for Icelake kan.liang
                   ` (11 preceding siblings ...)
  2020-07-23 17:11 ` [PATCH V7 12/14] perf, tools, stat: Support new per thread " kan.liang
@ 2020-07-23 17:11 ` kan.liang
  2020-07-23 17:11 ` [PATCH V7 14/14] perf, tools: Add documentation for topdown metrics kan.liang
  13 siblings, 0 replies; 48+ messages in thread
From: kan.liang @ 2020-07-23 17:11 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: jolsa, eranian, alexander.shishkin, ak, like.xu, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The slots event is required in a Topdown Metric group.

Add a check to examine the Topdown Metric group. Error out if there is
no slots event detected.

Only check the group on the platform which using topdown_metric_attrs,
e.g. Ice Lake.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/builtin-stat.c | 72 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 078ea1485317..9886aef78440 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1441,6 +1441,72 @@ static int topdown_filter_events(const char **attr, char **str, bool use_group)
 	return 0;
 }
 
+/* Event encoding for Topdown Metric events */
+#define TOPDOWN_SLOTS		0x0400
+#define TOPDOWN_RETIRE		0x8000
+#define TOPDOWN_BAD_SPEC	0x8100
+#define TOPDOWN_FE_BOUND	0x8200
+#define TOPDOWN_BE_BOUND	0x8300
+
+static bool is_topdown_metric_event(struct evsel *counter)
+{
+	if (!counter->pmu_name)
+		return false;
+
+	if (strcmp(counter->pmu_name, "cpu"))
+		return false;
+
+	if ((counter->core.attr.config == TOPDOWN_RETIRE) ||
+	    (counter->core.attr.config == TOPDOWN_BAD_SPEC) ||
+	    (counter->core.attr.config == TOPDOWN_FE_BOUND) ||
+	    (counter->core.attr.config == TOPDOWN_BE_BOUND))
+		return true;
+
+	return false;
+}
+
+static bool is_topdown_slots_event(struct evsel *counter)
+{
+	if (!counter->pmu_name)
+		return false;
+
+	if (strcmp(counter->pmu_name, "cpu"))
+		return false;
+
+	if (counter->core.attr.config == TOPDOWN_SLOTS)
+		return true;
+
+	return false;
+}
+
+static bool topdown_check_group_member(void)
+{
+	struct evsel *counter, *leader, *member;
+	bool has_slots;
+
+	if (!pmu_have_event("cpu", topdown_metric_attrs[0]))
+		return true;
+
+	evlist__for_each_entry(evsel_list, counter) {
+		if (!is_topdown_metric_event(counter))
+			continue;
+
+		leader = counter->leader;
+		has_slots = false;
+
+		for_each_group_evsel(member, leader) {
+			if (is_topdown_slots_event(member))
+				has_slots = true;
+			counter = member;
+		}
+
+		if (!has_slots)
+			return false;
+	}
+
+	return true;
+}
+
 __weak bool arch_topdown_check_group(bool *warn)
 {
 	*warn = false;
@@ -2024,6 +2090,12 @@ int cmd_stat(int argc, const char **argv)
 					(const char **) stat_usage,
 					PARSE_OPT_STOP_AT_NON_OPTION);
 	perf_stat__collect_metric_expr(evsel_list);
+
+	if (!topdown_check_group_member()) {
+		fprintf(stderr, "Topdown group must include slots event\n");
+		goto out;
+	}
+
 	perf_stat__init_shadow_stats();
 
 	if (stat_config.csv_sep) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH V7 14/14] perf, tools: Add documentation for topdown metrics
  2020-07-23 17:11 [PATCH V7 00/14] TopDown metrics support for Icelake kan.liang
                   ` (12 preceding siblings ...)
  2020-07-23 17:11 ` [PATCH V7 13/14] perf, tools, stat: Check Topdown Metric group kan.liang
@ 2020-07-23 17:11 ` kan.liang
  13 siblings, 0 replies; 48+ messages in thread
From: kan.liang @ 2020-07-23 17:11 UTC (permalink / raw)
  To: peterz, acme, mingo, linux-kernel
  Cc: jolsa, eranian, alexander.shishkin, ak, like.xu

From: Andi Kleen <ak@linux.intel.com>

Add some documentation how to use the topdown metrics in ring 3.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Documentation/topdown.txt | 235 +++++++++++++++++++++++++++
 1 file changed, 235 insertions(+)
 create mode 100644 tools/perf/Documentation/topdown.txt

diff --git a/tools/perf/Documentation/topdown.txt b/tools/perf/Documentation/topdown.txt
new file mode 100644
index 000000000000..e724d2af3b8d
--- /dev/null
+++ b/tools/perf/Documentation/topdown.txt
@@ -0,0 +1,235 @@
+Using TopDown metrics in user space
+-----------------------------------
+
+Intel CPUs (since Sandy Bridge and Silvermont) support a TopDown
+methology to break down CPU pipeline execution into 4 bottlenecks:
+frontend bound, backend bound, bad speculation, retiring.
+
+For more details on Topdown see [1][5]
+
+Traditionally this was implemented by events in generic counters
+and specific formulas to compute the bottlenecks.
+
+perf stat --topdown implements this.
+
+Full Top Down includes more levels that can break down the
+bottlenecks further. This is not directly implemented in perf,
+but available in other tools that can run on top of perf,
+such as toplev[2] or vtune[3]
+
+New Topdown features in Ice Lake
+===============================
+
+With Ice Lake CPUs the TopDown metrics are directly available as
+fixed counters and do not require generic counters. This allows
+to collect TopDown always in addition to other events.
+
+% perf stat -a --topdown -I1000
+#           time             counts unit events
+     1.000854735     20,097,158,100      slots
+     1.000854735         79,327,616      topdown-retiring          #      0.4% retiring
+     1.000854735        157,932,715      topdown-bad-spec          #      0.8% bad speculation
+     1.000854735         81,610,855      topdown-fe-bound          #      0.4% frontend bound
+     1.000854735     19,778,286,903      topdown-be-bound          #     98.4% backend bound
+     2.003623823     20,010,908,365      slots
+     2.003623823         79,905,340      topdown-retiring          #      0.4% retiring
+     2.003623823        158,405,024      topdown-bad-spec          #      0.8% bad speculation
+     2.003623823         87,980,097      topdown-fe-bound          #      0.4% frontend bound
+     2.003623823     19,684,617,888      topdown-be-bound          #     98.4% backend bound
+     3.005828889     20,062,101,220      slots
+     3.005828889         80,077,032      topdown-retiring          #      0.4% retiring
+     3.005828889        158,682,921      topdown-bad-spec          #      0.8% bad speculation
+     3.005828889         86,579,604      topdown-fe-bound          #      0.4% frontend bound
+     3.005828889     19,736,761,649      topdown-be-bound          #     98.4% backend bound
+...
+
+This also enables measuring TopDown per thread/process instead
+of only per core.
+
+Using TopDown through RDPMC in applications on Ice Lake
+======================================================
+
+For more fine grained measurements it can be useful to
+access the new  directly from user space. This is more complicated,
+but drastically lowers overhead.
+
+On Ice Lake, there is a new fixed counter 3: SLOTS, which reports
+"pipeline SLOTS" (cycles multiplied by core issue width) and a
+metric register that reports slots ratios for the different bottleneck
+categories.
+
+The metrics counter is CPU model specific and is not be available
+on older CPUs.
+
+Example code
+============
+
+Library functions to do the functionality described below
+is also available in libjevents [4]
+
+The application opens a perf_event file descriptor
+and sets up fixed counter 3 (SLOTS) to start and
+allow user programs to read the performance counters.
+
+Fixed counter 3 is mapped to a pseudo event event=0x00, umask=04,
+so the perf_event_attr structure should be initialized with
+{ .config = 0x0400, .type = PERF_TYPE_RAW }
+
+#include <linux/perf_event.h>
+#include <sys/syscall.h>
+#include <unistd.h>
+
+/* Provide own perf_event_open stub because glibc doesn't */
+__attribute__((weak))
+int perf_event_open(struct perf_event_attr *attr, pid_t pid,
+		    int cpu, int group_fd, unsigned long flags)
+{
+	return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
+}
+
+/* open slots counter file descriptor for current task */
+struct perf_event_attr slots = {
+	.type = PERF_TYPE_RAW,
+	.size = sizeof(struct perf_event_attr),
+	.config = 0x400,
+	.exclude_kernel = 1,
+};
+
+int fd = perf_event_open(&slots, 0, -1, -1, 0);
+if (fd < 0)
+	... error ...
+
+The RDPMC instruction (or _rdpmc compiler intrinsic) can now be used
+to read slots and the topdown metrics at different points of the program:
+
+#include <stdint.h>
+#include <x86intrin.h>
+
+#define RDPMC_FIXED	(1 << 30)	/* return fixed counters */
+#define RDPMC_METRIC	(1 << 29)	/* return metric counters */
+
+#define FIXED_COUNTER_SLOTS		3
+#define METRIC_COUNTER_TOPDOWN_L1	0
+
+static inline uint64_t read_slots(void)
+{
+	return _rdpmc(RDPMC_FIXED | FIXED_COUNTER_SLOTS);
+}
+
+static inline uint64_t read_metrics(void)
+{
+	return _rdpmc(RDPMC_METRIC | METRIC_COUNTER_TOPDOWN_L1);
+}
+
+Then the program can be instrumented to read these metrics at different
+points.
+
+It's not a good idea to do this with too short code regions,
+as the parallelism and overlap in the CPU program execution will
+cause too much measurement inaccuracy. For example instrumenting
+individual basic blocks is definitely too fine grained.
+
+Decoding metrics values
+=======================
+
+The value reported by read_metrics() contains four 8 bit fields
+that represent a scaled ratio that represent the Level 1 bottleneck.
+All four fields add up to 0xff (= 100%)
+
+The binary ratios in the metric value can be converted to float ratios:
+
+#define GET_METRIC(m, i) (((m) >> (i*8)) & 0xff)
+
+#define TOPDOWN_RETIRING(val)	((float)GET_METRIC(val, 0) / 0xff)
+#define TOPDOWN_BAD_SPEC(val)	((float)GET_METRIC(val, 1) / 0xff)
+#define TOPDOWN_FE_BOUND(val)	((float)GET_METRIC(val, 2) / 0xff)
+#define TOPDOWN_BE_BOUND(val)	((float)GET_METRIC(val, 3) / 0xff)
+
+and then converted to percent for printing.
+
+The ratios in the metric accumulate for the time when the counter
+is enabled. For measuring programs it is often useful to measure
+specific sections. For this it is needed to deltas on metrics.
+
+This can be done by scaling the metrics with the slots counter
+read at the same time.
+
+Then it's possible to take deltas of these slots counts
+measured at different points, and determine the metrics
+for that time period.
+
+	slots_a = read_slots();
+	metric_a = read_metrics();
+
+	... larger code region ...
+
+	slots_b = read_slots()
+	metric_b = read_metrics()
+
+	# compute scaled metrics for measurement a
+	retiring_slots_a = GET_METRIC(metric_a, 0) * slots_a
+	bad_spec_slots_a = GET_METRIC(metric_a, 1) * slots_a
+	fe_bound_slots_a = GET_METRIC(metric_a, 2) * slots_a
+	be_bound_slots_a = GET_METRIC(metric_a, 3) * slots_a
+
+	# compute delta scaled metrics between b and a
+	retiring_slots = GET_METRIC(metric_b, 0) * slots_b - retiring_slots_a
+	bad_spec_slots = GET_METRIC(metric_b, 1) * slots_b - bad_spec_slots_a
+	fe_bound_slots = GET_METRIC(metric_b, 2) * slots_b - fe_bound_slots_a
+	be_bound_slots = GET_METRIC(metric_b, 3) * slots_b - be_bound_slots_a
+
+Later the individual ratios for the measurement period can be recreated
+from these counts.
+
+	slots_delta = slots_b - slots_a
+	retiring_ratio = (float)retiring_slots / slots_delta
+	bad_spec_ratio = (float)bad_spec_slots / slots_delta
+	fe_bound_ratio = (float)fe_bound_slots / slots_delta
+	be_bound_ratio = (float)be_bound_slots / slota_delta
+
+	printf("Retiring %.2f%% Bad Speculation %.2f%% FE Bound %.2f%% BE Bound %.2f%%\n",
+		retiring_ratio * 100.,
+		bad_spec_ratio * 100.,
+		fe_bound_ratio * 100.,
+		be_bound_ratio * 100.);
+
+Resetting metrics counters
+==========================
+
+Since the individual metrics are only 8bit they lose precision for
+short regions over time because the number of cycles covered by each
+fraction bit shrinks. So the counters need to be reset regularly.
+
+When using the kernel perf API the kernel resets on every read.
+So as long as the reading is at reasonable intervals (every few
+seconds) the precision is good.
+
+When using perf stat it is recommended to always use the -I option,
+with no longer interval than a few seconds
+
+	perf stat -I 1000 --topdown ...
+
+For user programs using RDPMC directly the counter can
+be reset explicitly using ioctl:
+
+	ioctl(perf_fd, PERF_EVENT_IOC_RESET, 0);
+
+This "opens" a new measurement period.
+
+A program using RDPMC for TopDown should schedule such a reset
+regularly, as in every few seconds.
+
+Limits on Ice Lake
+==================
+
+All the TopDown events must be in a group with SLOTS events.
+
+There is no sampling support for TopDown events.
+Sampling read SLOTS and TopDown events is forbidden.
+For example, perf record -e '{slots, topdown-retiring}:S'
+
+[1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win
+[2] https://github.com/andikleen/pmu-tools/wiki/toplev-manual
+[3] https://software.intel.com/en-us/intel-vtune-amplifier-xe
+[4] https://github.com/andikleen/pmu-tools/tree/master/jevents
+[5] https://sites.google.com/site/analysismethods/yasin-pubs
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 12/14] perf, tools, stat: Support new per thread TopDown metrics
  2020-07-23 17:11 ` [PATCH V7 12/14] perf, tools, stat: Support new per thread " kan.liang
@ 2020-07-24  3:29   ` Andi Kleen
  0 siblings, 0 replies; 48+ messages in thread
From: Andi Kleen @ 2020-07-24  3:29 UTC (permalink / raw)
  To: kan.liang
  Cc: peterz, acme, mingo, linux-kernel, jolsa, eranian,
	alexander.shishkin, like.xu

> +		if (topdown_metric_attrs[0] && str) {`
> +			if (!stat_config.interval) {
> +				fprintf(stat_config.output,
> +					"Topdown accuracy may decreases when measuring long period.\n"
> +					"Please print the result regularly, e.g. -I1000\n");

Can you disable this warning when --metrics-only is used? In this case it doesn't matter
because the error is proportional to the percentage accuracy and should be invisible.

You can only see a difference when looking at the expanded counts.

-andi

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 07/14] perf/core: Add a new PERF_EV_CAP_COEXIST event capability
  2020-07-23 17:11 ` [PATCH V7 07/14] perf/core: Add a new PERF_EV_CAP_COEXIST event capability kan.liang
@ 2020-07-24 10:55   ` peterz
  2020-07-24 11:46     ` peterz
  2020-08-19  8:52   ` [tip: perf/core] perf/core: Add a new PERF_EV_CAP_SIBLING " tip-bot2 for Kan Liang
  1 sibling, 1 reply; 48+ messages in thread
From: peterz @ 2020-07-24 10:55 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, jolsa, eranian, alexander.shishkin,
	ak, like.xu

On Thu, Jul 23, 2020 at 10:11:10AM -0700, kan.liang@linux.intel.com wrote:

> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 3b22db08b6fb..93631e5389bf 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -576,9 +576,14 @@ typedef void (*perf_overflow_handler_t)(struct perf_event *,
>   * PERF_EV_CAP_SOFTWARE: Is a software event.
>   * PERF_EV_CAP_READ_ACTIVE_PKG: A CPU event (or cgroup event) that can be read
>   * from any CPU in the package where it is active.
> + * PERF_EV_CAP_COEXIST: An event with this flag must coexist with other sibling
> + * events, which have the same flag. If any event with the flag is detached
> + * from the group, split the group into singleton events, and move the events
> + * with the flag to the unrecoverable ERROR state.
>   */
>  #define PERF_EV_CAP_SOFTWARE		BIT(0)
>  #define PERF_EV_CAP_READ_ACTIVE_PKG	BIT(1)
> +#define PERF_EV_CAP_COEXIST		BIT(2)
>  
>  #define SWEVENT_HLIST_BITS		8
>  #define SWEVENT_HLIST_SIZE		(1 << SWEVENT_HLIST_BITS)
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 7c436d705fbd..e35d549a356d 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2133,10 +2133,28 @@ static inline struct list_head *get_event_list(struct perf_event *event)
>  	return event->attr.pinned ? &ctx->pinned_active : &ctx->flexible_active;
>  }
>  
> +/*
> + * If the event has PERF_EV_CAP_COEXIST capability,
> + * schedule it out and move it into the ERROR state.
> + */
> +static inline void perf_remove_coexist_events(struct perf_event *event)
> +{
> +	struct perf_event_context *ctx = event->ctx;
> +	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
> +
> +	if (!(event->event_caps & PERF_EV_CAP_COEXIST))
> +		return;
> +
> +	event_sched_out(event, cpuctx, ctx);
> +	perf_event_set_state(event, PERF_EVENT_STATE_ERROR);
> +}

Ah, so the problem here is that ERROR is actually recoverable using
IOC_ENABLE. We don't want that either. Let me try and figure out of EXIT
would work.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 07/14] perf/core: Add a new PERF_EV_CAP_COEXIST event capability
  2020-07-24 10:55   ` peterz
@ 2020-07-24 11:46     ` peterz
  2020-07-24 13:43       ` Liang, Kan
  2020-07-24 14:39       ` Andi Kleen
  0 siblings, 2 replies; 48+ messages in thread
From: peterz @ 2020-07-24 11:46 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, jolsa, eranian, alexander.shishkin,
	ak, like.xu

On Fri, Jul 24, 2020 at 12:55:43PM +0200, peterz@infradead.org wrote:
> > +	event_sched_out(event, cpuctx, ctx);
> > +	perf_event_set_state(event, PERF_EVENT_STATE_ERROR);
> > +}
> 
> Ah, so the problem here is that ERROR is actually recoverable using
> IOC_ENABLE. We don't want that either. Let me try and figure out of EXIT
> would work.

EXIT is difficult too.. Also, that COEXIST thing hurt my brian, can't we
do a simpler SIBLING thing that simply requires the event to be a group
sibling?

The consequence is that SLOTS must be the leader, is that really a
problem? You keep providing the {cycles, slots, metric-things} example,
but afaict {slots, metric-thing, cycles} should work just fine too.
PERF_SAMPLE_READ with PERF_FORMAT_GROUP doesn't need to the the leader.

The thing is, with COEXIST, if you have:

	{cycles, slots, metric1, metric2}

and you say that COEXIST events need to co-exist with at least one other
COEXIST event you end up with a problem when: close(fd_slots) happens,
because in the above case, there's two more COEXIST events, so the
requirement holds, even through this is very much not what we want.

You 'fixed' this by saying closing any COEXIST event will tear the group
up, but that's 'weird' and unexpected if you were to do:
close(fd_metric2).


So something like this.

---
Subject: perf/core: Add a new PERF_EV_CAP_SIBLING event capability

Current perf assumes that events in a group are independent. Close an
event doesn't impact the value of the other events in the same group.
If the closed event is a member, after the event closure, other events
are still running like a group. If the closed event is a leader, other
events are running as singleton events.

Add PERF_EV_CAP_SIBLING to allow events to indicate they require being
part of a group, and when the leader dies they cannot exist
independently.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/perf_event.h |    4 ++++
 kernel/events/core.c       |   38 +++++++++++++++++++++++++++++++++-----
 2 files changed, 37 insertions(+), 5 deletions(-)

--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -576,9 +576,13 @@ typedef void (*perf_overflow_handler_t)(
  * PERF_EV_CAP_SOFTWARE: Is a software event.
  * PERF_EV_CAP_READ_ACTIVE_PKG: A CPU event (or cgroup event) that can be read
  * from any CPU in the package where it is active.
+ * PERF_EV_CAP_SIBLING: An event with this flag must be a group sibling and
+ * cannot be a group leader. If an event with this flag is detached from the
+ * group it is scheduled out and moved into an unrecoverable ERROR state.
  */
 #define PERF_EV_CAP_SOFTWARE		BIT(0)
 #define PERF_EV_CAP_READ_ACTIVE_PKG	BIT(1)
+#define PERF_EV_CAP_SIBLING		BIT(2)
 
 #define SWEVENT_HLIST_BITS		8
 #define SWEVENT_HLIST_SIZE		(1 << SWEVENT_HLIST_BITS)
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2133,8 +2133,24 @@ static inline struct list_head *get_even
 	return event->attr.pinned ? &ctx->pinned_active : &ctx->flexible_active;
 }
 
+/*
+ * Events that have PERF_EV_CAP_SIBLING require being part of a group and
+ * cannot exist on their own, schedule them out and move them into the ERROR
+ * state. Also see _perf_event_enable(), it will not be able to recover
+ * this ERROR state.
+ */
+static inline void perf_remove_sibling_event(struct perf_event *event)
+{
+	struct perf_event_context *ctx = event->ctx;
+	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
+
+	event_sched_out(event, cpuctx, ctx);
+	perf_event_set_state(event, PERF_EVENT_STATE_ERROR);
+}
+
 static void perf_group_detach(struct perf_event *event)
 {
+	struct perf_event *leader = event->group_leader;
 	struct perf_event *sibling, *tmp;
 	struct perf_event_context *ctx = event->ctx;
 
@@ -2153,7 +2169,7 @@ static void perf_group_detach(struct per
 	/*
 	 * If this is a sibling, remove it from its group.
 	 */
-	if (event->group_leader != event) {
+	if (leader != event) {
 		list_del_init(&event->sibling_list);
 		event->group_leader->nr_siblings--;
 		goto out;
@@ -2166,6 +2182,9 @@ static void perf_group_detach(struct per
 	 */
 	list_for_each_entry_safe(sibling, tmp, &event->sibling_list, sibling_list) {
 
+		if (sibling->event_caps & PERF_EV_CAP_SIBLING)
+			perf_remove_sibling_event(sibling);
+
 		sibling->group_leader = sibling;
 		list_del_init(&sibling->sibling_list);
 
@@ -2183,10 +2202,10 @@ static void perf_group_detach(struct per
 	}
 
 out:
-	perf_event__header_size(event->group_leader);
-
-	for_each_sibling_event(tmp, event->group_leader)
+	for_each_sibling_event(tmp, leader)
 		perf_event__header_size(tmp);
+
+	perf_event__header_size(leader);
 }
 
 static bool is_orphaned_event(struct perf_event *event)
@@ -2979,6 +2998,7 @@ static void _perf_event_enable(struct pe
 	raw_spin_lock_irq(&ctx->lock);
 	if (event->state >= PERF_EVENT_STATE_INACTIVE ||
 	    event->state <  PERF_EVENT_STATE_ERROR) {
+out:
 		raw_spin_unlock_irq(&ctx->lock);
 		return;
 	}
@@ -2990,8 +3010,16 @@ static void _perf_event_enable(struct pe
 	 * has gone back into error state, as distinct from the task having
 	 * been scheduled away before the cross-call arrived.
 	 */
-	if (event->state == PERF_EVENT_STATE_ERROR)
+	if (event->state == PERF_EVENT_STATE_ERROR) {
+		/*
+		 * Detached SIBLING events cannot leave ERROR state.
+		 */
+		if (event->event_caps & PERF_EV_CAP_SIBLING &&
+		    event->group_leader == event)
+			goto out;
+
 		event->state = PERF_EVENT_STATE_OFF;
+	}
 	raw_spin_unlock_irq(&ctx->lock);
 
 	event_function_call(event, __perf_event_enable, NULL);


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 08/14] perf/x86/intel: Generic support for hardware TopDown metrics
  2020-07-23 17:11 ` [PATCH V7 08/14] perf/x86/intel: Generic support for hardware TopDown metrics kan.liang
@ 2020-07-24 13:19   ` peterz
  2020-07-24 15:27     ` peterz
  2020-08-19  8:52   ` [tip: perf/core] " tip-bot2 for Kan Liang
  1 sibling, 1 reply; 48+ messages in thread
From: peterz @ 2020-07-24 13:19 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, jolsa, eranian, alexander.shishkin,
	ak, like.xu

On Thu, Jul 23, 2020 at 10:11:11AM -0700, kan.liang@linux.intel.com wrote:
> @@ -3375,6 +3428,72 @@ static int intel_pmu_hw_config(struct perf_event *event)
>  	if (event->attr.type != PERF_TYPE_RAW)
>  		return 0;
>  
> +	/*
> +	 * Config Topdown slots and metric events
> +	 *
> +	 * The slots event on Fixed Counter 3 can support sampling,
> +	 * which will be handled normally in x86_perf_event_update().
> +	 *
> +	 * The metric events don't support sampling.
> +	 *
> +	 * For counting, topdown slots and metric events will be
> +	 * handled specially for event update.
> +	 * A flag PERF_X86_EVENT_TOPDOWN is applied for the case.
> +	 */
> +	if (x86_pmu.intel_cap.perf_metrics && is_topdown_event(event)) {
> +		if (is_metric_event(event)) {
> +			struct perf_event *leader = event->group_leader;
> +			struct perf_event *sibling;
> +
> +			/* The metric events don't support sampling. */
> +			if (is_sampling_event(event))
> +				return -EINVAL;
> +
> +			/* The metric events cannot be a group leader. */
> +			if (leader == event)
> +				return -EINVAL;
> +
> +			/*
> +			 * The slots event cannot be the leader of a topdown
> +			 * sample-read group, e.g., {slots, topdown-retiring}:S
> +			 */
> +			if (is_slots_event(leader) && is_sampling_event(leader))
> +				return -EINVAL;

This has nothing to do with sample-read; SLOTS cannot be sampling when
coupled with the METRIC stuff because hardware is daft.

And you can have SAMPLE_READ on non-leader events just fine.

> +
> +			/*
> +			 * The slots event must be before the metric events,
> +			 * because we only update the values of a topdown
> +			 * group once with the slots event.
> +			 */
> +			if (!is_slots_event(leader)) {
> +				for_each_sibling_event(sibling, leader) {
> +					if (is_slots_event(sibling))
> +						break;
> +					if (is_metric_event(sibling))
> +						return -EINVAL;
> +				}
> +			}

Per the SIBLING patch this then wants to be:

			if (!is_slots_event(leader))
				return -EINVAL;

			event->event_caps |= PERF_EV_CAP_SIBLING.
			/*
			 * Only once we have a METRICs sibling to we
			 * need TopDown magic.
			 */
			leader->hw.flags |= PERF_X86_EVENT_TOPDOWN;
> +		}
> +
> +		if (!is_sampling_event(event)) {
> +			if (event->attr.config1 != 0)
> +				return -EINVAL;

How does this depend on sampling?

> +			/*
> +			 * The TopDown metrics events and slots event don't
> +			 * support any filters.
> +			 */
> +			if (event->attr.config & X86_ALL_EVENT_FLAGS)
> +				return -EINVAL;

That seems independent of sampling too. Even a sampling SLOTS shouldn't
be having any of those afaict.

> +
> +			event->hw.flags |= PERF_X86_EVENT_TOPDOWN;

This is confusing too, a !sampling SLOTS event without METRIC siblings
shouldn't have this set, right? So arguably, this should be like above.

> +
> +			event->event_caps |= PERF_EV_CAP_COEXIST;
> +
> +			if (is_metric_event(event))
> +				event->hw.flags &= ~PERF_X86_EVENT_RDPMC_ALLOWED;

This too seems like something that should be in the is_metric_event()
branch above.

> +		}
> +	}
> +
>  	if (!(event->attr.config & ARCH_PERFMON_EVENTSEL_ANY))
>  		return 0;
>  

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 07/14] perf/core: Add a new PERF_EV_CAP_COEXIST event capability
  2020-07-24 11:46     ` peterz
@ 2020-07-24 13:43       ` Liang, Kan
  2020-07-24 13:54         ` Peter Zijlstra
  2020-07-24 14:39       ` Andi Kleen
  1 sibling, 1 reply; 48+ messages in thread
From: Liang, Kan @ 2020-07-24 13:43 UTC (permalink / raw)
  To: peterz
  Cc: acme, mingo, linux-kernel, jolsa, eranian, alexander.shishkin,
	ak, like.xu



On 7/24/2020 7:46 AM, peterz@infradead.org wrote:
> On Fri, Jul 24, 2020 at 12:55:43PM +0200, peterz@infradead.org wrote:
>>> +	event_sched_out(event, cpuctx, ctx);
>>> +	perf_event_set_state(event, PERF_EVENT_STATE_ERROR);
>>> +}
>>
>> Ah, so the problem here is that ERROR is actually recoverable using
>> IOC_ENABLE. We don't want that either. Let me try and figure out of EXIT
>> would work.
> 
> EXIT is difficult too.. Also, that COEXIST thing hurt my brian, can't we
> do a simpler SIBLING thing that simply requires the event to be a group
> sibling?
> 
> The consequence is that SLOTS must be the leader, is that really a
> problem? You keep providing the {cycles, slots, metric-things} example,
> but afaict {slots, metric-thing, cycles} should work just fine too. > PERF_SAMPLE_READ with PERF_FORMAT_GROUP doesn't need to the the leader.

I'm not sure I get your point.
For the PERF_SAMPLE_READ with PERF_FORMAT_GROUP case, other events can 
be the leader, e.g., {cycles, slots, metric-things}:S.
In this case, the SLOTS event is not a leader. I don't think we can 
assume that the SLOTS event must be the leader.

I think the case should be a useful case. Users may want to read the 
Topdown values for each sample.

Thanks,
Kan

> 
> The thing is, with COEXIST, if you have:
> 
> 	{cycles, slots, metric1, metric2}
> 
> and you say that COEXIST events need to co-exist with at least one other
> COEXIST event you end up with a problem when: close(fd_slots) happens,
> because in the above case, there's two more COEXIST events, so the
> requirement holds, even through this is very much not what we want.
> 
> You 'fixed' this by saying closing any COEXIST event will tear the group
> up, but that's 'weird' and unexpected if you were to do:
> close(fd_metric2).
> >
> So something like this.
> 
> ---
> Subject: perf/core: Add a new PERF_EV_CAP_SIBLING event capability
> 
> Current perf assumes that events in a group are independent. Close an
> event doesn't impact the value of the other events in the same group.
> If the closed event is a member, after the event closure, other events
> are still running like a group. If the closed event is a leader, other
> events are running as singleton events.
> 
> Add PERF_EV_CAP_SIBLING to allow events to indicate they require being
> part of a group, and when the leader dies they cannot exist
> independently.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>   include/linux/perf_event.h |    4 ++++
>   kernel/events/core.c       |   38 +++++++++++++++++++++++++++++++++-----
>   2 files changed, 37 insertions(+), 5 deletions(-)
> 
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -576,9 +576,13 @@ typedef void (*perf_overflow_handler_t)(
>    * PERF_EV_CAP_SOFTWARE: Is a software event.
>    * PERF_EV_CAP_READ_ACTIVE_PKG: A CPU event (or cgroup event) that can be read
>    * from any CPU in the package where it is active.
> + * PERF_EV_CAP_SIBLING: An event with this flag must be a group sibling and
> + * cannot be a group leader. If an event with this flag is detached from the
> + * group it is scheduled out and moved into an unrecoverable ERROR state.
>    */
>   #define PERF_EV_CAP_SOFTWARE		BIT(0)
>   #define PERF_EV_CAP_READ_ACTIVE_PKG	BIT(1)
> +#define PERF_EV_CAP_SIBLING		BIT(2)
>   
>   #define SWEVENT_HLIST_BITS		8
>   #define SWEVENT_HLIST_SIZE		(1 << SWEVENT_HLIST_BITS)
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2133,8 +2133,24 @@ static inline struct list_head *get_even
>   	return event->attr.pinned ? &ctx->pinned_active : &ctx->flexible_active;
>   }
>   
> +/*
> + * Events that have PERF_EV_CAP_SIBLING require being part of a group and
> + * cannot exist on their own, schedule them out and move them into the ERROR
> + * state. Also see _perf_event_enable(), it will not be able to recover
> + * this ERROR state.
> + */
> +static inline void perf_remove_sibling_event(struct perf_event *event)
> +{
> +	struct perf_event_context *ctx = event->ctx;
> +	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
> +
> +	event_sched_out(event, cpuctx, ctx);
> +	perf_event_set_state(event, PERF_EVENT_STATE_ERROR);
> +}
> +
>   static void perf_group_detach(struct perf_event *event)
>   {
> +	struct perf_event *leader = event->group_leader;
>   	struct perf_event *sibling, *tmp;
>   	struct perf_event_context *ctx = event->ctx;
>   
> @@ -2153,7 +2169,7 @@ static void perf_group_detach(struct per
>   	/*
>   	 * If this is a sibling, remove it from its group.
>   	 */
> -	if (event->group_leader != event) {
> +	if (leader != event) {
>   		list_del_init(&event->sibling_list);
>   		event->group_leader->nr_siblings--;
>   		goto out;
> @@ -2166,6 +2182,9 @@ static void perf_group_detach(struct per
>   	 */
>   	list_for_each_entry_safe(sibling, tmp, &event->sibling_list, sibling_list) {
>   
> +		if (sibling->event_caps & PERF_EV_CAP_SIBLING)
> +			perf_remove_sibling_event(sibling);
> +
>   		sibling->group_leader = sibling;
>   		list_del_init(&sibling->sibling_list);
>   
> @@ -2183,10 +2202,10 @@ static void perf_group_detach(struct per
>   	}
>   
>   out:
> -	perf_event__header_size(event->group_leader);
> -
> -	for_each_sibling_event(tmp, event->group_leader)
> +	for_each_sibling_event(tmp, leader)
>   		perf_event__header_size(tmp);
> +
> +	perf_event__header_size(leader);
>   }
>   
>   static bool is_orphaned_event(struct perf_event *event)
> @@ -2979,6 +2998,7 @@ static void _perf_event_enable(struct pe
>   	raw_spin_lock_irq(&ctx->lock);
>   	if (event->state >= PERF_EVENT_STATE_INACTIVE ||
>   	    event->state <  PERF_EVENT_STATE_ERROR) {
> +out:
>   		raw_spin_unlock_irq(&ctx->lock);
>   		return;
>   	}
> @@ -2990,8 +3010,16 @@ static void _perf_event_enable(struct pe
>   	 * has gone back into error state, as distinct from the task having
>   	 * been scheduled away before the cross-call arrived.
>   	 */
> -	if (event->state == PERF_EVENT_STATE_ERROR)
> +	if (event->state == PERF_EVENT_STATE_ERROR) {
> +		/*
> +		 * Detached SIBLING events cannot leave ERROR state.
> +		 */
> +		if (event->event_caps & PERF_EV_CAP_SIBLING &&
> +		    event->group_leader == event)
> +			goto out;
> +
>   		event->state = PERF_EVENT_STATE_OFF;
> +	}
>   	raw_spin_unlock_irq(&ctx->lock);
>   
>   	event_function_call(event, __perf_event_enable, NULL);
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 07/14] perf/core: Add a new PERF_EV_CAP_COEXIST event capability
  2020-07-24 13:43       ` Liang, Kan
@ 2020-07-24 13:54         ` Peter Zijlstra
  2020-07-24 14:19           ` Liang, Kan
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2020-07-24 13:54 UTC (permalink / raw)
  To: Liang, Kan
  Cc: acme, mingo, linux-kernel, jolsa, eranian, alexander.shishkin,
	ak, like.xu

On Fri, Jul 24, 2020 at 09:43:44AM -0400, Liang, Kan wrote:
> 
> 
> On 7/24/2020 7:46 AM, peterz@infradead.org wrote:
> > On Fri, Jul 24, 2020 at 12:55:43PM +0200, peterz@infradead.org wrote:
> > > > +	event_sched_out(event, cpuctx, ctx);
> > > > +	perf_event_set_state(event, PERF_EVENT_STATE_ERROR);
> > > > +}
> > > 
> > > Ah, so the problem here is that ERROR is actually recoverable using
> > > IOC_ENABLE. We don't want that either. Let me try and figure out of EXIT
> > > would work.
> > 
> > EXIT is difficult too.. Also, that COEXIST thing hurt my brian, can't we
> > do a simpler SIBLING thing that simply requires the event to be a group
> > sibling?
> > 
> > The consequence is that SLOTS must be the leader, is that really a
> > problem? You keep providing the {cycles, slots, metric-things} example,
> > but afaict {slots, metric-thing, cycles} should work just fine too.
> > PERF_SAMPLE_READ with PERF_FORMAT_GROUP doesn't need to the the leader.
> 
> I'm not sure I get your point.
> For the PERF_SAMPLE_READ with PERF_FORMAT_GROUP case, other events can be
> the leader, e.g., {cycles, slots, metric-things}:S.
> In this case, the SLOTS event is not a leader. I don't think we can assume
> that the SLOTS event must be the leader.

You can have a sibling event with SAMPLE_READ and FORMAT_GROUP just
fine. The sampling event doesn't need to be the leader.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 07/14] perf/core: Add a new PERF_EV_CAP_COEXIST event capability
  2020-07-24 13:54         ` Peter Zijlstra
@ 2020-07-24 14:19           ` Liang, Kan
  2020-07-24 14:32             ` Peter Zijlstra
  0 siblings, 1 reply; 48+ messages in thread
From: Liang, Kan @ 2020-07-24 14:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, mingo, linux-kernel, jolsa, eranian, alexander.shishkin,
	ak, like.xu



On 7/24/2020 9:54 AM, Peter Zijlstra wrote:
> On Fri, Jul 24, 2020 at 09:43:44AM -0400, Liang, Kan wrote:
>>
>>
>> On 7/24/2020 7:46 AM, peterz@infradead.org wrote:
>>> On Fri, Jul 24, 2020 at 12:55:43PM +0200, peterz@infradead.org wrote:
>>>>> +	event_sched_out(event, cpuctx, ctx);
>>>>> +	perf_event_set_state(event, PERF_EVENT_STATE_ERROR);
>>>>> +}
>>>>
>>>> Ah, so the problem here is that ERROR is actually recoverable using
>>>> IOC_ENABLE. We don't want that either. Let me try and figure out of EXIT
>>>> would work.
>>>
>>> EXIT is difficult too.. Also, that COEXIST thing hurt my brian, can't we
>>> do a simpler SIBLING thing that simply requires the event to be a group
>>> sibling?
>>>
>>> The consequence is that SLOTS must be the leader, is that really a
>>> problem? You keep providing the {cycles, slots, metric-things} example,
>>> but afaict {slots, metric-thing, cycles} should work just fine too.
>>> PERF_SAMPLE_READ with PERF_FORMAT_GROUP doesn't need to the the leader.
>>
>> I'm not sure I get your point.
>> For the PERF_SAMPLE_READ with PERF_FORMAT_GROUP case, other events can be
>> the leader, e.g., {cycles, slots, metric-things}:S.
>> In this case, the SLOTS event is not a leader. I don't think we can assume
>> that the SLOTS event must be the leader.
> 
> You can have a sibling event with SAMPLE_READ and FORMAT_GROUP just
> fine. The sampling event doesn't need to be the leader.
> 

There will be a problem for the current perf tool, which assumes that 
the leader event is the sampling event.

I will check how can we specially handle it in the perf tool.

Thanks,
Kan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 07/14] perf/core: Add a new PERF_EV_CAP_COEXIST event capability
  2020-07-24 14:19           ` Liang, Kan
@ 2020-07-24 14:32             ` Peter Zijlstra
  2020-07-24 14:46               ` Andi Kleen
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2020-07-24 14:32 UTC (permalink / raw)
  To: Liang, Kan
  Cc: acme, mingo, linux-kernel, jolsa, eranian, alexander.shishkin,
	ak, like.xu

On Fri, Jul 24, 2020 at 10:19:25AM -0400, Liang, Kan wrote:
> There will be a problem for the current perf tool, which assumes that the
> leader event is the sampling event.
> 
> I will check how can we specially handle it in the perf tool.

Ah, okay. I've long lost track of how the tool works :/

Something that seems to 'work' is:
'{cycles,cpu/instructions,period=50000/}', so maybe you can make the
group modifier :S use any sampling event if there is one, and otherwise
designate the leader.

Then you can write things like:

  '{slots, metric1, metric2, cpu/cycles,freq=50000/}:S'

and then since cycles is specified as a sampling event, it will use
that.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 07/14] perf/core: Add a new PERF_EV_CAP_COEXIST event capability
  2020-07-24 11:46     ` peterz
  2020-07-24 13:43       ` Liang, Kan
@ 2020-07-24 14:39       ` Andi Kleen
  2020-07-24 14:51         ` Peter Zijlstra
  1 sibling, 1 reply; 48+ messages in thread
From: Andi Kleen @ 2020-07-24 14:39 UTC (permalink / raw)
  To: peterz
  Cc: kan.liang, acme, mingo, linux-kernel, jolsa, eranian,
	alexander.shishkin, like.xu

> The consequence is that SLOTS must be the leader, is that really a
> problem? You keep providing the {cycles, slots, metric-things} example,

Yes that's a problem. One (major) use case for topdown is to
sample on lots of different events, but always create groups that
also measure topdown metrics for that sampling period. For that we need
other leaders.

-Andi

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 07/14] perf/core: Add a new PERF_EV_CAP_COEXIST event capability
  2020-07-24 14:32             ` Peter Zijlstra
@ 2020-07-24 14:46               ` Andi Kleen
  2020-07-24 14:59                 ` Peter Zijlstra
  0 siblings, 1 reply; 48+ messages in thread
From: Andi Kleen @ 2020-07-24 14:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Liang, Kan, acme, mingo, linux-kernel, jolsa, eranian,
	alexander.shishkin, like.xu

> Something that seems to 'work' is:
> '{cycles,cpu/instructions,period=50000/}', so maybe you can make the
> group modifier :S use any sampling event if there is one, and otherwise
> designate the leader.
> 
> Then you can write things like:
> 
>   '{slots, metric1, metric2, cpu/cycles,freq=50000/}:S'
> 
> and then since cycles is specified as a sampling event, it will use
> that.

Okay possible, but it makes things more complicated
for the user to understand and requires special documentation.
Hopefully it's worth it the internal simplification.

-andi

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 07/14] perf/core: Add a new PERF_EV_CAP_COEXIST event capability
  2020-07-24 14:39       ` Andi Kleen
@ 2020-07-24 14:51         ` Peter Zijlstra
  0 siblings, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2020-07-24 14:51 UTC (permalink / raw)
  To: Andi Kleen
  Cc: kan.liang, acme, mingo, linux-kernel, jolsa, eranian,
	alexander.shishkin, like.xu

On Fri, Jul 24, 2020 at 07:39:08AM -0700, Andi Kleen wrote:
> > The consequence is that SLOTS must be the leader, is that really a
> > problem? You keep providing the {cycles, slots, metric-things} example,
> 
> Yes that's a problem. One (major) use case for topdown is to
> sample on lots of different events, but always create groups that
> also measure topdown metrics for that sampling period. For that we need
> other leaders.

Apparently the only problem here is the tool, the kernel doesn't care
which events in a group are sampling, there could be multiple.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 07/14] perf/core: Add a new PERF_EV_CAP_COEXIST event capability
  2020-07-24 14:46               ` Andi Kleen
@ 2020-07-24 14:59                 ` Peter Zijlstra
  2020-07-24 16:43                   ` peterz
  0 siblings, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2020-07-24 14:59 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Liang, Kan, acme, mingo, linux-kernel, jolsa, eranian,
	alexander.shishkin, like.xu

On Fri, Jul 24, 2020 at 07:46:32AM -0700, Andi Kleen wrote:
> > Something that seems to 'work' is:
> > '{cycles,cpu/instructions,period=50000/}', so maybe you can make the
> > group modifier :S use any sampling event if there is one, and otherwise
> > designate the leader.
> > 
> > Then you can write things like:
> > 
> >   '{slots, metric1, metric2, cpu/cycles,freq=50000/}:S'
> > 
> > and then since cycles is specified as a sampling event, it will use
> > that.
> 
> Okay possible, but it makes things more complicated
> for the user to understand and requires special documentation.
> Hopefully it's worth it the internal simplification.

You already require special documentation for this metrics stuff. We
already need to state that SLOTS cannot be a sampling event, so you
already need to pay attention to this anyway.

A shortcut could be a :s event modifier, then you can write:

 '{slots, metric1, metric2, cycles:s}:S'

and have the tool select the :s tagged one.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 08/14] perf/x86/intel: Generic support for hardware TopDown metrics
  2020-07-24 13:19   ` peterz
@ 2020-07-24 15:27     ` peterz
  2020-07-24 16:07       ` Liang, Kan
  0 siblings, 1 reply; 48+ messages in thread
From: peterz @ 2020-07-24 15:27 UTC (permalink / raw)
  To: kan.liang
  Cc: acme, mingo, linux-kernel, jolsa, eranian, alexander.shishkin,
	ak, like.xu

On Fri, Jul 24, 2020 at 03:19:06PM +0200, peterz@infradead.org wrote:
> On Thu, Jul 23, 2020 at 10:11:11AM -0700, kan.liang@linux.intel.com wrote:
> > @@ -3375,6 +3428,72 @@ static int intel_pmu_hw_config(struct perf_event *event)
> >  	if (event->attr.type != PERF_TYPE_RAW)
> >  		return 0;
> >  
> > +	/*
> > +	 * Config Topdown slots and metric events
> > +	 *
> > +	 * The slots event on Fixed Counter 3 can support sampling,
> > +	 * which will be handled normally in x86_perf_event_update().
> > +	 *
> > +	 * The metric events don't support sampling.
> > +	 *
> > +	 * For counting, topdown slots and metric events will be
> > +	 * handled specially for event update.
> > +	 * A flag PERF_X86_EVENT_TOPDOWN is applied for the case.
> > +	 */
> > +	if (x86_pmu.intel_cap.perf_metrics && is_topdown_event(event)) {
> > +		if (is_metric_event(event)) {
> > +			struct perf_event *leader = event->group_leader;
> > +			struct perf_event *sibling;
> > +
> > +			/* The metric events don't support sampling. */
> > +			if (is_sampling_event(event))
> > +				return -EINVAL;
> > +
> > +			/* The metric events cannot be a group leader. */
> > +			if (leader == event)
> > +				return -EINVAL;
> > +
> > +			/*
> > +			 * The slots event cannot be the leader of a topdown
> > +			 * sample-read group, e.g., {slots, topdown-retiring}:S
> > +			 */
> > +			if (is_slots_event(leader) && is_sampling_event(leader))
> > +				return -EINVAL;
> 
> This has nothing to do with sample-read; SLOTS cannot be sampling when
> coupled with the METRIC stuff because hardware is daft.
> 
> And you can have SAMPLE_READ on non-leader events just fine.
> 
> > +
> > +			/*
> > +			 * The slots event must be before the metric events,
> > +			 * because we only update the values of a topdown
> > +			 * group once with the slots event.
> > +			 */
> > +			if (!is_slots_event(leader)) {
> > +				for_each_sibling_event(sibling, leader) {
> > +					if (is_slots_event(sibling))
> > +						break;
> > +					if (is_metric_event(sibling))
> > +						return -EINVAL;
> > +				}
> > +			}
> 
> Per the SIBLING patch this then wants to be:
> 
> 			if (!is_slots_event(leader))
> 				return -EINVAL;
> 
> 			event->event_caps |= PERF_EV_CAP_SIBLING.
> 			/*
> 			 * Only once we have a METRICs sibling to we
> 			 * need TopDown magic.
> 			 */
> 			leader->hw.flags |= PERF_X86_EVENT_TOPDOWN;
> > +		}
> > +
> > +		if (!is_sampling_event(event)) {
> > +			if (event->attr.config1 != 0)
> > +				return -EINVAL;
> 
> How does this depend on sampling?
> 
> > +			/*
> > +			 * The TopDown metrics events and slots event don't
> > +			 * support any filters.
> > +			 */
> > +			if (event->attr.config & X86_ALL_EVENT_FLAGS)
> > +				return -EINVAL;
> 
> That seems independent of sampling too. Even a sampling SLOTS shouldn't
> be having any of those afaict.
> 
> > +
> > +			event->hw.flags |= PERF_X86_EVENT_TOPDOWN;
> 
> This is confusing too, a !sampling SLOTS event without METRIC siblings
> shouldn't have this set, right? So arguably, this should be like above.
> 
> > +
> > +			event->event_caps |= PERF_EV_CAP_COEXIST;
> > +
> > +			if (is_metric_event(event))
> > +				event->hw.flags &= ~PERF_X86_EVENT_RDPMC_ALLOWED;
> 
> This too seems like something that should be in the is_metric_event()
> branch above.
> 
> > +		}
> > +	}
> > +
> >  	if (!(event->attr.config & ARCH_PERFMON_EVENTSEL_ANY))
> >  		return 0;
> >  

FWIW, I pushed out a branch with all these changes in:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/metric

Just to get it some build love, if you want it differently, I'm happy to
throw it all out again.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 08/14] perf/x86/intel: Generic support for hardware TopDown metrics
  2020-07-24 15:27     ` peterz
@ 2020-07-24 16:07       ` Liang, Kan
  2020-07-24 19:10         ` Liang, Kan
  0 siblings, 1 reply; 48+ messages in thread
From: Liang, Kan @ 2020-07-24 16:07 UTC (permalink / raw)
  To: peterz
  Cc: acme, mingo, linux-kernel, jolsa, eranian, alexander.shishkin,
	ak, like.xu



On 7/24/2020 11:27 AM, peterz@infradead.org wrote:
> On Fri, Jul 24, 2020 at 03:19:06PM +0200, peterz@infradead.org wrote:
>> On Thu, Jul 23, 2020 at 10:11:11AM -0700, kan.liang@linux.intel.com wrote:
>>> @@ -3375,6 +3428,72 @@ static int intel_pmu_hw_config(struct perf_event *event)
>>>   	if (event->attr.type != PERF_TYPE_RAW)
>>>   		return 0;
>>>   
>>> +	/*
>>> +	 * Config Topdown slots and metric events
>>> +	 *
>>> +	 * The slots event on Fixed Counter 3 can support sampling,
>>> +	 * which will be handled normally in x86_perf_event_update().
>>> +	 *
>>> +	 * The metric events don't support sampling.
>>> +	 *
>>> +	 * For counting, topdown slots and metric events will be
>>> +	 * handled specially for event update.
>>> +	 * A flag PERF_X86_EVENT_TOPDOWN is applied for the case.
>>> +	 */
>>> +	if (x86_pmu.intel_cap.perf_metrics && is_topdown_event(event)) {
>>> +		if (is_metric_event(event)) {
>>> +			struct perf_event *leader = event->group_leader;
>>> +			struct perf_event *sibling;
>>> +
>>> +			/* The metric events don't support sampling. */
>>> +			if (is_sampling_event(event))
>>> +				return -EINVAL;
>>> +
>>> +			/* The metric events cannot be a group leader. */
>>> +			if (leader == event)
>>> +				return -EINVAL;
>>> +
>>> +			/*
>>> +			 * The slots event cannot be the leader of a topdown
>>> +			 * sample-read group, e.g., {slots, topdown-retiring}:S
>>> +			 */
>>> +			if (is_slots_event(leader) && is_sampling_event(leader))
>>> +				return -EINVAL;
>>
>> This has nothing to do with sample-read; SLOTS cannot be sampling when
>> coupled with the METRIC stuff because hardware is daft.
>>
>> And you can have SAMPLE_READ on non-leader events just fine.
>>
>>> +
>>> +			/*
>>> +			 * The slots event must be before the metric events,
>>> +			 * because we only update the values of a topdown
>>> +			 * group once with the slots event.
>>> +			 */
>>> +			if (!is_slots_event(leader)) {
>>> +				for_each_sibling_event(sibling, leader) {
>>> +					if (is_slots_event(sibling))
>>> +						break;
>>> +					if (is_metric_event(sibling))
>>> +						return -EINVAL;
>>> +				}
>>> +			}
>>
>> Per the SIBLING patch this then wants to be:
>>
>> 			if (!is_slots_event(leader))
>> 				return -EINVAL;
>>
>> 			event->event_caps |= PERF_EV_CAP_SIBLING.
>> 			/*
>> 			 * Only once we have a METRICs sibling to we
>> 			 * need TopDown magic.
>> 			 */
>> 			leader->hw.flags |= PERF_X86_EVENT_TOPDOWN;
>>> +		}
>>> +
>>> +		if (!is_sampling_event(event)) {
>>> +			if (event->attr.config1 != 0)
>>> +				return -EINVAL;
>>
>> How does this depend on sampling?
>>
>>> +			/*
>>> +			 * The TopDown metrics events and slots event don't
>>> +			 * support any filters.
>>> +			 */
>>> +			if (event->attr.config & X86_ALL_EVENT_FLAGS)
>>> +				return -EINVAL;
>>
>> That seems independent of sampling too. Even a sampling SLOTS shouldn't
>> be having any of those afaict.
>>
>>> +
>>> +			event->hw.flags |= PERF_X86_EVENT_TOPDOWN;
>>
>> This is confusing too, a !sampling SLOTS event without METRIC siblings
>> shouldn't have this set, right? So arguably, this should be like above.
>>
>>> +
>>> +			event->event_caps |= PERF_EV_CAP_COEXIST;
>>> +
>>> +			if (is_metric_event(event))
>>> +				event->hw.flags &= ~PERF_X86_EVENT_RDPMC_ALLOWED;
>>
>> This too seems like something that should be in the is_metric_event()
>> branch above.
>>
>>> +		}
>>> +	}
>>> +
>>>   	if (!(event->attr.config & ARCH_PERFMON_EVENTSEL_ANY))
>>>   		return 0;
>>>   
> 
> FWIW, I pushed out a branch with all these changes in:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/metric
> 
> Just to get it some build love, if you want it differently, I'm happy to
> throw it all out again.

Thanks Peter.

I will pull the branch and do more tests.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 07/14] perf/core: Add a new PERF_EV_CAP_COEXIST event capability
  2020-07-24 14:59                 ` Peter Zijlstra
@ 2020-07-24 16:43                   ` peterz
  2020-07-24 17:00                     ` Liang, Kan
  0 siblings, 1 reply; 48+ messages in thread
From: peterz @ 2020-07-24 16:43 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Liang, Kan, acme, mingo, linux-kernel, jolsa, eranian,
	alexander.shishkin, like.xu

On Fri, Jul 24, 2020 at 04:59:34PM +0200, Peter Zijlstra wrote:
> On Fri, Jul 24, 2020 at 07:46:32AM -0700, Andi Kleen wrote:
> > > Something that seems to 'work' is:
> > > '{cycles,cpu/instructions,period=50000/}', so maybe you can make the
> > > group modifier :S use any sampling event if there is one, and otherwise
> > > designate the leader.
> > > 
> > > Then you can write things like:
> > > 
> > >   '{slots, metric1, metric2, cpu/cycles,freq=50000/}:S'
> > > 
> > > and then since cycles is specified as a sampling event, it will use
> > > that.
> > 
> > Okay possible, but it makes things more complicated
> > for the user to understand and requires special documentation.
> > Hopefully it's worth it the internal simplification.
> 
> You already require special documentation for this metrics stuff. We
> already need to state that SLOTS cannot be a sampling event, so you
> already need to pay attention to this anyway.
> 
> A shortcut could be a :s event modifier, then you can write:
> 
>  '{slots, metric1, metric2, cycles:s}:S'
> 
> and have the tool select the :s tagged one.

Having slots as leader also would allow doing something like
FORMAT_METRIC, where we return sibling/leader in some fashion.

That also makes sense for instructions, because, IIRC,
instructions/slots is the better IPC.

And we should probably consider FORMAT_RESET.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 07/14] perf/core: Add a new PERF_EV_CAP_COEXIST event capability
  2020-07-24 16:43                   ` peterz
@ 2020-07-24 17:00                     ` Liang, Kan
  0 siblings, 0 replies; 48+ messages in thread
From: Liang, Kan @ 2020-07-24 17:00 UTC (permalink / raw)
  To: peterz, Andi Kleen
  Cc: acme, mingo, linux-kernel, jolsa, eranian, alexander.shishkin, like.xu



On 7/24/2020 12:43 PM, peterz@infradead.org wrote:
> On Fri, Jul 24, 2020 at 04:59:34PM +0200, Peter Zijlstra wrote:
>> On Fri, Jul 24, 2020 at 07:46:32AM -0700, Andi Kleen wrote:
>>>> Something that seems to 'work' is:
>>>> '{cycles,cpu/instructions,period=50000/}', so maybe you can make the
>>>> group modifier :S use any sampling event if there is one, and otherwise
>>>> designate the leader.
>>>>
>>>> Then you can write things like:
>>>>
>>>>    '{slots, metric1, metric2, cpu/cycles,freq=50000/}:S'
>>>>
>>>> and then since cycles is specified as a sampling event, it will use
>>>> that.
>>>
>>> Okay possible, but it makes things more complicated
>>> for the user to understand and requires special documentation.
>>> Hopefully it's worth it the internal simplification.
>>
>> You already require special documentation for this metrics stuff. We
>> already need to state that SLOTS cannot be a sampling event, so you
>> already need to pay attention to this anyway.
>>
>> A shortcut could be a :s event modifier, then you can write:
>>
>>   '{slots, metric1, metric2, cycles:s}:S'
>>
>> and have the tool select the :s tagged one.

It looks like PT encountered a similar issue as us.
They use the 2nd event of the group as the
"leader". I think we can simply extend the function to check the slots 
event in perf tool.
https://lore.kernel.org/lkml/20200401101613.6201-17-adrian.hunter@intel.com/


> 
> Having slots as leader also would allow doing something like
> FORMAT_METRIC, where we return sibling/leader in some fashion.
> 
> That also makes sense for instructions, because, IIRC,
> instructions/slots is the better IPC.
> 
> And we should probably consider FORMAT_RESET.

What's FORMAT_RESET for?

Thanks,
Kan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 08/14] perf/x86/intel: Generic support for hardware TopDown metrics
  2020-07-24 16:07       ` Liang, Kan
@ 2020-07-24 19:10         ` Liang, Kan
  2020-07-28 12:32           ` Peter Zijlstra
  2020-07-28 13:09           ` Peter Zijlstra
  0 siblings, 2 replies; 48+ messages in thread
From: Liang, Kan @ 2020-07-24 19:10 UTC (permalink / raw)
  To: peterz
  Cc: acme, mingo, linux-kernel, jolsa, eranian, alexander.shishkin,
	ak, like.xu



On 7/24/2020 12:07 PM, Liang, Kan wrote:
> 
> 
> On 7/24/2020 11:27 AM, peterz@infradead.org wrote:
>> On Fri, Jul 24, 2020 at 03:19:06PM +0200, peterz@infradead.org wrote:
>>> On Thu, Jul 23, 2020 at 10:11:11AM -0700, kan.liang@linux.intel.com 
>>> wrote:
>>>> @@ -3375,6 +3428,72 @@ static int intel_pmu_hw_config(struct 
>>>> perf_event *event)
>>>>       if (event->attr.type != PERF_TYPE_RAW)
>>>>           return 0;
>>>> +    /*
>>>> +     * Config Topdown slots and metric events
>>>> +     *
>>>> +     * The slots event on Fixed Counter 3 can support sampling,
>>>> +     * which will be handled normally in x86_perf_event_update().
>>>> +     *
>>>> +     * The metric events don't support sampling.
>>>> +     *
>>>> +     * For counting, topdown slots and metric events will be
>>>> +     * handled specially for event update.
>>>> +     * A flag PERF_X86_EVENT_TOPDOWN is applied for the case.
>>>> +     */
>>>> +    if (x86_pmu.intel_cap.perf_metrics && is_topdown_event(event)) {
>>>> +        if (is_metric_event(event)) {
>>>> +            struct perf_event *leader = event->group_leader;
>>>> +            struct perf_event *sibling;
>>>> +
>>>> +            /* The metric events don't support sampling. */
>>>> +            if (is_sampling_event(event))
>>>> +                return -EINVAL;
>>>> +
>>>> +            /* The metric events cannot be a group leader. */
>>>> +            if (leader == event)
>>>> +                return -EINVAL;
>>>> +
>>>> +            /*
>>>> +             * The slots event cannot be the leader of a topdown
>>>> +             * sample-read group, e.g., {slots, topdown-retiring}:S
>>>> +             */
>>>> +            if (is_slots_event(leader) && is_sampling_event(leader))
>>>> +                return -EINVAL;
>>>
>>> This has nothing to do with sample-read; SLOTS cannot be sampling when
>>> coupled with the METRIC stuff because hardware is daft.
>>>
>>> And you can have SAMPLE_READ on non-leader events just fine.
>>>
>>>> +
>>>> +            /*
>>>> +             * The slots event must be before the metric events,
>>>> +             * because we only update the values of a topdown
>>>> +             * group once with the slots event.
>>>> +             */
>>>> +            if (!is_slots_event(leader)) {
>>>> +                for_each_sibling_event(sibling, leader) {
>>>> +                    if (is_slots_event(sibling))
>>>> +                        break;
>>>> +                    if (is_metric_event(sibling))
>>>> +                        return -EINVAL;
>>>> +                }
>>>> +            }
>>>
>>> Per the SIBLING patch this then wants to be:
>>>
>>>             if (!is_slots_event(leader))
>>>                 return -EINVAL;
>>>
>>>             event->event_caps |= PERF_EV_CAP_SIBLING.
>>>             /*
>>>              * Only once we have a METRICs sibling to we
>>>              * need TopDown magic.
>>>              */
>>>             leader->hw.flags |= PERF_X86_EVENT_TOPDOWN;

Since we only set the flag for the SLOTS event now, the V7 patch will 
treat the metric events as normal events, which trigger an error.

To fix the error, I think we may merge the changes as below with the 
[PATCH 08/14] perf/x86/intel: Generic support for hardware TopDown metrics.

I think we don't need the PERF_X86_EVENT_TOPDOWN flag anymore.
If it's a non-sampling slots event, apply the special function.
If it's a metric event, do nothing.

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 0f3d01562ded..02dfee0b6615 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -73,10 +73,10 @@ u64 x86_perf_event_update(struct perf_event *event)
  	u64 prev_raw_count, new_raw_count;
  	u64 delta;

-	if (unlikely(!hwc->event_base))
+	if (unlikely(!hwc->event_base || is_metric_event(event)))
  		return 0;

-	if (unlikely(is_topdown_count(event)) && x86_pmu.update_topdown_event)
+	if (unlikely(is_slots_count(event)) && x86_pmu.update_topdown_event)
  		return x86_pmu.update_topdown_event(event);

  	/*
@@ -1280,11 +1280,10 @@ int x86_perf_event_set_period(struct perf_event 
*event)
  	s64 period = hwc->sample_period;
  	int ret = 0, idx = hwc->idx;

-	if (unlikely(!hwc->event_base))
+	if (unlikely(!hwc->event_base || is_metric_event(event)))
  		return 0;

-	if (unlikely(is_topdown_count(event)) &&
-	    x86_pmu.set_topdown_event_period)
+	if (unlikely(is_slots_count(event)) && x86_pmu.set_topdown_event_period)
  		return x86_pmu.set_topdown_event_period(event);

  	/*
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 6cb079e0c9d9..010ac74afc09 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2405,27 +2405,18 @@ static u64 icl_update_topdown_event(struct 
perf_event *event)
  	return slots;
  }

-static void intel_pmu_read_topdown_event(struct perf_event *event)
+static void intel_pmu_read_event(struct perf_event *event)
  {
-	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
-
-	/* Only need to call update_topdown_event() once for group read. */
-	if ((cpuc->txn_flags & PERF_PMU_TXN_READ) &&
-	    !is_slots_event(event))
+	if (unlikely(is_metric_event(event)))
  		return;

-	perf_pmu_disable(event->pmu);
-	x86_pmu.update_topdown_event(event);
-	perf_pmu_enable(event->pmu);
-}
-
-static void intel_pmu_read_event(struct perf_event *event)
-{
  	if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
  		intel_pmu_auto_reload_read(event);
-	else if (is_topdown_count(event) && x86_pmu.update_topdown_event)
-		intel_pmu_read_topdown_event(event);
-	else
+	else if (is_slots_count(event) && x86_pmu.update_topdown_event) {
+		perf_pmu_disable(event->pmu);
+		x86_pmu.update_topdown_event(event);
+		perf_pmu_enable(event->pmu);
+	} else
  		x86_perf_event_update(event);
  }

@@ -3606,12 +3597,14 @@ static int intel_pmu_hw_config(struct perf_event 
*event)
  	 *
  	 * The slots event on Fixed Counter 3 can support sampling,
  	 * which will be handled normally in x86_perf_event_update().
-	 *
  	 * The metric events don't support sampling.
  	 *
-	 * For counting, topdown slots and metric events will be
-	 * handled specially for event update.
-	 * A flag PERF_X86_EVENT_TOPDOWN is applied for the case.
+	 * For the counting of the slots and metric events, the rules
+	 * as below have to be applied:
+	 * - A standalone metric event or pure metric events group is
+	 *   not supported.
+	 * - The metric events group must include the slots event.
+	 * - The slots event must be the leader of the group.
  	 */
  	if (x86_pmu.intel_cap.perf_metrics && is_topdown_event(event)) {
  		if (event->attr.config1 || event->attr.config2)
@@ -3647,11 +3640,6 @@ static int intel_pmu_hw_config(struct perf_event 
*event)
  				return -EINVAL;

  			event->event_caps |= PERF_EV_CAP_SIBLING;
-			/*
-			 * Only once we have a METRICs sibling do we
-			 * need TopDown magic.
-			 */
-			leader->hw.flags |= PERF_X86_EVENT_TOPDOWN;
  		}
  	}

diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 345442410a4d..153db209c105 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -79,12 +79,6 @@ static inline bool constraint_match(struct 
event_constraint *c, u64 ecode)
  #define PERF_X86_EVENT_PEBS_VIA_PT	0x0800 /* use PT buffer for PEBS */
  #define PERF_X86_EVENT_PAIR		0x1000 /* Large Increment per Cycle */
  #define PERF_X86_EVENT_LBR_SELECT	0x2000 /* Save/Restore MSR_LBR_SELECT */
-#define PERF_X86_EVENT_TOPDOWN		0x4000 /* Count Topdown slots/metrics 
events */
-
-static inline bool is_topdown_count(struct perf_event *event)
-{
-	return event->hw.flags & PERF_X86_EVENT_TOPDOWN;
-}

  static inline bool is_metric_event(struct perf_event *event)
  {
@@ -105,6 +99,14 @@ static inline bool is_topdown_event(struct 
perf_event *event)
  	return is_metric_event(event) || is_slots_event(event);
  }

+static inline bool is_slots_count(struct perf_event *event)
+{
+	if (is_slots_event(event) && !is_sampling_event(event))
+		return true;
+
+	return false;
+}
+
  struct amd_nb {
  	int nb_id;  /* NorthBridge id */
  	int refcnt; /* reference count */


Thanks,
Kan

>>>> +        }
>>>> +
>>>> +        if (!is_sampling_event(event)) {
>>>> +            if (event->attr.config1 != 0)
>>>> +                return -EINVAL;
>>>
>>> How does this depend on sampling?
>>>
>>>> +            /*
>>>> +             * The TopDown metrics events and slots event don't
>>>> +             * support any filters.
>>>> +             */
>>>> +            if (event->attr.config & X86_ALL_EVENT_FLAGS)
>>>> +                return -EINVAL;
>>>
>>> That seems independent of sampling too. Even a sampling SLOTS shouldn't
>>> be having any of those afaict.
>>>
>>>> +
>>>> +            event->hw.flags |= PERF_X86_EVENT_TOPDOWN;
>>>
>>> This is confusing too, a !sampling SLOTS event without METRIC siblings
>>> shouldn't have this set, right? So arguably, this should be like above.
>>>
>>>> +
>>>> +            event->event_caps |= PERF_EV_CAP_COEXIST;
>>>> +
>>>> +            if (is_metric_event(event))
>>>> +                event->hw.flags &= ~PERF_X86_EVENT_RDPMC_ALLOWED;
>>>
>>> This too seems like something that should be in the is_metric_event()
>>> branch above.
>>>
>>>> +        }
>>>> +    }
>>>> +
>>>>       if (!(event->attr.config & ARCH_PERFMON_EVENTSEL_ANY))
>>>>           return 0;
>>
>> FWIW, I pushed out a branch with all these changes in:
>>
>>    git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git 
>> perf/metric
>>
>> Just to get it some build love, if you want it differently, I'm happy to
>> throw it all out again.
> 
> Thanks Peter.
> 
> I will pull the branch and do more tests.
> 
> Thanks,
> Kan

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 08/14] perf/x86/intel: Generic support for hardware TopDown metrics
  2020-07-24 19:10         ` Liang, Kan
@ 2020-07-28 12:32           ` Peter Zijlstra
  2020-07-28 13:09           ` Peter Zijlstra
  1 sibling, 0 replies; 48+ messages in thread
From: Peter Zijlstra @ 2020-07-28 12:32 UTC (permalink / raw)
  To: Liang, Kan
  Cc: acme, mingo, linux-kernel, jolsa, eranian, alexander.shishkin,
	ak, like.xu

On Fri, Jul 24, 2020 at 03:10:52PM -0400, Liang, Kan wrote:

> > > > Per the SIBLING patch this then wants to be:
> > > > 
> > > >             if (!is_slots_event(leader))
> > > >                 return -EINVAL;
> > > > 
> > > >             event->event_caps |= PERF_EV_CAP_SIBLING.
> > > >             /*
> > > >              * Only once we have a METRICs sibling to we
> > > >              * need TopDown magic.
> > > >              */
> > > >             leader->hw.flags |= PERF_X86_EVENT_TOPDOWN;
> 
> Since we only set the flag for the SLOTS event now, the V7 patch will treat
> the metric events as normal events, which trigger an error.

Damn, that was a silly oversight on my part.

> I think we don't need the PERF_X86_EVENT_TOPDOWN flag anymore.
> If it's a non-sampling slots event, apply the special function.
> If it's a metric event, do nothing.

> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 0f3d01562ded..02dfee0b6615 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -73,10 +73,10 @@ u64 x86_perf_event_update(struct perf_event *event)
>  	u64 prev_raw_count, new_raw_count;
>  	u64 delta;
> 
> -	if (unlikely(!hwc->event_base))
> +	if (unlikely(!hwc->event_base || is_metric_event(event)))
>  		return 0;
> 
> -	if (unlikely(is_topdown_count(event)) && x86_pmu.update_topdown_event)
> +	if (unlikely(is_slots_count(event)) && x86_pmu.update_topdown_event)
>  		return x86_pmu.update_topdown_event(event);
> 
>  	/*
> @@ -1280,11 +1280,10 @@ int x86_perf_event_set_period(struct perf_event
> *event)
>  	s64 period = hwc->sample_period;
>  	int ret = 0, idx = hwc->idx;
> 
> -	if (unlikely(!hwc->event_base))
> +	if (unlikely(!hwc->event_base || is_metric_event(event)))
>  		return 0;
> 
> -	if (unlikely(is_topdown_count(event)) &&
> -	    x86_pmu.set_topdown_event_period)
> +	if (unlikely(is_slots_count(event)) && x86_pmu.set_topdown_event_period)
>  		return x86_pmu.set_topdown_event_period(event);
> 
>  	/*

This; I don't like that much, it adds even more conditions to fairly hot
code.

I was even considering adding a static_branch for
x86_pmu.intel_cap.perf_metrics.

Anyway, let me fix this.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 08/14] perf/x86/intel: Generic support for hardware TopDown metrics
  2020-07-24 19:10         ` Liang, Kan
  2020-07-28 12:32           ` Peter Zijlstra
@ 2020-07-28 13:09           ` Peter Zijlstra
  2020-07-28 13:28             ` Liang, Kan
  1 sibling, 1 reply; 48+ messages in thread
From: Peter Zijlstra @ 2020-07-28 13:09 UTC (permalink / raw)
  To: Liang, Kan
  Cc: acme, mingo, linux-kernel, jolsa, eranian, alexander.shishkin,
	ak, like.xu

On Fri, Jul 24, 2020 at 03:10:52PM -0400, Liang, Kan wrote:

> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 6cb079e0c9d9..010ac74afc09 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -2405,27 +2405,18 @@ static u64 icl_update_topdown_event(struct
> perf_event *event)
>  	return slots;
>  }
> 
> -static void intel_pmu_read_topdown_event(struct perf_event *event)
> +static void intel_pmu_read_event(struct perf_event *event)
>  {
> -	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> -
> -	/* Only need to call update_topdown_event() once for group read. */
> -	if ((cpuc->txn_flags & PERF_PMU_TXN_READ) &&
> -	    !is_slots_event(event))
>  		return;
> 
> -	perf_pmu_disable(event->pmu);
> -	x86_pmu.update_topdown_event(event);
> -	perf_pmu_enable(event->pmu);
> -}
> -
> -static void intel_pmu_read_event(struct perf_event *event)
> -{
>  	if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
>  		intel_pmu_auto_reload_read(event);
> -	else if (is_topdown_count(event) && x86_pmu.update_topdown_event)
> -		intel_pmu_read_topdown_event(event);
> -	else
> +	else if (is_slots_count(event) && x86_pmu.update_topdown_event) {
> +		perf_pmu_disable(event->pmu);
> +		x86_pmu.update_topdown_event(event);
> +		perf_pmu_enable(event->pmu);
> +	} else
>  		x86_perf_event_update(event);
>  }

I'm a little puzzled by this; what happens if you:

	fd = sys_perf_event_open(&attr_slots);
	fd1 = sys_perf_event_open(&attr_metric, .group_fd=fd);

	read(fd1);

?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 08/14] perf/x86/intel: Generic support for hardware TopDown metrics
  2020-07-28 13:09           ` Peter Zijlstra
@ 2020-07-28 13:28             ` Liang, Kan
  2020-07-28 13:44               ` peterz
  0 siblings, 1 reply; 48+ messages in thread
From: Liang, Kan @ 2020-07-28 13:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: acme, mingo, linux-kernel, jolsa, eranian, alexander.shishkin,
	ak, like.xu



On 7/28/2020 9:09 AM, Peter Zijlstra wrote:
> On Fri, Jul 24, 2020 at 03:10:52PM -0400, Liang, Kan wrote:
> 
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index 6cb079e0c9d9..010ac74afc09 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -2405,27 +2405,18 @@ static u64 icl_update_topdown_event(struct
>> perf_event *event)
>>   	return slots;
>>   }
>>
>> -static void intel_pmu_read_topdown_event(struct perf_event *event)
>> +static void intel_pmu_read_event(struct perf_event *event)
>>   {
>> -	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>> -
>> -	/* Only need to call update_topdown_event() once for group read. */
>> -	if ((cpuc->txn_flags & PERF_PMU_TXN_READ) &&
>> -	    !is_slots_event(event))
>>   		return;
>>
>> -	perf_pmu_disable(event->pmu);
>> -	x86_pmu.update_topdown_event(event);
>> -	perf_pmu_enable(event->pmu);
>> -}
>> -
>> -static void intel_pmu_read_event(struct perf_event *event)
>> -{
>>   	if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
>>   		intel_pmu_auto_reload_read(event);
>> -	else if (is_topdown_count(event) && x86_pmu.update_topdown_event)
>> -		intel_pmu_read_topdown_event(event);
>> -	else
>> +	else if (is_slots_count(event) && x86_pmu.update_topdown_event) {
>> +		perf_pmu_disable(event->pmu);
>> +		x86_pmu.update_topdown_event(event);
>> +		perf_pmu_enable(event->pmu);
>> +	} else
>>   		x86_perf_event_update(event);
>>   }
> 
> I'm a little puzzled by this; what happens if you:
> 
> 	fd = sys_perf_event_open(&attr_slots);
> 	fd1 = sys_perf_event_open(&attr_metric, .group_fd=fd);
> 
> 	read(fd1);
> 
> ?
> 

I did a quick test. It depends on the .read_format of attr_metric.
If PERF_FORMAT_GROUP is applied for attr_metric, perf_read_group() will 
be invoked. The value of fd1 is updated correctly.
If the flag is not applied, 0 will be returned.

static ssize_t
__perf_read(struct perf_event *event, char __user *buf, size_t count)
{
	u64 read_format = event->attr.read_format;
	int ret;

	/*
	 * Return end-of-file for a read on an event that is in
	 * error state (i.e. because it was pinned but it couldn't be
	 * scheduled on to the CPU at some point).
	 */
	if (event->state == PERF_EVENT_STATE_ERROR)
		return 0;

	if (count < event->read_size)
		return -ENOSPC;

	WARN_ON_ONCE(event->ctx->parent_ctx);
	if (read_format & PERF_FORMAT_GROUP)
		ret = perf_read_group(event, read_format, buf);
	else
		ret = perf_read_one(event, read_format, buf);

	return ret;
}


Thanks,
Kan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 08/14] perf/x86/intel: Generic support for hardware TopDown metrics
  2020-07-28 13:28             ` Liang, Kan
@ 2020-07-28 13:44               ` peterz
  2020-07-28 14:01                 ` Liang, Kan
  0 siblings, 1 reply; 48+ messages in thread
From: peterz @ 2020-07-28 13:44 UTC (permalink / raw)
  To: Liang, Kan
  Cc: acme, mingo, linux-kernel, jolsa, eranian, alexander.shishkin,
	ak, like.xu

On Tue, Jul 28, 2020 at 09:28:39AM -0400, Liang, Kan wrote:
> 
> 
> On 7/28/2020 9:09 AM, Peter Zijlstra wrote:
> > On Fri, Jul 24, 2020 at 03:10:52PM -0400, Liang, Kan wrote:
> > 
> > > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> > > index 6cb079e0c9d9..010ac74afc09 100644
> > > --- a/arch/x86/events/intel/core.c
> > > +++ b/arch/x86/events/intel/core.c
> > > @@ -2405,27 +2405,18 @@ static u64 icl_update_topdown_event(struct
> > > perf_event *event)
> > >   	return slots;
> > >   }
> > > 
> > > -static void intel_pmu_read_topdown_event(struct perf_event *event)
> > > +static void intel_pmu_read_event(struct perf_event *event)
> > >   {
> > > -	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> > > -
> > > -	/* Only need to call update_topdown_event() once for group read. */
> > > -	if ((cpuc->txn_flags & PERF_PMU_TXN_READ) &&
> > > -	    !is_slots_event(event))
> > >   		return;
> > > 
> > > -	perf_pmu_disable(event->pmu);
> > > -	x86_pmu.update_topdown_event(event);
> > > -	perf_pmu_enable(event->pmu);
> > > -}
> > > -
> > > -static void intel_pmu_read_event(struct perf_event *event)
> > > -{
> > >   	if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
> > >   		intel_pmu_auto_reload_read(event);
> > > -	else if (is_topdown_count(event) && x86_pmu.update_topdown_event)
> > > -		intel_pmu_read_topdown_event(event);
> > > -	else
> > > +	else if (is_slots_count(event) && x86_pmu.update_topdown_event) {
> > > +		perf_pmu_disable(event->pmu);
> > > +		x86_pmu.update_topdown_event(event);
> > > +		perf_pmu_enable(event->pmu);
> > > +	} else
> > >   		x86_perf_event_update(event);
> > >   }
> > 
> > I'm a little puzzled by this; what happens if you:
> > 
> > 	fd = sys_perf_event_open(&attr_slots);
> > 	fd1 = sys_perf_event_open(&attr_metric, .group_fd=fd);
> > 
> > 	read(fd1);
> > 
> > ?
> > 
> 
> I did a quick test. It depends on the .read_format of attr_metric.
> If PERF_FORMAT_GROUP is applied for attr_metric, perf_read_group() will be
> invoked. The value of fd1 is updated correctly.
> If the flag is not applied, 0 will be returned.

Exactly :-), was that intentional? Because prior to this change that
would've worked just fine.

Now, I agree it's a daft thing, because that value is pretty useless
without also having the slots value, but I feel we should be explicit in
our choices here.

If for example, we would've had hardware provide us the raw metric
counters, instead of us having to reconstruct them, this would've been a
semi useful thing.

So I'm tempted to leave things as it, and keep this 'working'.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH V7 08/14] perf/x86/intel: Generic support for hardware TopDown metrics
  2020-07-28 13:44               ` peterz
@ 2020-07-28 14:01                 ` Liang, Kan
  0 siblings, 0 replies; 48+ messages in thread
From: Liang, Kan @ 2020-07-28 14:01 UTC (permalink / raw)
  To: peterz
  Cc: acme, mingo, linux-kernel, jolsa, eranian, alexander.shishkin,
	ak, like.xu



On 7/28/2020 9:44 AM, peterz@infradead.org wrote:
> On Tue, Jul 28, 2020 at 09:28:39AM -0400, Liang, Kan wrote:
>>
>>
>> On 7/28/2020 9:09 AM, Peter Zijlstra wrote:
>>> On Fri, Jul 24, 2020 at 03:10:52PM -0400, Liang, Kan wrote:
>>>
>>>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>>>> index 6cb079e0c9d9..010ac74afc09 100644
>>>> --- a/arch/x86/events/intel/core.c
>>>> +++ b/arch/x86/events/intel/core.c
>>>> @@ -2405,27 +2405,18 @@ static u64 icl_update_topdown_event(struct
>>>> perf_event *event)
>>>>    	return slots;
>>>>    }
>>>>
>>>> -static void intel_pmu_read_topdown_event(struct perf_event *event)
>>>> +static void intel_pmu_read_event(struct perf_event *event)
>>>>    {
>>>> -	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>>>> -
>>>> -	/* Only need to call update_topdown_event() once for group read. */
>>>> -	if ((cpuc->txn_flags & PERF_PMU_TXN_READ) &&
>>>> -	    !is_slots_event(event))
>>>>    		return;
>>>>
>>>> -	perf_pmu_disable(event->pmu);
>>>> -	x86_pmu.update_topdown_event(event);
>>>> -	perf_pmu_enable(event->pmu);
>>>> -}
>>>> -
>>>> -static void intel_pmu_read_event(struct perf_event *event)
>>>> -{
>>>>    	if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
>>>>    		intel_pmu_auto_reload_read(event);
>>>> -	else if (is_topdown_count(event) && x86_pmu.update_topdown_event)
>>>> -		intel_pmu_read_topdown_event(event);
>>>> -	else
>>>> +	else if (is_slots_count(event) && x86_pmu.update_topdown_event) {
>>>> +		perf_pmu_disable(event->pmu);
>>>> +		x86_pmu.update_topdown_event(event);
>>>> +		perf_pmu_enable(event->pmu);
>>>> +	} else
>>>>    		x86_perf_event_update(event);
>>>>    }
>>>
>>> I'm a little puzzled by this; what happens if you:
>>>
>>> 	fd = sys_perf_event_open(&attr_slots);
>>> 	fd1 = sys_perf_event_open(&attr_metric, .group_fd=fd);
>>>
>>> 	read(fd1);
>>>
>>> ?
>>>
>>
>> I did a quick test. It depends on the .read_format of attr_metric.
>> If PERF_FORMAT_GROUP is applied for attr_metric, perf_read_group() will be
>> invoked. The value of fd1 is updated correctly.
>> If the flag is not applied, 0 will be returned.
> 
> Exactly :-), was that intentional?

Kind of, because a metric event must be in a group with the leader slots 
event. If a user reads (treats) the metric event as a singleton event, 
an invalid value should be expected.

> Because prior to this change that
> would've worked just fine.
> 
> Now, I agree it's a daft thing, because that value is pretty useless
> without also having the slots value, but I feel we should be explicit in
> our choices here.
> 
> If for example, we would've had hardware provide us the raw metric
> counters, instead of us having to reconstruct them, this would've been a
> semi useful thing.
> 
> So I'm tempted to leave things as it, and keep this 'working'.

I will update the perf tool document to force the PERF_FORMAT_GROUP flag 
for each metric events.

Thanks,
Kan


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [tip: perf/core] perf/x86/intel: Support per-thread RDPMC TopDown metrics
  2020-07-23 17:11 ` [PATCH V7 11/14] perf/x86/intel: Support per-thread RDPMC TopDown metrics kan.liang
@ 2020-08-19  8:52   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-08-19  8:52 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Kan Liang, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     2cb5383b30d47c446ec7d884cd80f93ffcc31817
Gitweb:        https://git.kernel.org/tip/2cb5383b30d47c446ec7d884cd80f93ffcc31817
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 23 Jul 2020 10:11:14 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 18 Aug 2020 16:34:37 +02:00

perf/x86/intel: Support per-thread RDPMC TopDown metrics

Starts from Ice Lake, the TopDown metrics are directly available as
fixed counters and do not require generic counters. Also, the TopDown
metrics can be collected per thread. Extend the RDPMC usage to support
per-thread TopDown metrics.

The RDPMC index of the PERF_METRICS will be output if RDPMC users ask
for the RDPMC index of the metrics events.

To support per thread RDPMC TopDown, the metrics and slots counters have
to be saved/restored during the context switching.

The last_period and period_left are not used in the counting mode. Use
the fields for saved_metric and saved_slots.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200723171117.9918-12-kan.liang@linux.intel.com
---
 arch/x86/events/core.c       |  5 +-
 arch/x86/events/intel/core.c | 90 ++++++++++++++++++++++++++++++-----
 include/linux/perf_event.h   | 29 +++++++----
 3 files changed, 102 insertions(+), 22 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index ebf723f..0f3d015 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2257,7 +2257,10 @@ static int x86_pmu_event_idx(struct perf_event *event)
 	if (!(hwc->flags & PERF_X86_EVENT_RDPMC_ALLOWED))
 		return 0;
 
-	return hwc->event_base_rdpmc + 1;
+	if (is_metric_idx(hwc->idx))
+		return INTEL_PMC_FIXED_RDPMC_METRICS + 1;
+	else
+		return hwc->event_base_rdpmc + 1;
 }
 
 static ssize_t get_attr_rdpmc(struct device *cdev,
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index db83334..c72e490 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2258,7 +2258,13 @@ static int icl_set_topdown_event_period(struct perf_event *event)
 	if (left == x86_pmu.max_period) {
 		wrmsrl(MSR_CORE_PERF_FIXED_CTR3, 0);
 		wrmsrl(MSR_PERF_METRICS, 0);
-		local64_set(&hwc->period_left, 0);
+		hwc->saved_slots = 0;
+		hwc->saved_metric = 0;
+	}
+
+	if ((hwc->saved_slots) && is_slots_event(event)) {
+		wrmsrl(MSR_CORE_PERF_FIXED_CTR3, hwc->saved_slots);
+		wrmsrl(MSR_PERF_METRICS, hwc->saved_metric);
 	}
 
 	perf_event_update_userpage(event);
@@ -2279,7 +2285,7 @@ static inline u64 icl_get_metrics_event_value(u64 metric, u64 slots, int idx)
 	return  mul_u64_u32_div(slots, val, 0xff);
 }
 
-static void __icl_update_topdown_event(struct perf_event *event,
+static u64 icl_get_topdown_value(struct perf_event *event,
 				       u64 slots, u64 metrics)
 {
 	int idx = event->hw.idx;
@@ -2290,7 +2296,50 @@ static void __icl_update_topdown_event(struct perf_event *event,
 	else
 		delta = slots;
 
-	local64_add(delta, &event->count);
+	return delta;
+}
+
+static void __icl_update_topdown_event(struct perf_event *event,
+				       u64 slots, u64 metrics,
+				       u64 last_slots, u64 last_metrics)
+{
+	u64 delta, last = 0;
+
+	delta = icl_get_topdown_value(event, slots, metrics);
+	if (last_slots)
+		last = icl_get_topdown_value(event, last_slots, last_metrics);
+
+	/*
+	 * The 8bit integer fraction of metric may be not accurate,
+	 * especially when the changes is very small.
+	 * For example, if only a few bad_spec happens, the fraction
+	 * may be reduced from 1 to 0. If so, the bad_spec event value
+	 * will be 0 which is definitely less than the last value.
+	 * Avoid update event->count for this case.
+	 */
+	if (delta > last) {
+		delta -= last;
+		local64_add(delta, &event->count);
+	}
+}
+
+static void update_saved_topdown_regs(struct perf_event *event,
+				      u64 slots, u64 metrics)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	struct perf_event *other;
+	int idx;
+
+	event->hw.saved_slots = slots;
+	event->hw.saved_metric = metrics;
+
+	for_each_set_bit(idx, cpuc->active_mask, INTEL_PMC_IDX_TD_BE_BOUND + 1) {
+		if (!is_topdown_idx(idx))
+			continue;
+		other = cpuc->events[idx];
+		other->hw.saved_slots = slots;
+		other->hw.saved_metric = metrics;
+	}
 }
 
 /*
@@ -2304,6 +2353,7 @@ static u64 icl_update_topdown_event(struct perf_event *event)
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct perf_event *other;
 	u64 slots, metrics;
+	bool reset = true;
 	int idx;
 
 	/* read Fixed counter 3 */
@@ -2318,19 +2368,39 @@ static u64 icl_update_topdown_event(struct perf_event *event)
 		if (!is_topdown_idx(idx))
 			continue;
 		other = cpuc->events[idx];
-		__icl_update_topdown_event(other, slots, metrics);
+		__icl_update_topdown_event(other, slots, metrics,
+					   event ? event->hw.saved_slots : 0,
+					   event ? event->hw.saved_metric : 0);
 	}
 
 	/*
 	 * Check and update this event, which may have been cleared
 	 * in active_mask e.g. x86_pmu_stop()
 	 */
-	if (event && !test_bit(event->hw.idx, cpuc->active_mask))
-		__icl_update_topdown_event(event, slots, metrics);
+	if (event && !test_bit(event->hw.idx, cpuc->active_mask)) {
+		__icl_update_topdown_event(event, slots, metrics,
+					   event->hw.saved_slots,
+					   event->hw.saved_metric);
 
-	/* The fixed counter 3 has to be written before the PERF_METRICS. */
-	wrmsrl(MSR_CORE_PERF_FIXED_CTR3, 0);
-	wrmsrl(MSR_PERF_METRICS, 0);
+		/*
+		 * In x86_pmu_stop(), the event is cleared in active_mask first,
+		 * then drain the delta, which indicates context switch for
+		 * counting.
+		 * Save metric and slots for context switch.
+		 * Don't need to reset the PERF_METRICS and Fixed counter 3.
+		 * Because the values will be restored in next schedule in.
+		 */
+		update_saved_topdown_regs(event, slots, metrics);
+		reset = false;
+	}
+
+	if (reset) {
+		/* The fixed counter 3 has to be written before the PERF_METRICS. */
+		wrmsrl(MSR_CORE_PERF_FIXED_CTR3, 0);
+		wrmsrl(MSR_PERF_METRICS, 0);
+		if (event)
+			update_saved_topdown_regs(event, 0, 0);
+	}
 
 	return slots;
 }
@@ -3578,8 +3648,6 @@ static int intel_pmu_hw_config(struct perf_event *event)
 			 */
 			leader->hw.flags |= PERF_X86_EVENT_TOPDOWN;
 			event->hw.flags  |= PERF_X86_EVENT_TOPDOWN;
-
-			event->hw.flags &= ~PERF_X86_EVENT_RDPMC_ALLOWED;
 		}
 	}
 
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6048650..46a3974 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -212,17 +212,26 @@ struct hw_perf_event {
 	 */
 	u64				sample_period;
 
-	/*
-	 * The period we started this sample with.
-	 */
-	u64				last_period;
+	union {
+		struct { /* Sampling */
+			/*
+			 * The period we started this sample with.
+			 */
+			u64				last_period;
 
-	/*
-	 * However much is left of the current period; note that this is
-	 * a full 64bit value and allows for generation of periods longer
-	 * than hardware might allow.
-	 */
-	local64_t			period_left;
+			/*
+			 * However much is left of the current period;
+			 * note that this is a full 64bit value and
+			 * allows for generation of periods longer
+			 * than hardware might allow.
+			 */
+			local64_t			period_left;
+		};
+		struct { /* Topdown events counting for context switch */
+			u64				saved_metric;
+			u64				saved_slots;
+		};
+	};
 
 	/*
 	 * State for throttling the event, see __perf_event_overflow() and

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip: perf/core] perf/x86/intel: Support TopDown metrics on Ice Lake
  2020-07-23 17:11 ` [PATCH V7 10/14] perf/x86/intel: Support TopDown metrics on Ice Lake kan.liang
@ 2020-08-19  8:52   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-08-19  8:52 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Kan Liang, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     59a854e2f3b90ad2cc7368ae392de40b981ad51d
Gitweb:        https://git.kernel.org/tip/59a854e2f3b90ad2cc7368ae392de40b981ad51d
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 23 Jul 2020 10:11:13 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 18 Aug 2020 16:34:37 +02:00

perf/x86/intel: Support TopDown metrics on Ice Lake

Ice Lake supports the hardware TopDown metrics feature, which can free
up the scarce GP counters.

Update the event constraints for the metrics events. The metric counters
do not exist, which are mapped to a dummy offset. The sharing between
multiple users of the same metric without multiplexing is not allowed.

Implement set_topdown_event_period for Ice Lake. The values in
PERF_METRICS MSR are derived from the fixed counter 3. Both registers
should start from zero.

Implement update_topdown_event for Ice Lake. The metric is reported by
multiplying the metric (fraction) with slots. To maintain accurate
measurements, both registers are cleared for each update. The fixed
counter 3 should always be cleared before the PERF_METRICS.

Implement td_attr for the new metrics events and the new slots fixed
counter. Make them visible to the perf user tools.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200723171117.9918-11-kan.liang@linux.intel.com
---
 arch/x86/events/intel/core.c      | 118 +++++++++++++++++++++++++++++-
 arch/x86/events/perf_event.h      |  13 +++-
 arch/x86/include/asm/msr-index.h  |   2 +-
 arch/x86/include/asm/perf_event.h |   2 +-
 4 files changed, 135 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 4a43668..db83334 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -247,6 +247,10 @@ static struct event_constraint intel_icl_event_constraints[] = {
 	FIXED_EVENT_CONSTRAINT(0x003c, 1),	/* CPU_CLK_UNHALTED.CORE */
 	FIXED_EVENT_CONSTRAINT(0x0300, 2),	/* CPU_CLK_UNHALTED.REF */
 	FIXED_EVENT_CONSTRAINT(0x0400, 3),	/* SLOTS */
+	METRIC_EVENT_CONSTRAINT(INTEL_TD_METRIC_RETIRING, 0),
+	METRIC_EVENT_CONSTRAINT(INTEL_TD_METRIC_BAD_SPEC, 1),
+	METRIC_EVENT_CONSTRAINT(INTEL_TD_METRIC_FE_BOUND, 2),
+	METRIC_EVENT_CONSTRAINT(INTEL_TD_METRIC_BE_BOUND, 3),
 	INTEL_EVENT_CONSTRAINT_RANGE(0x03, 0x0a, 0xf),
 	INTEL_EVENT_CONSTRAINT_RANGE(0x1f, 0x28, 0xf),
 	INTEL_EVENT_CONSTRAINT(0x32, 0xf),	/* SW_PREFETCH_ACCESS.* */
@@ -309,6 +313,12 @@ EVENT_ATTR_STR_HT(topdown-recovery-bubbles, td_recovery_bubbles,
 EVENT_ATTR_STR_HT(topdown-recovery-bubbles.scale, td_recovery_bubbles_scale,
 	"4", "2");
 
+EVENT_ATTR_STR(slots,			slots,		"event=0x00,umask=0x4");
+EVENT_ATTR_STR(topdown-retiring,	td_retiring,	"event=0x00,umask=0x80");
+EVENT_ATTR_STR(topdown-bad-spec,	td_bad_spec,	"event=0x00,umask=0x81");
+EVENT_ATTR_STR(topdown-fe-bound,	td_fe_bound,	"event=0x00,umask=0x82");
+EVENT_ATTR_STR(topdown-be-bound,	td_be_bound,	"event=0x00,umask=0x83");
+
 static struct attribute *snb_events_attrs[] = {
 	EVENT_PTR(td_slots_issued),
 	EVENT_PTR(td_slots_retired),
@@ -2232,6 +2242,99 @@ static void intel_pmu_del_event(struct perf_event *event)
 		intel_pmu_pebs_del(event);
 }
 
+static int icl_set_topdown_event_period(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	s64 left = local64_read(&hwc->period_left);
+
+	/*
+	 * The values in PERF_METRICS MSR are derived from fixed counter 3.
+	 * Software should start both registers, PERF_METRICS and fixed
+	 * counter 3, from zero.
+	 * Clear PERF_METRICS and Fixed counter 3 in initialization.
+	 * After that, both MSRs will be cleared for each read.
+	 * Don't need to clear them again.
+	 */
+	if (left == x86_pmu.max_period) {
+		wrmsrl(MSR_CORE_PERF_FIXED_CTR3, 0);
+		wrmsrl(MSR_PERF_METRICS, 0);
+		local64_set(&hwc->period_left, 0);
+	}
+
+	perf_event_update_userpage(event);
+
+	return 0;
+}
+
+static inline u64 icl_get_metrics_event_value(u64 metric, u64 slots, int idx)
+{
+	u32 val;
+
+	/*
+	 * The metric is reported as an 8bit integer fraction
+	 * suming up to 0xff.
+	 * slots-in-metric = (Metric / 0xff) * slots
+	 */
+	val = (metric >> ((idx - INTEL_PMC_IDX_METRIC_BASE) * 8)) & 0xff;
+	return  mul_u64_u32_div(slots, val, 0xff);
+}
+
+static void __icl_update_topdown_event(struct perf_event *event,
+				       u64 slots, u64 metrics)
+{
+	int idx = event->hw.idx;
+	u64 delta;
+
+	if (is_metric_idx(idx))
+		delta = icl_get_metrics_event_value(metrics, slots, idx);
+	else
+		delta = slots;
+
+	local64_add(delta, &event->count);
+}
+
+/*
+ * Update all active Topdown events.
+ *
+ * The PERF_METRICS and Fixed counter 3 are read separately. The values may be
+ * modify by a NMI. PMU has to be disabled before calling this function.
+ */
+static u64 icl_update_topdown_event(struct perf_event *event)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	struct perf_event *other;
+	u64 slots, metrics;
+	int idx;
+
+	/* read Fixed counter 3 */
+	rdpmcl((3 | INTEL_PMC_FIXED_RDPMC_BASE), slots);
+	if (!slots)
+		return 0;
+
+	/* read PERF_METRICS */
+	rdpmcl(INTEL_PMC_FIXED_RDPMC_METRICS, metrics);
+
+	for_each_set_bit(idx, cpuc->active_mask, INTEL_PMC_IDX_TD_BE_BOUND + 1) {
+		if (!is_topdown_idx(idx))
+			continue;
+		other = cpuc->events[idx];
+		__icl_update_topdown_event(other, slots, metrics);
+	}
+
+	/*
+	 * Check and update this event, which may have been cleared
+	 * in active_mask e.g. x86_pmu_stop()
+	 */
+	if (event && !test_bit(event->hw.idx, cpuc->active_mask))
+		__icl_update_topdown_event(event, slots, metrics);
+
+	/* The fixed counter 3 has to be written before the PERF_METRICS. */
+	wrmsrl(MSR_CORE_PERF_FIXED_CTR3, 0);
+	wrmsrl(MSR_PERF_METRICS, 0);
+
+	return slots;
+}
+
 static void intel_pmu_read_topdown_event(struct perf_event *event)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
@@ -4480,6 +4583,15 @@ static struct attribute *icl_events_attrs[] = {
 	NULL,
 };
 
+static struct attribute *icl_td_events_attrs[] = {
+	EVENT_PTR(slots),
+	EVENT_PTR(td_retiring),
+	EVENT_PTR(td_bad_spec),
+	EVENT_PTR(td_fe_bound),
+	EVENT_PTR(td_be_bound),
+	NULL,
+};
+
 static struct attribute *icl_tsx_events_attrs[] = {
 	EVENT_PTR(tx_start),
 	EVENT_PTR(tx_abort),
@@ -5264,10 +5376,13 @@ __init int intel_pmu_init(void)
 			hsw_format_attr : nhm_format_attr;
 		extra_skl_attr = skl_format_attr;
 		mem_attr = icl_events_attrs;
+		td_attr = icl_td_events_attrs;
 		tsx_attr = icl_tsx_events_attrs;
 		x86_pmu.rtm_abort_event = X86_CONFIG(.event=0xca, .umask=0x02);
 		x86_pmu.lbr_pt_coexist = true;
 		intel_pmu_pebs_data_source_skl(pmem);
+		x86_pmu.update_topdown_event = icl_update_topdown_event;
+		x86_pmu.set_topdown_event_period = icl_set_topdown_event_period;
 		pr_cont("Icelake events, ");
 		name = "icelake";
 		break;
@@ -5387,6 +5502,9 @@ __init int intel_pmu_init(void)
 	if (x86_pmu.counter_freezing)
 		x86_pmu.handle_irq = intel_pmu_handle_irq_v4;
 
+	if (x86_pmu.intel_cap.perf_metrics)
+		x86_pmu.intel_ctrl |= 1ULL << GLOBAL_CTRL_EN_PERF_METRICS;
+
 	return 0;
 }
 
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index dea009c..3454424 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -407,6 +407,19 @@ struct cpu_hw_events {
 	EVENT_CONSTRAINT(c, (1ULL << (32+n)), FIXED_EVENT_FLAGS)
 
 /*
+ * The special metric counters do not actually exist. They are calculated from
+ * the combination of the FxCtr3 + MSR_PERF_METRICS.
+ *
+ * The special metric counters are mapped to a dummy offset for the scheduler.
+ * The sharing between multiple users of the same metric without multiplexing
+ * is not allowed, even though the hardware supports that in principle.
+ */
+
+#define METRIC_EVENT_CONSTRAINT(c, n)					\
+	EVENT_CONSTRAINT(c, (1ULL << (INTEL_PMC_IDX_METRIC_BASE + n)),	\
+			 INTEL_ARCH_EVENT_MASK)
+
+/*
  * Constraint on the Event code + UMask
  */
 #define INTEL_UEVENT_CONSTRAINT(c, n)	\
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index fabb61c..dc131b8 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -863,6 +863,8 @@
 #define MSR_CORE_PERF_GLOBAL_CTRL	0x0000038f
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL	0x00000390
 
+#define MSR_PERF_METRICS		0x00000329
+
 /* PERF_GLOBAL_OVF_CTL bits */
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI_BIT	55
 #define MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI		(1ULL << MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI_BIT)
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 964ba31..e20aa58 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -198,6 +198,7 @@ struct x86_pmu_capability {
 
 /* RDPMC offset for Fixed PMCs */
 #define INTEL_PMC_FIXED_RDPMC_BASE		(1 << 30)
+#define INTEL_PMC_FIXED_RDPMC_METRICS		(1 << 29)
 
 /*
  * All the fixed-mode PMCs are configured via this single MSR:
@@ -305,6 +306,7 @@ static inline bool is_topdown_idx(int idx)
 #define GLOBAL_STATUS_TRACE_TOPAPMI		BIT_ULL(GLOBAL_STATUS_TRACE_TOPAPMI_BIT)
 #define GLOBAL_STATUS_PERF_METRICS_OVF_BIT	48
 
+#define GLOBAL_CTRL_EN_PERF_METRICS		48
 /*
  * We model guest LBR event tracing as another fixed-mode PMC like BTS.
  *

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip: perf/core] perf/x86: Add a macro for RDPMC offset of fixed counters
  2020-07-23 17:11 ` [PATCH V7 09/14] perf/x86: Add a macro for RDPMC offset of fixed counters kan.liang
@ 2020-08-19  8:52   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-08-19  8:52 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Kan Liang, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     0e2e45e2ded4988f5641115fd996c75dc32e4be3
Gitweb:        https://git.kernel.org/tip/0e2e45e2ded4988f5641115fd996c75dc32e4be3
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 23 Jul 2020 10:11:12 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 18 Aug 2020 16:34:36 +02:00

perf/x86: Add a macro for RDPMC offset of fixed counters

The RDPMC base offset of fixed counters is hard-code. Use a meaningful
name to replace the magic number to improve the readability of the code.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200723171117.9918-10-kan.liang@linux.intel.com
---
 arch/x86/events/core.c            | 3 ++-
 arch/x86/include/asm/perf_event.h | 3 +++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 53fcf0a..ebf723f 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1151,7 +1151,8 @@ static inline void x86_assign_hw_event(struct perf_event *event,
 		hwc->config_base = MSR_ARCH_PERFMON_FIXED_CTR_CTRL;
 		hwc->event_base = MSR_ARCH_PERFMON_FIXED_CTR0 +
 				(idx - INTEL_PMC_IDX_FIXED);
-		hwc->event_base_rdpmc = (idx - INTEL_PMC_IDX_FIXED) | 1<<30;
+		hwc->event_base_rdpmc = (idx - INTEL_PMC_IDX_FIXED) |
+					INTEL_PMC_FIXED_RDPMC_BASE;
 		break;
 
 	default:
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 000cab7..964ba31 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -196,6 +196,9 @@ struct x86_pmu_capability {
  * Fixed-purpose performance events:
  */
 
+/* RDPMC offset for Fixed PMCs */
+#define INTEL_PMC_FIXED_RDPMC_BASE		(1 << 30)
+
 /*
  * All the fixed-mode PMCs are configured via this single MSR:
  */

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip: perf/core] perf/x86/intel: Generic support for hardware TopDown metrics
  2020-07-23 17:11 ` [PATCH V7 08/14] perf/x86/intel: Generic support for hardware TopDown metrics kan.liang
  2020-07-24 13:19   ` peterz
@ 2020-08-19  8:52   ` tip-bot2 for Kan Liang
  1 sibling, 0 replies; 48+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-08-19  8:52 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra, Kan Liang, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     7b2c05a15d29d0570a0d21da1e4fd5cbc85cbf13
Gitweb:        https://git.kernel.org/tip/7b2c05a15d29d0570a0d21da1e4fd5cbc85cbf13
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 23 Jul 2020 10:11:11 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 18 Aug 2020 16:34:36 +02:00

perf/x86/intel: Generic support for hardware TopDown metrics

Intro
=====

The TopDown Microarchitecture Analysis (TMA) Method is a structured
analysis methodology to identify critical performance bottlenecks in
out-of-order processors. Current perf has supported the method.

The method works well, but there is one problem. To collect the TopDown
events, several GP counters have to be used. If a user wants to collect
other events at the same time, the multiplexing probably be triggered,
which impacts the accuracy.

To free up the scarce GP counters, the hardware TopDown metrics feature
is introduced from Ice Lake. The hardware implements an additional
"metrics" register and a new Fixed Counter 3 that measures pipeline
"slots". The TopDown events can be calculated from them instead.

Events
======

The level 1 TopDown has four metrics. There is no event-code assigned to
the TopDown metrics. Four metric events are exported as separate perf
events, which map to the internal "metrics" counter register. Those
events do not exist in hardware, but can be allocated by the scheduler.

For the event mapping, a special 0x00 event code is used, which is
reserved for fake events. The metric events start from umask 0x10.

When setting up the metric events, they point to the Fixed Counter 3.
They have to be specially handled.
- Add the update_topdown_event() callback to read the additional metrics
  MSR and generate the metrics.
- Add the set_topdown_event_period() callback to initialize metrics MSR
  and the fixed counter 3.
- Add a variable n_metric_event to track the number of the accepted
  metrics events. The sharing between multiple users of the same metric
  without multiplexing is not allowed.
- Only enable/disable the fixed counter 3 when there are no other active
  TopDown events, which avoid the unnecessary writing of the fixed
  control register.
- Disable the PMU when reading the metrics event. The metrics MSR and
  the fixed counter 3 are read separately. The values may be modified by
  an NMI.

All four metric events don't support sampling. Since they will be
handled specially for event update, a flag PERF_X86_EVENT_TOPDOWN is
introduced to indicate this case.

The slots event can support both sampling and counting.
For counting, the flag is also applied.
For sampling, it will be handled normally as other normal events.

Groups
======

The slots event is required in a Topdown group.
To avoid reading the METRICS register multiple times, the metrics and
slots value can only be updated by slots event in a group.
All active slots and metrics events will be updated one time.
Therefore, the slots event must be before any metric events in a Topdown
group.

NMI
======

The METRICS related register may be overflow. The bit 48 of the STATUS
register will be set. If so, PERF_METRICS and Fixed counter 3 are
required to be reset. The patch also update all active slots and
metrics events in the NMI handler.

The update_topdown_event() has to read two registers separately. The
values may be modified by an NMI. PMU has to be disabled before calling
the function.

RDPMC
======

RDPMC is temporarily disabled. A later patch will enable it.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200723171117.9918-9-kan.liang@linux.intel.com
---
 arch/x86/events/core.c            |  63 ++++++++++++---
 arch/x86/events/intel/core.c      | 124 +++++++++++++++++++++++++++--
 arch/x86/events/perf_event.h      |  37 +++++++++-
 arch/x86/include/asm/msr-index.h  |   1 +-
 arch/x86/include/asm/perf_event.h |  47 +++++++++++-
 5 files changed, 257 insertions(+), 15 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 8e108ea..53fcf0a 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -76,6 +76,9 @@ u64 x86_perf_event_update(struct perf_event *event)
 	if (unlikely(!hwc->event_base))
 		return 0;
 
+	if (unlikely(is_topdown_count(event)) && x86_pmu.update_topdown_event)
+		return x86_pmu.update_topdown_event(event);
+
 	/*
 	 * Careful: an NMI might modify the previous event value.
 	 *
@@ -1031,6 +1034,42 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 	return unsched ? -EINVAL : 0;
 }
 
+static int add_nr_metric_event(struct cpu_hw_events *cpuc,
+			       struct perf_event *event)
+{
+	if (is_metric_event(event)) {
+		if (cpuc->n_metric == INTEL_TD_METRIC_NUM)
+			return -EINVAL;
+		cpuc->n_metric++;
+	}
+
+	return 0;
+}
+
+static void del_nr_metric_event(struct cpu_hw_events *cpuc,
+				struct perf_event *event)
+{
+	if (is_metric_event(event))
+		cpuc->n_metric--;
+}
+
+static int collect_event(struct cpu_hw_events *cpuc, struct perf_event *event,
+			 int max_count, int n)
+{
+
+	if (x86_pmu.intel_cap.perf_metrics && add_nr_metric_event(cpuc, event))
+		return -EINVAL;
+
+	if (n >= max_count + cpuc->n_metric)
+		return -EINVAL;
+
+	cpuc->event_list[n] = event;
+	if (is_counter_pair(&event->hw))
+		cpuc->n_pair++;
+
+	return 0;
+}
+
 /*
  * dogrp: true if must collect siblings events (group)
  * returns total number of events and error code
@@ -1067,28 +1106,22 @@ static int collect_events(struct cpu_hw_events *cpuc, struct perf_event *leader,
 	}
 
 	if (is_x86_event(leader)) {
-		if (n >= max_count)
+		if (collect_event(cpuc, leader, max_count, n))
 			return -EINVAL;
-		cpuc->event_list[n] = leader;
 		n++;
-		if (is_counter_pair(&leader->hw))
-			cpuc->n_pair++;
 	}
+
 	if (!dogrp)
 		return n;
 
 	for_each_sibling_event(event, leader) {
-		if (!is_x86_event(event) ||
-		    event->state <= PERF_EVENT_STATE_OFF)
+		if (!is_x86_event(event) || event->state <= PERF_EVENT_STATE_OFF)
 			continue;
 
-		if (n >= max_count)
+		if (collect_event(cpuc, event, max_count, n))
 			return -EINVAL;
 
-		cpuc->event_list[n] = event;
 		n++;
-		if (is_counter_pair(&event->hw))
-			cpuc->n_pair++;
 	}
 	return n;
 }
@@ -1110,6 +1143,10 @@ static inline void x86_assign_hw_event(struct perf_event *event,
 		hwc->event_base	= 0;
 		break;
 
+	case INTEL_PMC_IDX_METRIC_BASE ... INTEL_PMC_IDX_METRIC_END:
+		/* All the metric events are mapped onto the fixed counter 3. */
+		idx = INTEL_PMC_IDX_FIXED_SLOTS;
+		/* fall through */
 	case INTEL_PMC_IDX_FIXED ... INTEL_PMC_IDX_FIXED_BTS-1:
 		hwc->config_base = MSR_ARCH_PERFMON_FIXED_CTR_CTRL;
 		hwc->event_base = MSR_ARCH_PERFMON_FIXED_CTR0 +
@@ -1245,6 +1282,10 @@ int x86_perf_event_set_period(struct perf_event *event)
 	if (unlikely(!hwc->event_base))
 		return 0;
 
+	if (unlikely(is_topdown_count(event)) &&
+	    x86_pmu.set_topdown_event_period)
+		return x86_pmu.set_topdown_event_period(event);
+
 	/*
 	 * If we are way outside a reasonable range then just skip forward:
 	 */
@@ -1529,6 +1570,8 @@ static void x86_pmu_del(struct perf_event *event, int flags)
 	}
 	cpuc->event_constraint[i-1] = NULL;
 	--cpuc->n_events;
+	if (x86_pmu.intel_cap.perf_metrics)
+		del_nr_metric_event(cpuc, event);
 
 	perf_event_update_userpage(event);
 
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 76eab81..4a43668 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2165,11 +2165,24 @@ static inline void intel_clear_masks(struct perf_event *event, int idx)
 static void intel_pmu_disable_fixed(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
-	int idx = hwc->idx - INTEL_PMC_IDX_FIXED;
 	u64 ctrl_val, mask;
+	int idx = hwc->idx;
 
-	mask = 0xfULL << (idx * 4);
+	if (is_topdown_idx(idx)) {
+		struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+		/*
+		 * When there are other active TopDown events,
+		 * don't disable the fixed counter 3.
+		 */
+		if (*(u64 *)cpuc->active_mask & INTEL_PMC_OTHER_TOPDOWN_BITS(idx))
+			return;
+		idx = INTEL_PMC_IDX_FIXED_SLOTS;
+	}
 
+	intel_clear_masks(event, idx);
+
+	mask = 0xfULL << ((idx - INTEL_PMC_IDX_FIXED) * 4);
 	rdmsrl(hwc->config_base, ctrl_val);
 	ctrl_val &= ~mask;
 	wrmsrl(hwc->config_base, ctrl_val);
@@ -2186,7 +2199,7 @@ static void intel_pmu_disable_event(struct perf_event *event)
 		x86_pmu_disable_event(event);
 		break;
 	case INTEL_PMC_IDX_FIXED ... INTEL_PMC_IDX_FIXED_BTS - 1:
-		intel_clear_masks(event, idx);
+	case INTEL_PMC_IDX_METRIC_BASE ... INTEL_PMC_IDX_METRIC_END:
 		intel_pmu_disable_fixed(event);
 		break;
 	case INTEL_PMC_IDX_FIXED_BTS:
@@ -2219,10 +2232,26 @@ static void intel_pmu_del_event(struct perf_event *event)
 		intel_pmu_pebs_del(event);
 }
 
+static void intel_pmu_read_topdown_event(struct perf_event *event)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+	/* Only need to call update_topdown_event() once for group read. */
+	if ((cpuc->txn_flags & PERF_PMU_TXN_READ) &&
+	    !is_slots_event(event))
+		return;
+
+	perf_pmu_disable(event->pmu);
+	x86_pmu.update_topdown_event(event);
+	perf_pmu_enable(event->pmu);
+}
+
 static void intel_pmu_read_event(struct perf_event *event)
 {
 	if (event->hw.flags & PERF_X86_EVENT_AUTO_RELOAD)
 		intel_pmu_auto_reload_read(event);
+	else if (is_topdown_count(event) && x86_pmu.update_topdown_event)
+		intel_pmu_read_topdown_event(event);
 	else
 		x86_perf_event_update(event);
 }
@@ -2230,8 +2259,22 @@ static void intel_pmu_read_event(struct perf_event *event)
 static void intel_pmu_enable_fixed(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
-	int idx = hwc->idx - INTEL_PMC_IDX_FIXED;
 	u64 ctrl_val, mask, bits = 0;
+	int idx = hwc->idx;
+
+	if (is_topdown_idx(idx)) {
+		struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+		/*
+		 * When there are other active TopDown events,
+		 * don't enable the fixed counter 3 again.
+		 */
+		if (*(u64 *)cpuc->active_mask & INTEL_PMC_OTHER_TOPDOWN_BITS(idx))
+			return;
+
+		idx = INTEL_PMC_IDX_FIXED_SLOTS;
+	}
+
+	intel_set_masks(event, idx);
 
 	/*
 	 * Enable IRQ generation (0x8), if not PEBS,
@@ -2251,6 +2294,7 @@ static void intel_pmu_enable_fixed(struct perf_event *event)
 	if (x86_pmu.version > 2 && hwc->config & ARCH_PERFMON_EVENTSEL_ANY)
 		bits |= 0x4;
 
+	idx -= INTEL_PMC_IDX_FIXED;
 	bits <<= (idx * 4);
 	mask = 0xfULL << (idx * 4);
 
@@ -2279,7 +2323,7 @@ static void intel_pmu_enable_event(struct perf_event *event)
 		__x86_pmu_enable_event(hwc, ARCH_PERFMON_EVENTSEL_ENABLE);
 		break;
 	case INTEL_PMC_IDX_FIXED ... INTEL_PMC_IDX_FIXED_BTS - 1:
-		intel_set_masks(event, idx);
+	case INTEL_PMC_IDX_METRIC_BASE ... INTEL_PMC_IDX_METRIC_END:
 		intel_pmu_enable_fixed(event);
 		break;
 	case INTEL_PMC_IDX_FIXED_BTS:
@@ -2440,6 +2484,15 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
 	}
 
 	/*
+	 * Intel Perf mertrics
+	 */
+	if (__test_and_clear_bit(GLOBAL_STATUS_PERF_METRICS_OVF_BIT, (unsigned long *)&status)) {
+		handled++;
+		if (x86_pmu.update_topdown_event)
+			x86_pmu.update_topdown_event(NULL);
+	}
+
+	/*
 	 * Checkpointed counters can lead to 'spurious' PMIs because the
 	 * rollback caused by the PMI will have cleared the overflow status
 	 * bit. Therefore always force probe these counters.
@@ -3375,6 +3428,58 @@ static int intel_pmu_hw_config(struct perf_event *event)
 	if (event->attr.type != PERF_TYPE_RAW)
 		return 0;
 
+	/*
+	 * Config Topdown slots and metric events
+	 *
+	 * The slots event on Fixed Counter 3 can support sampling,
+	 * which will be handled normally in x86_perf_event_update().
+	 *
+	 * Metric events don't support sampling and require being paired
+	 * with a slots event as group leader. When the slots event
+	 * is used in a metrics group, it too cannot support sampling.
+	 */
+	if (x86_pmu.intel_cap.perf_metrics && is_topdown_event(event)) {
+		if (event->attr.config1 || event->attr.config2)
+			return -EINVAL;
+
+		/*
+		 * The TopDown metrics events and slots event don't
+		 * support any filters.
+		 */
+		if (event->attr.config & X86_ALL_EVENT_FLAGS)
+			return -EINVAL;
+
+		if (is_metric_event(event)) {
+			struct perf_event *leader = event->group_leader;
+
+			/* The metric events don't support sampling. */
+			if (is_sampling_event(event))
+				return -EINVAL;
+
+			/* The metric events require a slots group leader. */
+			if (!is_slots_event(leader))
+				return -EINVAL;
+
+			/*
+			 * The leader/SLOTS must not be a sampling event for
+			 * metric use; hardware requires it starts at 0 when used
+			 * in conjunction with MSR_PERF_METRICS.
+			 */
+			if (is_sampling_event(leader))
+				return -EINVAL;
+
+			event->event_caps |= PERF_EV_CAP_SIBLING;
+			/*
+			 * Only once we have a METRICs sibling do we
+			 * need TopDown magic.
+			 */
+			leader->hw.flags |= PERF_X86_EVENT_TOPDOWN;
+			event->hw.flags  |= PERF_X86_EVENT_TOPDOWN;
+
+			event->hw.flags &= ~PERF_X86_EVENT_RDPMC_ALLOWED;
+		}
+	}
+
 	if (!(event->attr.config & ARCH_PERFMON_EVENTSEL_ANY))
 		return 0;
 
@@ -5218,6 +5323,15 @@ __init int intel_pmu_init(void)
 		 * counter, so do not extend mask to generic counters
 		 */
 		for_each_event_constraint(c, x86_pmu.event_constraints) {
+			/*
+			 * Don't extend the topdown slots and metrics
+			 * events to the generic counters.
+			 */
+			if (c->idxmsk64 & INTEL_PMC_MSK_TOPDOWN) {
+				c->weight = hweight64(c->idxmsk64);
+				continue;
+			}
+
 			if (c->cmask == FIXED_EVENT_FLAGS
 			    && c->idxmsk64 != INTEL_PMC_MSK_FIXED_REF_CYCLES) {
 				c->idxmsk64 |= (1ULL << x86_pmu.num_counters) - 1;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 5d453da..dea009c 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -79,6 +79,31 @@ static inline bool constraint_match(struct event_constraint *c, u64 ecode)
 #define PERF_X86_EVENT_PEBS_VIA_PT	0x0800 /* use PT buffer for PEBS */
 #define PERF_X86_EVENT_PAIR		0x1000 /* Large Increment per Cycle */
 #define PERF_X86_EVENT_LBR_SELECT	0x2000 /* Save/Restore MSR_LBR_SELECT */
+#define PERF_X86_EVENT_TOPDOWN		0x4000 /* Count Topdown slots/metrics events */
+
+static inline bool is_topdown_count(struct perf_event *event)
+{
+	return event->hw.flags & PERF_X86_EVENT_TOPDOWN;
+}
+
+static inline bool is_metric_event(struct perf_event *event)
+{
+	u64 config = event->attr.config;
+
+	return ((config & ARCH_PERFMON_EVENTSEL_EVENT) == 0) &&
+		((config & INTEL_ARCH_EVENT_MASK) >= INTEL_TD_METRIC_RETIRING)  &&
+		((config & INTEL_ARCH_EVENT_MASK) <= INTEL_TD_METRIC_MAX);
+}
+
+static inline bool is_slots_event(struct perf_event *event)
+{
+	return (event->attr.config & INTEL_ARCH_EVENT_MASK) == INTEL_TD_SLOTS;
+}
+
+static inline bool is_topdown_event(struct perf_event *event)
+{
+	return is_metric_event(event) || is_slots_event(event);
+}
 
 struct amd_nb {
 	int nb_id;  /* NorthBridge id */
@@ -285,6 +310,12 @@ struct cpu_hw_events {
 	u64				tfa_shadow;
 
 	/*
+	 * Perf Metrics
+	 */
+	/* number of accepted metrics events */
+	int				n_metric;
+
+	/*
 	 * AMD specific bits
 	 */
 	struct amd_nb			*amd_nb;
@@ -727,6 +758,12 @@ struct x86_pmu {
 	atomic_t	lbr_exclusive[x86_lbr_exclusive_max];
 
 	/*
+	 * Intel perf metrics
+	 */
+	u64		(*update_topdown_event)(struct perf_event *event);
+	int		(*set_topdown_event_period)(struct perf_event *event);
+
+	/*
 	 * perf task context (i.e. struct perf_event_context::task_ctx_data)
 	 * switch helper to bridge calls from perf/core to perf/x86.
 	 * See struct pmu::swap_task_ctx() usage for examples;
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 2859ee4..fabb61c 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -857,6 +857,7 @@
 #define MSR_CORE_PERF_FIXED_CTR0	0x00000309
 #define MSR_CORE_PERF_FIXED_CTR1	0x0000030a
 #define MSR_CORE_PERF_FIXED_CTR2	0x0000030b
+#define MSR_CORE_PERF_FIXED_CTR3	0x0000030c
 #define MSR_CORE_PERF_FIXED_CTR_CTRL	0x0000038d
 #define MSR_CORE_PERF_GLOBAL_STATUS	0x0000038e
 #define MSR_CORE_PERF_GLOBAL_CTRL	0x0000038f
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 58419e5..000cab7 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -244,6 +244,52 @@ struct x86_pmu_capability {
  */
 #define INTEL_PMC_IDX_FIXED_BTS			(INTEL_PMC_IDX_FIXED + 15)
 
+/*
+ * The PERF_METRICS MSR is modeled as several magic fixed-mode PMCs, one for
+ * each TopDown metric event.
+ *
+ * Internally the TopDown metric events are mapped to the FxCtr 3 (SLOTS).
+ */
+#define INTEL_PMC_IDX_METRIC_BASE		(INTEL_PMC_IDX_FIXED + 16)
+#define INTEL_PMC_IDX_TD_RETIRING		(INTEL_PMC_IDX_METRIC_BASE + 0)
+#define INTEL_PMC_IDX_TD_BAD_SPEC		(INTEL_PMC_IDX_METRIC_BASE + 1)
+#define INTEL_PMC_IDX_TD_FE_BOUND		(INTEL_PMC_IDX_METRIC_BASE + 2)
+#define INTEL_PMC_IDX_TD_BE_BOUND		(INTEL_PMC_IDX_METRIC_BASE + 3)
+#define INTEL_PMC_IDX_METRIC_END		INTEL_PMC_IDX_TD_BE_BOUND
+#define INTEL_PMC_MSK_TOPDOWN			((0xfull << INTEL_PMC_IDX_METRIC_BASE) | \
+						INTEL_PMC_MSK_FIXED_SLOTS)
+
+/*
+ * There is no event-code assigned to the TopDown events.
+ *
+ * For the slots event, use the pseudo code of the fixed counter 3.
+ *
+ * For the metric events, the pseudo event-code is 0x00.
+ * The pseudo umask-code starts from the middle of the pseudo event
+ * space, 0x80.
+ */
+#define INTEL_TD_SLOTS				0x0400	/* TOPDOWN.SLOTS */
+/* Level 1 metrics */
+#define INTEL_TD_METRIC_RETIRING		0x8000	/* Retiring metric */
+#define INTEL_TD_METRIC_BAD_SPEC		0x8100	/* Bad speculation metric */
+#define INTEL_TD_METRIC_FE_BOUND		0x8200	/* FE bound metric */
+#define INTEL_TD_METRIC_BE_BOUND		0x8300	/* BE bound metric */
+#define INTEL_TD_METRIC_MAX			INTEL_TD_METRIC_BE_BOUND
+#define INTEL_TD_METRIC_NUM			4
+
+static inline bool is_metric_idx(int idx)
+{
+	return (unsigned)(idx - INTEL_PMC_IDX_METRIC_BASE) < INTEL_TD_METRIC_NUM;
+}
+
+static inline bool is_topdown_idx(int idx)
+{
+	return is_metric_idx(idx) || idx == INTEL_PMC_IDX_FIXED_SLOTS;
+}
+
+#define INTEL_PMC_OTHER_TOPDOWN_BITS(bit)	\
+			(~(0x1ull << bit) & INTEL_PMC_MSK_TOPDOWN)
+
 #define GLOBAL_STATUS_COND_CHG			BIT_ULL(63)
 #define GLOBAL_STATUS_BUFFER_OVF_BIT		62
 #define GLOBAL_STATUS_BUFFER_OVF		BIT_ULL(GLOBAL_STATUS_BUFFER_OVF_BIT)
@@ -254,6 +300,7 @@ struct x86_pmu_capability {
 #define GLOBAL_STATUS_LBRS_FROZEN		BIT_ULL(GLOBAL_STATUS_LBRS_FROZEN_BIT)
 #define GLOBAL_STATUS_TRACE_TOPAPMI_BIT		55
 #define GLOBAL_STATUS_TRACE_TOPAPMI		BIT_ULL(GLOBAL_STATUS_TRACE_TOPAPMI_BIT)
+#define GLOBAL_STATUS_PERF_METRICS_OVF_BIT	48
 
 /*
  * We model guest LBR event tracing as another fixed-mode PMC like BTS.

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip: perf/core] perf/core: Add a new PERF_EV_CAP_SIBLING event capability
  2020-07-23 17:11 ` [PATCH V7 07/14] perf/core: Add a new PERF_EV_CAP_COEXIST event capability kan.liang
  2020-07-24 10:55   ` peterz
@ 2020-08-19  8:52   ` tip-bot2 for Kan Liang
  1 sibling, 0 replies; 48+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-08-19  8:52 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra, Kan Liang, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     9f0c4fa111dc909ca545c45ea20ec84da555ce16
Gitweb:        https://git.kernel.org/tip/9f0c4fa111dc909ca545c45ea20ec84da555ce16
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 23 Jul 2020 10:11:10 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 18 Aug 2020 16:34:36 +02:00

perf/core: Add a new PERF_EV_CAP_SIBLING event capability

Current perf assumes that events in a group are independent. Close an
event doesn't impact the value of the other events in the same group.
If the closed event is a member, after the event closure, other events
are still running like a group. If the closed event is a leader, other
events are running as singleton events.

Add PERF_EV_CAP_SIBLING to allow events to indicate they require being
part of a group, and when the leader dies they cannot exist
independently.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200723171117.9918-8-kan.liang@linux.intel.com
---
 include/linux/perf_event.h |  4 ++++-
 kernel/events/core.c       | 38 ++++++++++++++++++++++++++++++++-----
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 04a49cc..6048650 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -576,9 +576,13 @@ typedef void (*perf_overflow_handler_t)(struct perf_event *,
  * PERF_EV_CAP_SOFTWARE: Is a software event.
  * PERF_EV_CAP_READ_ACTIVE_PKG: A CPU event (or cgroup event) that can be read
  * from any CPU in the package where it is active.
+ * PERF_EV_CAP_SIBLING: An event with this flag must be a group sibling and
+ * cannot be a group leader. If an event with this flag is detached from the
+ * group it is scheduled out and moved into an unrecoverable ERROR state.
  */
 #define PERF_EV_CAP_SOFTWARE		BIT(0)
 #define PERF_EV_CAP_READ_ACTIVE_PKG	BIT(1)
+#define PERF_EV_CAP_SIBLING		BIT(2)
 
 #define SWEVENT_HLIST_BITS		8
 #define SWEVENT_HLIST_SIZE		(1 << SWEVENT_HLIST_BITS)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5bfe8e3..57efe3b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2133,8 +2133,24 @@ static inline struct list_head *get_event_list(struct perf_event *event)
 	return event->attr.pinned ? &ctx->pinned_active : &ctx->flexible_active;
 }
 
+/*
+ * Events that have PERF_EV_CAP_SIBLING require being part of a group and
+ * cannot exist on their own, schedule them out and move them into the ERROR
+ * state. Also see _perf_event_enable(), it will not be able to recover
+ * this ERROR state.
+ */
+static inline void perf_remove_sibling_event(struct perf_event *event)
+{
+	struct perf_event_context *ctx = event->ctx;
+	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
+
+	event_sched_out(event, cpuctx, ctx);
+	perf_event_set_state(event, PERF_EVENT_STATE_ERROR);
+}
+
 static void perf_group_detach(struct perf_event *event)
 {
+	struct perf_event *leader = event->group_leader;
 	struct perf_event *sibling, *tmp;
 	struct perf_event_context *ctx = event->ctx;
 
@@ -2153,7 +2169,7 @@ static void perf_group_detach(struct perf_event *event)
 	/*
 	 * If this is a sibling, remove it from its group.
 	 */
-	if (event->group_leader != event) {
+	if (leader != event) {
 		list_del_init(&event->sibling_list);
 		event->group_leader->nr_siblings--;
 		goto out;
@@ -2166,6 +2182,9 @@ static void perf_group_detach(struct perf_event *event)
 	 */
 	list_for_each_entry_safe(sibling, tmp, &event->sibling_list, sibling_list) {
 
+		if (sibling->event_caps & PERF_EV_CAP_SIBLING)
+			perf_remove_sibling_event(sibling);
+
 		sibling->group_leader = sibling;
 		list_del_init(&sibling->sibling_list);
 
@@ -2183,10 +2202,10 @@ static void perf_group_detach(struct perf_event *event)
 	}
 
 out:
-	perf_event__header_size(event->group_leader);
-
-	for_each_sibling_event(tmp, event->group_leader)
+	for_each_sibling_event(tmp, leader)
 		perf_event__header_size(tmp);
+
+	perf_event__header_size(leader);
 }
 
 static bool is_orphaned_event(struct perf_event *event)
@@ -2979,6 +2998,7 @@ static void _perf_event_enable(struct perf_event *event)
 	raw_spin_lock_irq(&ctx->lock);
 	if (event->state >= PERF_EVENT_STATE_INACTIVE ||
 	    event->state <  PERF_EVENT_STATE_ERROR) {
+out:
 		raw_spin_unlock_irq(&ctx->lock);
 		return;
 	}
@@ -2990,8 +3010,16 @@ static void _perf_event_enable(struct perf_event *event)
 	 * has gone back into error state, as distinct from the task having
 	 * been scheduled away before the cross-call arrived.
 	 */
-	if (event->state == PERF_EVENT_STATE_ERROR)
+	if (event->state == PERF_EVENT_STATE_ERROR) {
+		/*
+		 * Detached SIBLING events cannot leave ERROR state.
+		 */
+		if (event->event_caps & PERF_EV_CAP_SIBLING &&
+		    event->group_leader == event)
+			goto out;
+
 		event->state = PERF_EVENT_STATE_OFF;
+	}
 	raw_spin_unlock_irq(&ctx->lock);
 
 	event_function_call(event, __perf_event_enable, NULL);

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip: perf/core] perf/x86/intel: Use switch in intel_pmu_disable/enable_event
  2020-07-23 17:11 ` [PATCH V7 06/14] perf/x86/intel: Use switch in intel_pmu_disable/enable_event kan.liang
@ 2020-08-19  8:52   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-08-19  8:52 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Kan Liang, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     58da7dbe6f036fefe504a4bb452afbd39bba73f7
Gitweb:        https://git.kernel.org/tip/58da7dbe6f036fefe504a4bb452afbd39bba73f7
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 23 Jul 2020 10:11:09 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 18 Aug 2020 16:34:36 +02:00

perf/x86/intel: Use switch in intel_pmu_disable/enable_event

Currently, the if-else is used in the intel_pmu_disable/enable_event to
check the type of an event. It works well, but with more and more types
added later, e.g., perf metrics, compared to the switch statement, the
if-else may impair the readability of the code.

There is no harm to use the switch statement to replace the if-else
here. Also, some optimizing compilers may compile a switch statement
into a jump-table which is more efficient than if-else for a large
number of cases. The performance gain may not be observed for now,
because the number of cases is only 5, but the benefits may be observed
with more and more types added in the future.

Use switch to replace the if-else in the intel_pmu_disable/enable_event.

If the idx is invalid, print a warning.

For the case INTEL_PMC_IDX_FIXED_BTS in intel_pmu_disable_event, don't
need to check the event->attr.precise_ip. Use return for the case.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200723171117.9918-7-kan.liang@linux.intel.com
---
 arch/x86/events/intel/core.c | 36 +++++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index ac1408f..76eab81 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2180,17 +2180,28 @@ static void intel_pmu_disable_event(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
 
-	if (idx < INTEL_PMC_IDX_FIXED) {
+	switch (idx) {
+	case 0 ... INTEL_PMC_IDX_FIXED - 1:
 		intel_clear_masks(event, idx);
 		x86_pmu_disable_event(event);
-	} else if (idx < INTEL_PMC_IDX_FIXED_BTS) {
+		break;
+	case INTEL_PMC_IDX_FIXED ... INTEL_PMC_IDX_FIXED_BTS - 1:
 		intel_clear_masks(event, idx);
 		intel_pmu_disable_fixed(event);
-	} else if (idx == INTEL_PMC_IDX_FIXED_BTS) {
+		break;
+	case INTEL_PMC_IDX_FIXED_BTS:
 		intel_pmu_disable_bts();
 		intel_pmu_drain_bts_buffer();
-	} else if (idx == INTEL_PMC_IDX_FIXED_VLBR)
+		return;
+	case INTEL_PMC_IDX_FIXED_VLBR:
 		intel_clear_masks(event, idx);
+		break;
+	default:
+		intel_clear_masks(event, idx);
+		pr_warn("Failed to disable the event with invalid index %d\n",
+			idx);
+		return;
+	}
 
 	/*
 	 * Needs to be called after x86_pmu_disable_event,
@@ -2262,18 +2273,27 @@ static void intel_pmu_enable_event(struct perf_event *event)
 	if (unlikely(event->attr.precise_ip))
 		intel_pmu_pebs_enable(event);
 
-	if (idx < INTEL_PMC_IDX_FIXED) {
+	switch (idx) {
+	case 0 ... INTEL_PMC_IDX_FIXED - 1:
 		intel_set_masks(event, idx);
 		__x86_pmu_enable_event(hwc, ARCH_PERFMON_EVENTSEL_ENABLE);
-	} else if (idx < INTEL_PMC_IDX_FIXED_BTS) {
+		break;
+	case INTEL_PMC_IDX_FIXED ... INTEL_PMC_IDX_FIXED_BTS - 1:
 		intel_set_masks(event, idx);
 		intel_pmu_enable_fixed(event);
-	} else if (idx == INTEL_PMC_IDX_FIXED_BTS) {
+		break;
+	case INTEL_PMC_IDX_FIXED_BTS:
 		if (!__this_cpu_read(cpu_hw_events.enabled))
 			return;
 		intel_pmu_enable_bts(hwc->config);
-	} else if (idx == INTEL_PMC_IDX_FIXED_VLBR)
+		break;
+	case INTEL_PMC_IDX_FIXED_VLBR:
 		intel_set_masks(event, idx);
+		break;
+	default:
+		pr_warn("Failed to enable the event with invalid index %d\n",
+			idx);
+	}
 }
 
 static void intel_pmu_add_event(struct perf_event *event)

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip: perf/core] perf/x86/intel: Fix the name of perf METRICS
  2020-07-23 17:11 ` [PATCH V7 05/14] perf/x86/intel: Fix the name of perf METRICS kan.liang
@ 2020-08-19  8:52   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-08-19  8:52 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Kan Liang, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     bbdbde2a415d9f479803266cae6fb0c1a9f6c80e
Gitweb:        https://git.kernel.org/tip/bbdbde2a415d9f479803266cae6fb0c1a9f6c80e
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 23 Jul 2020 10:11:08 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 18 Aug 2020 16:34:35 +02:00

perf/x86/intel: Fix the name of perf METRICS

Bit 15 of the PERF_CAPABILITIES MSR indicates that the perf METRICS
feature is supported. The perf METRICS is not a PEBS feature.

Rename pebs_metrics_available perf_metrics.

The bit is not used in the current code. It will be used in a later
patch.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200723171117.9918-6-kan.liang@linux.intel.com
---
 arch/x86/events/perf_event.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 7b68ab5..5d453da 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -537,7 +537,7 @@ union perf_capabilities {
 		 */
 		u64	full_width_write:1;
 		u64     pebs_baseline:1;
-		u64	pebs_metrics_available:1;
+		u64	perf_metrics:1;
 		u64	pebs_output_pt_available:1;
 	};
 	u64	capabilities;

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip: perf/core] perf/x86/intel: Move BTS index to 47
  2020-07-23 17:11 ` [PATCH V7 04/14] perf/x86/intel: Move BTS index to 47 kan.liang
@ 2020-08-19  8:52   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-08-19  8:52 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Kan Liang, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     d39fcc32893dac2d02900d99c38276a00cc54d60
Gitweb:        https://git.kernel.org/tip/d39fcc32893dac2d02900d99c38276a00cc54d60
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 23 Jul 2020 10:11:07 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 18 Aug 2020 16:34:35 +02:00

perf/x86/intel: Move BTS index to 47

The bit 48 in the PERF_GLOBAL_STATUS is used to indicate the overflow
status of the PERF_METRICS counters.

Move the BTS index to the bit 47.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200723171117.9918-5-kan.liang@linux.intel.com
---
 arch/x86/include/asm/perf_event.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index fe8110a..58419e5 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -238,11 +238,11 @@ struct x86_pmu_capability {
 /*
  * We model BTS tracing as another fixed-mode PMC.
  *
- * We choose a value in the middle of the fixed event range, since lower
+ * We choose the value 47 for the fixed index of BTS, since lower
  * values are used by actual fixed events and higher values are used
  * to indicate other overflow conditions in the PERF_GLOBAL_STATUS msr.
  */
-#define INTEL_PMC_IDX_FIXED_BTS			(INTEL_PMC_IDX_FIXED + 16)
+#define INTEL_PMC_IDX_FIXED_BTS			(INTEL_PMC_IDX_FIXED + 15)
 
 #define GLOBAL_STATUS_COND_CHG			BIT_ULL(63)
 #define GLOBAL_STATUS_BUFFER_OVF_BIT		62

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip: perf/core] perf/x86/intel: Introduce the fourth fixed counter
  2020-07-23 17:11 ` [PATCH V7 03/14] perf/x86/intel: Introduce the fourth fixed counter kan.liang
@ 2020-08-19  8:52   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-08-19  8:52 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Kan Liang, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     6f7225099d5f3ec3019f380a0da2b456b7796cb0
Gitweb:        https://git.kernel.org/tip/6f7225099d5f3ec3019f380a0da2b456b7796cb0
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 23 Jul 2020 10:11:06 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 18 Aug 2020 16:34:35 +02:00

perf/x86/intel: Introduce the fourth fixed counter

The fourth fixed counter, TOPDOWN.SLOTS, is introduced in Ice Lake to
measure the level 1 TopDown events.

Add MSR address and macros for the new fixed counter, which will be used
in a later patch.

Add comments to explain the event encoding rules for the fixed counters.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200723171117.9918-4-kan.liang@linux.intel.com
---
 arch/x86/include/asm/perf_event.h | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index fd3eba6..fe8110a 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -197,12 +197,24 @@ struct x86_pmu_capability {
  */
 
 /*
- * All 3 fixed-mode PMCs are configured via this single MSR:
+ * All the fixed-mode PMCs are configured via this single MSR:
  */
 #define MSR_ARCH_PERFMON_FIXED_CTR_CTRL	0x38d
 
 /*
- * The counts are available in three separate MSRs:
+ * There is no event-code assigned to the fixed-mode PMCs.
+ *
+ * For a fixed-mode PMC, which has an equivalent event on a general-purpose
+ * PMC, the event-code of the equivalent event is used for the fixed-mode PMC,
+ * e.g., Instr_Retired.Any and CPU_CLK_Unhalted.Core.
+ *
+ * For a fixed-mode PMC, which doesn't have an equivalent event, a
+ * pseudo-encoding is used, e.g., CPU_CLK_Unhalted.Ref and TOPDOWN.SLOTS.
+ * The pseudo event-code for a fixed-mode PMC must be 0x00.
+ * The pseudo umask-code is 0xX. The X equals the index of the fixed
+ * counter + 1, e.g., the fixed counter 2 has the pseudo-encoding 0x0300.
+ *
+ * The counts are available in separate MSRs:
  */
 
 /* Instr_Retired.Any: */
@@ -213,11 +225,16 @@ struct x86_pmu_capability {
 #define MSR_ARCH_PERFMON_FIXED_CTR1	0x30a
 #define INTEL_PMC_IDX_FIXED_CPU_CYCLES	(INTEL_PMC_IDX_FIXED + 1)
 
-/* CPU_CLK_Unhalted.Ref: */
+/* CPU_CLK_Unhalted.Ref: event=0x00,umask=0x3 (pseudo-encoding) */
 #define MSR_ARCH_PERFMON_FIXED_CTR2	0x30b
 #define INTEL_PMC_IDX_FIXED_REF_CYCLES	(INTEL_PMC_IDX_FIXED + 2)
 #define INTEL_PMC_MSK_FIXED_REF_CYCLES	(1ULL << INTEL_PMC_IDX_FIXED_REF_CYCLES)
 
+/* TOPDOWN.SLOTS: event=0x00,umask=0x4 (pseudo-encoding) */
+#define MSR_ARCH_PERFMON_FIXED_CTR3	0x30c
+#define INTEL_PMC_IDX_FIXED_SLOTS	(INTEL_PMC_IDX_FIXED + 3)
+#define INTEL_PMC_MSK_FIXED_SLOTS	(1ULL << INTEL_PMC_IDX_FIXED_SLOTS)
+
 /*
  * We model BTS tracing as another fixed-mode PMC.
  *

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip: perf/core] perf/x86/intel: Name the global status bit in NMI handler
  2020-07-23 17:11 ` [PATCH V7 02/14] perf/x86/intel: Name the global status bit in NMI handler kan.liang
@ 2020-08-19  8:52   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-08-19  8:52 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra, Kan Liang, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     60a2a271cf05cf046c522e1d7f62116b4bcb32a2
Gitweb:        https://git.kernel.org/tip/60a2a271cf05cf046c522e1d7f62116b4bcb32a2
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 23 Jul 2020 10:11:05 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 18 Aug 2020 16:34:34 +02:00

perf/x86/intel: Name the global status bit in NMI handler

Magic numbers are used in the current NMI handler for the global status
bit. Use a meaningful name to replace the magic numbers to improve the
readability of the code.

Remove a Tab for all GLOBAL_STATUS_* and INTEL_PMC_IDX_FIXED_BTS macros
to reduce the length of the line.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200723171117.9918-3-kan.liang@linux.intel.com
---
 arch/x86/events/intel/core.c      |  4 ++--
 arch/x86/include/asm/perf_event.h | 22 ++++++++++++----------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 5096347..ac1408f 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2389,7 +2389,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
 	/*
 	 * PEBS overflow sets bit 62 in the global status register
 	 */
-	if (__test_and_clear_bit(62, (unsigned long *)&status)) {
+	if (__test_and_clear_bit(GLOBAL_STATUS_BUFFER_OVF_BIT, (unsigned long *)&status)) {
 		u64 pebs_enabled = cpuc->pebs_enabled;
 
 		handled++;
@@ -2410,7 +2410,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
 	/*
 	 * Intel PT
 	 */
-	if (__test_and_clear_bit(55, (unsigned long *)&status)) {
+	if (__test_and_clear_bit(GLOBAL_STATUS_TRACE_TOPAPMI_BIT, (unsigned long *)&status)) {
 		handled++;
 		if (unlikely(perf_guest_cbs && perf_guest_cbs->is_in_guest() &&
 			perf_guest_cbs->handle_intel_pt_intr))
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 0c1b137..fd3eba6 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -225,16 +225,18 @@ struct x86_pmu_capability {
  * values are used by actual fixed events and higher values are used
  * to indicate other overflow conditions in the PERF_GLOBAL_STATUS msr.
  */
-#define INTEL_PMC_IDX_FIXED_BTS				(INTEL_PMC_IDX_FIXED + 16)
-
-#define GLOBAL_STATUS_COND_CHG				BIT_ULL(63)
-#define GLOBAL_STATUS_BUFFER_OVF			BIT_ULL(62)
-#define GLOBAL_STATUS_UNC_OVF				BIT_ULL(61)
-#define GLOBAL_STATUS_ASIF				BIT_ULL(60)
-#define GLOBAL_STATUS_COUNTERS_FROZEN			BIT_ULL(59)
-#define GLOBAL_STATUS_LBRS_FROZEN_BIT			58
-#define GLOBAL_STATUS_LBRS_FROZEN			BIT_ULL(GLOBAL_STATUS_LBRS_FROZEN_BIT)
-#define GLOBAL_STATUS_TRACE_TOPAPMI			BIT_ULL(55)
+#define INTEL_PMC_IDX_FIXED_BTS			(INTEL_PMC_IDX_FIXED + 16)
+
+#define GLOBAL_STATUS_COND_CHG			BIT_ULL(63)
+#define GLOBAL_STATUS_BUFFER_OVF_BIT		62
+#define GLOBAL_STATUS_BUFFER_OVF		BIT_ULL(GLOBAL_STATUS_BUFFER_OVF_BIT)
+#define GLOBAL_STATUS_UNC_OVF			BIT_ULL(61)
+#define GLOBAL_STATUS_ASIF			BIT_ULL(60)
+#define GLOBAL_STATUS_COUNTERS_FROZEN		BIT_ULL(59)
+#define GLOBAL_STATUS_LBRS_FROZEN_BIT		58
+#define GLOBAL_STATUS_LBRS_FROZEN		BIT_ULL(GLOBAL_STATUS_LBRS_FROZEN_BIT)
+#define GLOBAL_STATUS_TRACE_TOPAPMI_BIT		55
+#define GLOBAL_STATUS_TRACE_TOPAPMI		BIT_ULL(GLOBAL_STATUS_TRACE_TOPAPMI_BIT)
 
 /*
  * We model guest LBR event tracing as another fixed-mode PMC like BTS.

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [tip: perf/core] perf/x86: Use event_base_rdpmc for the RDPMC userspace support
  2020-07-23 17:11 ` [PATCH V7 01/14] perf/x86: Use event_base_rdpmc for the RDPMC userspace support kan.liang
@ 2020-08-19  8:52   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 48+ messages in thread
From: tip-bot2 for Kan Liang @ 2020-08-19  8:52 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra, Kan Liang, x86, LKML

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     75608cb02ea5dd997990e2998eca3670cb71a18c
Gitweb:        https://git.kernel.org/tip/75608cb02ea5dd997990e2998eca3670cb71a18c
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Thu, 23 Jul 2020 10:11:04 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 18 Aug 2020 16:34:34 +02:00

perf/x86: Use event_base_rdpmc for the RDPMC userspace support

The RDPMC index is always re-calculated for the RDPMC userspace support,
which is unnecessary.

The RDPMC index value is stored in the variable event_base_rdpmc for
the kernel usage, which can be used for RDPMC userspace support as well.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200723171117.9918-2-kan.liang@linux.intel.com
---
 arch/x86/events/core.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 1cbf57d..8e108ea 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2208,17 +2208,12 @@ static void x86_pmu_event_unmapped(struct perf_event *event, struct mm_struct *m
 
 static int x86_pmu_event_idx(struct perf_event *event)
 {
-	int idx = event->hw.idx;
+	struct hw_perf_event *hwc = &event->hw;
 
-	if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
+	if (!(hwc->flags & PERF_X86_EVENT_RDPMC_ALLOWED))
 		return 0;
 
-	if (x86_pmu.num_counters_fixed && idx >= INTEL_PMC_IDX_FIXED) {
-		idx -= INTEL_PMC_IDX_FIXED;
-		idx |= 1 << 30;
-	}
-
-	return idx + 1;
+	return hwc->event_base_rdpmc + 1;
 }
 
 static ssize_t get_attr_rdpmc(struct device *cdev,

^ permalink raw reply related	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2020-08-19  8:56 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-23 17:11 [PATCH V7 00/14] TopDown metrics support for Icelake kan.liang
2020-07-23 17:11 ` [PATCH V7 01/14] perf/x86: Use event_base_rdpmc for the RDPMC userspace support kan.liang
2020-08-19  8:52   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-23 17:11 ` [PATCH V7 02/14] perf/x86/intel: Name the global status bit in NMI handler kan.liang
2020-08-19  8:52   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-23 17:11 ` [PATCH V7 03/14] perf/x86/intel: Introduce the fourth fixed counter kan.liang
2020-08-19  8:52   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-23 17:11 ` [PATCH V7 04/14] perf/x86/intel: Move BTS index to 47 kan.liang
2020-08-19  8:52   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-23 17:11 ` [PATCH V7 05/14] perf/x86/intel: Fix the name of perf METRICS kan.liang
2020-08-19  8:52   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-23 17:11 ` [PATCH V7 06/14] perf/x86/intel: Use switch in intel_pmu_disable/enable_event kan.liang
2020-08-19  8:52   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-23 17:11 ` [PATCH V7 07/14] perf/core: Add a new PERF_EV_CAP_COEXIST event capability kan.liang
2020-07-24 10:55   ` peterz
2020-07-24 11:46     ` peterz
2020-07-24 13:43       ` Liang, Kan
2020-07-24 13:54         ` Peter Zijlstra
2020-07-24 14:19           ` Liang, Kan
2020-07-24 14:32             ` Peter Zijlstra
2020-07-24 14:46               ` Andi Kleen
2020-07-24 14:59                 ` Peter Zijlstra
2020-07-24 16:43                   ` peterz
2020-07-24 17:00                     ` Liang, Kan
2020-07-24 14:39       ` Andi Kleen
2020-07-24 14:51         ` Peter Zijlstra
2020-08-19  8:52   ` [tip: perf/core] perf/core: Add a new PERF_EV_CAP_SIBLING " tip-bot2 for Kan Liang
2020-07-23 17:11 ` [PATCH V7 08/14] perf/x86/intel: Generic support for hardware TopDown metrics kan.liang
2020-07-24 13:19   ` peterz
2020-07-24 15:27     ` peterz
2020-07-24 16:07       ` Liang, Kan
2020-07-24 19:10         ` Liang, Kan
2020-07-28 12:32           ` Peter Zijlstra
2020-07-28 13:09           ` Peter Zijlstra
2020-07-28 13:28             ` Liang, Kan
2020-07-28 13:44               ` peterz
2020-07-28 14:01                 ` Liang, Kan
2020-08-19  8:52   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-23 17:11 ` [PATCH V7 09/14] perf/x86: Add a macro for RDPMC offset of fixed counters kan.liang
2020-08-19  8:52   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-23 17:11 ` [PATCH V7 10/14] perf/x86/intel: Support TopDown metrics on Ice Lake kan.liang
2020-08-19  8:52   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-23 17:11 ` [PATCH V7 11/14] perf/x86/intel: Support per-thread RDPMC TopDown metrics kan.liang
2020-08-19  8:52   ` [tip: perf/core] " tip-bot2 for Kan Liang
2020-07-23 17:11 ` [PATCH V7 12/14] perf, tools, stat: Support new per thread " kan.liang
2020-07-24  3:29   ` Andi Kleen
2020-07-23 17:11 ` [PATCH V7 13/14] perf, tools, stat: Check Topdown Metric group kan.liang
2020-07-23 17:11 ` [PATCH V7 14/14] perf, tools: Add documentation for topdown metrics kan.liang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).